D2I DEV (non-prod workbench)
D2I Development Plan Overview
Key Objectives (Phase 1 & 2)
- Extract specified data sub-set as JSON via query
- Provide capability for automated JSON query extract via script
- Enhance automated JSON query extract(now payload) to enable send to defined API endpoint
- Develop mechanism(s) to enable API response logging within SSD/persistent payload table
- Develop mechanism(s) to enable SSD row-level change tracking towards delta extracts
- Transition from initial full payload submissions to daily delta updates
- Ensure minimal manual intervention with configurable automation
- Design towards potential additional fields inclusion/future changes
Development Data flows Roadmap (Phase 1 & 2)
sequenceDiagram
autonumber
participant DevSim as Python Script (Simulate Changes)
participant SQL as CMS Db Server: ssd_api_data_staging(_anon)
participant Ingest as SQL Update SSD Staging Table
participant PSScript as PowerShell Script (Delta Submission)
participant API as External API
%% Development | Testing
DevSim->>SQL: Modify json_payload, current_hash
DevSim->>SQL: Set row_state = 'updated'
DevSim->>SQL: Set submission_status = 'pending'
Note over DevSim,SQL: Development | Testing
%% Deployment | Live Start
Ingest->>SQL: Build JSON structure (per child record)
Ingest->>SQL: MERGE into SSD staging table
alt Matched and changed
SQL->>SQL: Update json_payload, current_hash
SQL->>SQL: Set row_state = 'updated'
else Matched and unchanged
SQL->>SQL: Set row_state = 'unchanged'
else New record (not matched)
SQL->>SQL: Insert new row
SQL->>SQL: Set row_state = 'new'
SQL->>SQL: Set submission_status = 'pending'
else Missing in source
SQL->>SQL: Set row_state = 'deleted'
end
Note over Ingest,SQL: Deployment | Live Daily Ingestion
%% PowerShell Script Starts
PSScript->>SQL: SELECT records WHERE submission_status IN ('pending', 'error')
alt usePartialPayload = true
PSScript->>SQL: Generate-AllPartialPayloads()
SQL-->>PSScript: partial_json_payload
else use full json_payload
SQL-->>PSScript: json_payload
end
loop Max 100 records per batch
PSScript->>API: POST JSON batch
alt HTTP 200 OK
API-->>PSScript: DfE uuid response + timestamp
PSScript->>SQL: Set submission_status = 'sent'
PSScript->>SQL: Set api_response = DfE uuid
PSScript->>SQL: Copy json_payload → previous_json_payload
PSScript->>SQL: Copy current_hash → previous_hash
PSScript->>SQL: Set row_state = 'unchanged'
else Retryable Errors (401, 403, 429)
API-->>PSScript: Retryable error
PSScript->>API: Retry with backoff
alt Max retries reached
PSScript->>SQL: Set submission_status = 'error'
PSScript->>SQL: Set api_response = error message
end
else Fatal Errors (400, 204, 413)
API-->>PSScript: Non-retryable error
PSScript->>SQL: Set submission_status = 'error'
PSScript->>SQL: Set api_response = error message
end
end
Note over PSScript,API: Deployment | Daily Submission (Batched, Status-driven)
SSD Staging Table Load|API Event Flag Updates (Phase 1 & 2)
Event |
submission_status |
row_state |
Initial load (new record) |
pending |
new |
JSON modified (Python or SQL hash) |
pending |
updated |
API success |
sent |
unchanged |
API failure (retry exhausted) |
error |
(unchanged) |
Deleted in source |
(unchanged) |
deleted |
Conceptual Overview (Phase 1)
flowchart TD
%% Local Authority systems
subgraph Local_Authority ["**Local Authority**"]
style Local_Authority font-weight:bold
style Local_Authority width:360px,height:500px %% Ensures wider boundary for Local Authority
subgraph Source_DB ["Source DB (CMS)"]
style Source_DB width:300px,height:250px %% Wider and taller to fit SSD box
cms_raw_tables["CMS Raw CSC Tables"]
style cms_raw_tables text-align:center
subgraph SSD ["Standard Safeguarding Dataset"]
style SSD width:230px,height:200px %% Adjust height to fit content
ssd_tables["SSD"]
style ssd_tables text-align:center
end
end
subgraph Shell_Process ["Scripted-Shell Process"]
style Shell_Process padding-top:100px %% Adds spacing from the box label
json_query["JSON Extract Query"]
api_payload["Prepare API Payload"]
handle_status["Handle Response Status"]
style json_query text-align:center
style api_payload text-align:center
style handle_status text-align:center
end
end
style Local_Authority stroke-dasharray: 5,5 %% LA boundary
%% API System Owner system(s)
subgraph API_System ["**API System Owner**"]
style API_System font-weight:bold
api_receive["Receive JSON Data"]
api_status["Send Response Status"]
end
%% Arrows showing data flow
cms_raw_tables -->|Extract| ssd_tables
ssd_tables -->|Full Data Extract | json_query
json_query -->|Format for API - Full Data Extract| api_payload
api_payload -->|Send Payload| api_receive
api_receive -->|Response Status| api_status
api_status -->|Process Response| handle_status
%% Apply Colors Using Class Definitions
classDef existingProcess fill:#cfe2f3,stroke:#000,stroke-width:1px %% Light Blue
classDef newProcess fill:#f9cb9c,stroke:#000,stroke-width:1px %% Light Orange
%% Assign Classes
class json_query newProcess
class api_payload newProcess
class handle_status newProcess
%% Legend
subgraph Legend ["Legend"]
new_process_key["Stage 1 Development"]
style new_process_key fill:#f9cb9c,stroke:#000,stroke-width:1px %% Match newProcess color
end
style Legend width:200px %% Legend size
Project Breakdown (Phase 1)
Development Task Status Key
🔲 Backlog | 🔄 In Progress | 🛠 Testing | 🚀 In Review | ✅ Completed
⏳ Blocked | 🗄 Deferred
Task Area |
Task |
Status |
Review Initial Specification |
Review specification for project scope |
✅ |
|
Ensure any project required permissions/software is available |
✅ |
|
Complete API to SSD fields mapping |
✅ |
SSD Changes |
Add API specified fields into SSD and data spec (pushed to public SSD front-end?) |
✅ |
|
SystemC (SQL Server) |
🗄 |
|
Mosaic (SQL Server) |
🗄 |
|
Eclipse (Postgres+) |
🗄 |
|
Azeus (Oracle) (March development to prioritise API object requirements) |
🗄 |
Create Documentation (Framework & Plan) |
Request guidance on documentation preferences/standards |
⏳ |
|
Create initial documentation framework (is there an existing req standard/pref?) |
✅ |
|
Define/write up development plan stage 1 |
🚀 |
|
Define/write up development plan stage 2 |
🔄 |
Review and Complete SSD Backlog Tickets |
Backlog board review |
🔄 |
|
Work to close required backlog tickets (known blockers affecting API data flow processes or data) |
🔲 |
Write JSON Data Extract (SQL Query) |
Partial JSON extract query with Header + Top-level child details only (process testing) |
✅ |
|
Full JSON extract query with Header + Top-level child details + all sub-level elements |
✅ |
Automate Data Extraction |
Investigation towards suitable process/script for data extract + API workflow |
🚀 |
|
Develop API workflow shell script(s) incl. DB access, JSON query extraction |
🚀 |
|
Test API workflow locally within host LA (extract only) |
🚀 |
Create Documentation (Playbook) |
Write up final LA playbook details |
🚀 |
|
Update documentation based on pilot LA 1 + stakeholder(s) feedback |
🔲 |
Simulate API Integration local within ESCC |
Create/generate/Anonymise dummy data for initial API send (SSD structure + repeatable) |
✅ |
|
Test with complete (non-delta) payload of null/dummy data |
✅ |
|
Test each response code(s), & logging within payload table |
✅ |
Test API Integration with a Pilot LA |
Test with complete (non-delta) payload of null/dummy data |
⏳ |
|
Test each response code(s), & logging within payload table |
⏳ |
Refinements/Granular end-goal fixes |
Process to handle (mid-)record 'purges' |
⏳ |
|
Discuss/investigate longer-term/wider API use and potential process changes (e.g. do we need combined payload staging table as mid-term historic record) |
🔲 |
Simulated API Overview (Phase 1 #1)
flowchart TD
%% Legend
subgraph Legend ["Legend"]
process_key["Process Steps"]
style process_key fill:#cfe2f3;
api_key["API Interaction"]
style api_key fill:#f9cb9c;
hosted_key["Hosted CMS DB Server"]
style hosted_key fill:#d9ead3;
python_key["During Development Phase"]
style python_key fill:#ffe599;
external_key["External Receiving System"]
style external_key fill:#f4cccc;
end
flowchart TD
Title["**API Data Flow - Phase1 Pt1**"]
subgraph LA_Server_Instance
%% SystemC Reporting Instance DB (Containing all DB-related and API elements)
subgraph SystemC_DB ["SystemC Reporting Instance DB"]
style SystemC_DB fill:#d9ead3,stroke:#000,stroke-width:1px; %% Light Green, same as CMS CSC Schema
subgraph CMS_CSC_Schema ["CMS CSC Schema"]
end
style CMS_CSC_Schema fill:#d9ead3,stroke:#000,stroke-width:1px; %% Light Green
CMS_CSC_Schema -->|Extract SSD Data| SSD_Tables["SSD Tables"]
%% Child-Level JSON Extract Step
SSD_Tables -->|SQL JSON Extract| Child_JSON_Extract["Child-level JSON Extract Process"]
Child_JSON_Extract -->|Extract API JSON Data| Bulk_Populate["ssd_api_data_staging table submission_status=Pending curr_prev_hash_vals=checksums"]
subgraph Development_Phase_Only_DB
%% Anonymised Data in DB
SSD_Staging_Anon["ssd_api_data_staging_anon Replicated staging table **ANONYMISED**"]
end
end
API_Powershell_Live -->|Anon API Array Payload+Header| API_External["API Live Call"]
subgraph Development_Phase_Only_API ["Development Phase only"]
API_Powershell_Dev -->|Anon API Array Payload| API_Simulated["API Simulated Call"]
end
end
%% External Receiving System (Outside LA Server)
subgraph External_Receiving_System ["External Receiving System"]
style External_Receiving_System fill:#f4cccc; %% Light Red for external systems
API_External -->|API Send Request| API_Endpoint
API_Endpoint -->|API Response Codes| API_External
%% API Processing (Inside LA Server)
SSD_Staging_Anon -->|Extract Anon JSON | API_Powershell_Live["API Powershell Live"]
SSD_Staging_Anon -->|Extract Anon JSON | API_Powershell_Dev["API Powershell Dev"]
end
%% Ensure the return flow is fully outside LA_Server_Instance
API_Simulated -->|Simulated Test Response| API_Powershell_Dev
API_External -->|Handle API Response| API_Powershell_Live
API_Powershell_Dev -->|Update R-Cd & sub_status| SSD_Staging_Anon
API_Powershell_Live -->|Update R-Code & status| SSD_Staging_Anon
%% Local Anaconda Environment for Python Processing
subgraph Development_Phase_Only_Py ["Development Phase only"]
Python_Anon["Python Anonymisation in Local Anaconda Env."]
end
%% Anonymisation Flow (now separated)
Bulk_Populate -->|Process Data for Anonymisation| Python_Anon
Python_Anon -->|Store Anonymised Data| SSD_Staging_Anon
%% Apply Colors Using Class Definitions
classDef processNode fill:#cfe2f3; %% Light Blue for processes
classDef apiNode fill:#f9cb9c; %% Light Orange for API interaction
classDef hostedNode fill:#d9ead3; %% Light Green for Hosted DB elements
classDef Testing fill:#ffe599,stroke:#000,stroke-width:1px; %% Light Yellow for Python Processing
classDef externalNode fill:#f4cccc,stroke:#000,stroke-width:1px; %% Light Red for External Receiving System
%% Assign Classes
class SystemC_DB hostedNode;
class CMS_CSC_Schema hostedNode;
class SSD_Tables,Child_JSON_Extract,Bulk_Populate,SSD_Staging_Anon processNode;
class API_Powershell,API_External,API_Endpoint apiNode;
class External_Receiving_System externalNode;
class Development_Phase_Only_Py Testing;
class Development_Phase_Only_API Testing;
class Development_Phase_Only_DB Testing;
class Python_Anon pythonEnv;
Ref: ssd_api_data_staging
The Phase 1(S1) payload data is agreed as the full refresh of all payload data. A staging table, added to the core SSD implementation is the suggested method towards achieving this and onward stages, an example shown here. This enables all staged 'Pending' records to be extracted by the API process. (Note: Hashed/Anonymised test data table shown here).
As per the above diagram, during development, we're aiming to replicate the live staging table using anonymised data. It's from this replicated oject that all Phase 1 tests will be run. At the point where live data from an agreed pilot/project LA can be sent, the shown api data flows will switch over to using the live staging table. During Phase|Stage 2 development (From May 2025->), the staging and API process will be further developed such that a row|record status provides the flag of which records form each delta-payload, e.g. 'New', 'Deleted', 'Updated' included with 'Unchanged' records being ignored.
Switch to data hitting API endpoint Overview (Phase1 #2)
Essentially as #1 above, but switch to (full payload) data hitting defined endpoint. It's recommended that initially this be continued using only the anonymised data, and thus retain the 'development' process areas defined above(orange). The aim that when agreed, to shift the data flow onto live data, dropping the anonymisation processes labelled in the diagram as in development.
flowchart TD
Title["**API Data Flow - Phase1 #2**"]
subgraph LA_Server_Instance
%% SystemC Reporting Instance DB (Containing all DB-related and API elements)
subgraph SystemC_DB ["SystemC Reporting Instance DB"]
style SystemC_DB fill:#d9ead3,stroke:#000,stroke-width:1px; %% Light Green, same as CMS CSC Schema
subgraph CMS_CSC_Schema ["CMS CSC Schema"]
end
style CMS_CSC_Schema fill:#d9ead3,stroke:#000,stroke-width:1px; %% Light Green
CMS_CSC_Schema -->|Extract SSD Data| SSD_Tables["SSD Tables"]
%% Child-Level JSON Extract Step
SSD_Tables -->|SQL JSON Extract| Child_JSON_Extract["Child-level JSON Extract Process"]
Child_JSON_Extract -->|Extract API JSON Data| Bulk_Populate["ssd_api_data_staging table submission_status=Pending curr_prev_hash_vals=checksums"]
end
API_Powershell_Live -->|Prepare Payload and Header| API_External["API Live Call"]
end
%% External Receiving System (Outside LA Server)
subgraph External_Receiving_System ["External Receiving System"]
style External_Receiving_System fill:#f4cccc; %% Light Red for external systems
API_External -->|API Send Request| API_Endpoint
API_Endpoint -->|API Response Codes| API_External
%% API Processing (Inside LA Server)
Bulk_Populate -->|Extract JSON to API array | API_Powershell_Live["API Powershell Live"]
end
%% Ensure the return flow is fully outside LA_Server_Instance
API_External -->|API Response| API_Powershell_Live
API_Powershell_Live -->|Update R-Cd & sub_status| Bulk_Populate
%% Apply Colors Using Class Definitions
classDef processNode fill:#cfe2f3; %% Light Blue for processes
classDef apiNode fill:#f9cb9c; %% Light Orange for API interaction
classDef hostedNode fill:#d9ead3; %% Light Green for Hosted DB elements
classDef Testing fill:#ffe599,stroke:#000,stroke-width:1px; %% Light Yellow for Python Processing
classDef externalNode fill:#f4cccc,stroke:#000,stroke-width:1px; %% Light Red for External Receiving System
%% Assign Classes
class SystemC_DB hostedNode;
class CMS_CSC_Schema hostedNode;
class SSD_Tables,Child_JSON_Extract,Bulk_Populate processNode;
class API_Powershell,API_External,API_Endpoint apiNode;
class External_Receiving_System externalNode;
class Python_Anon pythonEnv;
Conceptual Overview (Phase2)
On completion of Phase 1 #2, development work shifts to refine the full data payloads into record-level update deltas. This to be combined with both the ongoing support of, and wider take-on where agreed pilot LAs into the testing/development of the API data flow process.
flowchart TD
%% Local Authority systems
subgraph Local_Authority ["**Local Authority**"]
style Local_Authority font-weight:bold;
subgraph Source_DB ["Source DB (CMS)"]
cms_raw_tables["CMS Raw CSC Tables"]
subgraph SSD ["Standard Safeguarding Dataset (SSD)"]
ssd_tables["SSD"]
hash_table_process["SSD Change Management & API Log"]
end
end
subgraph Shell_Process ["Scripted-Shell Process"]
json_query["JSON Extract Query"]
api_payload["Prepare API Payload"]
handle_status["Handle Response Status"]
end
end
style Local_Authority stroke-dasharray: 5,5 %% LA boundary
%% API System Owner system(s)
subgraph API_System ["**API System Owner**"]
style API_System font-weight:bold;
api_receive["Receive JSON Data"]
api_status["Send Response Status"]
end
%% Arrows showing data flow
cms_raw_tables -->|Extract| ssd_tables
ssd_tables -->|SSD Generate Hash Log| hash_table_process
hash_table_process -->|Extract Changes| json_query
json_query -->|Format for API - Delta Changes| api_payload
json_query -->|Store Payload| hash_table_process
api_payload -->|Send Payload| api_receive
api_receive -->|Response Status| api_status
api_status -->|Process Response| handle_status
handle_status -->|Update| hash_table_process
%% Apply Colors Using Class Definitions
classDef existingProcess fill:#cfe2f3,stroke:#000,stroke-width:1px; %% Light Blue
classDef newProcess fill:#f9cb9c,stroke:#000,stroke-width:1px; %% Light Orange
%% Assign Classes
class json_query newProcess;
class api_payload newProcess;
class handle_status newProcess;
class hash_table_process newProcess;
%% Legend
subgraph Legend ["Legend"]
new_process_key["Stage 2 Development"]
style new_process_key fill:#f9cb9c,stroke:#000,stroke-width:1px; %% Match newProcess color
end
style Legend width:200px %% Legend size
Project Breakdown (Phase 2) (tbc)
Development Task Status Key
🔲 Backlog | 🔄 In Progress | 🛠 Testing | 🚀 In Review | ✅ Completed
⏳ Blocked | 🗄 Deferred
Task Area |
Task |
Status |
Enable SSD Row-Level Change Tracking |
Develop mechanism(s) to enable record-level/deltas change tracking |
🔄 |
|
Re-develop API process to integrate change tracking/record-level deltas |
🔲 |
Provide Configuration Playbook and Guidance for LAs |
SystemC |
🔄 |
|
Mosaic |
⏳ |
|
Eclipse |
⏳ |
Expand Pilot |
Expand pilot to further LAs with D2I support |
🔲 |