Skip to content

⚙️ Development — Activities

Purpose

Overview of core activities and role assignments during the Development phase, from data automation to model development and test validation.

When to use this?

You have passed Gate 2 and are ready to build the production-ready AI solution. This page guides you through the Specification-First Method, Knowledge Coupling, automated validation, and the SaaS procurement variant.


🎯 Objective

Execute the Development phase activities to produce a production-ready AI system that meets the specification, passes all validation levels, and is documented in a complete Technical Model Card.


✅ Entry Criteria (Definition of Ready)

  • Gate 2 (PoV Investment) is approved.
  • The Golden Set is available and approved as the baseline for testing.
  • The development team has access to models, infrastructure, and data pipelines.

⚙️ Core Activities

1. Automating Data Flows

Setting up pipelines that automatically clean and supply data — no more manual work. Reliable data flows are the backbone of any production AI system.

Steps:

  1. Design Data Pipelines: Map the data flow from source to model. Identify all extraction points, transformation steps, and loading targets. Document the schema at each stage.
  2. Implement ETL Processes: Build automated Extract, Transform, Load processes. Use orchestration tools (e.g., Airflow, Prefect) to schedule and monitor pipeline execution.
  3. Set Up Quality Controls: Implement automatic validation of incoming data. Check for completeness, format consistency, and value ranges. Reject or flag records that fail validation.
  4. Enable Version Control: Track data changes and lineage. Every dataset used for training or inference must be versioned so that results are reproducible.
  5. Test Pipeline Resilience: Simulate failures (missing data, schema changes, source outages) and verify that the pipeline handles them gracefully — with alerts, not silent failures.

Manual data handling is a production risk

If your Validation Pilot relied on manually prepared data, the Development phase must automate this. Manual processes do not scale and introduce inconsistency.

Practical Example

Situation: A municipal government was building an AI assistant to help citizens navigate the permit application process — from building permits to event licences. The municipality had 2,400 pages of regulations spread across PDFs, intranet pages, and legacy Word documents, updated irregularly by different departments. Approach: The ML Engineer designed an ETL pipeline using Apache Airflow that ingested documents from three sources: a SharePoint document library (regulations), a Confluence space (internal procedures), and a PostgreSQL database (historical permit decisions). The pipeline ran nightly, chunking documents into 500-token segments with 50-token overlap, embedding them with a multilingual model (Dutch/English), and storing them in a vector database. Quality controls flagged documents older than their stated review date, triggering an alert to the responsible department. The Knowledge Coupling was configured with source attribution — every answer cited the specific regulation article. During prompt iteration, the team discovered that the AI conflated "building permit" (omgevingsvergunning) with "environmental permit" (milieuvergunning) in 8% of cases. They refined the Steering Instructions with explicit differentiation criteria and added a disambiguation question to the prompt template. The RAG Design Canvas (template) documented the chunking strategy, retrieval parameters, and the decision to use hybrid search (semantic + keyword) for regulation article numbers. Result: The automated data pipeline reduced manual document preparation from 3 days per update cycle to zero. Retrieval quality testing showed 94% of test queries returned the correct regulation article in the top 3 results. The Technical Model Card (template) captured the full configuration: base model, embedding model, chunk size, retrieval strategy, and prompt version — enabling reproducible builds and audit compliance under the EU AI Act.

2. Knowledge Coupling & Fine-Tuning

Connecting the AI to internal documents and configuring the model for optimal performance.

Knowledge Coupling (RAG Setup):

  1. Inventory Knowledge Sources: Identify all internal documents, FAQs, procedures, and data sources the AI needs to access.
  2. Design Chunking Strategy: Determine how to split documents into chunks for embedding. Balance chunk size (too small loses context, too large dilutes relevance) against retrieval performance.
  3. Configure Retrieval: Set up the embedding model, vector store, and retrieval strategy. Test retrieval quality against known queries.
  4. Implement Source Attribution: Ensure the AI cites its sources. This is critical for auditability and user trust.

Prompt Engineering:

  1. Draft Steering Instructions: Write the system prompt based on the Goal Definition. Include the system's role, objectives, boundaries, and output format.
  2. Iterate with the Golden Set: Test each prompt iteration against the Golden Set. Record scores and identify failure patterns.
  3. Refine Based on Failures: Analyse where the AI fails. Is it a knowledge gap (add to RAG), a reasoning gap (improve the prompt), or a boundary violation (strengthen the Hard Boundaries)?

Model Fine-Tuning (when needed):

  1. Assess Need: Fine-tuning is only justified when prompt engineering and Knowledge Coupling cannot achieve the required performance.
  2. Prepare Training Data: Assemble a dataset of input-output pairs. The dataset must be representative of the production workload and include edge cases.
  3. Train and Evaluate: Run the fine-tuning process and evaluate against the Golden Set. Compare results with the non-fine-tuned baseline.
  4. Document the Model: Record the base model, fine-tuning parameters, training data provenance, and evaluation results in the Technical Model Card.

3. Specification-First Method (SDD Cycle)

We write the expected outcome (the test) first, then the implementation. This ensures quality and prevents drift from the original intent.

The SDD Cycle:

  1. AI Product Manager defines the Goal Definition — what the system must achieve.
  2. The team drafts the initial Steering Instructions based on the Goal Definition.
  3. The system generates a detailed specification of the expected behaviour.
  4. Human Review — the team validates the specification against the intent before spending resources.
  5. The approved specification drives development — every feature is built to meet the specification, and every change is validated against it.

For systems with a higher degree of autonomy, behaviour changes are implemented in small, bounded steps. Per change, record the intended effect, applicable boundaries, and verification method. Only after successful verification is a change permanently applied.

4. Validation at Three Levels

Every change is tested on three dimensions. All three must pass before a change is deployed.

Syntactic Validation

  • Question: Does the code work? No crashes or errors?
  • Check: Unit tests, integration tests, schema validation, linting.
  • Automation: Run in CI/CD on every commit. Block the pipeline on failure.

Technical Delivery & Pipelines

  • Data Pipelines: Setting up robust flows for training and inference.
  • Automated Gates (Governance-as-Code): Integrate the Hard Boundaries and success metrics directly into the CI/CD pipeline.
  • Example: The build automatically fails if the bias score is too high or accuracy drops below the threshold.
  • Continuous Testing (CT): Automated evaluation of model outputs with every change to the Steering Instructions.

Behavioural Validation

  • Question: Does it do what we expect?
  • Check: Functional tests, regression tests, Golden Set evaluation.
  • Method: Run the Golden Set against the current configuration. Compare scores with the baseline. Maximum regression: 5% on existing metrics.

Goal-Aligned Validation

  • Question: Does it help the user? Does it deliver value?
  • Check: User acceptance testing, scenario-based evaluation by domain experts.
  • Method: Have domain experts assess the system's output in realistic scenarios. Record their judgement on relevance, usefulness, and safety.

5. Variant: SaaS & Procurement (Buy vs. Build)

Not all AI solutions are built in-house. When purchasing standard AI software (SaaS), the focus of the Development phase changes:

  • From Building to Configuring: Focus on setting up the right system prompts, Knowledge Coupling sources and safety filters within the vendor environment.
  • Validation Remains Identical: Even a purchased tool must pass the Validation Pilot and Golden Set test before going live. Do not blindly trust the vendor's "demo".
  • Model Card becomes Configuration Card: Document which settings, plugins and data connections are active.
  • Vendor Lock-in Check: Verify that data and logs are exportable for compliance (EU AI Act). Ensure the contract includes data portability and deletion clauses.

Vendor demos are not evidence

A vendor's demonstration uses curated data and optimal conditions. Your Golden Set tests the system in your context with your data. Require the vendor to run your Golden Set and share the results.


👥 RACI

Role Responsibility in Development
Data Scientist Responsible: Development of AI models and Knowledge Coupling.
ML Engineer Responsible: Building data pipelines and infrastructure.
AI Product Manager Accountable: Owner of the product backlog and prioritisation.
QA Engineer Responsible: Performing automated tests and validation.
Guardian (Ethicist) Consulted: Reviews Hard Boundary compliance and fairness results.
DevOps Consulted: Advises on Go-live and infrastructure.

✅ Exit Criteria (Gate 3 — Production-Ready)

The Development phase activities are complete when:

  • All specification items are implemented and pass Golden Set validation.
  • Automated validation pipeline runs successfully on every commit.
  • Three-level validation (Syntactic, Behavioural, Goal-Aligned) is complete.
  • Technical Model Card is completed.
  • Logging plan is configured and tested.

Collaboration Mode: [Mode X — Name]. The SDD specification states the mode as a constraint. Required validation for this mode: → See Evidence Standards.


📦 Deliverables

  1. Technical Model Card — complete system documentation.
  2. Automated Test Suite — integrated into CI/CD pipeline.
  3. Validation Report (release candidate) — meeting Evidence Standards.
  4. Data Pipeline Documentation — schema, flow, and quality controls.
  5. Configuration Card (SaaS variant) — settings, plugins, and data connections.

Next step: Start the SDD cycle: write the spec, derive the Golden Set, build and validate. → Use the Technical Model Card as your starting point. → See also: Objectives | SDD Pattern | Validation Report


Version: 1.1 Date: 07 May 2026 Status: Final