Skip to content

🎯 Development — Objectives

Purpose

Objectives of Phase 3: building a robust, production-ready AI solution that meets all quality and safety requirements.

🎯 Objective

The primary objective of the Development phase is to build a robust, production-ready AI solution that meets all quality, safety, and compliance requirements. We apply the Specification-First Method (Spec-Driven Development): we define the expected behaviour before building, derive the Golden Set from the specification, and validate every change against it. This ensures that the system does what it is supposed to do — not just what the code happens to produce.

Unlike traditional software development, where logic is deterministic and a single test suite suffices, AI systems require continuous validation because behaviour is probabilistic. Every change to Steering Instructions, Knowledge Coupling, or model parameters must pass through the three validation dimensions: Syntactic, Behavioural, and Goal-Aligned.

Key result: A fully functional AI system ready for go-live, including automated tests, documentation, and a complete technical dossier that demonstrates compliance with the Hard Boundaries and Evidence Standards for the applicable risk level.


✅ Entry Criteria (Definition of Ready)

Before this phase starts, the following conditions must be met:

  • Gate 2 (PoV Investment) is approved with a documented Go decision.
  • The Validation Pilot has demonstrated that the solution works — the AI meets or exceeds the success criteria for its risk level.
  • The Cost Overview is positive and approved by the Business Sponsor and Finance.
  • The development team is complete and has access to all required resources: models, infrastructure, data pipelines, and development tools.
  • The Golden Set from the Validation phase is available and approved as the baseline for ongoing testing.

Do not start Development without an approved Golden Set

The Golden Set is the foundation of Specification-First Development. Without it, you cannot validate whether changes improve or degrade the system. If the Validation phase did not produce a sufficient Golden Set, return to Validation and complete it first.


⚙️ Core Activities

1. Specification-First Development

We write the expected outcome (the specification) first, then the implementation. This ensures quality and prevents drift from the original intent.

  • Define the Goal Definition: The AI Product Manager articulates what the system must achieve. This is the single source of truth for the system's purpose.
  • Draft Steering Instructions: The team (AI Engineer, Developer, or prompt specialist) drafts the initial Steering Instructions based on the Goal Definition.
  • Generate Specification: The system generates a detailed specification of the expected behaviour.
  • Human Review: The team validates the specification against the intent before spending resources on training or test runs.
  • Derive Golden Set: The approved specification drives the creation or refinement of the Golden Set — the test cases that verify the system meets the specification.

2. Knowledge Coupling & Model Configuration

We connect the AI to internal business information and configure the model for optimal performance in the specific business context.

  • Knowledge Coupling: Connect the AI to internal documents, FAQs, procedures, and data sources. Set up RAG (Retrieval Augmented Generation) pipelines with appropriate chunking, embedding, and retrieval strategies.
  • Prompt Engineering: Optimise the Steering Instructions through iterative testing against the Golden Set. Each change is validated before being applied.
  • Model Fine-Tuning: If prompt engineering and Knowledge Coupling are insufficient, adjust model parameters for the specific use case. Fine-tuning requires a larger dataset and more compute — justify the investment against the expected improvement.

3. Automated Validation Pipeline

We build automated tests that run with every change to ensure the system continues to meet the specification.

  • Syntactic Validation: Automated checks on structure, schemas, and linting. Every change must pass before proceeding.
  • Behavioural Conformance: Automated evaluation against the Golden Set. The system must maintain or improve its scores with every change.
  • Goal Alignment: Scenario-based evaluation by domain experts. This cannot be fully automated — human judgement is required to assess whether the system genuinely helps the user.

4. Controlled Behaviour Changes

Changes to the behaviour of an AI system are implemented in bounded steps. Per change, we record:

  • The intended effect (what should improve and why).
  • The applicable boundaries and limits (which Hard Boundaries are relevant).
  • How it is determined that the change meets the objective and Hard Boundaries (validation method and acceptance criteria).

Only after successful verification is a change permanently applied. This prevents uncontrolled drift and ensures every change is traceable.

5. SaaS & Procurement Variant (Buy vs. Build)

Not all AI solutions are built in-house. When purchasing standard AI software (SaaS), the focus of the Development phase changes:

  • From Building to Configuring: Focus on setting up the right system prompts, Knowledge Coupling sources, and safety filters within the vendor environment.
  • Validation Remains Identical: Even a purchased tool must pass the Validation Pilot and Golden Set test before going live. Do not blindly trust the vendor's "demo".
  • Model Card becomes Configuration Card: Document which settings, plugins, and data connections are active.
  • Vendor Lock-in Check: Verify that data and logs are exportable for compliance (EU AI Act). Ensure the contract includes data portability and deletion clauses.

👥 RACI

Role Responsibility in Development
Data Scientist Responsible: Development of AI models and Knowledge Coupling.
ML Engineer Responsible: Building data pipelines, infrastructure, and automated validation.
AI Product Manager Accountable: Owner of the product backlog, prioritisation, and specification approval.
QA Engineer Responsible: Performing automated tests and validation.
Guardian (Ethicist) Consulted: Reviews Hard Boundary compliance and fairness results.
DevOps Consulted: Advises on infrastructure and Go-live readiness.

✅ Exit Criteria (Gate 3 — Production-Ready)

The Development phase closes when all of the following are satisfied:

  • All features in the specification are implemented and tested against the Golden Set.
  • Automated validation pipeline is operational and passes on the current configuration.
  • Validation at three levels is complete: Syntactic, Behavioural, and Goal-Aligned.
  • Technical Model Card is completed with full documentation of the running system.
  • Logging plan is in place meeting the requirements for the risk level.
  • Incident response procedure is drafted and tested.
  • Gate 3 review is conducted with the Business Sponsor and Guardian.
  • Go/No-Go decision for Delivery phase is documented.

Collaboration Mode: [Mode X — Name]. The SDD specification states the mode as a constraint. Required validation for this mode: → See Evidence Standards.


📦 Deliverables

The following artefacts are produced during this phase:

  1. Technical Model Card — complete documentation of the model, prompts, configuration, and data sources.
  2. Automated Test Suite — Golden Set tests, regression tests, and adversarial tests integrated into CI/CD.
  3. Validation Report (release candidate) — results meeting the standards from the Evidence Standards.
  4. Logging Plan — audit trail configuration meeting the retention requirements for the risk level.
  5. Incident Response Procedure — draft procedure for handling production incidents.

Next step: Start the SDD cycle: write the spec, derive the Golden Set, build and validate. → Use the Goal Definition as your starting point. → See also: Activities | Spec-Driven Development | SDD Pattern


Version: 1.1 Date: 07 May 2026 Status: Final