🎯 Validation — Objectives¶

Purpose

Objectives and approach of Phase 2: proving the AI idea works and is financially viable before making a major investment.

🎯 Objective¶

The primary objective of the Validation phase is to prove that the AI idea works in the specific business context and is financially viable before committing to a full development investment. We run a small-scale Validation Pilot (Proof of Value) with real data, measure the results against a Golden Set, and produce a Cost Overview that enables an informed Go/No-Go decision at Gate 2.

This phase answers two critical questions: (1) Does the AI understand our business context well enough to deliver value? and (2) Is the investment justified by the expected return? If either answer is negative, we stop — and that is a successful outcome, because it prevents wasted resources on an unviable project.

Key result: A working Validation Pilot demonstrating that the AI understands the specific business context and delivers measurable value, supported by a Cost Overview with ROI calculation and a Validation Report with evidence against the applicable Evidence Standards.

✅ Entry Criteria (Definition of Ready)¶

Before this phase starts, the following conditions must be met:

Gate 1 (Go/No-Go Discovery) is approved with a documented Go decision.
The Data Evaluation has been completed with a positive result on Access, Quality, and Relevance.
A test set (Golden Set) is available with representative real-world examples. The minimum size depends on the risk level: 20 cases for Minimal Risk, 50 cases for Limited Risk, 150 cases for High Risk.
The team has access to the required tools, models, and data for experimentation.
The intended Collaboration Mode is recorded in the Project Charter.

Do not start Validation without a Golden Set

A Validation Pilot without a representative test set measures nothing. The Golden Set must contain real-world examples — not synthetic or "happy flow" cases. See Evidence Standards for Golden Set requirements per risk level.

Case study

See Case Studies — Scenario 2: Customer Service Automation for a conceptual example of the Validation phase in practice.

⚙️ Core Activities¶

1. Validation Pilot (Proof of Value)¶

We run a small-scale experiment to test whether the AI understands the specific business context. The pilot is deliberately limited in scope — it tests the core hypothesis, not the full solution.

Assemble Test Set: Collect representative real-world examples from the Golden Set. Ensure coverage of standard cases, edge cases, and adversarial cases (for Limited and High Risk projects).
Baseline Measurement: Measure how humans or existing systems perform on the same test set. This establishes the benchmark the AI must exceed.
AI Experiment: Have the AI process the same examples using the current Steering Instructions and Knowledge Coupling configuration.
Compare Results: Evaluate AI performance against the baseline and the success criteria defined in Discovery. The AI must score above the threshold for its risk level (see Evidence Standards).

2. Reliability Testing¶

We verify that the Validation Pilot results are stable and not based on chance or a favourable test set composition.

Reproducibility: Run the AI multiple times on the same test set. Does it give consistent answers? Measure variation across runs.
Edge Cases: How does the system respond to unusual, ambiguous, or extreme input? Edge cases reveal the boundaries of the AI's capability.
Bias Detection: Are there systematic errors in certain categories? Perform a fairness check across demographic or business-relevant groups. For Limited Risk: difference in Major error rate between groups ≤ 10%. For High Risk: ≤ 5%.

3. Cost Overview¶

We produce a complete estimate of investment and operational costs to enable an informed financial decision.

Investment Costs: People (development, training, management FTEs), Technology (licences, cloud infrastructure, tools), Data (cleaning, labelling, enrichment).
Operational Costs (per month/year): Usage costs (cloud/API costs per task or transaction), Maintenance (monitoring, updates, support), Risk costs (potential costs of errors or incidents).
Return on Investment (ROI): Time savings (hours saved per week/month), Quality improvement (fewer errors, higher customer satisfaction), Revenue growth (new opportunities, faster turnaround).

4. Evidence Collection¶

We gather the evidence required for Gate 2. The evidence pack must include the Validation Report, Technical Model Card (draft), and Guardian approval on Hard Boundaries. The depth of evidence depends on the risk level and Collaboration Mode.

→ See Evidence Standards for the full requirements per risk level.

👥 RACI¶

Role	Responsibility in Validation
Data Scientist	Responsible: Performing the Validation Pilot and reliability testing.
AI Product Manager	Accountable: Owner of the business case and ROI calculation (Cost Overview).
Business Sponsor	Consulted: Validates the test set and success criteria.
Finance	Consulted: Reviews the cost estimate and ROI calculation.
Guardian (Ethicist)	Consulted: Reviews fairness check results and approves Hard Boundaries.
Stakeholders	Informed: Receive updates on progress and pilot results.

✅ Exit Criteria (Gate 2 — PoV Investment)¶

The Validation phase closes when all of the following are satisfied:

Validation Pilot is completed with results meeting or exceeding the success criteria for the risk level.
Reliability testing confirms stable results across multiple runs.
Cost Overview is completed with ROI calculation and reviewed by Finance.
Validation Report is drafted with evidence against the applicable Evidence Standards.
Guardian has approved the Hard Boundaries.
Gate 2 review is conducted with the Business Sponsor.
Go/No-Go decision for Development phase is documented.

Collaboration Mode: [Mode X — Name] as recorded in the Project Charter. Validate that the mode still matches the risk level — if the risk classification has changed, reassess the mode. Required validation for this mode: → See Evidence Standards.

📦 Deliverables¶

The following artefacts are produced during this phase:

Validation Report — pilot results, reliability testing outcomes, and conclusion.
Cost Overview — investment costs, operational costs, and ROI calculation.
Technical Model Card (draft) — documentation of the model, prompts, and configuration used in the pilot.
Golden Set Test Results — detailed scores per test case and aggregate metrics.

Next step: Complete the Validation Pilot and document the results in the Validation Report. → Use the Validation Report as your starting point. → See also: Activities | Business Case | Gate 2 Checklist

Version: 1.1 Date: 07 May 2026 Status: Final

Was this page helpful? Give feedback