Skip to content

versie: '1.0' type: guide layer: 2 phase: [1, 2, 3, 4, 5] summary: 'Setup of layered dashboards and KPIs to make the AI system's health continuously visible for the operations team.' answers: ["How does Metrics & Dashboards work?", "What roles do I need?"]


3. Metrics & Dashboards

Purpose

Setup of layered dashboards and KPIs to make the AI system's health continuously visible for the operations team.

1. Objective

We make the health of the AI system continuously visible via layered dashboards and unambiguous KPIs, so that the management team can intervene in a timely manner when deviations occur.


2. Entry Criteria

  • System is in production (Gate 4 approved).
  • SLOs are agreed in writing.
  • Logging and telemetry are actively set up.

3. Core Activities

The Four KPI Categories

We measure at four levels. Each category has a fixed owner and reporting cadence:

Category Example metrics Owner Cadence
Model performance Accuracy, F1-score, deviation vs Golden Set Data Scientist Daily
Operational Latency P95, error rate, uptime, throughput (requests/min) MLOps Engineer Real-time
Usage costs Cost per call, monthly compute costs AI PM Monthly
Governance Number of Hard Boundary violations, Guardian interventions, bias signals Guardian Weekly

Dashboard Layers

We distinguish three layers. Each dashboard has a different audience and granularity:

Layer 1 — Operational (real-time): Visible to MLOps and tech team. Shows system health, alerts and active incidents.

Layer 2 — Model quality (daily/weekly): Visible to Data Scientist and AI PM. Shows accuracy trends, Performance Degradation signals and comparison with the Golden Set.

Layer 3 — Strategic (monthly/quarterly): Visible to CAIO and management. Shows ROI realisation, cost trends and compliance status.

Thresholds and Alerts

For each critical metric we define three levels:

Level Action
🟡 Warning Notification to management team; investigation required within 48 hours
🟠 Critical Immediate intervention required; Guardian is informed
🔴 Circuit Breaker Automatic blocking or escalation; human approval required before restart

Example: If accuracy drops below 85% (Warning), below 80% (Critical) or below 70% (Circuit Breaker).

SLO Definition and Monitoring

An SLO (Service Level Objective) is an internally binding target. We define at a minimum:

  • Availability: e.g. ≥ 99.5% uptime per month.
  • Latency: e.g. P95 response time ≤ 2 seconds.
  • Accuracy floor: e.g. F1-score ≥ 0.80 on the Golden Set.

SLOs are established before Gate 4 and included in the handover documentation.


4. Team & Roles

Role Responsibility R/A/C/I
MLOps Engineer Manages operational dashboard, configures alerts R
Data Scientist Manages model quality dashboard, analyses trends R
AI Product Manager Manages strategic dashboard, guards ROI and SLOs A
Guardian Guards governance dashboard, reports deviations C
CAIO Receives monthly strategic report I

5. Exit Criteria

  • All four KPI categories are visible in the right dashboard.
  • Thresholds and alert rules are documented and tested.
  • SLOs are established and shared with the management organisation.
  • First monthly report has been delivered to the CAIO.

6. Deliverables

Deliverable Description Owner
Operational dashboard Real-time health monitoring MLOps Engineer
Model quality report Weekly summary of performance vs Golden Set Data Scientist
Monthly Strategic Report ROI, cost, compliance status AI PM
SLO document Established service standards and thresholds AI PM

7. DORA Framework and AI-Specific Extensions

The four DORA metrics (DevOps Research and Assessment) are an established standard for measuring software delivery performance. For AI systems we extend these with AI-specific indicators:

DORA Metric Definition AI Extension
Lead Time for Changes Time from commit to production + Time from prompt change to validated deployment
Deployment Frequency How often deployments occur + Frequency of model/prompt updates
Change Failure Rate % of deployments causing an incident + % of prompt changes causing quality decline
Mean Time to Recovery (MTTR) Average recovery time after incident + Recovery time after drift detection

AI-Specific Additional Metrics

Metric Definition Owner Cadence
Acceptance Rate % of AI suggestions actually adopted AI PM Weekly
Rework Percentage % of AI output requiring correction Tech Lead Weekly
Cost per Feature Total cost (tokens + compute + review) per delivered feature AI PM Monthly

Related modules:


Next step: Measure realised benefits via Benefits Realisation → See also: Performance Degradation Detection