AI Safety Checklist¶

Purpose

Structured safety checklist across four dimensions (training, deployment, monitoring, governance) for use at every Gate Review.

Structured safety checks across four dimensions: training, deployment, monitoring and governance. Use this checklist at every Gate Review for High Risk and Limited Risk systems.

Risk-proportional use

Minimal Risk systems: complete section 4 (Governance). Limited Risk: sections 2 + 4. High Risk: all four sections mandatory.

Section 1 — Training & Data Safety¶

Relevant for self-trained models or fine-tuning. Skip for pure API usage of foundation models.

Check	Status	Note
Training data evaluated for harmful content	☐
Bias detected and documented in training data	☐
Personal data in training data minimised or pseudonymised	☐
Data sources documented (origin, licence, dates)	☐
Adversarial examples included in training set	☐
Model weights securely stored (access control, version management)	☐

Section 2 — Deployment Safety¶

Check	Status	Note
Input filtering configured (block prohibited inputs)	☐
Output filtering configured (block prohibited outputs)	☐
Hard Boundaries documented and technically enforced	☐
Rate limiting configured (abuse prevention)	☐
Circuit Breaker configured (see Incident Response)	☐
Least-privilege access: system has minimum required permissions	☐
System prompt protected against extraction	☐
Users informed they are interacting with AI (transparency obligation)	☐
Human-in-the-loop mechanism operational for impactful decisions	☐
Exit procedure for users documented (escalation to human)	☐

Section 3 — Monitoring Safety¶

Check	Status	Note
Logging of inputs and outputs active (with retention policy)	☐
Quality monitoring active (thresholds configured)	☐
Drift detection configured (see Drift Detection)	☐
Fairness metrics monitored (if multiple user groups)	☐
Anomaly detection on usage (unusual patterns, abuse)	☐
Alerting to responsible party on threshold breach	☐
Procedure for harmful output reports by users	☐
Periodic sample review of outputs scheduled	☐

Section 4 — Governance Safety¶

Check	Status	Note
Guardian appointed and actively involved	☐
Safety review performed at every Gate	☐
Red Teaming performed (High/Limited Risk)	☐
Incident response procedure documented and tested	☐
Accountable owner for the system named	☐
Model Card up-to-date with known limitations and risks	☐
Periodic recertification scheduled (min. annually for High Risk)	☐
EU AI Act compliance status documented	☐

Constitutional AI — Guidelines for Autonomous Systems¶

For Collaboration Mode 4 and 5 (system acts autonomously), additional Constitutional AI principles apply:

The Three Core Principles¶

1. Harmlessness — No harm The system avoids actions that may cause harm to users, third parties or the organisation. Explicitly define which actions are prohibited, regardless of instruction.

2. Honesty — No deception The system communicates transparently about its capabilities, uncertainties and limitations. It does not fabricate facts and indicates when it does not know something.

3. Helpfulness — Relevant assistance The system genuinely attempts to be helpful within the defined scope. Refusal is always justified with an alternative.

Implementation Checklist for Autonomous Systems¶

Requirement	Status
Action scope technically bounded (which systems/actions are accessible)	☐
Prohibited actions explicitly documented (not only implicitly expected)	☐
Maximum impact per action bounded (e.g. maximum transaction value)	☐
Self-critique mechanism: system checks own output before execution	☐
Human approval required above defined impact threshold	☐
Audit trail of all autonomous actions (immutable)	☐
Explainability: system can explain its decision on request	☐

Safety Score¶

Count the number of checked items per section and calculate the safety score:

Section	Checked	Total	%
1 — Training & Data Safety		6
2 — Deployment Safety		10
3 — Monitoring Safety		8
4 — Governance Safety		8
Total		32

Minimum threshold for go-live:

High Risk: ≥ 90% (≥ 29/32)
Limited Risk: ≥ 75% (≥ 24/32, section 1 optional)
Minimal Risk: section 4 complete

Was this page helpful? Give feedback