Cheatsheet — Evidence Standards¶

Evidence Levels¶

Level	Description	Example
L1 — Claim	Assertion without substantiation	"The model is accurate"
L2 — Indication	Single measurement or anecdote	One test result
L3 — Evidence	Repeatable measurement on representative set	Golden Set score on 200 items
L4 — Strong Evidence	Multiple methods, independently validated	Golden Set + human review + A/B test

Minimum requirement for Gate 2: level L3 or higher.

Artefact	Minimum level	Method
Output quality	L3	Golden Set + automated metric
Fairness	L3	Segmented analysis per group
Safety (High Risk)	L4	Red Teaming + independent review
Latency	L3	Load test (p95, p99) (p95 = 95th percentile — 95% of all requests are faster than this value)
Cost projection	L2	Calculator + documented assumptions
Traceability	L3	Audit trail demonstrated

Each piece of evidence must include at minimum:

Insufficient evidence

Source: Evidence Standards | Validation Report

Was this page helpful? Give feedback