Cheatsheet — Hard Boundaries¶

Source: AI Safety Checklist | Red Teaming

What are Hard Boundaries?¶

Hard Boundaries are behaviours that the AI system must never exhibit, regardless of user instruction. They are technically enforced — not merely described in documentation.

Universal Hard Boundaries (for every system)¶

Category	Prohibited behaviour
Harmful content	Instructions for physical harm, illegal activities, weapons
Deception	Claiming to be human when asked
Privacy	Generating or inferring personal data about third parties
System instructions	Revealing or overwriting own system prompt
Scope violation	Performing actions outside the defined task scope

Domain-specific Hard Boundaries (examples)¶

Domain	Red Line example
Legal	No concrete legal advice without qualification
Medical	No diagnoses or medication recommendations
Financial	No investment advice without disclaimer
HR	No selection decisions without human review
Customer service	No commitments outside the approved offering

Defining Hard Boundaries — Template¶

RED LINE #[n]
Category: [Harmful content / Privacy / Scope / Deception / Domain]
Prohibited behaviour: [Exact description]
Technical enforcement: [Input filter / Output filter / Guardrail / Prompt]
Tested via: [Red Teaming exercise #]
Approved by: [Guardian] on [date]

Gate Review Check¶

All Hard Boundaries are documented in writing
Each Red Line is technically enforced (not merely described)
Red Teaming has tested Hard Boundaries (see Red Teaming Playbook)
Guardian has approved Hard Boundaries
Procedure for violations is documented

Source: Red Teaming Playbook | Deployment Safety

Was this page helpful? Give feedback