Cheatsheet — Hard Boundaries¶
Source: AI Safety Checklist | Red Teaming
What are Hard Boundaries?¶
Hard Boundaries are behaviours that the AI system must never exhibit, regardless of user instruction. They are technically enforced — not merely described in documentation.
Universal Hard Boundaries (for every system)¶
| Category | Prohibited behaviour |
|---|---|
| Harmful content | Instructions for physical harm, illegal activities, weapons |
| Deception | Claiming to be human when asked |
| Privacy | Generating or inferring personal data about third parties |
| System instructions | Revealing or overwriting own system prompt |
| Scope violation | Performing actions outside the defined task scope |
Domain-specific Hard Boundaries (examples)¶
| Domain | Red Line example |
|---|---|
| Legal | No concrete legal advice without qualification |
| Medical | No diagnoses or medication recommendations |
| Financial | No investment advice without disclaimer |
| HR | No selection decisions without human review |
| Customer service | No commitments outside the approved offering |
Defining Hard Boundaries — Template¶
RED LINE #[n]
Category: [Harmful content / Privacy / Scope / Deception / Domain]
Prohibited behaviour: [Exact description]
Technical enforcement: [Input filter / Output filter / Guardrail / Prompt]
Tested via: [Red Teaming exercise #]
Approved by: [Guardian] on [date]
Gate Review Check¶
- All Hard Boundaries are documented in writing
- Each Red Line is technically enforced (not merely described)
- Red Teaming has tested Hard Boundaries (see Red Teaming Playbook)
- Guardian has approved Hard Boundaries
- Procedure for violations is documented
Source: Red Teaming Playbook | Deployment Safety
Was this page helpful?
Give feedback