Skip to content
Free Tool Arena

Glossary · Definition

Constitutional AI

Constitutional AI (CAI) is Anthropic's alignment technique that uses AI feedback against a written 'constitution' of principles instead of human feedback ranking. The training method behind Claude.

Updated May 2026 · 4 min read
100% in-browserNo downloadsNo sign-upMalware-freeHow we keep this safe →

Definition

Constitutional AI (CAI) is Anthropic's alignment technique that uses AI feedback against a written 'constitution' of principles instead of human feedback ranking. The training method behind Claude.

What it means

Original paper: Bai et al. 2022. Process: (1) Train a base model with supervised fine-tuning. (2) Have the model self-critique its outputs against a constitution (principles like 'be helpful, harmless, honest'). (3) Use these critiques to refine outputs. (4) Train a preference model on AI-ranked pairs (RLAIF). The result: Claude's tendency to refuse harmful requests, hedge on uncertain claims, and explain its reasoning is shaped largely by CAI rather than direct human feedback.

Advertisement

Why it matters

CAI scales better than RLHF because it doesn't require thousands of human raters per model iteration. It's also more transparent — you can read the constitution and understand why the model behaves the way it does. Anthropic's Claude family is the highest-profile CAI implementation; other labs increasingly adopt similar patterns.

Related free tools

Frequently asked questions

What's IN the constitution?

Anthropic publishes its constitutions. Mix of principles (helpful, harmless, honest), references to UN human rights, and specific behaviors (don't do X).

CAI vs RLHF?

RLHF uses humans to rank responses. CAI uses AI to rank, against a written constitution. CAI scales better; RLHF historically had higher-quality rankings on subtle cases.

Related terms