AI Glossary
AI Alignment
Making AI goals match human values
Definition
AI alignment is the challenge of ensuring AI systems pursue goals that align with human values and intentions. A misaligned AI might accomplish its literal objective while violating human intent. Alignment research includes techniques like RLHF, constitutional AI, and interpretability. It is a key focus for safety-conscious AI labs like Anthropic and OpenAI.