AI Glossary

AI Alignment

Making AI goals match human values

Definition

AI alignment is the challenge of ensuring AI systems pursue goals that align with human values and intentions. A misaligned AI might accomplish its literal objective while violating human intent. Alignment research includes techniques like RLHF, constitutional AI, and interpretability. It is a key focus for safety-conscious AI labs like Anthropic and OpenAI.

Related Tools

claude

← Back to Glossary

AI Alignment

Definition

Related Terms

Related Tools