AI Glossary
Guardrails
Rules and filters that prevent AI from harmful outputs
Definition
Guardrails are safety mechanisms applied to AI systems to prevent them from producing harmful, inappropriate, or off-topic outputs. They can be implemented through fine-tuning (constitutional AI, RLHF), system prompts, or post-generation filtering. All major AI APIs have built-in guardrails, and enterprise deployments often add custom guardrails for their specific use cases.