AI Glossary

Guardrails

Rules and filters that prevent AI from harmful outputs

Definition

Guardrails are safety mechanisms applied to AI systems to prevent them from producing harmful, inappropriate, or off-topic outputs. They can be implemented through fine-tuning (constitutional AI, RLHF), system prompts, or post-generation filtering. All major AI APIs have built-in guardrails, and enterprise deployments often add custom guardrails for their specific use cases.

← Back to Glossary

Guardrails

Definition

Related Terms