AI Glossary
Quantization
Making large AI models smaller and faster
Definition
Quantization is a model compression technique that reduces the precision of a model's weights (e.g., from 32-bit floats to 4-bit integers), making models smaller and faster to run with minimal quality loss. Quantized models can run on consumer hardware like laptops and phones. It is why tools like Ollama can run capable models locally.