AI Glossary

Quantization

Making large AI models smaller and faster

Definition

Quantization is a model compression technique that reduces the precision of a model's weights (e.g., from 32-bit floats to 4-bit integers), making models smaller and faster to run with minimal quality loss. Quantized models can run on consumer hardware like laptops and phones. It is why tools like Ollama can run capable models locally.

← Back to Glossary

Quantization

Definition

Related Terms