AI Glossary
Inference
Running a trained AI model to generate output
Definition
Inference is the process of using a trained AI model to generate outputs from new inputs. It is distinct from training (which creates the model). Inference speed is measured in tokens per second and affects user experience significantly. Inference cost is the primary driver of API pricing — more capable models are more expensive to run.