AI Glossary

Inference

Running a trained AI model to generate output

Definition

Inference is the process of using a trained AI model to generate outputs from new inputs. It is distinct from training (which creates the model). Inference speed is measured in tokens per second and affects user experience significantly. Inference cost is the primary driver of API pricing — more capable models are more expensive to run.

← Back to Glossary

Inference

Definition

Related Terms