AI Glossary

Latency

How long an AI model takes to respond

Definition

Latency in AI refers to the time between sending a request to a model and receiving the response. For real-time applications like chatbots and code assistants, low latency (milliseconds to seconds) is critical. Factors affecting latency include model size, hardware, and network conditions. Streaming responses (showing output token-by-token) improve perceived latency even when total generation time is the same.

Related Terms

Inference Token / Tokenization AI API

← Back to Glossary