AI Glossary
Latency
How long an AI model takes to respond
Definition
Latency in AI refers to the time between sending a request to a model and receiving the response. For real-time applications like chatbots and code assistants, low latency (milliseconds to seconds) is critical. Factors affecting latency include model size, hardware, and network conditions. Streaming responses (showing output token-by-token) improve perceived latency even when total generation time is the same.