Home/Glossary/Training Data

AI Glossary

Training Data

The text and examples that shape an AI model

Definition

Training data is the large collection of text, images, or other data used to train an AI model. For LLMs, this includes books, websites, code, scientific papers, and more — often trillions of tokens. The quality, diversity, and recency of training data heavily influence model capabilities and biases. Training data has a knowledge cutoff — the model knows nothing that happened after it.

Related Terms

Back to Glossary