AI Glossary

Benchmark

Standardized tests for comparing AI model capability

Definition

A benchmark is a standardized test or dataset used to measure and compare AI model performance. Common benchmarks include MMLU (knowledge), HumanEval (coding), GPQA (graduate-level reasoning), and LMSYS Arena (head-to-head user preference). Benchmarks help buyers compare models objectively, though they can be "gamed" by training specifically on benchmark data.

Related Terms

Large Language Model (LLM)Inference

← Back to Glossary