AI Glossary
Mixture of Experts (MoE)
A model architecture that activates only part of its capacity per token
Definition
Mixture of Experts (MoE) is a neural network architecture where the model contains many specialized sub-networks ("experts") but only activates a small subset for each input token. This allows a model to have a very large total parameter count while remaining computationally efficient. Models like Mixtral and GPT-4 reportedly use MoE architecture.