Mixture of Experts (MoE)

A model architecture that activates only part of its capacity per token

Definition

Mixture of Experts (MoE) is a neural network architecture where the model contains many specialized sub-networks ("experts") but only activates a small subset for each input token. This allows a model to have a very large total parameter count while remaining computationally efficient. Models like Mixtral and GPT-4 reportedly use MoE architecture.

← Back to Glossary

Mixture of Experts (MoE)

Definition

Related Terms