Mixture of Experts (MoE)

A neural network architecture where only a subset of the model's parameters, called "experts", are activated for each input. MoE allows models to have very large total parameter counts while keeping inference costs manageable: a model with 400 billion parameters might activate only 50 billion for any given query.