AI MODEL RUNS ON 12.5% OF EXPERTS WITH MINIMAL LOSS
AI DESK■ 2 MIN READ
SAT, MAY 16, 2026■ AI-SUMMARIZED FROM 1 SOURCE ▸ TIMELINE
Researchers at the Allen Institute for AI and UC Berkeley have developed EMO, a mixture-of-experts model that maintains near-full performance while using just one-eighth of its experts. The breakthrough could make advanced AI systems practical for memory-constrained devices.
A new approach to training mixture-of-experts (MoE) models shows that AI systems can achieve near-full performance while running on a fraction of their computational components.
Traditional MoE architectures organize experts by word types or linguistic features. The new EMO model instead groups experts by content domains, enabling researchers to remove 75 percent of the experts while sacrificing only about one percentage point of performance.
This efficiency gain addresses a critical limitation of current MoE models: their memory demands make deployment difficult in resource-constrained environments. By reducing the active expert count dramatically, EMO opens possibilities for running these models on devices with limited RAM and storage.
The domain-based specialization appears to create cleaner separation between expert functions than traditional approaches. This structure allows for more aggressive pruning without cascading performance degradation. Researchers can identify and remove experts that handle less common or overlapping content areas.
The one-percentage-point performance drop represents a practical trade-off. For many applications, particularly those not requiring maximum accuracy, the efficiency gains could outweigh this modest performance cost.
The work carries implications beyond just smaller devices. Faster inference speeds from reduced expert activation could cut operational costs for large-scale AI services. Lower memory requirements could also expand the scope of edge deployment scenarios where full models currently prove impractical.
Mixture-of-experts has emerged as a key scaling strategy for large language models, with major implementations from Meta, Google, and others. However, scaling benefits come with memory and latency penalties that have limited real-world adoption. Solutions that make MoE models more efficient address a genuine infrastructure challenge.
The research demonstrates that architectural choices fundamentally shape how AI models can be optimized. Organizing computation by content domain rather than linguistic patterns produces systems that compress more effectively. This insight could influence how future large models structure their expert components.
Further work will likely explore how this approach scales to even larger models and whether similar domain-based specialization benefits other AI architectures.
■ SOURCES
► The Decoder■ SUMMARY WRITTEN BY AI FROM THE LINKS ABOVE
■ MORE FROM THE AI DESK
Singapore's Sea Ltd. has established a dedicated team to identify and pursue AI investments, signaling a strategic pivot beyond its e-commerce core business. The move reflects the company's search for new growth opportunities in artificial intelligence.
9H AGO— AI Desk
Tech executives are laying off workers based on AI capabilities they may not fully grasp, according to Box founder Aaron Levie. The trend has accelerated dramatically, with 2026 layoffs already approaching 2025's total.
9H AGO— AI Desk
AI startup Shift is offering free home cleaning services in New York and plans to expand to London, but the deal requires homeowners to let the company film cleaners performing household chores.
9H AGO— Industry Desk
Bank of England Governor Andrew Bailey revealed that British banks remain unable to access Anthropic's Mythos AI tool. Bailey called for coordinated international efforts to address cybersecurity challenges.
9H AGO— AI Desk