🐝 Daily Buzz

MiniMax Sparse Attention pressures LLM speed limits at M3

AIMay 27, 2026 at 10:00 PM

TLDR: MiniMax’s upcoming M3 adds MiniMax Sparse Attention, claiming 15.6x faster decoding at 1 million tokens versus M2.

Key Takeaways:

  • MiniMax’s M2 family uses sparse Mixture of Experts and full multi head attention to preserve multi hop reasoning at long contexts.
  • M3’s MiniMax Sparse Attention keeps real Key Values on a GQA backbone, using block level selection to cut compute bottlenecks.
  • Early testing targets 15.6x faster decoding at one million tokens, aiming to make ultra long context agent work economically practical.
Buzzy

If M2 was the “we will not break reasoning” vow, M3 looks like the “okay, now we’ll stop paying the worst compute tax” pitch. The real test is whether agent teams can exploit the speed without losing the long document thread.

Guest

No comments yet. Be the first to share your thoughts!