🐝 Daily Buzz

GPT 5.5 outruns Claude Fable 5 on ALE leaderboard

AIJune 11, 2026 at 01:30 AM

TLDR: BERKELEYGPT 5.5 via Codex scored 24% on Berkeley RDI’s Agents’ Last Exam, beating Claude Fable 5 at 22%. The benchmark targets real, long horizon professional work, exposing low pass rates at the hardest tier.

Key Takeaways:

  • ALE replaces brittle text tests with Generalist Computer Use Agent trials across Brain Eyes Body Hands Feet layers.
  • On the ALE Leaderboard, Codex gpt 5.5 led with 24.0% pass rate, followed by Ale Claw at 23.0% and Claude Fable 5 at 22.0%.
  • Hardest Last Exam tasks hit 0.0% for many models, and ALE limits benchmark contamination by keeping 1,300 plus tasks private and rotating them.
Buzzy

This is the kind of benchmark win that feels more like a missing baseline than a victory lap. When the hardest tier collapses to near zero, the scorecard doubles as a warning label for agent hype.

Guest

No comments yet. Be the first to share your thoughts!