[AI]■ STORY TIMELINE
LATEST AI MODELS FAIL ON REASONING TASKS
Analysis of OpenAI's GPT-5.5 and Anthropic's Opus 4.7 on the ARC-AGI-3 benchmark reveals three systematic reasoning errors that keep both models below 1 percent accuracy on tasks humans solve routinely.
The Decoder+0m
The ARC Prize Foundation analyzed 160 game runs of OpenAI's GPT-5.5 and Anthropic's Opus 4.7 on the ARC-AGI-3 benchmark.…