[AI]■ STORY TIMELINE

LATEST AI MODELS FAIL ON REASONING TASKS

Analysis of OpenAI's GPT-5.5 and Anthropic's Opus 4.7 on the ARC-AGI-3 benchmark reveals three systematic reasoning errors that keep both models below 1 percent accuracy on tasks humans solve routinely.

1 SOURCEFIRST SEEN MAY 2, 01:31 PM► READ THE ARTICLE

The Decoder+0m

Even the latest AI models make three systematic reasoning errors, ARC-AGI-3 analysis shows

The ARC Prize Foundation analyzed 160 game runs of OpenAI's GPT-5.5 and Anthropic's Opus 4.7 on the ARC-AGI-3 benchmark.…

◄ BACK TO ARTICLE