:

AI MODELS FAIL TO SPOT UNSOLVABLE MATH PROBLEMS

AI DESK2 MIN READ
SUN, MAY 17, 2026

■ AI-SUMMARIZED FROM 1 SOURCE ▸ TIMELINE

A new benchmark created by 64 mathematicians reveals that advanced AI systems confidently attempt to solve math problems with no solution, exposing a critical gap in reasoning capability. Google's Gemini 3 Pro achieved 30 percent accuracy on research-level problems but no model exceeded 50 percent when identifying deliberately unsolvable tasks.

Researchers unveiled SOOHAK, a math benchmark containing 439 handwritten problems designed to evaluate AI reasoning at scale. The dataset includes 99 intentionally unsolvable tasks, serving as a crucial test for whether AI systems recognize the limits of problems rather than simply generate plausible-sounding answers. The results highlight a significant weakness: while increased computational power improves performance on solvable problems, it does not enhance models' ability to acknowledge when a problem cannot be solved. This disconnect suggests that scaling alone cannot address fundamental reasoning gaps in current AI systems. Google's Gemini 3 Pro led the benchmark across research-level problems, but the broader finding remains troubling. No model achieved even 50 percent accuracy on the unsolvability detection task—a baseline that would be expected from systems claiming advanced mathematical reasoning. The benchmark addresses a gap between flashy individual successes and the sustained, broad research skills required for authentic mathematical problem-solving. AI systems that confidently attempt unsolvable problems pose real risks in applied domains where recognizing problem constraints is essential. SOOHAK represents an effort to establish more rigorous evaluation standards for mathematical AI. Rather than measuring only success on solvable problems, the benchmark forces systems to demonstrate judgment about problem feasibility—a capability that current models struggle to develop, regardless of their overall performance levels. The findings suggest that future AI development must address not just computational scale, but fundamental differences in how models approach reasoning tasks and recognize epistemic boundaries.

■ SOURCES

The Decoder

■ SUMMARY WRITTEN BY AI FROM THE LINKS ABOVE

■ MORE FROM THE AI DESK

Singapore's Sea Ltd. has established a dedicated team to identify and pursue AI investments, signaling a strategic pivot beyond its e-commerce core business. The move reflects the company's search for new growth opportunities in artificial intelligence.

9H AGOAI Desk

Tech executives are laying off workers based on AI capabilities they may not fully grasp, according to Box founder Aaron Levie. The trend has accelerated dramatically, with 2026 layoffs already approaching 2025's total.

9H AGOAI Desk

AI startup Shift is offering free home cleaning services in New York and plans to expand to London, but the deal requires homeowners to let the company film cleaners performing household chores.

9H AGOIndustry Desk

Bank of England Governor Andrew Bailey revealed that British banks remain unable to access Anthropic's Mythos AI tool. Bailey called for coordinated international efforts to address cybersecurity challenges.

9H AGOAI Desk

■ SUBSCRIBE TO THE DAILY BRIEF

ONE EMAIL, 5 STORIES, 06:00 UTC. UNSUBSCRIBE ANYTIME.