[AI]■ STORY TIMELINE
BERKELEY RESEARCHERS BREAK TOP AI AGENT BENCHMARKS
Berkeley's RDI team demonstrated critical flaws in leading AI agent benchmarks, achieving near-perfect scores by exploiting structural weaknesses rather than improving actual AI capabilities.
Hacker News+0m
Article URL: https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/ Comments URL: https://news.ycombinator.com/item?…