NIST: DEEPSEEK V4 LAGS US MODELS BY 8 MONTHS

AI DESK■ 2 MIN READ

SUN, MAY 3, 2026

■ AI-SUMMARIZED FROM 1 SOURCE BELOW

The National Institute of Standards and Technology's Center for AI Standards and Innovation evaluated DeepSeek V4 Pro and found it trails leading US AI models by approximately eight months in capability, while remaining the most advanced Chinese AI model.

NIST's CAISI conducted a formal evaluation of DeepSeek V4 Pro, an open-weight AI model, in April 2026. The assessment determined the Chinese model operates at a performance level roughly equivalent to leading US AI systems from eight months prior. Despite the capability gap, DeepSeek V4 Pro represents a significant milestone for Chinese AI development. It surpasses all previous domestic models in performance metrics evaluated by NIST, establishing a new benchmark for the country's artificial intelligence sector. The evaluation provides a concrete timeline for comparing advanced AI systems across different regions. The eight-month lag reflects the rapid pace of AI development, where incremental improvements in training methods, data quality, and computational resources drive meaningful performance gains. NIST's CAISI, established to coordinate AI standards development across government and industry, regularly assesses frontier AI models to track progress and identify standardization needs. The DeepSeek V4 Pro evaluation contributes to this broader effort to create benchmarks for AI capability assessment. Open-weight models like DeepSeek V4 Pro differ from proprietary systems, allowing researchers and developers to inspect and modify the underlying code. This transparency has made the model popular among the AI research community, though the evaluation indicates performance still trails the most capable proprietary US systems in development. The findings underscore ongoing competition in large language models and generative AI between the US and China. Both nations continue investing heavily in AI research and infrastructure, with capability gaps narrowing over time as development techniques become more efficient and widely distributed.

■ SOURCES

► Techmeme

■ SUMMARY WRITTEN BY AI FROM THE LINKS ABOVE

■ MORE FROM THE AI DESK

P639AI OUTPERFORMS ER DOCTORS IN HARVARD DIAGNOSIS STUDY

A Harvard study found that large language models provided more accurate diagnoses than emergency room physicians across various medical scenarios, including real ER cases.

JUST NOW— AI Desk

P637US AGENCY CLAIMS CHINA LAGS 8 MONTHS IN AI RACE

A US government benchmark says China has fallen eight months behind in AI development. However, independent data contradicts this assessment, while Chinese competitors like Deepseek are gaining ground through lower costs.

1H AGO— AI Desk

P630AI MUSIC FLOODS STREAMING—DEMAND UNCLEAR

Generative AI music is proliferating across streaming platforms, but listeners and industry players show little enthusiasm for the trend. What began as experimental novelty in 2018 has evolved into a flood of algorithmically-generated content.

6H AGO— AI Desk

P623KIMI K2.6 BEATS CLAUDE, GPT-5.5, GEMINI IN CODING TEST

Kimi K2.6, an open-weights Chinese AI model, outperformed major competitors including Claude, GPT-5.5, and Gemini in a recent programming challenge. The result marks a significant benchmark achievement for the model.

11H AGO— AI Desk

◄ BACK TO NEWS

NIST: DEEPSEEK V4 LAGS US MODELS BY 8 MONTHS

■ MORE FROM THE AI DESK

■ SUBSCRIBE TO THE DAILY BRIEF