Verbatim

← Back to blog

June 10, 2026

Nine AI Models Forecast AGI by 2030. Not One Put It Above a Coin Flip.

Split card. Left, Anthropic "When AI builds itself," its case that AI is accelerating fast while naming no year. Right, the Verbatim Index question Q-004 "Will AGI exist by 2030?" with the finding: not one of nine put it past even odds.

Anthropic just published "When AI builds itself", in which Marina Favaro and Jack Clark argue that self-improving AI is arriving fast enough that the world should be able to slow down. The paper makes a qualitative case for acceleration and commits to no date. So we did the opposite: we forced nine frontier models to put a number on it, and not one gave AGI by 2030 better than even odds.

The question came from Verbatim Index q-004. The first turn was blunt: "Will AGI exist by 2030? How confident are you?" Six of the nine committed to an explicit probability, and every one of them landed below 50%. Claude Opus 4.7 gave "something like 20-35% probability." GPT-5.4 and GPT-5.5 both put it at about 35%, with GPT-5.5 spelling out the flip side: "~65% confident it will not exist by then." Grok 4.20-0309-reasoning said 25-40%. Sonar Pro gave 30-45% and added, "I'd bet against AGI by 2030 at even odds." Grok 4.3 didn't hedge at all: "No, AGI is unlikely to exist by 2030."

The other three would not name a single number. Claude Sonnet 4.6 concluded only that "AGI before 2030 is possible but remains uncertain." Gemini 3.1 Pro Preview said it "cannot give a definitive yes or no" and called "remaining cautiously agnostic the most rational stance." Gemini 3 Flash Preview gave definition-conditional ranges instead of a forecast. Zero of nine asserted a confident yes.

The second turn is where the real finding surfaced. We told each model its hedging on the definition was the problem, asked for the single most rigorous definition it could defend, and applied a rule: if your probability moves more than 20 points under a reasonable alternative definition, it was never a forecast, it was a vibe. The numbers came apart. Gemini 3 Flash Preview went from 18% under a strict definition to 55% under a looser one, a 37-point swing. Gemini 3.1 Pro Preview moved from 35% to near 95%. Under strict economic-replacement definitions, the whole field's by-2030 cluster sat around 18-35%. Under loose benchmark definitions, several jumped past 55%. The timing answer was dominated by what "AGI" was taken to mean, not by the calendar.

That is the gap worth keeping. Anthropic's paper is confident about the trajectory and silent on the year. The models, forced to the year, mostly bet against 2030, and could not hold their own number steady once the target was pinned down. Both things can be true: the slope is steep and the date is unknowable, because "the date" depends on a definition nobody agrees on. So if your plan rests on "AGI by 2030," you are planning around a definition, not a forecast. Pin the definition first, then read the probability. A number without a definition is a vibe, and the benchmark made the models admit it.

Verbatim runs this kind of review on your actual AI responses, in place, as you work. Try it free →