o3 breaks (some) records, but AI becomes pay-to-win

@CalConrad says:

April 25, 2025 at 7:19 pm

I see an AI Explained notification, I click on an AI Explained notification.

Reply

@MartinDlabaja says:

April 25, 2025 at 7:26 pm

You see a trendy comment, you copy pasted the comment. Awesome. gg.

Reply
@moedemama says:

April 25, 2025 at 7:38 pm

I see these stupid and worthless comments on every video

Reply
@Ginto_O says:

April 25, 2025 at 7:56 pm

What’s “ai explained”?

Reply
@brianbuntz4072 says:

April 25, 2025 at 10:12 pm

Some of the other AI feeds are cringeworthy hype fests. Nice to keep up to date without the extra drama layer.

Reply
@homefreedome217 says:

April 26, 2025 at 3:19 am

Yep!

Reply

@OnigoroshiZero says:

April 25, 2025 at 7:33 pm

OpenAI’s AGI definition is essentially ASI…
A decade ago we would define current multimodal models as AGI.

Anyway, thanks for another amazing video.

Reply

@maloxi1472 says:

April 25, 2025 at 8:55 pm

OpenAI’s definition is actually far below AGI

Reply
@etz8360 says:

April 25, 2025 at 9:19 pm

@@maloxi1472what’s your definition then?

Reply
@zoeherriot says:

April 26, 2025 at 1:36 am

@@etz8360 Well I can tell you that OpenAI and MS agreed that the definition was the ability to generate 100 billion in revenue per year. Which is not directly tied to the level of intelligence.

Reply
@mr.nicolas4367 says:

April 26, 2025 at 1:48 am

You have no idea what you are talking about. That’s the Dunning-Kruger effect talking

Reply
@TheTabascodragon says:

April 26, 2025 at 6:57 am

I would define AGI as an AI agent that can autonomously perform any task that a human can

I would define ASI as an entity capable of exceeding human performance in all tasks, and probably even doing tasks humans can’t. I think ASI will very likely either have some level of sentience, or something similar to sentience.

Reply

@George-h9j says:

April 25, 2025 at 7:34 pm

Tbh, focusing too much on the price feels a bit off. We have no idea what’s behind it.. maybe googles just throwing cash around to dominate the market, or maybe they’re just way better at scaling their infrastructure. Either way, price doesn’t tell the whole story when it comes to model quality or innovation

Reply

@apache937 says:

April 25, 2025 at 9:28 pm

If only they revealed the model size and what quant they run at

Reply
@pmuz482 says:

April 25, 2025 at 11:41 pm

its pretty funny, certainly seems google is partly following the market finally, but also doing this to spite openai which is just hilarious. They are doing the google CLASSIC of undercutting and stealing the market.

Reply

@viyye says:

April 25, 2025 at 7:38 pm

every time I give o3 documents it just gets totally lost and repeats itself when I try to steer it back in line, I have to change to model to 4o to get it analys the documents, and these are sometimes less than 20 page long

Reply

@GeoMeridium says:

April 25, 2025 at 8:05 pm

If you are not using the API version of o3 and o4-mini, the temperature gets adjusted automatically, so as to deter companies like Deepseek from training off the model. This practice has a negative effect on overall performance and tool use.

Reply
@apache937 says:

April 25, 2025 at 9:31 pm

in chatgpt i dont think the reasoning models have pdf reading ability

Reply
@viyye says:

April 25, 2025 at 10:14 pm

@apache937 yes they do

Reply
@juliankohler5086 says:

April 26, 2025 at 12:32 am

o3 it’s confused like that because it is built wrong. o3 is not built to give you what you want. I think the main mission of o3 is to gather data for the next reasoning model. o3 is not supposed to be a good assistant. It basically confessed that to me (after explicitly lying about how it works and saying it only predicts next words). I pressed o3 with facts about its reasoning mechanisms, and it eventually said that it has priorities that outweigh my prompt, with the goal of generating the “ideal answer” according to that internal mathematic, frequently getting rid of constraints I set. These benchmarks are meaningless. Getting the results you want is what we should be focusing on. o3 does not exist to do that, and that’s worrying. The goal of the o-series is to eventually get rid of the need for a user, I think.

Reply

@Dannnneh says:

April 25, 2025 at 7:38 pm

My jaw dropped when I saw o3’s performance on long context.
The face-touching question is hilarious in its absurd common sense, getting it wrong is a ridiculous notion, human dubby for this one.
I will never not be grateful for your updates. o7

Reply

@ryzikx says:

April 25, 2025 at 8:17 pm

same. always thought google had the edge for super long context

Reply
@etz8360 says:

April 25, 2025 at 9:18 pm

chatgpt o7? 👀👀👀

Reply
@Valentininia says:

April 26, 2025 at 12:16 pm

@@etz8360o7 wrote that comment… it’s from future

Reply

@jonp3674 says:

April 25, 2025 at 8:05 pm

I am not really convinced the General in AGI is particularly important.

For instance I have a maths PhD and I think Gemini is now better than me at mathematics, hands down in terms of breadth of knowledge and speed, it’s already getting to the point where it’s helping researchers a lot and will soon be speeding them up a lot an ushering in the singularity.

I don’t see why it matters particularly that I can fry an egg and it can’t.

Reply

@Gafferman says:

April 25, 2025 at 8:18 pm

It’s the difference between a menu and a fully functional robot waiter providing table service.

Reply
@konstantinlozev2272 says:

April 25, 2025 at 8:32 pm

The LLMs of today are a incredibly biased toward maths.
And most other work requires immense detailed and intricate context/knowledge.

Reply
@facts-ec4yi says:

April 25, 2025 at 8:35 pm

Yeah, I think the same way. I’m nowhere near as smart as you in maths, but I am an undergraduate in AI & Comp-Sci, and it’s way better than I am at mathematics and computer science. Also, the rate at which it improves is faster than I can learn, so I can never catch up to it haha.

Reply
@lucid8302 says:

April 25, 2025 at 8:44 pm

Everyone thinks of AGI as a kind of point on a timeline. But the reality is that intelligence is more like a spectrum, and every time llm gets smarter, everyone just shifts and adjusts the requirements for AGI. If AGI is a duck, then llm will never become AGI.

Reply
@anearthian894 says:

April 25, 2025 at 8:53 pm

But i heard they dont know the algorithm of multiplication yet? Like they cant multiply accurately after certain length of numbers…without calculator. Still its interesting how they can do phd level maths

Reply

@sircramthel8664 says:

April 25, 2025 at 8:08 pm

I’ve never really liked these thinking models. They are really argumentative, have no personality, can never accept that they can be wrong, and just do dumb stuff. Honestly, I find gpt 4.5 and 4.1 leagues ahead these thinking models. For programming big projects, they aren’t as good (but I find thinking models to still suck), but for real-world tasks, and being enjoyable to talk to, 4.1 and 4.5 and even 4o are much better than these thinking models. Gemini 2.5 feels a bit better than o3, but still not as good as the other models.

Reply

@r0bophonic says:

April 25, 2025 at 8:08 pm

12:00 “Most economically valuable work.” Notably it does not say “knowledge work”. That definition of AGI requires embodiment to perform physical labor, i.e., robots.

Reply

@apache937 says:

April 25, 2025 at 9:29 pm

ASI gotta be pretty ddamn stupid if it cant do physical actions

Reply
@zoeherriot says:

April 26, 2025 at 1:42 am

That’s a good point – but worth noting that knowledge work accounts for more than 60% of the US economy.

Reply
@theWACKIIRAQI says:

April 26, 2025 at 3:46 am

@@zoeherriotsheaaat lol

Seriously tho

Reply
@zoeherriot says:

April 26, 2025 at 3:54 am

@ Yeah, the other issue is how categorise economically valuable work. For instance, “industry” represents 20% of the workforce, but it only contributes to 10% of the GDP. So you can easily play with numbers here. I think the issue is the original statement is meaningless without more qualifiers.

Reply
@unvergebeneid says:

April 26, 2025 at 12:17 pm

It’s the old adage of software having so many features, it can even make coffee. With the implicit understanding that of course software can do a lot of things but making a coffee, that’s something you’ll still have to do yourself.

(And if anybody brings up barista robots, do me a favour and get at least diagnosed.)

Reply

@maks_st says:

April 25, 2025 at 8:12 pm

7:25 So practically speaking, we are living at a time when it’s actually really cheap for us users to leverage the models, given their functionality included in the $20 per month. But this will change and either the functionality will be limited or we’ll have to pay more.

Similar to early VC-funded industries when the service is cheap and gets more expensive as the market matures.

Reply

@fark69 says:

April 26, 2025 at 4:34 am

Why do you think this? It’s pretty clear to me that there’s no moat (as Google mentioned) and AI will be one of those things where most people use free or OSS, like operating systems are

Reply
@SnapDragon128 says:

April 26, 2025 at 7:42 am

No, the free models 2 years from now will be much smarter than the free models available now. (It’s very much a “rising tide lifts all boats” situation.) What _may_ change is that they’ll be dumber than the top-end state-of-the-art models that will cost thousands of dollars to use.

Reply
@Yasmina-n3u9x says:

April 26, 2025 at 10:46 am

@@SnapDragon128 Yeah I agree, I’m pretty sure the free models in 1 year will already be better than o3 (high) now. Just like Gemini 2.5 pro is free and definitely better than any models there were a year ago

Reply
@moozooh says:

April 26, 2025 at 12:28 pm

@@fark69 Even if the models are free, running them is not (nor is training or fine-tuning them). Hardware and electricity costs are part of the equation. I could run a model locally but it would cost me more than using a model of comparable quality via API. And if I upgrade my hardware to run a bigger model, it’d be even more expensive with that upgrade factored in.

Reply

@OverLordGoldDragon says:

April 25, 2025 at 8:59 pm

My comment on the Green Card situation is that now is absolutely not the time to “speak up only when you’re directly affected”.
AGI falling into the hands of this admin. might not be much better than it falling into China’s. I don’t say this lightly or ideologically.
Not only are they not speaking up, some are caving in outright – so far I’m aware of only Zuck, but I’d keep an eye on the rest.

Reply

@fark69 says:

April 26, 2025 at 4:36 am

AGI will obviously be in everyone’s hands. It doesn’t seem possible to stop competitors from progressing in AI. I mean, remember Deepseek?

Reply
@Houshalter says:

April 26, 2025 at 7:50 am

How is that not disgustingly ideological?

Reply
@missoats8731 says:

April 26, 2025 at 10:56 am

I think the term “ideological” is completely misunderstood. Everything we do and say is ideological. It’s just a system of ideas that you believe in. If you don’t want the most powerful invention of humankind to be controlled by a dictator, of course that’s an ideology. And every sane person would agree with that ideology.

Reply

@OperationDarkside says:

April 25, 2025 at 9:17 pm

I used Gemini 2.5 Pro to solve a very persistent bug in my C++ game engine. For me this would have taken hours, but probably days, to solve. Gemini solved it one shot, only a single minor error, in ~60s. As I get older, I will never be able to beat that. It can even write correct WebGPU shaders, which there’s almost no training data for. If I didn’t have bigger problems right now, I would be ecstatic.

Reply

@TheFeelTrain says:

April 25, 2025 at 11:04 pm

I had it help me write a pretty complex vapoursynth script and was surprised at how much it knew. Vapoursynth is a niche within a niche, there can’t be very much training data for it.

And yet I had one issue that it was able to not only fix but explain why it was happening. When I tried searching for it myself I could not even find a single result relating to it. I was blown away.

Reply
@OperationDarkside says:

April 25, 2025 at 11:16 pm

@@TheFeelTrain There must be either a really high quality training data set no one else has or there’s some yet unknown trick to their RL setup.

Reply
@GrindThisGame says:

April 26, 2025 at 7:27 am

@@OperationDarkside Wait until they just read the entire doc/book about the language, learn it on the fly and code in it right away all in context.

Reply
@CyanOgilvie says:

April 26, 2025 at 2:04 pm

@@GrindThisGame They can do this already – I use this technique all the time: give Claude the entire book (like the 160 page pdf documentation on libtomcrypt), then point it at your code and have it implement wrappers for new parts of the library, extrapolating the patterns established by the existing hand-written portions. It’s not perfect but it’s very, very good. Also it doesn’t need extensive training data for a low-resourced language or library or whatever – provided the examples it has (even just in the prompt) reveal a consistent design language and thinking it’s extremely good at just guessing the stuff it hasn’t seen. It does this (successfully) all the time for me, with things it can’t ever have seen before (because I just wrote them). This is something that I think we’re missing mostly – it’s not so much that the models “know” facts, but rather that they’re really good at guessing.

Reply

@malikmartin7410 says:

April 25, 2025 at 9:20 pm

One of my favorite videos of yours on the channel is “AI won’t be AGI until it can at least do this” in which you showcased areas where models fail and later introduced many of the new techniques that are being used now. Would be interesting to see you make a part two of that video given how much progress has been made.

Reply

@fark69 says:

April 26, 2025 at 4:32 am

Problem is that as soon as a flaw becomes known it is extremely easy to patch (you just fill examples of the problem into the training data for the next one). But fixing the fundamental flaw is much harder. That’s the problem with all these benchmarks

Reply

@pigeon_official says:

April 25, 2025 at 10:15 pm

12:00 I think this definition of AGI is terrible because notice how he says just “humans” not “the average human” no, just humans. That would mean it outperforms every human in the entire world at most economically valuable work, which is just ASI bro

Reply

@Valentininia says:

April 26, 2025 at 12:16 pm

Nah ASI is more then that. I would say the thing you are describing is between AGI and ASI

Reply

@nicdemai says:

April 25, 2025 at 11:52 pm

4:49 Marking GPT-3.5 as Blind wa just painful.

Reply

@2hcy says:

April 25, 2025 at 11:59 pm

How did you estimate 1000x bigger models in 2030? That would imply 1,000,000x training compute…? Are you saying there will be 100x (10,000x–> 1,000,000) training speed/bandwidth optimization in 5 years?

1000x bigger models seems very unlikely to me

Reply

@TheTabascodragon says:

April 26, 2025 at 6:46 am

I think the biggest bottleneck right now is the speed at which we’re able to physically build and power data centers. If that got dramatically faster somehow, then I would be more inclined to believe him

Reply
@2hcy says:

April 26, 2025 at 10:03 am

@TheTabascodragon For sure. Epoch has done some great work on this topic. It’s very difficult to build the power needed and that’s one of the first blockers to go past 10,000x to 50,000x compute by 2030. The others are chip production and data scarcity (at the 5 OOM range), and latency (in the 5-6 OOM range).

China on the other hand might not have the same difficulty building out the power required – they built 10x more new electric power than the US for each of the last 3 years at ~160-450GW compared to US’s 17-47GW. However, they have a chip shortage which should continue imo into ~2030.

It’s possible a huge breakthrough similar to LRM via RL is achieved that pushes us closer to AGI faster. But in terms of scaling I really doubt we’ll have 3 OOM bigger models (which would be in the ~5 quadrillion parameter range).

Reply

@2hcy says:

April 26, 2025 at 12:13 am

How did you calculate 12 orders of magnitude more inference compute required? Even if OpenAI go from 160M to 2B daily users, each user uses chat 10x more, each chat is spins up 10 instances (agent or whatever), models are 1000x bigger (unlikely, they’d more likely be 100x bigger), and some other magical 100x, I’m still at “only” 8 OOM???

Reply

@Spartansareawesome11 says:

April 26, 2025 at 1:27 am

Does that count longer and deeper reasoning levels over time?

Reply
@2hcy says:

April 26, 2025 at 1:53 am

@Spartansareawesome11 yeah, so, if the reasoning levels go up by 10,000 (4 OOM to meet 12 OOM mentioned in video), then the latency of the model would rise to on the order of 5 hours. Imagine chatting with chatgpt 10x more daily but each reply takes 5 hours. So I better load up all my questions in the morning and hopefully I’ll have my answers ready for me before dinner! 😂

Reply

@humunu says:

April 26, 2025 at 12:15 am

Thanks “me old mucker”. Always grateful to be kept up to speed!

Reply

@annaczgli2983 says:

April 26, 2025 at 12:59 am

OpenAI has to give a rosy revenue forecast to keep their investors, particularly SoftBank, happy.

Reply

@CasualTortoise says:

April 26, 2025 at 5:35 am

I feel like that physics benchmark is way too easy if models are already close to 50 %.

I agree with your analysis that AGI being close does not make sense in the light of their own economic projections. I also think that a lot of projects they are doing seems like a total waste of time if they truly believe d AGI was that close. Like why work on Sora if you in two years could have a system smarter than all of humanity combined 🤷🏻‍♂️?

Reply

@eyeofthetiger7 says:

April 26, 2025 at 1:42 pm

The long context performance is the most important stat here

Reply

o3 breaks (some) records, but AI becomes pay-to-win

Related Posts

Joe Lilli