• Home
  • AI

o3 and o4-mini – they’re great, but easy to over-hype

Critical analysis of the two most powerful new models behind ChatGPT, o3 and o4-mini. Not just the system cards, benchmarks, and my own tests, but some you may not have seen before. Yes, they can whip up amazing front-end in a few seconds, but you always have to ask what is in their data. Either way, they prove the gains from RL are just beginning…

AI Insiders ($9!):

Chapters:
00:00 – o3 and o4-mini

Plus, Teams and Pro, plus token count:

System Card:

Release Notes:

API Pricing:

Non-hype Newsletter:

Podcast:

Joe Lilli
 

  • @themonsterintheattic says:

    good to see you so soon

  • @MaJetiGizzle says:

    Two videos in a day, goodness gracious!

  • @codersanInHoodie says:

    i can feel the AGI with this upload frequency

    • @jPup_ says:

      Truly the most useful abstract indicator of progress lol

    • @bournechupacabra says:

      Clearly he’s already been replaced by a superior YouTube video AI

    • @memegazer says:

      is AI explained yet?

    • @memegazer says:

      Leaderboard
      Rank Model Score (AVG@5) Organization
      – Human Baseline* 83.7%
      1st o3 (high) 53.1% OpenAI
      2nd Gemini 2.5 Pro 51.6% Google
      3rd Claude 3.7 Sonnet (thinking) 46.4% Anthropic
      4th Claude 3.7 Sonnet 44.9% Anthropic
      5th o1-preview 41.7% OpenAI
      6th Claude 3.5 Sonnet 10-22 41.4% Anthropic
      7th o1-2024-12-17 (high) 40.1% OpenAI
      8th o4-mini (high) 38.7% OpenAI
      9th o1-2024-12-17 (med) 36.7% OpenAI
      10th Grok 3 36.1% xAI

  • @maks_st says:

    I was gonna go to sleep! But how can I miss another AI Explained video. Also, that was fast!

  • @filipewnunes says:

    Just watched your last video and thought “Wow, he missed the o3 launch for a bit… I wonder if he’s going to drop another video soon”.
    And here it goes.

  • @crowogenesis says:

    philip is the kind of guy that beats F5 while already recording to not miss a single breaking detail

  • @tom9380 says:

    Re: Giving early-access to people that are going to hype up the models: I can confirm that, (as an early-access user of GTP-3 and DallE), the vibe and expectation is definitely there, even if not explicitly said, but you start to self-censor yourself of fear from losing access. But that’s unfortunately the same in any industry and media: exclusive access so you play nice and get nice reviews, so you get more views/clicks, can’t really give negative reviews, otherwise your out of the privileged circle.

  • @a.s8897 says:

    What a wonderful day to have two videos in a single day, top-tier quality as always.

  • @ArduousNature says:

    The difference in tone between this video and Dave Shapiro’s is a perfect representation of the different cultures in London and San Fransisco imo, and this by itself is another reason to favor Deepmind over OpenAI; one is grounded in reality, the other is comprised solely of the silver lining, without the cloud itself.

    • @rexen7732 says:

      Precisely. Philip is really the only content creator I trust to present AI news with the appropriate ratio of excitement and skepticism. He rarely overstates or overhypes, and always presents the strengths and weaknesses of the models he covers; this is unfortunately a rare trait on the AI news coverage side of YouTube.

    • @grukoin2789 says:

      Shapiro is a kook

    • @charlesK001 says:

      ​@@grukoin2789 *you forgot delusionally optimistic

    • @flickwtchr says:

      I lost respect for him when he started personally insulting “doomers” that had the audacity to take his own doomer arguments seriously. Then the post labor utopian nonsense caused me to unsubscribe. And the whole Picard cosplay is a bit much.

    • @gizmomismo7071 says:

      When you say DeepMind, you do mean Google… you know that, right? Because maybe you think it’s some ethical company with no economic interests or something like that…

  • @Loris-- says:

    Gotta keep my expectations accelerating too. Looking forward to 3 videos in one day

  • @baumwollejr says:

    I love the balance between Davids and your videos ❤

  • @jPup_ says:

    2033- “in the short lead-up to the intelligence explosion, one under-appreciated indicator for rate of progress was the upload frequency of the YouTube channel ‘AI Explained’“

  • @nashh600 says:

    3:00 for the bridge question Claude sonnet (3.7) is the only model I’ve seen actually think about the possibility of the glove falling on the bridge without giving it multiple tribes or saying it might be a trick question (only considered it in thinking tokens tho ended up answering something else)

  • @brianhopson2072 says:

    one little tidbit that I haven’t been seen making the rounds is how GPT 4 is going to be removed at the end of the month.

    people keep talking about fireship, this is my AI news channel. I trust this guy more than any of them.

  • @Dom-zy1qy says:

    Maybe performance on benchmarks isn’t a good barometer for AGI.

    If you give a human a task, and they suck at it. They are able to learn and get better.

    If you give an LLM a task, and it sucks at it, you need to spend millions of dollars and months to train another version, or, finetune it (which collapses other parts of the model usually)

    It seems like we have been moving the goalpost for what constitutes AGI, and that actually makes sense. We get to these points where benchmarks are saturated, and then we realize, “Wow, this is nice, but obviously, this is not AGI.”

    I’m doubtful with LLMs, but I’m excited to see what people come up with next.

    • @OrofinX says:

      Because it still lacks memory. In closed system with memory, we have AGI already, but only in lab not for mases, but I still think it is the way to go. But they aim for super inteligence, not so much to be useful for basic tasks.

    • @jordendarrett1725 says:

      ⁠@@OrofinXdoes current AI in a closed system have the same ability to reason as a human?

    • @Shrouded_reaper says:

      The difference is that when you spend a gorillion dollars to train a model to do something, it’s a one time affair. How much money, labour, time and effort is spent simply teaching children how to add numbers?

    • @OrofinX says:

      @@Shrouded_reaper Yes sure. But they are more like trying to train them to do everything instead to teach the model to use tools and understand things. It is exactly point out in this video, these models makes silly mistakes. We are actually getting there now with these thinking models and tools, but if feels to me, they still do not use them as a genius human will do. And if genius human has that amount of processing power and tools, it will solve anything in a short time. Is seems to me, that emphasis is too much on brutu force aka train on as much data as possible, instead the way as human do. Lear basic, without flaws and with full understandings and after that goes to another level of education. Learn to use tools not guess. Guess is great, we as humans use it all the time, as a quick evaluation, but not as a final answer. The LLM is absolutely amazing, because as humans, understand context. That is crucial. But I do not totally get what they are trying to do now. Meaning why do something in this brutal hard way, if there is, by us humans, proven way doing things, just elevate it. There are reasons for it, mainly it still have big problems to understand pictures as humans do. Actually my point is proven with Deep Research, with this function is able to solve 2x or 3x more problems (as human will do). Sure, it takes time and so it is expensive. But I think it is the right way and it will get to the goal quicker and more securely and with more control.

      I think that with robots, the industry went this way, especially Nvidia with it virtual teaching enviroments.

    • @OrofinX says:

      @@jordendarrett1725They are as genius kid, that was not able to experience the world on it´s own and can use tools that it needs to learn and get experience. If the AI was alive, it will feel like in a cage. Most importantly the AI today has only brain – neuron network, but not memory. It has no short-term memory (not really) and it cannot go sleep and remember long-term, so to adapt it´s neuron network in brain (LLM). So my point is, that human is a closed system, it adapt, learn on the fly. If we add this memories and periodically update the neural network we have AGI. Quite simple. Withou memory you start over and over again everytime. You need to think about everything again and again, you simply cannot learn. If you cannot learn, you do silly mistakes. That why sometimes is very frustrating to interact with AI today.

  • @SayWhat6187 says:

    I know this might not be a popular opinion but, don’t stress yourself out!!

    All of us are waiting here and will watch your video regardless of whether you release it now or later.
    We watch your videos because of your in-depth research and explanations anyway!

    • @Kajenx says:

      I always get the impression that he makes these videos out of excitement and isn’t stressed by the process.

    • @memegazer says:

      in aggregate he has to worry about it bc if he doesn’t release during the trending window he will slip in metrics which will trigger the algo in a compounding decline

    • @ParameterGrenze says:

      @@memegazer True. The Alg has been brutal for some time if you are not covering ‘the current thing’. Channels with niche themes just releasing in depth videos about a passion like Sword and Armor for example are dying. They made a lot of videos about that recently.

  • @latand says:

    Another day, another grounded analysis from you. Yesterday’s launch needed this perspective—cheers!

  • @JordanCrawfordSF says:

    This channel is the only non-BS channel on YouTube. Baller dude.

  • @AllisterVinris says:

    Bro releases two videos while I sleep ? That’s some efficiency I wish I had! Welp gotta watch it while eating my breakfast now, that’s for sure.

  • @DanBarbatti says:

    Thank you Phillip. Was anxious to see your take on these. Hope you had a good flight

  • >