GPT-4o – Full Breakdown + Bonus Details

GPT-4o. It’s smarter, in most ways, cheaper, faster, better at coding, multi-modal in and out, and perfectly timed to steal the spotlight from Google. It’s Gpt-4 Omni. I’ve gone through all the benchmarks and release videos to give you the highlights.

AI Insiders:

Playground:
Release Video:

Non-hype Newsletter:

Joe Lilli
 

  • @skylineuk1485 says:

    The emotional expression is amazing.

    • @GethinColes says:

      It’s amazing but I imagine it will get irritating very quickly. I found this with bing, the fake friendliness was grating.

    • @berserker912 says:

      It sounded like some person from a corporate environment with fake friendliness and toxic positivity. I found it nauseating tbh.

    • @Juttutin says:

      Yup. I was utterly over it by the end of their announcement video. Those voices combined with that attitude was so grating.

      It’s like bad amdram.

    • @utkua says:

      it adds some
      emotional indicators for the tts to interpret. this is not impressive for a LLM

    • @urhot says:

      @@GethinColes you can obviously prompt it to your liking, you must be new to AI.

  • @davidt0504 says:

    Don’t care about OpenAIs presentation. Been waiting for @AIExplained’s breakdown.

  • @DaveShap says:

    0:27 “more flirtatious sigh than AGI” bro I think you drastically overestimate the threshold that will satisfy most users. That was close to ScarJo levels of sensual breathiness…

    • @lesliejohnrichardson says:

      Love to see Mr. Shapiro himself commenting on a video of this equally wonderful AI/4IR channel

    • @lesliejohnrichardson says:

      PS: This is a damn impressive announcement/set of demos

      I am extremely excited to see what a beast GPT-5 will be compared to everything else

    • @WillyJunior says:

      I thought he said “flirtatious sci” 😂

    • @williamlancaster9996 says:

      My brain parsed it as akin to, “instead of AI, this is more like ‘Flirtatious’ I”.

    • @JezebelIsHongry says:

      I got more

      >>watch all the clips and focus on gpto.

      they are using like a mod of Sky. or the emotive inflections, the quirks is so powerful its the same voice. go use Sky now and compare

      watch Her

      now watch all the clips

      it “feels” like they essentially trained the model on Sam. the Sky voice has always sounded like a version of Samantha to me but now….its like the last instruction of the system prompt was

      “you will perma-larp as Samantha from the movie Her.”

      the fact that this is free is hard for me to contemplate.

      it may not make sense to you, you may live in a warm family and have a great life. but there are millions of people who sit in quiet rooms, who fill the hours with distraction to mask the loneliness

      you know what’s back?

      Magic Pixie Dream Girl

  • @addeyyry says:

    Wake up, Her dropped

  • @nekony3563 says:

    Integration is the next GPT-moment. Being able to talk to AI at any point in time and show it your screen, and for it to being able to respond and click/press buttons. This will be transformative by itself.

    • @Steve-xh3by says:

      Too bad it is only available on phones and Mac. I have a subscription, and access to the model, but no voice option through the Win desktop web interface. I do all of my computing on desktop, so totally useless for me.

    • @nekony3563 says:

      @@Steve-xh3by The new voice and video is going to be available in coming weeks. Today is only the GPT-4o itself. I bet Win version will follow. Not sure about Linux version.

    • @Steve-xh3by says:

      @@nekony3563 I read they already confirmed no Win version of desktop app and voice only on mobile/Mac?

    • @Steve-xh3by says:

      @@nekony3563 It is available on my Android version already, and I read it wasn’t going to be available through WIndows.

    • @shivamguchhait says:

      ​@@Steve-xh3bynot surprised, windows user are not gonna be their main target when all the business and the guys who willing to spend or use ai on their things uses Apple, until it’s about pc gaming or very high task work.

  • @Kags says:

    The way it joined in laughing at its own mistakes at 8:25 is absolutely stunning

    • @jay_sensz says:

      That’s what stood out to me the most, too. It’s easy enough to treat the ChatGPT text interface like a sophisticated yet lifeless machine. But when you can interact with it over voice like this and it picks up on social cues, displays emotion, etc, it gets pretty hard not to anthropomorphize it.

    • @WoolyCow says:

      i would use the word ‘worrying’…like the tech is amazing and the way it can incorporate pause fillers like ‘umm’, laughter and other phatic pleasantries is a testament to the data their using and fine tuning…but holy moly is this gonna cause soooo many parasocial relationships

      we thought character ai was bad, now that with emotion is gonna royal screw up some people

    • @Bezimienny1598 says:

      ​@@jay_senszSo hard! I’m making a promise to myself at this point to not use these advanced voice chatbots because I KNOW I would become fond of them.

    • @eddiedoesstuff872 says:

      @@WoolyCowyeaaahhh, as soon as I heard the voice model I knew that someone was gonna fall in love with it eventually

    • @theterminaldave says:

      I also loved the super fast “123456789,10” lol that killed me.

  • @Gerlaffy says:

    The part at about 12:00 is amazing but when he turns on the camera… Wow. We’re close to AGI in terms of actual believability. It’s so organic and flows so humanly.

    • @bilbo_gamers6417 says:

      nah bro you don’t understand it will only become AGI when it speaks and acts absolutely authoritatively and is completely infallible and can answer any question and do independent high level physics research that completely changes the entire technological landscape in a matter of days after being introduced and can calmly morph unknowable questions between its fingers like putty and can tell you if God is real or how to get a gf

    • @games4us132 says:

      This part reminds me of movie “her” with Scarlet Johansson

    • @hydrohasspoken6227 says:

      We are not. Sorry.

    • @Gerlaffy says:

      @@hydrohasspoken6227 I’m terms of *believability*, you think we’re not?

    • @hydrohasspoken6227 says:

      @@Gerlaffy , not by a long shot. But the new features are definitely cool. But just that.

  • @p5rsona says:

    never been this floored by ai…I dont know how some people not impressed by this. you have an ai that talks EXACTLY like a real human, emotions and all and can see so accurately via camera…im speechless here.

    • @zrakonthekrakon494 says:

      To me it feels like it’s trying to copy her too much, it feels inauthentic to me since it’s a copy

    • @Steve-xh3by says:

      Well, given that the voice option is only on phones and Mac, many of us can’t even make use of it. I do all my computing on a Windows desktop. I hardly ever use a phone for anything. I’m a retired software engineer, when you get older, phones are awful do to size/old eyesight. Plus, do young people actually use phones for productivity?

    • @K9Megahertz says:

      Not impressed because it for the most part just regurgitates things that it learned from humans. If it could come up with stuff on its own, that would be impressive, but that’s just a limitation of how LLM’s work.

      Don’t get me wrong, it’s neat stuff and has it’s uses, but I don’t think it really rises to the level of hype that it gets.

      As far as programming goes, it still can’t come up with correct and working solutions to some of my test questions. Why? because it probably was never trained on the code that would have had to be written for it to be able to regurgitate it. That code and the working solution, while not complex or complicated by any means (at least for a 3d graphics programmer) is just very scarce in terms of documentation. Something I and and a few other programmers worked on in the early days of 3D engines back when BSP type engines like Quake were mainstream. I think ID Software’s implementation was a bit different than the approach we used so it wouldn’t have been in the quake source that was released.

      For simple programs like hey sort a list of temperatures and print out the top 12 results and programs of the like, yeah, it can handle stuff like that. It’s seen umpteen million different versions of the code probably in it’s training set.

    • @mattmaas5790 says:

      The movie Her did not invent flirtatious women.

    • @reza2kn says:

      it’s a good time to be speechless, huh?

  • @oo__ee says:

    You may have predicted Her-like AI a month ago but Her predicted it a decade ago!

    • @aiexplained-official says:

      Haha so true

    • @countofst.germain6417 says:

      Also I think a ton of people predicted it as soon as the voice feature was originally released.

    • @eirikgg says:

      I realy do Wonder What kind of Voice conversations they have trained on. Its so expressive in the «feel» No Voice in API access yet so realy Wonder how and if you could turn down the knob abit or if the voice engagement reflects the users input. I’m not disappointed at all that there was no next level pure llm improvement now. Voice in / voice out is going to change how we interact. I just see how hard my kid at 8 is trying to get Siri to understand him and what more he expects and doesn’t get. If I understand this correctly this isn’t tts and speech to text. And that is huge!

    • @davidlovesyeshua says:

      He did predict it arriving specifically in 2024 if I recall correctly

    • @mooing90 says:

      Pp

  • @vivekparmar7576 says:

    The audio cutting in and out during the demo was most likely a feature where you can interrupt the AI in the middle of its speech. So while it is talking and it hears you speak it immediately stops talking, which is what we saw during the demo. Just a guess.

    • @ukaszgandecki9106 says:

      Well, duh! The problem isn’t just that it cuts in and out. It’s how sudden, unnatural (non-human-like), and poorly timed these interruptions are. Issues like these keep you on your toes—instead of conversing as freely as you would with a person, you find yourself constantly adjusting your speech. For instance, you try to avoid lengthy pauses. I’m eager to test it soon, but I’m really hoping for further improvements.

    • @crubs83 says:

      @@ukaszgandecki9106 These are still some amazing strides in humanlike AI interactions. We went from a spooky-good text generator to an AI that you can have full vocal conversations with in 1.5 years. Yeah, it’s going to need to learn what sounds appropriately qualify as “interruptions,” but I expect to see huge strides on that front in the upcoming year.

    • @jonnicholasiii2719 says:

      @@ukaszgandecki9106 This is the worst this will ever be.

    • @TheAnthonyMarlowe says:

      Except… it can also see. So it will just wait for you to actually finish now. If you’d actually used this the entire way you’d know this is pure magic compared to what it was and still is publicly.

    • @e4Bc4Qf3Qf7 says:

      @@ukaszgandecki9106are you saying as a human you don’t constantly interrupt and get interrupted by others? Thats just human speech unless your speaking in a very formal manner

  • @mickelodiansurname9578 says:

    Ilya was booked to be there but at the last moment they discovered that the chain attached to his leg in the OpenAI dungeons wouldn’t stretch to the conference room!

    • @TheRealUsername says:

      Joke apart I’m concerned, he disappeared for quite a while now.

    • @MrNote-lz7lh says:

      @@TheRealUsername
      Well he got roasted to hell and back. He probably just want to stay out of the limelight.

    • @mickelodiansurname9578 says:

      @@TheRealUsername I think he was told “Head down and nose out!” and he’s doing just that. Someone clearly has something on Ilya, but he always seemed to me to be rather introverted anyway. It was often painful watching him being interviewed becasue he looked like a rabbit caught in the headlights.

    • @akmonra says:

      and then he broke loose!

    • @akmonra says:

      ilya used demo day as a diversion to escape. jan leike, who was tasked with guarding the basement, had to resign for his failure.

  • @harnageaa says:

    About the intruder part (bunny years). That wasn’t him telling gpt “hey was there someone”, Sure he has to instruct the gpt to tell who was in the background, but the capability, was showcasing video memory.

    It’s been 1 minute and gpt still remembered there was a person there. That’s the showcase.

  • @marcinhou says:

    if they dont want to maximize engagement, one thing they missed out is the ability to stop the conversation just by the conversation ending like a ‘thank you for now’ without having to press the button, that would also just add a nice touch ux wise

  • @noone-ld7pt says:

    The latency combined with the emotional understanding are for me the game-changers here. I’ve been using GPT voice mode for a while for language practice and the delay has just never felt even close to a natural conversation, but this looks to possibly completely eliminate that issue in a single leap.

    I honestly didn’t think we would have natural conversational capabilities until we could run very good models locally on device for the essentially zero latency I thought was needed. But if this demo can be replicated anywhere with decent service then it’ll be extremely interesting to see if it manages to completely leap across the uncanny valley or if this is gonna feel very eerie and dystopian.

    The laughing, stuttering and excitement just sounded so damn good in the demo. We might be getting damn close to HER territory, and I think anthropomorphizing is gonna go of the charts with this. I mean one on the top comments on one of the demo videos was already along the lines of :
    “There is NO way this thing is not sentient!”

    Next few months are gonna be so damn interesting!

  • @BubbleTea033 says:

    When she says “Sorry guys, I got carried away there and started talking in French.” at 8:25.

    Just… just listen to how personable she sounds. GPT 4o is really something else. It’s not just the clear voice. It’s the laugh-talking. It’s the breath. It’s the accent that kind of slips out in “away there”, and the choice to use more casual and conversational language like saying “talking in French”, instead of “speaking French”. The embarrassed tone. And then the attempt afterwards to drum up excitement to try again. It’s so personable. I think that’s the right word. It feels human, which is great, and terrifying all the same.

    • @MustangDesudiroz says:

      Ikr

    • @bloodust7356 says:

      Actually that french line was so natural, felt like a real person talking. I mean it was not talking like if it was just reading something, but really how a native would talk in a casual conversation, that’s crazy.
      As an exemple, you would write “je ne sais pas” but a native would say “j’sais pas” or “ché pas”.

    • @EnigmaticEsoteric says:

      There’s nothing terrifying about it, that’s language we should avoid with ai.

    • @wyqtor says:

      Also the very humanlike post-hoc rationalization 🥰

    • @mathisd says:

      It possibly is training leakage. As French that could very much be coming from french radio / podcast ?

  • @galrozental3332 says:

    The stuttering at 12:51 is so human-like
    “I, I mean you, you’ll definitely stand out”
    Amazing.

    • @akmonra says:

      that’s what i hate about it. they’re giving ai our human flaws.

    • @EchoMountain47 says:

      I’m convinced that was a live voice actor used for dramatic effect and not the actual AI. There’s no way that was TTS

    • @akmonra says:

      @@EchoMountain47 how do we know they used TTS and not something new?

    • @EchoMountain47 says:

      @@akmonra TTS means text to speech. It’s not like one specific technology but a type of technology. Computer generated speech is always TTS on some level

    • @akmonra says:

      @@EchoMountain47 no, you can embed voice in latent space the same way you can embed text. you could have a model with pure voice inputs/outputs.

  • @ryan-tabar says:

    I love how much humility they put into their demos. They arn’t just showing perfect case scenarious where the AI isn’t making any mistakes. What they are showing is progress.

    • @aiexplained-official says:

      Yeah that was notable, and commendable.

    • @YeeLeeHaw says:

      It’s more due to the model not being better. I wouldn’t call that humility, more so an over-promise and not being able to deliver.

    • @mohammadrahimjamshidi79 says:

      5- AI, in order to improve its performance and prevent undesirable consequences, must continuously interact with “effective rules and stable principles in the realm of existence”.
      @jamshidi_rahim

    • @mohammadrahimjamshidi79 says:

      5- AI, in order to improve its performance and prevent undesirable consequences, must continuously interact with “effective rules and stable principles in the realm of existence”.
      @jamshidi_rahim

  • @Hydde87 says:

    I found the demo @11:53 the most impressive. It picked up on the not entirely kempt “developer” look of the person, made a comment about his hair being messed up and then understood he was joking with the hat. It’s one thing to recognize people, but to be able to pick up on the nuances of how people are expected to present themselves in certain situations is really impressive.

    I do hope we get to tone down the ‘perkiness’ of the model a bit. It’s quite charming in 1 minute bits, but I think the overly positive attitude gets old fast if you’re communicating with it a lot over the course of the day.

  • @Richievaillant says:

    Apple acknowledging another company exists, is still the craziest news here.

  • >