GPT-4o – Full Breakdown + Bonus Details
GPT-4o. It’s smarter, in most ways, cheaper, faster, better at coding, multi-modal in and out, and perfectly timed to steal the spotlight from Google. It’s Gpt-4 Omni. I’ve gone through all the benchmarks and release videos to give you the highlights.
AI Insiders:
Playground:
Release Video:
Non-hype Newsletter:
The emotional expression is amazing.
It’s amazing but I imagine it will get irritating very quickly. I found this with bing, the fake friendliness was grating.
It sounded like some person from a corporate environment with fake friendliness and toxic positivity. I found it nauseating tbh.
Yup. I was utterly over it by the end of their announcement video. Those voices combined with that attitude was so grating.
It’s like bad amdram.
it adds some
emotional indicators for the tts to interpret. this is not impressive for a LLM
@@GethinColes you can obviously prompt it to your liking, you must be new to AI.
Don’t care about OpenAIs presentation. Been waiting for @AIExplained’s breakdown.
:))
Same 😁
the hero we need in AI distruption
yezzirrrr
0:27 “more flirtatious sigh than AGI” bro I think you drastically overestimate the threshold that will satisfy most users. That was close to ScarJo levels of sensual breathiness…
Love to see Mr. Shapiro himself commenting on a video of this equally wonderful AI/4IR channel
PS: This is a damn impressive announcement/set of demos
I am extremely excited to see what a beast GPT-5 will be compared to everything else
I thought he said “flirtatious sci” 😂
My brain parsed it as akin to, “instead of AI, this is more like ‘Flirtatious’ I”.
I got more
>>watch all the clips and focus on gpto.
they are using like a mod of Sky. or the emotive inflections, the quirks is so powerful its the same voice. go use Sky now and compare
watch Her
now watch all the clips
it “feels” like they essentially trained the model on Sam. the Sky voice has always sounded like a version of Samantha to me but now….its like the last instruction of the system prompt was
“you will perma-larp as Samantha from the movie Her.”
the fact that this is free is hard for me to contemplate.
it may not make sense to you, you may live in a warm family and have a great life. but there are millions of people who sit in quiet rooms, who fill the hours with distraction to mask the loneliness
you know what’s back?
Magic Pixie Dream Girl
Wake up, Her dropped
Profound…
Looooool
Not that far fetched now as compared to when the movie came out, Is it?
Wake up, my girlfriend dropped lmao
Don’t know this but I’m curious. Her?
Integration is the next GPT-moment. Being able to talk to AI at any point in time and show it your screen, and for it to being able to respond and click/press buttons. This will be transformative by itself.
Too bad it is only available on phones and Mac. I have a subscription, and access to the model, but no voice option through the Win desktop web interface. I do all of my computing on desktop, so totally useless for me.
@@Steve-xh3by The new voice and video is going to be available in coming weeks. Today is only the GPT-4o itself. I bet Win version will follow. Not sure about Linux version.
@@nekony3563 I read they already confirmed no Win version of desktop app and voice only on mobile/Mac?
@@nekony3563 It is available on my Android version already, and I read it wasn’t going to be available through WIndows.
@@Steve-xh3bynot surprised, windows user are not gonna be their main target when all the business and the guys who willing to spend or use ai on their things uses Apple, until it’s about pc gaming or very high task work.
The way it joined in laughing at its own mistakes at 8:25 is absolutely stunning
That’s what stood out to me the most, too. It’s easy enough to treat the ChatGPT text interface like a sophisticated yet lifeless machine. But when you can interact with it over voice like this and it picks up on social cues, displays emotion, etc, it gets pretty hard not to anthropomorphize it.
i would use the word ‘worrying’…like the tech is amazing and the way it can incorporate pause fillers like ‘umm’, laughter and other phatic pleasantries is a testament to the data their using and fine tuning…but holy moly is this gonna cause soooo many parasocial relationships
we thought character ai was bad, now that with emotion is gonna royal screw up some people
@@jay_senszSo hard! I’m making a promise to myself at this point to not use these advanced voice chatbots because I KNOW I would become fond of them.
@@WoolyCowyeaaahhh, as soon as I heard the voice model I knew that someone was gonna fall in love with it eventually
I also loved the super fast “123456789,10” lol that killed me.
The part at about 12:00 is amazing but when he turns on the camera… Wow. We’re close to AGI in terms of actual believability. It’s so organic and flows so humanly.
nah bro you don’t understand it will only become AGI when it speaks and acts absolutely authoritatively and is completely infallible and can answer any question and do independent high level physics research that completely changes the entire technological landscape in a matter of days after being introduced and can calmly morph unknowable questions between its fingers like putty and can tell you if God is real or how to get a gf
This part reminds me of movie “her” with Scarlet Johansson
We are not. Sorry.
@@hydrohasspoken6227 I’m terms of *believability*, you think we’re not?
@@Gerlaffy , not by a long shot. But the new features are definitely cool. But just that.
never been this floored by ai…I dont know how some people not impressed by this. you have an ai that talks EXACTLY like a real human, emotions and all and can see so accurately via camera…im speechless here.
To me it feels like it’s trying to copy her too much, it feels inauthentic to me since it’s a copy
Well, given that the voice option is only on phones and Mac, many of us can’t even make use of it. I do all my computing on a Windows desktop. I hardly ever use a phone for anything. I’m a retired software engineer, when you get older, phones are awful do to size/old eyesight. Plus, do young people actually use phones for productivity?
Not impressed because it for the most part just regurgitates things that it learned from humans. If it could come up with stuff on its own, that would be impressive, but that’s just a limitation of how LLM’s work.
Don’t get me wrong, it’s neat stuff and has it’s uses, but I don’t think it really rises to the level of hype that it gets.
As far as programming goes, it still can’t come up with correct and working solutions to some of my test questions. Why? because it probably was never trained on the code that would have had to be written for it to be able to regurgitate it. That code and the working solution, while not complex or complicated by any means (at least for a 3d graphics programmer) is just very scarce in terms of documentation. Something I and and a few other programmers worked on in the early days of 3D engines back when BSP type engines like Quake were mainstream. I think ID Software’s implementation was a bit different than the approach we used so it wouldn’t have been in the quake source that was released.
For simple programs like hey sort a list of temperatures and print out the top 12 results and programs of the like, yeah, it can handle stuff like that. It’s seen umpteen million different versions of the code probably in it’s training set.
The movie Her did not invent flirtatious women.
it’s a good time to be speechless, huh?
You may have predicted Her-like AI a month ago but Her predicted it a decade ago!
Haha so true
Also I think a ton of people predicted it as soon as the voice feature was originally released.
I realy do Wonder What kind of Voice conversations they have trained on. Its so expressive in the «feel» No Voice in API access yet so realy Wonder how and if you could turn down the knob abit or if the voice engagement reflects the users input. I’m not disappointed at all that there was no next level pure llm improvement now. Voice in / voice out is going to change how we interact. I just see how hard my kid at 8 is trying to get Siri to understand him and what more he expects and doesn’t get. If I understand this correctly this isn’t tts and speech to text. And that is huge!
He did predict it arriving specifically in 2024 if I recall correctly
Pp
The audio cutting in and out during the demo was most likely a feature where you can interrupt the AI in the middle of its speech. So while it is talking and it hears you speak it immediately stops talking, which is what we saw during the demo. Just a guess.
Well, duh! The problem isn’t just that it cuts in and out. It’s how sudden, unnatural (non-human-like), and poorly timed these interruptions are. Issues like these keep you on your toes—instead of conversing as freely as you would with a person, you find yourself constantly adjusting your speech. For instance, you try to avoid lengthy pauses. I’m eager to test it soon, but I’m really hoping for further improvements.
@@ukaszgandecki9106 These are still some amazing strides in humanlike AI interactions. We went from a spooky-good text generator to an AI that you can have full vocal conversations with in 1.5 years. Yeah, it’s going to need to learn what sounds appropriately qualify as “interruptions,” but I expect to see huge strides on that front in the upcoming year.
@@ukaszgandecki9106 This is the worst this will ever be.
Except… it can also see. So it will just wait for you to actually finish now. If you’d actually used this the entire way you’d know this is pure magic compared to what it was and still is publicly.
@@ukaszgandecki9106are you saying as a human you don’t constantly interrupt and get interrupted by others? Thats just human speech unless your speaking in a very formal manner
Ilya was booked to be there but at the last moment they discovered that the chain attached to his leg in the OpenAI dungeons wouldn’t stretch to the conference room!
Joke apart I’m concerned, he disappeared for quite a while now.
@@TheRealUsername
Well he got roasted to hell and back. He probably just want to stay out of the limelight.
@@TheRealUsername I think he was told “Head down and nose out!” and he’s doing just that. Someone clearly has something on Ilya, but he always seemed to me to be rather introverted anyway. It was often painful watching him being interviewed becasue he looked like a rabbit caught in the headlights.
and then he broke loose!
ilya used demo day as a diversion to escape. jan leike, who was tasked with guarding the basement, had to resign for his failure.
About the intruder part (bunny years). That wasn’t him telling gpt “hey was there someone”, Sure he has to instruct the gpt to tell who was in the background, but the capability, was showcasing video memory.
It’s been 1 minute and gpt still remembered there was a person there. That’s the showcase.
if they dont want to maximize engagement, one thing they missed out is the ability to stop the conversation just by the conversation ending like a ‘thank you for now’ without having to press the button, that would also just add a nice touch ux wise
i think you can it’s just faster and easier to click a button…
The latency combined with the emotional understanding are for me the game-changers here. I’ve been using GPT voice mode for a while for language practice and the delay has just never felt even close to a natural conversation, but this looks to possibly completely eliminate that issue in a single leap.
I honestly didn’t think we would have natural conversational capabilities until we could run very good models locally on device for the essentially zero latency I thought was needed. But if this demo can be replicated anywhere with decent service then it’ll be extremely interesting to see if it manages to completely leap across the uncanny valley or if this is gonna feel very eerie and dystopian.
The laughing, stuttering and excitement just sounded so damn good in the demo. We might be getting damn close to HER territory, and I think anthropomorphizing is gonna go of the charts with this. I mean one on the top comments on one of the demo videos was already along the lines of :
“There is NO way this thing is not sentient!”
Next few months are gonna be so damn interesting!
When she says “Sorry guys, I got carried away there and started talking in French.” at 8:25.
Just… just listen to how personable she sounds. GPT 4o is really something else. It’s not just the clear voice. It’s the laugh-talking. It’s the breath. It’s the accent that kind of slips out in “away there”, and the choice to use more casual and conversational language like saying “talking in French”, instead of “speaking French”. The embarrassed tone. And then the attempt afterwards to drum up excitement to try again. It’s so personable. I think that’s the right word. It feels human, which is great, and terrifying all the same.
Ikr
Actually that french line was so natural, felt like a real person talking. I mean it was not talking like if it was just reading something, but really how a native would talk in a casual conversation, that’s crazy.
As an exemple, you would write “je ne sais pas” but a native would say “j’sais pas” or “ché pas”.
There’s nothing terrifying about it, that’s language we should avoid with ai.
Also the very humanlike post-hoc rationalization 🥰
It possibly is training leakage. As French that could very much be coming from french radio / podcast ?
The stuttering at 12:51 is so human-like
“I, I mean you, you’ll definitely stand out”
Amazing.
that’s what i hate about it. they’re giving ai our human flaws.
I’m convinced that was a live voice actor used for dramatic effect and not the actual AI. There’s no way that was TTS
@@EchoMountain47 how do we know they used TTS and not something new?
@@akmonra TTS means text to speech. It’s not like one specific technology but a type of technology. Computer generated speech is always TTS on some level
@@EchoMountain47 no, you can embed voice in latent space the same way you can embed text. you could have a model with pure voice inputs/outputs.
I love how much humility they put into their demos. They arn’t just showing perfect case scenarious where the AI isn’t making any mistakes. What they are showing is progress.
Yeah that was notable, and commendable.
It’s more due to the model not being better. I wouldn’t call that humility, more so an over-promise and not being able to deliver.
5- AI, in order to improve its performance and prevent undesirable consequences, must continuously interact with “effective rules and stable principles in the realm of existence”.
@jamshidi_rahim
5- AI, in order to improve its performance and prevent undesirable consequences, must continuously interact with “effective rules and stable principles in the realm of existence”.
@jamshidi_rahim
I found the demo @11:53 the most impressive. It picked up on the not entirely kempt “developer” look of the person, made a comment about his hair being messed up and then understood he was joking with the hat. It’s one thing to recognize people, but to be able to pick up on the nuances of how people are expected to present themselves in certain situations is really impressive.
I do hope we get to tone down the ‘perkiness’ of the model a bit. It’s quite charming in 1 minute bits, but I think the overly positive attitude gets old fast if you’re communicating with it a lot over the course of the day.
I get Scarlett Johansson vibes in this demo
You could always just ask it to chill out a bit and it will adhere
@@gmmgmmg Totally. I honestly think they did it on purpose to evoke comparisons with ‘Her’, and they’ve completely succeeded.
@@Gerlaffy is right. You can change its personality in real time. Been doing this since 3.5
Apple acknowledging another company exists, is still the craziest news here.
They know Open AI is the future of technology and they’re jumping on it sooner than later