Never Browse Alone? Gemini 2 Live and ChatGPT Vision

The ‘Gemini 2 Era’ begins … with screen-sharing? But really, it’s a great free tool, for curiosity satisfying rather than bleeding-edge intelligence. I give you the benchmarks, the highlights and of course, the latest from OpenAI Advanced Voice Mode with Vision.

Assembly AI Speech to Text:

Plus Deep Research in Gemini Advanced, Simple Bench updates, Santa and what might be for some of you Google’s deflating admission.

AI Insiders (Now $9!):

Chapters:
00:00 – Introduction
00:38 – Live Interaction
03:43 – Gemini 2.0 Flash Benchmarks
05:10 – Audio and Image Output
06:38 – Project Mariner (+ WebVoyager Bench)
08:49 – But Progress Slowing Down?
10:43 – OpenAI Announcements + Games

Gemini 2.0 Flash Benchmarks:
Project mariner:
WebVoyager:
Gemini Game play:
Advanced Voice Mode OpenAI:

Claude Computer Use:
Oriol Vinyals Interview:

The 8 Most Controversial Terms in AI:

Non-hype Newsletter:

Podcast:

Joe Lilli
 

  • @Creepaminer says:

    So far this new Gemini is the only amazing thing to come out during OpenAI’s 12 days

  • @markopolio4163 says:

    The real shipmas is the frequency of these AI Explained Videos.

  • @MaJetiGizzle says:

    “There isn’t really a wall per se, but there is a bit of a hill that we need to hike.” – Sundar Pichai

    • @byrnemeister2008 says:

      Pretty much what Altman is saying. No wall just harder to make progress.

    • @MattGreenfield says:

      @@byrnemeister2008 Meanwhile Noam Brown at OpenAI (one of the main guys behind o1) has just confidently said that progress will accelerate in 2025.

  • @TheYoxiz says:

    i’ve tried Gemini 2 Flash in my native language (french) and the results were HILAROUSLY bad. i asked it, “hey! can you hear me okay?” and it wrote me, i kid you not, *an essay about the meaning of what the phrase “hey! can you hear me okay?”*, instead of just replying. it did that for anything i asked. like i would literally just say “hello!” and instead of saying hello back to me it would offer translations, suggestions, explanations, of… “hello”, instead of talking. i’ve never seen a language model do that before.

    • @maciejbala477 says:

      that’s a true LLM moment right there lol. Ah, the experience of running into a mistake that a human would never make under any circumstances…

    • @Pixcrafts says:

      I spoke to it in Greek and it replied with bogus TV series Greek 😂😂

    • @jozefwoo8079 says:

      Advanced voice mode from OpenAI started speaking Dutch with mistakes when I asked it to have a basic conversation in Dutch with my wife to help her learn the language. One time it started speaking German with me and then I told it I don’t speak German, it then switched to English with a German accent 😂

  • @ibonitog says:

    Most surprising fact from today’s video is that your name is Phillip 😀

  • @robkline6809 says:

    Thanks for the timely video, Philip – you are the go-to source for me!

  • @DentoxRaindrops says:

    “AGI is in the air” Brockman said. Hope we end this shipmas with a banger from both Google and OpenAI. Great video, dude!

    • @julius4858 says:

      Wouldn’t hold my breath, they already gave us o1 pro, what else is gonna come now 😂

    • @DentoxRaindrops says:

      ​@@julius4858right, but you never know

    • @DreckbobBratpfanne says:

      ​@@julius4858There was either and error or a leak of a gpt4.5 in their web UI on the first day when they released o1, so maybe that. Allthough why would they need an o1-pro if they had a 4.5 (that at least needs to be able to beat Sonnet3.5(new))

    • @AmandaFessler says:

      @@julius4858 Alas, as a pessimist, I agree. Start with the big one. Everything else is gravy.

    • @julius4858 says:

      @@AmandaFessler I mean it wouldn’t even make sense to work on o1 and release it with a 200$ price tag and then release an even better model, days later? Idk, I don’t see that happening

  • @robkline6809 says:

    The debate over slowing progress in LLMs overlooks a key point: while model advancement rates may be debatable, we’re nowhere near realizing the potential of existing capabilities. Emergence isn’t just about unexpected model capabilities appearing; it’s also about practitioners discovering unexpected possibilities through creative applications of current systems.

    • Anonymous says:

      Not to mention that if einstein was struck by lightning and become 10x smarter, the benchmark wouldnt be able to reflect it. It would lokk like 1.1x or soemthing 😂

    • @robkline6809 says:

      🙂 – lol

    • @maciejbala477 says:

      yep. There’s still ways to use the current LLMs without improving them that don’t exist yet. That’s also progress if it’s implemented, just perhaps slightly less exciting

    • @joaquinhernangomez says:

      I get what you mean, but if the long term goal is to at some point reach something that can actually be called AGI, the foundation being used won’t ever bring us there no matter what novel approach comes out, I think we need one or two major breakthrough at the level of “Attention is all you need” from 2017. These architectures leveraging transformers initially started as a very effective tool for translation, and eventually with fine tuning we were able to get gpt 3.5, currently the newest approach is to use chain of thought to imitate reasoning.

      While the progress is still impressive, we won’t ever build something truly intelligent (it still seems that if you give a SOTA LLM a problem it hasn’t seen, it will struggle and even if it has, it can is still prone to generate a misleading token, deviating it entirely from the right solution). Another problem I’ve been encountering online and even experienced recently, is the skepticism for using these models in production within a somewhat serious application, the risks present in every single model to “hallucinate” has too much risk, since at the end of the day, that’s what they’re basically designed to do, hallucinate based on the text they’ve been trained and been given as a prompt. Very excited to see regardless where we will be in the next 5 years

    • @phillmeredith says:

      In reality this is how all technology advances, massive strides and then iterative steps which lead to enormous advancement over time

  • @cacogenicist says:

    Major progress, I suspect, will shift from scaling giant general models to assembling smaller, narrower-domain specialized models — along with memory storage and management components, and some kind of domain identification/routing element — into a sort of modular system that’s smarter than the sum of its parts.

    • @goldenshirt says:

      that would surely introduce a lot of latency, and I doubt that’s the way for AGI

    • @kyneticist says:

      ​@@goldenshirt Cacogenicist’s observations do seem like the most likely areas of progress. People’s desperation for AGI blinds them.

    • @davidlovesyeshua says:

      Any reason for thinking this?

    • @cacogenicist says:

      ​@@davidlovesyeshua- Human minds comprise somewhat domain biased, sort of soft cognitive modules (permeable, still relatively domain flexible) along with harder (less permeable, less flexible) evolutionarily older modules with clear neuro-structural correlates. They aren’t just big, homogenous natural language processing blobs.

    • @goldenshirt says:

      @@davidlovesyeshua if you’re talking to me, it’s because using many different specialized models is very different from AGI, you could even say it’s the opposite.

      AGI is supposed to be the best or close to in all domains (general intelligence), so using different models will not create one model that’s good at everything.

      Of course it’s a doubt that AGI is possible, but I hope it can happen

  • @HAL9000. says:

    Proud of OAI shipping Gemini Flash 2.0 and all those amazing tools for their shipmas lol

  • @OZtwo says:

    I was and still am supporting OpenAI yet this last year they have been hit hard with a lot of the key developers leaving. One of the biggest issues I think as well was that 01 was to be ChatGPT 5.0 yet it wasn’t what they were hoping for. The only answer they have to fix the current issue is to simply put more compute power and why it has started to cost so much as the compute power should only be needed at the training level.

  • @user-sl6gn1ss8p says:

    This is the only channel I still trust to get my Tic Tac Toe news

  • @autingo6583 says:

    the most impressive thing for me is that they actually have the capacity to roll this out. we’ve come a long way since google got caught flat-footed and had nothing more than poor old lamda-based bard prototypes because everything else was too heavy to serve

  • @turner-tune says:

    the tic-tac-toe part was gold 🤣 Amazing video as always! Thank you for the laugh and the great info 👏

    • @Fluffy-v9p says:

      was it a reference to the last video where he was told he got the tic tac toe problem wrong and told the ai (which got it right) that it was wrong?

    • @waffemitaffe8031 says:

      Yes. The AI also got it wrong tho, not just him.

      And it was quite an easy problem, but man was up all night reading research papers so we shouldn’t be too hard on him.

  • @brianjanssens8020 says:

    This is probably the only google product i can say i approve of so far. Mostly because they finally listened and toned down the censorship. You can create a tailor made model now specifically for you, with tons of fancy features that can help you and a lot of input tokens.

  • @sub-vibes says:

    2:14 – Gemini couldn’t sound _LESS_ enthused to be there if they tried!! 😆😆😆😆😆

  • @wisdomking8305 says:

    Loving the frequent updates form you

  • @georgesos says:

    Pichai was saying that it gets really steep, but when the “competition” was mentioned , he changed tune.(investors are listening)…..

  • @pareak says:

    Thank you so much for your videos! Quick uploads, high quality, intelligent, and yet still fun to watch. In the past weeks, the amount of time I have decreased drastically. I stopped watching a lot of different AI YouTube channels. But let me tell you this: I did not miss a single video of yours, and I don’t plan to ever miss one!

  • @Likou_ says:

    6:20 the reference to the mistake on the previous video is hilarious

  • >