DeepMind’s New AIs: The Future is Here!

❤️ Check out Lambda here and sign up for their GPU Cloud:

Guide for using DeepSeek on Lambda:

📝 The Gemma 3 paper and the rest are available here:

Sources:

📝 My paper on simulations that look almost like reality is available for free here:

Or this is the orig. Nature Physics link with clickable citations:

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Taras Bobrovytsky, Thomas Krcmar, Tybie Fitzhugh, Ueli GallizziIf you wish to appear here or pick up other perks, click here:

My research:
X/Twitter:
Thumbnail design: Felícia Zsolnai-Fehér –

Joe Lilli
 

  • @teamredstudio7012 says:

    Spectacular! What a time to be alive!

  • @fim-43redeye31 says:

    I’ve tested ShieldGemma 2 myself, and it has a major bias against anime and cute things. Gemma 3 impresses me, though.

  • @rando6836 says:

    Generative edits are one thing, but I want to see the ability to create consistent characters and styles. Not similar, the same.

  • @FictionalMarine says:

    Now we have new AI almost every day

  • @TheAkdzyn says:

    This is very impressive. It feels like another leap with the coherent text in images, image editing on par with photoshop all in an incredible form factor. Sounds like a creative box of knowledge.

  • @JorgetePanete says:

    I remember watching here some years ago a technique for recoloring a photo of Abraham Lincoln, but the hair was slightly out of pose, it was revealed to be a GAN at work.
    I can’t believe the progress we’ve seen until today and what’s left to come with video generation.

  • @_Inevitability_ says:

    What a time to be AI

  • @bluehorizon2006 says:

    Yess!! Local and Open models for the Win!

  • @torarinvik4920 says:

    I am fine-tuning gemma3 1b as we speak 😎

  • @fus3n says:

    To add image generation and editing is only available in gemini 2.0 flash experimental (not the normal gemini 2 flash) its not available in gemma, and image generation is native from the model not an external model.

    • @nikhilsharma32907 says:

      @@fus3n no it have image to text

    • @fus3n says:

      @nikhilsharma32907  gemma has image to text, not text to image, the new gemini can output images means generate image natively which isnt supported in gemma, which wasnt clarified properly in this video, everything was just mushed together

    • @mcasma1523 says:

      So it’s a scam video?

    • @fus3n says:

      @@mcasma1523 its not a scam it wasn’t organized properly making it seem like all that was possible by gemma, there was a small text saying the generation was by gemini flash but that really isnt enough for all the other stuff as i can see many people thinking it was possible by gemma a open source model

  • @WinonaNagy says:

    Gemma 3 optimizing with just one GPU? Now that’s impressive! From creative writing to high-dex robotics, this is a game-changer. Hats off to Google DeepMind!

  • @June-1980 says:

    With how fast things are moving, We will either be the last generation or the eternal generation. What a time be alive..

    • @TheBeastDispenser says:

      like with most things, it will be somewhere in between. Even though things are moving fast, I think it will take longer than some expect. However, I do think this generation will include people who live to 150+ years old.

    • @puppergump4117 says:

      This is already taking much longer than expected. If they just scaled up they should have had something twice as good last year. They encountered some limit in training data and now we’re relying on breakthroughs to minimize the models. Basically, the upgrades will now be inconsistent but much more rewarding.

    • @shiroi5672 says:

      @@TheBeastDispenser We already know all the hallmarks of aging, so 150+ seems a bit pessimistic.

    • @DendrocnideMoroides says:

      @@TheBeastDispenser You are thinking in teams of breakthroughs in aging that can come in the next decade or two, if someone does live to 150 then their will be 150 years of progress in technology by that time which is such an insane amount that it is basically guaranteed that they will live almost forever.

    • @himanshusingh5214 says:

      Things are heating up in India. Imagine how hot it would get in May.

  • @kylek29 says:

    I suspect one reason they’re trying to reduce the footprint of their model is for the purpose of putting it on the Pixel devices, which hold smaller versions of models for specific tasks.

    • @dantedamean says:

      @@kylek29 that’s definitely the future. Whoever can get AI integrated into a phone first is gonna make billions.

    • @skyak4493 says:

      That is what I am looking to do, so I am sure the creators had this intent years ago when they were assigned to the task. I am thrilled such capable models are open source. The giant cutting edge models are of no value because in the time it takes to build trust in them, they are obsolete!

    • @exoticredtadpole2713 says:

      @@skyak4493 Not open source. Not open weights but rather w”eights available”

    • @cascadecontroller says:

      Also, autonomy. If you want to build a truly autonomous robot all of the things have to be run locally.

  • @mighty2146 says:

    But can it fill my wine glass to the brim ?

  • @npc-drew says:

    What you mean Gemma 2 was okay, it was literally one the best in the small llm models, the mode concise.

    • @jimhrelb2135 says:

      @@npc-drew gemma2-9b beats any <= 10B text model from my personal eval. It's definitely my favorite until today.

    • @npc-drew says:

      exactly, same with Gemma-2 27B beating others 70B models.

      My go-to for rag locally is Gemma-2 3B, the biggest difference is its punctuation is more human-like and less LLM (robotic in a way you can tell it’s a LLM).

      My favorite framework is onnxruntime, speed was 100 tk/s, faster than llama.cpp at around 70-80, and then at last was exllama, however, gone are the days when gguf quant was worse than exl2.

      For VL, right now I’m using queen-2.5 7B and its OCR capabilities beat Google Cloud OCR, crazy how few 1-2 years ago Multi modal LLM did worse at OCR than OCR AI and now tide has reversed

  • @Archipelagoes says:

    What? That’s crazy.. This really goes on massive leap in improvement isn’t always on calculating.. How efficient it is is should also be considered..

  • @patrickdegenaar9495 says:

    Holy moly… I uploaded a Glastonbury photo, and it recognised the pyramid stage!!! I’m getting 8 tokens/s on the 12B param version and 1 token/s on the full 27B, using my 5-year-old computer with a (8Gb VRAM) 3060TI. That is really impressive!

  • @sirrobinofloxley7156 says:

    That’s definitely working almost perfectly, just a few little tweaks needed and I made a fantastic thing, which Gemma 3 helped a lot to fix. I just made the relevant fixes, put them back in and reasoned with the model about the aims of the work. One thing though, I did have to change the settings, in aistudio, and it was a after the full gamut of Gemini models before it shows u, but it gives the 2T option FOR FREE!!! Thanks for sharing!

  • @mrrfyW says:

    I LOVE that new AI image editing thing with Google’s AI Studio. I even made a video (shared somewhere else) using it! What a wonderful time to be alive!

  • @xeleader says:

    “This is not amazing, this is beyond amazing!”
    Another Two Minute Papers mood

  • >