DeepMind’s New AIs: The Future is Here!
❤️ Check out Lambda here and sign up for their GPU Cloud:
Guide for using DeepSeek on Lambda:
📝 The Gemma 3 paper and the rest are available here:
Sources:
📝 My paper on simulations that look almost like reality is available for free here:
Or this is the orig. Nature Physics link with clickable citations:
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Taras Bobrovytsky, Thomas Krcmar, Tybie Fitzhugh, Ueli GallizziIf you wish to appear here or pick up other perks, click here:
My research:
X/Twitter:
Thumbnail design: Felícia Zsolnai-Fehér –
Spectacular! What a time to be alive!
After lost your job, it will become what a time not to be alive 😂
I’ve tested ShieldGemma 2 myself, and it has a major bias against anime and cute things. Gemma 3 impresses me, though.
@fim-43redeye31 that’s good. we don’t want our ai liking 3000 yo little girls
@handsanitizer2457 Maybe I want my AI liking actual adults. ShieldGemma hates *anything* feminine and animesque.
@@fim-43redeye31 good to know 👍
Generative edits are one thing, but I want to see the ability to create consistent characters and styles. Not similar, the same.
It’s close
You mean to put the same characters in an already existing scene?
This would be devestating if we think about DeepFakes
For that you need to get an artist that puts actual effort into art
Generate a character sheet and train a lora?
Now we have new AI almost every day
Really weird times, isn’t it?
Every minute!
We entering the singularity lol its gonna get faster
@@SockOrSomething Let’s hope for a new global paradigm shift soon 😉
The speed at which these are improving is incredible.
This is very impressive. It feels like another leap with the coherent text in images, image editing on par with photoshop all in an incredible form factor. Sounds like a creative box of knowledge.
I remember watching here some years ago a technique for recoloring a photo of Abraham Lincoln, but the hair was slightly out of pose, it was revealed to be a GAN at work.
I can’t believe the progress we’ve seen until today and what’s left to come with video generation.
good old GAN. How far we’ve come since
What a time to be AI
🤣
Terminator coming, Irobot too
Yess!! Local and Open models for the Win!
I am fine-tuning gemma3 1b as we speak 😎
how can you do that?
@@torarinvik4920 where do you find your data? And how much do you need ?
@@MrHanil1 by fine tuning it
Hi.. Please tell which gpu, quantization etc u are using and share your results as well
Do I learn it by going through an LLM engineering course? Please advise
To add image generation and editing is only available in gemini 2.0 flash experimental (not the normal gemini 2 flash) its not available in gemma, and image generation is native from the model not an external model.
@@fus3n no it have image to text
@nikhilsharma32907 gemma has image to text, not text to image, the new gemini can output images means generate image natively which isnt supported in gemma, which wasnt clarified properly in this video, everything was just mushed together
So it’s a scam video?
@@mcasma1523 its not a scam it wasn’t organized properly making it seem like all that was possible by gemma, there was a small text saying the generation was by gemini flash but that really isnt enough for all the other stuff as i can see many people thinking it was possible by gemma a open source model
Gemma 3 optimizing with just one GPU? Now that’s impressive! From creative writing to high-dex robotics, this is a game-changer. Hats off to Google DeepMind!
Not really though. It’s already out dated. Qwen’s QwQ beat them to the punch
@@_CRiT_hits_ QwQ is significantly bigger than even the largest Gemma.
@@_CRiT_hits_ shh let the bot advertise
@@puppergump4117 You’re probably replying to a Qwen hype bot… It’s robot warfare!
What a time to be Alive!
With how fast things are moving, We will either be the last generation or the eternal generation. What a time be alive..
like with most things, it will be somewhere in between. Even though things are moving fast, I think it will take longer than some expect. However, I do think this generation will include people who live to 150+ years old.
This is already taking much longer than expected. If they just scaled up they should have had something twice as good last year. They encountered some limit in training data and now we’re relying on breakthroughs to minimize the models. Basically, the upgrades will now be inconsistent but much more rewarding.
@@TheBeastDispenser We already know all the hallmarks of aging, so 150+ seems a bit pessimistic.
@@TheBeastDispenser You are thinking in teams of breakthroughs in aging that can come in the next decade or two, if someone does live to 150 then their will be 150 years of progress in technology by that time which is such an insane amount that it is basically guaranteed that they will live almost forever.
Things are heating up in India. Imagine how hot it would get in May.
I suspect one reason they’re trying to reduce the footprint of their model is for the purpose of putting it on the Pixel devices, which hold smaller versions of models for specific tasks.
@@kylek29 that’s definitely the future. Whoever can get AI integrated into a phone first is gonna make billions.
That is what I am looking to do, so I am sure the creators had this intent years ago when they were assigned to the task. I am thrilled such capable models are open source. The giant cutting edge models are of no value because in the time it takes to build trust in them, they are obsolete!
@@skyak4493 Not open source. Not open weights but rather w”eights available”
Also, autonomy. If you want to build a truly autonomous robot all of the things have to be run locally.
But can it fill my wine glass to the brim ?
What you mean Gemma 2 was okay, it was literally one the best in the small llm models, the mode concise.
@@npc-drew gemma2-9b beats any <= 10B text model from my personal eval. It's definitely my favorite until today.
exactly, same with Gemma-2 27B beating others 70B models.
My go-to for rag locally is Gemma-2 3B, the biggest difference is its punctuation is more human-like and less LLM (robotic in a way you can tell it’s a LLM).
My favorite framework is onnxruntime, speed was 100 tk/s, faster than llama.cpp at around 70-80, and then at last was exllama, however, gone are the days when gguf quant was worse than exl2.
For VL, right now I’m using queen-2.5 7B and its OCR capabilities beat Google Cloud OCR, crazy how few 1-2 years ago Multi modal LLM did worse at OCR than OCR AI and now tide has reversed
What? That’s crazy.. This really goes on massive leap in improvement isn’t always on calculating.. How efficient it is is should also be considered..
Holy moly… I uploaded a Glastonbury photo, and it recognised the pyramid stage!!! I’m getting 8 tokens/s on the 12B param version and 1 token/s on the full 27B, using my 5-year-old computer with a (8Gb VRAM) 3060TI. That is really impressive!
That’s definitely working almost perfectly, just a few little tweaks needed and I made a fantastic thing, which Gemma 3 helped a lot to fix. I just made the relevant fixes, put them back in and reasoned with the model about the aims of the work. One thing though, I did have to change the settings, in aistudio, and it was a after the full gamut of Gemini models before it shows u, but it gives the 2T option FOR FREE!!! Thanks for sharing!
I LOVE that new AI image editing thing with Google’s AI Studio. I even made a video (shared somewhere else) using it! What a wonderful time to be alive!
By somewhere else I mean on my second channel, by the way.
“This is not amazing, this is beyond amazing!”
Another Two Minute Papers mood