Finally, DeepMind Made An IQ Test For AIs! πŸ€–

❀️ Try Macro for free and supercharge your learning:

πŸ“ The papers are available here:

πŸ“ My paper on simulations that look almost like reality is available for free here:

Or this is the orig. Nature Physics link with clickable citations:

πŸ™ We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Taras Bobrovytsky, Thomas Krcmar, Tybie Fitzhugh, Ueli GallizziIf you wish to appear here or pick up other perks, click here:

My research:
X/Twitter:
Thumbnail design: FelΓ­cia Zsolnai-FehΓ©r –

Joe Lilli
 

  • @MarixEm says:

    the evil pillow part killed me

  • @Seboko-o7j says:

    Lumiere kills me πŸ˜‚

  • @oatlord says:

    I really liked sora’s painting video. I think that’s reality now.

  • @MCA5EY says:

    So what i’ve inferred from current video models is they’re getting better at consistency, but i don’t think they’ve been trained alongside physics understanding. Google just released their Gemma 3 multi-modal model that’s pretty good at text and image consistency because they were trained together. I think there needs to be a video generator trained alongside physics. Like a bunch of ground truth videos with detailed physics breakdown explaining step by step all the interactions, and then Reinforcement Learning.

    • @sebastianjost says:

      @@MCA5EY that’s why people work on training robots in simulation and make those simulations faster and faster.

      For now, this is still difficult, but we’re getting there.

    • @jaiveersingh5538 says:

      @@MCA5EY sounds like NVIDIA’s COSMOS

  • @vixxcelacea2778 says:

    I look forward to it understanding physics and also human emotions. I look forward to the movies people are going to make and share. Besides obvious medical and scientific application of AI, the creative aspect of people being able to make things they otherwise would never be able to due to resources or personal constraints (like body or brain) is gonna be so neat to see. Even the crappy or weird stuff. Just unleashing creativity at the core.

    • @June-1980 says:

      Agreed! I hope it becomes great for humanity and life in general

    • @cajampa says:

      @@vixxcelacea2778 This! So much.
      I can’t wait until we get so all the creations from people who never would have a chance to create something, like movies and other kind of striking visual narratives before. I am one of them who look forward to one day be able to create what I already visualize in my stories.

  • @TheAkdzyn says:

    Finally, a video discussing the unique challenges in the physics behind AI videos. I wonder what type of data it would need to accurately model our physics. Exciting stuff.

  • @justtiredthings says:

    I wish Veo 2 had been included in this

  • @tHEuKER says:

    I always find it hilarious when you put the video’s intro halfway through it. πŸ˜‚

  • @Songfugel says:

    Thank you for easing my existential crisis even if just a tiny bit

  • @muridsilat says:

    I may misunderstand the comment starting at 5:53. I know some AI models are trained on physics simulations for things like robotics and self-driving cars. I was under the impression that this training was effective. I would have thought that an AI playing in a digital physics sandbox (particularly with the sort of visual information humans get in a video game) would improve at creating and predicting visual representations of those phenomenon.

    • @AlienXtream1 says:

      it is, to a degree. but the thing to understand is that the methods, models, and time involved are not the same. they are separate things that have been done, just not yed all combined together into a single system. at present, the ability to have an internal sandbox is too computationally intensive for these types of things (text to image/video or generative text like ChatGPT) as those need to run fast and relatively efficiently. ChatGPT doesn’t need to speculate or consider your whole point of view to respond to a generic query or input.

      by contrast, things like a robot having to manipulate things in the real world needs to be sure that all the signals it sends to the motors and actuators and whatever else exists to articulate the system are performed accurately. different contexts requiring different tools and different levels of precision. most of the “really powerful” AIs that you hear about are only that way because they are task specific and very narrow in scope, having been able to train over long periods of time (not realtime). a self driving car doesn’t need to plan out the route taking into consideration all the things you have to do that day and the timings of them. it just has to move the vehicle from point A to point B without hitting anything or anyone (and a bunch of other things like road rules, but the general idea is much the same).

      to my understanding its largely a pathfinding algorithm combined with a bunch of data from peripheral sensors and collision avoidance. these are things that can actually be manually programmed with staggering effectiveness. the machine learning part would largely come into play for converting the sensor data into the correct formats so that these algorithms (of which similar kinds are already used for things like AI in games) have access to the required information; how many cars are around me, where are the people, whats the speed limit along this stretch of road. all that stuff is trivial to get in a virtual world like GTA as its all there already in the way the computer understands it. the graphics are then converted into human compatible representations. machine learning is needed to do the inverse effectively though. taking an image and other data from proximity sensors and such and converting that to what the algorithm’s need to effectively work. im sure newer self driving AI uses actual machine learning for aspects of the driving algorithm itself, but that’d be to improve the adaptability of the system for real world scenarios and details. the general principle is the same though; the hard part isn’t driving the car, its interpreting the sensor and camera data reliably enough for the car to drive

    • @muridsilat says:

      ​@@AlienXtream1 Yeah. I was thinking the overall integration might be what’s missing, but it seems like recognizing relationships between visual data and corresponding physics calculations would be right in AI’s wheelhouse. I’d certainly expect a ton of training to take place, just like any other AI model. Once that’s done, however, I would assume predicting the next few frames of a video of a ball dropping is pretty much the same as all the other predictive stuff AI does.

  • @Justin_Arut says:

    Advanced understanding and probably AGI will come with embodied experience.

  • @comfortablesofa says:

    So … when fixed, you’ll have an unbelievably powerful Ai that fails the next rounds of basic tests that someone else will imagine. The real question is: how can we get beyond benchmark hacking and truly move beyond LLMs to causal modeling…

  • @CGDive says:

    This was an awesome video. Lots of fun!

  • @Topnichemarket says:

    It’s incredible to see how AI can generate stunningly realistic videos yet still struggle so badly with basic physics. The teapot growing a pedestal? The fire refusing to go out in water? Absolutely mind-blowing failures! It really makes you question what “intelligence” means for AI. The fact that more training doesn’t necessarily improve their physical understanding is shocking. I can’t wait to see how AI evolves from here. Thanks for sharing this amazing breakdownβ€”I loved it!

    • @drhxa says:

      @@Topnichemarket the thing to keep in mind is that although these deep networks like gpt-4 or o1 are very good at sounding intelligent and can ace college math exams that appear very difficult to 98% of people, they are fundamentally extremely different types of intelligences/understanding. What is easy for us can be extremely hard for them and vice-versa!

      I suspect this trend will last for a while longer.

    • @michaelwoodby5261 says:

      The new Gemini Flash 2 Experimental, or whatever it’s called, may be holding the solution. It’s allowing the full model, that has an internal understanding of the world, to produce pictures, instead of shunting that job off to a different image program. The end result has the ability to futz with images as ideas instead of just pixels.
      I suspect once they are willing to pay for the processing to just fold an MMM into the video producing program they’ll solve…. maybe not every problem with AI video, but the vast majority.

    • @eleklink8406 says:

      Humans learn enormous amount of knowledge _before_ we know what learning is. ex: what is object, what is living thing, how gravity/fluids/… works,…
      these things extremely hard for AI, because no humans know how he learned it, so human unable to “teach” it to “young” AIs

    • @andrewmurphy8154 says:

      I wonder how much of our physical intuition is hard wired in from millions of years of evolution. By contrast, AI systems have only been in existence for a few tens of years, with limited to no real world physical feed back. I imagine things will change radically with the couple of decades.

  • @VidSkipperAI says:

    Summary (with chapter timestamps): This video explores how well AI models understand the physics of the world by testing their ability to predict the outcomes of simple video scenarios, revealing that even the most advanced models often fail to grasp basic physical principles.
    0:00 πŸ€– Intro to AI Understanding
    β€’ πŸ€– AI techniques are being tested to see if they truly understand what they are looking at in videos.
    β€’ πŸ§ͺ Google DeepMind is testing AIs by showing them the start of a video and asking them to predict what will happen in the next 5 seconds.
    β€’ πŸ”¬ The video will go through 4 experiments, each more challenging than the last, to test the AI’s understanding of physics.

    1:18 πŸ§ͺ Experimenting with AI Prediction
    β€’ β˜• Experiment 1: A rotating teapot. Some AIs failed to predict the rotation, while others had issues with object permanence.
    β€’ 🎨 Experiment 2: Painting something. OpenAI’s Sora failed despite performing well in the previous experiment. VideoPoet performed reasonably well.
    β€’ βš–οΈ Experiment 3: Light versus heavy. A kettlebell and a piece of paper are dropped on a pillow. All AIs failed to accurately predict the outcome.

    3:16 πŸ”₯ More Challenging Experiments
    β€’ πŸ”₯ Experiment 4: A match on fire is put into water. Most AIs failed, with Sora initially getting it right but then making a mistake.
    β€’ πŸ“Š AIs were tested on solid dynamics, fluid dynamics, optics, thermodynamics, and magnetism.
    β€’ πŸ“‰ Sora came in last, while the multiframe version of VideoPoet performed the best, but still below 30% accuracy.

    4:29 🌊 Results and Analysis
    β€’ 🌊 AIs understood fluid mechanics better than solid dynamics, which is surprising.
    β€’ 🧠 Visual realism and physical understanding do not necessarily go hand in hand.
    β€’ πŸ€” GPT-like AI assistants were tested with visual IQ tests, and their performance was surprisingly poor.

    5:45 πŸ“š Why AIs Struggle with Physics
    β€’ πŸ“š Physical understanding differs significantly from the tasks these systems are trained for.
    β€’ πŸ“ˆ Teaching algorithms more does not necessarily improve their scores on these tests.
    β€’ πŸ’‘ AIs can do amazing things but are fundamentally different from human intelligence and have a long way to go.

    ** Generated using ✨ VidSkipper AI Chrome Plugin

  • @LukeJAllen says:

    I’m very happy to see a channel not just hyping up AI, but showing a more realistic view including its shortcomings. I very much dislike how companies who make the models only focus on the positive points, making them sound almost perfect, when in reality that is completely not the case.

  • @nikhilsultania170 says:

    1. They should have included Veo 2 (their own product) in this, apparently its very good.
    2. This is a great observation, and a strong motivation for physics based video gens

    • @MrRandomPlays_1987 says:

      Same thought, it’s odd how they included almost any possible AI video model yet not the physically based Veo2.

  • @mattilindstrom says:

    So, we’ve come up with excellent visual comedy generators. I’d count that as a win.

  • @Sophistry0001 says:

    This is actually kinda wild. Humans have an intuitive understanding of the type of physics we interact with daily, its so simple and intuitive kids by age 3 or 4 understand most of these. These bots don’t have a clue though

  • @pritpatel8368 says:

    Where is veo 2
    Why you forgot to compare it in every video

    • @MrRandomPlays_1987 says:

      Same thought, it’s odd how they included almost any possible AI video model yet not the physically based Veo2.

  • >