Sora – Full Analysis (with new details)

Sora, the text-to-video model from OpenAI, is here. I go over the bonus details and demos released in the last few hours, and the technical paper. I’ll also give you a glimpse of what’s to come next and a host of implications. Even if you’ve seen every Sora video, I bet you won’t know all of this!

AI Insiders:

Sora:

ViT Transformers:
Captioning Innovation:
NaViT:
OpenAI Exclusives:

And far too many tweets to list here!

AI Insiders: Non-Hype, Free Newsletter:

Joe Lilli
 

  • @onlymediumsteak9005 says:

    January was slow, but February is already delivering more than I hoped for all of 2024 🤯

  • @petermcind says:

    This video is history. One of those things people will look back on in years and remember what the beginning felt like.

    • @Techtalk2030 says:

      the early years of the 4th industrial revolution

    • @stockholmpublishings2937 says:

      the beginning of the end when Skynet was activated

    • @seanmurphy6481 says:

      Will Smith eating spaghetti.

    • @archvaldor says:

      I think people are being a bit credulous here. When CGI first came out, it was breathtaking watching something lie Terminator 2, which did it right, but very quickly cgi became difficult to watch and movies are now turning back towards mixing old school realism with cgi enhancement. This will be similar. AI videos will be saturating youtube and it will get kickback as everyone notices how flawed the concept is..

    • @theterminaldave says:

      @@seanmurphy6481 I actually want an AI that will create the weirdly misinterpreted imagery that the Will Spaghetti AI did.

  • @lodepublishing says:

    OpenAI: “We can now create HD movies based on text prompts.”
    Everyone: “Can it contain text?”
    OpenAI: “No, we can’t do text yet.”

    • @Techtalk2030 says:

      Itll all be fixed up by the end of this year most likely. Vudeo, text, audio.

    • @stockholmpublishings2937 says:

      but you can add text with separate AIs

    • @MrMnmn911 says:

      Give it 2 weeks. It will be capable of generating text.

    • @orterves says:

      I’m guessing having a refining process where the generated movie can be run through specialised models – one to correct text, another to ensure finger consistency, another for eye colour, another for jiggle physics, etc etc – could be used to fix up the raw output

    • @RexelBartolome says:

      @@orterves To put trust in an AI model (or multiple ones) to fix temporal and physical coherence is just way too much compute/scale to solve, and also a bit unreliable considering my experience with similar models being used to fix Stable Diffusions’ hands and faces for example. I predict the future of video generation is actually going to be 3D-based, perhaps an animated nerf will be generated and you can just control the camera afterwards. That would ensure that everything is ‘accurate’ with object permanence etc., instead of going this route of solving everything frame by frame all in one camera perspective

  • @iandanforth4313 says:

    Correction: Both videos in their interpolation examples *are* generated by SORA.

    • @h-di4qd says:

      i thought so too. the fact that it’s open for correction and second guessing is indicative of how advanced it is. ohhhh, i’m not looking forward to the era of generated political and global conflict videos.

    • @sebastianjost says:

      you’re right. This is also indicated by the changing watermark in the bottom right corner.

    • @GS-tk1hk says:

      I was gonna say the same thing, it is pretty clear if you look at the people moving around, doesn’t quite look right. Still, the fact that you can barely tell apart a real video and an AI video is just bonkers, this really is the DALLE-2 moment of text to video.

    • @dunar1005 says:

      you must have missed his own research papers @@JBroMCMXCI

    • @thanos879 says:

      @@JBroMCMXCI That’s totally false. This guy always reads the research papers and everything. Even finding mistakes in the papers. And has interviewed people in the industry. And I’m sure a lot more that I don’t know about. YouTubers make it look effortless.

  • @Theonlyrealcornpop says:

    OpenAI’s text-to-worldbuilding follow-up – combined with Apple’s silent unveiling of Apple’s KeyFramer for animation – legitimately blew my mind. I just don’t even know how creatives as individual contributors are expected to integrate this into their workflows with the pace it’s moving – and that’s literally my entire job

    • @JohnSmith762A11B says:

      It’s true. I’m overwhelmed with creative possibilities but know if I wait just a bit longer I’ll have even better set of tools ready to go. It’s all starting to feel a bit “singularity” as its exhausting even to try to keep up with.

    • @RosscoAW says:

      Weird, it’s almost like our socioeconomic system is even more woefully inadequate for dealing with the realities of a legitimately semi-automated, borderline post-scarcity world than it is at dealing with our normal, industrialized, globalized blue collar world. I wonder if anybody has ever devised an alternative economic system predicated on adapting to and accomodating the changes necessary with a highly industrialized economy and a work force of intellectuals instead of 90%+ peasants. If they had, I bet it would have a boring name like “socialism,” or something. 😂

    • @JBroMCMXCI says:

      @@RosscoAW name one communist regime that didn’t genocide its intellectuals

    • @NihongoWakannai says:

      ​@@RosscoAW how do you see AI automating a bunch of highly creative white collar jobs and come to the conclusion that peasantry is ending?

    • @basilmcdonnell9807 says:

      I spent 20 years building and maintaining workflow systems for animation. As of now the industry, all of it, is at a dead standstill. No one knows what to do with this stuff. How do you go from script to storyboard to animation to render now? We don’t even know the job titles any more. How do you propose a budget for a show when you have no idea how to make it?

  • @EthanHaluzaDelay says:

    Two AI Explained videos in two days! Your speed is incredible!

  • @RazorbackPT says:

    7:45 “The video you see was NOT generated by Sora” Are you sure? It really looks like it is. The stairs that lead nowhere, the choppy motion of the people.

  • @bryanp8042 says:

    The biggest implication I see with this is what this means for multi-modal models. This is currently caption->video, but if the technology behind this were implemented into a multimodal GPT model (which I get the feeling is already happening behind the scenes), the implications are absurd. Having spatio-temporal abstractions of this fidelity existing in the same parameter space as text abstractions would have massive implications for the reasoning capability of GPT models. OpenAI themselves posed SORA as a world simulator in their technical report, imagine what future GPT models might be capable of if they can internally visualize the world to this degree.

    • @GrindThisGame says:

      They have eyes and ears. With Optimus they will have touch.

    • @urhot says:

      @@GrindThisGameare they partnered with Tesla?

    • @concernedindian144 says:

      Absolutely, imagine you ask a question and GPT simulates the reality of question and then start answering, that would be AGI

    • @gclip9883 says:

      @@GrindThisGame I’m sorry, but i’m still extremely sceptical about Optimus. Whereas OpenAI managed to actually back up their claims, Tesla has done nothing but make massive promises that they couldn’t deliver. They haven’t solved FSd and are in fact behind compared to other companies. The new robot looks cool but uses technology that has existed in robotics for decades. The only real innovation with their robot are their motors, but that is not exactly groundbreaking. I’m happy to be proven wrong, but until then i would not put Tesla anywhere near OpenAI in terms of innovation.

    • @wolfganggager5110 says:

      Yes, but in my opinion their technical approach is extremely resource-intensive and blurred. But maybe that will change soon with knowledge graphs.
      https://www.youtube.com/watch?v=nPG_jKrSpi0

  • @QuickM8tey says:

    I showed some of the Sora videos to friends and they suspected some of it was ai generated considering my passion for the topic, but none of them guessed the entire videos were. I cannot even imagine what Sora videos will look like 1-2 major upgrades later. I’m hoping there’s a breakthrough with math and llms for education by 2025. Great video man

  • @spaceadv6060 says:

    I’ve been following AI progress for about a year, but to be honest sora blindsighted me. I thought I had a mental model of what exponential progress looks like but I realize now that I have no idea. Thanks again for your high quality videos! You are my go to creator for AI content.

    • @aktchungrabanio6467 says:

      Thank you for being so candid

    • @ClayMann says:

      I can’t even describe what Sora is doing from models a year ago as an exponential leap. Its not twice as good or even 10x. Its somewhere my mind can’t even measure. the style transformations, the morphing, the temporal accuracy and super stable occlusion. Its all just, well magical is all I can come up with. If we got one more leap like this in another year we’re in a completely new world that I do not think the public are ready for. Imagine real-time Sora *slow motion mind explosion*

    • @scaryjam8 says:

      Blindsided*

    • @theeternalnow6506 says:

      Agree on this one. This one genuinely made a leap forward that caught me off guard.

      Now think what we’re getting 6 to 12 months from now.

      Google with the 10 million tokens.

      Its going to get wilder and wilder very rapidly.

    • @ShawnFumo says:

      Yeah I felt like this at the end of last year actually. After keeping track of image generation since MidJourney v3, I had some idea of the quality I thought we’d have at the start of this year. But we were already past it by probably by the third quarter of the year. And now Sora is so beyond that. It is like v4 or v5 quality at a minute long instead of a single frame. And with all the good stuff Runway and Pika have done, the 4s limitation is still a huge limitation. But I’m sure they’ve looked closely at what OpenAI has said and the papers they referenced and are working on their response already.

  • @JohnSmith762A11B says:

    Sora kinda ate my entire day today. I’m exhausted thinking about the possibilities, limitations, and implications. I’m going to watch a movie now, performed by human actors, filmed with real cameras. How quaint.

  • @MemesnShet says:

    You just dropped so many bombs of the implications of this project and future plans of Open AI and much mode that its hard to keep track of wow

    Even tho this channel is very fast paced on whats happening right now i believe making short compilations by topic of all the incredible predictions,scoops and information gems that you keep finding instead of having them scattered throughout the videos would BLOW PEOPLES MINDS!

    Im sure there are many people interested in AI that have no idea about all the plans and projects that Open AI has been working on aside of LLMs

    Your videos are amazing with information gems across all your catalog of videos and I believe showcasing those gems specially those that mainstream media hasn’t even catched up to yet would blow this channel into the stratosphere and beyond as it should.

  • @pareak says:

    Sora was literally the first time that I could not believe the AI progress I was seeing.

  • @k.a.8725 says:

    After watching Rabbit AI, Gemini 1.5 Pro and now Sora, I am convinced that AI will just continue to completly shatter our expectations for the next few years.

  • @Macieks300 says:

    The fact that that Berkley robot was deployed 0-shot is crazy to me. It means that truly when AGI comes the hardware won’t stay that far behind and won’t be actually its biggest limitation.

  • @shadowdragon3521 says:

    12:33 I believe the social response people are supposed to give is along the lines of “omg how am I supposed to tell what footage is genuine and what is generated anymore?”. I don’t think he was talking about filmmakers’ jobs getting replaced.

    • @chrism1503 says:

      I think people talking about filmmakers’ jobs being replaced is absolutely part of the “social response”.

    • @neutra__l8525 says:

      @@chrism1503 Yes its part of it, but as mentioned in the video, this was released as somewhat of a warning as to what is coming. Sure, a warning to everyone involved in film that their jobs may be in trouble is necessary, but it is also only letting them know that they are facing the same challenges in the near future that almost everyone else is.. unemployment. However not being able to differentiate fake footage from real footage (should that happen) becomes a massive problem for all of society as it throws the legal system into utter chaos. If the legal system fails, society could quickly crumble. That is a much bigger problem than the film industry. And as we know, governments are slow and lumbering, while AI has created new problems before the government has even heard of the old problems. And the problems get worse every minute. These companies need to slow down the pace massively, but they wont. Who is going to slow down on developing the greatest and last technology that humans will ever create. Its winner takes all and everyone knows it.

  • @GrindThisGame says:

    This is my favorite YT channel (and I’m subbed to 100s of channels). I watch every episode from start to end. Thank you for doing what you do.

  • @jarekstorm6331 says:

    The anomalies are like things that happen in dreams, bizarre and surreal yet you just accept them when dreaming. Still, these leaps are amazing to see.

  • @vladdata741 says:

    Great analysis. It’s crucial to see how Sora feeds into the accelerating feedback loops for AGI. Pair it with a vision model which selects accurate videos and discards the bad ones: you have a synthetic generator of endless high-quality video data. Pair it with an LLM, you have an agent who can imagine its action plan in a 3D environment (like we do) and simulate 3D scenarios to think about physics and other problems. Put all of these in a robot… Well you can see where this is going.

    • @skierpage says:

      I wonder if Sora had a fine-tuning step where they said now that you’ve learned about all the features and textures and visual appearances of millions of items in video scenes, now here are the best video clips to learn what makes a great video. Similar to how some LLMs are fine-tuned by re-reading all of Wikipedia.

  • @Madlintelf says:

    It’s one thing to have hindsight and look back and realize you lived through significant historical period, it’s quite another to realize it’s happening in real time and there is no end in sight! What a time to be alive, thanks for documenting as much as you can.

    • @theeternalnow6506 says:

      Yeah. The future feels incredibly uhhh unpredictable in what its actually going to look like.

      I do know that we’re in a science fiction movie and its going to get crazier and crazier very soon.

      Those reports of deepmind synthesizing 2 million potential new materials, etc. All the new things that ai is currently creating will have its own ripple effects in industries and its going to get really fucking wild pretty soon. This video at the end shows the robot walking and ive been convinced for a while now that we’re going to have actual robots that we can talk to walk around in certain places within 5 years. Might even be 3 at the current rate.

      Its nuts.

  • >