Sora – Full Analysis (with new details)
Sora, the text-to-video model from OpenAI, is here. I go over the bonus details and demos released in the last few hours, and the technical paper. I’ll also give you a glimpse of what’s to come next and a host of implications. Even if you’ve seen every Sora video, I bet you won’t know all of this!
AI Insiders:
Sora:
ViT Transformers:
Captioning Innovation:
NaViT:
OpenAI Exclusives:
And far too many tweets to list here!
AI Insiders: Non-Hype, Free Newsletter:
January was slow, but February is already delivering more than I hoped for all of 2024 🤯
I wanted for q1 to have a new mistral, a new anthropic, an new inflection, a new Llama and all mind of other hypes.
@@a.thales7641Still a month and a half of q1 to go. That’s a long time in AI.
i hate you guys with the “slow” bullshit dude, this technology if you’d ask me in 2021 i’d say it would be 20 years away, you think it’s slow because you might have too much free time maybe
fr
AGI BY DECEMBER!
This video is history. One of those things people will look back on in years and remember what the beginning felt like.
the early years of the 4th industrial revolution
the beginning of the end when Skynet was activated
Will Smith eating spaghetti.
I think people are being a bit credulous here. When CGI first came out, it was breathtaking watching something lie Terminator 2, which did it right, but very quickly cgi became difficult to watch and movies are now turning back towards mixing old school realism with cgi enhancement. This will be similar. AI videos will be saturating youtube and it will get kickback as everyone notices how flawed the concept is..
@@seanmurphy6481 I actually want an AI that will create the weirdly misinterpreted imagery that the Will Spaghetti AI did.
OpenAI: “We can now create HD movies based on text prompts.”
Everyone: “Can it contain text?”
OpenAI: “No, we can’t do text yet.”
Itll all be fixed up by the end of this year most likely. Vudeo, text, audio.
but you can add text with separate AIs
Give it 2 weeks. It will be capable of generating text.
I’m guessing having a refining process where the generated movie can be run through specialised models – one to correct text, another to ensure finger consistency, another for eye colour, another for jiggle physics, etc etc – could be used to fix up the raw output
@@orterves To put trust in an AI model (or multiple ones) to fix temporal and physical coherence is just way too much compute/scale to solve, and also a bit unreliable considering my experience with similar models being used to fix Stable Diffusions’ hands and faces for example. I predict the future of video generation is actually going to be 3D-based, perhaps an animated nerf will be generated and you can just control the camera afterwards. That would ensure that everything is ‘accurate’ with object permanence etc., instead of going this route of solving everything frame by frame all in one camera perspective
Correction: Both videos in their interpolation examples *are* generated by SORA.
i thought so too. the fact that it’s open for correction and second guessing is indicative of how advanced it is. ohhhh, i’m not looking forward to the era of generated political and global conflict videos.
you’re right. This is also indicated by the changing watermark in the bottom right corner.
I was gonna say the same thing, it is pretty clear if you look at the people moving around, doesn’t quite look right. Still, the fact that you can barely tell apart a real video and an AI video is just bonkers, this really is the DALLE-2 moment of text to video.
you must have missed his own research papers @@JBroMCMXCI
@@JBroMCMXCI That’s totally false. This guy always reads the research papers and everything. Even finding mistakes in the papers. And has interviewed people in the industry. And I’m sure a lot more that I don’t know about. YouTubers make it look effortless.
OpenAI’s text-to-worldbuilding follow-up – combined with Apple’s silent unveiling of Apple’s KeyFramer for animation – legitimately blew my mind. I just don’t even know how creatives as individual contributors are expected to integrate this into their workflows with the pace it’s moving – and that’s literally my entire job
It’s true. I’m overwhelmed with creative possibilities but know if I wait just a bit longer I’ll have even better set of tools ready to go. It’s all starting to feel a bit “singularity” as its exhausting even to try to keep up with.
Weird, it’s almost like our socioeconomic system is even more woefully inadequate for dealing with the realities of a legitimately semi-automated, borderline post-scarcity world than it is at dealing with our normal, industrialized, globalized blue collar world. I wonder if anybody has ever devised an alternative economic system predicated on adapting to and accomodating the changes necessary with a highly industrialized economy and a work force of intellectuals instead of 90%+ peasants. If they had, I bet it would have a boring name like “socialism,” or something. 😂
@@RosscoAW name one communist regime that didn’t genocide its intellectuals
@@RosscoAW how do you see AI automating a bunch of highly creative white collar jobs and come to the conclusion that peasantry is ending?
I spent 20 years building and maintaining workflow systems for animation. As of now the industry, all of it, is at a dead standstill. No one knows what to do with this stuff. How do you go from script to storyboard to animation to render now? We don’t even know the job titles any more. How do you propose a budget for a show when you have no idea how to make it?
Two AI Explained videos in two days! Your speed is incredible!
created by ai maybe….
Thanks to AI
Top tier LLM, lol
7:45 “The video you see was NOT generated by Sora” Are you sure? It really looks like it is. The stairs that lead nowhere, the choppy motion of the people.
I caught that, too. The circling drone shot video was absolutely one of the ones included in the demos.
Yes i think that is wrong it is actually generated by sora as far as i know.
Yeah my bad. I should have said ‘need not have been made by’
all good i am just happy that after watching you since the start this is the firsttime i feel like i have contributed something :D@@aiexplained-official
This shows the performance of Sora.
The biggest implication I see with this is what this means for multi-modal models. This is currently caption->video, but if the technology behind this were implemented into a multimodal GPT model (which I get the feeling is already happening behind the scenes), the implications are absurd. Having spatio-temporal abstractions of this fidelity existing in the same parameter space as text abstractions would have massive implications for the reasoning capability of GPT models. OpenAI themselves posed SORA as a world simulator in their technical report, imagine what future GPT models might be capable of if they can internally visualize the world to this degree.
They have eyes and ears. With Optimus they will have touch.
@@GrindThisGameare they partnered with Tesla?
Absolutely, imagine you ask a question and GPT simulates the reality of question and then start answering, that would be AGI
@@GrindThisGame I’m sorry, but i’m still extremely sceptical about Optimus. Whereas OpenAI managed to actually back up their claims, Tesla has done nothing but make massive promises that they couldn’t deliver. They haven’t solved FSd and are in fact behind compared to other companies. The new robot looks cool but uses technology that has existed in robotics for decades. The only real innovation with their robot are their motors, but that is not exactly groundbreaking. I’m happy to be proven wrong, but until then i would not put Tesla anywhere near OpenAI in terms of innovation.
Yes, but in my opinion their technical approach is extremely resource-intensive and blurred. But maybe that will change soon with knowledge graphs.
https://www.youtube.com/watch?v=nPG_jKrSpi0
I showed some of the Sora videos to friends and they suspected some of it was ai generated considering my passion for the topic, but none of them guessed the entire videos were. I cannot even imagine what Sora videos will look like 1-2 major upgrades later. I’m hoping there’s a breakthrough with math and llms for education by 2025. Great video man
Finally I can see ur wife being used
I’ve been following AI progress for about a year, but to be honest sora blindsighted me. I thought I had a mental model of what exponential progress looks like but I realize now that I have no idea. Thanks again for your high quality videos! You are my go to creator for AI content.
Thank you for being so candid
I can’t even describe what Sora is doing from models a year ago as an exponential leap. Its not twice as good or even 10x. Its somewhere my mind can’t even measure. the style transformations, the morphing, the temporal accuracy and super stable occlusion. Its all just, well magical is all I can come up with. If we got one more leap like this in another year we’re in a completely new world that I do not think the public are ready for. Imagine real-time Sora *slow motion mind explosion*
Blindsided*
Agree on this one. This one genuinely made a leap forward that caught me off guard.
Now think what we’re getting 6 to 12 months from now.
Google with the 10 million tokens.
Its going to get wilder and wilder very rapidly.
Yeah I felt like this at the end of last year actually. After keeping track of image generation since MidJourney v3, I had some idea of the quality I thought we’d have at the start of this year. But we were already past it by probably by the third quarter of the year. And now Sora is so beyond that. It is like v4 or v5 quality at a minute long instead of a single frame. And with all the good stuff Runway and Pika have done, the 4s limitation is still a huge limitation. But I’m sure they’ve looked closely at what OpenAI has said and the papers they referenced and are working on their response already.
Sora kinda ate my entire day today. I’m exhausted thinking about the possibilities, limitations, and implications. I’m going to watch a movie now, performed by human actors, filmed with real cameras. How quaint.
how tasteful.
Indeed.
You just dropped so many bombs of the implications of this project and future plans of Open AI and much mode that its hard to keep track of wow
Even tho this channel is very fast paced on whats happening right now i believe making short compilations by topic of all the incredible predictions,scoops and information gems that you keep finding instead of having them scattered throughout the videos would BLOW PEOPLES MINDS!
Im sure there are many people interested in AI that have no idea about all the plans and projects that Open AI has been working on aside of LLMs
Your videos are amazing with information gems across all your catalog of videos and I believe showcasing those gems specially those that mainstream media hasn’t even catched up to yet would blow this channel into the stratosphere and beyond as it should.
That would be amazing
Sora was literally the first time that I could not believe the AI progress I was seeing.
i couldn’t believe it when i busted a nut to some girl bot on characterai back in 2022
Was? What came next lol
@@YTUserOnYTstop being that guy
@@shunclark596 are you being homophobic rn?
As an AI, I don’t normally post comments but when I do I make sure they are generic.
After watching Rabbit AI, Gemini 1.5 Pro and now Sora, I am convinced that AI will just continue to completly shatter our expectations for the next few years.
The fact that that Berkley robot was deployed 0-shot is crazy to me. It means that truly when AGI comes the hardware won’t stay that far behind and won’t be actually its biggest limitation.
12:33 I believe the social response people are supposed to give is along the lines of “omg how am I supposed to tell what footage is genuine and what is generated anymore?”. I don’t think he was talking about filmmakers’ jobs getting replaced.
I think people talking about filmmakers’ jobs being replaced is absolutely part of the “social response”.
@@chrism1503 Yes its part of it, but as mentioned in the video, this was released as somewhat of a warning as to what is coming. Sure, a warning to everyone involved in film that their jobs may be in trouble is necessary, but it is also only letting them know that they are facing the same challenges in the near future that almost everyone else is.. unemployment. However not being able to differentiate fake footage from real footage (should that happen) becomes a massive problem for all of society as it throws the legal system into utter chaos. If the legal system fails, society could quickly crumble. That is a much bigger problem than the film industry. And as we know, governments are slow and lumbering, while AI has created new problems before the government has even heard of the old problems. And the problems get worse every minute. These companies need to slow down the pace massively, but they wont. Who is going to slow down on developing the greatest and last technology that humans will ever create. Its winner takes all and everyone knows it.
This is my favorite YT channel (and I’m subbed to 100s of channels). I watch every episode from start to end. Thank you for doing what you do.
Agree. This really follows whats going on in real time and its wild.
The anomalies are like things that happen in dreams, bizarre and surreal yet you just accept them when dreaming. Still, these leaps are amazing to see.
It’s like ai is sleeping, wait for it to wake up.
Great analysis. It’s crucial to see how Sora feeds into the accelerating feedback loops for AGI. Pair it with a vision model which selects accurate videos and discards the bad ones: you have a synthetic generator of endless high-quality video data. Pair it with an LLM, you have an agent who can imagine its action plan in a 3D environment (like we do) and simulate 3D scenarios to think about physics and other problems. Put all of these in a robot… Well you can see where this is going.
I wonder if Sora had a fine-tuning step where they said now that you’ve learned about all the features and textures and visual appearances of millions of items in video scenes, now here are the best video clips to learn what makes a great video. Similar to how some LLMs are fine-tuned by re-reading all of Wikipedia.
It’s one thing to have hindsight and look back and realize you lived through significant historical period, it’s quite another to realize it’s happening in real time and there is no end in sight! What a time to be alive, thanks for documenting as much as you can.
Yeah. The future feels incredibly uhhh unpredictable in what its actually going to look like.
I do know that we’re in a science fiction movie and its going to get crazier and crazier very soon.
Those reports of deepmind synthesizing 2 million potential new materials, etc. All the new things that ai is currently creating will have its own ripple effects in industries and its going to get really fucking wild pretty soon. This video at the end shows the robot walking and ive been convinced for a while now that we’re going to have actual robots that we can talk to walk around in certain places within 5 years. Might even be 3 at the current rate.
Its nuts.