o1 – What is Going On? Why o1 is a 3rd Paradigm of Model + 10 Things You Might Not Know
o1 is different, and even sceptics are calling it a 'large reasoning model'. But why is it so different, and why does that say about the future? When models are rewarded for correctness of answers, not just harmlessness or predicting the next word. But does even o1 lack spatial reasoning? How did the White House react yesterday? And did Ilya Sutskever warn of o1 getting … 'creative'?
AI Insiders – Now $9:
Chapters:
00:00 – Intro
01:04 – How o1 Works (The 3rd Paradigm)
03:10 – We Don’t Need Human Examples (OpenAI)
03:54 – How o1 Works (Temp 1 Graded)
06:28 – Is This Reasoning?
08:48 – Personal Announcement
11:27 – Hidden, serial Thoughts?
13:11 – Memorized Reasoning?
15:40 – 10 Facts
o1, Learning to Reason –
Species Tweet:
Noam Brown Video:
2021 Paper on Verifiers:
Let’s Verify Step By Step:
DeepMind Not Far Behind:
Chain of Thought for Serial Problems:
Q* Clues (yes, I am proud of that one):
Let’s Think Pixel by Pixel:
Or Dot by Dot (the general power of CoT):
RL by Karpathy:
Not Prompt Engineering:
ARC-AGI Analysis:
Reality Foundation models:
Memorising Reasoning vs CoT Q-Table:
When You Know the Right CoT:
Original Information Report:
StockFish: (chess)
Will AGI fall like chess:
Fei Fei Start-up $1B:
White House Report:
Simple-Bench:
o1 Fails:
My New Coursera Course! The 8 Most Controversial Terms in AI:
Non-hype Newsletter:
GenAI Hourly Consulting:
I use Descript to edit my videos:
Many people expense AI Insiders for work. Feel free to use this template:
AI Insiders – Now $9:
Here for the singularity
babe wake up, the best AI channel posted again
Yeah, you’re definitely single.
Hey babe, I’m awake and was already watching no worries
Yes honey
First off i’m not your babe. I am a bearded man in my 50s who has just worked out at Planet Fitness.
Hey diddle diddle.
Some people are honestly just impatient. They think that because we haven’t got AGI now, it’s all over and we aren’t ever gonna get it. Have patience, folks, because i get the feeling that the next year will really wow you all.
Cope. It is here or it isn’t. Choose one or be a wordcel.
Make it ten years… maybe
y’all forgot that nothing ever happens
said that last year
It’s because while our models are getting better and better on an almost monthly basis there’s still very little commercial applications for them. Once we solve the hallucination issue I think that’s when things will explode. Researching or just searching for any piece of information will be SO easy and companies will do so many things with these models.
The concept of ‘dreaming up’ some really interesting and creative solution (high temp) followed by harshly cutting those down to what is reasonable and realistic and plausible (low temp) feels very left brain / right brain in humans. Interesting.
It would be cool to have two different models working together to coordinate…
Wait is that kinda how the brain reasons? Do you have any references? That’s so cool
Every day, LLMs get more and more similar to brains (very inefficient ones)
See “thinking fast and slow” by Kahneman. Intuition vs reasoning
Underrated comment. There’s a reason animal brains (not just humans) have this collaborative bifurcation.
@@mathufnnThe comment below maybe tried to answer you there is also a video by veritassium on thinking fast and slow
The key to getting AI to go beyond just being a very good librarian pulling existing information to actually coming up with novel ideas and innovating is going to be enabling agentic behavior and autonomous experimentation
FYI, Claude gets the block stacking problem right, too.
Let’s think through this step-by-step:
1. Initial state:
C
A
–
B (on the table)
2. We need to get to this state:
A
B
C
3. The key constraint is that we cannot move block C.
4. Given these conditions, it’s actually impossible to achieve the desired stack without moving block C. Here’s why:
– Block C must remain at the bottom of the stack since we can’t move it.
– Block B needs to be placed on top of C.
– Block A needs to end up on top of B.
5. However, block A is currently under C, and we’re not allowed to move C to free A.
Therefore, it’s not possible to arrange the blocks in the requested order (A on B on C) without moving block C. The constraint of not moving C prevents any solution to this problem.
Man, Claude seems to have really good natural reasoning. Makes me curious about how it will fare with chain of thought learning
Claude has a stronger “intuition”, so to say. @@mathufnn
Yeah, I just got the right answer too with 3.5 Sonnet but it took a couple tries of telling it it was wrong
@@VeganCheeseburger For me it got it right first time without any help.
I’m pretty sure Subbarao Kambhampati, Noam Brown, Philip and all the LLMs are wrong. There is indeed a way to rearrange the block if you don’t just follow the reasoning you memorized and use your human superpower of lateral thinking: fix block C in place by an external tool fixed to the floor/ceiling/wall/whatever, shorten all legs of the table for the height of block B and fit the latter back into the stack. Easy-peasy!
Note that the figure at 6:10 has a log scale on the x-axis. This is logarithmic growth in pass@1 accuracy, not linear growth.
Exactly was about to comment
So it’s diminishing returns, huh?
This is to be expected because growth in accuracy becomes exponentially hard with higher accuracy as for each item there is a chance of the model not knowing the correct answer. This corresponds to a conjunction of many events with probability <1 resulting in a product that tends toward zero probability. Accuracy, hence, does not measure capabilities very well in the lower and upper ranges. I think you would actually have to transform the y-axis by an inverse sigmoid to get something more intuitive. However, accuracy overall can of course be a misleading measure still due to things like data leakage, overfitting, ceiling effect, lacking generalization to other datasets and so on.
@@panzerofthelake4460 not necessarily. The change in spacing over the log scale time for compute, is not consistent with increasing time for compute. So the next plot point could end up significantly higher than the previous plot point, given more time for compute. Won’t know until its executed
I removed my comment making the same point, it’s good point and you are right. OpenAI sentence, below the graph, is correct, but misleading.
Thank you for reducing the price! I’m one of those people that has been holding out for this to happen. I even browsed through your patreon a couple days ago and wished I could sign up
Same, just signed up!
@@autocatalyst Thank you both so much
O1 helped me code something in my Unreal Engine project that was previously impossible. Now, I can create anything and run it in my game—I’m blown away.
huh, my experience with o1 is that it just takes longer to get the wrong answer.
What was your workflow to do this?
@@jonschlinkert I’ve had a great experience with o1. Do you mean in general, or specific to programming?
@@jonschlinkertIt depends on what you’re asking it. It’s better for some coding and reasoning problems. However, it’s not as good in some other aspects
It’s the preview, the real deal will come later this year.
Imagine putting on a blindfold and listening to someone describe their kitchen to you in great detail. With enough detail you could probably figure out how to navigate their kitchen with your blindfold still on … but you’d probably also make some weird mistakes as well — this is the state of AI.
“Hallucinating” in AI is like having been told during training that “kids drawings are often posted to the fridge” and so you reach up to look for a drawing there but don’t find one. Your training told you to expect it, but you experience a mismatch between training and reality.
In o1-preview I asked it to decode gibberish… it spent 127 seconds trying to do it, in its output it didn’t say “I don’t know” flat out, but it didn’t give a wrong answer. Instead, it explained information that it used instead.
I used Yan LeCuns current llm bench question – walking along the sphere. It failed
@@CellarDoorCS Why people are so desperate to beat all the benchmarks? While there are some not beaten, we’re still not in Terminator timeline 😀 You should be happy.
@@r-saint No Terminators until we get integrated spatial reasoning. Although 1 billion raised in 4 months for that startup gives me a feeling it’ll come sooner rather than later.
No, it explained the information it likely could have used. It has no idea what it actually did. It’s all just mimicry.
@@CellarDoorCS
I have so far not been happy with the fact that pretty much everyone using that test seems to be completely unaware of how straight lines work on spheres. That said, I also haven’t found an AI model that gives a good answer. But the problem isn’t limited to AI models, most humans can’t answer it either, or more often answer it confidentially incorrectly. So I wouldn’t use a model’s failure to answer it in an attempt to demonstrate that it’s any lesser than humans.
To make this quick as I can (which, it’s me, so that’s probably not gonna be very quick):
A straight line on a sphere is not a line of latitude. People often look at the question as meaning you walk some way around the earth’s curvature south, and then because you turn to face east for example and start walking in a straight line they assume that means you will continue to travel exactly east on a line of latitude, a constant distance away from the pole. That is not the case. To see why, imagine only walking one meter away from the north pole in a physical space (like literally imagine yourself standing at the north pole and walking away 1 meter). Then turn 90 degrees left, and walk in a straight line. The assumption made in most people’s attempts implies that you will now walk in a 1 meter radius circle (a line of latitude with very close to a 90 degree north angle) around the north pole. That is clearly not a straight line as the question requires, as you are actively turning the entire time. Unless you walk all the way to the equator, you cannot think of the problem like this.
Straight lines on spheres are called great circles. On earth, they loop all away around the planet in a circle and have the core of the earth at their centre. All of them have an approximate length of 40,000km and radius roughly equal to the standard radius of the earth (varies slightly depending on your starting distance from the equator and direction).
i used o1 preview on graduate physics and math problems and it was stunning. so stunning. i have chills. it spent 143 seconds
How long do you usually take?
@@theWACKIIRAQI4 hours per problem
We’re doomed
If only it could answer problems with no current solutions
@@Walczykand They say LLM are not calculators.SO LLM It will not be possible to calculate accurately 😂
You are the best, more informative AI channel by far. I remember your Q* video that predicted many of the developments that took place with o1, before anyone! You don’t merely consolidate AI news like so many other channels, you thoroughly investigate the latest research, all so the general public can understand what is going on at the frontier of AI in simple terms.
The price you are charging is more than fair, and we appreciate your content very much. Thank you.
When a channel is so much in a league of its own that the only guest who can match the quality is your past self.
AI Explained training the next AI Explained model
True.
Bro should be opening his own lab
Gary Marcus thinks this is just clever brute force, “pushing the limits of dead end approach”, but does it matter when this approach gets better over time and hasn’t shown signs of stopping?
Like Altman said, if you strap a rocket to a stochastic parrot you can still reach the moon.
@@ThreeChe who knows, maybe this dumpster rocket will be creative enough to generate us a more efficient approach to AGI? totally possible 🤷🏻
@@timwang4659Exactly! lmao
@@timwang4659 I can see it being used to do nonstop AI research autonomosly, definitely possible it comes up with novel algos, architectures, etc that are more efficient. But I also don’t see why this current approach can’t produce an AGI model (likely mixture of experts). Performance improvements with compute scaling is not slowing down. The money is there to continue scaling. Maybe energy constraints will get in the way before it happens, maybe not.
I mean, human intelligence isn’t merely brute force but I think it’s clear there is an aspect of brute force to it. The advancement of many fields seems to be contingent on trialing different things until something sticks.
AI grifters and clickbaiters: watch this video, study it, and improve your game. This is what AI content on YouTube should look like.
@@faizanrana2998please start making sense. Are you a bot, or just dumb?
They don’t care about AI as much as they care about ad revenue. They’ll produce clickbaity slop just to keep the gravy train rolling.
If Ilya ever gets tired of making ai, he could get a role in movies giving super subtle but incredibly ominous and convincing predictions about various future scenarios. “Unexpected creativity that makes the antics of Sidney look very modest”. 😮
Yeah, and his manner of speech is very distinctive – perfect for ominous statements 🙂
True. He would be the best “mad/genius AI scientist.”
Saying “I don’t know” is a challenge for most humans, too.
Now I have the very strong feeling that “RL is creative” combined with “effective CoT is not English anymore” is actually what Ilya saw
As one of the people who complained about the price for AI Insiders previously being too high, I now stand by my word and have subbed for a year!
Thanks so much Mario!!