Altman Expects a ‘Fast Take-off’, ‘Super-Agent’ Debuting Soon and DeepSeek R1 Out
OpenAI looks set to debut their Operator system, and some leaks are out. At the same time Deepseek R1 releases some numbers, and Sam Altman says he might have been wrong before, and now anticipates a 'fast take-off'. Plus two papers to give you an idea of what a super-agent might be decent at doing, some more exclusive article analysis and much more. Who said anything else is happening today…
80,000 Hours Channel:
Spotify:
AI Insiders ($9!):
Chapters:
00:00 – Introduction
01:13 – Pro Cost and OpenAI Operator
04:00 – Agent Benchmarks Being Targeted
07:48 – Fast Take-off, Altman
08:48 – Altman flip-flops
10:02 – Deepseek R1 First Reaction
Altman ‘100x expectations out of control’:
OpenAI Operator Table:
WebVoyager:
OSWorld:
Axios Exclusive 1 (Super-Agent):
Axios Exclusive 2:
Deepseek R1 Numbers:
Does 1.5B outperform 3.5 Sonnet on Math?:
Deepseek R1 (deepseek-reasoner) Pricing:
Altman Fast Takeoff:
OpenAI Economic Blueprint:
Target is Long-horizon Tasks:
Support Regulations:
Donation:
Amodei on Regulations by 2025:
‘Feel the AGI’:
GPT-5 and o-series merger:
o1 Thinks in Chinese:
Non-hype Newsletter:
Podcast:
R1 is 100% free and unlimited on their chat platform, and the API is dirt cheap too. Insane. They can even correctly answer this prompt, which o1 can’t:
“Write a haiku where the second letter of each word when put together spells ‘SIMPLE'”
Regarding coding, Aider with the pair deepseek-reason and deepseek-chat on Architect mode will be insane.
Is it good at coding?
@@warmesuppe DeepSeek 3 is very good at coding, on par with the latest Claude 3.5 Sonnet, I haven’t test DeepSeek R1 at coding yet.
Good timing. Distraction from inauguration
What a shitshow man… not this video, this video is a masterpiece as usual
It’ll probably be fine, maybe even really good.
Good times
Democrat with PTSD detected. Maybe ChatGPT can help you with some counselling?
Excited to hear about the coding project you mentioned! Interesting that, for the moment, sonnet still beats o1 in professional use cases.
Maybe I’m not driving it hard enough, but seeing little difference 4o v o1 in real-life use cases.
As a Help Desk Technician, there goes my job. I’ve been out of work, for this level of the industry since November 2024.
I did a similar job for a few months back in 2019. And even then, way before ChatGPT became a thing, I already had a feeling that much of the work I did could eventually be automated away.
Fast Take-off and we have virtually stopped talking about alignment.
Thank God, an AI “aligned” with a species constantly killing each other at scale and rapidly knowingly destroying their own environment is terrifying
@@jyjjy7 Indeed. All alignment “options” for powerful AI are only bad outcomes.
I’m glad about that. I hope OpenAI finally stops wasting money on alignment research and focuses on increasing capabilities.
I really don’t trust bench marks at all now. I’m an AI coder which basically means that I knew nothing when I started to code. I found Claude 3.5 sonnet the most superior tool by far in my experience.
We really need a diverse range of benchmarks for models as we accelerate because everyone has different needs. A coder who knows some really basic stuff may find other models more helpful.
It’s really a tricky question/ benchmark. Would you hire a mathematician to teach your 3 year old basic math or someone who’s more experienced with teaching young kids?
Im curious in how far you get without any coding knowledge? I started coding about 6-7 years ago and I rarely find it helpfull to use AI to code. You probably already learned a bit about coding using the AI by now, but what are the problems you face when using codeassistants?
Paper has been out for 90 minutes and you haven’t read the whole thing??
I was shocked, your relentless excellence has trained me to expect you to have always read everything moments after release haha.
I had always thought that he read papers into existence.
My benchmark for ASI is when all the career pages for AI companies are blank
The quality of the new Sonnet 3.5 still blows my mind, how it’s able to compete with OpenAI’s reasoning models. That’s why I’m also more hyped for Anthropic’s next move than OpenAI’s.
Simple fact that they can’t search the web makes Anthropic’s models useless for a large number of people, myself included.
Agree. I’m waiting with bated breath.
@@ArianeQubeyou can always use perplexity with sonnet 3.5
@@ArianeQube Sure, that’s valid but doesn’t apply to everyone.
I tend to find Deepseek concerning . Either the Chinese have been able to match or nearly match the progress in the west without the latest chips meaning they have better algorithms or they have secretly gotten a large number of the latest chips illegally. I guess time will tell.
Can’t wait for this to turn into one of the biggest RCE vulnerability in the world.
All the more reason to use adblock
“buy me the best esports gaming mouse on the market”
Operator: *buys “ultimate esports super mouse” for $10 on temu*
Either that, or a 5 year old mouse.
Hell of a way to start a Monday. Hosting a workshop on LLMs for several friends who are academics, and it feels like the ground is shifting under our feet continuously. This channel is a great resource as always
What is the end-game here? Economy will collapse if AI takes over even 20-30% white collar jobs in next 5 years. The last time we had such high unemployment was during the great depression. UBI is not going to solve this problem! AI might lead to a violent revolution against oligarchy.. What say?
UBI could solve the problem, if there was the will and a massive cultural shift. Not optimistic that will happen quickly or easily, but we’ve seen how bad human societies are at managing long horizon problems like climate change, so if it is a problem that’s heading our way maybe it’s better for it to come as a system shock than a slow collapse.
Uniwersal basic income. But don’t even think you’re going to afford much goodies and services
2:53 small flex having Noam Broom following you haha
I use Descript as well. Same pinch 😊 5:10
My two main models that I use at the moment are the O1 Pro and the Gemini 2.0 Flash. I’m currently waiting for the O3 and other advancements. Thanks for your great videos! 🙂
I’ll believe it when I see it.
Not sure why they’re asking agents to click a mouse to navigate a website – HTML is structured and rich with markup and metadata, it would make a lot more sense for an agent to interact at the DOM level than with pixels!
When do you think Anthropic’s next release is? Do you think they are working on a CoT model too?