ChatGPT O1 Explained

I reverse-engineered OpenAI's O1-Preview model using O1-Preview! I asked it to generate the full research paper with code and I gave it dozens of related research papers from the past few years as context. It recreated a working version of the O1 model to the best of it's ability and In this video, we'll go over all the details of the model, the code, and the research techniques that make the O1 model series state of art across so many benchmarks. LETS SPREAD THIS AI POWER, I can't wait to see what you think, enjoy!

Code & paper for this video:

Deploy your own AI trading bot (no code):

Want more AI/ML education? Connect with me here:
X:
Instagram:
Linkedin:
TikTok:
Facebook:

⏱️ **Chapters:**
0:00 – Introduction: Reproducing OpenAI's o1 Model Series
1:30 – Generating a Research Paper Using o1 Preview
2:30 – Overview of 'o1-nano': An Open Source, Explainable Model
3:30 – Understanding Chain-of-Thought Reasoning in o1 Models
4:30 – How Reinforcement Learning is Used in Training and Inference
5:30 – Exploring Reasoning Paths and Subtasks During Inference
6:30 – Unpacking OpenAI's Reasoning Tokens
7:30 – Overview of the Model Architecture
8:30 – Core Components: Transformer, Chain-of-Thought Module, Reasoning Token Generator
9:30 – Training the Model to Reason Better Using Reinforcement Learning
13:30 – Historical Papers Leading to o1: Chain-of-Thought and 'Let's Verify Step by Step'
15:30 – The New Scaling Law: Inference Time Scaling
16:30 – The Usage of of Reinforcement Learning
17:30 – Demo of the Code: Running the Test
18:30 – Conclusion: Open Source Code and Research Paper as a Starting Point
19:00 – Closing Remarks and Encouragement to Explore the GitHub Repository

Don't forget to like, share, and subscribe for more deep dives into AI advancements!

I Built a Sports Betting Bot with ChatGPT:

I Built a Trading Bot with ChatGPT:

Watch ChatGPT Build an AI Startup:

Watch ChatGPT Build a Finance Startup:

Watch Me Build a Startup Playlist:

🔔 Subscribe and hit the notification bell to join the AI revolution!

@rahulvmp2050 says:

October 18, 2024 at 3:23 pm

Cool

@PharoahJardin says:

October 18, 2024 at 3:29 pm

Nice !

@mercymay42 says:

October 18, 2024 at 3:31 pm

Love your videos! 🥰

@alexiades says:

October 18, 2024 at 3:42 pm

Awesome man, great to see you back into AI.

@zacharybamberger6965 says:

October 18, 2024 at 3:53 pm

I’m sorry to be “that negative guy” in the comments, but some of your claims here are overstretched, and the concepts you’re throwing around are at a surface level. You made little reference to the importance of reward models in PPO and did not distinguish between per-step and global evaluation (a critical aspect of creating the tree structure you made reference to). There’s also no evidence that reasoning models require special tokens. Finally, the applicability of your method here is super constrained, whereas other MCTS-based methods with language models manage to generalize to non-math based tasks.

You’ve produced excellent videos in the past, but this one unfortunately falls short

@jeffcarey3045 says:

October 18, 2024 at 4:43 pm

Don’t even bother, he thinks he’s right about everything.

@sapandeepsandhu4410 says:

October 18, 2024 at 3:57 pm

back to track great

@competidor64 says:

October 18, 2024 at 4:04 pm

Thanks Siraj

@robbybobby6464 says:

October 18, 2024 at 4:23 pm

Havent seen or kept up with your channel in a long while but glad to see you’re still creating awesome and well-explained content!

@mootytootyfrooty says:

October 18, 2024 at 4:32 pm

Ha this is great, I was thinking a few days ago about what would happen if we used o1 to document itself and the paper chain, and you went and did it!