This New Free AI Is History In The Making!
❤️ Check out Lambda here and sign up for their GPU Cloud:
Try it out (choose DeepSeek as your model):
Official (read the privacy policy below before you use this one):
Run it at home:
Links:
📝 My paper on simulations that look almost like reality is available for free here:
Or this is the orig. Nature Physics link with clickable citations:
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Benji Rabhan, B Shang, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Taras Bobrovytsky, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi
If you wish to appear here or pick up other perks, click here:
My research:
X/Twitter:
Thumbnail design: Felícia Zsolnai-Fehér –
Where is Anthropic? No model from them in a while
Anthropic pulled ahead with Claude 3.5, like: “We’re the best! Now we can slow down AI dev by just not working!” RIP
(I’m pretty sure this was what actually happened. The CEO commented something about “acknowledging that Anthropic was contributing to the AI development terminal race conditions.”)
Anthropic’s flagship model underperformed compared to just training 3.5 for longer and improving their post training, so they released 3.6 instead. They’re currently working to produce their own version of test time scaling models, and Claude 3.5 Opus.
They all have something up their sleeves that they’re holding back to see what other companies release.
I think anthropic is out of the AI race
They’re obviously working on their new model, but right now it’s outperforming chatgpt in most cases so unless they have something actually useful to release, I don’t see why they should release useless models like O1 for instance. Yeah O1 is interesting on paper, but useless in most case scenario on top of being slow and expensive as hell. So I hope anthropic is just focusing on their next useful model and release it when ready
It’s definitely impressive, although in terms of coding the distilled versions are still giving me subpar responses compared to claude
They can’t write an AI summarisation app when handed the working example code to work from!
The distilled versions are just fine-tunes of the base models, they didn’t go through reinforcement learning. Honestly, calling them “Small R1” is a travesty.
Obviously, it’s distilled versions. If you scaled Claude down to the same size it wouldn’t do much better.
@DefaultFlame definitely, just wanted to highlight this since some reviews are so enthusiastic and the distilled models in theory perform on some coding benchmarks exceptionally well
Don’t use the distilled models then
Unlike OpenAI, DeepSeek is open AI.
china No.1 period
@@CharlesLijt Ask it to criticize China
Ask it anything about China it wont answer
“Closed AI companies like OpenAI.” – Fireship 2025
@@Joso997why would i need it to be able to criticize China? I don’t care i just want it to be performent in what i need.
Wait, did you run full DeepSeek R1 locally on your Mac M2 Ultra at 1:15 in this video? Or is it some smaller distilled model?
Isn’t the full deepseek R1 400gb or something. How is he supposed to run it locally without using a pruned model
@@ImNotQualifiedToSayThisButusing a quantization.
The ram is 192GB on that device!
@@SrIgort Yes, this is what I meant. Of course “double precision” model would be too big, but is it quantized “full model” or small distilled model (trained by full R1 outputs)? Do anyone knows? And how good is the “best” model you can fit locally (on a reasonable computer) for coding? (Not all codes should be shared with Web-based models, you know…)
Sam should change his company name to ClosedAI
hur hur hur
“Military AI” is more likely to be
💀
NSA AI
They could could go with a name that sounds like a spring… “SPYAIAiaiai’… hehe
they are irrelevant suddenly, LMAO
You can run this with about ~400GB of ram (the best, 670B version) or 43GB down to 6GB (70B to 12B)
The lower models are pretty stupid though. I asked 7B to give me a riddle for me to solve. It then hallucinated that I had given it a riddle and solved it itself.
Ram or Rom?
I thought R1 671B was 404GB ROM
@@hippopotamus86 Yes, that’s kinda obvious, but they are good for programming still for example. The smallest I would go is around 12B.
@@w04h I’ve been experimenting with it all day. I’ve really not managed to get much good code out of it. The model ran on their website does a good job, but ultimately I had to go back to O1 to get scripts working. I created an Outlook addin which uses the 7B model to summarise emails. Works good enough for that!
@@hippopotamus86 probably because 7B models tend to over-hallucinate, the 32B and the 70B models are pretty good though
AGI = Altman Gets Investment
😂😂😂😂
😂😂😂😂
😂😂😂😂
Clever 😂
yes! lolol
Chinese AI is very impressive. They have caught up despite the US embargos being placed on GPUs. When China starts producing competent GPUs, I have little doubt they will be class-leading or comparable at least.
Thanks to the Republican party, China has passed us on pretty much every single front. This is the last era of America’s soft and hard power.
Sounds like you don’t even know the difference between training and breakthrough. All the Chinese AI didn’t start on their own, they started off open source model. US cannot stop China from innovating the AI architecture, but they can slow down the training part, which often take a long time before you get a decent AI
Just don’t ask DeepSeek about Taiwan
@@Trumben Keep crying about a chinese state
@@shikyokira3065That’s a painful oversimplification, I doubt you know any technical machine learning basics, let alone reading the DeepSeek paper.
I really enjoy using this model. The chain of thought is amazing to behold. And anything that can rip AGI away from a tiny set of USD-almost-trillionaires is a huge benefit to the world. “Bravo” as you say. And all this apparently is a side-project!
I was amazed by how sophisticated and human-like its reasoning process is, especially considering they used RL over human training. For me, this is the most impressive model released to the public to date.
if you ask deepseek about president xi jinping it does not provide any answer but if you ask about Donald trump it gives a long answer
In my experience for writing code it is working significantly better than the current best model of openai(o1-2024-12-17) at 30 times cheaper price.
But it is bit lacking on general knowledge compared to GPT-4o as it is a smaller model. Also r1 kind of stops following orders when the chat gets too long, just reminding the older parts of the chat fixes this.
no way they opened the AI
3:54 how if most people will be using quantized models and you can’t do anything to a quantized model (except for inference)
just shows that AI development has plateaued since everybody has pretty much caught up to openai
Honestly I tried this AI for coding a plugin and it kind of sucks. It gives better results than Claude Sonnet 3.5 sometimes, but, well, the fact that it can think for a long time doesn’t mean the quality of thinking is high. It missed some obvious stuff and doesn’t see connections between facts the way Sonnet sees it without that thinking process.
Which model?
There are 4 that I know of.
I think you want to use chain of thought models for concepting/designing/reviewing, and use the normal models to write plain code.
Every government should have a simple law for IT companies.
If the product is closed source it can NOT be called”open” or any similar name as it is misleading consumers.
If the product is open source it can have the name “open” If it changes to closed course name must be changed beforehand.
Failure to comply should result in heavy fines
Their product name is ChatG
PT🙂
The company can just have one product that is open and all the rest closed though and still call themselves open source!
I compared two 14B parameter (same Quantized) models (DeepSeek-R1-Distill-Qwen-14B-GGUF/DeepSeek-R1-Distill-Qwen-14B-Q6_K.gguf vs. phi-4/phi-4-Q6_K.gguf) on a local machine (i9 13000k/64GB RAM/4090 RTX) to generate Python code for hand tracking using OpenCV and mediapipe. While DeepSeek-R1’s innovative approach is promising, the phi-4 model consistently outperformed it in this specific task. The phi-4 model not only successfully generated the code but also corrected errors present in the DeepSeek-R1 outputs. This preliminary test suggests that while DeepSeek-R1 shows potential, further refinement is needed to surpass the current performance of competing models in this domain.
This is where the 32B version (deepseek-r1-distrill-qwen-32b) and Phi-4 14B start to show their difference, for the obvious reason that it’s double the parameter size. However, the idea of using deepseek-r1-zero on phi-4 is there which means any base model can be utilized to add this thought process.
I must have chosen the wrong quanitized model, because its so slow, does anyone know what I should select to run it locally on a 4090?
Good luck with the Stargate, murikans!
So two Nvidia Digits (256GB unified ram) for $6000 will run the full 600b locally?😮 thats actually quite affordable, great for many small companies internal stuffs
HOLD ON TO YOUR PAPERS BOIS!!