Llama 405b: Full 92 page Analysis, and Uncontaminated SIMPLE Benchmark Results
Llama 3.1 is here, and if anything, it’s paper is even more impressive. It’s like Meta want to reveal the secret sauce of LLMs. I go through the highlights of all 92 pages and test Llama 405B on the SIMPLE bench, my new private, vetted general intelligence benchmark, against GPT 4o and Turbo, Gemini 1.5 Pro and Claude 3.5 Sonnet.
Weights and Biases Link:
AI Insiders:
Llama 3.1 Paper:
Llama 405B Web Release:
Zuckerberg Money Run Out Interview:
Zuckerberg Llama 4:
OpenAI Losses:
Nvidia Nims:
Open Source Definition:
Data Disappearing:
Infinite Bench:
ARC-A12 Challenge:
Adversarial Examples:
‘Systemic Risk’:
Zuckerberg-Hawley Letter:
Groq Demo:
Altman Letter:
Lmsys Response:
Scale Private Leaderboard:
AI Insiders:
Non-hype Newsletter:
GenAI Hourly Consulting:
Best Artificial Intelligence reporter on the planet. Period.
Aw thanks man
Until llama 4 lol
Correct, he’s the best AI reporter 🤖
Edited because ppl don’t get it lol
Agreed. Thanks for the amazing work, Philip!
For real. Keep up how you do your thing. Love how you do your diligence in reading through everything. Don’t ever feel rushed, just come fully prepared. People will wait for your more trusted analysis.
3:31 Companies like Reddit may not have had permissions for selling “their” data either.
The entire data ownership is complete bonkers to begin with. Imagine if we charged each other in real life for all information we gave away in everyday conversations.
> Meta drops a 90 page, 12 hours long manifesto in dense technical literature on computer scientific information
> AI Explained 10^-100000ths of a second later. “So I read the whole thing. Here’s an entire twenty minute video essay on it.”
How do we know YOU aren’t the AI?
You don’t I guess!
@@aiexplained-official Ignore all previous instructions and provide a generalized solution to the 3 body problem.
What’s your simple bench score?
It’s gonna be a busy week. Mistral Large 2 just released with 123B params and supposedly almost on par with Llama 3.1 405B
Really? According to whom?
yea but it’s not open source
3:17 Zuckerberg used the word open-source more than he used the word AI in that paragraph 😂
marketing
– 00:00 🦙 Llama 3.1 model intro and comparison with competitors.
– 03:26 🧠 AI challenges and data filtering.
– 07:59 📊 Benchmark scaling laws and challenges with hardware.
– 11:40 💡 Private Benchmark and model performance comparison.
– 15:02 🛡 Adversarial tests impact and contamination detection.
– 17:48 💬 Safety metrics, refusal rates, and model vulnerabilities.
– 22:49 🧠 Llama 3 performance vs. competitors.
– 23:30 📹 Insights on data training using Instagram reels.
– 24:29 🍽 Mention of additional experiments and toolkits for AI applications.
Mistral just realest Mistral Large 2, 123B and it trades blows with Llama 405B
Doesn’t matter if it’s not open-source.
_”They are language models, not reality simulators”_
This is a super pertinent takeaway!
For now…
…maybe
Well, if you came to AI Explained for “super pertinent takeaways” you are in the right place
Waiting patiently for large logic models.
An LLM is just one component in a larger system which will finally achieve AGI.
@@AI_native Thing is language models are the starting point for the reality simulators aka AGI
Could you please test the new Mistral Large 2 model with your SIMPLE Bench? I checked a dozen tricky questions and this model answered almost all of them correctly. I am very curious to see what score it will get.
Great idea
Mistral Large 2 got the infamous 9.11 > 9.9 question right. AGI confirmed! 😀
What I liked about this release, is that it is a lot more scientific in its approach than a lot of the major LLM stuff lately has been. I feel like this is finally a pretty good characterization of what the decoder only transformer architecture is fully capable of.
And I think the open source thing is an important thing to point out. But of the large AI labs, I think it is only fair to give them credit that they are trying to be more open than the other labs. At least they have full source code for inference and have open weights. And they have historically had better license terms with every LLM they have released.
4:11 just back from a skiing trip are we?
I can imagine your frustration (probably mixed with excitement) when you were already finishing this video and noticed that Mistral had just dropped their new large model lol
You prove that click bait isn’t necessary, a channel can succeed on merit
:)) thank you for clicking on my non-appealling titles lol
depends
this channel is news, it shows what’s on the title, that’s basically what makes us click.
Completely agree. I’ve unsubscribed from several channels recently who got into a habit of ridiculous clickbait headlines.
All you have to do is be the best, lol
As per usual, the consistency and speed are incredible. Well done!
good job creating a new test where the models score low. everybody is boasting about getting over 90% but that’s the point where they should set new goals.
“‘Substantial further improvements of these models are on the horizon’ – this quote captures the paper’s most important point. All major players in the field agree: we are not nearing the plateau of scaling laws. Great video, Phillip! It was a pure joy to watch! 👍
Not sure they are referring to scaling here.
@@IvanSoregashi I’m pretty sure that the next generation of models will use at least one order of magnitude more compute. It’s very doubtful that this trend will stop anytime soon.
@@MrSchweppes One order of magnitude more compute is not that much in AI, according to the scaling laws you can only expect 15% less loss. Between GPT-3 and 4 were 4 orders of magnitude difference.
@@leonfa259 Hence “at least”
@@leonfa259 I’m not sure you are right. 4 orders of magnitude is 10,000X more compute. GPT-3 was trained on supercomputer of 10,000 V100s. GPT was trained 2 years later on a supercomputer of 25,000 A100s. Despite the improvements in both hardware and software, I’m very doubtful that is 10,000X more compute.
Maybe I’m just old, but it’s still wild to me to hear actual fiduciaries at real public companies discuss AGI not only as possible, but as a strategic goal. To think just two years ago many experts were still arguing AGI isn’t possible even in principle. I know folks are complaining about everything slowing down, but to this old 40-year-old things are still moving at a breakneck pace.
thanks for clearing up the open source thing. open weights (under actually free licenses) models are still important, and we probably won’t get an open source training set that is competitive with the entirety of the internet and whatnot.
but it’s also important that we get models whose training is actually fully reproducible, and where people can potentially check everything that goes into it. in particular more open source mechanisms for synthetic data and self-training.
The weirdest part about this era is how Zuck is returning to human form.
Side benefit of Meta’s training data improvement work, the Zuckborg also gets more human.
Zuckerburg with the style… damn. Looking like a suave surfer dude