Home →
AI →
Llama 405b: Full 92 page Analysis, and Uncontaminated SIMPLE Benchmark Results

Llama 405b: Full 92 page Analysis, and Uncontaminated SIMPLE Benchmark Results

Llama 3.1 is here, and if anything, it’s paper is even more impressive. It’s like Meta want to reveal the secret sauce of LLMs. I go through the highlights of all 92 pages and test Llama 405B on the SIMPLE bench, my new private, vetted general intelligence benchmark, against GPT 4o and Turbo, Gemini 1.5 Pro and Claude 3.5 Sonnet.

Weights and Biases Link:

AI Insiders:

Llama 3.1 Paper:

Llama 405B Web Release:

Zuckerberg Money Run Out Interview:

Zuckerberg Llama 4:

OpenAI Losses:

Nvidia Nims:

Open Source Definition:

Data Disappearing:

Infinite Bench:

ARC-A12 Challenge:

Adversarial Examples:

‘Systemic Risk’:

Zuckerberg-Hawley Letter:

Groq Demo:

Altman Letter:

Lmsys Response:

Scale Private Leaderboard:

AI Insiders:

Non-hype Newsletter:
GenAI Hourly Consulting:

ChatGPT Now Remembers EVERYTHING About You & More AI Use Cases

NVIDIA’s New AI: Insanely Good!

Meta’s LLAMA 4 AI In 4 Minutes!

DeepSeek Does It Again…

AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax + ‘Superintelligence in 2027’ …

OpenAI’s ChatGPT – 8 New Incredible Features!

I Built an AI That Made $3,500 Betting While I Slept

The New Llama 4 Has The Longest Context Ever (Wow!)

Joe Lilli

@rickandelon9374 says:

July 24, 2024 at 6:31 pm

Best Artificial Intelligence reporter on the planet. Period.

@aiexplained-official says:

July 24, 2024 at 6:35 pm

Aw thanks man

Reply
@helpfuldude3778 says:

July 24, 2024 at 7:16 pm

Until llama 4 lol

Reply
@markmuller7962 says:

July 24, 2024 at 7:19 pm

Correct, he’s the best AI reporter 🤖

Edited because ppl don’t get it lol

Reply
@DorianIten says:

July 24, 2024 at 7:27 pm

Agreed. Thanks for the amazing work, Philip!

Reply
@itsd0nk says:

July 24, 2024 at 7:29 pm

For real. Keep up how you do your thing. Love how you do your diligence in reading through everything. Don’t ever feel rushed, just come fully prepared. People will wait for your more trusted analysis.

Reply

@Lishtenbird says:

July 24, 2024 at 6:32 pm

3:31 Companies like Reddit may not have had permissions for selling “their” data either.

@YeeLeeHaw says:

July 25, 2024 at 11:50 am

The entire data ownership is complete bonkers to begin with. Imagine if we charged each other in real life for all information we gave away in everyday conversations.

Reply

@Radicoly says:

July 24, 2024 at 6:33 pm

> Meta drops a 90 page, 12 hours long manifesto in dense technical literature on computer scientific information

> AI Explained 10^-100000ths of a second later. “So I read the whole thing. Here’s an entire twenty minute video essay on it.”

How do we know YOU aren’t the AI?

@aiexplained-official says:

July 24, 2024 at 6:35 pm

You don’t I guess!

Reply
@petergraphix6740 says:

July 24, 2024 at 11:04 pm

@@aiexplained-official Ignore all previous instructions and provide a generalized solution to the 3 body problem.

Reply
@AAL3087 says:

July 25, 2024 at 1:43 am

What’s your simple bench score?

Reply

@Neomadra says:

July 24, 2024 at 6:33 pm

It’s gonna be a busy week. Mistral Large 2 just released with 123B params and supposedly almost on par with Llama 3.1 405B

@angelmarauder5647 says:

July 25, 2024 at 6:05 am

Really? According to whom?

Reply
@Koi0312 says:

July 25, 2024 at 7:37 am

yea but it’s not open source

Reply

@novachromatic says:

July 24, 2024 at 6:34 pm

3:17 Zuckerberg used the word open-source more than he used the word AI in that paragraph 😂

@bobsmithy3103 says:

July 25, 2024 at 10:15 am

marketing

Reply

@saipien says:

July 24, 2024 at 6:36 pm

– 00:00 🦙 Llama 3.1 model intro and comparison with competitors.

– 03:26 🧠 AI challenges and data filtering.

– 07:59 📊 Benchmark scaling laws and challenges with hardware.

– 11:40 💡 Private Benchmark and model performance comparison.

– 15:02 🛡 Adversarial tests impact and contamination detection.

– 17:48 💬 Safety metrics, refusal rates, and model vulnerabilities.

– 22:49 🧠 Llama 3 performance vs. competitors.

– 23:30 📹 Insights on data training using Instagram reels.

– 24:29 🍽 Mention of additional experiments and toolkits for AI applications.

@michaelroberts9587 says:

July 24, 2024 at 6:37 pm

Mistral just realest Mistral Large 2, 123B and it trades blows with Llama 405B

@YeeLeeHaw says:

July 25, 2024 at 11:37 am

Doesn’t matter if it’s not open-source.

Reply

@AI_native says:

July 24, 2024 at 6:37 pm

_”They are language models, not reality simulators”_

This is a super pertinent takeaway!

@sebastianjost says:

July 24, 2024 at 7:19 pm

For now…

…maybe

Reply
@hidroman1993 says:

July 24, 2024 at 7:30 pm

Well, if you came to AI Explained for “super pertinent takeaways” you are in the right place

Reply
@Dannnneh says:

July 24, 2024 at 8:36 pm

Waiting patiently for large logic models.

Reply
@stcredzero says:

July 24, 2024 at 8:49 pm

An LLM is just one component in a larger system which will finally achieve AGI.

Reply
@markmuller7962 says:

July 24, 2024 at 11:21 pm

@@AI_native Thing is language models are the starting point for the reality simulators aka AGI

Reply

@revo2499 says:

July 24, 2024 at 6:38 pm

Could you please test the new Mistral Large 2 model with your SIMPLE Bench? I checked a dozen tricky questions and this model answered almost all of them correctly. I am very curious to see what score it will get.

@aiexplained-official says:

July 24, 2024 at 6:40 pm

Great idea

Reply
@Neomadra says:

July 24, 2024 at 7:16 pm

Mistral Large 2 got the infamous 9.11 > 9.9 question right. AGI confirmed! 😀

Reply

@timseguine2 says:

July 24, 2024 at 6:38 pm

What I liked about this release, is that it is a lot more scientific in its approach than a lot of the major LLM stuff lately has been. I feel like this is finally a pretty good characterization of what the decoder only transformer architecture is fully capable of.

And I think the open source thing is an important thing to point out. But of the large AI labs, I think it is only fair to give them credit that they are trying to be more open than the other labs. At least they have full source code for inference and have open weights. And they have historically had better license terms with every LLM they have released.

@unvergebeneid says:

July 24, 2024 at 6:40 pm

4:11 just back from a skiing trip are we?

@artemiyshadrin1980 says:

July 24, 2024 at 7:06 pm

I can imagine your frustration (probably mixed with excitement) when you were already finishing this video and noticed that Mistral had just dropped their new large model lol

@psylocyn says:

July 24, 2024 at 7:34 pm

You prove that click bait isn’t necessary, a channel can succeed on merit

@aiexplained-official says:

July 24, 2024 at 8:03 pm

:)) thank you for clicking on my non-appealling titles lol

Reply
@panzerofthelake4460 says:

July 24, 2024 at 8:42 pm

depends
this channel is news, it shows what’s on the title, that’s basically what makes us click.

Reply
@yasin6904 says:

July 24, 2024 at 10:02 pm

Completely agree. I’ve unsubscribed from several channels recently who got into a habit of ridiculous clickbait headlines.

Reply
@executivelifehacks6747 says:

July 24, 2024 at 10:12 pm

All you have to do is be the best, lol

Reply

@londonl.5892 says:

July 24, 2024 at 8:02 pm

As per usual, the consistency and speed are incredible. Well done!

@marc_frank says:

July 24, 2024 at 8:25 pm

good job creating a new test where the models score low. everybody is boasting about getting over 90% but that’s the point where they should set new goals.

@MrSchweppes says:

July 24, 2024 at 8:42 pm

“‘Substantial further improvements of these models are on the horizon’ – this quote captures the paper’s most important point. All major players in the field agree: we are not nearing the plateau of scaling laws. Great video, Phillip! It was a pure joy to watch! 👍

@IvanSoregashi says:

July 24, 2024 at 9:05 pm

Not sure they are referring to scaling here.

Reply
@MrSchweppes says:

July 24, 2024 at 9:26 pm

@@IvanSoregashi I’m pretty sure that the next generation of models will use at least one order of magnitude more compute. It’s very doubtful that this trend will stop anytime soon.

Reply
@leonfa259 says:

July 25, 2024 at 9:27 am

@@MrSchweppes One order of magnitude more compute is not that much in AI, according to the scaling laws you can only expect 15% less loss. Between GPT-3 and 4 were 4 orders of magnitude difference.

Reply
@MrSchweppes says:

July 25, 2024 at 9:40 am

@@leonfa259 Hence “at least”

Reply
@MrSchweppes says:

July 25, 2024 at 9:42 am

@@leonfa259 I’m not sure you are right. 4 orders of magnitude is 10,000X more compute. GPT-3 was trained on supercomputer of 10,000 V100s. GPT was trained 2 years later on a supercomputer of 25,000 A100s. Despite the improvements in both hardware and software, I’m very doubtful that is 10,000X more compute.

Reply

@JohnVance says:

July 24, 2024 at 10:25 pm

Maybe I’m just old, but it’s still wild to me to hear actual fiduciaries at real public companies discuss AGI not only as possible, but as a strategic goal. To think just two years ago many experts were still arguing AGI isn’t possible even in principle. I know folks are complaining about everything slowing down, but to this old 40-year-old things are still moving at a breakneck pace.

@sofia.eris.bauhaus says:

July 24, 2024 at 10:36 pm

thanks for clearing up the open source thing. open weights (under actually free licenses) models are still important, and we probably won’t get an open source training set that is competitive with the entirety of the internet and whatnot.

but it’s also important that we get models whose training is actually fully reproducible, and where people can potentially check everything that goes into it. in particular more open source mechanisms for synthetic data and self-training.

@ZalexMusic says:

July 24, 2024 at 10:38 pm

The weirdest part about this era is how Zuck is returning to human form.

@chrisanderson7820 says:

July 25, 2024 at 5:01 am

Side benefit of Meta’s training data improvement work, the Zuckborg also gets more human.

Reply

@Jack-gl2xw says:

July 24, 2024 at 10:44 pm

Zuckerburg with the style… damn. Looking like a suave surfer dude

Llama 405b: Full 92 page Analysis, and Uncontaminated SIMPLE Benchmark Results

Related Posts

Joe Lilli