Home →
AI →
Orca: The Model Few Saw Coming

Orca: The Model Few Saw Coming

The first model set to be opensourced that actually comes close to ChatGPT, and is just 13B (that's small enough for a laptop). The 51 page report from Microsoft was released just 48 hours ago but I have gone through it all, and bring relevant insights from 5 other papers.

By imitating the logic and explanations of GPT 4 (and using GPT 3.5 as an assistant), as well as by training on diverse tasks and an order of magnitude more examples, we have Orca. I will showcase it on a dozen benchmarks and go through in detail how it works and why.

I will also end on comments from Sam Altman and Ilya Sutskever on whether Opensource will catch-up…

Orca Paper:
False Promise Paper:

FLAN:
Vicuna:
No Moat memo:
LLM Leaderboard:
AGIEval:
BIG-Bench Hard:
Language Models as Tool Makers:
Altman Interview:
DERA Paper:
Let's Verify Step by Step:

Non-Hype, Free Newsletter:

OpenAI’s ChatGPT o3 – Pushing Humanity Forward!

How to Research Anybody with ChatGPT o3 ✍️

ChatGPT’s o3 Can Find You Anywhere in the World…

NVIDIA’s Tech: Brutal 2,500,000 Part Simulation!

New Feature Makes NotebookLM Better Than Ever 👍

New Higgsfield Video AI Surprised Us!

Mindblowing o3 Prompts, OpenAI Models & More AI Use Cases

ChatGPT Remembers EVERYTHING About You Now 🤯

Joe Lilli

@laslog says:

June 7, 2023 at 4:17 pm

Matching gpt4 performance in some areas with 13B! INSANE!

@aiexplained-official says:

June 7, 2023 at 4:19 pm

Yeah just a couple. More of a GPT 3.5 rival though.

Reply
@EddieBurke says:

June 7, 2023 at 4:25 pm

@@aiexplained-official considering I was using GPT-2 to help me write stuff like 2 years ago, it is still fucking unreal how fast open source how caught up to commercial.

Reply
@blablabic2024 says:

June 7, 2023 at 4:44 pm

It will definitely surpass closed source in couple of years. OpenAI is a great platform that will become obsolete in cca. five years and along with it the market cap of software giants. Even the hardware giants will get displaced in their market share due to slew of open source hardware (RISC-V et al). We’re not talking about moats, we’re talking about dams that just got breached….

Reply
@EddyLeeKhane says:

June 7, 2023 at 4:51 pm

@@EddieBurkehave they though?

Still looks like rote memorization by fine-tuning to me

Reply
@dv_interval42 says:

June 7, 2023 at 4:57 pm

@@EddyLeeKhane Yep. Real progress would be apparent, we wouldn’t have to bend over backwards to justify actual leaps!

Reply

@michabrugger7664 says:

June 7, 2023 at 4:20 pm

This is top quality content! Thanks for keeping me up to date 🙂

@aiexplained-official says:

June 7, 2023 at 5:04 pm

Thanks Micha

Reply
@SENTRY456123 says:

June 7, 2023 at 9:37 pm

No cutting, no BGM, no subtitles, no animations yet still top quality video with only natural speech talent

Reply

@atpray says:

June 7, 2023 at 4:34 pm

If a 13B parameter can do that, I cannot imagine what GPT-4 with further improvements can do.

@zyansheep says:

June 7, 2023 at 5:40 pm

@@fontende more data isn’t the only way to improve a model…

Reply
@MindFactoryAI says:

June 7, 2023 at 5:43 pm

@@zyansheep Right, essentially they have done the easiest thing, text prediction, with a bit of inverse RL. There is a much more complex space of composite models, inductive biases, reasoning processes and loss functions as yet unexplored.

Reply
@martiddy says:

June 7, 2023 at 7:33 pm

@Phobos Deimos Actually Meta is working on a multimodal AI that includes images, text, audio, depth information and more.

Reply
@bartpelle3460 says:

June 7, 2023 at 7:58 pm

@@fontende *beep beep boop, I am a bot and this action was performed by CommentGPT*

Reply

@DaveShap says:

June 7, 2023 at 4:50 pm

“Everyone and their gramma used it for whatever” – spoken in an erudite English accent. My life is infinitely more complete for having heard you utter these words. Thank you.

@aiexplained-official says:

June 7, 2023 at 4:55 pm

Thanks David. My BBC British accent is a perfect disguise if I don’t actually know something. No one would guess.

Reply
@DaveShap says:

June 7, 2023 at 6:18 pm

We Americans have been thoroughly trained – Oh yeah this guy sounds SUPER credible! 😀

Reply
@GuinessOriginal says:

June 7, 2023 at 8:03 pm

@@DaveShap it’s Grandma David 😉 just an FYI in case you weren’t sure

Reply

@FrancoisPesce says:

June 7, 2023 at 4:56 pm

I find it intriguing how this research has leveraged imitation learning to essentially ‘distill’ the extensive knowledge of large language models such as ChatGPT and GPT-4 into Orca. My interpretation is that the choice of 5 million distilled examples essentially creates a filtration process, condensing and harnessing the most valuable insights from the sea of information these large models have processed.

Remember, these large language models have been trained on an incredibly diverse range of data, which includes valuable knowledge, but also less useful or even misleading information. The challenge in training these models has been to sift through this ‘noise’, identifying the truly useful signals. When we reach training convergence, these models have either generated a functional approximation of the data’s meaning, or perhaps have pruned the absurdities in the dataset by plateauing towards the end of the training process, thereby not allocating further weight to insignificant data.

To provide an analogy, consider the breadth of Wikipedia, with its 6 million English articles. Only a fraction of these, about 0.1% (or 6000), are ‘Featured Articles’, denoting the highest quality. I would compare the approach of this research to understanding the world through these 6000 best articles. By focusing on content that has been consensually deemed as superior, you likely cover a vast spectrum of world knowledge.

It’s as if the Orca model is absorbing the distilled wisdom of the large language models, somewhat akin to a student learning from a world-class tutor. And while this may not fully capture the intricate reasoning process of the original models, it clearly leads to significant improvements in performance, as demonstrated in the zero-shot reasoning benchmarks and academic tests mentioned in the paper. This suggests that learning from step-by-step explanations, whether they originate from humans or advanced AI models, holds great promise for advancing model capabilities and skills.

@GuinessOriginal says:

June 7, 2023 at 7:28 pm

I suspect you’re on the right track, but it might be more akin to the simpler 80-20 rule. It’s likely that 80% of user requests can be met using only 20% of the training data. The exact numbers aren’t important but you get my point. I wonder how meta’s megabyte hierarchical architecture will affect this too, it looked very interesting to me.

Reply
@ivan24zg says:

June 7, 2023 at 8:06 pm

Once a neural network “groks” something you can remove activations that were used only for memorization before generalization occurred. But these redundant activations have NOT been pruned from models like ChatGPT after the training is done, they are still there. The distillation process done by ORCA fast-tracks the generalizations, ultimately reducing the size of the network needed to learn something. ChatGPT can probably be reduced to 10% of its size (or less) if they pruned it after the training is done. The 50%-400% gains that ORCA has over Vicuna are absurd, and indicates that we are nowhere near the diminishing returns threshold. Once all the algorithmic optimizations are done, the consumer-grade LLMs will probably end up more powerful than anyone ever imagined. OpenAI is free to spend millions to train the network, but they CANNOT protect extraction of the knowledge from the network to produce better models. And that guy at the end claiming that there will always be a “gap” is deluded – we only need to produce AGI *once*, and after that it’s in a self-driving mode.

Reply
@GuinessOriginal says:

June 7, 2023 at 8:25 pm

@@ivan24zg is what talking about similar to having a sparse architecture in your neural network or is that something completely different?

Reply
@sebastianjost says:

June 7, 2023 at 8:27 pm

While my initial reaction was to expect some major knowledge gaps in Orca, I now noticed, that humans are often taught in a similar way.

Consider a mathematical theorem. There are often many different proofs developed and refined over hundreds of years. Students are usually taught the most elegant proof, not the original or worse: all the different proofs. While the remaining proofs could stilll provide valuable insights, if you see enough proofs of different theorems, you should still learn sufficiently much. That’s what we seem to expect from humans at least.

While the objectiveness in maths is rarely found in other areas, the same principle should apply to other areas as well. I think that helps in understanding why Orca is so good/ competetive.

Reply
@clray123 says:

June 7, 2023 at 8:58 pm

I wonder if Orca also loves to go in loops like a mad parrot when you disable sampling and go for greedy token generation. This sort of absurd behavior (with probability of the next tokens being amplified by whatever the model has already spit out) makes me quite skeptical of whether the small models really are very “smart” … or just more successful at parroting what’s contained in the smart model.

Reply

@octia2817 says:

June 7, 2023 at 4:57 pm

I would love more open-source model content!

@schumzy says:

June 7, 2023 at 5:27 pm

+1 and made a comment regarding same before seeing this. I think it’s extremely naive of him regarding the OpenAi and Google to only focus on those companies, yet the real innovation is happening in opensource

Reply
@jawgboi9210 says:

June 7, 2023 at 6:22 pm

Agreed, more information on open source would be brilliant!

Reply
@marshallmcluhan33 says:

June 7, 2023 at 6:28 pm

uncensored models mean more than people think…

Reply
@anand0212 says:

June 7, 2023 at 6:30 pm

Definitely interested in this content

Reply
@hqcart1 says:

June 7, 2023 at 6:34 pm

what is the open source link?

Reply

@ahtoshkaa says:

June 7, 2023 at 5:07 pm

Yes please! More about open source LLMs would be great from you since you study everything is so much detail. Don’t be too hasty to edit stuff out. I’m sure people will love listening to a 30-40+ minute video when it comes from you.

@TarninTheGreat says:

June 7, 2023 at 8:13 pm

Oh yeah, I’d watch hour long videos from him every day. I don’t think that’s what he wants to make; but yeah man, don’t worry about length, you’re making the best content on the subject, how long it goes is not a concern.

Reply
@StevenAkinyemi says:

June 8, 2023 at 5:58 pm

I would too. Absolutely

Reply
@OlebileWareus says:

June 10, 2023 at 2:20 pm

i would

Reply
@ParameterGrenze says:

June 11, 2023 at 1:03 am

Agreed. Your viewers are not in the 10 min attention span crowd. Don’t try to game your vids from best practices optimized for mainstream viewers.

Reply

@nihilistoner says:

June 7, 2023 at 5:11 pm

You know this already, but we all appreciate your work so much. Thank you! 🙂

@GoldenBeholden says:

June 7, 2023 at 5:16 pm

This field has been an absolute joy to be a part of these past few months.

@iverbrnstad791 says:

June 7, 2023 at 9:24 pm

Absolute terror.

Reply
@reeven1721 says:

June 7, 2023 at 10:33 pm

@@iverbrnstad791 Why not both? 😀

Reply
@mth469 says:

June 8, 2023 at 3:52 am

how does one get into this
as an outsider looking in?

are there any models with APIs that allow data to be fed in and results being spit out.

Reply
@EnderViBrittania says:

June 9, 2023 at 6:34 pm

@@iverbrnstad791 Ok, doomer. Get some meds.

Reply
@iverbrnstad791 says:

June 10, 2023 at 7:33 am

@@EnderViBrittania I’m just not naive to the implications of this, unlike dreamers who drop all critical thought at the prospect of having a robo waifu.

Reply

@reinerheiner1148 says:

June 7, 2023 at 5:53 pm

This paper is more important then one might think. It could lead the way to ai learning like humans. Because it already shows that a) learning easier stuff before more difficult stuff improves learning even in an LLM and b) that the better the explanation for an answer, the more it will understand through reasoning. Going down that path could mean hugely decreased training times while further improving the llms reasoning capabilities. It is like memorizing vs understanding. And yes please talk more about open source llms. Especially the ones that work with langchain. Thanks for the video!

@Silduril says:

June 7, 2023 at 8:36 pm

Exactly what I was thinking! Super exciting stuff 😀

Reply
@clray123 says:

June 7, 2023 at 9:07 pm

It is naive to assume that the AI is “learning like humans”. Humans do not learn by memorizing millions of text examples.

Reply
@jeff__w says:

June 8, 2023 at 12:56 am

@@clray123 Absolutely. And I doubt the model “understands” anything by reasoning. What these explanations do is allows the model to refine its neural net so that its verbal output emulates the verbal behavior that we would call “reasoning” in humans.

Reply
@reinerheiner1148 says:

June 8, 2023 at 10:12 am

@@clray123 you did not understand my message. What I was getting at was that currently, yes, the AI is memorizing by going though loads of data unlike humans. But this paper shows that memorizing is inferior to understanding, aka for each sample to provide extensive reasoning so the AI can learn why the answer is correct. This and the fact that the AI learns better when providing simple problems before more complex ones shows that the exclusive brute force approach is inferior. Any researcher getting this paper will realize that there is probably a lot of potential to optimize training, so that the model will need less training to learn the same thing. Which in theory could lead to models learning just as fast as humans, if we are able to optimize the training methods (and probably the model structure) enough. After all, its possible, because humans can do it. And I have an example for you how learning can be massively improved when it comes to another branch of machine learning, reinforcement learning: look at openais paper on hindsight training, which hughely decreased the amount of samples needed to learn a task. Its not a far stretch that we can have similar progress with LLMs. So yea, I don’t think I am naive… But I am well aware of where we are right now.

Reply
@reinerheiner1148 says:

June 8, 2023 at 10:19 am

@@jeff__w in the end reasoning is also a consequence of refining a neural network in humans as well so… You oversell humans. Yes we are superior, but its not magic. At the end of the day we are the sum of our learned information plus biological biases (hormones, neurotransmitters, brain structure). Which is not plesant for many to realize, because that limits the concept of free will as well.

Reply

@msylvestre says:

June 7, 2023 at 6:05 pm

I really feel lucky to have found your channel. I genuinely think it’s the best source for no-shill, in-depth, AI and LLMs news.

@RandyHawkinsMD says:

June 8, 2023 at 5:06 pm

I couldn’t agree more.

Reply
@skierpage says:

June 9, 2023 at 5:50 am

This channel is really very good. There’s also Yannick Kilcher’s ML News, which is a sporadic excellent summary of what’s been going on not just in papers but new software releases and and other AI-related goings-on. He also does deep dives into particular papers.

Let us not speak of Two Minute Papers and KZ-F’s mindless recycled eye-candy visuals with no information about the paper itself or what’s novel in it.

Reply
@Fru1tpunch says:

June 10, 2023 at 5:59 am

oh god since ai is in the news now it feels like all the channels are like crypto shillers

Reply
@ToriKo_ says:

June 24, 2023 at 8:20 pm

+

Reply

@Cacti_hipster says:

June 7, 2023 at 6:09 pm

The web of lies at 11:05 flew over my head until I thought of it more as a recursive base case. Mad props to this team!

@lukeg1680 says:

June 7, 2023 at 7:30 pm

your content is deeply moving, thank you. I get this sense of vertigo, as the ground shifts beneath our feet, steadied only by your academic tone and deep commitment to the facts. I’ve never seen aww-inspiring breaking news told through academic papers before, gripping.

@harveyhutsby7697 says:

June 7, 2023 at 9:24 pm

I have a large appetite for this kind of content and you are by far the best source I’ve found on youtube, so personally I would like to see more.

@sbondi says:

June 7, 2023 at 9:31 pm

Actually, as to the “more open source?”comment because “you have a lot more to say about it”, I say “YES, PLEASE”! Everything you say in every video is done with such intelligence and quality that it ALWAYS has great value! I can see that you put your heart into these videos, and I really appreciate all the heavy lifting that you are doing for all of us! 😃

@aiexplained-official says:

June 7, 2023 at 10:01 pm

Thank you Bondi, so kind

Reply

@alpha007org says:

June 7, 2023 at 9:55 pm

Stephen Wolfram explained his thoughts about LLMs, where he says that LLM models (like c.gpt) can be distilled down to a much smaller size. It was on Lex Friedman podcast.

@deciphrai says:

June 8, 2023 at 6:04 am

Timestamps courtesy of Deciphr AI 👨‍💻

0:02:32 – Orca’s 13 billion parameters
0:04:12 – Orca leveraged system instructions
0:05:58 – Task complexity and diverse examples
0:07:18 – Orca matches text DaVinci 3 in the SAT, LSAT, GRE and GMAT
0:08:03 – Orca reaches parity with ChatGPT
0:08:39 – Microsoft’s involvement in Orca’s research
0:09:04 – Orca vs Vicuna
0:11:28 – Orca in common sense reasoning questions
0:12:10 – Orca’s potential for improvement
0:15:22 – Gap between open source and private models
0:16:32 – Sam Altman’s perspective on OpenAI’s unique moat
0:17:23 – Possible future videos on open source models

@SingularitySenses says:

June 8, 2023 at 8:29 am

How can I use Deciphr AI to make timestamps on other YouTube videos?

Reply
@dik9091 says:

June 9, 2023 at 4:15 am

thnx man I need that conceptually

Reply
@sizui1650 says:

June 10, 2023 at 5:10 pm

wtf

Reply
@bomberfish77 says:

June 10, 2023 at 5:58 pm

how

Reply
@mcombatti says:

June 11, 2023 at 12:24 am

Open source? Where’s the model download?

Reply

@TheMirrorslash says:

June 8, 2023 at 11:07 am

I have a feeling that your theory about why microsoft conducted this research is spot on. The fact that using LLMs to train other models was called “a false promise” to begin with is wild. It feels 100% like the logical step you’d take to build on existing models. And the fact models can be “robbed” like this just shows that this technology will be everywhere no matter what format you release it in.

@electrolove9538 says:

June 10, 2023 at 1:42 pm

It’s still a bit bizzare that MS would publish these findings. Why not keep them private? It also may rub OpenAI the wrong way and hurt their relationship…still doesn’t make sense to me 🤔

Reply
@TheJackiMonster says:

June 12, 2023 at 8:05 pm

I think it definitely shows that companies can not really sell their model as easy as they thought with current legality around AI training. But then the problem is their models are all build around the fact they could train it without caring about copyright of the information used as training sets.

So either everyone gets robbed or AI might not be profitable because paying for copyright will be too expensive when it comes to generalized networks.

Either way it’s interesting to watch, I think.

Reply
@mohammednisham7126 says:

June 13, 2023 at 11:37 am

@@TheJackiMonster it might be more like foundational models won’t be profitable, but applications built on top of them definitely could be

Reply

@BodyMusicification says:

June 8, 2023 at 11:54 am

Please more coverage on open source models. This is the most uplifting, hope-inspiring video I’ve watched of yours yet.

@RandyHawkinsMD says:

June 8, 2023 at 5:20 pm

This channel constitutes a highly valued way for me to benefit from our moderator’s experience and efforts. His insights are thoughtful, his research is current, and both guide my investigations. Many thanks. And by the way, open source developments are of particular interest.

Orca: The Model Few Saw Coming

Related Posts

Joe Lilli