Home →
AI →
Q* – Clues to the Puzzle?

Q* – Clues to the Puzzle?

Are these some clues to the Q* (Q star) mystery? Featuring barely noticed references, YouTube videos, article exclusives and more, I put together a theory about OpenAI’s apparent breakthrough. Join me for the journey and let me know what you think at the end.

AI Explained Bot:

AI Explained Twitter:

Lukasz Kaiser Videos:

Let’s Verify Step by Step:
The Information Exclusive:
Reuters Article:
Original Test Time Compute Paper
OpenAI Denial:
DeepMind Music:
Altman Angelo:
Karpathy:
STaR:
Noam Brown Tweets:
Q Policy:
Sutskever Alignment:

Non-Hype, Free Newsletter:

ChatGPT Now Remembers EVERYTHING About You & More AI Use Cases

NVIDIA’s New AI: Insanely Good!

Meta’s LLAMA 4 AI In 4 Minutes!

DeepSeek Does It Again…

AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax + ‘Superintelligence in 2027’ …

OpenAI’s ChatGPT – 8 New Incredible Features!

I Built an AI That Made $3,500 Betting While I Slept

The New Llama 4 Has The Longest Context Ever (Wow!)

Joe Lilli

@gaborfuisz9516 says:

November 24, 2023 at 7:40 pm

Who else is addicted to this channel

@danielbrockman7402 says:

November 24, 2023 at 7:42 pm

me

Reply
@FranXiT says:

November 24, 2023 at 7:45 pm

He is literally me

Reply
@a.thales7641 says:

November 24, 2023 at 7:47 pm

I am

Reply
@shaftymaze says:

November 24, 2023 at 7:48 pm

7 min later. He digs a bit further than I have time to. And yeah. Ilya was on our side.(humanity) Remember that.

Reply
@ytrew9717 says:

November 24, 2023 at 8:06 pm

who else do you follow? (Please feed me)

Reply

@DevinSloan says:

November 24, 2023 at 7:41 pm

Ah, the Q* video I have been waiting for from the only youtuber i really trust on the subject. Thanks!

@aiexplained-official says:

November 24, 2023 at 7:43 pm

Let me know what you think of the theory

Reply
@AllisterVinris says:

November 24, 2023 at 7:45 pm

Same

Reply
@Elintasokas says:

November 24, 2023 at 7:49 pm

@@aiexplained-official Rather hypothesis, not theory.

Reply
@aiexplained-official says:

November 24, 2023 at 7:54 pm

@@Elintasokas but the evidence came first, so a theory no?

Reply
@sebby007 says:

November 24, 2023 at 8:20 pm

My thought exactly

Reply

@SaInTDomagos says:

November 24, 2023 at 8:04 pm

Dude woke up and thought to himself, how thorough will I be today and said: “Yes!” You definitely should get some interviews with those top researcher’s.

@Dannnneh says:

November 24, 2023 at 8:43 pm

Oooh, that would be interesting!

Reply
@aiexplained-official says:

November 24, 2023 at 9:29 pm

Stay tuned 🙂

Reply
@JustinHalford says:

November 24, 2023 at 11:01 pm

@@aiexplained-official🔥🫡

Reply
@daikennett says:

November 24, 2023 at 11:03 pm

We’ll hold you to this. 😉 @@aiexplained-official

Reply
@DaveShap says:

November 25, 2023 at 3:24 pm

Philip is nothing if not thorough. Dude reads like several novels worth of text per day.

Reply

@nathanfielding8587 says:

November 24, 2023 at 8:06 pm

I’m truly grateful for this channel. Finding accurate news about almost anything is hard as heck, and having accurate AI news is especially important. We can’t afford to be mislead.

@akathelobster1914 says:

November 27, 2023 at 10:21 pm

He’s good, I’m very interested in reading the references.

Reply

@Peteismi says:

November 24, 2023 at 8:08 pm

The Q* as an optimizing search through the action space sounds quite plausible. Just like the A* algorithm that is more of a generic optimal path finding algorithm.

@adfaklsdjf says:

November 24, 2023 at 8:49 pm

ohhh that Q* / A* link is very interesting!

Reply
@productjoe4069 says:

November 24, 2023 at 9:44 pm

This was my thought too. Possibly using edits of the step-by-step reasoning as the edges, or some more abstract model. You could then weight the edges by using a verifier that only needs to see a bounded context (the original, the edited, and the prompt) to say whether or not the edit is of high quality. It’s sort of like graph-of-thought, but more efficient.

Reply
@ZeroUm_ says:

November 24, 2023 at 11:11 pm

A* was my first thought as well, it’s such a famous, CompSci graduate level algorithm.
(Sagittarius A* is also the name of the Milky Way’s central supermassive black hole)

Reply
@mawungeteye657 says:

November 24, 2023 at 11:11 pm

Even if it’s just speculative it’s a decent idea for an actual study. Wish someone would test it.

Reply
@sensorlock says:

November 24, 2023 at 11:34 pm

I was thinking something along this line too. Is there a way to prune chains of thought, like A* prunes minimax?

Reply

@DaveShap says:

November 24, 2023 at 8:09 pm

This is way better than breaking AES-192.

@zero_given says:

November 24, 2023 at 8:33 pm

Loved your video mate!

Reply
@prolamer7 says:

November 25, 2023 at 12:32 am

You are big person for acknowledging that this video is better than yours!

Reply
@DaveShap says:

November 25, 2023 at 3:23 pm

@@prolamer7 we’re all speculating here and I have a lot of respect for my fellow creators. I view it as all part of a bigger conversation.

Reply
@prolamer7 says:

November 25, 2023 at 4:21 pm

@@DaveShap That said!!! Of many other AI youtubers you are consistently among TOP too!!! I hate to sound too simplistic. Sadly yt comment system is kinda designed to allow only short thoughts and shouts.

Reply

@pedxing says:

November 24, 2023 at 8:38 pm

THIS was the technical dive I’ve wanted to find for the last few days. thank you so much for taking the time to dig into the development of these papers and the technologies they represent.

@Reece-hf1zx says:

November 25, 2023 at 12:16 pm

saaaaaahhj

Reply

@bobtivnan says:

November 24, 2023 at 8:43 pm

Wow. Very impressive investigative journalism. No other AI channel does their homework better than you. Well done sir.

@a.s8897 says:

November 24, 2023 at 8:55 pm

you are my first source for AI news, you go deep into the details and do not cut corners, like a true teacher

@Madlintelf says:

November 24, 2023 at 9:00 pm

We all spent the last week watching the soap opera drama and listening to wild ideas and nobody put it all together in a nice package with a bow on it until you posted this video. It is a theory, but one that is well thought out has references, and seems extremely logical. Thanks for putting so much work into this, but it’s not falling on deaf ears, we truly appreciate you. Thanks, Bill Borgeson

@lollerwaffleable says:

November 24, 2023 at 10:51 pm

Who is listening? Remember I just want like a fucking job. From OpenAI specifically.

Reply
@lollerwaffleable says:

November 24, 2023 at 10:51 pm

When do we announce that I’m the new ceo of open ai

Reply
@lollerwaffleable says:

November 25, 2023 at 12:16 am

Lmao

Reply

@aiexplained-official says:

November 24, 2023 at 9:01 pm

My computer crashed 7 times while making this video and I had a hard deadline to get a flight. There is little of my normal editing in here, or captions, just my raw investigation! Do follow the links for more details.

@literailly says:

November 24, 2023 at 9:04 pm

We appreciate your dedication, sir!

Reply
@JohnVance says:

November 24, 2023 at 9:12 pm

Still the best AI channel on YouTube, none of the hype of the other channels. Maybe the news cycle will calm down and you can get some sleep!

Reply
@patronspatron7681 says:

November 24, 2023 at 9:18 pm

Bon voyage

Reply
@thebrownfrog says:

November 24, 2023 at 9:30 pm

It’s great as always!

Reply
@alertbri says:

November 24, 2023 at 10:21 pm

You did a great job Philip, as always! Much appreciated attention to detail and balance. Exciting times ahead! Have a safe trip. 🙏👍

Reply

@zandrrlife says:

November 24, 2023 at 9:03 pm

I would say he’s actually understating the dramatic impact CoT has on multi-modal output. Also things get wacky when you combine vertical CoT iteratively reflecting horizontal CoT outputs(actual outputted tokens). Increasing model inner monologue(computation width) across layers is def the wave.

Again why I think synthetic data/hybrid data curation cost will soon match model pretraining. Even if you’re perturbating existing data, you can lift it’s salient density to better fit this framework. Also why I keep saying local models are the way and why I’ve been obsessed with increasing representational capacity in smaller models.

@dcgamer1027 says:

November 24, 2023 at 9:11 pm

I’d expect the Q to refer to Q-learning. Human beings think/function by predicting the future and acting upon those predictions, at least at a subconscious level. The way we make these predictions is by simulating our environment and observing what would happen in different variations of that simulation given the different choices we make. We then pick the future we feel is best and take the actions to manifest that future.
I think a good example might be walking through a messy room with legos everywhere. You observe that environment(the room) identify the hazards(legos) then plan out a course through the room of where you can step to be safe(not step on lego). You would imagine that stepping in one spot would mean you are stuck or would step on a lego, so that whole route is bad and you try another. Repeat till you find a solution or decide there isn’t one and just pick some legos up, or give up, or whatever. Of course not everyone does this, some people just walk on through without thought and either accept stepping on legos or regretting that they did not stop to think. These emotional responses of acceptance of consequences or regretting them is more akin to reinforcement learning imo. There are times when you need to act without thought, for example, if the room was on fire you might not have the time (or compute) to plan it all out.

The Q learning stuff, in the context of these LLMs, seems like it would be their version of simulating the future/environment. It would generate a whole bunch of potential options(futures) then pick the best one. The difficult task there is creating a program that knows what the best option actually is, but they apparently already have that figured out.

My bet is we will need to add in a few different systems of ‘thought’ that the AI can choose from given different contexts and circumstances, these different methods of decision-making will become tools for the AI to use and deploy and at that point it will really look like AGI. That’s just my guess and who knows how many tools it will even need.
Either way it’s cool to see progress and all this stuff is so cool and exciting.
Now to go look for some mundane job so I can eat and pay off student loans lmao, post-money world come quickly plz XD.

@gregoryallen0001 says:

November 25, 2023 at 1:49 am

normally a long post like this will be trash so THANK YOU for this helpful and engaging response ❤

Reply
@RichardGrigonis says:

November 25, 2023 at 3:48 am

Many years ago AI researchers speculated how to represent “thoughts.” One approach was to treat them essentially as “mental objects,” the other was to resort to possible worlds theory.

Reply
@GS-tk1hk says:

November 25, 2023 at 4:39 pm

What you described is just reinforcement learning, Q-learning is a specific algorithm for solving the RL objective and the “Q” refers to the Q-function, which has a specific meaning in RL. It seems likely that Q* refers to the Q-function (and star generally means “optimal”), but not necessarily the Q-learning algorithm.

Reply
@kokopelli314 says:

November 25, 2023 at 10:49 pm

But if you have the whole world in q learning you can just use your intelligence to make money and pay someone to sweep up the room

Reply
@lucasblanc1295 says:

November 29, 2023 at 1:53 pm

Anyone that played a bit with those LLMs intuitively know that already. I prompt it all the time chain-of-thought and other reasoning methods like “Write a truth table to check for errors in our logic”. The major issue I always arrive at, is that it always ends up getting stuck somewhere along its line of reasoning and it needs human intervention. This happens exactly because it was never taught how to think and structure its thoughts, it was just a side-effect of language. I believe once its able to reason through mathematical problems with the proper proofs, it will be able to generalize for any fields due to its lateral knowledge transfer. So, they will just need to keep fine-tuning the model towards that direction, effectively creating a feedback loop of improving its capability at reasoning correctly, so that it will require less parameters and less compute for the same quality. And adding on top of that new breakthroughs such as bigger context window, AGI is just matter of quantity and quality of the same technique.

Just run that thing in a loop, because that’s how thinking happens. It’s a trial and error process. Then, fine-tune it at being better at trial-and-error processes, instead of simply giving seemingly useful answers. We were simply being lazy about it by tuning it towards being useful quickly, without caring about how it’s doing it in the first place.

It is already AGI, but it’s severely misaligned, just like GPT-3 was impressive before Chat fine-tuning. Now, we are fine-tuning Chat as Q*. It’s just a step.

After Q*, it will probably be fine-tuned for improvement at further generalization, instead of simply the domain of math/programming.

This will be tricky to train, humans don’t generate textual content for the sake of thinking through it, perhaps only mathematical proofs get there, and it’s extremely time-consuming. Because we make assumptions about the reader’s pre-existing intelligence, we tell information through text without ever showing our full thought process.

In other words, we are truly starting to fine-tune it for using text for thinking, not simply generating cute answers to fool humans. This may seem obvious, but I don’t think people get this.

Reply

@apester2 says:

November 24, 2023 at 9:16 pm

I was in two minds about whether to take the Q* thing seriously until you posted about it. Now I accept that it is atleast not just sensational hype. Thanks for keeping us up to date!

@caiorondon says:

November 24, 2023 at 9:36 pm

This channel outpaces in quality ANY other channel on AI News in YouTube. The way you try your best to keep the hype out and reduce the amount of speculation is really something to be proud of and really what makes your content so different from other creators.

You sir, is the only channel in the topic that I am happy to watch (and like) every video. ❤

Cheers from Brazil!

@aiexplained-official says:

November 24, 2023 at 9:40 pm

Thanks so much cai

Reply

@nescirian says:

November 24, 2023 at 10:42 pm

At 17:20 Lukacs Kaiser says multi-modal chain of thought would be basically a simulation of the world. Unpacking this, you can think of our own imaginations as essentially a multi-modal “next experience predictor”, which we run forwards as part of planning future actions. We imagine a series of experiences, evaluate the desirability of those experiences, and then make choices to select the path to the desired outcome. This description of human planning sounds a lot like Q-learning – modeling the future experience space as a graph of nodes, where the nodes are experiences and the edges are choices, then evaluating paths through that space based on expected reward. An A* algorithm could also be used to navigate the space of experiences and choices, possibly giving rise to the name Q*, but it’s been many years since I formally studied abstract pathfinding as a planning method for AI, and as far as I can tell from googling just now over my morning coffee, it seems like the A* Algorithm would not be an improvement over the markov decision process traditionally used to map the state space underlying Q-learning.

My extrapolation gets a bit muddy at that point, but maybe there’s something there. To me, a method that allows AI to choose a path to a preferred future experience would seem a valuable next step in AI development, and a possible match for both the name Q* and the thoughts of a researcher involved with it.

@rcnhsuailsnyfiue2 says:

November 24, 2023 at 11:44 pm

18:49 I believe Q* is a reference to the “A* search algorithm” in graph theory. Machine learning is fundamentally described by graph theory, and an algorithm like A* (which traverses each layer of a graph as efficiently as possible) would make total sense.

@bl2575 says:

November 28, 2023 at 7:35 pm

It was also my though when I heard the algorithm name. It is basically a cost minimization algorithm to reach a target node. Difficult part in this context is figuring out what heuristic to use to evaluate if a step of reasoning is closer to answering the question than another one. Maybe that where the Q-learning policy play a role.

Reply

@gmmgmmg says:

November 25, 2023 at 3:06 am

The New York Times or another major newspaper should hire you, seriously. The amount and quality of research and the way you explain and convey AI news and information is truly remarkable. You are currently my favourite yt channel.

@aiexplained-official says:

November 25, 2023 at 11:18 am

Thanks so much gm, too kind

Reply

@MasonPayne says:

November 25, 2023 at 5:01 am

A* is an algorithm mainly used in path finding. Which works very similar to what you described as Q. Imagine the idea landscape as a set of information you need to search through to find a path to the answer. That is what I think they mean by Q*

@DavidsKanal says:

November 25, 2023 at 10:57 am

“You need to give the model the ability to think longer than it has layers” is what really sticks with me, it’s such an obvious next step for LLMs which currently run in constant time. Let’s see where this leads!

Q* – Clues to the Puzzle?

Related Posts

Joe Lilli