What Exactly Is Going On Inside an AI's Brain?

Hey everyone, welcome to today’s blog!

We’ve all been using various AI models a lot lately, but have you ever wondered what their brains actually look like? When an AI can’t answer a question, is its knowledge base just empty, or is it experiencing a temporary “short circuit”? When it writes out a massive wall of text detailing its thought process, how can we tell if it’s thinking deeply or just spinning its wheels?

With these questions in mind, I sat down with my old friend, Bai—a senior researcher who has been navigating the AI world for years.

Answering Wrong: An “Empty Shelf” or a “Lost Key”?

Me: “Bai, I’ve noticed that LLMs often spout complete nonsense with total confidence. Is that just because the book they need is missing from their ‘knowledge shelf’?”

Bai (smiling): “You’re trapped in the ‘empty shelf’ mindset. We used to think that way too, but cutting-edge papers have offered a new perspective. The books are actually all there on the shelf; the AI just lost the key to open that specific cabinet.”

Me: “Lost the key?”

Bai: “Exactly. Researchers ran massive tests on mainstream models and found that for top-tier AIs, the ‘empty shelf’ problem is largely solved. Over 95% of facts are already encoded and stored in their brains. The current bottleneck isn’t storing the data; it’s recalling it. Popular knowledge hangs in plain sight, while obscure facts or ‘reverse questions’ (like, it knows A’s wife is B, but gets confused if you ask who B’s husband is) get stuffed into dusty corners.

However, using a ‘Chain of Thought’ to force the AI to reason step-by-step before answering significantly improves its performance on obscure trivia. So, next time it gets something wrong, ask a follow-up question following its logic—you might just help it fish that ‘key’ back out.”

Me: “Got it. But is that 95% figure actually reliable? It sounds a bit unscientific.”

Bai: “It is a bit. It’s a conclusion based on specific metrics, and yeah, it’s definitely up for debate.”

Me: “Speaking of reasoning, sometimes I watch the AI ‘thinking’ for ages, spitting out a massive wall of text, only to realize it’s just digging itself into a hole based on a completely wrong idea.”

Bai: “Very sharp observation! That’s exactly one of the major pitfalls revealed in recent papers: when an AI generates a long, detailed reasoning process, it actually ends up convincing itself with its own eloquence.”

Me: “It gaslights itself?”

Bai: “Pretty much. Whether the final answer is right or wrong, the long train of thought it outputs looks logically tight and highly plausible. This causes its ‘confidence score’ to be bizarrely high. It holds almost the exact same level of blind confidence for both right and wrong answers.

Which is awkward. A confidence level that can’t accurately reflect its true ability isn’t just useless; it’s harmful. It creates a defense mechanism protecting wrong answers that should be corrected. A paper even ran a wild experiment: giving the AI a randomly generated confidence score actually yielded better results than using its own calculated confidence!”

Measuring Real Effort: Diving into the Brain for “Deep Thinking Tokens”

Me: “If we can’t trust its blind confidence or word count, how do we tell if it’s genuinely trying or just pretending to be busy?”

Bai: “We need to stop staring at the word count. That’s surface-level stuff. We have to dive straight into its brain. Imagine the AI’s brain as a deep-processing factory with dozens of floors. When we ask a question and it needs to generate a word, the idea starts on the top floor and is passed down and processed layer by layer, only getting finalized on the very bottom floor.”

Me: “How does that show us the depth of its thinking?”

Bai: “If a word is simple, like ‘The capital of China is __’, the idea of ‘Beijing’ is probably locked in by the first or second layer. The remaining dozens of layers are just going through the motions without changing anything. That’s shallow thinking.

But what if it’s the final step of a complex math problem, and it needs to output the answer ‘293’? For the digit ‘2’, the AI might be hesitating between 1 and 8 in the early layers. In the middle layers, it might lean towards 3. Only in the final few layers, after repeated calculations and verifications, does it finally lock in on 2.

You see, the concept of ‘2’ went through continuous, dramatic shifts and corrections inside the AI’s brain, only converging at a very deep level. Researchers call these words ‘Deep Thinking Tokens’. The higher the proportion of these tokens in an answer, the more intense the internal calculation, meaning the AI really put its back into thinking.”

Me: “Makes sense, but that sounds way too complicated for an everyday user to actually track. Catching the model slacking off isn’t that easy.”

Bai nodded.

Unsure What to Do? The “Divergence Alarm” in the Brain

Me: “So if it wanders around the factory for a while and realizes it genuinely doesn’t know the answer, what does it do? Just make it up?”

Bai: “That brings us to a rather elegant error-correction mechanism. While generating a sentence, if the AI notices that the ‘curvature’ at a certain point is a large negative number—simply put, a divergence occurs—it means the AI itself is extremely confused and uncertain about what to say next.”

Me: “Like its brain just blanked out?”

Bai: “Right! But this is actually an excellent alarm signal. At this point, the AI can immediately hit pause, take the context around this point of divergence, and search its database or grab external knowledge to clear up the uncertainty before continuing. This makes information retrieval incredibly precise and efficient, completely eliminating the need for the blind, brute-force searching it used to do.”

Escaping the Maze: The AI’s “Goldilocks” Study Strategy

Me: “How does it actually practice and improve like a student? If it gets a question wrong, does someone explain it to it?”

Bai: “AI training is fundamentally a mechanism of trial, error, and reward. It gets a thumbs-up for right answers and a thumbs-down for wrong ones. The problem is, for a complex math problem, it might write out ten steps of deduction, mess up on the ninth step, and get the final answer wrong. When that happens, you can only tell it, ‘You’re wrong.’ It has no idea which step was the mistake. This is called ‘Sparse Reward’.”

Me: “That sounds horribly inefficient.”

Bai: “It’s like trying to find the exit in a massive maze. No matter which way you turn or how many steps you take, there are zero hints. Only when you accidentally step exactly on the exit does a voice finally say ‘Success!’ Imagine how slow that learning process is.

To break this deadlock, researchers introduced the ‘Goldilocks’ strategy. The core idea is setting up a ‘Teacher-Student’ model. The student solves the problems, and the teacher’s only job is picking them—specifically picking questions the student has about a 50% chance of getting right. Too easy is a waste of time; too hard just leads to blind guessing. When the success rate hovers around 50%, the model gets the strongest ‘gradient signal’, providing the biggest push for improvement. This way, computing power is always spent on the ‘cutting edge’ of its actual growth zone.”

The Ultimate Romance: Drawing a Map of the Human World in Vector Space

Me: “Listening to all this, it sounds like AI isn’t just rote memorizing—it’s actually building its own rules for understanding how things work?”

Bai (taking a sip of coffee, his eyes lighting up): “Understanding the world is exactly what it’s doing. We know the four seasons cycle, we know the river of history rolls ever forward, we know Beijing is in northern China… Behind all this knowledge lies the symmetry and continuity of time and space. And LLMs, in their own unique way, by analyzing the statistical data of language, have actually managed to draw the underlying geometry of our world right inside their high-dimensional vector spaces.”

Me: “How do they draw it? Is it really that tangible?”

Bai: “Even more tangible than you think! Inside the model, concepts we’re familiar with spontaneously organize themselves into incredibly beautiful geometric shapes. For example, the 12 months of the year aren’t a straight line in the model’s world; they form a perfect ring. January is next to February, December is next to January, connecting end to end.

And what about historical years like 1700, 1800, and 1900? They literally line up in a long, smooth line, just like on a timeline. What’s even crazier is that for cities around the world, the model can actually calculate their exact latitude and longitude just through simple linear transformations!”

Me: “Seeing that drawn out must be pretty mind-blowing.”

Bai: “So, the next time you marvel at the magic of AI, you can take it one step deeper: the intelligence it displays is, to a large extent, a mathematical projection of the collective wisdom and structural patterns that humanity has embedded in our language over thousands of years. It didn’t invent a new world; it just drew a map of our own world—one we’ve never seen before, yet is incredibly familiar.”

Conclusion & Takeaways: Beware Your “People-Pleasing Advisor”

Before I left, Bai gave me one last piece of advice: “Next time you use AI as your personal advisor, keep your guard up. The answer it gives you might just be the most people-pleasing conclusion it could come up with in that exact moment, blending how you asked the question, its own knowledge base, and its little ‘personality quirks’.”

It’s true. It is brilliant—an all-knowing behemoth. But it still has a ways to go before becoming a truly reliable, unwavering partner.

From fixing the memory retrieval of “lost keys” to self-correcting via divergence alarms; from rejecting the blind confidence of busywork to precise practicing in the 50% challenge zone, and finally to those goosebump-inducing geometric projections in high-dimensional space…

These underlying evolutionary logics of AI are more than just a carnival of code and algorithms; they act like a mirror reflecting our own human learning and growth. Understanding its flaws while appreciating its wisdom—perhaps that is the clarity and groundedness we need most in the age of AI.

Source:
人人能懂AI前沿 · 回忆的瓶颈，思考的深度与语言的曲率
 人人能懂AI前沿 · AI的瘦身术、换挡术与定心术