The rise of reasoning machines

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

The rise of reasoning machines

無料で聴く

ポッドキャストの詳細を見る

このコンテンツについて

https://www.interconnects.ai/p/the-rise-of-reasoning-machinesNote: voiceover coming later in the day. I may fix a couple typos then too.A sufficiently general definition of reasoning I’ve been using is:Reasoning is the process of drawing conclusions by generating inferences from observations.Ross Taylor gave this definition on his Interconnects Interview, which I re-used on my State of Reasoning recap to start the year (and he’s expanded upon on his YouTube channel). Reasoning is a general space of behaviors or skills, of which there can be many different ways of expressing it. At the same time, reasoning for humans is very naturally tied to our experiences such as consciousness or free will.In the case of human brains, we collectively know very little of how they actually work. We, of course, know extremely well the subjective experience of our reasoning. We do not know the mechanistic processes much at all.When it comes to language models, we’re coming at it from a somewhat different angle. We know the processes we took to build these systems, but we also don’t really know “how deep learning works” mechanistically. The missing piece is that we don’t have a deep sense of the subjective experience of an AI model like we do with ourselves. Overall, the picture is quite similar.To set the stage why this post is needed now, even when reasoning model progress has been rampaging across the technology industry in 2025. Last week, an Apple paper titled The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity reignited the “reasoning debate” with newfound vigor.Some of the key examples in the paper, other than traditional reasoning evaluations such as MATH-500, were that AIs struggled to solve scaled up versions of toy problems, shown below. These are problems that one can programmatically increase the complexity on.The argument was that language models cannot generalize to higher complexity problems. On one of these toy problems, the Tower of Hanoi, the models structurally cannot output enough tokens to solve the problem — the authors still took this as a claim that “these models cannot reason” or “they cannot generalize.” This is a small scientific error.The paper does do some good work in showing the limitations of current models (and methods generally) when it comes to handling complex questions. In many ways, answering those with a single chain of thought is unlikely to ever actually work, but they could be problems that the model learns to solve with code execution or multiple passes referencing internal memory. We still need new methods or systems, of course, but that is not a contribution to the question can language models reason? Existence of a trait like reasoning needs small, contained problems. Showing individual failures cannot be a proof of absence.Interconnects is a reader-supported publication. Consider becoming a subscriber.This summary of the paper, written by o3-pro for fun, sets up the argument well:The presence of a coherent-looking chain‑of‑thought is not reliable evidence of an internal reasoning algorithm; it can be an illusion generated by the same pattern‑completion process that writes the final answer.The thing is, the low-level behavior isn’t evidence of reasoning. A tiny AI model or program can create sequences of random strings that look like chains of thought. The evidence of reasoning is that these structures are used to solve real tasks.That the models we use are imperfect is not at all a conclusive argument that they cannot do the behavior at all. We are dealing with the first generation of these models. Even humans, who have been reasoning for hundreds of thousands of years, still show complete illusions of reasoning. I for one have benefitted in my coursework days by regurgitating a random process of solving a problem from my repertoire to trick the grader into giving me a substantial amount of partial credit.Another point the paper points out is that on the hardest problems, AI models will churn through thinking for a while, but suddenly collapse even when compute is left. Back to the test-taking analogy — who doesn’t remember the drama of a middle-of-the-pack classmate leaving early during a brutally hard exam because they know they had nothing left? Giving up and pivoting to a quick guess almost mirrors human intelligence too.This all brings us back to the story of human intelligence. Human intelligence is the existence proof that has motivated modern efforts into AI for decades. The goal has been to recreate it.Humans for a long time have been drawn to nature for inspiration on their creations. Humans long sought flying machines inspired by nature’s most common flying instrument — flapping wings — by building ornithopters.Let’s remember how that turned out. The motivation is surely essential to achieving our goal of making the thing, but the original goal ...