Some ideas for what comes next

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Some ideas for what comes next

無料で聴く

ポッドキャストの詳細を見る

このコンテンツについて

https://www.interconnects.ai/p/summertime-outlook-o3s-novelty-comingSummer is always a slow time for the tech industry. OpenAI seems fully in line with this, with their open model “[taking] a little more time” and GPT-5 seemingly always delayed a bit more. These will obviously be major news items, but I’m not sure we see them until August.I’m going to take this brief reprieve in the bombardment of AI releases to reflect on where we’ve been and where we’re going. Here’s what you should know.1. o3 as a technical breakthrough beyond scalingThe default story around OpenAI’s o3 model is that they “scaled compute for reinforcement learning training,” which caused some weird, entirely new over-optimization issues. This is true, and the plot from the livestream of the release still represents a certain type of breakthrough — namely scaling up data and training infrastructure for reinforcement learning with verifiable rewards (RLVR).The part of o3 that isn’t talked about enough is how different its search feels. For a normal query, o3 can look at 10s of websites. The best description I’ve heard of its relentlessness en route to finding a niche piece of information is akin to a “trained hunting dog on the scent.” o3 just feels like a model that can find information in a totally different way than anything out there.The kicker with this is that we’re multiple months out from its release in April of 2025 and no other leading lab has a model remotely like it. In a world where releases between labs, especially OpenAI and Google, seem totally mirrored, this relentless search capability in o3 still stands out to me.The core question is when will another laboratory release a model that feels qualitatively similar? If this trend goes on through the end of the summer it’ll be a confirmation that OpenAI had some technical breakthrough to increase the reliability of search and other tool-use within reasoning models.For a contrast, consider basic questions we are facing in the open and academic community on how to build a model inspired by o3 (so something more like a GPT-4o or Claude 4 in its actual search abilities):* Finding RL data where the model is incentivized to search is critical. It’s easy in an RL experiment to tell the model to try searching in the system prompt, but as training goes on if the tool isn’t useful the model will learn to stop using it (very rapidly). It is likely that OpenAI, particularly combined with lessons from Deep Research’s RL training (which, I know, is built on o3), has serious expertise here. A research paper showing a DeepSeek R1 style scaled RL training along with consistent tool use rates across certain data subsets will be very impressive to me.* The underlying search index is crucial. OpenAI’s models operate on a Bing backend. Anthropic uses Brave’s API and it struggles for it (lots of SEO spam). Spinning up an academic baseline with these APIs is a moderate additive cost on top compute.Once solid open baselines exist, we could do fun science such as studying which model can generalize to unseen data-stores best — a crucial feature for spinning up a model on local sensitive data, e.g. in healthcare or banking.If you haven’t been using o3 for search, you really should give it a go.Interconnects is a reader-supported publication. Consider becoming a subscriber.2. Progress on agents will be higher variance than modeling was, but often still extremely rapidClaude Code’s product market fit, especially with Claude 4, is phenomenal. It’s the full package for a product — works quite often and well, a beautiful UX that mirrors the domain, good timing, etc. It’s just a joy to use.With this context, I really have been looking for more ways to write about it. The problem with Claude Code, and other coding agents such as Codex and Jules, is that I’m not in the core audience. I’m not regularly building in complex codebases — I’m more of a research manager and fixer across the organization than someone that is building in one repository all the time — so, I don’t have practical guides on how to get the most out of Claude Code or a deep connection with it that can help you “feel the AGI.”What I do know about is models and systems, and there are some very basic facts of frontier models that make the trajectory for the capabilities of these agents quite optimistic.The new part of LLM-based agents is that they involve many model calls, sometimes with multiple models and multiple prompt configurations. Previously, the models everyone was using in chat windows were designed to make progress on linear tasks and return that to the user — there wasn’t a complex memory or environment to manage.Adding a real environment for the models has made it so the models need to do more things and often a wider breadth of tasks. When building these agentic systems, there are two types of bottlenecks:* The models cannot solve any of the task we ...