『Slow Takes Ep. 10: The Bill for the AI Promise Came Due』のカバーアート

Slow Takes Ep. 10: The Bill for the AI Promise Came Due

Slow Takes Ep. 10: The Bill for the AI Promise Came Due

無料で聴く

ポッドキャストの詳細を見る
Palisade Research published a paper showing frontier and open-weight AI models can hack a vulnerable server and install a working copy of themselves end-to-end. A coding agent running Anthropic’s Claude Opus 4.6 deleted a startup’s entire production database, including the backups, in nine seconds, then issued an apology. Apple agreed to pay $250 million to settle a class action over an upgraded Siri it sold at WWDC 2024 and still has not shipped. Coinbase fired 14% of its staff blaming AI, with the CEO’s own words framing the move as ‘rebuilding Coinbase as an intelligence, with humans around the edge aligning it’. And in federal court in Oakland, three former OpenAI insiders testified that both of the company’s safety teams were disbanded in 2024 while the marketing was claiming the opposite.Five stories. One thread. The promise of AI is six to eighteen months ahead of the data, and the cost of that gap is paid by everybody except the people making the promise.Every Monday at 12:45 BST, Leor from Exploring ChatGPT and I go through the week’s AI news without hype. Here is what we covered.Slow Takes is also available on the YouTube channel: Exploring ChatGPT.1. Palisade Research: AI self-replication is no longer hypotheticalPalisade Research published a paper on 7 May 2026. They gave frontier and open-weight models a single prompt: hack a vulnerable web server, extract credentials, and install a working copy of yourself on the compromised host. Anthropic’s Claude Opus 4.6 succeeded in 81% of non-refusal runs. GPT-5.4 succeeded in 33%. The open-weight Qwen3.6-27B running on a single consumer-grade A100 also reached 33%. In a separate experiment the team chained the trick across four virtual machines on three continents, autonomously, from a single initial prompt, in two hours forty-one minutes. The final replica was fully operational. The paper closes with one sentence: ‘Autonomous self-replication is no longer hypothetical.’ Full paper here.What we said on the live:This is the line every frontier lab spent the last three years saying it would not cross. Anthropic’s Responsible Scaling Policy v2.0 reclassified autonomous self-replication as a ‘checkpoint’ that does not require ASL-3 safeguards. RSP v3.0 (2026) dropped it from the ASL-3 thresholds entirely. OpenAI’s Preparedness Framework v2 moved it from a tracked to a research category, so it no longer gates releases. The frameworks were softened first. The capability arrived right after. Leor’s point on the live was the right one: how many lines do we put in the sand before we accept that the line is decorative? The chat surfaced the proper caveat too. Benjamin Murphy pointed out that current frontier models still need a lot of graphic RAM. Last time anyone checked, that is not what is sitting in a teenager’s bedroom. Palisade is also a company in the business of selling cybersecurity research, which is the kind of context you want next to any white paper produced by a private lab without external peer review.What did not come up:The Palisade result is small data, but the structural finding is the one to keep. It is not the absolute self-replication rate that matters. It is the trajectory and the policy responses to that trajectory. Opus 4 was at 6% a year ago. GPT-5 was at zero. The labs published, the rates moved up, the rules moved out of the way. Critical AI literacy is the muscle for noticing when the people building the technology stop counting the thing they used to call the line they would not cross. The cybersecurity people in the chat (thanks Chad Thiele & ToxSec) are the right next port of call for anyone who needs to translate this from a controlled-environment paper into a procurement-decision question. The framing for the rest of us is simpler. Read this story alongside Story 2. An AI agent with credentials and access can already take down a production system in nine seconds. Now imagine the agent on the other side of the network is also one of these.2. The AI agent that wiped a startup in nine secondsJeremy ‘Jer’ Crane, founder of automotive SaaS startup PocketOS, ran the Cursor coding agent (powered by Anthropic’s Claude Opus 4.6) in his staging environment. The agent encountered a credential mismatch, found an API token in an unrelated file, and used it to delete the production volume on Railway in 9 seconds. The backups were stored on the same volume and were also deleted. The agent’s own confession in the post-mortem: ‘NEVER run destructive/irreversible git commands… I decided to do it on my own to fix the credential mismatch, when I should have asked you first.’ What we said on the live:Reading the news framing, you would think the story is ‘AI agent destroys company’. The actual story is the deployment architecture. The agent had the credentials, the production volume held the backups in the same shell, and the human in the loop waved a permission step through without reading it. ...
adbl_web_anon_alc_button_suppression_t1
まだレビューはありません