『The three questions every CIO should ask about a vendor accuracy claim』のカバーアート

The three questions every CIO should ask about a vendor accuracy claim

The three questions every CIO should ask about a vendor accuracy claim

無料で聴く

ポッドキャストの詳細を見る

今ならプレミアムプランが3カ月 月額99円

2026年5月12日まで。4か月目以降は月額1,500円で自動更新します。

概要

Episode 9 of Agent Mode AI. Abby and Avery walk AM-146, the claim that vendor "ready-to-run" positioning without named task, named baseline, and named methodology is procurement-deck noise rather than procurement evidence. The procurement-grade reference shapes in 2026 are the academic-benchmark layer (CRMArena-Pro 35% multi-step reliability, CMU TheAgentCompany 30-35% reproduction range, WebArena ~36% browser-agent ceiling, SWE-bench Verified for code generation) and the Anthropic Claude for Chrome disclosure pattern (23.6% pre-mitigation, 11.2% post, 0% on URL-injection variants after patches). A third class — the named-customer audited deployment, with McKinsey Lilli, JPMorgan, BT Now Assist, and UK Government Digital Service as the canonical references — sits alongside. Sources cited: - CRMArena-Pro paper, Salesforce AI Research, August 2025 - Carnegie Mellon TheAgentCompany academic benchmark - WebArena academic benchmark - SWE-bench Verified - Anthropic published security disclosure on Claude for Chrome, 26 August 2025 - McKinsey internal Lilli platform deployment data - JPMorgan Chase 2023 AI value disclosure - BT Now Assist deployment, Hena Jalil - UK Government Digital Service Q4 2024 Claims tracked: - AM-146 — Three accuracy-disclosure questions for procurement — agentmodeai.com/holding/?claim=AM-146 - AM-009 — Claude for Chrome procurement-grade disclosure pattern — agentmodeai.com/holding/?claim=AM-009 - AM-140 — Procurement-committee pre-pilot questions — agentmodeai.com/holding/?claim=AM-140 Newsletter and the full Holding-up ledger: agentmodeai.com
adbl_web_anon_alc_button_suppression_c
まだレビューはありません