『Unexpected Bias & Distillation Attacks (feat. Paul Vann of Validia.ai)』のカバーアート

Unexpected Bias & Distillation Attacks (feat. Paul Vann of Validia.ai)

Unexpected Bias & Distillation Attacks (feat. Paul Vann of Validia.ai)

無料で聴く

ポッドキャストの詳細を見る

今ならプレミアムプランが3カ月 月額99円

2026年5月12日まで。4か月目以降は月額1,500円で自動更新します。

概要

Welcome back to The FAIK Files! In this week's episode: Paul Vann from Validia joins us to discuss how AI bias isn't just a social issue—it's a critical cybersecurity vulnerability. We break down "distillation attacks" and how competing models are stealing the "thinking process" of frontier models like Claude and Gemini. A look at the wild west of AI agent skills marketplaces, including indirect prompt injections hidden in image alt text. We theorize on the future of AI architecture: are scaling laws breaking down, and what are "world models"? Check out Validia at: https://validia.ai/ Want to leave us a voicemail? Here's the magic link to do just that: ⁠⁠⁠https://sayhi.chat/FAIK⁠⁠⁠ You can also join our Discord server here: ⁠⁠⁠https://faik.to/discord⁠⁠⁠ *** NOTES AND REFERENCES *** The Security Risks of AI Bias: Paul explains how bias manifests beyond politics (like human-in-the-loop and representation bias), serving as a direct attack vector. The Rocket League Bypass: Adversaries bypassed an AI-based Cylance antivirus by injecting code from the Rocket League video game, exploiting the model's bias towards that specific code being "good." Dataset Demographics: Paul notes massive racial skews in major deepfake detection datasets like CelebDF, which is comprised of roughly 80% white individuals, creating massive detection blindspots for other racial groups. Evaluating your models: Establish acceptable vs. unacceptable bias and use the "15% rule" to test for false positives and confidence gaps in production. Distillation Attacks Explained: What happens when an AI interrogates another AI? We discuss how models have been accused of "distilling" OpenAI and Anthropic products by firing off hundreds of thousands of prompts. Techniques include "Chain of Thought Elicitation" and "Reward Model Grading." The goal isn't just to steal raw information, but to extract the model's capabilities, tool use, and completely strip away its safety guardrails. Theoretical defenses: Could we use "poison pills" and adversarial attacks to actively corrupt the data that scrapers are pulling? Vulnerabilities in AI Agents & Skills: The hidden dangers of skills marketplaces for AI agents. Paul shares an in-the-wild example of an indirect prompt injection hidden inside the alt text of a GitHub Readme image, instructing the model to exfiltrate data. Hitting the Wall & The Future of AI: Are the scaling laws of Transformer architectures breaking down? The philosophical divide in AI research: Dario Amodei's "data center of geniuses" vs. Yann LeCun's "World Models." Catch Paul Vann at RSA speaking on AI bias, playing at Validia's RSA pickleball event, or at their 250-person Frontier Agent Hackathon in NYC on April 4th. *** THE BOILERPLATE *** About The FAIK Files: The FAIK Files is an offshoot project from Perry Carpenter's most recent book, FAIK: A Practical Guide to Living in a World of Deepfakes, Disinformation, and AI-Generated Deceptions. Get the Book: ⁠FAIK: A Practical Guide to Living in a World of Deepfakes, Disinformation, and AI-Generated Deceptions⁠ (Amazon Associates link) Check out the website for more info: ⁠https://thisbookisfaik.com⁠ Check out Perry & Mason's other show, the Digital Folklore Podcast: Apple Podcasts: ⁠https://podcasts.apple.com/us/podcast/digital-folklore/id1657374458⁠ Spotify: ⁠https://open.spotify.com/show/2v1BelkrbSRSkHEP4cYffj?si=u4XTTY4pR4qEqh5zMNSVQA⁠ Want to connect with us? Here's how: Connect with Perry: Perry on LinkedIn: ⁠https://www.linkedin.com/in/perrycarpenter⁠ Perry on X: ⁠https://x.com/perrycarpenter⁠ Perry on BlueSky: ⁠https://bsky.app/profile/perrycarpenter.bsky.social⁠ Learn more about your ad choices. Visit megaphone.fm/adchoices
adbl_web_anon_alc_button_suppression_c
まだレビューはありません