The Battle Test Podcast

エピソード

Short Story by a Small Agent Model (SAM)

2025/08/07

Welcome to a unique storytelling experiment. What you're about to hear wasn't written in the traditional sense. It was generated entirely by SAM—the Small Agent Model—an AI trained on technical documents, research papers, and patterns pulled from a vast archive of open-access knowledge, including thousands of PDFs from arXiv.org. This story began with a single prompt and evolved entirely within SAM’s internal reasoning. No plot outline. No human editing. Just raw output shaped by the model’s logic, curiosity, and sense of narrative. Our goal? To test whether a small, locally-running AI could hold focus across a long-form story—maintaining character development, tension, and thematic consistency. The result is a digital hallucination… but one grounded in real science, speculative fiction, and machine-learned creativity. Let’s begin.

続きを読む一部表示

36 分

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

無料で聴く
Episode 40 - Teaching Smaller AI Models to Think Like Cybersecurity Experts: A Deep Dive into Knowledge Distillation

2025/05/05

In this episode, we unpack a cutting-edge approach to building lean, high-performance AI models tailored for cybersecurity. Based on our latest white paper, we explore a multi-stage knowledge distillation pipeline that transfers expertise from large teacher models to smaller, more efficient student models like Phi-3 Mini. Topics include structured data enrichment, virtual machine-based learning, test-time reinforcement learning (TTRL), and curiosity-driven exploration powered by Information Theory. Whether you're an AI researcher, cybersecurity professional, or tech strategist, this episode offers a deep yet accessible guide to making specialized AI practical for real-world, resource-constrained environments.

続きを読む一部表示

25 分

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

無料で聴く
Episode 39 - The Dark Side of MCP: How LLMs Can Be Hacked by Design

2025/04/14

The paper titled "MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits" by Brandon Radosevich and John Halloran investigates security vulnerabilities introduced by the Model Context Protocol (MCP), an open standard designed to streamline integration between large language models (LLMs), data sources, and agentic tools. While MCP aims to facilitate seamless AI workflows, the authors identify significant security risks associated with its current design.

続きを読む一部表示

13 分

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

無料で聴く
Episode 38 - Unmasking Cyber Threats: Agentless Emulation for Next-Gen Cyber Defense

2025/04/02

In this episode, we explore how modern cybersecurity is transforming with agentless threat emulation. We discuss a cutting-edge platform that simulates advanced persistent threat (APT) tactics without installing agents—leveraging open-source tools like Atomic Red Team and PurpleSharp alongside the MITRE ATT&CK framework. Discover how the platform’s user-friendly, drag-and-drop scenario builder, remote execution via SSH/WinRM, and real-time monitoring empower cyber defenders to train effectively, identify detection gaps, and bolster overall security. Join us as we break down the technical innovations, operational benefits, and strategic value of continuous, automated threat simulations in today’s dynamic cyber landscape.

続きを読む一部表示

23 分

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

無料で聴く
Episode 37 - NIST Report on Adversarial Machine Learning Taxonomy and Terminology

2025/04/02

This NIST report offers a comprehensive exploration of adversarial machine learning (AML), detailing threats against both predictive AI (PredAI) and generative AI (GenAI) systems. It presents a structured taxonomy and terminology of various attacks, categorising them by the AI system properties they target, such as availability, integrity, and privacy, with an additional category for GenAI focusing on misuse enablement. The document outlines the stages of learning vulnerable to attacks and the varying capabilities and knowledge an attacker might possess. Furthermore, it describes existing and potential mitigation strategies to defend against these evolving threats, highlighting the inherent trade-offs and challenges in securing AI systems.

続きを読む一部表示

37 分

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

無料で聴く
Episode 37 - Distilling Knowledge: How Mechanistic Interpretability Elevates AI Models"

2025/04/02

In this episode, we delve into a newly published white paper that outlines a cutting-edge pipeline for enhancing language models through knowledge distillation and post-hoc mechanistic interpretability analysis. We explore how the approach integrates data enrichment, teacher pair generation, parameter-efficient fine-tuning, and a self-study loop to specialize a base language model—particularly for cybersecurity tasks—while preserving its broader language capabilities. We also discuss the newly introduced Mechanistic Interpretability Framework, which sheds light on the internal workings of the distilled model, offering insights into layer activations and causal pathways. Whether you're building domain-specific AI or curious about making large language models more transparent, this conversation reveals how domain expertise and interpretability can come together to create more trustworthy and efficient AI systems.

続きを読む一部表示

22 分

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

無料で聴く
Episode 36 - Cyber Common Operational Picture Framework for Situational Awareness

2025/02/11

This research paper proposes a Cyber Common Operational Picture (CyCOP) framework for enhancing cyber situational awareness. The framework integrates various data streams to provide a comprehensive visual representation of cyber threats, enabling faster responses to attacks. The authors present five visualisations designed to meet specific needs in cyber defence, detailing their design and testing response times. The study's findings suggest that a well-designed CyCOP, adhering to proposed criteria for interface design and response speed, can significantly improve situational awareness and preparedness against cyberattacks, with applications in both military and civilian contexts. Future work will expand the framework to include further functionalities for threat response.

続きを読む一部表示

26 分

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

無料で聴く
Episode 35 - 2024 Wrap up and Innovation in Wargaming

2024/12/27

This article by Tyson Kackley details the development of a new wargaming system for the US Marine Corps, designed to support Force Design 2030. The system integrates multiple simulations across all warfighting domains, employing a modular, cloud-based architecture for scalability and adaptability. A key feature is its emphasis on data management and a continuous verification, validation, and accreditation (VV&A) process. This ensures the system’s outputs are reliable and defensible, informing crucial decision-making. The system uses a framework of simulations, allowing for the strengths of individual tools to compensate for each other's weaknesses. The system's design prioritizes the use of validated conceptual models and facilitates collaboration amongst subject-matter experts.

続きを読む一部表示

28 分

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

無料で聴く

エピソード

Short Story by a Small Agent Model (SAM)

カートのアイテムが多すぎます

カートに追加できませんでした。

ウィッシュリストに追加できませんでした。

ほしい物リストの削除に失敗しました。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Episode 40 - Teaching Smaller AI Models to Think Like Cybersecurity Experts: A Deep Dive into Knowledge Distillation

カートのアイテムが多すぎます

カートに追加できませんでした。

ウィッシュリストに追加できませんでした。

ほしい物リストの削除に失敗しました。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Episode 39 - The Dark Side of MCP: How LLMs Can Be Hacked by Design

カートのアイテムが多すぎます

カートに追加できませんでした。

ウィッシュリストに追加できませんでした。

ほしい物リストの削除に失敗しました。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Episode 38 - Unmasking Cyber Threats: Agentless Emulation for Next-Gen Cyber Defense

カートのアイテムが多すぎます

カートに追加できませんでした。

ウィッシュリストに追加できませんでした。

ほしい物リストの削除に失敗しました。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Episode 37 - NIST Report on Adversarial Machine Learning Taxonomy and Terminology

カートのアイテムが多すぎます

カートに追加できませんでした。

ウィッシュリストに追加できませんでした。

ほしい物リストの削除に失敗しました。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Episode 37 - Distilling Knowledge: How Mechanistic Interpretability Elevates AI Models"

カートのアイテムが多すぎます

カートに追加できませんでした。

ウィッシュリストに追加できませんでした。

ほしい物リストの削除に失敗しました。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Episode 36 - Cyber Common Operational Picture Framework for Situational Awareness

カートのアイテムが多すぎます

カートに追加できませんでした。

ウィッシュリストに追加できませんでした。

ほしい物リストの削除に失敗しました。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Episode 35 - 2024 Wrap up and Innovation in Wargaming

カートのアイテムが多すぎます

カートに追加できませんでした。

ウィッシュリストに追加できませんでした。

ほしい物リストの削除に失敗しました。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました