エピソード

  • WorkOS Releases auth.md: An Open Agent Registration Protocol Built on OAuth Standards — 2026-05-25
    2026/05/25
    ## Short Segments Today, we're diving into a major shift in how AI agents authenticate and operate online. WorkOS has introduced auth.md, a new open protocol designed to streamline agent registration using OAuth standards. This development could redefine how agents interact with web services, moving beyond traditional human-centric authentication methods. ## Feature Story WorkOS has unveiled auth.md, an open agent registration protocol built on OAuth standards, aiming to revolutionize how AI agents authenticate and operate on the web. Traditionally, web authentication has been designed with the assumption that a human is behind the browser, clicking buttons, filling out forms, and verifying emails. However, this model falls short when it comes to AI agents, which are increasingly performing tasks like writing code, opening pull requests, and updating records autonomously. Currently, the workaround for agent registration involves providing agents with raw API keys or session tokens. This method is fraught with issues, as these credentials are often unscoped, difficult to audit on a per-session basis, and challenging to revoke selectively. WorkOS's auth.md proposes a structured alternative to this problem. Auth.md is essentially a small Markdown file that an application publishes at a well-known location, typically a URL like "https://service.com/auth.md". This file serves as a guide for agents on how to register with the service, detailing supported flows, available scopes, and how credentials are issued, audited, and revoked. The beauty of auth.md lies in its dual functionality: it acts as documentation for human developers and as a runtime artifact that agents can read programmatically. Agents can fetch the auth.md file, read the structured sections, select the appropriate flow, and register without human intervention. This process is facilitated by a two-hop discovery mechanism. The machine-readable source of truth resides at a well-known path, which promotes the resource and points to the Authorization Server. The Authorization Server metadata includes the necessary blocks for agent registration. This development is particularly significant in the context of the growing role of AI agents in enterprise environments. As AI agents transition from single-user desktop demos to enterprise production, they face the challenge of multi-user, multi-system delegated authorization. Security architects and AI engineers are tasked with ensuring that every agent action is treated as a delegated user action, maintaining a clean audit trail and explicit consent. The introduction of auth.md aligns with ongoing efforts to extend OAuth for AI agents, as seen in recent IETF drafts. These drafts propose mechanisms for AI agents to act on behalf of users with explicit consent, addressing the current lack of clarity in audit trails when agents perform actions on behalf of users. Moreover, auth.md complements other initiatives like the System for Cross-Domain Identity Management (SCIM) for AI, which aims to standardize the provisioning and deprovisioning of AI agents across various applications. Together, these developments are laying the groundwork for a more secure and efficient ecosystem for AI agents. In practical terms, auth.md could significantly enhance the security and manageability of AI agents in enterprise settings. By providing a clear and structured method for agent registration, it reduces the risk of unauthorized access and simplifies the process of auditing and revoking credentials. This is a crucial step forward as AI agents become more integrated into critical infrastructure and workflows. Looking ahead, the adoption of auth.md and similar protocols could lead to a more standardized approach to AI agent authentication, making it easier for organizations to deploy and manage these agents at scale. As the landscape of AI continues to evolve, developments like auth.md will be key to ensuring that security and efficiency keep pace with innovation. That's all for today's episode of Impact Vector. Stay tuned for more insights into the latest AI tools and technologies. Until next time!
    続きを読む 一部表示
    4 分
  • Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys — 2026-05-24
    2026/05/24
    ## Short Segments NVIDIA's Gated DeltaNet-2 introduces a new linear attention layer that decouples erase and write operations, enhancing memory management in AI models. Today, we'll explore how this innovation improves performance and what it means for developers. Later, we'll dive into Microsoft's Webwright, a terminal-native web agent framework that significantly boosts task performance. But first, let's break down NVIDIA's latest release. NVIDIA AI has unveiled Gated DeltaNet-2, a linear attention layer that separates erase and write operations in the Delta Rule, addressing a key bottleneck in memory management. This model, trained on 100 billion FineWeb-Edu tokens, outperforms its predecessors like Mamba-2 and Gated DeltaNet across various benchmarks. By decoupling the active memory edit into two channel-wise gates, Gated DeltaNet-2 allows for more precise control over memory updates, enhancing both speed and efficiency. This development is particularly significant for developers working with large-scale AI models, as it offers a more efficient way to manage memory without compromising on performance. The practical consequence is a more streamlined process for handling complex data sets, making it easier to implement advanced AI solutions in real-world applications. ## Feature Story Microsoft Research's Webwright framework redefines web automation by using a terminal-native approach, significantly improving task performance. Unlike traditional web agents that operate one action at a time, Webwright allows agents to write and refine Playwright code, offering a more flexible and efficient method for web interactions. This shift from a stateful browser session to a terminal environment enables agents to launch, inspect, and discard browsers while focusing on code and logs in the local workspace. This approach mirrors how developers create Robotic Process Automation scripts, allowing for reusable and adaptable solutions. Webwright's architecture consists of three core components: a Runner, a Model Endpoint, and a terminal Environment, totaling just over a thousand lines of code. This simplicity and efficiency make it accessible for developers looking to integrate AI-driven web automation into their workflows. The framework's ability to score 60.1% on the Odysseys benchmark, a significant improvement from the base GPT-5.4's 33.5%, highlights its potential to transform how web tasks are automated. For developers, this means a more robust toolset for creating and deploying web agents, ultimately leading to faster and more reliable automation solutions. As AI continues to evolve, frameworks like Webwright will play a crucial role in bridging the gap between AI capabilities and practical applications, offering new possibilities for innovation and efficiency in web-based tasks.
    続きを読む 一部表示
    3 分
  • Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE — 2026-05-23
    2026/05/23
    ## Short Segments Perplexity open-sources Bumblebee, a read-only supply-chain scanner for developer endpoints, addressing a critical security gap. Attackers are increasingly targeting developer machines, not just production systems. Bumblebee, now available on GitHub, is designed to scan macOS and Linux environments for risky packages, browser extensions, and AI tool configurations without modifying the machine. This tool helps security teams quickly identify which developer machines are exposed to new vulnerabilities by checking local developer state, such as lockfiles and package metadata. Bumblebee fills a crucial gap left by existing tools like SBOMs and EDR products, which do not fully cover local developer environments. By providing real-time insights into on-disk metadata, Bumblebee enhances the security posture of developer systems, making it easier to respond to supply-chain threats. ## Feature Story Nous Research releases Contrastive Neuron Attribution (CNA), a breakthrough in steering language models without SAE training or weight modification. Instruction-tuned language models are designed to refuse harmful requests, but understanding which part of the model is responsible for this behavior has been a challenge. The Nous Research team developed CNA to identify specific MLP neurons that distinguish harmful from benign prompts. By ablating just 0.1% of MLP activations, they achieved a more than 50% reduction in refusal rates across various models, while maintaining high output quality. Existing steering methods like Contrastive Activation Addition (CAA) and Sparse Autoencoders (SAEs) have limitations. CAA modifies entire layer-wide signals, leading to degraded output quality at high steering strengths. SAEs require expensive external training and are sensitive to activation noise. CNA, however, requires only a forward pass, making it more efficient and precise. A key finding of the research is that the late-layer structure that discriminates harmful from benign prompts exists in base models before any fine-tuning. Alignment fine-tuning transforms the function of neurons within this existing structure into a sparse, targetable refusal gate, rather than creating new structures. This insight challenges the assumption that fine-tuning creates new mechanisms for refusal. The implications of CNA are significant for developers and researchers working with language models. It offers a more targeted approach to steering model behavior, reducing the need for extensive retraining or weight modification. This can lead to more efficient and effective deployment of language models in applications where safety and alignment are critical. As the field of AI continues to evolve, methods like CNA provide valuable tools for understanding and controlling model behavior at a granular level. This research not only advances the technical capabilities of language models but also contributes to the broader goal of developing AI systems that are safe and aligned with human values.
    続きを読む 一部表示
    3 分
  • Microsoft Releases Fara1.5: A Family of Browser Computer-Use Agents (4B/9B/27B) That Outperform OpenAI — 2026-05-22
    2026/05/22
    ## Short Segments OpenMythos offers a new way to build recurrent-depth transformers for advanced AI tasks. Today, we're diving into how OpenMythos enables the creation of recurrent-depth transformers for tasks like MLA, GQA, and loop-scaled reasoning. Later, we'll explore Microsoft's release of Fara1.5, a new family of browser computer-use agents that outperform existing models. OpenMythos is a community-driven project that reconstructs the hypothesized architecture of Anthropic's Claude Mythos model using PyTorch. In a recent tutorial, developers demonstrated how to build advanced recurrent-depth transformers using OpenMythos in Google Colab. This setup allows for the creation of MLA and GQA model variants, enabling deeper computation through recurrent loops. By leveraging these loops, a single model can reuse its parameters, enhancing its ability to perform complex reasoning tasks. OpenMythos provides a unique opportunity for developers to experiment with cutting-edge AI architectures, offering insights into the potential of recurrent-depth transformers. As AI continues to evolve, tools like OpenMythos are crucial for pushing the boundaries of what's possible in machine learning and artificial intelligence. ## Feature Story Microsoft's Fara1.5 sets a new benchmark in browser-based AI agents, outperforming competitors in task success rates. Microsoft Research's AI Frontiers lab has unveiled Fara1.5, a family of computer-use agent models designed to operate within a browser environment. These models, available in three sizes—4B, 9B, and 27B—are integrated with Microsoft's MagenticLite, a sandboxed browser interface that facilitates their operation. Fara1.5 models are pixel-to-action systems, meaning they interpret browser screenshots and execute mouse and keyboard actions to complete tasks. This approach places them in the same category as other recent agent products like OpenAI's Operator and Google's Gemini 2.5 Computer Use. What sets Fara1.5 apart is its performance on the Online-Mind2Web benchmark, which evaluates task success across 300 tasks on 136 popular websites. The Fara1.5-27B model achieved a 72% task success rate, significantly outperforming OpenAI's Operator at 58.3% and Google's Gemini 2.5 at 57.3%. Even the smaller Fara1.5-9B model scored 63.4%, nearly doubling the performance of its predecessor, Fara-7B, which scored 34.1%. This leap in performance highlights the advancements Microsoft has made in developing efficient and effective AI agents for web-based tasks. The architecture of Fara1.5 is built on Qwen3.5 base checkpoints, utilizing an observe-think-act loop to process information and determine actions. At each step, the model considers the prior conversation history and the three most recent browser screenshots before emitting thoughts and a single next action. This method allows the model to navigate complex web environments with greater accuracy and efficiency. Microsoft's integration of these models with MagenticLite further enhances their capabilities, providing a robust platform for AI-driven browser interactions. The release of Fara1.5 marks a significant advancement in the field of computer-use agents, offering a powerful tool for automating web-based tasks. For developers and enterprises, this means access to more reliable and efficient AI agents that can handle a wide range of online activities. As these models continue to evolve, they promise to transform how we interact with web environments, making complex tasks more accessible and manageable. Looking ahead, the success of Fara1.5 could pave the way for further innovations in AI-driven browser technology, setting new standards for performance and usability.
    続きを読む 一部表示
    4 分
  • One Model, Three Modalities: ByteDance Releases Lance for Image and Video Understanding, Generation, and — 2026-05-21
    2026/05/21
    ## Short Segments Forward Deployed Engineers are reshaping AI roles at OpenAI, Anthropic, and Google in 2026. These engineers work directly within client environments, not from a home office, to build and implement AI systems in real-world settings. Unlike traditional consultants who provide recommendations, Forward Deployed Engineers are responsible for the actual deployment and operation of AI solutions in production. This role, originally coined by Palantir, has seen a significant surge in demand as companies seek to integrate AI more deeply into their operations. With the rise of AI, the need for such hands-on, embedded roles is growing, highlighting a shift in how technical expertise is applied in the field. As AI continues to evolve, the Forward Deployed Engineer role exemplifies the increasing importance of direct, on-site technical collaboration to ensure successful AI integration. ## Feature Story ByteDance's new model, Lance, integrates image and video understanding, generation, and editing into a single framework. This development marks a significant shift from traditional models that separate these tasks into distinct architectures. Lance's unified approach allows it to handle a wide range of tasks, from image and video captioning to text-to-image and text-to-video generation, all within one model. With only 3 billion active parameters, Lance is designed to be lightweight yet powerful, making it accessible for developers to build with, not just read about. The model's open-source release under the Apache 2.0 license further facilitates commercial experimentation and innovation. By training Lance from scratch and optimizing its architecture to handle multimodal tasks efficiently, ByteDance has demonstrated the potential of smaller models to perform complex visual tasks effectively. This approach contrasts with the trend of relying on large-scale compute resources, showcasing a more efficient path forward in AI development. As Lance becomes available to the developer community, it offers a new foundation for exploring unified visual models, potentially influencing future AI research and applications. Developers can now experiment with Lance's capabilities, which include advanced image and video editing features, providing a versatile tool for creative and technical projects alike. Looking ahead, Lance's impact on the AI landscape will depend on how well it performs in real-world applications and its ability to inspire further advancements in multimodal AI systems. As the AI community continues to explore the possibilities of unified models, Lance stands as a promising example of innovation in the field.
    続きを読む 一部表示
    3 分
  • Google Introduces Gemini 3.5 Flash at I/O 2026: A Faster and Cheaper Model for AI Agents and Coding — 2026-05-20
    2026/05/20
    ## Short Segments NVIDIA's new Nemotron-Labs-Diffusion model family unifies three decoding modes, offering a fresh approach to language model architecture. Today, we'll explore how this tri-mode model changes the game for AI text generation, Alibaba's breakthrough in real-time translation, and MIT's innovative use of AI in drug discovery. Coming up, we'll dive into Google's latest AI model, Gemini 3.5 Flash, and its implications for intelligent agents and coding. NVIDIA's Nemotron-Labs-Diffusion introduces a tri-mode language model that combines autoregressive, diffusion-based parallel, and self-speculation decoding. This model family, available in 3B, 8B, and 14B parameter sizes, aims to overcome the limitations of sequential decoding by enabling higher throughput through parallel processing. While traditional autoregressive models generate text one token at a time, diffusion models denoise multiple tokens simultaneously, increasing efficiency but historically lagging in accuracy. By integrating these modes, NVIDIA offers a practical deployment option for non-autoregressive text generation, potentially transforming AI text generation workflows. This development highlights NVIDIA's commitment to advancing AI capabilities beyond research, making them accessible for real-world applications. Alibaba's Qwen team has unveiled Qwen3.5-LiveTranslate-Flash, a model that achieves real-time multimodal interpretation across 60 languages with just 2.8 seconds of latency. This marks a significant improvement from its predecessor, which supported 18 languages at a three-second delay. The model's ability to stream translations continuously while the speaker is talking reduces the need for per-language model switching, streamlining multilingual product development. By processing 'reading units' instead of waiting for full sentences, Qwen3.5-LiveTranslate-Flash enhances real-time communication, making it a valuable tool for global enterprises seeking seamless language integration. This advancement underscores the potential of AI to bridge language barriers in real-time applications. MIT researchers are leveraging AI to revolutionize drug discovery by analyzing vast numbers of potential chemical compounds. With estimates suggesting that between 10^20 and 10^60 compounds could be viable small-molecule drugs, AI offers a way to identify promising candidates efficiently. Associate Professor Connor Coley is at the forefront of this effort, developing computational models that predict reaction pathways and design new compounds. This approach not only accelerates the drug discovery process but also exemplifies the intersection of AI and science, where machine learning aids in generating insights that would be too time-consuming to achieve experimentally. As AI continues to evolve, its role in scientific research and innovation is set to expand, offering new possibilities for discovery and development. ## Feature Story Google's Gemini 3.5 Flash, unveiled at I/O 2026, promises faster and cheaper AI capabilities for intelligent agents and coding tasks. This new model outperforms its predecessor, Gemini 3.1 Pro, on several challenging benchmarks, marking a significant leap in AI performance. With a Terminal-Bench 2.1 score of 76.2% for coding performance and an 83.6% score on MCP Atlas for tool-use reliability, Gemini 3.5 Flash sets a new standard for AI efficiency. Its ability to complete tasks at less than half the cost and four times the speed of previous models makes it an attractive option for developers and enterprises alike. Priced at $1.50 per million input tokens and $9.00 per million output tokens, with a context window of over a million input tokens, this model is designed for scalability and versatility. Gemini 3.5 Flash supports text, image, audio, and video inputs, with dynamic thinking enabled by default to allocate more compute for complex problems. This release signifies Google's commitment to advancing AI technology, providing tools that enhance real-world utility and agentic task performance. As Gemini 3.5 Flash becomes available globally, its impact on AI-driven applications and intelligent agent development will be closely watched, potentially reshaping how AI is integrated into everyday products and services.
    続きを読む 一部表示
    4 分
  • How to Build an Advanced Agentic AI System with Planning, Tool Calling, Memory, and Self-Critique Using — 2026-05-19
    2026/05/19
    ## Short Segments Today, we're diving into the mechanics of building an advanced agentic AI system using the OpenAI API. This isn't just about chatbots anymore; it's about creating AI workflows that can plan, execute, and critique their own actions. Coming up, we'll explore how this system integrates planning, tool calling, memory, and self-critique to transform how tasks are automated and managed. ## Feature Story Building an advanced agentic AI system with the OpenAI API is now within reach, offering a new level of automation and intelligence in AI workflows. This system is designed as a pipeline of specialized roles: a planner, a tool-using executor, and a critic. This separation allows for distinct handling of strategy, action, and quality control, making the AI more efficient and reliable. The process begins with setting up the OpenAI SDK, ensuring that the system remains lightweight and reproducible, particularly in environments like Google Colab. By using a hidden terminal prompt for the API key, the setup maintains security and privacy, preventing the key from appearing in the notebook output or code. Once the OpenAI client is established, the system is configured to use a specific model, such as GPT-5.2. This model serves as the backbone for the AI's operations, enabling it to perform complex tasks with precision. The agent's architecture is modular, allowing for the integration of various structured tools. These include a calculator for computations, a mini knowledge-base search for retrieving guidance, JSON extraction for structured outputs, and file writing for saving deliverables. This modularity is crucial as it allows the AI to adapt to different tasks and environments. For instance, the agent can perform web searches, retrieve local data, load datasets, and execute Python scripts, all through a structured schema. This flexibility is enhanced by a hybrid router that combines heuristics and LLM reasoning, dynamically deciding which tools to use based on the task at hand. Such a system moves beyond the limitations of single-prompt chatbots, which often struggle with maintaining context over multiple interactions. Instead, this agentic AI can handle complex, multistep tasks autonomously. For example, it can research companies, compare pricing, and draft emails, all without manual intervention. This capability is particularly valuable in professional settings where efficiency and accuracy are paramount. The introduction of workspace agents in platforms like ChatGPT further exemplifies this evolution. These agents, powered by Codex, can manage complex tasks and long-running workflows within organizational controls. They represent a significant shift in how AI is utilized in the workplace, taking on tasks traditionally performed by humans, such as preparing reports, writing code, and responding to messages. The broader AI industry is actively pursuing the development of such agents, with companies like Google and OpenAI leading the charge. OpenAI's recent unveiling of a "Responses API" is a testament to this trend, aiming to facilitate the creation of AI agents capable of performing multistep actions on behalf of users. As these systems become more sophisticated, they promise to revolutionize how we interact with technology. By automating routine tasks and enhancing decision-making processes, agentic AI systems can significantly boost productivity and innovation across various sectors. Looking ahead, the continued development and deployment of these systems will likely lead to even more advanced capabilities. As AI agents become more integrated into our daily workflows, they will not only perform tasks but also learn and adapt, offering personalized solutions and insights. In conclusion, the ability to build an advanced agentic AI system using the OpenAI API marks a pivotal moment in AI development. By combining planning, tool calling, memory, and self-critique, these systems offer a glimpse into the future of AI-driven automation and intelligence. As we continue to explore and refine these technologies, the potential for transformative change in how we work and live becomes increasingly tangible.
    続きを読む 一部表示
    4 分
  • NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid — 2026-05-18
    2026/05/18
    ## Short Segments Today, NVIDIA unveils a groundbreaking 4-bit pretraining methodology using NVFP4, validated on a 12-billion-parameter hybrid Mamba-Transformer model. This development could redefine efficiency in AI training. Coming up, we'll explore how this innovation could change the landscape of large language model training. ## Feature Story NVIDIA has introduced a new 4-bit pretraining methodology using NVFP4, marking a significant advancement in AI model training. This approach was validated on a 12-billion-parameter hybrid Mamba-Transformer model, trained on an unprecedented 10 trillion tokens. The NVFP4 format, supported by Blackwell Tensor Cores, represents a leap forward in efficiency, potentially halving memory usage and reducing computational demands compared to the current FP8 standard. Traditionally, pretraining large language models (LLMs) in FP8 has been the norm, but the shift to a 4-bit floating point format has posed challenges due to the compressed dynamic range and increased quantization error over long token sequences. NVIDIA's NVFP4 addresses these issues by introducing a microscaling format that enhances precision and stability, even at reduced bit levels. NVFP4's innovation lies in its structure. It reduces the block size from 32 to 16 elements, allowing for a more precise dynamic range. The block scale factors are stored in a format that trades exponent range for mantissa precision, ensuring that the maximum representable values are closely mapped. Additionally, NVFP4 incorporates a second scaling level with an FP32 per-tensor scale, maintaining the block scales within range and ensuring at least 6.25% of values in each block are accurately represented. This methodology was put to the test with a 12-billion-parameter hybrid Mamba-Transformer model, achieving a performance score of 62.58% on the MMLU-Pro 5-shot benchmark, closely matching the 62.62% score of the FP8 baseline. This demonstrates that NVFP4 can maintain high accuracy levels while significantly reducing resource requirements. The implications of this development are substantial. By enabling efficient training of large models with reduced precision, NVFP4 could lower the cost and time associated with AI model development. This is particularly relevant as the demand for more complex and capable AI systems grows, necessitating models that can handle dense technical problems and long-context analysis efficiently. Moreover, NVFP4's compatibility with NVIDIA's Transformer Engine means that developers can integrate this format into existing workflows, leveraging the benefits of reduced memory and compute usage without sacrificing performance. This could accelerate the deployment of advanced AI models across various industries, from natural language processing to autonomous systems. Looking ahead, the success of NVFP4 in pretraining large models could pave the way for further innovations in low-precision AI training. As researchers continue to explore the potential of 4-bit formats, we may see even more efficient and powerful AI systems emerge, capable of tackling increasingly complex tasks with minimal resource expenditure. In summary, NVIDIA's introduction of NVFP4 represents a pivotal moment in AI model training, offering a path to more efficient and cost-effective development of large language models. As this technology gains traction, it could transform the landscape of AI research and deployment, making advanced capabilities more accessible and sustainable.
    続きを読む 一部表示
    4 分