エピソード

  • Nature: Large Language Models Are Proficient in Solving and Creating Emotional Intelligence Tests
    2025/06/25

    Summary of https://www.nature.com/articles/s44271-025-00258-x

    Explores the emotional intelligence capabilities of Large Language Models (LLMs), specifically their ability to solve and create emotional intelligence tests. It highlights that several LLMs, including ChatGPT-4, consistently outperformed human averages on various established emotional intelligence assessments.

    The research also investigated LLMs' capacity to generate new, psychometrically sound test items, finding that these AI-created questions demonstrated comparable difficulty and a strong correlation with original human-designed tests. While some minor differences were observed in clarity, realism, and content diversity, the study ultimately suggests that LLMs can reason accurately about human emotions and their regulation, indicating their potential for use in socio-emotional applications and psychometric development.

    • LLMs demonstrate superior performance in solving emotional intelligence tests compared to humans. Six widely used Large Language Models (LLMs), including ChatGPT-4, ChatGPT-o1, Gemini 1.5 flash, Copilot 365, Claude 3.5 Haiku, and DeepSeek V3, collectively achieved an average accuracy of 81% on five standard emotional intelligence (EI) tests, significantly outperforming the human average of 56% reported in original validation studies. All tested LLMs scored more than one standard deviation above the human mean, with ChatGPT-o1 and DeepSeek V3 exceeding two standard deviations above it.
    • LLMs are proficient at generating new, high-quality emotional intelligence test items. ChatGPT-4 successfully generated new test items (scenarios and response options) for five different ability EI tests, and these new versions demonstrated statistically equivalent test difficulty compared to the original tests when administered to human participants. Importantly, ChatGPT-4 did not simply paraphrase existing items; participants perceived a low level of similarity to any original test scenario in 88% of the newly created scenarios.
    • LLM-generated tests exhibit psychometric properties largely comparable to original human-designed tests, though with some minor differences. While not all psychometric properties (such as perceived item clarity, realism, item content diversity, internal consistency, and correlations with vocabulary or other EI tests) were statistically equivalent between original and ChatGPT-generated versions, any differences observed were small (Cohen’s d less than ±0.25) and none of the 95% confidence interval boundaries exceeded a medium effect size (d ± 0.50). Furthermore, original and ChatGPT-generated tests were strongly correlated (r=0.46), suggesting they measure similar constructs.
    • LLMs show potential for "cognitive empathy" and consistent application of emotional knowledge. The findings support the idea that LLMs can generate responses consistent with accurate knowledge of emotional concepts, emotional situations, and their implications, indicating they fulfill the aspect of cognitive empathy. LLMs offer advantages such as processing emotional scenarios based on extensive datasets, which may lead to fewer errors, and providing consistent emotional knowledge unaffected by human variability like mood, fatigue, or personal preferences.
    • LLMs can significantly aid psychometric test development but cannot fully replace human validation processes. The research highlights that LLMs like ChatGPT can be powerful tools for assisting in the psychometric development of standardized assessments, particularly in the domain of emotion, by generating complete tests with generally acceptable psychometric properties using few prompts. However, the study also notes that while valuable for creating an initial item pool, LLMs cannot replace the necessary pilot and validation studies to refine or eliminate poorly performing items.
    続きを読む 一部表示
    14 分
  • OpenAI: Multi-Agent Portfolio Collaboration with OpenAI Agents SDK
    2025/06/24

    Summary of https://cookbook.openai.com/examples/agents_sdk/multi-agent-portfolio-collaboration/multi_agent_portfolio_collaboration

    This guide from OpenAI introduces a multi-agent collaboration system built using the OpenAI Agents SDK, specifically designed for complex tasks like investment research. It demonstrates a "hub-and-spoke" architecture where a central Portfolio Manager agent orchestrates specialized agents (Macro, Fundamental, Quantitative) as callable tools.

    The system leverages various tool types, including custom Python functions, managed OpenAI tools like Code Interpreter and WebSearch, and external MCP servers, to provide deep, high-quality analysis and scalable workflows. The document emphasizes modularity, parallelism, and auditability through structured prompts and tracing, offering a blueprint for building robust, expert-collaborative AI systems.

    • Multi-Agent Collaboration is Essential for Complex Tasks The core concept is that multiple autonomous LLM agents can coordinate to achieve overarching goals that would be difficult for a single agent to handle. This approach is particularly useful for complex systems, such as financial analysis, where different specialist agents (e.g., Macro, Fundamental, Quantitative) can each handle a specific subtask or expertise area.
    • The "Agent as a Tool" Pattern is Highly Effective This guide specifically highlights and uses the "agent as a tool" collaboration model. In this pattern, a central agent (the Portfolio Manager) orchestrates the workflow by calling other specialist agents as if they were tools for specific subtasks. This design maintains a single thread of control, simplifies coordination, ensures transparency, and allows for parallel execution of sub-tasks, which is ideal for complex analyses.
    • Modular Design Fosters Specialization, Parallelism, and Maintainability Breaking down a complex problem into specialized agents, each with a clear role, leads to deeper, higher-quality research because each agent can focus on its domain with the right tools and prompts. This modularity also makes the system easier to update, test, or improve without affecting other components, and allows independent agents to work concurrently, dramatically reducing task completion time.
    • Flexible Integration of Diverse Tool Types Enhances Agent Capabilities The OpenAI Agents SDK provides significant flexibility in defining and using various tool types. Agents can leverage custom Python functions for domain-specific logic, managed tools like Code Interpreter (for quantitative analysis) and WebSearch (for real-time information), and external MCP (Model Context Protocol) servers for standardized access to external data sources like Yahoo Finance.
    • Structured Orchestration and Observability are Crucial for Robust Systems The Head Portfolio Manager agent's system prompt is central to the workflow, encoding the firm's philosophy, clear tool usage rules, and a multi-step process. This ensures consistent, auditable, and high-quality outputs. Furthermore, OpenAI Traces provide detailed visibility into every agent and tool call, allowing for real-time monitoring, debugging, and full transparency of the workflow.
    続きを読む 一部表示
    22 分
  • BCG: AI-First Companies Win the Future
    2025/06/23

    Summary of https://media-publications.bcg.com/BCG-Executive-Perspectives-AI-First-Companies-Win-the-Future-Issue1-10June2025.pdf

    This Boston Consulting Group (BCG) Executive Perspectives document, from June 2025, addresses how companies can become "AI-first" to achieve future success. It explains that the democratization of AI, shifting business economics, and the ability of AI-native firms to scale rapidly with lean teams necessitate this transformation.

    The report details five key characteristics of an AI-first organization: a wider competitive moat, a reshaped profit and loss (P&L) model, a decentralized tech foundation, an AI-first operating model, and specialized, scalable talent.

    It also provides five actionable steps for executives to begin their AI transformation journey, emphasizing a business-led AI agenda and the importance of demonstrating measurable impact.

    • Wider Competitive Moat Companies will increase their ability to capitalize on key assets such as brand, intellectual property (IP), and talent. Brand trust, direct relationships with customers, ownership of innovations (including patents, trademarks, and copyrights), and exclusive, high-quality data sets become crucial as AI democratizes access and commoditizes content and advice.
    • Reshaped P&L Model There will be high technology spending to support AI, with the value unlocked from efficiencies being reinvested. This involves a significant increase in tech spending (estimated 25-45%) and a decline in labor spending as AI reduces reliance on human-driven processes, ultimately boosting operating margins by redeploying value into growth priorities.
    • Decentralized Tech Foundation Business units will be empowered to lead AI adoption and deploy AI solutions with increased speed and independence, while IT provides and maintains enterprise-wide AI platforms, agent ecosystems, and the overall tech, data, and cyber foundation.
    • AI-First Operating Model Organizations will streamline their operations through reusable AI workflows and reduced duplication. This model shifts from traditional, people-centric processes supplemented by digital tools to processes built around AI agents, with human oversight for gap closure. This leads to flattened hierarchies, real-time governance, and an AI-embracing culture.
    • Specialized, Scalable Talent Companies will develop lean, high-performing teams with specialized skills, focusing roles on judgment, strategy, and human-AI collaboration. AI will automate routine tasks, reshaping roles and potentially reducing headcount, while increasing productivity for top performers and intensifying the competition for skilled AI-fluent talent who will command a premium.
    続きを読む 一部表示
    15 分
  • McKinsey: Seizing the Agentic AI Advantage – A CEO Playbook
    2025/06/20

    Summary of https://www.mckinsey.com/~/media/mckinsey/business%20functions/quantumblack/our%20insights/seizing%20the%20agentic%20ai%20advantage/seizing-the-agentic-ai-advantage.pdf

    McKinsey & Company report, "Seizing the Agentic AI Advantage," examines the current "gen AI paradox," where widespread adoption of generative AI has led to minimal organizational impact.

    The authors explain that AI agents, which are autonomous and goal-driven, can overcome this paradox by transforming complex business processes beyond simple task automation. The report outlines a strategic shift required for CEOs to implement agentic AI effectively, emphasizing the need to move from scattered experiments to integrated, large-scale transformations.

    This includes reimagining workflows around agents, establishing a new agentic AI mesh architecture, and addressing the human and governance challenges associated with deploying autonomous AI. Ultimately, the text argues that successful adoption of agentic AI will redefine how organizations operate, compete, and create value.

    • The Generative AI Paradox: Despite widespread adoption, nearly eight in ten companies using generative AI (gen AI) report no significant bottom-line impact. This "gen AI paradox" stems from an imbalance where easily scaled "horizontal" enterprise-wide tools (like copilots and chatbots) provide diffuse, hard-to-measure gains, while more transformative "vertical" (function-specific) use cases remain largely stuck in pilot mode.
    • Agentic AI as the Catalyst: AI agents offer a way to overcome this paradox by automating complex business processes. Unlike reactive gen AI tools, agents combine autonomy, planning, memory, and integration to become proactive, goal-driven virtual collaborators, unlocking potential far beyond mere efficiency gains.
    • Reinventing Workflows is Crucial: Realizing the full potential of agentic AI requires more than simply plugging agents into existing workflows; it necessitates reimagining and redesigning those workflows from the ground up, with agents at the core. This involves reordering steps, reallocating responsibilities between humans and agents, and leveraging agents' strengths like parallel execution and real-time adaptability for transformative impact.
    • New Architecture and Enablers for Scale: To effectively scale agents, organizations need a new AI architecture paradigm called the "agentic AI mesh". This composable, distributed, and vendor-agnostic framework enables agents to collaborate securely across systems while managing risks like uncontrolled autonomy and sprawl. Additionally, scaling requires critical enablers such as upskilling the workforce, adapting technology infrastructure, accelerating data productization, and deploying agent-specific governance mechanisms.
    • The CEO's Mandate and Human Challenge: The primary challenge in scaling agentic AI is not technical but human: earning trust, driving adoption, and establishing proper governance for autonomous systems. CEOs must lead this transformation by concluding the experimentation phase, realigning AI priorities with strategic programs, redesigning AI governance, and launching high-impact agent-driven projects to redefine how their organizations operate.
    続きを読む 一部表示
    25 分
  • LEGO/The Alan Turing Institute: Understanding the Impacts of Generative AI Use on Children
    2025/06/19

    Summary of https://www.turing.ac.uk/sites/default/files/2025-05/combined_briefing_-_understanding_the_impacts_of_generative_ai_use_on_children.pdf

    Presents the findings of a research project on the impacts of generative AI on children, combining both quantitative survey data from children, parents, and teachers with qualitative insights gathered from school workshops.

    The research, guided by a framework focusing on children's wellbeing, explores how children use generative AI for activities like creativity and learning. Key findings indicate that nearly a quarter of children aged 8-12 have used generative AI, primarily ChatGPT, with usage varying by factors such as age, gender, and educational needs.

    The document also highlights parent, carer, and teacher concerns regarding potential exposure to inappropriate content and the impact on critical thinking skills, while noting that teachers are generally more optimistic about their own use of the technology than its use by students.

    The research concludes with recommendations for policymakers and industry to promote child-centered AI development, improve AI literacy, address bias, ensure equitable access, and mitigate environmental impacts.

    • Despite a general lack of research specifically focused on the impacts of generative AI on children, and the fact that these tools have often not been developed with children's interests, needs, or rights in mind, a significant number of children aged 8-12 are already using generative AI, with ChatGPT being the most frequently used tool.
    • The patterns of generative AI use among children vary notably based on age, gender, and additional learning needs. Furthermore, there is a clear disparity in usage rates between children in private schools (52% usage) and those in state schools (18% usage), indicating a potential widening of the digital divide.
    • There are several significant concerns shared by children, parents, carers, and teachers regarding generative AI, including the risk of children being exposed to inappropriate or inaccurate information (cited by 82% and 77% of parents, respectively), worries about the negative impact on children's critical thinking skills (shared by 76% of parents/carers and 72% of teachers), concerns about environmental impacts, potential bias in outputs, and teachers reporting students submitting AI-generated work as their own.
    • Despite concerns, the research highlights potential benefits of generative AI, particularly its potential to support children with additional learning needs, an area children and teachers both support for future development. Teachers who use generative AI also report positive impacts on their own work, including increased productivity and improved performance on teaching tasks.
    • To address the risks and realize the benefits, the sources emphasize the critical need for child-centred AI design, meaningful participation of children and young people in decision-making processes, improving AI literacy for children, parents, and teachers, and ensuring equitable access to both the tools and educational resources about them.
    続きを読む 一部表示
    22 分
  • OpenAI: Disrupting Malicious Uses of AI – June 2025
    2025/06/19

    Summary of https://cdn.openai.com/threat-intelligence-reports/5f73af09-a3a3-4a55-992e-069237681620/disrupting-malicious-uses-of-ai-june-2025.pdf

    Report detailing OpenAI's efforts to identify and counter various abusive activities leveraging their AI models. It presents ten distinct case studies of disrupted operations, including deceptive employment schemes, covert influence operations, cyberattacks, and scams.

    The report highlights how threat actors, often originating from China, Russia, Iran, Cambodia, and the Philippines, utilized AI for tasks ranging from generating social media content and deceptive resumes to developing malware and social engineering tactics.

    OpenAI emphasizes that their use of AI to detect these activities has paradoxically increased visibility into malicious workflows, allowing for quicker disruption and sharing of insights with industry partners.

    • OpenAI's mission is to ensure that artificial general intelligence (AGI) benefits all of humanity by deploying AI tools to solve difficult problems and defend against various abuses. This includes preventing AI use by authoritarian regimes, and combating covert influence operations (IO), child exploitation, scams, spam, and malicious cyber activity.
    • OpenAI has successfully detected, disrupted, and exposed a range of abusive activities by leveraging AI as a force multiplier for their expert investigative teams. These malicious uses of AI include social engineering, cyber espionage, deceptive employment schemes (like the "IT Workers" case), covert influence operations (such as "Sneer Review," "High Five," "VAGue Focus," "Helgoland Bite," "Uncle Spam," and "STORM-2035"), cyber operations ("ScopeCreep," "Vixen," and "Keyhole Panda"), and scams (like "Wrong Number").
    • These malicious operations originated from various global locations, demonstrating a widespread threatscape. Four of the ten cases in the report likely originated from China, spanning social engineering, covert influence operations, and cyber threats. Other disruptions involved activities from Cambodia (task scam), the Philippines (comment spamming), and covert influence attempts potentially linked with Russia and Iran. Additionally, deceptive employment schemes showed behaviors consistent with North Korea (DPRK)-linked activity.
    • Threat actors utilized AI to evolve and scale their operations, yet this reliance also increased their exposure and aided in their disruption. For example, AI was used for automating resume creation, generating social media content, translating messages for social engineering, and developing malware. Paradoxically, this integration of AI into their workflows provided OpenAI with insights, enabling quicker identification and disruption of these threats.
    • AI investigations are an evolving discipline, and ongoing disruptions help refine defenses and contribute to a broader understanding of the AI threatscape. OpenAI emphasizes that each disrupted operation improves their understanding of how threat actors abuse their models, allowing them to refine their defenses and share findings with industry peers and authorities to strengthen collective defenses across the internet.
    続きを読む 一部表示
    24 分
  • Oakland University: The Memory Paradox –Why Our Brains Need Knowledge in an Age of AI
    2025/06/18

    Summary of https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5250447

    Argues that human memory remains crucial even in the age of AI. It explores the neuroscience behind learning, detailing how the brain utilizes declarative and procedural memory systems and organizes knowledge into schemata and neural manifolds.

    The authors propose that cognitive offloading to digital tools, while seemingly efficient, can undermine these internal cognitive processes, potentially contributing to phenomena like the reversal of the Flynn Effect.

    They advocate for educational approaches that balance technology use with the active internalization of knowledge, suggesting that understanding the brain's natural learning mechanisms is key to designing effective education in the digital age.

    • The central "Memory Paradox" is that in the age of generative AI and ubiquitous digital tools, increasing reliance on external aids to store or handle information can weaken human cognitive capacities by reducing the exercise of internal memory systems.
    • Neuroscience explains that developing deep understanding, fluency, and intuition requires internalizing knowledge through repeated practice, allowing information to transition from the declarative memory system (facts and concepts) to the procedural memory system (skills and routines); excessive reliance on external tools prevents this crucial "proceduralization".
    • Building robust internal mental frameworks, known as schemata, which are supported by optimized neural patterns called neural manifolds, is essential for organizing knowledge, enabling efficient thinking, detecting errors, and supporting critical thinking and creativity; constantly looking information up hinders the formation of these internal structures.
    • Shifts in educational practices away from emphasizing memorization and explicit content instruction, coinciding with the rise of digital tools and cognitive offloading, are linked to the recent reversal of the Flynn Effect—the decline in IQ scores observed in developed countries—suggesting societal-level consequences for cognitive performance when internal memory is devalued.
    • Effective learning in the digital age requires balancing the use of external technology to support internal cognitive work rather than replacing it. Strategies should promote active engagement, structured practice, memorization of foundational knowledge, and utilizing tools that encourage the brain's natural learning mechanisms like prediction error detection and schema formation.
    続きを読む 一部表示
    28 分
  • Pearson: Asking to Learn – What Student Queries to Generative AI Reveal About Cognitive Engagement
    2025/06/17

    Summary of https://plc.pearson.com/sites/pearson-corp/files/asking-to-learn.pdf

    Analyzing student queries to an AI-powered study tool reveals that while many questions focus on basic factual and conceptual knowledge, a significant portion demonstrates higher-order thinking skills, suggesting the tool can support deeper learning.

    Insights from this study are being used to develop features that encourage students to ask more complex questions. The authors emphasize that meaningfully integrating AI tools into learning can foster a richer, more active educational experience.

    • A large-scale study analyzed 128,725 student queries from 8,681 unique users interacting with the "Explain" feature of an AI-powered study tool embedded in an eTextbook. The analysis focused on the open-ended nature of the Explain feature queries as insights into student thought processes.
    • Using Bloom's Taxonomy, the analysis found that 80% of student inputs related to basic Factual or Conceptual knowledge, such as definitions or understanding connections. This aligns with the introductory biology course context.
    • However, the data also showed that about one-third of inputs reflected more advanced cognitive complexity, and 20% were at levels suggesting higher-order thinking skills (Analyze and above), indicating potential for deeper learning beyond basic recall.
    • The presence of higher-level queries suggests that many students are actively framing their inquiries rather than passively seeking information, pointing to the tool's potential to foster more advanced cognitive skills when thoughtfully integrated.
    • Insights from the analysis have directly informed the development of a new "Go Deeper" feature which suggests follow-up questions targeting higher cognitive levels to encourage deeper engagement.
    続きを読む 一部表示
    14 分