Security Incubation
Posts
Rethinking AI Red Teaming: From Model Bugs to Systemic Resilience

Rethinking AI Red Teaming: From Model Bugs to Systemic Resilience

A systems-level approach to AI red teaming for managing emergent risks and sociotechnical vulnerabilities

Matt James
July 18, 2025

Red teaming, long rooted in military strategy and later adopted by cybersecurity professionals, is undergoing a transformation in the context of artificial intelligence. As generative AI systems become increasingly embedded in critical social and economic infrastructures, organizations and policymakers have turned to red teaming as a core technique for identifying weaknesses and enhancing safety. However, a recent paper from researchers at Stanford and Georgetown argues that the current approach to AI red teaming is too narrow and risks missing the forest for the trees.

The Problem: A Narrow Focus on Models

Today's AI red teaming efforts largely concentrate on exposing flaws within a single AI model—what the authors refer to as micro-level red teaming. These exercises often involve prompt injection, jailbreaks, or adversarial testing to surface undesired outputs. While important, this approach falls short of addressing the deeper and more complex risks associated with real-world deployment of AI systems.

The paper critiques this reductionist focus, pointing out that most red teaming exercises fail to account for sociotechnical dynamics—the ways AI models interact with users, institutions, and environments. As the authors highlight, emergent risks often arise not from isolated model failures but from the broader context in which AI is used, such as how misinformation spreads, how feedback loops reinforce biases, or how users exploit system-level blind spots.

A Broader Vision: Two Levels of AI Red Teaming

To fill this gap, the authors propose a comprehensive framework that distinguishes between two complementary levels of AI red teaming:

Micro-level red teaming targets the model itself, evaluating prompt behavior, guardrails, and output reliability. This is the current norm in most AI labs.
Macro-level red teaming, by contrast, considers the entire AI system lifecycle, from data sourcing and training pipeline to deployment environment and downstream impacts. It involves assessing risks that emerge from complex interactions between models, users, interfaces, and institutions.

By operationalizing red teaming at both levels, organizations can shift from reactive patchwork to proactive risk management.

Recommendations for the Future

Drawing on decades of experience from cybersecurity and systems engineering, the authors offer a set of actionable recommendations to improve AI red teaming practices:

Treat Red Teaming as an Ongoing Process: Red teaming should not be a one-time product launch checklist but an iterative, continuous evaluation across the AI lifecycle.
Build Multifunctional Red Teams: Effective red teams should include experts in machine learning, human-computer interaction, social science, cybersecurity, and domain-specific knowledge. This diversity is essential to uncovering emergent, cross-cutting vulnerabilities.
Evaluate Sociotechnical Contexts: Red teams must consider how AI systems are used in practice—including how users might manipulate them, how they interact with other tools, and how organizational incentives shape deployment.
Incorporate Systems Thinking: Rather than viewing AI safety as a matter of model robustness alone, organizations must assess systemic resilience—identifying feedback loops, cascades, and coordination failures that lead to large-scale harm.
Embrace Transparency and External Testing: Independent red teaming and disclosure mechanisms can surface risks that internal teams may miss due to institutional blind spots or incentive structures.

The field of AI safety is at an inflection point. As AI systems become more powerful and socially embedded, the traditional model-focused approach to red teaming must evolve. This paper offers a timely and much-needed reframing of red teaming—not as a narrow adversarial testing tool, but as a holistic practice rooted in systems thinking, interdisciplinary collaboration, and continuous scrutiny.

Organizations that adopt this broader vision will be better positioned to identify and mitigate not just model-level bugs, but the real-world harms that arise when AI meets messy, unpredictable human systems.

Citation:
Sharkey, L., Pasquinelli, M., Cheng, B., Dobbe, R., et al. Operationalizing Red Teaming for AI Systems. arXiv preprint arXiv:2507.05538. July 2025.
https://arxiv.org/abs/2507.05538