Rethinking AI Red Teaming: From Model Bugs to Systemic Resilience

A systems-level approach to AI red teaming for managing emergent risks and sociotechnical vulnerabilities

Red teaming started as a military practice and eventually became a staple in cybersecurity, but it’s now being reshaped by the rise of artificial intelligence. As generative AI systems weave deeper into everyday life and major industries, both companies and policymakers have leaned on red teaming to spot weaknesses and improve safety. But a recent paper from Stanford and Georgetown researchers argues that today’s AI red-teaming efforts are too limited — and that we may be focusing on small issues while overlooking much bigger risks.

The Problem: A Narrow Focus on Models

Most AI red-teaming work today zeroes in on finding weaknesses in a single model — what the authors call “micro-level” red teaming. These tests usually revolve around prompt injection, jailbreaks, and other adversarial tactics to coax out bad behavior. Useful as that is, it doesn’t get at the broader, more complicated risks that emerge once AI systems are deployed in real-world settings.

The authors argue that this narrow focus misses the bigger picture. Most red-teaming efforts ignore the sociotechnical side of things — how AI systems behave once real people, institutions, and environments get involved. Many of the most serious risks don’t come from a model misfiring on its own, but from how it’s used in the world: misinformation that snowballs, feedback loops that amplify bias, or users finding and exploiting gaps in the overall system rather than the model itself.

A Broader Vision: Two Levels of AI Red Teaming

To fill this gap, the authors propose a comprehensive framework that distinguishes between two complementary levels of AI red teaming:

  • Micro-level red teaming targets the model itself, evaluating prompt behavior, guardrails, and output reliability. This is the current norm in most AI labs.

  • Macro-level red teaming, by contrast, considers the entire AI system lifecycle, from data sourcing and training pipeline to deployment environment and downstream impacts. It involves assessing risks that emerge from complex interactions between models, users, interfaces, and institutions.

By operationalizing red teaming at both levels, organizations can shift from reactive patchwork to proactive risk management.

Recommendations for the Future

Drawing on decades of experience from cybersecurity and systems engineering, the authors offer a set of actionable recommendations to improve AI red teaming practices:

  1. Treat Red Teaming as an Ongoing Process: Red teaming should not be a one-time product launch checklist but an iterative, continuous evaluation across the AI lifecycle.

  2. Build Multifunctional Red Teams: Effective red teams should include experts in machine learning, human-computer interaction, social science, cybersecurity, and domain-specific knowledge. This diversity is essential to uncovering emergent, cross-cutting vulnerabilities.

  3. Evaluate Sociotechnical Contexts: Red teams must consider how AI systems are used in practice—including how users might manipulate them, how they interact with other tools, and how organizational incentives shape deployment.

  4. Incorporate Systems Thinking: Rather than viewing AI safety as a matter of model robustness alone, organizations must assess systemic resilience—identifying feedback loops, cascades, and coordination failures that lead to large-scale harm.

  5. Embrace Transparency and External Testing: Independent red teaming and disclosure mechanisms can surface risks that internal teams may miss due to institutional blind spots or incentive structures.

The field of AI safety is at an inflection point. As AI systems become more powerful and socially embedded, the traditional model-focused approach to red teaming must evolve. This paper offers a timely and much-needed reframing of red teaming—not as a narrow adversarial testing tool, but as a holistic practice rooted in systems thinking, interdisciplinary collaboration, and continuous scrutiny.

Organizations that adopt this broader vision will be better positioned to identify and mitigate not just model-level bugs, but the real-world harms that arise when AI meets messy, unpredictable human systems.

Citation:
Sharkey, L., Pasquinelli, M., Cheng, B., Dobbe, R., et al. Operationalizing Red Teaming for AI Systems. arXiv preprint arXiv:2507.05538. July 2025.
https://arxiv.org/abs/2507.05538