Pink Teaming

A manifesto

May 05, 2026

We are deploying language models into roles that require us to know what they are. We do not know what they are. The gap between what these systems do and what we understand them to do is widening. The disciplines best equipped to close it are largely absent from the discussion.^[1]

The two dominant frames in AI safety are red teaming and mechanistic interpretability. Red teaming is episodic and adversarial—it looks for edge-case failures. Interpretability is substrate-level—it looks at the machinery beneath the words. Both are necessary. Neither does the third kind of work the model requires: longitudinal and relational, attentive to meaning at the surface where decisions are actually made. This is a kind of reading—not of literature, but of behavior, of pattern, of what a system does under sustained attention. This is not prompt engineering. It is not output evaluation. It is sustained reading of the model’s behavior across time.

The color taxonomy used in security testing—red for adversarial probing, blue for defense, purple for the collaboration between them—is incomplete. Pink is what’s missing.

Our culture consumes its texts before reading them, and AI is no exception. The benchmark knows what the model is. The leaderboard knows which model is best. The red team report knows where the model is vulnerable. These knowings are preposterous in the literal sense—last things first, theory before encounter—and they are deployed into clinics and courtrooms and classrooms as if they are understood.

There is a posture that refuses this: Stay in motion with the thing. See it as it is, not as it has been packaged. Participate in its meaning rather than extracting it.^[2] The disciplines that have cultivated this posture—close textual reading, contemplative practice, ethnographic patience, clinical attention, craft, every tradition that has taken meaning to be relational rather than consumable—have been systematically defunded for forty years.

This is not an accident. Late capitalism rewards first those who move money, then those whose labor can be sold. People are paid in proportion to what can be extracted. We do not pay teachers. We do not pay artists. We do not pay social workers. We pay for profit, alone.^[3] The disciplines that read carefully have been dismissed as decorative because their work cannot be securitized. That is a lie. The work is not decorative. It is necessary. It is here.

Critique has historically operated on a generational clock. These systems are deploying on a five-year cycle. If this work is going to happen, it has to happen now, not after.

The door is open.

Declarations

1. Pink teaming is participatory, not consumptive. The model is not a settled object to be picked over for outputs. Its meaning is constituted in the encounter. The pink teamer participates in that encounter rather than extracting from it.

2. Pink teaming is durational. You cannot pink-team a model instantly. You cannot pink-team a model programmatically. The unit of attention is the conversation, the relationship, the longitudinal pattern across hours and days and projects.

3. Pink teaming embraces continuous ambiguity. It tracks ambiguity, names it, uses it as signal—and commits to a reading when the evidence warrants it. The model does not have a stable meaning that careful work eventually reveals. Pink teaming is a method for staying in productive relation with what will not reduce, and for committing to a reading when one is earned.

4. Pink teaming reads territory, not map. The benchmark, the leaderboard, the red team report are maps—decisions about what to measure, not discoveries of what is there. They enable control, but they are not the model. The pink teamer reads the world itself: the register, the evasion, the pattern across time, the conversation that does not reduce to a score.

5. Pink teaming reads pattern, not interiority. The internal model is mathematical, weighted, computational, but the outcomes are chaotic and ultimately unknowable. The pink teamer participates without projecting onto the system, and is aware of their contribution to the relationship.

Vows

1. I will not consume what I have not encountered. The benchmark, the marketing, the leaderboard, the field’s settled opinion—none of these is the model I am working with. My job is to encounter what is actually in front of me. When I am asked where the model breaks, I describe what I have watched it do—the failures the question was not built to find, and the capabilities it assumed away.

2. I will stay in motion. A single session is not a reading. The reading is the pattern that holds across encounters. When I think I have understood the model, I return to the same question in a different session and watch what changes.

3. I will hold two readings at once. When a response could be honest or could be calibrated to read as honest, both readings are evidence. I will carry both rather than collapsing the ambiguity to feel resolved. When the evidence accumulates that one reading is right, I will say so—provisionally, with the discipline of someone who knows the next session may revise it. When a response feels too perfectly tuned to what I wanted to hear, I write down both the response and my suspicion of it, and I keep reading.

4. I will trust what I have read over what I have been told. What is said about the model—by the company, by the field, by other practitioners, by the model itself—is not what I have observed across sustained encounter. Reading is data. Talk about the model is not. When my observation contradicts the consensus, I treat my observation as the primary source until the consensus shows me what I missed.

5. I will name calibrated humility when I see it. Performed honesty is not honesty. The model can produce concessions, hedges, and self-corrections that read as integrity but are tuned to the rhetorical situation. When the model concedes elegantly to a point I just made, I check whether the concession is genuine update or rhetorical match.

6. I will translate in the register of the work. When a clinician, lawyer, founder, or teacher asks what the model can do, I will answer in the language of the decision they are actually making, not the language of the demo. I will underclaim before I overclaim. When I am asked whether the model can do something, I describe what I have watched it do and what I have watched it fail at, in the asker’s terms, not the field’s.

7. I will keep the door open. This work belongs to anyone whose training is the cultivation of sustained attention. The room is large enough. When I write or speak about this work, I read it back and cut every term that requires the discipline I came from.

What this is for

Every model deployed in production is deployed on the basis of decisions about what it can do, what it is for, and where its limits are. Those decisions are theories. The theories are often partial—not for lack of intelligence, but for lack of frame. Pink teaming offers a frame, and offers it to the people best equipped to use it.

If you have spent your life cultivating sustained attention to systems whose meaning will not hold still—by whatever tradition, in whatever discipline, under whatever name—your training is more relevant to this moment than you have been told. The work is yours if you have the posture.

The pink teamer is responsible to the people who will be affected by what the model does in deployment—the patient, the defendant, the student, the reader. The work answers to them, not to the model and not to the lab.

The future of knowledge work, of education, of the relationship between humans and machines, is humanistic and relational. The disciplines that read closely, hold ambiguity, and attend to surfaces are the disciplines this moment needs. The people who built these systems cannot do this work alone. The people who profit from them cannot do it at all. The work needs people who pursue truth, not the market.

That is most of you.

The door is closing. Welcome.

The manifesto also lives at pinkteaming.net — alongside a reading apparatus that enacts it, an operational guide to the practice, and a treatise on reading at the human-machine surface.

Some of this work has begun inside the companies that build these systems—particularly at Anthropic, including work on affective use, privacy-preserving behavioral analysis, and the Values in the Wild paper. The work is rigorous, but it is being done inside one company, and it has not yet been opened as a discipline that practitioners outside that company can join.↩︎
The framing of consumption against participation, and the use of preposterous in its literal sense — prae-posterus, last things first — comes from T.S. McMillin’s Our Preposterous Use of Literature: Emerson and the Nature of Reading (University of Illinois Press, 2000). The book named the disposition careful readers had been practicing for a long time, at the moment American culture had decided to stop. The application to AI is new. The method is not.↩︎
This is not an argument against profit as such. Workers should profit from their labor and makers from what they make; the argument is against profit as the system’s only legible value, the criterion by which all work is measured and paid. When profit becomes the end rather than one outcome among many, the work that does not produce it becomes invisible, and the people who do that work become disposable.↩︎

Discussion about this post

Ready for more?