Protecting Cognitive Integrity: Our internal AI use policy (V1)

We (at GPAI Policy Lab), wanted to share our V1 policy as an invitation to argue about it. Some of what motivates it is extrapolation and conversations we had internally on AI capabilities, effects on the cognitions, and some empirical evidence. I think the expected cost of being somewhat over-cautious here is lower than the cost of being under-cautious, and the topic deserves considerably more attention than it's currently getting. I'd love to see more orgs publish their own policies on this, both to compare experiences and to develop shared best practices.

I'd particularly welcome:
- Counterarguments from people who think this kind of policy is overblown, counterproductive, or targets the wrong mechanisms.
- Experience from other AI safety or AI policy orgs that have tried something similar, what worked, what didn't, what you'd change for V2.
- Specific critiques of the restrictions themselves: too narrow, too broad, wrong category, wrong threshold.
- Or event alternative framings. Do you think "cognitive integrity" is the right handle? Is this a special case of a more general problem we should be thinking about differently?

If you've written anything on this, or are working on something similar inside your own organization, I'd be very interested in hearing from you.

Why I'm writing this

I want to share a policy we've put in place internally at the GPAI Policy Lab, because I think the underlying problem is one more AI safety and AI policy organizations should be thinking (and doing) something about now, rather than later.

The concern is that daily use of capable AI systems can gradually compromise the cognitive integrity we most need to do our work. I take this very seriously, and I think more people should. A significant part of what we do require good reasoning, good judgment and object-level understanding of the issues.

So to preserve it and progress in a world with AI agent, I don't think individual willpower is enough here. My view is that we need norms, concretely specified and shared, rather than asking each person to manage this on their own. The failure modes are subtle enough and could be fast enough that people often don't notice them happening, which is part of what makes this potentially dangerous.

So we wrote a V1 and we apply it internally. I'm publishing and sharing it for two reasons : - One is that having an internal policy, even an imperfect one, is probably a useful lever for awareness, both inside your own org and more broadly. Writing something concrete forces specificity, and specificity is what lets you actually notice when you're violating it. If this problem is real and growing, I'd like the conversation to be happening inside more orgs sooner rather than later.
- The other is that I want feedback. A broader reflection on these questions should probably exist and doesn't really, as far as I can tell. I'd like to know what other AI safety and AI policy organizations have tried, what has worked, what hasn't, and what we're not seeing. Our V1 is only one data point but I'd like there to be many.

What follows is a translation of our internal document, originally written in French. I've kept the original structure, hard restrictions, then warning signals, then what to do about them.

The policy

We use AI every day, on many tasks. Daily interaction with AI systems can affect our cognitive and judgment capacities. We already know these effects can worsen as model capabilities grow, and we think it's necessary to take preventive measures to avoid them or at least limit them.

The document sets out hard restrictions, and the signals we watch for collectively.

1. Hard restrictions

No delegation of value judgments to AI, to evaluate a situation, an action, or a moral dilemma. Forming a considered judgment, built on reasoning and properly supported, is a core capacity we want to preserve and deliberately keep exercising. It's not a capacity we want to delegate.

Good uses:

Asking AI how different schools of thought would approach a given dilemma, to enrich your own thinking.
Asking AI to list arguments for and against a position, likely objections, or angles you hadn't considered.

Uses to avoid:

Asking AI "should we publish this note?", "is this position morally defensible?", "am I right in this disagreement?"
Using AI's response as an arbiter in a team disagreement ("I checked with Claude, it agrees with me").

No emotional management, relationship management, or metaphysical conversation via AI in a professional context. These are the conversations where the documented patterns of cognitive drift (AI psychosis among others) tend to crystallize and where they have been most frequently observed. They also short-circuit exchanges that should have happened inside the team.

Good uses:

"I've thought about what I want to say to X and why; I'm asking AI to help me rephrase for clarity or diplomacy."
"I've drafted a difficult message; I'm asking AI to spot ambiguous passages or ones that could land badly."

Uses to avoid:

Asking AI "am I right to be annoyed with X?"
Asking AI "how should I react to what Y said?" and delegating the judgment.
Venting professional frustration to AI to offload it. At best the problem is evacuated without being resolved; at worst, a conversation that should have happened inside the team doesn't happen.
Sending a message whose final wording came from AI without checking that it actually matches what you think and how you want to be perceived.

No prolonged, solitary AI sessions on high-stakes topics. If you find yourself in conversation with AI for more than 90 minutes on the same topic - a dilemma, a consequential decision, an analysis whose conclusions depend on initial framing - without having talked to a human, stop. The documented patterns of cognitive drift start with prolonged, isolated sessions. The phenomenon will likely intensify as models become more capable, so it's better to install the norm upstream.

Good uses:

Working two hours with AI on reformatting, debugging, or layout. The rule doesn't apply to mechanical tasks.

Uses to avoid:

Stringing together several hours of reflection without talking to anyone.

No AI interaction as the first action for reasoning and analysis tasks. These faculties are critical for what we do at GPAI Policy Lab. It's strictly necessary to maintain and keep developing our capacity to think and reason without thinking primarily through framings we didn't produce.

Good uses:

"I started by reasoning on my own, I thought about my plan and the arguments; now I want to stress-test and rephrase with AI."
Using AI for mechanical translation, formatting, research.

Uses to avoid:

Using AI to start a reasoning process and build an argument from scratch.

2. Individual and collective warning signals

Erosion of cognitive integrity can be subtle and progressive, which makes the phenomenon harder to detect and prevent. This is why watchfulness has to be collective.

Individual warning signals (watch for in yourself):

Difficulty formulating a position on a topic without having consulted AI first.
Defaulting to AI as your first interlocutor.
Needing an immediate response; discomfort with uncertainty or with sustained autonomous reflection.
The feeling that AI always confirms your intuitions, combined with noticing that no recent interaction has led you to revise a position or recognize an objection you hadn't already identified.

Collective warning signals (watch for in the team):

AI invoked as an argument from authority in meetings ("I checked with Claude", "the AI confirms that…").
Deliverable quality is constant or improving, but the ability to defend the work orally is going down.

3. Protocol

⚠️ When one or more warning signals appears:
If you see the signal in yourself: stop using AI on the task in question and pick it back up autonomously, then talk to a colleague or manager about it.
If you see it in a colleague: give them the feedback directly, in an informal setting.
If the pattern persists or affects more than one person: raise it at a team meeting.

Discuss

Why I'm writing this

The policy

1. Hard restrictions

2. Individual and collective warning signals

3. Protocol

Leave a Comment