Author name: Prasad Mahadik, Adrians Skapars

Do Linear Probes Generalize Better in Persona Coordinates?

Prasad Mahadik, Adrians Skapars / May 18, 2026

arXiv:2605.09391v2 Announce Type: replace
Abstract: It is becoming increasingly necessary to have monitors check for harmful behaviors during language model interactions, but text-only monitoring has not been sufficient. This is because models sometim…

cs.AI

Do Linear Probes Generalize Better in Persona Coordinates?

Prasad Mahadik, Adrians Skapars / May 12, 2026

arXiv:2605.09391v1 Announce Type: new
Abstract: It is becoming increasingly necessary to have monitors check for harmful behaviors during language model interactions, but text-only monitoring has not been sufficient. This is because models sometimes e…