cs.AI, cs.CL, cs.LG

Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMs

arXiv:2503.05371v3 Announce Type: replace
Abstract: We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We compute 8 steering vectors, each correspo…