Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMs
arXiv:2503.05371v3 Announce Type: replace
Abstract: We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We compute 8 steering vectors, each correspo…