HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection
arXiv:2511.06391v3 Announce Type: replace
Abstract: Optimization of offensive content moderation models for different types of hateful messages is typically achieved through continued pre-training or fine-tuning on new hate speech benchmarks. However,…