Author name: Wai Man Si, Mingjie Li, Michael Backes, Yang Zhang

Pruning Unsafe Tickets: A Resource-Efficient Framework for Safer and More Robust LLMs

Wai Man Si, Mingjie Li, Michael Backes, Yang Zhang / April 20, 2026

arXiv:2604.15780v1 Announce Type: new
Abstract: Machine learning models are increasingly deployed in real-world applications, but even aligned models such as Mistral and LLaVA still exhibit unsafe behaviors inherited from pre-training. Current alignme…

cs.CL

A Systematic Study of Training-Free Methods for Trustworthy Large Language Models

Wai Man Si, Mingjie Li, Michael Backes, Yang Zhang / April 20, 2026

arXiv:2604.15789v1 Announce Type: new
Abstract: As Large Language Models (LLMs) receive increasing attention and are being deployed across various domains, their potential risks, including generating harmful or biased content, producing unsupported cl…