cs.AI, cs.LG

Bounded Ratio Reinforcement Learning

arXiv:2604.18578v1 Announce Type: new
Abstract: Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a significan…

cs.CV, cs.LG

Vision Language Models are Biased

arXiv:2505.23941v4 Announce Type: replace
Abstract: Large language models (LLMs) memorize a vast amount of prior knowledge from the Internet that helps them on downstream tasks but also may notoriously sway their outputs towards wrong or biased answer…

Scroll to Top