Carnegie Mellon at ICLR 2026

CMU researchers are presenting 194 papers at the Fourteenth International Conference on Learning Representations (ICLR 2026), held from April 23rd-April 27th at the Riocentro Convention and Event Center in Rio de Janeiro, Brazil. Here is a quick overview of the areas our researchers are working on:

Here are our most frequent collaborator institutions:

Oral Papers

EditBench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits

Authors: Wayne Chi (CMU), Valerie Chen (Carnegie Mellon University), Ryan Shar (Apple), Aditya Mittal (CMU, Carnegie Mellon University), Jenny Liang (School of Computer Science, Carnegie Mellon University), Wei-Lin Chiang (UC Berkeley / LMSYS), Anastasios Angelopoulos (University of California Berkeley), Ion Stoica (), Graham Neubig (Carnegie Mellon University), Ameet Talwalkar (University of California-Los Angeles), Chris Donahue (CMU / Google DeepMind)

This work introduces EditBench, a new benchmark for testing how well AI models can edit existing code based on user instructions. Unlike prior benchmarks, it uses real-world coding tasks and contexts, including things like the surrounding code and cursor position. The benchmark includes 545 diverse problems, and results show that most models struggle—only a few achieve strong performance. The study also finds that having more realistic context significantly impacts how well models perform, highlighting the importance of evaluating code-editing in real-world settings.

UALM: Unified Audio Language Model for Understanding, Generation and Reasoning

Authors: Jinchuan Tian (CMU, Carnegie Mellon University), Sang-gil Lee (NVIDIA), Zhifeng Kong (NVIDIA), Sreyan Ghosh (Nvidia), Arushi Goel (NVIDIA), Chao-Han Huck Yang (NVIDIA Research), Wenliang Dai (NVIDIA), Zihan Liu (Nvidia), Hanrong Ye (NVIDIA), Shinji Watanabe (Carnegie Mellon University), Mohammad Shoeybi (NVIDIA), Bryan Catanzaro (NVIDIA), Rafael Valle (NVIDIA), Wei Ping (Nvidia)

This paper introduces the Unified Audio Language Model (UALM), a single model designed to handle audio understanding, text-to-audio generation, and multimodal reasoning together. Instead of treating these as separate tasks, UALM learns to both interpret and generate audio, achieving performance comparable to specialized state-of-the-art models. The authors also show that combining text and audio during the model’s reasoning process improves its ability to handle complex tasks. Overall, the work demonstrates a step toward more general AI systems that can reason across both language and sound.

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Authors: Yueqi Song (CMU), Ketan Ramaneti (Amazon), Zaid Sheikh (Carnegie Mellon University), Ziru Chen (Ohio State University, Columbus), Boyu Gou (Ohio State University, Columbus), Tianbao Xie (the University of Hong Kong, University of Hong Kong), Yiheng Xu (University of Hong Kong), Danyang Zhang (Shanghai Jiao Tong University), Apurva Gandhi (Carnegie Mellon University), Fan Yang (Fujitsu), Joseph Liu (School of Computer Science, Carnegie Mellon University), Tianyue Ou (Carnegie Mellon University), Zhihao Yuan (Carnegie Mellon University), Frank F Xu (Carnegie Mellon University), Shuyan Zhou (Facebook), Xingyao Wang (All Hands AI), Xiang Yue (Carnegie Mellon University), Tao Yu (University of Hong Kong), Huan Sun (Ohio State University), Yu Su (Ohio State University), Graham Neubig (Carnegie Mellon University)

This work introduces the Agent Data Protocol (ADP), a standardized format for representing training data for AI agents. The authors argue that the main challenge isn’t a lack of data, but that existing datasets are fragmented across different formats and tools. ADP acts as a common “interlingua,” making it easier to combine diverse data sources—like coding, browsing, and tool use—into a single training pipeline. By converting 13 datasets into this unified format, the authors show that models trained on the combined data achieve improved performance.

MotionStream: Real-Time Video Generation with Interactive Motion Controls

Authors: Joonghyuk Shin (Seoul National University), Zhengqi Li (Google), Richard Zhang (Adobe), Jun-Yan Zhu (Carnegie Mellon University), Jaesik Park (Seoul National University), Eli Shechtman (Adobe), Xun Huang (Adobe Research)

This paper introduces MotionStream, a system for generating videos in real time based on motion and text inputs. Unlike prior methods that take minutes to produce a video, MotionStream can stream results at up to 29 frames per second on a single GPU. The key idea is to train a fast, causal model that can generate video continuously, using techniques that prevent quality from degrading over long sequences. As a result, users can interactively control motion—like drawing paths or moving a camera—and see the video update instantly.

OpenThoughts: Data Recipes for Reasoning Models

Authors: Etash Guha (Stanford University, Anthropic), Ryan Marten (Harbor), Sedrick Keh (Toyota Research Institute), Negin Raoof (University of California, Berkeley), Georgios Smyrnis (University of Texas, Austin), Hritik Bansal (University of California, Los Angeles), Marianna Nezhurina (Juelich Supercomputing Center, LAION, Tuebingen University), Jean Mercat (Toyota Research Institute (TRI)), Trung Vu (Google), Zayne Sprague (New York University), Ashima Suvarna (UCLA), Benjamin Feuer (Stanford University), Leon Liangyu Chen (Stanford University), Zaid Khan (University of North Carolina at Chapel Hill), Eric Frankel (Department of Computer Science, University of Washington), Sachin Grover (Arizona State University), Caroline Choi (None), Niklas Muennighoff (Stanford University), Shiye Su (Stanford University), Wanjia Zhao (Stanford University), John Yang (Princeton University), Shreyas Pimpalgaonkar (New York University), Kartik sharma (Georgia Institute of Technology), Charlie Ji (University of California, Berkeley), Yichuan Deng (Department of Computer Science, University of Washington), Sarah Pratt (University of Washington), Vivek Ramanujan (Department of Computer Science, University of Washington), Jon Saad-Falcon (Computer Science Department, Stanford University), Stutee Acharya (University of South Florida), Jeffrey Li (Carnegie Mellon University), Achal Dave (Anthropic), Alon Albalak (SynthLabs), Kushal Arora (McGill University), Blake Wulfe (Toyota Research Institute), Chinmay Hegde (New York University), Greg Durrett (New York University), Sewoong Oh (University of Washington), Mohit Bansal (UNC Chapel Hill), Saadia Gabriel (University of Washington), Aditya Grover (UCLA), Kai-Wei Chang (University of Virginia Main Campus), Vaishaal Shankar (Apple), Aaron Gokaslan (Cornell University), Mike Merrill (None), Tatsunori Hashimoto (Stanford University), Yejin Choi (Stanford University / NVIDIA), Jenia Jitsev (LAION; Juelich Supercomputing Center, Research Center Juelich), Reinhard Heckel (Technical University Munich), Maheswaran Sathiamoorthy (University of Southern California), Alex Dimakis (Electrical Engineering & Computer Science Department, University of California, Berkeley), Ludwig Schmidt (University of Washington / Stanford / Anthropic)

This work introduces the OpenThoughts project, which aims to create high-quality, open-source datasets for training reasoning-focused AI models. The authors show that models trained on their public data can match or exceed the performance of strong existing systems that rely on private datasets. By carefully studying and improving their data generation process, they build larger and better datasets that significantly boost performance across math, coding, and science benchmarks. Overall, the project demonstrates that open data alone can be enough to train highly capable reasoning models.

Mamba-3: Improved Sequence Modeling using State Space Principles

Authors: Aakash Sunil Lahoti (CMU, Carnegie Mellon University), Kevin Li (Carnegie Mellon University), Berlin Chen (Princeton University), Caitlin Wang (Princeton University), Aviv Bick (Carnegie Mellon University), Zico Kolter (Carnegie Mellon University), Tri Dao (Princeton University), Albert Gu (Cartesia AI CMU)

This paper introduces Mamba-3, a new model designed to make AI inference faster and more efficient without sacrificing performance. While many efficient alternatives to Transformers reduce computation, they often struggle with tasks like tracking long-term information; Mamba-3 addresses this with improved state modeling and a more expressive update mechanism. The model also uses a multi-input, multi-output design to boost accuracy without slowing down generation. Overall, Mamba-3 shows that it’s possible to improve both efficiency and capability at the same time, pushing forward the tradeoff between speed and performance.

Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding

Authors: Yuxuan Zhou (Independent Researcher), Fei Huang (Alibaba Group), Heng Li (Carnegie Mellon University), Fengyi Wu (University of Washington), Tianyu Wang (University of Washington), Jianwei Zhang (Alibaba Group), Junyang Lin (Alibaba Group), Zhi-Qi Cheng (University of Washington)

This paper introduces Hierarchical Speculative Decoding (HSD), a new method to speed up large language model inference by improving the verification step in speculative decoding while preserving exact output distributions. It addresses the challenge of “joint intractability” in sequence-level verification by organizing resampling into a hierarchy that redistributes probability mass across branches, enabling more tokens to be accepted at once. The approach is theoretically proven to be lossless and empirically shows consistent speed improvements across models and benchmarks, outperforming prior tokenwise and blockwise verification methods. Overall, HSD offers a practical and general way to accelerate decoding without sacrificing fidelity, achieving state-of-the-art efficiency when integrated into existing frameworks.

Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning

Authors: Haoyue Dai (Carnegie Mellon University), Immanuel Albrecht (FernUniversität in Hagen), Peter Spirtes (Carnegie Mellon University), Kun Zhang (Carnegie Mellon University & MBZUAI)

This paper studies causal discovery in linear non-Gaussian models with latent variables and cycles, focusing on when different causal graphs are observationally indistinguishable. It provides the first general characterization of distributional equivalence in this setting, introducing new tools—especially edge rank constraints—to describe when two models generate the same observed data. Building on this theory, the authors derive practical graphical criteria and transformations to enumerate all equivalent models and propose an algorithm to recover the entire equivalence class from data. Overall, the work removes the need for strong structural assumptions and offers a general, principled framework for latent-variable causal discovery.

Revela: Dense Retriever Learning via Language Modeling

Authors: Fengyu Cai (Technische Universität Darmstadt), Tong Chen (University of Washington), Xinran Zhao (Carnegie Mellon University), Sihao Chen (Microsoft), Hongming Zhang (Tencent AI Lab Seattle), Sherry Wu (Carnegie Mellon University), Iryna Gurevych (Technical University of Darmstadt / Mohamed bin Zayed University of Artificial Intelligence), Heinz Koeppl (TU Darmstadt)

This paper introduces Revela, a self-supervised framework for training dense retrievers by leveraging language modeling objectives instead of relying on annotated query-document pairs. It augments next-token prediction with an in-batch attention mechanism that allows documents to attend to each other, enabling the retriever to learn cross-document relationships jointly with a language model. Experiments across domain-specific, reasoning-intensive, and general benchmarks show that Revela matches or surpasses supervised and API-based retrievers while using significantly less data and compute. Overall, the work demonstrates a scalable and efficient alternative for retriever learning directly from raw text with strong generalization across domains.

Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling

Authors: Tal Daniel (Carnegie Mellon University), Carl Qi (University of Texas at Austin), Dan Haramati (Brown University), Amir Zadeh (Lambda), Chuan Li (Lambda Labs), Aviv Tamar (Technion), Deepak Pathak (Carnegie Mellon University), David Held (Carnegie Mellon University)

This paper introduces the Latent Particle World Model (LPWM), a self-supervised, object-centric world model that learns to decompose scenes into latent particles (e.g., keypoints, masks, and object attributes) directly from raw video without supervision. It proposes a novel per-particle latent action mechanism that models stochastic dynamics, enabling the system to capture complex multi-object interactions and generate diverse future predictions. The model is trained end-to-end and supports flexible conditioning on actions, language, and goal images, achieving state-of-the-art performance on both real-world and synthetic video prediction tasks. Beyond video modeling, LPWM also demonstrates strong potential for decision-making applications such as imitation learning by leveraging its learned latent dynamics.

LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts

Authors: Siyuan Wang (Shanghai Jiao Tong University), Gaokai Zhang (Carnegie Mellon University), Li Lyna Zhang (Microsoft Research Asia), Ning Shang (Microsoft), Fan Yang (Microsoft Research), Dongyao Chen (Shanghai Jiaotong University), Mao Yang (Peking University)

The authors introduce LoongRL, a reinforcement learning framework designed to improve long-context reasoning in large language models by training them on challenging, synthesized tasks. They propose KeyChain, a data construction method that embeds hidden question chains within long documents, forcing models to perform multi-step planning, retrieval, and reasoning rather than relying on shortcuts. Through RL training, models develop an emergent “plan–retrieve–reason–recheck” reasoning pattern that generalizes from shorter (16K) to much longer (128K) contexts. Experiments show that LoongRL significantly boosts long-context reasoning performance while maintaining strong short-context abilities, achieving results comparable to much larger models.

Exchangeability of GNN Representations with Applications to Graph Retrieval

Authors: Kartik Nair (Carnegie Mellon University), Indradyumna Roy (IIT Bombay, Aalto University), Soumen Chakrabarti (IIT Bombay), Anirban Dasgupta (IIT Gandhinagar), Abir De (Indian Institute of Technology Bombay)

This paper introduces the concept of exchangeability in graph neural networks (GNNs), showing that the dimensions of learned node embeddings are statistically interchangeable due to random initialization and permutation-invariant training. This property implies that embedding components share identical distributions, enabling simplifications in how graph similarities are computed. Leveraging this insight, the authors approximate complex transportation-based graph distances using simpler Euclidean operations on sorted embedding values. They further propose GRAPHHASH, a locality-sensitive hashing framework that enables efficient and scalable graph retrieval, achieving strong performance compared to existing methods.

Poster Papers

Applications

TusoAI: Agentic Optimization for Scientific Methods

Authors: Alistair Turcan (School of Computer Science, Carnegie Mellon University), Kexin Huang (Stanford University), Lei Li (School of Computer Science, Carnegie Mellon University), Martin J. Zhang (Carnegie Mellon University)

AutoLibra: Agent Metric Induction from Open-Ended Human Feedback

Authors: Hao Zhu (Carnegie Mellon University), Phil Cuvin (Stanford University), Xinkai Yu (University of Pennsylvania, University of Pennsylvania), Charlotte Yan (Stanford University), Jason Zhang (Stanford University), Diyi Yang (Stanford University)

DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials

Authors: Kevin Han (Carnegie Mellon University), Bowen Deng (UC Berkeley), Amir Barati Farimani (CMU, Carnegie Mellon University), Gerbrand Ceder (University of California, Berkeley)

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

Authors: Ganlin Yang (University of Science and Technology of China), Tianyi Zhang (Zhejiang University; Shanghai Artificial Intelligence Laboratory), Haoran Hao (Carnegie Mellon University), Weiyun Wang (Fudan University), Yibin Liu (Northeastern University), Dehui Wang (Shanghai Jiaotong University), Guanzhou Chen (Shanghai AI Laboratory, Shanghai Jiaotong University), Zijian Cai (Shenzhen University), Junting Chen (national university of singaore, National University of Singapore), Weijie Su (University of Science and Technology of China), Wengang Zhou (University of Science and Technology of China), Yu Qiao (Shanghai Aritifcal Intelligence Laboratory), Jifeng Dai (Tsinghua University, Tsinghua University), Jiangmiao Pang (Shanghai AI Laboratory), Gen Luo (Shanghai AI Laboratory), Wenhai Wang (Shanghai AI Laboratory), Yao Mu (Shanghai Jiao Tong University), Zhi Hou (Shanghai Artificial Intelligence Laboratory)

DataMIL: Selecting Data for Robot Imitation Learning with Datamodels

Authors: Shivin Dass (University of Texas at Austin), Alaa Khaddaj (OpenAI), Logan Engstrom (Massachusetts Institute of Technology), Aleksander Madry (OpenAI), Andrew Ilyas (Carnegie Mellon University), Roberto Martín-Martín (University of Texas at Austin)

TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale

Authors: Malgorzata Gwiazda (Technical University of Munich), Yifu Cai (Millennium Management LLC), Mononito Goswami (Carnegie Mellon University), Arjun Choudhry (Georgia Institute of Technology), Artur Dubrawski (Carnegie-Mellon University)

MetaVLA: Unified Meta Co-Training for Efficient Embodied Adaptation

Authors: Chen Li (Carnegie Mellon University), Zhantao Yang (Carnegie Mellon University), Han Zhang (Carnegie Mellon University), Fangyi Chen (ByteDance Inc.), Chenchen Zhu (Meta AI), Anudeepsekhar Bolimera (Carnegie Mellon University), Marios Savvides (Carnegie Mellon University)

Strategic Planning and Rationalizing on Trees Make LLMs Better Debaters

Authors: Danqing Wang (Carnegie Mellon University), Zhuorui Ye (Tsinghua University), Xinran Zhao (CMU, Carnegie Mellon University), Fei Fang (Carnegie Mellon University), Lei Li (School of Computer Science, Carnegie Mellon University)

Go-Browse: Training Web Agents with Structured Exploration

Authors: Apurva Gandhi (Carnegie Mellon University), Graham Neubig (Carnegie Mellon University)

Mode-conditioning unlocks superior test-time compute scaling

Authors: Chen Wu (Carnegie Mellon University), Sachin Goyal (Anthropic), Aditi Raghunathan (Carnegie Mellon University)

What Generative Search Engines Like and How to Optimize Web Content Cooperatively

Authors: Yujiang Wu (Carnegie Mellon University), Shanshan Zhong (Carnegie Mellon University), Yubin Kim (Vody, LLC), Chenyan Xiong (School of Computer Science, Carnegie Mellon University)

TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis

Authors: Vijay Ekambaram (IBM), Subodh Kumar (International Business Machines), Arindam Jati (International Business Machines (IBM)), Sumanta Mukherjee (International Business Machines), Tomoya Sakai (International Business Machines), Pankaj Dayama (International Business Machines), Wesley Gifford (IBM Research), Jayant Kalagnanam (Carnegie Mellon University)

RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation

Authors: Yash Jangir (Carnegie Mellon University), Yidi Zhang (), Kashu Yamazaki (CMU, Carnegie Mellon University), Chenyu Zhang (Peking University), Kuan-Hsun Tu (National Taiwan University), Tsung-Wei Ke (Department of computer science and informational engineering, National Taiwan University), Lei Ke (Carnegie Mellon University), Yonatan Bisk (Carnegie Mellon University), Katerina Fragkiadaki (CMU)

Generalizable End-to-End Tool-Use RL with Synthetic CodeGym

Authors: Weihua Du (Tsinghua University), HaileiGong (Huawei Technologies Ltd.), Zhan Ling (UC San Diego), Kang Liu (ByteDance Inc.), Lingfeng Shen (Johns Hopkins University), Xuesong Yao (ByteDance Inc.), Yufei Xu (ByteDance Inc.), Dingyuan Shi (ByteDance Inc.), Yiming Yang (Carnegie Mellon University), Jiecao Chen (ByteDance Inc.)

WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables

Authors: Zhaojiang Lin (Meta), YONG XU (Meta), Kai Sun (Meta), Jing Zheng (Ant Group), Yin Huang (Facebook), Surya Appini (Meta), Krish Narang (Facebook), Renjie Tao (Facebook), Ishan Jain (Facebook), Siddhant Arora (Carnegie Mellon University), Ruizhi Li (Facebook), Yiteng Huang (Facebook), Kaushik Patnaik (Apple), Wenfang Xu (Meta Platforms, Inc.), Suwon Shon (ASAPP), Yue Liu (Meta), Ahmed Aly (Facebook), Anuj Kumar (Meta), Florian Metze (Carnegie Mellon University), Xin Dong (Facebook)

A tale of two tails: Preferred and anti-preferred natural stimuli in visual cortex

Authors: Rabia Gondur (Cold Spring Harbor Laboratory), Patricia Stan (CMU, Carnegie Mellon University), Matthew A Smith (Carnegie Mellon University), Benjamin Cowley (Cold Spring Harbor Laboratory)

FictionalQA: A Dataset for Studying Memorization and Knowledge Acquisition

Authors: John Kirchenbauer (University of Maryland, College Park), Natjanan Mongkolsupawan (Carnegie Mellon University), Yuxin Wen (University of Maryland), Tom Goldstein (University of Maryland), Daphne Ippolito (School of Engineering and Applied Science, University of Pennsylvania)

Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

Authors: Justin Lin (Computer Science Department, Stanford University), Eliot Jones (Gray Swan), Donovan Jasper (Stanford University), Ethan Ho (Stanford University), Anna Wu (Computer Science Department, Stanford University), Arnold Yang (Stanford University), Neil Perry (Princeton University), Andy Zou (CMU, Carnegie Mellon University), Matt Fredrikson (University of Wisconsin, Madison), Zico Kolter (Carnegie Mellon University), Percy Liang (Stanford University), Dan Boneh (Stanford University), Daniel Ho (Stanford University)

Bound by semanticity: universal laws governing the generalization-identification tradeoff

Authors: Marco Nurisso (Polytechnic University of Turin), Jesseba Fernando (Northeastern University), Raj Deshpande (Northeastern University London), Alan Perotti (Intesa Sanpaolo AI Research), Raja Marjieh (Princeton University), Steven Frankland (Dartmouth College), Richard Lewis (Carnegie Mellon University), Taylor Webb (University of California, Los Angeles), Declan Campbell (Princeton University), Francesco Vaccarino (Politecnico di Torino), Jonathan Cohen (Princeton University), Giovanni Petri (Network Science Institute, Northeastern University London)

Zero-shot Forecasting by Simulation Alone

Authors: Boris Oreshkin (Amazon), Mayank Jauhari (Amazon), Ravi Kiran Selvam (Amazon), Malcolm Wolff (Amazon), Wenhao Pan (University of Washington), Shankar Ramasubramanian (Amazon), KIN GUTIERREZ (Carnegie Mellon University), Tatiana Konstantinova (Amazon), Andres Potapczynski (New York University), Mengfei Cao (Amazon.com), Dmitry Efimov (Amazon), Michael W Mahoney (University of California Berkeley), Andrew Gordon Wilson (New York University)

Self-Improving Vision-Language-Action Models with Data Generation via Residual RL

Authors: Wenli Xiao (Carnegie Mellon University), Haotian Lin (CMU, Carnegie Mellon University), Andy Peng (University of California, Berkeley), Haoru Xue (University of California, Berkeley), Tairan He (NVIDIA), Zhengyi Luo (Carnegie Mellon University), Yuqi Xie (NVIDIA), Fengyuan Hu (NVIDIA), Jim Fan (NVIDIA), Guanya Shi (CMU, Carnegie Mellon University), Yuke Zhu (NVIDIA / UT-Austin)

Improving Attributed Long-form Question Answering with Intent Awareness

Authors: Xinran Zhao (CMU, Carnegie Mellon University), Aakanksha Naik (Allen Institute for Artificial Intelligence), Jay DeYoung (Allen Institute for Artificial Intelligence), Joseph Chee Chang (Allen Institute for Artificial Intelligence), Jena Hwang (Allen Institute for Artificial Intelligence), Sherry Wu (Carnegie Mellon University), Varsha Kishore (Cornell University)

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Authors: Yuxiao Qu (Carnegie Mellon University), Anikait Singh (Stanford University), Yoonho Lee (Stanford University), Amrith Setlur (Carnegie Mellon University), Russ Salakhutdinov (CMU), Chelsea Finn (Stanford University, Physical Intelligence), Aviral Kumar (University of California Berkeley)

Measuring LLM Novelty As The Frontier Of Original And High-Quality Output

Authors: Vishakh Padmakumar (Stanford University), Chen Yueh-Han (New York University), Jane Pan (New York University), Valerie Chen (Carnegie Mellon University), He He (New York University)

BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning

Authors: Yitang Li (), Zhengyi Luo (Carnegie Mellon University), Tonghe Zhang (Carnegie Mellon University), Cunxi Dai (Carnegie Mellon University), Anssi Kanervisto (Microsoft Research), Andrea Tirinzoni (Meta, FAIR), Haoyang Weng (Tsinghua University, Tsinghua University), Kris Kitani (Carnegie Mellon University), Mateusz Guzek (Meta AI), Ahmed Touati (Meta AI Research), Alessandro Lazaric (Facebook), Matteo Pirotta (Meta), Guanya Shi (CMU, Carnegie Mellon University)

CaTS: Calibrated Test-Time Scaling for Efficient LLM Reasoning

Authors: Chengsong Huang (Washington University, Saint Louis), Langlin Huang (Washington University, Saint Louis), Jixuan Leng (Carnegie Mellon University), Jiacheng Liu (NVIDIA), Jiaxin Huang (Washington University in St. Louis)

Real-Time Reasoning Agents in Evolving Environments

Authors: Yule Wen (Tsinghua University, Tsinghua University), Yixin Ye (Shanghai Jiaotong University), Yanzhe Zhang (Georgia Institute of Technology), Diyi Yang (Stanford University), Hao Zhu (Carnegie Mellon University)

ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists

Authors: Jie Ruan (University of Michigan – Ann Arbor), Inderjeet Nair (University of Michigan – Ann Arbor), Shuyang Cao (Bloomberg), Amy Liu (University of Michigan), Sheza Munir (University of Toronto), Micah Pollens-Dempsey (University of Michigan – Ann Arbor), Yune-Ting Chiang (University of Michigan – Ann Arbor), Lucy Kates (University of Michigan – Ann Arbor), Nicholas David (University of Michigan – Ann Arbor), Sihan Chen (Carnegie Mellon University), Ruxin Yang (University of Michigan – Ann Arbor), Yuqian Yang (University of Michigan – Ann Arbor), Jihyun Gump (University of Michigan – Ann Arbor), Tessa Bialek (University of Michigan Law School), Vivek Sankaran (University of Michigan – Ann Arbor), Margo Schlanger (University of Michigan – Ann Arbor), Lu Wang (University of Michigan)

From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking

Authors: Gyeongwon J Kim (Carnegie Mellon University), Alex Wilf (Carnegie Mellon University), Louis-Philippe Morency (Carnegie Mellon University), Daniel Fried (Carnegie Mellon University)

PRISM: Enhancing PRotein Inverse Folding through Fine- Grained Retrieval on Structure-Sequence Multimodal Representations

Authors: Sazan Mahbub (Carnegie Mellon University School of Computer Science), Souvik Kundu (Intel), Eric P Xing (CMU)

MAPSS: Manifold-based Assessment of Perceptual Source Separation

Authors: Amir Ivry (Technion), Samuele Cornell (None), Shinji Watanabe (Carnegie Mellon University)

ViPRA: Video Prediction for Robot Actions

Authors: Sandeep Kumar Routray (Skild AI), Hengkai Pan (CMU, Carnegie Mellon University), Unnat Jain (Facebook AI Research), Shikhar Bahl (Skild AI), Deepak Pathak (Carnegie Mellon University)

Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives

Authors: Zihan Wang (Amazon), Jiashun Wang (School of Computer Science, Carnegie Mellon University), Jeff Tan (Carnegie Mellon University), Yiwen Zhao (School of Computer Science, Carnegie Mellon University), Jessica Hodgins (RAI Institute), Shubham Tulsiani (Carnegie Mellon University), Deva Ramanan (School of Computer Science, Carnegie Mellon University)

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Authors: Junlong Li (The Hong Kong University of Science and Technology), Wenshuo Zhao (Zhejiang University), Jian Zhao (Beijing University of Posts and Telecommunications), Weihao Zeng (Hong Kong University of Science and Technology), Haoze Wu (Zhejiang University), Xiaochen Wang (None), Rui Ge (Shanghai Jiaotong University), Yuxuan Cao (HKUST), Yuzhen Huang (HKUST), Wei Liu (HKUST), Junteng LIU (HKUST), Zhaochen Su (The Hong Kong University of Science and Technology), Yiyang Guo (Fudan University), FAN ZHOU (Shanghai Jiao Tong University), Lueyang Zhang (The Hong Kong University of Science and Technology), Juan Michelini (Universidad de la República), Xingyao Wang (All Hands AI), Xiang Yue (Carnegie Mellon University), Shuyan Zhou (Facebook), Graham Neubig (Carnegie Mellon University), Junxian He (HKUST)

SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling

Authors: Yixian Zhang (Tsinghua University, Tsinghua University), Shu-ang Yu (Tsinghua University), Tonghe Zhang (Carnegie Mellon University), Mo Guang (Li Auto Inc.), Haojia Hui (Li Auto Inc.), Kaiwen Long (Li Auto Inc.), Yu Wang (Tsinghua Univ.), Chao Yu (Tsinghua University), Wenbo Ding (Tsinghua University, Tsinghua University)

Computer Vision

Multi-Object System Identification from Videos

Authors: Chunjiang Liu (Carnegie Mellon University), Xiaoyuan Wang (Carnegie Mellon University), Qingran Lin (Georgia Institute of Technology), Albert Xiao (Carnegie Mellon University), Haoyu Chen (Harvard University, Harvard University), Shizheng Wen (ETHZ – ETH Zurich), Hao Zhang (UIUC), Lu Qi (Insta360), Ming-Hsuan Yang (Google DeepMind), Laszlo A. Jeni (Carnegie Mellon University), Min Xu (Carnegie Mellon University), Yizhou Zhao (Snap Inc.)

FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction

Authors: Yixiang Dai (Carnegie Mellon University), Fan Jiang (AMAP, Alibaba), Chiyu Wang (Alibaba Group), Mu Xu (Alibaba Group), Yonggang Qi (Beijing University of Posts and Telecommunications)

Learning an Image Editing Model without Image Editing Pairs

Authors: Nupur Kumari (Carnegie Mellon University), Sheng-Yu Wang (CMU, Carnegie Mellon University), Cherry Zhao (Adobe Research), Yotam Nitzan (Adobe Research), Yuheng Li (Adobe Systems), Krishna Kumar Singh (Adobe Systems), Richard Zhang (Adobe), Eli Shechtman (Adobe), Jun-Yan Zhu (Carnegie Mellon University), Xun Huang (Adobe Research)

Controllable Video Generation with Provable Disentanglement

Authors: Yifan Shen (Mohamed bin Zayed University of Artificial Intelligence), Peiyuan Zhu (Mohamed bin Zayed University of Artificial Intelligence), Zijian Li (Mohamed bin Zayed University of Artificial Intelligence), Shaoan Xie (Carnegie Mellon University), Namrata Deka (Carnegie Mellon University), Zongfang Liu (Zhejiang University), Zeyu Tang (Stanford University), Guangyi Chen (MBZUAI&CMU), Kun Zhang (Carnegie Mellon University & MBZUAI)

Virtual Community: An Open World for Humans, Robots, and Society

Authors: Qinhong Zhou (University of Massachusetts at Amherst), Hongxin Zhang (UMass Amherst), Xiangye Lin (University of Massachusetts at Amherst), Zheyuan Zhang (Johns Hopkins University), Yutian Chen (Carnegie Mellon University), Wenjun Liu (University of Massachusetts at Amherst), Zunzhe Zhang (Tsinghua University), Sunli Chen (University of Massachusetts at Amherst), Lixing Fang (University of Massachusetts at Amherst), Qiushi Lyu (University of Illinois, Urbana-Champaign), Xinyu Sun (South China University of Technology), Jincheng Yang (University of Maryland, College Park), Zeyuan Wang (Tsinghua University, Tsinghua University), Bao Dang (University of Massachusetts at Amherst), Zhehuan Chen (Peking University), Daksha Ladia (University of Massachusetts Amherst), Quang Dang (University of Massachusetts at Amherst), Jiageng Liu (University of Massachusetts at Amherst), Chuang Gan (MIT-IBM Watson AI Lab)

Faster Vision Transformers with Adaptive Patches

Authors: Rohan Choudhury (None), JungEun Kim (General Robotics), Jinhyung Park (Carnegie Mellon University), Eunho Yang (Korea Advanced Institute of Science & Technology), Laszlo A. Jeni (Carnegie Mellon University), Kris Kitani (Carnegie Mellon University)

Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning

Authors: Shengguang Wu (Stanford University), Xiaohan Wang (Stanford University), Yuhui Zhang (Stanford University), Hao Zhu (Carnegie Mellon University), Serena Yeung (Stanford University)

VINCIE: Unlocking In-context Image Editing from Video

Authors: Leigang Qu (National University of Singapore), Feng Cheng (ByteDance Seed), Ziyan Yang (ByteDance Inc.), Qi Zhao (ByteDance Inc.), Shanchuan Lin (ByteDance), Yichun Shi (None), Yicong Li (National University of Singapore), Wenjie Wang (University of Science and Technology of China), Tat-Seng Chua (National University of Singapore), Lu Jiang (Carnegie Mellon University)

RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

Authors: Isaac Robinson (Roboflow), Peter Robicheaux (Roboflow), Matvei Popov (Roboflow, Inc), Deva Ramanan (School of Computer Science, Carnegie Mellon University), Neehar Peri (Carnegie Mellon University)

lmgame-Bench: How Good are LLMs at Playing Games?

Authors: Lanxiang Hu (University of California, San Diego), Mingjia Huo (University of California, San Diego), Yuxuan Zhang (University of California, San Diego), Haoyang Yu (University of California San Diego), Eric P Xing (CMU), Ion Stoica (), Tajana Rosing (University of California, San Diego), Haojian Jin (None), Hao Zhang (University of California, San Diego)

ASCIIEval: Benchmarking Models’ Visual Perception in Text Strings via ASCII Art

Authors: Qi Jia (Shanghai Artificial Intelligence Laboratory), Xiang Yue (Carnegie Mellon University), Shanshan Huang (Guangzhou University), Ziheng Qin (Facebook), Yizhu Liu (Meituan), Bill Yuchen Lin (xAI), Yang You (National University of Singapore), Guangtao Zhai (Shanghai Jiao Tong University)

SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs

Authors: Yuyou Zhang (CMU, Carnegie Mellon University), Radu Corcodel (Mitsubishi Electric Research Labs), Chiori Hori (Mitsubishi Electric Research Labs), Anoop Cherian (Australian National University), DING ZHAO (Carnegie Mellon University)

SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus

Authors: Ming Zhao (Jilin University), Wenhui Dong (NanJing University), Yang Zhang (Chinese People’s Liberation Army General Hospital), wangyou (University of the Chinese Academy of Sciences), Zhonghao Zhang (Ningxia University), Zian Zhou (Zhejiang University), YUNZHI GUAN (Fudan University), Liukun Xu (Nanjing Medical University), Wei Peng (Stanford University), Zhaoyang Gong (Fudan University), Zhicheng Zhang (Chinese People’s Liberation Army General Hospital), Dachuan li (Fudan University), Xiaosheng Ma (Fudan University), Yuli Ma (Peking University), Jianing Ni (Carnegie Mellon University), Changjiang Jiang (Ant Group), Lixia Tian (Beijing Jiaotong University), Chen Qixin (Zhejiang University), Xia Kaishun (Zhejiang University of Technology), Pingping Liu (Jilin University), Tongshun Zhang (Jilin University), ZhiqiangLiu (Huazhong University of Science and Technology), Zhongan Bi (Zhejiang Lab), Chenyang Si (Nanyang Technological University), Tiansheng Sun (Chinese People’s Liberation Army General Hospital), Caifeng Shan (Nanjing University)

SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

Authors: Jianyi Wang (Nanyang Technological University), Shanchuan Lin (ByteDance), Zhijie Lin (Zhejiang University), Yuxi Ren (ByteDance Inc.), Meng Wei (ByteDance Inc.), Zongsheng Yue (Xi’an Jiaotong University), Shangchen Zhou (Nanyang Technological University), Hao Chen (ByteDance Inc.), Yang Zhao (Bytedance Inc.), Ceyuan Yang (ByteDance), Xuefeng Xiao (ByteDance), Chen Change Loy (Nanyang Technological University), Lu Jiang (Carnegie Mellon University)

Mixture of Contexts for Long Video Generation

Authors: Shengqu Cai (Stanford University), Ceyuan Yang (ByteDance), Lvmin Zhang (Stanford University), Yuwei Guo (The Chinese University of Hong Kong), Junfei Xiao (Johns Hopkins University), Ziyan Yang (ByteDance Inc.), Yinghao Xu (Stanford University), Zhenheng Yang (Tiktok), Alan Yuille (Johns Hopkins University), Leonidas Guibas (Stanford University), Maneesh Agrawala (Stanford University), Lu Jiang (Carnegie Mellon University), Gordon Wetzstein (Stanford University)

pySpatial: Generating 3D Visual Programs for Zero-Shot Spatial Reasoning

Authors: Zhanpeng Luo (University of Pittsburgh), Ce Zhang (Carnegie Mellon University), Silong Yong (Department of Automation, Tsinghua University, Tsinghua University), Cunxi Dai (Carnegie Mellon University), Qianwei Wang (University of Michigan – Ann Arbor), Haoxi Ran (Carnegie Mellon University), Guanya Shi (CMU, Carnegie Mellon University), Katia Sycara (Carnegie Mellon University), Yaqi Xie (Carnegie Mellon University)

Sharp Monocular View Synthesis in Less Than a Second

Authors: Lars Mescheder (Apple), Wei Dong (Apple), Shiwei Li (Apple), Xuyang BAI (Apple), Marcel Santos (Apple), Peiyun Hu (Carnegie Mellon University), Bruno Lecouat (Telecom ParisTech), Mingmin Zhen (Apple), Amaël Delaunoy (Apple), Tian Fang (Hong Kong University of Science and Technology), Yanghai Tsin (Apple), Stephan Richter (Apple), Vladlen Koltun (Apple)

S2GO: Streaming Sparse Gaussian Occupancy

Authors: Jinhyung Park (Carnegie Mellon University), Chensheng Peng (University of California, Berkeley), yihan hu (Applied Intuition), Wenzhao Zheng (UC Berkeley), Kris Kitani (Carnegie Mellon University), Wei Zhan (University of California Berkeley)

Captain Cinema: Towards Short Movie Generation

Authors: Junfei Xiao (Johns Hopkins University), Ceyuan Yang (ByteDance), Lvmin Zhang (Stanford University), Shengqu Cai (Stanford University), Yang Zhao (Bytedance Inc.), Yuwei Guo (The Chinese University of Hong Kong), Gordon Wetzstein (Stanford University), Maneesh Agrawala (Stanford University), Alan Yuille (Johns Hopkins University), Lu Jiang (Carnegie Mellon University)

Deep Learning

Chimera: State Space Models Beyond Sequences

Authors: Aakash Sunil Lahoti (CMU, Carnegie Mellon University), Tanya Marwah (CMU), (None), Albert Gu (Cartesia AI CMU)

VisCoder2: Building Multi-Language Visualization Coding Agents

Authors: Yuansheng Ni (University of Waterloo), Songcheng Cai (University of Waterloo), Xiangchao Chen (University of Waterloo), Jiarong Liang (University of Waterloo), Zhiheng LYU (University of Hong Kong), Jiaqi Deng (Korea Advanced Institute of Science & Technology), Kai Zou (NetMind.AI), PING NIE (Peking University), Fei Yuan (Shanghai Artificial Intelligent Laboratory), Xiang Yue (Carnegie Mellon University), Wenhu Chen (University of Waterloo)

Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks

Authors: Yubo Li (Carnegie Mellon University), Ramayya Krishnan (Carnegie Mellon University), Rema Padman (Carnegie Mellon University)

Tracking Equivalent Mechanistic Interpretations Across Neural Networks

Authors: Alan Sun (Carnegie Mellon University), Mariya Toneva (None)

How Dark Patterns Manipulate Web Agents

Authors: Phil Cuvin (Stanford University), Hao Zhu (Carnegie Mellon University), Diyi Yang (Stanford University)

ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases

Authors: Ziqian Zhong (Carnegie Mellon University), Aditi Raghunathan (Carnegie Mellon University), Nicholas Carlini (Anthropic)

Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data

Authors: Syeda Nahida Akter (Carnegie Mellon University), Shrimai Prabhumoye (NVIDIA), Eric Nyberg (Carnegie Mellon University), Mostofa Patwary (NVIDIA), Mohammad Shoeybi (NVIDIA), Yejin Choi (Stanford University / NVIDIA), Bryan Catanzaro (NVIDIA)

InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning

Authors: Matthew Yang (Carnegie Mellon University), Hao Bai (University of Illinois at Urbana-Champaign), Ian Wu (Carnegie Mellon University), Gene Yang (Carnegie Mellon University), Amrith Setlur (Carnegie Mellon University), Aviral Kumar (University of California Berkeley)

Dynamic Chunking for End-to-End Hierarchical Sequence Modeling

Authors: Sukjun Hwang (Carnegie Mellon University), Brandon Wang (onepot), Albert Gu (Cartesia AI CMU)

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

Authors: Amrith Setlur (Carnegie Mellon University), Matthew Yang (Carnegie Mellon University), Charlie Snell (University of California, Berkeley), Jeremiah Greer (Oumi AI PBC), Ian Wu (Carnegie Mellon University), Virginia Smith (Carnegie Mellon University), Max Simchowitz (Massachusetts Institute of Technology), Aviral Kumar (University of California Berkeley)

FSA: An Alternative Efficient Implementation of Native Sparse Attention Kernel

Authors: Ran Yan (The Hong Kong University of Science and Technology), YOUHE JIANG (University of Cambridge), Zhuoming Chen (Carnegie Mellon University), Haohui Mai (Hong Kong University of Science and Technology), Beidi Chen (CMU, Carnegie Mellon University), Binhang Yuan (HKUST)

In Good GRACES: Principled Teacher Selection for Knowledge Distillation

Authors: Abhishek Panigrahi (Princeton), Bingbin Liu (Carnegie Mellon University), Sadhika Malladi (Princeton University), Sham Kakade (Harvard University), Surbhi Goel (Microsoft)

Log-Linear Attention

Authors: Guo (), Songlin Yang (ShanghaiTech University), Tarushii Goel (Massachusetts Institute of Technology), Eric P Xing (CMU), Tri Dao (Princeton University), Yoon Kim (MIT)

Generalized Parallel Scaling with Interdependent Generations

Authors: Harry Dong (Carnegie Mellon University), David Brandfonbrener (NYU), Eryk Helenowski (Facebook), Yun He (Meta), Mrinal Kumar (Facebook), Han Fang (Meta GenAI), Yuejie Chi (Carnegie Mellon University), Karthik Abinav Sankararaman (Facebook)

KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and Unlearning

Authors: Yinyi Luo (Carnegie Mellon University), Zhexian Zhou (CMU, Carnegie Mellon University), Hao Chen (Google DeepMind), Kai Qiu (CMU, Carnegie Mellon University), Marios Savvides (Carnegie Mellon University), Yixuan Li (University of Wisconsin, Madison), Jindong Wang (William & Mary)

General Machine Learning

VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding

Authors: Abdul Waheed (Maharaja Agrasen Institute of Technology, New Delhi), Zhen Wu (Carnegie Mellon University), Dareen Alharthi (LTI CMU), Seungone Kim (Carnegie Mellon University), Bhiksha Raj (Carnegie Mellon University)

Score-based Greedy Search for Structure Identification of Partially Observed Causal Models

Authors: Xinshuai Dong (CMU), Ignavier Ng (Carnegie Mellon University), Haoyue Dai (Carnegie Mellon University), Jiaqi Sun (CMU, Carnegie Mellon University), Xiangchen Song (Carnegie Mellon University), Peter Spirtes (Carnegie Mellon University), Kun Zhang (Carnegie Mellon University & MBZUAI)

On Code-Induced Reasoning in LLMs

Authors: Abdul Waheed (Maharaja Agrasen Institute of Technology, New Delhi), Zhen Wu (Carnegie Mellon University), Carolyn Rose (School of Computer Science, Carnegie Mellon University), Daphne Ippolito (School of Engineering and Applied Science, University of Pennsylvania)

Conditional Independent Component Analysis for Estimating Causal Structure with Latent Variables

Authors: Yewei Xia (Fudan University), Zhengming Chen (Guangdong University of Technology), Haoyue Dai (Carnegie Mellon University), Fuhong Wang (Guangdong University of Technology), Yixin Ren (Fudan University), Yiqing Li (MBZUAI), Kun Zhang (Carnegie Mellon University & MBZUAI), Shuigeng Zhou (Fudan University)

Ambig-SWE: Interactive Agents to Overcome Underspecificity in Software Engineering

Authors: Sanidhya Vijayvargiya (Samsung), Xuhui Zhou (Carnegie Mellon University), Akhila Yerukola (Carnegie Mellon University), Maarten Sap (Carnegie Mellon University), Graham Neubig (Carnegie Mellon University)

Navigating the Accuracy-Size Trade-Off with Flexible Model Merging

Authors: Akash Dhasade (EPFL), Divyansh Jhunjhunwala (Amazon), Milos Vujasinovic (EPFL), Gauri Joshi (Carnegie Mellon University), Anne-Marie Kermarrec (School of Computer and Communication Sciences, EPFL – EPF Lausanne)

Dual Perspectives on Non-Contrastive Self-Supervised Learning

Authors: Jean Ponce (NYU/ENS-PSL), Basile Terver (AMI Labs), Martial Hebert (Carnegie Mellon University), Michael Arbel (INRIA)

Multiple-Prediction-Powered Inference

Authors: Charlie Cowen-Breen (Massachusetts Institute of Technology), Alekh Agarwal (Google), Stephen Bates (Massachusetts Institute of Technology), William W. Cohen (Carnegie Mellon University), Jacob Eisenstein (Google), Amir Globerson (Google), Adam Fisch (Google DeepMind)

Command-V: Training-Free Representation Finetuning Transfer

Authors: Barry Wang (Carnegie Mellon University), Avi Schwarzschild (Carnegie Mellon University), Alexander Robey (CMU, Carnegie Mellon University), Ali Payani (Cisco Systems), Charles Fleming (Cisco), Mingjie Sun (School of Computer Science, Carnegie Mellon University), Daphne Ippolito (School of Engineering and Applied Science, University of Pennsylvania)

SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis

Authors: Shahriar Noroozizadeh (Carnegie Mellon University), Xiaobin Shen (Carnegie Mellon University), Jeremy Weiss (National Library of Medicine), George H. Chen (Carnegie Mellon University)

Conformalized Decision Risk Assessment

Authors: Wenbin Zhou (Carnegie Mellon University), Agni Orfanoudaki (University of Oxford), Shixiang Zhu (Carnegie Mellon University)

Prompt-MII: Meta-Learning Instruction Induction for LLMs

Authors: Emily Xiao (Carnegie Mellon University), Yixiao Zeng (XPeng Motors / Carnegie Mellon University), Ada Chen (CMU, Carnegie Mellon University), Chin-Jou Li (Language Technologies Institute, Carnegie Mellon University), Amanda Bertsch (Carnegie Mellon University), Graham Neubig (Carnegie Mellon University)

Optimization

Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence

Authors: Shuhua Yu (Carnegie Mellon University), Dusan Jakovetic (University of Novi Sad Faculty of Sciences), Soummya Kar (CMU, Carnegie Mellon University)

FrontierCO: Real-World and Large-Scale Evaluation of Machine Learning Solvers for Combinatorial Optimization

Authors: Shengyu Feng (Carnegie Mellon University), Weiwei Sun (Carnegie Mellon University), Shanda Li (Carnegie Mellon University), Ameet Talwalkar (University of California-Los Angeles), Yiming Yang (Carnegie Mellon University)

Gen-DFL: Decision-Focused Generative Learning for Robust Decision Making

Authors: Prince Wang (Carnegie Mellon University), Shuyi Chen (Carnegie Mellon University), Jinhao Liang (University of Virginia, Charlottesville), Ferdinando Fioretto (University of Virginia, Charlottesville), Shixiang Zhu (Carnegie Mellon University)

Reinforcement Learning

Jackpot: Align Actor-Policy Distribution for scalable and stable RL for LLM

Authors: Zhuoming Chen (Carnegie Mellon University), Hongyi Liu (CMU, Carnegie Mellon University), Yang Zhou (CMU, Carnegie Mellon University), Haizhong Zheng (CMU, Carnegie Mellon University), Beidi Chen (CMU, Carnegie Mellon University)

Improving Human-AI Coordination through Online Adversarial Training and Generative Models

Authors: Paresh Chaudhary (Department of Computer Science, University of Washington), Yancheng Liang (Department of Computer Science, University of Washington), Daphne Chen (Carnegie Mellon University), Simon Du (University of Washington), Natasha Jaques (University of Washington, Google DeepMind)

From Curiosity to Caution: Mitigating Reward Hacking for Best-of-$N$ with Pessimism

Authors: Zhuohao Yu (Peking University), Steven Wu (Carnegie Mellon University), Adam Block (Columbia University)

Online Decision Making with Generative Action Sets

Authors: Jianyu Xu (Carnegie Mellon University), Vidhi Jain (Netaji Subhas University of Technology), Bryan Wilder (Carnegie Mellon University), Aarti Singh (Carnegie Mellon University)

TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks

Authors: Vansh Kapoor (School of Computer Science, Carnegie Mellon University), Aman Gupta (Carnegie Mellon University), Hao Chen (Amazon), Anurag Beniwal (Elevenlabs), Jing Huang (Stanford), Aviral Kumar (University of California Berkeley)

Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion

Authors: Dan Haramati (Brown University), Carl Qi (University of Texas at Austin), Tal Daniel (Carnegie Mellon University), Amy Zhang (UT Austin), Aviv Tamar (Technion), George D Konidaris (Brown)

DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty

Authors: Mingxuan Cui (University of Illinois at Urbana-Champaign), Duo Zhou (University of Illinois Urbana-Champaign), Yuxuan Han (New York University), Grani A. Hanasusanto (University of Illinois Urbana-Champaign), Qiong Wang (University of Illinois at Urbana-Champaign), Huan Zhang (CMU), Zhengyuan Zhou (New York University)

Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation

Authors: Xintong Duan (Morgan Stanley), Yutong (Kelly) He (Carnegie Mellon University), Fahim Tajwar (Carnegie Mellon University), Russ Salakhutdinov (CMU), Zico Kolter (Carnegie Mellon University), Jeff Schneider (School of Computer Science)

Demystifying The Mechanisms Behind Emergent Exploration in Goal-Conditioned RL

Authors: Mahsa Bastankhah (None), Grace Liu (Carnegie Mellon University), Dilip Arumugam (Princeton University), Thomas L. Griffiths (Princeton), Benjamin Eysenbach (Princeton University)

MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance

Authors: Narjes Nourzad (University of Southern California), Carlee Joe-Wong (Carnegie Mellon University)

Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions

Authors: Zequan Wu (Carnegie Mellon University), Mengye Ren (New York University)

All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning

Authors: Gokul Swamy (Carnegie Mellon University), Sanjiban Choudhury (None), Wen Sun (Cornell University), Steven Wu (Carnegie Mellon University), Drew Bagnell (Carnegie-Mellon University)

HARDTESTGEN: A High-Quality RL Verifier Generation Pipeline for LLM Algorithimic Coding

Authors: Zhongmou He (Carnegie Mellon University), Yee Man Choi (University of Waterloo), Kexun Zhang (Carnegie Mellon University), Ivan Bercovich (UC Santa Barbara + ScOp VC), Jiabao Ji (University of California, Santa Barbara), Junting Zhou (Peking University), Dejia Xu (University of Texas at Austin), Aidan Zhang (Carnegie Mellon University), Yixiao Zeng (XPeng Motors / Carnegie Mellon University), Lei Li (School of Computer Science, Carnegie Mellon University)

Reevaluating Policy Gradient Methods for Imperfect-Information Games

Authors: Max Rudolph (University of Texas at Austin), Nathan Lichtlé (Electrical Engineering & Computer Science Department, University of California, Berkeley), Sobhan Mohammadpour (MIT), Alexandre M Bayen (None), Zico Kolter (Carnegie Mellon University), Amy Zhang (UT Austin), Gabriele Farina (Massachusetts Institute of Technology), Eugene Vinitsky (New York University), Samuel Sokota (Carnegie Mellon University)

GEM: A Gym for Generalist LLMs

Authors: Zichen Liu (Sea AI Lab), Anya Sims (University of Oxford), Keyu Duan (national university of singaore, National University of Singapore), Changyu Chen (Stanford University), Simon Yu (Northeastern University), Xiangxin Zhou (UCAS), Haotian Xu (Tsinghua University, Tsinghua University), Shaopan Xiong (Alibaba Group), Bo Liu (National University of Singapore), Chenmien Tan (University of Edinburgh), Weixun Wang (Tianjin University), Hao Zhu (Carnegie Mellon University), Weiyan Shi (Columbia University), Diyi Yang (Stanford University), Michael Qizhe Shieh (National University of Singapore), Yee Whye Teh (University of Oxford and Google DeepMind), Wee Sun Lee (National University of Singapore), Min Lin (Sea AI Lab)

Learning to Maximize Rewards via Reaching Goals

Authors: Chongyi Zheng (Princeton University), Mahsa Bastankhah (None), Grace Liu (Carnegie Mellon University), Benjamin Eysenbach (Princeton University)

Social Aspects

When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models

Authors: Nan Zhang (Pennsylvania State University), Eugene Kwek (Pennsylvania State University), Yusen Zhang (Columbia University), Hieu Nguyen (The Pennsylvania State University), Prasenjit Mitra (Carnegie Mellon University), Rui Zhang (Pennsylvania State University)

Doubly-Robust LLM-as-a-Judge: Externally Valid Estimation with Imperfect Personas

Authors: Luke Guerdan (Carnegie Mellon University), Justin Whitehouse (Stanford University), Kimberly Truong (Carnegie Mellon University), Ken Holstein (School of Computer Science, Carnegie Mellon University), Steven Wu (Carnegie Mellon University)

BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning

Authors: Qiusi Zhan (University of Illinois Urbana-Champaign), Hyeonjeong Ha (University of Illinois Urbana-Champaign), Rui Yang (Hong Kong University of Science and Technology), Sirui Xu (University of Illinois at Urbana-Champaign), Hanyang Chen (University of Illinois at Urbana-Champaign), Liang-Yan Gui (UIUC), Yu-Xiong Wang (UIUC), Huan Zhang (CMU), Heng Ji (University of Illinois at Urbana-Champaign), Daniel Kang (UIUC)

VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety

Authors: Shruti Palaskar (Apple), Leon Gatys (Apple), Mona Abdelrahman (Apple), Mar Jacobo (Apple), Laurence Lindsey (Apple), Rutika Moharir (Apple), Gunnar Lund (Grammarly), Yang Xu (Apple), Navid Shiee (Apple), Jeffrey Bigham (Carnegie Mellon University), Charles Maalouf (Apple), Joseph Cheng (Apple)

Generative Value Conflicts Reveal LLM Priorities

Authors: Andy Liu (Carnegie Mellon University), Kshitish Ghate (University of Washington), Mona Diab (Carnegie Mellon University), Daniel Fried (Carnegie Mellon University), Atoosa Kasirzadeh (Alan Turing Institute), Max Kleiman-Weiner (Common Sense Machines)

Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs

Authors: Ziqian Zhong (Carnegie Mellon University), Aditi Raghunathan (Carnegie Mellon University)

CIMemories: A Compositional Benchmark For Contextual Integrity In LLMs

Authors: Niloofar Mireshghallah (Carnegie Mellon University), Neal Mangaokar (Meta FAIR), Narine Kokhlikyan (Facebook), Arman Zharmagambetov (Meta AI (FAIR)), Manzil Zaheer (Google), Saeed Mahloujifar (Princeton University), Kamalika Chaudhuri (Facebook)

On The Fragility of Benchmark Contamination Detection in Reasoning Models

Authors: Han Wang (University of Illinois at Urbana-Champaign), Haoyu Li (University of Illinois at Urbana-Champaign), Brian Ko (University of Washington), Huan Zhang (CMU)

ELEPHANT: Measuring and understanding social sycophancy in LLMs

Authors: Myra Cheng (Stanford University), Sunny Yu (Stanford University), Cinoo Lee (Stanford University), Pranav Khadpe (Carnegie Mellon University), Lujain Ibrahim (University of Oxford), Dan Jurafsky (Stanford University)

Operationalizing Data Minimization for Privacy-Preserving LLM Prompting

Authors: Jijie Zhou (Northeastern University), Niloofar Mireshghallah (Carnegie Mellon University), Tianshi Li (Northeastern University)

PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm

Authors: Jing-Jing Li (University of California, Berkeley), Joel Mire (Carnegie Mellon University), Eve Fleisig (UC Berkeley), Valentina Pyatkin (Ai2, ETH AI Center), Anne Collins (University of California, Berkeley), Maarten Sap (Carnegie Mellon University), Sydney Levine (NYU / Google Deepmind)

Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability

Authors: Taylor Sorensen (humans&), Benjamin Newman (University of Washington), Jared Moore (Computer Science Department, Stanford University), Chan Young Park (University of Texas at Austin), Jillian Fisher (University of Washington), Niloofar Mireshghallah (Carnegie Mellon University), Liwei Jiang (None), Yejin Choi (Stanford University / NVIDIA)

OpenAgentSafety: A Comprehensive Framework For Evaluating Real-World AI Agent Safety

Authors: Sanidhya Vijayvargiya (Samsung), Aditya Soni (School of Computer Science, Carnegie Mellon University), Xuhui Zhou (Carnegie Mellon University), Zora Zhiruo Wang (Carnegie Mellon University), Nouha Dziri (Allen Institute for AI), Graham Neubig (Carnegie Mellon University), Maarten Sap (Carnegie Mellon University)

Theory

Learning a Game by Paying the Agents

Authors: Brian Zhang (MIT), Tao Lin (Harvard University), Yiling Chen (Harvard University), Tuomas Sandholm (Carnegie Mellon University)

Convergence of Regret Matching in Potential Games and Constrained Optimization

Authors: Ioannis Anagnostides (Carnegie Mellon University), Emanuel Tewolde (Carnegie Mellon University), Brian Zhang (MIT), Ioannis Panageas (Donald Bren School of Information and Computer Sciences, University of California, Irvine), Vincent Conitzer (Carnegie Mellon University), Tuomas Sandholm (Carnegie Mellon University)

Nearly Space-Optimal Graph and Hypergraph Sparsification in Insertion-Only Data Streams

Authors: Vincent Cohen-Addad (Google), David Woodruff (Carnegie Mellon University), Shenghao Xie (Texas A&M University), Samson Zhou (Texas A&M University)

Sample Complexity and Representation Ability of Test-time Scaling Paradigms

Authors: Baihe Huang (University of California, Berkeley), Shanda Li (Carnegie Mellon University), Tianhao Wu (University of California, Berkeley), Yiming Yang (Carnegie Mellon University), Ameet Talwalkar (University of California-Los Angeles), Kannan Ramchandran (), Michael Jordan (University of California, Berkeley), Jiantao Jiao (University of California Berkeley)

General search techniques without common knowledge for imperfect-information games, and application to superhuman Fog of War chess

Authors: Brian Zhang (MIT), Tuomas Sandholm (Carnegie Mellon University)

Internal Planning in Language Models: Characterizing Horizon and Branch Awareness

Authors: Muhammed Ustaomeroglu (Carnegie Mellon University), Baris Askin (Carnegie Mellon University), Gauri Joshi (Carnegie Mellon University), Carlee Joe-Wong (Carnegie Mellon University), Guannan Qu (Carnegie Mellon University)

Action Chunking and Data Augmentation Yield Exponential Improvements in Behavior Cloning for Continuous Spaces

Authors: Thomas T. Zhang (University of Pennsylvania), Daniel Pfrommer (None), Chaoyi Pan (Carnegie Mellon University), Nikolai Matni (School of Engineering and Applied Science, University of Pennsylvania), Max Simchowitz (Massachusetts Institute of Technology)

Polynomial Convergence of Riemannian Diffusion Models

Authors: Xingyu Xu (CMU, Carnegie Mellon University), Ziyi Zhang (CMU, Carnegie Mellon University), Yorie Nakahira (Researcher at NII LLM Center Assistant Professor at CMU), Guannan Qu (Carnegie Mellon University), Yuejie Chi (Carnegie Mellon University)

Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information

Authors: Nina Balcan (Carnegie-Mellon University), Martino Bernasconi (Bocconi University), Matteo Castiglioni (Politecnico di Milano), Andrea Celli (Bocconi University), Keegan Harris (UC Berkeley), Steven Wu (Carnegie Mellon University)

Dimension-Free Decision Calibration for Nonlinear Loss Functions

Authors: Jingwu Tang (Carnegie Mellon University), Jiayun Wu (Carnegie Mellon University), Steven Wu (Carnegie Mellon University), Jiahao Zhang (Carnegie Mellon University)

Learning-Augmented Moment Estimation on Time-Decay Models

Authors: Soham Nagawanshi (Texas A&M University – College Station), Shalini Panthangi (CMU, Carnegie Mellon University), Chen Wang (Rice University and Texas A&M University), David Woodruff (Carnegie Mellon University), Samson Zhou (Texas A&M University)

Designing Rules to Pick a Rule: Aggregation by Consistency

Authors: Ratip Emin Berker (Carnegie Mellon University), Ben Armstrong (Tulane University), Vincent Conitzer (Carnegie Mellon University), Nihar Shah (CMU)

Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment

Authors: Cyrus Cousins (Duke University), Vijay Keswani (Indian Institute of Technology, Delhi), Vincent Conitzer (Carnegie Mellon University), Hoda Heidari (Carnegie Mellon University), Jana Borg (Duke University), Walter Sinnott-Armstrong (Duke University)

Uncategorized

Online Selective Conformal Inference: Errors and Solutions

Authors: Yusuf Sale (LMU Munich), Aaditya Ramdas (Carnegie Mellon University)

PCNN: Probable-Class Nearest-Neighbor Explanations Improve Fine-Grained Image Classification Accuracy for AIs and Humans

Authors: Giang Nguyen (Auburn University), Valerie Chen (Carnegie Mellon University), Mohammad Reza Taesiri (None), Anh Nguyen (Auburn University)

Distributed Algorithms for Euclidean Clustering

Authors: Vincent Cohen-Addad (Google), Liudeng Wang (Texas A&M University – College Station), David Woodruff (Carnegie Mellon University), Samson Zhou (Texas A&M University)

LipNeXt: Scaling up Lipschitz-based Certified Robustness to Billion-parameter Models

Authors: Kai Hu (Carnegie Mellon University), Haoqi Hu (Amazon), Matt Fredrikson (University of Wisconsin, Madison)

Consistent Low-Rank Approximation

Authors: David Woodruff (Carnegie Mellon University), Samson Zhou (Texas A&M University)

Optimizing Canaries for Privacy Auditing with Metagradient Descent

Authors: Matteo Boglioni (ETH Zürich), Terrance Liu (Carnegie Mellon University), Andrew Ilyas (Carnegie Mellon University), Steven Wu (Carnegie Mellon University)

Flow Autoencoders are Effective Protein Tokenizers

Authors: Rohit Dilip (California Institute of Technology), Evan Zhang (California Institute of Technology), Ayush Varshney (Carnegie Mellon University), David Van Valen (None)

Better Bounds for the Distributed Experts Problem

Authors: David Woodruff (Carnegie Mellon University), Samson Zhou (Texas A&M University)

Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models

Authors: Xinyue Ai (Peking University), Yutong (Kelly) He (Carnegie Mellon University), Albert Gu (Cartesia AI CMU), Russ Salakhutdinov (CMU), Zico Kolter (Carnegie Mellon University), Nicholas Boffi (CMU, Carnegie Mellon University), Max Simchowitz (Massachusetts Institute of Technology)

RLP: Reinforcement as a Pretraining Objective

Authors: Ali Hatamizadeh (Nvidia), Syeda Nahida Akter (Carnegie Mellon University), Shrimai Prabhumoye (NVIDIA), Jan Kautz (NVIDIA), Mostofa Patwary (NVIDIA), Mohammad Shoeybi (NVIDIA), Bryan Catanzaro (NVIDIA), Yejin Choi (Stanford University / NVIDIA)

Programming with Pixels: Can Computer-Use Agents do Software Engineering?

Authors: Pranjal Aggarwal (CMU), Sean Welleck (Carnegie Mellon University)

Think Then Embed: Generative Context Improves Multimodal Embedding

Authors: Xuanming Cui (University of Central Florida), Jianpeng Cheng (Meta), Hong-You Chen (Ohio State University), Satya Narayan Shukla (Meta), Abhijeet Awasthi (Indian Institute of Technology Bombay), Xichen Pan (New York University), Chaitanya Ahuja (Carnegie Mellon University), Shlok Mishra (Facebook), Taipeng Tian (Meta), Qi Guo (Facebook), Ser-Nam Lim (University of Central Florida), Aashu Singh (Facebook), Xiangjun Fan (Meta)

ToolTree: Efficient LLM Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning

Authors: Shuo Yang (The University of Melbourne), Caren Han (University of Melbourne), Yihao Ding (University of Western Australia), Shuhe Wang (University of Melbourne), Eduard H Hovy (Carnegie Mellon University)

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Authors: Yanghao Li (Apple), Rui Qian (Apple), Bowen Pan (Massachusetts Institute of Technology), Haotian Zhang (NVIDIA), Haoshuo Huang (Apple), Bowen Zhang (Apple), Jialing Tong (Apple), Haoxuan You (Apple AI/ML), Xianzhi Du (Apple), Zhe Gan (Apple), Hyunjik Kim (DeepMind), Chao Jia (Google), Zhenbang Wang (Apple), Yinfei Yang (Apple), Mingfei Gao (Apple), Zi-Yi Dou (Carnegie Mellon University), Wenze Hu (UCLA, University of California, Los Angeles), Chang Gao (Waymo), Dongxu Li (SalesForce.com), Philipp Dufter (Apple), Zirui Wang (Apple AI/ML), Guoli Yin (Apple), Zhengdong Zhang (Google), Chen Chen (Apple), Yang Zhao (University of California, Berkeley), Ruoming Pang (None), Zhifeng Chen (Apple)

TrustGen: A Platform of Dynamic Benchmarking on the Trustworthiness of Generative Foundation Models

Authors: Yue Huang (University of Notre Dame), Chujie Gao (Mohamed bin Zayed University of Artificial Intelligence), Siyuan Wu (None), Haoran Wang (Emory University), Xiangqi Wang (University of Notre Dame), Jiayi Ye (Sichuan University), Yujun Zhou (University of Notre Dame), Yanbo Wang (Mohamed bin Zayed University of Artificial Intelligence), Jiawen Shi (Huazhong University of Science and Technology), Qihui Zhang (Sichuan University), Han Bao (University of Notre Dame), Zhaoyi Liu (University of Illinois at Urbana-Champaign), Yuan Li (University of Cambridge), Tianrui Guan (Department of Computer Science, University of Maryland, College Park), Peiran Wang (University of California, Los Angeles), Haomin Zhuang (University of Notre Dame), Dongping Chen (University of Washington), Kehan Guo (University of Notre Dame), Andy Zou (CMU, Carnegie Mellon University), Bryan Hooi (National University of Singapore), Caiming Xiong (Salesforce Research), Elias Stengel-Eskin (Department of Computer Science, UT Austin), Hongyang Zhang (University of Waterloo), Hongzhi Yin (University of Queensland), Huan Zhang (CMU), Huaxiu Yao (UNC-Chapel Hill), Jieyu Zhang (Department of Computer Science, University of Washington), Jaehong Yoon (NTU Singapore), Kai Shu (Emory University), Ranjay Krishna (Department of Computer Science), Swabha Swayamdipta (University of Southern California), Weijia Shi (University of Washington, Seattle), Xiang Li (Massachusetts General Hospital), Yuexing Hao (Massachusetts Institute of Technology), Zhihao Jia (School of Computer Science, Carnegie Mellon University), Zhize Li (KAUST), Xiuying Chen (Mohamed bin Zayed University of Artificial Intelligence), Zhengzhong Tu (Texas A&M University – College Station), Xiyang Hu (Arizona State University), Tianyi Zhou (MBZUAI), Jieyu Zhao (University of Southern California), Lichao Sun (Lehigh University), Furong Huang (University of Maryland), Or Cohen-Sasson (University of Miami), Prasanna Sattigeri (IBM Research), Anka Reuel (Stanford University), Max Lamparth (Stanford University), Yue Zhao (University of Southern California), Nouha Dziri (Allen Institute for AI), Yu Su (Ohio State University), Huan Sun (Ohio State University), Heng Ji (University of Illinois at Urbana-Champaign), Chaowei Xiao (Johns Hopkins University/NVIDIA), Mohit Bansal (UNC Chapel Hill), Nitesh Chawla (University of Notre Dame), Jian Pei (Simon Fraser University), Jianfeng Gao (Microsoft Research), Michael Backes (CISPA Helmholtz Center for Information Security), Philip Yu (University of Illinois, Chicago), Neil Gong (), Pin-Yu Chen (IBM Research AI), Bo Li (University of Illinois, Urbana Champaign), Dawn Song (Berkeley), Xiangliang Zhang (University of Notre Dame)

RefineBench: Evaluating Refinement Capability of Language Models via Checklists

Authors: Young-Jun Lee (KAIST), Seungone Kim (Carnegie Mellon University), Byung-Kwan Lee (NVIDIA), Minkyeong Moon (Yonsei University), Yechan Hwang (), Jong Myoung Kim (Korea Advanced Institute of Science & Technology), Graham Neubig (Carnegie Mellon University), Sean Welleck (Carnegie Mellon University), Ho-Jin Choi (Korea Advanced Institute of Science & Technology)

LLMs Struggle to Balance Reasoning and World Knowledge in Causal Narrative Understanding

Authors: Khurram Yamin (CMU, Carnegie Mellon University), Shantanu Gupta (Carnegie Mellon University), Gaurav Ghosal (CMU, Carnegie Mellon University), Zachary Lipton (Carnegie Mellon University), Bryan Wilder (Carnegie Mellon University)

Scaling Group Inference for Diverse and High-Quality Generation

Authors: Gaurav Parmar (Carnegie Mellon University), Or Patashnik (Tel Aviv University), Daniil Ostashev (Snap Inc.), Kuan-Chieh Wang (Snap Inc.), Kfir Aberman (Google), Srinivasa Narasimhan (Carnegie Mellon University), Jun-Yan Zhu (Carnegie Mellon University)

Much Ado About Noising: Dispelling the Myths of Generative Robotic Control

Authors: Chaoyi Pan (Carnegie Mellon University), Giridharan Anantharaman (Facebook), Nai-Chieh Huang (Carnegie Mellon University), Claire Jin (School of Computer Science, Carnegie Mellon University), Daniel Pfrommer (None), Chenyang Yuan (Toyota Research Institute), Frank Permenter (Toyota Research Institute), Guannan Qu (Carnegie Mellon University), Nicholas Boffi (CMU, Carnegie Mellon University), Guanya Shi (CMU, Carnegie Mellon University), Max Simchowitz (Massachusetts Institute of Technology)

Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking

Authors: Dhruv Rohatgi (Massachusetts Institute of Technology), Abhishek Shetty (University of California Berkeley), Donya Saless (University of California, Berkeley), Yuchen Li (Carnegie Mellon University), Ankur Moitra (Massachusetts Institute of Technology), Andrej Risteski (Carnegie Mellon University), Dylan Foster (Microsoft Research NYC)

OptimalThinkingBench: Evaluating Over and Underthinking in LLMs

Authors: Pranjal Aggarwal (CMU), Seungone Kim (Carnegie Mellon University), Jack Lanchantin (Meta), Sean Welleck (Carnegie Mellon University), Jason E Weston (), Ilia Kulikov (New York University), Swarnadeep Saha (FAIR at Meta)

CoMind: Towards Community-Driven Agents for Machine Learning Engineering

Authors: Sijie Li (Peking University), Weiwei Sun (Carnegie Mellon University), Shanda Li (Carnegie Mellon University), Ameet Talwalkar (University of California-Los Angeles), Yiming Yang (Carnegie Mellon University)

Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

Authors: Fan Feng (University of California, San Diego), Selena Ge (University of California, San Diego), Minghao Fu (University of California, San Diego), Zijian Li (Mohamed bin Zayed University of Artificial Intelligence), Yujia Zheng (Carnegie Mellon University), Zeyu Tang (Stanford University), Yingyao Hu (Johns Hopkins University), Biwei Huang (University of California, San Diego), Kun Zhang (Carnegie Mellon University & MBZUAI)

ZeroGR: A Generalizable and Scalable Framework for Zero-Shot Generative Retrieval

Authors: Weiwei Sun (Carnegie Mellon University), Keyi Kong (Shandong University), xinyu ma (Institute of Computing Technology，Chinese Academy of Science), Shuaiqiang Wang (Baidu Inc.), Dawei Yin (Baidu), Maarten de Rijke (University of Amsterdam), Zhaochun Ren (Leiden University), Yiming Yang (Carnegie Mellon University)

TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning

Authors: Tunyu Zhang (Rutgers University), Haizhou Shi (ML Lab@Rutgers), Yibin Wang (None), Hengyi Wang (Rutgers University), Xiaoxiao He (Facebook), Zhuowei Li (Amazon), Haoxian Chen (Columbia University), Ligong Han (Rutgers University), Kai Xu (Amazon), Huan Zhang (CMU), Dimitris Metaxas (Rutgers University), Hao Wang (Rutgers University)

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Authors: Mike Merrill (None), Alexander Shaw (Brigham Young University), Nicholas Carlini (Anthropic), Boxuan Li (Microsoft), Harsh Raj (Northeastern University), Ivan Bercovich (UC Santa Barbara + ScOp VC), Lin Shi (Cornell University), Jeong Shin (Snorkel AI), Thomas Walshe (Reflection AI), E. Kelly Buchanan (Columbia University), Junhong Shen (Carnegie Mellon University), Guanghao Ye (Massachusetts Institute of Technology), Haowei Lin (Peking University), Jason Poulos (Independent Researcher), Maoyu Wang (), Marianna Nezhurina (Juelich Supercomputing Center, LAION, Tuebingen University), Di Lu (Tencent), Orfeas Menis Mastromichalakis (National Technical University of Athens), Zhiwei Xu (University of Michigan), Zizhao Chen (Department of Computer Science, Cornell University), Yue Liu (NUS), Robert Zhang (University of Texas at Austin), Leon Liangyu Chen (Stanford University), Anurag Kashyap (Amazon), Jan-Lucas Uslu (Stanford University), Jeffrey Li (Carnegie Mellon University), Jianbo Wu (University of California, Merced), Minghao Yan (Department of Computer Science, University of Wisconsin – Madison), Song Bian (University of Wisconsin-Madison), Vedang Sharma (Fremont Unified School District), Ke Sun (Amazon), Steven Dillmann (Stanford University), Akshay Anand (University of California, Berkeley), Andrew Lanpouthakoun (Stanford University), Bardia Koopah (University of California, Berkeley), Changran Hu (Sambanova Systems, Inc), Etash Guha (Stanford University, Anthropic), Gabriel Dreiman (Insitro), Jiacheng Zhu (Massachusetts Institute of Technology), Karl Krauth (Stanford), Li Zhong (Anthropic), Niklas Muennighoff (Stanford University), Robert Amanfu (Independent), Shangyin Tan (University of California, Berkeley), Shreyas Pimpalgaonkar (New York University), Tushar Aggarwal (Microsoft Research / Stanford), Xiangning Lin (CMU), Xin Lan (Michigan State University), Xuandong Zhao (UC Berkeley), Yiqing Liang (Brown University), Yuanli Wang (Boston University), Zilong (Ryan) Wang (UC San Diego), Changzhi Zhou (Tencent), David Heineman (Allen Institute for Artificial Intelligence), Hange Liu (Microsoft), Harsh Trivedi (Allen Institute for Artificial Intelligence), John Yang (Princeton University), Junhong Lin (Massachusetts Institute of Technology), Manish Shetty (University of California, Berkeley), Michael Yang (University of California, Santa Barbara), Nabil Omi (Microsoft Research), Negin Raoof (University of California, Berkeley), Shanda Li (Carnegie Mellon University), Terry Yue Zhuo (Data61, CSIRO), Wuwei Lin (OpenAI), Yiwei Dai (Cornell University), Yuxin Wang (Dartmouth College), Wenhao Chai (Princeton University), Shang Zhou (University of California, San Diego), Dariush Wahdany (CISPA Helmholtz Center), Ziyu She (None), Jiaming Hu (Boston University), Zhikang Dong (State University of New York at Stony Brook), Yuxuan Zhu (University of Illinois Urbana-Champaign), Sasha Cui (Yale University), Ahson Saiyed (University of Virginia, Charlottesville), Arinbjörn Kolbeinsson (UVA & K01), Christopher Rytting (Brigham Young University), Ryan Marten (Harbor), Yixin Wang (University of Michigan – Ann Arbor), Jenia Jitsev (LAION; Juelich Supercomputing Center, Research Center Juelich), Alex Dimakis (Electrical Engineering & Computer Science Department, University of California, Berkeley), Andy Konwinski (University of California, Berkeley), Ludwig Schmidt (University of Washington / Stanford / Anthropic)

A Dense Subset Index for Collective Query Coverage

Authors: Kartik Nair (Carnegie Mellon University), Pritish Chakraborty (Indian Institute of Technology Bombay, Indian Institute of Technology, Bombay), Atharva Tambat (Indian Institute of Technology Bombay, Indian Institute of Technology, Bombay), Indradyumna Roy (IIT Bombay, Aalto University), Soumen Chakrabarti (IIT Bombay), Anirban Dasgupta (IIT Gandhinagar), Abir De (Indian Institute of Technology Bombay,)

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Authors: Zhenting Wang (Rutgers University), Qi Chang (Accenture), Hemani Patel (University of California, Berkeley), Shashank Biju (University of California, Berkeley), Cheng-En Wu (Accenture), Quan Liu (Accenture), Aolin Ding (Accenture), Alireza Rezazadeh (Accenture), Ankit Parag Shah (Carnegie Mellon University), Yujia Bao (Accenture), Eugene Siow (Accenture)

STEM: SCALING TRANSFORMERS WITH EMBEDDING MODULES

Authors: Ranajoy Sadhukhan (Carnegie Mellon University), Sheng Cao (Meta), Harry Dong (Carnegie Mellon University), Changsheng Zhao (Meta Inc.), Attiano Purpura-Pontoniere (Meta – UCLA), Yuandong Tian (Meta AI Research), Zechun Liu (Meta), Beidi Chen (CMU, Carnegie Mellon University)

Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

Authors: Zhepeng Cen (CMU), Haolin Chen (), Shiyu Wang (Emory University), Zuxin Liu (Salesforce AI Research), Zhiwei Liu (SalesForce.com), DING ZHAO (Carnegie Mellon University), Caiming Xiong (Salesforce Research), Huan Wang (Salesforce Research), Weiran Yao (SalesForce.com)

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Authors: Ruibin Yuan (Hong Kong University of Science and Technology), Hanfeng Lin (Hong Kong University of Science and Technology), Shuyue Guo (Beijing University of Posts and Telecommunications), Ge Zhang (University of Waterloo), Jiahao Pan (Hong Kong University of Science and Technology), Yongyi Zang (Smule, Inc.), Haohe Liu (Ohio State University), Yiming Liang (University of the Chinese Academy of Sciences), Wenye Ma (Mohamed bin Zayed University of Artificial Intelligence), Xingjian Du (University of Rochester), Xeron Du (01.AI), Zhen Ye (The Hong Kong University of Science and Technology), Tianyu Zheng (Beijing University of Posts and Telecommunications), Zhengxuan Jiang (Zhejiang University), Yinghao MA (Queen Mary University of London), Minghao Liu (2077AI), Zeyue Tian (Hong Kong University of Science and Technology), Ziya Zhou (The Hong Kong University of Science and Technology), Liumeng Xue (Hong Kong University of Science and Technology), Xingwei Qu (University of Manchester), Yizhi Li (University of Manchester), Shangda Wu (Tencent), Tianhao Shen (Tianjin University), Ziyang Ma (Shanghai Jiao Tong University), Jun Zhan (Fudan University), Chunhui Wang (JD.com), Yatian Wang (The Hong Kong University of Science and Technology), Xiaowei Chi (Hong Kong University of Science and Technology), Xinyue Zhang (National University of Singapore), Zhenzhu Yang (China University of Geoscience Beijing), XiangzhouWang (Wuhan University of Engineering Science), Shansong Liu (Institute of Artificial Intelligence (TeleAI), China Telecom), Lingrui Mei (University of the Chinese Academy of Sciences), Peng Li (Hong Kong University of Science and Technology), JUNJIE WANG (None), Jianwei Yu (Microsoft), Guojian Pang (ByteDance Inc.), Xu Li (Kuaishou- 快手科技), Zihao Wang (CMU, Carnegie Mellon University；ZJU，Zhejiang University), Xiaohuan Zhou (ByteDance Inc.), Lijun Yu (Google DeepMind), Emmanouil Benetos (Queen Mary University of London), Yong Chen (Geely Automobile Research Institute (Ningbo) Co., Ltd), Chenghua Lin (University of Manchester ), Xie Chen (Shanghai Jiaotong University), Gus Xia (MBZUAI), Zhaoxiang Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences), Chao Zhang (Department of Electronic Engineering, Tsinghua University), Wenhu Chen (University of Waterloo), Xinyu Zhou (Megvii Inc.), Xipeng Qiu (Fudan University), Roger Dannenberg (Carnegie Mellon University), JIAHENG LIU (Nanjing University), Jian Yang (Beihang University), Wenhao Huang (01.AI), Wei Xue (Hong Kong University of Science and Technology), Xu Tan (Microsoft Research), Yike Guo (Imperial College London)

PAT3D: Physics-Augmented Text-to-3D Scene Generation

Authors: Guying Lin (Carnegie Mellon University), Kemeng Huang (University of Hong Kong), Michael Liu (CMU, Carnegie Mellon University), Ruihan Gao (Carnegie Mellon University), Hanke Chen (Carnegie Mellon University), Lyuhao Chen (Carnegie Mellon University), Beijia Lu (Carnegie Mellon University), Taku Komura (the University of Hong Kong, University of Hong Kong), Yuan Liu (The University of Hong Kong), Jun-Yan Zhu (Carnegie Mellon University), Minchen Li (School of Engineering and Applied Science, University of Pennsylvania)

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

Authors: Yong Lin (Princeton University), Shange Tang (Princeton University), Bohan Lyu (Tsinghua University), Ziran Yang (Princeton University), Jui-Hui Chung (Princeton University), Haoyu Zhao (Princeton University), Lai Jiang (University of Illinois at Urbana-Champaign), Yihan Geng (Peking University), Jiawei Ge (Princeton University), Jingruo Sun (Stanford University), Jiayun Wu (Carnegie Mellon University), Jiri Gesi (Amazon Science), Ximing Lu (University of Washington), David Acuna (NVIDIA / Univ of Toronto), Kaiyu Yang (Meta), Hongzhou Lin (Amazon), Yejin Choi (Stanford University / NVIDIA), Danqi Chen (Princeton University), Sanjeev Arora (Princeton University), Chi Jin (Princeton University)

Diverse Dictionary Learning

Authors: Yujia Zheng (Carnegie Mellon University), Zijian Li (Mohamed bin Zayed University of Artificial Intelligence), Shunxing Fan (Mohamed bin Zayed University of Artificial Intelligence), Andrew Gordon Wilson (New York University), Kun Zhang (Carnegie Mellon University & MBZUAI)

Characterization and Learning of Causal Graphs with Latent Confounders and Post-treatment Selection from Interventional Data

Authors: Gongxu Luo (Mohamed bin Zayed University of Artificial Intelligence), Loka Li (MBZUAI), Guangyi Chen (MBZUAI&CMU), Haoyue Dai (Carnegie Mellon University), Kun Zhang (Carnegie Mellon University & MBZUAI)

Accelerating Eigenvalue Dataset Generation via Chebyshev Subspace Filter

Authors: Hong Wang (University of Science and Technology of China), Jie Wang (University of Science and Technology of China), Jian Luo (Stony Brook University), huanshuo dong (University of Science and Technology of China), Yeqiu Chen (University of Science and Technology of China), Runmin Jiang (Carnegie Mellon University), Zhen Huang (University of Science and Technology of China)

Beyond Hearing: Learning Task-Agnostic ExG Representations from Earphones via Physiology-Informed Tokenization

Authors: Hyungjun Yoon (Korea Advanced Institute of Science & Technology), Seungjoo Lee (Carnegie Mellon University), Yu Wu (Dartmouth College), XiaoMeng Chen (Shanghai Jiaotong University), Taiting Lu (Pennsylvania State University), Freddy Liu (University of Pennsylvania, University of Pennsylvania), Taeckyung Lee (KAIST), Hyeongheon Cha (Korea Advanced Institute of Science & Technology), Haochen Zhao (), Gaoteng Zhao (Northwest University), Dongyao Chen (Shanghai Jiaotong University), Cecilia Mascolo (University of Cambridge), Sung-Ju Lee (UCLA Computer Science Department, University of California, Los Angeles), Lili Qiu (Microsoft)

Understanding and Fixing Bottlenecks in State Space Models: What Recency and Over-Smoothing Tell Us

Authors: Adrita Das (Carnegie Mellon University), Dantong Zhu (Columbia University)

The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Authors: Seongyun Lee (KAIST AI), Seungone Kim (Carnegie Mellon University), Minju Seo (Korea Advanced Institute of Science & Technology), Yongrae Jo (KAIST), Dongyoung Go (Cornell University), Hyeonbin Hwang (Korea Advanced Institute of Science & Technology), Jinho Park (Korea Advanced Institute of Science & Technology), Xiang Yue (Carnegie Mellon University), Sean Welleck (Carnegie Mellon University), Graham Neubig (Carnegie Mellon University), Moontae Lee (University of Illinois, Chicago), Minjoon Seo (KAIST)

PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits

Authors: Loka Li (MBZUAI), Wong Kang (Mohamed bin Zayed University of Artificial Intelligence), Minghao Fu (University of California, San Diego), Guangyi Chen (MBZUAI&CMU), Zhenhao Chen (MBZUAI), Gongxu Luo (Mohamed bin Zayed University of Artificial Intelligence), Yuewen Sun (Mohamed bin Zayed University of Artificial Intelligence), Salman Khan (Mohamed bin Zayed University of Artificial Intelligence), Peter Spirtes (Carnegie Mellon University), Kun Zhang (Carnegie Mellon University & MBZUAI)