


default search action
EMNLP 2025: Suzhou, China
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng:

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, EMNLP 2025, Suzhou, China, November 4-9, 2025. Association for Computational Linguistics 2025, ISBN 979-8-89176-332-6 - Dominic Petrak, Thy Thy Tran, Iryna Gurevych:

Towards Automated Error Discovery: A Study in Conversational AI. 1-23 - Mohsinul Kabir, Ajwad Abrar, Sophia Ananiadou:

Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs. 24-51 - Donya Rooein, Vilém Zouhar, Debora Nozza, Dirk Hovy

:
Biased Tales: Cultural and Topic Bias in Generating Children's Stories. 52-72 - Donghyun Kim, Sriram Ravula, Taemin Ha, Alex Dimakis, Daehyeok Kim, Aditya Akella:

Large Language Models as Realistic Microservice Trace Generators. 73-91 - David Beauchemin, Michelle Albert-Rochette, Richard Khoury, Pierre-Luc Déziel:

JUDGEBERT: Assessing Legal Meaning Preservation Between Sentences. 92-118 - David Beauchemin, Richard Khoury:

QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments. 119-130 - Siqi Shen, Mehar Singh, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Rada Mihalcea:

Revisiting LLM Value Probing Strategies: Are They Robust and Expressive? 131-145 - Kian Ahrabian, Pegah Jandaghi, Negar Mokhberian, Sai Praneeth Karimireddy, Jay Pujara:

A Systematic Analysis of Base Model Choice for Reward Modeling. 146-164 - Branislav Pecher, Ivan Srba, Mária Bieliková:

Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance. 165-184 - Melanie Subbiah, Akankshya Mishra, Grace Kim, Liyan Tang, Greg Durrett, Kathleen McKeown:

Is the Top Still Spinning? Evaluating Subjectivity in Narrative Understanding. 185-203 - Jakub Macina, Nico Daheim, Ido Hakimi, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan:

MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors. 204-221 - Haishuo Fang, Xiaodan Zhu, Iryna Gurevych:

Preemptive Detection and Correction of Misaligned Actions in LLM Agents. 222-244 - Simon Münker:

Fingerprinting LLMs through Survey Item Factor Correlation: A Case Study on Humor Style Questionnaire. 245-258 - Tianlu Zheng, Yifan Zhang, Xiang An, Ziyong Feng, Kaicheng Yang, Qichuan Ding:

Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval. 259-271 - David Dinucu-Jianu, Jakub Macina, Nico Daheim, Ido Hakimi, Iryna Gurevych, Mrinmaya Sachan:

From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning. 272-292 - Yuhang Tian

, Dandan Song, Zhijing Wu, Pan Yang, Changzhi Zhou, Jun Yang, Hao Wang, Huipeng Ma, Chenhao Li
, Luan Zhang:
CompKBQA: Component-wise Task Decomposition for Knowledge Base Question Answering. 293-309 - Yang Zhao, Yixin Wang, Mingzhang Yin:

Permutative Preference Alignment from Listwise Ranking of Human Judgments. 310-334 - Junyu Cheng, Chang Pan, Shuangyin Li:

ToneCraft: Cantonese Lyrics Generation with Harmony of Tones and Pitches. 335-353 - Zechen Li

, Shohreh Deldari, Linyao Chen, Hao Xue, Flora D. Salim:
SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition. 354-379 - Tuan-Luc Huynh, Thuy-Trang Vu, Weiqing Wang, Trung Le, Dragan Gasevic, Yuan-Fang Li, Thanh-Toan Do:

MixLoRA-DSI: Dynamically Expandable Mixture-of-LoRA Experts for Rehearsal-Free Generative Retrieval over Dynamic Corpora. 380-396 - Patrick Giedemann, Pius von Däniken, Jan Milan Deriu, Álvaro Rodrigo, Anselmo Peñas, Mark Cieliebak:

ViClaim: A Multilingual Multilabel Dataset for Automatic Claim Detection in Videos. 397-413 - Yuxiang Zheng, Dayuan Fu, Xiangkun Hu, Xiaojie Cai, Lyumanshan Ye, Pengrui Lu, Pengfei Liu:

DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments. 414-431 - Enjun Du, Siyi Liu, Yongqi Zhang:

Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning. 432-453 - Zhaodan Zhang, Jin Zhang

, Hui Xu, Jiafeng Guo, Xueqi Cheng:
MPRF: Interpretable Stance Detection through Multi-Path Reasoning Framework. 454-470 - Junjie Ye

, Yuming Yang
, Yang Nan, Shuo Li, Qi Zhang, Tao Gui, Xuanjing Huang, Peng Wang, Zhongchao Shi, Jianping Fan:
Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels. 471-513 - Jingyu Wei, Bo Liu, Tianjiao Wan, Baoyun Peng, Xingkong Ma, Mengmeng Guo:

JI2S: Joint Influence-Aware Instruction Data Selection for Efficient Fine-Tuning. 514-527 - Xingjian Diao, Chunhui Zhang, Keyi Kong, Weiyi Wu, Chiyu Ma, Zhongyu Ouyang, Peijun Qing, Soroush Vosoughi, Jiang Gui:

SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models. 528-540 - Xiangchen Wang, Jinrui Zhang, Teng Wang, Haigang Zhang, Feng Zheng:

Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors. 541-558 - Xuanliang Zhang, Dingzirui Wang, Keyan Xu, Qingfu Zhu, Wanxiang Che:

RoT: Enhancing Table Reasoning with Iterative Row-Wise Traversals. 559-579 - ZhaoDan Zhang, Jin Zhang

, Xueqi Cheng, Hui Xu:
T-MAD: Target-driven Multimodal Alignment for Stance Detection. 580-595 - Kun Peng, Cong Cao, Hao Peng, Guanlin Wu, Zhifeng Hao, Lei Jiang, Yanbing Liu, Philip S. Yu:

Emotion Transfer with Enhanced Prototype for Unseen Emotion Recognition in Conversation. 596-608 - Ruoxi Cheng, Yizhong Ding, Shuirong Cao, Ranjie Duan, Xiaoshuang Jia, Shaowei Yuan, Simeng Qin, Zhiqiang Wang, Xiaojun Jia:

PBI-Attack: Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization. 609-628 - Yilong Xu, Jinhua Gao, Xiaoming Yu, Yuanhai Xue, Baolong Bi, Huawei Shen, Xueqi Cheng:

Training a Utility-based Retriever Through Shared Context Attribution for Retrieval-Augmented Language Models. 629-648 - Kaiyue Feng, Siyue Zhang, Bingsen Chen, Yilun Zhao, Chen Zhao:

SportReason: Evaluating Retrieval-Augmented Reasoning across Tables and Text for Sports Question Answering. 649-662 - Junsheng Huang, Zhitao He, Yuchen Huang, Sandeep Polisetty, Qingyun Wang, Yi R. Fung:

MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness. 663-676 - Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, Yulan He:

CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation. 677-693 - Chenxing Wei, Mingwen Ou, Ying He, Yao Shu, Fei Yu:

PAFT: Prompt-Agnostic Fine-Tuning. 694-717 - Linger Deng, Linghao Zhu, Yuliang Liu, Yu Wang, Qunyi Xie, Jingjing Wu, Gang Zhang, Yingying Zhu, Xiang Bai:

Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning. 718-735 - Yanshu Li, Jianjiang Yang, Tian Yun, Pinyuan Feng, Jinfa Huang, Ruixiang Tang:

TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration. 736-763 - Tianxin Xie, Yan Rong, Pengfei Zhang, Wenwu Wang, Li Liu:

Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey. 764-791 - Lyucheng Wu, Mengru Wang, Ziwen Xu, Tri Cao, Nay Oo, Bryan Hooi, Shumin Deng:

Automating Steering for Safe Multimodal Large Language Models. 792-814 - Yilin Jiang, Mingzi Zhang, Sheng Jin, Zengyi Yu, Xiangjie Kong, Binghao Tu:

EMNLP: Educator-role Moral and Normative Large Language Models Profiling. 815-843 - Bohao Chu

, Meijie Li, Sameh Frihat, Chengyu Gu, Georg Lodde, Elisabeth Livingstone, Norbert Fuhr:
TracSum: A New Benchmark for Aspect-Based Summarization with Sentence-Level Traceability in Medical Domain. 844-864 - Wenbin Hu, Haoran Li, Huihao Jing, Qi Hu, Ziqian Zeng, Sirui Han, Heli Xu, Tianshu Chu, Peizhao Hu, Yangqiu Song:

Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning. 865-883 - Liqiang Ming, Sheng-hua Zhong, Yuncong Li:

Towards General-Domain Word Sense Disambiguation: Distilling Large Language Model into Compact Disambiguator. 884-897 - Hongyuan Lu

, Zixuan Li, Zefan Zhang, Wai Lam:
SLoW: Select Low-frequency Words! Automatic Dictionary Selection for Translation on Large Language Models. 898-913 - Haoyi Wu, Zhihao Teng, Kewei Tu:

Parallel Continuous Chain-of-Thought with Jacobi Iteration. 914-926 - Yuhang Chen, Zhen Tan, Tianlong Chen:

EQA-RM: A Generative Embodied Reward Model with Test-time Scaling. 927-945 - Yongkang Chen, Xiaohu Du, Xiaotian Zou, Chongyang Zhao, Huan Deng, Hu Li, Xiaohui Kuang:

Refusal-Aware Red Teaming: Exposing Inconsistency in Safety Evaluations. 946-955 - Zekun Xi, Wenbiao Yin, Jizhan Fang, Jialong Wu, Runnan Fang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang:

OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking. 956-976 - Yihan Wang, Peiyu Liu, Xin Yang:

LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL. 977-991 - Yihong Liu, Runsheng Chen, Lea Hirlimann, Ahmad Dawar Hakimi, Mingyang Wang, Amir Hossein Kargaran, Sascha Rothe, François Yvon, Hinrich Schütze:

On Relation-Specific Neurons in Large Language Models. 992-1022 - Hengyu An

, Jinghuai Zhang, Tianyu Du, Chunyi Zhou, Qingming Li, Tao Lin, Shouling Ji:
IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents. 1023-1039 - Xingjian Diao, Weiyi Wu, Keyi Kong, Peijun Qing, Xinwen Xu, Ming Cheng, Soroush Vosoughi, Jiang Gui:

ProtoVQA: An Adaptable Prototypical Framework for Explainable Fine-Grained Visual Question Answering. 1040-1057 - Yuanyang Yin, Yaqi Zhao, Yajie Zhang

, Yuanxing Zhang, Ke Lin, Jiahao Wang, Xin Tao, Pengfei Wan, Wentao Zhang, Feng Zhao:
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs. 1058-1070 - George Arthur Baker, Mario Sanz-Guerrero, Katharina von der Wense:

Molecular String Representation Preferences in Pretrained LLMs: A Comparative Study in Zero- & Few-Shot Molecular Property Prediction. 1071-1085 - Ming Wang, Miao Zhang, Xuebo Liu, Liqiang Nie:

Weight-Aware Activation Sparsity with Constrained Bayesian Optimization Scheduling for Large Language Models. 1086-1098 - Ziming You, Yumiao Zhang, Dexuan Xu, Yiwei Lou, Yandong Yan, Wei Wang, Huamin Zhang, Yu Huang:

DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation. 1099-1123 - Yang Du, Zhuoran Lin, Kaiqiang Song, Biao Wang, Zhicheng Zheng, Tiezheng Ge, Bo Zheng, Qin Jin:

VC4VG: Optimizing Video Captions for Text-to-Video Generation. 1124-1138 - Alireza Salemi, Hamed Zamani:

LaMP-QA: A Benchmark for Personalized Long-form Question Answering. 1139-1159 - Yubo Zhu

, Dongrui Liu, Zecheng Lin, Wei Tong, Sheng Zhong, Jing Shao:
The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations. 1160-1176 - Huihao Jing, Haoran Li, Wenbin Hu, Qi Hu, Heli Xu, Tianshu Chu, Peizhao Hu, Yangqiu Song:

MCIP: Protecting MCP Safety via Model Contextual Integrity Protocol. 1177-1194 - Wenyu Tao, Xiaofen Xing, Zeliang Li, Xiangmin Xu:

SAKI-RAG: Mitigating Context Fragmentation in Long-Document RAG via Sentence-level Attention Knowledge Integration. 1195-1213 - Yuchen Ji, Bo Xu, Jie Shi, Jiaqing Liang, Deqing Yang, Yu Mao, Hai Chen, Yanghua Xiao:

Skeletons Matter: Dynamic Data Augmentation for Text-to-Query. 1214-1236 - Cheng Shen, Yew-Soon Ong, Joey Tianyi Zhou:

CondenseLM: LLMs-driven Text Dataset Condensation via Reward Matching. 1237-1252 - Gueter Josmy Faure, Min-Hung Chen, Jia-Fong Yeh, Ying Cheng, Hung-Ting Su, Yung-Hao Tang, Shang-Hong Lai, Winston H. Hsu:

MovieCORE: COgnitive REasoning in Movies. 1253-1272 - Yuhao Chen, Yuanjie Lyu

, Shuochen Liu, Chao Zhang, Junhui Lv, Tong Xu:
Think Wider, Detect Sharper: Reinforced Reference Coverage for Document-Level Self-Contradiction Detection. 1273-1288 - Arijit Maji, Raghvendra Kumar, Akash Ghosh, Anushka, Nemil Shah, Abhilekh Borah, Vanshika Shah, Nishant Mishra, Sriparna Saha:

DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian Culture. 1289-1313 - Changbing Yang, Franklin Ma, Freda Shi, Jian Zhu:

LingGym: How Far Are LLMs from Thinking Like Field Linguists? 1314-1340 - Haijian Ma, Daizong Liu, Xiaowen Cai, Pan Zhou, Yulai Xie:

Learning from Few Samples: A Novel Approach for High-Quality Malcode Generation. 1341-1358 - Sarfaroz Yunusov, Kaige Chen, Kazi Nishat Anwar, Ali Emami

:
Personality Matters: User Traits Predict LLM Preferences in Multi-Turn Collaborative Tasks. 1359-1372 - Yiming Jia, Jiachen Li, Xiang Yue, Bo Li, Ping Nie, Kai Zou, Wenhu Chen:

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search. 1373-1393 - Qingcheng Zeng, Weihao Xuan, Leyang Cui, Rob Voigt:

Thinking Out Loud: Do Reasoning Models Know When They're Right? 1394-1407 - Weihao Xuan, Qingcheng Zeng, Heli Qi, Junjue Wang, Naoto Yokoya:

Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models. 1408-1450 - Mengqi Liao, Xiangyu Xi, Ruinian Chen, Jia Leng, Yangen Hu, Ke Zeng, Shuai Liu, Huaiyu Wan:

Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs. 1451-1463 - Ingroj Shrestha, Padmini Srinivasan:

LLM Bias Detection and Mitigation through the Lens of Desired Distributions. 1464-1480 - Teng Lin, Yuyu Luo, Honglin Zhang, Jicheng Zhang, Chunlin Liu, Kaishun Wu, Nan Tang:

MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering. 1481-1494 - Yifei Wang, Feng Xiong, Yong Wang, Linjing Li, Xiangxiang Chu, Daniel Dajun Zeng:

POSITION BIAS MITIGATES POSITION BIAS: Mitigate Position Bias Through Inter-Position Knowledge Distillation. 1495-1512 - Weihao Xuan, Rui Yang, Heli Qi, Qingcheng Zeng, Yunze Xiao, Aosong Feng, Dairui Liu, Yun Xing, Junjue Wang, Fan Gao, Jinghui Lu

, Yuang Jiang, Huitao Li, Xin Li, Kunyu Yu, Ruihai Dong, Shangding Gu, Yuekang Li, Xiaofei Xie
, Felix Juefei-Xu, Foutse Khomh, Osamu Yoshie, Qingyu Chen, Douglas Teodoro, Nan Liu, Randy Goebel, Lei Ma, Edison Marrese-Taylor, Shijian Lu, Yusuke Iwasawa, Yutaka Matsuo, Irene Li:
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation. 1513-1532 - Weiming Zhang, Qingyao Li, Xinyi Dai, Jizheng Chen, Kounianhua Du, Weiwen Liu, Yasheng Wang, Ruiming Tang, Yong Yu, Weinan Zhang:

NL-Debugging: Exploiting Natural Language as an Intermediate Representation for Code Debugging. 1533-1549 - Bryan Chen Zhengyu Tan, Daniel Wai Kit Chin, Zhengyuan Liu, Nancy F. Chen, Roy Ka-Wei Lee:

Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD. 1550-1575 - Yuan Liu, Zhongyin Zhao, Le Tian, Haicheng Wang, Xubing Ye, Yangxiu You, Zilin Yu, Chuhan Wu, Zhou Xiao, Yang Yu, Jie Zhou:

POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion. 1576-1601 - Xuemei Tang, Xufeng Duan, Zhenguang G. Cai:

Large Language Models for Automated Literature Review: An Evaluation of Reference Generation, Abstract Writing, and Review Composition. 1602-1617 - Nafiseh Nikeghbal, Amir Hossein Kargaran, Jana Diesner:

CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs. 1618-1639 - Huan Xu, Zequn Li, Wen Tang, Jian Jun Zhang:

From Schema to State: Zero-Shot Scheme-Only Dialogue State Tracking via Diverse Synthetic Dialogue and Step-by-Step Distillation. 1640-1652 - Zhi-Yuan Chen, Hao Wang, Xinyu Zhang, Enrui Hu, Yankai Lin:

Beyond the Surface: Measuring Self-Preference in LLM Judgments. 1653-1672 - Dong Shu, Xuansheng Wu, Haiyan Zhao, Mengnan Du, Ninghao Liu:

Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders. 1673-1682 - Hengran Zhang, Minghao Tang, Keping Bi, Jiafeng Guo, Shihao Liu, Daiting Shi, Dawei Yin, Xueqi Cheng:

Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation. 1683-1702 - Ege Yigit Çelik, Selma Tekir:

CiteBART: Learning to Generate Citations for Local Citation Recommendation. 1703-1719 - Lan Zhang

, Marco Valentino, André Freitas
:
Autoformalization in the Wild: Assessing LLMs on Real-World Mathematical Definitions. 1720-1738 - Caleb Ziems, William Barr Held, Jane Yu, Amir Goldberg, David Grusky, Diyi Yang:

Culture Cartography: Mapping the Landscape of Cultural Knowledge. 1739-1757 - Gregory Polyakov, Christian Hepting, Carsten Eickhoff, Seyed Ali Bahrainian

:
Interpretability Analysis of Arithmetic In-Context Learning in Large Language Models. 1758-1777 - Yao Zhang

, Chenyang Lin, Shijie Tang, Haokun Chen, Shijie Zhou, Yunpu Ma, Volker Tresp:
SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence. 1778-1818 - Nikta Gohari Sadr, Sahar Heidariasl, Karine Megerdoomian, Laleh Seyyed-Kalantari, Ali Emami

:
We Politely Insist: Your LLM Must Learn the Persian Art of Taarof. 1819-1838 - Dustin Wright

, Zain Muhammad Mujahid, Lu Wang, Isabelle Augenstein
, David Jurgens:
Unstructured Evidence Attribution for Long Context Query Focused Summarization. 1839-1867 - Subrata Biswas, Mohammad Nur Hossain Khan, Bashima Islam:

RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language. 1868-1894 - Mingyuan Wu, Jize Jiang, Haozhen Zheng, Meitang Li, Zhaoheng Li, Beitong Tian, Bo Chen, Yongjoo Park, Minjia Zhang, ChengXiang Zhai, Klara Nahrstedt:

Cache-of-Thought: Master-Apprentice Framework for Cost-Effective Vision Language Model Reasoning. 1895-1909 - Xuyang Liu, Yiyu Wang, Junpeng Ma, Linfeng Zhang:

Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models. 1910-1924 - Shwai He, Tao Ge, Guoheng Sun, Bowei Tian, Xiaoyang Wang, Dong Yu:

Router-Tuning: A Simple and Effective Approach for Dynamic Depth. 1925-1938 - Zixuan Weng, Xiaolong Jin, Jinyuan Jia, Xiangyu Zhang:

Foot-In-The-Door: A Multi-turn Jailbreak for LLMs. 1939-1950 - Yuan Yuan, Muyu He, Muhammad Adil Shahid, Ziyang Li, Jiani Huang, Li Zhang:

TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games. 1951-1965 - Minghui Li, Hao Zhang, Yechao Zhang, Wei Wan, Shengshan Hu, Pei Xiaobing, Jing Wang:

Transferable Direct Prompt Injection via Activation-Guided MCMC Sampling. 1966-1978 - Peifeng Wang, Austin Xu, Yilun Zhou, Caiming Xiong, Shafiq Joty:

Direct Judgement Preference Optimization. 1979-2009 - Xilong Wang, John Bloch, Zedian Shao, Yuepeng Hu, Shuyan Zhou, Neil Zhenqiang Gong:

WebInject: Prompt Injection Attack to Web Agents. 2010-2030 - Tian Lan, Jiang Li, Yemin Wang, Xu Liu, Xiangdong Su, Guanglai Gao:

F²Bench: An Open-ended Fairness Evaluation Benchmark for LLMs with Factuality Considerations. 2031-2046 - Taylor Sorensen, Pushkar Mishra, Roma Patel, Michael Henry Tessler, Michiel A. Bakker, Georgina Evans, Iason Gabriel, Noah D. Goodman, Verena Rieser:

Value Profiles for Encoding Human Variation. 2047-2095 - Lucius E. J. Bynum, Kyunghyun Cho:

Language Models as Causal Effect Generators. 2096-2115 - Joshua Rozner, Leonie Weissweiler, Kyle Mahowald, Cory Shain:

Constructions are Revealed in Word Distributions. 2116-2138 - Yilun Yang, Yekun Chai:

CodeMixBench: Evaluating Code-Mixing Capabilities of LLMs Across 18 Languages. 2139-2169 - Jiyue Jiang, Yitao Xu, Zikang Wang, Yihan Ye, Yanruisheng Shao, Yuheng Shan, Jiuming Wang, Xiaodan Fan, Jiao Yuan, Yu Li:

RBPtool: A Deep Language Model Framework for Multi-Resolution RBP-RNA Binding Prediction and RNA Molecule Design. 2170-2185 - Yiran Yang, Haifeng Sun, Jingyu Wang, Qi Qi, Zirui Zhuang, Huazheng Wang, Pengfei Ren, Jing Wang, Jianxin Liao:

Unveiling Internal Reasoning Modes in LLMs: A Deep Dive into Latent Reasoning vs. Factual Shortcuts with Attribute Rate Ratio. 2186-2206 - Zirui He, Mingyu Jin, Bo Shen, Ali Payani, Yongfeng Zhang, Mengnan Du:

SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models. 2207-2236 - Joshua Rozner, Leonie Weissweiler, Cory Shain:

BabyLM's First Constructions: Causal interventions provide a signal of learning. 2237-2249 - Itay Nakash, George Kour, Koren Lazar, Matan Vetzler, Guy Uziel, Ateret Anaby-Tavor:

Effective Red-Teaming of Policy-Adherent Agents. 2250-2268 - Zongxi Li, Yang Li, Haoran Xie, S. Joe Qin:

CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering. 2269-2288 - Kunlun Zhu, Jiaxun Zhang, Ziheng Qi, Nuoxing Shang, Zijia Liu, Peixuan Han, Yu Su, Haofei Yu, Jiaxuan You:

SafeScientist: Enhancing AI Scientist Safety for Risk-Aware Scientific Discovery. 2289-2317 - Adrian Benton, Alexander Gutkin, Christo Kirov, Brian Roark:

Improving Informally Romanized Language Identification. 2318-2336 - Ivan Kobyzev, Abbas Ghaddar, Dingtao Hu, Boxing Chen:

Integral Transformer: Denoising Attention, Not Too Much Not Too Little. 2337-2354 - Yicheng Fu, Zhemin Huang, Liuxin Yang, Yumeng Lu, Zhongdongming Dai:

CHENGYU-BENCH: Benchmarking Large Language Models for Chinese Idiom Understanding and Use. 2355-2366 - Divyanshu Aggarwal, Ashutosh Sathe, Sunayana Sitaram:

Improving Cross Lingual Transfer by Pretraining with Active Forgetting. 2367-2378 - Shuo Xing, Peiran Li, Yuping Wang, Ruizheng Bai, Yueqi Wang, Chan-Wei Hu, Chengxuan Qian, Huaxiu Yao, Zhengzhong Tu:

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization. 2379-2397 - Crystal Qian, Aaron T. Parisi, Clémentine Bouleau, Vivian Tsai, Maël Lebreton, Lucas Dixon:

To Mask or to Mirror: Human-AI Alignment in Collective Reasoning. 2398-2423 - Krishna C. Puvvada, Faisal Ladhak, Santiago Akle Serano, Cheng-Ping Hsieh, Shantanu Acharya, Somshubra Majumdar, Fei Jia, Samuel Kriman, Simeng Sun, Dima Rekesh, Boris Ginsburg:

SWAN: An Efficient and Scalable Approach for Long-Context Language Modeling. 2424-2438 - Melissa Roemmele, John Joon Young Chung, Taewook Kim, Yuqian Sun, Alex Calderwood, Max Kreminski:

LLMs Behind the Scenes: Enabling Narrative Scene Illustration. 2439-2457 - Le Zhang, Bo Wang, Xipeng Qiu, Siva Reddy, Aishwarya Agrawal:

REARANK: Reasoning Re-ranking Agent via Reinforcement Learning. 2458-2471 - Marcus Ma, Georgios Chochlakis, Niyantha Maruthu Pandiyan, Jesse Thomason, Shrikanth Narayanan:

Large Language Models Do Multi-Label Classification Differently. 2472-2495 - Lester James Validad Miranda, Elyanah Aco, Conner G. Manuel, Jan Christian Blaise Cruz, Joseph Marvin Imperial:

FilBench: Can LLMs Understand and Generate Filipino? 2496-2529 - ChengYan Wu, Bolei Ma, Yihong Liu, Zheyu Zhang, Ningyuan Deng, Yanshu Li, Baolan Chen, Yi Zhang, Yun Xue, Barbara Plank:

M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis. 2530-2557 - Alexandr Nesterov, Andrey Sakhovskiy, Ivan Sviridov, Airat Valiev, Vladimir Makharev, Petr Anokhin, Galina Zubkova, Elena Tutubalina:

RuCCoD: Towards Automated ICD Coding in Russian. 2558-2585 - Dayu Yang, Tianyang Liu, Daoan Zhang, Antoine Simoulin, Xiaoyi Liu, Yuwei Cao, Zhaopu Teng, Xin Qian, Grey Yang, Jiebo Luo, Julian J. McAuley:

Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs. 2586-2616 - Pin-Jie Lin, Rishab Balasubramanian, Fengyuan Liu, Nikhil Kandpal, Tu Vu:

Efficient Model Development through Fine-tuning Transfer. 2617-2636 - Mingyang Wang, Lukas Lange, Heike Adel, Yunpu Ma, Jannik Strötgen, Hinrich Schütze:

Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes. 2637-2665 - Yuhan Liu, Michael JQ Zhang, Eunsol Choi:

User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal. 2666-2681 - Yu-Wen Chen, Melody Ma, Julia Hirschberg:

Read to Hear: A Zero-Shot Pronunciation Assessment Using Textual Descriptions and LLMs. 2682-2694 - Sanchit Sinha, Guangzhi Xiong, Aidong Zhang:

COCO-Tree: Compositional Hierarchical Concept Trees for Enhanced Reasoning in Vision-Language Models. 2695-2711 - Tong Bao, Mir Tafseer Nayeem, Davood Rafiei, Chengzhi Zhang:

SurveyGen: Quality-Aware Scientific Survey Generation with Large Language Models. 2712-2736 - Zhisheng Zheng, Puyuan Peng, Anuj Diwan, Cong Phuoc Huynh, Xiaohang Sun, Zhu Liu, Vimal Bhat, David Harwath:

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing. 2737-2756 - Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu, Kai Shu, Lu Cheng, Huan Liu:

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge. 2757-2791 - Iustin Sirbu, Robert-Adrian Popovici, Cornelia Caragea, Stefan Trausan-Matu, Traian Rebedea:

MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification. 2792-2808 - Prakamya Mishra, Jiang Liu, Jialian Wu, Xiaodong Yu, Zicheng Liu, Emad Barsoum:

TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games. 2809-2831 - Zhenyu Lei, Zhen Tan, Song Wang, Yaochen Zhu, Zihan Chen, Yushun Dong, Jundong Li:

Learning from Diverse Reasoning Paths with Routing and Collaboration. 2832-2845 - Jiayuan Zhu, Jiazhen Pan, Yuyuan Liu, Fenglin Liu, Junde Wu:

Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning. 2846-2857 - Shrey Pandit, Jiawei Xu, Junyuan Hong, Zhangyang Wang, Tianlong Chen, Kaidi Xu, Ying Ding:

MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models. 2858-2873 - Jonathan Ivey, Susan Gauch, David Jurgens

:
NUTMEG: Separating Signal From Noise in Annotator Disagreement. 2874-2887 - Abhilekh Borah, Chhavi Sharma, Danush Khanna, Utkarsh Bhatt, Gurpreet Singh, Hasnat Md Abdullah, Raghav Kaushik Ravi, Vinija Jain, Jyoti Patel, Shubham Singh, Vasu Sharma, Arpita Vats, Rahul Raja, Aman Chadha, Amitava Das:

Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations. 2888-2947 - Hayoung Jung, Shravika Mittal, Ananya Aatreya, Navreet Kaur, Munmun De Choudhury, Tanushree Mitra:

MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform. 2948-2982 - Rimon Melamed, Lucas H. McCabe, H. Howie Huang:

Demystifying optimized prompts in language models. 2983-2999 - Cihan Xiao, Matthew Wiesner, Debashish Chakraborty, Reno Kriz, Keith Cunningham, Kenton Murray, Kevin Duh, Luis Tavarez-Arce, Paul McNamee, Sanjeev Khudanpur:

Whisper-UT: A Unified Translation Framework for Speech and Text. 3000-3016 - Yubo Wang, Ping Nie, Kai Zou, Lijun Wu, Wenhu Chen:

Unleashing the Reasoning Potential of LLMs by Critique Fine-Tuning on One Problem. 3017-3027 - Hongxiang Zhang, Hao Chen, Muhao Chen, Tianyi Zhang:

Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation. 3028-3046 - Tianhao Zhang, Zhecheng Sheng, Zhexiao Lin, Chen Jiang, Dongyeop Kang:

BBScoreV2: Learning Time-Evolution and Latent Alignment from Stochastic Representation. 3047-3061 - Yu Xia, Yiran Shen, Junda Wu, Tong Yu, Sungchul Kim, Ryan A. Rossi, Lina Yao, Julian J. McAuley:

SAND: Boosting LLM Agents with Self-Taught Action Deliberation. 3062-3077 - Lingyao Li, Dawei Li, Zhenhui Ou, Xiaoran Xu, Jingxiao Liu, Zihui Ma, Runlong Yu, Min Deng:

LLMs as World Models: Data-Driven and Human-Centered Pre-Event Simulation for Disaster Impact Assessment. 3078-3096 - Hua Shen, Nicholas Clark, Tanu Mitra:

Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values? 3097-3118 - Jiazheng Li, Yuxiang Zhou, Junru Lu, Gladys Tyen, Lin Gui, Cesare Aloisi

, Yulan He:
Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time. 3119-3140 - Sania Waheed, Na Min An:

Image Embedding Sampling Method for Diverse Captioning. 3141-3157 - Huihan Li, You Chen, Siyuan Wang, Yixin He, Ninareh Mehrabi, Rahul Gupta, Xiang Ren:

Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time. 3158-3180 - Jiarui Yao, Ruida Wang, Tong Zhang:

FANS: Formal Answer Selection for LLM Natural Language Math Reasoning Using Lean4. 3181-3200 - Gagan Bhatia, Maxime Peyrard, Wei Zhao:

Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning. 3201-3219 - Jianyou Wang, Weili Cao, Longtian Bao, Youze Zheng, Gil Pasternak, Kaicheng Wang, Xiaoyue Wang, Ramamohan Paturi, Leon Bergen:

Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark. 3220-3248 - Boyu Guan

, Chuang Han, Yining Zhang, Yupu Liang, Zhiyang Zhang, Yang Zhao, Chengqing Zong:
SHIFT: Selected Helpful Informative Frame for Video-guided Machine Translation. 3249-3267 - Bohan Lyu, Siqiao Huang, Zichen Liang, Qian Sun, Jiaming Zhang:

Surge: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors. 3268-3308 - Carlos Mullov, Alexander Waibel:

Few-Shot Learning Translation from New Languages. 3309-3330 - Yunze Xiao, Lynnette Hui Xian Ng, Jiarui Liu, Mona T. Diab:

Humanizing Machines: Rethinking LLM Anthropomorphism Through a Multi-Level Framework of Design. 3331-3350 - Heming Xia, Chak Tou Leong, Wenjie Wang, Yongqi Li, Wenjie Li:

TokenSkip: Controllable Chain-of-Thought Compression in LLMs. 3351-3363 - Tu Anh Dinh

, Jan Niehues:
Are Generative Models Underconfident? Better Quality Estimation with Boosted Model Probability. 3364-3382 - Zhaofeng Wu, Michihiro Yasunaga, Andrew Cohen, Yoon Kim, Asli Celikyilmaz, Marjan Ghazvininejad:

reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs. 3383-3409 - Ting-Yun Chang, Muru Zhang, Jesse Thomason, Robin Jia

:
Why Do Some Inputs Break Low-Bit LLM Quantization? 3410-3429 - Keisuke Kamahori, Jungo Kasai, Noriyuki Kojima, Baris Kasikci:

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation. 3430-3442 - Hao Nan Sheng

, Zhi-Yong Wang
, Hing Cheung So, Mingrui Yang:
AROMA: Autonomous Rank-one Matrix Adaptation. 3443-3459 - Ziyang Ma, Qingyue Yuan, Zhenglin Wang, Deyu Zhou:

Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens. 3460-3477 - Qibin Li, Zhen Xu, Shengyuan Bai, Nianmin Yao, Kaili Sun, Bowen Wu, Ying Li, Baoxun Wang:

Anchoring-Guidance Fine-Tuning (AnGFT): Elevating Professional Response Quality in Role-Playing Conversational Agents. 3478-3496 - Yuhang He, Yash Jain, Xubo Liu, Andrew Markham, Vibhav Vineet:

RiTTA: Modeling Event Relations in Text-to-Audio Generation. 3497-3511 - Xiaofeng Zhang, Yihao Quan, Chen Shen, Chaochen Gu, Xiaosong Yuan, Shaotian Yan, Jiawei Cao, Hao Cheng, Kaijie Wu, Jieping Ye:

Shallow Focus, Deep Fixes: Enhancing Shallow Layers Vision Attention Sinks to Alleviate Hallucination in LVLMs. 3512-3534 - Peerat Limkonchotiwat, Pume Tuchinda, Lalita Lowphansirikul, Surapon Nonesung, Panuthep Tasawong, Alham Fikri Aji, Can Udomcharoenchaikit, Sarana Nutanong:

WangchanThaiInstruct: An instruction-following Dataset for Culture-Aware, Multitask, and Multi-domain Evaluation in Thai. 3535-3558 - Zhengyi Zhao, Shubo Zhang, Yuxi Zhang, Yanxi Zhao, Yifan Zhang, Zezhong Wang, Huimin Wang, Yutian Zhao, Bin Liang, Yefeng Zheng, Binyang Li, Kam-Fai Wong, Xian Wu:

MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models. 3559-3582 - Dongning Rao, Rongchu Zhou, Peng Chen, Zhihua Jiang:

A Comprehensive Literary Chinese Reading Comprehension Dataset with an Evidence Curation Based Solution. 3583-3603 - Jie Shi, Xi Cao, Bo Xu, Jiaqing Liang, Yanghua Xiao, Jia Chen, Peng Wang, Wei Wang:

Dialect-SQL: An Adaptive Framework for Bridging the Dialect Gap in Text-to-SQL. 3604-3619 - Yixuan Tang

, Yi Yang:
FinMTEB: Finance Massive Text Embedding Benchmark. 3620-3638 - Anuj Diwan, Zhisheng Zheng, David Harwath, Eunsol Choi:

Scaling Rich Style-Prompted Text-to-Speech Datasets. 3639-3659 - Mahammed Kamruzzaman, Gene Louis Kim:

Exploring Changes in Nation Perception with Nationality-Assigned Personas in LLMs. 3660-3678 - Jianxing Yu, Zihao Gou, Chen Li, Zhisheng Wang

, Peiji Yang, Wenqing Chen, Jian Yin:
Eliciting Implicit Acoustic Styles from Open-domain Instructions to Facilitate Fine-grained Controllable Generation of Speech. 3679-3695 - Xiaoyu Xu, Minxin Du, Qingqing Ye, Haibo Hu:

OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models. 3696-3715 - Jiajie Zhang, Nianyi Lin, Lei Hou, Ling Feng, Juanzi Li:

AdaptThink: Reasoning Models Can Learn When to Think. 3716-3730 - Zhengyi Zhao, Shubo Zhang, Zezhong Wang, Huimin Wang, Yutian Zhao, Bin Liang, Yefeng Zheng, Binyang Li, Kam-Fai Wong, Xian Wu:

T2: An Adaptive Test-Time Scaling Strategy for Contextual Question Answering. 3731-3756 - Yang Wu, Ruijia Wang, Jie Wu:

Non-Existent Relationship: Fact-Aware Multi-Level Machine-Generated Text Detection. 3757-3768 - Ziwei Ji, Lei Yu, Yeskendir Koishekenov, Yejin Bang, Anthony Hartshorn, Alan Schelten, Cheng Zhang, Pascale Fung, Nicola Cancedda:

Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations. 3769-3793 - Huanghai Liu, Quzhe Huang, Qingjing Chen

, Yiran Hu, Jiayu Ma, Yun Liu, Weixing Shen, Yansong Feng:
JUREX-4E: Juridical Expert-Annotated Four-Element Knowledge Base for Legal Reasoning. 3794-3814 - Vinay Samuel, Harshita Diddee, Yiming Zhang, Daphne Ippolito:

CIE: Controlling Language Model Text Generations Using Continuous Signals. 3815-3825 - Xi Wang, Songlei Jian, Shasha Li, Xiaopeng Li, Bin Ji, Ma Jun, Xiaodong Liu, Jing Wang, Jianfeng Zhang, Jie Yu, Feilong Bao, Wangbaosheng:

Stand on The Shoulders of Giants: Building JailExpert from Previous Attack Experience. 3826-3843 - Boyu Mi, Hanqing Wang, Tai Wang, Yilun Chen, Jiangmiao Pang:

Language-to-Space Programming for Training-Free 3D Visual Grounding. 3844-3864 - Wanlong Liu, Junying Chen, Ke Ji, Li Zhou, Wenyu Chen, Benyou Wang:

RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions. 3865-3888 - Yilong Lai, Jialong Wu, Zhenglin Wang, Deyu Zhou:

AdaRewriter: Unleashing the Power of Prompting-based Conversational Query Reformulation via Test-Time Adaptation. 3889-3905 - Xudong Lu, Haohao Gao, Renshou Wu, Shuai Ren, Xiaoxin Chen, Hongsheng Li, Fangyuan Li:

SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant? 3906-3931 - Tan Yue, Rui Mao, Zilong Song, Zonghai Hu, Dongyan Zhao:

F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task. 3932-3948 - Qiyuan Chen, Hongsen Huang, Qian Shao, Jiahe Chen, Jintai Chen, Hongxia Xu, Renjie Hua, Ren Chuan, Jian Wu:

Icon2: Aligning Large Language Models Using Self-Synthetic Preference Data via Inherent Regulation. 3949-3968 - Ming Dong, Jinkui Zhang, Bolong Zheng, Xinhui Tu, Po Hu, Tingting He:

DSCD: Large Language Model Detoxification with Self-Constrained Decoding. 3969-3984 - Jue Zhang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang:

From Reasoning to Answer: Empirical, Attention-Based and Mechanistic Insights into Distilled DeepSeek R1 Models. 3985-4002 - Songbo Hu, Ivan Vulic, Anna Korhonen:

Quantifying Language Disparities in Multilingual Large Language Models. 4003-4018 - Jihyung Lee, Daehui Kim

, Seonjeong Hwang, Hyounghun Kim, Gary Lee:
KoBLEX: Open Legal Question Answering with Multi-hop Reasoning. 4019-4053 - Bichen Wang, Yuzhe Zi, Yixin Sun, Hao Yang, Yanyan Zhao, Bing Qin:

End-to-End Learnable Psychiatric Scale Guided Risky Post Screening for Depression Detection on Social Media. 4054-4066 - Xinjie Zhao, Fan Gao, Xingyu Song, Yingjian Chen, Rui Yang, Yanran Fu, Yuyang Wang, Yusuke Iwasawa, Yutaka Matsuo, Irene Li:

ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA. 4067-4089 - Peter A. Jansen, Samiah Hassan, Ruoyao Wang:

Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials Science. 4090-4102 - Jiale Kang, Ziyin Yue, Qingyu Yin, Rui Jiang, Weile Li, Zening Lu, Zhouran Ji:

ModRWKV: Transformer Multimodality in Linear Time. 4103-4115 - Jiaao Yu, Yijing Lin, Zhipeng Gao, Xuesong Qiu, Lanlan Rui:

Multimedia Event Extraction with LLM Knowledge Editing. 4116-4124 - Shuo Wang

, Renhao Li, Xi Chen, Yulin Yuan, Min Yang, Derek F. Wong:
Exploring the Impact of Personality Traits on LLM Bias and Toxicity. 4125-4143 - Chenyuan He, Yuxiang Jia, Fei Gao, Senbin Zhu, Hongde Liu, Hongying Zan, Min Peng:

Task-aware Contrastive Mixture of Experts for Quadruple Extraction in Conversations with Code-like Replies and Non-opinion Detection. 4144-4159 - Dianqing Liu, Yi Liu, Guoqing Jin, Zhendong Mao:

Mitigating Biases in Language Models via Bias Unlearning. 4160-4178 - Jing Xiong, Jianghan Shen, Fanghua Ye, Chaofan Tao, Zhongwei Wan, Jianqiao Lu, Xun Wu, Chuanyang Zheng, Zhijiang Guo

, Min Yang, Lingpeng Kong, Ngai Wong:
UNComp: Can Matrix Entropy Uncover Sparsity? - A Compressor Design from an Uncertainty-Aware Perspective. 4179-4199 - Haiquan Qiu, You Wu, Dong Li, Jianmin Guo, Quanming Yao:

Superpose Task-specific Features for Model Merging. 4200-4214 - Suifeng Zhao, Zhuoran Jin, Sujian Li, Jun Gao

:
FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain. 4215-4249 - Qinzhuo Wu, Pengzhi Gao, Wei Liu, Jian Luan:

BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism. 4250-4272 - Siyue Zhang, Yilun Zhao, Liyuan Geng, Arman Cohan, Anh Tuan Luu, Chen Zhao:

Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective. 4273-4303 - Heng Wang, Yotaro Shimose, Shingo Takamatsu:

BannerAgency: Advertising Banner Design with Multimodal LLM Agents. 4304-4329 - Weijie Shi, Jipeng Zhang, Yaguang Wu, Jingzhi Fang, Shibo Zhang, Yao Zhao, Hao Chen

, Ruiyuan Zhang, Yue Cui, Jia Zhu, Sirui Han, Jiajie Xu, Xiaofang Zhou:
DIDS: Domain Impact-aware Data Sampling for Large Language Model Training. 4330-4350 - Chang Su, Dengliang Shi, Siyuan Huang, Jintao Du, Changhua Meng, Yu Cheng, Weiqiang Wang, Zhouhan Lin:

Training LLMs to be Better Text Embedders through Bidirectional Reconstruction. 4351-4369 - Shaomu Tan, Christof Monz:

ReMedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling. 4370-4387 - Zhiyuan Peng, Xin Yin, Rui Qian, Peiqin Lin, Yongkang Liu, Hao Zhang, Chenhao Ying, Yuan Luo:

SolEval: Benchmarking Large Language Models for Repository-level Solidity Smart Contract Generation. 4388-4411 - Nathan Roll, Calbert Graham, Yuka Tatsumi, Kim Tien Nguyen, Meghan Sumner, Dan Jurafsky:

In-Context Learning Boosts Speech Recognition via Human-like Adaptation to Speakers and Language Varieties. 4412-4426 - Changsheng Wang, Chongyu Fan, Yihua Zhang, Jinghan Jia, Dennis Wei, Parikshit Ram, Nathalie Baracaldo, Sijia Liu:

Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills. 4427-4443 - Yijun Shen, Delong Chen, Fan Liu, Xingyu Wang, Chuanyi Zhang, Liang Yao, Yuhui Zheng:

Chain-of-Talkers (CoTalk): Fast Human Annotation of Dense Image Captions. 4444-4464 - Hao Sun, Zile Qiao, Bo Wang, Guoxin Chen, Yingyan Hou, Yong Jiang, Pengjun Xie, Fei Huang, Yan Zhang:

DecoupleSearch: Decouple Planning and Search via Hierarchical Reward Modeling. 4465-4478 - Jianwei Wang, Chengming Shi, Junyao Yang, Haoran Li, Qianli Ma, Huiping Zhuang, Cen Chen, Ziqian Zeng:

RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data Synthesis. 4479-4500 - Haorui Wang, Zheng Wang, Yuxuan Zhang, Bo Wang

, Bin Wu:
Synergizing Multimodal Temporal Knowledge Graphs and Large Language Models for Social Relation Recognition. 4501-4520 - Chaeeun Kim, Jinu Lee, Wonseok Hwang:

LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation. 4521-4554 - Jingxuan Wei, Nan Xu, Junnan Zhu, Haoyanni, Gaowei Wu, Qi Chen, Bihui Yu, Lei Wang:

ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering. 4555-4569 - Di Zhao

, Longhui Ma, Siwei Wang, Miao Wang, Zhao Lv:
COLA: Collaborative Multi-Agent Framework with Dynamic Task Scheduling for GUI Automation. 4570-4593 - Jiguo Liu, Chao Liu, Meimei Li, Nan Li, Shihao Gao, Dali Zhu:

DASA-Trans-STM: Adaptive Efficient Transformer for Short Text Matching using Data Augmentation and Semantic Awareness. 4594-4610 - Avinash Madasu, Vasudev Lal, Phillip Howard:

Pruning the Paradox: How CLIP's Most Informative Heads Enhance Performance While Amplifying Bias. 4611-4626 - Ziyue Liu, Ruijie Zhang, Zhengyang Wang, Mingsong Yan, Zi Yang, Paul D. Hovland, Bogdan Nicolae, Franck Cappello, Sui Tang, Zheng Zhang:

CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation. 4627-4645 - Ziwen Chen, Xiaoyuan Zhang, Ming Zhu:

TS-CLIP: Time Series Understanding by CLIP. 4646-4664 - Yangyang Xu

, Jinpeng Hu, Zhuoer Zhao, Zhangling Duan, Xiao Sun, Xun Yang:
MultiAgentESC: A LLM-based Multi-Agent Collaboration Framework for Emotional Support Conversation. 4665-4681 - Yilin Wang

, Heng Wang, Yuyang Bai, Minnan Luo:
Continuously Steering LLMs Sensitivity to Contextual Knowledge with Proxy Models. 4682-4698 - Yun-Shiuan Chuang, Sameer Narendran, Nikunj Harlalka, Alexander Cheung, Sizhe Gao, Siddharth Suresh, Junjie Hu, Timothy T. Rogers:

Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding. 4699-4713 - Jun-Yu Ma, Tianqing Fang, Zhisong Zhang, Hongming Zhang, Haitao Mi, Dong Yu:

Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation. 4714-4720 - Zhongyi Ye, Weitai Zhang, Xinyuan Zhou, Yongxin Zhu, Ninghui Rao, Enhong Chen:

Scalable Data Synthesis through Human-like Cognitive Imitation and Data Recombination. 4721-4735 - Jianan Wang, Bin Li, Jingtao Qi, Xueying Wang, Fu Li, Lihanxun Li

:
BeSimulator: A Large Language Model Powered Text-based Behavior Simulator. 4736-4754 - Hexiang Tan, Fei Sun, Sha Liu, Du Su, Qi Cao, Xin Chen, Jingang Wang, Xunliang Cai, Yuanzhuo Wang, Huawei Shen, Xueqi Cheng:

Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs. 4755-4765 - Zhanming Shen, Tianqi Xu, Hao Wang, Jian Li, Miao Pan:

pFedGPT: Hierarchically Optimizing LoRA Aggregation Weights for Personalized Federated GPT Models. 4766-4778 - Juntao Zhao, Wenhao Lu, Sheng Wang, Lingpeng Kong, Chuan Wu:

QSpec: Speculative Decoding with Complementary Quantization Schemes. 4779-4795 - Zetong Li, Qinliang Su, Minhua Huang, Yin Yang:

Co-Evolving LLMs and Embedding Models via Density-Guided Preference Optimization for Text Clustering. 4796-4808 - Yidan Zhang, Yu Wan, Boyi Deng, Baosong Yang, Haoran Wei, Fei Huang, Bowen Yu, Dayiheng Liu, Junyang Lin, Fei Huang, Jingren Zhou:

P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs. 4809-4836 - Yutao Zhu, Jiajie Jin, Hongjin Qian, Zheng Liu, Zhicheng Dou, Ji-Rong Wen:

Single LLM, Multiple Roles: A Unified Retrieval-Augmented Generation Framework Using Role-Specific Token Optimization. 4837-4856 - Zezhong Jin, Shubhang Desai, Xu Chen, Biyi Fang, Zhuoyi Huang, Zhe Li, Chong-Xin Gan, Xiao Tu, Man-Wai Mak, Yan Lu, Shujie Liu:

TrInk: Ink Generation with Transformer Network. 4857-4864 - Xiaoyi Bao, Zhongqing Wang, Jinghang Gu, Chu-Ren Huang:

CalligraphicOCR for Chinese Calligraphy Recognition. 4865-4877 - Cheng Wang, Gelei Deng, Xianglin Yang, Han Qiu, Tianwei Zhang:

When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models. 4878-4888 - Pingyi Hu, Xiaofan Bai, Xiaojing Ma, Chaoxiang He, Dongmei Zhang, Bin Benjamin Zhu:

RESF: Regularized-Entropy-Sensitive Fingerprinting for Black-Box Tamper Detection of Large Language Models. 4889-4903 - Zhaomin Wu, Jizhou Guo

, Junyi Hou, Bingsheng He, Lixin Fan, Qiang Yang:
Model-based Large Language Model Customization as Service. 4904-4921 - Haochen Sun

, Shuwen Zhang, Lujie Niu, Lei Ren
, Hao Xu, Hao Fu, Fangkun Zhao, Caixia Yuan, Xiaojie Wang:
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents. 4922-4951 - Yao Chen

, Jiawei Sheng, Wenyuan Zhang
, Tingwen Liu:
Improving Reasoning Capabilities in Small Models through Mixture-of-layers Distillation with Stepwise Attention on Key Information. 4952-4971 - Renjie Luo, Jiaxi Li, Chen Huang, Wei Lu:

Through the Valley: Path to Effective Long CoT Training for Small Language Models. 4972-4992 - Jiahui Li, Lin Li, Tai-Wei Chang, Kun Kuang, Long Chen, Jun Zhou, Cheng Yang:

RED: Unleashing Token-Level Rewards from Holistic Feedback via Reward Redistribution. 4993-5022 - Peng Ding, Wen Sun, Dailin Li, Wei Zou, Jiaming Wang, Jiajun Chen, Shujian Huang:

SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models. 5023-5037 - Zizhen Li, Chuanhao Li, Yibin Wang, Qi Chen, Diping Song, Yukang Feng, Jianwen Sun, Jiaxin Ai, Fanrui Zhang, Mingzhu Sun, Kaipeng Zhang:

InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles. 5038-5076 - Zekun Moore Wang, King Zhu, Chunpu Xu, Wangchunshu Zhou, Jiaheng Liu, Yibo Zhang, Jessie Jiashuo Wang, Ning Shi, Siyu Li, Yizhi Li, Haoran Que, Zhaoxiang Zhang, Yuanxing Zhang, Ge Zhang, Ke Xu, Jie Fu, Wenhao Huang:

MIO: A Foundation Model on Multimodal Tokens. 5077-5099 - Nan Jiang, Ziming Wu, De-Chuan Zhan, Fuming Lai, Shaobing Lian:

DART: Distilling Autoregressive Reasoning to Silent Thought. 5100-5108 - Qi Zhang, Shouqing Yang, Lirong Gao, Hao Chen

, Xiaomeng Hu, Jinglei Chen, Jiexiang Wang, Sheng Guo, Bo Zheng, Haobo Wang, Junbo Zhao:
LeTS: Learning to Think-and-Search via Process-and-Outcome Reward Hybridization. 5109-5122 - Zhanming Shen, Hao Chen

, Yulei Tang, Shaolin Zhu, Wentao Ye, Xiaomeng Hu, Haobo Wang, Gang Chen, Junbo Zhao:
CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle Consistency. 5123-5137 - Grace LeFevre, Qingcheng Zeng, Adam Leif, Jason Jewell, Denis Peskoff, Rob Voigt:

Good Intentions Beyond ACL: Who Does NLP for Social Good, and Where? 5138-5150 - Zhihan Guo, Jiele Wu, Wenqian Cui, Yifei Zhang, Minda Hu, Yufei Wang, Irwin King:

From General Reward to Targeted Reward: Improving Open-ended Long-context Generation Models. 5151-5166 - Xinyue Lou, You Li, Jinan Xu, Xiangyu Shi

, Chi Chen, Kaiyu Huang:
Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model. 5167-5186 - Bajian Xiang, Shuaijiang Zhao, Tingwei Guo, Wei Zou:

Understanding the Modality Gap: An Empirical Study on the Speech-Text Alignment Mechanism of Large Speech Language Models. 5187-5202 - Yifan Liu, Wenkuan Zhao, Shanshan Zhong, Jinghui Qin, Mingfu Liang, Zhongzhan Huang, Wushao Wen:

AssoCiAm: A Benchmark for Evaluating Association Thinking while Circumventing Ambiguity. 5203-5219 - Zexuan Li

, Hongliang Dai, Piji Li:
M-BRe: Discovering Training Samples for Relation Extraction from Unlabeled Texts with Large Language Models. 5220-5238 - Sangyeon Yoon, Wonje Jeung, Albert No:

R-TOFU: Unlearning in Large Reasoning Models. 5239-5258 - Zequn Xie, Chuxin Wang, Yeqiang Wang, Sihang Cai, Shulei Wang, Tao Jin:

Chat-Driven Text Generation and Interaction for Person Retrieval. 5259-5270 - Yuxuan Li, Hirokazu Shirado:

Spontaneous Giving and Calculated Greed in Language Models. 5271-5286 - Lei Jiang, Desheng Wu, Xiaolong Zheng:

SenDetEX: Sentence-Level AI-Generated Text Detection for Human-AI Hybrid Content via Style and Context Fusion. 5287-5302 - Mo Zhiqiang, Yang Hua, Jiahui Li, Yuan Liu, Shawn Wong, Jianmin Huang:

Judge and Improve: Towards a Better Reasoning of Knowledge Graphs with Large Language Models. 5303-5320 - Zhuo Li, Yuhao Du, Xiaoqi Jiao, Steven Y. Guo, Yuege Feng, Xiang Wan, Anningzhe Gao, Jinpeng Hu:

Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy Paradigm. 5321-5340 - Jiajun Zhou, Yifan Yang, Kai Zhen, Ziyue Liu, Yequan Zhao, Ershad Banijamali, Athanasios Mouchtaris, Ngai Wong, Zheng Zhang:

QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models. 5341-5359 - Yingfa Chen, Yutong Wu, Chenyang Song, Zhen Leng Thai, Xingyu Shen, Xu Han, Zhiyuan Liu, Maosong Sun:

Cost-Optimal Grouped-Query Attention for Long-Context Modeling. 5360-5376 - Zhongyi Zhou, Yichen Zhu, Minjie Zhu, Junjie Wen, Ning Liu, Zhiyuan Xu, Weibin Meng, Yaxin Peng, Chaomin Shen, Feifei Feng, Yi Xu:

ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model. 5377-5395 - Ziyi Guan, Jason Chun Lok Li, Zhijian Hou, Pingping Zhang, Donglai Xu, Yuzhi Zhao, Mengyang Wu, Jinpeng Chen, Thanh-Toan Nguyen, Pengfei Xian, Wenao Ma, Shengchao Qin, Graziano Chesi, Ngai Wong:

KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation. 5396-5405 - Jihai Zhang, Xiaoye Qu, Tong Zhu, Yu Cheng:

CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling. 5406-5419 - Xiaoxi Li

, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, Zhicheng Dou:
Search-o1: Agentic Search-Enhanced Large Reasoning Models. 5420-5438 - Shenghan Wu, Yimo Zhu, Wynne Hsu, Mong-Li Lee, Yang Deng

:
From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations. 5439-5453 - Shuodi Liu, Yingzhuo Liu, Zi Wang, Yusheng Wang, Huijia Wu, Liuyu Xiang, Zhaofeng He:

Select-Then-Decompose: From Empirical Analysis to Adaptive Selection Strategy for Task Decomposition in Large Language Models. 5454-5477 - Junchen Ding, Jiahao Zhang, Yi Liu, Ziqi Ding, Gelei Deng, Yuekang Li:

TombRaider: Entering the Vault of History to Jailbreak Large Language Models. 5478-5493 - Danny Wang, Ruihong Qiu, Guangdong Bai, Zi Huang:

Text Meets Topology: Rethinking Out-of-distribution Detection in Text-Rich Networks. 5494-5523 - Zhuo Li, Yuege Feng, Dandan Guo, Jinpeng Hu, Anningzhe Gao, Xiang Wan:

APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal Transport. 5524-5538 - Feng Xiong, Hongling Xu, Yifei Wang, Runxi Cheng, Yong Wang, Xiangxiang Chu:

HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation. 5539-5555 - Wonje Jeung, Sangyeon Yoon, Albert No:

SEPS: A Separability Measure for Robust Unlearning in LLMs. 5556-5587 - Zehong Yan, Peng Qi, Wynne Hsu, Mong-Li Lee:

TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection. 5588-5604 - Justin Xu, Yiming Li, Zizheng Zhang, Augustine Yui Hei Luk, Mayank Jobanputra, Samarth Oza, Ashley Murray, Meghana Reddy Kasula, Andrew Parker, David W. Eyre:

Tree-of-Quote Prompting Improves Factuality and Attribution in Multi-Hop and Medical Reasoning. 5605-5622 - Yichuan Ma, Yunfan Shao, Peiji Li, Demin Song, Qipeng Guo, Linyang Li, Xipeng Qiu, Kai Chen:

UnitCoder: Scalable Code Synthesis from Pre-training Corpora. 5623-5641 - Jixiao Zhang, Chunsheng Zuo

:
GRPO-LEAD: A Difficulty-Aware Reinforcement Learning Approach for Concise Mathematical Reasoning in Language Models. 5642-5654 - Peichao Lai, Jiaxin Gan, Feiyang Ye, Wentao Zhang, Fangcheng Fu, Yilei Wang, Bin Cui:

Improving Low-Resource Sequence Labeling with Knowledge Fusion and Contextual Label Explanations. 5655-5674 - Congchi Yin, Qian Yu, Zhiwei Fang, Changping Peng, Piji Li:

Rethinking Cross-Subject Data Splitting for Brain-to-Text Decoding. 5675-5689 - Dongjun Jang, Youngchae Ahn, Hyopil Shin:

RCScore: Quantifying Response Consistency in Large Language Models. 5690-5708 - Hui Li, Ante Wang, Kunquan Li, Zhihao Wang, Liang Zhang, Delai Qiu

, Qingsong Liu, Jinsong Su:
A Multi-Agent Framework with Automated Decision Rule Optimization for Cross-Domain Misinformation Detection. 5709-5725 - Shuting Wang, Jiejun Tan

, Zhicheng Dou, Ji-Rong Wen:
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain. 5726-5751 - Xiaopeng Ke, Hexuan Deng

, Xuebo Liu, Jun Rao, Zhenxi Song, Jun Yu, Min Zhang:
AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs. 5752-5785 - Junxi Wu, Jinpeng Wang, Zheng Liu, Bin Chen, Dongjian Hu, Hao Wu, Shu-Tao Xia:

MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds. 5786-5805 - Lin Lu, Zhigang Zuo, Ziji Sheng, Pan Zhou:

Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model Merging. 5806-5825 - Xi Chen, Shuo Wang

:
Pragmatic Inference Chain (PIC) Improving LLMs' Reasoning of Authentic Implicit Toxic Language. 5826-5841 - Wang Cai, Hsiu-Yuan Huang, Zhixiang Wang, Yunfang Wu:

Beyond Demonstrations: Dynamic Vector Construction from Latent Representations. 5842-5857 - Ying Zhao, Yuanzhao Guo, Xuemeng Weng, Yuan Tian, Wei Wang, Yi Chang:

Detoxifying Large Language Models via the Diversity of Toxic Samples. 5858-5871 - Yanxu Ji

, Jinzhong Ning, Yi-Jia Zhang, Zhi Liu, Hongfei Lin:
LLM-Driven Implicit Target Augmentation and Fine-Grained Contextual Modeling for Zero-Shot and Few-Shot Stance Detection. 5872-5884 - Mengze Hong

, Wailing Ng, Chen Jason Zhang, Yuanfeng Song, Di Jiang:
Dial-In LLM: Human-Aligned LLM-in-the-loop Intent Clustering for Customer Service Dialogues. 5885-5900 - Xiangchi Yuan, Chunhui Zhang, Zheyuan Liu, Dachuan Shi, Leyan Pan, Soroush Vosoughi, Wenke Lee:

Superficial Self-Improved Reasoners Benefit from Model Merging. 5901-5921 - Wenqiao Zhu

, Ji Liu, Rongjunchen Zhang, Haipang Wu, Yulun Zhang:
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning. 5922-5937 - Mengze Hong

, Wailing Ng, Chen Jason Zhang, Di Jiang:
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation. 5938-5953 - Naen Xu, Jinghuai Zhang, Changjiang Li, Zhi Chen, Chunyi Zhou, Qingming Li, Tianyu Du, Shouling Ji:

VideoEraser: Concept Erasure in Text-to-Video Diffusion Models. 5954-5983 - Xinyu Zhang, Lingling Zhang, Yanrui Wu, Muye Huang, Wenjun Wu, Bo Li, Shaowei Wang, Basura Fernando, Jun Liu:

Diagram-Driven Course Questions Generation. 5984-5999 - Yuanyuan He, Yongsen Pan, Wei Li, Jiali You, Jiawen Deng, Fuji Ren:

ECC: An Emotion-Cause Conversation Dataset for Empathy Response. 6000-6017 - Zijian Wang, Chang Xu:

ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing Representations. 6018-6039 - Jinwang Song, Hongying Zan, Kunli Zhang, Lingling Mu, Yingjie Han, Haobo Hua, Min Peng:

JOLT-SQL: Joint Loss Tuning of Text-to-SQL with Confusion-aware Noisy Schema Sampling. 6040-6053 - Zhibo Man, Yuanmeng Chen, Yujie Zhang, Jinan Xu:

DMDTEval: An Evaluation and Analysis of LLMs on Disambiguation in Multi-domain Translation. 6054-6071 - David Wadden, Kejian Shi, Jacob Morrison, Alan Li, Aakanksha Naik, Shruti Singh, Nitzan Barzilay, Kyle Lo, Tom Hope, Luca Soldaini, Shannon Zejiang Shen, Doug Downey, Hannaneh Hajishirzi, Arman Cohan:

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature. 6072-6109 - Xinkui Lin, Yuhui Zhang, Yongxiu Xu, Kun Huang, Hongzhang Mu, Yubin Wang, Gaopeng Gou, Li Qian, Li Peng, Wei Liu, Jian Luan, Hongbo Xu:

MAKAR: a Multi-Agent framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition. 6110-6130 - Bingrui Sima, Linhua Cong, Wenxuan Wang, Kun He:

VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models. 6131-6144 - Kohei Tsuji, Tatsuya Hiraoka, Yuchang Cheng, Eiji Aramaki, Tomoya Iwakura:

Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors. 6145-6163 - Shuo Yan

, Ruochen Li, Ziming Luo
, Zimu Wang, Daoyang Li, Liqiang Jing, Kaiyu He, Peilin Wu, Juntong Ni, George Michalopoulos, Yue Zhang, Ziyang Zhang, Mian Zhang, Zhiyu Chen, Xinya Du:
LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling Research. 6164-6186 - Jinlin Wang, Yulong Ji, Hongyu Yang:

RAV: Retrieval-Augmented Voting for Tactile Descriptions Without Training. 6187-6194 - Takashi Wada, Yuki Hirakawa, Ryotaro Shimizu, Takahiro Kawashima, Yuki Saito:

Static Word Embeddings for Sentence Semantic Representation. 6195-6211 - Jingjin Wang, Jiawei Han:

PropRAG: Guiding Retrieval with Beam Search over Proposition Paths. 6212-6227 - Jun Yan, Wenjie Jacky Mo, Xiang Ren, Robin Jia

:
Rethinking Backdoor Detection Evaluation for Language Models. 6228-6239 - Pingzhi Li, Prateek Yadav, Jaehong Yoon, Jie Peng, Yi-Lin Sung, Mohit Bansal, Tianlong Chen:

Glider: Global and Local Instruction-Driven Expert Router. 6240-6301 - Zhengdong Yang, Zhen Wan, Sheng Li, Chao-Han Huck Yang, Chenhui Chu:

CoVoGER: A Multilingual Multitask Benchmark for Speech-to-text Generative Error Correction with Large Language Models. 6302-6314 - Jinman Zhao, Xueyan Zhang, Jiaru Li, Jingcheng Niu, Yulan Hu, Erxue Min, Gerald Penn:

Tiny Budgets, Big Gains: Parameter Placement Strategy in Parameter Super-Efficient Fine-Tuning. 6315-6333 - Junkai Liu, Yujie Tong, Hui Huang, Bowen Zheng, Yiran Hu, Peicheng Wu, Chuan Xiao, Makoto Onizuka, Muyun Yang, Shuyuan Zheng:

Legal Fact Prediction: The Missing Piece in Legal Judgment Prediction. 6334-6349 - Xu Zhang, Xunjian Yin, Dinghao Jing, Huixuan Zhang, Xinyu Hu, Xiaojun Wan:

DAMON: A Dialogue-Aware MCTS Framework for Jailbreaking Large Language Models. 6350-6366 - Qihan Wang, Shidong Pan, Tal Linzen, Emily Black:

Multilingual Prompting for Improving LLM Generation Diversity. 6367-6389 - Genglin Liu, Vivian T. Le, Salman Rahman, Elisa Kreiss, Marzyeh Ghassemi, Saadia Gabriel:

MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations. 6390-6417 - Wenzhi Wang, Paul Reisert, Shoichi Naito, Naoya Inoue, Machi Shimmei, Surawat Pothong, Jungmin Choi, Kentaro Inui:

Identification of Multiple Logical Interpretations in Counter-Arguments. 6418-6433 - Peng Wang, Biyu Zhou, Xuehai Tang, Jizhong Han, Songlin Hu:

LyapLock: Bounded Knowledge Preservation in Sequential Large Language Model Editing. 6434-6459 - Mengyu Bu

, Shaolei Zhang, Zhongjun He, Hua Wu, Yang Feng:
AlignX: Advancing Multilingual Large Language Models with Multilingual Representation Alignment. 6460-6489 - Gangwei Jiang

, Yahui Liu, Zhaoyi Li, Wei Bi, Fuzheng Zhang, Linqi Song
, Ying Wei, Defu Lian:
What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning. 6490-6514 - Yiding Wang, Fanxu Meng, Xuefeng Zhang, Fan Jiang, Pingzhi Tang, Muhan Zhang:

HD-PiSSA: High-Rank Distributed Orthogonal Adaptation. 6515-6528 - Runyu Peng, Yunhua Zhou, Kai Lv, Yang Gao, Qipeng Guo, Xipeng Qiu:

Firewall Routing: Blocking Leads to Better Hybrid Inference for LLMs. 6529-6554 - Chengyu Jiao, Shuhao Chen, Yu Zhang:

SPE Attention: Making Attention Equivariant to Semantic-Preserving Permutation for Code Processing. 6555-6568 - Yudong Yang, Jimin Zhuang, Guangzhi Sun, Changli Tang, Yixuan Li, Peihan Li, Yifan Jiang, Wei Li, Zejun Ma, Chao Zhang:

Audio-centric Video Understanding Benchmark without Text Shortcut. 6569-6587 - Songshuo Lu, Hua Wang, Yutian Rong, Zhi Chen, Yaohua Tang:

TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text. 6588-6601 - Haozhan Shen, Kangjia Zhao, Tiancheng Zhao, Ruochen Xu, Zilun Zhang, Mingwei Zhu, Jianwei Yin:

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration. 6602-6618 - Enci Zhang, Xingang Yan, Wei Lin, Tianxiang Zhang, Qianchun Lu:

Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation. 6619-6633 - Keer Lu, Keshi Zhao, Zhuoran Zhang, Zheng Liang, Bin Cui, Tengjiao Wang, Wentao Zhang:

VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs. 6634-6658 - Hengxing Cai, Jinhan Dong, Jingjun Tan, Jingcheng Deng, Sihang Li, Zhifeng Gao, Haidong Wang, Zicheng Su, Agachai Sumalee, Renxin Zhong:

FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models. 6659-6676 - Haoran Chen, Junyan Lin, Xinghao Chen, Yue Fan, Jianfeng Dong, Xin Jin, Hui Su, Jinlan Fu, Xiaoyu Shen:

Multimodal Language Models See Better When They Look Shallower. 6677-6695 - Xujia Wang, Yunjia Qi, Bin Xu:

LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization. 6696-6715 - Tianle Gu, Zongqi Wang, Kexin Huang, Yuanqi Yao, Xiangliang Zhang, Yujiu Yang, Xiuying Chen:

Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking. 6716-6733 - Bufan Gao, Elisa Kreiss:

Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender Biases. 6734-6750 - Jikai Wang, Zhenxu Tian, Juntao Li, Qingrong Xia, Xinyu Duan, Zhe-Feng Wang, Baoxing Huai, Min Zhang:

Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification. 6751-6763 - Haoqin Tu, Weitao Feng, Hardy Chen, Hui Liu, Xianfeng Tang, Cihang Xie:

ViLBench: A Suite for Vision-Language Process Reward Modeling. 6764-6779 - Hwan Chang, Yumin Kim, Yonghyun Jun, Hwanhee Lee:

Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering. 6780-6800 - Wei Shi, Sihang Li, Tao Liang, Mingyang Wan, Guojun Ma, Xiang Wang, Xiangnan He:

Route Sparse Autoencoder to Interpret Large Language Models. 6801-6815 - Qizhen Zhang, Prajjwal Bhargava, Chloe Bi, Chris X. Cai, Jakob Nicolaus Foerster, Jeremy Fu, Punit Singh Koura, Ruan Silva, Sheng Shen, Emily Dinan, Suchin Gururangan, Mike Lewis:

BTS: Harmonizing Specialized Experts into a Generalist LLM. 6816-6834 - Anant Khandelwal, Manish Gupta, Puneet Agrawal:

CoCoA: Confidence- and Context-Aware Adaptive Decoding for Resolving Knowledge Conflicts in Large Language Models. 6835-6855 - Huixuan Zhang, Xiaojun Wan:

R-Bind: Unified Enhancement of Attribute and Relation Binding in Text-to-Image Diffusion Models. 6856-6870 - Zinan Tang, Xin Gao, Qizhi Pei, Zhuoshi Pan, Mengzhang Cai, Jiang Wu, Conghui He, Lijun Wu:

Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning. 6871-6891 - Wei Liu, Nai Ding:

Information Integration in Large Language Models is Gated by Linguistic Structural Markers. 6892-6904 - Chengfeng Zhao, Shizhu He, Shanshan Jiang, Bin Dong, Jun Zhao, Kang Liu:

Why and How LLMs Benefit from Knowledge Introspection in Commonsense Reasoning. 6905-6920 - Jing He, Mingyang Lv, Qing Shi, Gong Cheng:

GraDaSE: Graph-Based Dataset Search with Examples. 6921-6932 - Youwon Jang, Woo Suk Choi, Minjoon Jung, Min Su Lee, Byoung-Tak Zhang:

Confidence-guided Refinement Reasoning for Zero-shot Question Answering. 6933-6950 - Yiqi Li, Yusheng Liao, Zhe Chen, Yanfeng Wang, Yu Wang:

DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction. 6951-6966 - Zhenhua Xu, Xixiang Zhao, Xubin Yue, Shengwei Tian, Changting Lin, Meng Han:

CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation Backdoor. 6967-6989 - Yikuan Xia, Jiazun Chen, Sujian Li, Jun Gao

:
Realistic Training Data Generation and Rule Enhanced Decoding in LLM for NameGuess. 6990-7007 - Zhenhua Xu, Meng Han, Wenpeng Xing:

EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint. 7008-7031 - Kailai Yang, Zhiwei Liu, Qianqian Xie, Jimin Huang, Erxue Min, Sophia Ananiadou:

Selective Preference Optimization via Token-Level Reward Function Estimation. 7032-7056 - Seonil Son, Ju-Min Oh, Heegon Jin, Cheolhun Jang, Jeongbeom Jeong, Kuntae Kim:

Arena-lite: Efficient and Reliable Large Language Model Evaluation via Tournament-Based Direct Comparisons. 7057-7075 - Ruiyi Yan, Yugo Murawaki:

Addressing Tokenization Inconsistency in Steganography and Watermarking Based on Large Language Models. 7076-7098 - Minghua He, Yue Chen, Fangkai Yang, Pu Zhao, Wenjie Yin, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang:

ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation. 7099-7125 - Junnan Zhu, Jingyi Wang, Bohan Yu, Xiaoyu Wu, Junbo Li, Lei Wang, Nan Xu:

TableEval: A Real-World Benchmark for Complex, Multilingual, and Multi-Structured Table Question Answering. 7126-7146 - Jinyang Zhang, Kexin Yang, Yu Wan, Muyang Ye, Baosong Yang, Fei Huang, Junyang Lin, Dayiheng Liu:

NOVA-63: Native Omni-lingual Versatile Assessments of 63 Disciplines. 7147-7189 - Zihan Wang

, Zihan Liang, Zhou Shao, Yufei Ma, Huangyu Dai, Ben Chen, Lingtao Mao, Chenyi Lei, Yuqing Ding, Han Li:
InfoGain-RAG: Boosting Retrieval-Augmented Generation through Document Information Gain-based Reranking and Filtering. 7190-7204 - Yicheng Ji

, Jun Zhang, Heming Xia, Jinpeng Chen, Lidan Shou, Gang Chen, Huan Li:
SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning. 7205-7219 - Muhammad Dehan Al Kautsar, Lucky Susanto, Derry Tanti Wijaya, Fajri Koto:

What Do Indonesians Really Need from Language Technology? A Nationwide Survey. 7220-7245 - Yimu Wang, Mozhgan Nasr Azadani, Sean Sedwards, Krzysztof Czarnecki:

LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts. 7246-7261 - Wessel Poelman

, Thomas Bauwens, Miryam de Lhoneux
:
Confounding Factors in Relating Model Performance to Morphology. 7262-7287 - Hongyan Chang, Ali Shahin Shamsabadi, Kleomenis Katevas, Hamed Haddadi, Reza Shokri:

Context-Aware Membership Inference Attacks against Pre-trained Large Language Models. 7288-7310 - Gustave Cortal, Alain Finkel:

Formalizing Style in Personal Narratives. 7311-7326 - Yulin Chen, Haoran Li, Yuexin Li, Yue Liu, Yangqiu Song, Bryan Hooi:

TopicAttack: An Indirect Prompt Injection Attack via Topic Transition. 7327-7345 - Gianluca Sperduti, Dong Nguyen:

PSET: a Phonetics-Semantics Evaluation Testbed. 7346-7356 - Yingli Shen, Wen Lai, Shuo Wang, Ge Gao, Kangyang Luo, Alexander Fraser

, Maosong Sun:
From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora. 7357-7379 - Shuzheng Si, Haozhe Zhao, Gang Chen, Yunshui Li, Kangyang Luo, Chuancheng Lv, Kaikai An, Fanchao Qi, Baobao Chang, Maosong Sun:

GATEAU: Selecting Influential Samples for Long Context Alignment. 7380-7411 - Wangyi Jiang, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun:

Teach Small Models to Reason by Curriculum Distillation. 7412-7422 - Wenrui Cai, Chengyu Wang, Junbing Yan, Jun Huang, Xiangzhong Fang:

Enhancing Reasoning Abilities of Small LLMs with Cognitive Alignment. 7423-7438 - Wei Liu, Siya Qi, Xinyu Wang, Chen Qian, Yali Du, Yulan He:

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning. 7439-7458 - Lena Sophia Bolliger, Lena Ann Jäger:

Genre Matters: How Text Types Interact with Decoding Strategies and Lexical Predictors in Shaping Reading Behavior. 7459-7476 - Aziguli Wulamu, Kaiyuan Gong, Lyu Zhengyu, Yu Han, Zhihong Zhu, Bowen Xing:

RTE-GMoE: A Model-agnostic Approach for Relation Triplet Extraction via Graph-based Mixture-of-Expert Mutual Learning. 7477-7488 - Kyeongman Park, Nakyeong Yang, Kyomin Jung:

Avoidance Decoding for Diverse Multi-Branch Story Generation. 7489-7505 - Weiqiu You, Anton Xue, Shreya Havaldar, Delip Rao, Helen Jin, Chris Callison-Burch, Eric Wong:

Probabilistic Soundness Guarantees in LLM Reasoning Chains. 7506-7525 - Heng-Da Xu, Xian-Ling Mao, Fanshu Sun, Tian-Yi Che, Cheng-Xin Xin, Heyan Huang:

SQLWOZ: A Realistic Task-Oriented Dialogue Dataset with SQL-Based Dialogue State Representation for Complex User Requirements. 7526-7551 - Yuxin Gou, Xiaoning Dong, Qin Li, Shishen Gu, Richang Hong, Wenbo Hu:

SURE: Safety Understanding and Reasoning Enhancement for Multimodal Large Language Models. 7552-7593 - Minh-Phuc Truong, Hai An Vu, Tu Vu

, Nguyen Thi Ngoc Diep, Linh Van Ngo, Thien Huu Nguyen, Trung Le:
EMO: Embedding Model Distillation via Intra-Model Relation and Optimal Transport Alignments. 7594-7606 - Kun Li

, Lai Man Po, Hongzheng Yang, Xuyuan Xu, Kangcheng Liu, Yuzhi Zhao
:
AesBiasBench: Evaluating Bias and Alignment in Multimodal Language Models for Personalized Image Aesthetic Assessment. 7607-7620 - Anum Afzal, Florian Matthes, Alexander R. Fabbri:

DA-Pred: Performance Prediction for Text Summarization under Domain-Shift and Instruct-Tuning. 7621-7632 - Jielong Tang

, Yang Yang, Jianxing Yu, Zhen-Xing Wang, Haoyuan Liang, Liang Yao, Jian Yin:
UnCo: Uncertainty-Driven Collaborative Framework of Large and Small Models for Grounded Multimodal NER. 7633-7651 - Yi Sun, Han Wang, Jiaqiang Li, Jiacheng Liu, Xiangyu Li, Hao Wen, Yizhen Yuan, Huiwen Zheng, Yan Liang, Yuanchun Li, Yunxin Liu:

An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint. 7652-7671 - Songze Li, Zhiqiang Liu, Zhengke Gui, Huajun Chen, Wen Zhang:

Enrich-on-Graph: Query-Graph Alignment for Complex Reasoning with LLM Enriching. 7672-7692 - Yuanjun Feng, Vivek Choudhary, Yash Raj Shrestha:

Noise, Adaptation, and Strategy: Assessing LLM Fidelity in Decision-Making. 7693-7706 - Johannes Moll, Louisa Fay, Asfandyar Azhar, Sophie Ostmeier, Sergios Gatidis, Tim C. Lueth, Curtis Langlotz, Jean-Benoit Delbrouck:

Structuring Radiology Reports: Challenging LLMs with Lightweight Models. 7707-7724 - Yunuo Liu, Dawei Zhu, Zena Al-Khalili, Dai Cheng, Yanjun Chen, Dietrich Klakow, Wei Zhang, Xiaoyu Shen:

PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks. 7725-7734 - Yuebin Xu, Zhiyi Chen, Zeyi Wen:

EcoTune: Token-Efficient Multi-Fidelity Hyperparameter Optimization for Large Language Model Inference. 7735-7745 - Xia Du, Shuhan Sun

, Pengyuan Liu, Dong Yu:
Investigating Value-Reasoning Reliability in Small Large Language Models. 7746-7786 - Zahra Dehghanighobadi, Asja Fischer, Muhammad Bilal Zafar:

Can LLMs Explain Themselves Counterfactually? 7787-7815 - Chuanyang Zheng, Yihang Gao, Guoxuan Chen, Han Shi, Jing Xiong, Xiaozhe Ren, Chao Huang, Zhenguo Li, Yu Li:

Self-Adjust Softmax. 7816-7836 - Shaoqing Lin, Chong Teng, Fei Li, Donghong Ji, Lizhen Qu, Zhuang Li

:
DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement. 7837-7862 - Ernesto Luis Estevanell-Valladares, Suilan Estevez-Velarde, Yoan Gutiérrez, Andrés Montoyo, Ruslan Mitkov:

XAutoLM: Efficient Fine-Tuning of Language Models via Meta-Learning and AutoML. 7863-7880 - Roman Vashurin, Maiya Goloburda, Preslav Nakov, Maxim Panov:

UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models. 7881-7908 - Zhepei Wei, Wenlin Yao, Yao Liu, Weizhi Zhang, Qin Lu, Liang Qiu, Changlong Yu, Puyang Xu, Chao Zhang, Bing Yin, Hyokun Yun, Lihong Li:

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning. 7909-7928 - Tobias Domhan, Dawei Zhu:

Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language Models. 7929-7947 - Petros Raptopoulos, Giorgos Filandrianos, Maria Lymperaiou, Giorgos Stamou:

PAKTON: A Multi-Agent Framework for Question Answering in Long Legal Agreements. 7948-7984 - Xu Sun, Lionel Delphin-Poulat, Christèle Tarnec, Anastasia Shimorina:

PoSum-Bench: Benchmarking Position Bias in LLM-based Conversational Summarization. 7985-8009 - Ziqing Qiao, Yongheng Deng, Jiali Zeng, Dong Wang, Lai Wei, Guanbo Wang, Fandong Meng, Jie Zhou, Ju Ren, Yaoxue Zhang:

ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning. 8010-8029 - Hao Li, Lijun Li, Zhenghao Lu, Xianyi Wei, Rui Li, Jing Shao, Lei Sha:

Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment. 8030-8050 - Yuxia Gong, Shuguo Hu, Huaiwen Zhang:

Cross-domain Rumor Detection via Test-Time Adaptation and Large Language Models. 8051-8066 - Chun Hu, Junhui He, Shangyu Wu, Yuxin He, Chun Jason Xue, Qingan Li:

MLWQ: Efficient Small Language Model Deployment via Multi-Level Weight Quantization. 8067-8077 - Seongryong Jung, Suwan Yoon, DongGeon Kim, Hwanhee Lee:

ToDi: Token-wise Distillation via Fine-Grained Divergence Control. 8078-8091 - Qingyao Li

, Wei Xia, Xinyi Dai, Kounianhua Du, Weiwen Liu, Yasheng Wang, Ruiming Tang, Yong Yu, Weinan Zhang:
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation. 8092-8110 - Yucheng Sun, Alessandro Stolfo, Mrinmaya Sachan:

Probing for Arithmetic Errors in Language Models. 8111-8128 - Minda Hu, Qiyuan Zhang, Yufei Wang, Bowei He, Hongru Wang, Jingyan Zhou, Liangyou Li, Yasheng Wang, Chen Ma, Irwin King:

NILE: Internal Consistency Alignment in Large Language Models. 8129-8147 - Rong Ma, Lei Wang, Yating Yang, Bo Ma, Rui Dong, Fengyi Yang, Ahtamjan Ahmat, Kaiwen Lu, Xinyue Wang:

Mining the Past with Dual Criteria: Integrating Three types of Historical Information for Context-aware Event Forecasting. 8148-8163 - Andrei Catalin Coman

, Ionut-Teodor Sorodoc, Leonardo F. R. Ribeiro, Bill Byrne, James Henderson, Adrià de Gispert:
RAGferee: Building Contextual Reward Models for Retrieval-Augmented Generation. 8164-8211 - Minh Duc Bui

, Carolin Holtermann
, Valentin Hofmann, Anne Lauscher, Katharina von der Wense:
Large Language Models Discriminate Against Speakers of German Dialects. 8212-8240 - Yini Wang, Xian Zhou, Shengan Zheng, Linpeng Huang, Zhunchen Luo, Wei Luo, Xiaoying Bai:

Uncovering Argumentative Flow: A Question-Focus Discourse Structuring Framework. 8241-8259 - Tarun Tater, Diego Frassinelli, Sabine Schulte im Walde:

AbsVis - Benchmarking How Humans and Vision-Language Models "See" Abstract Concepts in Images. 8260-8281 - Tatiana Anikina, Ján Cegin

, Jakub Simko, Simon Ostermann:
A Rigorous Evaluation of LLM Data Generation Strategies for Low-Resource Languages. 8282-8303 - Houxing Ren, Zimu Lu, Weikang Shi, Haotian Hou, Yunqiao Yang, Ke Wang, Aojun Zhou, Junting Pan, Mingjie Zhan, Hongsheng Li:

Alignment with Fill-In-the-Middle for Enhancing Code Generation. 8304-8320 - Hanbo Huang, Yihan Li, Bowen Jiang, Bo Jiang, Lin Liu, Zhuotao Liu, Ruoyu Sun, Shiyu Liang:

A Middle Path for On-Premises LLM Deployment: Preserving Privacy Without Sacrificing Model Confidentiality. 8321-8359 - Jonghyun Hong, Sungyoon Lee:

Variance Sensitivity Induces Attention Entropy Collapse and Instability in Transformers. 8360-8378 - Min Hyuk Kim, Changheon Kim, Seok Bong Yoo:

X-FLoRA: Cross-modal Federated Learning with Modality-expert LoRA for Medical VQA. 8379-8397 - Ahmet Yavuz Uluslu, Tannon Kew, Tilia Ellendorff, Gerold Schneider, Rico Sennrich:

Robust Native Language Identification through Agentic Decomposition. 8398-8414 - Jiawei Chen, Xinyan Guan, Qianhao Yuan, Guozhao Mo, Weixiang Zhou, Yaojie Lu, Hongyu Lin, Ben He, Le Sun, Xianpei Han:

ConsistentChat: Building Skeleton-Guided Consistent Multi-Turn Dialogues for Large Language Models from Scratch. 8415-8441 - Yizheng Sun, Hao Li, Chang Xu, Hongpeng Zhou, Chenghua Lin, Riza Batista-Navarro, Jingyuan Sun:

Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical Study. 8442-8456 - Nisrine Rair, Alban Goupil, Valeriu Vrabie, Emmanuel Chochoy:

When Annotators Disagree, Topology Explains: Mapper, a Topological Tool for Exploring Text Embedding Geometry and Ambiguity. 8457-8480 - Yingming Wang, Pepa Atanasova:

Self-Critique and Refinement for Faithful Natural Language Explanations. 8481-8507 - Arghodeep Nandi, Megha Sundriyal, Euna Mehnaz Khan, Jikai Sun, Emily K. Vraga, Jaideep Srivastava, Tanmoy Chakraborty:

The Psychology of Falsehood: A Human-Centric Survey of Misinformation Detection. 8508-8525 - Xinhao Huang, Zhibo Ren, Yipeng Yu, Ying Zhou, Zulong Chen, Zeyi Wen:

SEAL: Structure and Element Aware Learning Improves Long Structured Document Retrieval. 8526-8536 - Yu Zhang, Dong Guo, Fang Wu, Guoliang Zhu, Dian Ding, Yiming Zhang:

AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity. 8537-8549 - Michael Sejr Schlichtkrull:

Attacks by Content: Automated Fact-checking is an AI Security Issue. 8550-8565 - Yuezhang Peng, Yuxin Liu, Fei Wen, Xie Chen:

MUZO: Leveraging Multiple Queries and Momentum for Zeroth-Order Fine-Tuning of Large Language Models. 8566-8584 - Hao Fang, Jiawei Kong, Tianqu Zhuang, Yixiang Qiu, Kuofeng Gao, Bin Chen, Shu-Tao Xia, Yaowei Wang, Min Zhang:

Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors. 8585-8602 - Sergey Pletenev, Maria Marina, Nikolay Ivanov, Daria Galimzianova, Nikita Krayko, Mikhail Salnikov, Vasily Konovalov, Alexander Panchenko, Viktor Moskvoretskii:

Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA. 8603-8620 - Alina Klerings, Jannik Brinkmann, Daniel Ruffinelli, Simone Paolo Ponzetto:

Steering Language Models in Multi-Token Generation: A Case Study on Tense and Aspect. 8621-8639 - Navve Wasserman

, Oliver Heinimann, Yuval Golbari, Tal Zimbalist, Eli Schwartz, Michal Irani:
DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers. 8640-8658 - Yupei Du, Philipp Mondorf, Silvia Casola, Yuekun Yao, Robert Litschko, Barbara Plank:

Reason to Rote: Rethinking Memorization in Reasoning. 8659-8679 - Kazuki Matsuda, Yuiga Wada, Shinnosuke Hirano, Seitaro Otsuki, Komei Sugiura:

VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions. 8680-8696 - Maria Marina, Nikolay Ivanov, Sergey Pletenev, Mikhail Salnikov, Daria Galimzianova, Nikita Krayko, Vasily Konovalov, Alexander Panchenko, Viktor Moskvoretskii:

LLM-Independent Adaptive RAG: Let the Question Speak for Itself. 8697-8709 - Hongyi Luo, Qing Cheng, Daniel Matos, Hari Krishna Gadi

, Yanfeng Zhang, Lu Liu, Yongliang Wang, Niclas Zeller, Daniel Cremers, Liqiu Meng:
TurnBack: A Geospatial Route Cognition Benchmark for Large Language Models through Reverse Route. 8710-8729 - Yuqicheng Zhu, Jingcheng Wu, Yizhen Wang, Hongkuan Zhou, Jiaoyan Chen, Evgeny Kharlamov, Steffen Staab:

Certainty in Uncertainty: Reasoning over Uncertain Knowledge Graphs with Statistical Guarantees. 8730-8752 - Shengxiang Gao, Jey Han Lau, Jianzhong Qi:

Beyond Seen Data: Improving KBQA Generalization Through Schema-Guided Logical Form Generation. 8753-8772 - Yan Li, Tianyi Zhang, Zechuan Li, Caren Han:

A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation. 8773-8793 - Yilun Liu, Minggui He, Feiyu Yao, Yuhe Ji, Shimin Tao, Jingzhou Du, Justin Li, Jian Gao, Zhang Li, Hao Yang, Boxing Chen, Osamu Yoshie:

Taming Text-to-Image Synthesis for Novices: User-centric Prompt Generation via Multi-turn Guidance. 8794-8811 - Dong Nguyen, Esther Ploeger

:
We Need to Measure Data Diversity in NLP - Better and Broader. 8812-8821 - Lei Yu, Jingcheng Niu, Zining Zhu, Xi Chen, Gerald Penn:

Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity. 8822-8837 - Ana Ezquerro, Carlos Gómez-Rodríguez, David Vilares:

Hierarchical Bracketing Encodings Work for Dependency Graphs. 8838-8851 - Zhenqi Jia, Rui Liu, Berrak Sisman, Haizhou Li:

Multimodal Fine-grained Context Interaction Graph Modeling for Conversational Speech Synthesis. 8852-8858 - Mehdi Ali, Manuel Brack, Max Lübbering, Elias Wendt, Abbas Goher Khan, Richard Rutmann, Alex Jude, Maurice Kraus, Alexander Arno Weber, Felix Stollenwerk, David Kaczér, Florian Mai, Lucie Flek, Rafet Sifa, Nicolas Flores-Herr, Joachim Köhler, Patrick Schramowski, Michael Fromm, Kristian Kersting:

Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models. 8859-8898 - Hyukhun Koh, Minha Jhang, Dohyung Kim, Sangmook Lee, Kyomin Jung:

Conditional [MASK] Discrete Diffusion Language Model. 8899-8923 - Yogesh Kumar:

Language-Guided Temporal Token Pruning for Efficient VideoLLM Processing. 8924-8931 - Anda Cheng, Wei Huang, Yinggui Wang:

A Fully Probabilistic Perspective on Large Language Model Unlearning: Evaluation and Optimization. 8932-8943 - Xinyu Liu, Bei Li, Jiahao Liu, Junhao Ruan, Kechen Jiao, Hongyin Tang, Jingang Wang, Tong Xiao, JingBo Zhu:

IIET: Efficient Numerical Transformer via Implicit Iterative Euler Method. 8944-8958 - Tianqing Fang, Hongming Zhang, Zhisong Zhang, Kaixin Ma, Wenhao Yu, Haitao Mi, Dong Yu:

WebEvolver: Enhancing Web Agent Self-Improvement with Co-evolving World Model. 8959-8975 - Stephen Meisenbacher, Maulik Chevli, Florian Matthes:

Leveraging Semantic Triples for Private Document Generation with Local Differential Privacy Guarantees. 8976-8992 - Yiheng Jing, Mingming Zhang

, Yong Zhuang, Jiacheng Guo, Juan Wang, Xiaoyang Xu, Wenzhe Yi, Keyan Guo, Hongxin Hu:
HVGuard: Utilizing Multimodal Large Language Models for Hateful Video Detection. 8993-9006 - Yijiong Yu, Wei Wang, Ran Chen, Ji Pei:

Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence. 9007-9014 - Wenxin Tang, Jingyu Xiao, Wenxuan Jiang, Xi Xiao, Yuhang Wang, Xuxin Tang, Qing Li, Yuehe Ma, Junliang Liu, Shisong Tang, Michael R. Lyu:

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design. 9015-9039 - Hongyao Tu, Liang Zhang, Yujie Lin, Xin Lin, Haibo Zhang, Long Zhang

, Jinsong Su:
LLM-OREF: An Open Relation Extraction Framework Based on Large Language Models. 9040-9052 - Jian Li, Shenglin Yin, Yujia Zhang, Alan Zhao, Xi Chen, Xiaohui Zhou, Pengfei Xu:

Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization. 9053-9063 - Leonardo Ranaldi, Federico Ranaldi, Fabio Massimo Zanzotto, Barry Haddow, Alexandra Birch:

Improving Multilingual Retrieval-Augmented Language Models through Dialectic Reasoning Argumentations. 9064-9085 - Jiajun Chen, Yik-Cheung Tam:

Predicate-Guided Generation for Mathematical Reasoning. 9086-9099 - Raphael Gruber, Abdelrahman Abdallah, Michael Färber, Adam Jatowt:

ComplexTempQA: A 100m Dataset for Complex Temporal Question Answering. 9100-9112 - Qiuchen Wang, Ruixue Ding, Zehui Chen, Weiqi Wu, Shihang Wang, Pengjun Xie, Feng Zhao:

ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents. 9113-9134 - Muhammad Falensi Azmi

, Muhammad Dehan Al Kautsar, Alfan Farizki Wicaksono, Fajri Koto:
IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages. 9135-9166 - Harsh Vishwakarma, Ankush Agarwal, Ojas Patil, Chaitanya Devaguptapu, Mahesh Chandran:

Can LLMs Help You at Work? A Sandbox for Evaluating LLM Agents in Enterprise Environments. 9167-9201 - Viacheslav Sinii, Alexey Gorbatovski, Artem Cherepanov, Boris Shaposhnikov, Nikita Balagansky, Daniil Gavrilov:

Steering LLM Reasoning Through Bias-Only Adaptation. 9202-9211 - Zuojin Tang, Bin Hu, Chenyang Zhao, De Ma, Gang Pan, Bin Liu:

VLASCD: A Visual Language Action Model for Simultaneous Chatting and Decision Making. 9212-9232 - Yew Ken Chia, Liying Cheng, Hou Pong Chan, Maojia Song, Chaoqun Liu, Mahani Aljunied, Soujanya Poria, Lidong Bing:

M-LongDoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework. 9233-9250 - Pu Jian, Junhong Wu, Wei Sun, Chen Wang, Shuo Ren, Jiajun Zhang:

Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models. 9251-9270 - Youquan Li, Miao Zheng, Fan Yang, Guosheng Dong, Bin Cui, Weipeng Chen, Zenan Zhou, Wentao Zhang:

FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback. 9271-9291 - Fabian Karl

, Ansgar Scherp:
HYDRA: A Multi-Head Encoder-only Architecture for Hierarchical Text Classification. 9292-9303 - Pengyu Zeng, Jun Yin, Miao Zhang, Yuqin Dai, Jizhizi Li, ZhanXiang Jin, Shuai Lu:

CARD: Cross-modal Agent Framework for Generative and Editable Residential Design. 9304-9319 - Jusheng Zhang, Yijia Fan, Kaitong Cai, Zimeng Huang, Xiaofei Sun, Jian Wang, Chengpei Tang, Keze Wang:

DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off. 9320-9340 - Thibaut Thonet, Germán Kruszewski, Jos Rozen, Pierre Erbacher, Marc Dymetman:

FaST: Feature-aware Sampling and Tuning for Personalized Preference Alignment with Limited Data. 9341-9370 - Brian S. Lin, Jiaxin Yuan, Zihan Zhou, Shouli Wang, Shuo Wang, Cunliang Kong, Qi Shi, Yuxuan Li, Liner Yang, Zhiyuan Liu, Maosong Sun:

On LLM-Based Scientific Inductive Reasoning Beyond Equations. 9371-9394 - Xiaofu Chen, Israfel Salazar, Yova Kementchedjhieva:

SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation. 9395-9407 - Yuxuan Hu, Jihao Liu, Ke Wang, Jinliang Zheng, Weikang Shi, Manyuan Zhang, Qi Dou, Rui Liu, Aojun Zhou, Hongsheng Li:

LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding. 9408-9421 - Anmol Mekala, Anirudh Atmakuru, Yixiao Song, Marzena Karpinska, Mohit Iyyer:

Does quantization affect models' performance on long-context tasks? 9422-9470 - Tianbo Wang, Yuqing Ma, Kewei Liao, Chengzhao Yang, Zhange Zhang, Jiakai Wang, Xianglong Liu:

Token-Aware Editing of Internal Activations for Large Language Model Alignment. 9471-9509 - Dawid Jan Kopiczko, Tijmen Blankevoort, Yuki M. Asano:

Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs. 9510-9536 - Md. Mehrab Tanjim, Yeonjun In, Xiang Chen, Victor S. Bursztyn, Ryan A. Rossi, Sungchul Kim, Guang-Jie Ren, Vaishnavi Muppala, Shun Jiang, Yongsung Kim, Chanyoung Park:

Disambiguation in Conversational Question Answering in the Era of LLMs and Agents: A Survey. 9537-9550 - Xueguan Zhao, Wenpeng Lu, Chaoqun Zheng, Weiyu Zhang, Jiasheng Si, Deyu Zhou:

Plan Dynamically, Express Rhetorically: A Debate-Driven Rhetorical Framework for Argumentative Writing. 9551-9573 - Kechen Jiao, Zhirui Fang, Jiahao Liu, Bei Li, Qifan Wang, Xinyu Liu, Junhao Ruan, Zhongjian Qiao, Yifan Zhu, Yaxin Xu, Jingang Wang, Xiu Li:

TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making. 9574-9588 - Yifan Xia, Guorui Chen

, Wenqian Yu
, Zhijiang Li, Philip Torr, Jindong Gu:
Reimagining Safety Alignment with An Image. 9589-9603 - Siva Rajesh Kasa, Karan Gupta, Sumegh Roychowdhury, Ashutosh Kumar, Yaswanth Biruduraju, Santhosh Kumar Kasa, Nikhil Priyatam Pattisapu, Arindam Bhattacharya, Shailendra Agarwal, Vijay Huddar:

Generative or Discriminative? Revisiting Text Classification in the Era of Transformers. 9604-9626 - Ziqi Miao, Yi Ding, Lijun Li, Jing Shao:

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection. 9627-9644 - Alessio Cocchieri, Luca Ragazzi, Giuseppe Tagliavini, Lorenzo Tordi, Antonella Carbonaro, Gianluca Moro:

Can Large Language Models Win the International Mathematical Games? 9645-9671 - Jian Yang, Jiaxi Yang, Wei Zhang, Ke Jin, Yibo Miao, Lei Zhang, Liqun Yang, Zeyu Cui, Yichang Zhang, Zhoujun Li, Binyuan Hui, Junyang Lin:

CodeArena: Evaluating and Aligning CodeLLMs on Human Preference. 9672-9683 - Yuekun Yao, Yupei Du, Dawei Zhu, Michael Hahn, Alexander Koller:

Language models can learn implicit multi-hop reasoning, but only if they have lots of training data. 9684-9702 - Joseph Marvin Imperial, Abdullah Barayan

, Regina Stodden, Rodrigo Wilkens, Ricardo Muñoz Sánchez, Lingyun Gao, Melissa Torgbi, Dawn Knight, Gail Forey, Reka R. Jablonkai, Ekaterina Kochmar, Robert Reynolds, Eugénio Ribeiro, Horacio Saggion, Elena Volodina, Sowmya Vajjala, Thomas François, Fernando Alva-Manchego, Harish Tayyar Madabushi:
UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment. 9703-9755 - Jiawei Guo, Feifei Zhai, Pu Jian, Qianrun Wei, Yu Zhou:

CROP: Contextual Region-Oriented Visual Token Pruning. 9756-9772 - Andrew Piper, Robert Budac:

CR4-NarrEmote: An Open Vocabulary Dataset of Narrative Emotions Derived Using Citizen Science. 9773-9784 - Haoqi Yang, Yao Yao, Zuchao Li, Baoyuan Qi, Guoming Liu, Hai Zhao:

XQuant: Achieving Ultra-Low Bit KV Cache Quantization with Cross-Layer Compression. 9785-9800 - Yueyang Cang, Yuhang Liu, Xiaoteng Zhang, Erlu Zhao, Li Shi:

DINT Transformer. 9801-9809 - Zhiyu Cao, Peifeng Li, Qiaoming Zhu:

ICR: Iterative Clarification and Rewriting for Conversational Search. 9810-9824 - Tong Zhang, Kuofeng Gao, Jiawang Bai, Leo Yu Zhang, Xin Yin, Zonghui Wang, Shouling Ji, Wenzhi Chen:

Pre-training CLIP against Data Poisoning with Optimal Transport-based Matching and Alignment. 9825-9838 - Weicong Qin, Yi Xu, Weijie Yu, Teng Shi, Chenglei Shen, Ming He, Jianping Fan, Xiao Zhang, Jun Xu:

Similarity = Value? Consultation Value-Assessment and Alignment for Personalized Search. 9839-9852 - Zhaoyan Gong, Juan Li, Zhiqiang Liu, Lei Liang, Huajun Chen, Wen Zhang:

RTQA : Recursive Thinking for Complex Temporal Knowledge Graph Question Answering with Large Language Models. 9853-9870 - Yao Wang

, Di Liang, Minlong Peng:
Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance. 9871-9885 - Xiaonan Wang, Bo Shao, Hansaem Kim:

AI Knows Where You Are: Exposure, Bias, and Inference in Multimodal Geolocation with KoreaGEO. 9886-9903 - Kairong Han, Wenshuo Zhao, Ziyu Zhao, Ye Jun Jian, Lujia Pan, Kun Kuang:

CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models. 9904-9921 - Zhaoheng Huang, Yutao Zhu, Ji-Rong Wen, Zhicheng Dou:

Enhancing LLM Text Detection with Retrieved Contexts and Logits Distribution Consistency. 9922-9934 - Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, Yonatan Belinkov:

Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps. 9935-9960 - Zichen Wen, Yifeng Gao, Shaobo Wang, Junyuan Zhang, Qintong Zhang, Weijia Li, Conghui He, Linfeng Zhang:

Stop Looking for "Important Tokens" in Multimodal Language Models: Duplication Matters More. 9961-9980 - Yuchen Deng, Shichen Fan, Naibo Wang, Xinkui Zhao, See-Kiong Ng:

AgentPro: Enhancing LLM Agents with Automated Process Supervision. 9981-10006 - Lorenzo Molfetta, Giacomo Frisoni, Nicolò Monaldini, Gianluca Moro:

PORTS: Preference-Optimized Retrievers for Tool Selection with Large Language Models. 10007-10030 - Xin Song, Haiyan Liu, Haiyang Wang, Ye Wang, Kai Chen, Bin Zhou:

MusKGC: A Flexible Multi-source Knowledge Enhancement Framework for Open-World Knowledge Graph Completion. 10031-10049 - Kai Tang, Rui Wang, Renyu Zhu, Minmin Lin, Xiao Ding, Tangjie Lv, Changjie Fan, Runze Wu, Haobo Wang:

Towards Transferable Personality Representation Learning based on Triplet Comparisons and Its Applications. 10050-10066 - Hao Yang, Lizhen Qu, Ehsan Shareghi, Gholamreza Haffari:

Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models. 10067-10079 - Simin Chen, Yiming Chen, Zexin Li, Yifan Jiang, Zhongwei Wan, Yixin He, Dezhi Ran, Tianle Gu, Haizhou Li, Tao Xie, Baishakhi Ray:

Benchmarking Large Language Models Under Data Contamination: A Survey from Static to Dynamic Evaluation. 10080-10098 - Tiansheng Hu, Tongyan Hu, Liuyang Bai, Yilun Zhao, Arman Cohan, Chen Zhao:

FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain. 10099-10128 - Yangqin Jiang, Xubin Ren, Lianghao Xia, Da Luo, Kangyi Lin, Chao Huang:

RecGPT: A Foundation Model for Sequential Recommendation. 10129-10143 - Chih-Kai Yang, Neo S. Ho, Hung-yi Lee:

Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey. 10144-10170 - Nikita Balagansky, Yaroslav Aksenov, Daniil Laptev, Vadim Kurochkin, Gleb Gerasimov, Nikita Koriagin, Daniil Gavrilov:

Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy. 10171-10179 - Taiming Lu, Philipp Koehn:

Learn and Unlearn: Addressing Misinformation in Multilingual LLMs. 10180-10195 - Dulhan Jayalath, James Bradley Wendt, Nicholas Monath, Sandeep Tata, Beliz Gunel:

PRISM: Efficient Long-Range Reasoning With Short-Context LLMs. 10196-10218 - Yichen Tang

, Weihang Su, Yujia Zhou, Yiqun Liu, Min Zhang, Shaoping Ma, Qingyao Ai:
Augmenting Multi-Agent Communication with State Delta Trajectory. 10219-10240 - Dana Arad, Aaron Mueller, Yonatan Belinkov:

SAEs Are Good for Steering - If You Select the Right Features. 10241-10259 - Kyohoon Jin, Juhwan Choi, Jungmin Yun, Junho Lee, Soojin Jang, YoungBin Kim:

CoBA: Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic Triples. 10260-10278 - Milad Alshomary, Nikhil Reddy Varimalla, Vishal Anand, Smaranda Muresan, Kathleen McKeown:

Layered Insights: Generalizable Analysis of Human Authorial Style by Leveraging All Transformer Layers. 10279-10292 - Yingming Zheng, Hanqi Li, Kai Yu, Lu Chen:

When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models. 10293-10308 - Hellina Hailu Nigatu, Atnafu Lambebo Tonja, Henok Biadglign Ademtew, Hizkiel Mitiku Alemayehu, Negasi Haile Abadi, Tadesse Destaw Belay, Seid Muhie Yimam:

A Case Against Implicit Standards: Homophone Normalization in Machine Translation for Languages that use the Ge'ez Script. 10309-10320 - Syeda Jannatus Saba, Steven Skiena:

Evaluating Language Translation Models by Playing Telephone. 10321-10336 - Shuo Yang, Zheyu Zhang

, Bardh Prenkaj, Gjergji Kasneci:
Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs. 10337-10358 - Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gipp:

SPaRC: A Spatial Pathfinding Reasoning Challenge. 10359-10390 - Yao-Ching Yu, Tsun-Han Chiang, Cheng-Wei Tsai, Chien-Ming Huang, Wen-Kwang Tsao:

Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training. 10391-10413 - Yuhang Chen, Zhen Tan, Ajay Kumar Jaiswal, Huaizhi Qu, Xinyu Zhao, Qi Lin, Yu Cheng, Andrew Kwong, Zhichao Cao, Tianlong Chen:

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework. 10414-10424 - Wei Jie Yeo, Ranjan Satapathy, Erik Cambria:

Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models. 10425-10447 - Reza Khanmohammadi, Erfan Miahi, Mehrsa Mardikoraem, Simerjot Kaur, Ivan Brugere, Charese Smiley, Kundan Thind, Mohammad M. Ghassemi:

Calibrating LLM Confidence by Probing Perturbed Representation Stability. 10448-10514 - Yuanzhe Shen, Yide Liu, Zisu Huang, Ruicheng Yin, Xiaoqing Zheng, Xuanjing Huang:

SATER: A Self-Aware and Token-Efficient Approach to Routing and Cascading. 10515-10529 - Rui Ha, Chaozhuo Li, Rui Pu, Litian Zhang, Xi Zhang, Sen Su:

DSG-MCTS: A Dynamic Strategy-Guided Monte Carlo Tree Search for Diversified Reasoning in Large Language Models. 10530-10544 - Juntae Lee, Jihwan Bang, Seunghan Yang, Simyung Chang:

CIFLEX: Contextual Instruction Flow for Sub-task Execution in Multi-Turn Interactions with a Single On-Device LLM. 10545-10559 - Zhuo Liu, Ding Yu, Hangfeng He

:
On the Role of Model Prior in Real-World Inductive Reasoning. 10560-10583 - Hellina Hailu Nigatu, Nikita Mehandru, Negasi Haile Abadi, Blen Gebremeskel, Ahmed Alaa, Monojit Choudhury:

Viability of Machine Translation for Healthcare in Low-Resourced Languages. 10584-10598 - Yilun Qiu, Tianhao Shi, Xiaoyan Zhao, Fengbin Zhu, Yang Zhang, Fuli Feng:

Latent Inter-User Difference Modeling for LLM Personalization. 10599-10617 - Kangyu Qiao, Shaolei Zhang, Yang Feng:

IG-Pruning: Input-Guided Block Pruning for Large Language Models. 10618-10629 - Momoka Furuhashi, Kouta Nakayama, Takashi Kodama, Saku Sugawara:

Are Checklists Really Useful for Automatic Evaluation of Generative Tasks? 10630-10653 - Kirill Semenov

, Rico Sennrich:
Measuring the Effect of Disfluency in Multilingual Knowledge Probing Benchmarks. 10654-10672 - Changyue Wang, Weihang Su, Qingyao Ai, Yichen Tang

, Yiqun Liu:
Knowledge Editing through Chain-of-Thought. 10673-10693 - Qian Dong, Jia Chen, Qingyao Ai, Hongning Wang, Haitao Li, Yi Wu, Yao Hu, Yiqun Liu, Shaoping Ma:

SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code Generation. 10694-10705 - Yufei Wang, Adriana Kovashka:

Probing Logical Reasoning of MLLMs in Scientific Diagrams. 10706-10718 - Huishuai Zhang, Bohan Wang, Luoxin Chen:

AdamS: Momentum Itself Can Be A Normalizer for LLM Pretraining and Post-training. 10719-10738 - Feiyang Kang, Newsha Ardalani, Michael Kuchnik, Youssef Emad, Mostafa Elhoushi, Shubhabrata Sengupta, Shang-wen Li, Ramya Raghavendra, Ruoxi Jia, Carole-Jean Wu:

Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls. 10739-10758 - Yumeng Shi, Quanyu Long, Wenya Wang

:
Static or Dynamic: Towards Query-Adaptive Token Selection for Video Question Answering. 10759-10771 - Zonghai Yao, Michael Sun, Won Seok Jang, Sunjae Kwon, Soie Kwon, Hong Yu:

DischargeSim: A Simulation Benchmark for Educational Doctor-Patient Communication at Discharge. 10772-10798 - Monjoy Narayan Choudhury, Junling Wang

, Yifan Hou, Mrinmaya Sachan:
Can Vision-Language Models Solve Visual Math Equations? 10799-10808 - Benlu Wang, Iris Xia, Yifan Zhang, Junda Wang, Feiyun Ouyang, Shuo Han, Arman Cohan, Hong Yu, Zonghai Yao:

From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical Calculations. 10809-10833 - Yi Sui, Chaozhuo Li, Chen Zhang, Dawei Song, Qiuchi Li:

Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge. 10834-10858 - Ziliang Qiu, Renfen Hu:

Deep Associations, High Creativity: A Simple yet Effective Metric for Evaluating Large Language Models. 10859-10872 - Advit Deepak, Megan Mou, Jing Huang, Diyi Yang:

Identifying Unlearned Data in LLMs via Membership Inference Attacks. 10873-10892 - Zihao Li, Xu Wang, Yuzhe Yang, Ziyu Yao

, Haoyi Xiong, Mengnan Du:
Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models. 10893-10913 - KV Aditya Srivatsa, Kaushal Kumar Maurya, Ekaterina Kochmar:

LLMs cannot spot math errors, even when allowed to peek into the solution. 10914-10928 - Haoyu Huang, Chong Chen, Zeang Sheng, Yang Li, Wentao Zhang:

Can LLMs be Good Graph Judge for Knowledge Graph Construction? 10929-10948 - Zhi Zhang, Yixian Shen, Congfeng Cao, Ekaterina Shutova:

NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning. 10949-10966 - Abdellah El Mekki, Houdaifa Atou, Omer Nacar, Shady Shehata, Muhammad Abdul-Mageed:

NileChat: Towards Linguistically Diverse and Culturally Aware LLMs for Local Communities. 10967-10991 - Yuan Gao, Weiwei Sun:

A Computational Simulation of Language Production in First Language Acquisition. 10992-11006 - Danna Zheng, Mirella Lapata, Jeff Z. Pan:

Long-Form Information Alignment Evaluation Beyond Atomic Facts. 11007-11027 - AbdelRahim A. Elmadany, Sang Yun Kwon, Hawau Olamide Toyin, Alcides Alcoba Inciarte, Hanan Aldarmaki, Muhammad Abdul-Mageed:

Voice of a Continent: Mapping Africa's Speech Technology Frontier. 11028-11050 - Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma:

Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains. 11051-11079 - Bo Chen, Xiaoyu Li, Yingyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Jiahao Zhang:

Circuit Complexity Bounds for RoPE-based Transformer Architecture. 11080-11097 - Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma:

Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments. 11098-11126 - Yingyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang:

Towards Infinite-Long Prefix in Transformer. 11127-11191 - Zixian Ma, Jianguo Zhang, Zhiwei Liu, Jieyu Zhang, Juntao Tan, Manli Shu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Caiming Xiong, Ranjay Krishna, Silvio Savarese:

LATTE: Learning to Think with Vision Specialists. 11192-11229 - Xianren Zhang, Hui Liu, Delvin Ce Zhang, Xianfeng Tang, Qi He, Dongwon Lee, Suhang Wang:

SUA: Stealthy Multimodal Large Language Model Unlearning Attack. 11230-11243 - Hongbo Liu

, Jia Xu:
ResFormer: All-Time Reservoir Memory for Long Sequence Classification. 11244-11256 - Zeping Yu, Yonatan Belinkov, Sophia Ananiadou:

Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models. 11257-11272 - Enora Rice, Katharina von der Wense, Alexis Palmer:

Interdisciplinary Research in Conversation: A Case Study in Computational Morphology for Language Documentation. 11273-11285 - Huanxin Sheng, Xinyi Liu, Hangfeng He

, Jieyu Zhao, Jian Kang:
Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction. 11286-11328 - Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang:

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time. 11329-11354 - Miao Zhou, Lina Yang, Thomas Wu, Dongnan Yang, Xinru Zhang:

Dual-Path Dynamic Fusion with Learnable Query for Multimodal Sentiment Analysis. 11355-11365 - Yunzhi Yao, Jizhan Fang, Jia-Chen Gu, Ningyu Zhang, Shumin Deng, Huajun Chen, Nanyun Peng:

CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners. 11366-11382 - Yuheng Wu, Jianwen Xie, Denghui Zhang, Zhaozhuo Xu:

DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic. 11383-11397 - Yangyifan Xu, Shuo Ren, Jiajun Zhang:

Collaborative Beam Search: Enhancing LLM Reasoning via Collective Consensus. 11398-11410 - Keane Ong, Rui Mao, Deeksha Varshney, Paul Pu Liang, Erik Cambria, Gianmarco Mengaldo:

Deriving Strategic Market Insights with Large Language Models: A Benchmark for Forward Counterfactual Generation. 11411-11434 - Zhuohang Li, Chao Yan, Nicholas J. Jackson, Wendi Cui, Bo Li, Jiaxin Zhang, Bradley A. Malin:

Towards Statistical Factuality Guarantee for Large Vision-Language Models. 11435-11456 - Guangzhi Sun, Potsawee Manakul, Xiao Zhan, Mark J. F. Gales:

Unlearning vs. Obfuscation: Are We Truly Removing Knowledge? 11457-11467 - Bolian Li, Yanran Wu, Xinyu Luo, Ruqi Zhang:

Reward-Shifted Speculative Sampling Is An Efficient Test-Time Weak-to-Strong Aligner. 11468-11478 - Ruiyu Xiao, Lei Wu

, Yuanxing Liu, Weinan Zhang, Ting Liu:
Stimulate the Critical Thinking of LLMs via Debiasing Discussion. 11479-11492 - Xintong Li, Jalend Bantupalli, Ria Dharmani, Yuwei Zhang, Jingbo Shang:

Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit Reasoning. 11493-11506 - Ozan Irsoy, Pengxiang Cheng

, Jennifer L. Chen, Daniel Preotiuc-Pietro, Shiyue Zhang, Duccio Pappadopulo:
Improving Instruct Models for Free: A Study on Partial Adaptation. 11507-11521 - Xintong Li, Junda Wu, Tong Yu, Rui Wang, Yu Wang, Xiang Chen, Jiuxiang Gu, Lina Yao, Julian J. McAuley, Jingbo Shang:

CoMMIT: Coordinated Multimodal Instruction Tuning. 11522-11536 - Tianhao Wu, Weizhe Yuan, Olga Golovneva, Jing Xu, Yuandong Tian, Jiantao Jiao, Jason E. Weston, Sainbayar Sukhbaatar:

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge. 11537-11554 - Song Wang, Zhen Tan, Zihan Chen, Shuang Zhou, Tianlong Chen, Jundong Li:

AnyMAC: Cascading Flexible Multi-Agent Collaboration via Next-Agent Prediction. 11555-11567 - Nishant Balepur, Matthew Shu, Yoo Yeon Sung, Seraphina Goldfarb-Tarrant, Shi Feng, Fumeng Yang, Rachel Rudinger

, Jordan Lee Boyd-Graber:
A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users. 11568-11595 - Jocelyn J. Shen, Akhila Yerukola, Xuhui Zhou, Cynthia Breazeal, Maarten Sap, Hae-Won Park:

Words Like Knives: Backstory-Personalized Modeling and Detection of Violent Communication. 11596-11614 - Song Wang, Zihan Chen, Peng Wang, Zhepei Wei, Zhen Tan, Yu Meng, Cong Shen, Jundong Li:

Separate the Wheat from the Chaff: Winnowing Down Divergent Views in Retrieval Augmented Generation. 11615-11631 - Devin R. Wright

, Jisun An, Yong-Yeol Ahn:
Cognitive Linguistic Identity Fusion Score (CLIFS): A Scalable Cognition-Informed Approach to Quantifying Identity Fusion from Text. 11632-11662 - Tan-Hanh Pham, Hoang-Nam Le, Phu-Vinh Nguyen, Chris Ngo, Truong-Son Hy:

SilVar: Speech-Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization. 11663-11674 - Amirhossein Abaskohi, Raymond Li, Chuyuan Li, Shafiq Joty, Giuseppe Carenini:

CEMTM: Contextual Embedding-based Multimodal Topic Modeling. 11675-11692 - Jonathan Rusert:

RedHerring Attack: Testing the Reliability of Attack Detection. 11693-11708 - Cui Ding, Yanning Yin, Lena Ann Jäger, Ethan Wilcox:

Modeling Bottom-up Information Quality during Language Processing. 11709-11721 - Tian Qin, Naomi Saphra

, David Alvarez-Melis:
Data Drives Unstable Hierarchical Generalization in LMs. 11722-11740 - Jiahao Qiu, Yinghui He, Xinzhe Juan, Yimin Wang, Yuhan Liu, Zixin Yao, Yue Wu, Xun Jiang, Ling Yang, Mengdi Wang:

EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety. 11741-11756 - Ayush Gupta, Ramneet Kaur, Anirban Roy, Adam D. Cobb, Rama Chellappa, Susmit Jha:

Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs. 11757-11770 - François Ledoyen, Gaël Dias, Jérémie Pantin, Alexis Lechervy, Fabrice Maurel, Youssef Chahir:

Facilitating Cognitive Accessibility with LLMs: A Multi-Task Approach to Easy-to-Read Text Generation. 11771-11797 - Yiyang Huang, Yizhou Wang, Yun Fu:

D-CoDe: Scaling Image-Pretrained VLMs to Video via Dynamic Compression and Question Decomposition. 11798-11811 - Ruochen Li, Jun Li, Bailiang Jian, Kun Yuan, Youxiang Zhu:

ReEvalMed: Rethinking Medical Report Evaluation by Aligning Metrics with Real-World Clinical Judgment. 11812-11826 - Khai Le-Duc, Tuyen Tran, Bach Phan Tat, Nguyen Kim Hai Bui, Quan Dang Anh, Hung-Phong Tran, Thanh Thuy Nguyen, Ly Nguyen, Tuan-Minh Phan, Thi Thu Phuong Tran, Chris Ngo, Nguyen X. Khanh, Thanh Nguyen-Tang:

MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation. 11827-11952 - Nafis Irtiza Tripto, Saranya Venkatraman, Mahjabin Nahar, Dongwon Lee:

Beyond Checkmate: Exploring the Creative Choke Points for AI Generated Texts. 11953-11970 - Jushaan Singh Kalra, Xinran Zhao, To Eun Kim, Fengyu Cai, Fernando Diaz, Tongshuang Wu:

MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers. 11971-11990 - Seunghan Yang, Juntae Lee, Jihwan Bang, Kyuhong Shim, Minsoo Kim, Simyung Chang:

Learning Contextual Retrieval for Robust Conversational Search. 11991-12003 - Reza Averly, Frazier N. Baker, Ian A Watson, Xia Ning:

LIDDIA: Language-based Intelligent Drug Discovery Agent. 12004-12028 - Weihua Du, Pranjal Aggarwal, Sean Welleck, Yiming Yang:

Agentic-R1: Distilled Dual-Strategy Reasoning. 12029-12043 - Yichi Zhang, Xin Luna Dong, Zhaojiang Lin, Andrea Madotto, Anuj Kumar, Babak Damavandi, Joyce Chai, Seungwhan Moon:

Proactive Assistant Dialogue Generation from Streaming Egocentric Videos. 12044-12068 - Dayeon Ki, Kevin Duh, Marine Carpuat:

Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine Translation. 12069-12092 - Ali Salamatian

, Amirhossein Abaskohi, Wan-Cyuan Fan, Mir Rayat Imtiaz Hossain, Leonid Sigal, Giuseppe Carenini:
ChartGaze: Enhancing Chart Understanding in LVLMs with Eye-Tracking Guided Attention Refinement. 12093-12113 - Yanzhen Shen, Sihao Chen, Xueqiang Xu, Yunyi Zhang, Chaitanya Malaviya, Dan Roth:

LogiCoL: Logically-Informed Contrastive Learning for Set-based Dense Retrieval. 12114-12125 - Fanhu Zeng, Fei Zhu, Haiyang Guo, Xu-Yao Zhang, Cheng-Lin Liu:

ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided Prompt. 12126-12141 - Xiaoshu Chen, Sihang Zhou, Ke Liang, Xiaoyu Sun, Xinwang Liu:

Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster. 12142-12157 - Fengyuan Liu, Rui Zhao, Shuo Chen, Guohao Li, Philip Torr, Lei Han, Jindong Gu:

Can an Individual Manipulate the Collective Decisions of Multi-Agents? 12158-12182 - Yujia Hu, Ming Shan Hee, Preslav Nakov, Roy Ka-Wei Lee:

Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages. 12183-12201 - Xiaotong Zhang, Ying Li:

Improving Clustering with Positive Pairs Generated from LLM-Driven Labels. 12202-12218 - Lijia Lv, Yuanshu Zhao, Guan Wang, Xuehai Tang, Jie Wen, Jizhong Han, Songlin Hu:

Gamma-Guard: Lightweight Residual Adapters for Robust Guardrails in Large Language Models. 12219-12231 - Jingyang Lin, Andy Wong, Tian Xia, Shenghua He, Hui Wei, Mei Han, Jiebo Luo:

Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning. 12232-12248 - Ya Su, Hu Zhang, Yue Fan

, Guangjun Zhang, Yujie Wang, Ru Li, Hongye Tan:
Dynamic Energy-Based Contrastive Learning with Multi-Stage Knowledge Verification for Event Causality Identification. 12249-12267 - Zhipeng Bian, Jieming Zhu, Qijiong Liu, Wang Lin, Guohao Cai, Zhaocheng Du, Jiacheng Sun, Zhou Zhao, Zhenhua Dong:

ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment. 12268-12278 - Jianzhi Yan

, Le Liu, Youcheng Pan, Shiwei Chen, Zike Yuan, Yang Xiang, Buzhou Tang:
From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement. 12279-12295 - Chong Tian, Qirong Ho, Xiuying Chen:

A Symbolic Adversarial Learning Framework for Evolving Fake News Generation and Detection. 12296-12310 - Huimin Wang, Yutian Zhao, Yefeng Zheng, Xian Wu:

RareSyn: Health Record Synthesis for Rare Disease Diagnosis. 12311-12327 - Jie Chen, Jinhao Jiang, Yingqian Min, Zican Dong, Shijie Wang, Wayne Xin Zhao, Ji-Rong Wen:

Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework. 12328-12338 - Guixian Xu, Zeli Su, Ziyin Zhang, Jianing Liu, Xu Han, Ting Zhang, Yushuang Dong:

CMHG: A Dataset and Benchmark for Headline Generation of Minority Languages in China. 12339-12346 - Xu Shen, Yixin Liu, Yiwei Dai, Yili Wang, Rui Miao, Yue Tan, Shirui Pan, Xin Wang:

Understanding the Information Propagation Effects of Communication Topologies in LLM-based Multi-Agent Systems. 12347-12361 - Chao Huang, Fengran Mo, Yufeng Chen, Changhao Guan, Zhenrui Yue, Xinyu Wang, Jinan Xu, Kaiyu Huang:

Boosting Data Utilization for Multilingual Dense Retrieval. 12362-12378 - Chien Hung Chen, Hen-Hsen Huang, Hsin-Hsi Chen:

Self-Augmented Preference Alignment for Sycophancy Reduction in LLMs. 12379-12391 - Hang Ni, Fan Liu, Xinyu Ma, Lixin Su, Shuaiqiang Wang, Dawei Yin, Hui Xiong, Hao Liu:

TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning. 12392-12418 - Ivory Yang, Xiaobo Guo, Yuxin Wang, Hefan Zhang, Yaning Jia

, William Dinauer, Soroush Vosoughi:
Recontextualizing Revitalization: A Mixed Media Approach to Reviving the Nüshu Language. 12419-12428 - Chuxue Cao, Mengze Li, Juntao Dai, Jinluan Yang, Zijian Zhao, Shengyu Zhang, Weijie Shi, Chengzhong Liu, Sirui Han, Yike Guo:

Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving. 12429-12449 - Tianduo Wang, Lu Xu, Wei Lu, Shanbo Cheng:

From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition. 12450-12464 - Yong Zhao, Kai Xu, Zhengqiu Zhu, Yue Hu, Zhiheng Zheng, Yingfeng Chen, Yatai Ji, Chen Gao, Yong Li, Jincai Huang:

CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space. 12465-12480 - Sreetama Sarkar, Yue Che, Alex Gavin, Peter Anthony Beerel, Souvik Kundu:

Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression. 12481-12500 - Yu Wang, Nan Yang, Liang Wang, Furu Wei, Fuli Feng:

Examining False Positives under Inference Scaling for Mathematical Reasoning. 12501-12520 - Yikang Liu

, Wanyang Zhang, Yiming Wang
, Jialong Tang, Pei Zhang, Baosong Yang, Fei Huang, Rui Wang, Hai Hu:
Translationese-index: Using Likelihood Ratios for Graded and Generalizable Measurement of Translationese. 12521-12538 - Ruifeng Ren, Zhicong Li, Yong Liu:

Exploring the Limitations of Mamba in COPY and CoT Reasoning. 12539-12563 - Dong Wang, Xinghang Li, Zhengshen Zhang, Jirong Liu, Xiao Ma, Hanbo Zhang, Tao Kong, Huaping Liu:

ProcWorld: Benchmarking Large Model Planning in Reachability-Constrained Environments. 12564-12594 - Kaijie Chen, Zihao Lin, Zhiyang Xu, Ying Shen, Yuguang Yao, Joy Rimchala, Jiaxin Zhang, Lifu Huang:

R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation. 12595-12630 - Xiaoqiang Kang

, Shengen Wu
, Zimu Wang, Yilin Liu, Xiaobo Jin, Kaizhu Huang, Wei Wang, Yutao Yue, Xiaowei Huang, Qiufeng Wang:
Can GRPO Boost Complex Multimodal Table Understanding? 12631-12644 - Agam Goyal, Xianyang Zhan, Yilun Chen, Koustuv Saha, Eshwar Chandrasekharan:

MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance. 12645-12660 - Jingcheng Deng, Zhongtao Jiang, Liang Pang, Zihao Wei, Liwei Chen, Kun Xu, Yang Song, Huawei Shen, Xueqi Cheng:

Following the Autoregressive Nature of LLM Embeddings via Compression and Alignment. 12661-12677 - Chumeng Liang, Jiaxuan You:

Evaluating LLM-Generated Diagrams as Graphs. 12678-12690 - Agam Goyal, Vedant Rathi, William Yeh, Yian Wang, Yuen Chen, Hari Sundaram:

Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders. 12691-12709 - Shi-Yu Tian

, Zhi Zhou, Kun-Yang Yu, Ming Yang, Lin-Han Jia, Lan-Zhe Guo, Yufeng Li:
VCSearch: Bridging the Gap Between Well-Defined and Ill-Defined Problems in Mathematical Reasoning. 12710-12731 - Wang Peixu, Chen Yu, Yu Ming, Cheng Xiang:

How do autoregressive transformers solve full addition? 12732-12756 - Fanyi Yang, Jianfeng Liu, Xin Zhang, Haoyu Liu, Xixin Cao, Yuefeng Zhan, Hao Sun, Weiwei Deng, Feng Sun, Qi Zhang:

MAIN: Mutual Alignment Is Necessary for instruction tuning. 12757-12769 - Dingwei Chen, Ziqiang Liu, Feiteng Fang, Chak Tou Leong, Shiwen Ni, Ahmadreza Argha, Hamid Alinejad-Rokny, Min Yang, Chengming Li:

Expanding before Inferring: Enhancing Factuality in Large Language Models through Premature Layers Interpolation. 12770-12785 - Wenyu Qiu, Yuxiong Wang, Jiajun Tan, Hanchao Hou, Qinda Liu, Wei Yao, Shiguang Ni:

DeepWell-Adol: A Scalable Expert-Based Dialogue Corpus for Adolescent Positive Mental Health and Wellbeing Promotion. 12786-12810 - Xiaoqun Liu, Jiacheng Liang, Luoxi Tang, Muchao Ye, Weicheng Ma

, Zhaohan Xi:
Data to Defense: The Role of Curation in Aligning Large Language Models Against Safety Compromise. 12811-12826 - Xuekang Wang, Shengyu Zhu, Xueqi Cheng:

Speculative Safety-Aware Decoding. 12827-12841 - Jihyun Lee, Yejin Min, San Kim

, Yejin Jeon, SungJun Yang, Hyounghun Kim, Gary Lee:
PanicToCalm: A Proactive Counseling Agent for Panic Attacks. 12842-12874 - Youngbin Choi, Seunghyuk Cho, Minjong Lee, MoonJeong Park, Yesong Ko, Jungseul Ok, Dongwoo Kim:

CoPL: Collaborative Preference Learning for Personalizing LLMs. 12875-12893 - Chao Hao, Zezheng Wang, Yanhua Huang, Ruiwen Xu, Wenzhe Niu, Xin Liu, Zitong Yu:

Dynamic Collaboration of Multi-Language Models based on Minimal Complete Semantic Units. 12894-12911 - Wenwen Li, Kangwei Shi, Yidong Chai:

AI Chatbots as Professional Service Agents: Developing a Professional Identity. 12912-12925 - Zhuoyuan Mao, Mengjie Zhao, Qiyu Wu, Hiromi Wakaki, Yuki Mitsufuji:

DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning. 12926-12948 - Giulia Pucci, Leonardo Ranaldi:

Advancing Oversight Reasoning across Languages for Audit Sycophantic Behaviour via X-Agent. 12949-12965 - Han Peng, Jinhao Jiang, Zican Dong, Wayne Xin Zhao, Lei Fang:

CAFE: Retrieval Head-based Coarse-to-Fine Information Seeking to Enhance Multi-Document QA Capability. 12966-12978 - Senyu Li, Jiayi Wang, Felermino D. M. A. Ali, Colin Cherry, Daniel Deutsch, Eleftheria Briakou, Rui Sousa-Silva, Henrique Lopes Cardoso

, Pontus Stenetorp, David Ifeoluwa Adelani:
SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages? 12979-12998 - Nakyeong Yang, Minsung Kim, Seunghyun Yoon, Joongbo Shin, Kyomin Jung:

FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge. 12999-13014 - Weiyi Yang, Richong Zhang, Junfan Chen, Jiawei Sheng:

Calibrating Pseudo-Labeling with Class Distribution for Semi-supervised Text Classification. 13015-13028 - Wei Yang, Jinwei Xiao, Hongming Zhang, Qingyang Zhang, Yanna Wang, Bo Xu:

Coarse-to-Fine Grounded Memory for LLM Agent Planning. 13029-13056 - Xisheng Xiao, Hanlin Zhao:

From A and B to A+B: Can Large Language Models Solve Compositional Math Problems? 13057-13078 - Mohammad Beigi, Ying Shen, Parshin Shojaee, Qifan Wang, Zichao Wang, Chandan K. Reddy, Ming Jin, Lifu Huang:

Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories. 13079-13092 - Bangde Du, Ziyi Ye, Zhijing Wu, Monika Jankowska, Shuqi Zhu, Qingyao Ai, Yujia Zhou, Yiqun Liu:

SimVBG: Simulating Individual Values by Backstory Generation. 13093-13122 - Dingchu Zhang, Yida Zhao, Jialong Wu, Liwen Zhang, Baixuan Li, Wenbiao Yin, Yong Jiang, Yu-Feng Li, Kewei Tu, Pengjun Xie, Fei Huang:

EvolveSearch: An Iterative Self-Evolving Search Agent. 13123-13136 - Canmiao Zhou, Han Huang:

Syntax-Aware Retrieval Augmentation for Neural Symbolic Regression. 13137-13147 - Dingkun Zhang, Shuhan Qi, Xinyu Xiao, Kehai Chen, Xuan Wang:

Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMs. 13148-13164 - Chunyang Jiang

, Chi-Min Chan, Yiyang Cai, Yulong Liu, Wei Xue, Yike Guo:
Graceful Forgetting in Generative Language Models. 13165-13180 - Yunxiao Shi, Haoning Shang, Xing Zi, Wujiang Xu, Yue Feng, Min Xu:

Answering Narrative-Driven Recommendation Queries via a Retrieve-Rank Paradigm and the OCG-Agent. 13181-13202 - Hongbo Zhang, Han Cui, Guangsheng Bao, Linyi Yang, Jun Wang, Yue Zhang:

Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values. 13203-13216 - Brendan Murphy, Dillon Bowen, Shahrad Mohammadzadeh, Tom Tseng, Julius Broomfield, Adam Gleave, Kellin Pelrine:

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility. 13217-13246 - Jiyuan Liu, Jiaxing Yan, Chunjiang Zhu, Xingyu Liu, Li Qing, Yanghui Rao:

Neural Topic Modeling via Contextual and Graph Information Fusion. 13247-13263 - Jiyuan Liu, Jielin Song, Yunhe Pang, Zhiyu Shen, Yanghui Rao:

CARE: A Disagreement Detection Framework with Concept Alignment and Reasoning Enhancement. 13264-13279 - Yejin Yoon

, Yuri Son, Namyoung So, Minseo Kim, Minsoo Cho, Chanhee Park, Seungshin Lee, Taeuk Kim
:
Beyond Task-Oriented and Chitchat Dialogues: Proactive and Transition-Aware Conversational Agents. 13280-13306 - Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang:

LightThinker: Thinking Step-by-Step Compression. 13307-13328 - Minglai Yang, Ethan Huang, Liang Zhang, Mihai Surdeanu, William Yang Wang, Liangming Pan:

How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark. 13329-13347 - Debdeep Sanyal, Agniva Maiti, Umakanta Maharana, Dhruv Kumar, Ankur Mali, C. Lee Giles, Murari Mandal:

Investigating Pedagogical Teacher and Student LLM Agents: Genetic Adaptation Meets Retrieval-Augmented Generation Across Learning Styles. 13348-13389 - Yujie Feng, Li-Ming Zhan, Zexin Lu, Yongxin Xu, Xu Chu, Yasha Wang, Jiannong Cao, Philip S. Yu, Xiao-Ming Wu:

GeoEdit: Geometric Knowledge Editing for Large Language Models. 13390-13405 - Bo Li, Huanming Zhang, Yuhua Jiang, Yucong Wang, Tengyu Zhang, Shaoqiang Yan, Hongyao Li, Yihong Liu, Feifei Gao:

A Generative Pre-Trained Language Model for Channel Prediction in Wireless Communications Systems. 13406-13419 - Yujie Feng, Jian Li, Xiaoyu Dong, Pengfei Xu, Xiaohui Zhou, Yujia Zhang, Zexin Lu, Yasha Wang, Alan Zhao, Xu Chu, Xiao-Ming Wu:

AIMMerging: Adaptive Iterative Model Merging Using Training Trajectories for Language Model Continual Learning. 13420-13437 - Shuaijie She, Junxiao Liu, Yifeng Liu, Jiajun Chen, Xin Huang

, Shujian Huang:
R-PRM: Reasoning-Driven Process Reward Modeling. 13438-13451 - Yuqian Fu, Yuanheng Zhu, Jiajun Chai, Guojun Yin, Wei Lin, Qichao Zhang, Dongbin Zhao:

RLAE: Reinforcement Learning-Assisted Ensemble for LLMs. 13452-13466 - Yang Yan, Yu Lu, Renjun Xu, Zhenzhong Lan:

Do Large Language Models Truly Grasp Addition? A Rule-Focused Diagnostic Using Two-Integer Arithmetic. 13467-13483 - Xuan Zhang, Yongliang Shen, Zhe Zheng, Linjuan Wu, Wenqi Zhang, Yuchen Yan

, Qiuying Peng, Jun Wang, Weiming Lu:
AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification. 13484-13511 - Chengpeng Li, Mingfeng Xue, Zhenru Zhang, Jiaxi Yang, Beichen Zhang, Bowen Yu, Binyuan Hui, Junyang Lin, Xiang Wang, Dayiheng Liu:

START: Self-taught Reasoner with Tools. 13512-13553 - Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim:

The Impact of Negated Text on Hallucination with Large Language Models. 13554-13572 - Zhe Yang, Yichang Zhang, Yudong Wang, Ziyao Xu, Junyang Lin, Zhifang Sui:

A Probabilistic Inference Scaling Theory for LLM Self-Correction. 13573-13587 - Wei Zhai, Nan Bai, Qing Zhao, Jianqiang Li, Fan Wang, Hongzhi Qi, Meng Jiang, Xiaoqin Wang, Bing Xiang Yang, Guanghui Fu:

MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media. 13588-13603 - Xurui Li, Wanghaijiao, Kaisong Song, Rui Zhu, Haixu Tang:

Knowledge-Aware Co-Reasoning for Multidisciplinary Collaboration. 13604-13620 - Yueen Ma, Dafeng Chi, Shiguang Wu, Yuecheng Liu, Yuzheng Zhuang, Irwin King:

Astra: Efficient Transformer Architecture and Contrastive Dynamics Learning for Embodied Instruction Following. 13621-13639 - Woohyun Cho, Youngmin Kim, Sunghyun Lee, Youngjae Yu:

MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation. 13640-13668 - Wenshuo Zhao, Haoxing Zhai, Xinyu Qiu, Zhenting Qi, Shuhe Li, Linchao Zhu:

MuTIS: Enhancing Reasoning Efficiency through Multi Turn Intervention Sampling in Reinforcement Learning. 13669-13681 - Yanzhi Tian

, Zeming Liu, Zhengyang Liu, Chong Feng, Xin Li, Heyan Huang, Yuhang Guo:
PRIM: Towards Practical In-Image Multilingual Machine Translation. 13682-13697 - Beatrice Savoldi, Giuseppe Attanasio, Eleonora Cupin, Eleni Gkovedarou

, Janiça Hackenbuchner, Anne Lauscher
, Matteo Negri, Andrea Piergentili, Manjinder Thind, Luisa Bentivogli:
Mind the Inclusivity Gap: Multilingual Gender-Neutral Translation Evaluation with mGeNTE. 13698-13720 - Jianxiang Peng, Ling Shi, Xinwei Wu, Hanwen Zhang, Fujiang Liu, Haocheng Lyu, Deyi Xiong:

DiplomacyAgent: Do LLMs Balance Interests and Ethical Principles in International Events? 13721-13739 - She Yifei, Xinhao Wei, Yulong Wang:

DisLoRA: Task-specific Low-Rank Adaptation via Orthogonal Basis from Singular Value Decomposition. 13740-13755 - Zixin Chen, Sicheng Song, KaShun Shum, Yanna Lin, Rui Sheng, Weiqi Wang, Huamin Qu:

Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering. 13756-13789 - Lingjie Jiang, Shaohan Huang, Xun Wu, Furu Wei:

Textual Aesthetics in Large Language Models. 13790-13818 - Jan Bakker, Jaap Kamps:

Section-Level Simplification of Biomedical Abstracts. 13819-13833 - Abhinav Joshi, Vaibhav Sharma, Sanjeet Singh, Ashutosh Modi:

PoseStitch-SLT: Linguistically Inspired Pose-Stitching for End-to-End Sign Language Translation. 13834-13853 - Avyav Kumar Singh, Helen Yannakoudakis:

Few-Shot Open-Set Classification via Reasoning-Aware Decomposition. 13854-13875 - Beatrice Savoldi, Alan Ramponi, Matteo Negri, Luisa Bentivogli:

Translation in the Hands of Many: Centering Lay Users in Machine Translation Interactions. 13876-13889 - Yirong Zeng

, Xiao Ding, Yuxian Wang, Weiwen Liu, Yutai Hou, Wu Ning, Xu Huang, Duyu Tang, Dandan Tu, Bing Qin, Ting Liu:
iTool: Reinforced Fine-Tuning with Dynamic Deficiency Calibration for Advanced Tool Use. 13890-13905 - Guangzhan Wang, Hongyu Zhang, Beijun Shen, Xiaodong Gu:

Transplant Then Regenerate: A New Paradigm for Text Data Augmentation. 13906-13920 - Agostina Calabrese, Tom Sherborne, Björn Ross, Mirella Lapata:

Compositional Generalisation for Explainable Hate Speech Detection. 13921-13943 - Jinyoung Kim, Ji Won Yoon:

CCQA: Generating Question from Solution Can Improve Inference-Time Reasoning in SLMs. 13944-13956 - Jiu Sha, Yu Weng, Mengxiao Zhu, Chong Feng, Zheng Liu, Jialedongzhu:

TVQACML: Benchmarking Text-Centric Visual Question Answering in Multilingual Chinese Minority Languages. 13957-13967 - Shane Storks, Itamar Bar-Yossef, Yayuan Li, Zheyuan Zhang, Jason J. Corso, Joyce Chai:

Transparent and Coherent Procedural Mistake Detection. 13968-14002 - Jie Wu, Haoling Li, Xin Zhang, Xiao Liu, Yangyu Huang, Jianwen Luo, Yizhen Zhang, Zuchao Li, Ruihang Chu, Yujiu Yang, Scarlett Li:

Teaching Your Models to Understand Code via Focal Preference Alignment. 14003-14023 - Xixi Wu, Yanchao Tan, Nan Hou, Ruiyang Zhang, Hong Cheng:

MoLoRAG: Bootstrapping Document Understanding via Multi-modal Logic-aware Retrieval. 14024-14045 - Ioanna Ntinou, Alexandros Xenos, Yassine Ouali, Adrian Bulat, Georgios Tzimiropoulos:

Vision-Free Retrieval: Rethinking Multimodal Search with Textual Scene Descriptions. 14046-14062 - Xiaohan Yu, Pu Jian, Chong Chen:

TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning. 14063-14082 - Jongyeop Hyun

, Bumsoo Kim:
Retrieval Enhanced Feedback via In-context Neural Error-book. 14083-14098 - Jiachen Yu

, Shaoning Sun, Xiaohui Hu, Jiaxu Yan, Kaidong Yu, Xuelong Li:
Improve LLM-as-a-Judge Ability as a General Ability. 14099-14115 - Zhiwen Ruan

, Yixia Li, Yefeng Liu, Yun Chen, Weihua Luo, Peng Li, Yang Liu, Guanhua Chen
:
G2: Guided Generation for Enhanced Output Diversity in LLMs. 14116-14134 - Yuejin Xie, Youliang Yuan, Wenxuan Wang, Fan Mo, Jianmin Guo, Pinjia He:

ToolSafety: A Comprehensive Dataset for Enhancing Safety in LLM-Based Agent Tool Invocations. 14135-14156 - Sangyeon Cho, Mingi Kim, Jinkwon Hwang, Jaehoon Go, Minuk Ma, Sunjae Yoon, Junyeong Kim:

Learning to See through Sound: From VggCaps to Multi2Cap for Richer Automated Audio Captioning. 14157-14175 - Guohong Li, Deyi Xiong:

Towards Optimal Evaluation Efficiency for Large Language Models. 14176-14183 - Yiheng Hu, Xiaoyang Wang, Qing Liu, Xiwei Xu, Qian Fu, Wenjie Zhang, Liming Zhu:

MMAPG: A Training-Free Framework for Multimodal Multi-hop Question Answering via Adaptive Planning Graphs. 14184-14200 - Sugyeong Eo, Jung Jun Lee, Chanjun Park, Heuiseok Lim:

Mixture-of-Clustered-Experts: Advancing Expert Specialization and Generalization in Instruction Tuning. 14201-14212 - Yufan Ye, Ting Zhang, Wenbin Jiang, Hua Huang:

Process-Supervised Reinforcement Learning for Code Generation. 14213-14226 - Yifei Song, Claire Gardent:

MuCAL: Contrastive Alignment for Preference-Driven KG-to-Text Generation. 14227-14270 - Wei Wang, Zhaowei Li, Qi Xu, Linfeng Li, Yiqing Cai, Botian Jiang, Hang Song, Xingcan Hu, Pengyu Wang, Li Xiao:

Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models. 14271-14290 - Menghua Wu, Cai Zhou, Stephen Bates, Tommi S. Jaakkola:

Thought calibration: Efficient and confident test-time scaling. 14291-14305 - Ziling Cheng, Meng Cao, Leila Pishdad, Yanshuai Cao, Jackie CK Cheung:

Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation. 14306-14333 - Wei Wang, Zhaowei Li, Qi Xu, Yiqing Cai, Hang Song, Qi Qi, Ran Zhou, Zhida Huang, Tao Wang, Li Xiao:

QCRD: Quality-guided Contrastive Rationale Distillation for Large Language Models. 14334-14345 - Junfei Wu, Yue Ding, Guofan Liu, Tianze Xia, Ziyue Huang, Dianbo Sui, Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan:

SHARP: Steering Hallucination in LVLMs via Representation Engineering. 14346-14361 - Tony Woo, Sehun Lee, Kang-Wook Kim, Gunhee Kim:

Think, Verbalize, then Speak: Bridging Complex Thoughts and Comprehensible Speech. 14362-14379 - Safal Shrestha, Minwu Kim, Aadim Nepal, Anubhav Shrestha, Keith W. Ross:

Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings. 14380-14401 - Hao Zheng, Xinyan Guan, Hao Kong, Wenkai Zhang, Jia Zheng, Weixiang Zhou, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun:

PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides. 14402-14418 - Mei Guo, Chen Chen, Chunyan Hou, Yike Wu, Xiaojie Yuan:

SWAM: Adaptive Sliding Window and Memory-Augmented Attention Model for Rumor Detection. 14419-14430 - Xingyu Tan

, Xiaoyang Wang, Qing Liu, Xiwei Xu, Xin Yuan, Liming Zhu, Wenjie Zhang:
HydraRAG: Structured Cross-Source Enhanced Large Language Model Reasoning. 14431-14459 - Zikang Liu, Longteng Guo, Yepeng Tang, Tongtian Yue, Junxian Cai, Kai Ma, Qingbin Liu, Xi Chen, Jing Liu:

VRoPE: Rotary Position Embedding for Video Large Language Models. 14460-14472 - Decheng Duan, Jitong Peng, Yingyi Zhang, Chengzhi Zhang:

SciNLP: A Domain-Specific Benchmark for Full-Text Scientific Entity and Relation Extraction in NLP. 14473-14486 - Jinke Wang, Zenan Ying, Qi Liu, Wei Chen, Tong Xu, Huijun Hou, Zhi Zheng:

Think and Recall: Layer-Level Prompting for Lifelong Model Editing. 14487-14502 - Amirbek Djanibekov, Nurdaulet Mukhituly, Kentaro Inui, Hanan Aldarmaki, Nils Lukas:

SPIRIT: Patching Speech Language Models against Jailbreak Attacks. 14503-14520 - Liangyu Xu, Xuemiao Zhang, Feiyu Duan, Sirui Wang, Rongxiang Weng, Jingang Wang, Xunliang Cai:

FIRE: Flexible Integration of Data Quality Ratings for Effective Pretraining. 14521-14541 - Nitay Calderon, Liat Ein-Dor

, Roi Reichart:
Multi-Domain Explainability of Preferences. 14542-14575 - Shuyun Yang, Yan Zhang, Zhengmao Ye, Lei Duan, Mingjie Tang:

Tuning Less, Prompting More: In-Context Preference Learning Pipeline for Natural Language Transformation. 14576-14587 - Shounak Paul, Dhananjay Ghumare, Pawan Goyal, Saptarshi Ghosh, Ashutosh Modi:

IL-PCSR: Legal Corpus for Prior Case and Statute Retrieval. 14588-14611 - Chaoyue He, Xin Zhou, Yi Wu, Xinjia Yu, Yan Zhang, Lei Zhang, Di Wang

, Shengfei Lyu, Hong Xu, Xiaoqiao Wang, Wei Liu, Chunyan Miao:
ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge. 14612-14653 - Hansi Wang

, Yue Wang, Qiliang Liang, Yang Liu
:
How Sememic Components Can Benefit Link Prediction for Lexico-Semantic Knowledge Graphs? 14654-14673 - Yiwen Jiang, Deval Mehta, Siyuan Yan, Yaling Shen, Zimu Wang, Zongyuan Ge:

WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image Classification. 14674-14685 - Abhinav Joshi, Areeb Ahmad, Ashutosh Modi:

Calibration Across Layers: Understanding Calibration Evolution in LLMs. 14686-14714 - Aida Ramezani, Yang Xu:

The discordance between embedded ethics and cultural inference in large language models. 14715-14736 - Cheng Xu, Nan Yan, Shuhao Guan, Yuke Mei, M. Tahar Kechadi

:
SSA: Semantic Contamination of LLM-Driven Fake News Detection. 14737-14751 - Jingyao Li, Senqiao Yang, Sitong Wu, Han Shi, Chuanyang Zheng, Hong Xu, Jiaya Jia:

Logits-Based Finetuning. 14752-14764 - Jiaqian Li, Qisheng Hu, Jing Li, Wenya Wang

:
STARE at the Structure: Steering ICL Exemplar Selection with Structural Alignment. 14765-14782 - Tao Fan, Guoqiang Ma, Yuanfeng Song, Lixin Fan, Qiang Yang:

PPC-GPT: Federated Task-Specific Compression of Large Language Models via Pruning and Chain-of-Thought Distillation. 14783-14794 - Brian J. Chan, Mao Xun Huang, Jui-Hung Cheng, Chao-Ting Chen, Hen-Hsen Huang:

Efficient Beam Search for Large Language Models Using Trie-Based Decoding. 14795-14807 - Pavan Sai Balaga, Nagasamudram Karthik, Challa Vishwanath, Raksha Sharma, Rudra Murthy, Ashish R. Mittal:

Power doesn't reside in size: A Low Parameter Hybrid Language Model (HLM) for Sentiment Analysis in Code-mixed data. 14808-14816 - David G. Hobson, Derek Ruths, Andrew Piper:

Evaluating Taxonomy Free Character Role Labeling (TF-CRL) in News Stories using Large Language Models. 14817-14839 - Subin Kim, Hoonrae Kim, Jihyun Lee, Yejin Jeon, Gary Lee:

MIRROR: Multimodal Cognitive Reframing Therapy for Rolling with Resistance. 14840-14869 - Bin Deng, Yizhe Feng, Zeming Liu, Qing Wei, Xiangrong Zhu, Shuai Chen, Yuanfang Guo, Yunhong Wang:

RETAIL: Towards Real-world Travel Planning for Large Language Models. 14870-14902 - Tuc Nguyen, Yifan Hu

, Thai Le:
Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification. 14903-14919 - Elle:

Reward Model Perspectives: Whose Opinions Do Reward Models Reward? 14920-14944 - Yu-Chen Lu, Chong-Yan Chen, Chi-Chih Chang, Yu-Fang Hu, Kai-Chiang Wu:

FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference. 14945-14955 - Eshaan Tanwar, Anwoy Chatterjee

, Michael Saxon, Alon Albalak, William Yang Wang, Tanmoy Chakraborty:
Do You Know About My Nation? Investigating Multilingual Language Models' Cultural Literacy Through Factual Knowledge. 14956-14979 - Ang Li, Yiquan Wu, Yinghao Hu, Lizhi Qing, Shihang Wang, Chengyuan Liu, Tao Wu, Adam Jatowt, Ming Cai, Fei Wu, Kun Kuang:

CoEvo: Coevolution of LLM and Retrieval Model for Domain-Specific Information Retrieval. 14980-14999 - Shiyu Li, Yang Tang, Ruijie Liu, Shi-Zhe Chen, Xi Chen:

Conan-Embedding-v2: Training an LLM from Scratch for Text Embeddings. 15000-15016 - Yue Zhang, Tianyi Ma, Zun Wang, Yanyuan Qiao, Parisa Kordjamshidi:

Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs. 15017-15025 - Xiaolong Wang, Zhaolu Kang, Wangyuxuan Zhai, Xinyue Lou, Yunghwei Lai, Ziyue Wang, Yawen Wang, Kaiyu Huang, Yile Wang, Peng Li, Yang Liu:

MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models. 15026-15048 - Chi-Yun Chang, Xueyang Huang, Humaira Nasir, Shane Storks, Olawale Akingbade, Huteng Dai:

Mind the Gap: How BabyLMs Learn Filler-Gap Dependencies. 15049-15065 - Meng Lu, Ruochen Zhang, Carsten Eickhoff, Ellie Pavlick:

Paths Not Taken: Understanding and Mending the Multilingual Factual Recall Pipeline. 15066-15096 - Zsolt T. Kardkovács, Lynda Djennane, Anna Field, Boualem Benatallah, Yacine Gaci, Fabio Casati, Walid Gaaloul:

BTC-SAM: Leveraging LLMs for Generation of Bias Test Cases for Sentiment Analysis Models. 15097-15113 - Chen Han

, Wenzhen Zheng, Xijin Tang:
Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate with Large Language Models. 15114-15129 - Chenjie Ni, Zhepeng Wang, Runxue Bao, Shangqian Gao, Yanfu Zhang:

Controllable Memorization in LLMs via Weight Pruning. 15130-15145 - Poorvi Acharya, J. Elizabeth Liebl, Dhiman Goswami, Kai North, Marcos Zampieri, Antonios Anastasopoulos:

Tracing L1 Interference in English Learner Writing: A Longitudinal Corpus with Error Annotations. 15146-15167 - Lei Yang, Shaoyang Xu, Jianxiang Peng, Shaolin Zhu, Deyi Xiong:

DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor Search. 15168-15182 - Jiayu Yao, Shenghua Liu, Yiwei Wang, Lingrui Mei, Baolong Bi, Yuyao Ge, Zhecheng Li, Xueqi Cheng:

Who is in the Spotlight: The Hidden Bias Undermining Multimodal Retrieval-Augmented Generation. 15183-15193 - Punit Kumar Singh, Nishant Kumar, Akash Ghosh, Kunal Pasad, Khushi Soni, Manisha Jaishwal, Sriparna Saha, Syukron Abu Ishaq Alfarozi, Asres Temam Abagissa, Kitsuchart Pasupa, Haiqin Yang, José G. Moreno:

Let's Play Across Cultures: A Large Multilingual, Multicultural Benchmark for Assessing Language Models' Understanding of Sports. 15194-15241 - Jiaxin Li, Geng Zhao, Xiaoci Zhang:

Multilingual Federated Low-Rank Adaptation for Collaborative Content Anomaly Detection across Multilingual Social Media Participants. 15242-15262 - Arkadeep Acharya, Akash Ghosh, Pradeepika Verma, Kitsuchart Pasupa, Sriparna Saha, Priti Singh:

M3Retrieve: Benchmarking Multimodal Retrieval for Medicine. 15263-15276 - Zengqing Wu, Takayuki Ito:

The Hidden Strength of Disagreement: Unraveling the Consensus-Diversity Tradeoff in Adaptive Multi-Agent Systems. 15277-15297 - Ana Sabina Uban, Liviu P. Dinu, Ioan-Bogdan Iordache, Simona Georgescu, Claudia Vlad:

Friend or Foe? A Computational Investigation of Semantic False Friends across Romance Languages. 15298-15312 - Seorin Kim, Dongyoung Lee

, Jaejin Lee:
KLAAD: Refining Attention Mechanisms to Reduce Societal Bias in Generative Language Models. 15313-15334 - Runfei Chen, Shuyang Jiang, Wei Huang:

SeMob: Semantic Synthesis for Dynamic Urban Mobility Prediction. 15335-15355 - Yize Cheng

, Wenxiao Wang, Mazda Moayeri, Soheil Feizi:
DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors. 15356-15373 - Feng He, Chao Zhang, Zhixue Zhao:

Minimal, Local, and Robust: Embedding-Only Edits for Implicit Bias in T2I Models. 15374-15392 - Dahyun Lee, Jonghyeon Choi, Jiyoung Han, Kunwoo Park:

Journalism-Guided Agentic In-context Learning for News Stance Detection. 15393-15416 - Victor Charpenay, Steven Schockaert:

Less Is MuRE: Revisiting Shallow Knowledge Graph Embeddings. 15417-15443 - Shuangjie Fu, Du Su, Beining Huang, Fei Sun, Jingang Wang, Wei Chen, Huawei Shen, Xueqi Cheng:

Jailbreak LLMs through Internal Stance Manipulation. 15444-15459 - Haoming Huang, Yibo Yan, Jiahao Huo, Xin Zou, Xinfeng Li, Kun Wang, Xuming Hu:

Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis. 15460-15479 - Jun Zhang, Haihong E, Tianyi Hu, Yifan Zhu, Meina Song, Haoran Luo:

Complex Numerical Reasoning with Numerical Semantic Pre-training Framework. 15480-15514 - Sydney Anuyah, Mehedi Mahmud Kaushik, Sri Rama Krishna Reddy Dwarampudi, Rakesh Shiradkar, Arjan Durresi, Sunandan Chakraborty:

Automated Knowledge Graph Construction using Large Language Models and Sentence Complexity Modelling. 15515-15539 - Sadam Al-Azani, Maad Alowaifeer, Alhanoof Alhunief, Ahmed Abdelali:

OntologyRAG-Q: Resource Development and Benchmarking for Retrieval-Augmented Question Answering in Qur'anic Tafsir. 15540-15558 - Allison Lahnala, Charles Welch, David Jurgens, Lucie Flek:

The Practical Impacts of Theoretical Constructs on Empathy Modeling. 15559-15586 - Sashuai Zhou, Weinan Gan, Qijiong Liu, Ke Lei, Jieming Zhu, Hai Huang, Yan Xia, Ruiming Tang, Zhenhua Dong, Zhou Zhao:

RecBase: Generative Foundation Model Pretraining for Zero-Shot Recommendation. 15587-15599 - Amit Gajbhiye, Thomas Bailleux, Zied Bouraoui, Luis Espinosa Anke, Steven Schockaert:

Grouping Entities with Shared Properties using Multi-Facet Prompting and Property Embeddings. 15600-15615 - Kun Zhu, Lizi Liao

, Yuxuan Gu, Lei Huang, Xiaocheng Feng, Bing Qin:
Context-Aware Hierarchical Taxonomy Generation for Scientific Papers via LLM-Guided Multi-Aspect Clustering. 15616-15634 - Dongjun Kim, Gyuho Shim, Yongchan Chun, Minhyuk Kim, Chanjun Park, Heuiseok Lim:

Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks. 15635-15650 - Yuan Chang

, Ziyue Li, Hengyuan Zhang, Yuanbo Kong, Yanru Wu, Hayden Kwok-Hay So, Zhijiang Guo
, Liya Zhu, Ngai Wong:
TreeReview: A Dynamic Tree of Questions Framework for Deep and Efficient LLM-based Scientific Peer Review. 15651-15682 - Yunhui Jang, Jaehyung Kim, Sungsoo Ahn:

Improving Chemical Understanding of LLMs via SMILES Parsing. 15683-15698 - Yiheng Wu, Ningchao Ge, Yanmin Li

, Liwei Qian, Mengna Zhu, Haoyu Yang
, Haiwen Chen, Jibing Wu:
Can Large Language Models Tackle Graph Partitioning? 15699-15719 - Zitao Fang, Guodong Du, Shuyang Yu, Yifei Guo, Yiwei Zhang, Yiyao Cao, Jing Li, Ho-Kin Tang, Sim Kuan Goh:

To See a World in a Spark of Neuron: Disentangling Multi-Task Interference for Training-Free Model Merging. 15720-15740 - Binh Nguyen, Shuju Shi, Ryan Ofman, Thai Le:

What You Read Isn't What You Hear: Linguistic Sensitivity in Deepfake Speech Detection. 15741-15755 - Weiqing Luo, Zhen Tan, Yifan Li, Xinyu Zhao, Kwonjoon Lee, Behzad Dariush, Tianlong Chen:

Task-Aware Resolution Optimization for Visual Large Language Models. 15756-15770 - Yukyung Lee, JoongHoon Kim, Jaehee Kim, Hyowon Cho, Jaewook Kang, Pilsung Kang, Najoung Kim:

CheckEval: A reliable LLM-as-a-Judge framework for evaluating text generation using checklists. 15771-15798 - Lingjun Zhao, Hal Daumé III:

A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations. 15799-15813 - Qihang Ma, Shengyu Li, Jie Tang, Dingkang Yang, Chenshaodong, Yingyi Zhang, Chao Feng, Ran Jiao:

Boosting Multi-modal Keyphrase Prediction with Dynamic Chain-of-Thought in Vision-Language Models. 15814-15827 - Tianhao Niu, Yiming Cui, Baoxin Wang, Xiao Xu, Xin Yao, Qingfu Zhu, Dayong Wu, Shijin Wang, Wanxiang Che:

Chart2Code53: A Large-Scale Diverse and Complex Dataset for Enhancing Chart-to-Code Generation. 15828-15844 - Zheng Xin Yong, Beyza Ermis, Marzieh Fadaee, Stephen H. Bach, Julia Kreutzer:

The State of Multilingual LLM Safety Research: From Measuring The Language Gap To Mitigating It. 15845-15860 - Saket S. Chaturvedi, Gaurav Bagwe, Lan Zhang, Xiaoyong Yuan:

AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt. 15861-15878 - Lanxiao Huang, Daksh Dave, Tyler Cody, Peter A. Beling, Ming Jin:

From Capabilities to Performance: Evaluating Key Functional Properties of LLM Architectures in Penetration Testing. 15879-15905 - Nadir Durrani, Basel Mousi, Fahim Dalvi:

Editing Across Languages: A Survey of Multilingual Knowledge Editing. 15906-15918 - Gaurav Bagwe, Saket S. Chaturvedi, Xiaolong Ma, Xiaoyong Yuan, Kuang-Ching Wang, Lan Zhang:

Your RAG is Unfair: Exposing Fairness Vulnerabilities in Retrieval-Augmented Generation via Backdoor Attacks. 15919-15937 - Harshil Vejendla:

Drift-Adapter: A Practical Approach to Near Zero-Downtime Embedding Model Upgrades in Vector Databases. 15938-15949 - Ya Wu

, Qiang Sheng, Danding Wang, Guang Yang, Yifan Sun, Zhengjia Wang, Yuyan Bu, Juan Cao:
The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas. 15950-15970 - Harshil Vejendla:

SliceMoE: Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer Scaling. 15971-15978 - Heng Zhou, Hejia Geng, Xiangyuan Xue, Li Kang, Yiran Qin, Zhiyong Wang, Zhenfei Yin, Lei Bai:

ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks. 15979-15998 - Weichun Shi, Minghao Liu, Wanting Zhang, Langchen Shi, Fuqi Jia, Feifei Ma, Jian Zhang:

ConstraintLLM: A Neuro-Symbolic Framework for Industrial-Level Constraint Programming. 15999-16019 - Seungwon Lim, Sungwoong Kim, Jihwan Yu, Sungjae Lee, Jiwan Chung, Youngjae Yu:

VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms. 16020-16047 - Navid Madani, Rohini K. Srihari:

ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents. 16048-16065 - Ko-Wei Huang, Yi-Fu Fu, Ching-Yu Tsai, Yu-Chieh Tu, Tzu-ling Cheng, Cheng-Yu Lin, Yi-Ting Yang, Heng-Yi Liu, Keng-Te Liao, Da-Cheng Juan, Shou-De Lin:

Neuron-Level Differentiation of Memorization and Generalization in Large Language Models. 16066-16080 - Zhuoxuan Zhang, Jinhao Duan, Edward Kim, Kaidi Xu:

Sparse Neurons Carry Strong Signals of Question Ambiguity in LLMs. 16081-16099 - Supriti Sinhamahapatra, Jan Niehues:

Do Slides Help? Multi-modal Context for Automatic Transcription of Conference Talks. 16100-16110 - Tianyi Lorena Yan, Robin Jia

:
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries. 16111-16134 - Sahithya Ravi, Gabriel Herbert Sarch, Vibhav Vineet, Andrew D. Wilson, Balasaravanan Thoravi Kumaravel:

Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames. 16135-16150 - Yiru Tang, Kun Zhou, Yingqian Min, Wayne Xin Zhao, Jing Sha, Zhichao Sheng, Shijin Wang:

Enhancing Chain-of-Thought Reasoning via Neuron Activation Differential Analysis. 16151-16159 - Abdullah Hashmat, Muhammad Arham Mirza, Agha Ali Raza:

PakBBQ: A Culturally Adapted Bias Benchmark for QA. 16160-16172 - Sahil Verma, Keegan Hines, Jeff A. Bilmes, Charlotte Siska, Luke Zettlemoyer, Hila Gonen, Chandan Singh:

MULTIGUARD: An Efficient Approach for AI Safety Moderation Across Languages and Modalities. 16173-16187 - Haoran Zhao, Robert D. Hawkins:

Comparing human and LLM politeness strategies in free production. 16188-16216 - Deuksin Kwon, Jiwon Hae, Emma Clift, Daniel Shamsoddini, Jonathan Gratch, Gale M. Lucas:

ASTRA: A Negotiation Agent with Adaptive and Strategic Reasoning via Tool-integrated Action for Dynamic Offer Optimization. 16217-16238 - Nura Aljaafari, Danilo S. Carvalho

, André Freitas:
CARMA: Enhanced Compositionality in LLMs via Advanced Regularisation and Mutual Information Alignment. 16239-16259 - Runjia Zeng, Guangyan Sun, Qifan Wang, Tong Geng, Sohail A. Dianat, Xiaotian Han, Raghuveer Rao, Xueling Zhang, Cheng Han, Lifu Huang, Dongfang Liu:

MEPT: Mixture of Expert Prompt Tuning as a Manifold Mapper. 16260-16280 - Chi Minh Bui, Ngoc Mai Thieu, Van Vinh Nguyen, Jason J. Jung, Khac-Hoai Nam Bui:

KG-CQR: Leveraging Structured Relation Representations in Knowledge Graphs for Contextual Query Retrieval. 16281-16298 - Maithili Joshi, Palash Nandi, Tanmoy Chakraborty:

SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection. 16299-16314 - Xianxuan Long, Yao Fu, Runchao Li, Mu Sheng, Haotian Yu, Xiaotian Han, Pan Li:

When Truthful Representations Flip Under Deceptive Instructions? 16315-16335 - Yuya Asano, Diane J. Litman, Erin Walker:

Can LLMs simulate the same correct solutions to free-response math problems as real students? 16336-16365 - Deuksin Kwon, Kaleen Shrestha, Bin Han, Elena Hayoung Lee, Gale M. Lucas:

Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans. 16366-16380 - Bowen Wang, Haiyuan Wan, Liwen Shi, Chen Yang, Peng He, Yue Ma, Haochen Han, Wenhao Li, Tiao Tan, Yongjian Li, Fangming Liu, Yifan Gong, Sheng Zhang:

RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging. 16381-16395 - Emmy Liu, Amanda Bertsch, Lintang Sutawika, Lindia Tjuatja, Patrick Fernandes, Lara Marinov, Michael Chen, Shreya Singhal, Carolin Lawrence, Aditi Raghunathan, Kiril Gashteovski, Graham Neubig:

Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions. 16396-16427 - Jiarui Liu, Yueqi Song, Yunze Xiao, Mingqian Zheng, Lindia Tjuatja, Jana Schaich Borg, Mona T. Diab, Maarten Sap:

Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics. 16428-16458 - Ziniu Zhang, Zhenshuo Zhang, Dongyue Li, Lu Wang, Jennifer G. Dy, Hongyang R. Zhang:

Linear-Time Demonstration Selection for In-Context Learning via Gradient Estimation. 16459-16477 - Chutong Meng, Philipp Koehn:

Speech Vecalign: an Embedding-based Method for Aligning Parallel Speech Documents. 16478-16494 - Ezgi Basar

, Francesca Padovani, Jaap Jumelet, Arianna Bisazza:
TurBLiMP: A Turkish Benchmark of Linguistic Minimal Pairs. 16495-16510 - Hanjun Luo, Yingbin Jin, Yiran Wang, Xinfeng Li, Tong Shang, Xuecheng Liu, Ruizhe Chen, Kun Wang, Hanan Salam, Qingsong Wen, Zuozhu Liu:

DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition. 16511-16535 - Mossad Helali, Yutai Luo, Tae Jun Ham, Jim Plotts, Ashwin Chaugule, Jichuan Chang, Parthasarathy Ranganathan, Essam Mansour:

Reliable and Cost-Effective Exploratory Data Analysis via Graph-Guided RAG. 16536-16553 - Jaehoon Yun, Jiwoong Sohn, Jungwoo Park, Hyunjae Kim, Xiangru Tang, Daniel Shao, Yonghoe Koo, Minhyeok Ko, Qingyu Chen, Mark Gerstein, Michael Moor, Jaewoo Kang:

Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards. 16554-16571 - Jin Peng Zhou, Sébastien M. R. Arnold, Nan Ding, Kilian Q. Weinberger, Nan Hua, Fei Sha:

Graders Should Cheat: Privileged Information Enables Expert-Level Automated Evaluations. 16572-16590 - Yubin Ge, Salvatore Romeo, Jason Cai, Monica Sunkara, Yi Zhang:

SAMULE: Self-Learning Agents Enhanced by Multi-level Reflection. 16591-16610 - Soyeong Jeong, Jinheon Baek, Sukmin Cho

, Sung Ju Hwang, Jong C. Park:
Database-Augmented Query Representation for Information Retrieval. 16611-16633 - Naama Rivlin-Angert, Guy Mor-Lan:

The Enemy from Within: A Study of Political Delegitimization Discourse in Israeli Political Speech. 16634-16647 - Pedram Zaree, Md Abdullah Al Mamun, Quazi Mishkatul Alam, Yue Dong, Ihsen Alouani, Nael B. Abu-Ghazaleh:

Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment. 16648-16668 - Jianglin Lu, Hailing Wang, Yi Xu, Yizhou Wang, Kuo Yang, Yun Fu:

Representation Potentials of Foundation Models for Multimodal Alignment: A Survey. 16669-16684 - Ziyin Zhang, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Rui Wang, Zhaopeng Tu:

Draft Model Knows When to Stop: Self-Verification Speculative Decoding for Long-Form Generation. 16685-16697 - Balaji Darur, Karan Singla:

Visual-Aware Speech Recognition for Noisy Scenarios. 16698-16706 - Abubakr Mohamed, Hamdy Mubarak:

Advancing Arabic Diacritization: Improved Datasets, Benchmarking, and State-of-the-Art Models. 16707-16719 - Arjun Arunasalam, Madison Pickering, Z. Berkay Celik, Blase Ur:

Implicit Values Embedded in How Humans and LLMs Complete Subjective Everyday Tasks. 16720-16743 - Mahmud Wasif Nafee, Maiqi Jiang

, Haipeng Chen, Yanfu Zhang:
Dynamic Retriever for In-Context Knowledge Editing via Policy Optimization. 16744-16757 - Zhengxiang Wang, Weiling Li, Panagiotis Kaliosis, Owen Rambow, Susan Brennan:

LVLMs are Bad at Overhearing Human Referential Communication. 16758-16782 - Ruida Wang, Yuxin Li, Yi R. Fung, Tong Zhang:

Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability. 16783-16809 - Minhyuk Kim, Seungyoon Lee, Heuiseok Lim:

TORSO: Template-Oriented Reasoning Towards General Tasks. 16810-16818 - Sheshera Mysore, Debarati Das, Hancheng Cao, Bahareh Sarrafzadeh:

Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the Wild. 16819-16846 - Gagan Mundada, Yash Vishe, Amit Namburi, Xin Xu

, Zachary Novack, Julian J. McAuley
, Junda Wu:
WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning. 16847-16863 - Hyukkyu Kang, Injung Kim, Wook-Shin Han:

TRIAL: Token Relations and Importance Aware Late-interaction for Accurate Text Retrieval. 16864-16877 - Jin Jiang, Jianing Wang, Yuchen Yan

, Yang Liu, Jianhua Zhu, Mengdi Zhang, Liangcai Gao:
Do Large Language Models excel in Complex Logical Reasoning with Formal Language? 16878-16903 - Junho Yoo, Youhyun Shin:

Fair or Framed? Political Bias in News Articles Generated by LLMs. 16904-16930 - Sihang Zeng, Kai Tian, Kaiyan Zhang, Yuru Wang, Junqi Gao, Runze Liu, Sa Yang, Jingxuan Li, Xinwei Long, Jiaheng Ma, Biqing Qi, Bowen Zhou:

ReviewRL: Towards Automated Scientific Review with RL. 16931-16943 - Octavian Alexandru Trifan, Jason Lee Weber, Marc Titus Trifan, Alexandru Nicolau, Alexander V. Veidenbaum:

Grammar Pruning: Enabling Low-Latency Zero-Shot Task-Oriented Language Models for Edge AI. 16944-16957 - Terrance Liu, Shuyi Wang, Daniel Preotiuc-Pietro, Yash Chandarana, Chirag Gupta:

Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies. 16958-16982 - Haitian Zhong, Yuhuan Liu, Ziyang Xu

, Guofan Liu, Qiang Liu, Shu Wu, Zhe Zhao, Liang Wang, Tieniu Tan:
REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing. 16983-17000 - Chung-En Sun, Ge Yan, Tsui-Wei Weng:

ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models. 17001-17025 - Meng-Chen Wu, Si-Chi Chin, Tess Wood, Ayush Goyal, Narayanan Sadagopan:

Incorporating Diverse Perspectives in Cultural Alignment: Survey of Evaluation Benchmarks Through A Three-Dimensional Framework. 17026-17061 - Yubo Xie

, Chenkai Wang, Zongyang Ma, Fahui Miao:
Are Large Language Models Chronically Online Surfers? A Dataset for Chinese Internet Meme Explanation. 17062-17083 - Luyang Zhang, Shuaimin Li, Yishuo Li, Kunpeng Kang

, Kaiyuan Zhang
, Cong Wang, Wenpeng Lu:
RoDEval: A Robust Word Sense Disambiguation Evaluation Framework for Large Language Models. 17084-17115 - Mengzhu Liu, Zhengqiu Zhu, Chuan Ai, Chen Gao, Xinghong Li, Lingnan He, Kaisheng Lai, Yingfeng Chen, Xin Lu, Yong Li, Quanjun Yin:

PychoAgent: Psychology-driven LLM Agents for Explainable Panic Prediction on Social Media during Sudden Disaster Events. 17116-17134 - Zezhong Wang, Xingshan Zeng

, Weiwen Liu, Yufei Wang, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong:
Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs' Reasoning. 17135-17148 - Yu Zhang, Zhaoman Zhong, Huihui Lv:

Inter-sentence Context Modeling and Structure-aware Representation Enhancement for Conversational Sentiment Quadruple Extraction. 17149-17159 - Xiaolong Wei, Bo Lu, Xingyu Zhang, Zhejun Zhao, Dongdong Shen, Long Xia, Dawei Yin:

Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards. 17160-17186 - Chenhao Huang, Ziyu Shen, Yicong Ren, Huiyuan Zheng, Jiazheng Zhang, Mingxu Chai, Ming Zhang, Shihan Dou, Fan Mo, Jie Shi, Tao Gui, Qi Zhang, Xuanjing Huang:

Governance in Motion: Co-evolution of Constitutions and AI models for Scalable Safety. 17187-17210 - Yisheng Zhong, Yizhu Wen, Junfeng Guo, Mehran Kafai, Heng Huang, Hanqing Guo, Zhuangdi Zhu:

Web Intellectual Property at Risk: Preventing Unauthorized Real-Time Retrieval by Large Language Models. 17211-17224 - Bofu Dong, Pritesh Shah, Sumedh Sonawane, Tiyasha Banerjee, Erin Brady, Xinya Du, Ming Jiang:

SciEvent: Benchmarking Multi-domain Scientific Event Extraction. 17225-17255 - Sunhao Dai, Zhanshuo Cao, Wenjie Wang, Liang Pang, Jun Xu, See-Kiong Ng, Tat-Seng Chua:

Media Source Matters More Than Content: Unveiling Political Bias in LLM-Generated Citations. 17256-17276 - Can Lin, Zhengwang Jiang, Ling Zheng, Qi Zhao

, Yuhang Zhang, Qi Song, Wangqiu Zhou:
RJE: A Retrieval-Judgment-Exploration Framework for Efficient Knowledge Graph Question Answering with LLMs. 17277-17294 - Taisei Yamamoto, Ryoma Kumon

, Danushka Bollegala
, Hitomi Yanaka:
Bias Mitigation or Cultural Commonsense? Evaluating LLMs with a Japanese Dataset. 17295-17313 - Jane Xing, Tianyi Niu, Shashank Srivastava:

Chameleon LLMs: User Personas Influence Chatbot Personality Shifts. 17314-17332 - Dylan Hutson, Daniel Vennemeyer, Aneesh Deshmukh

, Justin Zhan, Tianyu Jiang
:
GuessingGame: Measuring the Informativeness of Open-Ended Questions in Large Language Models. 17333-17349 - Shang Liu, Yao Lu, Wenji Fang, Jing Wang, Zhiyao Xie:

SynC-LLM: Generation of Large-Scale Synthetic Circuit Code with Hierarchical Language Models. 17350-17365 - Zhiyu Yang, Shuo Wang, Yukun Yan, Yang Deng

:
Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug Errors. 17366-17381 - Libo Zhang, Zhaoning Zhang, Xubaizhou, Rui Li, Zhiliang Tian, Songzhu Mei, Dongsheng Li:

Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference. 17382-17395 - Qidong Wang, Junjie Hu, Ming Jiang:

V-SEAM: Visual Semantic Editing and Attention Modulating for Causal Interpretability of Vision-Language Models. 17396-17420 - Alham Fikri Aji, Trevor Cohn:

LORAXBENCH: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages. 17421-17446 - Jingyan Shen, Jiarui Yao, Rui Yang, Yifan Sun, Feng Luo

, Rui Pan, Tong Zhang, Han Zhao:
MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning. 17447-17463 - Sangoh Lee, Sungho Park, Wook-Shin Han:

SAFE: Schema-Driven Approximate Distance Join for Efficient Knowledge Graph Querying. 17464-17489 - Xiwen Liang, Min Lin, Weiqi Ruan, Rongtao Xu, Yuecheng Liu, Jiaqi Chen, Bingqian Lin, Yuzheng Zhuang, Xiaodan Liang:

Structured Preference Optimization for Vision-Language Long-Horizon Task Planning. 17490-17515 - Jingheng Ye, Shen Wang, Deqing Zou, Yibo Yan, Kun Wang, Hai-Tao Zheng, Ruitong Liu, Zenglin Xu, Irwin King, Philip S. Yu, Qingsong Wen:

Position: LLMs Can be Good Tutors in English Education. 17516-17535 - Haobo Li, Zhaowei Wang, Jiachen Wang, Yueya Wang, Alexis Kai-Hon Lau, Huamin Qu:

CLLMate: A Multimodal Benchmark for Weather and Climate Events Forecasting. 17536-17562 - Zhipeng Chen, Kun Zhou, Liang Song, Wayne Xin Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen:

Extracting and Combining Abilities For Building Multi-lingual Ability-enhanced Large Language Models. 17563-17580 - Pranjal A. Chitale, Bishal Santra, Yashoteja Prabhu, Amit Sharma:

Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for Retrieval. 17581-17617 - Ashutosh Bajpai, Tanmoy Chakraborty:

Temporal Referential Consistency: Do LLMs Favor Sequences Over Absolute Time References? 17618-17636 - Zixin Chen, Hongzhan Lin, Kaixin Li, Ziyang Luo, Yayue Deng, Jing Ma:

MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language Models. 17637-17659 - Yusuke Noro:

Multi-perspective Analysis of Large Language Model Domain Specialization: An Experiment in Accounting Audit Procedures Generation. 17660-17682 - Xingzuo Li, Kehai Chen, Yunfei Long, Xuefeng Bai, Yong Xu, Min Zhang:

Generator-Assistant Stepwise Rollback Framework for Large Language Model Agent. 17683-17700 - Li Sun, Liu He, Shuyue Jia, Yangfan He, Chenyu You:

DocAgent: An Agentic Framework for Multi-Modal Long-Context Document Understanding. 17701-17716 - Xubin Ren, Chao Huang:

EasyRec: Simple yet Effective Language Models for Recommendation. 17717-17732 - Tianshi Zheng, Zheye Deng, Hong Ting Tsang, Weiqi Wang, Jiaxin Bai, Zihao Wang, Yangqiu Song:

From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery. 17733-17750 - Zhen Xiong, Yujun Cai, Zhecheng Li, Yiwei Wang:

Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLMs. 17751-17763 - Shichen Lu, Tongtian Yue, Longteng Guo, Handong Li, Xingjian He, Si Liu, Jing Liu:

ViPE: Visual Perception in Parameter Space for Efficient Video-Language Understanding. 17764-17775 - Hongshen Xu, Zihan Wang, Zichen Zhu, Lei Pan, Xingyu Chen, Shuai Fan, Lu Chen, Kai Yu:

Alignment for Efficient Tool Calling of Large Language Models. 17776-17792 - Jiani Guo, Zuchao Li, Jie Wu, Qianren Wang, Yun Li, Lefei Zhang, Hai Zhao, Yu-Jiu Yang:

ToM: Leveraging Tree-oriented MapReduce for Long-Context Reasoning in Large Language Models. 17793-17812 - Md Ayon Mia

, Akm Moshiur Rahman Mazumder, Khadiza Sultana Sayma, Md Fahim, Md. Tahmid Hasan Fuad
, Muhammad Ibrahim Khan, AKMMahbubur Rahman:
BANMIME : Misogyny Detection with Metaphor Explanation on Bangla Memes. 17813-17839 - Yifan Lan, Yuanpu Cao, Weitong Zhang, Lu Lin, Jinghui Chen:

Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time. 17840-17865 - Ran Xu, Kaixin Ma, Wenhao Yu, Hongming Zhang, Joyce C. Ho, Carl Yang, Dong Yu:

Retrieval-augmented GUI Agents with Generative Guidelines. 17866-17875 - Chun Kang, Zhigu Qian, Zhen Fu, Jiaojiao Fu, Yangfan Zhou:

COAS2W: A Chinese Older-Adults Spoken-to-Written Transformation Corpus with Context Awareness. 17876-17895 - Xin Liu, Lu Wang:

Answer Convergence as a Signal for Early Stopping in Reasoning. 17896-17907 - Xin Liu, Lechen Zhang, Sheza Munir, Yiyang Gu, Lu Wang:

VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts. 17908-17925 - Simone Papicchio

, Luca Cagliero
, Paolo Papotti:
SQUAB: Evaluating LLM robustness to Ambiguous and Unanswerable Questions in Semantic Parsing. 17926-17946 - Auguste Poiroux, Gail Weiss, Viktor Kuncak, Antoine Bosselut:

Reliable Evaluation and Benchmarks for Statement Autoformalization. 17947-17969 - Jen-Tse Huang, Jiantong Qin, Jianping Zhang, Youliang Yuan, Wenxuan Wang, Jieyu Zhao:

VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models. 17970-17993 - Nannan Huang

, Haytham M. Fayek, Xiuzhen Zhang:
Less Is More? Examining Fairness in Pruned Large Language Models for Summarising Opinions. 17994-18018 - Jingyuan Huang, Jen-tse Huang, Ziyi Liu, Xiaoyuan Liu, Wenxuan Wang, Jieyu Zhao:

AI Sees Your Location - But With A Bias Toward The Wealthy World. 18019-18039 - Jinglin Chen, Qiwei Li, Zuchao Li, Baoyuan Qi, Guoming Liu, Haojun Ai, Hai Zhao, Ping Wang:

Faster In-Context Learning for LLMs via N-Gram Trie Speculative Decoding. 18040-18051 - Muhammad Farid Adilazuarda, Chen Cecilia Liu, Iryna Gurevych, Alham Fikri Aji:

From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs. 18052-18079 - Jinwoo Jeon, JunHyeok Oh, Hayeong Lee, Byung-Jun Lee:

Iterative Prompt Refinement for Safer Text-to-Image Generation. 18080-18096 - Peidong Wang, Ming Wang, Zhiming Ma, Xiaocui Yang, Shi Feng, Daling Wang, Yifei Zhang, Kaisong Song:

Language Models as Continuous Self-Evolving Data Engineers. 18097-18116 - Hua Cai, Shuang Zhao, Liang Zhang, Xuli Shen, Qing Xu, Weilin Shen, Zihao Wen, Tianke Ban:

Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference. 18117-18131 - Yunkai Dang, Mengxi Gao, Yibo Yan, Xin Zou, Yanggan Gu, Jungang Li, Jingyu Wang, Peijie Jiang, Aiwei Liu, Jia Liu, Xuming Hu:

Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios. 18132-18173 - Jiaxin Liu, Yixuan Tang

, Yi Yang, Kar Yan Tam:
Evaluating and Aligning Human Economic Risk Preferences in LLMs. 18174-18188 - Mingxuan Xia, Zhijie Jiang, Haobo Wang, Junbo Zhao, Tianlei Hu, Gang Chen:

Ensembling Prompting Strategies for Zero-Shot Hierarchical Text Classification with Large Language Models. 18189-18208 - Eugene Jang, Kimin Lee, Jin-Woo Chung, Keuntae Park, Seungwon Shin:

Improbable Bigrams Expose Vulnerabilities of Incomplete Tokens in Byte-Level Tokenizers. 18209-18216 - Jiwen Zhang, Ya-Qi Yu, Minghui Liao, WenTao Li, Jihao Wu, Zhongyu Wei:

UI-Hawk: Unleashing the Screen Stream Understanding for Mobile GUI Agents. 18217-18236 - Cheryl Lee, Chunqiu Steven Xia, Longji Yang, Jen-tse Huang, Zhouruixin Zhu, Lingming Zhang, Michael R. Lyu:

UniDebugger: Hierarchical Multi-Agent Framework for Unified Software Debugging. 18237-18266 - Ming Li, Nan Zhang, Chenrui Fan, Hong Jiao, Yanbin Fu, Sydney Peters, Qingshu Xu, Robert Lissitz, Tianyi Zhou:

Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory. 18267-18288 - Kaikai An, Fangkai Yang, Liqun Li, Junting Lu, Sitao Cheng, Shuzheng Si, Lu Wang, Pu Zhao, Lele Cao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Baobao Chang:

Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation. 18289-18308 - Gabriele Sarti, Vilém Zouhar, Malvina Nissim, Arianna Bisazza:

Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement. 18309-18326 - Kai Chen, Zihao He, Taiwei Shi, Kristina Lerman:

STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models. 18327-18355 - Marija Sakota, Robert West:

Combining Constrained and Unconstrained Decoding via Boosting: BoostCD and Its Application to Information Extraction. 18356-18371 - Yeliang Xiu, Yongmei Liu:

MultiLogicNMR(er): A Benchmark and Neural-Symbolic Framework for Non-monotonic Reasoning with Multiple Extensions. 18372-18405 - Haijiang Liu

, Qiyuan Li, Chao Gao, Yong Cao, Xiangyu Xu, Xun Wu, Daniel Hershcovich, Jinguang Gu:
Beyond Demographics: Enhancing Cultural Value Survey Simulation with Multi-Stage Personality-Driven Cognitive Reasoning. 18406-18428 - Ruobing Wang, Qiaoyu Tan, Yili Wang, Ying Wang, Xin Wang:

CrystalICL: Enabling In-Context Learning for Crystal Generation. 18429-18444 - Zhuowen Han, Xinwei Wu, Dan Shi, Renren Jin, Deyi Xiong:

Towards a Unified Paradigm of Concept Editing in Large Language Models. 18445-18461 - Kaiyan Chang

, Yonghao Shi, Chenglong Wang, Hang Zhou, Chi Hu, Xiaoqian Liu, Yingfeng Luo, Yuan Ge, Tong Xiao, JingBo Zhu:
Step-level Verifier-guided Hybrid Test-Time Scaling for Large Language Models. 18462-18477 - Junzhuo Li, Bo Wang, Xiuze Zhou, Xuming Hu:

Dynamic Expert Specialization: Towards Catastrophic Forgetting-Free Multi-Domain MoE Adaptation. 18478-18493 - Zhuozhuo Tu, Cheng Chen, Yuxuan Du:

RRInf: Efficient Influence Function Estimation via Ridge Regression for Large Language Models and Text-to-Image Diffusion Models. 18494-18507 - Luisa Geiger, Mareike Hartmann, Michael Sullivan, Alexander Koller:

Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions. 18508-18525 - Zhen Zhang, Yifan Yang, Kai Zhen, Nathan Susanj, Athanasios Mouchtaris, Siegfried Kunzmann, Zheng Zhang:

MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models. 18526-18543 - Michael Sullivan, Mareike Hartmann, Alexander Koller:

Procedural Environment Generation for Tool-Use Agents. 18544-18562 - Yanling Wang, Haoyang Li, Hao Zou, Jing Zhang, Xinlei He, Qi Li, Ke Xu:

FacLens: Transferable Probe for Foreseeing Non-Factuality in Fact-Seeking Question Answering of Large Language Models. 18563-18582 - Bowen Chen, Zhao Wang, Shingo Takamatsu:

OMS: On-the-fly, Multi-Objective, Self-Reflective Ad Keyword Generation via LLM Agent. 18583-18601 - Guangfu Guo, Xiaoqian Lu, Yue Feng:

Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents. 18602-18616 - Asif Hanif, Maha Tufail Agro, Fahad Shamshad, Karthik Nandakumar:

TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models. 18617-18633 - Sourav Das, Kripabandhu Ghosh:

Can LLMs be Literary Companions?: Analysing LLMs on Bengali Figures of Speech Identification. 18634-18656 - Davide Ghilardi, Federico Belotti, Marco Molinari, Tao Ma, Matteo Palmonari:

Group-SAE: Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups. 18657-18677 - Lei Hei, Tingjing Liao, Peiyingxin, Yiyang Qi, Jiaqi Wang, Ruiting Li, Feiliang Ren:

Retrieval over Classification: Integrating Relation Semantics for Multimodal Relation Extraction. 18678-18693 - Zhijun Xu, Siyu Yuan, Yiqiao Zhang, Jingyu Sun, Tong Zheng, Deqing Yang:

PunMemeCN: A Benchmark to Explore Vision-Language Models' Understanding of Chinese Pun Memes. 18694-18710 - Kaikai An, Li Sheng, Ganqu Cui, Shuzheng Si, Ning Ding, Yu Cheng, Baobao Chang:

UltraIF: Advancing Instruction Following from the Wild. 18711-18726 - Hongyi Tang, Zhihao Zhu, Yi Yang:

Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework. 18727-18740 - Boyi Zhang, Zhuo Liu, Hangfeng He

:
TreeRare: Syntax Tree-Guided Retrieval and Reasoning for Knowledge-Intensive Question Answering. 18741-18762 - Jan Fillies, Michael Peter Hoffmann, Rebecca Reichel, Roman Salzwedel, Sven Bodemer, Adrian Paschke:

Mapping Toxic Comments Across Demographics: A Dataset from German Public Broadcasting. 18763-18779 - Danielle Cohen, Yoni Halpern, Noam Kahlon, Joel Oren, Omri Berkovitch, Sapir Caduri, Ido Dagan, Anatoly Efros:

Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition. 18780-18799 - Tamer Ghattas, Michael Hassid, Roy Schwartz:

On Pruning State-Space LLMs. 18800-18814 - Xin Zhang, Guang-Ze Chen, Shuzhen Li, Zhulin Liu, C. L. Philip Chen, Tong Zhang:

An Orthogonal High-Rank Adaptation for Large Language Models. 18815-18833 - WenJie Zhou, Bohan Wang, Wei Chen, Xueqi Cheng:

BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network Training. 18834-18849 - Noy Sternlicht, Ariel Gera, Roy Bar-Haim, Tom Hope, Noam Slonim:

Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation. 18850-18869 - Mengyue Wang, Shuo Chen, Kristian Kersting, Volker Tresp, Yunpu Ma:

METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding. 18870-18884 - Yingqi Fan, Anhao Zhao, Jinlan Fu, Junlong Tong, Hui Su, Yijie Pan, Wei Zhang, Xiaoyu Shen:

VisiPruner: Decoding Discontinuous Cross-Modal Dynamics for Efficient Multimodal LLMs. 18885-18902 - Song Jin, Juntian Zhang, Yuhan Liu, Xun Zhang, Yufei Zhang, Guojun Yin, Fei Jiang, Wei Lin, Rui Yan:

Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems. 18903-18920 - Qin Chen, Yuanyi Ren, Xiaojun Ma, Mugeng Liu, Shi Han, Dongmei Zhang:

SheetDesigner: MLLM-Powered Spreadsheet Layout Generation with Rule-Based and Vision-Based Reflection. 18921-18939 - Amit Giloni, Chiara Picardi, Roy Betser, Shamik Bose, Aishvariya Priya Rathina Sabapathy, Roman Vainshtein:

CAIR: Counterfactual-based Agent Influence Ranker for Agentic AI Workflows. 18940-18966 - Yiming Du, Yifan Xiang, Bin Liang, Dahua Lin, Kam-Fai Wong, Fei Tan:

ReSURE: Regularizing Supervision Unreliability for Multi-turn Dialogue Fine-tuning. 18967-18985 - Yoav Gur-Arieh, Clara Suslik, Yihuai Hong, Fazl Barez, Mor Geva:

Precise In-Parameter Concept Erasure in Large Language Models. 18986-19006 - Jianfei Ma

, Zhaoxin Feng
, Emmanuele Chersoni, Huacheng Song, Ziqi Zhang:
PhonoThink: Improving Large Language Models' Reasoning on Chinese Phonological Ambiguities. 19007-19022 - Jimin Lee, Ingeol Baek, Byeongjeong Kim

, Hyunkyung Bae, Hwanhee Lee:
SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL. 19023-19035 - Sijia Yao, Pengcheng Huang, Zhenghao Liu, Yu Gu, Yukun Yan, Shi Yu, Ge Yu:

ExpandR: Teaching Dense Retrievers Beyond Queries with LLM Guidance. 19036-19054 - Alejandro Cuevas, Saloni Dash, Bharat Kumar Nayak, Dan Vann, Madeleine I. G. Daepp:

Anecdoctoring: Automated Red-Teaming Across Language and Place. 19055-19074 - Salma Kharrat, Fares Fourati

, Marco Canini
:
ACING: Actor-Critic for Instruction Learning in Black-Box LLMs. 19075-19102 - Sourabrata Mukherjee, Atharva Mehta, Sougata Saha, Akhil Arora, Monojit Choudhury:

Women, Infamous, and Exotic Beings: A Comparative Study of Honorific Usages in Wikipedia and LLMs for Bengali and Hindi. 19103-19126 - Hanyin Wang, Chufan Gao, Qiping Xu, Bolun Liu, Guleid Hussein, Hariprasad Reddy Korsapati, Mohamad El Labban, Kingsley Iheasirim, Mohamed Hassan, Gokhan Anil, Brian Bartlett, Jimeng Sun:

Process-Supervised Reward Models for Verifying Clinical Note Generation: A Scalable Approach Guided by Domain Expertise. 19127-19147 - Zejiang He

, Jingyuan Huang, Menglong Lu, Zhen Huang, Shanshan Liu, Zhiliang Tian, Dong Sheng Li:
GCML: Gradient Coherence Guided Meta-Learning for Cross-Domain Emerging Topic Rumor Detection. 19148-19162 - Neh Majmudar, Elena Filatova:

Can LLMs Generate and Solve Linguistic Olympiad Puzzles? 19163-19200 - Zihan Liao, Jun Wang, Hang Yu, Lingxiao Wei, Jianguo Li, Jun Wang, Wei Zhang:

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning. 19201-19230 - Zhihui Chen, Kai He, Yucheng Huang, Yunxiao Zhu, Mengling Feng:

DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized Domains. 19231-19253 - Qingkai Min, Zitian Qu, Qipeng Guo, Xiangkun Hu, Zheng Zhang, Yue Zhang:

Multi-Document Event Extraction Using Large and Small Language Models. 19254-19285 - Zike Yuan, Ming Liu, Hui Wang, Bing Qin:

MA-GTS: A Multi-Agent Framework for Solving Complex Graph Problems in Real-World Applications. 19286-19304 - Weiqiao Shan, Yuang Li, Yuhao Zhang, Yingfeng Luo, Chen Xu, Xiaofeng Zhao, Long Meng, Yunfei Lu, Min Zhang, Hao Yang, Tong Xiao, JingBo Zhu:

Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders. 19305-19320 - Runze Li

, Siyu Wu, Jun Wang, Wei Zhang:
CIKT: A Collaborative and Iterative Knowledge Tracing Framework with Large Language Models. 19321-19334 - Chenlin Liu, Minghui Fang, Patrick Zhang, Wei Zhou, Jie Gao, Jiqing Han:

Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets. 19335-19353 - Yuyang Wu, Jinhui Ye, Shuhao Zhang, Lu Dai

, Yonatan Bisk, Olexandr Isayev:
MolErr2Fix: Benchmarking LLM Trustworthiness in Chemistry via Modular Error Detection, Localization, Explanation, and Correction. 19354-19371 - Xiaoyu Luo

, Yiyi Chen
, Johannes Bjerva
, Qiongxiu Li
:
Shared Path: Unraveling Memorization in Multilingual LLMs through Language Similarities. 19372-19388 - Chaojun Nie

, Jun Zhou, Guanxiang Wang, Shisong Wu, Zichen Wang
:
Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation. 19389-19406 - Jian Zhang, Junyi Guo, Junyi Yuan, Huanda Lu, Yanlin Zhou, Fangyu Wu, Qiufeng Wang, Dongming Lu:

LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval. 19407-19417 - Nicholas Deas, Kathleen McKeown:

Artificial Impressions: Evaluating Large Language Model Behavior Through the Lens of Trait Impressions. 19418-19444 - Jiulong Wu, Zhengliang Shi, Shuaiqiang Wang, Jizhou Huang, Dawei Yin, Lingyong Yan, Min Cao, Min Zhang:

Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization. 19445-19461 - Hongxin Ding, Yue Fang, Runchuan Zhu, Xinke Jiang, Jinyang Zhang, Yongxin Xu, Weibin Liao, Xu Chu, Junfeng Zhao, Yasha Wang:

3DS: Medical Domain Adaptation of LLMs via Decomposed Difficulty-based Data Selection. 19462-19484 - Kirolos Ataallah, Eslam Mohamed Bakr, Mahmoud Ahmed, Chenhui Gou, Khushbu Pahwa, Jian Ding, Mohamed Elhoseiny

:
InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows. 19485-19512 - Yihuai Hong, Lei Yu, Haiqin Yang, Shauli Ravfogel, Mor Geva:

Intrinsic Test of Unlearning Using Parametric Knowledge Traces. 19513-19535 - Nikhil Bhendawade, Irina Belousova, Qichen Fu, Henry Mason, Antonie Lin, Mohammad Rastegari, Mahyar Najibi:

Speculative Streaming: Efficient and Scalable Speculative Decoding with Multi-Stream Attention. 19536-19559 - Yujie Wang, Yunwei Zhao, Jing Yang, Han Han, Shiguang Shan, Jie Zhang:

Evaluating Cognitive-Behavioral Fixation via Multimodal User Viewing Patterns on Social Media. 19560-19572 - Mario Sanz-Guerrero, Minh Duc Bui, Katharina von der Wense:

Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs. 19573-19583 - Yuhao Wang, Heyang Liu, Ziyang Cheng, Ronghua Wu, Qunshan Gu, Yanfeng Wang, Yu Wang:

VocalNet: Speech LLMs with Multi-Token Prediction for Faster and High-Quality Generation. 19584-19601 - Yuyi Huang, Runzhe Zhan, Lidia S. Chao, Ailin Tao, Derek F. Wong:

Path Drift in Large Reasoning Models: How First-Person Commitments Override Safety. 19602-19616 - Jiaxuan Zhao, Naibin Gu, Yuchen Feng, Xiyu Liu, Peng Fu, Zheng Lin, Weiping Wang:

CBP-Tuning: Efficient Local Customization for Black-box Large Language Models. 19617-19630 - Ahmed Karim, Qiao Wang, Zheng Yuan:

Beyond the Score: Uncertainty-Calibrated LLMs for Automated Essay Assessment. 19631-19636 - Georgios Chochlakis, Peter Wu, Arjun Bedi, Marcus Ma, Kristina Lerman, Shrikanth Narayanan:

Humans Hallucinate Too: Language Models Identify and Correct Subjective Annotation Errors With Label-in-a-Haystack Prompts. 19637-19656 - Sofia Jamil, Kotla Sai Charan, Sriparna Saha, Koustava Goswami, K. J. Joseph:

Do It Yourself (DIY): Modifying Images for Poems in a Zero-Shot Setting Using Weighted Prompt Manipulation. 19657-19665 - Haozhe Zhao, Shuzheng Si, Liang Chen, Yichi Zhang, Maosong Sun, Baobao Chang, Minjia Zhang:

Looking Beyond Text: Reducing Language Bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance. 19666-19690 - Lubna Zahan Lamia, Mabsur Fatin Bin Hossain, Md. Mosaddek Khan:

Who Holds the Pen? Caricature and Perspective in LLM Retellings of History. 19691-19710 - Minxuan Lv, Zhenpeng Su, Leiyu Pan, Yizhe Xiong, Zijia Lin, Hui Chen, Wei Zhou, Jungong Han, Guiguang Ding, Wenwu Ou, Di Zhang, Kun Gai, Songlin Hu:

DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs. 19711-19722 - Peilin Wu, Mian Zhang, Xinlu Zhang, Xinya Du, Zhiyu Chen:

Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty. 19723-19734 - Francesca Padovani, Jaap Jumelet, Yevgen Matusevych

, Arianna Bisazza:
Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models. 19735-19756 - Nicolas Audinet de Pieuchon, Adel Daoud, Connor Thomas Jerzak, Moa Johansson, Richard Johansson:

Benchmarking Debiasing Methods for LLM-based Parameter Estimates. 19757-19772 - Jaisidh Singh, Diganta Misra, Boris Knyazev, Antonio Orvieto:

(Almost) Free Modality Stitching of Foundation Models. 19773-19789 - Tingqiao Xu, Ziru Zeng, Jiayu Chen:

VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal Data. 19790-19809 - Rushi Wang, Jiateng Liu, Cheng Qian, Yifan Shen, Yanzhou Pan, Zhaozhuo Xu, Ahmed Abbasi, Heng Ji, Denghui Zhang:

Rescorla-Wagner Steering of LLMs for Undesired Behaviors over Disproportionate Inappropriate Context. 19810-19845 - Zhengkang Zhang, Zhongqing Wang, Guodong Zhou:

Exploring Artificial Image Generation for Stance Detection. 19846-19861 - Jonathan Pofcher, Christopher M. Homan, Randall Sell, Ashiqur R. KhudaBukhsh:

Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope Speech. 19862-19888 - Andong Hua, Kenan Tang, Chenhe Gu, Jindong Gu, Eric Wong, Yao Qin:

Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLMs. 19889-19899 - Wonbin Kweon, SeongKu Kang, Runchu Tian, Pengcheng Jiang, Jiawei Han, Hwanjo Yu:

Topic Coverage-based Demonstration Retrieval for In-Context Learning. 19900-19912 - Linlu Qiu, Cedegao E. Zhang, Joshua B. Tenenbaum, Yoon Kim, Roger P. Levy:

On the Same Wavelength? Evaluating Pragmatic Reasoning in Language Models across Broad Concepts. 19913-19935 - Ali Sarosh Bangash, Krish Veera, Ishfat Abrar Islam, Raiyan Abdul Baten:

MuseScorer: Idea Originality Scoring At Scale. 19936-19954 - João Fonseca, Andrew Bell, Julia Stoyanovich:

SAFENUDGE: Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs. 19955-19969 - Debrup Das

, Seán Ó Nualláin, Razieh Rahimi:
RaDeR: Reasoning-aware Dense Retrieval Models. 19970-19997 - Bhuiyan Sanjid Shafique, Ashmal Vayani, Muhammad Maaz, Hanoona Abdul Rasheed, Dinura Dissanayake, Mohammed Irfan Kurpath, Yahya Hmaiti, Go Inoue, Jean Lahoud, Md. Safirur Rashid, Shadid Intisar Quasem, Maheen Fatima, Franco Vidal, Mykola Maslych, Ketan Pravin More, Sanoojan Baliah, Hasindri Watawana, Yuhao Li, Fabian Farestam, Leon Schaller, Roman Tymtsiv, Simon Weber, Hisham Cholakkal, Ivan Laptev, Shin'ichi Satoh, Michael Felsberg, Mubarak Shah, Salman H. Khan, Fahad Shahbaz Khan:

A Culturally-diverse Multilingual Multimodal Video Benchmark & Model. 19998-20022 - Faramarz Farhangian, Leandro Augusto Ensina, George D. C. Cavalcanti, Rafael M. O. Cruz:

DRES: Fake news detection by dynamic representation and ensemble selection. 20023-20041 - Rashin Rahnamoun, Mehrnoush Shamsfard:

A Graph-Theoretical Framework for Analyzing the Behavior of Causal Language Models. 20042-20073 - Ziqi Zhang, Ali Shahin Shamsabadi, Hanxiao Lu, Yifeng Cai, Hamed Haddadi:

Membership and Memorization in LLM Knowledge Distillation. 20074-20084 - Masahiro Kaneko, Alham Fikri Aji, Timothy Baldwin:

Balanced Multi-Factor In-Context Learning for Multilingual Large Language Models. 20085-20104 - Chihiro Taguchi, Seiji Maekawa, Nikita Bhutani:

Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-k. 20105-20130 - Chihiro Taguchi, Seng Mai, Keita Kurabe, Yusuke Sakai, Georgina Agyei, Soudabeh Eslami, David Chiang:

Languages Still Left Behind: Toward a Better Multilingual Machine Translation Benchmark. 20131-20143 - César Guerra-Solano

, Zhuochun Li, Xiang Lorraine Li:
Think Globally, Group Locally: Evaluating LLMs Using Multi-Lingual Word Grouping Games. 20144-20165 - Renjie Pi, Kehao Miao, Li Peihang, Runtao Liu, Jiahui Gao, Jipeng Zhang, Xiaofang Zhou:

Pointing to a Llama and Call it a Camel: On the Sycophancy of Multimodal Large Language Models. 20166-20180 - Renjie Pi, Haoping Bai, Qibin Chen, Xiaoming Simon Wang, Jiulong Shan, Xiaojiang Liu, Meng Cao:

MR. Judge: Multimodal Reasoner as a Judge. 20181-20205 - Lei Gao, Amir Ziashahabi, Yue Niu, Salman Avestimehr, Murali Annavaram:

MobiZO: Enabling Efficient LLM Fine-Tuning at the Edge via Inference Engines. 20206-20223 - Wafa Al Ghallabi, Ritesh Thawkar, Sara Ghaboura, Ketan Pravin More, Omkar Thawakar, Hisham Cholakkal, Salman Khan, Rao Muhammad Anwer:

Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs. 20224-20244 - Joshua Ong Jun Leang, Aryo Pradipta Gema, Shay B. Cohen:

CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning. 20245-20274 - Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel J. Candès, Tatsunori Hashimoto:

s1: Simple test-time scaling. 20275-20321 - Mohammed Fayiz Parappan, Ricardo Henao:

Learning Subjective Label Distributions via Sociocultural Descriptors. 20322-20338 - Gaoxiang Luo

, Aryan Deshwal:
COM-BOM: Bayesian Exemplar Search for Efficiently Exploring the Accuracy-Calibration Pareto Frontier. 20339-20352 - Yohei Seki, Hakusen Shu, Anaïs Lhuissier, Hanwool Lee, Juyeon Kang, Min-Yuh Day, Chung-Chi Chen:

ML-Promise: A Multilingual Dataset for Corporate Promise Verification. 20353-20366 - Vera Neplenbroek, Arianna Bisazza, Raquel Fernández:

Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization. 20367-20400 - Yen-Ju Lu, Thomas Thebaud, Laureano Moro-Velázquez, Najim Dehak, Jesús Villalba:

Paired by the Teacher: Turning Unpaired Data into High-Fidelity Pairs for Low-Resource Text Generation. 20401-20423 - Di Wu, Seth Aycock

, Christof Monz:
Please Translate Again: Two Simple Experiments on Whether Human-Like Reasoning Helps Translation. 20424-20440 - Ingeol Baek, Hwan Chang, Sunghyun Ryu, Hwanhee Lee:

How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads. 20441-20453 - Lucas Resck, Isabelle Augenstein

, Anna Korhonen:
Explainability and Interpretability of Multilingual Large Language Models: A Survey. 20454-20486 - Youngwoo Kim, Himanshu Beniwal, Steven L. Johnson, Thomas Hartvigsen:

Decoding the Rule Book: Extracting Hidden Moderation Criteria from Reddit Communities. 20487-20498 - Vatsal Malaviya, Agneet Chatterjee, Maitreya Patel, Yezhou Yang, Chitta Baral:

AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models. 20499-20516 - Wafa Aissa, Thibault Bañeras-Roux, Elodie Vanzeveren, Lingyun Gao, Rodrigo Wilkens, Thomas François:

Assessing French Readability for Adults with Low Literacy: A Global and Local Perspective. 20517-20539 - Joohyung Yun, Doyup Lee, Wook-Shin Han:

LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval. 20540-20559 - Tanmay Parekh, Kartik Mehta, Ninareh Mehrabi, Kai-Wei Chang, Nanyun Peng:

DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning. 20560-20582 - Tanmay Parekh, Yuxuan Dong, Lucas Bandarkar, Artin Kim, I-Hung Hsu, Kai-Wei Chang, Nanyun Peng:

SNaRe: Domain-aware Data Generation for Low-Resource Event Detection. 20583-20604 - Zheyuan Yang, Lyuhao Chen, Arman Cohan, Yilun Zhao:

Table-R1: Inference-Time Scaling for Table Reasoning Tasks. 20605-20624 - Tingyu Song

, Yilun Zhao, Siyue Zhang, Chen Zhao, Arman Cohan:
LimRank: Less is More for Reasoning-Intensive Information Reranking. 20625-20639 - Mihir Parmar, Xin Liu, Palash Goyal, Yanfei Chen, Long T. Le, Swaroop Mishra, Hossein Mobahi, Jindong Gu, Zifeng Wang, Hootan Nakhost, Chitta Baral, Chen-Yu Lee, Tomas Pfister, Hamid Palangi:

PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving. 20640-20666 - Shubham Gandhi, Atharva Naik, Yiqing Xie, Carolyn P. Rosé:

An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation. 20667-20686 - Anton Lavrouk, Tarek Naous, Alan Ritter, Wei Xu:

What are Foundation Models Cooking in the Post-Soviet World? 20687-20709 - Tianshi Zheng, Cheng Jiayang, Chunyang Li, Haochen Shi, Zihao Wang, Jiaxin Bai, Yangqiu Song, Ginny Y. Wong, Simon See:

LogiDynamics: Unraveling the Dynamics of Inductive, Abductive and Deductive Logical Inferences in LLM Reasoning. 20710-20731 - Han Liu, Ruoyao Wen, Srijith Nair, Jia Liu, Wenjing Lou, Chongjie Zhang, William Yeoh, Yevgeniy Vorobeychik, Ning Zhang:

EcoLoRA: Communication-Efficient Federated Fine-Tuning of Large Language Models. 20732-20746 - Boxiang Ma

, Ru Li, Yuanlong Wang, Hongye Tan, Xiaoli Li:
Memorization ≠ Understanding: Do Large Language Models Have the Ability of Scenario Cognition? 20747-20763 - Hong Zhang, Feng Zhao, Ruilin Zhao, Cheng Yan, Kangzheng Liu:

Priority on High-Quality: Selecting Instruction Data via Consistency Verification of Noise Injection. 20764-20776 - Xin Gao, Ruiyi Zhang, Daniel Du, Saurabh Mahindre, Sai Ashish Somayajula, Pengtao Xie:

Can Prompts Rewind Time for LLMs? Evaluating the Effectiveness of Prompted Knowledge Cutoffs. 20777-20788 - YiQiu Guo, Yuchen Yang, Zhe Chen, Pingjie Wang, Yusheng Liao, Ya Zhang, Yanfeng Wang, Yu Wang:

DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models. 20789-20808 - Hyeonseok Moon, Seongtae Hong, Jaehyung Seo, Heuiseok Lim:

Metric Calculating Benchmark: Code-Verifiable Complicate Instruction Following Benchmark for Large Language Models. 20809-20823 - Yuanchang Luo, Daimeng Wei, Shaojun Li, Hengchao Shang, Jiaxin Guo, Zongyao Li, Zhanglin Wu, Xiaoyu Chen, Zhiqiang Rao, Jinlong Yang, Hao Yang:

Generative Annotation for ASR Named Entity Correction. 20824-20835 - Younghun Lee, Dan Goldwasser:

SOLAR: Towards Characterizing Subjectivity of Individuals through Modeling Value Conflicts and Trade-offs. 20836-20851 - Kang He, Kaushik Roy:

LogicTree: Structured Proof Exploration for Coherent and Rigorous Logical Reasoning with Large Language Models. 20852-20881 - Michiharu Yamashita, Thanh Tran, Delvin Ce Zhang, Dongwon Lee:

Unmasking Fake Careers: Detecting Machine-Generated Career Trajectories via Multi-layer Heterogeneous Graphs. 20882-20897 - Zhihua Ban, Haotian Ma, Siheng Zhang, Shengyu Liu, Xichen Chen, Ming Yang:

GAP: a Global Adaptive Pruning Method for Large Language Models. 20898-20903 - Haojin Wang

, Zining Zhu, Freda Shi:
Distribution Prompting: Understanding the Expressivity of Language Models Through the Next-Token Distributions They Can Produce. 20904-20917 - Feng Zhao, Ruoyu Chai, Kangzheng Liu, Xianggan Liu:

LGA: LLM-GNN Aggregation for Temporal Evolution Attribute Graph Prediction. 20918-20929 - Tao Zou

, Xinghua Zhang, Haiyang Yu, Minzheng Wang, Fei Huang, Yongbin Li:
EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models. 20930-20953 - Kazem Faghih, Wenxiao Wang, Yize Cheng

, Siddhant Bharti, Gaurang Sriramanan, Sriram Balasubramanian, Parsa Hosseini, Soheil Feizi:
Tool Preferences in Agentic LLMs are Unreliable. 20954-20969 - Yu Liu, Yanan Cao, Xixun Lin, Yanmin Shang, Shi Wang, Shirui Pan:

Enhancing Large Language Model for Knowledge Graph Completion via Structure-Aware Alignment-Tuning. 20970-20984 - Joongmin Shin

, Chanjun Park, Jeongbae Park, Jaehyung Seo, Heuiseok Lim:
MultiDocFusion : Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents. 20985-21004 - Qiang Liu, Xinlong Chen, Yue Ding, Bowen Song, Weiqiang Wang, Shu Wu, Liang Wang:

Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models. 21005-21021 - Huy Nghiem, Phuong-Anh Nguyen-Le, John Prindle, Rachel Rudinger

, Hal Daumé III:
'Rich Dad, Poor Lad': How do Large Language Models Contextualize Socioeconomic Factors in College Admission ? 21022-21056 - Licheng Pan, Yongqi Tong, Xin Zhang, Xiaolu Zhang, Jun Zhou, Zhixuan Chu:

Understanding and Mitigating Overrefusal in LLMs from an Unveiling Perspective of Safety Decision Boundary. 21057-21075 - Xinpan Yuan, Mingzhu Huang, Liujie Hua, Jianuo Ju, Xu Zhang:

MMAG: Multimodal Learning for Mucus Anomaly Grading in Nasal Endoscopy via Semantic Attribute Prompting. 21076-21086 - Linyao Yang, Jian-Tao Huang, Yafei Lu, Zhenhui Jessie Li, Guirong Xue:

The Emperor's New Reasoning: Format Imitation Overshadows Genuine Mathematical Understanding in SFT. 21087-21100 - Lang Cao, Yingtian Zou, Chao Peng, Renhong Chen, Wu Ning, Yitong Li:

Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning. 21101-21118 - Cai Ke, Yiming Du, Bin Liang, Yifan Xiang, Lin Gui, Zhongyang Li, Baojun Wang, Yue Yu, Hui Wang, Kam-Fai Wong, Ruifeng Xu:

Flexibly Utilize Memory for Long-Term Conversation via a Fragment-then-Compose Framework. 21119-21136 - Tianyu Zhang, Xinyu Wang, Lu Li, Zhenghan Tai, Jijun Chi, Jingrui Tian, Hailin He, Suyuchen Wang:

STRICT: Stress-Test of Rendering Image Containing Text. 21137-21150 - Chung-Nan Tsai, Xin Wang, Cheng-Hsiung Lee, Ching-Sheng Lin:

A Sequential Multi-Stage Approach for Code Vulnerability Detection via Confidence- and Collaboration-based Decision Making. 21151-21157 - Zhaoyi Joey Hou, Adriana Kovashka, Xiang Lorraine Li:

Leveraging Large Models to Evaluate Novel Content: A Case Study on Advertisement Creativity. 21158-21177 - Wenjie Hua, Hoang H. Nguyen, Gangyan Ge:

BIRD: Bronze Inscription Restoration and Dating. 21178-21190 - Lei Jiang, Zixun Zhang, Yuting Zeng, Chunzhao Xie, Tongxuan Liu

, Zhen Li, Lechao Cheng, Xiaohua Xu:
DCP: Dual-Cue Pruning for Efficient Large Vision-Language Models. 21191-21204 - Suyuchen Wang, Jinlin Wang, Xinyu Wang, Shiqi Li, Xiangru Tang, Sirui Hong, Xiao-Wen Chang, Chenglin Wu, Bang Liu:

Improving Context Fidelity via Native Retrieval-Augmented Reasoning. 21205-21218 - Shehzeen Samarah Hussain, Paarth Neekhara, Xuesong Yang, Edresson Casanova, Subhankar Ghosh, Roy Fejgin, Mikyas T. Desta, Rafael Valle, Jason Li:

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance. 21219-21234 - Soumya Sanyal, Tianyi Xiao, Xiang Ren:

Mixing Inference-time Experts for Enhancing LLM Reasoning. 21235-21249 - Xubo Qin, Jun Bai, Jiaqi Li, Zixia Jia, Zilong Zheng:

Reinforced Query Reasoners for Reasoning-intensive Retrieval Tasks. 21250-21263 - Wei Wu, Zhuoshi Pan, Kun Fu, Chao Wang, Liyi Chen, Yunchu Bai, Tianfu Wang, Zheng Wang, Hui Xiong:

TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection. 21264-21281 - Siyu Yan, Long Zeng, Xuecheng Wu, Chengcheng Han, Kongcheng Zhang, Chong Peng, Xuezhi Cao, Xunliang Cai, Chenjuan Guo:

MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models. 21282-21303 - Sen Yang, Yu Bao, Yu Lu, Jiajun Chen, Shujian Huang, Shanbo Cheng:

EnAnchored-X2X: English-Anchored Optimization for Many-to-Many Translation. 21304-21317 - Jianshuo Dong, Yutong Zhang, Liu Yan, Zhenyu Zhong, Tao Wei, Ke Xu, Minlie Huang, Chao Zhang, Han Qiu:

"I've Decided to Leak": Probing Internals Behind Prompt Leakage Intents. 21318-21348 - Yi Han, Yuanxing Liu, Weinan Zhang, Ting Liu:

Nullspace Disentanglement for Red Teaming Language Models. 21349-21365 - Sijie Mai, Shiqin Han, Haifeng Hu:

Supervised Attention Mechanism for Low-quality Multimodal Data. 21366-21386 - Huaisheng Zhu, Siyuan Xu, Hangfan Zhang, Teng Xiao, Zhimeng Guo, Shijie Zhou

, Shuyue Hu, Vasant G. Honavar:
Reinforcement Learning for Large Language Models via Group Preference Reward Shaping. 21387-21400 - Dhananjaya Gowda, Seoha Song, Harshith Goka, Junhyun Lee:

zFLoRA: Zero-Latency Fused Low-Rank Adapters. 21401-21418 - Mihir Parmar, Palash Goyal, Xin Liu, Yiwen Song, Mingyang Ling, Chitta Baral, Hamid Palangi, Tomas Pfister:

PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving. 21419-21433 - Jinsung Kim, Seonmin Koo, Heuiseok Lim:

Semantic Inversion, Identical Replies: Revisiting Negation Blindness in Large Language Models. 21434-21471 - Hyuk Namgoong, Jeesu Jung, Hyeonseok Kang, Yohan Lee, Sangkeun Jung:

AMACE: Automatic Multi-Agent Chart Evolution for Iteratively Tailored Chart Generation. 21472-21487 - Jianguo Zhang, Thai Hoang, Ming Zhu, Zuxin Liu, Shiyu Wang, Tulika Awalgaonkar, Akshara Prabhakar, Haolin Chen, Weiran Yao, Zhiwei Liu, Juntao Tan, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong:

ActionStudio: A Lightweight Framework for Data and Training of Large Action Models. 21488-21502 - Seongmin Lee, Aeree Cho, Grace C. Kim, Shengyun Peng, Mansi Phute, Duen Horng Chau:

Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety. 21503-21534 - Sohee Kim, Soohyun Ryu, Joonhyung Park, Eunho Yang:

Unveiling the Response of Large Vision-Language Models to Visually Absent Tokens. 21535-21557 - Abhinav Arabelly, Jagrut Nemade, Robert D. Nowak, Jifan Zhang:

Improving Task Diversity in Label Efficient Supervised Finetuning of LLMs. 21558-21570 - Xing Fu

, Haozhen Li
, Bichen Wang, Hao Yang, Yanyan Zhao, Bing Qin:
Look Beyond Feeling: Unveiling Latent Needs from Implicit Expressions for Proactive Emotional Support. 21571-21598 - Pengcheng Jiang, Xueqiang Xu, Jiacheng Lin, Jinfeng Xiao, Zifeng Wang, Jimeng Sun, Jiawei Han:

s3: You Don't Need That Much Data to Train a Search Agent via RL. 21599-21617 - Fanqi Wan, Longguang Zhong, Ziyi Yang, Ruijun Chen, Xiaojun Quan:

FuseChat: Knowledge Fusion of Chat Models. 21618-21642 - Yukun Zhang, Xueqing Zhou:

Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers. 21643-21663 - Nurit Cohen-Inger, Yehonatan Elisha, Bracha Shapira, Lior Rokach, Seffi Cohen:

Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon. 21664-21677 - Jisu Kim, Youngwoo Shin, Uiji Hwang, Jihun Choi, Richeng Xuan, Taeuk Kim

:
Memorization or Reasoning? Exploring the Idiom Understanding of LLMs. 21678-21699 - Haihua Xie, Yinzhu Cheng, Yaqing Wang, Miao He, Mingming Sun:

RD-MCSA: A Multi-Class Sentiment Analysis Approach Integrating In-Context Classification Rationales and Demonstrations. 21700-21723 - Heekyung Lee, Jiaxin Ge, Tsung-Han Wu, Minwoo Kang, Trevor Darrell, David M. Chan:

Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint. 21724-21737 - Gihun Cho

, Seunghyun Jang, Hanbin Ko, Inhyeok Baek, Chang Min Park:
CREPE: Rapid Chest X-ray Report Evaluation by Predicting Multi-category Error Counts. 21738-21755 - Jihee Kim, Subeen Park, Hakyung Lee, YongTaek Lim, Hyo-won Suh, Kyungwoo Song:

TIDES: Technical Information Discovery and Extraction System. 21756-21772 - Wenxuan Wang, Juluan Shi, Zixuan Ling, Yuk-Kit Chan, Chaozheng Wang, Cheryl Lee, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu:

Learning to Ask: When LLM Agents Meet Unclear Instruction. 21773-21784 - Yuchi Wang, Yishuo Cai, Shuhuai Ren, Sihan Yang, Linli Yao, Yuanxin Liu, Yuanxing Zhang, Pengfei Wan, Xu Sun:

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction. 21785-21804 - Xuhui Zheng, Kang An, Ziliang Wang, Yuhang Wang, Yichao Wu:

StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization. 21805-21830 - Yanshuo Wang, Yanghao Zhou, Yukang Lin, Haoxing Chen, Jin Zhang, Wentao Zhu, Jie Hong, Xuesong Li:

Dynamic Model-Bank Test-Time Adaptation for Automatic Speech Recognition. 21831-21841 - Wei Huang, Anda Cheng, Yinggui Wang:

Mitigating Catastrophic Forgetting in Large Language Models with Forgetting-aware Pruning. 21842-21856 - Hwiyeong Lee, Uiji Hwang, Hyelim Lim, Taeuk Kim

:
Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models. 21857-21869 - Omkar Gurjar, Agam Goyal, Eshwar Chandrasekharan:

ArgCMV: An Argument Summarization Benchmark for the LLM-era. 21870-21883 - Honghao Fu, Junlong Ren, Qi Chai, Deheng Ye, Yujun Cai, Hao Wang:

VistaWise: Building Cost-Effective Agent with Cross-Modal Knowledge Graph for Minecraft. 21884-21898 - Xuelin Li, Xiangqi Jin, Linfeng Zhang:

GraphKV: Breaking the Static Selection Paradigm with Graph-Based KV Cache Eviction. 21899-21909 - Wei Liu, Michael Strube:

Joint Modeling of Entities and Discourse Relations for Coherence Assessment. 21910-21926 - Jun Bai, Minghao Tong, Yang Liu

, Zixia Jia, Zilong Zheng:
Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs. 21927-21942 - An Wang, Xingwu Sun, Ruobing Xie, Shuaipeng Li, Jiaqi Zhu, Zhen Yang, Pinxue Zhao, Weidong Han, Zhanhui Kang, Di Wang, Naoaki Okazaki, Cheng-Zhong Xu:

HMoE: Heterogeneous Mixture of Experts for Language Modeling. 21943-21957 - Yaoyao Qian, Yifan Zeng, Yuchao Jiang, Chelsi Jain, Huazheng Wang:

The Ranking Blind Spot: Decision Hijacking in LLM-based Text Ranking. 21958-21968 - Hailin Hao, Elsi Kaiser:

Uniform Information Density and Syntactic Reduction: Revisiting *that*-Mentioning in English Complement Clauses. 21969-21983 - Yujin Kang

, Park Seong Woo, Yoon-Sik Cho:
GRIT: Guided Relational Integration for Efficient Multi-Table Understanding. 21984-21997 - Yiming Zhang, Siyue Zhang, Junbo Zhao, Chen Zhao:

RPDR: A Round-trip Prediction-Based Data Augmentation Framework for Long-Tail Question Answering. 21998-22012 - Lorena Calvo-Bartolomé, Valérie Aldana, Karla Cantarero, Alonso Madroñal de Mesa, Jerónimo Arenas-García, Jordan Lee Boyd-Graber:

Discrepancy Detection at the Data Level: Toward Consistent Multilingual Question Answering. 22013-22054 - Yizhou Ying, Geng Zhang, Cui Danxin, Chengyu Du, Guanglei Yue, Sihang Jiang, Jiaqing Liang, Yifei Fu, Hailin Hu, Yanghua Xiao:

Data-Efficient Selection via Grammatical Complexity in Continual Pre-training of Domain-Specific LLMs. 22055-22069 - Guangyu Xie, Yice Zhang, Jianzhu Bao, Qianlong Wang, Yang Sun, Bingbing Wang, Ruifeng Xu:

Comprehensive and Efficient Distillation for Lightweight Sentiment Analysis Models. 22070-22091 - Huy Quang Dao, Lizi Liao

:
One Planner To Guide Them All ! Learning Adaptive Conversational Planners for Goal-oriented Dialogues. 22092-22116 - Ponhvoan Srey

, Xiaobao Wu, Anh Tuan Luu:
Unsupervised Hallucination Detection by Inspecting Reasoning Processes. 22117-22129 - Yi Feng, Chuanyi Li, Jiatong He, Zhenyu Hou, Vincent Ng:

Multimodal Neural Machine Translation: A Survey of the State of the Art. 22130-22147 - Magdalena Król, Aleksander Smywinski-Pohl

, Zbigniew Kaleta, Pawel Lewkowicz:
Lemmatization of Polish Multi-word Expressions. 22148-22157 - Yice Zhang, Guangyu Xie, Jingjie Lin, Jianzhu Bao, Qianlong Wang, Xi Zeng, Ruifeng Xu:

Targeted Distillation for Sentiment Analysis. 22158-22181 - Hao Wang, Hao Li, Junda Zhu, Xinyuan Wang, Chengwei Pan, Minlie Huang, Lei Sha:

DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak. 22182-22194 - Zicheng Zhou, Min Huang, Qinghai Miao:

Rank-Awareness and Angular Constraints: A New Perspective on Learning Sentence Embeddings from NLI Data. 22195-22209 - Qianrui Zhou, Hua Xu, Yifan Wang, Xinzhi Dong, Hanlei Zhang:

LLM-Guided Semantic Relational Reasoning for Multimodal Intent Recognition. 22210-22226 - Burak Satar, Zhixin Ma, Patrick Amadeus Irawan, Wilfried A. Mulyawan, Jing Jiang, Ee-Peng Lim, Chong-Wah Ngo:

Seeing Culture: A Benchmark for Visual Reasoning and Grounding. 22227-22243 - Jingjie Zheng, Aryo Pradipta Gema, Giwon Hong, Xuanli He, Pasquale Minervini, Youcheng Sun, Qiongkai Xu

:
GRADA: Graph-based Reranking against Adversarial Documents Attack. 22244-22266 - Yehang Zhang, Xinli Xu, Xiaojie Xu, Doudou Zhang

, Li Liu, Ying-Cong Chen:
Orchestrating Audio: Multi-Agent Framework for Long-Video Audio Synthesis. 22267-22282 - Kaiyuan Zhang

, Qian Liu, Luyang Zhang, Chaoqun Zheng, Shuaimin Li, Bing Xu, Muyun Yang, Xinxiao Qiao, Wenpeng Lu:
MADAWSD: Multi-Agent Debate Framework for Adversarial Word Sense Disambiguation. 22283-22302 - Juri Opitz, Lucas Möller, Andrianos Michail, Sebastian Padó, Simon Clematide:

Interpretable Text Embeddings and Text Similarity Explanation: A Survey. 22303-22319 - Jianyuan Zhong, Zeju Li, Zhijian Xu, Xiangyu Wen, Qiang Xu:

Dyve: Thinking Fast and Slow for Dynamic Process Verification. 22320-22333 - Soda Marem Lo, Silvia Casola, Erhan Sezerer, Valerio Basile, Franco Sansonetti, Antonio Uva, Davide Bernardi:

PERSEVAL: A Framework for Perspectivist Classification Evaluation. 22334-22359 - Yuto Harada, Yusuke Yamauchi, Yusuke Oda, Yohei Oseki, Yusuke Miyao, Yu Takagi:

Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality. 22360-22381 - Ujjwal Sharma, Pushpak Bhattacharyya:

IndiGEC: Multilingual Grammar Error Correction for Low-Resource Indian Languages. 22382-22396 - Giorgos Filandrianos, Angeliki Dimitriou, Maria Lymperaiou, Konstantinos Thomas, Giorgos Stamou:

Bias Beware: The Impact of Cognitive Biases on LLM-Driven Product Recommendations. 22397-22426 - Jie Zhang, Changzai Pan, Sishi Xiong, Kaiwen Wei, Yu Zhao, Xiangyu Li, Jiaxin Peng, Xiaoyan Gu, Jian Yang, Wenhan Chang, Zhenhe Wu, Jiang Zhong, Shuangyong Song, Xuelong Li:

T2R-BENCH: A Benchmark for Real World Table-to-Report Task. 22427-22451 - Zifeng Ding, Sikuan Yan, Moy Yuan, Xianglong Hu, Fangru Lin, Andreas Vlachos:

TCP: a Benchmark for Temporal Constraint-Based Planning. 22452-22475 - Felix Stahlberg, Shankar Kumar:

The Role of Outgoing Connection Heterogeneity in Feedforward Layers of Large Language Models. 22476-22484 - Manan Suri, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi, Vivek Gupta, Dinesh Manocha:

Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents. 22485-22508 - Lautaro Estienne, Gabriel Ben Zenou, Nona Naderi, Jackie CK Cheung, Pablo Piantanida:

Collaborative Rational Speech Act: Pragmatic Reasoning for Multi-Turn Dialog. 22509-22523 - Qiwei Peng, Yekun Chai, Anders Søgaard:

Understanding Subword Compositionality of Large Language Models. 22524-22535 - Zhipeng Yang, Junzhuo Li, Siyu Xia, Xuming Hu:

Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs. 22536-22564 - Viktor Hangya, Fabian Küch, Darina Gold:

From Understanding to Generation: An Efficient Shortcut for Evaluating Language Models. 22565-22581 - Qiwei Peng, Guimin Hu, Yekun Chai, Anders Søgaard:

Debiasing Multilingual LLMs in Cross-lingual Latent Space. 22582-22593 - Max Conti, Manuel Faysse, Gautier Viaud, Antoine Bosselut, Céline Hudelot, Pierre Colombo:

Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings. 22594-22608 - Xiaozhou You, Yahui Luo, Lihong Gu:

MS-RAG: Simple and Effective Multi-Semantic Retrieval-Augmented Generation. 22609-22625 - Wei Wu, Mark Last:

Transitive self-consistency evaluation of NLI models without gold labels. 22626-22642 - Jonghwi Kim, Deokhyung Kang, Seonjeong Hwang, Yunsu Kim, Jungseul Ok, Gary Lee:

MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries. 22643-22659 - Junqi Wu, Shujie Ji, Kang Zhong, Huiling Peng, Zhendongxiao, Xiongding Liu, Wu Wei:

Enhancing Chinese Offensive Language Detection with Homophonic Perturbation. 22660-22675 - Kimberly Le Truong, Riccardo Fogliato, Hoda Heidari, Steven Wu:

Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles. 22676-22709 - Esther Shizgal, Eitan Wagner, Renana Keydar, Omri Abend:

Computational Analysis of Character Development in Holocaust Testimonies. 22710-22734 - Daiye Miao, Yufang Liu, Jie Wang, Changzhi Sun, Yunke Zhang, Demei Yan, Shaokang Dong, Qi Zhang, Yuanbin Wu:

TASO: Task-Aligned Sparse Optimization for Parameter-Efficient Model Adaptation. 22735-22747 - Rui Liu, Jiahao Cao, Jiaqian Ren, Xu Bai, Yanan Cao:

Dual-Path Counterfactual Integration for Multimodal Aspect-Based Sentiment Classification. 22748-22758 - Camilla Casula

, Sebastiano Vecellio Salto, Elisa Leonardelli, Sara Tonelli:
Job Unfair: An Investigation of Gender and Occupational Bias in Free-Form Text Completions by LLMs. 22759-22777 - Chengqian Ma, Wei Tao

, Steven Y. Guo:
C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations. 22778-22796 - Changjiang Gao, Hankun Lin, Xin Huang

, Xue Han, Junlan Feng, Chao Deng, Jiajun Chen, Shujian Huang:
Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From. 22797-22826 - Mahdi Zakizadeh, Mohammad Taher Pilehvar:

Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark Datasets. 22827-22840 - Sergio E. Zanotto, Segun Aroyehun:

Linguistic and Embedding-Based Profiling of Texts Generated by Humans and Large Language Models. 22841-22858 - Marine Carpuat, Omri Asscher, Kalika Bali, Luisa Bentivogli, Frédéric Blain

, Lynne Bowker, Monojit Choudhury, Hal Daumé III, Kevin Duh, Ge Gao, Alvin Grissom II, Marzena Karpinska, Elaine C. Khoong, William D. Lewis, André F. T. Martins, Mary Nurminen, Douglas W. Oard, Maja Popovic, Michel Simard, François Yvon:
An Interdisciplinary Approach to Human-Centered Machine Translation. 22859-22879 - Gleb Mezentsev, Ivan V. Oseledets:

Exploring the Hidden Capacity of LLMs for One-Step Text Generation. 22880-22889 - Guanghui Song, Dongping Liao, Yiren Zhao, Kejiang Ye, Chengzhong Xu, Xitong Gao:

Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization. 22890-22903 - Hengrui Zhang, Pin-Siang Huang, Zhen Zhang, Peican Lin, Yao-Ching Yu, Bo Hu, Yulu Du:

PathwiseRAG: Multi-Dimensional Exploration and Integration Framework. 22904-22925 - Anh Ngo, Nicolas Rollet, Catherine Pelachaud, Chloé Clavel:

"Mm, Wat?" Detecting Other-initiated Repair Requests in Dialogue. 22926-22939 - Nancy Hamdan

, Osama Rakan Al Mraikhat, Fadi A. Zaraket:
R-BPE: Improving BPE-Tokenizers with Token Reuse. 22940-22948 - Diogo Tavares, David Semedo, Alexander Rudnicky, João Magalhães:

Language Models Can be Efficiently Steered via Minimal Embedding Layer Transformations. 22949-22967 - Fanzhen Liu, Sharif Abuadbba, Kristen Moore

, Surya Nepal, Cécile Paris, Jia Wu
, Jian Yang
, Quan Z. Sheng
:
Adversarial Attacks Against Automated Fact-Checking: A Survey. 22968-22990 - An-Lan Wang, Jingqun Tang, Lei Liao, Hao Feng, Qi Liu, Xiang Fei, Jinghui Lu

, Han Wang, Hao Liu, Yuliang Liu, Xiang Bai, Can Huang:
WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild? 22991-23001 - Cheng Xu, Nan Yan, Shuhao Guan, Changhong Jin, Yuke Mei, Yibing Guo, M. Tahar Kechadi

:
DCR: Quantifying Data Contamination in LLMs Evaluation. 23002-23020 - Svetlana Maslenkova, Clément Christophe, Marco AF Pimentel, Tathagata Raha, Muhammad Umar Salman, Ahmed Al-Mahrooqi, Avani Gupta, Shadab Khan, Ronnie Rajan, Praveen K. Kanithi:

Building Trust in Clinical LLMs: Bias Analysis and Dataset Transparency. 23021-23044 - Zhihang Tan

, Jingrui Hou, Ping Wang, Qibiao Hu, Peng Zhu:
Surprise Calibration for Better In-Context Learning. 23045-23060 - Bowen Zhang, Yi Yang, Fuqiang Niu, Xianghua Fu, Genan Dai, Hu Huang:

SPARK: Simulating the Co-evolution of Stance and Topic Dynamics in Online Discourse with LLM-based Agents. 23061-23073 - Yang Wang, Chenghao Xiao, Chia-Yi Hsiao, Zi Yan Chang, Chi-Li Chen, Tyler Loakman, Chenghua Lin:

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth. 23074-23096 - Ryang Heo, Yongsik Seo, Junseong Lee, Dongha Lee:

Can Large Language Models be Effective Online Opinion Miners? 23097-23136 - Dianqing Lin

, Aruukhan, Hongxu Hou, Shuo Sun, Wei Chen, Yichen Yang, Guodong Shi:
Can Large Language Models Translate Unseen Languages in Underrepresented Scripts? 23137-23150 - Yue Yang, Yinzhi Xu, Chenghao Huang, JohnMichael Jurgensen, Han Hu, Hao Wang:

InterIDEAS: Philosophical Intertextuality via LLMs. 23151-23172 - Yangfan Wang, Jie Liu, Chen Tang, Lian Yan, Jingchi Jiang:

KCS: Diversify Multi-hop Question Generation with Knowledge Composition Sampling. 23173-23185 - Yerin Hwang

, Dongryeol Lee, Kyungmin Min, Taegwan Kang, Yongil Kim, Kyomin Jung:
Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation. 23186-23205 - Yidan Xu, Xinghao Yang, Wei Liu, Bao-di Liu, Weifeng Liu:

Disentangled Information Bottleneck for Adversarial Text Defense. 23206-23218 - Zerui Chen, Huiming Fan, Qianyu Wang, Tao He, Ming Liu, Heng Chang, Weijiang Yu, Ze Li, Bing Qin:

How do Language Models Reshape Entity Alignment? A Survey of LM-Driven EA Methods: Advances, Benchmarks, and Future. 23219-23234 - Fanqi Kong, Xiaoyuan Zhang, Xinyu Chen, Yaodong Yang, Song-Chun Zhu, Xue Feng:

Enhancing LLM-Based Social Bot via an Adversarial Learning Framework. 23235-23260 - Haojia Zhu, Zhicheng Li, Jiahui Jin:

GER-LLM: Efficient and Effective Geospatial Entity Resolution with Large Language Model. 23261-23277 - Sheng Zhang, Yifan Ding, Shuquan Lian, Shun Song, Hui Li:

CodeRAG: Finding Relevant and Necessary Knowledge for Retrieval-Augmented Repository-Level Code Completion. 23278-23288 - Brendon Boldt, David R. Mortensen

:
Searching for the Most Human-like Emergent Language. 23289-23307 - Debasmita Bhattacharya

, David Sasu, Michela Marchini, Natalie Schluter, Julia Hirschberg:
Does Context Matter? A Prosodic Comparison of English and Spanish in Monolingual and Multilingual Discourse Settings. 23308-23322 - Seungyoun Yi, Minsoo Khang, Sungrae Park:

ZERA: Zero-init Instruction Evolving Refinement Agent - From Zero Instructions to Structured Prompts via Principle-based Optimization. 23323-23337 - Matthias Sperber, Maureen de Seyssel, Jiajun Bao, Matthias Paulik:

Toward Machine Interpreting: Lessons from Human Interpreting Studies. 23338-23353 - Jaewoo Ahn, Junseo Kim, Heeseung Yun, Jaehyeon Son, Dongmin Park, Jaewoong Cho, Gunhee Kim:

FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games. 23354-23384 - Erik Arakelyan, Pasquale Minervini, Patrick S. H. Lewis, Pat Verga, Isabelle Augenstein

:
FLARE: Faithful Logic-Aided Reasoning and Exploration. 23385-23403 - Debasmita Bhattacharya

, Juan Junco, Divya Tadimeti, Julia Hirschberg:
Discourse-Driven Code-Switching: Analyzing the Role of Content and Communicative Function in Spanish-English Bilingual Speech. 23404-23419 - Jiale Chen, Xuelian Dong, Qihao Yang, Wenxiu Xie, Tianyong Hao:

Can Large Language Models Translate Spoken-Only Languages through International Phonetic Transcription? 23420-23435 - Ruiran Su, Jiasheng Si, Zhijiang Guo

, Janet B. Pierrehumbert:
ClimateViz: A Benchmark for Statistical Reasoning and Fact Verification on Scientific Charts. 23436-23458 - Hyuntae Park, Yeachan Kim, SangKeun Lee:

Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware Alignment. 23459-23479 - Victor Adelakun Omolaoye, Babajide Alamu Owoyele, Gerard de Melo:

SLlama: Parameter-Efficient Language Model Architecture for Enhanced Linguistic Competence Under Strict Data Constraints. 23480-23495 - Divy Kala, Eshika Khandelwal, Makarand Tapaswi:

What You See is What You Ask: Evaluating Audio Descriptions. 23496-23518 - Ekaterina Taktasheva, Jeff Dalton:

TAPS: Tool-Augmented Personalisation via Structured Tagging. 23519-23544 - Masahiro Kaneko, Timothy Baldwin:

Investigating How Pre-training Data Leakage Affects Models' Reproduction and Detection Capabilities. 23545-23555 - Wenda Qin, Andrea Burns, Bryan A. Plummer, Margrit Betke:

Walk and Read Less: Improving the Efficiency of Vision-and-Language Navigation via Tuning-Free Multimodal Token Pruning. 23556-23570 - Junho Kim, Soyeon Bak, Mingyu Lee, Minju Hong, Songha Kim, Tae-Eui Kam, SangKeun Lee:

Connecting the Knowledge Dots: Retrieval-augmented Knowledge Connection for Commonsense Reasoning. 23571-23590 - Yeonseok Jeong, Minsoo Kim, Seung-won Hwang, Byung-Hak Kim:

Agent-as-Judge for Factual Summarization of Long Narratives. 23591-23608 - Miriam Wanner, Benjamin Van Durme, Mark Dredze:

DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation. 23609-23626 - Alberto Testoni, Barbara Plank, Raquel Fernández:

RAcQUEt: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs. 23627-23647 - Thomas Hikaru Clark, Jacob Hoover Vigly, Edward Gibson, Roger P. Levy:

Resource-Rational Noisy-Channel Language Processing: Testing the Effect of Algorithmic Constraints on Inferences. 23648-23661 - Ine Gevers, Victor De Marez, Jens Van Nooten, Jens Lemmens

, Andriy Kosar, Ehsan Lotfi, Nikolay Banar
, Pieter Fivez, Luna De Bruyne, Walter Daelemans
:
In Benchmarks We Trust ... Or Not? 23662-23676 - Xueqiao Zhang, Chao Zhang, Jingtao Xu, Yifan Zhu, Xin Shi, Yi Yang, Yawei Luo:

Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents. 23677-23703 - Maureen de Seyssel, Jie Chi, Skyler Seto, Maartje ter Hoeve, Masha Fedzechkina, Natalie Schluter:

Discriminating Form and Meaning in Multilingual Models with Minimal-Pair ABX Tasks. 23704-23725 - Juntong Wu, Zijing Liu, He Cao, Li Hao, Bin Feng, Zishan Shu, Ke Yu, Li Yuan, Yu Li:

Rethinking Text-based Protein Understanding: Retrieval or LLM? 23726-23746 - Claudiu Daniel Hromei, Antonio Scaiella, Danilo Croce, Roberto Basili:

Grounded Semantic Role Labelling from Synthetic Multimodal Data for Situated Robot Commands. 23747-23770 - Kai Golan Hashiloni

, Ofri Hefetz, Kfir Bar:
Easy as PIE? Identifying Multi-Word Expressions with LLMs. 23771-23790 - Wuwei Zhang, Fangcong Yin, Howard Yen, Danqi Chen, Xi Ye

:
Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking. 23791-23805 - Jingbiao Mei, Jinghong Chen, Guangyu Yang, Weizhe Lin, Bill Byrne:

Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection. 23806-23828 - Zhifei Xie, Mingbao Lin, Zihang Liu

, Pengcheng Wu, Shuicheng Yan, Chunyan Miao:
Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models. 23829-23851 - Marvin Lavechin, Thomas Hueber:

From perception to production: how acoustic invariance facilitates articulatory learning in a self-supervised vocal imitation model. 23852-23863 - Pinhuan Wang

, Zhiqiu Xia, Chunhua Liao, Feiyi Wang, Hang Liu:
REALM: Recursive Relevance Modeling for LLM-based Document Re-Ranking. 23864-23878 - Karolina Seweryn, Anna Kolos, Agnieszka Karlinska, Katarzyna Lorenc, Katarzyna Dziewulska

, Maciej Chrabaszcz, Aleksandra Krasnodebska, Paula Betscher, Zofia Cieslinska, Katarzyna Kowol, Julia Moska, Dawid Motyka, Pawel Walkowiak, Bartosz Zuk, Arkadiusz Janz:
PLLuM-Align: Polish Preference Dataset for Large Language Model Alignment. 23879-23908 - Yicong Wu, Guangyue Lu, Yuan Zuo, Huarong Zhang, Junjie Wu:

Graph-R1: Incentivizing the Zero-Shot Graph Learning Capability in LLMs via Explicit Reasoning. 23909-23927 - Weicheng Ma

, John J. Guerrerio, Soroush Vosoughi:
Scalable and Culturally Specific Stereotype Dataset Construction via Human-LLM Collaboration. 23928-23956 - LiQing Xu, Qiwei Li, Tianshuo Peng, Zuchao Li, Hai Zhao, Ping Wang:

Can Large Language Models Be Good Language Teachers? 23957-23971 - Qian Wan, Wangzi Shi, Jintian Feng, Shengyingjie Liu, Luona Wei, Zhicheng Dai, Jianwen Sun:

Empowering Math Problem Generation and Reasoning for Large Language Model via Synthetic Data based Continual Learning Framework. 23972-23991 - Vani Kanjirangat, Tanja Samardzic

, Ljiljana Dolamic, Fabio Rinaldi:
Tokenization and Representation Biases in Multilingual Models on Dialectal NLP Tasks. 23992-24010 - Isabel Cachola, Daniel Khashabi, Mark Dredze:

Evaluating the Evaluators: Are readability metrics good measures of readability? 24011-24027 - Ankan Mullick, Saransh Sharma, Abhik Jana, Pawan Goyal:

Text Takes Over: A Study of Modality Bias in Multimodal Intent Detection. 24028-24058 - Raphaël Sarfati, Haley Moller, Toni J. B. Liu, Nicolas Boullé, Christopher J. Earls:

What's in a prompt? Language models encode literary style in prompt embeddings. 24059-24068 - Zijie Wang

, Eduardo Blanco:
Identifying and Answering Questions with False Assumptions: An Interpretable Approach. 24069-24087 - Zhaowei Liu, Xin Guo, Haotian Xia, Lingfeng Zeng, Fangqi Lou, Jinyi Niu, Mengping Li, Qi Qi, Jiahuan Li, Wei Zhang, Yinglong Wang, Weige Cai, Weining Shen, Liwen Zhang:

VisFinEval: A Scenario-Driven Chinese Multimodal Benchmark for Holistic Financial Understanding. 24088-24146 - David Acuna, Ximing Lu, Jaehun Jung, Hyunwoo Kim

, Amlan Kar, Sanja Fidler, Yejin Choi:
Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions. 24147-24160 - Harry Mayne, Ryan Othniel Kearns, Yushi Yang, Andrew M. Bean, Eoin D. Delaney, Chris Russell, Adam Mahdi:

LLMs Don't Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations. 24161-24186 - Jean de Dieu Nyandwi, Yueqi Song, Simran Khanuja, Graham Neubig:

Grounding Multilingual Multimodal LLMs With Cultural Knowledge. 24187-24231 - Weizhe Yuan, Ilia Kulikov, Ping Yu, Kyunghyun Cho, Sainbayar Sukhbaatar, Jason E. Weston, Jing Xu:

Following Length Constraints in Instructions. 24232-24243 - Hongda Jiang, Xinyuan Zhang, Siddhant Garg, Rishab Arora, Shiunzu Kuo, Jiayang Xu, Aaron Colak, Xin Luna Dong:

Memory-QA: Answering Recall Questions Based on Multimodal Memories. 24244-24266 - Javad Rafiei Asl

, Sidhant Narula, Mohammad GhasemiGol, Eduardo Blanco, Daniel Takabi:
NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks. 24267-24295 - Simon A. Aytes, Jinheon Baek, Sung Ju Hwang:

Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching. 24296-24320 - Badr AlKhamissi, Greta Tuckute, Yingtian Tang, Taha Osama A Binhuraib, Antoine Bosselut, Martin Schrimpf:

From Language to Cognition: How LLMs Outgrow the Human Language Network. 24321-24339 - Ilya Ovodov, Petr Surovtsev, Karina Kvanchiani, Alexander Kapitanov, Alexander Nagaev:

Logos as a Well-Tempered Pre-train for Sign Language Recognition. 24340-24353 - Jakub Binkowski, Denis Janiak, Albert Sawczyn, Bogdan Gabrys

, Tomasz Kajdanowicz:
Hallucination Detection in LLMs Using Spectral Features of Attention Maps. 24354-24385 - Sanwoo Lee, Kun Liang, Yunfang Wu:

Composable Cross-prompt Essay Scoring by Merging Models. 24386-24400 - Yuho Lee, Jiaqi Deng, Nicole Hee-Yeon Kim, Hyangsuk Min, Taewon Yun, Minjeong Ban, Kim Yul, Hwanjun Song:

Towards a Holistic and Automated Evaluation Framework for Multi-Level Comprehension of LLMs in Book-Length Contexts. 24401-24425 - Hy Dang, Tianyi Liu, Zhuofeng Wu, Jingfeng Yang, Haoming Jiang, Tao Yang, Pei Chen, Zhengyang Wang, Helen Wang, Huasheng Li, Bing Yin, Meng Jiang:

Improving Large Language Models Function Calling and Interpretability via Guided-Structured Templates. 24426-24442 - Katerina Korre

, Dimitris Tsirmpas, Nikos Gkoumas, Emma Cabalé, Danai Myrtzani, Theodoros Evgeniou, Ion Androutsopoulos, John Pavlopoulos:
Evaluation and Facilitation of Online Discussions in the LLM Era: A Survey. 24443-24462 - Yizhe Xiong, Xiansheng Chen, Xin Ye, Hui Chen, Zijia Lin, Haoran Lian, Zhenpeng Su, Wei Huang, Jianwei Niu, Jungong Han, Guiguang Ding:

Temporal Scaling Law for Large Language Models. 24463-24483 - Yi Feng, Jiaqi Wang, Wenxuan Zhang, Zhuang Chen, Yutong Shen, Xiyao Xiao, Minlie Huang, Liping Jing, Jian Yu:

Reframe Your Life Story: Interactive Narrative Therapist and Innovative Moment Assessment with Large Language Models. 24484-24509 - Xunlian Dai, Li Zhou, Benyou Wang, Haizhou Li:

From Word to World: Evaluate and Mitigate Culture Bias in LLMs via Word Association Test. 24510-24526 - Shenglai Zeng, Jiankun Zhang, Pengfei He, Jie Ren, Tianqi Zheng, Hanqing Lu, Han Xu, Hui Liu, Yue Xing, Jiliang Tang:

Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data. 24527-24558 - Weixiang Zhao, Jiahe Guo

, Yulin Hu, Yang Deng
, An Zhang, Xingyu Sui, Xinyang Han, Yanyan Zhao, Bing Qin, Tat-Seng Chua, Ting Liu:
AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender. 24559-24577 - Chuangtao Ma

, Yongrui Chen, Tianxing Wu, Arijit Khan
, Haofen Wang:
Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities. 24578-24597 - Inderjeet Singh

, Ramya Srinivasan, Roman Vainshtein, Hisashi Kojima:
TFDP: Token-Efficient Disparity Audits for Autoregressive LLMs via Single-Token Masked Evaluation. 24598-24615 - Li Zhou, Lutong Yu, Dongchu Xie, Shaohuan Cheng, Wenyan Li, Haizhou Li:

Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation. 24616-24638 - Zhongyu Yang, Junhao Song

, Siyang Song, Wei Pang, Yingfang Yuan:
MERMAID: Multi-perspective Self-reflective Agents with Generative Augmentation for Emotion Recognition. 24639-24655 - Seungjong Sun, Seo Yeon Baek, Jang Hyun Kim:

Personality Vector: Modulating Personality of Large Language Models by Model Merging. 24656-24677 - Ruibin Xiong, Yimeng Chen, Dmitrii Khizbullin, Mingchen Zhuge, Jürgen Schmidhuber:

Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models. 24678-24714 - Qianqi Yan, Hongquan Li, Shan Jiang, Yang Zhao, Xinze Guan, Ching-Chen Kuo, Xin Eric Wang:

Hidden in Plain Sight: Reasoning in Underspecified and Misspecified Scenarios for Multimodal LLMs. 24715-24735 - Rik Koncel-Kedziorski, Brihi Joshi, Tim Paek:

PrimeX: A Dataset of Worldview, Opinion, and Explanation. 24736-24761 - Amruta Parulekar, Preethi Jyothi:

LASER: An LLM-based ASR Scoring and Evaluation Rubric. 24762-24771 - Zhenyun Deng, Yulong Chen, Andreas Vlachos:

Improving Zero-shot Sentence Decontextualisation with Content Selection and Planning. 24772-24788 - Jiankun Zhang, Shenglai Zeng, Jie Ren, Tianqi Zheng, Hui Liu, Xianfeng Tang, Hui Liu, Yi Chang:

Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented Generation. 24789-24810 - Dongwon Jung, Wenxuan Zhou, Muhao Chen:

Code Execution as Grounded Supervision for LLM Reasoning. 24811-24822 - Sai Sundaresan

, Harshita Chopra, Atanu R. Sinha, Koustava Goswami, Nagasai Saketh Naidu, Raghav Karan, N. Anushka:
Subjective Behaviors and Preferences in LLM: Language of Browsing. 24823-24836 - Michal Golovanevsky, William Rudman, Michael A. Lepori, Amir Bar, Ritambhara Singh, Carsten Eickhoff:

Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts. 24837-24852 - Benyamin Jamialahmadi, Parsa Kavehzadeh, Mehdi Rezagholizadeh, Parsa Farinneya, Hossein Rajabzadeh, Aref Jafari, Boxing Chen, Marzieh S. Tahaei:

Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models. 24853-24867 - Leena Mathur, Marian Qian, Paul Pu Liang, Louis-Philippe Morency:

Social Genome: Grounded Social Reasoning Abilities of Multimodal Models. 24868-24891 - Hanxi Guo, Siyuan Cheng, Xiaolong Jin, Zhuo Zhang, Guangyu Shen, Kaiyuan Zhang, Shengwei An, Guanhong Tao, Xiangyu Zhang:

Profiler: Black-box AI-generated Text Origin Detection via Context-aware Inference Pattern Analysis. 24892-24912 - Dingdong Wang, Junan Li, Mingyu Cui, Dongchao Yang, Xueyuan Chen, Helen M. Meng:

Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs. 24913-24924 - Kun Li, Yunxiang Li, Tianhua Zhang

, Hongyin Luo, Xixin Wu, James R. Glass, Helen M. Meng:
RAG-Zeval: Enhancing RAG Responses Evaluator through End-to-End Reasoning and Ranking-Based Reinforcement Learning. 24925-24943 - Hao Xu, Jiacheng Liu, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi:

Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index. 24944-24969 - Sujoy Sarkar, Gourav Sarkar, Manoj Balaji Jagadeeshan, Jivnesh Sandhan, Amrith Krishna, Pawan Goyal:

Mahānāma: A Unique Testbed for Literary Entity Discovery and Linking. 24970-24984 - Davis Brown, Prithvi Balehannina, Helen Jin, Shreya Havaldar, Hamed Hassani, Eric Wong:

Adaptively profiling models with task elicitation. 24985-25020 - Sasha Boguraev, Christopher Potts, Kyle Mahowald:

Causal Interventions Reveal Shared Structure Across English Filler-Gap Constructions. 25021-25042 - Yiwei Liu, Emma Jane Pretty

, Jiahao Huang, Saku Sugawara:
TactfulToM: Do LLMs have the Theory of Mind ability to understand White Lies? 25043-25061 - Colten DiIanni, Daniel Deutsch:

Don't Sweat the Small Stuff: Segment-Level Meta-Evaluation Based on Pairwise Difference Correlation. 25062-25070 - Alexander Scarlatos, Nigel Fernandez, Christopher Ormerod, Susan Lottridge, Andrew S. Lan:

SMART: Simulated Students Aligned with Item Response Theory for Question Difficulty Prediction. 25071-25094 - Guido Ivetta, Marcos J. Gomez

, Sofía Martinelli, Pietro Palombini, Maria Emilia Echeveste, Nair Carolina Mazzeo, Beatriz Busaniche, Luciana Benotti:
HESEIA: A community-based dataset for evaluating social biases in large language models, co-designed in real school settings in Latin America. 25095-25117 - Rabiul Awal, Mahsa Massoud, Aarash Feizi, Zichao Li, Suyuchen Wang, Christopher Pal, Aishwarya Agrawal, David Vázquez, Siva Reddy, Juan A. Rodríguez, Perouz Taslakian, Spandana Gella, Sai Rajeswar:

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation. 25118-25145 - Jules Watson, Xi Wang

, Raymond Liu, Suzanne Stevenson, Barend Beekhuizen:
Analyzing values about gendered language reform in LLMs' revisions. 25146-25161 - Zihan Chen, Lei Shi, Weize Wu, Qiji Zhou, Yue Zhang:

ALLabel: Three-stage Active Learning for LLM-based Entity Recognition using Demonstration Retrieval. 25162-25176 - Lihui Liu:

HyperKGR: Knowledge Graph Reasoning in Hyperbolic Space with Graph Neural Network Encoding Symbolic Path. 25177-25188 - Yuan Chiang, Elvis Hsieh, Chia-Hong Chou, Janosh Riebesell:

LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval. 25189-25221 - Stéphane Aroca-Ouellette, Katharina von der Wense, Alessandro Roncone:

ReSeeding Latent States for Sequential Language Understanding. 25222-25236 - Shuya Feng, Yuan Hong:

DPED: Multi-Layer Noise Distillation for Privacy-Preserving Text Embeddings. 25237-25245 - Mert Inan, Anthony Sicilia, Alex Xie, Saujas Vaduguru, Daniel Fried, Malihe Alikhani:

Identifying & Interactively Refining Ambiguous User Goals for Data Visualization Code Generation. 25246-25263 - Brendon Boldt, David R. Mortensen

:
Morpheme Induction for Emergent Language. 25264-25279 - Siyuan Wang, Enda Zhao, Xiang Ren:

Stepwise Informativeness Search for Improving LLM Reasoning. 25280-25298 - Eric Chamoun, Nedjma Ousidhoum, Michael Sejr Schlichtkrull, Andreas Vlachos

:
Social Good or Scientific Curiosity? Uncovering the Research Framing Behind NLP Artefacts. 25299-25335 - Mintong Kang, Vinayshekhar Bannihatti Kumar, Shamik Roy, Abhishek Kumar, Sopan Khosla, Balakrishnan Narayanaswamy, Rashmi Gangadharaiah:

FairGen: Controlling Sensitive Attributes for Fair Generations in Diffusion Models via Adaptive Latent Guidance. 25336-25350 - Artemis Panagopoulou, Le Xue, Honglu Zhou, Silvio Savarese, Ran Xu, Caiming Xiong, Chris Callison-Burch, Mark Yatskar, Juan Carlos Niebles:

Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D. 25351-25365 - Guilin Hu

, Malek Itani, Tuochao Chen, Shyamnath Gollakota:
Proactive Hearing Assistants that Isolate Egocentric Conversations. 25366-25383 - Weijia Xu, Nebojsa Jojic, Nicolas Le Roux:

fLSA: Learning Semantic Structures in Document Collections Using Foundation Models. 25384-25395 - Kaiwen Zhou, Xuandong Zhao, Jayanth Srinivasa, Gaowen Liu, Aosong Feng, Dawn Song, Xin Eric Wang:

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning. 25396-25412 - Rosni Vasu, Chandrayee Basu, Bhavana Dalvi Mishra, Cristina Sarasua, Peter Clark, Abraham Bernstein:

HypER: Literature-grounded Hypothesis Generation and Distillation with Provenance. 25413-25438 - Kai Guo, Harry Shomer, Shenglai Zeng, Haoyu Han, Yu Wang, Jiliang Tang:

Empowering GraphRAG with Knowledge Filtering and Integration. 25439-25453 - Jaewook Lee, Alexander Scarlatos, Andrew Lan:

Interpretable Mnemonic Generation for Kanji Learning via Expectation-Maximization. 25454-25475 - Jean-Flavien Bussotti, Paolo Papotti:

Refining Attention for Explainable and Noise-Robust Fact-Checking with Transformers. 25476-25488 - Seongho Joo, Hyukhun Koh, Kyomin Jung:

Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding. 25489-25524 - Meng Lu, Catherine Chen, Carsten Eickhoff:

Pathway to Relevance: How Cross-Encoders Implement a Semantic Variant of BM25. 25525-25547 - Andre Wang He, Daniel Fried, Sean Welleck:

Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening. 25548-25560 - Sana Kang, Myeongseok Gwon, Su Young Kwon, Jaewook Lee, Andrew Lan, Bhiksha Raj, Rita Singh:

PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs. 25561-25593 - Sahana Ramnath, Anurag Mudgil, Brihi Joshi, Skyler Hallinan, Xiang Ren:

Amulet: Putting Complex Multi-Turn Conversations on the Stand with LLM Juries. 25594-25635 - Yunfan Zhang, Kathleen McKeown, Smaranda Muresan:

Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment. 25636-25649 - Yunyan Zhang, Zhihong Zhu, Xian Wu:

CMedCalc-Bench: A Fine-Grained Benchmark for Chinese Medical Calculations in LLM. 25650-25659 - Guanyu Hou

, Jiaming He, Yinhang Zhou, Ji Guo, Yitong Qiao, Rui Zhang, Wenbo Jiang:
Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical Study. 25660-25676 - Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang:

How Far Can LLMs Improve from Experience? Measuring Test-Time Learning Ability in LLMs with Human Comparison. 25677-25691 - Yejin Son, Minseo Kim, Sungwoong Kim, Seungju Han, Jian Kim, Dongju Jang, Youngjae Yu, Chan Young Park

:
Subtle Risks, Critical Failures: A Framework for Diagnosing Physical Safety of LLMs for Embodied Decision Making. 25692-25733 - Aurick Qiao, Zhewei Yao, Samyam Rajbhandari, Yuxiong He:

SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation. 25734-25753 - Ling-I Wu, Weijie Wu, Minyu Chen, Jianxin Xue, Guoqiang Li:

Co-Eval: Augmenting LLM-based Evaluation with Machine Metrics. 25754-25776 - MinJu Jeon, Si-Woo Kim, Ye-Chan Kim, HyunGee Kim, Dong-Jin Kim:

Sali4Vid: Saliency-Aware Video Reweighting and Adaptive Caption Retrieval for Dense Video Captioning. 25777-25790 - Pingjing Yang, Sullam Jeoung, Jennifer Cromley, Jana Diesner:

Semantic Networks Extracted from Students' Think-Aloud Data are Correlated with Students' Learning Performance. 25791-25804 - York Hay Ng, Phuong Hanh Hoang, En-Shiun Annie Lee:

Less is More: The Effectiveness of Compact Typological Language Representations. 25805-25816 - Runcong Zhao, Chengyu Cao, Qinglin Zhu, Xiucheng Lyu, Shun Shao, Lin Gui, Ruifeng Xu, Yulan He:

Sparse Activation Editing for Reliable Instruction Following in Narratives. 25817-25832 - Asif Shahriar, Rifat Shahriyar, M. Saifur Rahman:

Inceptive Transformers: Enhancing Contextual Representations through Multi-Scale Feature Learning Across Domains and Languages. 25833-25848 - Sakiko Yahata, Zhen Wan, Fei Cheng

, Sadao Kurohashi, Hisahiko Sato, Ryozo Nagai:
Causal Tree Extraction from Medical Case Reports: A Novel Task for Experts-like Text Comprehension. 25849-25867 - Alisha Srivastava, Emir Korukluoglu, Minh Nhat Le, Duyen Tran, Chau Minh Pham, Marzena Karpinska, Mohit Iyyer:

OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature. 25868-25895 - Bingyang Ye

, Jingxuan Tu, James Pustejovsky:
Enhanced Noun-Noun Compound Interpretation through Textual Enrichment. 25896-25911 - Zhouxiang Fang, Aayush Mishra, Muhan Gao, Anqi Liu, Daniel Khashabi:

ICL CIPHERS: Quantifying "Learning" in In-Context Learning via Substitution Ciphers. 25912-25933 - Yunhao Gou, Hansi Yang, Zhili Liu, Kai Chen, Yihan Zeng, Lanqing Hong, Zhenguo Li, Qun Liu, Bo Han, James Kwok, Yu Zhang:

Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning. 25934-25960 - Jiazheng Kang, Mingming Ji, Zhe Zhao, Ting Bai:

Memory OS of AI Agent. 25961-25970 - Juyoung Han, Hyunsun Hwang, Changki Lee:

Rule Discovery for Natural Language Inference Data Generation Using Out-of-Distribution Detection. 25971-25991 - Zesen Lyu

, Dandan Zhang, Wei Ye, Fangdi Li, Zhihang Jiang, Yao Yang:
Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models. 25992-26003 - Francesco Periti

, Roksana Goworek, Haim Dubossarsky, Nina Tahmasebi
:
Definition Generation for Word Meaning Modeling: Monolingual, Multilingual, and Cross-Lingual Perspectives. 26004-26024 - Juncheng Wang, Chao Xu, Cheng Yu, Zhe Hu, Haoyu Xie, Guoqi Yu, Lei Shang, Shujun Wang:

Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers. 26025-26043 - Huaqin Zhao, Jiaxi Li, Yi Pan, Shizhe Liang, Xiaofeng Yang

, Fei Dou, Tianming Liu, Jin Lu:
HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization. 26044-26067 - Yejin Choi, Jae-Woo Park, Janghan Yoon, Saejin Kim, Jaehyun Jeon, Youngjae Yu:

Zero-shot Multimodal Document Retrieval via Cross-modal Question Generation. 26068-26083 - Suqing Wang, Zuchao Li, Luohe Shi, Bo Du, Hai Zhao, Yun Li, Qianren Wang:

From Parameters to Performance: A Data-Driven Study on LLM Structure and Development. 26084-26101 - Ramya Keerthy Thatikonda, Wray L. Buntine, Ehsan Shareghi:

Logical Reasoning with Outcome Reward Models for Test-Time Scaling. 26102-26112 - Qingjie Zhang, Di Wang, Haoting Qian, Liu Yan, Tianwei Zhang, Ke Xu, Qi Li, Minlie Huang, Hewu Li, Han Qiu:

Speculating LLMs' Chinese Training Data Pollution from Their Tokens. 26113-26133 - Abhay Gupta, Kevin Zhu, Vasu Sharma, Sean O'Brien, Michael Lu:

NovelHopQA: Diagnosing Multi-Hop Reasoning Failures in Long Narrative Contexts. 26134-26151 - Chenxu Yang, Ruipeng Jia, Mingyu Zheng, Naibin Gu, Zheng Lin, Siyuan Chen, Weichong Yin, Hua Wu, Weiping Wang:

Weights-Rotated Preference Optimization for Large Language Models. 26152-26175 - Yuhan Liu, Zirui Song, Juntian Zhang, Xiaoqing Zhang, Xiuying Chen, Rui Yan:

The Stepwise Deception: Simulating the Evolution from True News to Fake News with LLM Agents. 26176-26192 - Kangtao Lv, Haibin Chen, Yujin Yuan, Langming Liu, Shilei Liu, Yongwei Wang, Wenbo Su, Bo Zheng:

How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models. 26193-26208 - Biao Zhang, Lixin Chen, Tong Liu, Bo Zheng:

SMEC:Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression. 26209-26222 - Hanqing Li, Diego Klabjan:

Reverse Prompt Engineering: A Zero-Shot, Genetic Algorithm Approach to Language Model Inversion. 26223-26245 - Hang Wu, Hongkai Chen, Yujun Cai, Chang Liu, Qingwen Ye, Ming-Hsuan Yang, Yiwei Wang:

DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning. 26246-26256 - Jia Wang

, Ziyu Zhao, Tingjuntao Ni, Zhongyu Wei:
SocioBench: Modeling Human Behavior in Sociological Surveys with Large Language Models. 26257-26289 - Wei-Ning Chiu, Yu-Hsiang Wang, Andy Hsiao

, Yu-Shiang Huang, Chuan-Ju Wang:
Financial Risk Relation Identification through Dual-view Adaptation. 26290-26300 - Razvan-Gabriel Dumitru, Minglai Yang, Vikas Yadav, Mihai Surdeanu:

CopySpec: Accelerating LLMs with Speculative Copy-and-Paste. 26301-26332 - Kainan Liu, Yong Zhang, Ning Cheng, Zhitao Li, Shaojun Wang, Jing Xiao:

GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression. 26333-26348 - Yuhao Yang, Jiabin Tang, Lianghao Xia, Xingchen Zou, Yuxuan Liang, Chao Huang:

GraphAgent: Agentic Graph Language Assistant. 26349-26368 - Zhihao Jia, Mingyi Jia, Junwen Duan, Jian-xin Wang:

DDO: Dual-Decision Optimization for LLM-Based Medical Consultation via Multi-Agent Collaboration. 26369-26386 - WenHao Wang, Zijie Yu, Rui Ye, Jianqing Zhang, Guangyi Liu, Liang Liu, Siheng Chen, Yanfeng Wang:

FedMABench: Benchmarking Mobile GUI Agents on Decentralized Heterogeneous User Data. 26387-26408 - Shuliang Liu, Zheng Qi, Jesse Jiaxi Xu, Yibo Yan, Junyan Zhang, He Geng, Aiwei Liu, Peijie Jiang, Jia Liu, Yik-Cheung Tam, Xuming Hu:

VLA-Mark: A cross modal watermark for large vision-language alignment models. 26409-26427 - Hongji Li, Andrianos Michail, Reto Gubelmann

, Simon Clematide, Juri Opitz:
Sentence Smith: Controllable Edits for Evaluating Text Embeddings. 26428-26445 - Yu Sun, Xingyu Qian, Weiwen Xu, Hao Zhang, Chenghao Xiao, Long Li, Deli Zhao, Wenbing Huang, Tingyang Xu, Qifeng Bai

, Yu Rong:
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning. 26446-26467 - Seongwan Park, Taeklim Kim, Youngjoong Ko:

Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval. 26468-26485 - Yuanzhang Lin, Zhe Zhang, He Rui, Qingao Dong, Mingyi Zhou, Jing Zhang, Xiang Gao, Hailong Sun:

UICOMPASS: UI Map Guided Mobile Task Automation via Adaptive Action Generation. 26486-26506 - Tommaso Green, Martin Gubri

, Haritz Puerto, Sangdoo Yun, Seong Joon Oh:
Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers. 26507-26529 - Xu Wang, Zihao Li, Benyou Wang, Yan Hu, Difan Zou:

Model Unlearning via Sparse Autoencoder Subspace Guided Projections. 26530-26546 - Changtai Zhu, Siyin Wang, Ruijun Feng, Kai Song, Xipeng Qiu:

ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning. 26547-26564 - Wen Tao, Jing Tang, Alvin Chan, Bryan Hooi, Baolong Bi, Nanyun Peng, Yuansheng Liu, Yiwei Wang:

How to Make Large Language Models Generate 100% Valid Molecules? 26565-26580 - Jianzhu Bao, Yuqi Huang, Yang Sun, Wenya Wang

, Yice Zhang, Bojun Jin, Ruifeng Xu:
Exploring Quality and Diversity in Synthetic Data Generation for Argument Mining. 26581-26604 - Mohammad Amin Ghanizadeh, Mohammad Javad Dousti:

Dynamic Jointly Batch Selection for Data Efficient Machine Translation Fine-Tuning. 26605-26613 - Ivan Sviridov, Amina Miftakhova, Artemiy Tereshchenko, Galina Zubkova, Pavel Blinov, Andrey V. Savchenko:

3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark. 26614-26654 - Lucio La Cava, Andrea Tagarelli:

OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution. 26655-26671 - Shiting Huang, Zhen Fang, Zehui Chen, Siyu Yuan, Junjie Ye

, Yu Zeng, Lin Chen, Qi Mao, Feng Zhao:
CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios. 26672-26704 - Marek Kadlcík, Michal Stefánik, Timothee Mickus, Josef Kuchar, Michal Spiegel:

Pre-trained Language Models Learn Remarkably Accurate Representations of Numbers. 26705-26714 - Yu Zeng, Yukun Qi, Yiming Zhao, Xikun Bao, Lin Chen, Zehui Chen, Shiting Huang, Jie Zhao, Feng Zhao:

Enhancing Large Vision-Language Models with Ultra-Detailed Image Caption Generation. 26715-26741 - António Farinhas, Nuno Miguel Guerreiro, Sweta Agrawal, Ricardo Rei, André F. T. Martins:

Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral. 26742-26756 - Julius Mayer, Mohamad Ballout, Serwan Jassim, Farbod Nosrat Nezami, Elia Bruni:

iVISPAR - An Interactive Visual-Spatial Reasoning Benchmark for VLMs. 26757-26781 - Omer Nahum, Nitay Calderon, Orgad Keller, Idan Szpektor, Roi Reichart:

Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance. 26782-26809 - Holli Sargeant, Andreas Östling, Måns Magnusson:

Detecting Legal Citations in United Kingdom Court Judgments. 26810-26836 - Guangxiang Zhao, Saier Hu, Xiaoqi Jian, Jinzhu Wu, Yuhan Wu, Lin Sun, Xiangzheng Zhang:

Large Language Models Badly Generalize across Option Length, Problem Types, and Irrelevant Noun Replacements. 26837-26846 - Ehsan Doostmohammadi, Marco Kuhlmann:

Studying the Role of Input-Neighbor Overlap in Retrieval-Augmented Language Models Training Efficiency. 26847-26856 - Pedro Henrique Luz de Araujo

, Paul Röttger, Dirk Hovy, Benjamin Roth:
Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance. 26857-26886 - Taha Ceritli, Ondrej Bohdal, Mete Ozay, Jijoong Moon, Kyeng-Hun Lee, Hyeonmok Ko, Umberto Michieli:

HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging. 26887-26909 - Senjie Jin, Lu Chen, Zhiheng Xi, Yuhui Wang, Sirui Song, Yuhao Zhou, Xinbo Zhang, Peng Sun, Hong Lu, Tao Gui, Qi Zhang, Xuanjing Huang:

Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning. 26910-26927 - Songsheng Wang, Rucheng Yu, Zhihang Yuan, Chao Yu, Feng Gao, Yu Wang, Derek F. Wong:

Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance. 26928-26940 - Quang Anh Nguyen, Nadi Tomeh, Mustapha Lebbah, Thierry Charnois, Hanane Azzag:

Leveraging Text-to-Text Transformers as Classifier Chain for Few-Shot Multi-Label Classification. 26941-26950 - Rochelle Choenni, Ivan Titov:

M-Wanda: Improving One-Shot Pruning for Multilingual LLMs. 26951-26964 - Hamidreza Saffari, Mohammadamin Shafiei, Hezhao Zhang, Lasana T. Harris

, Nafise Sadat Moosavi:
Beyond Hate Speech: NLP's Challenges and Opportunities in Uncovering Dehumanizing Language. 26965-26980 - Eunseong Choi, June Park, Hyeri Lee, Jongwuk Lee:

Conflict-Aware Soft Prompting for Retrieval-Augmented Generation. 26981-26995 - Haiming Qin, Jiwei Zhang

, Wei Zhang, Kezhong Lu, Mingyang Zhou, Hao Liao, Rui Mao:
R-CHAR: A Metacognition-Driven Framework for Role-Playing in Large Language Models. 26996-27014 - Gaifan Zhang, Yi Zhou, Danushka Bollegala

:
Annotating Training Data for Conditional Semantic Textual Similarity Measurement using Large Language Models. 27015-27027 - Haidong Xu, Meishan Zhang, Hao Ju, Zhedong Zheng, Erik Cambria, Min Zhang, Hao Fei:

When Words Smile: Generating Diverse Emotional Facial Expressions from Text. 27028-27046 - Kai Krüger, Johanna Binnewitt

, Kathrin Ehmann, Stefan Winnige, Alan Akbik:
Improving Online Job Advertisement Analysis via Compositional Entity Extraction. 27047-27065 - Qiunan Du, Zhiliang Tian, Zhen Huang, Kailun Bian, Tianlun Liu, Zhaoning Zhang, Xinwang Liu, Feng Liu, Dong Sheng Li:

Correlation-Aware Example Selection for In-Context Learning with Nonsymmetric Determinantal Point Processes. 27066-27082 - Effrosyni Sokli, Georgios Peikos

, Pranav Kasela, Gabriella Pasi:
Leveraging Cognitive Complexity of Texts for Contextualization in Dense Retrieval. 27083-27096 - Zhang Zhang, Guhao Feng, Jian Guan, Di He, Wei Wu:

Beyond Online Sampling: Bridging Offline-to-Online Alignment via Dynamic Data Transformation for LLMs. 27097-27109 - Rishika Bhagwatkar, Syrielle Montariol, Angelika Romanou, Beatriz Borges, Irina Rish, Antoine Bosselut:

CAVE : Detecting and Explaining Commonsense Anomalies in Visual Environments. 27110-27151 - Linjuan Wu, Haoran Wei, Huan Lin, Tianhao Li, Baosong Yang, Fei Huang, Weiming Lu:

Enhancing LLM Language Adaption through Cross-lingual In-Context Pre-training. 27152-27166 - Sifan Li, Yujun Cai, Yiwei Wang:

SemVink: Advancing VLMs' Semantic Understanding of Optical Illusions via Visual Global Thinking. 27167-27177 - Qianxi He, Qianyu He, Jiaqing Liang, Weikang Zhou, Zeye Sun, Fei Yu, Yanghua Xiao:

Order Doesn't Matter, But Reasoning Does: Training LLMs with Order-Centric Augmentation. 27178-27192 - Alessandro De Bellis, Salvatore Bufi, Giovanni Servedio, Vito Walter Anelli, Tommaso Di Noia, Eugenio Di Sciascio:

Type-Less yet Type-Aware Inductive Link Prediction with Pretrained Language Models. 27193-27209 - Tsedeniya Kinfe Temesgen

, Marion Di Marco, Alexander Fraser
:
Extracting Linguistic Information from Large Language Models: Syntactic Relations and Derivational Knowledge. 27210-27226 - Qianxi He, Qingyu Ren, Shanzhe Lei, Xuhong Wang, Yingchun Wang:

Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning. 27227-27243 - Dominik Meier

, Jan Philip Wahle, Paul Röttger, Terry Ruas, Bela Gipp:
TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent. 27244-27261 - Jean-Baptiste Sevestre, Emmanuel Dupoux:

Frequency & Compositionality in Emergent Communication. 27262-27274 - Fabian Retkowski, Maike Züfle

, Andreas Sudmann, Dinah Pfau, Shinji Watanabe, Jan Niehues, Alexander Waibel:
Summarizing Speech: A Comprehensive Survey. 27275-27306 - Cheng Liu, Yifei Lu, Fanghua Ye, Jian Li, Xingyu Chen, Feiliang Ren, Zhaopeng Tu, Xiaolong Li:

CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards. 27307-27336 - Yifan Deng, Spencer S. Ericksen, Anthony Gitter

:
Assay2Mol: Large Language Model-based Drug Design Using BioAssay Context. 27337-27362 - Zehan Li, Fu Zhang, Wenqing Zhang, Jiawei Li, Zhou Li, Jingwei Cheng, Tianyue Peng:

Frame First, Then Extract: A Frame-Semantic Reasoning Pipeline for Zero-Shot Relation Triplet Extraction. 27363-27376 - Yahan Yang, Soham Dan, Shuo Li, Dan Roth, Insup Lee:

MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety. 27377-27396 - Ruochun Jin, Xiyue Wang, Dong Wang, Haoqi Zheng, Yunpeng Qi, Silin Yang, Meng Zhang:

TALON: A Multi-Agent Framework for Long-Table Exploration and Question Answering. 27397-27413 - Pawel Maka, Yusuf Can Semerci

, Jan Scholtes, Gerasimos Spanakis
:
You Are What You Train: Effects of Data Composition on Training Context-aware Machine Translation Models. 27414-27437 - Jessica Hoffmann, Christiane Ahlheim, Zac Yu, Aria Walfrand, Jarvis Jin, Marie Tano, Ahmad Beirami, Erin MacMurray van Liemt, Nithum Thain, Hakim Sidahmed, Lucas Dixon:

Improving Neutral Point-of-View Generation with Data- and Parameter-Efficient RL. 27438-27467 - Emmanouil Seferis, Changshun Wu, Stefanos Kollias, Saddek Bensalem, Chih-Hong Cheng:

Randomized Smoothing Meets Vision-Language Models. 27468-27478 - Matthew Zent, Digory Smith, Simon Woodhead:

PIIvot: A Lightweight NLP Anonymization Framework for Question-Anchored Tutoring Dialogues. 27479-27488 - Yinuo Wang, Baiyang Wang, Robert E. Mercer, Frank Rudzicz, Sudipta Singha Roy, Pengjie Ren, Zhumin Chen, Xindi Wang:

Trustworthy Medical Question Answering: An Evaluation-Centric Survey. 27489-27502 - Wesley Scivetti, Tatsuya Aoyama, Ethan Wilcox, Nathan Schneider:

Unpacking Let Alone: Human-Scale Models Generalize to a Rare Construction in Form but not Meaning. 27503-27514 - Pierre Andrews, Mikel Artetxe, Mariano Coria Meglioli, Marta R. Costa-jussà, Joe Chuang, David Dale, Mark Duppenthaler, Nathanial Paul Ekberg, Cynthia Gao, Daniel Edward Licht, Jean Maillard, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Eduardo Sánchez, Ioannis Tsiamas, Arina Turkatenko, Albert Ventayol-Boada, Shireen Yates:

BOUQuET : dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation. 27515-27535 - Qian Wu

, Zheyao Gao, Longfei Gou, Yifan Hou, Ann Sin Nga Lau, Qi Dou:
HealthCards: Exploring Text-to-Image Generation as Visual Aids for Healthcare Knowledge Democratizing and Education. 27536-27558 - Ammar Khairi, Daniel D'souza, Ye Shen, Julia Kreutzer, Sara Hooker:

When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs. 27559-27583 - Yi-Cheng Lin, Kang-Chieh Chen, Zhe-Yan Li, Tzu-Heng Wu, Tzu-Hsuan Wu, Kuan-Yu Chen, Hung-yi Lee, Yun-Nung Chen:

Creativity in LLM-based Multi-Agent Systems: A Survey. 27584-27607 - Chenwei Xie, Matthew King-Hang Ma, Wenbo Wang, William Shi-Yuan Wang:

Context and POS in Action: A Comparative Study of Chinese Homonym Disambiguation in Human and Language Models. 27608-27625 - Piotr Przybyla

, Euan McGill, Horacio Saggion:
Attacking Misinformation Detection Using Adversarial Examples Generated by Language Models. 27626-27642 - Felermino D. M. A. Ali, Henrique Lopes Cardoso

, Rui Sousa-Silva:
Leveraging Loanword Constraints for Improving Machine Translation in a Low-Resource Multilingual Context. 27643-27657 - Yuemei Xu, Kexin Xu

, Jian Zhou, Ling Hu, Lin Gui:
Linguistic Neuron Overlap Patterns to Facilitate Cross-lingual Transfer on Low-resource Languages. 27658-27673 - Ona de Gibert

, Joseph Attieh
, Teemu Vahtola
, Mikko Aulamo
, Zihao Li
, Raúl Vázquez
, Tiancheng Hu, Jörg Tiedemann
:
Scaling Low-Resource MT via Synthetic Data Generation with LLMs. 27674-27692 - Da Li, Keping Bi, Jiafeng Guo, Xueqi Cheng:

Tailoring Table Retrieval from a Field-aware Hybrid Matching Perspective. 27693-27704 - Sotaro Takeshita, Yurina Takeshita, Daniel Ruffinelli, Simone Paolo Ponzetto:

Randomly Removing 50% of Dimensions in Text Embeddings has Minimal Impact on Retrieval and Classification Tasks. 27705-27726 - Matteo Marcuzzo

, Alessandro Zangari
, Andrea Albarelli, José Camacho-Collados, Mohammad Taher Pilehvar:
Morables: A Benchmark for Assessing Abstract Moral Reasoning in LLMs with Fables. 27727-27751 - Francisco Valentini

, Viviana Cotik, Damián Ariel Furman, Ivan Bercovich, Edgar Altszyler, Juan Manuel Pérez:
MessIRve: A Large-Scale Spanish Information Retrieval Dataset. 27752-27769 - Jesujoba Oluwadara Alabi, Israel Abebe Azime, Miaoran Zhang, Cristina España-Bonet, Rachel Bawden, Dawei Zhu, David Ifeoluwa Adelani, Clement Odoje, Idris Akinade, Iffat Maab, Davis David, Shamsuddeen Hassan Muhammad, Neo Putini, David O. Ademuyiwa, Andrew Caines, Dietrich Klakow:

AFRIDOC-MT: Document-level MT Corpus for African Languages. 27770-27806 - Jesujoba Oluwadara Alabi, Michael A. Hedderich, David Ifeoluwa Adelani, Dietrich Klakow:

Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead. 27807-27841 - Yiyang Zhou, Linjie Li, Shi Qiu, Zhengyuan Yang, Yuyang Zhao, Siwei Han, Yangfan He, Kangqi Li, Haonian Ji, Zihao Zhao, Haibo Tong, Lijuan Wang, Huaxiu Yao:

GLIMPSE: Do Large Vision-Language Models Truly Think With Videos or Just Glimpse at Them? 27842-27856 - Lance Calvin Lim Gamboa, Yue Feng, Mark G. Lee

:
Social Bias in Multilingual Language Models: A Survey. 27857-27880 - Costas Mavromatis, Soji Adeshina, Vassilis N. Ioannidis, Zhen Han, Qi Zhu, Ian Robinson, Bryan Thompson, Huzefa Rangwala, George Karypis:

BYOKG-RAG: Multi-Strategy Graph Retrieval for Knowledge Graph Question Answering. 27881-27898 - Avijit Mitra, Zhichao Yang, Emily Druhl, Raelene Goodwin, Hong Yu:

Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text. 27899-27935 - Alessandro Zangari

, Matteo Marcuzzo
, Andrea Albarelli, Mohammad Taher Pilehvar, José Camacho-Collados:
Pun Unintended: LLMs and the Illusion of Humor Understanding. 27936-27971 - Jaehong Yoon, Shoubin Yu, Mohit Bansal:

RACCooN: Versatile Instructional Video Editing with Auto-Generated Narratives. 27972-28008 - Yanjin He, Qingkai Zeng, Meng Jiang:

Pre-trained Models Perform the Best When Token Distributions Follow Zipf's Law. 28009-28021 - Florin Cuconasu

, Simone Filice, Guy Horowitz, Yoelle Maarek, Fabrizio Silvestri:
Do RAG Systems Really Suffer From Positional Bias? 28022-28036 - Wonjin Yoon, Boyu Ren, Spencer Thomas, Chanhwi Kim, Guergana Savova, Mei-Hua Hall, Tim Miller:

Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction. 28037-28054 - Tamara Quiroga, Felipe Bravo-Marquez, Valentin Barriere:

Adapting Bias Evaluation to Domain Contexts using Generative Models. 28055-28066 - Jon Gauthier, Canaan Breiss, Matthew K. Leonard, Edward F. Chang:

Emergent morpho-phonological representations in self-supervised speech models. 28067-28086 - Jiayi Wang, Yao Lu, Maurice Weber, Max Ryabinin, David Ifeoluwa Adelani, Yihong Chen, Raphael Tang, Pontus Stenetorp:

Multilingual Language Model Pretraining using Machine-translated Data. 28087-28107 - Jinggui Liang, Dung Vo

, Lizi Liao
:
IntentionFrame: A Semi-Structured, Multi-Aspect Framework for Fine-Grained Conversational Intention Understanding. 28108-28125 - Ziyang Wang, Jaehong Yoon, Shoubin Yu, Md Mohaiminul Islam, Gedas Bertasius, Mohit Bansal:

Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning. 28126-28140 - Ondrej Bohdal, Mete Ozay, Jijoong Moon, Kyeng-Hun Lee, Hyeonmok Ko, Umberto Michieli:

Efficient Compositional Multi-tasking for On-device Large Language Models. 28141-28165 - Samuel Simko, Mrinmaya Sachan, Bernhard Schölkopf, Zhijing Jin:

Improving Large Language Model Safety with Contrastive Representation Learning. 28166-28194 - Taehee Park, Heejin Do, Gary Lee:

Leveraging What's Overfixed: Post-Correction via LLM Grammatical Error Overcorrection. 28195-28207 - Aoming Liu, Kevin Miller, Venkatesh Saligrama, Kate Saenko, Boqing Gong

, Ser-Nam Lim, Bryan A. Plummer:
Scaling Up Temporal Domain Generalization via Temporal Experts Averaging. 28208-28231 - Yi Jing, Zijun Yao, Hongzhu Guo, Lingxu Ran, Xiaozhi Wang, Lei Hou, Juanzi Li:

LinguaLens: Towards Interpreting Linguistic Mechanisms of Large Language Models via Sparse Auto-Encoder. 28232-28251 - Adrian Cosma, Stefan Ruseti, Emilian Radoi, Mihai Dascalu:

The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models. 28252-28263 - Aloka Fernando, Nisansa de Silva, Menan Velayuthan, Charitha Rathnayake, Surangika Ranathunga:

Improving the Quality of Web-mined Parallel Corpora of Low-Resource Languages using Debiasing Heuristics. 28264-28281 - Rohit Khoja, Devanshu Gupta, Yanjie Fu, Dan Roth, Vivek Gupta:

Weaver: Interweaving SQL and LLM for Table Reasoning. 28282-28308 - Seungmin Shin, Dooyoung Kim, Youngjoong Ko:

ECO Decoding: Entropy-Based Control for Controllability and Fluency in Controllable Dialogue Generation. 28309-28321 - Antara Raaghavi Bhattacharya, Isabel Papadimitriou, Kathryn Davidson, David Alvarez-Melis:

Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles. 28322-28332 - Hannah Cyberey, Yangfeng Ji, David Evans:

Unsupervised Concept Vector Extraction for Bias Control in LLMs. 28333-28355 - Jin Zhao, Xinrui Hu, Nianwen Xue:

Seeing the Same Story Differently: Framing-Divergent Event Coreference for Computational Framing Analysis. 28356-28371 - Fan Bai, Hamid Hassanzadeh, Ardavan Saeedi, Mark Dredze:

LLMs are Better Than You Think: Label-Guided In-Context Learning for Named Entity Recognition. 28372-28392 - Jaewon Cheon

, Pilsung Kang:
COUNTDOWN: Contextually Sparse Activation Filtering Out Unnecessary Weights in Down Projection. 28393-28409 - Chelsi Jain, Yiran Wu, Yifan Zeng, Jiale Liu, Shengyu Dai, Zhenwen Shao, Qingyun Wu, Huazheng Wang:

SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement. 28410-28427 - Runze Liu, Chenjia Bai, Jiafei Lyu, Shengjie Sun, Yali Du, Xiu Li:

VLP: Vision-Language Preference Learning for Embodied Manipulation. 28428-28444 - Kuei-Chun Kao, Hsu Tzu-Yin, Yunqi Hong, Ruochen Wang, Cho-Jui Hsieh:

QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models. 28445-28460 - Ashish Seth, Utkarsh Tyagi, Ramaneswaran Selvakumar, Nishit Anand, Sonal Kumar, Sreyan Ghosh, Ramani Duraiswami, Chirag Agarwal, Dinesh Manocha:

EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding. 28461-28480 - Ramaneswaran Selvakumar, Ashish Seth, Nishit Anand, Utkarsh Tyagi, Sonal Kumar, Sreyan Ghosh, Dinesh Manocha:

MULTIVOX: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions. 28481-28493 - Minyeong Choe

, Haehyun Cho, Changho Seo, Hyunil Kim:
Do All Autoregressive Transformers Remember Facts the Same Way? A Cross-Architecture Analysis of Recall Mechanisms. 28494-28513 - Luca Mitran, Sophie Wu, Andrew Piper:

Probing Narrative Morals: A New Character-Focused MFT Framework for Use with Large Language Models. 28514-28529 - Dezhi Zhao, Xin Liu, Xiaocheng Feng, Hui Wang, Bing Qin:

Probing and Boosting Large Language Models Capabilities via Attention Heads. 28530-28544 - Jiyao Wei, Saiping Guan, Da Li, Zhongni Hou, Miao Su, Yucan Guo, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng:

A Survey of Link Prediction in N-ary Knowledge Graphs. 28545-28567 - Bingqian Liu, Fu Zhang, Guoqing Chen, Jingwei Cheng:

Multi-Frequency Contrastive Decoding: Alleviating Hallucinations for Large Vision-Language Models. 28568-28584 - Yifan Duan, Yihong Tang

, Kehai Chen, Liqiang Nie, Min Zhang:
ORPP: Self-Optimizing Role-playing Prompts to Enhance Language Model Capabilities. 28585-28600 - Tianyuan Huang, Zepeng Zhu, Hangdi Xing, Zirui Shao, Zhi Yu, Chaoxiong Yang, Jiaxian He, Xiaozhong Liu, Jiajun Bu:

BrailleLLM: Braille Instruction Tuning with Large Language Models for Braille Domain Tasks. 28601-28612 - Yevheniia Kryklyvets, Mohammed Irfan Kurpath, Sahal Shaji Mullappilly, Jinxing Zhou, Fahad Shahbaz Khan, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal:

MAviS: A Multimodal Conversational Assistant For Avian Species. 28613-28639 - Manato Tajiri, Michimasa Inaba:

Refining Text Generation for Realistic Conversational Recommendation via Direct Preference Optimization. 28640-28661 - Shashank Srivastava:

Large Language Models Threaten Language's Epistemic and Communicative Foundations. 28662-28676 - Zhuo Chen, Xinyu Wang, Yong Jiang, Zhen Zhang, Xinyu Geng, Pengjun Xie, Fei Huang, Kewei Tu:

Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference. 28677-28692 - Jeongwoo Na, Jun Kwon, Eunseong Choi, Jongwuk Lee:

Multi-view-guided Passage Reranking with Large Language Models. 28693-28706 - Özge Alaçam, Sanne Hoeken, Andreas Säuberli, Hannes Gröner, Diego Frassinelli, Sina Zarrieß, Barbara Plank:

Disentangling Subjectivity and Uncertainty for Hate Speech Annotation and Modeling using Gaze. 28707-28724 - Junhyuk Choi, Ro-hoon Oh, Jihwan Seol, Bugeun Kim:

VoiceBBQ: Investigating Effect of Content and Acoustics in Social Bias of Spoken Language Model. 28725-28736 - Advaith Malladi

, Rakesh R. Menon, Yuvraj Jain, Shashank Srivastava:
Explaining Differences Between Model Pairs in Natural Language through Sample Learning. 28737-28759 - Yu-Ang Lee, Guan-Ting Yi

, Mei-Yi Liu, Jui-Chao Lu, Guan-Bo Yang, Yun-Nung Chen:
Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future Directions. 28760-28775 - Xiaohan Ding

, Kaike Ping, Buse Çarik, Eugenia Ha Rim Rho
:
A Multi-Level Benchmark for Causal Language Understanding in Social Media Discourse. 28776-28790 - Zihan Liang

, Ziwen Pan, Ruoxuan Xiong:
Causal Representation Learning from Multimodal Clinical Records under Non-Random Modality Missingness. 28791-28808 - Keon-Woo Roh, Yeong-Joon Ju, Seong-Whan Lee:

XLQA: A Benchmark for Locale-Aware Multilingual Open-Domain Question Answering. 28809-28821 - Xin Su, Phillip Howard, Steven Bethard

:
Transformer-Based Temporal Information Extraction and Application: A Review. 28822-28841 - Ruohao Guo, Wei Xu, Alan Ritter:

How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation. 28842-28861 - Yejin Lee, Joonghyuk Hahn, Hyeseon Ahn, Yo-Sub Han:

AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection. 28862-28874 - Hanqi Duan, Yao Cheng, Jianxiang Yu, Yao Liu, Xiang Li:

Can Large Language Models Act as Ensembler for Multi-GNNs? 28875-28894 - Younwoo Choi, Changling Li

, Yongjin Yang, Zhijing Jin:
Agent-to-Agent Theory of Mind: Testing Interlocutor Awareness among Large Language Models. 28895-28928 - Ridwan Mahbub, Mohammed Saidul Islam

, Mir Tafseer Nayeem, Md. Tahmid Rahman Laskar, Mizanur Rahman, Shafiq Joty, Enamul Hoque:
From Charts to Fair Narratives: Uncovering and Mitigating Geo-Economic Biases in Chart-to-Text. 28929-28947 - Tongtong Liu, Zhaohui Wang, Meiyue Qin, Zenghui Lu, Xudong Chen, Yuekui Yang, Peng Shu:

Real-time Ad Retrieval via LLM-generative Commercial Intention for Sponsored Search Advertising. 28948-28960 - Ikhyun Cho, Julia Hockenmaier:

Toward Efficient Sparse Autoencoder-Guided Steering for Improved In-Context Learning in Large Language Models. 28961-28973 - Boyu Zhang, Ping He, Tianyu Du, Xuhong Zhang, Lei Yun, Kingsum Chow, Jianwei Yin:

CLMTracing: Black-box User-level Watermarking for Code Language Model Tracing. 28974-28990 - Abdelrahman Sadallah, Tim Baumgärtner

, Iryna Gurevych, Ted Briscoe:
The Good, the Bad and the Constructive: Automatically Measuring Peer Review's Utility for Authors. 28991-29021 - Linfeng Liu, Hongqiu Wu, Hai Zhao:

Evolving Chinese Spelling Correction with Corrector-Verifier Collaboration. 29022-29028 - Yang Zhou, Pengfei Cao, Yubo Chen, Qingbin Liu, Dianbo Sui, Xi Chen, Kang Liu, Jun Zhao:

M2Edit: Locate and Edit Multi-Granularity Knowledge in Multimodal Large Language Model. 29029-29042 - Haochen Shi, Shaobo Li, Guoqing Chao, Xiaoliang Shi, Wentao Chen, Zhenzhou Ji:

Do LLMs Behave as Claimed? Investigating How LLMs Follow Their Own Claims using Counterfactual Questions. 29043-29056 - Alan Ramponi, Marco Rovera, Róbert Móro, Sara Tonelli:

Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two Approaches. 29057-29076 - Saad Obaid ul Islam, Anne Lauscher, Goran Glavas:

How Much Do LLMs Hallucinate across Languages? On Realistic Multilingual Estimation of LLM Hallucination. 29077-29098 - Ran Zhang, Wei Zhao, Lieve Macken, Steffen Eger:

LiTransProQA: An LLM-based Literary Translation Evaluation Metric with Professional Question Answering. 29099-29121 - Alessa Carbo

, Eric T. Nalisnick:
Improving Handshape Representations for Sign Language Processing: A Graph Neural Network Approach. 29122-29135 - Oscar Sainz, Naiara Pérez, Julen Etxaniz, Joseba Fernandez de Landa, Itziar Aldabe, Iker García-Ferrero, Aimar Zabala, Ekhi Azurmendi, German Rigau, Eneko Agirre, Mikel Artetxe, Aitor Soroa:

Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque. 29136-29160 - Ritam Dutt, Carolyn P. Rosé, Maarten Sap:

SOCIAL SCAFFOLDS: A Generalization Framework for Social Understanding Tasks. 29161-29197 - Haotian Dong

, Jingyan Jiang, Rongwei Lu, Jiajun Luo, Jiajun Song, Bowen Li, Ying Shen, Zhi Wang:
Beyond A Single AI Cluster: A Survey of Decentralized LLM Training. 29198-29212 - Pranav Bhandari, Nicolas Fay, Michael J. Wise, Amitava Datta, Stephanie Meek, Usman Naseem, Mehwish Nasim:

Can LLM Agents Maintain a Persona in Discourse? 29213-29229 - Shun Shao, Yftah Ziser, Zheng Zhao

, Yifu Qiu, Shay B. Cohen, Anna Korhonen:
Iterative Multilingual Spectral Attribute Erasure. 29230-29255 - Abir Harrasse, Philip Quirke, Clement Neo, Dhruv Nathawani, Luke Marks, Amir Abdullah:

TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research. 29256-29284 - Fares Fawzi

, Vinitra Swamy, Dominik Glandorf, Tanya Nazaretsky, Tanja Käser:
SCRIBE: Structured Chain Reasoning for Interactive Behaviour Explanations using Tool Calling. 29285-29310 - Jianfeng Deng, Qingfeng Chen, Debo Cheng, Jiuyong Li, Lin Liu:

Logit Space Constrained Fine-Tuning for Mitigating Hallucinations in LLM-Based Recommender Systems. 29311-29324 - Dongjie Fu, Xize Cheng, Linjun Li, Xiaoda Yang, Lujia Yang, Tao Jin:

PACHAT: Persona-Aware Speech Assistant for Multi-party Dialogue. 29325-29342 - Junda Zhu, Lingyong Yan, Shuaiqiang Wang, Dawei Yin, Lei Sha:

Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking. 29343-29361 - Shuzhou Yuan, Jingyi Sun, Ran Zhang, Michael Färber, Steffen Eger, Pepa Atanasova

, Isabelle Augenstein
:
Graph-Guided Textual Explanation Generation Framework. 29362-29386 - Leonardo Bertolazzi, Philipp Mondorf, Barbara Plank, Raffaella Bernardi:

The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It. 29387-29424 - Kerem Zaman, Shashank Srivastava:

A Causal Lens for Evaluating Faithfulness Metrics. 29425-29449 - Yifei Yu, Qian-Wen Zhang, Lingfeng Qiao, Di Yin, Fang Li, Jie Wang, Chen Zeng Xi, Suncong Zheng, Xiaolong Liang, Xing Sun:

Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts. 29450-29468 - Pengxiang Zhao, Hanyu Hu, Ping Li, Yi Zheng, Zhefeng Wang, Xiaoming Yuan:

FISTAPruner: Layer-wise Post-training Pruning for Large Language Models. 29469-29487 - Jayanth Krishna Chundru, Rudrashis Poddar, Jie Cao, Tianyu Jiang

:
Do LLMs Encode Frame Semantics? Evidence from Frame Identification. 29488-29500 - Kyumin Lee, Minjin Jeon, Sanghwan Jang, Hwanjo Yu:

StepER: Step-wise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models. 29501-29523 - Yushi Yang, Filip Sondej, Harry Mayne, Andrew Lee, Adam Mahdi:

How Does DPO Reduce Toxicity? A Mechanistic Neuron-Level Analysis. 29524-29543 - Yue Li, Zhixue Zhao, Carolina Scarton:

It's All About In-Context Learning! Teaching Extremely Low-Resource Languages to LLMs. 29544-29559 - Kwesi A. Cobbina, Tianyi Zhou:

Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning. 29560-29593 - Ilker Kesen, Jonas F. Lotz, Ingo Ziegler

, Phillip Rust, Desmond Elliott
:
Multilingual Pretraining for Pixel Language Models. 29594-29611 - Gabrielle Kaili-May Liu, Gal Yona, Avi Caciularu, Idan Szpektor, Tim G. J. Rudner, Arman Cohan:

MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs. 29612-29656 - George Drayson

, Emine Yilmaz, Vasileios Lampos:
Machine-generated text detection prevents language model collapse. 29657-29673 - Faeze Ghorbanpour, Daryna Dementieva, Alexander Fraser

:
Data-Efficient Hate Speech Detection via Cross-Lingual Nearest Neighbor Retrieval with Limited Labeled Data. 29674-29692 - Qi Lin, Weikai Xu, Lisi Chen, Bin Dai:

V-VAE: A Variational Auto Encoding Framework Towards Fine-Grained Control over Human-Like Chat. 29693-29706 - João Maria Janeiro, Belen Alastruey, Francisco Massa, Maha Elbayad, Benjamin Piwowarski, Patrick Gallinari, Loïc Barrault:

Mixture of Languages: Improved Multilingual Encoders Through Language Grouping. 29707-29722 - Gautam Siddharth Kashyap

, Mark Dras
, Usman Naseem
:
Too Helpful, Too Harmless, Too Honest or Just Right? 29723-29734 - Danrui Li

, Sen Zhang, Samuel S. Sohn, Kaidong Hu, Muhammad Usman, Mubbasir Kapadia:
Cardiverse: Harnessing LLMs for Novel Card Game Prototyping. 29735-29762 - Ignacio J. Tripodi, Greg Buda, Margaret Meagher, Elizabeth A. Olson:

Assessing effective de-escalation of crisis conversations using transformer-based models and trend statistics. 29763-29777 - Seong-Jin Park, Kang-Min Kim:

Measuring and Mitigating Media Outlet Name Bias in Large Language Models. 29778-29797 - Stephanie Schoch, Yangfeng Ji:

The Good, the Bad, and the Debatable: A Survey on the Impacts of Data for In-Context Learning. 29798-29812 - Thibaud Ardoin, Yi Cai, Gerhard Wunder:

Where Confabulation Lives: Latent Feature Discovery in LLMs. 29813-29837 - Samuel Lewis-Lim, Xingwei Tan, Zhixue Zhao

, Nikolaos Aletras:
Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation? 29838-29853 - Nicola Horst, Davide Mazzaccara, Antonia Schmidt, Michael Sullivan, Filippo Momentè, Luca Franceschetti, Philipp Sadler, Sherzod Hakimov, Alberto Testoni, Raffaella Bernardi, Raquel Fernández, Alexander Koller, Oliver Lemon, David Schlangen

, Mario Giulianelli, Alessandro Suglia:
Playpen: An Environment for Exploring Learning From Dialogue Game Feedback. 29854-29891 - Zhifeng Hao, Junqi Huang, Shaobin Shi, Ruichu Cai, Boyan Xu:

GenLink: Generation-Driven Schema-Linking via Multi-Model Learning for Text-to-SQL. 29892-29905 - Marek Strong, Andreas Vlachos:

TSVer: A Benchmark for Fact Verification Against Time-Series Evidence. 29906-29926 - Ruizheng Huang, Zhicheng Zhang, Yong Wang:

Cross-MoE: An Efficient Temporal Prediction Framework Integrating Textual Modality. 29927-29938 - Jack Gallifant, Shan Chen, Kuleen Sasse, Hugo J. W. L. Aerts, Thomas Hartvigsen, Danielle S. Bitterman:

Sparse Autoencoder Features for Classifications and Transferability. 29939-29963 - Yang Yang, Mohan Timilsina, Edward Curry:

KGE Calibrator: An Efficient Probability Calibration Method of Knowledge Graph Embedding Models for Trustworthy Link Prediction. 29964-29987 - Takumi Shibata

, Yuichi Miyamura:
LCES: Zero-shot Automated Essay Scoring via Pairwise Comparisons Using Large Language Models. 29988-30001 - Sanad Shaban, Nizar Habash:

The Arabic Generality Score: Another Dimension of Modeling Arabic Dialectness. 30002-30013 - Mostafa Saeed, Nizar Habash:

Lemmatization as a Classification Task: Results from Arabic across Multiple Genres. 30014-30029 - Aida Mostafazadeh Davani, Sunipa Dev, Héctor Pérez-Urbina, Vinodkumar Prabhakaran:

A Comprehensive Framework to Operationalize Social Stereotypes for Responsible AI Evaluations. 30030-30043 - Amber Shore, Russell Scheinberg, Ameeta Agrawal, So Young Lee:

Correct-Detect: Balancing Performance and Ambiguity Through the Lens of Coreference Resolution in LLMs. 30044-30058 - Melissa Kazemi Rad, Alberto Purpura, Himanshu Kumar, Emily Chen, Mohammad Shahed Sorower:

GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection. 30059-30077 - Taro Yano, Yoichi Ishibashi, Masafumi Oyamada:

LaMDAgent: An Autonomous Framework for Post-Training Pipeline Optimization via LLM Agents. 30078-30095 - Akaash Kolluri, Shengguang Wu, Joon Sung Park, Michael S. Bernstein:

Finetuning LLMs for Human Behavior Prediction in Social Science Experiments. 30096-30111 - Anthony Hughes, Nikolaos Aletras, Ning Ma

:
How Private are Language Models in Abstractive Summarization? 30112-30130 - Zelin Li, Dawei Song:

Expectation Preference Optimization: Reliable Preference Estimation for Improving the Reasoning Capability of Large Language Models. 30131-30146 - Sruthi Gorantla, Aditya Rawal, Devamanyu Hazarika, Kaixiang Lin, Mingyi Hong, Mahdi Namazifar:

Split-Merge: Scalable and Memory-Efficient Merging of Expert LLMs. 30147-30166 - Ashwin Ramaswamy, Nestor Demeure, Ermal Rrapaj:

Model Consistency as a Cheap yet Predictive Proxy for LLM Elo Scores. 30167-30175 - Xueqing Peng, Triantafillos Papadopoulos, Efstathia Soufleri, Polydoros Giannouris, Ruoyu Xiang, Yan Wang, Lingfei Qian, Jimin Huang, Qianqian Xie, Sophia Ananiadou:

Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance. 30176-30202 - Avishek Lahiri, Yufang Hou, Debarshi Kumar Sanyal:

TaxoAlign: Scholarly Taxonomy Generation Using Language Models. 30203-30223 - Witold Sosnowski, Arkadiusz Modzelewski, Kinga Skorupska, Adam Wierzbicki:

DiNaM: Disinformation Narrative Mining with Large Language Models. 30224-30251 - Lesheng Jin, Zhenyuan Ruan, Haohui Mai, Jingbo Shang:

VeriLocc: End-to-End Cross-Architecture Register Allocation via LLM. 30252-30262 - Mohamed Bayan Kmainasi, Abul Hasnat, Md. Arid Hasan, Ali Ezzat Shahroor

, Firoj Alam:
MemeIntel: Explainable Detection of Propagandistic and Hateful Memes. 30263-30279 - Seoyoon Park, Hyeji Choi, Minseon Kim, Subin An, Xiaonan Wang, Gyuri Choi, Hansaem Kim:

FLUID QA: A Multilingual Benchmark for Figurative Language Usage in Dialogue across English, Chinese, and Korean. 30280-30294 - Mohna Chakraborty, Lu Wang, David Jurgens:

Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework. 30295-30323 - Hao Peng, Yunjia Qi, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li:

VerIF: Verification Engineering for Reinforcement Learning in Instruction Following. 30324-30339 - Ruihan Yang, Caiqi Zhang, Zhisong Zhang, Xinting Huang, Dong Yu, Nigel Collier, Deqing Yang:

UNCLE: Benchmarking Uncertainty Expressions in Long-Form Generation. 30340-30356 - Massimiliano Pronesti, Michela Lorandi, Paul Flanagan, Oisin Redmond, Anya Belz, Yufang Hou:

Enhancing Study-Level Inference from Clinical Trial Papers via Reinforcement Learning-Based Numeric Reasoning. 30357-30373 - Ali Veisi, Hamidreza Amirzadeh, Amir Mansourian:

Context-aware Biases for Length Extrapolation. 30374-30395 - Yifei Li, Hanane Nour Moussa, Ziru Chen, Shijie Chen, Botao Yu, Mingyi Xue, Benjamin Burns, Tzu-Yao Chiu, Vishal Dey, Zitong Lu, Chen Wei, Qianheng Zhang, Tianyu Zhang, Song Gao, Xuhui Huang, Xia Ning, Nesreen K. Ahmed, Ali Payani, Huan Sun:

AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists. 30396-30418 - Nir Sweed, Hanit Hakim, Ben Wolfson, Hila Lifshitz, Dafna Shahaf:

Finding your MUSE: Mining Unexpected Solutions Engine. 30419-30434 - Yao Fu, Xianxuan Long, Runchao Li, Haotian Yu, Mu Sheng, Xiaotian Han, Yu Yin, Pan Li:

Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs. 30435-30458 - Su-Hyeong Park, Ho-Beom Kim, Seong-Jin Park, Dinara Aliyeva, Kang-Min Kim:

Leveraging Knowledge Graph-Enhanced LLMs for Context-Aware Medical Consultation. 30459-30475 - Fatemeh Haji, Mazal Bethany, Cho-Yu Jason Chiang, Anthony Rios, Peyman Najafirad:

Reflective Agreement: Combining Self-Mixture of Agents with a Sequence Tagger for Robust Event Extraction. 30476-30492 - Maya Kruse, Majid Afshar, Saksham Khatwani, Anoop M. Mayampurath, Guanhua Chen, Yanjun Gao:

Simple Yet Effective: An Information-Theoretic Approach to Multi-LLM Uncertainty Quantification. 30493-30504 - Alba Táboas García, Piotr Przybyla

, Leo Wanner:
Exploring morphology-aware tokenization: A case study on Spanish language modeling. 30505-30518 - Oghenevovwe Ikumariegbe, Eduardo Blanco, Ellen Riloff:

Studying Rhetorically Ambiguous Questions. 30519-30529 - Xiaoyuan Wu, Weiran Lin, Omer Akgul, Lujo Bauer:

Estimating LLM Consistency: A User Baseline vs Surrogate Metrics. 30530-30544 - DongGeon Lee, Joonwon Jang, Jihae Jeong, Hwanjo Yu:

Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study. 30545-30588 - Varun Dhanraj, Chris Eliasmith:

Improving Rule-based Reasoning in LLMs using Neurosymbolic Representations. 30589-30608 - Jacob Daniel Devasier, Rishabh Mediratta, Chengkai Li:

Can LLMs Extract Frame-Semantic Arguments? 30609-30622 - Woomin Song, Saket Dingliwal, Sai Muralidhar Jayanthi, Bhavana Ganesh, Jinwoo Shin, Aram Galstyan, Sravan Babu Bodapati:

Accelerated Test-Time Scaling with Model-Free Speculative Sampling. 30623-30636 - Karim Galliamov, Ivan Titov, Ilya Pershin:

Enhancing RLHF with Human Gaze Modeling. 30637-30643 - Maithe van Noort, Michal Korenar

, Jelke Bloem
:
Mapping semantic networks to Dutch word embeddings as a diagnostic tool for cognitive decline. 30644-30659 - Aneesh Komanduri, Karuna Bhaila, Xintao Wu:

CausalVLBench: Benchmarking Visual Causal Reasoning in Large Vision-Language Models. 30660-30680 - Yunzhe Wang, Gale M. Lucas, Burcin Becerik-Gerber, Volkan Ustun:

Implicit Behavioral Alignment of Language Agents in High-Stakes Crowd Simulations. 30681-30698 - Keenan Samway, Max Kleiman-Weiner, David Guzman Piedrahita, Rada Mihalcea, Bernhard Schölkopf, Zhijing Jin:

Are Language Models Consequentialist or Deontological Moral Reasoners? 30699-30726 - Yongmin Yoo

, Qiongkai Xu
, Longbing Cao
:
PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims. 30727-30746 - Siddarth Mamidanna, Daking Rai, Ziyu Yao

, Yilun Zhou:
All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens. 30747-30760 - Roelien C. Timmer, Yufang Hou, Stephen Wan:

A Position Paper on the Automatic Generation of Machine Learning Leaderboards. 30761-30784 - AmirHossein Dabiri Aghdam, Lele Wang:

SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models. 30785-30806 - Thong Nguyen, Yibin Lei, Jia-Huei Ju, Andrew Yates:

SERVAL: Surprisingly Effective Zero-Shot Visual Document Retrieval Powered by Large Vision and Language Models. 30807-30822 - Han Wu, Jie Yin

:
Meta-Semantics Augmented Few-Shot Relational Learning. 30823-30835 - Rui Wang, Bohao Li, Xiyang Dai, Jianwei Yang, Yi-Ling Chen, Zhen Xing, Yifan Yang, Dongdong Chen, Xipeng Qiu, Zuxuan Wu, Yu-Gang Jiang:

ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning. 30836-30849 - Ashima Suvarna, Christina Chance

, Karolina Naranjo, Hamid Palangi, Sophie Hao
, Thomas Hartvigsen, Saadia Gabriel:
ModelCitizens: Representing Community Voices in Online Safety. 30850-30866 - Pengyu Wang, Shaojun Zhou, Chenkun Tan, Xinghao Wang, Wei Huang, Zhen Ye, Zhaowei Li, Botian Jiang, Dong Zhang, Xipeng Qiu:

UnifiedVisual: A Framework for Constructing Unified Vision-Language Datasets. 30867-30899 - Suhas BN, Yash Mahajan, Dominik Mattioli, Andrew M. Sherrill, Rosa I. Arriaga, Christopher W. Wiese, Saeed Abdullah:

The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support. 30900-30922 - Zirui Shao, Feiyu Gao, Zhaoqing Zhu, Chuwei Luo, Hangdi Xing, Zhi Yu, Qi Zheng, Ming Yan, Jiajun Bu:

Is Cognition Consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding. 30923-30944 - Fengze Liu, Haoyu Wang

, Joonhyuk Cho, Dan Roth, Andrew Lo:
AutoCT: Automating Interpretable Clinical Trial Prediction with LLM Agents. 30945-30970 - Kuicai Dong, Yujing Chang, Derrick-Goh-Xin Deik, Dexun Li, Ruiming Tang, Yong Liu:

MMDocIR: Benchmarking Multimodal Retrieval for Long Documents. 30971-31005 - Subhendu Khatuya

, Shashwat Naidu
, Pawan Goyal, Niloy Ganguly:
Program of Thoughts for Financial Reasoning: Leveraging Dynamic In-Context Examples and Generative Retrieval. 31006-31018 - Muhammad Ali, Salman Khan:

Waste-Bench: A Comprehensive Benchmark for Evaluating VLLMs in Cluttered Environments. 31019-31032 - Zixuan Ke, Yifei Ming, Xuan-Phi Nguyen, Caiming Xiong, Shafiq Joty:

Demystifying Domain-adaptive Post-training for Financial LLMs. 31033-31059 - Mian Zhong, Pristina Wang, Anjalie Field

:
HICode: Hierarchical Inductive Coding with LLMs. 31060-31078 - Zhiyao Ma, In Gim, Lin Zhong:

Cacheback: Speculative Decoding With Nothing But Cache. 31079-31084 - Yifan Liu

, Qianfeng Wen, Mark Zhao, Jiazhou Liang, Scott Sanner:
MA-DPR: Manifold-aware Distance Metrics for Dense Passage Retrieval. 31085-31103 - Md Mezbaur Rahman, Cornelia Caragea:

LLM-Guided Co-Training for Text Classification. 31104-31121 - Yike Zhang, Zhiyuan He, Huiqiang Jiang, Chengruidong Zhang, Yuqing Yang, Jianyong Wang, Lili Qiu:

LeanK: Learnable K Cache Channel Pruning for Efficient Decoding. 31122-31137 - Hammad A. Ayyubi, Puneet Mathur, Md. Mehrab Tanjim, Vlad I. Morariu:

DELOC: Document Element Localizer. 31138-31147 - Yue Fang, Shaohan Huang, Xin Yu, Haizhen Huang, Zihan Zhang, Weiwei Deng, Furu Wei, Feng Sun, Qi Zhang, Zhi Jin:

NL2Lean: Translating Natural Language into Lean 4 through Multi-Aspect Reinforcement Learning. 31148-31158 - Sunayana Sitaram, Adrian de Wynter, Isobel McCrum, Qilong Gu, Si-Qing Chen:

A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications. 31159-31183 - Prasanna Reddy Pulakurthi

, Jiamian Wang, Majid Rabbani, Sohail A. Dianat, Raghuveer Rao, Zhiqiang Tao:
X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning. 31184-31195 - Yichen Ouyang, Lu Wang, Fangkai Yang, Pu Zhao, Chenghua Huang, Jianfeng Liu, Bochen Pang, Yaming Yang, Yuefeng Zhan, Hao Sun, Qingwei Lin, Saravan Rajmohan, Weiwei Deng, Dongmei Zhang, Feng Sun:

Token-level Proximal Policy Optimization for Query Generation. 31196-31210 - Pittawat Taveekitworachai, Potsawee Manakul, Sarana Nutanong, Kunat Pipatanakul:

Prior Prompt Engineering for Reinforcement Fine-Tuning. 31211-31236 - Siyu Liang

, Nicolas Ballier, Gina-Anne Levow, Richard A. Wright:
Beyond WER: Probing Whisper's Sub-token Decoder Across Diverse Language Resource Levels. 31237-31247 - Aswin RRV, Jacob Dineen, Divij Handa, Md Nayem Uddin, Mihir Parmar, Chitta Baral, Ben Zhou:

ThinkTuning: Instilling Cognitive Reflections without Distillation. 31248-31262 - Daniil Orel, Indraneil Paul, Iryna Gurevych, Preslav Nakov:

Droid: A Resource Suite for AI-Generated Code Detection. 31263-31289 - Guanyu Li, Zhiheng Xi, Zhihao Zhang, Boyang Hong, Tao Gui, Qi Zhang, Xuanjing Huang:

LoRACoE: Improving Large Language Model via Composition-based LoRA Expert. 31290-31304 - Tingchen Fu, Fazl Barez:

Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness. 31305-31319 - Jiayou Zhong, Anudeex Shetty, Chao Jia, Xuanrui Lin, Usman Naseem:

Pluralistic Alignment for Healthcare: A Role-Driven Framework. 31320-31343 - Andrew Zhang, Anushka Sivakumar, Chia-Wei Tang, Chris Thomas:

Flexible-length Text Infilling for Discrete Diffusion Models. 31344-31359 - Sabri Boughorbel, Fahim Dalvi, Nadir Durrani, Majd Hawasly:

Beyond the Leaderboard: Understanding Performance Disparities in Large Language Models via Model Diffing. 31360-31371 - Malik Marmonier, Rachel Bawden, Benoît Sagot:

Explicit Learning and the LLM in Machine Translation. 31372-31422 - Jacob Lee Suchardt

, Hana El-Shazli, Pierluigi Cassotti:
Towards Language-Agnostic STIPA: Universal Phonetic Transcription to Support Language Documentation at Scale. 31423-31439 - Alon Eirew, Kfir Bar, Ido Dagan:

Beyond Pairwise: Global Zero-shot Temporal Graph Generation. 31440-31458 - Hongyu Chen, Neele Falk, Michael Roth, Agnieszka Falenska:

"Feels Feminine to Me": Understanding Perceived Gendered Style through Human Annotations. 31459-31480 - Fabian Anghel, Cristea Petru-Theodor, Claudiu Creanga, Sergiu Nisioi

:
RALS: Resources and Baselines for Romanian Automatic Lexical Simplification. 31481-31492 - Herun Wan, Minnan Luo, Zihan Ma, Guang Dai, Xiang Zhao:

How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis. 31493-31516 - Anthony Dubreuil, Antoine Gourru, Christine Largeron, Amine Trabelsi:

Are Stereotypes Leading LLMs' Zero-Shot Stance Detection ? 31517-31530 - Arnav Arora

, Srishti Yadav, Maria Antoniak, Serge J. Belongie
, Isabelle Augenstein
:
Multi-Modal Framing Analysis of News. 31531-31553 - Junjie Huang, Ruiquan Zhang, Jinsong Su, Yidong Chen

:
TempParaphraser: "Heating Up" Text to Evade AI-Text Detection through Paraphrasing. 31554-31573 - Sandro Paval, Pascal Meißner, Ivan P. Yamshchikov:

ComicScene154: A Scene Dataset for Comic Analysis. 31574-31580 - Roman Christof, Farnaz Zeidi, Manuela Messelhäußer, Dirk Mentzer, Renate König

, Liam Harold Childs, Alexander Mehler:
MedLinkDE - MedDRA Entity Linking for German with Guided Chain of Thought Reasoning. 31581-31593 - Longkai Cheng

, Along He, Mulin Li, Xueshuo Xie, Tao Li:
HookMoE: A learnable performance compensation strategy of Mixture-of-Experts for LLM inference acceleration. 31594-31606 - Mengying Yuan, WenHao Wang, Zixuan Wang, Yujie Huang, Kangli Wei, Fei Li, Chong Teng, Donghong Ji:

Cross-Document Cross-Lingual NLI via RST-Enhanced Graph Fusion and Interpretability Prediction. 31607-31629 - Longxuan Ma, Xiao Wu, Yuxin Huang, Shengxiang Gao, Zhengtao Yu:

3R: Enhancing Sentence Representation Learning via Redundant Representation Reduction. 31630-31643 - Abhirama Subramanyam Penamakuri

, Navlika Singh, Piyush Arora, Anand Mishra:
When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs. 31644-31661 - Jingqi Zhou, Sheng Wang, Jingwei Dong, Kai Liu, Lei Li, Jiahui Gao, Jiyue Jiang, Lingpeng Kong, Chuan Wu:

ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom. 31662-31691 - Nicholas Popovic

, Michael Färber:
Extractive Fact Decomposition for Interpretable Natural Language Inference in one Forward Pass. 31692-31705 - Bryan Eikema, Anna Rutkiewicz, Mario Giulianelli:

Structure-Conditional Minimum Bayes Risk Decoding. 31706-31723 - Yue Li, Zhixue Zhao, Carolina Scarton:

Label Set Optimization via Activation Distribution Kurtosis for Zero-Shot Classification with Generative Models. 31724-31741 - Hinata Tezuka, Naoya Inoue:

The Transfer Neurons Hypothesis: An Underlying Mechanism for Language Latent Space Transitions in Multilingual LLMs. 31742-31792 - Thu Phuong Nguyen, Duc M. Nguyen, Hyotaek Jeon, Hyunwook Lee, Hyunmin Song, Sungahn Ko, Taehwan Kim:

VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions. 31793-31813 - Caiqi Zhang, Chang Shu, Ehsan Shareghi, Nigel Collier:

All Roads Lead to Rome: Graph-Based Confidence Estimation for Large Language Model Reasoning. 31814-31824 - Arvindh Arun, Sumit Kumar, Mojtaba Nayyeri, Bo Xiong, Ponnurangam Kumaraguru, Antonio Vergari, Steffen Staab:

SEMMA: A Semantic Aware Knowledge Graph Foundation Model. 31825-31848 - Mizanur Rahman, Md. Tahmid Rahman Laskar, Shafiq Joty, Enamul Hoque:

Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text. 31849-31874 - Mansi Dhamne

, Sneha Raman, Preeti Rao:
Predicting Prosodic Boundaries for Children's Texts. 31875-31885 - Xingwei Tan, Marco Valentino, Mahmud Elahi Akhter

, Maria Liakata, Nikolaos Aletras:
Enhancing Logical Reasoning in Language Models via Symbolically-Guided Monte Carlo Process Supervision. 31886-31900 - Piotr Sawicki, Marek Grzes, Dan Brown, Fabrício Góes:

Can Large Language Models Outperform Non-Experts in Poetry Evaluation? A Comparative Study Using the Consensual Assessment Technique. 31901-31918 - Zijian Ling, Han Zhang, Jiahao Cui, Zhequn Wu, Xu Sun, Guohao Li, Xiangjian He:

Beyond Human Labels: A Multi-Linguistic Auto-Generated Benchmark for Evaluating Large Language Models on Resume Parsing. 31919-31945 - Zeju Qiu, Weiyang Liu, Adrian Weller, Bernhard Schölkopf:

Orthogonal Finetuning Made Scalable. 31946-31963 - Wei Liu, Yancheng He, Yu Li, Hui Huang, Chengwei Hu, Jiaheng Liu, Shilong Li, Wenbo Su, Bo Zheng:

AIR: Complex Instruction Generation via Automatic Iterative Refinement. 31964-31986 - Mushtari Sadia

, Zhenning Yang, Yunming Xiao, Ang Chen, Amrita Roy Chowdhury:
SQUiD: Synthesizing Relational Databases from Unstructured Text. 31987-32012 - Yu Wang, Shiwan Zhao, Zhihu Wang, Ming Fan, Xicheng Zhang, Yubo Zhang, Zhengfan Wang, Heyuan Huang, Ting Liu:

RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning. 32013-32037 - Wentao Wang, Guangyuan Jiang, Tal Linzen, Brenden M. Lake:

Rapid Word Learning Through Meta In-Context Learning. 32038-32073 - Jacqueline Rowe, Mateusz Klimaszewski, Liane Guillou, Shannon Vallor, Alexandra Birch:

EuroGEST: Investigating gender stereotypes in multilingual language models. 32074-32096 - Tu Nguyen, Kevin Du, Alexander Miserlis Hoyle, Ryan Cotterell:

How Persuasive Is Your Context? 32097-32123 - Yu Fan, Yang Tian, Shauli Ravfogel, Mrinmaya Sachan, Elliott Ash, Alexander Miserlis Hoyle:

The Medium Is Not the Message: Deconfounding Document Embeddings via Linear Concept Erasure. 32124-32143 - Hauke Licht, Rupak Sarkar, Patrick Y. Wu, Pranav Goel, Niklas Stoehr, Elliott Ash, Alexander Miserlis Hoyle:

Measuring scalar constructs in social science with LLMs. 32144-32171 - Jing Yu

, Yibo Zhao, Jiapeng Zhu, Wenming Shao, Bo Pang, Zhao Zhang, Xiang Li:
Text Detoxification: Data Efficiency, Semantic Preservation and Model Generalization. 32172-32186 - Kiana Aghakasiri

, Noopur Zambare, JoAnn Thai, Carrie Ye, Mayur Mehta, J. Ross Mitchell, Mohamed Abdalla:
Not What the Doctor Ordered: Surveying LLM-based De-identification and Quantifying Clinical Information Loss. 32187-32203 - Zhenning Shi, Yijia Zhu, Yi Xie, Junhan Shi, Guorui Xie, Haotian Zhang, Yong Jiang, Congcong Miao, Qing Li:

Reasoning under Uncertainty: Efficient LLM Inference via Unsupervised Confidence Dilution and Convergent Adaptive Sampling. 32204-32218 - Charles Nimo, Shuheng Liu, Irfan Essa, Michael L. Best:

Africa Health Check: Probing Cultural Bias in Medical LLMs. 32219-32232 - Orfeas Menis-Mastromichalakis, Giorgos Filandrianos, Maria Symeonaki, Giorgos Stamou:

Assumed Identities: Quantifying Gender Bias in Machine Translation of Gender-Ambiguous Occupational Terms. 32233-32249 - Aly M. Kassem, Zhuan Shi, Negar Rostamzadeh, Golnoosh Farnadi:

REVIVING YOUR MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing. 32250-32263 - Matteo Bortoletto, Constantin Ruhdorfer, Andreas Bulling:

ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions. 32264-32289 - Grgur Kovac, Jérémy Perez, Rémy Portelas, Peter Ford Dominey, Pierre-Yves Oudeyer:

Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data? 32290-32309 - Hazel Kim, Tom A. Lamb, Adel Bibi, Philip Torr, Yarin Gal:

Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Ambiguous Prompts and Unanswerable Questions. 32310-32322 - Kuang-Da Wang, Shuoyang Ding, Chao-Han Huck Yang, Ping-Chun Hsieh, Wen-Chih Peng, Vitaly Lavrukhin, Boris Ginsburg:

Extending Automatic Machine Translation Evaluation to Book-Length Documents. 32323-32339 - Tong Chen, Zimu Wang, Yiyi Miao, Haoran Luo, Yuanfei Sun, Wei Wang, Zhengyong Jiang, Procheta Sen, Jionglong Su:

MedFact: A Large-scale Chinese Dataset for Evidence-based Medical Fact-checking of LLM Responses. 32340-32353 - Yogesh Kulkarni, Pooyan Fazli:

VideoPASTA: 7K Preference Pairs That Matter for Video-LLM Alignment. 32354-32379 - Seyedali Mohammadi

, Bhaskara Hanuma Vedula, Hemank Lamba, Edward Raff, Ponnurangam Kumaraguru, Francis Ferraro, Manas Gaur:
Do LLMs Adhere to Label Definitions? Examining Their Receptivity to External Label Definitions. 32380-32393 - Oron Anschel, Alon Shoshan, Adam Botach, Shunit Haviv Hakimi, Asaf Gendler, Emanuel Ben Baruch, Nadav Bhonker, Igor Kviatkovsky, Manoj Aggarwal, Gérard G. Medioni:

Group-Aware Reinforcement Learning for Output Diversity in Large Language Models. 32394-32415 - Abteen Ebrahimi, Adam Wiemerslage, Katharina von der Wense:

Model-Based Ranking of Source Languages for Zero-Shot Cross-Lingual Transfer. 32416-32461 - Byeongho Yu

, Changhun Lee, Jungyu Jin, Eunhyeok Park:
PruneCD: Contrasting Pruned Self Model to Improve Decoding Factuality. 32462-32473 - Jinfeng Zhou, Yuxuan Chen, Jianing Yin

, Yongkang Huang, Yihan Shi, Xikun Zhang, Libiao Peng, Rongsheng Zhang, Tangjie Lv, Zhipeng Hu, Hongning Wang, Minlie Huang:
Crisp: Cognitive Restructuring of Negative Thoughts through Multi-turn Supportive Dialogues. 32474-32503 - Srikant Panda, Amit Agarwal, Hitesh Laxmichand Patel:

AccessEval: Benchmarking Disability Bias in Large Language Models. 32504-32530 - Yihao Li, Jiayi Xin, Miranda Muqing Miao, Qi Long

, Lyle H. Ungar:
The Impact of Language Mixing on Bilingual LLM Reasoning. 32531-32548 - Stella Frank, Emily Allaway:

VISaGE: Understanding Visual Generics and Exceptions. 32549-32558 - Alex Laitenberger, Christopher D. Manning, Nelson F. Liu:

Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language Models. 32559-32569 - Yisong Miao

, Min-Yen Kan:
Discursive Circuits: How Do Language Models Understand Discourse Relations? 32570-32589 - Chan Young Park

, Jillian Fisher, Marius Memmel, Dipika Khullar, Seoho Yun, Abhishek Gupta, Yejin Choi:
Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning. 32590-32611 - Gaurav Srivastava

, Shuxiang Cao, Xuan Wang:
ThinkSLM: Towards Reasoning in Small Language Models. 32612-32662 - Justin Chih-Yao Chen, Archiki Prasad, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal:

MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning. 32663-32686 - Anton Korikov, Pan Du, Scott Sanner, Navid Rekabsaz:

Batched Self-Consistency Improves LLM Relevance Assessment and Ranking. 32687-32703 - Marc Felix Brinner, Sina Zarrieß:

SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts. 32704-32719 - Zihao Zhao, Anjalie Field:

Controlled Generation for Private Synthetic Text. 32720-32735 - Kilichbek Haydarov, Youssef Mohamed, Emilio Goldenhersch, Paul OCallaghan, Li-jia Li, Mohamed Elhoseiny:

Towards AI-Assisted Psychotherapy: Emotion-Guided Generative Interventions. 32736-32755 - Yongjian Chen, Antonio Toral:

From Shortcuts to Balance: Attribution Analysis of Speech-Text Feature Utilization in Distinguishing Original from Machine-Translated Texts. 32756-32763 - Gaurav Srivastava

, Zhenyu Bi, Meng Lu, Xuan Wang:
DEBATE, TRAIN, EVOLVE: Self-Evolution of Language Model Reasoning. 32764-32810 - Wentao Zhang, Woojeong Kim, Yuntian Deng:

From Chat Logs to Collective Insights: Aggregative Question Answering. 32811-32850 - Tonmoy Hasan, Razvan C. Bunescu:

A Text-Based Recommender System that Leverages Explicit Affective State Preferences. 32851-32865 - Geyang Guo, Tarek Naous, Hiromi Wakaki, Yukiko Nishimura, Yuki Mitsufuji, Alan Ritter, Wei Xu:

CARE: Multilingual Human Preference Learning for Cultural Awareness. 32866-32895 - Justin Vasselli, Eunike Andriani Kardinata, Yusuke Sakai, Taro Watanabe:

Multilingual Dialogue Generation and Localization with Dialogue Act Scripting. 32896-32911 - Tamás Ficsor, Gábor Berend:

SUE: Sparsity-based Uncertainty Estimation via Sparse Dictionary Learning. 32912-32929 - Yifeng Ding, Hantian Ding, Shiqi Wang, Qing Sun, Varun Kumar, Zijian Wang:

Planning-Aware Code Infilling via Horizon-Length Prediction. 32930-32942 - Ashmari Pramodya, Nirasha Nelki, Heshan Shalinda, Chamila Liyanage, Yusuke Sakai, Randil Pushpananda, Ruvan Weerasinghe, Hidetaka Kamigaito, Taro Watanabe:

SinhalaMMLU: A Comprehensive Benchmark for Evaluating Multitask Language Understanding in Sinhala. 32943-32961 - Kartik Sharma, Peeyush Kumar, Yunqing Li:

OG-RAG: Ontology-grounded retrieval-augmented generation for large language models. 32962-32981 - Finlay Fehlauer, Kyle Mahowald, Tiago Pimentel:

Convergence and Divergence of Language Models under Different Random Seeds. 32982-32991 - Liuxuan Jiao, Chen Gao, Yiqian Yang, Chenliang Zhou, YiXian Huang, Xinlei Chen, Yong Li:

Analyzing and Modeling LLM Response Lengths with Extreme Value Theory: Anchoring Effects and Hybrid Distributions. 32992-33002 - Jio Choi, Mohit Bansal, Elias Stengel-Eskin:

Language Models Identify Ambiguities and Exploit Loopholes. 33003-33018 - Andong Chen, Lianzhang Lou, Kehai Chen, Xuefeng Bai, Yang Xiang, Muyun Yang, Tiejun Zhao, Min Zhang:

Benchmarking LLMs for Translating Classical Chinese Poetry: Evaluating Adequacy, Fluency, and Elegance. 33019-33036 - Alhanoof Althnian, Norah A. Alzahrani, Shaykhah Z. Alsubaie, Eman Albilali, Ahmed Abdelali, Nouf M. Alotaibi, M. Saiful Bari, Yazeed Alnumay, Abdulhamed Alothaimen, Maryam Saif, Shahad D. Alzaidi, Faisal Abdulrahman Mirza, Yousef Almushayqih, Mohammed Al Saleem, Ghadah Alabduljabbar

, Abdulmohsen Al-Thubaity, Areeb Alowisheq, Nora Al-Twairesh:
AraEval: An Arabic Multi-Task Evaluation Suite for Large Language Models. 33037-33061 - Yumeng Wang, Xiuying Chen, Suzan Verberne

:
QUIDS: Query Intent Description for Exploratory Search via Dual Space Modeling. 33062-33077 - Kiran Ramnath, Kang Zhou, Sheng Guan, Soumya Smruti Mishra, Xuan Qi, Zhengyuan Shen, Shuai Wang, Sangmin Woo, Sullam Jeoung

, Yawei Wang, Haozhu Wang, Han Ding, Yuzhe Lu, Zhichao Xu, Yun Zhou, Balasubramaniam Srinivasan, Qiaojing Yan, Yueyan Chen, Haibo Ding, Panpan Xu, Lin Lee Cheong:
A Systematic Survey of Automatic Prompt Optimization Techniques. 33078-33110 - Beiduo Chen

, Yang Janet Liu, Anna Korhonen, Barbara Plank:
Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation. 33111-33135 - Rana Salama, Jason Cai, Michelle Yuan, Anna Currey, Monica Sunkara, Yi Zhang, Yassine Benajiba:

MemInsight: Autonomous Memory Augmentation for LLM Agents. 33136-33152 - Chenglong Lu, Chenxiao Li, Jingwei Cheng, Yongquan Ji

, Guoqing Chen, Fu Zhang:
Breaking the Noise Barrier: LLM-Guided Semantic Filtering and Enhancement for Multi-Modal Entity Alignment. 33153-33167 - Zeinab Sadat Taghavi, Ali Modarressi, Yunpu Ma, Hinrich Schütze:

ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge. 33168-33190 - Lisa Alazraki

, Maximilian Mozes, Jon Ander Campos, Yi Chern Tan, Marek Rei, Max Bartolo:
No Need for Explanations: LLMs can implicitly learn from mistakes in-context. 33191-33215 - Ziyu Chen, Junfei Sun, Chenxi Li, Tuan Dung Nguyen, Jing Yao, Xiaoyuan Yi, Xing Xie, Chenhao Tan, Lexing Xie:

MoVa: Towards Generalizable Classification of Human Morals and Values. 33216-33260 - Yue Fan, Handong Zhao, Ruiyi Zhang, Yu Shen, Xin Eric Wang, Gang Wu:

GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration. 33261-33278 - Wenyuan Zhang

, Shuaiyi Nie, Jiawei Sheng, Zefeng Zhang, Xinghua Zhang, Yongquan He, Tingwen Liu:
Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing. 33279-33302 - Jiazheng Liu, Sipeng Zheng, Börje F. Karlsson, Zongqing Lu:

Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning. 33303-33324 - Shengjie Li, Vincent Ng:

Graph-Based Multi-Trait Essay Scoring. 33325-33351 - John Salvador, Naman Bansal, Mousumi Akter, Souvika Sarkar, Anupam Das, Shubhra Kanti Karmaker:

Benchmarking LLMs on Semantic Overlap Summarization. 33352-33373 - Siddhant Bikram Shah, Kristina T. Johnson:

N-CORE: N-View Consistency Regularization for Disentangled Representation Learning in Nonverbal Vocalizations. 33374-33391 - Jinwook Park, Kangil Kim

:
Probability Distribution Collapse: A Critical Bottleneck to Compact Unsupervised Neural Grammar Induction. 33392-33403 - Alexander Spangher, Michael Vu, Arda Kaz, Naitian Zhou, Ben Welsh:

Spatial Layouts in News Homepages Capture Human Preferences. 33404-33420 - Taebaek Hwang, Minseo Kim, Gisang Lee, Seonuk Kim, Hyunjun Eun:

KRETA: A Benchmark for Korean Reading and Reasoning in Text-Rich VQA Attuned to Diverse Visual Contexts. 33421-33432 - Jeonghye Kim, Sojeong Rhee, Minbeom Kim, Dohyung Kim, Sangmook Lee, Youngchul Sung, Kyomin Jung:

ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection. 33433-33465 - Shudong Liu, Hongwei Liu, Junnan Liu, Linchen Xiao, Songyang Gao, Chengqi Lyu, Yuzhe Gu, Wenwei Zhang, Derek F. Wong, Songyang Zhang, Kai Chen:

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward. 33466-33494 - Xiao Wu, Ting-Zhu Huang, Liang-Jian Deng, Yanyuan Qiao, Imran Razzak, Yutong Xie:

A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making. 33495-33512 - Yongye Su, Yucheng Zhang, Zeru Shi, Bruno Ribeiro, Elisa Bertino:

Castle: Causal Cascade Updates in Relational Databases with Large Language Models. 33513-33525 - Vishnu Raja, Adithya V. Ganesan, Anand Syamkumar, Ritwik Banerjee, H. Andrew Schwartz:

Idiosyncratic Versus Normative Modeling of Atypical Speech Recognition: Dysarthric Case Studies. 33526-33537 - Kinjal Basu, Ibrahim Abdelaziz, Kiran Kate, Mayank Agarwal, Maxwell Crouse, Yara Rizk, Kelsey Bradford, Asim Munawar, Sadhana Kumaravel, Saurabh Goyal, Xin Wang, Luis A. Lastras, Pavan Kapanipathi:

NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls. 33538-33547 - Md. Atabuzzaman

, Ali Asgarov, Christopher Thomas:
Benchmarking and Mitigating MCQA Selection Bias of Large Vision-Language Models. 33548-33562 - Sandeep Kumar, Tirthankar Ghosal, Vinayak Goyal, Asif Ekbal:

Can Large Language Models Unlock Novel Scientific Research Ideas? 33563-33587 - Wenya Xie

, Shaochen Zhong, Hoang Anh Duy Le, Zhaozhuo Xu, Jianwen Xie, Zirui Liu:
Word Salad Chopper: Reasoning Models Waste A Ton Of Decoding Budget On Useless Repetitions, Self-Knowingly. 33588-33598 - Pramit Sahoo, Maharaj Brahma

, Maunendra Sankar Desarkar:
DIWALI - Diversity and Inclusivity aWare cuLture specific Items for India: Dataset and Assessment of LLMs for Cultural Text Adaptation in Indian Context. 33599-33626 - Shuyang Cao, Kaijian Zou, Lu Wang:

SYNC: A Synthetic Long-Context Understanding Benchmark for Controlled Comparisons of Model Capabilities. 33627-33648 - Chester Palen-Michel, Maxwell Pickering, Maya Kruse, Jonne Sälevä, Constantine Lignos:

OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages. 33649-33674 - Elizabeth Orwig, Shinwoo Park, Hyundong Jin, Yo-Sub Han:

Mondrian: A Framework for Logical Abstract (Re)Structuring. 33675-33690 - Hiroyuki Deguchi

, Masaaki Nagata:
Case-Based Decision-Theoretic Decoding with Quality Memories. 33691-33706 - Xinliang Frederick Zhang

, Nicholas Beauchamp, Lu Wang:
PRIME: Large Language Model Personalization with Cognitive Dual-Memory and Personalized Thought Process. 33707-33736 - Ananth Agarwal, Jasper Jian, Christopher D. Manning, Shikhar Murty:

Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations. 33737-33757 - Zihan Huang, Junda Wu, Rohan Surana, Tong Yu, David Arbour, Ritwik Sinha, Julian J. McAuley:

Image Difference Captioning via Adversarial Preference Optimization. 33758-33770 - Mohammad Ramezanali, Mo Vazifeh, Paolo Santi:

seqBench: A Tunable Benchmark to Quantify Sequential Reasoning Limits of LLMs. 33771-33792 - Minki Hong, Jangho Choi, Jihie Kim:

NormGenesis: Multicultural Dialogue Generation via Exemplar-Guided Social Norm Modeling and Violation Recovery. 33793-33831 - Anjiang Wei, Yuheng Wu, Yingjia Wan, Tarun Suresh, Huanmi Tan, Zhanke Zhou, Sanmi Koyejo, Ke Wang, Alex Aiken:

SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas. 33832-33849 - Chaeri Kim, Jaeyeon Bae, Taehwan Kim:

Data Descriptions from Large Language Models with Influence Estimation. 33850-33867 - Anjiang Wei, Jiannan Cao, Ran Li, Hongyu Chen, Yuhui Zhang, Ziheng Wang, Yuan Liu, Thiago S. F. X. Teixeira, Diyi Yang, Ke Wang, Alex Aiken:

EquiBench: Benchmarking Large Language Models' Reasoning about Program Semantics via Equivalence Checking. 33868-33881 - Shiqi Wang, Qi Wang, Runliang Niu, He Kong, Yi Chang:

MicroEdit: Neuron-level Knowledge Disentanglement and Localization in Lifelong Model Editing. 33882-33896 - Domenico Meconi, Simone Stirpe, Federico Martelli, Leonardo Lavalle, Roberto Navigli:

Do Large Language Models Understand Word Senses? 33897-33916 - Vijeta Deshpande, Debasmita Ghose, John D. Patterson, Roger E. Beaty, Anna Rumshisky:

Diverse, not Short: A Length-Controlled Data Selection Strategy for Improving Response Diversity of Language Models. 33917-33938 - Yixuan Tang, Yuanyuan Shi, Yiqun Sun, Anthony Kum Hoe Tung:

Uncovering the Bigger Picture: Comprehensive Event Understanding Via Diverse News Retrieval. 33939-33957 - Hyungjune Bu, ChanJoo Jung, Minjae Kang, Jaehyung Kim:

Personalized LLM Decoding via Contrasting Personal Preference. 33958-33978 - Yixuan Tang, Jincheng Wang, Anthony Kum Hoe Tung:

The Missing Parts: Augmenting Fact Verification with Half Truth Detection. 33979-33996 - Yimin Xiao, Yongle Zhang, Dayeon Ki, Calvin Bao, Marianna J. Martindale, Charlotte Vaughn, Ge Gao, Marine Carpuat:

Toward Machine Translation Literacy: How Lay Users Perceive and Rely on Imperfect Translations. 33997-34014 - Emanuele Moscato, Tiancheng Hu, Matthias Orlikowski, Paul Röttger, Debora Nozza:

Personalization up to a Point: Why Personalized Content Moderation Needs Boundaries, and How We Can Enforce Them. 34015-34029 - Jun Rong Brian Chong, Yixuan Tang, Anthony Kum Hoe Tung:

MPCG: Multi-Round Persona-Conditioned Generation for Modeling the Evolution of Misinformation with LLMs. 34030-34064 - Pingjun Hong, Beiduo Chen

, Siyao Peng, Marie-Catherine de Marneffe, Barbara Plank:
LiTEx: A Linguistic Taxonomy of Explanations for Understanding Within-Label Variation in Natural Language Inference. 34065-34085 - Tommaso Bonomo

, Luca Gioffrè
, Roberto Navigli:
LiteraryQA: Towards Effective Evaluation of Long-document Narrative QA. 34086-34107 - Seung-Bin Kim, Junhyeok Cha, Hyung-Seok Oh, Heejin Choi, Seong-Whan Lee:

FillerSpeech: Towards Human-Like Text-to-Speech Synthesis with Filler Insertion and Filler Style Control. 34108-34125 - Luca Moroni, Javier Aula-Blasco, Simone Conia, Irene Baucells, Naiara Pérez, Silvia Paniagua Suárez, Anna Salles, Malte Ostendorff, Júlia Falcão, Guijin Son, Aitor Gonzalez-Agirre, Roberto Navigli, Marta Villegas:

Multi-LMentry: Can Multilingual LLMs Solve Elementary Tasks Across Languages? 34126-34157 - Yixuan Wang, Shiyu Ji, Yijun Liu, Yuzhuang Xu, Yang Xu, Qingfu Zhu, Wanxiang Che:

Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query. 34158-34174 - Eva Maria Vecchi, Neele Falk, Carlotta Quensel

, Iman Jundi, Gabriella Lapesa:
PerspectiveMod: A Perspectivist Resource for Deliberative Moderation. 34175-34198 - Hongyu Sun, Yusuke Sakai, Haruki Sakajo, Shintaro Ozaki, Kazuki Hayashi, Hidetaka Kamigaito, Taro Watanabe:

LoCt-Instruct: An Automatic Pipeline for Constructing Datasets of Logical Continuous Instructions. 34199-34218 - Shweta Verma

, Abhinav Anand
, Mira Mezini:
CodeSSM: Towards State Space Models for Code Understanding. 34219-34235 - Numaan Naeem, Abdellah El Mekki, Muhammad Abdul-Mageed:

EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMs. 34236-34263 - Giuliano Martinelli

, Bruno Gatti
, Roberto Navigli:
xCoRe: Cross-context Coreference Resolution. 34264-34278 - Jeongyeon Hwang, Junyoung Park, Hyejin Park, Dongwoo Kim, Sangdon Park, Jungseul Ok:

Retrieval-Augmented Generation with Estimation of Source Reliability. 34279-34303 - Pawitsapak Akarajaradwong, Pirat Pothavorn, Chompakorn Chaksangchaichot, Panuthep Tasawong, Thitiwat Nopparatbundit, Keerakiat Pratai, Sarana Nutanong:

NitiBench: Benchmarking LLM Frameworks on Thai Legal Question Answering Capabilities. 34304-34327 - Maggie Mi, Aline Villavicencio, Nafise Sadat Moosavi:

From Input Perception to Predictive Insight: Modeling Model Blind Spots Before They Become Errors. 34328-34341 - Alaa Aljabari, Mohammed Khalilia, Mustafa Jarrar:

WojoodRelations: Arabic Relation Extraction Corpus and Modeling. 34342-34360 - Murathan Kurfali, Robert Östling:

Conflicting Needles in a Haystack: How LLMs behave when faced with contradictory information. 34361-34376 - Wenxuan Liu, Zixuan Li, Long Bai, Yuxin Zuo, Daozhu Xu, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng:

Towards Event Extraction with Massive Types: LLM-based Collaborative Annotation and Partitioning Extraction. 34377-34399 - Sherrie Shen, Weixuan Wang, Alexandra Birch:

Liaozhai through the Looking-Glass: On Paratextual Explicitation of Culture-Bound Terms in Machine Translation. 34400-34416 - Karim Ghonim, Andrei Stefan Bejgu, Alberte Fernández-Castro, Roberto Navigli:

Concept-pedia: a Wide-coverage Semantically-annotated Multimodal Dataset. 34417-34438 - Karim Ghonim, Pere-Lluís Huguet Cabot, Riccardo Orlando, Roberto Navigli:

RAED: Retrieval-Augmented Entity Description Generation for Emerging Entity Linking and Disambiguation. 34439-34452 - Kyuyoung Kim, Jinwoo Shin, Jaehyung Kim:

Personalized Language Models via Privacy-Preserving Evolutionary Model Merging. 34453-34468 - Padakanti Srijith, Khushbu Pahwa, Radhika Mamidi, Bapi Raju Surampudi, Manish Gupta, Subba Reddy Oota:

Aligning Text/Speech Representations from Multimodal Models with MEG Brain Activity During Listening. 34469-34486 - Mounica Maddela, Lingjue Xie, Daniel Preotiuc-Pietro, Mausam:

STARQA: A Question Answering Dataset for Complex Analytical Reasoning over Structured Databases. 34487-34499 - Colin Hong, Xu Guo, Anand Chaanan Singh, Esha Choukse, Dmitrii Ustiugov:

Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency. 34500-34517 - Chenxin An, Zhihui Xie, Xiaonan Li, Ming Zhong, Shansan Gong, Lei Li, Jun Zhang, Jingjing Xu, Lingpeng Kong:

Long Chain-of-Thought Fine-tuning via Understanding-to-Reasoning Transition. 34518-34534 - Gleb Kuzmin, Petr Strepetov

, Maksim Stankevich, Natalya V. Chudova, Artem Shelmanov, Ivan V. Smirnov:
Exploring Large Language Models for Detecting Mental Disorders. 34535-34559 - Joonho Ko, Jinheon Baek, Sung Ju Hwang:

Efficient Real-time Refinement of Language Model Text Generation. 34560-34573 - Daehoon Gwak, Minseo Jung, Junwoo Park, Minho Park, ChaeHun Park, Junha Hyung, Jaegul Choo:

Reward-Weighted Sampling: Enhancing Non-Autoregressive Characteristics in Masked Diffusion LLMs. 34574-34594 - Esra Dönmez, Maximilian Maurer, Gabriella Lapesa, Agnieszka Falenska:

AI Argues Differently: Distinct Argumentative and Linguistic Patterns of LLMs in Persuasive Contexts. 34595-34626 - Souha Hassine, Asma Arrak, Marouene Addhoum, Steven R. Wilson:

TounsiBench: Benchmarking Large Language Models for Tunisian Arabic. 34627-34642 - Ines Rehbein, Ines Reinig, Simone Paolo Ponzetto:

Moral Framing in Politics (MFiP): A new resource and models for moral framing. 34643-34663 - Aakash Kumar Agarwal, Saprativa Bhattacharjee, Mauli Rastogi, Jemima Jacob, Biplab Banerjee, Rashmi Gupta, Pushpak Bhattacharyya:

ReDepress: A Cognitive Framework for Detecting Depression Relapse from Social Media. 34664-34682 - Michel Olvera, Changhong Wang, Paraskevas Stamatiadis, Gaël Richard, Slim Essid:

iKnow-audio: Integrating Knowledge Graphs with Audio-Language Models. 34683-34700 - Sourjyadip Ray, Shubham Sharma

, Somak Aditya, Pawan Goyal:
EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos. 34701-34727 - Denis Janiak, Jakub Binkowski, Albert Sawczyn, Bogdan Gabrys

, Ravid Shwartz-Ziv, Tomasz Kajdanowicz:
The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs. 34728-34745 - Rachneet Singh Sachdeva, Rima Hazra, Iryna Gurevych:

Turning Logic Against Itself: Probing Model Defenses Through Contrastive Questions. 34746-34776 - Sina J. Semnani, Han Zhang, Xinyan He, Merve Tekgurler, Monica Lam:

CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition. 34777-34824 - Inbar Pendzel, Einat Minkov:

Towards Author-informed NLP: Mind the Social Bias. 34825-34838 - Sina J. Semnani, Jirayu Burapacheep, Arpandeep Khatua, Thanawan Atchariyachanvanit, Zheng Wang, Monica S. Lam:

Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with Large Language Models. 34839-34866 - Junghwan Kim

, Haotian Zhang, David Jurgens:
Leveraging Multilingual Training for Authorship Representation: Enhancing Generalization across Languages and Domains. 34867-34892 - Libo Zhao, Jing Li, Ziqian Zeng:

DrFrattn: Directly Learn Adaptive Policy from Attention for Simultaneous Machine Translation. 34893-34906 - Fagun Patel, Duc Q. Nguyen, Sang T. Truong, Jody Vaynshtok, Sanmi Koyejo, Nick Haber:

The Sound of Syntax: Finetuning and Comprehensive Evaluation of Language Models for Speech Pathology. 34907-34925 - Sina Abbasi, Mohammad Reza Modarres, Mohammad Taher Pilehvar:

NormXLogit: The Head-on-Top Never Lies. 34926-34947 - Akriti Jain, Pritika Ramu, Aparna Garimella, Apoorv Saxena:

Doc2Chart: Intent-Driven Zero-Shot Chart Generation from Documents. 34948-34963 - Boyang Zhang, Yicong Tan, Yun Shen, Ahmed Salem, Michael Backes, Savvas Zannettou, Yang Zhang:

Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification. 34964-34976 - Tanawan Premsri, Parisa Kordjamshidi:

FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks. 34977-35003 - Roksana Goworek, Haim Dubossarsky:

Multilinguality Does not Make Sense: Investigating Factors Behind Zero-Shot Cross-Lingual Transfer in Sense-Aware Tasks. 35004-35029 - Arturo Oncevay, Elena Kochkina, Keshav Ramani, Toyin Aguda, Simerjot Kaur, Charese Smiley:

Translating Domain-Specific Terminology in Typologically-Diverse Languages: A Study in Tax and Financial Education. 35030-35044 - Tomohiro Sawada, Kartik Goyal:

Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language Models. 35045-35058 - Nandan Kumar Jha, Brandon Reagen:

Spectral Scaling Laws in Language Models: emphHow Effectively Do Feed-Forward Networks Use Their Latent Space? 35059-35070 - Fan Gao, Cheng Huang, Yutong Liu, Nyima Tashi, Xiangxiang Wang, Thupten Tsering, Ban Ma-bao, Renzeng Duojie, Gadeng Luosang, Rinchen Dongrub, Dorje Tashi, Xiao Feng, Yongbin Yu, Hao Wang:

TLUE: A Tibetan Language Understanding Evaluation Benchmark. 35071-35097 - Zeyu Zhang, Alessandro Moschitti, Thuy Vu:

Retrieving Support to Rank Answers in Open-Domain Question Answering. 35098-35105 - Adam Zahradník, Marek Suppa:

Trojsten Benchmark: Evaluating LLM Problem-Solving in Slovak STEM Competition Problems. 35106-35121 - Alexandre Costa Ferro Filho, Rafaello Virgilli, Lucas Alcântara Souza, Frederico Santos de Oliveira, Marcelo Henrique Lopes Ferreira, Daniel Tunnermann, Gustavo dos Reis Oliveira

, Anderson da Silva Soares, Arlindo Rodrigues Galvão Filho:
BRSpeech-DF: A Deep Fake Synthetic Speech Dataset for Portuguese Zero-Shot TTS. 35122-35127 - Shaona Ghosh, Amrita Bhattacharjee, Yftah Ziser, Christopher Parisien:

A Simple Yet Effective Method for Non-Refusing Context Relevant Fine-grained Safety Steering in LLMs. 35128-35148 - Jaden Kapali, Keaton Williamson, Winston Wu:

Statistical and Neural Methods for Hawaiian Orthography Modernization. 35149-35155 - Sriharsh Bhyravajjula, Melanie Walsh, Anna Preus

, Maria Antoniak:
so much depends / upon / a whitespace: Why Whitespace Matters for Poets and LLMs. 35156-35173 - Jingyu Zhang, Jiacan Yu, Marc Marone, Benjamin Van Durme, Daniel Khashabi:

Certified Mitigation of Worst-Case LLM Copyright Infringement. 35174-35195 - Eduard Tulchinskii, Laida Kushnareva, Anastasia Voznyuk, Andrei Andriiainen, Irina Piontkovskaya, Evgeny Burnaev, Serguei Barannikov:

Quantifying Logical Consistency in Transformers via Query-Key Alignment. 35196-35211 - Yao Dou, Michel Galley, Baolin Peng, Chris Kedzie, Weixin Cai, Alan Ritter, Chris Quirk, Wei Xu, Jianfeng Gao:

SimulatorArena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants? 35212-35290 - Sophia Simeng Han, Yoshiki Takashima, Shannon Zejiang Shen, Chen Liu, Yixin Liu, Roque K. Thuo, Sonia Knowlton, Ruzica Piskac, Scott J. Shapiro, Arman Cohan:

CourtReasoner: Can LLM Agents Reason Like Judges? 35291-35306 - Sharanya Majumder, Zehua Li, Derek Ouyang, Kit T. Rodolfa, Elena Eneva, Julian Nyarko, Daniel E. Ho:

Not Your Typical Government Tipline: LLM-Assisted Routing of Environmental Protection Agency Citizen Tips. 35307-35315 - Myeongseob Ko, Nikhil Reddy Billa, Adam Nguyen, Charles Fleming, Ming Jin, Ruoxi Jia:

Retracing the Past: LLMs Emit Training Data When They Get Lost. 35316-35337 - Linyang He, Qiaolin Wang, Xilin Jiang, Nima Mesgarani:

Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations. 35338-35353 - Khonzoda Umarova, Lillian Lee, Laerdon Kim:

Current Semantic-change Quantification Methods Struggle with Discovery in the Wild. 35354-35367 - Jay Patel, Hrudayangam Mehta, Jeremy Blackburn:

Evaluating Large Language Models for Detecting Antisemitism. 35368-35397 - Guangze Gao, Zixuan Li, Chunfeng Yuan, Jiawei Li, Wu Jianzhuo, Yuehao Zhang, Xiaolong Jin, Bing Li, Weiming Hu:

D-RAG: Differentiable Retrieval-Augmented Generation for Knowledge Graph Question Answering. 35398-35417 - Thang Luong, Dawsen Hwang, Hoang H. Nguyen, Golnaz Ghiasi, Yuri Chervonyi, Insuk Seo, Junsu Kim, Garrett Bingham, Jonathan Lee, Swaroop Mishra, Alex Zhai, Clara Huiyi Hu, Henryk Michalewski, Jimin Kim, Jeonghyun Ahn, Junhwi Bae, Xingyou Song, Trieu H. Trinh, Quoc V. Le, Junehyuk Jung:

Towards Robust Mathematical Reasoning. 35418-35442 - Junjie Xing, Yeye He, Mengyu Zhou, Haoyu Dong, Shi Han, Dongmei Zhang, Surajit Chaudhuri:

Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Fine-tuning. 35443-35460 - Ankan Mullick, Sombit Bose, Rounak Saha, Ayan Kumar Bhowmick, Aditya Vempaty, Prasenjit Dey, Ravi Kokku, Pawan Goyal, Niloy Ganguly:

Introducing Spotlight: A Novel Approach for Generating Captivating Key Information from Documents. 35461-35489 - Moritz Altemeyer, Steffen Eger, Johannes Daxenberger, Yanran Chen, Tim Altendorf, Philipp Cimiano, Benjamin Schiller:

Argument Summarization and its Evaluation in the Era of Large Language Models. 35490-35511 - Margaret A. Hughes, Brandon Roy, Elinor Poole-Dayan, Deb Roy, Jad Kabbara:

Computational Analysis of Conversation Dynamics through Participant Responsivity. 35512-35531 - Sangjun Lee, Seung-taek Woo, Jungyu Jin, Changhun Lee, Eunhyeok Park:

AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models. 35532-35550 - Alejandro Benito-Santos, Adrián Ghajari:

Beyond Averages: Learning with Annotator Disagreement in STS. 35551-35557 - Wenyang Hu, Gregory Kang Ruey Lau, Diwen Liu, Jizhuo Chen, See-Kiong Ng, Bryan Kian Hsiang Low:

Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning Tasks. 35558-35572 - Seyedeh Fatemeh Ebrahimi, Jaakko Peltonen:

Constrained Non-negative Matrix Factorization for Guided Topic Modeling of Minority Topics. 35573-35598 - Nadine El-Naggar, Tatsuki Kuribayashi, Ted Briscoe:

Which Word Orders Facilitate Length Generalization in LMs? An Investigation with GCG-Based Artificial Languages. 35599-35613 - Megi Dervishi, Alexandre Allauzen, Gabriel Synnaeve, Yann LeCun:

Training compute-optimal transformer encoder models. 35614-35629 - Hyungyu Shin, Jingyu Tang, Yoonjoo Lee, Nayoung Kim, Hyunseung Lim

, Ji Yong Cho, Hwajung Hong, Moontae Lee, Juho Kim:
Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM Reviews. 35630-35656 - Zoe Wanying He, Sean Trott, Meenakshi Khosla:

Seeing Through Words, Speaking Through Pixels: Deep Representational Alignment Between Vision and Language Models. 35657-35672 - Artem Vazhentsev, Ekaterina Fadeeva, Rui Xing, Gleb Kuzmin, Ivan Lazichny, Alexander Panchenko, Preslav Nakov, Timothy Baldwin, Maxim Panov, Artem Shelmanov:

Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models. 35673-35694 - Xintong Wang, Yixiao Liu, Jingheng Pan, Liang Ding, Longyue Wang, Chris Biemann:

Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites. 35695-35711 - Artem Shelmanov, Ekaterina Fadeeva, Akim Tsvigun, Ivan Tsvigun, Zhuohan Xie, Igor Kiselev, Nico Daheim, Caiqi Zhang, Artem Vazhentsev, Mrinmaya Sachan, Preslav Nakov, Timothy Baldwin:

A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs. 35712-35731

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














