


default search action
19th EACL 2026: Rabat, Morocco - Volume 1: Long Papers
- Vera Demberg, Kentaro Inui, Lluís Marquez:

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2026 - Volume 1: Long Papers, Rabat, Morocco, March 24-29, 2026. Association for Computational Linguistics 2026, ISBN 979-8-89176-380-7 - Yang Liu, Jiaye Yang, Weikang Li, Jiahui Liang, Yang Li, Lingyong Yan:

LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts. 1-22 - Yuxuan Zhu, Antony Kellermann, Akul Gupta, Philip Li, Richard Fang, Rohan Bindu, Daniel Kang:

Teams of LLM Agents can Exploit Zero-Day Vulnerabilities. 23-35 - Jingwei Ni, Yu Fan, Vilém Zouhar, Donya Rooein, Alexander Miserlis Hoyle, Mrinmaya Sachan, Markus Leippold, Dirk Hovy, Elliott Ash:

Can Reasoning Help Large Language Models Capture Human Annotator Disagreement? 36-54 - Vilém Zouhar, Maike Züfle, Beni Egressy, Julius Cheng, Mrinmaya Sachan, Jan Niehues:

Early-Exit and Instant Confidence Translation Quality Estimation. 55-76 - Justus-Jonas Erker, Nils Reimers, Iryna Gurevych:

GRITHopper: Decomposition-Free Multi-Hop Dense Retrieval. 77-94 - Gyubeum Lim, Yemo Koo, Vijay Krishna Madisetti:

SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models. 95-140 - Shuhaib Mehri, Xiusi Chen, Heng Ji, Dilek Hakkani-Tür:

Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data Synthesis. 141-164 - Jan Strich, Enes Kutay Isgorur, Maximilian Trescher, Chris Biemann, Martin Semmann:

T2-RAGBench: Text-and-Table Aware Retrieval-Augmented Generation. 165-191 - Kefan Yu, Qingcheng Zeng, Weihao Xuan, Wanxin Li, Jingyi Wu, Rob Voigt:

The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models. 192-213 - Jonas Golde, Nicolaas Paul Jedema, RaviKiran Krishnan, Phong Le:

Hierarchical Text Classification with LLM-Refined Taxonomies. 214-228 - Chengsong Huang, Langlin Huang, Jiaxin Huang:

Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning. 229-249 - Sarah Ball, Frauke Kreuter, Nina Panickssery:

Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models. 250-279 - Tianyu Cao, Neel Bhandari, Akhila Yerukola, Akari Asai, Maarten Sap:

Out of Style: RAG's Fragility to Linguistic Variation. 280-318 - Franziska Weeber, Tanise Ceron, Sebastian Padó:

Do Political Opinions Transfer Between Western Languages? An Analysis of Unaligned and Aligned Multilingual LLMs. 319-340 - Haoran Sun, Shaoning Zeng, Bob Zhang:

H-MEM: Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM Agents. 341-350 - Abid Ali, Diego Mollá, Usman Naseem:

MULSUM: A Multimodal Summarization System with Vis-Aligner and Diversity-Aware Image Selection. 351-362 - Federico Marcuzzi, Xuefei Ning, Roy Schwartz, Iryna Gurevych:

How Quantization Shapes Bias in Large Language Models. 363-404 - Jasmin Orth, Philipp Mondorf, Barbara Plank:

If Probable, Then Acceptable? Understanding Conditional Acceptability Judgments in Large Language Models. 405-427 - Sangmitra Madhusudan, Kaige Chen, Ali Emami:

The Dog the Cat Chased Stumped the Model: Measuring When Language Models Abandon Structure for Shortcuts. 428-453 - Alperen Ozturk, Saziye Betül Özates, Sophia Bahar Root, Angela Violi, Nicholas A. Kotov, J. Scott Vanepps, Emine Sumeyra Turali-Emre:

Automated Screening of Antibacterial Nanoparticle Literature: Dataset Curation and Model Evaluation. 454-465 - Jiaxin Bai, Zhaobo Wang, Junfei Cheng, Dan Yu, Zerui Huang, Weiqi Wang, Xin Liu, Chen Luo, Yanming Zhu, Bo Li, Yangqiu Song:

Intention Knowledge Graph Construction for User Intention Relation Modeling. 466-484 - Chunyang Jiang, Paola Merlo:

Analogical Structure, Minimal Contextual Cues and Contrastive Distractors: Input Design for Sample-Efficient Linguistic Rule Induction. 485-500 - Yunze Xiao, Tingyu He, Lionel Z. Wang, Yiming Ma, Xingyu Song, Xiaohang Xu, Mona T. Diab, Irene Li, Ka Chung Ng:

JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models' Detection of Human risky health behavior Content in Jirai Community. 501-517 - Manoj Balaji Jagadeeshan, Samarth Bhatia, Pretam Ray, Harshul Raj Surana, Akhil Rajeev P, Priya Mishra, Annarao Kulkarni, Ganesh Ramakrishnan, Prathosh A. P., Pawan Goyal:

Chandomitra: Towards Generating Structured Sanskrit Poetry from Natural Language Inputs. 518-534 - Chen Cecilia Liu, Hiba Arnaout, Nils Kovacic, Dana Atzil-Slonim, Iryna Gurevych:

Tailored Emotional LLM-Supporter: Enhancing Cultural Sensitivity. 535-574 - Hussein Abdallah, Ibrahim Abdelaziz, Panos Kalnis, Essam Mansour:

Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs. 575-592 - David Guzman Piedrahita, Irene Strauss, Rada Mihalcea, Zhijing Jin:

Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models. 593-652 - Yufeng Zou, Jean Utke, Diego Klabjan, Han Liu:

PromptFE: Automated Feature Engineering by Prompting. 653-681 - Maor Juliet Lavi, Tova Milo, Mor Geva:

Detecting (Un)answerability in Large Language Models with Linear Directions. 682-699 - Sanghwan Bae, Jiwoo Hong, Min Young Lee, Hanbyul Kim, Jeongyeon Nam, Donghyun Kwak:

Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning. 700-719 - Elize Herrewijnen, Dong Nguyen, Floris Bex, Albert Gatt:

BERT, are you paying attention? Attention regularization with human-annotated rationales. 720-751 - Jasper Jian, Christopher D. Manning:

Humans and transformer LMs: Abstraction drives language learning. 752-765 - Minh Duc Chu, Kshitij Pawar, Zihao He, Roxanna Sharifi, Ross M. Sonnenblick, Magdalayna Curry, Laura D'Adamo, Lindsay Young, Stuart B. Murray, Kristina Lerman:

BigTokDetect: A Clinically-Informed Vision-Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok. 766-790 - Terra Blevins, Susanne Sophie Schmalwieser, Benjamin Roth:

Do language models accommodate their users? A study of linguistic convergence. 791-807 - Anmol Goel, Alan Ritter, Iryna Gurevych:

Auditing Language Model Unlearning via Information Decomposition. 808-826 - Yu-Shin Huang, Peter Just, Hanyun Yin, Krishna Narayanan, Ruihong Huang, Chao Tian:

OD-Stega: LLM-Based Relatively Secure Steganography via Optimized Distributions. 827-851 - Min Zeng, Xi Chen, Haiqin Yang, Yike Guo:

Sparse Adapter Fusion for Continual Learning in NLP. 852-863 - Zixiao Zhu, Hanzhang Zhou, Zijian Feng, Tianjiao Li, Chua Jia Jim Deryl, Lee Onn Mak, Gee Wah Ng, Kezhi Mao:

Rethinking Prompt Optimizers: From Prompt Merits to Optimization. 864-892 - Ana-Maria Bucur, Marcos Zampieri, Tharindu Ranasinghe, Fabio Crestani:

A Survey on Multilingual Mental Disorders Detection from Social Media Data. 893-918 - Ilias Chalkidis, Stephanie Brandl, Paris Aslanidis:

Identifying Fine-grained Forms of Populism in Political Discourse: A Case Study on Donald Trump's Presidential Campaigns. 919-936 - Xingyu Zhu, Claire Nédellec, Balázs Nagy, László Vidács, Robert Bossy:

SCoNE: a Self-Correcting and Noise-Augmented Method for Complex Biological and Chemical Named Entity Recognition. 937-952 - Iwona Christop, Mateusz Czyznikiewicz, Pawel Skórzewski, Lukasz Bondaruk, Jakub Kubiak, Marcin Lewandowski, Marek Kubis:

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models. 953-983 - Syeda Nahida Akter, Shrimai Prabhumoye, Matvei Novikov, Seungju Han, Ying Lin, Evelina Bakhturina, Eric Nyberg, Yejin Choi, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro:

Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning. 984-1002 - Aleksandra Krasnodebska, Katarzyna Dziewulska, Karolina Seweryn, Maciej Chrabaszcz, Wojciech Kusa:

Safety of Large Language Models Beyond English: A Systematic Literature Review of Risks, Biases, and Safeguards. 1003-1034 - Yuhang Liu, Pengxiang Li, Zishu Wei, Congkai Xie, Xueyu Hu, Xinchen Xu, Shengyu Zhang, Xiaotian Han, Hongxia Yang, Fei Wu:

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection. 1035-1051 - Yakup Abrek Er, Ilker Kesen, Gözde Gül Sahin, Aykut Erdem:

Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish. 1052-1085 - Bastien Liétard, Gabriel Loiseau:

CALE : Concept-Aligned Embeddings for Both Within-Lemma and Inter-Lemma Sense Differentiation. 1086-1100 - Felix Matthias Saaro, Pius von Däniken, Mark Cieliebak, Jan Milan Deriu:

Do NOT Classify and Count: Hybrid Attribute Control Success Evaluation. 1101-1114 - Gyuwan Kim, Yang Li, Evangelia Spiliopoulou, Jie Ma, William Yang Wang:

Detecting Training Data of Large Language Models via Expectation Maximization. 1115-1129 - Pritam Sil, Durgaprasad Karnam, Vinay Reddy Venumuddala, Pushpak Bhattacharyya:

How effective are VLMs in assisting humans in inferring the quality of mental models from Multimodal short answers? 1130-1140 - Simon Münker, Nils Schwager, Achim Rettinger:

Don't Trust Generative Agents to Mimic Communication on Social Networks Unless You Benchmarked their Empirical Realism. 1141-1151 - Jing Yang, Moritz Hechtbauer, Elisabeth Khalilov, Evelyn Luise Brinkmann, Vera Schmitt, Nils Feldhus:

Persona Prompting as a Lens on LLM Social Reasoning. 1152-1170 - Michele Joshua Maggini, Paloma Piot, Anxo Pérez, Erik Bran Marino, Lúa Santamaría Montesinos, Ana Lisboa Cotovio, Marta Vázquez Abuín, Javier Parapar, Pablo Gamallo:

PartisanLens: A Multilingual Dataset of Hyperpartisan and Conspiratorial Immigration Narratives in European Media. 1171-1186 - Lei Xu, Pierre Beckmann, Marco Valentino, André Freitas:

Adaptive LLM-Symbolic Reasoning via Dynamic Logical Solver Composition. 1187-1208 - Elena Sofia Ruzzetti, Fabio Massimo Zanzotto, Tommaso Caselli:

Lexical Popularity: Quantifying the Impact of Pre-training for LLM Performance. 1209-1230 - Paul He, Yinya Huang, Mrinmaya Sachan, Zhijing Jin:

Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification. 1231-1250 - Punya Syon Pandey, Yongjin Yang, Jiarui Liu, Zhijing Jin:

CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures. 1251-1266 - Thomas F. Burns, Letitia Parcalabescu, Stephan Wäldchen, Michael Barlow, Gregor Ziegltrum, Volker Stampa, Bastian Harren, Björn Deiseroth:

Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation. 1267-1283 - Zijun Wu, Yongchang Hao, Lili Mou:

Ultra-Low-Dimensional Prompt Tuning via Random Projection. 1284-1303 - Robin Young:

NP-Hard Lower Bound Complexity for Semantic Self-Verification. 1304-1318 - Fengwei Tian, Payel Bhattacharjee, Heidi A. Hanson, Geoffrey D. Rubin, Joseph Y. Lo, Ravi Tandon:

STAMP: Selective Task-Aware Mechanism for Text Privacy. 1319-1333 - Alberto Purpura, Li Wang, Sahil Badyal, Eugenio Beaufrand, Adam Faulkner:

Deconstructing Instruction-Following: A New Benchmark for Granular Evaluation of Large Language Model Instruction Compliance Abilities. 1334-1349 - Muyang Zhou, Huaxia Rui:

Utterance-level Detection Framework for LLM-Involved Content Detection in Conversational Setting. 1350-1366 - Yifei Shen, Yilun Zhao, Justice Ou, Tinglin Huang, Arman Cohan:

Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL. 1367-1412 - Vishal Anand, Milad Alshomary, Kathleen McKeown:

iBERT: Interpretable Embeddings via Sense Decomposition. 1413-1429 - Vinu Sankar Sadasivan, Soheil Feizi, Rajiv Mathews, Lun Wang:

Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World. 1430-1440 - Cléa Chataigner, Rebecca Ma, Prakhar Ganesh, Yuhao Chen, Afaf Taïk, Elliot Creager, Golnoosh Farnadi:

Say It Another Way: Auditing LLMs with a User-Grounded Automated Paraphrasing Framework. 1441-1467 - Shuai Wang, Harrisen Scells, Bevan Koopman, Guido Zuccon:

AutoBool: Reinforcement-Learned LLM for Effective Automatic Systematic Reviews Boolean Query Generation. 1468-1493 - Jiaqian Zhang, Zhaozhi Qian, Faroq Al-Tam, Ignacio Iacobacci, Muhammad Al-Qurishi, Riad Souissi:

Improving LLM Domain Certification with Pretrained Guide Models. 1494-1510 - Kevin Han, Siddharth Maddikayala, Tim Knappe, Om Patel, Austen Liao, Amir Barati Farimani:

TDFlow: Agentic Workflows for Test Driven Development. 1511-1527 - Igor Sterner, Alex Lascarides, Frank Keller:

Contrastive Learning with Narrative Twins for Modeling Story Salience. 1528-1550 - Yachuan Liu, Xiaochun Wei, Lin Shi, Xinnuo Li, Bohan Zhang, Paramveer Dhillon, Qiaozhu Mei:

ExAnte: A Benchmark for Ex-Ante Inference in Large Language Models. 1551-1571 - Grace Byun, Rebecca Lipschutz, Sean T. Minton, Abigail Powers, Jinho D. Choi:

CRADLE Bench: A Clinician-Annotated Benchmark for Multi-Faceted Mental Health Crisis and Safety Risk Detection. 1572-1590 - Tessa Masis, Brendan T. O'Connor:

Coordinates from Context: Using LLMs to Ground Complex Location References. 1591-1606 - Viet-Thanh Pham, Minghan Wang, Hao-Han Liao, Thuy-Trang Vu:

Discourse Graph Guided Document Translation with Large Language Models. 1607-1627 - Patrice Bechard, Chao Wang, Amirhossein Abaskohi, Juan A. Rodríguez, Christopher Pal, David Vázquez, Spandana Gella, Sai Rajeswar, Perouz Taslakian:

StarFlow: Generating Structured Workflow Outputs From Sketch Images. 1628-1645 - Ren-Wei Liang, Chin-Ting Hsu, Chan-Hung Yu, Saransh Agrawal, Shih-Cheng Huang, Chieh-Yen Lin, Shang-Tse Chen, Kuan-Hao Huang, Shao-Hua Sun:

Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors. 1646-1668 - Reza Khanmohammadi, Erfan Miahi, Simerjot Kaur, Charese Smiley, Ivan Brugere, Kundan Thind, Mohammad M. Ghassemi:

How Reliable are Confidence Estimators for Large Reasoning Models? A Systematic Benchmark on High-Stakes Domains. 1669-1754 - Hoang-Quoc Nguyen-Son, Minh-Son Dao, Koji Zettsu:

SearchLLM: Detecting LLM Paraphrased Text by Measuring the Similarity with Regeneration of the Candidate Source via Search Engine. 1755-1772 - Zichen Song, Weijia Li:

RoZO: Geometry-Aware Zeroth-Order Fine-Tuning on Low-Rank Adapters for Black-Box Large Language Models. 1773-1783 - Ryusei Nishide, Makoto Miwa:

Mitigating Degree Bias in Hypergraphs via Attribute-as-Structure Approach. 1784-1801 - Pengda Wang, Huiqi Zou, Han Jiang, Hanjie Chen, Tianjun Sun, Xiaoyuan Yi, Ziang Xiao, Frederick L. Oswald:

Generative Personality Simulation via Theory-Informed Structured Interview. 1802-1888 - Chongwen Zhao, Yutong Ke, Kaizhu Huang:

Unraveling LLM Jailbreaks Through Safety Knowledge Neurons. 1889-1906 - Shristi Das Biswas, Yue Zhang, Anwesan Pal, Radhika Bhargava, Kaushik Roy:

ELLA: Efficient Lifelong Learning for Adapters in Large Language Models. 1907-1924 - Mohamed Elgaar, Hadi Amiri:

LingGen: Scalable Multi-Attribute Linguistic Control via Power-Law Masking. 1925-1942 - Ömer Faruk Akgül, Feiyu Zhu, Yuxin Yang, Rajgopal Kannan, Viktor K. Prasanna:

RECIPE-TKG: From Sparse History to Structured Reasoning for LLM-based Temporal Knowledge Graph Completion. 1943-1965 - Michelle Yuan, Weiyi Sun, Amir Rezaeian, Jyotika Singh, Sandip Ghoshal, Yao-Ting Wang, Miguel Ballesteros, Yassine Benajiba:

Barriers to Discrete Reasoning with Transformers: A Survey Across Depth, Exactness, and Bandwidth. 1966-1978 - James Burgess, Jan N. Hansen, Duo Peng, Yuhui Zhang, Alejandro Lozano, Min Woo Sun, Emma Lundberg, Serena Yeung-Levy:

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR. 1979-1997 - Bolei Ma, Yong Cao, Indira Sen, Anna-Carolina Haensch, Frauke Kreuter, Barbara Plank, Daniel Hershcovich:

Too Open for Opinion? Embracing Open-Endedness in Large Language Models for Social Simulation. 1998-2016 - Ze Yu Zhang, Zitao Li, Yaliang Li, Bolin Ding, Bryan Kian Hsiang Low:

Respecting Temporal-Causal Consistency: Entity-Event Knowledge Graph for Retrieval-Augmented Generation. 2017-2054 - Kai Sun, Yin Huang, Srishti Mehra, Mohammad Kachuee, Xilun Chen, Renjie Tao, Zhaojiang Lin, Andrea Jessee, Nirav Shah, Alex Betty, Yue Liu, Anuj Kumar, Wen-tau Yih, Xin Luna Dong:

Knowledge Extraction on Semi-Structured Content: Does It Remain Relevant for Question Answering in the Era of LLMs? 2055-2074 - Saptarshi Ghosh, Linfeng Liu, Tianyu Jiang:

A Computational Approach to Visual Metonymy. 2075-2099 - Juan Moreno Gonzalez, Bashar Alhafni, Nizar Habash:

A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic. 2100-2113 - Artem Chervyakov, Ulyana Isaeva, Anton A. Emelyanov, Artem Safin, Maria Tikhonova, Alexander Kharitonov, Yulia Lyakh, Petr Surovtsev, Denis Shevelev, Vildan Saburov, Vasily Konovalov, Elisei Rykov, Ivan Sviridov, Amina Miftakhova, Ilseyar Alimova, Alexander Panchenko, Alexander Kapitanov, Alena Fenogenova:

Multimodal Evaluation of Russian-language Architectures. 2114-2161 - Abhilekh Borah, Shubhra Ghosh, Kedar Joshi, Aditya Kumar Guru, Kripabandhu Ghosh:

Don't Judge a Book by its Cover: Testing LLMs' Robustness Under Logical Obfuscation. 2162-2180 - Shifali Agrahari, Moushumi Mahato, Abhisek Tiwari, Javaid Nabi:

I know you are different! Towards Persona Driven Knowledge-infused Dialogue Assistant. 2181-2205 - Qifan Yu, Zhenyu He, Sijie Li, Zhou Xun, Jun Zhang, Jingjing Xu, Di He:

Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning. 2206-2222 - Hyunseo Shin, Wonseok Hwang:

Layer-wise Swapping for Generalizable Multilingual Safety. 2223-2238 - Sondre Wold, Étienne Simon, Erik Velldal, Lilja Øvrelid:

Measuring Idiomaticity in Text Embedding Models with epsilon-compositionality. 2239-2252 - Andrew Zhao, Reshmi Ghosh, Vitor Carvalho, Emily Lawton, Keegan Hines, Gao Huang, Jack W. Stokes:

Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers. 2253-2272 - Qian Wang, Ziqi Huang, Ruoxi Jia, Paul Debevec, Ning Yu:

MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling. 2273-2295 - Salam Khalifa, Abdelrahim Qaddoumi, Nizar Habash, Owen Rambow:

Computational Benchmarks for Egyptian Arabic Child Directed Speech. 2296-2307 - Wooseok Choi, Hyungbin Kim, Yon Dohn Chung:

K-LegalDeID: A Benchmark Dataset and KLUEBERT-CRF for De-identification in Korean Court Judgments. 2308-2325 - Yuanbo Tang, Naifan Zhang, Yan Tang, Meixuan Chen, Shuhan Huang, Tingyu Cao, Yang Li:

Specialization through Collaboration: Understanding Expert Interaction in Mixture-of-Expert Large Language Models. 2326-2339 - Kellen Tan Cheng, Ganesh Ramesh, Nafiul Rashid, Geoffrey J. Tso, Jilong Kuang:

Compact Language Models with Iterative Text Refinement for Health Dialogue Summarization. 2340-2363 - Alberto Testoni, Iacer Calixto:

Mind the Gap: Benchmarking LLM Uncertainty and Calibration with Specialty-Aware Clinical QA and Reasoning-Based Behavioural Features. 2364-2382 - Andreas Säuberli, Darja Jepifanova, Diego Frassinelli, Barbara Plank:

Controlling Reading Ease with Gaze-Guided Text Generation. 2383-2397 - Marie Bexte, Andrew Caines, Diane Nicholls, Paula Buttery, Torsten Zesch:

PictureStories: Predicting the Task Adherence of Language Learner Answers to a Picture Story-Based Writing Task. 2398-2415 - Vitalii Hirak, Jaap Jumelet, Arianna Bisazza:

Assessing the Impact of Typological Features on Multilingual Machine Translation in the Age of Large Language Models. 2416-2434 - Sviatoslav Lushnei, Dmytro Shumskyi, Severyn Shykula, Ernesto Jiménez-Ruiz, Artur d'Avila Garcez:

Large Language Models as Oracles for Ontology Alignment. 2435-2449 - Rahul Thapa, Qingyang Wu, Kevin Wu, Harrison Zhang, Angela Zhang, Eric Wu, Haotian Ye, James Zou:

Reasoning or Knowledge: Stratified Evaluation of Biomedical LLMs. 2450-2483 - Jonathan Davidov, Aviv Slobodkin, Shmuel Tomi Klein, Reut Tsarfaty, Ido Dagan, Ayal Klein:

Effective QA-Driven Annotation of Predicate-Argument Relations Across Languages. 2484-2502 - Wessel Poelman, Miryam de Lhoneux:

Form and Meaning in Intrinsic Multilingual Evaluations. 2503-2521 - Dongzhuoran Zhou, Yuqicheng Zhu, Xiaxia Wang, Hongkuan Zhou, Yuan He, Jiaoyan Chen, Steffen Staab, Evgeny Kharlamov:

What Breaks Knowledge Graph based RAG? Benchmarking and Empirical Insights into Reasoning under Incomplete Knowledge. 2522-2538 - Ivan Vykopal, Matús Pikuliak, Simon Ostermann, Marián Simko:

Assessing Web Search Credibility and Response Groundedness in Chat Assistants. 2539-2560 - Gautam Siddharth Kashyap, Mark Dras, Usman Naseem:

When the Model Said 'No Comment', We Knew Helpfulness Was Dead, Honesty Was Alive, and Safety Was Terrified. 2561-2572 - Rongzhi Li, Hitomi Yanaka:

NeuronMoE: Efficient Cross-Lingual Extension via Neuron-Guided Mixture-of-Experts. 2573-2586 - Vigneshwaran Shankaran, Gabriella Lapesa, Claudia Wagner:

From Emotion to Expression: Theoretical Foundations and Resources for Fear Speech. 2587-2606 - Vijini Liyanage, François Yvon:

AdaptBPE: From General Purpose to Specialized Tokenizers. 2607-2620 - Julia Romberg, Christopher Schröder, Julius Gonsior, Katrin Tomanek, Fredrik Olsson:

Reassessing Active Learning Adoption in Contemporary NLP: A Community Survey. 2621-2647 - Osama Mohammed Afzal, Preslav Nakov, Tom Hope, Iryna Gurevych:

Beyond "Not Novel Enough": Enriching Scholarly Critique with LLM-Assisted Feedback. 2648-2671 - Busayo Awobade, Mardhiyah Sanni, Tassallah Abdullahi, Chibuzor Okocha, Kelechi Ezema, Devendra Deepak Kayande, Lukman E. Ismaila, Tobi Olatunji, Gloria Ashiya Katuka:

AfriVox: Probing Multilingual and Accent Robustness of Speech LLMs. 2672-2690 - Tomás Freitas Osório, Henrique Lopes Cardoso:

PortOldBERT: Portuguese Historical Language Models. 2691-2705 - Alessio Cocchieri, Luca Ragazzi, Giuseppe Tagliavini, Gianluca Moro:

ReMedQA: Are We Done With Medical Multiple-Choice Benchmarks? 2706-2738 - Gabriele Maraia, Leonardo Ranaldi, Marco Valentino, Fabio Massimo Zanzotto:

Can Activation Steering Generalize Across Languages? A Study on Syllogistic Reasoning in Language Models. 2739-2753 - Viktoriia Zinkovich, Anton Antonov, Andrei Spiridonov, Denis Shepelev, Andrey Moskalenko, Daria Pugacheva, Elena Tutubalina, Andrey Kuznetsov, Vlad Shakhuro:

SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space. 2754-2775 - Nuhu Ibrahim, Robert Stevens, Riza Batista-Navarro:

Knowledge Augmentation Enhances Token Classification for Recipe Understanding. 2776-2788 - Basit Ali, Anubhav Sinha, Nitin Ramrakhiyani, Sachin Pawar, Girish Keshav Palshikar, Manoj Apte:

Argumentation and Judgement Factors: LLM-based Discovery and Application in Insurance Disputes. 2789-2804 - Quan Hung Tran, Pham Tien Nam, Son T. Luu, Kiet Van Nguyen:

ViGoEmotions: A Benchmark Dataset For Fine-grained Emotion Detection on Vietnamese Texts. 2805-2831 - Manuel Frank, Haithem Afli:

PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs. 2832-2851 - Maria Korobeynikova, Alessia Battisti, Lukas Fischer, Yingqiang Gao:

DETECT: Determining Ease and Textual Clarity of German Text Simplifications. 2852-2882 - Wei-Ling Hsu, Yu-Chien Tang, An-Zi Yen:

MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support. 2883-2901 - Zihao Li, Shaoxiong Ji, Jörg Tiedemann:

Test-Time Scaling of Reasoning Models for Machine Translation. 2902-2917 - Kiran Kate, Yara Rizk, Poulami Ghosh, Ashu Gulati, Tathagata Chakraborti, Zidane Wright, Mayank Agarwal:

How Good Are LLMs at Processing Tool Outputs? 2918-2941 - Soyoung Oh, Xinting Huang, Mathis Pink, Michael Hahn, Vera Demberg:

Tug-of-war between idioms' figurative and literal interpretations in LLMs. 2942-2958 - Debtanu Datta, Mohan Kishore Chilukuri, Yash Kumar, Saptarshi Ghosh, Muhammad Bilal Zafar:

Do LLM hallucination detectors suffer from low-resource effect? 2959-2985 - Anas Belfathi, Nicolas Hernandez, Laura Monceaux, Warren Bonnard, Mary Catherine Lavissière, Christine Jacquin, Richard Dufour:

Coupling Local Context and Global Semantic Prototypes via a Hierarchical Architecture for Rhetorical Roles Labeling. 2986-3004 - Juncheng Wang, Zhe Hu, Chao Xu, Siyue Ren, Yuxiang Feng, Yang Liu, Baigui Sun, Shujun Wang:

Guided by the Plan: Enhancing Faithful Autoregressive Text-to-Audio Generation with Guided Decoding. 3005-3018 - Andrea Ermellino, Lorenzo Malandri, Fabio Mercorio, Antonio Serino:

Safe-Unsafe Concept Separation Emerges from a Single Direction in Language Models Activation Space. 3019-3034 - Róbert Belanec, Branislav Pecher, Ivan Srba, Mária Bieliková:

PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark. 3035-3054 - Chenhui Li, Weihai Lu:

Decoding the Market's Pulse: Context-Enriched Agentic Retrieval Augmented Generation for Predicting Post-Earnings Price Shocks. 3055-3073 - May Bashendy, Walid Massoud, Sohaila Eltanbouly, Salam Albatarni, Marwan Sayed, Abrar Abir, Houda Bouamor, Tamer Elsayed:

LAILA: A Large Trait-Based Dataset for Arabic Automated Essay Scoring. 3074-3091 - Benjamin Elder, Anupama Murthi, Jungkoo Kang, Ankita Naik, Kinjal Basu, Kiran Kate, Danish Contractor:

Live API-Bench: 2500+ Live APIs for Testing Multi-Step Tool Calling. 3092-3124 - Arkadiusz Modzelewski, Witold Sosnowski, Eleni Papadopulos, Elisa Sartori, Tiziano Labruna, Giovanni Da San Martino, Adam Wierzbicki:

MALicious INTent Dataset and Inoculating LLMs for Enhanced Disinformation Detection. 3125-3148 - Felicia Körner, Max Müller-Eberstein, Anna Korhonen, Barbara Plank:

When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training. 3149-3169 - Qiao Liang, Yanjiang Liu, Weixiang Zhou, Ben He, Yaojie Lu, Hongyu Lin, Jia Zheng, Xianpei Han, Le Sun, Yingfei Sun:

Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models. 3170-3184 - Kin Kwan Leung, Mouloud Belbahri, Yi Sui, Alex Labach, Xueying Zhang, Stephen Rose, Jesse C. Cresswell:

Classifying and Addressing the Diversity of Errors in Retrieval-Augmented Generation Systems. 3185-3207 - Haoyu Jiang, Fanjie Zeng, Boan Qu, Xiaojie Lin, Wei Zhong:

Helios: A Foundational Language Model for Smart Energy Knowledge Reasoning and Application. 3208-3220 - Georgii Aparin, Tasnima Sadekova, Alexey D. Rukhovich, Assel Yermekova, Laida Kushnareva, Vadim Popov, Kristian Kuznetsov, Irina Piontkovskaya:

AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders. 3221-3254 - Anna Bavaresco, Marianne de Heer Kloots, Sandro Pezzelle, Raquel Fernández:

Vision-Language Models Align with Human Neural Representations in Concept Processing. 3255-3274 - Minh Ngoc Ta, Dong Cao Van, Duc-Anh Hoang, Minh Le-Anh, Truong Nguyen, My Anh Tran Nguyen, Yuxia Wang, Preslav Nakov, Dinh Viet Sang:

FAID: Fine-grained AI-generated Text Detection using Multi-task Auxiliary and Multi-level Contrastive Learning. 3275-3296 - Jaap Jumelet, Abdellah Fourtassi, Akari Haga, Bastian Bunzeck, Bhargav Shandilya, Diana Galván-Sosa, Faiz Ghifari Haznitrama, Francesca Padovani, Francois Meyer, Hai Hu, Julen Etxaniz, Laurent Prévot, Linyang He, María Grandury, Mila Marcheva, Negar Foroutan, Nikitas Theodoropoulos, Pouya Sadeghi, Siyuan Song, Suchir Salhan, Susana Zhou, Yurii Paniv, Ziyin Zhang, Arianna Bisazza, Alex Warstadt, Leshem Choshen:

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data. 3297-3329 - Seojin Hwang, Yumin Kim, Byeongjeong Kim, Donghoon Shin, Hwanhee Lee:

Personality Editing for Language Models through Adjusting Self-Referential Queries. 3330-3351 - Daniel Fadlon, Kfir Bar:

How Much Pretraining Does Structured Data Need? 3352-3365 - Xiutian Zhao, Rochelle Choenni, Rohit Saxena, Ivan Titov:

Finding Culture-Sensitive Neurons in Vision-Language Models. 3366-3381 - Léo Labat, Étienne Ollion, François Yvon:

Polyglots or Multitudes? Multilingual LLM Answers to Value-laden Multiple-Choice Questions. 3382-3398 - Serwar Basch, Ilia Kuznetsov, Tom Hope, Iryna Gurevych:

ABCD-LINK: Annotation Bootstrapping for Cross-Document Fine-Grained Links. 3399-3423 - Sukannya Purkayastha, Nils Dycke, Anne Lauscher, Iryna Gurevych:

Decision-Making with Deliberation: Meta-reviewing as a Document-grounded Dialogue. 3424-3465 - Shreyas N. Samaga, Gilberto Gonzalez Arroyo, Tamal K. Dey:

HalluZig: Hallucination Detection using Zigzag Persistence. 3466-3482 - Matt Pauk, Maria Leonor Pacheco:

Mapping the Course for Prompt-based Structured Prediction. 3483-3508 - Runpeng Dai, Run Yang, Fan Zhou, Hongtu Zhu:

Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models. 3509-3521 - Huayu Li, Zhengxiao He, Siyuan Tian, Jinghao Wen, Ao Li:

Martingale Foresight Sampling: A Principled Approach to Inference-Time LLM Decoding. 3522-3533 - Ian Berlot-Attwell, Tobias Sesterhenn, Frank Rudzicz, Xujie Si:

Is This LLM Library Learning? Evaluation Must Account For Compute and Behaviour. 3534-3568 - Jongwoo Park, Kanchana Ranasinghe, Kumara Kahatapitiya, Wonjeong Ryu, Donghyun Kim, Michael S. Ryoo:

Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA. 3569-3588 - Aishwarya Maheswaran, Maunendra Sankar Desarkar:

A Unified View on Emotion Representation in Large Language Models. 3589-3610 - Shima Imani, Seungwhan Moon, Lambert Mathias, Lu Zhang, Babak Damavandi:

TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models. 3611-3625 - Mohamed Elaraby, Diane J. Litman:

ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs. 3626-3643 - Potsawee Manakul, Woody Haosheng Gan, Michael J. Ryan, Ali Sartaz Khan, Warit Sirichotedumrong, Kunat Pipatanakul, William Barr Held, Diyi Yang:

AudioJudge: Understanding What Works in Large Audio Model Based Speech Evaluation. 3644-3663 - Leonardo Ranaldi, Giulia Pucci:

Learning Multilingual Agentic Policy to Control Sycophancy. 3664-3681 - Seungho Lee, Kyumin Lee:

ToxiPrompt: A Two-Stage Red-Teaming Approach for Balancing Adversarial Prompt Diversity and Response Toxicity. 3682-3696 - Kosei Uemura, Miaoran Zhang, David Ifeoluwa Adelani:

AfriMTEB and AfriE5: Benchmarking and Adapting Text Embedding Models for African Languages. 3697-3717 - Shanshan Liu, Noriki Nishida, Fei Cheng, Narumi Tokunaga, Rumana Ferdous Munne, Yuki Yamagata, Kouji Kozaki, Takehito Utsuro, Yuji Matsumoto:

Better Generalizing to Unseen Concepts: An Evaluation Framework and An LLM-Based Auto-Labeled Pipeline for Biomedical Concept Recognition. 3718-3734 - Dhananjay Ashok, Suraj Nair, Mutasem Al-Darabsah, Choon Hui Teo, Tarun Agarwal, Jonathan May:

A Representation Sharpening Framework for Zero Shot Dense Retrieval. 3735-3751 - Praveen Venkateswaran, Danish Contractor:

Spotlight Your Instructions: Instruction-following with Dynamic Attention Steering. 3752-3770 - Matthew Toles, Isaac Song, Rattandeep Singh, Zhou Yu:

FormGym: Doing Paperwork with Agents. 3771-3785 - Sil Hamilton, Matthew Wilkens, Andrew Piper:

NarraBench: A Comprehensive Framework for Narrative Benchmarking. 3786-3801 - Yu-Neng Chuang, Guanchu Wang, Chia-Yuan Chang, Ruixiang Tang, Shaochen Zhong, Fan Yang, Andrew Wen, Mengnan Du, Xuanting Cai, Vladimir Braverman, Xia Hu:

FaithLM: Towards Faithful Explanations for Large Language Models. 3802-3824 - Matteo Gay, Coleman Haley, Mario Giulianelli, Edoardo M. Ponti:

Is Information Density Uniform when Utterances are Grounded on Perception and Discourse? 3825-3853 - Ayoub Hammal, Pierre Zweigenbaum, Caio Corro:

KAD: A Framework for Proxy-based Test-time Alignment with Knapsack Approximation Deferral. 3854-3872 - Abeer Badawi, Elahe Rahimi, Md. Tahmid Rahman Laskar, Sheri Grach, Lindsay Bertrand, Lames Danok, Prathiba Dhanesh, Jimmy Huang, Frank Rudzicz, Elham Dolatabadi:

When Can We Trust LLMs in Mental Health? Large-Scale Benchmarks for Reliable LLM Evaluation. 3873-3896 - Benno Uthayasooriyar, Antoine Ly, Franck Vermet, Caio Corro:

DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures. 3897-3907 - Hyunji Nam, Lucía Langlois, James Malamut, Mei Tan, Dorottya Demszky:

IDEAlign: Comparing Ideas of Large Language Models to Domain Experts. 3908-3925 - Yue Zhou, Xiaobo Guo, Belhassen Bayar, Srinivasan H. Sengamedu:

Amory: Building Coherent Narrative-Driven Agent Memory through Agentic Reasoning. 3926-3938 - Cristian Santini, Marieke van Erp, Mehwish Alam:

It's All About the Confidence: An Unsupervised Approach for Multilingual Historical Entity Linking using Large Language Models. 3939-3954 - Carolin Holtermann, Florian Schneider, Anne Lauscher:

SoS: Analysis of Surface over Semantics in Multilingual Text-To-Image Generation. 3955-3995 - Ahmad Aljanaideh:

Gender and Politeness Perception: A Novel Approach for Exploring Annotations Disagreement. 3996-4005 - Carolin Holtermann, Nina Krebs, Anne Lauscher:

TempViz: On the Evaluation of Temporal Knowledge in Text-to-Image Models. 4006-4028 - Peiran Li, Jan Fillies, Adrian Paschke:

ToxiGAN: Toxic Data Augmentation via LLM-Guided Directional Adversarial Generation. 4029-4044 - Adriana Valentina Costache, Silviu Florin Gheorghe, Eduard Gabriel Poesina, Paul Irofti, Radu Tudor Ionescu:

Text Classification Under Class Distribution Shift: A Survey. 4045-4060 - Atoosa Malemir Chegini, Hamid Kazemi, Garrett Souza, Maria Safi, Yang Song, Samy Bengio, Sinead Williamson, Mehrdad Farajtabar:

Reasoning's Razor: Reasoning Improves Accuracy but Hurts Recall at Critical Operating Points in Safety and Hallucination Detection. 4061-4086 - Huaiyuan Yao, Wanpeng Xu, Justin Turnau, Nadia Kellam, Hua Wei:

Instructional Agents: Reducing Teaching Faculty Workload through Multi-Agent Instructional Design. 4087-4109 - Weishi Wang, Hengchang Hu, Daniel Dahlmeier:

Rethinking Reading Order: Toward Generalizable Document Understanding with LLM-based Relation Modeling. 4110-4130 - Yi Zheng, Björn Ross, Walid Magdy:

Validating Automatic Evaluation of Controllable Counterspeech Generation: Rankings Matter More Than Scores. 4131-4146 - Rheeya Uppaal, Phu Mon Htut, Min Bai, Nikolaos Pappas, Zheng Qi, Sandesh Swamy:

Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking. 4147-4168 - Ha Min Son, Huan Ren, Xin Liu, Zhe Zhao:

Automating Android Build Repair: Bridging the Reasoning-Execution Gap in LLM Agents with Domain-Specific Tools. 4169-4189 - Roelien C. Timmer, Necva Bölücü, Stephen Wan:

MetaLead: A Comprehensive Human-Curated Leaderboard Dataset for Transparent Reporting of Machine Learning Experiments. 4190-4206 - Zhiyu Xue, Reza Abbasi-Asl, Ramtin Pedarsani:

Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations. 4207-4220 - Yujia Hu, Roy Ka-Wei Lee:

HateXScore: A Metric Suite for Evaluating Reasoning Quality in Hate Speech Explanations. 4221-4240 - Zhengyang Shan, Aaron Mueller:

Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics? 4241-4265 - Bo Ni, Yu Wang, Leyao Wang, Branislav Kveton, Franck Dernoncourt, Yu Xia, Hongjie Chen, Reuben Luera, Samyadeep Basu, Subhojyoti Mukherjee, Puneet Mathur, Nesreen K. Ahmed, Junda Wu, Li Li, Huixin Zhang, Ruiyi Zhang, Tong Yu, Sungchul Kim, Jiuxiang Gu, Zhengzhong Tu, Alexa F. Siu, Zichao Wang, Seunghyun Yoon, Nedim Lipka, Namyong Park, Zihao Lin, Trung Bui, Yue Zhao, Tyler Derr, Ryan A. Rossi:

A Survey on LLM-based Conversational User Simulation. 4266-4301 - Iffat Maab, Usman Haider, Junichi Yamagishi:

Prompt-driven Detection of Offensive Urdu Language using Large Language Models. 4302-4327 - Tiejin Chen, Kaishen Wang, Hua Wei:

Zer0-Jack: A memory-efficient gradient-based jailbreaking method for black box Multi-modal Large Language Models. 4328-4344 - Youngseung Jeon, Ziwen Li, Thomas Li, JiaSyuan Chang, Morteza Ziyadi, Xiang Anthony Chen:

RAGPPI: Retrieval-Augmented Generation Benchmark for Protein-Protein Interactions in Drug Discovery. 4345-4363 - Hee-Soo Kim, Jun-Young Kim, Jeong-Hwan Lee, Seong-Jin Park, Kang-Min Kim:

Don't Generate, Classify! Low-Latency Prompt Optimization with Structured Complementary Prompt. 4364-4383 - Bingxuan Hou, Jiayi Lin, Chenyang Zhang, Dapeng Yin, Shuyue Zhu, Qingqing Hong, Mengna Gao, Jun-li Wang:

CHROMIC: Chronological Reasoning Across Multi-Panel Comics. 4384-4400 - Kai Yao, Zhenghan Song, Kaixin Wu, Mingjie Zhong, Danzhao Cheng, Zhaorui Tan, Yixin Ji, Penglei Gao:

GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer Selection. 4401-4416 - Chuyuan Li, Giuseppe Carenini:

BeDiscovER: The Benchmark of Discourse Understanding in the Era of Reasoning Language Models. 4417-4479 - Chuang Zhang, Zizhen Zhu, Yihao Wei, Bing Tian, Junyi Liu, Henan Wang, Wang Xavier, Yaxiao Liu:

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning. 4480-4501 - Anubhab Mandal, Sandeep Mishra, Bishal Santra, Tushar Abhishek, Pawan Goyal, Manish Gupta:

Chat-Ghosting: Methods for Auto-Completion in Dialog Systems. 4502-4528 - Sirat Samyoun, Yingtai Xiao, Jian Du:

Attribution-Guided Multi-Object Hallucination and Bias Detection in Vision-Language Models. 4529-4548 - Ning Shi, Bradley Hauer, David Basil, John Zhang, Grzegorz Kondrak:

Word Surprisal Correlates with Sentential Contradiction in LLMs. 4549-4564 - Sharanya Dasgupta, Arkaprabha Basu, Sujoy Nath, Swagatam Das:

ARREST: Adversarial Resilient Regulation Enhancing Safety and Truth in Large Language Models. 4565-4584 - Chen Kim Heng, Shao Wen Tong, Julian Wong Wei Sheng:

Re2-DocRED: Revisiting Revisited-DocRED for Joint Entity and Relation Extraction. 4585-4621 - Nura Aljaafari, Danilo S. Carvalho, André Freitas:

Where Do LLMs Compose Meaning? A Layerwise Analysis of Compositional Robustness. 4622-4646 - Bryan Chen Zhengyu Tan, Weihua Zheng, Zhengyuan Liu, Nancy F. Chen, Hwaran Lee, Kenny Tsu Wei Choo, Roy Ka-Wei Lee:

BLEnD-Vis: Benchmarking Multimodal Cultural Understanding in Vision Language Models. 4647-4669 - Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam:

Document-Level Zero-Shot Relation Extraction with Entity Side Information. 4670-4680 - Daniel Scalena, Gabriele Sarti, Arianna Bisazza, Elisabetta Fersini, Malvina Nissim:

Steering Large Language Models for Machine Translation Personalization. 4681-4701 - Eunkyung Choi, Young Jin Suh, Siun Lee, Hongseok Oh, Juheon Kang, Won Hur, Hun Park, Wonseok Hwang:

Taxation Perspectives from Large Language Models: A Case Study on Additional Tax Penalties. 4702-4726 - Shigeng Chen, Linhao Luo, Zhangchi Qiu, Yanan Cao, Carl Yang, Shirui Pan:

Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing. 4727-4751 - Wafaa Mohammed, Vlad Niculae, Chrysoula Zerva:

Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding. 4752-4774 - Ryo Nagata, Daichi Mochihashi, Misato Ido, Yusuke Kubota, Naoki Otani, Yoshifumi Kawasaki, Hiroya Takamura:

Cross-lingual and Word-Independent Methods for Quantifying Degree of Grammaticalization. 4775-4787 - Hans Hergen Lehmann, Jae Hee Lee, Steven Schockaert, Stefan Wermter:

Knowing the Facts but Choosing the Shortcut: Understanding How Large Language Models Compare Entities. 4788-4821 - Everlyn Asiko Chimoto, Mostafa Elhoushi, Bruce A. Bassett:

Calibrating Beyond English: Language Diversity for Better Quantized Multilingual LLMs. 4822-4838 - Khanh-Tung Tran, Barry O'Sullivan, Hoang D. Nguyen:

LaCoMSA: Language-Consistency Multilingual Self-Alignment with Latent Representation Rewarding. 4839-4853 - Kartik Ravisankar, HyoJung Han, Sarah Wiegreffe, Marine Carpuat:

Can you map it to English? The Role of Cross-Lingual Alignment in the Multilingual Performance of LLMs. 4854-4872 - Ponrawee Prasertsom, Andrea Silvi, Jennifer Culbertson, Devdatt P. Dubhashi, Moa Johansson, Kenny Smith:

Recursive numeral systems are highly regular and easy to process. 4873-4885 - Ana Ezquerro, Carlos Gómez-Rodríguez, David Vilares:

Bringing Emerging Architectures to Sequence Labeling in NLP. 4886-4909 - Zijie Wang, Xinyu Yan, Che Wang, Zihao Zeng, Lei Xiao, Wei Yang Bryan Lim:

SEMIROUTER: Sparse-Data Enhanced Routing for Adaptive Multi-LLM System. 4910-4921 - Hyeseon An, Shinwoo Park, Suyeon Woo, Yo-Sub Han:

DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation. 4922-4936 - Zhaoyue Sun, Gabriele Pergola, Yulan He:

Boundary-Aware LLM Augmentation for Low-Resource Event Argument Extraction. 4937-4953 - Gaifan Zhang, Yi Zhou, Danushka Bollegala:

CASE - Condition-Aware Sentence Embeddings for Conditional Semantic Textual Similarity Measurement. 4954-4968 - Mingyang Li, Viktor Schlegel, Tingting Mu, Wuraola Oyewusi, Kai Kang, Goran Nenadic:

Evaluation and LLM-Guided Learning of ICD Coding Rationales. 4969-5003 - Tianhui Zhang, Yi Zhou, Danushka Bollegala:

Evaluating the Effect of Retrieval Augmentation on Social Biases. 5004-5026 - Angana Borah, Rada Mihalcea, Verónica Pérez-Rosas:

Persuasion at Play: Understanding Misinformation Dynamics in Demographic-Aware Human-LLM Interactions. 5027-5053 - Xianzhi Li, Ethan Callanan, Abdellah Ghassel, Xiaodan Zhu:

Entropy-Gated Branching for Efficient Test-Time Reasoning. 5054-5069 - Sriram Balasubramanian, Samyadeep Basu, Koustava Goswami, Ryan Anthony Rossi, Varun Manjunatha, Roshan Santhosh, Ruiyi Zhang, Soheil Feizi, Nedim Lipka:

Decomposition-Enhanced Training for Post-Hoc Attributions in Language Models. 5070-5084 - Shubham Kulkarni, Alexander Lyzhov, Preetam Joshi, Shiva Chaitanya:

INSURE-Dial: A Phase-Aware Conversational Dataset Benchmark for Compliance Verification and Phase Detection. 5085-5109 - Antonia Karamolegkou, Angana Borah, Eunjung Cho, Sagnik Ray Choudhury, Martina Galletti, Pranav Gupta, Oana Ignat, Priyanka Kargupta, Neema Kotonya, Hemank Lamba, Sun-Joo Lee, Arushi Mangla, Ishani Mondal, Fatima Zahra Moudakir, Deniz Nazar, Poli Nemkova, Dina Pisarevskaya, Naquee Rizwan, Nazanin Sabri, Keenan Samway, Dominik Stammbach, Anna Steinberg, David Tomás, Steven R. Wilson, Bowen Yi, Jessica H. Zhu, Arkaitz Zubiaga, Anders Søgaard, Alexander Fraser, Zhijing Jin, Rada Mihalcea, Joel R. Tetreault, Daryna Dementieva:

NLP for Social Good: A Survey and Outlook of Challenges, Opportunities and Responsible Deployment. 5110-5170 - Suyash Fulay, Jocelyn Zhu, Michiel A. Bakker:

From Delegates to Trustees: How Optimizing for Long-Term Interests Shapes Bias and Alignment in LLMs. 5171-5194 - Ivan Vykopal, Antonia Karamolegkou, Jaroslav Kopcan, Qiwei Peng, Tomas Javurek, Michal Gregor, Marián Simko:

Investigating Language and Retrieval Bias in Multilingual Previously Fact-Checked Claim Detection. 5195-5221 - Faezeh Hosseini, Mohammadali Yousefzadeh, Yadollah Yaghoobzadeh:

FFE-Hallu: Hallucinations in Fixed Figurative Expressions: A Benchmark of Idioms and Proverbs in the Persian Language. 5222-5235 - Delvin Ce Zhang, Suhan Cui, Zhelin Chu, Xianren Zhang, Dongwon Lee:

MEVER: Multi-Modal and Explainable Claim Verification with Graph-based Evidence Retrieval. 5236-5255 - Shubham Patle, Sara Ghaboura, Hania Tariq, Mohammad Usman Khan, Omkar Thawakar, Rao Muhammad Anwer, Salman H. Khan:

DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding. 5256-5269 - Ofer Meshi, Krisztian Balog, Sally Goldman, Avi Caciularu, Guy Tennenholtz, Jihwan Jeong, Amir Globerson, Craig Boutilier:

ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders. 5270-5304 - Yu Wu, Ke Shu, Jonas Fischer, Lidia Pivovarova, David Rosson, Eetu Mäkelä, Mikko Tolonen:

Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark. 5305-5328 - Pedro Henrique Luz de Araujo, Michael A. Hedderich, Ali Modarressi, Hinrich Schütze, Benjamin Roth:

Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions. 5329-5359 - Paul Grundmann, Jan Frick, Dennis Fast, Thomas Steffek, Felix A. Gers, Wolfgang Nejdl, Alexander Löser:

CliniBench: A Clinical Outcome Prediction Benchmark for Generative and Encoder-Based Language Models. 5360-5378 - Mohd Mujtaba Akhtar, Girish, Muskaan Singh:

DIVINE : Coordinating Multimodal Disentangled Representations for Oro-Facial Neurological Disorder Assessment. 5379-5392 - Imry Ziv, Nur Geffen Lan, Emmanuel Chemla:

Biasless Language Models Learn Unnaturally: How LLMs Fail to Distinguish the Possible from the Impossible. 5393-5403 - Mohd Mujtaba Akhtar, Girish, Farhan Sheth, Muskaan Singh:

Bridging Attribution and Open-Set Detection using Graph-Augmented Instance Learning in Synthetic Speech. 5404-5413 - Pranav Shetty, Mirazul Haque, Zhiqiang Ma, Xiaomo Liu:

Detecting Non-Membership in LLM Training Data via Rank Correlations. 5414-5429 - Jiarui Liu, Weihao Xuan, Zhijing Jin, Mona T. Diab:

Taming Object Hallucinations with Verified Atomic Confidence Estimation. 5430-5444 - Nithin Sivakumaran, Justin Chih-Yao Chen, David Wan, Yue Zhang, Jaehong Yoon, Elias Stengel-Eskin, Mohit Bansal:

DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning. 5445-5464 - Saptarshi Sengupta, Zhengyu Zhou, Jun Araki, Xingbo Wang, Bingqing Wang, Suhang Wang, Zhe Feng:

ToolDreamer: Instilling LLM Reasoning Into Tool Retrievers. 5465-5482 - Luca Mainardi, Selçuk Sandikci, Joaquin Vanschoren:

An Empirical Study of Speculative Decoding for Small Language Models. 5483-5497 - Rishi Ravikumar, Nuhu Ibrahim, Riza Batista-Navarro:

Lost in Formatting: How Output Formats Skew LLM Performance on Information Extraction. 5498-5513 - Shiv Shankar:

Pseudo-Likelihood Training for Reasoning Diffusion Language Models. 5514-5529 - Ján Cegin, Branislav Pecher, Ivan Srba, Jakub Simko:

RoSE: Round-robin Synthetic Data Evaluation for Selecting LLM Generators without Human Test Sets. 5530-5545 - Tianyi Niu, Jaemin Cho, Elias Stengel-Eskin, Mohit Bansal:

RotBench: Evaluating Multi-modal Large Language Models on Identifying Image Rotation. 5546-5569 - Alireza Dehghanpour Farashah, Aditi Khandelwal, Marylou Fauchard, Zhuan Shi, Negar Rostamzadeh, Golnoosh Farnadi:

Multilingual Amnesia: On the Transferability of Unlearning in Multilingual LLMs. 5570-5589 - Yuxuan Jiang, Francis Ferraro:

Beyond Math: Stories as a Testbed for Memorization-Constrained Reasoning in LLMs. 5590-5607 - Disha Makhija, Manoj Ghuhan Arivazhagan, Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah:

Neural Breadcrumbs: Membership Inference Attacks on LLMs Through Hidden State and Attention Pattern Analysis. 5608-5620 - Paul Quinlan, Qingguo Li, Xiaodan Zhu:

Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data. 5621-5647 - Luca Benedetto, Antonia Donvito, Alberto Lucchetti, Andrea Cappelli, Paula Buttery:

Beyond Names: How Grammatical Gender Markers Bias LLM-based Educational Recommendations. 5648-5668 - Mathieu Sibue, Andrés Muñoz Garza, Samuel Mensah, Pranav Shetty, Zhiqiang Ma, Xiaomo Liu, Manuela Veloso:

ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images. 5669-5688 - Zhaotian Weng, Haoxuan Li, Xin Eric Wang, Kuan-Hao Huang, Jieyu Zhao:

What's Missing in Vision-Language Models? Probing Their Struggles with Causal Order Reasoning. 5689-5701 - Mingrui Ye, Chanjin Zheng, Zengyi Yu, Chenyu Xiang, Zhixue Zhao, Zheng Yuan, Helen Yannakoudakis:

KidsArtBench: Multi-Dimensional Children's Art Evaluation with Attribute-Aware MLLMs. 5702-5722 - Navita Goyal, Hal Daumé III:

Steering Safely or Off a Cliff? Rethinking Specificity and Robustness in Inference-Time Interventions. 5723-5738 - Xin Zhao, Naoki Yoshinaga, Tsuta Yuma, Akiko Aizawa:

Tracing Multilingual Knowledge Acquisition Dynamics in Domain Adaptation: A Case Study of Biomedical Adaptation. 5739-5760 - Marisa Hudspeth, Patrick J. Burns, Brendan T. O'Connor:

Contextual morphologically-guided tokenization for Latin encoder models. 5761-5775 - Yang Zhang, Amr Mohamed, Hadi Abdine, Guokan Shang, Michalis Vazirgiannis:

Beyond Random Sampling: Efficient Language Model Pretraining via Curriculum Learning. 5776-5794 - Shiyi Ding, Shaoen Wu, Ying Chen:

ObjChangeVR: Object State Change Reasoning from Continuous Egocentric Views in VR Environments. 5795-5812 - Yiyang Feng, Zeming Chen, Haotian Wu, Jiawei Zhou, Antoine Bosselut:

Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge. 5813-5847 - Jingyi Chen, Zhimeng Guo, Jiyun Chun, Pichao Wang, Andrew Perrault, Micha Elsner:

Do Audio LLMs Really LISTEN, or Just Transcribe? Measuring Lexical vs. Acoustic Emotion Cues Reliance. 5848-5877 - Zili Huang, Matthew Maciejewski, Leibny Paola García-Perera, Shinji Watanabe, Sanjeev Khudanpur:

CSPB: Conversational Speech Processing Benchmark for Self-supervised Speech Models. 5878-5893 - Pulkit Madaan, Krithika Ramesh, Lisa Bauer, Charith Peris, Anjalie Field:

Multi-Token Completion for Text Anonymization. 5894-5908 - Kosei Uemura, David Guzmán, Quang Phuoc Nguyen, Jesujoba Oluwadara Alabi, En-Shiun Annie Lee, David Ifeoluwa Adelani:

MERLIN: Multi-Stage Curriculum Alignment for Multilingual Encoder-LLM Integration in Cross-Lingual Reasoning. 5909-5924 - Ye Yu, Haibo Jin, Yaoning Yu, Jun Zhuang, Haohan Wang:

Now You Hear Me: Audio Narrative Attacks Against Large Audio-Language Models. 5925-5939 - Aaron J. Li, Suraj Srinivas, Usha Bhalla, Himabindu Lakkaraju:

Evaluating Adversarial Robustness of Concept Representations in Sparse Autoencoders. 5940-5957 - Karin de Langis, Püren Öncel, Ryan Peters, Andrew Elfenbein, Laura Kristen Allen, Andreas Schramm, Dongyeop Kang:

Mary, the Cheeseburger-Eating Vegetarian: Do LLMs Recognize Incoherence in Narratives? 5958-5970 - Karin de Langis, Jong Inn Park, Khanh Chi Le, Andreas Schramm, Andrew Elfenbein, Michael C. Mensink, Dongyeop Kang:

Strong Memory, Weak Control: An Empirical Study of Executive Functioning in LLMs. 5971-5986 - Soma Sato, Ryohei Sasano:

How Do Language Models Acquire Character-Level Information? 5987-5997 - Sean Leishman, Sarenne Wallbridge, Peter Bell:

Analysing the role of lexical and temporal information in turn-taking through predictability. 5998-6009 - Jiyun Chun, Eric Fosler-Lussier, Michael White, Andrew Perrault:

Beyond Length: Context-Aware Expansion and Independence as Developmentally Sensitive Evaluation in Child Utterances. 6010-6030 - Zilong Li, Jie Cao:

Translation via Annotation: A Computational Study of Translating Classical Chinese into Japanese. 6031-6045 - Yuatyong Chaichana, Pittawat Taveekitworachai, Warit Sirichotedumrong, Potsawee Manakul, Kunat Pipatanakul:

Extending Audio Context for Long-Form Understanding in Large Audio-Language Models. 6046-6066 - Sai Akhil Kogilathota, Sripadha Vallabha E. G, Luzhe Sun, Jiawei Zhou:

HALP: Detecting Hallucinations in Vision-Language Models without Generating a Single Token. 6067-6085 - Aaryamonvikram Singh, Debopriyo Banerjee, Dhruv Sahnan, Monojit Choudhury, Shivam Chauhan, Rocktim Jyoti Das, Xudong Han, Haonan Li, Alok Anil Jadhav, Utkarsh Agarwal, Mukund Choudhary, Fajri Koto, Junaid Bhat, Awantika Shukla, Samujjwal Ghosh, Samta Kamboj, Onkar Pandit, Lalit Pradhan, Rahul Pal, Sunil Kumar Sahu, Parvez Mullah, Ali El Filali, Zainul Abedien Ahmed Quraishi, Neha Sengupta, Gokul Ramakrishnan, Rituraj Joshi, Gurpreet Gosal, Avraham Sheinin, Natalia Vassilieva, Preslav Nakov:

Nanda Family: Open-Weights Generative Large Language Models for Hindi. 6086-6108 - Daniel Brubaker, William Sheffield, Junyi Jessy Li, Kanishka Misra:

Wugnectives: Novel Entity Inferences of Language Models from Discourse Connectives. 6109-6127 - Amey Hengle, Prasoon Bajpai, Soham Dan, Tanmoy Chakraborty:

Can LLMs reason over extended multilingual contexts? Towards long-context evaluation beyond retrieval over haystacks. 6128-6152 - Sravanthi Machcha, Sushrita Yerra, Sahil Gupta, Aishwarya Sahoo, Sharmin Sultana, Hong Yu, Zonghai Yao:

Knowing When to Abstain: Medical LLMs Under Clinical Uncertainty. 6153-6182 - Zonghai Yao, Zihao Zhang, Chaolong Tang, Xingyu Bian, Youxia Zhao, Zhichao Yang, Junda Wang, Huixue Zhou, Won Seok Jang, Feiyun Ouyang, Hong Yu:

MedQA-CS: Objective Structured Clinical Examination (OSCE)-Style Benchmark for Evaluating LLM Clinical Skills. 6183-6257 - Santosh Srinath K, Mudit Somani, Varun Reddy Padala, Prajna Upadhyay, Abhijit Das:

Continual-learning for Modelling Low-Resource Languages from Large Language Models. 6258-6275 - Jongwon Ryu, Joonhyung Park, Jaeho Han, Yeong-Seok Kim, Hye-Rin Kim, Sunjae Yoon, Junyeong Kim:

Language-Grounded Multi-Domain Image Translation via Semantic Difference Guidance. 6276-6288 - Junior Cedric Tonga, Chen Cecilia Liu, Iryna Gurevych, Fajri Koto:

LLMs as Cultural Archives: Cultural Commonsense Knowledge Graph Extraction. 6289-6309 - Hamdy Mubarak, Majd Hawasly, Abubakr Mohamed:

Nahw: A Comprehensive Benchmark of Arabic Grammar Understanding, Error Detection, Correction, and Explanation. 6310-6328 - Li-Chun Lu, Miri Liu, Pin-Chun Lu, Yufei Tian, Shao-Hua Sun, Nanyun Peng:

Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations. 6329-6352 - Inho Won, Hangyeol Yoo, Minkyung Cho, Jungyeul Park, Hoyun Song, KyungTae Lim:

TReX: Tokenizer Regression for Optimal Data Mixture. 6353-6370 - Jiangnan Li, Thuy-Trang Vu, Christian Herold, Amirhossein Tebbifakhr, Shahram Khadivi, Gholamreza Haffari:

CONGRAD: Conflicting Gradient Filtering for Multilingual Preference Alignment. 6371-6387 - Pranav Bhandari, Nicolas Fay, Sanjeevan Selvaganapathy, Amitava Datta, Usman Naseem, Mehwish Nasim:

Activation-Space Personality Steering: Hybrid Layer Selection for Stable Trait Control in LLMs. 6388-6403 - Sergey Pankratov, Dan Alistarh:

Speculative Decoding Speed-of-Light: Optimal Lower Bounds via Branching Random Walks. 6404-6418 - Vítor Lourenço, Aline Paes, Tillman Weyde, Audrey Depeige, Mohnish Dubey:

KG-CRAFT: Knowledge Graph-based Contrastive Reasoning with LLMs for Enhancing Automated Fact-checking. 6419-6439 - Hang Ding, Yilun Zhao, Tiansheng Hu, Manasi Patwardhan, Arman Cohan:

SciRAG: Adaptive, Citation-Aware, and Outline-Guided Retrieval and Synthesis for Scientific Literature. 6440-6460 - Marton Szep, Jorge Marin Ruiz, Georgios Kaissis, Paulina Seidl, Rüdiger von Eisenhart-Rothe, Florian Hinterwimmer, Daniel Rueckert:

Unintended Memorization of Sensitive Information in Fine-Tuned Language Models. 6461-6480 - Giuseppe Russo, Debora Nozza, Paul Röttger, Dirk Hovy:

The Pluralistic Moral Gap: Understanding Moral Judgment and Value Differences between Humans and Large Language Models. 6481-6497 - Van-Quang Nguyen, Takayuki Okatani:

CoReTab: Improving Multimodal Table Understanding with Code-driven Reasoning. 6498-6523 - Yuxi Xia, Kinga Stanczak, Benjamin Roth:

Explaining Generalization of AI-Generated Text Detectors Through Linguistic Analysis. 6524-6546 - Elena Spaziani, Kamyar Zeinalipour, Pierluigi Cassotti, Nina Tahmasebi:

Elections go bananas: A First Large-scale Multilingual Study of Pluralia Tantum using LLMs. 6547-6570 - Giulio Corallo, Orion Weller, Fabio Petroni, Paolo Papotti:

CacheNotes: Task-Aware Key-Value Cache Compression for Reasoning-Intensive Knowledge Tasks. 6571-6590 - Yao Fu, Ran Qiu, Xinhe Wang, Jacob Sansom, Sathvika Ayyappa Prabhu, Huijie Tang, Jaekyeom Kim, Sungryull Sohn, Honglak Lee:

Beyond Blind Following: Evaluating Robustness of LLM Agents under Imperfect Guidance. 6591-6618 - Van Bach Nguyen, Jörg Schlötterer, Christin Seifert:

How Do LLMs Generate Contrastive Sentiments? A Mechanistic Perspective. 6619-6635 - Charu Karakkaparambil James, Waleed Mustafa, Marcio Monteiro, Marius Kloft, Sophie Fellenz:

Continual Neural Topic Model. 6636-6658 - Vasudha Varadarajan, Hui Xu, Rebecca Astrid Boehme, Mariam Marlen Mirström, Sverker Sikström, H. Andrew Schwartz:

MAQuA: Multi-outcome Adaptive Question-Asking for Mental Health using Item Response Theory. 6659-6677 - Masaki Asada, Makoto Miwa:

Principled Self-Correction in Discrete Diffusion: A UCB-Guided Framework for Text Generation. 6678-6692 - Negar Foroutan, Jakhongir Saydaliev, Grace Kim, Antoine Bosselut:

ConLID: Supervised Contrastive Learning for Low-Resource Language Identification. 6693-6708 - Yanran Chen, Lynn Greschner, Roman Klinger, Michael Klenk, Steffen Eger:

Emotionally Charged, Logically Blurred: AI-driven Emotional Framing Impairs Human Fallacy Detection. 6709-6732 - Mizanur Rahman, Mohammed Saidul Islam, Md. Tahmid Rahman Laskar, Shafiq Joty, Enamul Hoque:

Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization. 6733-6750 - Saeed Najafi, Alona Fyshe:

Offline Preference Optimization via Maximum Marginal Likelihood Estimation. 6751-6764 - Michael Wiegand, Elisabeth Eder, Josef Ruppenhofer:

The Relevance of Value Systems for Offensive Language Detection. 6765-6789 - Hyunji Lee, Seunghyun Yoon, Yunjae Won, Hanseok Oh, Geewook Kim, Trung Bui, Franck Dernoncourt, Elias Stengel-Eskin, Mohit Bansal, Minjoon Seo:

Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact. 6790-6810 - Aashiq Muhamed, Leonardo F. R. Ribeiro, Markus Dreyer, Virginia Smith, Mona T. Diab:

RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models. 6811-6856 - Roxana Petcu, Kenton Murray, Daniel Khashabi, Evangelos Kanoulas, Maarten de Rijke, Dawn J. Lawrie, Kevin Duh:

Query Decomposition for RAG: Balancing Exploration-Exploitation. 6857-6871 - Chi Zhang, Wenxuan Ding, Jiale Liu, Mingrui Wu, Qingyun Wu, Ray Mooney:

Do Images Speak Louder than Words? Investigating the Effect of Textual Misinformation in VLMs. 6872-6895 - Rifo Ahmad Genadi, Munachiso Nwadike, Nurdaulet Mukhituly, Tatsuya Hiraoka, Hilal AlQuabeh, Kentaro Inui:

Sycophancy Hides Linearly in the Attention Heads. 6896-6912 - Daniil Orel, Dilshod Azizov, Indraneil Paul, Yuxia Wang, Iryna Gurevych, Preslav Nakov:

AICD Bench: A Challenging Benchmark for AI-Generated Code Detection. 6913-6938 - Shahar Katz, Bar Alon, Ariel Shaulov, Lior Wolf, Mahmood Sharif:

Safeguarding Language Models via Self-Destruct Trapdoor. 6939-6958 - Prakhar Ganesh, Reza Shokri, Golnoosh Farnadi:

Rethinking Hallucinations: Correctness, Consistency, and Prompt Multiplicity. 6959-6978 - Bojan Batalo, Erica K. Shimomoto, Dipesh Satav, Neil Millar:

Hype or not? Formalizing Automatic Promotional Language Detection in Biomedical Research. 6979-6992 - Selim Furkan Tekin, Fatih Ilhan, Sihao Hu, Tiansheng Huang, Yichang Xu, Zachary Yahn, Ling Liu:

H3Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs. 6993-7013 - Yeganeh Kordi, Nihal V. Nayak, Max Zuo, Ilana Nguyen, Stephen H. Bach:

Revisiting Generalization Across Difficulty Levels: It's Not So Easy. 7014-7042 - Hadi Reisizadeh, Jinghan Jia, Zhiqi Bu, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, Sijia Liu, Mingyi Hong:

BLUR: A Bi-Level Optimization Approach for LLM Unlearning. 7043-7058 - Moulik Choraria, Xinbo Wu, Akhil Bhimaraju, Nitesh Sekhar, Yue Wu, Xu Zhang, Prateek Singhal, Lav R. Varshney:

DeepInsert: Early Layer Bypass for Efficient and Performant Multimodal Understanding. 7059-7079 - Mirac Suzgun, Mert Yüksekgönül, Federico Bianchi, Dan Jurafsky, James Zou:

Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory. 7080-7106 - Lucie Kunitomo-Jacquin, Edison Marrese-Taylor, Ken Fukuda, Masahiro Hamasaki:

Evidential Semantic Entropy for LLM Uncertainty Quantification. 7107-7122 - Laya Iyer, Angelina Wang, Sanmi Koyejo:

SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases. 7123-7137 - Yige Yuan, Teng Xiao, Shuchang Tao, Xue Wang, Jinyang Gao, Bolin Ding, Bingbing Xu:

Incentivizing Strong Reasoning from Weak Supervision. 7138-7156 - Brahim Touayouch, Loïc Fosse, Géraldine Damnati, Gwénolé Lecorvé:

DivMerge: A divergence-based model merging method for multi-tasking. 7157-7180 - Li An, Yujian Liu, Yepeng Liu, Yuheng Bu, Yang Zhang, Shiyu Chang:

A Reinforcement Learning Framework for Robust and Secure LLM Watermarking. 7181-7198 - Sameer Komoravolu, Khalil Mrini:

Agent-Testing Agent: A Meta-Agent for Automated Testing and Evaluation of Conversational AI Agents. 7199-7214 - Guy Alt, Eran Hirsch, Serwar Basch, Ido Dagan, Oren Glickman:

User-Centric Evidence Ranking for Attribution and Fact Verification. 7215-7237 - Mena Attia, Aashiq Muhamed, Mai Alkhamissi, Thamar Solorio, Mona T. Diab:

Beyond Understanding: Evaluating the Pragmatic Gap in LLMs' Cultural Processing of Figurative Language. 7238-7265 - Hieu Tran, Phuong-Anh Nguyen-Le, Huy Nghiem, Quang-Nhan Nguyen, Wei Ai, Marine Carpuat:

VietMix: A Naturally-Occurring Parallel Corpus and Augmentation Framework for Vietnamese-English Code-Mixed Machine Translation. 7266-7284 - Aditya Sanjiv Kanade, Tanuja Ganu:

Do You See Me : A Multidimensional Benchmark for Evaluating Visual Perception in Multimodal LLMs. 7285-7326 - Farnoosh Hashemi, Michael Macy:

An Empirical Study of Collective Behaviors and Social Dynamics in Large Language Model Agents. 7327-7351 - Shayan Bali, Farhan Farsi, Mohammad Hosseini, Adel Khorramrouz, Ehsaneddin Asgari:

Detecting Subtle Biases: An Ethical Lens on Underexplored Areas in AI Language Models Biases. 7352-7379 - Hamid Jahad Sarvestani, Vida Ramezanian, Saee Saadat, Neda Taghizadeh Serajeh, Maryam Sadat Razavi Taheri, Shohreh Kasaei, MohammadAmin Fazli, Ehsaneddin Asgari:

HarfoSokhan: A Comprehensive Parallel Dataset for Transitions between Persian Colloquial and Formal Variations. 7380-7392 - Miles Williams, George Chrysostomou, Vitor Jeronymo, Nikolaos Aletras:

Compressing Language Models for Specialized Domains. 7393-7415 - Priyanka Dey, Daniele Rosa, Wenqing Zheng, Daniel Barcklow, Jieyu Zhao, Emilio Ferrara:

GRAVITY: A Framework for Personalized Text Generation via Profile-Grounded Synthetic Preferences. 7416-7436 - Kent K. Chang, Mackenzie Cramer, Anna Ho, Ti Ti Nguyen, Yilin Yuan, David Bamman:

Multimodal Conversation Structure Understanding. 7437-7458 - Zizhou Liu, Ziwei Gong, Lin Ai, Zheng Hui, Run Chen, Colin Wayne Leach, Michelle R. Greene, Julia Hirschberg:

A Review of Incorporating Psychological Theories in LLMs. 7459-7495 - Aly M. Kassem, Bernhard Schölkopf, Zhijing Jin:

How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities. 7496-7507 - Kaiwen Shi, Zheyuan Zhang, Zhengqing Yuan, Keerthiram Murugesan, Vincent Galassi, Chuxu Zhang, Yanfang Ye:

NG-Router: Graph-Supervised Multi-Agent Collaboration for Nutrition Question Answering. 7508-7527 - Tianyang Xu, Dan Zhang, Kushan Mitra, Estevam Hruschka:

Verification-Aware Planning for Multi-Agent Systems. 7528-7546 - Xueqiang Xu, Jinfeng Xiao, James Barry, Mohab Elkaref, Jiaru Zou, Pengcheng Jiang, Yunyi Zhang, Maxwell J. Giammona, Geeth De Mel, Jiawei Han:

Zero-Shot Open-Schema Entity Structure Discovery. 7547-7561 - Anooshka Bajaj, Deven Mahesh Mistry, Sahaj Singh Maini, Yash Aggarwal, Zoran Tiganj:

Beyond Semantics: How Temporal Biases Shapes Retrieval in Transformer and State-Space Models. 7562-7581 - Kazuki Hayashi, Shintaro Ozaki, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe:

Diagnosing Vision Language Models' Perception by Leveraging Human Methods for Color Vision Deficiencies. 7582-7605 - Fan Jiang, Honglin Yu, Grace Chung, Trevor Cohn:

Tokenizer-Aware Cross-Lingual Adaptation of Decoder-Only LLMs through Embedding Relearning and Swapping. 7606-7636 - Henry Peng Zou, Siffi Singh, Yi Nian, Jianfeng He, Jason Cai, Saab Mansour, Hang Su:

Active Generalized Category Discovery with Diverse LLM Feedback. 7637-7658 - Chenyang Zhu, Spencer Hong, Jingyu Wu, Kushal Chawla, Yuhui Tang, Youbing Yin, Nathan Wolfe, Erin Babinsky, Daben Liu:

RAFFLES: Reasoning-based Attribution of Faults for LLM Systems. 7659-7688 - James Beetham, Souradip Chakraborty, Mengdi Wang, Furong Huang, Amrit Singh Bedi, Mubarak Shah:

Jailbreaks as Inference-Time Alignment: A Framework for Understanding Safety Failures in LLMs. 7689-7713 - Roy Xie, Deepak Gopinath, David Qiu, Dong Lin, Haitian Sun, Saloni Potdar, Bhuwan Dhingra:

Over-Searching in Retrieval-Augmented Large Language Models. 7714-7739 - Daniel Fein, Sebastian Russo, Violet Xiang, Kabir Jolly, Rafael Rafailov, Nick Haber:

LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing. 7740-7755 - Zihe Ye, Jingyuan Huang, Weixin Chen, Yongfeng Zhang:

H-Mem: Hybrid Multi-Dimensional Memory Management for Long-Context Conversational Agents. 7756-7775 - Xuefeng Wei, Xuan Zhou, Yusuke Sakai, Taro Watanabe:

"Yuki Gets Sushi, David Gets Steak?": Uncovering Gender and Racial Biases in LLM-Based Meal Recommendations. 7776-7796 - Haeji Jung, Jinju Kim, Kyungjin Kim, Youjeong Roh, David R. Mortensen:

Happiness is Sharing a Vocabulary: A Study of Transliteration Methods. 7797-7816 - Renxi Wang, Honglin Mu, Liqun Ma, Lizhi Lin, Yunlong Feng, Timothy Baldwin, Xudong Han, Haonan Li:

SCALAR: Scientific Citation-based Live Assessment of Long-context Academic Reasoning. 7817-7830 - Hiroaki Kingetsu, Kaoru Yokoo, Kenji Fukumizu, Manohar Kaul:

Look Before You Leap: A Lookahead Reasoning Quality Gate for Speculative Decoding. 7831-7847 - Masoomali Fatehkia, Enes Altinisik, Husrev Taha Sencar:

FanarGuard: A Culturally-Aware Moderation Filter for Arabic Language Models. 7848-7869 - Tsung-Min Pai, Jui-I Wang, Li-Chun Lu, Shao-Hua Sun, Hung-yi Lee, Kai-Wei Chang:

BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation. 7870-7915 - Vladislav Pedashenko, Laida Kushnareva, Yana Khassan Nibal, Eduard Tulchinskii, Kristian Kuznetsov, Vladislav Zharchinskii, Yury Maximov, Irina Piontkovskaya:

Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story. 7916-7944 - Zongyu Wu, Minhua Lin, Zhiwei Zhang, Fali Wang, Xianren Zhang, Xiang Zhang, Suhang Wang:

Image Corruption-Inspired Membership Inference Attacks against Large Vision-Language Models. 7945-7957 - Chengzhi Zhong, Fei Cheng, Qianying Liu, Yugo Murawaki, Chenhui Chu, Sadao Kurohashi:

Language Lives in Sparse Dimensions: Toward Interpretable and Efficient Multilingual Control for Large Language Models. 7958-7970 - Atharvan Dogra, Soumya Suvra Ghosal, Ameet Deshpande, Ashwin Kalyan, Dinesh Manocha:

Engagement Undermines Safety: How Stereotypes and Toxicity Shape Humor in Language Models. 7971-7990 - Yujia Zheng, Tianhao Li, Haotian Huang, Tianyu Zeng, Jingyu Lu, Chuangxin Chu, Yuekai Huang, Ziyou Jiang, Qian Xiong, Yuyao Ge, Mingyang Li:

Are All Prompt Components Value-Neutral? Understanding the Heterogeneous Adversarial Robustness of Dissected Prompt in LLMs. 7991-8019 - Hyundong Jin, Joonghyuk Hahn, Yo-Sub Han:

A Regex Minimization Benchmark: A PSPACE-Complete Challenge for Language Models. 8020-8048 - Leonardo Bertolazzi, Manuel Vargas Guzmán, Raffaella Bernardi, Maciej Malicki, Jakub Szymanik:

Teaching Small Language Models to Learn Logic through Meta-Learning. 8049-8080 - Ayush Singh, Dishank Aggarwal, Pranav Bhagat, Ainulla Khan, Sameer Malik, Amar Prakash Azad:

COMPACT: Building Compliance Paralegals via Clause Graph Reasoning over Contracts. 8081-8112 - Omar Momen, Emilie Sitter, J. Berenike Herrmann, Sina Zarrieß:

Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets. 8113-8127 - Sicheol Sung, Joonghyuk Hahn, Yo-Sub Han:

Repairing Regex Vulnerabilities via Localization-Guided Instructions. 8128-8142 - Jana Jung, Marlene Lutz, Indira Sen, Markus Strohmaier:

Do Psychometric Tests Work for Large Language Models? Evaluation of Tests on Sexism, Racism, and Morality. 8143-8173 - Yindong Wang, Martin Preiß, Margarita Bugueño, Jan Vincent Hoffbauer, Abdullatif Ghajar, Tolga Buz, Gerard de Melo:

ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations. 8174-8187 - Tomoyuki Jinno, Kazuki Hayashi, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe:

Cosine Similarity as Logits?: A Scalable Knowledge Probe Using Embedding Vectors from Generative Language Models. 8188-8200 - Zahra Abbasiantaeb, Simon Lupart, Mohammad Aliannejadi:

Generating Multi-Aspect Queries for Conversational Search. 8201-8217 - Guozhao Mo, Yanjiang Liu, Yafei Shi, Jiawei Chen, Yang Li, Yaojie Lu, Hongyu Lin, Ben He, Le Sun, Bo Zheng, Xianpei Han:

Navigating the Infinite Dynamic Web Space: Effective In-Context Exploration via Cognitive Multi-Agent Collaboration. 8218-8232 - Ryo Fujii, Makoto Morishita, Kazuki Yano, Jun Suzuki:

TimeMachine-bench: A Benchmark for Evaluating Model Capabilities in Repository-Level Migration Tasks. 8233-8264 - Robert West, Ashton Anderson, Ece Kamar, Eric Horvitz:

Tandem Training for Language Models. 8265-8278 - Dwip Dalal, Utkarsh Mishra, Narendra Ahuja, Nebojsa Jojic:

Can MLLMs Find Their Way in a City? Exploring Emergent Navigation from Web-Scale Knowledge. 8279-8303 - Alla Chepurova, Aydar Bulatov, Mikhail Burtsev, Yuri Kuratov:

Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models. 8304-8319 - Arnav Yayavaram, Siddharth Yayavaram, Simran Khanuja, Michael Saxon, Graham Neubig:

CAIRE: Cultural Attribution of Images with Retrieval. 8320-8338 - Xinlan Yan, Di Wu, Yibin Lei, Christof Monz, Iacer Calixto:

What Does Infect Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs. 8339-8358 - Giovanni Trappolini, Florin Cuconasu, Simone Filice, Yoelle Maarek, Fabrizio Silvestri:

Redefining Retrieval Evaluation in the Era of LLMs. 8359-8375 - Abir Harrasse, Chaithanya Bandi, Hari Bandi:

Debate, Deliberate, Decide (D3): A Cost-Aware Adversarial Framework for Reliable and Interpretable LLM Evaluation. 8376-8392 - Christine de Kock, Arij Riabi, Zeerak Talat, Michael Sejr Schlichtkrull, Pranava Madhyastha, Eduard H. Hovy:

IYKYK: Using language models to decode extremist cryptolects. 8393-8409 - Sawsan Alqahtani, Mir Tafseer Nayeem, Md. Tahmid Rahman Laskar, Tasnim Mohiuddin, M. Saiful Bari:

Stop Taking Tokenizers for Granted: They Are Core Design Decisions in Large Language Models. 8410-8432

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














