


default search action
ICMR 2026: Amsterdam, The Netherlands
- Stevan Rudinac, Zi Helen Huang, Marcel Worring, Lucia Vadicamo, Xirong Li, Pascal Mettes, Lizi Liao:

Proceedings of the 2026 International Conference on Multimedia Retrieval, ICMR 2026, Amsterdam, The Netherlands, June 16-19, 2026. ACM 2026, ISBN 979-8-4007-2617-0
Keynote Talks
- Marcella Cornia:

From Captioning to Multimodal Reasoning: The Evolution of Vision-Language Research in the Era of Multimodal LLMs. 1 - Evangelos Kanoulas:

Retrieval in the Age of Agents: From Ranking Documents to Querying Experts. 2
Multimedia Retrieval Methods and Systems
- Feng Ding, Huailiang Peng, Hao Sun, Chi Zhang, Qiong Dai:

Graphs for Logic and Texts for Context: A Multi-Agent Orchestrated Hybrid RAG with Stepwise Question Decomposition. 3-11 - Haowei Li, Haojie Wu, Jie Bao, Zhen Chen, Yong Liao:

Mitigating Semantic Bias in Multilingual Visual Document Retrieval via Language-Vision-Aware Late Interaction. 12-20 - Guquan Jing, Peng Gao, Yujian Lee, Hui Zhang:

Describing-Verifying-Scoring: A Hierarchical Reasoning Framework for Zero-Shot Composed Image Retrieval. 21-30 - Huaqing Song, Baichuan Lin, Shuofeng Sun, Lanchi Xie, Haibin Yan, Zhihui Li:

Spherical Projected Bézier Flow: A Geometry-Constrained Manifold Transport Framework for Cross-Age Face Retrieval. 31-39 - Huatuan Sun, Yunshan Ma, Changguang Wu, Yanxin Zhang, Pengfei Wang, Xiaoyu Du:

Frozen LVLMs for Micro-Video Recommendation: A Systematic Study of Feature Extraction and Fusion. 40-49 - Xiaoqin Zhang, Xingpeng Wu, Yanjun Lu, Shuhan Nie, Tian Shi, Xianglong Liu, Xin Yang:

PIRSA: Profile-Indexed Retrieval and Semantic Frame Alignment for Ambiguous Spoken Language Understanding. 50-58 - Quanyou Li, Yunqi Cao, Le Yu, Wenxin Li, Jin Xu:

PAMS-GNN: Popularity-Aware Multimodal Semantic GraphNeural Networks for Recommendation. 59-68 - Fengbin Zhu, Zijing Cai, Yuzhe Wang, Pengyang Shao, Wenjie Wang, Fuli Feng, Richang Hong, Tat-Seng Chua:

MURE: Hierarchical Multi-Resolution Encoding via Vision-Language Models for Visual Document Retrieval. 69-78 - Xiaotian Li, Chenyu Zhang, Changbai Chen, Zhongjie Zhu:

DIR-RSCLIP: A Degradation-Invariant Cross-Modal Retrieval Network for Remote Sensing Images. 79-87 - Yifei Cao, Guolong Wang, Mingliang Hou, Jizhe Yu, Xianjie Zhang, Xiya Bu, Zhizhen Li, Yu Liu:

BiOVQL: Brain-inspired One-stage Egocentric Visual Query Localization. 88-98 - Daichi Sugita, Huu-Long Pham, Makoto P. Kato, Hiroaki Ohshima, Sumio Fujita, Yoshiyuki Shoji:

Which LoRA Should Be Merged Next? Retrieving an Additional LoRA from a Target Image. 99-107 - Yixing Ke, Dong Liang, Kun Shang:

NeuroAlign: Dynamic Dual-Stream Alignment of Perception and Cognition for Zero-Shot Brain-Image Retrieval. 108-117 - Xuewen He, Yuning Guo, Fumao Xu, Mingyong Li:

M-STAR: Multi-view Semantic Topology Alignment with Reasoning from VLMs for Image-Text Retrieval. 118-127 - Yi Jin, Weichao Chen, Shengjie Zhao:

Scaling Multimodal Retrieval and Generation for Long Documents through Visual Tiling and Context Compression. 128-137 - Wenkai Ma, Wentao Ma, Tan Wang, Yuwei Wang, Lu Liu, Junfei Yang, Xianbao Xu, Yuan Rao:

GjPest4CMR: LLMs-Assisted Morphology-Aware Fine-grained Goji Pest Image-Text Retrieval. 138-146 - Peng Jin, Yuxuan Qiu, Zhaofa Wang, Xiaoyue Hu, Xiabing Zhou, Na Jiang:

LSR²: Learning to Select Relational and Reasoning Feature for Multi-modal Re-identification. 147-156 - Qiyou Huang, Meng Wang:

Unsupervised Cross-Modal Semantic Invariance Hashing. 157-166 - Honglei Zhang, Pengfei Zhou, Ruohan Wang, Siyue Zhang, Yilei Shi:

RoATR: A Systematic Study of Audio-Text Retrieval Robustness Against Realistic Perturbations. 167-176 - Junsong Wang, Weiqing Min, Guorui Sheng, Tao Yao, Lili Wang, Shuqiang Jiang:

RFHNet: Relational and Frequency-Aware Hashing Network for Large-Scale Fine-Grained Food Image Retrieval. 177-185 - Anjun Jia, Xiaowei Zhang:

DeHub: Learning Hub-Resistant Representations for Text-Video Retrieval. 186-195 - Neha Choudhary, Abhiraj Painuly, Yaman Kumar, Varun Khurana, Poonam Goyal:

AdRetr: Theme-Aware Advertisement Video Retrieval Beyond Keywords. 196-205 - Haoran Wang, Dan Wan:

Empowering Vision Language Models for Training-Free Visual Search via Context-Aware Scanpath Simulation. 206-214 - Ning Han, Xiubo Liang:

DyCa-GRPO: Calibrated and Efficient Learning-to-Rank for Multimodal Retrieval with Human Feedback. 215-222 - Xingchen Han, Ruihao Zhang, Ruiting Li, Yingxin Pei, Jiaqi Wang, Zhe Ji, Feiliang Ren, Yongkang Liu:

CCRA: A Cross-modal Complementary Representation Alignment Framework for Bridging the Modality Gap. 223-232 - Xinwei Li, Gaoquan Liu, Changqiang Xu, Ruohong Huan, Xiaomin Zhao:

PRISM-GCN: Principled Representation Integrationon Graphs for Multi-Behavior Recommendation. 233-241 - Rui Xu, Lin Yao, Zhiyang Wu, Xuyun Zhang, Guowei Wu:

B-HFA: Parameter-Efficient Vision-Language Retrieval via Block-shared Adapters and Hierarchical Aggregation. 242-251 - Yuan Ren, Mingxue Liao, Gang Zhou, Jinxing Peng, Pin Lv:

MCHRAG: Multi-Centroid Hierarchical Indexing for Efficient Incremental RAG. 252-260 - Shih-Chih Lin:

Adaptive Knowledge-Language Alignment for Industrial Anomaly Detection and Segmentation. 261-268 - Jiale Huang, Zixu Li, Zhiheng Fu, Zhiwei Chen, Qinlei Huang, Yupeng Hu:

RankVR: Low-Rank Structure Perception and Value Recalibration for Robust Composed Image Retrieval. 269-278 - Xing Zhang, Xu Cheng, Likang Wu, Yingyuan Xiao, Wenguang Zheng:

Target-Enhanced Gated Transformer: A Multi-Behavior Recommendation Framework for Noise Suppression and Target Signal Preservation. 279-287 - Jiale Huang, Zixu Li, Zhiwei Chen, Zhiheng Fu, Chunxiao Wang, Yupeng Hu:

IMAGINE: Adaptive Schema-Imagery Enhanced Composition for Composed Video Retrieval. 288-297 - Junhao Gao, Chao Yang, Bin Jiang:

CREAM: Collaborative Representation with Self-supervised Alignment for Multimedia Recommendation. 298-307 - Qinze Zhu, Jiayi Yang, Yijie Zhu, Mingyong Li:

Contrastive Multimodal Fusion and Pseudo-Label method for Unsupervised Cross-Modal Hashing Retrieval. 308-316 - Yunlong He, Ziqi Gu, Tong Zhang, Zhen Cui, Chunyan Xu:

From Noisy Candidates to Reliable Grounding in Weakly-Supervised Referring Expression Comprehension. 317-326 - Jialong Hu, Zijie Song, Yang Wang, Zhenzhen Hu, Jia Li, Yixiao Ma, Richang Hong:

Fine-grained Text-Video Retrieval with Patch-level Temporal Difference and Aggregation. 327-336 - Zeyu Li, Lei Li:

MMRet3D: A Multi-Modal Matching Framework for 3D Object Retrieval from Multi-View Images. 337-346 - Yewen Li, Zongwei Tang, Xiaodong Wang:

Global-Regional Dual Hashing for Unsupervised Visual-Textual Retrieval via Concept Similarity Guidance. 347-355 - Ziyi Wu, Jiahao Li, Mingyuan Jiu, Hongru Zhao, Hichem Sahbi, Mingliang Xu:

Calibrate and Aggregate: Cross-Modal Retrieval with Distribution Alignment and Token Reduction. 356-365 - Dixin Chen, Baoyao Yang, Haifeng Lin, Canrong Du, Wenbin Yao:

SCAN: Self-Calibrated Textual Anchoring for Dual-Granularity Video Screening in Text-Video Retrieval. 366-375 - Houlin Zhu, Libo Liu, Ruonan Zhang, Zhen Deng:

Deconstructing Centrality: Scale-Hierarchy for Hubness in Text-Video Retrieval. 376-385 - Zhoubo Xu, Zizhao Pang, Yu Meng, Huijie Cong:

Robust Graph Matching via Confidence-Guided Distillation and Random Node Masking. 386-394 - Mengxue Yang, Xitong Li, Ziyi Yang, Jingqi Zhang, Xiaruo Zhang, Ying Li:

CARE: A Constraint-Aware Hypergraph Framework for Knowledge-Driven Event-Centric Retrieval. 395-404 - Yifan Huo, Ming Liu, Junhong Zheng, Lili He:

Cross-View Collaborative Recommendation with Multimodal and Multi-Scale User Behaviors. 405-409 - Yulin Xu, Chunqi Guo, Yuanzhen Shuai, Jianyuan Ni:

Relational Retrieval: Leveraging Known-Novel Interactions for Generalized Category Discovery. 410-414 - Shuming Hu, Chuan Fu, Shuting He, Henghui Ding:

Instance-Adaptive Routing for Object Re-Identification with Heterogeneous Experts. 415-419 - Huaying Zhang, Ren Togo, Takahiro Ogawa, Miki Haseyama:

Reranking-based Analysis on Visual and Textual Query Contribution to Composed Image Retrieval. 420-424 - Zixuan Yu, Xiaoping Liang, Zhenjun Tang:

Video Hashing with Robust Secondary Frames and Local Tangent Space Alignment for Copy Detection. 425-429 - Po-Chih Lin, Yixuan Dong, Fang-Yi Su, Haijie Yang, Zelin Zang, Hongliang Zhang, Wing-Kuen Ling, Fuji Yang:

POSITIVE4Rec: Incorporating the Recency Effect with Positional Inductive Bias for Sequential Recommendation. 430-434 - Qiqi He, Dichucheng Li, Xiaoheng Sun, Anqi Huang:

A Decomposed Retrieval-Edit-Rerank Framework for Chord Generation. 435-439
Multimedia Analysis, Representation, and Understanding
- Feiyi He, Zihan Cheng, Jielei Wang, Cencen Liu, Guoming Lu:

DTSR: High-Frequency Prior-Based Dynamic Texture Synthesis for Real-World Image Super-Resolution. 440-448 - Ruoxuan Yang, Chuan Li:

Semi-Automatic Correction of 3D Tubular Structure Skeletons via Component-Wise MST and Filtered Delaunay Triangulation. 449-458 - Yifei Wang, Gaozhi Liu, Sheng Li, Xinpeng Zhang, Zhenxing Qian:

Neural Representations for Animated GIFs. 459-467 - Guogen Zeng, Juan Luo, Peng Sun, Ying Qiao, Anping Liu:

SatCLA: A Collaborative LLM-Driven Framework for Annotation of LEO Satellite Imagery. 468-477 - Na Dong, Gim Hee Lee:

Semi-3DETR: Semi-Supervised Detection Transformer for 3D Object Detection. 478-486 - Qiuyu Mei, Hong Yu, Shijie Yu, Yan Yang:

Evidential Uncertainty Modulated Adaptive Predictive Contrastive Learning for Multimodal Fusion. 487-496 - Jiakang Yu, Shizhou Huang, Xiaode Chen, Hongtao Deng, Wang Gao, Xun Zhu:

CodeMNER: Vision-Language Models are Better Multimodal Named Entity Recognizers via Progressive Vision-Code Alignment. 497-505 - Xin Qin, Wenjie Wang, Mengna Liu, Xu Cheng, Fan Shi, Shengyong Chen:

A2P-Net: Asymmetric Domain-Adaptive Prototype Network for Cross-Domain Multimodal Sensor Retrieval. 506-514 - Yijie Hong, Xiaofei Yin, Xinzhong Wang, Ya Guo, Huijia Zhu, Sufeng Duan:

Keep the General, Inject the Specific: Structured Dialogue Fine-Tuning for Knowledge Injection without Catastrophic Forgetting. 515-524 - Zijie Zhou, Dandan Zhu, Hangxiangpan Wang, Heng Zhang, Huishen Jiao, Yi Zhao:

Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models. 525-534 - Zhengyu Ma, Qifeng Zhou:

Improving Pseudo-Labeling and Representation Balance in Realistic Long-Tailed Semi-Supervised Learning. 535-543 - Pu Wang, Dehui Kong, Jinghua Li, Baocai Yin:

SubGAva: 3D Gaussian Primitive Subdivision for Photo-realistic and Animatable Human Avatars from Monocular Video. 544-552 - Xiaohong Jia, Zhiwei Xia, Baijing Wu, Yunchao Wei, Yao Zhao:

Entropy Regularized Simple Multiple Kernel K-Means with Adaptive Deviation Correction. 553-557 - Hongyu Chen, Liang Lin, Guangrun Wang:

OOWM: Structuring Embodied Reasoning and Planning via Object-Oriented Programmatic World Modeling. 558-566 - Ao Xu, Rujin Zhao, Xiong Xu, Boceng Huang, Yujia Jia, Hongfeng Long, Fuxuan Chen, Zilong Cao, Fangyuan Chen:

MAFNet: Multi-frequency Adaptive Fusion Network for Real-time Stereo Matching. 567-576 - Ziniu Liu, Shuheng Zhou, Jin Zeng, Mingqing Liu, Yulong Zhang, Hao Deng:

Adaptive Spatial-Channel Masked Reconstruction Knowledge Distillation for Dense Prediction. 577-585 - Jinguo Hu, Yuelan Qi, Song Yu, Zhifang Liao, Wendong Zhang:

PURE: Probabilistic Utility for Robust Evidence in Multimodal Recommendation. 586-594 - Fengrui Liu, Ruiyang Huang, Qijian Zheng, Yuanfang Wang, Feng Liu:

From Physics to Representation: Audio Learning with Synthetic Pre-training via Procedural Generation. 595-604 - Song Shi, Chengqiao Liu, Jinfang Jia:

Parameter-Efficient Adaptation with Proxy Anchors for Cross-Domain Few-Shot Learning. 605-614 - Fan Wu, Amine Kacete, Jérôme Royan, Guillaume Moreau:

Real-Time Retrieval-Free Camera Pose Estimation via Sparse Cross-Modal 2D-3D Matching with Projection-Guided Refinement. 615-624 - Bingyu Zhu, Dongbo Yu, Yunbiao Wang, Jun Xiao, Haiyong Jiang:

AF-BEV: Object-Aware Adaptive Frustum-based BEV Aggregation for 3D Object Detection. 625-634 - Yujun Hu, Changbo Wang, Gaoqi He:

KG-CPEN: Knowledge-Guided Compositional Prototype Evolution for Unbiased Scene Graph Generation. 635-644 - Yifan Hong, Jiao Luo, Wang Jiawei, Yangchen Zeng, Yusen Wu:

Anomaly Detection in Dynamic Networks with Hyperspherical Projection and DBN-based Anomaly Synthesis. 645-653 - Kaiyuan Jin, Wayit Abliz, Maihemuti Maimaiti, Zaokere Kadeer, Aishan Wumaier, Abdujelil Abdurahman, Panpan Zheng:

SSHF-CLIP: Semantic-Guided Scale Selection and Hybrid Fusion for Zero-Shot Anomaly Detection. 654-662 - Ye Wang, Kai Huang, Kai Zhao, Sumin Shen:

Detail-Aware Camouflaged Object Detection via Multi-Granularity Collaborative Learning. 663-671 - Guohui Ding, Chufei Wang, Hongfeng Wang, Zakovorotnyi Oleksandr:

TASEN: Topic-Aware Semantic Enhancement Network for Multimodal Named Entity Recognition. 672-680 - Yishan Zou, Chris D. Nugent, Matthew Burns, Shengli Wu, Mingzhu Xu, Meng Liu:

Egocentric Action Recognition with Retrieval-Augmented Learning. 681-689 - Guoxu Li, Cheng Han, Chao Zhang, Tongzhou Zhang, Shuang Yang:

GADRNet: Geometry-Prior-Guided Adaptive Distortion Rectification Network for Panoramic Depth Estimation. 690-699 - Chenyang Fan, Wen Yang, Junshi Cheng, Zihong Li, Wenfeng Zhang, Wei Hu, Yi Zhang, Pan Zeng:

MOC-3D: Manifold-Order Consistency for Text-to-3D Generation. 700-709 - Feifei Xu, Puzhe Li, Dongyang Li, Bo Li, Luobing Huang, Wenjing Zhu, Zirui Xu, Yu Xie:

From Confrontion to Balance: A Kronecker-Constrained Spectral Entropy Joint Optimization Framework for Multimodal Sentiment Analysis. 710-718 - Hongning Liu, Xianchao Zhang, Hui Sun, Linlin Zong, Wenxin Liang, Xinyue Liu:

Graph-Based Rotation-Robust Semantic Modeling for Oriented Object Detection in Remote Sensing Images. 719-727 - Haiyan Liu, Xin Song, Ye Wang, Kai Chen, Yuying Liu, Bin Zhou, Hongkui Tu, Liqun Gao, Yue Qian:

Robust Multi-modal Knowledge Graph Completion via Modality-Specific Experts. 728-737 - Yan Li, Junjie Zheng, Zhouchao Fu, Shengjie Yang, Junjie Liao, Jianwei Zheng:

LaViSE: Language-aware Vision Scale Enhancement for Referring Remote Sensing Image Segmentation. 738-747 - Yuanming Xie, Yanzhuo Xiang, Haochen You, Nuoya Liu, Fangzhou Liu, Bo Zhao, Zhaolu Kang, Yue Li, Yuqi Li:

Symmetry-Aware Causal Inference for Robust Neural PDE Solvers. 748-757 - Zijian Song, Qichang Li, Sihan Qin, Yuhao Chen, Tianshui Chen, Liang Lin, Guangrun Wang:

Learning Physics from Pretrained Video Models: A Multimodal Continuous and Sequential World Interaction Models for Robotic Manipulation. 758-767 - Yining Xu, Yuanyang Zhang, Jingjiao You, Yingjie Huang, Jianbo Mei, Yilong Liu, Li Yao:

Ref2Inpaint: 3D Gaussian Inpainting via Visibility-Aware Mask Refinement and VLM-Guided Reference Retrieval. 768-776 - Yilong Liu, Yuanyang Zhang, Zirui Luo, Kaixi Xu, Yining Xu, Li Yao:

SaF-AD: Saliency-Adaptive and Feature-Consistent Diffusion for Industrial Anomaly Detection. 777-786 - Yongzhu Miao, Puzhen Su, Haoran Yin, Shasha Li, Jintao Tang, Ting Wang:

Beyond Inconsistent Reasoning: Intermediate Process Direct Preference Optimization on Small Multi-modal Large Language Models for Visual Question Answering. 787-796 - Yifan Wang, Peiwu Wang, Yunxian Chi, Zhinan Gou, Kai Gao:

Mitigating Multimodal Inconsistency via Cognitive Dual-Pathway Reasoning for Intent Recognition. 797-806 - Binbin Zhang, Fang Zhou, Liang Xiao, Zhiyong Su, Weiqing Li:

OpenSGG-VL: Open-Vocabulary 3DSGG with Orthogonal Residual Fusion and Iterative Relation Refinement. 807-816 - Haowei Ran, Zheng Wang, Meijun Sun:

TEA-Mamba: Textual Event-Aware Mamba for Multimodal Time Series Forecasting. 817-825 - Binbin Li, Shupei Xiao, Dayan Wu, Gengqi Yang, Siyu Jia, Zisen Qi:

Seedcap: Semantic Expansion and Entity-Driven Zero-Shot Image Captioning. 826-834 - Jiagao Hu, Daiguo Zhou, Danzhen Fu, Fuhao Li, Zepeng Wang, Fei Wang, Wenhua Liao, Jiayi Xie, Haiyang Sun:

AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos. 835-844 - Nengbo Lu, Minghua Pan:

CausalGS: Learning Physical Causality of 3D Dynamic Scenes with Gaussian Representations. 845-854 - Fang Zhang, Anqi Gou, Linli Xu:

ExpV2S: Zero-Shot Expressive Video-to-Speech Synthesis via Latent Diffusion Model. 855-864 - Zijian Liu, Cong Pan, Chengjie Fan, Wanjie Cai, Xichen Ding, Jie Qin:

PRISM: Preference-Guided Semantic Reasoning with Vision-Language Models for Object Goal Navigation. 865-873 - Longkun Shi, Hanlin Hu, Haoze Zheng, Yuanyuan Liao, Turdi Tohti:

Rectifying Multimodal Variance: UAPA-HCF for Weakly Supervised Violence Detection. 874-883 - Zhaokun Huang, Yonghong Song, Heyao Zhang:

Where & What Anomaly? A Framework for Pose-Agnostic Anomaly Detection and Zero-Shot Semantic Classification. 884-892 - Jiong Wang, Zhongwei Huang, Chong Wang, Endai Huang, Ran Zhou, Haitao Gan, Yingying Zhu, Xiaoyu Shen:

Learning A Bank of Transferable Prompts for Vision-Language Models. 893-901 - Yuanyu Zheng, Xumin Shen, Yunda Sun, Ying Shen, Lin Zhang:

Decision-Invariant Sim-to-Real Vision-and-Language Navigation with Pseudo-Panoramic Observations. 902-910 - Shuyi Jiang, Zhihao Yuan, Na Zhao:

TAVEN: Task-driven Adaptive Viewpoint Exploration for Training-Free 3D Spatial Reasoning and Understanding. 911-920 - Zheng Liang, Yisu Liu, Shuo Yang:

SGFR-Track: Uncertainty-Gated Spectral Feature Restoration for Online Multi-Object Tracking. 921-929 - Mengjingcheng Mo, Jiankang Zheng, Jiaxu Leng, Xinbo Gao:

Retrieval-Guided Contextual Inference for Training-Free Video Anomaly Detection in Low-Light Scenarios. 930-939 - Zhixun Wang, Liqun Kuang, Song Wang, Shichao Jiao, Zhongyu Chen, Fengguang Xiong:

Agglomerative Model Meets Multi-Scale Adaptive Fusion for Cross-Modal Unsupervised Domain Adaptation. 940-948 - Xiangyu Ye, Yatie Xiao, Qingxiao Guan, Zhenbang Liu:

CHM: Context Hiding and Misguidance for Robust Adversarial Attacks on Active Speaker Detection. 949-957 - Ruihan Wang, Mengqi Lei, Siqi Li, Wei Bao:

HIRNet: Hypergraph-Induced Iterative Reasoning Network for Crowd Counting. 958-966 - Tengfei Ma, Weiran Pan, Wei Wei:

Unveiling PEFT Robustness to Noisy Labels in VLMs: A Gradient-Loss Decoupling Perspective. 967-976 - Yiqing Shen, Feng Chen:

Decoupling Multimodal Perception and Reasoning for Image Reasoning Segmentation with Large Language Models. 977-985 - Liang Yang, Hongyuan Xiao, Songtao He, Ye Lin, Zhenchang Zhang:

Beyond Post-hoc Fusion: Rethinking Cross-Modal Interaction Timing in Few-Shot Learning. 986-995 - Xun Fang, Zixuan Hua, Lihua Zhang:

GeoPro-Depth: Geometrically Consistent Prompting for Robust Metric Depth Completion. 996-1005 - Feifei Xu, Wenjing Zhu, Dongyang Li, Puzhe Li, Luobin Huang, Yu Xie, Zirui Xu:

A Unified Object-Centric Spatio-Temporal Graph Reasoning Framework for Audio-Visual Question Answering. 1006-1014 - Weiyi Bu, Xiaodong Cun, Rui Yin, Jiantao Yuan, Wei Qi:

RainbowDreamer: Taming Semantic Controls for Attribute-Consistent Text-to-3D Generation. 1015-1024 - Xiangyuan Peng, Kay Bierzynski, Lorenzo Servadei, Robert Wille:

Text4Radar-V2X: Text-guided 4D Radar for Cooperative 3D Object Detection. 1025-1034 - Songtao Li, Di Zheng, Xihua Zou:

Hierarchical Local-Global Context Modeling with Selective Modulation for Point Cloud Semantic Segmentation. 1035-1044 - Zeyu Li, Lei Li:

SKG-VLA: Scene Knowledge Graph Priors for Structured Scene Semantics and Multimodal Reasoning for Decision Making. 1045-1054 - Yuntao Shou, Tao Meng, Wei Ai, Keqin Li:

C$^2$MOE: Consistency and Complementarity-guided Mixture of Experts for Incomplete Multimodal Emotion Learning. 1055-1063 - Wei Xue, Zhiyong Huo:

PanoAdapter: Efficient Adaptation of Depth Foundation Models for Immersive Multimedia via Spherical Rectification. 1064-1072 - Yufeng Xu, Yunjia Li, Hai Wang, Wei Li:

Teaching Audio-Language Models to Reason over Time. 1073-1082 - Lanlan Lu, Qimeng Yang, Yi Liu, Xinjun Pei, Jinmiao Song:

Structure Aware Distillation for Multimodal Intent Understanding Under Missing Modalities. 1083-1091 - Thiago César Castilho Almeida, Gustavo Rosseto Leticio, Vinicius Atsushi Sato Kawai, Daniel Carlos Guimarães Pedronette:

Context-Aware Interpretable Representations for Retrieval and Graph Convolutional Network Classification. 1092-1101 - Kunfang Song, Guowei Yan, Jiaqing Wang, Shufen Ruan, Yanwen Wang:

STeP-Net: A Spatio-Temporal Perception Network for Action Detection. 1102-1110 - Luofeng Zhang, Shengzhe You, Qian Shao, Yanjing Lei, Yu Liu, Fei Gao:

Text-Guided Prototype Replay and Classifier Guidance for Incremental Few-Shot Semantic Segmentation. 1111-1119 - Jianhui Zheng, Zhiyong Huo, Jiahao Zheng:

Towards Robust Sparse-View 3D Gaussian Splatting via Hierarchical Depth and Multi-View Consistency. 1120-1127 - Kejun Liu, Yuanyuan Liu, Ke Wang, Jiahao Zhang, Lei Xu, Chang Tang, Zhe Chen, Yibing Zhan:

CERA: Conflict-Explicit Reflective Agent for Multimodal Emotion Reasoning. 1128-1137 - Kaicheng Peng, Ya Li:

PCMA: Prompt-guided Cross-domain Multi-prototype Alignment for Source-Free Domain Adaptation. 1138-1146 - Chengxi Zeng, Yuxuan Jiang, Ge Gao, Shuai Wang, Duolikun Danier, Bin Zhu, Stevan Rudinac, David Bull, Fan Zhang:

SAM3-LiteText: An Anatomical Study of the SAM3 Text Encoder for Efficient Vision-Language Segmentation. 1147-1156 - Wei Li, Haiyun Guo, Manli Tao, Honghui Dong, Ming Tang, Jinqiao Wang:

HB-Mamba: Hierarchical Bi-directional State Space Modeling for LiDAR Semantic Segmentation in Autonomous Driving. 1157-1165 - Qichao Zhang, Genlang Chen, Chengcheng Jia, Jiajian Zhang:

LTA-Gait: In-Network Temporal Activation of Large Vision Models for Gait Recognition. 1166-1173 - Zhihui Sun, Diwei Su, Xiuxing Li, Qixin Wang, Shihao Zhang, Xia Wu:

Adaptive Knowledge Generation via Reinforcement-Guided Pattern Completion for Zero-Shot Visual Question Answering. 1174-1183 - Yongkang Jin, Jianwen Luo, Jingjing Wang, Jianmin Yao, Yu Hong:

RMPL: Relation-aware Multi-task Progressive Learning with Stage-wise Training for Multimedia Event Extraction. 1184-1193 - Wenfeng Li, Ru Wang, Fuyong Xu, Guoshuai Yang, Peiyu Liu:

ESG-Rec: Enhancing Static-Graph Representations via Tri-view Contrastive Learning for Multimodal Recommendation. 1194-1202 - Leyuan Liu, Xiaoqing Chen, Yufei Qian, Jingying Chen:

GD-Head: Reconstructing High-quality 3D Avatar Head from a Single Image Using Geometry-guided Diffusion Models. 1203-1212 - Jiahao Chang, Haohua Zhao, Liqing Zhang:

HM-NVS: Hierarchical Multi-Modal Novel View Synthesis with Uncertainty-Aware Progressive Refinement. 1213-1221 - Ming Meng, Hanwen Liu, Xingxing Xiang, Long Ye, Lei Zhang, Zhaoxin Fan:

PSRNet: Progressive Semantic Refinement for Human Parsing via Text Conditioning and Embedding-Based Calibration. 1222-1230 - Zongcheng Han, Yu Hong, Haoran Sun, Dongyan Cao:

VAR-3D: View-aware Auto-Regressive Model for Text-to-3D Generation via a 3D Tokenizer. 1231-1240 - Zishen Qu, Xuesong Li, Hongwei Kang, Haijian Gu, Quan Meng, Tianrui Niu, Xin Yang, Ruidong Pan:

From Diffusion to Rectified Flow: Rethinking Text-Based Segmentation. 1241-1250 - Shibin Xie, Hao Yin, Shuting Wang, Xiaokang Fang, Liang Jin, Haotian Liu, Yanting Zhang, Shen Cai:

Geometry-Guided Depth Correction for Metric Relative Pose Estimation. 1251-1259 - Yangchen Zeng, Zhenyu Yu, Dongming Jiang, Wenbo Zhang, Yifan Hong, Zhanhua Hu, Jiao Luo, Kangning Cui:

Learning Where to Embed: Noise-Aware Positional Embedding for Query Retrieval in Small-Object Detection. 1260-1269 - Tiantian Xiao, Xi Wu, Hongbin Lv, Daoyuan Wang:

Time-Surface Self-Attention: Restoring Temporal Connectivity in Spiking Transformers. 1270-1278 - Junhao Yang, Chunguo Wu, Bo Yang, Hongwei Ge, Yanchun Liang, Heow Pueh Lee:

RHVI-FDD: A Hierarchical Decoupling Framework for Low-Light Image Enhancement. 1279-1288 - Hang Li, Yulong Sun, Binhong Zhao, Feng Chen, Limei Hu:

BSSNet: Block-Aware Serialized Spatial Learning for Robust Point Cloud Registration. 1289-1297 - Xin Li, Yongxiu Xu, Yuyao Kong, Hongbo Xu, Gaopeng Gou, Yubin Wang:

Event-Centric Structural Modeling for Zero-Shot Video Moment Retrieval. 1298-1307 - Lihong Huang, Sheng-Hua Zhong, Qiao Yan, Zhijiao Xiao, Yan Liu:

SliceCSRef: Dual-Level Semantic Alignment for Robust Speech Referring Expression Comprehension. 1308-1316 - Jiachen Tan, Tingting Zhang, Tao Zhou, Guangyao Su, Jianwei Fang, Bin Wu, Chunping Zheng:

DenseSpeech: Dense Multi-Segment Temporal Grounding in Public Speaking Videos. 1317-1326 - Wei Tan, Yuanhao Li, Wenkai Liang:

Evo-CuRL: Curriculum-Aware Reinforcement Learning over Code Lineage Graphs for Software Engineering Reasoning. 1327-1335 - Shuwei Guo, Simin Luan, John See, Zeyd Boukhers, Miho Ohsaki, Kimiaki Shirahama, Cong Yang:

MoiréNet: Leveraging Directional Priors for Compact Dual-Domain Image Demoiréing. 1336-1345 - Zhiqiang You, Yutong Jiang, Shenguang Huang, Zhangjie Liu, Gaode Wu, Haoyu Wang:

QMTD: Query Vector Guided Multi-Scale Text Detection. 1346-1354 - Tiantai Zhai, Yan Zhuang, Fuji Ren, Jiawen Deng, Liang Luo:

ReNoRD: Learning from Relations under Noisy Pseudo Labels via Relational Distillation for Multimodal Sentiment. 1355-1364 - Chengzheng Fu, Rong Quan, Siyu Chen, Yiming Ni, Jie Qin:

Dual-view Driver Gaze Estimation via Mutual Enhancement. 1365-1374 - Yan Shi, Qingdong He, Yijun Liu, Jingyong Su:

KAN or MLP? Point Cloud Shows the Way Forward. 1375-1384 - Zhenrong Guo, Bowen Fei, Daqian Liu, Jiahao Zhang:

HD²FI-Net: Hierarchical Dual-Domain Fusion-Interaction Network for RGB-T Semantic Segmentation. 1385-1393 - Yunqi Gao, Zhanfeng Liao, Dongbo Zhou, Leyuan Liu:

STAR-GS: Spatio-Temporal Geometry Alignment and Generative Refinement for Sparse-View 4D Gaussian Splatting. 1394-1403 - Longlong Zhai, Jiao Tian, Feng Yan, Lei Su, Yanjun Qin, Shaochen Jiang, Chong Peng, Panpan Zheng:

STEP: Stable Gradient Projection for Continual Learning. 1404-1412 - Wei Feng, Xin Wang, Hong Chen, Yu-Wei Zhan, Zihan Song, Bin Huang, Kecheng Zheng, Wenwu Zhu:

Reflective Cross-Granularity Grounding with Preference Optimization for Long Video Understanding. 1413-1422 - Jingjiao You, Yuanyang Zhang, Yining Xu, Yunjie Zhang, Li Yao, Cunjian Chen, Tien-Tsin Wong:

GlassSplat: Geometric Consistency and Pruning for Reflection-Free 3D Scene Reconstruction. 1423-1431 - Lu Dong, Haiyu Zhang, Han Lin, Ziang Yan, Xiangyu Zeng, Hongjie Zhang, Yifei Huang, Yi Wang, Zhen-Hua Ling, Limin Wang, Yali Wang:

VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations. 1432-1441 - Kun Yao, Hao Zeng, Yida Wang, Ming Ma:

ICMF-Net: Interactive Cross-Modal Fusion with Attention and Selection Network for Remote Sensing Object Detection. 1442-1449 - Wei Xu, Gu Geng, Di Yuan, Rui Chen, Qiao Liu:

DINOTrack: Leverage Differential Attention for Noise-Aware Visual Tracking with DINOv3. 1450-1458 - Guoliang Liu, Zhengwei Miao, Jianlin Zhang, Haorui Zuo:

RaC-Stitch: Reliability-Aware Collaborative Smoothing for Unsupervised Video Stitching. 1459-1467 - Zifeng Zhu, Yixian Dai, Binghui Guo, Hongwei Zheng, Yifan Sun, Zhaoxin Fan:

Beyond Over-Editing: Important Weight Constrained Knowledge Editing in Large Language Models. 1468-1476 - Zihao Zhang, Xunkai Li, Rong-Hua Li, Zhenjun Li, Bing Zhou, Guoren Wang:

Toward General and Robust LLM-enhanced Text-attributed Graph Learning. 1477-1485 - Shengchang Wang, Yue Han, Yongquan Xue, Jiao Li, Sizhe Tan, Yunfei Zhang, Liejun Wang, Panpan Zheng:

BSR-CLIP: Background-Calibrated Structural Reasoning for Zero-Shot Visual-Language Anomaly Detection. 1486-1494 - Shuo Liu, Jiakang Yu, Xun Zhu, Hongtao Deng, Yinxia Lou:

Query-Guided Conflict Inference and Incongruity-Aware Alignment for Implicit Hate Speech Detection in Videos. 1495-1503 - Jun Liu, Guoqiang Xiao, Michael S. Lew, Song Wu:

YDANet: Leveraging YCbCr Color Space and Dual-Path Attention Network for Depth Completion. 1504-1513 - Zhenpeng Zeng, Xiaoyu Wu, Xuxu Wang, Qian Yu, Yudong Wang, Zihao Liu:

TCRS-QA: Training-Free Chain-of-Thought Reasoning for Shot-Aware Storyline Question Answering in Long-Form Videos. 1514-1522 - Dongxing Duan, Xu Liu, Jingyuan Xu, Ruijie Liu, Dan Guo:

SIGaze: Toward Pixel-Level Single-Instance Gaze Object Prediction. 1523-1532 - Qingpo Wuwu, Xiaobao Wei, Peng Chen, Nan Huang, Zhongyu Zhao, Hao Wang, Ming Lu, Ningning Ma, Shanghang Zhang:

SparseStreet: Sparse Gaussian Splatting for Real-Time Street Scene Simulation. 1533-1541 - Pengfei Huang, Xuezhen Hou:

TD-CoT: Bridging the Holistic-Atomic Gap for Training-Free Temporal Reversal Detection in Video LVLMs. 1542-1546 - Seungmin Ha, Wei Li, Yulun Wu:

Efficient Music Denoising with Channel Attention and Multi-Scale Sequence Encoding. 1547-1552 - Lingyan Liang, Zhibin Zhang, Gang Dong, Dongchao Wen, Kaihua Zhang:

Robust Exemplar Prompt Learning via Bi-directional Visual-Semantic Alignment for Multi-Object Tracking. 1553-1557 - Shuang Guo, Ying Zhou, Meina Song, Huilin Ai, Jia-Long Li, Zi Yang Chen:

LDCS-Net: Local Dual-Context Attention Network for 3D Semantic Segmentation. 1558-1562 - Yingbin Wang, Jielei Wang, Qianxin Xia, Xuewan He, Guoming Lu:

CPD: Distilling Semantics in Feature Space via Class-Projection Interaction. 1563-1567 - Jizhe Yu, Hao Zhang, Xiya Bu, Yuhang Duan, Xiaoshuai Wu, Yu Liu:

Locate Core, Refine Path: A Training-Free Closed-Loop Paradigm for Referring Video Object Segmentation. 1568-1572 - Hongda Zhang, Siao Liu, Yi Liu, Chun Ouyang, Zhongxue Gan:

Joint-Guided Spatial and Semantic Sensitive Diffusion Policy for Robotic Manipulation. 1573-1577 - Lingtao Huang, Chengshuo Xia:

LimbAug: Enhancing Virtual IMU Generalization in Human Activity Recognition via Learning Limb Movement Difference. 1578-1582 - Xinyu Nan, Lingtao Mao, Huangyu Dai, Zexin Zheng, Xinyu Sun, Zihan Liang, Ben Chen, Chenyi Lei:

UniDGF: A Unified Detection-to-Generation Framework for Hierarchical Object Visual Recognition. 1583-1587 - Ziyuan Zhao, Yuhua Wang, Yifang Yin, Yichen Zhang, Xulei Yang, Jun Cheng, Roger Zimmermann, Cuntai Guan, S. Kevin Zhou:

BL-UDA: Towards Unsupervised Domain-Adaptive Surgical Instrument Segmentation with Source Box Labels. 1588-1592 - Qianlei Wang, Kexun Chen, Yuhuang He, Xiaolin Qin:

PEGE: Monocular 6D Pose Estimation with Geometry-Aware Enhancements for Small Objects. 1593-1597
Large-Scale and Efficient Multimedia Retrieval
- Jiaxin Wu, Xiao-Yong Wei, Qing Li:

Adaptive Multi-Agent Reasoning for Text-to-Video Retrieval. 1598-1607 - Yuanxin Wei, Lansong Diao, Bujiao Chen, Shenggan Cheng, Zhengping Qian, Wenyuan Yu, Nong Xiao, Wei Lin, Jiangsu Du:

MixCache: Mixture-of-Cache for Video Diffusion Transformer Acceleration. 1608-1617 - Qing Ma, Siyang Zhang, Yue Jiang, Cong Bai:

Short-Length Hashing via Bit-Level Semantic Representation for Image-Text Retrieval. 1618-1625 - Keli Liu, Zhendong Wang, Wengang Zhou, Houqiang Li:

StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models. 1626-1634 - Yueqi Zhu, Baiwen Zhang, Guo Cheng, Yongkang Zhang, Feiran Liu, Er Cao, Meng Xu:

HSS-Net: Hybrid State Space Modeling for Efficient Unified Adverse Weather Restoration. 1635-1643 - Jingru Li, Haowen Zheng:

Stillness is Redundant: Motion-Aware KV Cache Retrieval for Efficient Video Understanding. 1644-1652 - Yangyang Liu:

Audit-and-Repair Indexing: Probabilistic Caption Indexes for Robust Multimodal Retrieval. 1653-1661 - Zongze Wu, Yani Guo, Runnan Li:

Lookahead-R: Budget-Aware Tool Retrieval via Execution-Centric Planning. 1662-1671 - Yihao Song, Teng Hu, Ran Yi, Xiaoning Lei, Bin Sheng:

SA-Edit: Accelerating Editing Models via Test-time Spatial Acceleration. 1672-1681 - Han Sun, Liji Wu, Zixin Wang, Baisong Li, Huiping Zhuang, Weiping Wang, Yaoyi Deng, Mingxuan Li, Hailong Zhang:

LP-VFedNN: A Lightweight and Lossless Privacy-Preserving Vertical Federated Learning Framework For Heterogeneous Neural Network Via Homomorphic Encryption and Intel SGX. 1682-1691 - Yiming Ding, Ziang Chen, Jianguo Wei:

Uncertainty-Gated Generative Compression for Structure-Preserving Multimedia Retrieval. 1692-1700 - Haifeng Ma, Mingyue Guo, Linhui Xiao, Qingfang Zheng, Qingming Huang:

SHARP: Semantic Head-Aware Representation Pruning for Efficient MLLMs. 1701-1705 - Shuang Hu, Likai Yang, Xiaoping Liang, Zhenjun Tang:

Self-Supervised Video Hashing with Consistent Short-Term and Long-Range Temporal Modeling. 1706-1710 - Jiangtao Xie, Junjie Wu, Zhaolin Zhang, Qilong Wang, Peihua Li:

GDT-VLM: Global Distribution Modeling for Visual Token Compression in Efficient Multimodal Large Language Models. 1711-1715 - Junfeng Fang, Yan Zhang, Mingyu Liu, Zhaoxi Feng, Manzhou Li, Yan Li:

An Energy-Efficient Multimodal Retrieval Framework for Inference on Heterogeneous Edge Nodes. 1716-1720
User Interaction, Queries and Perception
- Teng Chen, Sheng Xu, Feixiang Guo, Xiaoyu Wang, Qingqing Gu, Hongyan Li, Luo Ji:

Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA. 1721-1729 - Bastian Jäckl, Omar Shahbaz Khan, Benjamin Verner, Zuzana Vopálková, Udo Schlegel, Daniel A. Keim, Jakub Lokoc:

What Drove Success at the 15th Video Browser Showdown? A Comprehensive Interaction-Logging Analysis. 1730-1739 - Shuaiwei Xie, Yating Yang, Bo Ma, Xi Zhou, Zhen Wang, Ahtamjan Ahmat:

Ask-to-Retrieve: VQA-Guided Information-Gain Question Selection for Interactive Text-to-Image Retrieval. 1740-1748 - Hongyi Zhu, Shuai Wang, Jia-Hong Huang, Yixian Shen, Stevan Rudinac, Evangelos Kanoulas:

Agent-Based Query Reformulation: Simulating Feedback and Mitigating Negation Blindness in Interactive Image Retrieval. 1749-1758 - Yu-Tong Cheng, Phuong Anh Nguyen, Chong-Wah Ngo:

A Pruning-based Question-Answering for Interactive Video Search: A Simple Baseline. 1759-1767 - Bastian Jäckl, Jirí Kruchina, Lucas Joos, Daniel A. Keim, Ladislav Peska, Jakub Lokoc:

Evaluating Keyframe Layouts for Visual Known-Item Search in Homogeneous Collections. 1768-1777 - Sigurður Þórarinsson, Björn Þór Jónsson, Omar Shahbaz Khan:

Optimization of Long-Running Media Aggregation Queries. 1778-1786 - Seongbo Jang, Seonghyeon Lee, Dongha Lee, Hwanjo Yu:

On the Effectiveness of Integration Methods for Multimodal Dialogue Response Retrieval. 1787-1792 - Rong Quan, Yantao Lai, Dong Liang, Jie Qin:

Object Referring-Guided Scanpath Prediction with Perception-Enhanced Vision-Language Models. 1793-1797 - Huilin Ai, Ying Zhou, Zhonghua Peng, Ziyang Chen, Shuang Guo, Heng Xu, Zhiwei Yu, Jialong Li:

FMD-AL: Cold-Start Active Learning based on Foundation Model. 1798-1802
Social, Narrative and Generative Multimedia
- Xiao Liang, Bangxin Li, Zixuan Chen, Hanyue Zheng, Zhi Ma, Di Wang, Cong Tian, Quan Wang:

VideoAgent: Personalized Synthesis of Scientific Videos. 1803-1811 - Tingrun Chen, Xudong Ling, Shicai Wei, Guiduo Duan, Yue Zhang:

FreSCo: Joint Frequency-Aware and Spatial Control for Image Zero-Shot Style Transfer. 1812-1821 - Lining Wang, Hongxun Yao, Jinyu Zhang:

Retrieval-Augmented Camera Control for Video Diffusion. 1822-1831 - Xudong Zhou, Guozheng Li, Chi Harold Liu:

LayoutGD: Content-Aware Layout Generation via Graph-Enhanced Diffusion Model. 1832-1841 - Yueqian Guo, Tianzhao Li, Xin Lv, Jiehaolin Chen, Zhaohan Wang, Yurun Chen, Sirui Xiao, Yezi He, Helin Li, Fan Zhang:

NeRAG: Neuro-Explicit Retrieval-Augmented Generation for Real-Time Interaction in Digital Humans. 1842-1850 - Jiahui Liu, Tao Sun, Zimeng Xu, Zhipeng Shen, Yifan Kong, Jiaao Zhou:

Context Relation-Aware and Fine-Grained Token Interaction for Dialogue-Level Aspect-Based Sentiment Quadruple Analysis. 1851-1860 - Alexander Vincent Lewi, Rainer Tan, Shengfeng He:

InterFold: Learning Interpretable Diffusion Manifolds Beyond Binary Samples. 1861-1869 - Jianran Liu, Wen Ji, Xiaokai Meng, Wancai Zhang, Ying Wang:

Toward Generation-Centric Coding: Compressing Latents representation for TI2V Synthesis. 1870-1878 - Lianyu Pang, Ji Zhou, Qiping Wang, Baoquan Zhao, Zhenguo Yang, Qing Li, Xudong Mao:

Training for Identity, Inference for Controllability: A Unified Approach to Tuning-Free Face Personalization. 1879-1888 - Wei Zhang, Changhong Jiang, Lulu Wang, Mingting Yu, Siyuan Zhao, Ronghan Li:

Sticker-Enriched Empathetic Response Generation: A Role-Aware Pairing Benchmark and A Novel Multimodal Framework. 1889-1898 - Yushe Cao, Luoxi Jing, Yuanze Wang, Dianxi Shi, Chun Yu, Junliang Xing:

Dual-Pathway Diffusion for Hand Correction in Synthetic Portraits: Global Context Aware and Local Structure Refinement. 1899-1907 - Hualiang Wei, Wenhui Li:

ExpPortrait: Novel Expression Generation for Fine-Grained Controllable Portrait Animation. 1908-1917 - Jianxuan Yang, Xiaoran Yang, Lipan Zhang, Xinyue Guo, Zhao Wang, Gongping Huang:

MultiSoundGen: Video-to-Audio Generation for Multi-Event Scenarios via SlowFast Contrastive Audio-Visual Pretraining and Direct Preference Optimization. 1918-1926
Trustworthy and Secure Multimedia
- Zehua Cheng, Wei Dai, Jiahao Sun:

Identity-Decoupled Anonymization for Visual Evidence in Multi-modal Retrieval-Augmented Generation. 1927-1935 - Yunfei Yang, Xiaojun Chen, Zhendong Zhao, Yu Zhou, Xiaoyan Gu, Juan Cao:

ComMark: Covert and Robust Black-Box Model Watermarking with Compressed Samples. 1936-1945 - Jianbin Ye, Man Xiao, Bo Liu, Huaping Hu, Zijian Gao, Shaojing Fu, Kele Xu, Huaimin Wang:

Multimodal Deepfake Detection with Quantum State Inspired Analytic Incremental Adaptability Learning. 1946-1955 - Yuze Li, Zhilei Liu:

UAU-Net: Uncertainty-aware Representation Learning and Evidential Classification for Facial Action Unit Detection. 1956-1964 - Yunfei Wang, Bo Du, Zhe Yang, Xin Liu, Zhiyu Lin, Tianxin Xu, Ji-Zhe Zhou:

Towards Generalized Image Manipulation Localization via Score-based Model. 1965-1973 - Chengyin Hu, Yijun Chen, Tianle Liang, Yiwei Wei, Ang Li, Haitao Shi, Lingyan Bian, Xuelian Shi:

GMAE: Gated Multi-signal Adaptive Ensemble for Targeted Transfer Attacks on Commercial Large Vision-Language Models. 1974-1983 - Hongying Zheng, Aijia Zhou, Di Xiao:

Preserving Texture in Chaos: Texture-Aware Deep Unfolding for Secure Compressed Sensing. 1984-1993 - Yongfei Tao, Yi Zhang, Runze Liao, Jingwei Qu, Bingyao Huang:

Text Recovery Attacks and Defenses on Physical Envelopes. 1994-2002 - Guanyu Wang, Kailong Wang, Yihao Huang, Mingyi Zhou, Geguang Pu, Li Li:

Privacy Protection Against Personalized Text-to-Image Synthesis via Cross-image Consistency Constraints. 2003-2011 - Hao Wang, Beichen Zhang, Yanpei Gong, Shaoyi Fang, Zhaobo Qi, Yuanrong Xu, Xinyan Liu, Weigang Zhang:

AIFIND: Artifact-Aware Interpreting Fine-Grained Alignment for Incremental Face Forgery Detection. 2012-2021 - Xuanyu Yin, Daowan Peng, Wei Wei:

Mitigating Hallucinations in Vision-Language Models via Contextual Entropy Calibration. 2022-2031 - Gang Zhou, Shibiao Xu, Xiaolong Zheng:

Spatial-Frequency Domain Complementary Learning for Robust Cross-Modal Hashing. 2032-2040 - Bin Zhu, Yinxuan Gui, Huiyan Qi, Jingjing Chen, Chong-Wah Ngo, Ee-Peng Lim:

Benchmarking Gaslighting Negation Attacks Against Multimodal Large Language Models. 2041-2049 - Yu Chen, Ke Wang, Fan Yang, Qiang Xu, Jinwei Chi, Honghao Wei:

Semantic-Guided Fast Adversarial Training via Class Relationship Exploitation. 2050-2058 - Huadi Zheng, Cheng Li, Xin Zhou, Wei Wang, Yuanhang Yu, Feng Wang, Yan Ding:

MirageNet: A Secure, Efficient, and Scalable DNN Protection for Edge-Computing Multimedia Retrieval. 2059-2067 - Junfeng Fang, Yan Zhang, Manzhou Li, Xinjin Ge, Yan Li, Weiyuan Cui:

Privacy-Constrained Low-Bit Representation Learning for Person Image Retrieval. 2068-2076 - Wenzheng Liu, Cheng Fu, Hujin Peng, Junlong Wu, Xianhong Chen, Lan Huang, Jing Xu, Yixi Tian, Menghan Liang, Xiaofeng Wang, Tan Deng, Wanwei Jiang, Xiang Li:

TSAD: Trace-Semantic Adaptive Disentanglement for Detecting and Grounding Multi-Modal Media Manipulation. 2077-2086 - Meng Yang, Peirou Liang, Zhiqian Wu, Yong Liao:

Structure-Induced Safety Gaps in Multimodal Reasoning Systems. 2087-2095 - Minxi Li, Naen Xu, Hengyu An, Tianyu Du:

Content-Adaptive Implicit Neural Representations for Resolution-Agnostic Remote Sensing Watermarking. 2096-2105 - Qichao Xiong, Yiheng Huang, Junhong Chen, Zhanhong Liang, Lunke Fei:

HiChrom-MAE: Frequency-Chromaticity Masked Autoencoding for Palmprint Presentation Attack Detection. 2106-2114 - Boyao Wei, Ruixia Liu, Yinglong Wang:

Enhancing Deepfake Detection Reliability via Risk-Regulated Dual-Threshold Interval Selection. 2115-2123 - Tongzheng Zhao, Yan Chen, Peng Wu, Chao Wen, Peng Zhou, Liang Du:

FairFBC: Scalable Fair Fuzzy Clustering via Group-Balanced Anchor Graphs. 2124-2132 - Fuqiang Du, Min Yu, Jianguo Jiang, Yixin Zhang, Yachao Liang, Meng Zhang, Weiqing Huang:

Thinking in High-Frequency: Practical Defense for Deepfake Detectors Against Black-box Adversarial Attacks. 2133-2141 - Jiaxin Chen, Long Sun, Dengyong Zhang:

DocForensic: A VMamba-Wavelet Cross-Attention Network for Document Image Tampering Localization. 2142-2146 - Xiaomeng Wang, Martha A. Larson, Zhengyu Zhao:

Revealing the Impact of Visual Text Style on Attribute-based Descriptions Produced by Large Visual Language Models. 2147-2151 - Dan Wu, Saihui Hou, Kang Yang, Jian Zhao, Min Ren, Zhaofeng He:

ForgeryMoE: Mixture of Experts for Image Forgery Detection under JPEG Compression. 2152-2156 - Zhiyao Xie, Tong Liu, Jiahao Huang, Nuno Lourenço, Xiaochen Yuan:

BiRNet: Bilateral-Attentive Refinement Network for Tampering Detection Across Manipulation Paradigms. 2157-2161
Evaluation and Applications
- Yueqin Luo, Yunfan Li:

Risk-Aware Medical Question Answering as a Reliability-Critical Decision Process. 2162-2171 - Jiaxin Gao, Yaohua Liu, Danchen Cui, Zhihui Zhao:

SNOC: Subtle Nested Objective Configuration for Joint Ultra-Low-Light Enhancement and Super-Resolution. 2172-2181 - Junliang Liu, Jingyu Xiao, Wenxin Tang, Zhixian Wang, Zipeng Xie, Wenxuan Wang, Minrun Zhang, Shuangheng Yu:

Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety. 2182-2190 - Zhiwei Hong, Qing Lei, Hongbo Zhang, Jixiang Du:

Skeleton-Guided Spatio-Temporal Video Representation for Long-Term Action Quality Assessment. 2191-2199 - Zhe Jin, Tat-Seng Chua:

CompArt: Operationalizing Aesthetic Alignment in Text-to-Image Generation via Principles of Art. 2200-2209 - Ruiqi Song, Lei Liu, Ya-Nan Zhang, Chao Wang, Xiaoning Li, Nan Mu:

VFGS-Net: Frequency-Guided State-Space Learning for Topology-Preserving Retinal Vessel Segmentation. 2210-2218 - Xiaoyang Wei, Camille Kurtz, Florence Cloppet:

Augmenting medical vision-language pretraining via domain-aware retrieval-augmented caption enrichment. 2219-2227 - Kaixi Xu, Yuanyang Zhang, Yilong Liu, Zirui Luo, Yutong He, Li Yao:

SAND: Semantic-Aware Anomaly Detection with Region-Consistent Memory for Noisy Training. 2228-2236 - Bangling Wang, Fengqi Hao, Jinqiang Bai, Huijuan Hao, Xiangjun Dong, Dexin Ma, Hoiio Kong:

CoMemRet: Wood Surface Anomaly Detection Based on Multiview Contrastive Learning and Memory Bank Retrieval. 2237-2245 - Shuai Wang, Hongyi Zhu, Jiahong Huang, Yixian Shen, Chengxi Zeng, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring:

A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding. 2246-2255 - Yufan Zhou, Zhiyuan Ma, Lei Wang, Hongrui Ren, Zhuolun Zhong, Shangpeng Wang, Chenyuan Zhang, Haoran Yang:

GS³-ICL: Graph-Structured Sparse Selection with Invariance-Consistent Learning for multimodal SER. 2256-2265 - Yifan Shuai, Ke Wang, Weiming Feng, Shuai Pang, Dehua Zhou, Yikui Zhai:

GAformer: Low-Light Image Enhancement Based on Gradient-Aware Kernel and Frequency-Modulated Transformer. 2266-2275 - Xingming Liao, Meiyu Zeng, Canyu Chen, Nankai Lin, Zhuowei Wang, Aimin Yang:

Chameleon: Benchmarking Detection and Backtracking on Commercial-Grade AI-Generated Videos. 2276-2285 - Yahui Li, Yinfeng Yu, Liejun Wang, Shengjie Shen:

EAD-Net: Emotion-Aware Talking Head Generation with Spatial Refinement and Temporal Coherence. 2286-2294 - Xiaowen Chen, Guoqiang Xiao, Michael S. Lew, Song Wu:

Coupling Frequency-domain Stepwise Enhancement and Statistical Moment Adjustment for Single Image Dehazing. 2295-2304 - Jidong Li, Xiaofei Yin, Ya Guo, Shuheng Zhou, Yi Tu, Haodong Zhao, Sufeng Duan, Gongshen Liu, Huijia Zhu:

OpenImplicit: Benchmarking Implicit Reasoning in MLLMs via Open-Ended Evaluation. 2305-2313 - Yiguo Jiang, Xiaodong Cun, Chen-Bin Feng, Jian Sun, Chi-Man Pun:

Decoupling Vocal and Rhythmic Conditioning for Music-Driven Singing Avatar Animation. 2314-2322 - Xiuze Dong, Maosheng He, Zhao Wang, Zhiqiang Liang, Yifan He:

TF-Count: A Training-Free Exemplar-Based Method for Class-Agnostic Cell Counting Using DINOv2. 2323-2332 - Nian Ai, Daizong Liu, Pan Zhou:

Exploring Potential Knowledge Correlations of Disease Diagnosis with Partial Labels. 2333-2342 - Youchen Luo, Zhaodong Sun, Huiyu Yang, Wenye Geng, Yuwei Chen, Xiaobai Li:

PhysFlow: Frequency-Selective Flow Matching with Dual-Stream Expert Fusion for Remote Photoplethysmography. 2343-2351 - Feng Li, Wen Luo, Bing Wang, Yongwei Li:

Sparse Implicit Connectivity Graphs with Scheduled Emotion History Sampling for Multimodal Emotion Recognition in Conversation. 2352-2360 - Linfeng Ke:

MDHL-FND: Multi-Domain Hierarchical LoRA for Fake News Detection. 2361-2370 - Lianwei Yang, Haokun Lin, Yichen Wu, Zhenan Sun, Qingyi Gu:

DapQ-DiT: Distribution-Aware Post-Training Quantization for Efficient Generative Tasks in Diffusion Transformers. 2371-2380 - Jiaxin Chen, Zhixiang Zheng, Pei Yi, Dengyong Zhang, Yuxu Peng, Lei Wang:

Multi-Scale Perception and Channel-Gated Fusion Network for Aerial Image Detection. 2381-2389 - Lemin Cheng:

Synergizing Agentic Data Generation and Efficient Representation Learning for Medical Cross-Modal Understanding. 2390-2399 - Zhihuan Lin, Fei Wang, Weihong Cai, Hao Cai:

Anatomic-Decoupling Adaptation: A Unified Framework for Zero/Few-Shot Medical Anomaly Detection. 2400-2408 - Yanjun Chi, Keqiang Wang, Wei Huang, Wei Xu, Jiaen Liang, Jun Yu:

MetaDB: Metadata-Guided Diffusion Bridge Model for High-Fidelity Medical Image Synthesis. 2409-2417 - Ke Zhang, Chu-Hsuan Hsueh, Kokolo Ikeda:

Event-Based Token Sequences for Audio-Conditioned Music-Game Level Modeling. 2418-2427 - Shijie Xuyang, Bingzhe Yu, Minyi Zhao, Guangze Li, Jihong Guan, Shuigeng Zhou:

Anime-2026: A Large-scale Anime Character Dataset for Anime-related AI Tasks. 2428-2437 - Changjiang Jiang, Wenhui Dong, Zhonghao Zhang, Fengchang Yu, Wei Peng, Xinbin Yuan, Yifei Bi, Ming Zhao, Zian Zhou, Chenyang Si, Caifeng Shan:

Ivy-Fake: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection. 2438-2447 - Wenhan Tao, Yongsheng Bai, Feng Wang, Jelena Tesic:

Data-Centric Multimodal Pavement Distress Detection Using Intensity-Range Imagery. 2448-2456 - Pengkun Jiao, Xinlan Wu, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, Yu-Gang Jiang:

RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models. 2457-2466 - Dongjin Huang, Yichuan Liu, Jiantao Qu, Qinghang Wu:

ImageNetion: Retrieval-as-Policy for Creative Gollin Figure Completion via Generative Feedback. 2467-2476 - Gang Liu, Xiaotian Tang, Jiacheng Gan, Tianyu Ren, Tingyao Liu, Shenjun Zhong:

MMG-RAG: Multimodal Graph RAG for Medical Report Generation. 2477-2485 - Yuqian Zheng, Hyeongjin Ahn, Juyeob Lee, Eunil Park:

CLIP-SegFusion: An Attention-Guided Feature Fusion Framework for Multi-Level Detection on AI-Generated Artworks. 2486-2495 - Jing Guo, Baoqi Huang, Qing Miao, Bing Jia, Runze Yang:

Depth-Guided Perception and Policy-Driven Adaptation for Wind Turbine Anomaly Detection. 2496-2504 - Mihai-Bogdan Bîndila, Shenghui Wang, Gwenn Englebienne:

Fine-Grained Cross-Modal Retrieval in Art via Region-Level Grounding of Symbolic Narratives. 2505-2514 - Hangshen Nong, Mingyuan Ge, Mingyong Li:

DynBrush: Structure-Aware and Style-Adaptive Transfer for Traditional Chinese Landscape Rendering. 2515-2523 - Youxin Liao, Jiahao Zhang, Guang Long, Yueran Wang, Ying Qi, Chenyang Wang, Qiang Zhang, Shanxiong Chen:

ICSGDiff: A Multimodal Structure-Aware Diffusion Network for Restoring Ancient Bamboo Slips. 2524-2532 - Junjiang Chen, Wenzhong Yang, Yabo Yin, Hongzhen Lv, Fuyuan Wei, Jingfeng He, Zongxu Luo, Junhang Wu:

TR-GLP: A Two-Stage Framework with Temporal Reprogramming and Residual Graph Label Propagation for Short Video Fake News Detection. 2533-2542 - Ji Zhang, Xiao Luo, Hang Zhou:

CB-CV: A Cluster-Based Cross-Validation Benchmark for Multimodal Video Out-of-Distribution Detection. 2543-2551 - Di Xiao, Yuhan Gou, Yu Ren, Shijia Xu, Yue Zhang:

LLM-Guided Secure Federated Visual Prompts with Deep Unfolding for MRI Reconstruction. 2552-2560 - Aoduo Li, Haoran Lv, Hongjian Xu, Shengmin Li, Sihao Qin, Zimeng Li, Chi Man Pun, Xuhang Chen:

ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis. 2561-2570 - Long Yu, Chunfang Yang, Ma Zhu, Xu Wang, Yang Pei:

Progressive High-Confidence Pseudo-Labeling for Unsupervised Cross-View Image Geo-Localization. 2571-2579 - Yi Zhao, Qianqian Ren, Boyuan Wang, Hui Xu:

Explainable Multimodal Modeling with KANs for Urban EV Charging Demand Forecasting. 2580-2587 - Hoang-Nguyen Cao, Le-Hoang Bui, Dinh-Khoi Vo, Minh-Triet Tran, Trung-Nghia Le:

VietFashion: Benchmarking Sketch-Text Composed Image Retrieval for Cultural Outfits. 2588-2596 - Peng Zhang, Xin Lin:

Multi-Modal 3D Object Detection in Autonomous Driving: A New Survey. 2597-2606 - Yuma Oe, Katsumi Tanaka, Yoshiyuki Shoji:

Asymmetric Pipeline for Dataset Construction and Situation-aware Generative Outfit Retrieval Leveraging Differences in Task Difficulty. 2607-2615 - Jizhe Yu, Xiya Bu, Yu Liu, Kaiping Xu, Yifei Cao, Zhizhen Li:

TrackNetV6: A Unified Framework for Lightweight and Robust Fast-Moving Tiny Ball Tracking. 2616-2625 - Xueru Zhao, Yanrong Hao, Xin Wen, Mengni Zhou, Jing Bian, Rui Cao:

NIGCL: Neuro-Image Geometric Contrastive Learning for Robust EEG-Based Visual Retrieval. 2626-2635 - Yiqiang Zhou, Xindan Gao, Jifeng Guo, Yanfeng Qiao, Yan Chen, Guang Li, Lu Wang:

ClearNight: Synergistic Dual-Prior Aggregation for Nighttime Image Dehazing. 2636-2645 - Hongxia Sun, Wenzhong Yang, Yabo Yin, Fuyuan Wei, Junhang Wu, Junjiang Chen:

HCG-MPB: Hierarchical Complementary Gating Mechanism with Multimodal Pattern Bank for Hateful Video Detection. 2646-2655 - Kaiwen Zheng, Junchen Fu, Songpei Xu, Yaoqin He, Joemon M. Jose, Hu Han, Xuri Ge:

Focal-RegionFace: Generating Fine-Grained Multi-attribute Descriptions for Arbitrarily Selected Face Focal Regions. 2656-2665 - Feng Li, Chen Sun, Bing Wang, Zongyu Xie:

TG-MUNet: A Lightweight Text-Guided Mamba Unet for Semi-Supervised Medical Image Segmentation. 2666-2675 - Tianyuan Li, Lei Wang, Yating Yang, Rui Dong, Ahtamjan Ahmat, Bangju Han, Zhijie Li:

Flickr-ABS: A Benchmark for Abstract Intent in Image-Text Retrieval. 2676-2684 - Bowei Fang, Yunyi Tang, Wentao Mu, Wenbo Liu, Fei Yan, Tao Deng:

DDL-Net: A Task-Balanced Panoptic Perception Network with Channel-Reorganized Attention in Autonomous Driving. 2685-2693 - Weitao Tang, Meijie Du, Die Hu, Shu Li, Zhao Li, Rong Yang, Qingyun Liu:

Blazer: Encrypted Video Traffic Identification for Mixed Segment Transmission Pattern based on LLM. 2694-2703 - Jiahao Chang, Haohua Zhao, Liqing Zhang:

Beyond Semantic Understanding: Physics-Informed Adaptation of Video Foundation Models for Precision Industrial Metrology. 2704-2713 - Hongfei Ye, Bin Chen, Wenxi Liu, Yu Zhang, Zhao Li, Dandan Ni, Hongyang Chen:

Assessing Color Vision Test in Large Vision-language Models. 2714-2722 - Shaojie Hu, Yong Liu, Xiaoguang Zhang:

SPGR: Semantic Purification and Geometric Routing for Fake Short-Video Detection. 2723-2731 - Xiao Ma, Jin Yuan, Yao Zhang, Cheng Zhong, Zhongchao Shi, Jianping Fan, Guihua Zeng:

SlimNet: High-Quality and Efficient Object Removal by Eliciting Latent Capabilities of Diffusion Models. 2732-2740 - Guangrun Chen, Xiaoli Yang, Litan Sun, Ming Jin, Xiaofeng Qu, Sijie Niu:

Entropy-aware Mutual Student Co-training for Source-free Domain Adaptation in Medical Image Segmentation. 2741-2750 - Xiaofeng Shen, Ze Rong, Haoyang Qin, Binhao Zhao, Yue Xu, Lei Ma:

SCORE: Soccer Curriculum for Ordinal Affect Recognition with Evidence. 2751-2759 - Songming Li, Jun Long:

EC-RAG: Evidence-Centric Retrieval-Augmented Generation for Medical Visual Question Answering. 2760-2767 - Jiang Liu, Yuhang Liu, Juan Yang, Qianhao Ren, Yutong Wang, Zhongjiang He, Jingmin Xin, Hao Sun:

VAV-R1: Difficulty-Aware Multimodal Reasoning for Video Anomaly Validation. 2768-2772 - Min Feng, Sheng-hua Zhong, Tianhao Gao, Rongrong Lu:

HSAMoE: Hemiparetic-Side-Aware Mixture of Experts for EEG-Based Motor Imagery Classification in Stroke Patients. 2773-2777 - Zhibo Zhang, Lin Zhao, Wenyan Xing, Di Wang, Chen Gong, Le Zhang:

Efficient Video Anomaly Detection for Edge Devices via Background Feature Caching. 2778-2782 - Lei Xu, Ru Li, Chunyu Zhao, Jiaqiang Zhang:

Centripetal-Scale Coupling and Sparsity-Guided Label Assignment for Tiny Object Detection in Steppe Rathole Monitoring. 2783-2787 - Min Dang, Gang Liu, Zhaolu Zheng, Luyi Qiu, Zihao Li, Jing Liu:

FSGD-Det: Frequency-Spectrum-Guided Dehazing for Aerial Object Detection in Hazy Images. 2788-2792 - Zijian Zhao, Dian Jin, Zijing Zhou:

Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM Approach. 2793-2797 - Xinyu Li, Wenjun Zhu, Xiaoyu Ji, Wenyuan Xu:

Uncovering Frequency Cues for Robust Event-Based UAV Detection. 2798-2802 - Haowen Hua, Zeyi Shao, Shuwei Guo, John See, Zeyd Boukhers, Miho Ohsaki, Kimiaki Shirahama, Cong Yang:

TrinitySeg: A Benchmark and Baseline for Monochrome NIR Smoke Segmentation. 2803-2807 - Juan Chen, Yang Wu, Lei Guo, Wenping Ge, Jie Ma:

Stat-SAM: Learning Global Echo-Intensity Priors as Prompts for SAM in Ultrasound Image Segmentation. 2808-2812 - Jiayi Ding, Libo Liu, Ruonan Zhang:

Topology-Guided Feature Integration for Structured Object Detection. 2813-2817
Brave New Ideas Papers
- Yuro Kanada, Yuma Oe, Huu-Long Pham, Makoto P. Kato, Hiroaki Ohshima, Sumio Fujita, Yoshiyuki Shoji:

Retrieval of LoRA Models based on Layer-Wise Weight Embedding without Metadata. 2818-2825 - Bo-Wen Zhang, Zhou Wang, Songlu Chen, Chun Yang, Xiaobin Zhu, Xu-Cheng Yin:

Beyond Optical Flow: Latent Micro-Motion as Visual Evidence in UAV Video. 2826-2832 - Tendai Mukande, Noel E. O'Connor:

Towards Self-Evolving Knowledge Systems: Enhancing Multimodal Agentic RAG with Hyperbolic Flows. 2833-2841
Technical Demonstrations
- Miriama Jánosová, Andrej Cernek, Radovan Dvorský, Vasil Poposki, Petra Budíková, Jan Sedmidubský:

Real-Time Monitoring and Analysis of Rehabilitation Exercises from a Smartphone Camera Video Stream. 2842-2846 - Allie Tran:

Keep Your Memories, Lose the Evidence: Privacy-Aware Lifelogging. 2847-2851 - Zhengyang Liang, Yan Shu, Cathal Gurrin, Nicu Sebe, Lizi Liao:

VideoCreator: An Agentic System for Multi-turn Video Production. 2852-2856 - Emil Leidland, Duc-Tien Dang-Nguyen:

TrackAnon: Towards Consistent Full-Body Anonymization. 2857-2861 - Stefan J. Arzberger, Helmut Neuschmied, Werner Bailer:

Discovery of Visually Novel Content in Developing News Stories. 2862-2866 - Tatsuro Banno, Koki Kawada, Mizuki Takenawa, Masatoshi Denda, Kiyoharu Aizawa:

Realistic Virtual Flood Experience System Using 360° Videos and 3D City Models Constructed from Building Footprints. 2867-2871 - Arun George Zachariah, Barnaby Simkin, Nikki Pope, Jibin Varghese, Michael Boone:

Context-Aware Visual Redaction Pipeline: Leveraging Vision-Language Models for High-Fidelity Content Inpainting. 2872-2876
Challenge Papers
- Duc-Tien Dang-Nguyen, Kha-Luan Pham, Minh-Anh Pham, Silje Førsund, Henrik Brattli Vold, Minh-Triet Tran, Anh-Duy Tran:

The 2026 Grand Challenge on Multimedia Verification: Overview and Key Directions. 2877-2881 - Hoang-Quoc Nguyen-Son, Tung-Duong Le-Duc, Quynh-Huong Dinh-Nguyen, Hai-Chau Nguyen-Le, Anh-Duy Tran, Minh-Son Dao:

DeepVerify: The End-to-end Software for Evidence-Based Multi-Modal Online Information Verification with Explainable Reasoning. 2882-2886 - Hung Truong Thanh Nguyen, Vo Thanh Khang Nguyen, Hoang-Loc Cao, Phuc Ho, Van Pham, Hung Cao:

Contestable Multi-Agent Debate with Arena-based Argumentative Computation for Multimedia Verification. 2887-2891 - Muhammad Shahid Muneer, Khoa Van Tran, Van Tuan Nguyen, Simon S. Woo:

MOSAIV: Multi-Agent LLM Swarms for Automated Multimedia News Verification: Fake News Detection. 2892-2896
Tutorial Abstracts
- Bo Peng:

Detecting Distribution Shifts In the Wild: Techniques and Future Perspectives. 2897-2898 - Chen Xu, Clara Rus, Yuanna Liu, Marleen de Jonge, Jun Xu, Maarten de Rijke:

Fairness in Information Retrieval: An Economic Perspective. 2899-2900 - Piera Riccio, Selina Khan, Ludovica Schaerf, Shuai Wang, Tiancheng Liu, Athanasios Efthymiou, Noa Garcia, Nanne van Noord:

Understanding Art & Culture. 2901-2903
Workshop Abstracts
- Allie Tran, Werner Bailer, Duc-Tien Dang-Nguyen, Graham Healy, Steve Hodges, Björn Þór Jónsson, Wolfgang Hürst, Luca Rossetto, Klaus Schoeffmann, Minh-Triet Tran, Liting Zhou, Cathal Gurrin:

Introduction to the 9th Annual Lifelog Search Challenge, LSC'26. 2904-2905 - Bo Peng, Xuefeng Du, Zhen Fang:

Toward Trustworthy Vision-language Models in the Wild: Theory, Algorithm and Application. 2906-2907 - Minh-Son Dao, Duc-Tien Dang-Nguyen, Son N. Tran:

The 7th International Workshop on Intelligent Cross-Data Analysis and Retrieval. 2908-2909 - Dan-Cristian Stanciu, Symeon Papadopoulos, Giorgos Kordopatis-Zilos, Bogdan Ionescu, Adrian Popescu, Roberto Caldelli, Milica Gerhardt, Vera Schmitt:

The 5th ACM International Workshop on Multimedia AI against Disinformation (MAD'26). 2910-2913
Doctoral Symposium Papers
- Yunming Hui:

User Influence Characterization and Market Event Attribution via GNN-LLM Collaboration in Financial Networks. 2914-2917 - Di Dai, Bo Liu, Liwei Wang:

Semantic-Anchored Multi-State Retrieval for Robust 3D Perception. 2918-2926 - Ander Etxezarreta:

Object pose estimation for upper-limb prostheses grasping. 2927-2930 - Toya Oyama:

Towards Scalable and Adaptive Multi-Teacher Distillation for Text-to-Video Retrieval. 2931-2934

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














