


default search action
ICCV 2025: Honolulu, HI, USA
- IEEE/CVF International Conference on Computer Vision, ICCV 2025, Honolulu, HI, USA, October 19-25, 2025. IEEE 2025, ISBN 979-8-3315-8775-8

- Christopher Xie, Armen Avetisyan, Henry Howard-Jenkins, Yawar Siddiqui, Julian Straub, Richard A. Newcombe, Vasileios Balntas, Jakob J. Engel:

Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling. 5657-5666 - Xingjian Leng, Jaskirat Singh, Yunzhong Hou, Zhenchang Xing, Saining Xie, Liang Zheng:

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers. 18262-18272 - Sagi Polaczek, Yuval Alaluf, Elad Richardson, Yael Vinker, Daniel Cohen-Or:

NeuralSVG: An Implicit Representation for Text-to-Vector Generation. 15458-15468 - Shufan Li, Konstantinos Kallidromitis, Akash Gokul, Arsh Koneru, Yusuke Kato, Kazuki Kozuka, Aditya Grover:

Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection. 15657-15668 - Petr Hruby, Marc Pollefeys:

Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras. 7143-7153 - Yazhou Xing, Yang Fei, Yingqing He, Jingye Chen, Jiaxin Xie, Xiaowei Chi, Qifeng Chen:

VideoVAE+: Large Motion Video Autoencoding with Cross-Modal Video VAE. 17951-17960 - Sangyun Shin, Yuhang He, Xinyu Hou, Samuel Hodgson, Andrew Markham, Niki Trigoni:

DiffRefine: Diffusion-Based Proposal Specific Point Cloud Densification for Cross-Domain Object Detection. 4888-4897 - Radu Beche, Sergiu Nedevschi:

ClaraVid: A Holistic Scene Reconstruction Benchmark from Aerial Perspective with Delentropy-Based Complexity Profiling. 26015-26025 - Gen Li, Yutong Chen, Yiqian Wu, Kaifeng Zhao, Marc Pollefeys, Siyu Tang:

EgoM2P: Egocentric Multimodal Multitask Pretraining. 10830-10843 - Shicai Wei, Chunbo Luo, Yang Luo:

Improving Multimodal Learning via Imbalanced Learning. 2250-2259 - Ryan Ramos, Vladan Stojnic, Giorgos Kordopatis-Zilos, Yuta Nakashima, Giorgos Tolias, Noa Garcia:

Processing and Acquisition Traces in Visual Encoders: What Does CLIP Know About Your Camera? 17056-17066 - Xin Dong, Shichao Dong, Jin Wang, Jing Huang, Li Zhou, Zenghui Sun, Lihua Jing, Jinsong Lan, Xiaoyong Zhu, Bo Zheng:

Inter: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling. 2534-2544 - Changsheng Gao, Yifan Ma, Qiaoxi Chen, Yenan Xu, Dong Liu, Weisi Lin:

Feature Coding in the Era of Large Models: Dataset, Test Conditions, and Benchmark. 1068-1077 - Tassilo Wald, Constantin Ulrich, Jonathan Suprijadi, Sebastian Ziegler, Michal Nohel, Robin Peretzke, Gregor Köhler, Klaus Maier-Hein:

An OpenMind for 3D Medical Vision Self-supervised Learning. 23839-23879 - Carl Olsson, Yaroslava Lochman, Johan Malmport, Christopher Zach:

Certifiably Optimal Anisotropic Rotation Averaging. 14856-14865 - Siqi Zhang, Yanyuan Qiao, Qunbo Wang, Zike Yan, Qi Wu, Zhihua Wei, Jing Liu:

COSMO: Combination of Selective Memorization for Low-Cost Vision-and-Language Navigation. 5511-5522 - Shakiba Kheradmand, Delio Vicini, George Kopanas, Dmitry Lagun, Kwang Moo Yi, Mark J. Matthews, Andrea Tagliasacchi:

StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting. 26326-26335 - Susan Liang, Chao Huang, Yunlong Tang, Zeliang Zhang, Chenliang Xu:

$\pi$-AVAS: Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis? 13942-13951 - Xin Zhang, Anpei Chen, Jincheng Xiong, Pinxuan Dai, Yujun Shen, Weiwei Xu:

Neural Shell Texture Splatting: More Details and Fewer Primitives. 25229-25238 - Wonseok Roh, Hwanhee Jung, Jong Wook Kim, Seunggwan Lee, Innfarn Yoo, Andreas Lugmayr, Seunggeun Chi, Karthik Ramani, Sangpil Kim:

CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from a Single-View Image. 28228-28238 - Karhan Kayan, Stamatis Alexandropoulos, Rishabh Jain, Yiming Zuo, Erich Liang, Jia Deng:

Princeton365: A Diverse Dataset with Accurate Camera Pose. 1-10 - Zhidan Xu, Xiaoqin Zhang, Shijian Lu:

Face Retouching with Diffusion Data Generation and Spectral Restorement. 14722-14731 - Zhijing Sun, Senyan Xu, Kean Liu, Runze Tian, Xueyang Fu, Zheng-Jun Zha:

EVDM: Event-based Real-World Video Deblurring with Mamba. 13793-13803 - Shaojie Zhang, Jiahui Yang, Jianqin Yin, Zhenbo Luo, Jian Luan, MiLM Plus:

Q-Frame: Query-Aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs. 22056-22065 - Xihong Yang, Siwei Wang, Jiaqi Jin, Fangdi Wang, Tianrui Liu, Yueming Jin, Xinwang Liu, En Zhu, Kunlun He:

Generalized Deep Multi-View Clustering Via Causal Learning With Partially Aligned Cross-View Correspondence. 1990-1999 - Jiawei Xu, Kai Deng, Zexin Fan, Shenlong Wang, Jin Xie, Jian Yang:

AD-GS: Object-Aware B-Spline Gaussian Splatting for Self-Supervised Autonomous Driving. 24770-24779 - Tianhong Gao, Yannian Fu, Weiqun Wu, Haixiao Yue, Shanshan Liu, Gang Zhang:

MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning. 1484-1494 - Jeremy Styborski, Mingzhi Lyu, Jiayou Lu, Nupur Kapur, Adams Wai-Kin Kong:

When and Where Do Data Poisons Attack Textual Inversion? 19439-19449 - Marcin Przewiezlikowski, Randall Balestriero, Wojciech Jasinski, Marek Smieja, Bartosz Zielinski:

Beyond [cls]: Exploring the True Potential of Masked Image Modeling Representations. 23442-23452 - Yukang Cao, Chenyang Si, Jinghao Wang, Ziwei Liu:

FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model. 18111-18120 - Risa Shinoda, Nakamasa Inoue, Hirokatsu Kataoka, Masaki Onishi, Yoshitaka Ushiku:

AgroBench: Vision-Language Model Benchmark in Agriculture. 7634-7644 - Zheyuan Zhang, Weihao Tang, Hong Chen:

Rethinking Key-Frame-Based Micro-Expression Recognition: a Robust and Accurate Framework Against Key-Frame Errors. 12274-12283 - Yueh-Cheng Liu, Lukas Höllein, Matthias Nießner, Angela Dai:

QuickSplat: Fast 3D Surface Reconstruction via Learned Gaussian Initialization. 27851-27861 - Xuan Yao, Junyu Gao, Changsheng Xu:

NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments. 5536-5546 - Sébastien Herbreteau, Michael Unser:

Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising. 10496-10506 - Jaeseok Byun, Seokhyeon Jeong, Wonjae Kim, Sanghyuk Chun, Taesup Moon:

An Efficient Post-Hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval. 3895-3904 - Yan Zhang, Yao Feng, Alpár Cseke, Nitin Saini, Nathan Bajandas, Nicolas Heron, Michael J. Black:

Primal: Physically Reactive and Interactive Motor Model for Avatar Learning. 12725-12736 - Yuedong Tan, Zongwei Wu, Yuqian Fu, Zhuyun Zhou, Guolei Sun, Eduard Zamfir, Chao Ma, Danda Pani Paudel, Luc Van Gool, Radu Timofte:

XTrack: Multimodal Training Boosts RGB-X Video Object Trackers. 5734-5744 - Fangyikang Wang, Hubery Yin, Lei Qian, Yinan Li, Shaobin Zhuang, Huminhao Zhu, Yilin Zhang, Yanlong Tang, Chao Zhang, Hanbin Zhao, Hui Qian, Chen Li:

Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin. 10453-10464 - Yinuo Zhao, Jiale Yuan, Zhiyuan Xu, Xiaoshuai Hao, Xinyi Zhang, Kun Wu, Zhengping Che, Chi Harold Liu, Jian Tang:

Training-Free Generation of Temporally Consistent Rewards from VLMs. 8133-8143 - Congyi Fan, Jian Guan, Xuanjia Zhao, Dongli Xu, Youtian Lin, Tong Ye, Pengming Feng, Haiwei Pan:

Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation. 13193-13202 - Chiao-An Yang, Kuan-Chuan Peng, Raymond A. Yeh:

Toward Long-Tailed Online Anomaly Detection Through Class-Agnostic Concepts. 23419-23430 - Zhuokun Chen, Jugang Fan, Zhuowei Yu, Bohan Zhuang, Mingkui Tan:

Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis. 17140-17149 - Haochen Wang, Qirui Chen, Cilin Yan, Jiayin Cai, Xiaolong Jiang, Yao Hu, Weidi Xie, Stratis Gavves:

Object-Centric Video Question Answering with Visual Grounding and Referring. 22274-22284 - Sacha Ichbiah, Anshuman Sinha, Fabrice Delbary, Hervé Turlier:

Inverse 3D Microscopy Rendering for Cell Shape Inference With Active Mesh. 26987-26998 - Wei Xu, Kangjie Chen, Jiawei Qiu, Yuyang Zhang, Run Wang, Jin Mao, Tianwei Zhang, Lina Wang:

Automated Red Teaming for Text-to-Image Models Through Feedback-Guided Prompt Iteration with Vision-Language Models. 18575-18584 - Tomoyuki Suzuki, Kang-Jun Liu, Naoto Inoue, Kota Yamaguchi:

LayerD: Decomposing Raster Graphic Designs into Layers. 17783-17792 - Zhewei Dai, Shilei Zeng, Haotian Liu, Xurui Li, Feng Xue, Yu Zhou:

SeaS: Few-Shot Industrial Anomaly Image Generation with Separation and Sharing Fine-Tuning. 23135-23144 - Kiseong Hong, Gyeong-Hyeon Kim, Eunwoo Kim:

RainbowPrompt: Diversity-Enhanced Prompt-Evolving for Continual Learning. 1130-1140 - Chengyao Qian, Trung Le, Mehrtash Harandi:

A Good Teacher Adapts Their Knowledge for Distillation. 1239-1248 - Wenhao Xu, Wenming Weng, Yueyi Zhang, Ruikang Xu, Zhiwei Xiong:

Event-Boosted Deformable 3D Gaussians for Dynamic Scene Reconstruction. 28334-28343 - Jiaqi Xu, Wenbo Li, Haoze Sun, Fan Li, Zhixin Wang, Long Peng, Jingjing Ren, Haoran Yang, Xiaowei Hu, Renjing Pei, Pheng-Ann Heng:

Fast Image Super-Resolution via Consistency Rectified Flow. 11755-11765 - Xiaolin Liu, Tianyi Zhou, Hongbo Kang, Jian Ma, Ziwen Wang, Jing Huang, Wenguo Weng, Yu-Kun Lai, Kun Li:

RESCUE: Crowd Evacuation Simulation via Controlling SDM-United Characters. 1-10 - Qiangqiang Wu, Yi Yu, Chenqi Kong, Ziquan Liu, Jia Wan, Haoliang Li, Alex C. Kot, Antoni B. Chan:

Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking. 11110-11121 - Fengzhe Zhou, Humphrey Shi:

HyPiDecoder: Hybrid Pixel Decoder for Efficient Segmentation and Detection. 22100-22109 - Markus Knoche, Daan de Geus, Bastian Leibe:

DONUT: A Decoder-Only Model for Trajectory Prediction. 28903-28912 - Junlong Tong, Wei Zhang, Yaohui Jin, Xiaoyu Shen:

Context Guided Transformer Entropy Modeling for Video Compression. 18885-18894 - Ziyu Liu, Zeyi Sun, Yuhang Zang, Xiaoyi Dong, Yuhang Cao, Haodong Duan, Dahua Lin, Jiaqi Wang:

Visual-RFT: Visual Reinforcement Fine-Tuning. 2034-2044 - Lily Goli, Sara Sabour, Mark J. Matthews, Marcus A. Brubaker, Dmitry Lagun, Alec Jacobson, David J. Fleet, Saurabh Saxena, Andrea Tagliasacchi:

RoMo: Robust Motion Segmentation Improves Structure from Motion. 6155-6164 - Danila Rukhovich, Elona Dupont, Dimitrios Mallis, Kseniya Cherenkova, Anis Kacem, Djamila Aouada:

CAD-Recode: Reverse Engineering CAD Code From Point Clouds. 9801-9811 - Adeela Islam, Stefano Fiorini, Stuart James, Pietro Morerio, Alessio Del Bue:

ReassembleNet: Learnable Keypoints and Diffusion for 2D Fresco Reconstruction. 9048-9057 - Jiahao Ma, Tianyu Wang, Miaomiao Liu, David Ahmedt-Aristizabal, Chuong Nguyen:

DCHM: Depth-Consistent Human Modeling for Multiview Detection. 7731-7740 - Jiazheng Liu, Zejin Wang, Bohao Chen, Hua Han:

Blind2Sound: Self-Supervised Image Denoising Without Residual Noise. 12937-12946 - Huaqiu Li, Yong Wang, Tongwen Huang, Hailang Huang, Haoqian Wang, Xiangxiang Chu:

LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling. 13684-13694 - Ziqiang Zheng, Yuk-Kwan Wong, Binh-Son Hua, Jianbo Shi, Sai-Kit Yeung:

CoraLSRT: Revisiting Coral Reef Semantic Segmentation by Feature Rectification via Self-Supervised Guidance. 19967-19977 - Juntao Wu, Xianting Huang, Yu Chen, Shuai Pang, Ke Wang:

Scaling and Taming Adversarial Training with Synthetic Data. 2951-2960 - Fengrui Tian, Tianjiao Ding, Jinqi Luo, Hancheng Min, René Vidal:

Voyaging into Perpetual Dynamic Scenes from a Single View. 7698-7708 - Sung Ju Lee, Nam Ik Cho:

Semantic Watermarking Reinvented: Enhancing Robustness and Generation Quality with Fourier Integrity. 18759-18769 - Yuting He, Shuo Li:

Vector Contrastive Learning for Pixel-Wise Pretraining in Medical Vision. 1-11 - Tianming Liang, Kun-Yu Lin, Chaolei Tan, Jianguo Zhang, Wei-Shi Zheng, Jian-Fang Hu:

ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations. 20009-20019 - Xi Yu, Xiang Gu, Zhihao Shi, Jian Sun:

Wasserstein Style Distribution Analysis and Transform for Stylized Image Generation. 17496-17505 - Rongyao Fang, Chengqi Duan, Kun Wang, Hao Li, Linjiang Huang, Hao Tian, Xingyu Zeng, Rui Zhao, Jifeng Dai, Hongsheng Li, Xihui Liu:

PUMA: Empowering Unified MLLM with Multi-Granular Visual Generation. 15447-15457 - Chen Liang, Wenguan Wang, Yi Yang:

Towards Human-Like Virtual Beings: Simulating Human Behavior in 3D Scenes. 10753-10763 - Yuting Liu, Liu Yang, Yu Wang:

Long-Tailed Classification with Multi-Granularity Semantics. 4285-4294 - Wenhan Wu, Zhishuai Guo, Chen Chen, Hongfei Xue, Aidong Lu:

Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-Based Action Recognition. 11122-11131 - Tianli Liao, Chenyang Zhao, Lei Li, Heling Cao:

Leveraging Local Patch Alignment to Seam-Cutting for Large Parallax Image Stitching. 27262-27271 - Ta Duc Huy, Duy Anh Huynh, Yutong Xie, Yuankai Qi, Qi Chen, Phi Le Nguyen, Sen Kim Tran, Son Lam Phung, Anton van den Hengel, Zhibin Liao, Minh-Son To, Johan W. Verjans, Vu Minh Hieu Phan:

Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding. 24445-24455 - Wenqi Ouyang, Zeqi Xiao, Danni Yang, Yifan Zhou, Shuai Yang, Lei Yang, Jianlou Si, Xingang Pan:

TokensGen: Harnessing Condensed Tokens for Long Video Generation. 18197-18206 - Jianfeng Dong, Danfeng Luo, Daizong Liu, Jie Sun, Xiaoye Qu, Xun Yang, Dongsheng Liu, Xun Wang:

LLM-Assisted Entropy-Based Adaptive Distillation for Unsupervised Fine-Grained Visual Representation Learning. 383-392 - Jonathan Roberts, Kai Han, Samuel Albanie:

GRAB: A Challenging Graph Analysis Benchmark for Large Multimodal Models. 1644-1654 - Junwei Luo, Yingying Zhang, Xue Yang, Kang Wu, Qi Zhu, Lei Liang, Jingdong Chen, Yansheng Li:

When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning. 9206-9217 - Zeyu Xi, Haoying Sun, Yaofei Wu, Junchi Yan, Haoran Zhang, Lifang Wu, Liang Wang, Changwen Chen:

Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning. 24330-24339 - Pawel Skiers, Kamil Deja:

Joint Diffusion Models in Continual Learning. 4380-4390 - Zerui Gong, Zhonghua Wu, Qingyi Tao, Qinyue Li, Chen Change Loy:

SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer. 18294-18303 - Zhongwei Qiu, Hanqing Chao, Tiancheng Lin, Wanxing Chang, Zijiang Yang, Wenpei Jiao, Yixuan Shen, Yunshuo Zhang, Yelin Yang, Wenbin Liu, Hui Jiang, Yun Bian, Ke Yan, Dakai Jin, Le Lu:

Bridging Local Inductive Bias and Long-Range Dependencies With Pixel-Mamba for End-To-End Whole Slide Image Analysis. 22738-22747 - Venkat Adithya Amula, Sunayana Samavedam, Saurabh Saini, Avani Gupta, P. J. Narayanan:

Prototype Guided Backdoor Defense via Activation Space Manipulation. 2195-2205 - Shengqi Dang, Yi He, Long Ling, Ziqing Qian, Nanxuan Zhao, Nan Cao:

EmotiCrafter: Text-to-Emotional-Image Generation Based on Valence-Arousal Model. 15218-15228 - Yuran Wang, Yingping Liang, Yutao Hu, Ying Fu:

RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather. 25134-25144 - Rui Wang, Yimu Sun, Jingxing Guo, Huisi Wu, Jing Qin:

GDKVM: Echocardiography Video Segmentation via Spatiotemporal Key-Value Memory with Gated Delta Rule. 12191-12200 - Linwei Chen, Lin Gu, Ying Fu:

Frequency-Dynamic Attention Modulation for Dense Prediction. 22620-22632 - Maksim Siniukov, Di Chang, Minh Tran, Hongkun Gong, Ashutosh Chaubey, Mohammad Soleymani:

Ditailistener: Controllable High Fidelity Listener Video Generation with Diffusion. 11991-12001 - Elena Belén Bueno-Benito

, Mariella Dimiccoli:
CLOT: Closed Loop Optimal Transport for Unsupervised Action Segmentation. 10719-10729 - Jongseo Lee, Kyungho Bae, Kyle Min, Gyeong-Moon Park, Jinwoo Choi:

ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning. 17546-17556 - Lojze Zust, Yohann Cabon, Juliette Marrie, Leonid Antsfeld, Boris Chidlovskii, Jérôme Revaud, Gabriela Csurka:

PanSt3R: Multi-View Consistent Panoptic Segmentation. 5856-5886 - Haoji Zhang, Yiqin Wang, Yansong Tang, Yong Liu, Jiashi Feng, Xiaojie Jin:

Flash-Vstream: Efficient Real-Time Understanding for Long Video Streams. 21059-21069 - Timo Teufel, Pulkit Gera, Xilong Zhou, Umar Iqbal, Pramod Rao, Jan Kautz, Vladislav Golyanik, Christian Theobalt:

HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis. 29131-29141 - Sarthak Kumar Maharana, Baoming Zhang, Leonid Karlinsky, Rogério Feris, Yunhui Guo:

BATCLIP: Bimodal Online Test-Time Adaptation for CLIP. 1569-1579 - Xiangxiang Chu, Renda Li, Yong Wang:

USP: Unified Self-Supervised Pretraining for Image Generation and Understanding. 18475-18486 - Kefan Chen, Sreyas Mohan, Justin Theiss, Sergiu Oprea, Srinath Sridhar, Aayush Prakash:

InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians. 10410-10420 - Xinyao Liu, Diping Song:

Constructing Ophthalmic MLLM for Positioning-Diagnosis Collaboration Through Clinical Cognitive Chain Reasoning. 21547-21556 - Feifei Zhang, Zhihao Wang, Xi Zhang, Changsheng Xu:

Overcoming Dual Drift for Continual Long-Tailed Visual Question Answering. 4413-4423 - Junhao Zheng, Jiahao Sun, Chenhao Lin, Zhengyu Zhao, Chen Ma, Chong Zhang, Chao Shen, Cong Wang, Qian Wang:

Revisiting Adversarial Patch Defenses on Object Detectors: Unified Evaluation, Large-Scale Dataset, and New Insights. 23476-23486 - Chengbo Wang, Guozheng Ma, Yifei Xue, Yizhen Lao:

Faster and Better 3D Splatting via Group Training. 27968-27977 - Bingchao Wang, Zhiwei Ning, Jianyu Ding, Xuanang Gao, Yin Li, Dongsheng Jiang, Jie Yang, Wei Liu:

FIX-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text. 20694-20704 - Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech, Xi Wang, Luc Van Gool, Danda Pani Paudel:

Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description. 5633-5644 - Haonan Wang, Qixiang Zhang, Lehan Wang, Xuanqi Huang, Xiaomeng Li:

Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction. 18367-18376 - Cihang Peng, Qiming Hou, Zhong Ren, Kun Zhou:

ROVI: A VLM-LLM Re-Captioned Dataset for Open-Vocabulary Instance-Grounded Text-to-Image Generation. 20204-20214 - Zitian Wang, Yue Liao, Kang Rong, Fengyun Rao, Yibo Yang, Si Liu:

Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs. 2010-2021 - Jaeseok Jeong, Junho Kim, Gayoung Lee, Yunjey Choi, Youngjung Uh:

Stylekeeper: Prevent Content Leakage using Negative Visual Query Guidance. 15760-15769 - Youneng Bao, Yiping Liu, Zhuo Chen, Yongsheng Liang, Mu Li, Kede Ma:

Dataset Distillation as Data Compression: A Rate-Utility Perspective. 1-11 - Zifu Wan, Ce Zhang, Silong Yong, Martin Q. Ma, Simon Stepputtis, Louis-Philippe Morency, Deva Ramanan, Katia Sycara, Yaqi Xie:

ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models. 3225-3234 - Quanfeng Lu, Wenqi Shao, Zitao Liu, Lingxiao Du, Fanqing Meng, Boxuan Li, Botong Chen, Siyuan Huang, Kaipeng Zhang, Ping Luo:

GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices. 22404-22414 - Emery Pierson, Lei Li, Angela Dai, Maks Ovsjanikov:

DiffuMatch: Category-Agnostic Spectral Diffusion Priors for Robust Non-Rigid Shape Matching. 5745-5756 - Qifan Yu, Zhebei Shen, Zhongqi Yue, Yang Wu, Bosheng Qin, Wenqiao Zhang, Yunfei Li, Juncheng Li, Siliang Tang, Yueting Zhuang:

Mastering Collaborative Multi-Modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness. 155-165 - Qiaole Dong, Yanwei Fu:

Online Dense Point Tracking with Streaming Memory. 8710-8720 - Yanwen Fang, Wenqi Jia, Xu Cao, Peng-Tao Jiang, Guodong Li, Jintai Chen:

Proxy-Bridged Game Transformer for Interactive Extreme Motion Prediction. 13912-13921 - Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos, Panagiotis C. Petrantonakis:

SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting. 1-12 - Byeongjun Kwon, Munchurl Kim:

One Look is Enough: Seamless Patchwise Refinement for Zero-Shot Monocular Depth Estimation on High-Resolution Images. 8077-8087 - Chenyu Mu, Yijun Qu, Jiexi Yan, Erkun Yang, Cheng Deng:

Meta-Learning Dynamic Center Distance: Hard Sample Mining for Learning with Noisy Labels. 415-425 - Stefan Stojanov, Linan Zhao, Yunzhi Zhang, Daniel L. K. Yamins, Jiajun Wu:

Weakly-Supervised Learning of Dense Functional Correspondences. 1-13 - Changha Shin, Woong Oh Cho, Seon Joo Kim:

Seam360GS: Seamless 360° Gaussian Splatting from Real-World Omnidirectional Images. 1-10 - Umaima Rahman, Mohammad Yaqub, Dwarikanath Mahapatra:

DiMPLE - Disentangled Multi-Modal Prompt Learning: Enhancing Out-of-Distribution Alignment with Invariant and Spurious Feature Separation. 1634-1643 - I-Hsiang Chen, Hua-En Chang, Wei-Ting Chen, Jenq-Neng Hwang, Sy-Yen Kuo:

Exploring Probabilistic Modeling Beyond Domain Generalization for Semantic Segmentation. 21755-21765 - Kunyang Li, Jean-Charles Noirot Ferrand, Ryan Sheatsley, Blaine Hoak, Yohan Beugin, Eric Pauley, Patrick D. McDaniel:

On the Robustness Tradeoff in Fine-Tuning. 4898-4907 - Yufei Han, Bowen Tie, Heng Guo, Youwei Lyu, Si Li, Boxin Shi, Yunpeng Jia, Zhanyu Ma:

PolGS: Polarimetric Gaussian Splatting for Fast Reflective Surface Reconstruction. 28073-28082 - Seungjin Jung, Kanghee Lee, Yonghyun Jeong, Haeun Noh, Jungmin Lee, Jongwon Choi:

Group-Wise Scaling and Orthogonal Decomposition for Domain-Invariant Feature Extraction in Face Anti-Spoofing. 13372-13381 - Jiahao Zhu, Zixuan Chen, Guangcong Wang, Xiaohua Xie, Yi Zhou:

SegmentDreamer: Towards High-Fidelity Text-to-3D Synthesis with Segmented Consistency Trajectory Distillation. 15864-15874 - Sunung Mun, Jinhwan Nam, Sunghyun Cho, Jungseul Ok:

Addressing Text Embedding Leakage in Diffusion-Based Image Editing. 16451-16460 - Kyusu Ahn, JiSoo Kim, Sangik Lee, HyunGyu Lee, Byeonghyun Ko, Chanwoo Park, Jaejin Lee:

UDC-VIT: A Real-World Video Dataset for Under-Display Cameras. 10950-10960 - Tianyi Wang, Shuaicheng Niu, Harry Cheng, Xiao Zhang, Yinglong Wang:

NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping. 9945-9954 - Yifan Liu, Shengjun Zhang, Chensheng Dai, Yang Chen, Hao Liu, Chen Li, Yueqi Duan:

Learning Efficient and Generalizable Human Representation with Human Gaussian Model. 11797-11806 - Mahir Atmis, Levent Karacan, Mehmet Sarigül:

One-Step Specular Highlight Removal with Adapted Diffusion Models. 16313-16322 - Jiwen Yu, Yiran Qin, Xintao Wang, Pengfei Wan, Di Zhang, Xihui Liu:

GameFactorly: Creating New Games with Generative Interactive Videos. 11590-11599 - Haitao Tian:

DuoCLR: Dual-Surrogate Contrastive Learning for Skeleton-Based Human Action Segmentation. 13772-13782 - Adrian Chow, Evelien Riddell, Yimu Wang, Sean Sedwards, Krzysztof Czarnecki:

OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection. 7990-8000 - Jieyi Tan, Chengwei Zhang, Bo Dang, Yansheng Li:

Towards Privacy-preserved Pre-training of Remote Sensing Foundation Models with Federated Mutual-Guidance Learning. 1804-1814 - Azim Ospanov, Mohammad Jalali, Farzan Farnia:

Scendi Score: Prompt-Aware Diversity Evaluation Via Schur Complement of Clip Embeddings. 16927-16937 - Hongyang Sun, Qinglin Yang, Jiawei Wang, Zhen Xu, Chen Liu, Yida Wang, Kun Zhan, Hujun Bao, Xiaowei Zhou, Sida Peng:

Hierarchy UGP: Hierarchy Unified Gaussian Primitive for Large-Scale Dynamic Scene Reconstruction. 26252-26262 - Carlos Esteves, Mohammed Suhail, Ameesh Makadia:

Spectral Image Tokenizer. 17181-17190 - Christian Simon, Masato Ishii, Akio Hayakawa, Zhi Zhong, Shusuke Takahashi, Takashi Shibuya, Yuki Mitsufuji:

TITAN-Guide: Taming Inference-Time Alignment for Guided Text-to-Video Diffusion Models. 1-10 - Yeon-Ji Song, Jaein Kim, Suhyung Choi, Jin-Hwa Kim, Byoung-Tak Zhang:

Ock: Unsupervised Dynamic Video Prediction With Object-Centric Kinematics. 11359-11368 - Li Hu, Guangyuan Wang, Zhen Shen, Xin Gao, Dechao Meng, Lian Zhuo, Peng Zhang, Bang Zhang, Liefeng Bo:

Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance. 10207-10217 - Nguyen Cong Dat, Bao Hieu Tran, Tung Hoang-Thanh:

Guiding Noisy Label Conditional Diffusion Models with Score-Based Discriminator Correction. 18531-18541 - Ruiqi Du, Xu Tang, Xiangrong Zhang, Jingjing Ma:

Category-Specific Selective Feature Enhancement for Long-Tailed Multi-Label Image Classification. 3757-3766 - Haochen Zhao, Jianwei Niu, Xuefeng Liu, Xiaozheng Xie, Li Kuang, Haotian Yang, Bin Dai, Hui Meng, Yong Wang:

Keep Your Friends Close, and Your Enemies Farther: Distance-Aware Voxel-Wise Contrastive Learning for Semi-Supervised Multi-Organ Segmentation. 21832-21842 - Wenting Luan, Siqi Lu, Yongbin Zheng, Wanying Xu, Lang Nie, Zongtan Zhou, Kang Liao:

Lifting the Structural Morphing for Wide-Angle Images Rectification: Unified Content and Boundary Modeling. 25529-25538 - Jiaqi Wu, Simin Chen, Jing Tang, Yuzhe Yang, Yiming Chen, Lixu Wang, Song Lin, Zehua Wang, Wei Chen, Zijian Tian:

FDPT: Federated Discrete Prompt Tuning for Black-Box Visual-Language Models. 1-10 - Yan Xia, Yunxiang Lu, Rui Song, Oussema Dhaouadi, João F. Henriques, Daniel Cremers:

Trafficloc: Localizing Traffic Surveillance Cameras in 3D Scenes. 28685-28695 - Joonmyung Choi, Sanghyeok Lee, Byungoh Ko, Eunseo Kim, Jihyung Kil, Hyunwoo J. Kim:

Representation Shift: Unifying Token Compression with Flashattention. 20456-20466 - Shicai Wei, Chunbo Luo, Yang Luo:

Boosting Multimodal Learning via Disentangled Gradient Learning. 1-10 - Jingjing Ren, Wenbo Li, Zhongdao Wang, Haoze Sun, Bangzhen Liu, Haoyu Chen, Jiaqi Xu, Aoxue Li, Shifeng Zhang, Bin Shao, Yong Guo, Lei Zhu:

Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis. 18155-18165 - Tomasz Niewiadomski, Anastasios Yiannakidis, Hanz Cuevas-Velasquez, Soubhik Sanyal, Michael J. Black, Silvia Zuffi, Peter Kulits:

Generative Zoo. 8492-8502 - Yufei Wang, Lanqing Guo, Zhihao Li, Jiaxing Huang, Pichao Wang, Bihan Wen, Jian Wang:

Training-Free Text-Guided Image Editing with Visual Autoregressive Model. 17577-17586 - Yukai Shi, Jiarong Ou, Rui Chen, Haotian Yang, Jiahao Wang, Xin Tao, Pengfei Wan, Di Zhang, Kun Gai:

Imbalance in Balance: Online Concept Balancing in Generation Models. 17432-17442 - Qi Xun Yeo, Yanyan Li, Gim Hee Lee:

Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images. 24999-25008 - Weitai Kang, Haifeng Huang, Yuzhang Shang, Mubarak Shah, Yan Yan:

Robin3D Improving 3D Large Language Model via Robust Instruction Tuning. 3905-3915 - Xingyu Miao, Haoran Duan, Quanhao Qian, Jiuniu Wang, Yang Long, Ling Shao, Deli Zhao, Ran Xu, Gongjie Zhang:

Towards Scalable Spatial Intelligence Via 2D-To-3D Data Lifting. 945-959 - Qianhao Yuan, Qingyu Zhang, Yanjiang Liu, Jiawei Chen, Yaojie Lu, Hongyu Lin, Jia Zheng, Xianpei Han, Le Sun:

ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers. 329-339 - Zuhao Yang, Jiahui Zhang, Yingchen Yu, Shijian Lu, Song Bai:

Versatile Transition Generation with Image-to-Video Diffusion. 16981-16990 - Christian Löwens, Thorben Funke, Jingchao Xie, Alexandru Paul Condurache:

PseudoMapTrainer: Learning Online Mapping without HD Maps. 5263-5272 - Zhile Chen, Hui Ji:

Robust Unfolding Network for HDR Imaging with Modulo Cameras. 25218-25228 - Qiyu Xu, Zhanxuan Hu, Yu Duan, Ercheng Pei, Yonghang Tai:

A Hidden Stumbling Block in Generalized Category Discovery: Distracted Attention. 405-414 - Xiaobao Wei, Qingpo Wuwu, Zhongyu Zhao, Zhuangzhe Wu, Nan Huang, Ming Lu, Ningning Ma, Shanghang Zhang:

EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting. 28462-28472 - Da-Wei Zhou, Kai-Wen Li, Jingyi Ning, Han-Jia Ye, Lijun Zhang, De-Chuan Zhan:

External Knowledge Injection for CLIP-Based Class-Incremental Learning. 3314-3325 - Shengfang Zhai, Jiajun Li, Yue Liu, Huanran Chen, Zhihua Tian, Wenjie Qu, Qingni Shen, Ruoxi Jia, Yinpeng Dong, Jiaheng Zhang:

Efficient Input-Level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation. 15182-15193 - Xinyu Zhou, Peiqi Duan, Yeliduosi Xiaokaiti, Chao Xu, Boxin Shi:

Event-Based Visual Vibrometry. 24666-24676 - Zihan Ding, Chi Jin, Difan Liu, Haitian Zheng, Krishna Kumar Singh, Qiang Zhang, Yan Kang, Zhe Lin, Yuchen Liu:

DOLLAR: Few-Step Video Generation Via Distillation and Latent Reward Optimization. 17961-17971 - Yang Liu, Xudong Xie, Yuliang Liu, Xiang Bai:

Multi-Scenario Overlapping Text Segmentation with Depth Awareness. 17454-17463 - Hao Tang, Zhiqing Guo, Liejun Wang, Chao Liu:

Similarity Memory Prior is All You Need for Medical Image Segmentation. 23009-23018 - Hayeon Kim, Ji Ha Jang, Se Young Chun:

Robust 3D-Masked Part-Level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling. 5501-5510 - Zichun Su, Zhi Lu

, Yutong Wu, Renfei Shen, Songfeng Lu:
FLSeg: Enhancing Privacy and Robustness in Federated Learning under Heterogeneous Data via Model Segmentation. 3916-3925 - Junjie Shan, Ziqi Zhao, Jialin Lu, Rui Zhang, Siu Ming Yiu, Ka-Ho Chow:

Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning. 1-10 - Lu Liu, Huiyu Duan, Qiang Hu, Liu Yang, Chunlei Cai, Tianxiao Ye, Huayu Liu, Xiaoyun Zhang, Guangtao Zhai:

F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration. 10982-10994 - Takahiro Kushida, Kenichiro Tanaka:

Thermal Polarimetric Multi-View Stereo. 27390-27399 - Yiming Zhang, Zhuokai Zhao, Zhaorun Chen, Zhili Feng, Zenghui Ding, Yining Sun:

RankCLIP: Ranking-Consistent Language-Image Pretraining. 3874-3884 - Sicheng Zhang, Binzhu Xie, Zhonghao Yan, Yuli Zhang, Donghao Zhou, Xiaofei Chen, Shi Qiu, Jiaqi Liu, Guoyang Xie, Zhichao Lu:

Trade-Offs in Image Generation: How Do Different Dimensions Interact? 1-12 - Xiaoyu Zhou, Jingqi Wang, Yongtao Wang, Yufei Wei, Nan Dong, Ming-Hsuan Yang:

AutoOcc: Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting. 3367-3377 - Zhehui Wu, Yong Chen, Naoto Yokoya, Wei He:

MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration. 13009-13020 - Ho Kei Cheng, Alexander Gerhard Schwing:

The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation. 15875-15884 - Pengzhen Chen, Yanwei Liu, Xiaoyan Gu, Enci Liu, Zhuoyi Shang, Xiangyang Ji, Wu Liu:

PlugMark: A Plug-In Zero-Watermarking Framework for Diffusion Models. 17335-17345 - Seunghyun Shin, Dongmin Shin, Jisu Shin, Hae-Gon Jeon, Joon-Young Lee:

Video Color Grading via Look-Up Table Generation. 19141-19152 - Hang Yang, Le Hui, Jianjun Qian, Jin Xie, Jian Yang:

GSRecon: Efficient Generalizable Gaussian Splatting for Surface Reconstruction from Sparse Views. 25346-25356 - Caoshuo Li, Zengmao Ding, Xiaobin Hu, Bang Li, Donghao Luo, AndyPian Wu, Chaoyang Wang, Chengjie Wang, Taisong Jin, SevenShu, Yunsheng Wu, Yongge Liu, Rongrong Ji:

OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography. 19893-19902 - Johannes Künzel, Anna Hilsmann, Peter Eisert:

RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction. 4868-4877 - Hongxin Li, Jingran Su, Jingfan Chen, Zheng Ju, Yuntao Chen, Qing Li, Zhaoxiang Zhang:

UIPro: Unleashing Superior Interaction Capability for GUI Agents. 1613-1623 - Rishubh Parihar, Sachidanand VS, R. Venkatesh Babu:

Zero-Shot Depth Aware Image Editing With Diffusion Models. 15748-15759 - Zhenghao He, Sanchit Sinha, Guangzhi Xiong, Aidong Zhang:

GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability. 614-623 - Zitian Tang, Shijie Wang, Junho Cho, Jaewook Yoo, Chen Sun:

How Can Objects Help Video-Language Understanding? 21994-22003 - Seunggwan Lee, Hwanhee Jung, Byoungsoo Koh, Qixing Huang, Sang Ho Yoon, Sangpil Kim:

PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior. 18585-18595 - Avinash Paliwal, Xilong Zhou, Wei Ye, Jinhui Xiong, Rakesh Ranjan, Nima Khademi Kalantari:

RI3D: Few-Shot Gaussian Splatting with Repair and Inpainting Diffusion Priors. 25094-25103 - Ruiyang Zhang, Hu Zhang, Zhedong Zheng:

Harnessing Uncertainty-Aware Bounding Boxes for Unsupervised 3D Object Detection. 9230-9240 - Shenyu Lu, Zhaoying Pan, Xiaoqian Wang:

Think Twice: Test-Time Reasoning for Robust CLIP Zero-Shot Classification. 2919-2929 - Taowen Wang, Cheng Han, James Liang, Wenhao Yang, Dongfang Liu, Luna Xinyu Zhang, Qifan Wang, Jiebo Luo, Ruixiang Tang:

Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics. 6948-6958 - Zhongyu Yang, Jun Chen, Dannong Xu, Junjie Fei, Xiaoqian Shen

, Liangbing Zhao
, Chun-Mei Feng, Mohamed Elhoseiny
:
WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation. 15532-15541 - Chuxin Wang, Yixin Zha, Wenfei Yang, Tianzhu Zhang:

StruMamba3D: Exploring Structural Mamba for Self-Supervised Point Cloud Representation Learning. 28546-28555 - Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou:

MAVias: Mitigate any Visual Bias. 1271-1281 - Jinjing Zhu, Tianbo Pan, Zidong Cao, Yexin Liu, James T. Kwok, Hui Xiong:

Depth Any Event Stream: Enhancing Event-based Monocular Depth Estimation via Dense-to-Sparse Distillation. 5146-5155 - Zhenhua Ning, Zhuotao Tian, Shaoshuai Shi, Guangming Lu, Daojing He, Wenjie Pei, Li Jiang:

Enhancing Spatial Reasoning in Multimodal Large Language Models Through Reasoning-Based Segmentation. 7851-7860 - Tiankai Chen, Yushu Li, Adam Goodge, Fei Teng, Xulei Yang, Tianrui Li, Xun Xu:

Exploiting Vision Language Model for Training-Free 3D Point Cloud OOD Detection via Graph Score Propagation. 28797-28807 - Yuxuan Yuan, Luyao Tang, Yixin Chen, Chaoqi Chen, Yue Huang, Xinghao Ding:

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching. 20911-20921 - Juncan Deng, Shuaiting Li, Zeyu Wang, Kedong Xu, Hong Gu, Kejie Huang:

ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba. 24518-24527 - Yuanlin Wang, Ruiqin Xiong, Rui Zhao, Jin Wang, Xiaopeng Fan, Tiejun Huang:

ISP2HRNet: Learning to Reconstruct High Resolution Image from Irregularly Sampled Pixels via Hierarchical Gradient Learning. 11547-11557 - Zijun Zhou, Yingying Deng, Xiangyu He, Weiming Dong, Fan Tang:

Multi-Turn Consistent Image Editing. 15792-15801 - Mengxue Qu, Yibo Hu, Kunyang Han, Yunchao Wei, Yao Zhao:

ReCot: Reflective Self-Correction Training for Mitigating Confirmation Bias in Large Vision-Language Models. 9147-9157 - Peng Liao, Xilu Wang, Yaochu Jin, Wenli Du, Han Hu:

Neural Architecture Search Driven by Locally Guided Diffusion for Personalized Federated Learning. 4222-4231 - Rongpei Hong, Jian Lang, Ting Zhong, Fan Zhou:

Borrowing Eyes for the Blind Spot: Overcoming Data Scarcity in Malicious Video Detection Via Cross-Domain Retrieval Augmentation. 22728-22737 - Jiale Cheng, Ruiliang Lyu, Xiaotao Gu, Xiao Liu, Jiazheng Xu, Yida Lu, Jiayan Teng, Zhuoyi Yang, Yuxiao Dong, Jie Tang, Hongning Wang, Minlie Huang:

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization. 15636-15645 - Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei:

Aligning Global Semantics and Local Textures in Generative Video Enhancement. 17087-17096 - Yu Wei, Jiahui Zhang, Xiaoqin Zhang, Ling Shao, Shijian Lu:

PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations. 26499-26508 - Junjia Huang, Pengxiang Yan, Jiyang Liu, Jie Wu, Zhao Wang, Yitong Wang, Liang Lin, Guanbin Li:

DreamFuse: Adaptive Image Fusion with Diffusion Transformer. 1-10 - Pengzhan Sun, Junbin Xiao, Tze Ho Elden Tse, Yicong Li, Arjun R. Akula, Angela Yao:

Visual Intention Grounding for Egocentric Assistants. 2512-2522 - Xiaoyang Hao, Han Li:

Perspose: 3D Human Pose Estimation with Perspective Encoding and Perspective Rotation. 8110-8119 - Aniket Roy, Shubhankar Borse, Shreya Kadambi, Debasmit Das, Shweta Mahajan, Risheek Garrepalli, Hyojin Park, Ankita Nayak, Rama Chellappa, Munawar Hayat, Fatih Porikli:

DuoLoRA: Cycle-Consistent and Rank-Disentangled Content-Style Personalization. 15395-15404 - Huixin Sun, Yanjing Li, Linlin Yang, Xianbin Cao, Baochang Zhang:

Uncertainty-Aware Gradient Stabilization for Small Object Detection. 8407-8417 - Xin You, Runze Yang, Chuyan Zhang, Zhongliang Jiang, Jie Yang, Nassir Navab:

FB-Diff: Fourier Basis-Guided Diffusion for Temporal Interpolation of 4D Medical Imaging. 28010-28020 - Yusuke Hirota, Ryo Hachiuma, Boyi Li, Ximing Lu, Michael Ross Boone, Boris Ivanovic, Yejin Choi, Marco Pavone, Yu-Chiang Frank Wang, Noa Garcia, Yuta Nakashima, Chao-Han Huck Yang:

Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation. 8634-8644 - Junyoung Lim, Jaewoo Ahn, Gunhee Kim:

ChartCap: Mitigating Hallucination of Dense Chart Captioning. 13171-13182 - Tianyu Hong, Xiaobo Zhou, Wenkai Hu, Qi Xie, Zhihui Ke, Tie Qiu:

Communication-Efficient Multi-Vehicle Collaborative Semantic Segmentation via Sparse 3D Gaussian Sharing. 28622-28631 - Yecheng Wu, Junyu Chen, Zhuoyang Zhang, Enze Xie, Jincheng Yu, Junsong Chen, Jinyi Hu, Yao Lu, Song Han, Han Cai:

DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer. 18034-18045 - Jiwoo Chung, Sangeek Hyun, Hyunjun Kim, Eunseo Koh, MinKyu Lee, Jae-Pil Heo:

Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation. 19174-19184 - Leonard Bruns, Axel Barroso-Laguna, Tommaso Cavallari, Áron Monszpart, Sowmya Munukutla, Victor Adrian Prisacariu, Eric Brachmann:

ACE-G: Improving Generalization of Scene Coordinate Regression Through Query Pre-Training. 26751-26761 - Jincheng Li, Chunyu Xie, Ji Ao, Dawei Leng, Yuhui Yin:

LMM-Det: Make Large Multimodal Models Excel in Object Detection. 308-318 - Hongyu Shen, Junfeng Ni, Yixin Chen, Weishuo Li, Mingtao Pei, Siyuan Huang:

Trace3D: Consistent Segmentation Lifting via Gaussian Instance Tracing. 6656-6666 - Junsong Chen, Shuchen Xue, Yuyang Zhao, Jincheng Yu, Sayak Paul, Junyu Chen, Han Cai, Song Han, Enze Xie:

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation. 16185-16195 - Jimin Dai, Jiexi Yan, Jian Yang, Lei Luo:

Straighten Viscous Rectified Flow via Noise Optimization. 15005-15014 - Zihui Gao, Jia-Wang Bian, Guosheng Lin, Hao Chen, Chunhua Shen:

SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting. 1-10 - Yusheng Dai, Chenxi Wang, Chang Li, Chen Wang, Kewei Li, Jun Du, Lei Sun, Jianqing Gao, Ruoyu Wang, Jiefeng Ma:

Latent Swap Joint Diffusion for 2D Long-Form Latent Generation. 11006-11015 - Shaocong Dong, Lihe Ding, Xiao Chen, Yaokun Li, Yuxin Wang, Yucheng Wang, Qi Wang, Jaehyeok Kim, Chenjian Gao, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu:

From One to More: Contextual Part Latents for 3D Generation. 8230-8240 - Xuying Zhang, Yupeng Zhou, Kai Wang, Yikai Wang, Zhen Li, Shaohui Jiao, Daquan Zhou, Qibin Hou, Ming-Ming Cheng:

AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction. 26273-26283 - Kangan Qian, Jinyu Miao, Xinyu Jiao, Ziang Luo, Zheng Fu, Yining Shi, Yunlong Wang, Kun Jiang, Diange Yang:

PriorMotion: Generative Class-Agnostic Motion Prediction with Raster-Vector Motion Field Priors. 27284-27294 - Xin Wang, Xinlin Wang, Shuiping Gou:

TopicGeo: An Efficient Unified Framework for Geolocation. 8241-8251 - Mengwei Xie, Shuang Zeng, Xinyuan Chang, Xinran Liu, Zheng Pan, Mu Xu, Xing Wei:

SeqGrowGraph: Learning Lane Topology as a Chain of Graph Expansions. 27166-27175 - Zexi Jia, Chuanwei Huang, Yeshuang Zhu, Hongyan Fei, Ying Deng, Zhiqiang Yuan, Jiapei Zhang, Jinchao Zhang, Jie Zhou:

From Imitation to Innovation: The Emergence of Ai's Unique Artistic Styles and the Challenge of Copyright Protection. 18980-18989 - Guangben Lu, Yuzhen Du, Yizhe Tang, Zhimin Sun, Ran Yi, Yifan Qi, Tianyi Wang, Lizhuang Ma, Fangyuan Zou:

Pinco: Position-Induced Consistent Adapter for Diffusion Transformer in Foreground-Conditioned Inpainting. 15266-15276 - Li Yi, Jie Hu, Songan Zhang, Guannan Jiang:

Adapt Foundational Segmentation Models with Heterogeneous Searching Space. 23364-23373 - Boyang Deng, Songyou Peng, Kyle Genova, Gordon Wetzstein, Noah Snavely, Leonidas J. Guibas, Thomas A. Funkhouser:

Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images. 12769-12778 - Tao Wang, Changxu Cheng, Lingfeng Wang, Senda Chen, Wuyue Zhao:

HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model. 23267-23278 - Ryan Po, Yotam Nitzan, Richard Zhang, Berlin Chen, Tri Dao, Eli Shechtman, Gordon Wetzstein, Xun Huang:

Long-Context State-Space Video World Models. 8733-8744 - Gwanghyun Kim, Suh Yoon Jeon, Seunggyu Lee, Se Young Chun:

PersonaCraft: Personalized and Controllable Full-Body Multi-Human Scene Generation Using Occlusion-Aware 3D-Conditioned Diffusion. 12034-12044 - Ziyun Wang, Ruijun Zhang, Zi-Yan Liu, Yufu Wang, Kostas Daniilidis:

Continuous-Time Human Motion Field from Event Cameras. 11502-11512 - Chengkai Hou, Yanjie Ze, Yankai Fu, Zeyu Gao, Songbo Hu, Yue Yu, Shanghang Zhang, Huazhe Xu:

4D Visual Pre-Training for Robot Learning. 8451-8461 - Zerui Tao, Yuhta Takida, Naoki Murata, Qibin Zhao, Yuki Mitsufuji:

Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models. 16333-16344 - Taewoo Kim, Kuk-Jin Yoon:

Event-guided Unified Framework for Low-light Video Enhancement, Frame Interpolation, and Deblurring. 8524-8534 - Kecheng Chen, Xinyu Luo, Tiexin Qin, Jie Liu, Hui Liu, Victor Ho-fun Lee, Hong Yan, Haoliang Li:

Test-time Adaptation for Foundation Medical Segmentation Model without Parametric Updates. 20075-20084 - Tuo Feng, Wenguan Wang, Yi Yang:

Gaussian-Based World Model: Gaussian Priors for Voxel-Based Occupancy Prediction and Future Motion Prediction. 25239-25249 - Tianhang Cheng, Akbert J. Zhai, Evan Z. Chen, Rui Zhou, Yawen Deng, Zitong Li, Kejie Zhao, Janice Shiu, Qianyu Zhao, Yide Xu, Xinlei Wang, Yuan Shen, Sheng Wang, Lisa Ainsworth, Kaiyu Guan, Shenlong Wang:

Demeter: A Parametric Model of Crop Plant Morphology from the Real World. 28740-28751 - Haipeng Li, Tianhao Zhou, Zhanglei Yang, Yi Wu, Yan Chen, Zijing Mao, Shen Cheng, Bing Zeng, Shuaicheng Liu:

Estimating 2D Camera Motion with Hybrid Motion Basis. 7624-7633 - Stuti Pathak, Prashant Kumar, Dheeraj Baiju, Nicholus Mboga, Gunther Steenackers, Rudi Penne:

Revisiting Point Cloud Completion: Are We Ready for the Real-World? 25388-25398 - Xiaoding Yuan, Guofeng Zhang, Prakhar Kaushik, Artur Jesslen, Adam Kortylewski, Alan L. Yuille:

Scaling 3D Compositional Models for Robust Classification and Pose Estimation. 6406-6415 - Andreas Engelhardt, Mark Boss, Vikram Voletti, Chun-Han Yao, Hendrik P. A. Lensch, Varun Jampani:

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation. 28428-28439 - Tianyuan Qu, Longxiang Tang, Bohao Peng, Senqiao Yang, Bei Yu, Jiaya Jia:

Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma? 20889-20899 - Kaisi Guan, Zhengfeng Lai, Yuchong Sun, Peng Zhang, Wei Liu, Kieran Liu, Meng Cao, Ruihua Song:

ETVA: Evaluation of Text-to-Video Alignment via Fine-Grained Question Generation and Answering. 21299-21309 - Jianhan Wu, Xiaoyang Qu, Zhangcheng Huang, Jianzong Wang:

Federated Domain Generalization with Domain-Specific Soft Prompts Generation. 2366-2375 - Amirhossein Kazerouni, Soroush Mehraban, Michael Brudno, Babak Taati:

LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding. 4828-4837 - Zixiang Ai, Zhenyu Cui, Yuxin Peng, Jiahuan Zhou:

UPP: Unified Point-Level Prompting for Robust Point Cloud Analysis. 27359-27368 - Marvin Burges, Philipe Ambrozio Dias, Carson Woody, Sarah Walters, Dalton D. Lunga:

Active Learning Meets Foundation Models: Fast Remote Sensing Data Annotation for Object Detection. 6058-6068 - Simone Alberto Peirone, Francesca Pistilli, Giuseppe Averta:

HiERO: Understanding the Hierarchy of Human Behavior Enhances Reasoning on Egocentric Videos. 19862-19871 - Vlad Hosu, Lorenzo Agnolucci, Daisuke Iso, Dietmar Saupe:

Image Intrinsic Scale Assessment: Bridging the Gap Between Quality and Resolution. 12863-12872 - Mutian Xu, Chongjie Ye, Haolin Liu, Yushuang Wu, Jiahao Chang, Xiaoguang Han:

Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with Two-Stage Depth Diffusion. 2609-2619 - Rakshith Madhavan, Federica Arrigoni:

On the Recovery of Cameras from Fundamental Matrices. 20934-20943 - Qi Li, Runpeng Yu, Xinchao Wang:

Towards Performance Consistency in Multi-Level Model Collaboration. 2567-2576 - Mustafa Shukor, Enrico Fini, Victor Guilherme Turrisi da Costa, Matthieu Cord, Joshua M. Susskind, Alaaeldin El-Nouby:

Scaling Laws for Native Multimodal Models. 12-23 - Mattia Segù, Marta Tintore Gazulla, Yongqin Xian, Luc Van Gool, Federico Tombari:

MOBIUS: Big-to-Mobile Universal Instance Segmentation via Multi-modal Bottleneck Fusion and Calibrated Decoder Pruning. 20726-20736 - Yongkang Zhang, Dongyu She, Zhong Zhou:

Adaptive Prompt Learning via Gaussian Outlier Synthesis for Out-Of-Distribution Detection. 3235-3244 - Jungdae Lee, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Yutaka Matsuo, Nakamasa Inoue:

CityNav: A Large-Scale Dataset for Real-World Aerial Navigation. 5912-5922 - Heeseok Jung, Jun-Hyeon Bak, Yujin Jeong, Gyugeun Lee, Jinwoo Ahn, Eun-Sol Kim:

Zero-Shot Compositional Video Learning with Coding Rate Reduction. 20508-20518 - Maoxian Wan, Kaige Li, Qichuan Geng, Weimin Shi, Zhong Zhou:

Incremental Few-Shot Semantic Segmentation Via Multi-Level Switchable Visual Prompts. 24113-24122 - Tongkai Shi, Lianyu Hu, Fanhua Shang, Liqing Gao, Wei Feng:

Greg: GEometry-Aware RegIon Refinement for Sign Language Video Generation. 16472-16481 - Jingxi Liao, Shijie Hao, Richang Hong, Meng Wang:

Gt-Mean Loss: a Simple Yet Effective Solution for Brightness Mismatch in Low-Light Image Enhancement. 6112-6121 - Zhaoyang Wu, Fang Liu, Licheng Jiao, Shuo Li, Lingling Li, LiXu Liu, Puhua Chen, Wenping Ma:

Hierarchical Variational Test-Time Prompt Generation for Zero-Shot Generalization. 2325-2335 - Lixing Xiao, Shunlin Lu, Huaijin Pi, Ke Fan, Liang Pan, Yueer Zhou, Ziyong Feng, Xiaowei Zhou, Sida Peng, Jingbo Wang:

MotionStreamer: Streaming Motion Generation via Diffusion-Based Autoregressive Model in Causal Latent Space. 10086-10096 - Bhavya Goyal, Felipe Gutierrez-Barragan, Wei Lin, Andreas Velten, Yin Li, Mohit Gupta:

Robust 3D Object Detection Using Probabilistic Point Clouds From Single-Photon Lidars. 28417-28427 - Chin-Yang Lin, Cheng Sun, Fu-En Yang, Min-Hung Chen, Yen-Yu Lin, Yu-Lun Liu:

LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos. 27412-27422 - Tao Gong, Qi Chu, Bin Liu, Wei Zhou, Nenghai Yu:

FE-CLIP: Frequency Enhanced CLIP Model for Zero-Shot Anomaly Detection and Segmentation. 21220-21230 - Guangzhao He, Yuxi Xiao, Zhen Xu, Xiaowei Zhou, Sida Peng:

ERNet: Efficient Non-Rigid Registration Network for Point Sequences. 27156-27165 - Ciyu Ruan, Ruishan Guo, Zihang Gong, Jingao Xu, Wenhan Yang, Xinlei Chen:

PRE-Mamba: A 4D State Space Model for Ultra-High-Frequent Event Camera Deraining. 9169-9180 - Arsha Nagrani, Sachit Menon, Ahmet Iscen, Shyamal Buch, Ramin Mehran, Nilpa Jha, Anja Hauth, Yukun Zhu, Carl Vondrick, Mikhail Sirotenko, Cordelia Schmid, Tobias Weyand:

Minerva: Evaluating Complex Video Reasoning. 23968-23978 - Zihao Xu, Yuzhi Tang, Bowen Xu, Qingquan Li:

NeurOp-Diff: Continuous Remote Sensing Image Super-Resolution via Neural Operator Diffusion. 12491-12501 - Matteo Poggi, Fabio Tosi:

FlowSeek: Optical Flow Made Easier with Depth Foundation Models and Motion Bases. 5667-5679 - Zuhao Yang, Yingchen Yu, Yunqing Zhao, Shijian Lu, Song Bai:

Timeexpert: an Expert-Guided Video Llm for Video Temporal Grounding. 24286-24296 - Heejeong Nam, Jinwoo Ahn, Keummin Ka, Jiwan Chung, Youngjae Yu:

VAGUE: Visual Contexts Clarify Ambiguous Expressions. 1537-1547 - Yukun Huang, Yanning Zhou, Jianan Wang, Kaiyi Huang, Xihui Liu:

DreamCube: RGB-D Panorama Generation via Multi-Plane Synchronization. 24922-24932 - Shaowen Tong, Zimin Xia, Alexandre Alahi, Xuming He, Yujiao Shi:

GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization. 25357-25366 - Fengbo Lan, Chang Wen Chen:

Removing Out-of-Focus Reflective Flares via Color Alignment. 9770-9779 - Yuwen Pan, Rui Sun, Wangkai Li, Tianzhu Zhang:

Exploring Weather-aware Aggregation and Adaptation for Semantic Segmentation under Adverse Conditions. 13952-13962 - Chun-Han Yao, Yiming Xie, Vikram Voleti, Huaizu Jiang, Varun Jampani:

SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation. 13248-13258 - Yu-Ju Tsai, Brian L. Price, Qing Liu, Luis Figueroa, Daniil Pakhomov, Zhihong Ding, Scott Cohen, Ming-Hsuan Yang:

CompleteMe: Reference-Based Human Image Completion. 18252-18261 - Mahnoor Fatima Saad, Ziad Al-Halah:

How Would it Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes. 12232-12241 - Yingyan Li, Yuqi Wang, Yang Liu, Jiawei He, Lue Fan, Zhaoxiang Zhang:

End-to-End Driving with Online Trajectory Evaluation via BEV World Model. 27137-27146 - SaiKiran Kumar Tedla, Junyong Lee, Beixuan Yang, Mahmoud Afifi, Michael S. Brown:

Multispectral Demosaicing via Dual Cameras. 5405-5414 - David G. Shatwell, Ishan Rajendrakumar Dave, Sirnam Swetha, Mubarak Shah:

GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space. 1-11 - Yihong Luo, Tianyang Hu, Jiacheng Sun, Yujun Cai, Jing Tang:

Learning Few-Step Diffusion Models by Trajectory Distribution Matching. 17719-17728 - Zhuo Li, Mingshuang Luo, Ruibing Hou, Xin Zhao, Hao Liu, Hong Chang, Zimo Liu, Chen Li:

Morph: a Motion-Free Physics Optimization Framework for Human Motion Generation. 14580-14589 - Hao Ban, Gokul Ram Subramani, Kaiyi Ji:

SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation. 785-795 - Achint Soni, Meet Soni, Sirisha Rambhatla:

LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing. 18804-18814 - Shengrong Yuan, Runmin Wang, Ke Hao, Xuqi Ma, Changxin Gao, Li Liu, Nong Sang:

StyleSRN: Scene Text Image Super-Resolution with Text Style Embedding. 18693-18702 - Wajahat Khalid, Bin Liu, Xulin Li, Muhammad Waqas, Muhammad Sher Afgan:

Bridging the Sky and Ground: Towards View-Invariant Feature Learning for Aerial-Ground Person Re-Identification. 9749-9758 - Ran Zhao, Xinxin Dai, Pengpeng Hu, Vasile Palade, Adrian Munteanu:

MeasureXpert: Automatic Anthropometric Measurement Extraction from Two Unregistered, Partial, Posed, and Dressed Body Scans. 9605-9615 - Seokho Han, Seoyeon Yoon, Jinhee Kim, Dongwei Wang, Kang Eun Jeon, Huanrui Yang, Jong Hwan Ko:

MSQ: Memory-Efficient Bit Sparsification Quantization. 21885-21894 - Anik Sarker, Alan T. Asbeck:

Correspondence-Free Fast and Robust Spherical Point Pattern Registration. 28156-28166 - Siyu Chen, Ting Han, Changshe Zhang, Xin Luo, Meiliu Wu, Guorong Cai, Jinhe Su:

Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation. 8285-8295 - Ruitao Wu, Yifan Zhao, Jia Li:

Learning Yourself: Class-Incremental Semantic Segmentation with Language-Inspired Bootstrapped Disentanglement. 21623-21634 - Francesco Taioli, Edoardo Zorzi, Gianni Franchi, Alberto Castellini, Alessandro Farinelli

, Marco Cristani, Yiming Wang:
Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues. 18781-18792 - Prafful Kumar Khoba, Zijian Wang

, Chetan Arora, Mahsa Baktashmotlagh:
PEFTDiff: Diffusion-Guided Transferability Estimation for Parameter-Efficient Fine-Tuning. 1454-1463 - Mert Sonmezer, Matthew Zheng, Pinar Yanardag:

LoRAverse: A Submodular Framework to Retrieve Diverse Adapters for Diffusion Models. 17879-17888 - Nurbek Tastan, Karthik Nandakumar:

A Framework for Double-Blind Federated Adaptation of Foundation Models. 923-933 - Xiwen Chen, Peijie Qiu, Wenhui Zhu, Hao Wang, Huayu Li, Xuanzhao Dong, Xiaotong Sun, Xiaobing Yu, Yalin Wang, Abolfazl Razi, Aristeidis Sotiras:

Cracking Instance Jigsaw Puzzles: An Alternative to Multiple Instance Learning for Whole Slide Image Analysis. 21353-21363 - Duong T. Tran, Trung-Kien Tran, Manfred Hauswirth, Danh Le Phuoc:

ReasonVQA: A Multi-Hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering. 18793-18803 - Yi-Ting Chen, Ting-Hsuan Liao, Pengsheng Guo, Alexander Gerhard Schwing, Jia-Bin Huang:

Bridging Diffusion Models and 3D Representations: A 3D Consistent Super-Resolution Framework. 13481-13490 - Qi Bi, Jingjun Yi, Huimin Huang, Hao Zheng, Haolan Zhan, Wei Ji, Yawen Huang, Yuexiang Li, Yefeng Zheng:

A Simple Yet Mighty Hartley Diffusion Versatilist for Generalizable Dense Vision Tasks. 6748-6760 - Dong Liu, Chunhui Luo, Yuanfei Bao, Gang Yang, Jie Xiao, Xueyang Fu, Zheng-Jun Zha:

Enhanced Pansharpening Via Quaternion Spatial-Spectral Interactions. 10908-10918 - Xianfu Cheng, Wei Zhang, Shiwei Zhang, Jian Yang, Xiangyuan Guan, Xianjie Wu, Xiang Li, Ge Zhang, Jiaheng Liu, Yuying Mai, Yutao Zeng, Zhoufutu Wen, Ke Jin, Baorui Wang, Weixiao Zhou, Yunhong Lu, Hangyuan Ji, Tongliang Li, Wenhao Huang, Zhoujun Li:

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models. 4637-4646 - Pei An, Jiaqi Yang, Muyao Peng, You Yang, Qiong Liu, Xiaolin Wu, Liangliang Nan:

MinCD-PnP: Learning 2D-3D Correspondences with Approximate Blind PnP. 26519-26528 - Changwon Kang, Jisong Kim, Hongjae Shin, Junseo Park, Jun Won Choi:

MAESTRO: Task-Relevant Optimization Via Adaptive Feature Enhancement and Suppression for Multi-Task 3D Perception. 28313-28323 - Boyu Chen, Zhengrong Yue, Siran Chen, Zikang Wang, Yang Liu, Peng Li, Yali Wang:

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents. 20237-20246 - Yufei Cai, Hu Han, Yuxiang Wei, Shiguang Shan, Xilin Chen:

EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-To-Video Diffusion Models. 10592-10601 - Parnian Zameni, Yuhan Shen, Ehsan Elhamifar:

MOSCATO: Predicting Multiple Object State Change through Actions. 11600-11611 - Bin Rao, Haicheng Liao, Yanchen Guan, Chengyue Wang, Bonan Wang, Jiaxun Zhang, Zhenning Li:

AMD: Adaptive Momentum and Decoupled Contrastive Learning Framework for Robust Long-Tail Trajectory Prediction. 28849-28858 - Qianqian Wang, Vickie Ye, Hang Gao, Weijia Zeng, Jake Austin, Zhengqi Li, Angjoo Kanazawa:

Shape of Motion: 4D Reconstruction From a Single Video. 9660-9672 - Zhi Chen, Zecheng Zhao, Jingcai Guo, Jingjing Li, Zi Huang:

SVIP: Semantically Contextualized Visual Patches for Zero-Shot Learning. 3346-3356 - Yidi Shao, Mu Huang, Chen Change Loy, Bo Dai:

GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects. 7841-7850 - Pengjie Zhang, Lin Zhu, Xiao Wang, Lizhi Wang, Hua Huang:

EMatch: A Unified Framework for Event-Based Optical Flow and Stereo Matching. 5845-5855 - Sicong Du, Jiarun Liu, Qifeng Chen, Haoxiang Chen, Tai-Jiang Mu, Sheng Yang:

RGE-GS: Reward-Guided Expansive Driving Scene Reconstruction via Diffusion Priors. 25756-25764 - Shizun Wang, Zhenxiang Jiang, Xingyi Yang, Xinchao Wang:

C4D: 4D Made from 3D Through Dual Correspondences. 7570-7580 - Zhengyao Lv, Tianlin Pan, Chenyang Si, Zhaoxi Chen, Wangmeng Zuo, Ziwei Liu, Kwan-Yee K. Wong:

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers. 5934-5943 - Pengfei Ren, Jingyu Wang, Haifeng Sun, Qi Qi, Xingyu Liu, Menghao Zhang, Lei Zhang, Jing Wang, Jianxin Liao:

Prior-Aware Dynamic Temporal Modeling Framework for Sequential 3D Hand Pose Estimation. 6476-6487 - Wongyun Yu, Ahyun Seo, Minsu Cho:

Axis-Level Symmetry Detection with Group-Equivariant Representation. 24791-24800 - Keming Wu, Junwen Chen, Zhanhao Liang, Yinuo Wang, Ji Li, Chao Zhang, Bin Wang, Yuhui Yuan:

Hybrid Layout Control for Diffusion Transformer: Fewer Annotations, Superior Aesthetics. 17930-17940 - Seung-Wook Kim, Seongyeol Kim, Jiah Kim, Seowon Ji, Se-Ho Lee:

FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization. 4616-4625 - Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi:

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics. 13405-13415 - Yuxuan Wang, Xuanyu Yi, Haohan Weng, Qingshan Xu, Xiaokang Wei, Xianghui Yang, Chunchao Guo, Long Chen, Hanwang Zhang:

Nautilus: Locality-Aware Autoencoder for Scalable Mesh Generation. 1-10 - Lingyi Hong, Jinglun Li, Xinyu Zhou, Shilin Yan, Pinxue Guo, Kaixun Jiang, Zhaoyu Chen, Shuyong Gao, Runze Li, Xingdong Sheng, Wei Zhang, Hong Lu, Wenqiang Zhang:

General Compression Framework for Efficient Transformer Object Tracking. 13427-13437 - Laura Niss, Kevin Vogt-Lowell, Theodoros Tsiligkaridis:

The Inter-Intra Modal Measure: A Predictive Lens on Fine-Tuning Outcomes in Vision-Language Models. 2396-2406 - Zefeng Qian, Xincheng Yao, Yifei Huang, Chongyang Zhang, Jiangyong Ying, Hong Sun:

Beyond Label Semantics:Language-Guided Action Anatomy for Few-Shot Action Recognition. 10421-10431 - Yanran Zhang, Bingyao Yu, Yu Zheng, Wenzhao Zheng, Yueqi Duan, Lei Chen, Jie Zhou, Jiwen Lu:

D3QE: Learning Discrete Distribution Discrepancy-Aware Quantization Error for Autoregressive-Generated Image Detection. 16292-16301 - Zipei Ma, Junzhe Jiang, Yurui Chen, Li Zhang:

BézierGS: Dynamic Urban Scene Reconstruction with Bézier Curve Gaussian Splatting. 25519-25528 - Eric Slyman, Md. Mehrab Tanjim, Kushal Kafle, Stefan Lee:

Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles. 17224-17234 - Pulkit Kumar, Shuaiyi Huang, Matthew Walmer, Sai Saketh Rambhatla, Abhinav Shrivastava:

Trokens: Semantic-Aware Relational Trajectory Tokens for Few-Shot Action Recognition. 13544-13556 - Darshan Thaker, Abhishek Goyal, René Vidal:

Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration. 12873-12882 - Xinkuan Qiu, Meina Kan, Yongbin Zhou, Shiguang Shan:

Benchmarking Multimodal Large Language Models Against Image Corruptions. 9014-9023 - Shunjie Yuan, Xinghua Li, Xuelin Cao, Haiyan Zhang, Mengyao Zhu, Robert H. Deng:

SPD: Shallow Backdoor Protecting Deep Backdoor Against Backdoor Detection. 4029-4038 - Zhixuan Li, Binqian Xu, Xiangbo Shu, Jiachao Zhang, Yazhou Yao, Guo-Sen Xie, Jinhui Tang:

Tensor-Aggregated LoRA in Federated Fine-Tuning. 1058-1067 - Ming Dai, Wenxuan Cheng, Jiedong Zhuang, Jiang-jiang Liu, Hongshen Zhao, Zhenhua Feng, Wankou Yang:

PropVG: End-To-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination. 7058-7068 - Shuting Dong, Mingzhi Chen, Feng Lu, Hao Yu, Guanghao Li, Zhe Wu, Ming Tang, Chun Yuan:

Vpr-Cloak: a First Look at Privacy Cloak Against Visual Place Recognition. 7197-7208 - Weixi Zheng, Jingwang Ling, Zhibo Wang, Quan Wang, Feng Xu:

Teeth Reconstruction and Performance Capture Using a Phone Camera. 9998-10008 - Ju-Hyeon Nam, Dong-Hyun Moon, Sang-Chul Lee:

M2sformer: Multi-Spectral and Multi-Scale Attention With Edge-Aware Difficulty Guidance for Image Forgery Localization. 15927-15938 - Yang Yang, Zhendong Mao, Hiroaki Santo, Yasuyuki Matsushita, Fumio Okura:

NeuraLeaf: Neural Parametric Leaf Models with Shape and Deformation Disentanglement. 28167-28176 - Giuseppe Cartella, Vittorio Cuculo, Alessandro D'Amelio, Marcella Cornia, Giuseppe Boccignone, Rita Cucchiara:

Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction. 16206-16216 - Zelong Sun, Dong Jing, Zhiwu Lu:

CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval. 22675-22684 - Jingjing Wang, Qirui Hu, Chong Bao, Yuke Zhu, Hujun Bao, Zhaopeng Cui, Guofeng Zhang:

LightCity: An Urban Dataset for Outdoor Inverse Rendering and Reconstruction Under Multi-Illumination Conditions. 26477-26487 - Junfei Xiao, Feng Cheng, Lu Qi, Liangke Gui, Yang Zhao, Shanchuan Lin, Jiepeng Cen, Zhibei Ma, Alan L. Yuille, Lu Jiang:

Videoauteur: Towards Long Narrative Video Generation. 19163-19173 - Xianglin Qiu, Xiaoyang Wang, Zhen Zhang, Jimin Xiao:

Bias-Resilient Weakly Supervised Semantic Segmentation Using Normalizing Flows. 21321-21330 - Jonathan Ventura, Viktor Larsson, Fredrik Kahl:

Uncalibrated Structure from Motion on a Sphere. 69-78 - Hyundong Jin, Hyung Jin Chang, Eunwoo Kim:

Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models. 3466-3476 - Wenqi Zhang, Hang Zhang, Xin Li, Jiashuo Sun, Yongliang Shen, Weiming Lu, Deli Zhao, Yueting Zhuang, Lidong Bing:

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining. 4647-4658 - Seokjun Choi, Hoon-Gyu Chung, Yujin Jeon, Giljoo Nam, Seung-Hwan Baek:

A Real-World Display Inverse Rendering Dataset. 25272-25283 - Rongtao Xu, Jian Zhang, Minghao Guo, Youpeng Wen, Haoting Yang, Min Lin, Jianzheng Huang, Zhe Li, Kaidong Zhang, Liqiong Wang, Yuxuan Kuang, Meng Cao, Feng Zheng, Xiaodan Liang:

$A_{0}$: An Affordance-Aware Hierarchical Model for General Robotic Manipulation. 13491-13501 - Rohan Sharma, Changyou Chen, Feng-Ju Chang, Seongjun Yun, Xiaohu Xie, Rui Meng, Dehong Xu, Alejandro Mottini, Qingjun Cui:

Multi-Modal Multi-Task Unified Embedding Model (M3T-UEM): A Task-Adaptive Representation Learning Framework. 22783-22793 - Hanwen Jiang, Qixing Huang, Georgios Pavlakos:

Real3D: Towards Scaling Large Reconstruction Models with Real Images. 5821-5833 - Alakh Desai, Nuno Vasconcelos:

Guiding Diffusion Models With Adaptive Negative Sampling Without External Resources. 16122-16131 - Wenwen Yu, Zhibo Yang, Yuliang Liu, Xiang Bai:

DocThinker: Explainable Multimodal Large Language Models with Rule-Based Reinforcement Learning for Document Understanding. 837-847 - Eyad Alshami, Shashank Agnihotri, Bernt Schiele, Margret Keuper:

AIM: Amending Inherent Interpretability via Self-Supervised Masking. 993-1003 - Langyu Wang, Bingke Zhu, Yingying Chen, Yiyuan Zhang, Ming Tang, Jinqiao Wang:

MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing. 20637-20646 - Ying Guo, Xi Liu, Cheng Zhen, Pengfei Yan, Xiaoming Wei:

ARIG: Autoregressive Interactive Head Generation for Real-Time Conversations. 12956-12965 - Elisabetta Fedele, Boyang Sun, Leonidas J. Guibas, Marc Pollefeys, Francis Engelmann:

SuperDec: 3D Scene Decomposition with Superquadric Primitives. 24625-24635 - Ke Niu, Haiyang Yu, Mengyang Zhao, Teng Fu, Siyang Yi, Wei Lu, Bin Li, Xuelin Qian, Xiangyang Xue:

ChatReID: Open-Ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models. 24245-24254 - Jungmin Lee, Seonghyuk Hong, Juyong Lee, Jaeyoon Lee, Jongwon Choi:

InsideOut: Integrated RGB-Radiative Gaussian Splatting for Comprehensive 3D Object Representation. 25820-25830 - Runtao Liu, Chen I Chieh, Jindong Gu, Jipeng Zhang, Renjie Pi, Qifeng Chen, Philip Torr, Ashkan Khakzar, Fabio Pizzati:

AlignGuard: Scalable Safety Alignment for Text-to-Image Generation. 17024-17034 - Mateusz Michalkiewicz, Sheena Bai, Mahsa Baktashmotlagh, Varun Jampani, Guha Balakrishnan:

Not All Views Are Created Equal: Analyzing Viewpoint Instabilities in Vision Foundation Models. 9113-9123 - Yuqing Wang, Zhijie Lin, Yao Teng, Yuanzhi Zhu, Shuhuai Ren, Jiashi Feng, Xihui Liu:

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. 18596-18605 - Yunchuan Guan, Yu Liu, Ke Zhou, Zhiqi Shen, Jenq-Neng Hwang, Serge J. Belongie, Lei Li:

Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy. 4188-4197 - Yufeng Zhong, Chengjian Feng, Feng Yan, Fanfan Liu, Liming Zheng, Lin Ma:

RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction. 6416-6425 - Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, Yunzhu Li:

PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos. 7219-7230 - Yilin Wang, Zunlei Feng, Jiachi Wang, Hengrui Lou, Binjia Zhou, Jie Lei, Mingli Song, Yijun Bei:

Spatial-Temporal Forgery Trace Based Forgery Image Identification. 17067-17076 - Yunsong Zhou, Naisheng Ye, William Ljungbergh, Tianyu Li, Jiazhi Yang, Zetong Yang, Hongzi Zhu, Christoffer Petersson, Hongyang Li:

Decoupled Diffusion Sparks Adaptive Scene Generation. 27760-27770 - Wenjie Xuan, Jing Zhang, Juhua Liu, Bo Du, Dacheng Tao:

Rethink Sparse Signals for Pose-Guided Text-to-Image Generation. 15896-15906 - Xiangbin Wei, Yuanfeng Wang, Ao Xu, Lingyu Zhu, Dongyong Sun, Keren Li, Yang Li, Qi Qin:

Noise2Score3D: Tweedie's Approach for Unsupervised Point Cloud Denoising. 25993-26003 - Sijie Li, Chen Chen, Jungong Han:

SimMLM: A Simple Framework for Multi-Modal Learning with Missing Modality. 24068-24077 - Jin-Hee Lee, Jae-Keun Lee, Jeseok Kim, Kwon Soon:

Power of Cooperative Supervision: Multiple Teachers Framework for Advanced 3D Semi-Supervised Object Detection. 6994-7003 - Andrew Bond, Jui-Hsien Wang, Long Mai, Erkut Erdem, Aykut Erdem:

GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting. 7187-7196 - Tong Wei, Yijun Yang, Junliang Xing, Yuanchun Shi, Zongqing Lu, Deheng Ye:

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-Based VLM Agent Training. 18855-18865 - Jongsuk Kim, Jaeyoung Lee, Gyojin Han, Dong-Jae Lee, Minki Jeong, Junmo Kim:

SynAD: Enhancing Real-World End-to-End Autonomous Driving Models through Synthetic Data Integration. 25197-25206 - Ziyu Zhang, Binbin Huang, Hanqing Jiang, Liyang Zhou, Xiaojun Xiang, Shuhan Shen:

Quadratic Gaussian Splatting: High Quality Surface Reconstruction with Second-Order Geometric Primitives. 28260-28270 - Amin Karimi Monsefi, Mridul Khurana, Rajiv Ramnath, Anuj Karpatne, Wei-Lun Chao, Cheng Zhang:

Taxadiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation. 8579-8589 - Haisheng Su, Junjie Zhang, Feixiang Song, Sanping Zhou, Wei Wu, Junchi Yan, Nanning Zheng:

FreqPDE: Rethinking Positional Depth Embedding for Multi-View 3D Object Detection Transformers. 28145-28155 - Yuchong Chen, Jian Yu, Shaoyan Gai, Zeyu Cai, Feipeng Da:

High-Precision 3D Measurement of Complex Textured Surfaces Using Multiple Filtering Approach. 25670-25679 - Ziwen Chen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yicong Hong, Fuxin Li, Zexiang Xu:

Long-LRM: Long-Sequence Large Reconstruction Model for Wide-Coverage Gaussian Splats. 4349-4359 - Peixi Wu, Bosong Chai, Menghua Zheng, Wei Li, Zhangchi Hu, Jie Chen, Zheyu Zhang, Hebei Li, Xiaoyan Sun:

Efficient Spiking Point Mamba for Point Cloud Analysis. 26393-26403 - Tajamul Ashraf, Janibul Bashir:

TITAN: Query-Token Based Domain Adaptive Adversarial Learning. 250-262 - Haotian Dong, Xin Wang, Di Lin, Yipeng Wu, Qin Chen, Ruonan Liu, Kairui Yang, Ping Li, Qing Guo:

NoiseController: Towards Consistent Multi-View Video Generation via Noise Decomposition and Collaboration. 14443-14452 - Jeongyun Kim, Seunghoon Jeong, Giseop Kim, Myung-Hwan Jeon, Eunji Jun, Ayoung Kim:

2D Gaussian Splatting-Based Sparse-View Transparent Object Depth Reconstruction Via Physics Simulation for Scene Update. 27927-27936 - Leekyeung Han, Hyunji Min, Gyeom Hwangbo, Jonghyun Choi, Paul Hongsuck Seo:

DialNav: Multi-Turn Dialog Navigation with a Remote Guide. 8514-8523 - Yusen Zhang, Wenliang Zheng, Aashrith Madasu, Peng Shi, Ryo Kamoi, Hao Zhou, Zhuoyang Zou, Shu Zhao, Sarkar Snigdha Sarathi Das, Vipul Gupta, Xiaoxin Lu, Nan Zhang, Ranran Haoran Zhang, Avitej Iyer, Renze Lou, Wenpeng Yin, Rui Zhang:

HRScene: How Far are VLMs from Effective High-Resolution Image Understanding? 22922-22933 - Zongyao Xue, Meina Kan, Shiguang Shan, Xilin Chen:

Feature Decomposition-Recomposition in Large Vision-Language Model for Few-Shot Class-Incremental Learning. 3153-3162 - Shengpeng Wang, Yulong Xie, Qing Liao, Wei Wang:

S3E: Self-Supervised State Estimation for Radar-Inertial System. 26686-26695 - Yucheng Suo, Fan Ma, Linchao Zhu, Tianyi Wang, Fengyun Rao, Yi Yang:

From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-Reward Alignment. 23243-23255 - Qian Wang

, Aleksandar Cvejic
, Abdelrahman Eldesokey, Peter Wonka:
EditClip: Representation Learning for Image Editing. 15960-15970 - Byungjun Kim, Shunsuke Saito, Giljoo Nam, Tomas Simon, Jason M. Saragih, Hanbyul Joo, Junxuan Li:

HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars. 9966-9976 - Fangqi Zhu, Hongtao Wu, Song Guo, Yuxiao Liu, Chilam Cheang, Tao Kong:

IRASim: A Fine-Grained World Model for Robot Manipulation. 9834-9844 - Lujun Li, Cheng Lin, Dezhi Li, You-Liang Huang, Wei Li, Tianyu Wu, Jie Zou, Wei Xue, Sirui Han, Yike Guo:

Efficient Fine-Tuning of Large Models Via Nested Low-Rank Adaptation. 22252-22262 - Haifeng Zhong, Fan Tang, Zhuo Chen, Hyung Jin Chang, Yixing Gao:

AMDANet: Attention-Driven Multi-Perspective Discrepancy Alignment for RGB-Infrared Image Fusion and Segmentation. 10645-10655 - Chengchang Tian, Jianwei Ma, Yan Huang, Zhanye Chen, Honghao Wei, Hui Zhang, Wei Hong:

DATA: Domain-And-Time Alignment for High-Quality Feature Fusion in Collaborative Perception. 28643-28652 - Xiaoxiao Wang, Chunxiao Li, Peng Sun, Boming Miao, Yunjian Zhang, Yao Zhu:

Towards Annotation-Free Evaluation: KPAScore for Human Keypoint Detection. 8441-8450 - Jingting Li, Yu Qian, Lin Zhao, Su-Jing Wang:

FED-PsyAU: Privacy-Preserving Micro-Expression Recognition Via Psychological Au Coordination and Dynamic Facial Motion Modeling. 14453-14463 - Xiaopeng Lin, Yulong Huang, Hongwei Ren, Zunchang Liu, Hongxiang Huang, Yue Zhou, Haotian Fu, Bojun Cheng:

ClearSight: Human Vision-Inspired Solutions for Event-Based Motion Deblurring. 7462-7471 - Shibo Wang, Haonan He, Maria Parelli, Christoph Gebhardt, Zicong Fan, Jie Song:

MagicHOI: Leveraging 3D Priors for Accurate Hand-Object Reconstruction from Short Monocular Video Clips. 5957-5968 - Samuel Clarke, Suzannah Wistreich, Yanjie Ze, Jiajun Wu:

X-Capture: An Open-Source Portable Device for Multi-Sensory Learning. 6436-6446 - Ying-Tian Liu, Jiajun Li, Yu-Tao Liu, Xin Yu, Yuan-Chen Guo, Yan-Pei Cao, Ding Liang, Ariel Shamir, Song-Hai Zhang:

NeuFrameQ: Neural Frame Fields for Scalable and Generalizable Anisotropic Quadrangulation. 28000-28009 - Tatiana Zemskova, Dmitry A. Yudin:

3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding. 8885-8895 - Zhenghong Zhou, Jie An, Jiebo Luo:

Latent-Reframe: Enabling Camera Control for Video Diffusion Models Without Training. 12779-12789 - Liang Xu, Chengqun Yang, Zili Lin, Fei Xu, Yifan Liu, Congsheng Xu, Yiyi Zhang, Jie Qin, Xingdong Sheng, Yunhui Liu, Xin Jin, Yichao Yan, Wenjun Zeng, Xiaokang Yang:

Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions. 12535-12548 - Yuxin Cheng, Binxiao Huang, Taiqiang Wu, Wenyong Zhou, Chenchen Ding, Zhengwu Liu, Graziano Chesi, Ngai Wong:

Perspective-Aware 3D Gaussian Inpainting with Multi-View Consistency. 28503-28513 - Nuo Chen, Chao Xiao, Yimian Dai, Shiman He, Miao Li, Wei An:

Event-Based Tiny Object Detection: A Benchmark Dataset and Baseline. 7209-7218 - Hanyu Zhou, Gim Hee Lee:

LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs. 22294-22304 - Dongbin Zhang, Yunfei Liu, Lijian Lin, Ye Zhu, Yang Li, Minghan Qin, Yu Li, Haoqian Wang:

GUAVA: Generalizable Upper Body 3D Gaussian Avatar. 14205-14217 - Moslem Yazdanpanah, Ali Bahri, Mehrdad Noori, Sahar Dastani, Gustavo Adolfo Vargas Hakim, David Osowiechi, Ismail Ben Ayed, Christian Desrosiers:

Purge-Gate: Backpropagation-Free Test-Time Adaptation for Point Clouds Classification via Token Purging. 27640-27649 - Wenyao Zhang, Hongsi Liu, Bohan Li, Jiawei He, Zekun Qi, Yunnan Wang, Shengyang Zhao, Xinqiang Yu, Wenjun Zeng, Xin Jin:

Hybrid-Grained Feature Aggregation with Coarse-to-Fine Language Guidance for Self-Supervised Monocular Depth Estimation. 6678-6692 - Kaichen Zhang, Yifei Shen, Bo Li, Ziwei Liu:

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models. 3650-3661 - Haochen Wang, Yucheng Zhao, Tiancai Wang, Haoqiang Fan, Xiangyu Zhang, Zhaoxiang Zhang:

Ross3d: Reconstructive Visual Instruction Tuning With 3D-Awareness. 9275-9286 - Revant Teotia, Candace Ross, Karen Ullrich, Sumit Chopra, Adriana Romero-Soriano, Melissa Hall, Matthew J. Muckley:

DIMCIM: A Quantitative Evaluation Framework for Default-Mode Diversity and Generalization in Text-to-Image Generative Models. 16431-16440 - Guowei Xu, Peng Jin, Ziang Wu, Hao Li, Yibing Song, Lichao Sun, Li Yuan:

LlaVA-CoT: Let Vision Language Models Reason Step-By-Step. 2087-2098 - Kuniaki Saito, Donghyun Kim, Kwanyong Park, Atsushi Hashimoto, Yoshitaka Ushiku:

CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning. 19872-19881 - Yuxi Xiao, Jianyuan Wang, Nan Xue, Nikita Karaev, Yuri Makarov, Bingyi Kang, Xing Zhu, Hujun Bao, Yujun Shen, Xiaowei Zhou:

SpatialTrackerV2: Advancing 3D Point Tracking with Explicit Camera Motion. 6726-6737 - Ji Du

, Xin Wang, Fangwei Hao, Mingyang Yu, Chunyuan Chen, Jiesheng Wu, Bin Wang, Jing Xu, Ping Li:
Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection. 22131-22142 - Yudong Liu, Jingwei Sun, Yueqian Lin, Jianyi Zhang, Jingyang Zhang, Ming Yin, Qinsi Wang, Hai Li, Yiran Chen:

Keyframe-Oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-form Video Processing. 20802-20811 - Zhaoxin Yuan, Shuang Yang, Shiguang Shan, Xilin Chen:

Not Only Vision: Evolve Visual Speech Recognition via Peripheral Information. 3091-3100 - Zeyu Liu, Zanlin Ni, Yeguo Hua, Xin Deng, Xiao Ma, Cheng Zhong, Gao Huang:

CODA: Repurposing Continuous VAEs for Discrete Tokenization. 18906-18916 - Jinglun Li, Kaixun Jiang, Zhaoyu Chen, Bo Li, Yao Tang, Weifeng Ge, Wenqiang Zhang:

Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection. 4496-4506 - Fanjie Kong, Yitong Li, Weihuang Chen, Chen Min, Yizhe Li, Zhiqiang Gao, Haoyang Li, Zhongyu Guo, Hongbin Sun:

VLR-Driver: Large Vision-Language-Reasoning Models for Embodied Autonomous Driving. 26966-26976 - Shuchang Ye, Usman Naseem, Mingyuan Meng, Jinman Kim:

Alleviating Textual Reliance in Medical Language-Guided Segmentation via Prototype-Driven Semantic Approximation. 22316-22326 - Beier Zhu, Ruoyu Wang, Tong Zhao, Hanwang Zhang, Chi Zhang:

Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models. 19557-19566 - Katja Schwarz, Norman Müller, Peter Kontschieder:

Generative Gaussian Splatting: Generating 3D Scenes with Video Diffusion Priors. 27510-27520 - Chengxuan Zhu, Qingnan Fan, Qi Zhang, Jinwei Chen, Huaqi Zhang, Chao Xu, Boxin Shi:

BokehDiff: Neural Lens Blur with One-Step Diffusion. 1369-1379 - Shehreen Azad, Yogesh Singh Rawat:

DisenQ: Disentangling Q-Former for Activity-Biometrics. 13502-13512 - Yatai Ji, Jiacheng Zhang, Jie Wu, Shilong Zhang, Shoufa Chen, Chongjian Ge, Peize Sun, Weifeng Chen, Wenqi Shao, Xuefeng Xiao, Weilin Huang, Ping Luo:

Prompt-A-Video: Prompt your Video Diffusion Model via Preference-Aligned LLM. 18725-18735 - Kang Zeng, Guojin Zhong, Jintao Cheng, Jin Yuan, Zhiyong Li:

AVAM: A Universal Training-Free Adaptive Visual Anchoring Embedded into Multimodal Large Language Model for Multi-Image Question Answering. 2292-2302 - Tianma Shen, Aditya Puranik, James Vong, Vrushabh Abhijit Deogirikar, Ryan Fell, Julianna Dietrich, Maria Kyrarini, Christopher Kitts, David C. Jeong:

Fish2Mesh Transformer: 3D Human Mesh Recovery from Egocentric Vision. 6498-6507 - Yu Cheng, Fajie Yuan:

LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models. 15692-15702 - Yang Liu, Yufei Yin, Chenchen Jing, Muzhi Zhu, Hao Chen, Yuling Xi, Bo Feng, Hao Wang, Shiyu Li, Chunhua Shen:

Unified Open-World Segmentation with Multi-Modal Prompts. 21557-21567 - Ziyin Zhou, Yunpeng Luo, Yuanchen Wu, Ke Sun, Jiayi Ji, Ke Yan, Shouhong Ding, Xiaoshuai Sun, Yunsheng Wu, Rongrong Ji:

Aigi-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models. 18746-18758 - Mukilan Karuppasamy, Shankar Gangisetty, Shyam Nandan Rai, Carlo Masone, C. V. Jawahar:

Towards Safer and Understandable Driver Intention Prediction. 25378-25387 - Liang Chen, Ghazi Shazan Ahmad, Tianjun Yao, Lingqiao Liu, Zhiqiang Shen:

One Last Attention for Your Vision-Language Model. 1-10 - Jiajin Tang, Zhengxuan Wei, Yuchen Zhu, Cheng Shi, Guanbin Li, Liang Lin, Sibei Yang:

Sim-DETR: Unlock DETR for Temporal Sentence Grounding. 22760-22771 - Dong Zhao, Qi Zang, Shuang Wang, Nicu Sebe

, Zhun Zhong:
Pseudo-SD: Pseudo Controlled Stable Diffusion for Semi-Supervised and Cross-Domain Semantic Segmentation. 22393-22403 - Ryan Rabinowitz, Steve Cruz, Walter J. Scheirer, Terrance E. Boult:

COSTARR: Consolidated Open Set Technique with Attenuation for Robust Recognition. 4146-4155 - Hoang Phan, Lam Tran, Quyen Tran, Ngoc N. Tran, Tuan Truong, Qi Lei, Nhat Ho, Dinh Q. Phung, Trung Le:

Beyond Losses Reweighting: Empowering Multi-Task Learning via the Generalization Perspective. 2440-2450 - Xiaofan Li, Zhihao Xu, Chenming Wu, Zhao Yang, Yumeng Zhang, Jiang-Jiang Liu, Haibao Yu, Xiaoqing Ye, Yuan Wang, Shirui Li, Xun Sun, Ji Wan, Jun Wang:

U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration. 24889-24898 - Yifan Jiao, Yunhao Li, Junhua Ding, Qing Yang, Song Fu, Heng Fan, Libo Zhang:

GSOT3D: Towards Generic 3D Single Object Tracking in the Wild. 5469-5478 - David Pujol-Perich, Sergio Escalera, Albert Clapés:

Sparse-Dense Side-Tuner for Efficient Video Temporal Grounding. 21515-21524 - Xiang Zhang, Yawar Siddiqui, Armen Avetisyan, Chris Xie, Jakob J. Engel, Henry Howard-Jenkins:

VertexRegen: Mesh Generation with Continuous Level of Detail. 12570-12580 - Zizhuo Li, Yifan Lu, Linfeng Tang, Shihua Zhang, Jiayi Ma:

CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching. 18521-18530 - Simon Niedermayr, Christoph Neuhauser, Rüdiger Westermann:

Lightweight Gradient-Aware Upscaling of 3D Gaussian Splatting Images. 25862-25871 - Maitreya Patel, Song Wen, Dimitris N. Metaxas, Yezhou Yang:

FlowChef: Steering of Rectified Flow Models for Controlled Generations. 15308-15318 - Dongli Tan, Xingyi He, Sida Peng, Yiqing Gong, Xing Zhu, Jiaming Sun, Ruizhen Hu, Yujun Shen, Hujun Bao, Xiaowei Zhou:

ReTracker: Exploring Image Matching for Robust Online Any Point Tracking. 4306-4316 - Ziyu Zhu, Xilin Wang, Yixuan Li, Zhuofan Zhang, Xiaojian Ma, Yixin Chen, Baoxiong Jia, Wei Liang, Qian Yu, Zhidong Deng, Siyuan Huang, Qing Li:

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation. 8120-8132 - Chang Liu, Mingxu Zhu, Zheyuan Zhang, Linna Song, Xiao Zhao, Qingliang Luo, Qi Wang, Chufan Guo, Kuifeng Su:

TAD-E2E: A Large-Scale End-to-End Autonomous Driving Dataset. 26600-26609 - Xingbo Yao, Xuanmin Wang, Hao Wu, Chengliang Ping, Doudou Zhang, Hui Xiong:

MagicCity: Geometry-Aware 3D City Generation from Satellite Imagery with Multi-View Consistency. 25325-25334 - Elias Marks, Lucas Nunes, Federico Magistri, Matteo Sodano, Rodrigo Marcuzzi, Lars Zimmermann, Jens Behley, Cyrill Stachniss:

Tree Skeletonization From 3D Point Clouds by Denoising Diffusion. 27607-27608 - Ivan Sabolic, Matej Grcic, Sinisa Segvic:

Seal Your Backdoor with Variational Defense. 752-764 - Hao Li, Xiang Chen, Jiangxin Dong, Jinhui Tang, Jinshan Pan:

FoundIR: Unleashing Million-Scale Training Data to Advance Foundation Models for Image Restoration. 12626-12636 - Jeonghoon Park, Juyoung Lee, Chaeyeon Chung, Jaeseong Lee, Jaegul Choo, Jindong Gu:

Fair Generation without Unfair Distortions: Debiasing Text-To-Image Generation with Entanglement-Free Attention. 17567-17576 - Jiaben Chen, Xin Yan, Yihang Chen, Siyuan Cen, Zixin Wang, Qinwei Ma, Haoyu Zhen, Kaizhi Qian, Lie Lu, Chuang Gan:

RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text. 10097-10107 - Pablo Garcia-Fernandez, Lorenzo Vaquero, Mingxuan Liu, Feng Xue, Daniel Cores, Nicu Sebe

, Manuel Mucientes, Elisa Ricci:
Superpowering Open-Vocabulary Object Detectors for X-ray Vision. 20770-20779 - JiaKui Hu, Yuxiao Yang, Jialun Liu, Jinbo Wu, Chen Zhao, Yanye Lu:

Auto-Regressively Generating Multi-View Consistent Images. 2556-2566 - Yuechen Xie, Jie Song, Yicheng Shan, Xiaoyan Zhang, Yuanyu Wan, Shengxuming Zhang, Jiarui Duan, Mingli Song:

Dataset Ownership Verification for Pre-Trained Masked Models. 3132-3142 - Yiming Cui, Liang Li, Haibing Yin, Yuhan Gao, Yaoqi Sun, Chenggang Yan:

Debiased Teacher for Day-to-Night Domain Adaptive Object Detection. 2577-2587 - Peiqi Chen, Lei Yu, Yi Wan, Yingying Pei, Xinyi Liu, Yongxiang Yao, Yingying Zhang, Lixiang Ru, Liheng Zhong, Jingdong Chen, Ming Yang, Yongjun Zhang:

CasP: Improving Semi-Dense Feature Matching Pipeline Leveraging Cascaded Correspondence Priors for Guidance. 28063-28072 - Liangyu Xiang, Junyu Gao, Changsheng Xu:

Evidential Knowledge Distillation. 2814-2824 - Hyeonho Jeong, Suhyeon Lee, Jong Chul Ye:

Reangle-A-Video: 4D Video Generation as Video-to-Video Translation. 11164-11175 - Yuheng Liu, Xinke Li, Yuning Zhang, Lu Qi, Xin Li, Wenping Wang, Chongshou Li, Xueting Li, Ming-Hsuan Yang:

Controllable 3D Outdoor Scene Generation via Scene Graphs. 28052-28062 - Yuru Jia, Valerio Marsocci, Ziyang Gong, Xue Yang, Maarten Vergauwen, Andrea Nascetti:

Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models? 8429-8440 - Cui Miao, Tao Chang, Meihan Wu, Hongbin Xu, Chun Li, Ming Li, Xiaodong Wang:

FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation. 6904-6913 - Grzegorz Gruszczynski, Jakub J. Meixner, Michal Jan Wlodarczyk, Przemyslaw Musialski:

Beyond Blur: A Fluid Perspective on Generative Diffusion Models. 17818-17827 - Junjie Wu, Jiangtao Xie, Zhaolin Zhang, Qilong Wang, Qinghua Hu, Peihua Li, Sen Xu:

DALIP: Distribution Alignment-Based Language-Image Pre-Training for Domain-Specific Data. 2099-2109 - Simon Kiefhaber, Stefan Roth, Simone Schaub-Meyer:

Removing Cost Volumes from Optical Flow Estimators. 79-89 - Marshall Thomas, Edward Fish, Richard Bowden:

VALLR: Visual ASR Language Model for Lip Reading. 2846-2856 - Hanshi Wang, Jin Gao, Weiming Hu, Zhipeng Zhang:

Height-Fidelity Dense Global Fusion for Multi-Modal 3D Object Detection. 26664-26674 - Bowen Chen, Yun Sing Koh, Gillian Dobbie:

GloPER: Unsupervised Animal Pattern Extraction from Local Reconstruction. 6519-6529 - Heng Su, Mengying Xie, Nieqing Cao, Yan Ding, Beichen Shao, Xianlei Long, Fuqiang Gu, Chao Chen:

OVA-Fields: Weakly Supervised Open-Vocabulary Affordance Fields for Robot Operational Part Detection. 6385-6395 - Chaojun Ni, Xiaofeng Wang, Zheng Zhu, Weijie Wang, Haoyun Li, Guosheng Zhao, Jie Li, Wenkang Qin, Guan Huang, Wenjun Mei:

WonderTurbo: Generating Interactive 3D World in 0.72 Seconds. 27423-27434 - In Cho, Youngbeom Yoo, Subin Jeon, Seon Joo Kim:

Representing 3D Shapes with 64 Latent Vectors for 3D Diffusion Models. 28556-28566 - Zexi Jia, Chuanwei Huang, Hongyan Fei, Yeshuang Zhu, Zhiqiang Yuan, Ying Deng, Jiapei Zhang, Jinchao Zhang, Jie Zhou:

A Visual Leap in Clip Compositionality Reasoning Through Generation of Counterfactual Sets. 23498-23507 - Taiga Yamane, Ryo Masumura, Satoshi Suzuki, Shota Orihashi:

MVTrajecter: Multi-View Pedestrian Tracking With Trajectory Motion Cost and Trajectory Appearance Cost. 13270-13280 - Joonghyuk Shin, Alchan Hwang, Yujin Kim, Daneul Kim, Jaesik Park:

Exploring Multimodal Diffusion Transformers for Enhanced Prompt-Based Image Editing. 19492-19502 - Wenxuan Wu, Ruowen Qu, Zhongliang Liu, Zhuoyan Dai, Dongzi Shi, Sijin Yu, Tong Xiong, Shiping Liu, Xiangmin Xu, Xiaofen Xing, Xin Zhang:

Drawing Developmental Trajectory From Cortical Surface Reconstruction. 11026-11035 - Kent Gauen, Stanley H. Chan:

Bayesian-Inspired Space-Time Superpixels. 5382-5391 - Hanwen Jiang, Hao Tan, Peng Wang, Hai Jin, Yue Zhao, Sai Bi, Kai Zhang, Fujun Luan, Kalyan Sunkavalli, Qixing Huang, Georgios Pavlakos:

Rayzer: a Self-Supervised Large View Synthesis Model. 4918-4929 - Tao Wang, Peiwen Xia, Bo Li, Peng-Tao Jiang, Zhe Kong, Kaihao Zhang, Tong Lu, Wenhan Luo:

MOERL: When Mixture-Of-Experts Meet Reinforcement Learning for Adverse Weather Image Restoration. 13673-13683 - Xuemeng Yang, Licheng Wen, Tiantian Wei, Yukai Ma, Jianbiao Mei, Xin Li, Wenjie Lei, Daocheng Fu, Pinlong Cai, Min Dou, Liang He, Yong Liu, Botian Shi, Yu Qiao:

DriveArena: A Closed-Loop Generative Simulation Platform for Autonomous Driving. 26933-26943 - Melanie Melanie Gotz, Torsten Kraub, Alexandra Dmitrienko:

Sibai: A Few-Shot Meta-Classifier for Poisoning Detection in Federated Learning. 3787-3797 - Yaqing Ding, Viktor Kocur, Václav Vávra, Zuzana Berger Haladová, Jian Yang, Torsten Sattler, Zuzana Kukelova:

RePoseD: Efficient Relative Pose Estimation With Known Depth Information. 14876-14886 - Tianshuo Peng, Mingsheng Li, Jiakang Yuan, Hongbin Zhou, Renqiu Xia, Renrui Zhang, Lei Bai, Song Mao, Bin Wang, Aojun Zhou, Botian Shi, Tao Chen, Bo Zhang, Xiangyu Yue:

Chimera: Improving Generalist Model with Domain-Specific Experts. 3011-3022 - Byeonghun Lee, Hyunmin Cho, Hong Gyu Choi, Soo Min Kang, Iljun Ahn, Kyong Hwan Jin:

Reference-Based Super-Resolution via Image-Based Retrieval-Augmented Generation Diffusion. 10764-10774 - Xiang Lv, Mingwen Shao, Lingzhuang Meng, Chang Liu, Yecong Wan, Xinyuan Chen:

SUV: Suppressing Undesired Video Content via Semantic Modulation Based on Text Embeddings. 18357-18366 - Hongyu Zhu, Sichu Liang, Wenwen Wang, Zhuomeng Zhang, Fangqi Li, Shi-Lin Wang:

Evading Data Provenance in Deep Neural Networks. 1249-1260 - David Svitov, Pietro Morerio, Lourdes Agapito, Alessio Del Bue:

BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis. 25029-25039 - Aritra Bhowmik, Mohammad Mahdi Derakhshani, Dennis C. Koelma, Yuki M. Asano, Martin R. Oswald, Cees G. M. Snoek:

TWIST & SCOUT: Grounding Multimodal LLM-Experts by Forget-Free Tuning. 1359-1368 - Francisco Caetano, Christiaan G. A. Viviers, Luis Albert Zavala-Mondragón, Peter H. N. de With, Fons van der Sommen:

DisCoPatch: Taming Adversarially-Driven Batch Statistics for Improved Out-of-Distribution Detection. 2898-2908 - Raphi Kang, Yue Song, Georgia Gkioxari, Pietro Perona:

Is CLIP Ideal? No. Can We Fix It? Yes! 22436-22446 - Amir Mehrpanah, Matteo Gamba, Kevin Smith, Hossein Azizpour:

On the Complexity-Faithfulness Trade-Off of Gradient-Based Explanations. 3531-3541 - Longfei Huang, Yu Liang, Hao Zhang, Jinwei Chen, Wei Dong, Lunde Chen, Wanyu Liu, Bo Li, Peng-Tao Jiang:

SDMATTE: Grafting Diffusion Models for Interactive Matting. 15229-15239 - Qian Liang, Ruixu Geng, Jinbo Chen, Haoyu Wang, Yan Chen, Yang Hu:

Spatial Alignment and Temporal Matching Adapter for Video-Radar Remote Physiological Measurement. 8623-8633 - Shunsuke Yasuki, Taiki Miyanishi, Nakamasa Inoue, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Masato Taki, Yutaka Matsuo:

GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields. 9737-9748 - Qihang Fan, Huaibo Huang, Mingrui Chen, Ran He:

Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens. 4019-4028 - Anh Thai, Songyou Peng, Kyle Genova, Leonidas J. Guibas, Thomas A. Funkhouser:

Splattalk: 3D VQA with Gaussian Splatting. 4712-4721 - Yiyu Li

, Haoyuan Wang, Ke Xu
, Gerhard Petrus Hancke, Rynson W. H. Lau:
SeHDR: Single-Exposure HDR Novel View Synthesis Via 3D Gaussian Bracketing. 26045-26054 - Yuxuan Cai, Jiangning Zhang, Haoyang He, Xinwei He, Ao Tong, Zhenye Gan, Chengjie Wang, Zhucun Xue, Yong Liu, Xiang Bai:

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models. 239-249 - Xinjie Zhang, Zhening Liu, Yifan Zhang, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Zehong Lin, Shuicheng Yan, Jun Zhang:

MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes. 27828-27838 - Xinzi Cao, Ke Chen, Feidiao Yang, Xiawu Zheng, Yonghong Tian, Yutong Lu:

ALLGCD: Leveraging All Unlabeled Data for Generalized Category Discovery. 3293-3303 - Zikai Zhou, Shitong Shao, Lichen Bai, Shufei Zhang, Zhiqiang Xu, Bo Han, Zeke Xie:

Golden Noise for Diffusion Models: A Learning Framework. 17688-17697 - Yu Sheng, Jiajun Deng, Xinran Zhang, Yu Zhang, Bei Hua, Yanyong Zhang, Jianmin Ji:

SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images. 26404-26414 - Le Zhuo, Liangbing Zhao

, Sayak Paul, Yue Liao, Renrui Zhang, Yi Xin, Peng Gao, Mohamed Elhoseiny
, Hongsheng Li:
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning. 15329-15339 - Qingqian Yang, Peishen Yan, Xiaoyu Wu, Jiaru Zhang, Tao Song, Yang Hua, Hao Wang, Liangliang Wang, Haibing Guan:

Stealthy Backdoor Attack in Federated Learning via Adaptive Layer-Wise Gradient Alignment. 29163-29172 - Sitao Zhang, Hongda Mao, Qingshuang Chen, Yelin Kim:

Efficient Visual Place Recognition Through Multimodal Semantic Knowledge Integration. 5601-5610 - Bhavna Gopal, Huanrui Yang, Mark Horton, Yiran Chen:

SAFER: Sharpness Aware Layer-Selective Finetuning for Enhanced Robustness in Vision Transformers. 3999-4008 - Gencer Sumbul, Chang Xu, Emanuele Dalsasso, Devis Tuia:

SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images. 5569-5578 - Qi Zhao, Xingyu Ni, Ziyu Wang, Feng Cheng, Ziyan Yang, Lu Jiang, Bohan Wang:

Synthetic Video Enhances Physical Fidelity in Video Synthesis. 12135-12146 - Zhongquan Jian, Yanhao Chen, Yancheng Wang, Junfeng Yao, Meihong Wang, Qingqiang Wu:

Supervised Exploratory Learning for Long-Tailed Visual Recognition. 1870-1880 - Pei He, Lingling Li, Licheng Jiao, Ronghua Shang, Fang Liu, Shuang Wang, Xu Liu, Wenping Ma:

Domain-Aware Category-Level Geometry Learning Segmentation for 3D Point Clouds. 28324-28333 - Jack Langerman, Denys Rozumnyi, Yuzhong Huang, Dmytro Mishkin:

Explaining Human Preferences via Metrics for Structured 3D Reconstruction. 26944-26953 - Matteo Dunnhofer, Zaira Manigrasso, Christian Micheloni:

Is Tracking Really More Challenging in First Person Egocentric Vision? 5879-5889 - Huiyang Hu, Peijin Wang, Hanbo Bi, Boyuan Tong, Zhaozhi Wang, Wenhui Diao, Hao Chang, Yingchao Feng, Ziqi Zhang, Yaowei Wang, Qixiang Ye, Kun Fu, Xian Sun:

RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model. 9876-9887 - Wenxuan Guo, Xiuwei Xu, Hang Yin, Ziwei Wang, Jianjiang Feng, Jie Zhou, Jiwen Lu:

IGL-Nav: Incremental 3D Gaussian Localization for Image-Goal Navigation. 6808-6817 - Zijie Wang, Weiming Zhang, Wei Zhang, Xiao Tan, Hongxing Liu, Yaowei Wang, Guanbin Li:

LaneDiffusion: Improving Centerline Graph Learning via Prior Injected BEV Feature Generation. 27052-27062 - Xingjian Wang, Li Chai, Jiming Chen:

Debiasing Trace Guidance: Top-Down Trace Distillation and Bottom-up Velocity Alignment for Unsupervised Anomaly Detection. 22989-22998 - Shihao Zhou, Dayu Li, Jinshan Pan, Juncheng Zhou, Jinglei Shi, Jufeng Yang:

Devil is in the Uniformity: Exploring Diverse Learners Within Transformer for Image Restoration. 12307-12317 - Mingfeng Zha, Tianyu Li

, Guoyin Wang, Peng Wang, Yangyang Wu, Yang Yang, Heng Tao Shen:
Implicit Counterfactual Learning for Audio-Visual Segmentation. 22349-22360 - Nisha Huang, Henglin Liu, Yizhou Lin, Kaer Huang, Chubin Chen, Jie Guo, Tong-Yee Lee, Xiu Li:

MaTe: Images are All You Need for Material Transfer via Diffusion Transformer. 15117-15126 - Weijia Zhang, Fei Xie, Tom Weidong Cai, Chao Ma:

VRM: Knowledge Distillation via Virtual Relation Matching. 2707-2717 - Goker Erdogan, Nikhil Parthasarathy, Catalin Ionescu, Drew A. Hudson, Alexander Lerchner, Andrew Zisserman, Mehdi S. M. Sajjadi, João Carreira:

LayerLock: Non-Collapsing Representation Learning with Progressive Freezing. 19461-19470 - Ömer Veysel Çagatan, Ömer Faruk Tal, M. Emre Gürsoy:

Adversarial Robustness of Discriminative Self-Supervised Learning in Vision. 2313-2324 - Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata, Sergey Tulyakov, Jian Ren, Anil Kag:

Scalable Ranked Preference Optimization for Text-To-Image Generation. 1-12 - Xinlong Ding, Hongwei Yu, Jiawei Li, Feifan Li, Yu Shang, Bochao Zou, Huimin Ma, Jiansheng Chen:

Kaleidoscopic Background Attack: Disrupting Pose Estimation With Multi-Fold Radial Symmetry Textures. 28483-28492 - Wontae Kim, Keuntek Lee, Nam Ik Cho:

Lightweight and Fast Real-Time Image Enhancement via Decomposition of the Spatial-Aware Lookup Tables. 11895-11905 - Fa-Ting Hong, Zunnan Xu, Zixiang Zhou, Jun Zhou, Xiu Li, Qin Lin, Qinglin Lu, Dan Xu:

Audio-Visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation. 12549-12558 - Vladimir Kulikov, Matan Kleiner, Inbar Huberman-Spiegelglas, Tomer Michaeli:

Flowedit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models. 19721-19730 - Boxiao Pan, Adam W. Harley, Francis Engelmann, C. Karen Liu, Leonidas J. Guibas:

LookOut: Real-World Humanoid Egocentric Navigation. 24977-24988 - Rui Yang, Huining Li, Yiyi Long, Xiaojun Wu, Shengfeng He:

Stroke2Sketch: Harnessing Stroke Attributes for Training-Free Sketch Generation. 16545-16554 - Hyunjoon Lee, Joonkyu Min, Jaesik Park:

CF3: Compact and Fast 3D Feature Fields. 27906-27916 - Fu Rong, Meng Lan, Qian Zhang, Lefei Zhang:

MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation. 23979-23989 - Panjian Huang, Saihui Hou, Junzhou Huang, Yongzhen Huang:

Learning a Unified Template for Gait Recognition. 12459-12469 - Tao Han, Wanghan Xu, Junchao Gong, Xiaoyu Yue, Song Guo, Luping Zhou, Lei Bai:

InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis. 17941-17950 - Ming Li, Xin Gu, Fan Chen, Xiaoying Xing, Longyin Wen, Chen Chen, Sijie Zhu:

SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing. 19206-19215 - Yufei Zhu, Yiming Zhong, Zemin Yang, Peishan Cong, Jingyi Yu, Xinge Zhu, Yuexin Ma:

Evolvinggrasp: Evolutionary Grasp Generation Via Efficient Preference Alignment. 11665-11674 - Qiucheng Wu, Handong Zhao, Michael Saxon, Trung Bui, William Yang Wang, Yang Zhang, Shiyu Chang:

VSP: Diagnosing the Dual Challenges of Perception and Reasoning in Spatial Planning Tasks for MLLMS. 2270-2280 - Yachun Mi, Yu Li, Weicheng Meng, Chaofeng Chen, Chen Hui, Shaohui Liu:

MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment. 18498-18509 - Yash Garg, Saketh Bachu, Arindam Dutta, Rohit Lal, Sarosij Bose, Calvin-Khang Ta, M. Salman Asif, Amit K. Roy-Chowdhury:

VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation Under Real Occlusions. 7350-7360 - Qin Zhou, Guoyan Liang, Xindi Li, Jingyuan Chen, Zhe Wang, Chang Yao, Sai Wu:

Learnable Retrieval Enhanced Visual-Text Alignment and Fusion for Radiology Report Generation. 22529-22538 - Yunjiang Xu, Lingzhi Li, Jin Wang, Yupeng Ouyang, Benyuan Yang:

INSTINCT: Instance-Level Interaction Architecture for Query-Based Collaborative Perception. 25464-25473 - Zhuqiang Lu, Zhenfei Yin, Mengwei He, Zhihui Wang, Zicheng Liu, Zhiyong Wang, Kun Hu:

B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens. 24549-24559 - Hailong Yan, Ao Li, Xiangtao Zhang, Zhe Liu, Zenglin Shi, Ce Zhu, Le Zhang:

MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices. 21949-21960 - Ada Gorgun, Bernt Schiele, Jonas Fischer:

VITAL: More Understandable Feature Visualization through Distribution Alignment and Relevant Information Flow. 4403-4412 - Huanjin Yao, Jiaxing Huang, Yawen Qiu, Michael K. Chen, Wenzheng Liu, Wei Zhang, Wenjie Zeng, Xikun Zhang, Jingyi Zhang, YuXin Song, Wenhao Wu, Dacheng Tao:

MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI. 273-283 - Byung-Ki Kwon, Qi Dai, Lee Hyoseok, Chong Luo, Tae-Hyun Oh:

JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers. 25261-25271 - Yefei He, Feng Chen, Jing Liu, Wenqi Shao, Hong Zhou, Kaipeng Zhang, Bohan Zhuang:

ZipVL: Accelerating Vision-Language Models Through Dynamic Token Sparsity. 20477-20486 - Tianyu Zou, Shengwu Xiong, Ruilin Yao, Yi Rong:

Balancing Conservatism and Aggressiveness: Prototype-Affinity Hybrid Network for Few-Shot Segmentation. 20561-20571 - Ming Hu, Kun Yuan, Yaling Shen, Feilong Tang, Xiaohao Xu, Lin Zhou, Wei Li, Ying Chen, Zhongxing Xu, Zelin Peng, Siyuan Yan, Vinkle Srivastav, Diping Song, Tianbin Li, Danli Shi, Jin Ye, Nicolas Padoy, Nassir Navab, Junjun He, Zongyuan Ge:

OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining. 1-12 - Shuang Guo, Friedhelm Hamann, Guillermo Gallego:

Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras. 7980-7989 - Zongyu Lin, Wei Liu, Chen Chen, Jiasen Lu, Wenze Hu, Tsu-Jui Fu, Jesse Allardice, Zhengfeng Lai, Liangchen Song, Bowen Zhang, Cha Chen, Yiran Fei, Lezhi Li, Yinfei Yang, Yizhou Sun, Kai-Wei Chang:

STIV: Scalable Text and Image Conditioned Video Generation. 16249-16259 - Zepeng Su, Zhulin Liu, Zongyan Zhang, Tong Zhang, C. L. Philip Chen:

TimeBooth: Disentangled Facial Invariant Representation for Diverse and Personalized Face Aging. 12147-12157 - Yuhang Li, Zhuying Li, Yuheng Jia:

Boosting Class Representation via Semantically Related Instances for Robust Long-Tailed Learning with Noisy Labels. 1516-1525 - Lin Bie, Siqi Li, Yifan Feng, Yue Gao:

Hyper-Depth: Hypergraph-Based Multi-Scale Representation Fusion for Monocular Depth Estimation. 5081-5090 - Yuhao Wang, Wei Xi:

UniConvNet: Expanding Effective Receptive Field While Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale. 20922-20933 - Fucai Ke, Vijay Kumar B. G, Xingjian Leng, Zhixi Cai, Zaid Khan, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi, Manmohan Chandraker:

DWIM: Towards Tool-Aware Visual Reasoning via Discrepancy-Aware Workflow Generation & Instruct-Masking Tuning. 3378-3389 - Heng Jia, Linchao Zhu, Na Zhao:

H3R: Hybrid Multi-view Correspondence for Generalizable 3D Reconstruction. 7655-7665 - Yannick Burkhardt, Simon Schaefer, Stefan Leutenegger:

SuperEvent: Cross-Modal Learning of Event-Based Keypoint Detection for SLAM. 8918-8928 - Yitong Jiang, Jinwei Gu, Tianfan Xue, Ka Chun Cheung, Pavlo Molchanov, Hongxu Yin, Sifei Liu:

Token-Efficient VLM: High-Resolution Image Understanding Via Dynamic Region Proposal. 24147-24158 - Kun Li, Pengyu Liu, Dan Guo, Fei Wang, Zhiliang Wu, Hehe Fan, Meng Wang:

MMAD: Multi-Label Micro-Action Detection in Videos. 13225-13236 - Shijie Ma, Yuying Ge, Teng Wang, Yuxin Guo, Yixiao Ge, Ying Shan:

GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers. 24402-24412 - Gene Chou, Wenqi Xian, Guandao Yang, Mohamed Abdelfattah, Bharath Hariharan, Noah Snavely, Ning Yu, Paul Debevec:

FlashDepth: Real-Time Streaming Video Depth Estimation at 2K Resolution. 9638-9648 - Zhibo Yang, Jun Tang, Zhaohai Li, Pengfei Wang, Jianqiang Wan, Humen Zhong, Xuejing Liu, Mingkun Yang, Peng Wang, Shuai Bai, Lianwen Jin, Junyang Lin:

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy. 21744-21754 - Yunqi Liu, Xue Ouyang, Xiaohui Cui:

GLEAM: Enhanced Transferable Adversarial Attacks for Vision-Language Pre-Training Models via Global-Local Transformations. 1665-1674 - Quang-Binh Nguyen, Minh Luu, Quang Nguyen, Anh Tran, Khoi Nguyen:

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models. 17013-17023 - Minghang Zheng, Yuxin Peng, Benyuan Sun, Yi Yang, Yang Liu:

Hierarchical Event Memory for Accurate and Low-Latency Online Video Temporal Grounding. 21589-21599 - Qirui Wu, Denys Iliash, Daniel Ritchie, Manolis Savva, Angel X. Chang:

Diorama: Unleashing Zero-Shot Single-View 3D Indoor Scene Modeling. 8896-8907 - Hyun Jun Yook, Ga San Jhun, Jae Hyun Cho, Min Jeon, Donghyun Kim, Tae Hyung Kim, Youn Kyu Lee:

ZIUM: Zero-Shot Intent-Aware Adversarial Attack on Unlearned Models. 3926-3935 - Dongwon Kim, Ju He, Qihang Yu, Chenglin Yang, Xiaohui Shen, Suha Kwak, Liang-Chieh Chen:

Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens. 18442-18452 - Sihang Li, Zeyu Jiang, Grace Chen, Chenyang Xu, Siqi Tan, Xue Wang, Irving Fang, Kristof Zyskowski, Shannon P. McPherron, Radu Iovita, Chen Feng, Jing Zhang:

GARF: Learning Generalizable 3D Reassembly for Real-World Fractures. 5711-5721 - Taehwan Lee, Kyeongkook Seo, Jaejun Yoo, Sung Whan Yoon:

Understanding Flatness in Generative Models: Its Role and Benefits. 4908-4917 - Grace Luo, Jonathan Granskog, Aleksander Holynski, Trevor Darrell:

Dual-Process Image Generation. 17972-17983 - Chensheng Peng, Ido Sobol, Masayoshi Tomizuka, Kurt Keutzer, Chenfeng Xu, Or Litany:

A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision. 28707-28717 - Yuhang Lu, Jiadong Tu, Yuexin Ma, Xinge Zhu:

ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving. 27783-27793 - Wentian Cai, Weizhao Weng, Zihao Huang, Yandan Chen, Siquan Huang, Ping Gao, Victor C. M. Leung, Ying Gao:

Unsupervised Histopathological Image Semantic Segmentation with Overlapping Patches Consistency Constraint. 23332-23341 - Zijun Lin, Shuting He, Cheston Tan, Bihan Wen:

GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding. 28774-28784 - Yunuo Chen, Zezheng Lyu, Bing He, Ning Cao, Gang Chen, Guo Lu, Wenjun Zhang:

Knowledge Distillation for Learned Image Compression. 4996-5006 - Hengjia Li, Lifan Jiang, Xi Xiao, Tianyang Wang, Hongwei Yi, Boxi Wu, Deng Cai:

Magicid: Hybrid Preference Optimization for Id-Consistent and Dynamic-Preserved Video Customization. 12737-12746 - Mahdiyar Molahasani, Azadeh Motamedi, Michael A. Greenspan, Il-Min Kim, Ali Etemad:

PRISM: Reducing Spurious Implicit Biases in Vision-Language Models with LLM-Guided Embedding Projection. 688-697 - Jiawei Gu, Ziyue Qiao, Zechao Li:

Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention. 457-466 - Emmanuelle Bourigault, Amir Jamaludin, Abdullah Hamdi:

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation. 21600-21611 - Zhizhong Huang, Xiaoming Liu:

Generalizable Object Re-Identification via Visual in-Context Prompting. 22539-22550 - Yingjie Chen, Yifang Men, Yuan Yao, Miaomiao Cui, Liefeng Bo:

Perception-as-Control: Fine-Grained Controllable Image Animation with 3D-Aware Motion Representation. 14380-14389 - Tommaso Galliena, Tommaso Apicella, Stefano Rosa, Pietro Morerio, Alessio Del Bue, Lorenzo Natale:

Embodied Image Captioning: Self-Supervised Learning Agents for Spatially Coherent Image Descriptions. 24370-24379 - Tan Pan, Zhaorui Tan, Kaiyu Guo, Dongli Xu, Weidi Xu, Chen Jiang, Xin Guo, Yuan Qi, Yuan Cheng:

Structure-Aware Semantic Discrepancy and Consistency for 3D Medical Image Self-Supervised Learning. 20257-20267 - Qing Lin, Jingfeng Zhang, Yew-Soon Ong, Mengmi Zhang:

Make Me Happier: Evoking Emotions through Image Diffusion Models. 16367-16376 - Miaowei Wang, Changjian Li, Amir Vaxman:

CanFields: Consolidating Diffeomorphic Flows for Non-Rigid 4D Interpolation From Arbitrary-Length Sequences. 28587-28598 - Ye Lu, Jie Wang, Jianjun Gao, Rui Gong, Chen Cai, Kim-Hui Yap:

A Structure-Aware and Motion-Adaptive Framework for 3D Human Pose Estimation with Mamba. 7958-7968 - Haokai Zhu, Bo Qu, Si-Yuan Cao, Runmin Zhang, Shujie Chen, Bailin Yang, Hui-Liang Shen:

EDFFDNet: Towards Accurate and Efficient Unsupervised Multi-Grid Image Registration. 5102-5111 - Guopeng Li, Qiang Wang, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia:

Fuse Before Transfer: Knowledge Fusion for Heterogeneous Distillation. 3445-3454 - Hao Huang, Shuaihang Yuan, Geeta Chandra Raju Bethala, Congcong Wen, Anthony Tzes, Yi Fang:

Wavelet Policy: Lifting Scheme for Policy Learning in Long-Horizon Tasks. 12349-12359 - Weitian Zhang, Yichao Yan, Sijing Wu, Manwen Liao, Xiaokang Yang:

Disentangled Clothed Avatar Generation with Layered Representation. 11327-11338 - Lilika Makabe, Hiroaki Santo, Fumio Okura, Michael S. Brown, Yasuyuki Matsushita:

Spectral Sensitivity Estimation with an Uncalibrated Diffraction Grating. 27252-27261 - Imad Eddine Marouf, Enzo Tartaglione, Stéphane Lathuilière, Joost van de Weijer:

Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering. 18078-18089 - Chaonan Ji, Jinwei Qi, Peng Zhang, Bang Zhang, Liefeng Bo:

Controllable and Expressive One-Shot Video Head Swapping. 10239-10250 - Ziyang Luo, Nian Liu, Xuguang Yang, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Junwei Han:

TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models. 24014-24023 - Simon Boeder, Fabian Gigengack, Benjamin Risse:

GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow. 24943-24954 - Jianing Zhang, Jiayi Zhu, Feiyu Ji, Xiaokang Yang, Xiaoyun Yuan:

Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography. 25914-25924 - Shangwen Zhu, Han Zhang, Zhantao Yang, Qianyu Peng, Zhao Pu, Huangji Wang, Fan Cheng:

Accelerating Diffusion Sampling via Exploiting Local Transition Coherence. 18284-18293 - Rongkun Xue, Jinouwen Zhang, Yazhe Niu, Dazhong Shen, Bingqi Ma, Yu Liu, Jing Yang:

Pretrained Reversible Generation as Unsupervised Visual Representation Learning. 19216-19226 - Mohammed Rakib, Arunkumar Bagavathi:

G2D: Boosting Multimodal Learning with Gradient-Guided Distillation. 4059-4068 - Xuejian Gou, Fang Liu, Licheng Jiao, Shuo Li, Lingling Li, Hao Wang, Xu Liu, Puhua Chen, Wenping Ma:

Knowledge-Guided Part Segmentation. 5490-5500 - Clément Chadebec, Onur Tasar, Sanjeev Sreetharan, Benjamin Aubin:

LBM: Latent Bridge Matching for Fast Image-to-Image Translation. 29086-29098 - Xiyu Zhang, Jiayi Ma, Jianwei Guo, Wei Hu, Zhaoshuai Qi, Fei Hui, Jiaqi Yang, Yanning Zhang:

HyperGCT: A Dynamic Hyper-GNN-Learned Geometric Constraint for 3D Registration. 24750-24759 - Xiao Fang, Minhyek Jeon, Shuowen Hu, Zheyang Qin, Shayok Chakraborty, Stanislav Panev, Celso de Melo, Fernando De la Torre:

Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision. 8088-8099 - Guanghui Shi, Xuefeng Liang, Wenjie Li, Xiaoyu Lin:

Learning Separable Fine-Grained Representation via Dendrogram Construction from Coarse Labels for Fine-grained Visual Recognition. 870-879 - Yichen Li, Antonio Torralba:

MultiModal Action Conditioned Video Simulation. 14173-14183 - Dongki Jung, Jaehoon Choi, Yonghan Lee, Dinesh Manocha:

IM360: Large-Scale Indoor Mapping with 360 Cameras. 29040-29050 - Hao Mark Chen, Shell Xu Hu, Wayne Luk, Timothy M. Hospedales, Hongxiang Fan:

FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization. 3390-3400 - Junyan Ye, Honglin Lin, Leyan Ou, Dairong Chen, Zihao Wang, Qi Zhu, Conghui He, Weijia Li:

Where am I? Cross-View Geo-localization with Natural Language Descriptions. 5890-5900 - Yaxin Xiao, Qingqing Ye, Li Hu, Huadi Zheng, Haibo Hu, Zi Liang, Haoyang Li, Yijie Jiao:

Reminiscence Attack on Residuals: Exploiting Approximate Machine Unlearning for Privacy. 3058-3068 - Hongyang Wei, Shuaizheng Liu, Chun Yuan, Lei Zhang:

Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models. 18640-18650 - Chaitanya Patel, Hiroki Nakamura, Yuta Kyuragi, Kazuki Kozuka, Juan Carlos Niebles, Ehsan Adeli

:
UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation. 10318-10329 - Aysan Aghazadeh, Adriana Kovashka:

Cap: Evaluation of Persuasive and Creative Image Generation. 16970-16980 - Chieh-Yun Chen, Min Shi, Gong Zhang, Humphrey Shi:

T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation. 19396-19405 - Zongheng Tang, Yi Liu, Yifan Sun, Yulu Gao, Jinyu Chen, Runsheng Xu, Si Liu:

CoST: Efficient Collaborative Perception from Unified Spatiotemporal Perspective. 1120-1129 - Anurag Ghosh, Shen Zheng, Robert Tamburo, Khiem Vuong, Juan R. Alvarez-Padilla, Hailiang Zhu, Michael Cardei, Nicholas Dunn, Christoph Mertz, Srinivasa G. Narasimhan:

ROADWork: A Dataset and Benchmark for Learning to Recognize, Observe, Analyze and Drive Through Work Zones. 6132-6142 - Rui Xie, Yinhong Liu, Penghao Zhou, Chen Zhao, Jun Zhou, Kai Zhang, Zhenyu Zhang, Jian Yang, Zhenheng Yang, Ying Tai:

Star: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution. 17108-17118 - Shr-Ruei Tsai, Wei-Cheng Chang, Jie-Ying Lee, Chih-Hai Su, Yu-Lun Liu:

Lightsout: Diffusion-Based Outpainting for Enhanced Lens Flare Removal. 6353-6363 - Kumara Kahatapitiya, Haozhe Liu, Sen He, Ding Liu, Menglin Jia, Chenyang Zhang, Michael S. Ryoo, Tian Xie:

Adaptive Caching for Faster Video Generation With Diffusion Transformers. 15240-15252 - Zukang Liao, Min Chen:

Background Invariance Testing According to Semantic Proximity. 8056-8065 - Jingwei Liu, Ling Yang, Hao Luo, Fan Wang, Hongyan Li, Mengdi Wang:

Preacher: Paper-to-Video Agentic System. 17129-17139 - Xiaohang Yang, Qing Wang, Jiahao Yang, Gregory G. Slabaugh, Shanxin Yuan:

STaR: Seamless Spatial-Temporal Aware Motion Retargeting with Penetration and Consistency Constraints. 12947-12955 - Lin Zhang, Xianfang Zeng, Kangcong Li, Gang Yu, Tao Chen:

SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning. 23145-23155 - Youming Deng, Wenqi Xian, Guandao Yang, Leonidas J. Guibas, Gordon Wetzstein, Steve Marschner, Paul Debevec:

Self-Calibrating Gaussian Splatting for Large Field-of-View Reconstruction. 25124-25133 - Haiming Zhu, Yangyang Xu, Chenshu Xu, Tingrui Shen, Wenxi Liu, Yong Du, Jun Yu, Shengfeng He:

Stable Score Distillation. 16597-16606 - Tsu-Jui Fu, Yusu Qian, Chen Chen, Wenze Hu, Zhe Gan, Yinfei Yang:

UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing. 17160-17170 - Min Kim, Younho Jeon, Sungho Jo:

Probabilistic Inertial Poser (ProbIP): Uncertainty-Aware Human Motion Modeling from Sparse Inertial Sensors. 1-10 - Junru Lin, Chirag Vashist, Mikaela Angelina Uy, Colton Stearns, Xuan Luo, Leonidas J. Guibas, Ke Li:

Global Motion Corresponder for 3D Point-Based Scene Interpolation Under Large Motion. 7884-7893 - Shuang Xu, Zixiang Zhao, Haowen Bai, Chang Yu, Jiangjun Peng, Xiangyong Cao, Deyu Meng:

Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image. 12002-12011 - Xinyi Lai, Luojun Lin, Weijie Chen, Yuanlong Yu:

A Tiny Change, a Giant Leap: Long-Tailed Class-Incremental Learning via Geometric Prototype Alignment. 1444-1453 - Jiahao Zhang, Zongli Jiang, Jinli Zhang, Yixin Wei, Liang Li, Yizheng Wang, Gang Wang:

Tracking Tiny Drones Against Clutter: Large-Scale Infrared Benchmark with Motion-Centric Adaptive Algorithm. 7361-7371 - Luca Bartolomei, Enrico Mannocci, Fabio Tosi, Matteo Poggi, Stefano Mattoccia:

Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation. 19669-19678 - Junhong Min, Youngpil Jeon, Jimin Kim, Minyong Choi:

S2M2: Scalable Stereo Matching Model for Reliable Depth Estimation. 26729-26739 - Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari:

UIP2P: Unsupervised Instruction-Based Image Editing via Edit Reversibility Constraint. 18895-18905 - Lingyong Fang, Xinzhong Wang, Depeng Wang, Zongru Wu, Ya Guo, Huijia Zhu, Zhuosheng Zhang, Gongshen Liu:

Can Knowledge be Transferred from Unimodal to Multimodal? Investigating the Transitivity of Multimodal Knowledge Editing. 2482-2490 - Ruiyuan Gao, Kai Chen, Bo Xiao, Lanqing Hong, Zhenguo Li, Qiang Xu:

MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control. 28135-28144 - Minghao Wen, Shengjie Wu, Kangkan Wang, Dong Liang:

InterGSEdit: Interactive 3D Gaussian Splatting Editing with 3D Geometry-Consistent Attention Prior. 26136-26145 - Yue Duan, Taicai Chen, Lei Qi, Yinghuan Shi:

Divide-And-Conquer for Enhancing Unlabeled Learning, Stability, and Plasticity in Semi-Supervised Continual Learning. 583-593 - Zekun Qian, Ruize Han, Zhixiang Wang, Junhui Hou, Wei Feng:

COVTrack: Continuous Open-Vocabulary Tracking via Adaptive Multi-Cue Fusion. 1-10 - Chih-Hao Lin, Zian Wang, Ruofan Liang, Yuxuan Zhang, Sanja Fidler, Shenlong Wang, Zan Gojcic:

Controllable Weather Synthesis and Removal with Video Diffusion Models. 13580-13591 - Yuchen Liu, Yaoming Wang, Bowen Shi, Xiaopeng Zhang, Wenrui Dai, Chenglin Li, Hongkai Xiong, Qi Tian:

METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models. 21492-21504 - Xinyao Liao, Xianfang Zeng, Liao Wang, Gang Yu, Guosheng Lin, Chi Zhang:

Motionagent: Fine-Grained Controllable Video Generation via Motion Field Agent. 11305-11316 - Bangxiang Lan, Ruobing Xie, Ruixiang Zhao, Xingwu Sun, Zhanhui Kang, Gang Yang, Xirong Li:

Hybrid-Tower: Fine-Grained Pseudo-Query Interaction and Generation for Text-to-Video Retrieval. 24497-24506 - Fang Zhang, Wenzhao Zheng, Linqing Zhao, Zelan Zhu, Jiwen Lu, Xiuzhuang Zhou:

PlaneRAS: Learning Planar Primitives for 3D Plane Recovery. 6882-6891 - Yang Tian, Zheng Lu, Mingqi Gao, Zheng Liu, Bo Zhao:

MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers. 1-10 - Yun Wang, Longguang Wang, Chenghao Zhang, Yongjian Zhang, Zhanjie Zhang, Ao Ma, Chenyou Fan, Tin Lun Lam, Junjie Hu:

Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts. 21276-21287 - Phillip Y. Lee, Jihyeon Je, Chanho Park, Leonidas J. Guibas, Mikaela Angelina Uy, Minhyuk Sung:

Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation. 9241-9251 - Zexuan Yan, Yue Ma, Chang Zou, Wenteng Chen, Qifeng Chen, Linfeng Zhang:

EEdit ⚡: Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing. 17474-17484 - Shizhen Zhao, Jiahui Liu, Xin Wen, Haoru Tan, Xiaojuan Qi:

Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection. 1751-1761 - Weikang Wang, Tobias Weißberg, Nafie El Amrani, Florian Bernard:

$\chi$: Symmetry Understanding of 3D Shapes via Chirality Disentanglement. 28292-28302 - Jiaxuan Chen, Yu Qi, Yueming Wang, Gang Pan:

Bridging the Gap Between Brain and Machine in Interpreting Visual Semantics: Towards Self-Adaptive Brain-to-Text Decoding. 21938-21948 - Jun Li, Jinpeng Wang, Chaolei Tan, Niu Lian, Long Chen, Yaowei Wang, Min Zhang, Shu-Tao Xia, Bin Chen:

Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning. 23074-23084 - Seungju Yoo, Hyuk Kwon, Joong-Won Hwang, Kibok Lee:

Automated Model Evaluation for Object Detection Via Prediction Consistency and Reliability. 19764-19773 - Chu Zhou, Yixin Yang, Junda Liao, Heng Guo, Boxin Shi, Imari Sato:

Polarimetric Neural Field via Unified Complex-Valued Wave Representation. 25660-25669 - Wangze Xu, Yifan Zhan, Zhihang Zhong, Xiao Sun:

Sequential Gaussian Avatars with Hierarchical Motion Context. 13592-13603 - Deng Li, Aming Wu, Yang Li, Yaowei Wang, Yahong Han:

Continual Adaptation: Environment-Conditional Parameter Generation for Object Detection in Dynamic Scenarios. 4434-4443 - Chong Cheng, Yu Hu, Sicheng Yu, Beizhen Zhao, Zijian Wang, Hao Wang:

RegGS: Unposed Sparse Views Gaussian Splatting with 3DGS Registration. 8100-8109 - Guan Luo, Jianfeng Zhang:

MS3D: High-Quality 3D Generation via Multi-Scale Representation Modeling. 26336-26348 - Hailong Guo, Bohan Zeng, Yiren Song, Wentao Zhang, Jiaming Liu, Chuang Zhang:

Any2anytryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks. 19085-19096 - Zhenxin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Zuxuan Wu, José M. Álvarez:

Hydra-NeXt: Robust Closed-Loop Driving with Open-Loop Training. 27305-27314 - Hebaixu Wang, Jiayi Ma:

Deep Adaptive Unfolded Network via Spatial Morphology Stripping and Spectral Filtration for Pan-Sharpening. 10730-10740 - Chenghao Xiao, Isaac Chung, Imene Kerboua, Jamie Stirling, Xin Zhang, Márton Kardos, Roman Solomatin, Noura Al Moubayed, Kenneth C. Enevoldsen, Niklas Muennighoff:

Mieb: Massive Image Embedding Benchmark. 22187-22198 - Sheng Ye, Xin Chen, Yan Zhang, Xianming Lin, Liujuan Cao:

ESCNet: Edge-Semantic Collaborative Network for Camouflaged Object Detection. 20053-20063 - Muleilan Pei, Shaoshuai Shi, Xuesong Chen, Xu Liu, Shaojie Shen:

Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics. 28303-28312 - Prerit Gupta, Jason Alexander Fotso-Puepi, Zhengyuan Li, Jay Mehta, Aniket Bera:

MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation. 13932-13941 - Sungwoo Cho, Jeongsoo Choi, Sungnyun Kim, Se-Young Yun:

MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation. 13151-13161 - Shida Sun, Yue Li, Yueyi Zhang, Zhiwei Xiong:

Generalizable Non-Line-of-Sight Imaging with Learnable Physical Priors. 1-10 - Ihab Asaad, Maha Shadaydeh, Joachim Denzler:

Gradient Extrapolation for Debiased Representation Learning. 3819-3829 - Yuxuan Luo, Zhengkun Rong, Lizhen Wang, Longhao Zhang, Tianshu Hu:

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance. 11036-11046 - Townim Faisal Chowdhury, Vu Minh Hieu Phan, Kewen Liao, Nanyu Dong, Minh-Son To, Anton van den Hengel, Johan W. Verjans, Zhibin Liao:

Looking in the Mirror: A Faithful Counterfactual Explanation Method for Interpreting Deep Image Classification Models. 2239-2249 - Pou-Chun Kung, Skanda Harisha, Ram Vasudevan, Aline Eid, Katherine A. Skinner:

RadarSplat: Radar Gaussian Splatting for High-Fidelity Data Synthesis and 3D Reconstruction of Autonomous Driving Scenes. 27596-27606 - Peizheng Li, Shuxiao Ding, You Zhou, Qingwen Zhang, Onat Inak, Larissa Triess, Niklas Hanselmann, Marius Cordts, Andreas Zell:

AGO: Adaptive Grounding for Open World 3D Occupancy Prediction. 8645-8655 - Ashutosh Anshul, Shreyas Gopal, Deepu Rajan, Eng Siong Chng:

Intra-Modal and Cross-Modal Synchronization for Audio-Visual Deepfake Detection and Temporal Localization. 13826-13836 - Inzamamul Alam, Md Tanvir Islam, Simon S. Woo, Khan Muhammad:

SpecGuard: Spectral Projection-Based Advanced Invisible Watermarking. 17984-17993 - Qingyan Bai, Hao Ouyang, Yinghao Xu, Qiuyu Wang, Ceyuan Yang, Ka Leong Cheng, Yujun Shen, Qifeng Chen:

Edicho: Consistent Image Editing in the Wild. 15277-15287 - Connor Malone, Somayeh Hussaini, Tobias Fischer, Michael Milford:

A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors for Visual Place Recognition. 1-12 - Zhaojie Zeng, Yuesong Wang, Tao Guan, Chao Yang, Lili Ju:

Instant Gaussianimage: A Generalizable and Self-Adaptive Image Representation via 2D Gaussian Splatting. 27896-27905 - Haoran Lou, Chunxiao Fan, Ziyan Liu, Yuexin Wu, Xinliang Wang:

LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs. 22004-22024 - Lening Wang, Wenzhao Zheng, Dalong Du, Yunpeng Zhang, Yilong Ren, Han Jiang, Zhiyong Cui, Haiyang Yu, Jie Zhou, Shanghang Zhang:

Authentic 4D Driving Simulation with a Video Generation Model. 28892-28902 - Jonas Belouadi, Eddy Ilg, Margret Keuper, Hideki Tanaka, Masao Utiyama, Raj Dabre, Steffen Eger, Simone Paolo Ponzetto:

Tikzero: Zero-Shot Text-Guided Graphics Program Synthesis. 17793-17806 - Aoxiang Fan, Corentin Dumery, Nicolas Talabot, Pascal Fua:

A View-Consistent Sampling Method for Regularized Training of Neural Radiance Fields. 25961-25971 - Yufei Zhan, Shurong Zheng, Yousong Zhu, Hongyin Zhao, Fan Yang, Ming Tang, Jinqiao Wang:

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring. 22947-22957 - Yasser Benigmim, Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Raoul de Charette:

FLOSS: Free Lunch in Open-Vocabulary Semantic Segmentation. 21471-21481 - Philipp Wulff, Felix Wimbauer, Dominik Muhle, Daniel Cremers:

Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images. 9352-9362 - Siqi Yang, Jinxiu Liang, Zhaojun Huang, Yeliduosi Xiaokaiti, Yakun Chang, Zhaofei Yu, Boxin Shi:

SpikeDiff: Zero-Shot High-Quality Video Reconstruction from Chromatic Spike Camera and Sub-Millisecond Spike Streams. 7905-7914 - Kwanseok Kim, Jaehoon Hahm, Sumin Kim, Jinhwan Sul, Byunghak Kim, Joonseok Lee:

SummDiff: Generative Modeling of Video Summarization with Diffusion. 15096-15106 - Ziye Wang, Minghang Yu, Chunyan Xu, Zhen Cui:

Semantic Discrepancy-Aware Detector for Image Forgery Identification. 18388-18398 - Yunshan Zhong, Yuyao Zhou, Yuxin Zhang, Wanchen Sui, Shen Li, Yong Li, Fei Chao, Rongrong Ji:

Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers. 12479-12490 - Xiangyu Han, Zhen Jia, Boyi Li, Yan Wang, Boris Ivanovic, Yurong You, Lingjie Liu, Yue Wang, Marco Pavone, Chen Feng, Yiming Li:

Extrapolated Urban View Synthesis Benchmark. 28718-28728 - Zhe Cao, Jin Zhang, Ruiheng Zhang:

IRGPT: Understanding Real-World Infrared Image with Bi-Cross-Modal Curriculum on Large-Scale Benchmark. 166-176 - Zheng Li, Yibing Song, Ming-Ming Cheng, Xiang Li, Jian Yang:

Advancing Textual Prompt Learning with Anchored Attributes. 3618-3627 - Chao Zhou, Tianyi Wei, Nenghai Yu:

Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling. 15171-15181 - Han Qiu, Peng Gao, Lewei Lu, Xiaoqin Zhang, Ling Shao, Shijian Lu:

Spatial Preference Rewarding for MLLMs Spatial Understanding. 720-730 - Ying Ba, Tianyu Zhang, Yalong Bai, Wenyi Mo, Tao Liang, Bing Su, Ji-Rong Wen:

Enhancing Reward Models for High-Quality Image Generation: Beyond Text-Image Alignment. 19022-19031 - Bowen Fu, Wei Wei, Jiaqi Tang, Jiangtao Nie, Yanyu Ye, Xiaogang Xu, Ying-Cong Chen, Lei Zhang:

Co-Painter: Fine-Grained Controllable Image Stylization via Implicit Decoupling and Adaptive Injection. 16830-16839 - Sanghyun Jo, Seo Jin Lee, Seungwoo Lee, Seohyung Hong, Hyungseok Seo, Kyungsu Kim:

COIN: Confidence Score-Guided Distillation for Annotation-Free Cell Segmentation. 20324-20335 - Qi Zhang, Chi Huang, Qian Zhang, Nan Li, Wei Feng:

SU-RGS: Relightable 3D Gaussian Splatting from Sparse Views Under Unconstrained Illuminations. 26859-26868 - Qiaomu Miao, Vivek Raju Golani, Jingyi Xu, Progga Paromita Dutta, Minh Hoai, Dimitris Samaras:

Multi-View Gaze Target Estimation. 5371-5381 - Zhiyuan Zhang, Dongdong Chen, Jing Liao:

I2V3D: Controllable Image-to-Video Generation with 3D Guidance. 13360-13371 - Yingyue Li, Bencheng Liao, Wenyu Liu, Xinggang Wang:

MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling. 20878-20888 - Seunggeun Chi, Enna Sachdeva, Pin-Hao Huang, Kwonjoon Lee:

Contact-Aware Amodal Completion for Human-Object Interaction via Multi-Regional Inpainting. 9487-9496 - Lin Sun, Jiale Cao, Jin Xie, Xiaoheng Jiang, Yanwei Pang:

CLIPeR: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation. 23199-23209 - Xueyi Zhang, Peiyin Zhu, Chengwei Zhang, Zhiyuan Yan, Jikang Cheng, Mingrui Lao, Siqi Cai, Yanming Guo:

Generalization-Preserved Learning: Closing the Backdoor to Catastrophic Forgetting in Continual Deepfake Detection. 3798-3808 - Yue Qiu, Yanjun Sun, Takuma Yagi, Shusaku Egami, Natsuki Miyata, Ken Fukuda, Kensho Hara, Ryusuke Sagawa:

VideoSetDiff: Identifying and Reasoning Similarities and Differences in Similar Videos. 12242-12252 - Yang Liu, Wentao Feng, Zhuoyao Liu, Shudong Huang, Jiancheng Lv:

Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching. 1-10 - Cong Wei, Yujie Zhong, Haoxian Tan, Yingsen Zeng, Yong Liu, Hongfa Wang, Yujiu Yang:

Instructseg: Unifying Instructed Visual Segmentation with Multi-Modal Large Language Models. 20193-20203 - Hyojun Go, Byeongjun Park, Hyelin Nam, Byung-Hoon Kim, Hyungjin Chung, Changick Kim:

VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling. 1-12 - Gang Fu:

Neural Solver of Dichromatic Reflection Model for Specular Highlight Removal. 7241-7250 - Yujia Tong, Yuze Wang, Jingling Yuan, Chuang Hu:

Robust Machine Unlearning for Quantized Neural Networks via Adaptive Gradient Reweighting with Similar Labels. 20603-20612 - Andy Regensky, Marc Windsheimer, Fabian Brand, André Kaup:

Beyond Perspective: Neural 360-Degree Video Compression. 16143-16153 - Mohammad Mohammadi, Ziyi Wu, Igor Gilitschenski:

TESPEC: Temporally-Enhanced Self-Supervised Pretraining for Event Cameras. 7782-7793 - Yuanyuan Gao, Hao Li, Jiaqi Chen, Zhengyu Zou, Zhihang Zhong, Dingwen Zhang, Xiao Sun, Junwei Han:

CityGS-$\mathcal{X}$: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction. 27187-27196 - Yanzuo Lu, Yuxi Ren, Xin Xia, Shanchuan Lin, Xing Wang, Xuefeng Xiao, Andy J. Ma, Xiaohua Xie, Jian-Huang Lai:

Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis. 16818-16829 - Xinyu Mao, Xiaohan Xing, Fei Meng, Jianbang Liu, Fan Bai, Qiang Nie, Max Q.-H. Meng:

One Polyp Identifies All: One-Shot Polyp Segmentation with SAM via Cascaded Priors and Iterative Prompt Evolution. 24182-24191 - Yong Zhang, Feng Liang, Guanghu Yuan, Min Yang, Chengming Li, Xiping Hu:

FedPall: Prototype-Based Adversarial and Collaborative Learning for Federated Learning with Feature Drift. 3111-3120 - Luong Tran, Thieu Vo, Anh Nguyen, Sang Dinh, Van Nguyen:

More Reliable Pseudo-Labels, Better Performance: A Generalized Approach to Single Positive Multi-Label Learning. 1349-1358 - Xuhong Huang, Shiqi Liu, Kai Zhang, Ying Tai, Jian Yang, Hui Zeng, Lei Zhang:

Reverse Convolution and its Applications to Image Restoration. 10507-10516 - Hanling Zhang, Rundong Su, Zhihang Yuan, Pengtao Chen, Mingzhu Shen, Yibo Fan, Shengen Yan, Guohao Dai, Yu Wang:

DiTFastAttnV2: Head-Wise Attention Compression for Multi-Modality Diffusion Transformers. 16399-16409 - Mingfang Zhang, Ryo Yonetani, Yifei Huang, Liangyang Ouyang, Ruicong Liu, Yoichi Sato:

Egocentric Action-Aware Inertial Localization in Point Clouds with Vision-Language Guidance. 27209-27219 - Juan Hu, Shaojing Fan, Terence Sim:

Seeing Through Deepfakes: A Human-Inspired Framework for Multi-Face Detection. 14517-14527 - Abiao Li, Chenlei Lv, Yuming Fang, Yifan Zuo, Jian Zhang, Guofeng Mei:

PointGAC: Geometric-Aware Codebook for Masked Point Cloud Modeling. 24989-24998 - Hugo Blanc, Jean-Emmanuel Deschaud, Alexis Paljic:

RayGaussX: Accelerating Gaussian-Based Ray Marching for Real-Time and High-Quality Novel View Synthesis. 27575-27584 - Huilin Xu, Jian Ding, Jiakun Xu, Ruixiang Wang, Jun Chen, Jinjie Mai, Yanwei Fu, Bernard Ghanem

, Feng Xu, Mohamed Elhoseiny
:
Diffusion-Based Imaginative Coordination for Bimanual Manipulation. 11469-11479 - Haoning Wu, Ziheng Zhao, Ya Zhang, Yanfeng Wang, Weidi Xie:

MRGen: Segmentation Data Engine for Underrepresented MRI Modalities. 19903-19913 - Pengkun Jiao, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, Yu-Gang Jiang:

From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning. 2728-2737 - Jingyi Pan, Dan Xu, Qiong Luo:

DiGA3D: Coarse-to-Fine Diffusional Propagation of Geometry and Appearance for Versatile 3D Inpainting. 16345-16355 - Xiuyu Wu, Xinhao Wang, Xiubin Zhu, Lan Yang, Jiyuan Liu, Xingchen Hu:

Measuring the Impact of Rotation Equivariance on Aerial Object Detection. 7329-7339 - Yiqing Shen, Bohan Liu, Chenjia Li, Lalithkumar Seenivasan, Mathias Unberath:

Online Reasoning Video Segmentation with Just-in-Time Digital Twins. 24698-24706 - Bowen Wang, Yafei Wang, Wei Gong, Siheng Chen, Genjia Liu, Minhao Xiong, Chin Long Ng:

V2XScenes: A Multiple Challenging Traffic Conditions Dataset for Large-Range Vehicle-Infrastructure Collaborative Perception. 28385-28395 - Lujun Li, Dezhi Li, Cheng Lin, Wei Li, Wei Xue, Sirui Han, Yike Guo:

AIRA: Activation-Informed Low-Rank Adaptation for Large Models. 1729-1739 - Yating Wang, Haoyi Zhu, Mingyu Liu, Jiange Yang, Haoshu Fang, Tong He:

VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers. 11089-11099 - Lijie Liu, Tianxiang Ma, Bingchuan Li, Zhuowei Chen, Jiawei Liu, Gen Li, Siyu Zhou, Qian He, Xinglong Wu:

Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment. 14951-14961 - Juan Yeo, Soonwoo Cha, Jiwoo Song, Hyunbin Jin, Taesup Kim:

ATAS: Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction. 20390-20400 - Hui Zhang, Dexiang Hong, Yitong Wang, Jie Shao, Xinglong Wu, Zuxuan Wu, Yu-Gang Jiang:

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation. 18487-18497 - Bimsara Pathiraja, Maitreya Patel, Shivam Singh, Yezhou Yang, Chitta Baral:

RefEdit: A Benchmark and Method for Improving Instruction-Based Image Editing Model on Referring Expressions. 15646-15656 - Zitong Zhang, Suranjan Gautam, Rui Yu:

Top2Pano: Learning to Generate Indoor Panoramas from Top-Down View. 28493-28502 - Zengyu Wan, Wei Zhai, Yang Cao, Zhengjun Zha:

Emotive: Event-Guided Trajectory Modeling for 3D Motion Estimation. 9342-9351 - Jiaqi Han, Haotian Ye, Puheng Li, Minkai Xu, James Zou, Stefano Ermon:

CHORDS: Diffusion Sampling Accelerator with Multi-Core Hierarchical ODE Solvers. 19386-19395 - Fabio De Sousa Ribeiro, Omar Todd, Charles Jones, Avinash Kori, Raghav Mehta, Ben Glocker:

Flow Stochastic Segmentation Networks. 14754-14765 - Shubhendu Jena, Amine Ouasfi, Mae Younes, Adnane Boukhayma:

Sparfels: Fast Reconstruction from Sparse Unposed Imagery. 27476-27487 - Lin Zeng, Boming Zhao, Jiarui Hu, Xujie Shen, Ziqiang Dang, Hujun Bao, Zhaopeng Cui:

GaussianUpdate: Continual 3D Gaussian Splatting Update for Changing Environments. 25800-25809 - Yiming Zuo, Willow Yang, Zeyu Ma, Jia Deng:

OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration. 9287-9297 - Rui Liu, Sheng Fan, Wenguan Wang, Yi Yang:

Underwater Visual SLAM with Depth Uncertainty and Medium Modeling. 970-980 - Khaled Abud, Sergey Lavrushkin, Alexey Kirillov, Dmitriy S. Vatolin:

IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models. 15469-15480 - Maximilian Andreas Hoefler, Karsten Müller, Wojciech Samek:

FedXDS: Leveraging Model Attribution Methods to Counteract Data Heterogeneity in Federated Learning. 4572-4581 - Meiqi Gong, Hao Zhang, Xunpeng Yi, Linfeng Tang, Jiayi Ma:

TemCoCo: Temporally Consistent Multi-Modal Video Fusion with Visual-Semantic Collaboration. 14326-14335 - Jeong Woon Lee, Hyoseok Hwang:

Diffusion Guided Adaptive Augmentation for Generalization in Visual Reinforcement Learning. 880-889 - Yifei Xia, Suhan Ling, Fangcheng Fu, Yujie Wang, Huixia Li, Xuefeng Xiao, Bin Cui:

Training-Free and Adaptive Sparse Attention for Efficient Long Video Generation. 15982-15993 - Daqian Shi, Xiaolei Diao, Xu Chen, Cédric M. John:

Competitive Distillation: A Simple Learning Strategy for Improving Visual Classification. 2981-2990 - Yuan Liu, Saihui Hou, Saijie Hou, Jiabao Du, Shibei Meng, Yongzhen Huang:

OmniDiff: A Comprehensive Benchmark for Fine-Grained Image Difference Captioning. 21440-21449 - Doriand Petit, Steve Bourgeois, Vincent Gay-Bellile, Florian Chabot, Loïc Barthe:

DiSCO-3D : Discovering and Segmenting Sub-Concepts from Open-Vocabulary Queries in NeRF. 20043-20052 - Guilian Chen, Huisi Wu, Jing Qin:

STDDNet: Harnessing Mamba for Video Polyp Segmentation via Spatial-aligned Temporal Modeling and Discriminative Dynamic Representation Learning. 21364-21373 - Jiajia Li, Huisi Wu, Jing Qin:

WeaveSeg: Iterative Contrast-weaving and Spectral Feature-refining for Nuclei Instance Segmentation. 21984-21993 - Liya Ji, Chenyang Qi, Qifeng Chen:

Instruction-Based Image Editing with Planning, Reasoning, and Generation. 17506-17515 - Yulin Wang, Mengting Hu, Hongli Li, Chen Luo:

HccePose(BF): Predicting Front & Back Surfaces to Construct Ultra-Dense 2D-3D Correspondences for Pose Estimation. 7166-7175 - Chi-Ping Su, Ching-Hsun Tseng, Bin Pu, Lei Zhao, Jiewen Yang, Zhuangzhuang Chen, Shin-Jye Lee:

EA-KD: Entropy-Based Adaptive Knowledge Distillation. 731-740 - Xiaolei Wang, Xiaoyang Wang, Huihui Bai, Eng Gee Lim, Jimin Xiao:

DecAD: Decoupling Anomalies in Latent Space for Multi-Class Unsupervised Anomaly Detection. 21568-21577 - Zhiqiang Yan, Zhengxue Wang, Haoye Dong, Jun Li, Jian Yang, Gim Hee Lee:

DuCos: Duality Constrained Depth Super-Resolution via Foundation Model. 8361-8371 - Weiying Xie, Zihan Meng, Jitao Ma, Wenjin Guo, Haowei Li, Haonan Qin, Leyuan Fang, Yunsong Li:

Allowing Oscillation Quantization: Overcoming Solution Space Limitation in Low Bit-Width Quantization. 24615-24624 - Siyoon Jin, Jisu Nam, Jiyoung Kim, Dahyun Chung, Yeong-Seok Kim, Joonhyung Park, Heonjeong Chu, Seungryong Kim:

AM-Adapter: Appearance Matching Adapter for Exemplar-Based Semantic Image Synthesis In-the-Wild. 17077-17086 - Enyu Liu, En Yu, Sijia Chen, Wenbing Tao:

Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion. 26999-27009 - Hyewon Park, Hyejin Park, Jueun Ko, Dongbo Min:

Hybrid-Tta: Continual Test-Time Adaptation Via Dynamic Domain Shift Detection. 2877-2886 - Guosheng Zhao, Xiaofeng Wang, Chaojun Ni, Zheng Zhu, Wenkang Qin, Guan Huang, Xingang Wang:

ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation. 26718-26728 - Jiesi Hu, Hanyang Peng, Yanwu Yang, Xutao Guo, Yang Shang, Pengcheng Shi, Chenfei Ye, Ting Ma:

Neuroverse3D: Developing in-Context Learning Universal Model for Neuroimaging in 3D. 21721-21731 - Yuanhong Yu, Xingyi He, Chen Zhao, Junhao Yu, Jiaqi Yang, Ruizhen Hu, Yujun Shen, Xing Zhu, Xiaowei Zhou, Sida Peng:

BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation. 9374-9384 - Lena Wild, Rafael Valencia, Patric Jensfelt:

ArgoTweak: Towards Self-Updating HD Maps Through Structured Priors. 6091-6100 - Jiakai Zhang, Shouchen Zhou, Haizhao Dai, Xinhang Liu, Peihao Wang, Zhiwen Fan, Yuan Pei, Jingyi Yu:

CryoFastAR: Fast Cryo-EM AB Initio Reconstruction Made Easy. 8462-8471 - Bingyi Liu, Jian Teng, Hongfei Xue, Enshu Wang, Chuanhui Zhu, Pu Wang, Libing Wu:

mmCooper: A Multi-Agent Multi-Stage Communication-Efficient and Collaboration-Robust Cooperative Perception Framework. 28396-28406 - Xingsong Ye, Yongkun Du, Yunbo Tao, Zhineng Chen:

TextSSR: Diffusion-Based Data Synthesis for Scene Text Recognition. 17464-17473 - Maximilian Ulmer, Wout Boerdijk, Rudolph Triebel, Maximilian Durner:

Conditional Latent Diffusion Models for Zero-Shot Instance Segmentation. 24360-24369 - Xinyu Hou, Zongsheng Yue, Xiaoming Li, Chen Change Loy:

Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis. 19353-19362 - Hyojin Bahng, Caroline Chan, Frédo Durand, Phillip Isola:

Cycle Consistency as Reward: Learning Image-Text Alignment Without Human Preferences. 22934-22946 - Chuanyu Fu, Yuqi Zhang, Kunbin Yao, Guanying Chen, Yuan Xiong, Chuan Huang, Shuguang Cui, Xiaochun Cao:

RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS. 27126-27136 - Min Yang, Zihan Jia, Zhilin Dai, Sheng Guo, Limin Wang:

MobileViCLIP: An Efficient Video-Text Model for Mobile Devices. 20824-20835 - Rohit Gandikota, Zongze Wu, Richard Zhang, David Bau, Eli Shechtman, Nicholas I. Kolkin:

SliderSpace: Decomposing the Visual Capabilities of Diffusion Models. 15994-16003 - Jing Wu, Mehrtash Harandi:

MUNBa: Machine Unlearning Via Nash Bargaining. 1-12 - Rui Yu, Xianghang Zhang, Runkai Zhao, Huaicheng Yan, Meng Wang:

DistillDrive: End-to-End Multi-Mode Autonomous Driving Distillation by Isomorphic Hetero-Source Planning Model. 26188-26197 - Xu Cao, Takafumi Taketomi:

Neural Multi-View Self-Calibrated Photometric Stereo without Photometric Stereo Cues. 27552-27562 - Ao Li, Jinpeng Liu, Yixuan Zhu, Yansong Tang:

ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion. 7592-7602 - Zhenbang Du, Yonggan Fu, Lifu Wang, Jiayi Qian, Xiao Luo, Yingyan (Celine) Lin:

Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment. 3001-3010 - Baihui Xiao, Chengjian Feng, Zhijian Huang, Feng Yan, Yujie Zhong, Lin Ma:

RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case. 27380-27389 - Luca Collorone, Matteo Gioia, Massimiliano Pappa, Paolo Leoni, Giovanni Ficarra

, Or Litany, Indro Spinelli, Fabio Galasso:
MonSTeR: A Unified Model for Motion, Scene, Text Retrieval. 10940-10949 - Lorenzo Mur-Labadia, Maria Santos-Villafranca, Jesus Bermudez-Cameo, Alejandro Pérez-Yus, Ruben Martinez-Cantin, Josechu J. Guerrero:

O-MaMa: Learning Object Mask Matching Between Egocentric and Exocentric Views. 6892-6903 - Yixiang Chen, Peiyan Li, Yan Huang, Jiabing Yang, Kehan Chen, Liang Wang:

EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow. 11958-11968 - Runhao Zeng, Jiaqi Mao, Minghao Lai, Minh Hieu Phan, Yanjie Dong, Wei Wang, Qi Chen, Xiping Hu:

OVG-HQ: Online Video Grounding with Hybrid-Modal Queries. 21085-21096 - Yi Wang, Zhitong Xiong, Chenying Liu, Adam J. Stewart, Thomas Dujardin, Nikolaos-Ioannis Bountos, Angelos Zavras, Franziska Gerken, Ioannis Papoutsis, Laura Leal-Taixé, Xiao Xiang Zhu:

Towards a Unified Copernicus Foundation Model for Earth Vision. 9888-9899 - Inwoo Hwang, Bing Zhou, Young Min Kim, Jian Wang, Chuan Guo:

Scenemi: Motion In-Betweening for Modeling Human-Scene Interactions. 6034-6045 - Yi-Hsin Chen, Yi-Chen Yao, Kuan-Wei Ho, Chun-Hung Wu, Huu-Tai Phung, Martin Benjak, Jörn Ostermann, Wen-Hsiao Peng:

HyTIP: Hybrid Temporal Information Propagation for Masked Conditional Residual Video Coding. 17889-17898 - Zihan Zhou, Li Li, Yanli Ren, Chuan Qin, Guorui Feng:

Leveraging Spatial Invariance to Boost Adversarial Transferability. 1423-1432 - Longliang Liu, Miaojie Feng, Junda Cheng, Jijun Xiang, Xuan Zhu, Xin Yang:

PriOr-Flow: Enhancing Primitive Panoramic Optical Flow with Orthogonal View. 5326-5336 - Mahmoud Ahmed, Junjie Fei, Jian Ding, Eslam Mohamed Bakr, Mohamed Elhoseiny

:
Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description. 8973-8983 - Yibin Yan, Jilan Xu, Shangzhe Di, Yikun Liu, Yudi Shi, Qirui Chen, Zeqian Li, Yifei Huang, Weidi Xie:

Learning Streaming Video Representation via Multitask Training. 9900-9912 - Mahesh Bhosale, Abdul Wasi, Yuanhao Zhai, Yunjie Tian, Samuel P. Border, Nan Xi, Pinaki Sarder, Junsong Yuan, David S. Doermann, Xuan Gong:

PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions. 22415-22424 - Junhyeog Yun, Minui Hong, Gunhee Kim:

FedMeNF: Privacy-Preserving Federated Meta-Learning for Neural Fields. 2161-2171 - Renjie Lu, Yu Zhou, Hao Cheng, Jingke Meng, Wei-Shi Zheng:

monoVLN: Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation. 9477-9486 - Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah, Kristen Grauman:

Switch-a-View: View Selection Learned from Unlabeled In-the-Wild Videos. 11969-11979 - Xuelin Zhu, Jian Liu, Jiuxin Cao, Bing Wang:

MambaMl: Exploring State Space Models for Multi-Label Image Classification. 4743-4753 - Hai Huang, Yan Xia, Shulei Wang, Hanting Wang, Minghui Fang, Shengpeng Ji, Sashuai Zhou, Tao Jin, Zhou Zhao:

Open-Set Cross Modal Generalization via Multimodal Unified Representation. 541-551 - Chengchao Zhang, Fanhua Shang, Hongyin Liu, Liang Wan, Wei Feng:

FedAGC: Federated Continual Learning with Asymmetric Gradient Correction. 3841-3850 - Geonho Bang, Minjae Seong, Jisong Kim, Geunju Baek, Daye Oh, Junhyung Kim, Junho Koh, Jun Won Choi:

RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion. 25315-25324 - Yi Liu, Shengqian Li, Zuzeng Lin, Feng Wang, Si Liu:

CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation. 15194-15204 - Xiaoxue Chen, Bhargav Chandaka, Chih-Hao Lin, Ya-Qin Zhang, David A. Forsyth, Hao Zhao, Shenlong Wang:

InvRGB+L: Inverse Rendering of Complex Scenes with Unified Color and LiDAR Reflectance Modeling. 27176-27186 - Xiyao Wang, Zhengyuan Yang, Linjie Li, Hongjin Lu, Yuancheng Xu, Chung-Ching Lin, Kevin Lin, Furong Huang, Lijuan Wang:

Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension. 1173-1184 - Zixuan Hu, Dongxiao Li, Xinzhu Ma, Shixiang Tang, Xiaotong Li, Wenhan Yang, Ling-Yu Duan:

Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts. 7273-7283 - Boqian Li, Haiwen Feng, Zeyu Cai, Michael J. Black, Yuliang Xiu:

ETCH: Generalizing Body Fitting to Clothed Humans Via Equivariant Tightness. 8264-8274 - Geon Yeong Park, Sang Wan Lee, Jong Chul Ye:

Inference-Time Diffusion Model Distillation. 4049-4058 - Hang Guo, Yawei Li, Taolin Zhang, Jiangshan Wang, Tao Dai, Shu-Tao Xia, Luca Benini:

FastVAR: Linear Visual Autoregressive Modeling Via Cached Token Pruning. 19011-19021 - Jinpeng Dong, Chen Li, Yutong Lin, Jingwen Fu, Sanping Zhou, Nanning Zheng:

DAMap: Distance-Aware MapNet for High Quality HD Map Construction. 5285-5294 - Wan Jiang, He Wang, Xin Zhang, Dan Guo, Zhaoxin Fan, Yunfeng Diao, Richang Hong:

Moderating the Generalization of Score-Based Generative Model. 360-369 - Hao Zheng, Yuting Zheng, Hanbo Huang, Chaofan Sun, Enhui Liao, Lin Liu, Yi Han, Hao Zhou, Shiyu Liang:

$\text{CO}_{2}$-Net: A Physics-Informed Spatio-Temporal Model for Global Surface $\text{CO}_{{2}}$ Reconstruction. 6220-6230 - Han Yu, Kehan Li, Dongbai Li, Yue He, Xingxuan Zhang, Peng Cui:

ODP-Bench: Benchmarking Out-Of-Distribution Performance Prediction. 1-13 - Jin Cao, Hongrui Wu, Ziyong Feng, Hujun Bao, Xiaowei Zhou, Sida Peng:

UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction. 27031-27041 - Zhanfeng Liao, Hanzhang Tu, Cheng Peng, Hongwen Zhang, Boyao Zhou, Yebin Liu:

HADES: Human Avatar with Dynamic Explicit Hair Strands. 12318-12327 - Ke Fan, Shunlin Lu, Minyue Dai, Runyi Yu, Lixing Xiao, Zhiyang Dou, Junting Dong, Lizhuang Ma, Jingbo Wang:

Go to Zero: Towards Zero-Shot Motion Generation with Million-Scale Data. 13336-13348 - Jiarui Wang, Huiyu Duan, Yu Zhao, Juntong Wang, Guangtao Zhai, Xiongkuo Min:

LMM4LMM: Benchmarking and Evaluating Large-Multimodal Image Generation With LMMs. 17312-17323 - Jianyu Wu, Yizhou Wang, Xiangyu Yue, Xinzhu Ma, Jinyang Guo, Dongzhan Zhou, Wanli Ouyang, Shixiang Tang:

CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation. 7014-7024 - Shiming Chen, Bowen Duan, Salman Khan, Fahad Shahbaz Khan:

Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model. 478-487 - Munish Monga, Vishal M. Chudasama, Pankaj Wasnik, Biplab Banerjee:

DuET: Dual Incremental Object Detection via Exemplar-Free Task Arithmetic. 3121-3131 - Yufei Shi, Weilong Yan, Gang Xu, Yumeng Li, Yucheng Chen, Zhenxi Li, Fei Yu, Ming Li, Si Yong Yeo:

PVChat: Personalized Video Chat with One-Shot Learning. 23321-23331 - Jaeseok Byun, Young Kyun Jang, Seokhyeon Jeong, Donghyun Kim, Taesup Moon:

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval. 21342-21352 - David A. Kelly, Akchunya Chanchal, Nathan Blake:

I Am Big, You Are Little; I Am Right, You Are Wrong. 1-10 - Renzhi He, Haowen Zhou, Yubei Chen, Yi Xue:

Recover Biological Structure from Sparse-View Diffraction Images with Neural Volumetric Prior. 27771-27782 - Mingqi Fang, Ziguang Li, Lingyun Yu, Quanwei Yang, Hongtao Xie, Yongdong Zhang:

Forensic-MoE: Exploring Comprehensive Synthetic Image Detection Traces With Mixture of Experts. 17772-17782 - Baris Zöngür, Robin Hesse, Stefan Roth:

Activation Subspaces for Out-of-Distribution Detection. 3509-3519 - Bin Cao, Sipeng Zheng, Ye Wang, Lujie Xia, Qianshan Wei, Qin Jin, Jing Liu, Zongqing Lu:

MotionCtrl: A Real-Time Controllable Vision-Language-Motion Model. 12253-12262 - Thomas Kreutz, Max Mühlhäuser, Alejandro Sánchez Guinea:

DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding. 14633-14643 - Jie Feng, Shengyuan Wang, Tianhui Liu, Yanxin Xi, Yong Li:

UrbanLLaVA: A Multi-Modal Large Language Model for Urban Intelligence. 6209-6219 - Liping Yi, Han Yu, Gang Wang, Xiaoguang Liu, Xiaoxiao Li:

Federated Representation Angle Learning. 1314-1324 - Yingqi Tang, Zhuoran Xu, Zhaotie Meng, Erkang Cheng:

HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder. 25605-25615 - Jong-Hyeon Baek, Jiwon Oh, Yeong Jun Koh:

EVOLVE: Event-Guided Deformable Feature Transfer and Dual-Memory Refinement for Low-Light Video Object Segmentation. 11273-11282 - Jie Chen, Zhangchi Hu, Peixi Wu, Huyue Zhu, Hebei Li, Xiaoyan Sun:

DASH: 4D Hash Encoding with Self-Supervised Decomposition for Real-Time Dynamic Scene Rendering. 26349-26359 - Yuhang Yang, Fengqi Liu, Yixing Lu, Qin Zhao, Pingyu Wu, Wei Zhai, Ran Yi, Yang Cao, Lizhuang Ma, Zheng-Jun Zha, Junting Dong:

SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets. 5122-5133 - Dibyadip Chatterjee, Edoardo Remelli, Yale Song, Bugra Tekin, Abhay Mittal, Bharat Bhatnagar, Necati Cihan Camgöz, Shreyas Hampali, Eric Sauser, Shugao Ma, Angela Yao, Fadime Sener:

Streaming Videollms for Real-Time Procedural Video Understanding. 22586-22598 - Yana Hasson, Pauline Luc, Liliane Momeni, Maks Ovsjanikov, Guillaume Le Moing, Alina Kuznetsova, Ira Ktena, Jennifer J. Sun, Skanda Koppula, Dilara Gokay, Joseph Heyward, Etienne Pot, Andrew Zisserman:

SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications. 21800-21811 - Zeqi Zheng, Yanchen Huang, Yingchao Yu, Zizheng Zhu, Junfeng Tang, Zhaofei Yu, Yaochu Jin:

SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition. 24539-24548 - Chang Liu, Yunfan Ye, Fan Zhang, Qingyang Zhou, Yuchuan Luo, Zhiping Cai:

HumanSAM: Classifying Human-Centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly. 14028-14038 - Varun Sundar, Tianyi Zhang, Sacha Jungerman, Mohit Gupta:

Quanta Neural Networks: From Photons to Perception. 5091-5101 - Xianglong He, Zi-Xin Zou, Chia-Hao Chen, Yuan-Chen Guo, Ding Liang, Chun Yuan, Wanli Ouyang, Yan-Pei Cao, Yangguang Li:

SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling. 14822-14833 - Haoyu Yao, Bin Yang, Wenke Huang, Bo Du, Mang Ye:

Unsupervised Visible-Infrared Person Re-Identification Under Unpaired Settings. 11916-11926 - Benjin Zhu, Xiaogang Wang, Hongsheng Li:

ConsistentCity: Semantic Flow-Guided Occupancy DiT for Temporally Consistent Driving Scene Synthesis. 26382-26392 - Javier Tirado-Garín, Javier Civera:

AnyCalib: On-Manifold Learning for Model-Agnostic Single-View Camera Calibration. 8044-8055 - Jiangran Lyu, Ziming Li, Xuesong Shi, Chaoyi Xu, Yizhou Wang, He Wang:

DyWA: Dynamics-Adaptive World Action Model for Generalizable Non-Prehensile Manipulation. 11058-11068 - Changwoon Choi, Jeongjun Kim, Geonho Cha, Minkwan Kim, Dongyoon Wee, Young Min Kim:

Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos. 6598-6608 - Jihun Kim, Hoyong Kwon, Hyeokjun Kweon, Wooseong Jeong, Kuk-Jin Yoon:

DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation. 23279-23289 - Zican Wang, Michael Fischer, Tobias Ritschel:

Stochastic Gradient Estimation for Higher-Order Differentiable Rendering. 28198-28206 - Stathis Galanakis, Alexandros Lattas, Stylianos Moschoglou, Bernhard Kainz, Stefanos Zafeiriou:

SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models. 14346-14356 - Stanislaw Szymanowicz, Jason Y. Zhang, Pratul P. Srinivasan, Ruiqi Gao, Arthur Brussee, Aleksander Holynski, Ricardo Martin-Brualla, Jonathan T. Barron, Philipp Henzler:

Bolt3D: Generating 3D Scenes in Seconds. 24846-24857 - Ruiyun Yu, Bingyang Guo, Haoyuan Li:

Anomaly Detection of Integrated Circuits Package Substrates Using the Large Vision Model SAIC: Dataset Construction, Methodology, and Application. 22563-22574 - Youwei Zheng, Yuxi Ren, Xin Xia, Xuefeng Xiao, Xiaohua Xie:

Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation. 18661-18670 - Sixiang Chen, Tian Ye, Yunlong Lin, Yeying Jin, Yijun Yang, Haoyu Chen, Jianyu Lai, Song Fei, Zhaohu Xing, Fugee Tsung, Lei Zhu:

Genhaze: Pioneering Controllable One-Step Realistic Haze Generation for Real-World Dehazing. 9194-9205 - Haoyang Liu, Yijiang Li, Tiancheng Xing, Peiran Wang, Vibhu Dalal, Luwei Li, Jingrui He, Haohan Wang:

Dataset Distillation via the Wasserstein Metric. 1205-1215 - Michihiro Kuroki, Toshihiko Yamasaki:

CE-FAM: Concept-Based Explanation via Fusion of Activation Maps. 1413-1422 - Tianao Li, Manxiu Cui, Cheng Ma, Emma Alexander:

Coordinate-Based Speed of Sound Recovery for Aberration-Corrected Photoacoustic Computed Tomography. 27466-27475 - Jinsoo Bae, Seoung Bum Kim, Hyungrok Do:

CaliMatch: Adaptive Calibration for Improving Safe Semi-Supervised Learning. 2867-2876 - Yapeng Meng, Yihan Lin, Taoyi Wang, Yuguo Chen, Lijian Wang, Rong Zhao:

Diffusion-Based Extreme High-Speed Scenes Reconstruction with the Complementary Vision Sensor. 5701-5710 - Mingqian Ji, Shanshan Zhang, Jian Yang:

OcRFDet: Object-Centric Radiance Fields for Multi-View 3D Object Detection in Autonomous Driving. 24933-24942 - Jungbin Cho, Junwan Kim, Jisoo Kim, Minseo Kim, Mingu Kang, Sungeun Hong, Tae-Hyun Oh, Youngjae Yu:

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding. 14602-14612 - Zuyu Zhang, Ning Chen, Yongshan Liu, Qinghua Zhang, Xu Zhang:

Adversarial Data Augmentation for Single Domain Generalization via Lyapunov Exponent-Guided Optimization. 552-561 - Ziling Wu, Armaghan Moemeni, Praminda Caleb-Solly:

Ensemble Foreground Management for Unsupervised Object Discovery. 20268-20279 - Xuying Zhang, Yutong Liu, Yangguang Li, Renrui Zhang, Yufei Liu, Kai Wang, Wanli Ouyang, Zhiwei Xiong, Peng Gao, Qibin Hou, Ming-Ming Cheng:

TAR3D: Creating High-Quality 3D Assets Via Next-Part Prediction. 1-12 - Zhefei Gong, Pengxiang Ding, Shangke Lyu, Siteng Huang, Mingyang Sun, Wei Zhao, Zhaoxin Fan, Donglin Wang:

CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction. 13460-13470 - Qi Qin, Le Zhuo, Yi Xin, Ruoyi Du, Zhen Li, Bin Fu, Yiting Lu, Xinyue Li, Dongyang Liu, Xiangyang Zhu, Will Beddow, Erwann Millon, Victor Perez, Wenhai Wang, Yu Qiao, Bo Zhang, Xiaohong Liu, Hongsheng Li, Chang Xu, Peng Gao:

Lumina-Image 2.0: a Unified and Efficient Image Generative Framework. 20031-20042 - Emily Yue-Ting Jia, Jiageng Mao, Zhiyuan Gao, Yajie Zhao, Yue Wang:

Learning an Implicit Physics Model for Image-Based Fluid Simulation. 7055-7057 - Jianlong Jin, Chenglong Zhao, Ruixin Zhang, Sheng Shang, Yang Zhao, Jun Wang, Jingyun Zhang, Shouhong Ding, Wei Jia, Yunsheng Wu:

Unified Adversarial Augmentation for Improving Palmprint Recognition. 14141-14151 - Dejie Yang, Zijing Zhao, Yang Liu:

AR-VRM: Imitating Human Motions for Visual Robot Manipulation with Analogical Reasoning. 6818-6827 - Jan Ackermann, Jonas Kulhanek, Shengqu Cai, Haofei Xu, Marc Pollefeys, Gordon Wetzstein, Leonidas J. Guibas, Songyou Peng:

CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization. 7808-7817 - Haoming Cai, Tsung-Wei Huang, Shiv Gehlot, Brandon Y. Feng, Sachin Shah, Guan-Ming Su, Christopher A. Metzler:

Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models. 18207-18217 - Bo-Lun Huang, Zi-Xiang Ni, Feng-Kai Huang, Hong-Han Shuai, Wen-Huang Cheng:

When Anchors Meet Cold Diffusion: A Multi-Stage Approach to Lane Detection. 27917-27926 - Jiashuo Yu, Yue Wu, Meng Chu, Zhifei Ren, Zizheng Huang, Pei Chu, Ruijie Zhang, Yinan He, Qirui Li, Songze Li, Zhenxiang Li, Zhongying Tu, Conghui He, Yu Qiao, Yali Wang, Yi Wang, Limin Wang:

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos. 21655-21666 - Qidong Huang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Jiaqi Wang, Weiming Zhang, Nenghai Yu:

Deciphering Cross-Modal Alignment in Large Vision-Language Models Via Modality Integration Rate. 218-227 - Wentao Xiang, Haoxian Tan, Yujie Zhong, Cong Wei, Dengjie Li, Yujiu Yang:

Advancing Visual Large Language Model for Multi-Granular Versatile Perception. 22153-22164 - Artem V. Nikonorov, Georgy Perevozchikov, Andrei Korepanov, Nancy Mehta, Mahmoud Afifi, Egor Ershov, Radu Timofte:

Color Matching Using Hypernetwork-Based Kolmogorov-Arnold Networks. 7099-7109 - Yu-Cheng Lin, Yu-Syuan Xu, Hao-Wei Chen, Hsien-Kai Kuo, Chun-Yi Lee:

EAMamba: Efficient All-Around Vision State Space Model for Image Restoration. 11708-11719 - Zhixuan Liu, Haokun Zhu, Rui Chen, Jonathan Francis, Soonmin Hwang, Ji Zhang, Jean Oh:

MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments. 27456-27465 - Takumi Kobayashi:

Temperature in Cosine-based Softmax Loss. 22199-22208 - Ruifei Zhang, Wei Zhang, Xiao Tan, Sibei Yang, Xiang Wan, Xiaonan Luo, Guanbin Li:

VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-Grounded Autonomous Driving. 5923-5933 - Wanchang Yu, Qing Zhang, Rongjia Zheng, Wei-Shi Zheng:

Structure-Guided Diffusion Models for High-Fidelity Portrait Shadow Removal. 11675-11684 - Jiefeng Li, Jinkun Cao, Haotian Zhang, Davis Rempe, Jan Kautz, Umar Iqbal, Ye Yuan:

GENMO: A GENeralist Model for Human MOtion. 11766-11776 - Xiaobao Wei, Peng Chen, Guangyu Li, Ming Lu, Hui Chen, Feng Tian:

GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting. 13293-13303 - Björn Braun, Rayan Armani, Manuel Meier, Max Möbus, Christian Holz:

EgoPPG: Heart Rate Estimation From Eye-Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks. 1-12 - Yuechen Zhang, Yaoyang Liu, Bin Xia, Bohao Peng, Zexin Yan, Eric Lo, Jiaya Jia:

MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers. 14464-14474 - Inkyu Shin, Chenglin Yang, Liang-Chieh Chen:

Deeply Supervised Flow-Based Generative Models. 16535-16544 - Yingying Zhang, Lixiang Ru, Kang Wu, Lei Yu, Lei Liang, Yansheng Li, Jingdong Chen:

SkySense V2: A Unified Foundation Model for Multi-Modal Remote Sensing. 9136-9146 - Kaixuan Jiang, Yang Liu, Weixing Chen, Jingzhou Luo, Ziliang Chen, Ling Pan, Guanbin Li, Liang Lin:

Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering. 9091-9101 - Maolin Wei, Wanzhou Liu, Eshed Ohn-Bar:

Passing the Driving Knowledge Test. 8395-8406 - Wenjin Zhang, Xinyu Li, Chenyang Gao, Ivan Marsic:

SemiVisBooster: Boosting Semi-Supervised Learning for Fine-Grained Classification through Pseudo-Label Semantic Guidance. 1195-1204 - Shengyuan Zhang, An Zhao, Ling Yang, Zejian Li, Chenye Meng, Haoran Xu, Tianrun Chen, Anyang Wei, Perry Pengyun Gu, Lingyun Sun:

Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion. 5007-5016 - Mincheol Park, Cheonjun Park, Seungseop Lim, Mijin Koo, Hyunwuk Lee, Won Woo Ro, Suhyun Kim:

Adversarial Purification via Super-Resolution and Diffusion. 4605-4615 - Shunya Nagashima, Komei Sugiura:

Deep Space Weather Model: Long-Range Solar Flare Prediction from Multi-Wavelength Images. 9396-9405 - Peng Wu, Qiuxia Lai, Hao Fang, Guo-Sen Xie, Yilong Yin, Xiankai Lu, Wenguan Wang:

A Conditional Probability Framework for Compositional Zero-Shot Learning. 3673-3683 - Yijun Liang, Shweta Bhardwaj, Tianyi Zhou:

Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion. 1697-1707 - Jiajin Tang, Zhengxuan Wei, Ge Zheng, Sibei Yang:

Closed-Loop Transfer for Weakly-Supervised Affordance Grounding. 9530-9539 - Ruoyu Wang, Huayang Huang, Ye Zhu, Olga Russakovsky, Yu Wu:

The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation. 17618-17628 - Wangbo Yu, Chaoran Feng, Jianing Li, Jiye Tang, Jiashu Yang, Zhenyu Tang, Meng Cao, Xu Jia, Yuchao Yang, Li Yuan, Yonghong Tian:

Evagaussians: Event Stream Assisted Gaussian Splatting from Blurry Images. 24780-24790 - Yang Xiao, Wang Lu, Jie Ji, Ruimeng Ye, Gen Li, Xiaolong Ma, Bo Hui:

Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing. 20445-20455 - Yuchen Zhou, Jiayu Tang, Xiaoyan Xiao, Yueyao Lin, Linkai Liu, Zipeng Guo, Hao Fei, Xiaobo Xia, Chao Gou:

Where, What, Why: Towards Explainable Driver Attention Prediction. 2675-2685 - Yuning Gong, Jiaming Chen, Xiaohua Ren, Yuanjun Liao, Yanci Zhang:

FlowStyler: Artistic Video Stylization Via Transformation Fields Transports. 10229-10238 - Juliette Marrie, Romain Menegaux, Michael Arbel, Diane Larlus, Julien Mairal:

LUDVIG: Learning-Free Uplifting of 2D Visual Features to Gaussian Splatting Scenes. 7440-7450 - Yuran Dong, Mang Ye:

Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images. 1-10 - Tianyu Zhang, Xin Luo, Li Li, Dong Liu:

StableCodec: Taming One-Step Diffusion for Extreme Image Compression. 17379-17389 - Chikai Shang, Mengke Li, Yiqun Zhang, Zhen Chen, Jinlin Wu, Fangqing Gu, Yang Lu, Yiu-ming Cheung:

PRO-VPT: Distribution-Adaptive Visual Prompt Tuning via Prompt Relocation. 1558-1568 - Romain Thoreau, Valerio Marsocci, Dawa Derksen:

Parameter-Efficient Adaptation of Geospatial Foundation Models Through Embedding Deflection. 9594-9604 - Jiajun Luo, Lizhuo Luo, Jianru Xu, Jiajun Song, Rongwei Lu, Chen Tang, Zhi Wang:

DICE: Staleness-Centric Optimizations for Parallel Diffusion MoE Inference. 15481-15490 - Zheng Zhang, Lihe Yang, Tianyu Yang, Chaohui Yu, Xiaoyang Guo, Yixing Lao, Hengshuang Zhao:

StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth. 7069-7078 - Siyu Jiao, Haoye Dong, Yuyang Yin, Zequn Jie, Yinlong Qian, Yao Zhao, Humphrey Shi, Yunchao Wei:

CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting. 4670-4680 - Wenda Shi, Yiren Song, Dengming Zhang, Jiaming Liu, Xingxing Zou:

FonTS: Text Rendering with Typography and Style Controls. 18463-18474 - Loïck Chambon, Eloi Zablocki, Alexandre Boulch, Mickaël Chen, Matthieu Cord:

GaussRender: Learning 3D Occupancy with Gaussian Rendering. 27010-27020 - Binbin Xiang, Maciej Wielgosz, Stefano Puliti, Kamil Král, Martin Krucek, Azim Missarov, Rasmus Astrup:

ForestFormer3D: A Unified Framework for End-to-End Segmentation of Forest LiDAR 3D Point Clouds. 24717-24727 - Vahid Balazadeh, Mohammadmehdi Ataei, Hyunmin Cheong, Amir Hosein Khasahmadi, Rahul G. Krishnan:

Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models. 7318-7328 - Zhixi Cai, Fucai Ke, Simindokht Jahangard, Maria Garcia de la Banda, Reza Haffari, Peter J. Stuckey, Hamid Rezatofighi:

Naver: a Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning. 24078-24089 - Soham Dasgupta, Shanthika Naik, Preet Savalia, Sujay Kumar Ingle, Avinash Sharma:

NGD: Neural Gradient Based Deformation for Monocular Garment Reconstruction. 25485-25495 - Yaowu Fan, Jia Wan, Tao Han, Antoni B. Chan, Andy J. Ma:

Video Individual Counting for Moving Drones. 1-10 - Xi Cheng, Ruiqi Lei, Di Huang, Zhichao Liao, Fengyuan Piao, Yan Chen, Pingfa Feng, Long Zeng:

Constraint-Aware Feature Learning for Parametric Point Cloud. 28114-28124 - Yuhwan Jeong, Yunseo Yang, Youngho Yoon, Kuk-Jin Yoon:

Robust Adverse Weather Removal via Spectral-based Spatial Grouping. 11872-11883 - Simon Reiß

, Zdravko Marinov, Alexander Jaus, Constantin Seibold, M. Saquib Sarfraz, Erik Rodner, Rainer Stiefelhagen:
Is Visual in-Context Learning for Compositional Medical Tasks Within Reach? 2642-2652 - Fei Peng, Junqiang Wu, Yan Li, Tingting Gao, Di Zhang, Huiyuan Fu:

MUSE: Multi-Subject Unified Synthesis Via Explicit Layout Semantic Expansion. 15885-15895 - Yangfu Li, Hongjian Zhan, Qi Liu, Li Sun, Yu-Jie Xiong, Yue Lu:

MSA2: Multi-Task Framework With Structure-Aware and Style-Adaptive Character Representation for Open-Set Chinese Text Recognition. 23095-23104 - Ying Xue, Jiaxi Jiang, Rayan Armani, Dominik Hollidt, Yi-Chi Liao, Christian Holz:

Group Inertial Poser: Multi-Person Pose and Global Translationfrom Sparse Inertial Sensors and Ultra-Wideband Ranging. 24910-24921 - Han Han, Wei Zhai, Yang Cao, Bin Li, Zhengjun Zha:

MATE: Motion-Augmented Temporal Consistency for Event-Based Point Tracking. 8340-8349 - Walid Bousselham, Angie W. Boggust, Sofian Chaybouti, Hendrik Strobelt, Hilde Kuehne:

LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity. 1-10 - Tongyan Hua, Lutao Jiang, Ying-Cong Chen, Wufan Zhao:

Sat2City: 3D City Generation from a Single Satellite Image with Cascaded Latent Diffusion. 27978-27988 - Jongseob Yun, Yong-Hoon Kwon, Min-Gyu Park, Ju-Mi Kang, Min-Ho Lee, Inho Chang, Ju Hong Yoon, Kuk-Jin Yoon:

WarpHE4D: Dense 4D Head Map Toward Full Head Reconstruction. 11480-11490 - Jiahui Wang, Zuyan Liu, Yongming Rao, Jiwen Lu:

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs. 23177-23187 - Luoxi Zhang, Pragyan Shrestha, Yu Zhou, Chun Xie, Itaru Kitahara:

Dual-S3D: Hierarchical Dual-Path Selective SSM-CNN for High-Fidelity Implicit Reconstruction. 25104-25113 - Yan Li, Yang Xu, Changhao Chen, Zhongchen Shi, Wei Chen, Liang Xie, Hongbo Chen, Erwei Yin:

M2EIT: Multi-Domain Mixture of Experts for Robust Neural Inertial Tracking. 28207-28216 - Ruowen Zhao, Junliang Ye, Zhengyi Wang, Guangce Liu, Yiwen Chen, Yikai Wang, Jun Zhu:

DeepMesh: Auto-Regressive Artist-Mesh Creation with Reinforcement Learning. 10612-10623 - Zhixiang Chi, Yanan Wu, Li Gu, Huan Liu, Ziqiang Wang, Yang Zhang, Yang Wang, Konstantinos N. Plataniotis:

Plug-in Feedback Self-Adaptive Attention in CLIP for Training-Free Open-Vocabulary Segmentation. 22815-22825 - Shengcao Cao, Zijun Wei, Jason Kuen, Kangning Liu, Lingzhi Zhang, Jiuxiang Gu, Hyunjoon Jung, Liang-Yan Gui, Yu-Xiong Wang:

Refer to Any Segmentation Mask Group with Vision-Language Prompts. 21853-21863 - JiaKui Hu, Zhengjian Yao, Lujia Jin, Hangzhou He, Yanye Lu:

Enhancing Image Restoration Transformer via Adaptive Translation Equivariance. 16047-16057 - Brian K. S. Isaac-Medina, Mauricio Che, Yona Faline A. Gaus, Samet Akcay, Toby P. Breckon:

FEVER-OOD: Free Energy Vulnerability Elimination for Robust Out-of-Distribution Detection. 4529-4538 - Teng-Fang Hsiao, Bo-Kai Ruan, Yi-Lun Wu, Tzu-Ling Lin, Hong-Han Shuai:

TF-TI2I: Training-Free Text-And-Image-To-Image Generation via Multi-Modal Implicit-Context Learning in Text-To-Image Models. 18377-18387 - Yunwei Lan, Zhigao Cui, Xin Luo, Chang Liu, Nian Wang, Menglin Zhang, Yanzhao Su, Dong Liu:

When Schrödinger Bridge Meets Real-World Image Dehazing with Unpaired Training. 8756-8765 - Mingze Sun, Shiwei Mao, Keyi Chen, Yurun Chen, Shunlin Lu, Jingbo Wang, Junting Dong, Ruqi Huang:

ARMO: Autoregressive Rigging for Multi-Category Objects. 7721-7730 - Zhirui Gao, Renjiao Yi, Yuhang Huang, Wei Chen, Chenyang Zhu, Kai Xu:

Self-Supervised Learning of Hybrid Part-Aware 3D Representations of 2D Gaussians and Superquadrics. 9649-9659 - Yunheng Li, Yuxuan Li, Quan-Sheng Zeng, Wenhai Wang, Qibin Hou, Ming-Ming Cheng:

Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction. 23795-23805 - Marc Lafon, Yannis Karmim, Julio Silva-Rodríguez, Paul Couairon, Clément Rambour, Raphaël Fournier-S'niehotta, Ismail Ben Ayed, Jose Dolz, Nicolas Thome:

ViLU: Learning Vision-Language Uncertainties for Failure Prediction. 17807-17817 - Siqi Luo, Haoran Yang, Yi Xin, Mingyang Yi, Guangyang Wu, Guangtao Zhai, Xiaohong Liu:

TR-PTS: Task-Relevant Parameter and Token Selection for Efficient Tuning. 4360-4369 - Xilin He, Cheng Luo, Xiaole Xian, Bing Li, Muhammad Haris Khan, Zongyuan Ge, Weicheng Xie, Siyang Song, Linlin Shen, Bernard Ghanem

, Xiangyu Yue:
SynFER: Towards Boosting Facial Expression Recognition With Synthetic Data. 10184-10195 - Soumyadipta Banerjee, Jiaul H. Paik, Debashis Sen:

Wide2Long: Learning Lens Compression and Perspective Adjustment for Wide-Angle to Telephoto Translation. 29001-29009 - Jieming Bian, Lei Wang, Letian Zhang, Jie Xu:

LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement. 3737-3746 - Lei Tian, Xiaomin Li, Liqian Ma, Hao Yin, Zirui Zheng, Hefei Huang, Taiqing Li, Huchuan Lu, Xu Jia:

CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting. 9855-9864 - Anjun Hu, Richard Tomsett, Valentin Gourmet, Massimo Camplani, Jas Kandola, Hanting Xie:

MiDSummer: Multi-Guidance Diffusion for Controllable Zero-Shot Immersive Gaussian Splatting Scene Generation. 26793-26805 - Hengrui Kang, Siwei Wen, Zichen Wen, Junyan Ye, Weijia Li, Peilin Feng, Baichuan Zhou, Bin Wang, Dahua Lin, Linfeng Zhang, Conghui He:

LEGION: Learning to Ground and Explain for Synthetic Image Detection. 18937-18947 - Junwen Huang, Shishir Reddy Vutukur, Peter KT Yu, Nassir Navab, Slobodan Ilic, Benjamin Busam:

RayPose: Ray Bundling Diffusion for Template Views in Unseen 6D Object Pose Estimation. 9700-9710 - Hao Chen, Tao Han, Song Guo, Jie Zhang, Yonghan Dong, Yue Yu, Lei Bai:

VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting. 7915-7924 - Yuanhong Zheng, Ruixuan Yu, Jian Sun:

Efficient Multi-Person Motion Prediction by Lightweight Spatial and Temporal Interactions. 10844-10853 - Ruijie Lu, Yixin Chen, Yu Liu, Jiaxiang Tang, Junfeng Ni, Diwen Wan, Gang Zeng, Siyuan Huang:

TACO: Taming Diffusion for In-the-Wild Video Amodal Completion. 13639-13650 - Yinhan Zhang, Yue Ma, Bingyuan Wang, Qifeng Chen, Zeyu Wang:

Magiccolor: Multi-Instance Sketch Colorization. 15205-15217 - Fangwei Zhong, Kui Wu, Churan Wang, Hao Chen, Hai Ci, Zhoujun Li, Yizhou Wang:

UnrealZoo: Enriching Photo-Realistic Virtual Worlds for Embodied AI. 5769-5779 - Wenjie Zhuo, Fan Ma, Hehe Fan:

InfiniDreamer: Arbitrarily Long Human Motion Generation Via Segment Score Distillation. 14688-14698 - Zhimin Liao, Ping Wei, Ruijie Zhang, Shuaijia Chen, Haoxuan Wang, Ziyang Ren:

$I^{\mathbf{2}}$-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting. 25810-25819 - Qinglei Cao, Ziyao Tang, Xiaoqin Tang:

TPG-INR: Target Prior-Guided Implicit 3D CT Reconstruction for Enhanced Sparse-View Imaging. 28239-28248 - Yujian Lee, Peng Gao, Yongqi Xu, Wentao Fan:

How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation? 23342-23352 - Ilan Naiman, Emanuel Ben Baruch, Oron Anschel, Alon Shoshan, Igor Kviatkovsky, Manoj Aggarwal, Gérard G. Medioni:

LV-MAE: Learning Long Video Representations Through Masked-Embedding Autoencoders. 21398-21407 - Chuanwei Huang, Zexi Jia, Hongyan Fei, Yeshuang Zhu, Zhiqiang Yuan, Ying Deng, Jiapei Zhang, Xiaoyue Duan, Jinchao Zhang, Jie Zhou:

MCID: Multi-aspect Copyright Infringement Detection for Generated Images. 16154-16164 - Yooshin Cho, Hanbyel Cho, Janghyeon Lee, Hyeong Gwon Hong, Jaesung Ahn, Junmo Kim:

Controllable Feature Whitening for Hyperparameter-Free Bias Mitigation. 4550-4560 - Zhihao Zhu, Yifan Zheng, Siyu Pan, Yaohui Jin, Yao Mu:

PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation. 8950-8960 - Matic Fucka, Vitjan Zavrtanik, Danijel Skocaj:

SALAD - Semantics-Aware Logical Anomaly Detection. 21843-21852 - Mengkun She, Felix Seegräber, David Nakath, Patricia Schöntag, Kevin Köser:

Relative Illumination Fields: Learning Medium and Light Independent Underwater Scenes. 29110-29119 - Teng Ma, Xiaojun Jia, Ranjie Duan, Xinfeng Li, Yihao Huang, Xiaoshuang Jia, Zhixuan Chu, Wenqi Ren:

Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models. 2686-2696 - Chaesong Park, Eunbin Seo, Jihyeon Hwang, Jongwoo Lim:

SC-Lane: Slope-Aware and Consistent Road Height Estimation Framework for 3D Lane Detection. 28407-28416 - Xidan Zhang

, Yihan Zhuang, Qian Guo, Haodong Yang, Xuelin Qian, Gong Cheng, Junwei Han, Zhongling Huang:
$\Phi$-GAN: Physics-Inspired GAN for Generating SAR Images Under Limited Data. 29075-29085 - Yuanzhi Zhu, Xi Wang, Stéphane Lathuilière, Vicky Kalogeiton:

Di[M]O: Distilling Masked Diffusion Models Into One-Step Generator. 18606-18618 - Handong Li, Yiyuan Zhang, Longteng Guo, Xiangyu Yue, Jing Liu:

Breaking the Encoder Barrier for Seamless Video-Language Understanding. 23167-23176 - Han Wang, Shengyang Li, Jian Yang, Yuxuan Liu, Yixuan Lv, Zhuang Zhou:

Cross-Modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method. 7873-7883 - Donggeun Lim, Jinseok Bae, Inwoo Hwang, Seungmin Lee, Hwanhee Lee, Young Min Kim:

Event-Driven Storytelling with Multiple Lifelike Humans in a 3D Scene. 11654-11664 - Hyung Kyu Kim, Sangmin Lee, Hak Gu Kim:

MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization. 11241-11251 - Jingwen Deng, Zihao Wang, Shaofei Cai, Anji Liu, Yitao Liang:

Open-World Skill Discovery from Unsegmented Demonstration Videos. 10708-10718 - Yan Zhuang, Minhao Liu, Wei Bai, Yanru Zhang, Xiaoyue Zhang, Jiawen Deng, Fuji Ren:

CMAD: Correlation-Aware and Modalities-Aware Distillation for Multimodal Sentiment Analysis with Missing Modalities. 4626-4636 - Mengyu Wang, Henghui Ding, Jianing Peng, Yao Zhao, Yunpeng Chen, Yunchao Wei:

CharaConsist: Fine-Grained Consistent Character Generation. 16058-16067 - Shawn Li, Peilin Cai, Yuxiao Zhou, Zhiyu Ni, Renjie Liang, You Qin, Yi Nian, Zhengzhong Tu, Xiyang Hu, Yue Zhao:

Secure On-Device Video OOD Detection without Backpropagation. 112-121 - Kasra Arabi, R. Teal Witter, Chinmay Hegde, Niv Cohen:

SEAL: Semantic Aware Image Watermarking. 16196-16205 - Hsuan-I Ho, Chen Guo, Po-Chen Wu, Ivan Shugurov, Chengcheng Tang, Abhay Mittal, Sizhe An, Manuel Kaufmann, Linguang Zhang:

PHD: Personalized 3D Human Body Fitting with Point Diffusion. 7526-7537 - Yiyuan Zhang, Handong Jing, Jing Liu, Xiangyu Yue:

Learning Beyond Still Frames: Scaling Vision-Language Models with Video. 22425-22435 - Junqi Ge, Ziyi Chen, Jintao Lin, Jinguo Zhu, Xihui Liu, Jifeng Dai, Xizhou Zhu:

V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding. 21070-21084 - Sandro Papais, Letian Wang, Brian Cheong, Steven L. Waslander:

ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting. 25474-25484 - Junyi Guo, Jingxuan Zhang, Fangyu Wu, Huanda Lu, Qiufeng Wang, Wenmian Yang, Eng Gee Lim, Dongming Lu:

HiGarment: Cross-Modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image. 18542-18551 - Tri Ton, Ji Woo Hong, Chang D. Yoo:

TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-To-Audio Synthesis. 14228-14237 - Guanyi Qin, Ziyue Wang, Daiyun Shen, Haofeng Liu, Hantao Zhou, Junde Wu, Runze Hu, Yueming Jin:

Structure Matters: Revisiting Boundary Refinement in Video Object Segmentation. 14431-14442 - Yingyan Xu, Kate Gadola, Prashanth Chandran, Sebastian Weiss, Markus Gross, Gaspard Zoss, Derek Bradley:

Monocular Facial Appearance Capture in the Wild. 12078-12088 - Haodong Zhu, Wenhao Dong, Linlin Yang, Hong Li, Yuguang Yang, Yangyang Ren, Qingcheng Zhu, Zichao Feng, Changbai Li, Shaohui Lin, Runqi Wang, Xiaoyan Luo, Baochang Zhang:

WaveMamba: Wavelet-Driven Mamba Fusion for RGB-Infrared Object Detection. 11219-11229 - Hengzhe Jin, Lang Nie, Chunyu Lin, Xiaomei Feng, Yao Zhao:

PixelStitch: Structure-Preserving Pixel-Wise Bidirectional Warps for Unsupervised Image Stitching. 28125-28134 - Hyolim Kang, Yunsu Park, Youngbeom Yoo, Yeeun Choi, Seon Joo Kim:

Open-Ended Hierarchical Streaming Video Understanding with Vision Language Models. 20715-20725 - Chengyu Tao, Xuanming Cao, Juan Du:

G2SF: Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection. 20551-20560 - Ronggang Huang, Haoxin Yang, Yan Cai, Xuemiao Xu, Huaidong Zhang, Shengfeng He:

ViewSRD: 3D Visual Grounding Via Structured Multi-View Decomposition. 11 - Hyeonwoo Kim, Sangwon Baik, Hanbyul Joo:

DAViD: Modeling Dynamic Affordance of 3D Objects Using Pre-Trained Video Diffusion Models. 10330-10341 - Jiayi Li:

SD2 Actor: Continuous State Decomposition Via Diffusion Embeddings for Robotic Manipulation. 13751-13760 - Chiao-An Yang, Raymond A. Yeh:

Heatmap Regression without Soft-Argmax for Facial Landmark Detection. 28729-28739 - Gunjan Chhablani, Xiaomeng Ye, Muhammad Zubair Irshad, Zsolt Kira:

EmbodiedSplat: Personalized Real-To-Sim-To-Real Navigation with Gaussian Splats From a Mobile Device. 25431-25441 - Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu, Radu Timofte:

After the Party: Navigating the Mapping from Color to Ambient Lighting. 9218-9229 - Shangpin Peng, Songiao Yang, Li Jiang, Zhuotao Tian:

Mitigating Object Hallucinations via Sentence-Level Early Intervention. 635-646 - Zichen Tang, Haihong E, Jiacheng Liu, Zhongjun Yang, Rongjin Li, Zihua Rong, Haoyang He, Zhuodi Hao, Xinyang Hu, Kun Ji, Ziyan Ma, Mengyuan Ji, Jun Zhang, Chenghao Ma, Qianhe Zheng, Yang Liu, Yiling Huang, Xinyi Hu, Qing Huang, Zijian Xie, Shiyao Peng:

$\mathcal{F}_{M}$ FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging. 3245-3257 - Haipeng Xiong, Kai Xu, Angela Yao:

Diagnosing Pretrained Models for Out-of-Distribution Detection. 1836-1845 - Zeyi Sun, Tong Wu, Pan Zhang, Yuhang Zang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang:

Bootstrap3D: Improving Multi-View Diffusion Model with Synthetic Data. 15714-15726 - Haoran Wang, Zekun Li, Jian Zhang, Lei Qi, Yinghuan Shi:

Correspondence as Video: Test-Time Adaption on SAM2 for Reference Segmentation in the Wild. 1-10 - Jiaxin Liu, Qichao Ying, Zhenxing Qian, Sheng Li, Runqi Zhang, Jian Liu, Xinpeng Zhang:

MoFRR: Mixture of Diffusion Models for Face Retouching Restoration. 12842-12851 - Ruofan Wang, Juncheng Li, Yixu Wang, Bo Wang, Xiaosen Wang, Yan Teng, Yingchun Wang, Xingjun Ma, Yu-Gang Jiang:

Ideator: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves. 8875-8884 - Giacomo D'Amicantonio, Snehashis Majhi, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, François Brémond, Egor Bondarev:

Mixture of Experts Guided by Gaussian Splatters Matters: A New Approach to Weakly-Supervised Video Anomaly Detection. 10275-10285 - Zhengxuan Wei, Jiajin Tang, Sibei Yang:

Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning. 3401-3412 - Hanqing Liu, Shouwei Ruan, Yao Huang, Shiji Zhao, Xingxing Wei:

When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack. 10485-10495 - Tuna Han Salih Meral, Enis Simsar, Federico Tombari, Pinar Yanardag:

Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation. 18090-18100 - Meiqi Wang, Han Qiu:

VISO: Accelerating In-Orbit Object Detection with Language-Guided Mask Learning and Sparse Inference. 23300-23310 - Feixiang Wang, Shuang Yang, Shiguang Shan, Xilin Chen:

CogCM: Cognition-Inspired Contextual Modeling for Audio-Visual Speech Enhancement. 21408-21418 - Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan, Gady Agam:

MaskSAM: Auto-Prompt SAM with Mask Classification for Volumetric Medical Image Segmentation. 24423-24433 - Haitam Ben Yahia, Denis Korzhenkov, Ioannis Lelekas, Amir Ghodrati, Amirhossein Habibian:

Mobile Video Diffusion. 19450-19460 - Yihang Zhu, Jinhao Zhang, Yuxuan Wang, Aming Wu, Cheng Deng:

VGMamba: Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding. 5295-5304 - Wenxuan Zhu

, Bing Li, Cheng Zheng, Jinjie Mai, Jun Chen, Letian Jiang
, Abdullah Hamdi, Sara Rojas Martinez, Chia-Wen Lin, Mohamed Elhoseiny
, Bernard Ghanem
:
4D-Bench: Benchmarking Multi-Modal Large Language Models for 4D Object Understanding. 21129-21143 - Yu Lei, Bingde Liu, Qingsong Xie, Haonan Lu, Zhijie Deng:

Advancing Text-to-3D Generation with Linearized Lookahead Variational Score Distillation. 19567-19576 - Xiang Xu, Lingdong Kong, Song Wang, Chuanwei Zhou, Qingshan Liu:

Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations. 25506-25518 - Ronglai Zuo, Rolandos Alexandros Potamias, Evangelos Ververas, Jiankang Deng, Stefanos Zafeiriou:

Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator. 23806-23816 - Athinoulla Konstantinou, Georgios Leontidis, Mamatha Thota, Aiden Durrant:

Equicaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks. 7947-7957 - Sangwon Baik, Hyeonwoo Kim, Hanbyul Joo:

Learning 3D Object Spatial Relationships From Pre-Trained 2D Diffusion Models. 8418-8428 - Jizong Peng, Tze Ho Elden Tse, Kai Xu, Wenchao Gao, Angela Yao:

A Constrained Optimization Approach for Gaussian Splatting from Coarsely-Posed Images and Noisy Lidar Point Clouds. 2961-2970 - Zhongdao Wang, Guodongfang Zhao, Jingjing Ren, Bailan Feng, Shifeng Zhang, Wenbo Li:

TurboVSR: Fantastic Video Upscalers and Where to Find Them. 18132-18142 - Tianqi Liu, Zihao Huang, Zhaoxi Chen, Guangcong Wang, Shoukang Hu, Liao Shen, Zhiguo Cao, Wei Li, Ziwei Liu:

Free4D: Tuning-Free 4D Scene Generation with Spatial-Temporal Consistency. 25571-25582 - Hongrui Yu, Lu Qi, Wanyu Lin, Jian Chen, Hailong Sun, Chengbin Sun:

Backdoor Defense via Enhanced Splitting and Trap Isolation. 1708-1717 - Zhi Hou, Tianyi Zhang, Yuwen Xiong, Haonan Duan, Hengjun Pu, Ronglei Tong, Chengyang Zhao, Xizhou Zhu, Yu Qiao, Jifeng Dai, Yuntao Chen:

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy. 7686-7697 - Meihan Wu, Tao Chang, Cui Miao, Jie Zhou, Chun Li, Xiangyu Xu, Ming Li, Xiaodong Wang:

EFTViT: Efficient Federated Training of Vision Transformers with Masked Images on Resource-Constrained Clients. 1815-1824 - Zijie Xin, Minquan Wang, Jingyu Liu, Quan Chen, Ye Ma, Peng Jiang, Xirong Li:

Music Grounding by Short Video. 22285-22293 - Shuo Zhang, Chen Gao, Youfang Lin:

Exploring View Consistency for Scene-Adaptive Low-Light Light Field Image Enhancement. 1-10 - Dadong Jiang, Zhi Hou, Zhihui Ke, Xianghui Yang, Xiaobo Zhou, Tie Qiu:

Timeformer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction. 8721-8732 - Ahmed S. Nassar, Matteo Omenetti, Maksym Lysak, Nikolaos Livathinos, Christoph Auer, Lucas Morin, Rafael Teixeira de Lima, Yusik Kim, A. Said Gurbuz, Michele Dolfi, Peter W. J. Staar:

SmolDocling: An Ultra-Compact Vision-Language Model for End-To-End Multi-Modal Document Conversion. 21972-21983 - Yi Chen, Yuying Ge, Weiliang Tang, Yizhuo Li, Yixiao Ge, Mingyu Ding, Ying Shan, Xihui Liu:

Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos. 19752-19763 - Liwei Luo, Shuaitengyuan Li, Dongwei Ren, Qilong Wang, Pengfei Zhu, Qinghua Hu:

Decoupled Multi-Predictor Optimization for Inference-Efficient Model Tuning. 3628-3638 - Zeyu Wang, Jizheng Zhang, Haiyu Song, Mingyu Ge, Jiayu Wang, Haoran Duan:

Highlight What You Want: Weakly-Supervised Instance-Level Controllable Infrared-Visible Image Fusion. 12637-12647 - Hanyu Zhou, Haonan Wang, Haoyue Liu, Yuxing Duan, Luxin Yan, Gim Hee Lee:

STD-GS: Exploring Frame-Event Interaction for SpatioTemporal-Disentangled Gaussian Splatting to Reconstruct High-Dynamic Scene. 24801-24810 - Peng Zheng, Junke Wang, Yi Chang, Yizhou Yu, Rui Ma, Zuxuan Wu:

Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis. 17390-17400 - Yunfei Long, Zilin Tian, Liguo Zhang, Huosheng Xu:

Boosting Adversarial Transferability via Negative Hessian Trace Regularization. 2386-2395 - Shuai Liu, Peng Zhang, Shiwei Zhang, Wei Ke:

CountSE: Soft Exemplar Open-Set Object Counting. 21536-21546 - Alireza Esmaeilzehi, Hossein Zaredar, Yapeng Tian, Laleh Seyyed-Kalantari:

ZFusion: Efficient Deep Compositional Zero-Shot Learning for Blind Image Super-Resolution with Generative Diffusion Prior. 12338-12348 - Haoxuan Wang, Zhenghao Zhao, Junyi Wu, Yuzhang Shang, Gaowen Liu, Yan Yan:

CaO2: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation. 4722-4731 - Jiale Xu, Shenghua Gao, Ying Shan:

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction. 25442-25452 - Zhaolun Li, Jichang Li, Yinqi Cai, Junye Chen, Xiaonan Luo, Guanbin Li, Rushi Lan:

FakeRadar: Probing Forgery Outliers to Detect Unknown Deepfake Videos. 13382-13392 - Junjia Huang, Pengxiang Yan, Jinhang Cai, Jiyang Liu, Zhao Wang, Yitong Wang, Xinglong Wu, Guanbin Li:

DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Model. 3357-3366 - Rui Sun, Huayu Mai, Wangkai Li, Yujia Chen, Yuan Wang:

Two Losses, One Goal: Balancing Conflict Gradients for Semi-Supervised Semantic Segmentation. 20357-20367 - Jiahao Xia, Yike Wu, Wenjian Huang, Jianguo Zhang, Jian Zhang:

Unsupervised Part Discovery via Descriptor-Based Masked Image Restoration with Optimized Constraints. 8668-8677 - Yichi Zhang, Le Xue, Wenbo Zhang, Lanlan Li, Yuchen Liu, Chen Jiang, Yuan Cheng, Yuan Qi:

SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images. 21107-21116 - Jing Wang, Rui Zhao, Ruiqin Xiong, Xingtao Wang, Xiaopeng Fan, Tiejun Huang:

SAMPLE: Semantic Alignment through Temporal-Adaptive Multimodal Prompt Learning for Event-Based Open-Vocabulary Action Recognition. 14409-14419 - Liwen Xiao, Zhiyu Pan, Zhicheng Wang, Zhiguo Cao, Wei Li:

SRefiner: Soft-Braid Attention for Multi-Agent Trajectory Refinement. 960-969 - Yuxuan Zhang, Yirui Yuan, Yiren Song, Haofan Wang, Jiaming Liu:

EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer. 19513-19524 - Kesen Zhao, Beier Zhu, Qianru Sun, Hanwang Zhang:

Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization. 2303-2312 - Hui Sun, Shiyin Lu, Huanyu Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Ming Li:

Mdp3: a Training-Free Approach for List-Wise Frame Selection in Video-Llms. 24090-24101 - Cheng-Fu Yang, Da Yin, Wenbo Hu, Heng Ji, Nanyun Peng, Bolei Zhou, Kai-Wei Chang:

Verbalized Representation Learning for Interpretable Few-Shot Generalization. 1602-1612 - Ziqi Ma, Yisong Yue, Georgia Gkioxari:

Find any Part in 3D. 1-10 - Eunjin Son, HyungGi Jo, Wookyong Kwon, Sang Jun Lee:

MDP-Omni: Parameter-Free Multimodal Depth Prior-Based Sampling for Omnidirectional Stereo Matching. 26178-26187 - Tim Seizinger, Florin-Alexandru Vasluianu, Marcos V. Conde, Zongwei Wu, Radu Timofte:

Bokehlicious: Photorealistic Bokeh Rendering with Controllable Apertures. 8908-8917 - Liwei Wang, Yanduo Zhang, Tao Lu, Fang Liu, Huiqin Zhang, Jiayi Ma, Huabing Zhou:

End-to-End Entity-Predicate Association Reasoning for Dynamic Scene Graph Generation. 17729-17738 - George Ciubotariu, Zhuyun Zhou, Zongwei Wu, Radu Timofte:

MIORe & VAR-MIORe: Benchmarks to Push the Boundaries of Restoration. 19784-19793 - Samir Khaki, Junxian Guo, Jiaming Tang, Shang Yang, Yukang Chen, Konstantinos N. Plataniotis, Yao Lu, Song Han, Zhijian Liu:

SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference. 23784-23794 - Shi-Chen Zhang, Yunheng Li, Yu-Huan Wu, Qibin Hou, Ming-Ming Cheng:

Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment. 22361-22371 - Jun Xiang, Yudong Guo, Leipeng Hu, Boyang Guo, Yancheng Yuan, Juyong Zhang:

Expressive Talking Human from Single-Image with Imperfect Priors. 10398-10409 - Chenjian Gao, Lihe Ding, Rui Han, Zhanpeng Huang, Zibin Wang, Tianfan Xue:

From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos. 25712-25721 - Nicolai Hermann, Jorge Condor, Piotr Didyk:

Puzzle Similarity: A Perceptually-Guided Cross-Reference Metric for Artifact Detection in 3D Scene Reconstructions. 28881-28891 - Yu-Lin Tsai, Yizhe Li, Chia-Mu Yu, Xuebin Ren, Po-Yu Chen, Zekai Chen, Francois Buet-Golfouse:

Differentially Private Fine-Tuning of Diffusion Models. 4561-4571 - Tahira Shehzadi, Khurram Azeem Hashmi, Shalini Sarode, Didier Stricker, Muhammad Zeshan Afzal:

STEP-DETR: Advancing DETR-based Semi-Supervised Object Detection with Super Teacher and Pseudo-Label Guided Text Queries. 3069-3079 - Hang Du, Jiayang Zhang, Guoshun Nan, Wendi Deng, Zhenyan Chen, Chenyang Zhang, Xiao Wang, Shan Huang, Yuqi Pan, Tao Qi, Sicong Leng:

From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning. 859-869 - Jiale Zhao, Xinyang Jiang, Junyao Gao, Yuhao Xue, Cairong Zhao:

One Object, Multiple Lies: A Benchmark for Cross-Task Adversarial Attack on Unified Vision-Language Models. 187-196 - Yehonathan Litman, Fernando De la Torre, Shubham Tulsiani:

LightSwitch: Multi-View Relighting with Material-Guided Diffusion. 27750-27759 - Yi Li, Hualiang Wang, Xinpeng Ding, Haonan Wang, Xiaomeng Li:

Token Activation Map to Visually Explain Multimodal LLMs. 48-58 - Ruonan Liu, Lin Zhu, Xijie Xiang, Lizhi Wang, Hua Huang:

Noise-Modeled Diffusion Models for Low-Light Spike Image Restoration. 4080-4089 - Runqi Wang, Yang Chen, Sijie Xu, Tianyao He, Wei Zhu, Dejia Song, Nemo Chen, Xu Tang, Yao Hu:

DynamicFace: High-Quality and Consistent Face Swapping for Image and Video Using Composable 3D Facial Priors. 13438-13447 - Abhinav Kumar, Yuliang Guo, Zhihao Zhang, Xinyu Huang, Liu Ren, Xiaoming Liu:

CHARM3R: Towards Unseen Camera Height Robust Monocular 3D Detector. 8777-8788 - Haicheng Wang, Zhemeng Yu, Gabriele Spadaro, Chen Ju, Victor Quétu, Shuai Xiao, Enzo Tartaglione:

FOLDER: Accelerating Multi-Modal Large Language Models with Enhanced Performance. 23614-23625 - Feng Yan, Fanfan Liu, Yiyang Huang, Zechao Guan, Liming Zheng, Yufeng Zhong, Chengjian Feng, Lin Ma:

RoboTron-Mani: All-in-One Multimodal Large Model for Robotic Manipulation. 13707-13718 - Alexander C. Ogren, Berthy T. Feng, Jihoon Ahn, Katherine L. Bouman, Chiara Daraio:

Visual Surface Wave Elastography: Revealing Subsurface Physical Properties via Visible Surface Waves. 26446-26455 - Hengyuan Zhang, Zhe Li, Xingqun Qi, Mengze Li, Muyi Sun, Siye Wang, Man Zhang, Sirui Han:

DanceEditor: Towards Iterative Editable Music-Driven Dance Generation with Open-Vocabulary Descriptions. 12158-12168 - Giacomo Meanti, Thomas Ryckeboer, Michael Arbel, Julien Mairal:

Unsupervised Imaging Inverse Problems with Diffusion Distribution Matching. 28364-28374 - Khurram Azeem Hashmi, Karthik Palyakere Suresh, Didier Stricker, Muhammad Zeshan Afzal:

TorchAdapt: Towards Light-Agnostic Real-Time Visual Perception. 5645-5656 - Song Wang, Xie Han, Liqun Kuang, Boying Wang, Zhongyu Chen, Zherui Qiao, Fan Yang, Xiaoxia Liu, Bingyu Zhang, Zhixun Wang:

The Source Image Is the Best Attention for Infrared and Visible Image Fusion. 13513-13522 - Yuan Tian, Shuo Wang, Rongzhao Zhang, Zijian Chen, Yankai Jiang, Chunyi Li, Xiangyang Zhu, Fang Yan, Qiang Hu, Xiaosong Wang, Guangtao Zhai:

Semantics Versus Identity: A Divide-and-Conquer Approach Towards Adjustable Medical Image De-Identification. 20613-20625 - Yusen Xie, Zhenmin Huang, Jin Wu, Jun Ma:

GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting. 26869-26878 - Jiaming Liu, Linghe Kong, Guihai Chen:

Improving SAM for Camouflaged Object Detection via Dual Stream Adapters. 21906-21916 - Shuyu Yang, Yaxiong Wang, Li Zhu, Zhedong Zheng:

Beyond Walking: A Large-Scale Image-Text Benchmark for Text-Based Person Anomaly Search. 11720-11730 - Yuan Yao, Qiushi Yang, Miaomiao Cui, Liefeng Bo:

Towards Fine-Grained Interactive Segmentation in Images and Videos. 22509-22518 - Xiefan Guo, Miaomiao Cui, Liefeng Bo, Di Huang:

ShortFT: Diffusion Model Alignment via Shortcut-Based Fine-Tuning. 678-687 - Mattia Soldan

, Fabian Caba Heilbron, Bernard Ghanem
, Josef Sivic, Bryan C. Russell:
ResidualViT for Efficient Temporally Dense Video Encoding. 22305-22315 - Feihong Yan, Qingyan Wei, Jiayi Tang, Jiajun Li, Yulin Wang, Xuming Hu, Huiqi Li, Linfeng Zhang:

LazyMAR: Accelerating Masked Autoregressive Models Via Feature Caching. 15552-15561 - Jijun Xiang, Xuan Zhu, Xianqi Wang, Yu Wang, Hong Zhang, Fei Guo, Xin Yang:

DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image. 6101-6111 - Yuval Grader, Hadar Averbuch-Elor:

Supercharging Floorplan Localization with Semantic Rays. 27116-27125 - Yuheng Du, Sheng Yang, Lingxuan Wang, Zhenghua Hou, Chengying Cai, Zhitao Tan, Mingxia Chen, Shi-Sheng Huang, Qiang Li:

RTMap: Real-Time Recursive Mapping with Change Detection and Localization. 28021-28030 - Yanchen Liu, Yanan Sun, Zhening Xing, Junyao Gao, Kai Chen, Wenjie Pei:

MotionShot: Adaptive Motion Transfer Across Arbitrary Objects for Text-to-Video Generation. 11861-11871 - Yiyang Chen, Shanshan Zhao, Lunhao Duan, Changxing Ding, Dacheng Tao:

Harnessing Text-to-Image Diffusion Models for Point Cloud Self-Supervised Learning. 26156-26166 - Zeren Jiang, Chuanxia Zheng, Iro Laina, Diane Larlus, Andrea Vedaldi:

Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction. 1-14 - Feng Huang, Shuyuan Zheng, Zhaobing Qiu, Huanxian Liu, Huanxin Bai, Liqiong Chen:

Text-IRSTD: Leveraging Semantic Text to Promote Infrared Small Target Detection in Complex Scenes. 10635-10644 - Chengxu Liu, Lu Qi, Jinshan Pan, Xueming Qian, Ming-Hsuan Yang:

Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing. 7538-7547 - Zonglin Di, Jing Shi, Yifei Fan, Hao Tan, Alexander Black, John P. Collomosse, Yang Liu:

DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes. 24580-24590 - Haiwen Diao, Xiaotong Li, Yufeng Cui, Yueze Wang, Haoge Deng, Ting Pan, Wenxuan Wang, Huchuan Lu, Xinlong Wang:

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models. 21014-21025 - Tianyang Xue, Lin Lu, Yang Liu, Mingdong Wu, Hao Dong, Yanbin Zhang, Renmin Han, Baoquan Chen:

GFPack++: Attention-Driven Gradient Fields for Optimizing 2D Irregular Packing. 18014-18023 - Jian Wang, Tianhong Dai, Bingfeng Zhang, Siyue Yu, Eng Gee Lim, Jimin Xiao:

Class Token as Proxy: Optimal Transport-Assisted Proxy Learning for Weakly Supervised Semantic Segmentation. 21645-21654 - Ru Zeng, Yan Song, Yang Zhang, Yanling Hu, Hui Yu:

Agreement Aware and Dissimilarity Oriented GLOM. 24351-24359 - Shaobo Zhang, Yuhang Huang, Wanqing Zhao, Wei Zhao, Ziyu Guan, Jinye Peng:

Environment-Agnostic Pose: Generating Environment-Independent Object Representations for 6D Pose Estimation. 8678-8687 - Tianyu Fu, Tengxuan Liu, Qinghao Han, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang:

FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Vision Language Models. 22654-22663 - Shaohan Li, Hao Yang, Min Chen, Xiaolin Qin:

Met2Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems. 5458-5468 - Zheyun Qin, Deng Yu, Chuanchen Luo, Zhumin Chen:

Sliced Wasserstein Bridge for Open-Vocabulary Video Instance Segmentation. 12470-12478 - Muhammad Aqeel, Shakiba Sharifi, Marco Cristani, Francesco Setti:

Towards Real Unsupervised Anomaly Detection Via Confident Meta-Learning. 4858-4867 - Xu Cheng, Xin Jiang, Zechao Li:

A Unified Interpretation of Training-Time Out-Of-Distribution Detection. 2142-2151 - Dongwoo Kang, Akhil Perincherry, Zachary Coalson, Aiden Gabriel, Stefan Lee, Sanghyun Hong:

Harnessing Input-Adaptive Inference for Efficient VLN. 8219-8229 - Martin de La Gorce, Charlie Hewitt, Tibor Takács, Robert Gerdisch, Zafiirah Hosenie, Givi Meishvili, Marek Kowalski, Thomas J. Cashman, Antonio Criminisi:

VoluMe - Authentic 3D Video Calls from Live Gaussian Splat Prediction. 13783-13792 - Yufei Zhang, Zijun Cui, Jeffrey O. Kephart, Qiang Ji:

Diffusion-Based 3D Hand Motion Recovery with Intuitive Physics. 7306-7317 - Quanhao Li, Zhen Xing, Rui Wang, Hui Zhang, Qi Dai, Zuxuan Wu:

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance. 12112-12123 - Junyu Lou, Xiaorui Zhao, Kexuan Shi, Shuhang Gu:

Learning Pixel-Adaptive Multi-Layer Perceptrons for Real-Time Image Enhancement. 14095-14105 - Aniruddha Bala, Rohit Chowdhury, Rohan Jaiswal, Siddharth Roheda:

DCT-Shield: A Robust Frequency Domain Defense Against Malicious Image Editing. 18876-18884 - Wenlun Zhang, Yunshan Zhong, Shimpei Ando, Kentaro Yoshioka:

AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model. 22383-22392 - Jinming Li, Yichen Zhu, Zhibin Tang, Junjie Wen, Minjie Zhu, Xiaoyu Liu, Chengmeng Li, Ran Cheng, Yaxin Peng, Yan Peng, Feifei Feng:

CoA-VLA: Improving Vision-Language-Action Models via Visual-Textual Chain-of-Affordance. 9759-9769 - Zhengzhuo Xu, SiNan Du, Yiyan Qi, Siwen Lu, Chengjin Xu, Chun Yuan, Jian Guo:

ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning. 426-436 - Tobias Fischer, Samuel Rota Bulò, Yung-Hsu Yang, Nikhil Varma Keetha, Lorenzo Porzi, Norman Müller, Katja Schwarz, Jonathon Luiten, Marc Pollefeys, Peter Kontschieder:

FlowR: Flowing from Sparse to Dense 3D Reconstructions. 27702-27712 - Songchun Zhang, Huiyao Xu, Sitong Guo, Zhongwei Xie, Hujun Bao, Weiwei Xu, Changqing Zou:

SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations. 27794-27805 - Tobias Kirschstein, Javier Romero, Artem Sevastopolsky, Matthias Nießner, Shunsuke Saito:

Avat3r: Large Animatable Gaussian Reconstruction Model for High-Fidelity 3D Head Avatars. 12089-12100 - Gaojie Lin, Jianwen Jiang, Jiaqi Yang, Zerong Zheng, Chao Liang, Yuan Zhang, Jingtuo Liu:

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models. 13848-13858 - Boyong He, Yuxiang Ji, Zhuoyue Tan, Liaoni Wu:

Boosting Domain Generalized and Adaptive Detection with Diffusion Models: Fitness, Generalization, and Transferability. 1912-1923 - Liming Lu, Shuchao Pang, Xu Zheng, Xiang Gu, Anan Du, Yunhuai Liu, Yongbin Zhou:

Ciard: Cyclic Iterative Adversarial Robustness Distillation. 350-359 - Haoran Chen, Ping Wang, Zihan Zhou, Xu Zhang, Zuxuan Wu, Yu-Gang Jiang:

Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning. 340-349 - You Huang, Lichao Chen, Jiayi Ji, Liujuan Cao, Shengchuan Zhang, Rongrong Ji:

Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive Segmentation. 19816-19826 - Shaofeng Yin, Ting Lei, Yang Liu:

ToolVQA: A Dataset for Multi-Step Reasoning VQA with External Tools. 4424-4433 - Jiahao Wu, Rui Peng, Jianbo Jiao

, Jiayu Yang, Luyang Tang, Kaiqiang Xiong, Jie Liang, Jinbo Yan, Runling Liu, Ronggang Wang:
LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling. 9519-9529 - Jaemin Kim, Bryan Sangwoo Kim, Jong Chul Ye:

Free2 Guide: Training-Free Text-to-Video Alignment Using Image LVLM. 17920-17929 - Minchao Jiang, Shunyu Jia, Jiaming Gu, Xiaoyuan Lu, Guangming Zhu, Anqi Dong, Liang Zhang:

Votesplat: Hough Voting Gaussian Splatting for 3D Scene Understanding. 6456-6465 - Yinwei Wu, Xianpan Zhou, Bing Ma, Xuefeng Su, Kai Ma, Xinchao Wang:

IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation. 15949-15959 - Ziwei Wang, Sameera Ramasinghe, Chenchen Hu, Julien Monteil, Loris Bazzani, Thalaiyasingam Ajanthan:

Learning Visual Hierarchies in Hyperbolic Space for Image Retrieval. 9924-9934 - ZiYi Dong, Chengxing Zhou, Weijian Deng, Pengxu Wei, Xiangyang Ji, Liang Lin:

Can We Achieve Efficient Diffusion without Self-Attention? Distilling Self-Attention Into Convolutions. 17401-17410 - Kailong Zhang, Youwei Lyu, Heng Guo, Si Li, Zhanyu Ma, Boxin Shi:

PolarAnything: Diffusion-based Polarimetric Image Synthesis. 26466-26476 - Qing Jiang, Lin Wu, Zhaoyang Zeng, Tianhe Ren, Yuda Xiong, Yihao Chen, Qin Liu, Lei Zhang:

Referring to Any Person. 21667-21678 - Ming Dai, Wenxuan Cheng, Jiang-Jiang Liu, Sen Yang, Wenxiao Cai, Yanpeng Sun, Wankou Yang:

DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation Through Loopback Synergy. 19936-19946 - Tewodros W. Ayalew, Xiao Zhang, Kevin Yuanbo Wu, Tianchong Jiang, Michael Maire, Matthew R. Walter:

Progressor: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement. 10297-10306 - Tianhao Wu, Chuanxia Zheng, Frank Guan, Andrea Vedaldi, Tat-Jen Cham:

Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images. 9181-9193 - Yiwen Chen, Yikai Wang, Yihao Luo, Zhengyi Wang, Zilong Chen, Jun Zhu, Chi Zhang, Guosheng Lin:

MeshAnything V2: Artist-Created Mesh Generation with Adjacent Mesh Tokenization. 13922-13931 - Yung-Hsu Yang, Luigi Piccinelli, Mattia Segù, Siyuan Li, Rui Huang, Yuqian Fu, Marc Pollefeys, Hermann Blum, Zuria Bauer:

3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection. 7429-7439 - Léopold Maillard, Tom Durand, Adrien Ramanana Rahary, Maks Ovsjanikov:

Laconic: A 3D Layout Adapter for Controllable Image Creation. 18046-18057 - Xiaohui Li, Yihao Liu, Shuo Cao, Ziyan Chen, Shaobin Zhuang, Xiangyu Chen, Yinan He, Yi Wang, Yu Qiao:

DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations. 15319-15328 - Raiyaan Abdullah, Jared Claypoole, Michael Cogswell, Ajay Divakaran, Yogesh S. Rawat:

Punching Bag vs. Punching Person: Motion Transferability in Videos. 11348-11358 - Jiacheng Li, Feiran Li, Daisuke Iso:

Learning Hierarchical Line Buffer for Image Processing. 11132-11141 - Zixin Zhu, Kevin Duarte, Mamshad Nayeem Rizve, Chengyuan Xu, Ratheesh Kalarot, Junsong Yuan:

CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation. 16682-16691 - Hyuck Lee, Taemin Park, Heeyoung Kim:

Learnable Logit Adjustment for Imbalanced Semi-Supervised Learning Under Class Distribution Mismatch. 2664-2674 - Atin Pothiraj, Elias Stengel-Eskin, Jaemin Cho, Mohit Bansal:

CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting. 8001-8010 - Tianrui Zhu, Shiyi Zhang, Jiawei Shao, Yansong Tang:

KV-Edit: Training-Free Image Editing for Precise Background Preservation. 16607-16617 - Valter Piedade

, Chitturi Sidhartha, Joseé Gaspar, Venu Madhav Govindu, Pedro Miraldo:
SAC-GNC: Sample Consensus for Adaptive Graduated Non-Convexity. 5780-5790 - Yufei Zhu, Hao Chen, Yongjian Deng, Wei You:

Separation for Better Integration: Disentangling Edge and Motion in Event-Based Deblurring. 14732-14742 - Kyle Sargent, Kyle Hsu, Justin Johnson, Li Fei-Fei, Jiajun Wu:

Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization. 19471-19481 - Gen Li, Yang Xiao, Jie Ji, Kaiyuan Deng, Bo Hui, Linke Guo, Xiaolong Ma:

Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization. 19659-19668 - Taehoon Kim, Jongwook Choi, Yonghyun Jeong, Haeun Noh, Jaejun Yoo, Seungryul Baek, Jongwon Choi:

Beyond Spatial Frequency: Pixel-Wise Temporal Frequency-Based Deepfake Video Detection. 11198-11207 - Elena Buglakova, Anwai Archit, Edoardo D'Imprima, Julia Mahamid, Constantin Pape, Anna Kreshuk:

Tiling Artifacts and Trade-Offs of Feature Normalization in the Segmentation of Large Biological Images. 13109-13118 - Wenhang Ge, Jiantao Lin, Guibao Shen, Jiawei Feng, Tao Hu, Xinli Xu, Ying-Cong Chen:

PRM: Photometric Stereo Based Large Reconstruction Model. 25009-25018 - Aggelina Chatziagapi, Louis-Philippe Morency, Hongyu Gong, Michael Zollhöfer, Dimitris Samaras, Alexander Richard:

AV-Flow: Transforming Text to Audio-Visual Human-Like Interactions. 14270-14282 - Ju He, Qihang Yu, Qihao Liu, Liang-Chieh Chen:

FlowTok: Flowing Seamlessly Across Text and Image Tokens. 1-12 - Chengbo Yuan, Geng Chen, Li Yi, Yang Gao:

Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos. 8863-8874 - Yixing Lu, Junting Dong, Youngjoong Kwon, Qin Zhao, Bo Dai, Fernando De la Torre:

GAS: Generative Avatar Synthesis from a Single Image. 12883-12893 - Han Ling, Xian Xu, Yinghui Sun, Quansen Sun:

OCSplats: Observation Completeness Quantification and Label Noise Separation in 3DGS. 25680-25689 - Hao-Yu Hou, Chun-Yi Lee, Motoharu Sonogashira, Yasutomo Kawanishi:

FROSS: Faster-than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images. 28818-28827 - Shengbang Tong, David Fan, Jiachen Zhu, Yunyang Xiong, Xinlei Chen, Koustuv Sinha, Michael Rabbat, Yann LeCun, Saining Xie, Zhuang Liu:

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning. 17001-17012 - Yuedong Tan, Jiawei Shao, Eduard Zamfir, Ruanjun Li, Zhaochong An, Chao Ma, Danda Pani Paudel, Luc Van Gool, Radu Timofte, Zongwei Wu:

What You Have is What You Track: Adaptive and Robust Multimodal Tracking. 3455-3465 - Xiaowen Ma, Zhenliang Ni, Xinghao Chen:

TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba. 23519-23529 - Onkar Susladkar, Gayatri Deshmukh, Yalcin Tur, Gorkem Durak, Ulas Bagci:

ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis. 22772-22782 - Yiming Huang, Zhiyang Dou, Lingjie Liu:

ModSkill: Physical Character Skill Modularization. 12394-12404 - Chenzhong Gao, Wei Li, Desheng Weng:

HOMO-Feature: Cross-Arbitrary-Modal Image Matching with Homomorphism of Organized Major Orientation. 10538-10548 - Dohwan Ko, Ji Soo Lee, Minhyuk Choi, Zihang Meng, Hyunwoo J. Kim:

Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval. 22263-22273 - Yasser Dahou, Ngoc Dung Huynh, Phuc H. Le-Khac, Wamiq Reyaz Para, Ankit Singh, Sanath Narayan:

Vision-Language Models Can't See the Obvious. 24159-24169 - Guobin Shen, Jindong Li, Tenglong Li, Dongcheng Zhao, Yi Zeng:

SpikePack: Enhanced Information Flow in Spiking Neural Networks with High Hardware Compatibility. 23385-23395 - Hai Jiang, Binhao Guan, Zhen Liu, Xiaohong Liu, Jian Yu, Zheng Liu, Songchen Han, Shuaicheng Liu:

Learning to See in the Extremely Dark. 7676-7685 - Etai Sella, Noam Atia, Ron Mokady, Hadar Averbuch-Elor:

Blended Point Cloud Diffusion for Localized Text-Guided Shape Editing. 19119-19129 - Chen Zhu, Wangbo Zhao, Huiwen Zhang, Yuhao Zhou, Weidong Tang, Shuo Wang, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Kai Wang, Dawei Yang:

EA-Vit: Efficient Adaptation for Elastic Vision Transformer. 1-10 - Huu-Phu Do, Yu-Wei Chen, Yi-Cheng Liao, Chi-Wei Hsiao, Han-Yang Wang, Wei-Chen Chiu, Ching-Chun Huang:

DynFaceRestore: Balancing Fidelity and Quality in Diffusion-Guided Blind Face Restoration with Dynamic Blur-Level Mapping and Guidance. 10432-10441 - Jingqiao Xiu, Yicong Li, Na Zhao, Han Fang, Xiang Wang, Angela Yao:

Geometric Alignment and Prior Modulation for View-Guided Point Cloud Completion on Unseen Categories. 27435-27444 - Wei-Jer Chang, Wei Zhan, Masayoshi Tomizuka, Manmohan Chandraker, Francesco Pittaluga:

LangTraj: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation. 26622-26631 - Lingyu Chen, Yawen Zeng, Yue Wang, Peng Wan, Guochen Ning, Hongen Liao, Daoqiang Zhang, Fang Chen:

COME: Dual Structure-Semantic Learning with Collaborative MOE for Universal Lesion Detection Across Heterogeneous Ultrasound Datasets. 21460-21470 - Linlan Huang, Xusheng Cao, Haori Lu, Yifan Meng, Fei Yang, Xialei Liu:

Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning. 3777-3786 - Sunghyun Park, Seokeon Choi, Hyoungwoo Park, Sungrack Yun:

Steering Guidance for Personalized Text-to-Image Diffusion Models. 15907-15916 - Junho Kim, Hyungjin Chung, Byung-Hoon Kim:

CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models. 22889-22898 - Chandan Yeshwanth, Dávid Rozenberszki, Angela Dai:

ExCap3d: Expressive 3D Scene Understanding via Object Captioning with Varying Detail. 21699-21709 - Yun Li, Yiming Zhang, Tao Lin, XiangRui Liu, Wenxiao Cai, Zheng Liu, Bo Zhao:

STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding? 5622-5632 - Gaoyang Zhang, Bingtao Fu, Qingnan Fan, Qi Zhang, Runxing Liu, Hong Gu, Huaqi Zhang, Xinguo Liu:

CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models. 15253-15265 - Zhifeng Gu, Bing Wang:

MMOne: Representing Multiple Modalities in One Scene. 1088-1098 - Xinhua Lu, Runhe Lai, Yanqi Wu, Kanghao Chen, Wei-Shi Zheng, Ruixuan Wang:

FA: Forced Prompt Learning of Vision-Language Models for Out-of-Distribution Detection. 1152-1161 - Zhankai Li, Weiping Wang, Jie Li, Shigeng Zhang, Yunan Hu, Song Guo:

Enhancing Transferability of Targeted Adversarial Examples Via Inverse Target Gradient Competition and Spatial Distance Stretching. 3716-3725 - Xiaojie Zhang, Yuanfei Wang, Ruihai Wu, Kunqi Xu, Yu Li, Liuyu Xiang, Hao Dong, Zhaofeng He:

Adaptive Articulated Object Manipulation on the Fly with Foundation Model Reasoning and Part Grounding. 13032-13042 - Omkar Thawakar, Dmitry Demidov, Ritesh Thawkar, Rao Muhammad Anwer, Mubarak Shah, Fahad Shahbaz Khan, Salman Khan:

Beyond Simple Edits: Composed Video Retrieval with Dense Modifications. 20435-20444 - Xiangdong Zhang, Shaofeng Zhang, Junchi Yan:

Towards More Diverse and Challenging Pre-Training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views. 28696-28706 - Hongyi Zhang, Laurie Bose, Jianing Chen, Piotr Dudek, Walterio W. Mayol-Cuevas:

Focal Plane Visual Feature Generation and Matching on a Pixel Processor Array. 29031-29041 - Yan Liu, Zehao Chen, Haojie Yan, De Ma, Huajin Tang, Qian Zheng, Gang Pan:

E-NeMF: Event-based Neural Motion Field for Novel Space-time View Synthesis of Dynamic Scenes. 10854-10864 - Hao He, Ceyuan Yang, Shanchuan Lin, Yinghao Xu, Meng Wei, Liangke Gui, Qi Zhao, Gordon Wetzstein, Lu Jiang, Hongsheng Li:

CameraCtrl II: Dynamic Scene Exploration via Camera-Controlled Video Diffusion Models. 13416-13426 - Shivangi Aneja, Artem Sevastopolsky, Tobias Kirschstein, Justus Thies, Angela Dai, Matthias Nießner:

GaussianSpeech: Audio-Driven Personalized 3D Gaussian Avatars. 13065-13075 - Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu, Matthias Bethge, Wieland Brendel, A. Sophia Koepke:

VGGSounder: Audio-Visual Evaluations for Foundation Models. 1027-1037 - Zhuoyan Luo, Yinghao Wu, Tianheng Cheng, Yong Liu, Yicheng Xiao, Hongfa Wang, Xiao-Ping Zhang, Yujiu Yang:

CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation. 22685-22694 - Jie Shao, Hanxiao Zhang, Hao Yu, Jianxin Wu:

Memory-Efficient Generative Models via Product Quantization. 16871-16881 - Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam, Junghyun Cho, Ig-Jae Kim:

VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset. 10043-10053 - Yu'ang Feng, Shuyong Gao, Fuzhen Yan, Yicheng Song, Lingyi Hong, Junjie Hu, Wenqiang Zhang:

Scoring, Remember, and Reference: Catching Camouflaged Objects in Videos. 13043-13052 - Yefei He, Yuanyu He, Shaoxuan He, Feng Chen, Hong Zhou, Kaipeng Zhang, Bohan Zhuang:

Neighboring Autoregressive Modeling for Efficient Visual Generation. 19000-19010 - Xin Jin, Haisheng Su, Cong Ma, Kai Liu, Wei Wu, Fei Hui, Junchi Yan:

GeoFormer: Geometry Point Encoder for 3D Object Detection with Graph-Based Transformer. 26879-26889 - Zhengkang Xiang, Zizhao Li, Amir Khodabandeh, Kourosh Khoshelham:

SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion. 24965-24976 - Yanbing Zhang, Zhe Wang, Qin Zhou, Mengping Yang:

FreeCus: Free Lunch Subject-Driven Customization in Diffusion Transformers. 15521-15531 - Haowei Kuang, Wenhan Yang, Zongming Guo, Jiaying Liu:

Cross-Granularity Online Optimization with Masked Compensated Information for Learned Image Compression. 1-10 - Andrea Simonelli, Norman Müller, Peter Kontschieder:

Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation. 24707-24716 - Shengyuan Ding, Shenxi Wu, Xiangyu Zhao, Yuhang Zang, Haodong Duan, Xiaoyi Dong, Pan Zhang, Yuhang Cao, Dahua Lin, Jiaqi Wang:

MM-IFEngine: Towards Multimodal Instruction Following. 1099-1109 - Sanghun Jung, Jingjing Zheng, Ke Zhang, Nan Qiao, Albert Y. C. Chen, Lu Xia, Chi Liu, Yuyin Sun, Xiao Zeng, Hsiang-Wei Huang, Byron Boots, Min Sun, Cheng-Hao Kuo:

Details Matter for Indoor Open-Vocabulary 3D Instance Segmentation. 9627-9637 - Zizhang Li, Hong-Xing Yu, Wei Liu, Yin Yang, Charles Herrmann, Gordon Wetzstein, Jiajun Wu:

Wonderplay: Dynamic 3D Scene Generation From a Single Image and Actions. 9080-9090 - Baofeng Tan, Xiu-Shen Wei, Lin Zhao:

Prototype-Based Contrastive Learning with Stage-Wise Progressive Augmentation for Self-Supervised Fine-Grained Learning. 4125-4134 - Zheyuan Zhang, Wanying Dou, Linkai Peng, Hongyi Pan, Ulas Bagci, Boqing Gong:

VideoAds for Fast-Paced Video Understanding. 21812-21821 - Guanning Zeng, Xiang Zhang, Zirui Wang, Haiyang Xu, Zeyuan Chen, Bingnan Li, Zhuowen Tu:

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation. 16765-16775 - Wenkun He, Yun Liu, Ruitao Liu, Li Yi:

SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis. 11731-11743 - Jingyi Yang, Xun Lin, Zitong Yu, Liepiao Zhang, Xin Liu, Hui Li, Xiaochen Yuan, Xiaochun Cao:

DADM: Dual Alignment of Domain and Modality for Face Anti-Spoofing. 12045-12056 - Lei Sun, Yuhan Bao, Jiajun Zhai, Jingyun Liang, Yulun Zhang, Kaiwei Wang, Danda Pani Paudel, Luc Van Gool:

Low-Light Image Enhancement Using Event-Based Illumination Estimation. 6667-6677 - Xiaoyi Feng, Tao Huang, Peng Wang, Zizhou Huang, Haihang Zhang, Yuntao Zou, Dagang Li, Kaifeng Zou:

A Unified Framework for Industrial Cel-Animation Colorization with Temporal-Structural Awareness. 19301-19310 - Yuan-Fu Yang, Hsiu-Hui Hsiao:

Photolithography Overlay Map Generation with Implicit Knowledge Distillation Diffusion Transformer. 15288-15297 - Kaiyu Yue, Vasu Singla, Menglin Jia, John Kirchenbauer, Rifaa Qadri, Zikui Cai, Abhinav Bhatele, Furong Huang, Tom Goldstein:

Zero-Shot Vision Encoder Grafting via LLM Surrogates. 4275-4284 - Xuzhi Wang, Xinran Wu, Song Wang, Lingdong Kong, Ziping Zhao:

Monocular Semantic Scene Completion via Masked Recurrent Networks. 24811-24822 - Eunchan Jo, Dahyun Kang, Sanghyun Kim, Yunseon Choi, Minsu Cho:

Few-Shot Pattern Detection via Template Matching and Regression. 21578-21588 - Hongqiu Wang, Wu Chen, Xiangde Luo, Zhaohu Xing, Lihao Liu, Jing Qin, Shaozhi Wu, Lei Zhu:

Toward Fair and Accurate Cross-Domain Medical Image Segmentation: a Vlm-Driven Active Domain Adaptation Paradigm. 24102-24112 - Anlin Zheng, Haochen Wang, Yucheng Zhao, Weipeng Deng, Tiancai Wang, Xiangyu Zhang, Xiaojuan Qi:

Holistic Tokenizer for Autoregressive Image Generation. 16916-16926 - Weiming Zhang, Dingwen Xiao, Lei Chen, Lin Wang:

E-SAM: Training-Free Segment Every Entity Model. 24688-24697 - Mingtao Feng, Longlong Mei, Zijie Wu, Jianqiao Luo, Fenghao Tian, Jie Feng, Weisheng Dong, Yaonan Wang:

Partially Matching Submap Helps: Uncertainty Modeling and Propagation for Text to Point Cloud Localization. 8296-8305 - Zijian Dong, Longteng Duan, Jie Song, Michael J. Black, Andreas Geiger:

MoGA: 3D Generative Avatar Prior for Monocular Gaussian Avatar Reconstruction. 13304-13314 - Han Jiang, Wenfei Yang, Tianzhu Zhang, Yongdong Zhang:

Diffusion-Based Source-Biased Model for Single Domain Generalized Object Detection. 1548-1557 - Hongjin Lyu, Bo Li, Paul L. Rosin, Yu-Kun Lai:

LGA-Net: Learning Local and Global Affinities for Sparse Scribble Based Image Colorization. 8144-8153 - Vishwesh Ramanathan, Tony Xu, Pushpak Pati, Faruk Ahmed, Maged Goubran, Anne L. Martel:

Modaltune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-Task Learning in Digital Pathology. 23912-23923 - Juelin Zhu, Shuaibang Peng, Long Wang, Hanlin Tan, Yu Liu, Maojun Zhang, Shen Yan:

LoD-Loc v2: Aerial Visual Localization Over Low Level-of-Detail City Models using Explicit Silhouette Alignment. 26610-26621 - Nan Chen, Mengqi Huang, Yihao Meng, Zhendong Mao:

LongAnimation: Long Animation Generation with Dynamic Global-Local Memory. 10032-10042 - Pan Liu, Jinshi Liu:

When Confidence Fails: Revisiting Pseudo-Label Selection in Semi-Supervised Semantic Segmentation. 21874-21884 - Zechao Hu, Zhengwei Yang, Hao Li, Zheng Wang, Yixiong Zou:

Cross-Category Subjectivity Generalization for Style-Adaptive Sketch Re-ID. 22644-22653 - Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Mayur Jagdishbhai Patel, Hongfei Xue, Ahmed Helmy, Srijan Das, Pu Wang:

MaskHand: Generative Masked Modeling for Robust Hand Mesh Reconstruction in the Wild. 8372-8383 - Nuoye Xiong, Anqi Dong, Ning Wang, Cong Hua, Guangming Zhu, Lin Mei, Peiyi Shen, Liang Zhang:

Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding. 2836-2845 - Yuping Wang, Xiangyu Huang, Xiaokang Sun, Mingxuan Yan, Shuo Xing, Zhengzhong Tu, Jiachen Li:

Uniocc: a Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving. 25560-25570 - Matthew Beveridge, Shree K. Nayar:

Hierarchical Material Recognition from Local Appearance. 8165-8176 - Yuqi Li, Haotian Zhang, Li Li, Dong Liu:

Learned Image Compression with Hierarchical Progressive Context Modeling. 18834-18843 - Hao Li, Ju Dai, Feng Zhou, Kaida Ning, Lei Li, Junjun Pan:

AU-Blendshape for Fine-Grained Stylized 3D Facial Expression Manipulation. 12605-12614 - Sankeerth Durvasula, Sharanshangar Muhunthan, Zain Moustafa, Richard Chen, Ruofan Liang, Yushi Guan, Nilesh A. Ahuja, Nilesh Jain, Selvakumar Panneer, Nandita Vijaykumar:

ContraGS: Codebook-Condensed and Trainable Gaussian Splatting for Fast, Memory-Efficient Reconstruction. 28935-28945 - Haowen Li, Zhenfeng Fan, Zhang Wen, Zhengzhou Zhu, Yunjin Li:

AIcomposer: Any Style and Content Image Composition via Feature Integration. 16840-16850 - Han Ji, Yuqi Feng, Jiahao Fan, Yanan Sun:

CARL: Causality-Guided Architecture Representation Learning for an Interpretable Performance Predictor. 23019-23029 - Ruiyang Ha, Songyi Jiang, Bin Li, Bikang Pan, Yihang Zhu, Junjie Zhang, Xiatian Zhu, Shaogang Gong, Jingya Wang:

Multi-Modal Multi-Platform Person Re-Identification: Benchmark and Method. 10251-10261 - Xingyu Zhu, Shuo Wang, Beier Zhu, Miaoge Li, Yunfan Li, Junfeng Fang, Zhicai Wang, Dongsheng Wang, Hanwang Zhang:

Dynamic Multimodal Prototype Learning in Vision-Language Models. 2501-2511 - Anthony Bisulco, Rahul Ramesh, Randall Balliestro, Pratik Chaudhari:

From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations. 16441-16450 - Jinshu Chen, Bingchuan Li, Fan Zhang, Songtao Zhao, Qian He:

OneGT: One-Shot Geometry-Texture Neural Rendering for Head Avatars. 11294-11304 - Xiaolong Jin, Zixuan Weng, Hanxi Guo, Chenlong Yin, Siyuan Cheng, Guangyu Shen, Xiangyu Zhang:

JailbreakDiffBench: A Comprehensive Benchmark for Jailbreaking Diffusion Models. 16461-16471 - Mao Mao, Xujie Shen, Guyuan Chen, Boming Zhao, Jiarui Hu, Hujun Bao, Zhaopeng Cui:

AccidentalGS: 3D Gaussian Splatting from Accidental Camera Motion. 27445-27455 - Sunpill Kim, Seunghun Paik, Chanwoo Hwang, Dongsoo Kim, Junbum Shin, Jae Hong Seo:

IDFace: Face Template Protection for Efficient and Secure Identification. 13995-14005 - Tianwei Xiong, Jun Hao Liew, Zilong Huang, Jiashi Feng, Xihui Liu:

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation. 18770-18780 - Yue Li, Meng Tian, Zhenyu Lin, Jiangtong Zhu, Dechang Zhu, Haiqiang Liu, Yueyi Zhang, Zhiwei Xiong, Xinhai Zhao:

Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving. 9431-9442 - Kazuma Nagata, Naoshi Kaneko:

DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images. 17899-17908 - Jingyang Li, Kuangyu Ding, Kim-Chuan Toh, Pan Zhou:

Memory-Efficient 4-bit Preconditioned Stochastic Optimization. 22633-22643 - Rongjia Zheng, Qing Zhang, Chengjiang Long, Wei-Shi Zheng:

DNF-Intrinsic: Deterministic Noise-Free Diffusion for Indoor Inverse Rendering. 10342-10352 - Junhao Dong, Jiao Liu, Xinghua Qu, Yew-Soon Ong:

Confound from all Sides, Distill with Resilience: Multi-Objective Adversarial Paths to Zero-Shot Robustness. 624-634 - Shuhang Chen, Hangjie Yuan, Pengwei Liu, Hanxue Gu, Tao Feng, Dong Ni:

SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images. 21209-21219 - Sanjoy Chowdhury, Subrata Biswas, Sayan Nag, Tushar Nagarajan, Calvin Murdock, Ishwarya Ananthabhotla, Yijun Qian, Vamsi Krishna Ithapu, Dinesh Manocha, Ruohan Gao:

EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception. 10741-10752 - Xihua Wang, Xin Cheng, Yuyue Wang, Ruihua Song, Yunfeng Wang:

VAFlow: Video-to-Audio Generation with Cross-Modality Flow Matching. 11777-11786 - Yin Xie, Kaicheng Yang, Xiang An, Kun Wu, Yongle Zhao, Weimo Deng, Zimin Ran, Yumeng Wang, Ziyong Feng, Roy Miles, Ismail Elezi, Jiankang Deng:

Region-based Cluster Discrimination for Visual Representation Learning. 1793-1803 - Shuaiting Li, Juncan Deng, Chengxuan Wang, Kedong Xu, Rongtao Deng, Hong Gu, Haibin Shen, Kejie Huang:

SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting. 23710-23719 - Jiachen Sun, De Cheng, Xi Yang, Nannan Wang:

Dual Domain Control via Active Learning for Remote Sensing Domain Incremental Object Detection. 3809-3818 - Tu Bui, Shruti Agarwal, John P. Collomosse:

TrustMark: Robust Watermarking and Watermark Removal for Arbitrary Resolution Images. 18629-18639 - Zhirui Gao, Renjiao Yi, Yaqiao Dai, Xuening Zhu, Wei Chen, Chenyang Zhu, Kai Xu:

Curve-Aware Gaussian Splatting for 3D Parametric Curve Reconstruction. 27531-27541 - Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, Dan Zhang:

LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models. 9913-9923 - U.-Chae Jun, Jaeeun Ko, Jiwoo Kang:

Generative Adversarial Diffusion. 16786-16796 - Yuanhan Zhang, Yunice Chew, Yuhao Dong, Aria Leo, Bo Hu, Ziwei Liu:

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding. 20626-20636 - Meiqi Cao, Xiangbo Shu, Xin Jiang, Rui Yan, Yazhou Yao, Jinhui Tang:

Exploiting Frequency Dynamics for Enhanced Multimodal Event-Based Action Recognition. 5969-5979 - Fangfu Liu, Hanyang Wang, Yimo Cai, Kaiyan Zhang, Xiaohang Zhan, Yueqi Duan:

Video-T1: Test-Time Scaling for Video Generation. 18671-18681 - Wenxuan Bao, Ruxi Deng, Ruizhong Qiu, Tianxin Wei, Hanghang Tong, Jingrui He:

Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning. 133-143 - Ranran Huang, Krystian Mikolajczyk:

No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views. 27947-27957 - Mo Zhou, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Vishal M. Patel, Hossein Talebi:

UniRes: Universal Image Restoration for Complex Degradations. 13237-13247 - Zexin Zheng, Jia-Feng Cai, Xiao-Ming Wu, Yi-Lin Wei, Yu-Ming Tang, Ancong Wu, Wei-Shi Zheng:

iManip: Skill-Incremental Learning for Robotic Manipulation. 13890-13900 - Byeongjun Park, Hyojun Go, Hyelin Nam, Byung-Hoon Kim, Hyungjin Chung, Changick Kim:

SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering. 27326-27337 - Yangyang Guo, Mohan Kankanhalli:

SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency. 3662-3672 - Jinglei Zhang, Yuanfan Guo, Rolandos Alexandros Potamias, Jiankang Deng, Hang Xu, Chao Ma:

VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning. 24203-24213 - Jiacheng Liu, Chang Zou, Yuanhuiyi Lyu, Junjie Chen, Linfeng Zhang:

From Reusing to Forecasting: Accelerating Diffusion Models With Taylorseers. 15853-15863 - Du Chen, Liyi Chen, Zhengqiang Zhang, Lei Zhang:

Generalized and Efficient 2D Gaussian Splatting for Arbitrary-Scale Super-Resolution. 26435-26445 - Huachao Zhu, Zelong Liu, Zhichao Sun, Yuda Zou, Gui-Song Xia, Yongchao Xu:

Beyond Pixel Uncertainty: Bounding the OoD Objects in Road Scenes. 8472-8481 - Honghui Xu, Chuangjie Fang, Yibin Wang, Jie Wu, Jianwei Zheng:

Laboring on Less Labors: RPCA Paradigm for Pan-Sharpening. 11393-11402 - Weinan He, Yixin Zhang, Zilei Wang:

Progressive Distribution Bridging: Unsupervised Adaptation for Large-Scale Pre-Trained Models via Adaptive Auxiliary Data. 3280-3292 - Sanjoy Chowdhury, Hanan Gani, Nishit Anand, Sayan Nag, Ruohan Gao, Mohamed Elhoseiny

, Salman Khan, Dinesh Manocha:
Aurelia: Test-Time Reasoning Distillation in Audio-Visual LLMs. 22899-22910 - Chaoyong Yang, Jia-Li Yin, Bin Chen, Zhaozhe Hu, Xiaolei Liu, Wei Lin:

KOEnsAttack: Towards Efficient Data-Free Black-Box Adversarial Attacks via Knowledge-Orthogonalized Substitute Ensembles. 3101-3110 - Dubing Chen, Huan Zheng, Yucheng Zhou, Xianfei Li, Wenlong Liao, Tao He, Pai Peng, Jianbing Shen:

Semantic Causality-Aware Vision-Based 3D Occupancy Prediction. 24878-24888 - Geonhee Sim, Gyeongsik Moon:

PERSONA: Personalized Whole-Body 3D Avatar with Pose-Driven Deformations from a Single Image. 12670-12680 - Jessica Bader, Leander Girrbach, Stephan Alaniz, Zeynep Akata:

SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions. 23188-23198 - Syed Talal Wasim, Hamid Suleman, Olga Zatsarynna, Muzammal Naseer, Juergen Gall:

MixANT: Observation-Dependent Memory Propagation for Stochastic Dense Action Anticipation. 14613-14622 - Weitian Wang, Shubham Rai, Cecilia De la Parra, Akash Kumar:

Mixa-Q: Revisiting Activation Sparsity for Vision Transformers From a Mixed-Precision Quantization Perspective. 22143-22152 - Junhao Ge, Zuhong Liu, Longteng Fan, Yifan Jiang, Jiaqi Su, Yiming Li, Zhejun Zhang, Siheng Chen:

Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving. 28859-28869 - Hanwen Zhang, Congqi Cao, Qinyi Lv, Lingtong Min, Yanning Zhang:

Autoregressive Denoising Score Matching Is a Good Video Anomaly Detector. 12057-12067 - Heyi Sun, Cong Wang, Tian-Xing Xu, Jingwei Huang, Di Kang, Chunchao Guo, Song-Hai Zhang:

SVG-Head: Hybrid Surface-Volumetric Gaussians for High-Fidelity Head Reconstruction and Real-Time Editing. 13326-13335 - Junhao Xiao, Yang Wei, Jingyu Wang, Yongchao Wang, Xiuli Bi, Bin Xiao:

Breaking Grid Constraints: Dynamic Graph Reconstruction Network for Multi-Organ Segmentation. 24413-24422 - Shengdong Han, Shangdong Yang, Yuxuan Li, Xin Zhang, Xiang Li, Jian Yang, Ming-Ming Cheng, Yimian Dai:

DISTA-Net: Dynamic Closely-Spaced Infrared Small Target Unmixing. 14655-14664 - Yiping Ji, Hemanth Saratchandran, Peyman Moghadam, Simon Lucey:

Always Skip Attention. 23115-23123 - Ziyan Guo, Zeyu Hu, De Wen Soh, Na Zhao:

MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm. 13869-13879 - Xiaokun Sun, Zeyu Cai, Ying Tai, Jian Yang, Zhenyu Zhang:

StrandHead: Text to Hair-Disentangled 3D Head Avatars Using Human-Centric Priors. 13393-13404 - Xinli Xu, Wenhang Ge, Jiantao Lin, Jiawei Feng, Lie Xu, HanFeng Zhao, Shunsi Zhang, Ying-Cong Chen:

FlexGen: Flexible Multi-View Generation from Text and Image Inputs. 18714-18724 - Mingyuan Sun, Zheng Fang, Jiaxu Wang, Kunyi Zhang, Qiang Zhang, Renjing Xu:

Learning Null Geodesics for Gravitational Lensing Rendering in General Relativity. 28473-28482 - Zijia Lu, Ehsan Elhamifar:

Multi-Modal Few-Shot Temporal Action Segmentation. 14106-14116 - Jeongho Kim, Hoiyeong Jin, Sunghyun Park, Jaegul Choo:

PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-Aware Mask. 16026-16036 - Gwanghyun Kim, Xueting Li, Ye Yuan, Koki Nagano, Tianye Li, Jan Kautz, Se Young Chun, Umar Iqbal:

GeoMan: Temporally Consistent Human Geometry Estimation Using Image-to-Video Diffusion. 7451-7461 - Taimur Hassan, Anabia Sohail, Muzammal Naseer, Naoufel Werghi:

Vision-Language Neural Graph Featurization for Extracting Retinal Lesions. 23700-23709 - Yongsheng Yu, Ziyun Zeng, Haitian Zheng, Jiebo Luo:

Omnipaint: Mastering Object-Oriented Editing Via Disentangled Insertion-Removal Inpainting. 17324-17334 - Yi Yang, Xiaoxuan He, Hongkun Pan, Xiyan Jiang, Yan Deng, Xingtao Yang, Haoyu Lu, Dacheng Yin, Fengyun Rao, Minfeng Zhu, Bo Zhang, Wei Chen:

R1-Onevision: Advancing Generalized Multimodal Reasoning Through Cross-Modal Formalization. 2376-2385 - Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Hengshuang Zhao:

ViLLa: Video Reasoning Segmentation with Large Language Model. 23667-23677 - Jerred Chen, Ronald Clark:

Image as an Imu: Estimating Camera Motion From a Single Motion-Blurred Image. 90-99 - Matthias Kümmerer, Harneet Singh Khanuja, Matthias Bethge:

Modeling Saliency Dataset Bias. 22077-22088 - Tiankai Hang, Shuyang Gu, Jianmin Bao, Fangyun Wei, Dong Chen, Xin Geng, Baining Guo:

Improved Noise Schedule for Diffusion Training. 4796-4806 - Julia Machnio, Mads Nielsen, Mostafa Mehdipour-Ghazi:

To Label or Not to Label: PALM - a Predictive Model for Evaluating Sample Efficiency in Active Learning Models. 4039-4048 - Kanoko Goto, Takumi Hirose, Mahiro Ukai, Shuhei Kurita, Nakamasa Inoue:

Referring Expression Comprehension for Small Objects. 21231-21242 - Bhishma Dedhia, David Bourgin, Krishna Kumar Singh, Yuheng Li, Yan Kang, Zhan Xu, Niraj K. Jha, Yuchen Liu:

Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks. 15385-15394 - Yuwen Du, Anning Hu, Zichen Chao, Yifan Lu, Junhao Ge, Genjia Liu, Weitao Wu, Lanjun Wang, Siheng Chen:

RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation. 26977-26986 - Xin Shen, Xinyu Wang, Lei Shen, Kaihao Zhang, Xin Yu

:
Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement. 20647-20657 - Liuyue Xie, Jiancong Guo, Ozan Cakmakci, Andre Araujo, László A. Jeni, Zhiheng Jia:

AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion. 26901-26911 - Quanmin Liang, Qiang Li, Shuai Liu, Xinzi Cao, Jinyi Lu, Feidiao Yang, Wei Zhang, Kai Huang, Yonghong Tian:

Efficient Event Camera Data Pretraining with Adaptive Prompt Fusion. 8656-8667 - Peiming Li, Ziyi Wang, Yulin Yuan, Hong Liu, Xiangming Meng, Junsong Yuan, Mengyuan Liu:

UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling. 6738-6747 - Kunlun Xu, Fan Zhuo, Jiangmeng Li, Xu Zou, Jiahuan Zhou:

Self-Reinforcing Prototype Evolution with Dual-Knowledge Cooperation for Semi-Supervised Lifelong Person Re-Identification. 3564-3574 - Kelin Yu, Sheng Zhang, Harshit Soora, Furong Huang, Heng Huang, Pratap Tokekar, Ruohan Gao:

GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning. 13183-13192 - Yunhao Li, Yifan Jiao, Dan Meng, Heng Fan, Libo Zhang:

Attention to Trajectory: Trajectory-Aware Open-Vocabulary Tracking. 14390-14398 - Heyan Liu, Jianing Sun, Jun Liu, Xi-Le Zhao, Tingting Wu, Tieyong Zeng:

Blind Noisy Image Deblurring Using Residual Guidance Strategy. 11016-11025 - Yuqi Wu, Wenzhao Zheng, Sicheng Zuo, Yuanhui Huang, Jie Zhou, Jiwen Lu:

EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-Based Online Scene Understanding. 26360-26370 - Quanwei Yang, Luying Huang, Kaisiyuan Wang, Jiazhi Guan, Shengyi He, Fengguo Li, Hang Zhou, Lingyun Yu, Yingying Li, Haocheng Feng, Hongtao Xie:

GestureHYDRA: Semantic Co-Speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation. 12615-12625 - Shuchao Pang, Zhenghan Chen, Shen Zhang, Liming Lu, Siyuan Liang, Anan Du, Yongbin Zhou:

Towards a 3D Transfer-Based Black-Box Attack via Critical Feature Guidance. 26912-26922 - Lijie Hu, Tianhao Huang, Huanyi Xie, Xilin Gong, Chenyang Ren, Zhengyu Hu, Lu Yu, Ping Ma, Di Wang

:
Semi-Supervised Concept Bottleneck Models. 2110-2119 - Wanquan Feng, Tianhao Qi, Jiawei Liu, Mingzhen Sun, Pengqi Tu, Tianxiang Ma, Fei Dai, Songtao Zhao, SiYu Zhou, Qian He:

I2VControl: Disentangled and Unified Video Motion Synthesis Control. 14051-14060 - Yan Wang, Da-Wei Zhou, Han-Jia Ye:

Integrating Task-Specific and Universal Adapters for Pre-Trained Model-Based Class-Incremental Learning. 806-816 - Jinpei Guo, Zheng Chen, Wenbo Li, Yong Guo, Yulun Zhang:

Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal. 14930-14939 - Xin Hu, Ke Qin, Guiduo Duan, Ming Li, Yuan-Fang Li, Tao He:

SPADE: Spatial-Aware Denoising Network for Open-Vocabulary Panoptic Scene Graph Generation with Long- and Local-Range Context Reasoning. 15562-15572 - Jae-Young Kang, Hoonhee Cho, Kuk-Jin Yoon:

Unleashing the Temporal Potential of Stereo Event Cameras for Continuous-Time 3D Object Detection. 6869-6881 - Wufei Xie, Yalin Wang, Chenliang Liu, Zhaohui Jiang, Xue Yang:

Flexi-FSCIL: Adaptive Knowledge Retention for Breaking the Stability-Plasticity Dilemma in Few-Shot Class-Incremental Learning. 2451-2460 - Andrea Conti, Matteo Poggi, Valerio Cambareri, Martin R. Oswald, Stefano Mattoccia:

ToF-Splatting: Dense SLAM Using Sparse Time-of-Flight Depth and Multi-Frame Integration. 28344-28353 - Jiaxin Ai, Pengfei Zhou, Zhaopan Xu, Ming Li, Fanrui Zhang, Zizhen Li, Jianwen Sun, Yukang Feng, Baojin Huang, Zhongyuan Wang, Kaipeng Zhang:

ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for Mllm-Based Process Judges. 4681-4690 - Yiyang Wang, Xi Chen, Xiaogang Xu, Sihui Ji, Yu Liu, Yujun Shen, Hengshuang Zhao:

DiffDoctor: Diagnosing Image Diffusion Models Before Treating. 18917-18926 - Jiale Chen, Wei Wang, Chongyang Shi, Li Dong, Xiping Hu:

Learning Robust Image Watermarking with Lossless Cover Recovery. 15056-15065 - Luyao Tang, Kunze Huang, Chaoqi Chen, Yuxuan Yuan, Chenxin Li, Xiaotong Tu, Xinghao Ding, Yue Huang:

Dissecting Generalized Category Discovery: Multiplex Consensus under Self-Deconstruction. 297-307 - Shenghe Zheng, Hongzhi Wang:

Free-Merging: Fourier Transform for Efficient Model Merging. 3863-3873 - Changhao Li, Xinrui Chen, Ji Wang, Kang Zhao, Jianfei Chen:

Task-Specific Zero-Shot Quantization-Aware Training for Object Detection. 22868-22878 - Weiming Ren, Wentao Ma, Huan Yang, Cong Wei, Ge Zhang, Wenhu Chen:

VAMBA: Understanding Hour-Long Videos with Hybrid Mamba-Transformers. 21197-21208 - Yiran Qin, Li Kang, Xiufeng Song, Zhenfei Yin, Xiaohong Liu, Xihui Liu, Ruimao Zhang, Lei Bai:

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints. 10075-10085 - Junsung Park, Jungbeom Lee, Jongyoon Song, Sangwon Yu, Dahuin Jung, Sungroh Yoon:

Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP. 2825-2835 - Shuren Qi, Yushu Zhang, Chao Wang, Zhihua Xia, Xiaochun Cao, Fenglei Fan:

Transparent Vision: A Theory of Hierarchical Invariant Representations. 3435-3444 - Xiaolong Xu, Lei Zhang, Jiayi Li, Lituan Wang, Yifan Guan, Yu Yan, Leyi Zhang, Hao Song:

Dual-Temporal Exemplar Representation Network for Video Semantic Segmentation. 10775-10785 - Jiahao Zhang, Anoop Cherian, Cristian Rodriguez, Weijian Deng, Stephen Gould:

Manual-PA: Learning 3D Part Assembly from Instruction Diagrams. 6304-6314 - Chenghu Du, Shengwu Xiong, Yi Rong:

All Parts Matter: A Unified Mask-Free Virtual Try-On Framework. 19525-19534 - Letian Zhang, Quan Cui, Bingchen Zhao, Cheng Yang:

Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis. 3542-3551 - Yikang Zhou, Tao Zhang, Shilin Xu, Shihao Chen, Qianyu Zhou, Yunhai Tong, Shunping Ji, Jiangning Zhang, Lu Qi, Xiangtai Li:

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs. 17663-17674 - ChangHee Yang, Hyeonseop Song, Seokhun Choi, Seungwoo Lee, Jaechul Kim, Hoseok Do:

PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data. 5611-5621 - Jiancheng Zhao, Yifan Zhan, Qingtian Zhu, Mingze Ma, Muyao Niu, Zunian Wan, Xiang Ji, Yinqiang Zheng:

Tree-NeRV: Efficient Non-Uniform Sampling for Neural Video Representation via Tree-Structured Feature Grids. 15076-15085 - Chengxu Liu, Lu Qi, Jinshan Pan, Xueming Qian, Ming-Hsuan Yang:

Learning Deblurring Texture Prior From Unpaired Data with Diffusion Model. 14195-14204 - Zeyinzi Jiang, Zhen Han, Chaojie Mao, Jingfeng Zhang, Yulin Pan, Yu Liu:

VACE: All-in-One Video Creation and Editing. 17191-17202 - Yuyan Chen, Yifan Jiang, Li Zhou, Jinghan Cao, Yu Guan, Ming-Hsuan Yang, Qing Guo:

Engage for All: Making Ordinary Image Descriptions Appealing Again! 19342-19352 - Adam W. Harley, Yang You, Xinglong Sun, Yang Zheng, Nikhil Raghuraman, Yunqi Gu, Sheldon Liang, Wen-Hsuan Chu, Achal Dave, Suya You, Rares Ambrus, Katerina Fragkiadaki, Leonidas J. Guibas:

AllTracker: Efficient Dense Point Tracking at High Resolution. 5253-5262 - Xiao Lin, Yun Peng, Liuyi Wang, Xianyou Zhong, Minghao Zhu, Yi Feng, Jingwei Yang, Chengju Liu, Qijun Chen:

CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation. 5990-6000 - Delong Zhang, Qiwei Huang, Yang Sun, Yuanliu Liu, Wei-Shi Zheng, Pengfei Xiong, Wei Zhang:

Learning Implicit Features with Flow-Infused Transformations for Realistic Virtual Try-On. 18736-18745 - Chen Li, Chinthani Sugandhika, Ee Yeo Keat, Eric P. Xing, Hao Zhang, Hong Yang, Deepu Rajan, Basura Fernando:

Imore: Implicit Program-Guided Reasoning for Human Motion QA. 12987-12996 - Chende Zheng, Ruiqi Suo, Chenhao Lin, Zhengyu Zhao, Le Yang, Shuai Liu, Minghui Yang, Cong Wang, Chao Shen:

D3: Training-Free AI-Generated Video Detection Using Second-Order Features. 1979-1989 - Rolandos Alexandros Potamias, Stathis Galanakis, Jiankang Deng, Athanasios Papaioannou, Stefanos Zafeiriou:

ImHead: A Large-Scale Implicit Morphable Model for Localized Head Modeling. 10196-10206 - Hiroyasu Akada, Jian Wang, Vladislav Golyanik, Christian Theobalt:

Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation. 9497-9507 - Yanqi Li, Jianwei Niu, Tao Ren:

Benefit from Seen: Enhancing Open-Vocabulary Object Detection by Bridging Visual and Textual Co-Occurrence Knowledge. 22110-22119 - Jinhua Zhang, Hualian Sheng, Sijia Cai, Bing Deng, Qiao Liang, Wen Li, Ying Fu, Jieping Ye, Shuhang Gu:

PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Model. 26306-26315 - Pingchuan Ma, Xiaopei Yang, Yusong Li, Ming Gui, Felix Krause, Johannes Schusterbauer, Björn Ommer:

SCFlow: Implicitly Learning Style and Content Disentanglement with Flow Models. 14919-14929 - Yanyan Li, Youxu Fang, Zunjie Zhu, Kunyi Li, Yong Ding, Federico Tombari:

4D Gaussian Splatting SLAM. 25019-25028 - Zelin Li, Ruohan Zong, Yifan Liu, Ruichen Yao, Yaokun Liu, Yang Zhang, Dong Wang:

Anti-Tamper Protection for Unauthorized Individual Image Generation. 15501-15510 - Haoyang Xu, Tianhao Zhao, Sibei Yang, Yutian Lin:

Penalizing Boundary Activation for Object Completeness in Diffusion Models. 14962-14972 - Ruofei Wang, Peiqi Duan, Boxin Shi, Renjie Wan

:
Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset. 10141-10150 - Nicole Hee-Yeon Kim, Hwanjun Song:

Robust Dataset Condensation using Supervised Contrastive Learning. 2857-2866 - Yan Wu, Korrawe Karunratanakul, Zhengyi Luo, Siyu Tang:

UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control. 13214-13224 - Sicheng Mo, Thao Nguyen, Xun Huang, Siddharth Srinivasan Iyer, Yijun Li, Yuchen Liu, Abhishek Tandon, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee, Bolei Zhou, Yuheng Li:

X-Fusion: Introducing New Modality to Frozen Large Language Models. 228-238 - Chunyi Li, Xiaozhe Li, Zicheng Zhang, Yuan Tian, Ziheng Jia, Xiaohong Liu, Xiongkuo Min, Jia Wang, Haodong Duan, Kai Chen, Guangtao Zhai:

Information Density Principle for MLLM Benchmarks. 4167-4177 - Avihai Naaman, Ron Shapira Weber, Oren Freifeld:

Synchronization of Multiple Videos. 12514-12523 - Chirui Chang, Jiahui Liu, Zhengzhe Liu, Xiaoyang Lyu, Yi-Hua Huang, Xin Tao, Pengfei Wan, Di Zhang, Xiaojuan Qi:

How Far are AI-Generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach. 10307-10317 - Junjie Nan, Jianing Li, Wei Chen, Mingkun Zhang, Xueqi Cheng:

NAPPure: Adversarial Purification for Robust Image Classification Under Non-Additive Perturbations. 2260-2269 - Xiao-Wen Zhang, Delong Zhang, Yi-Xing Peng, Zhi Ouyang, Jingke Meng, Wei-Shi Zheng:

Viperson: Flexibly Generating Virtual Identity for Person Re-Identification. 23374-23384 - Ryan Wong, Necati Cihan Camgöz, Richard Bowden:

SignRep: Enhancing Self-Supervised Sign Representations. 22804-22814 - Sudong Wang, Yunjian Zhang, Yao Zhu, Enci Liu, Jianing Li, Yanwei Liu, Xiangyang Ji:

SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models. 3639-3649 - Tongfan Guan, Jiaxin Guo, Chen Wang, Yun-Hui Liu:

BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment. 27681-27691 - Seunghun Lee, Jiwan Seo, Minwoo Choi, Kiljoon Han, Jaehoon Jeong, Zane Durante, Ehsan Adeli

, Sang Hyun Park, Sunghoon Im:
LOMM: Latest Object Memory Management for Temporally Consistent Video Instance Segmentation. 13719-13729 - Hao Lu, Yuting Zhang, Jiaqi Tang, Bowen Fu, Wenhang Ge, Wei Wei, Kaishun Wu, Yingcong Chen:

Rhythmguassian: Repurposing Generalizable Gaussian Model for Remote Physiological Measurement. 20780-20790 - Kwanyoung Kim, Byeongsu Sim:

PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity. 16238-16248 - Jianfang He, Min Cao, Silong Peng, Qiong Xie:

RareCLIP: Rarity-Aware Online Zero-Shot Industrial Anomaly Detection. 24478-24487 - Chen-Yi Lu, Md. Mehrab Tanjim, Ishita Dasgupta, Somdeb Sarkhel, Gang Wu, Saayan Mitra, Somali Chaterji:

Skald: Learning-Based Shot Assembly for Coherent Multi-Shot Video Creation. 17859-17868 - Yingjian Chen, Lei Zhang, Yakun Niu:

ForgeLens: Data-Efficient Forgery Focus for Generalizable Forgery Image Detection. 16270-16280 - Zewei Zhou, Hao Xiang, Zhaoliang Zheng, Seth Z. Zhao, Mingyue Lei, Yun Zhang, Tianhui Cai, Xinyi Liu, Johnson Liu, Maheswari Bajji, Xin Xia, Zhiyu Huang, Bolei Zhou, Jiaqi Ma:

V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction. 25399-25409 - Aneel Damaraju, Dean Hazineh, Todd E. Zickler:

CObL: Toward Zero-Shot Ordinal Layering Without User Prompting. 8154-8164 - Xin Ding, Hao Wu, Yifan Yang, Shiqi Jiang, Qianxi Zhang, Donglin Bai, Zhibo Chen, Ting Cao:

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition. 13448-13459 - Tianrui Lou, Xiaojun Jia, Siyuan Liang, Jiawei Liang, Ming Zhang, Yanjun Xiao, Xiaochun Cao:

3D Gaussian Splatting Driven Multi-View Robust Physical Adversarial Camouflage Generation. 28752-28762 - Dahee Kwon, Sehyun Lee, Jaesik Choi:

Granular Concept Circuits: Toward a Fine-Grained Circuit Discovery for Concept Representations. 2356-2365 - Han Fang, Kejiang Chen, Zehua Ma, Jiajun Deng, Yicong Li, Weiming Zhang, Ee-Chien Chang:

SynTag: Enhancing the Geometric Robustness of Inversion-Based Generative Image Watermarking. 15416-15425 - Fatemeh Saleh, Sadegh Aliakbarian, Charlie Hewitt, Lohit Petikam, Xian Xiao, Antonio Criminisi, Thomas J. Cashman, Tadas Baltrusaitis:

DAViD: Data-Efficient and Accurate Vision Models from Synthetic Data DAViD also references Michelangelo's David - an iconic symbol of anatomical precision-and the David vs. Goliath story, reflecting our small yet powerful dataset and models. 5348-5358 - Jan Skvrna, Lukás Neumann:

MonoSOWA: Scalable Monocular 3D Object Detector Without Human Annotations. 7613-7623 - Mohamed El Amine Boudjoghra, Ivan Laptev, Angela Dai:

ScanEdit: Hierarchically-Guided Functional 3D Scan Editing. 27105-27115 - Frano Rajic, Haofei Xu, Marko Mihajlovic, Siyuan Li, Irem Demir, Emircan Gündogdu, Lei Ke, Sergey Prokudin, Marc Pollefeys, Siyu Tang:

Multi-View 3D Point Tracking. 59-68 - Donghyeon Kwon, Youngseok Yoon, Hyeongseok Son, Suha Kwak:

MemDistill: Distilling LiDAR Knowledge into Memory for Camera-Only 3D Object Detection. 6828-6838 - Jianfei Jiang, Qiankun Liu, Haochen Yu, Hongyuan Liu, Liyong Wang, Jiansheng Chen, Huimin Ma:

MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network. 27806-27816 - David Stotko, Reinhard Klein:

SAFT: Shape and Appearance of Fabrics from Template via Differentiable Physical Simulations from Monocular Video. 27660-27670 - Jianting Tang, Yubo Wang, Haoyu Cao, Linli Xu:

BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models. 20582-20592 - Jiahui Geng, Qing Li:

SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders. 3023-3033 - Chuang Yu, Jinmiao Zhao, Yunpeng Liu, Sicheng Zhao, Yimian Dai, Xiangyu Yue:

From Easy to Hard: Progressive Active Learning Framework for Infrared Small Target Detection with Single Point Supervision. 2588-2598 - Shiyu Qin, Jinpeng Wang, Yimin Zhou, Bin Chen, Tianci Luo, Baoyi An, Tao Dai, Shu-Tao Xia, Yaowei Wang:

Cassic: Towards Content-Adaptive State-Space Models for Learned Image Compression. 15727-15736 - Zheng-Peng Duan, Jiawei Zhang, Xin Jin, Ziheng Zhang, Zheng Xiong, Dongqing Zou, Jimmy S. Ren, Chunle Guo, Chongyi Li:

DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution. 18948-18958 - Junyu Chen, Dongyun Zou, Wenkun He, Junsong Chen, Enze Xie, Song Han, Han Cai:

DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space. 1-10 - Stefan Kolek, Aditya Chattopadhyay, Kwan Ho Ryan Chan, Héctor Andrade-Loarca, Gitta Kutyniok, René Vidal:

Learning Interpretable Queries for Explainable Image Classification with Information Pursuit. 3947-3956 - Junghyup Lee, Jeimin Jeon, Dohyung Kim, Bumsub Ham:

Scheduling Weight Transitions for Quantization-Aware Training. 23466-23475 - Oscar Mañas, Pierluca D'Oro, Koustuv Sinha, Adriana Romero-Soriano, Michal Drozdzal, Aishwarya Agrawal:

Controlling Multimodal Llms Via Reward-Guided Decoding. 1391-1401 - Jiayi Guo, Chuanhao Yan, Xingqian Xu, Yulin Wang, Kai Wang, Gao Huang, Humphrey Shi:

IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance. 16079-16089 - Xavier Thomas, Deepti Ghadiyaram:

What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization. 2183-2194 - Yiyang Su, Yunping Shi, Feng Liu, Xiaoming Liu:

HAMoBE: Hierarchical and Adaptive Mixture of Biometric Experts for Video-Based Person ReID. 11525-11536 - Yafei Zhang, Lingqi Kong, Huafeng Li, Jie Wen:

Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning. 12659-12669 - Yanrui Bin, Wenbo Hu, Haoyuan Wang, Xinya Chen, Bing Wang:

NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors. 8330-8339 - Zengbin Wang, Saihui Hou, Junjie Li, Xu Liu, Chunshui Cao, Yongzhen Huang, Siye Wang, Man Zhang:

Gait: Exploring X Modality for Generalized Gait Recognition. 13259-13269 - Haejun Han, Hang Lu:

ASCENT: Annotation-Free Self-Supervised Contrastive Embeddings for 3D Neuron Tracking in Fluorescence Microscopy. 14676-14687 - Anand Kumar, Jiteng Mu, Nuno Vasconcelos:

IntroStyle: Training-Free Introspective Style Attribution Using Diffusion Features. 14909-14918 - Ping Cao, Yepeng Tang, Chunjie Zhang, Xiaolong Zheng, Chao Liang, Yunchao Wei, Yao Zhao:

Visual Relation Diffusion for Human-Object Interaction Detection. 23551-23560 - Xinye Cao, Hongcan Guo, Jiawen Qian, Guoshun Nan, Chao Wang, Yuqi Pan, Tianhao Hou, Xiaojuan Wang, Yutong Gao:

VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-Based Group Relative Policy Optimization. 23773-23783 - Hailing Wang, Jianglin Lu, Yitian Zhang, Yun Fu:

Outlier-Aware Post-Training Quantization for Image Super-Resolution. 16175-16184 - Yingsong Huang, Hui Guo, Jing Huang, Bing Bai, Qi Xiong:

Diffusion Epistemic Uncertainty with Asymmetric Learning for Diffusion-Generated Image Detection. 17097-17107 - Peng Chen, Pi Bu, Yingyao Wang, Xinyi Wang, Ziming Wang, Jie Guo, Yingxiu Zhao, Qi Zhu, Jun Song, Siran Yang, Jiamang Wang, Bo Zheng:

CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games. 10919-10928 - Hongjae Lee, Myungjun Son, Dongjea Kang, Seung-Won Jung:

Text Embedding Knows How to Quantize Text-Guided Diffusion Models. 15426-15436 - Shuai Jin, Yuhua Qian, Feijiang Li, Guoqing Liu, Xinyan Liang:

PASD: A Pixel-Adaptive Swarm Dynamics Approach for Unsupervised Low-Light Image Enhancement. 9070-9079 - Xuan-Hao Liu, Bao-Liang Lu, Wei-Long Zheng:

EEGMirror: Leveraging EEG Data in the Wild Via Montage-Agnostic Self-Supervision for EEG to Video Decoding. 18273-18283 - Fu-Jen Tsai, Yan-Tsung Peng, Yen-Yu Lin, Chia-Wen Lin:

PHATNet: A Physics-Guided Haze Transfer Network for Domain-Adaptive Real-World Image Dehazing. 5591-5600 - Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali, Tae Hyun Kim:

IDF: Iterative Dynamic Filtering Networks for Generalizable Image Denoising. 12180-12190 - Jorge Alejandro Amador Herrera, Yi Zhou, Xin Sun, Zhixin Shu, Chengan He, Sören Pirk, Dominik L. Michels:

Augmented Mass-Spring Model for Real-Time Dense Hair Simulation. 11339-11347 - Jialu Gao, K. J. Joseph, Fernando De la Torre:

Teleportraits: Training-Free People Insertion Into Any Scene. 18866-18875 - Qizhe Zhang, Aosong Cheng, Ming Lu, Renrui Zhang, Zhiyong Zhuo, Jiajun Cao, Shaobo Guo, Qi She, Shanghang Zhang:

Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs. 20857-20867 - Soorena Salari, Arash Harirpoush, Hassan Rivaz, Yiming Xiao:

Cabld: Contrast-Agnostic Brain Landmark Detection With Consistency-Based Regularization. 1-12 - Maria-Paola Forte, Nikos Athanasiou, Giulia Ballardini, Jan Ulrich Bartels, Katherine J. Kuchenbecker, Michael J. Black:

Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing. 5071-5080 - Marco Paul E. Apolinario, Sakshi Choudhary, Kaushik Roy:

CODE-CL: Conceptor-Based Gradient Projection for Deep Continual Learning. 775-784 - Jiahe Zhao, Ruibing Hou, Zejie Tian, Hong Chang, Shiguang Shan:

HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding. 4317-4327 - Chengwei Ren, Fan Zhang, Liangchao Xu, Liang Pan, Ziwei Liu, Wenping Wang, Xiao-Ping Zhang, Yuan Liu:

GauUpdate: New Object Insertion in 3D Gaussian Fields with Consistent Global Illumination. 28653-28663 - Yuan Sun, Xuan Wang, Cong Wang, WeiLi Zhang, Yanbo Fan, Yu Guo, Fei Wang:

Fine-Grained 3D Gaussian Head Avatars Modeling from Static Captures Via Joint Reconstruction and Registration. 14293-14304 - WonJun Moon, Hyun Seok Seong, Jae-Pil Heo:

Selective Contrastive Learning for Weakly Supervised Affordance Grounding. 5210-5220 - Risa Shinoda, Nakamasa Inoue, Iro Laina, Christian Rupprecht, Hirokatsu Kataoka:

AnimalClue: Recognizing Animals by their Traces. 14776-14786 - Lukas Kuhn, Sari Sadiya, Jörg Schlötterer, Florian Buettner, Christin Seifert, Gemma Roig:

Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers. 2217-2226 - Zhengbo Zhang, Lin Geng Foo, Hossein Rahmani, Jun Liu, De Wen Soh:

Performing Defocus Deblurring by Modeling its Formation Process. 5791-5801 - Yeming Yang, Qingling Zhu, Jianping Luo, Ka-Chun Wong, Qiuzhen Lin, Jianqiang Li:

TRNAS: A Training-Free Robust Neural Architecture Search. 2336-2345 - Zimin Ran, Xingyu Ren, Xiang An, Kaicheng Yang, Ziyong Feng, Jing Yang, Rolandos Alexandros Potamias, Linchao Zhu, Jiankang Deng:

HUST: High-Fidelity Unbiased Skin Tone Estimation via Texture Quantization. 13523-13532 - Jiawei He, Danshi Li, Xinqiang Yu, Zekun Qi, Wenyao Zhang, Jiayi Chen, Zhaoxiang Zhang, Zhizheng Zhang, Li Yi, He Wang:

DexVLG: Dexterous Vision-Language-Grasp Model at Scale. 14248-14258 - Yujie Zhou, Jiazi Bu, Pengyang Ling, Pan Zhang, Tong Wu, Qidong Huang, Jinsong Li, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Anyi Rao, Jiaqi Wang, Li Niu:

Light-a-Video: Training-Free Video Relighting via Progressive Light Fusion. 13315-13325 - Yuan Gao, Sangwook Kim, Jianzhong You, Chris McIntosh

:
ProbMED: A Probabilistic Framework for Medical Multimodal Binding. 20157-20167 - Sanghyun Son, Matheus Gadelha, Yang Zhou, Matthew Fisher, Zexiang Xu, Yi-Ling Qiao, Ming C. Lin, Yi Zhou:

DMesh++: An Efficient Differentiable Mesh for Complex Shapes. 26590-26599 - Yuang Wang, Chao Wen, Haoyu Guo, Sida Peng, Minghan Qin, Hujun Bao, Xiaowei Zhou, Ruizhen Hu:

Precise Action-to-Video Generation Through Visual Action Prompts. 12713-12724 - Derong Jin, Ruohan Gao:

Differentiable Room Acoustic Rendering with Multi-View Vision Priors. 37-47 - Yongchuan Cui, Peng Liu, Hui Zhang:

Enpowering Your Pansharpening Models with Generalizability: Unified Distribution Is All You Need. 11850-11860 - Tiange Xiang, Kai Li, Chengjiang Long, Christian Häne, Peihong Guo, Scott L. Delp, Ehsan Adeli, Li Fei-Fei:

Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation. 16492-16502 - Ziv Weiss Haddad, Oren Barkan, Yehonatan Elisha, Noam Koenigstein:

Soft Local Completeness: Rethinking Completeness in XAI. 19794-19804 - Shiyu Zhang, Cheng Yan, Yang Liu, Chenchen Jing, Lei Zhou, Wenjun Wang:

Learning Visual Proxy for Compositional Zero-Shot Learning. 2793-2802 - Ruchit Rawal, Reza Shirkavand, Heng Huang, Gowthami Somepalli, Tom Goldstein:

ARGUS: Hallucination and Omission Evaluation in Video-LLMs. 20280-20290 - Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif, Bassem Ouni, Muhammad Shafique:

ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches. 22999-23008 - Yian Zhao, Rushi Ye, Ruochong Zheng, Zesen Cheng, Chaoran Feng, Jiashu Yang, Pengchong Qiao, Chang Liu, Jie Chen:

Tune-Your-Style: Intensity-Tunable 3D Style Transfer with Gaussian Splatting. 19032-19042 - Fei Xie, Zhongdao Wang, Weijia Zhang, Chao Ma:

PVMamba: Parallelizing Vision Mamba via Dynamic State Aggregation. 10218-10228 - Shengqi Liu, Yuhao Cheng, Zhuo Chen, Xingyu Ren, Wenhan Zhu, Lincheng Li, Mengxiao Bi, Xiaokang Yang, Yichao Yan:

Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation. 17640-17650 - Jiahao Li, Xinhong Chen, Zhengmin Jiang, Qian Zhou, Yung-Hui Li, Jianping Wang:

Global Regulation and Excitation via Attention Tuning for Stereo Matching. 25539-25549 - Zhaonan Wang, Manyi Li, Changhe Tu:

AG2aussian: Anchor-Graph Structured Gaussian Splatting for Instance-Level 3D Scene Understanding and Editing. 26806-26816 - Zongyan Han, Mohamed El Amine Boudjoghra, Jiahua Dong, Jinhong Wang, Rao Muhammad Anwer:

All in One: Visual-Description-Guided Unified Point Cloud Segmentation. 24835-24845 - Ziliang Miao, Runjian Chen, Yixi Cai, Buwei He, Wenquan Zhao, Wenqi Shao, Bo Zhang, Fu Zhang:

Temporal Overlapping Prediction: A Self-Supervised Pre-Training Method for LiDAR Moving Object Segmentation. 26653-26663 - Youngho Kim, Hoonhee Cho, Kuk-Jin Yoon:

From Sharp to Blur: Unsupervised Domain Adaptation for 2D Human Pose Estimation Under Extreme Motion Blur Using Event Cameras. 9406-9417 - Wentao Zhu, Zhining Zhang, Yuwei Ren, Yin Huang, Hao Xu, Yizhou Wang:

Embodied Representation Alignment with Mirror Neurons. 11948-11957 - Yingwen Zhang, Meng Wang, Xihua Sheng, Peilin Chen, Junru Li, Li Zhang, Shiqi Wang:

An Information-Theoretic Regularizer for Lossy Neural Image Compression. 15573-15582 - Guibao Shen, Luozhou Wang, Jiantao Lin, Wenhang Ge, Chaozhe Zhang, Xin Tao, Di Zhang, Pengfei Wan, Guangyong Chen, Yijun Li, Ying-Cong Chen:

Scene Graph Guided Generation: Enable Accurate Relations Generation in Text-to-Image Models via Textural Rectification. 15437-15446 - Zhen Zeng, Leijiang Gu, Xun Yang, Zhangling Duan, Zenglin Shi, Meng Wang:

Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models. 2491-2500 - Jiasheng Guo, Xin Gao, Yuxiang Yan, Guanghao Li, Jian Pu:

Dark-ISP: Enhancing RAW Image Processing for Low-Light Object Detection. 9583-9593 - Zhichuan Wang, Yang Zhou, Zhe Liu, Rui Yu

, Song Bai, Yulong Wang, Xinwei He, Xiang Bai:
Describe, Adapt and Combine: Empowering CLIP Encoders for Open-Set 3D Object Retrieval. 21026-21036 - Youliang Zhang, Ronghui Li, Yachao Zhang, Liang Pan, Jingbo Wang, Yebin Liu, Xiu Li:

A Plug-And-Play Physical Motion Restoration Approach for In-The-Wild High-Difficulty Motions. 13281-13292 - Donglin Di, He Feng, Wenzhang Sun, Yongjia Ma, Hao Li, Wei Chen, Lei Fan, Tonghua Su, Xun Yang:

DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation. 12124-12134 - Fuyan Ma, Yiran He, Bin Sun, Shutao Li:

Multimodal Prompt Alignment for Facial Expression Recognition. 12581-12591 - Ruonan Yu, Songhua Liu, Zigeng Chen, Jingwen Ye, Xinchao Wang:

Heavy Labels Out! Dataset Distillation with Label Space Lightening. 5017-5026 - Tinghan Yang, Md Ashiqur Rahman, Raymond A. Yeh:

CLIPSym: Delving into Symmetry Detection with CLIP. 21003-21013 - Byung Hyun Lee, Wongi Jeong, Woojae Han, Kyoungbun Lee, Se Young Chun:

Continual Multiple Instance Learning with Enhanced Localization for Histopathological Whole Slide Image Analysis. 23232-23242 - Do Huu Dat, Nam Hyeon-Woo, Po Yuan Mao, Tae-Hyun Oh:

VSC: Visual Search Compositional Text-to-Image Diffusion Model. 19153-19162 - Yuan Wang, Yuxin Chen, Zhongang Qi, Lijun Liu, Jile Jiao, Xuetao Feng, Yujia Liang, Ying Shan, Zhipeng Zhang:

Mamba-3VL: Taming State Space Model for 3D Vision Language Learning. 6273-6283 - Zihua Zhao, Feng Hong, Mengxi Chen, Pengyi Chen, Benyuan Liu, Jiangchao Yao, Ya Zhang, Yanfeng Wang:

Differential-Informed Sample Selection Accelerates Multimodal Contrastive Learning. 2930-2940 - Dongyue Wu, Zilin Guo, Jialong Zuo, Nong Sang, Changxin Gao:

Partial Forward Blocking: A Novel Data Pruning Paradigm for Lossless Training Acceleration. 1-10 - Jaeho Shin, Hyeonjae Gil, Junwoo Jang, Maani Ghaffari, Ayoung Kim:

Registration beyond Points: General Affine Subspace Alignment via Geodesic Distance on Grassmann Manifold. 3767-3776 - Zhuoguang Chen, Minghui Qin, Tianyuan Yuan, Zhe Liu, Hang Zhao:

LONG3R: Long Sequence Streaming 3D Reconstruction. 5273-5284 - Peng Cai, Qiang Li, Kaicheng Yang, Dong Guo, Jia Li, Nan Zhou, Xiang An, Ninghua Yang, Jiankang Deng:

ForCenNet: Foreground-Centric Network for Document Image Rectification. 15137-15146 - Chuyan Zhang, Kefan Wang, Yun Gu:

Beyond Low-Rank Tuning: Model Prior-Guided Rank Allocation for Effective Transfer in Low-Data and Large-Gap Regimes. 1-10 - Hyungjin Kim, Seokho Ahn, Young-Duk Seo:

Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models. 17171-17180 - Aleksandar Jevtic, Christoph Reich, Felix Wimbauer, Oliver Hahn, Christian Rupprecht, Stefan Roth, Daniel Cremers:

Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion. 6784-6796 - Sijie Wang, Siqi Li, Yawei Zhang, Shangshu Yu, Shenghai Yuan, Rui She, Quanjiang Guo, Jinxuan Zheng, Ong Kang Howe, Leonrich Chandra, Shrivarshann Srijeyan, Aditya Sivadas, Toshan Aggarwal, Heyuan Liu, Hongming Zhang, Chujie Chen, Junyu Jiang, Lihua Xie, Wee Peng Tay:

UAVScenes: A Multi-Modal Dataset for UAVs. 28946-28958 - Seogkyu Jeon, Kibeom Hong, Hyeran Byun:

Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation. 20791-20801 - Young-Jun Lee, Byung-Kwan Lee, Jianshu Zhang, Yechan Hwang, Byungsoo Ko, Han-Gyu Kim, Dongyu Yao, Xuankun Rong, Eojin Joo, Seung-Ho Han, Bowon Ko, Ho-Jin Choi:

MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models. 708-719 - Ruotong Wang, Mingli Zhu, Jiarong Ou, Rui Chen, Xin Tao, Pengfei Wan, Baoyuan Wu:

BadVideo: Stealthy Backdoor Attack Against Text-to-Video Generation. 19075-19084 - David Serrano-Lozano, Aditya Arora, Luis Herranz, Konstantinos G. Derpanis, Michael S. Brown, Javier Vazquez-Corral:

Revisiting Image Fusion for Multi-Illuminant White-Balance Correction. 8275-8284 - Johannes Jakubik, Felix Yang, Benedikt Blumenstiel, Erik Scheurer, Rocco Sedona, Stefano Maurogiovanni, Jente Bosmans, Nikolaos Dionelis, Valerio Marsocci, Niklas Kopp, Rahul Ramachandran, Paolo Fraccaro, Thomas Brunschwiler, Gabriele Cavallaro, Juan Bernabé-Moreno, Nicolas Longépé:

TerraMind: Large-Scale Generative Multimodality for Earth Observation. 7383-7394 - Xinran Ling, Chen Zhu, Meiqi Wu, Hangyu Li, Xiaokun Feng, Cundian Yang, Aiming Hao, Jiashu Zhu, Jiahong Wu, Xiangxiang Chu:

VMBench: A Benchmark for Perception-Aligned Video Motion Generation. 13087-13098 - Ziyang Ren, Ping Wei, Shangqi Deng, Haowen Tang, Jiapeng Li, Huan Li:

TOTP: Transferable Online Pedestrian Trajectory Prediction with Temporal-Adaptive Mamba Latent Diffusion. 26263-26272 - Yuhan Li, Xianfeng Tan, Wenxiang Shang, Yubo Wu, Jian Wang, Xuanhong Chen, Yi Zhang, Hangcheng Zhu, Bingbing Ni:

RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation. 17485-17495 - Zonglin Lyu, Chen Chen:

TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation. 1-10 - Olaf Dükel, Thomas Wimmer, Christian Theobalt, Christian Rupprecht, Adam Kortylewski:

Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels. 5834-5844 - Emanuele Giacomini, Luca Di Giammarino, Lorenzo De Rebotti, Giorgio Grisetti, Martin R. Oswald:

Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping. 27630-27639 - Jinhyung Park, Javier Romero, Shunsuke Saito, Fabian Prada, Takaaki Shiratori, Yichen Xu, Federica Bogo, Shoou-I Yu, Kris Kitani, Rawal Khirodkar:

ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling. 6508-6518 - Yongjin Lee, Hyeon Mun Jeong, Yurim Jeon, Sanghyun Kim:

EVT: Efficient View Transformation for Multi-Modal 3D Object Detection. 26632-26642 - Jinhong Wang, Shuo Tong, Jian Liu, Dongqi Tang, Weiqiang Wang, Wentong Li, Hongxia Xu, Danny Z. Chen, Jintai Chen, Jian Wu:

OrderChain: Towards General Instruct-Tuning for Stimulating the Ordinal Understanding Ability of MLLM. 3477-3487 - Shangbo Wu, Yu-an Tan, Ruinan Ma, Wencong Ma, Dehua Zhu, Yuanzhang Li:

Boosting Generative Adversarial Transferability with Self-Supervised Vision Transformer Features. 530-540 - Thu Hang Phung, Duong M. Nguyen, Thanh Trung Huynh, Quoc Viet Hung Nguyen, Trong Nghia Hoang, Phi Le Nguyen:

Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data. 3936-3946 - Zhen Xing, Qi Dai, Zejia Weng, Zuxuan Wu, Yu-Gang Jiang:

Aid: Adapting Image2video Diffusion Models for Instruction-Guided Video Prediction. 21243-21253 - Hao Zhou, Zhanning Gao, Zhili Chen, Maosheng Ye, Qifeng Chen, Tongyi Cao, Honggang Qi:

Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving. 6165-6175 - Yihong Luo, Tianyang Hu, Yifan Song, Jiacheng Sun, Zhenguo Li, Jing Tang:

Adding Additional Control to One-Step Diffusion with Joint Distribution Matching. 4009-4018 - Naresh Kumar Devulapally, Mingzhen Huang, Vishal Asnani, Shruti Agarwal, Siwei Lyu, Vishnu Suresh Lokhande:

Your Text Encoder Can Be an Object-Level Watermarking Controller. 16576-16585 - Victor Quétu, Zhu Liao, Nour Hezbri, Fabio Pizzati, Enzo Tartaglione:

LaCoOT: Layer Collapse through Optimal Transport. 20497-20507 - Maximilian Pittner, Joel Janai, Mario Faigle, Alexandru Paul Condurache:

SparseLaneSTP: Leveraging Spatio-Temporal Priors with Sparse Transformers for 3D Lane Detection. 29099-29109 - Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Korrawe Karunratanakul, Pu Wang, Hongfei Xue, Chen Chen, Chuan Guo, Junli Cao, Jian Ren, Sergey Tulyakov:

MaskControl: Spatio-Temporal Control for Masked Motion Synthesis. 9955-9965 - Yixin Yang, Jiawei Zhang, Yang Zhang, Yunxuan Wei, Dongqing Zou, Jimmy S. Ren, Boxin Shi:

Event-Guided HDR Reconstruction with Diffusion Priors. 11787-11796 - Tianshu Huang, Akarsh Prabhakara, Chuhan Chen, Jay Karhade, Deva Ramanan, Matthew O'Toole, Anthony Rowe:

Towards Foundational Models for Single-Chip Radar. 24655-24665 - Yi Huang, Wei Xiong, He Zhang, Chaoqi Chen, Jianzhuang Liu, Mingfu Yan, Shifeng Chen:

DIVE: Taming DINO for Subject-Driven Video Editing. 16004-16014 - Chunhao Lu, Qiang Lu, Meichen Dong, Jake Luo:

End-to-End Multi-Modal Diffusion Mamba. 20529-20540 - Ragav Sachdeva, Andrew Zisserman:

From Panels to Prose: Generating Literary Narratives from Comics. 21864-21873 - Tongshun Zhang, Pingping Liu, Yubing Lu, Mengen Cai, Zijian Zhang, Zhe Zhang, Qiuzhan Zhou:

CWNet: Causal Wavelet Network for Low-Light Image Enhancement. 8789-8799 - Ava Pun, Kangle Deng, Ruixuan Liu, Deva Ramanan, Changliu Liu, Jun-Yan Zhu:

Generating Physically Stable and Buildable Brick Structures from Text. 14798-14809 - Chen Liu, Tobias Ritschel:

Generative Video Bi-Flow. 19363-19372 - Wenshuo Gao, Xicheng Lan, Shuai Yang:

AnyPortal: Zero-Shot Consistent Video Background Replacement. 18990-18999 - Pradyumn Goyal, Dmitry Petrov, Sheldon Andrews, Yizhak Ben-Shabat, Hsueh-Ti Derek Liu, Evangelos Kalogerakis:

GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes. 9332-9341 - Shaokui Wei, Jiayin Liu, Hongyuan Zha:

Backdoor Mitigation by Distance-Driven Detoxification. 4465-4474 - Yufan Liu, Wanqian Zhang, Huashan Chen, Lin Wang, Xiaojun Jia, Zheng Lin, Weiping Wang:

AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts. 1-10 - Xinqi Lyu, Yihao Liu, Yanjie Li, Bin Xiao:

PLA: Prompt Learning Attack Against Text-To-Image Generative Models. 16851-16860 - Trevine Oorloff, Vishwanath Sindagi, Wele Gedara Chaminda Bandara, Ali Shafahi, Amin Ghiasi, Charan Prakash, Reza Ardekani:

Stable Diffusion Models Are Secretly Good at Visual In-Context Learning. 23604-23613 - Wenjie Huang, Qi Yang, Shuting Xia, He Huang, Yiling Xu, Zhu Li:

LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression. 28577-28586 - Seunghun Lee, Jiwan Seo, Kiljoon Han, Minwoo Choi, Sunghoon Im:

CAVIS: Context-Aware Video Instance Segmentation. 4507-4517 - Ylli Sadikaj, Hongkuan Zhou, Lavdim Halilaj, Stefan Schmid, Steffen Staab, Claudia Plant:

MultiADS: Defect-Aware Supervision for Multi-Type Anomaly Detection and Segmentation in Zero-Shot Learning. 22978-22988 - Weihao Yu, Xiaoqing Guo, Xinyu Liu, Yifan Liu, Hao Zheng, Yawen Huang, Yixuan Yuan:

GaussianReg: Rapid 2D/3D Registration for Emergency Surgery Via Explicit 3D Modeling with Gaussian Primitives. 21482-21491 - Yuxin Jiang, Liming Jiang, Shuai Yang, Jia-Wei Liu, Ivor W. Tsang, Mike Zheng Shou:

Balanced Image Stylization with Style Matching Score. 1-10 - Haochen Han, Alex Jinpeng Wang, Peijun Ye, Fangming Liu:

Unlearning the Noisy Correspondence Makes CLIP More Robust. 4518-4528 - Gongwei Chen, Xurui Zhou, Rui Shao, Yibo Lyu, Kaiwen Zhou, Shuai Wang, Wentao Li, Yinchuan Li, Zhongang Qi, Liqiang Nie:

Less is More: Empowering GUI Agent with Context-Aware Simplification. 5901-5911 - Hung-Chieh Fang, Hsuan-Tien Lin, Irwin King, Yifei Zhang:

Soft Separation and Distillation: Toward Global Uniformity in Federated Unsupervised Learning. 2971-2980 - Heitor Rapela Medeiros, Atif Belal, Srikanth Muralidharan, Eric Granger, Marco Pedersoli:

Visual Modality Prompt for Adapting Vision-Language Object Detectors. 2172-2182 - Wenbo Yang, Zhongling Wang, Zhou Wang:

Towards a Universal Image Degradation Model via Content-Degradation Disentanglement. 12966-12975 - Ruifei Zhang, Junlin Xie, Wei Zhang, Weikai Chen, Xiao Tan, Xiang Wan, Guanbin Li:

Adadrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving. 5112-5121 - Pei Wang, Zhaowei Cai, Hao Yang, Davide Modolo, Ashwin Swaminathan:

Enhancing Numerical Prediction of MLLMS With Soft Labeling. 3424-3434 - Zhenjun Yu, Wenqiang Xu, Pengfei Xie, Yutong Li, Brian W. Anthony, Zhuorui Zhang, Cewu Lu:

Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-Aware Contact Representation. 8590-8599 - Chengyu Zheng, Jin Huang, Honghua Chen, Mingqiang Wei:

RARE: Refine Any Registration of Pairwise Point Clouds via Zero-Shot Learning. 26549-26558 - Xiequn Wang, Zhan Zhuang, Yu Zhang:

PLAN: Proactive Low-Rank Allocation for Continual Learning. 2909-2918 - Reza Rezaeian, Moein Heidari, Reza Azad, Dorit Merhof, Hamid Soltanian-Zadeh, Ilker Hacihaliloglu:

SL2 A-INR: Single-Layer Learnable Activation for Implicit Neural Representation. 26065-26074 - Wenjing Bian, Axel Barroso-Laguna, Tommaso Cavallari, Victor Adrian Prisacariu, Eric Brachmann:

Scene Coordinate Reconstruction Priors. 25765-25776 - Jaeha Kim, Junghun Oh, Kyoung Mu Lee:

Exploiting Diffusion Prior for Task-Driven Image Restoration. 10151-10161 - Yicheng Feng, Yijiang Li, Wanpeng Zhang, Sipeng Zheng, Hao Luo, Zihao Yue, Zongqing Lu:

VideoOrion: Tokenizing Object Dynamics in Videos. 20401-20412 - Zhiqi Pang, Chunyu Wang, Lingling Zhao, Junjie Wang:

Augmented and Softened Matching for Unsupervised Visible-Infrared Person Re-Identification. 14400-14409 - Tian-Xing Xu, Xiangjun Gao, Wenbo Hu, Xiaoyu Li, Song-Hai Zhang, Ying Shan:

Geometrycrafter: Consistent Geometry Estimation for Open-World Videos With Diffusion Priors. 1-13 - Tianfang Zhu, Hongyang Zhou, Anan Li:

MorphoGen: Efficient Unconditional Generation of Long-Range Projection Neuronal Morphology via a Global-to-Local Framework. 13021-13031 - Hui Li:

LLM Thought Divergence and Convergence for Dialogue-Based Image Generation Control. 18101-18110 - Joowon Kim, Ziseok Lee, Donghyeon Cho, Sanghyun Jo, Yeonsung Jung, Kyungsu Kim, Eunho Yang:

Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing. 18844-18854 - Hai Huang, Yan Xia, Sashuai Zhou, Hanting Wang, Shulei Wang, Zhou Zhao:

Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations. 22488-22498 - Ao Wang, Lihao Liu, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding:

YOLOE: Real-Time Seeing Anythi. 24591-24602 - Peiran Xu, Xicheng Gong, Yadong Mu:

NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation. 6327-6341 - Kai Ye, Chong Gao, Guanbin Li, Wenzheng Chen, Baoquan Chen:

GeoSplatting: Towards Geometry Guided Gaussian Splatting for Physically-Based Inverse Rendering. 28991-29000 - Ruyang Liu, Shangkun Sun, Haoran Tang, Wei Gao, Ge Li:

Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow. 23817-23827 - Yi Qin, Rui Wang, Tao Huang, Tong Xiao, Liping Jing:

SAM Encoder Breach by Adversarial Simplicial Complex Triggers Downstream Model Failures. 10624-10634 - Liying Yang, Chen Liu, Zhenwei Zhu, Ajian Liu, Hui Ma, Jian Nong, Yanyan Liang:

Not All Frame Features are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features. 7494-7504 - Jialiang Wang, Xianming Liu, Xiong Zhou, Gangfeng Hu, Deming Zhai, Junjun Jiang, Xiangyang Ji:

Joint Asymmetric Loss for Learning with Noisy Labels. 1947-1956 - Teng Zhou, Xiaoyu Zhang, Yongchuan Tang:

PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs. 1-10 - Yuyang Yang, We Li, Sheng Ao, Qingshan Xu, Shangshu Yu, Yu Guo, Yin Zhou, Siqi Shen, Cheng Wang:

RALoc: Enhancing Outdoor LiDAR Localization via Rotation Awareness. 3304-3313 - Kang Du, Zhihao Liang, Yulin Shen, Zeyu Wang:

GS-ID: Illumination Decomposition on Gaussian Splatting via Adaptive Light Aggregation and Diffusion-Guided Material Priors. 26220-26229 - Yingjie Zhou, Jiezhang Cao, Zicheng Zhang, Farong Wen, Yanwei Jiang, Jun Jia, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai:

Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads. 12201-12211 - Xiang Li, Lannan Luo, Qiang Zeng:

Backdoor Attacks on Neural Networks Via One-Bit Flip. 4328-4338 - Jinghan You, Shanglin Li, Yuanrui Sun, Jiangchuan Wei, Mingyu Guo, Chao Feng, Jiao Ran:

LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition. 11840-11849 - Mengmeng Sheng, Zeren Sun, Tianfei Zhou, Xiangbo Shu, Jinshan Pan, Yazhou Yao:

CA2C: A Prior-Knowledge-Free Approach for Robust Label Noise Learning via Asymmetric Co-Learning and Co-Training. 901-911 - Xiaoxi Liang, Yanbo Fan, Qiya Yang, Xuan Wang, Wei Gao, Ge Li:

DGTalker: Disentangled Generative Latent Space Learning for Audio-Driven Gaussian Talking Heads. 11079-11088 - Nahyuk Lee, Juhong Min, Junhong Lee, Chunghyun Park, Minsu Cho:

Combinative Matching for Geometric Shape Assembly. 9540-9549 - Hongchi Ma, Guanglei Yang, Debin Zhao, Yanli Ji, Wangmeng Zuo:

ReMP-AD: Retrieval-Enhanced Multi-Modal Prompt Fusion for Few-Shot Industrial Visual Anomaly Detection. 20425-20434 - Nam Duong Tran, Nam Nguyen Phuong, Hieu H. Pham, Phi Le Nguyen, My T. Thai:

ConstStyle: Robust Domain Generalization with Unified Style Transformation. 1-10 - Yupeng Hu, Changxing Ding, Chang Sun, Shaoli Huang, Xiangmin Xu:

Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection. 20126-20136 - Tuo Xiang, Xuemiao Xu, Bangzhen Liu, Jinyi Li, Yong Li, Shengfeng He:

Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification. 6761-6771 - Jiayuan Zhu, Junde Wu, Cheng Ouyang, Konstantinos Kamnitsas, J. Alison Noble:

SPA: Efficient User-Preference Alignment against Uncertainty in Medical Image Segmentation. 23731-23740 - Kim Kiehn, Albin Ahlbäck, Kathlén Kohn:

PLMP - Point-Line Minimal Problems for Projective SfM. 8558-8567 - Xiaoyu Zhang, Weihong Pan, Xiaojun Xiang, Hongjia Zhai, Liyang Zhou, Hanqing Jiang, Guofeng Zhang:

Tile-Wise Vs. Image-Wise: Random-Tile Loss and Training Paradigm for Gaussian Splatting. 26923-26932 - Matt De Vries, Reed Naidoo, Olga Fourkioti, Lucas G. Dent, Nathan Curry, Christopher Dunsby, Chris Bakal:

Interpretable Point Cloud Classification Using Multiple Instance Learning. 22209-22220 - Zihan Cao, Yu Zhong, Liang-Jian Deng:

Taming Flow Matching With Unbalanced Optimal Transport Into Fast Pansharpening. 2803-2813 - Jiannan Ge, Lingxi Xie, Hongtao Xie, Pandeng Li, Sun-Ao Liu, Xiaopeng Zhang, Qi Tian, Yongdong Zhang:

CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation. 24034-24044 - Zhexiong Wan, Jianqin Luo, Yuchao Dai, Gim Hee Lee:

Event-Aided Dense and Continuous Point Tracking: Everywhere and Anytime. 7936-7946 - Haidong Kang, Lianbo Ma, Pengjun Chen, Guo Yu, Xingwei Wang, Min Huang:

Beyond the Limits: Overcoming Negative Correlation of Activation-Based Training-Free NAS. 796-805 - Zhenzhi Wang, Yixuan Li, Yanhong Zeng, Yuwei Guo, Dahua Lin, Tianfan Xue, Bo Dai:

Multi-Identity Human Image Animation with Structural Video Diffusion. 11937-11947 - Jiaru Zhong, Jiahao Wang, Jiahui Xu, Xiaofan Li, Zaiqing Nie, Haibao Yu:

Cooptrack: Exploring End-to-End Learning for Efficient Cooperative Sequential Perception. 26954-26965 - Yanwen Wang, Yiyu Zhuang, Jiawei Zhang, Li Wang, Yifei Zeng, Xun Cao, Xinxin Zuo, Hao Zhu:

Tera: Rethinking Text-Guided Realistic 3D Avatar Generation. 10686-10697 - Yogesh Kumar, Uday Agarwal, Manish Gupta, Anand Mishra:

Aligning Moments in Time Using Video Queries. 20215-20225 - Leon Sick, Dominik Engel, Sebastian Hartwig, Pedro Hermosilla, Timo Ropinski:

CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation. 21265-21275 - Gang Dai, Yifan Zhang, Yutao Qin, Qiangya Guo, Shuangping Huang, Shuicheng Yan:

Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation. 19054-19064 - Lingyun Huang, Jianxu Mao, Junfei Yi, Ziming Tao, Yaonan Wang:

CVPT: Cross Visual Prompt Tuning. 848-858 - Wen Yang, Guodong Liu, Di Ming:

SMP-Attack: Boosting the Transferability of Feature Importance-Based Adversarial Attack with Semantics-Aware Multi-Granularity Patchout. 4444-4454 - Wenhao Wang, Yi Yang:

TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation. 14898-14908 - Zefu Lin, Wenbo Chen, Xiaojuan Jin, Yuran Yang, Lue Fan, Yixin Zhang, Yufeng Zhang, Zhaoxiang Zhang:

MCOP: Multi-UAV Collaborative Occupancy Prediction. 27242-27251 - Guangyao Li, Siping Zhuang, Yajun Jian, Yan Yan, Hanzi Wang:

Language Decoupling with Fine-Grained Knowledge Guidance for Referring Multi-Object Tracking. 23626-23635 - Yiwen Zhao, Yang Wang, Liting Wen, Hengyuan Zhang, Xingqun Qi:

Freedance: Towards Harmonic Free-Number Group Dance Generation Via a Unified Framework. 10560-10569 - Sara Rojas, Matthieu Armando, Bernard Ghanem

, Philippe Weinzaepfel, Vincent Leroy, Grégory Rogez:
HAMSt3R: Human-Aware Multi-View Stereo 3D Reconstruction. 5027-5037 - Evan Casey, Tianyu Zhang, Shu Ishida, John Roger Thompson, Amir Khasahmadi, Joseph George Lambourne, Pradeep Kumar Jayaraman, Karl D. D. Willis:

Aligning Constraint Generation with Design Intent in Parametric CAD. 8613-8622 - Yanyun Wang, Li Liu:

Failure Cases Are Better Learned but Boundary Says Sorry: Facilitating Smooth Perception Change for Accuracy-Robustness Trade-Off in Adversarial Training. 4691-4700 - Jinxin Shi, Jiabao Zhao, Yifan Yang, Xingjiao Wu, Jiawen Li, Liang He:

Lark: Low-Rank Updates After Knowledge Localization for Few-Shot Class-Incremental Learning. 3607-3617 - Yihan Cao, Jiazhao Zhang, Zhinan Yu, Shuzhen Liu, Zheng Qin, Qin Zou, Bo Du, Kai Xu:

CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs. 9550-9560 - Jeonghyeok Do, Munchurl Kim:

Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-Shot Skeleton-Based Action Recognition. 12757-12768 - Xinhang Liu, Jiawei Shi, Zheng Dang, Yuchao Dai:

MixRI: Mixing Features of Reference Images for Novel Object Pose Estimation. 9024-9035 - Ziqian Lu, Yunlong Yu, Qinyue Tong, Jun Liu:

Hierarchical Divide-And-Conquer Grouping for Classification Adaptation of Pre-Trained Models. 3575-3584 - Miroslav Purkrábek, Jiri Matas:

Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle. 9004-9013 - Wenliang Zhong, Robert A. Barton, Weizhi An, Feng Jiang, Hehuan Ma, Yuzhi Guo, Abhishek Dan, Shioulin Sam, Karim Bouyarmane, Junzhou Huang:

Zero-Shot Composed Image Retrieval via Dual-Stream Instruction-Aware Distillation. 22221-22231 - Jeongseok Hyun, Sukjun Hwang, Su Ho Han, Taeoh Kim, Inwoong Lee, Dongyoon Wee, Joon-Young Lee, Seon Joo Kim, Minho Shim:

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs. 23990-24000 - Woojung Son, Yoonki Cho, Guoyuan An, Chanmi Lee, Sung-Eui Yoon:

Towards Robustness of Person Search Against Corruptions. 23408-23418 - Zishu Qin, Junhao Xu, Weifeng Ge:

DeFSS: Image-to-Mask Denoising Learning for Few-Shot Segmentation. 22232-22240 - Runze Zhang, Guoguang Du, Xiaochuan Li, Qi Jia, Liang Jin, Lu Liu, Jingjing Wang, Cong Xu, Zhenhua Guo, Yaqian Zhao, Xiaoli Gong, Rengang Li, Baoyu Fan:

Dropletvideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation. 15583-15593 - Wenbin Teng, Gonglin Chen, Haiwei Chen, Yajie Zhao:

FVGen: Accelerating Novel-View Synthesis with Adversarial Video Diffusion Distillation. 26095-26105 - Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang, Pavel Tokmakov, Martial Hebert:

ReferevErything: Towards Segmenting Everything we can Speak of in Videos. 23221-23231 - Hengjia Li, Haonan Qiu, Shiwei Zhang, Xiang Wang, Yujie Wei, Zekun Li, Yingya Zhang, Boxi Wu, Deng Cai:

PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation. 19406-19416 - Alessandro Conti, Massimiliano Mancini, Enrico Fini, Yiming Wang, Paolo Rota, Elisa Ricci:

On Large Multimodal Models as Open-World Image Classifiers. 16388-16398 - Jiaxin Lu, Chun-Hao Paul Huang, Uttaran Bhattacharya, Qixing Huang, Yi Zhou:

Humoto: A 4D Dataset of Mocap Human Object Interactions. 10886-10897 - Shijie Li, Chunyu Liu, Xun Xu, Si Yong Yeo, Xulei Yang:

Future-Aware Interaction Network for Motion Forecasting. 7505-7515 - Hae Jin Song, Laurent Itti:

Riemannian-Geometric Fingerprints of Generative Models. 11425-11435 - Alessio Spagnoletti, Jean Prost, Andrés Almansa, Nicolas Papadakis, Marcelo Pereyra:

LATINO-PRO: Latent Consistency Inverse Solver with Prompt Optimization. 19597-19607 - Xingyu Hu, Junjun Jiang, Chenyang Wang, Kui Jiang, Xianming Liu, Jiayi Ma:

Balancing Task-Invariant Interaction and Task-Specific Adaptation for Unified Image Fusion. 11262-11272 - Dongheon Lee, Seokju Yun, Youngmin Ro:

Emulating Self-attention with Convolution for Efficient Image Super-Resolution. 24467-24477 - Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang:

Efficient Concertormer for Image Deblurring and Beyond. 14665-14675 - Zhiyuan Fang, Rengan Xie, Xuancheng Jin, Qi Ye, Wei Chen, Wenting Zheng, Rui Wang, Yuchi Huo:

A3GS: Arbitrary Artistic Style into Arbitrary 3D Gaussian Splatting. 17751-17760 - Yifei Zhang, Lei Chen:

LEGO-Maker: A Semantic-Driven Algorithm for Text-to-3D Generation. 15127-15136 - Wei Chen, Jingxi Yu, Zichen Miao, Qiang Qiu:

Sparse Fine-Tuning of Transformers for Generative Tasks. 18703-18713 - Shengkai Sun, Zefan Zhang, Jianfeng Dong, Zhiyong Cheng, Xiaojun Chang, Meng Wang:

Towards Efficient General Feature Prediction in Masked Skeleton Modeling. 12212-12221 - Gavriel Habib, Noa Barzilay, Or Shimshi, Rami Ben-Ari, Nir Darshan:

CarGait: Cross-Attention Based Re-ranking for Gait Recognition. 11884-11894 - Yizhou Zhao, Haoyu Chen, Chunjiang Liu, Zhenyang Li, Charles Herrmann, Junhwa Hur, Yinxiao Li, Ming-Hsuan Yang, Bhiksha Raj, Min Xu:

Toward Material-Agnostic System Identification From Videos. 5944-5956 - Han-Hung Lee, Qinghong Han, Angel X. Chang:

NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes. 26509-26518 - Tao Lei, Ziyao Yang, Xingwu Wang, Yi Wang, Xuan Wang, Feiman Sun, Asoke K. Nandi:

Adaptive Learning of High-Value Regions for Semi-Supervised Medical Image Segmentation. 21450-21459 - Xindi Yang, Baolu Li, Yiming Zhang, Zhenfei Yin, Lei Bai, Liqian Ma, Zhiyong Wang, Jianfei Cai, Tien-Tsin Wong, Huchuan Lu, Xu Jia:

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior. 12360-12370 - Yuan Bian, Min Liu, Yunqi Yi, Xueping Wang, Shuai Jiang, Yaonan Wang:

Prompt-Driven Transferable Adversarial Attack on Person Re-identification with Attribute-Aware Textual Inversion. 22599-22609 - Zhengyin Liang

, Hui Yin, Min Liang, Qianqian Du, Ying Yang, Hua Huang:
UniDxMD: Towards Unified Representation for Cross-Modal Unsupervised Domain Adaptation in 3D Semantic Segmentation. 20346-20356 - Tianyi Zhao, Boyang Liu

, Yanglei Gao, Yiming Sun, Maoxun Yuan, Xingxing Wei:
Rethinking Multi-Modal Object Detection From the Perspective of Mono-Modality Feature Learning. 6364-6373 - Thomas Carr, Depeng Xu, Shuhan Yuan, Aidong Lu:

Privacy-Centric Deep Motion Retargeting for Anonymization of Skeleton-Based Motion Visualization. 13162-13170 - Zewei Zhou, Seth Z. Zhao, Tianhui Cai, Zhiyu Huang, Bolei Zhou, Jiaqi Ma:

TurboTrain: Towards Efficient and Balanced Multi-Task Learning for Multi-Agent Perception and Prediction. 4391-4402 - Haoyu Zhen, Qiao Sun, Hongxin Zhang, Junyan Li, Siyuan Zhou, Yilun Du, Chuang Gan:

Learning 4D Embodied World Models. 5337-5347 - Yilei Jiang, Wei-Hong Li, Yiyuan Zhang, Minghong Cai, Xiangyu Yue:

FairGen: Enhancing Fairness in Text-to-Image Diffusion Models via Self-Discovering Latent Directions. 18411-18420 - James Amato, Yunan Xie, Leonel Medina-Varela, Ammar Aljerwi, Adam McCutcheon, T. Seth Rippentrop, Kristian Gonzalez, Jacques Delabrouille, Mustapha Ishak, Nicholas Ruozzi:

CMB-ML: A Cosmic Microwave Background Dataset for the Oldest Possible Computer Vision Task. 9418-9430 - Zeyi Sun, Ziyang Chu, Pan Zhang, Tong Wu, Yuhang Zang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang:

X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting. 17268-17280 - Yingping Liang, Yutao Hu, Wenqi Shao, Ying Fu:

Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space. 6621-6631 - Uzay Gökay, Federico Spurio, Dominik R. Bach, Juergen Gall:

Skeleton Motion Words for Unsupervised Skeleton-Based Temporal Action Segmentation. 12101-12111 - WonJun Moon, Cheol-Ho Cho, Woojin Jun, Taeoh Kim, Inwoong Lee, Dongyoon Wee, Minho Shim, Jae-Pil Heo:

Prototypes Are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval. 21789-21799 - Qingyu Shi, Jianzong Wu, Jinbin Bai, Jiangning Zhang, Lu Qi, Yunhai Tong, Xiangtai Li:

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer. 10995-11005 - Haonan Han, Rui Yang, Huan Liao, Jiankai Xing, Zunnan Xu, Xiaoming Yu, Junwei Zha, Xiu Li, Wanhua Li:

REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment. 25367-25377 - Jieun Kim, Jinmyeong Kim, Yoonji Kim, Sung-Bae Cho:

Fuzzy Contrastive Decoding to Alleviate Object Hallucination in Large Vision-Language Models. 20572-20581 - Zesen Cheng, Kehan Li, Yian Zhao, Hang Zhang, Chang Liu, Jie Chen:

Temporal-Aware Query Routing for Real-Time Video Instance Segmentation. 22467-22476 - Chenhao Zheng, Jieyu Zhang, Mohammadreza Salehi, Ziqi Gao, Vishnu Iyengar, Norimasa Kobori, Quan Kong, Ranjay Krishna:

One Trajectory, One Token: Grounded Video Tokenization Via Panoptic Sub-Object Trajectory. 23156-23166 - Beomyoung Kim, Chanyong Shin, Joonhyun Jeong, Hyungsik Jung, Se-Yun Lee, Sewhan Chun, Dong-Hyun Hwang, Joonsang Yu:

ZIM: Zero-Shot Image Matting for Anything. 23828-23838 - Tingwei Li, Jun Bao, Zhenzhong Kuang, Buyu Liu:

What we Need is Explicit Controllability: Training 3D Gaze Estimator Using Only Facial Images. 11414-11424 - Yue Li, Qi Ma, Runyi Yang, Huapeng Li, Mengjiao Ma, Bin Ren, Nikola Popovic, Nicu Sebe

, Ender Konukoglu, Theo Gevers, Luc Van Gool, Martin R. Oswald, Danda Pani Paudel:
SceneSplat: Gaussian Splatting-Based Scene Understanding with Vision-Language Pretraining. 4961-4972 - Yiming Wu, Huan Wang, Zhenghao Chen, Jianxin Pang, Dong Xu:

On-Device Diffusion Transformer Policy for Efficient Robot Manipulation. 14073-14083 - Junjie He, Yifeng Geng, Liefeng Bo:

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization. 14399-14408 - Aniket Rege, Zinnia Nie, Mahesh Ramesh, Unmesh Raskar, Zhuoran Yu, Aditya Kusupati, Yong Jae Lee, Ramya Korlakai Vinayak:

CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems. 15680-15691 - Jiawei Liang, Siyuan Liang, Tianrui Lou, Ming Zhang, Wenjin Li, Dunqiu Fan, Xiaochun Cao:

Gradient-Reweighted Adversarial Camouflage for Physical Object Detection Evasion. 13880-13889 - Shuofeng Sun, Haibin Yan:

Mitigating Geometric Degradation in Fast DownSampling via FastAdapter for Point Cloud Segmentation. 25983-25992 - Zeyuan Chen, Hongyi Xu, Guoxian Song, You Xie, Chenxu Zhang, Xin Chen, Chao Wang, Di Chang, Linjie Luo:

X-Dancer: Expressive Music to Human Dance Video Generation. 10602-10611 - Michael Steiner, Thomas Köhler, Lukas Radl, Felix Windisch, Dieter Schmalstieg, Markus Steinberger:

AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering. 27650-27659 - Chao Liu, Yangbo Jiang, Nenggan Zheng:

NETracer: A Topology-Aware Iterative Tracing Approach for Tubular Structure Extraction. 20593-20602 - Jiawei Wang, Zhiming Cui, Changjian Li:

VQ-SGen: A Vector Quantized Stroke Representation for Creative Sketch Generation. 19311-19320 - Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Junchi Yan, Yan Yan:

QuEST: Low-Bit Diffusion Model Quantization via Efficient Selective Finetuning. 15542-15551 - Kaijie Yin, Zhiyuan Zhang, Shu Kong, Tian Gao, Cheng-Zhong Xu, Hui Kong:

Information-Bottleneck Driven Binary Neural Network for Change Detection. 7176-7186 - Lanmiao Liu, Esam Ghaleb, Asli Özyürek, Zerrin Yumak:

SemGes: Semantics-Aware Co-Speech Gesture Generation Using Semantic Coherence and Relevance Learning. 13963-13973 - Yuanhe Guo, Linxi Xie, Zhuoran Chen, Kangrui Yu, Ryan Po, Guandao Yang, Gordon Wetztein, Hongyi Wen:

ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization. 1-10 - Hengyu Meng, Duotun Wang, Zhijing Shao, Ligang Liu, Zeyu Wang:

Text2VDM: Text to Vector Displacement Maps for Expressive and Interactive 3D Sculpting. 16882-16892 - Chong Cheng, Sicheng Yu, Zijian Wang, Yifan Zhou, Hao Wang:

Outdoor Monocular SLAM with Global Scale-Consistent 3D Gaussian Pointmaps. 26035-26044 - Zhentao Tan, Ben Xue, Jian Jia, Junhao Wang, Wencai Ye, Shaoyun Shi, Mingjie Sun, Wenjin Wu, Quan Chen, Peng Jiang:

SweetTok: Semantic-Aware Spatial-Temporal Tokenizer for Compact Video Discretization. 23541-23550 - Subhajit Maity, Ayan Kumar Bhunia, Subhadeep Koley, Pinaki Nath Chowdhury, Aneeshan Sain, Yi-Zhe Song:

Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection. 284-296 - Jiaying Ying, Heming Du, Kaihao Zhang, Lincheng Li, Xin Yu

:
LDPose: Towards Inclusive Human Pose Estimation for Limb-Deficient Individuals in the Wild. 9865-9875 - Qian Feng, Jiahang Tu, Mintong Kang, Hanbin Zhao, Chao Zhang, Hui Qian:

FG-OrIU: Towards Better Forgetting via Feature-Gradient Orthogonality for Incremental Unlearning. 1957-1967 - Hemanth Saratchandran, Simon Lucey:

Enhancing Transformers Through Conditioned Embedded Tokens. 4786-4795 - Zhixiang Wei, Guangting Wang, Xiaoxiao Ma, Ke Mei, Huaian Chen, Yi Jin, Fengyun Rao:

HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models. 22447-22456 - Sunjae Yoon, Gwanhyeong Koo, Younghwan Lee, Ji Woo Hong, Chang D. Yoo:

Occlusion-Robust Stylization for Drawing-Based 3D Animation. 12263-12273 - Xingshuo Han, Xuanye Zhang, Xiang Lan, Haozhao Wang, Shengmin Xu, Shen Ren, Jason Zeng, Ming Wu, Michael Heinrich, Tianwei Zhang:

Mind the Cost of Scaffold! Benign Clients May Even Become Accomplices of Backdoor Attack. 1580-1589 - Shiji Zhao, Ranjie Duan, Fengxiang Wang, Chi Chen, Caixin Kang, Shouwei Ruan, Jialing Tao, YueFeng Chen, Hui Xue, Xingxing Wei:

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency. 2045-2054 - Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, Xinchao Wang:

OminiControl: Minimal and Universal Control for Diffusion Transformer. 14940-14950 - Fatemeh Ghezloo, Mehmet Saygin Seyfioglu, Rustin Soraki, Wisdom Oluchi Ikezogwo, Beibin Li, Tejoram Vivekanandan, Joann G. Elmore, Ranjay Krishna, Linda G. Shapiro:

PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology. 23431-23441 - Zhen Wu, Jiaman Li, Pei Xu, C. Karen Liu:

Human-Object Interaction from Human-level Instructions. 11176-11186 - Priyank Pathak, Yogesh S. Rawat:

Colors See Colors Ignore: Clothes Changing ReID with Color Disentanglement. 16797-16807 - Jungwoo Huh, Yeseung Park, Seongjean Kim, Jungsu Kim, Sanghoon Lee:

MBTI: Masked Blending Transformers with Implicit Positional Encoding for Frame-rate Agnostic Motion Estimation. 11568-11578 - Jimyeong Kim, Jungwon Park, Yeji Song, Nojun Kwak, Wonjong Rhee:

ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation. 15939-15948 - Lingxiao Li, Kaixuan Fan, Boqing Gong, Xiangyu Yue:

HYPDAE: Hyperbolic Diffusion Autoencoders for Hierarchical Few-Shot Image Generation. 17119-17128 - Jianwei Fei, Yunshu Dai, Peipeng Yu, Zhe Kong, Jiantao Zhou, Zhihua Xia:

Scalable Dual Fingerprinting for Hierarchical Attribution of Text-to-Image Models. 15025-15034 - Bo Liu, Ke Zou, Li-Ming Zhan, Zexin Lu, Xiaoyu Dong, Yidi Chen, Chengqiang Xie, Jiannong Cao, Xiao-Ming Wu, Huazhu Fu:

GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-Ray Diagnosis. 21310-21320 - Yuheng Shi, Minjing Dong, Chang Xu:

Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation. 23487-23497 - Kailai Zhou, Fuqiang Yang, Shixian Wang, Bihan Wen, Chongde Zi, Linsen Chen, Qiu Shen, Xun Cao:

M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision. 7872 - Tom Fischer, Xiaojie Zhang, Eddy Ilg:

Unified Category-Level Object Detection and Pose Estimation from RGB Images Using 3D Prototypes. 9790-9800 - Jonas Mirlach, Lei Wan, Andreas Wiedholz, Hannan Ejaz Keen, Andreas Eich:

R-LiViT: A LiDAR-Visual-Thermal Dataset Enabling Vulnerable Road User Focused Roadside Perception. 28375-28384 - Hanshen Zhu, Zhen Zhu, Kaile Zhang, Yiming Gong, Yuliang Liu, Xiang Bai:

Training-Free Geometric Image Editing on Diffusion Models. 19130-19140 - Fangfu Liu, Hao Li, Jiawei Chi, Hanyang Wang, Ming-Hsuan Yang, Fudong Wang, Yueqi Duan:

LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion. 29010-29020 - Kahim Wong, Jicheng Zhou, Haiwei Wu, Yain-Whar Si, Jiantao Zhou:

ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement. 19280-19289 - Daixun Li, Yusi Zhang, Mingxiang Cao, Donglai Liu, Weiying Xie, Tianlin Hui, Lunkai Lin, Zhiqiang Xie, Yunsong Li:

Towards Long-Horizon Vision-Language-Action System: Reasoning, Acting and Memory. 6839-6848 - Runze He, Bo Cheng, Yuhang Ma, Qingxiang Jia, Shanyuan Liu, Ao Ma, Xiaoyu Wu, Liebucha Wu, Dawei Leng, Yuhui Yin:

PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models. 18143-18154 - Jinghao Wang, Zhang Li, Zi Wang, Banglei Guan, Yang Shang, Qifeng Yu:

Deterministic Object Pose Confidence Region Estimation. 14866-14875 - Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta, Yaoting Wang, Mohamed Elhoseiny

, Ruohan Gao, Dinesh Manocha:
AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs. 1590-1601 - Yongxin Guo, Lin Wang, Xiaoying Tang, Tao Lin:

Client2Vec: Improving Federated Learning by Distribution Shifts Aware Client Indexing. 1433-1443 - Ruidong Chen, Honglin Guo, Lanjun Wang, Chenyu Zhang, Weizhi Nie, An-An Liu:

TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models. 1-10 - Hang Su, Yunlong Feng, Daniel Gehrig, Panfeng Jiang, Ling Gao, Xavier Lagorce, Laurent Kneip:

A Linear N-Point Solver for Structure and Motion from Asynchronous Tracks. 1-10 - Hallee E. Wong, Jose Javier Gonzalez Ortiz, John V. Guttag, Adrian V. Dalca:

Multiverseg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with in-Context Guidance. 20966-20980 - Yue Fan, Xiaojian Ma, Rongpeng Su, Jun Guo, Rujie Wu, Xi Chen, Qing Li:

Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding. 6342-6352 - David Fan, Shengbang Tong, Jiachen Zhu, Koustuv Sinha, Zhuang Liu, Xinlei Chen, Michael Rabbat, Nicolas Ballas, Yann LeCun, Amir Bar, Saining Xie:

Scaling Language-Free Visual Representation Learning. 1-13 - Mingxuan Wu, Huang Huang, Justin Kerr, Chung Min Kim, Anthony Zhang, Brent Yi, Angjoo Kanazawa:

Predict-Optimize-Distill: A Self-Improving Cycle for 4D Object Understanding. 6575-6584 - Zhongze Wang, Haitao Zhao, Lujian Yao, Jingchao Peng, Kaijie Zhao:

Dual-Level Prototype Learning for Composite Degraded Image Restoration. 14006-14061 - Manahil Raza, Ayesha Azam, Talha Qaiser, Nasir M. Rajpoot:

PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction. 22175-22186 - Yinda Chen, Haoyuan Shi, Xiaoyu Liu, Te Shi, Ruobing Zhang, Dong Liu, Zhiwei Xiong, Feng Wu:

TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation. 13604-13613 - Ruangrawee Kitichotkul, Shashwath Bharadwaj, Joshua Rapp, Yanting Ma, Alexander Mehta, Vivek K. Goyal:

Free-Running vs. Synchronous: Single-Photon Lidar for High-Flux 3D Imaging. 25972-25982 - Chunwei Wang, Guansong Lu, Junwei Yang, Runhui Huang, Jianhua Han, Lu Hou, Wei Zhang, Hang Xu:

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance. 21612-21622 - Alexey Kravets, Da Chen, Vinay P. Namboodiri:

Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting. 1902-1911 - Junkai Deng, Hanting Niu, Jiaze Li, Fei Hou, Ying He:

UNIS: A Unified Framework for Achieving Unbiased Neural Implicit Surfaces in Volume Rendering. 27671-27680 - Saemi Moon, Minjong Lee, Sangdon Park, Dongwoo Kim:

Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning. 16356-16366 - Xuechao Zou, Yue Li, Shun Zhang, Kai Li, Shiying Wang, Pin Tao, Junliang Xing, Congyan Lang:

Dynamic Dictionary Learning for Remote Sensing Image Segmentation. 22457-22466 - Katja Schwarz, Denis Rozumny, Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder:

A Recipe for Generating 3D Worlds from a Single Image. 3520-3530 - Haoyang Chen, Dongfang Sun, Caoyuan Ma, Shiqin Wang, Kewei Zhang, Zheng Wang, Zhixiang Wang:

Subjective Camera 1.0: Bridging Human Cognition and Visual Reconstruction Through Sequence-Aware Sketch-Guided Diffusion. 17838-17847 - Alex Costanzino, Pierluigi Zama Ramirez, Luigi Lella, Matteo Ragaglia, Alessandro Oliva, Giuseppe Lisanti, Luigi Di Stefano:

SiM3D: Single-Instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark. 20944-20953 - Yong Liu, Hang Dong, Jinshan Pan, Qingji Dong, Kai Chen, Rongxiang Zhang, Lean Fu, Fei Wang:

PatchScaler: An Efficient Patch-Independent Diffusion Model for Image Super-Resolution. 11283-11293 - Stefan Andreas Baumann, Nick Stracke, Timy Phan, Björn Ommer:

What If: Understanding Motion Through Sparse Interactions. 10286-10296 - Siddharth Tourani, Jayaram Reddy, Akash Kumbar, Satyajit Tourani, Nishant Goyal, Madhava Krishna, N. Dinesh Reddy, Muhammad Haris Khan:

Leveraging 2D Priors and SDF Guidance for Dynamic Urban Scene Rendering. 29051-29063 - Harry Cheng, Yangyang Guo, Qing Guo, Ming-Hsuan Yang, Tian Gan, Weili Guan, Liqiang Nie:

Social Debiasing for Fair Multi-Modal LLMs. 1740-1750 - Boyi Sun, Yuhang Liu, Houxin He, Yonglin Tian, Fei-Yue Wang:

AnnofreeOD: Detecting All Classes at Low Frame Rates Without Human Annotations. 5315-5325 - Ruida Zhang, Chengxi Li, Chenyangguang Zhang, Xingyu Liu, Haili Yuan, Yanyan Li, Xiangyang Ji, Gim Hee Lee:

Street Gaussians Without 3D Object Tracker. 2522-2534 - Xiaohang Zhan, Dingming Liu:

LaRender: Training-Free Occlusion Control in Image Generation via Latent Rendering. 19679-19688 - Shanlin Sun, Yifan Wang, Hanwen Zhang, Yifeng Xiong, Qin Ren, Ruogu Fang, Xiaohui Xie, Chenyu You:

Ouroboros: Single-Step Diffusion Models for Cycle-Consistent Forward and Inverse Rendering. 10386-10397 - Yangyang Xu, Bangzhen Liu, Wenqi Shao, Yong Du, Shengfeng He, Tingting Zhu:

Cross-Subject Mind Decoding from Inaccurate Representations. 15066-15075 - Tianyi Xu, Fan Zhang, Boxin Shi, Tianfan Xue, Yujin Wang:

AdaptiveAE: An Adaptive Exposure Strategy for HDR Capturing in Dynamic Scenes. 25176-25185 - Fanhong Zeng, Huanan Li, Juntao Guan, Rui Fan, Tong Wu, Xilong Wang, Rui Lai:

An Efficient Hybrid Vision Transformer for Tinyml Applications. 19914-19924 - Tongtong Cheng, Rongzhen Li, Yixin Xiong, Tao Zhang, Jing Wang, Kai Liu:

MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding. 5479-5489 - Yujeong Chae, Heejun Park, Hyeonseong Kim, Kuk-Jin Yoon:

Doppler-Aware LiDAR-RADAR Fusion for Weather-Robust 3D Detection. 27197-27208 - Pegah Khayatan, Mustafa Shukor, Jayneel Parekh, Arnaud Dapogny, Matthieu Cord:

Analyzing Fine-Tuning Representation Shift for Multimodal LLMs Steering. 2206-2216 - Uranik Berisha, Jens Mehnert, Alexandru Paul Condurache:

Variance-Based Pruning for Accelerating and Compressing Trained Networks. 1-10 - Zedong Wang, Siyuan Li, Dan Xu:

Rep-MTL: Unleashing the Power of Representation-Level Task Saliency for Multi-Task Learning. 3413-3423 - Suorong Yang, Peijia Li, Furao Shen, Jian Zhao:

Reinforcement Learning-Guided Data Selection Via Redundancy Assessment. 1004-1015 - Chen Chen, Kangcheng Bin, Ting Hu, Jiahao Qi, Xingyue Liu, Tianpeng Liu, Zhen Liu, Yongxiang Liu, Ping Zhong:

Fusion Meets Diverse Conditions: A High-Diversity Benchmark and Baseline for UAV-Based Multimodal Object Detection with Condition Cues. 27958-27967 - Yifan Lu, Xuanchi Ren, Jiawei Yang, Tianchang Shen, Zhangjie Wu, Jun Gao, Yue Wang, Siheng Chen, Mike Chen, Sanja Fidler, Jiahui Huang:

InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models. 27272-27283 - Xiaobiao Du, Yida Wang, Haiyang Sun, Zhuojie Wu, Hongwei Sheng, Shuyun Wang, Jiaying Ying, Ming Lu, Tianqing Zhu, Kun Zhan, Xin Yu:

3DRealCar: An In-the-Wild RGB-D Car Dataset with 360-Degree Views. 26488-26498 - Peilin Tao, Hainan Cui, Diantao Tu, Shuhan Shen:

MGSfM: Multi-Camera Geometry Driven Global Structure-from-Motion. 1-10 - Wujie Sun, Defang Chen, Siwei Lyu, Genlang Chen, Chun Chen, Can Wang:

Knowledge Distillation with Refined Logits. 1110-1119 - Fei Zhou, Peng Wang, Lei Zhang, Wei Wei, Chen Ding, Guosheng Lin, Yanning Zhang:

Towards Effective Foundation Model Adaptation for Extreme Cross-Domain Few-Shot Learning. 4582-4593 - Jensen Zhou, Hang Gao, Vikram Voleti, Aaryaman Vasishta, Chun-Han Yao, Mark Boss, Philip Torr, Christian Rupprecht, Varun Jampani:

Stable Virtual Camera: Generative View Synthesis with Diffusion Models. 12405-12414 - Hongyi Zhou, Yulan Guo, Xiaogang Wang, Kai Xu:

Monomobility: Zero-Shot 3D Mobility Analysis From Monocular Videos. 8800-8809 - Feng Yang, Yichao Cao, Xiu Su, Dan Niu, Xuanpeng Li:

CounterPC: Counterfactual Feature Realignment for Unsupervised Domain Adaptation on Point Clouds. 24760-24769 - Zikun Xu, Shaobing Xu:

MergeOcc: Bridge the Domain Gap between Different Lidars for Robust Occupancy Prediction. 26539-26548 - Marcos V. Conde, Zihao Lu, Radu Timofte:

PixTalk: Controlling Photorealistic Image Processing and Editing with Language. 19269-19279 - Wenlong Luo, Shizhou Zhang, De Cheng, Yinghui Xing, Guoqiang Liang, Peng Wang, Yanning Zhang:

Gradient Decomposition and Alignment for Incremental Object Detection. 4486-4495 - Yongsheng Yuan, Jie Zhao, Dong Wang, Huchuan Lu:

CAT: A Unified Click-and-Track Framework for Realistic Tracking. 5690-5700 - Saimouli Katragadda, Cho-Ying Wu, Yuliang Guo, Xinyu Huang, Guoquan Huang, Liu Ren:

Online Language Splatting. 25882-25892 - Michael Bernasconi, Abdelaziz Djelouah, Yang Zhang, Markus Gross, Christopher Schroers:

LDIP: Long Distance Information Propagation for Video Super-Resolution. 11558-11567 - Yuekun Dai, Haitian Li, Shangchen Zhou, Chen Change Loy:

Trans-Adapter: A Plug-And-Play Framework for Transparent Image Inpainting. 15015-15024 - Hyeongjin Nam, Donghwan Kim, Gyeongsik Moon, Kyoung Mu Lee:

PARTE: Part-Guided Texturing for 3D Human Reconstruction from a Single Image. 8547-8557 - Ziyi Liu, Zhe Xu, Jiabo Ma, Wenqiang Li, Ruixuan Wang, Bo Du, Hao Chen:

Conditional Visual Autoregressive Modeling for Pathological Image Restoration. 17828-17837 - Junbo Zhao

, Ting Zhang, Jiayu Sun, Mi Tian, Hua Huang:
Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information. 1526-1536 - Linshen Liu, Boyan Su, Junyue Jiang, Guanlin Wu, Cong Guo, Ceyu Xu, Hao (Frank) Yang:

Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge. 25903-25913 - Ziyue Huang, Yongchao Feng, Ziqi Liu, Shuai Yang, Qingjie Liu, Yunhong Wang:

OpenRSD: Towards Open-Prompts for Object Detection in Remote Sensing Images. 8384-8394 - Chi-Jui Ho, Yash Belhe, Steve Rotenberg, Ravi Ramamoorthi, Tzu-Mao Li, Nicholas Antipa:

A Differentiable Wave Optics Model for End-To-End Computational Imaging System Optimization. 28042-28051 - Yuhui Zeng, Haoxiang Wu, Wenjie Nie, Guangyao Chen, Xiawu Zheng, Yunhang Shen, Jun Peng, Yonghong Tian, Rongrong Ji:

From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors Via LLM-guided Symbolic Reasoning. 24380-24391 - Haoyi Duan, Hong-Xing Yu, Sirui Chen, Li Fei-Fei, Jiajun Wu:

WorldScore: A Unified Evaluation Benchmark for World Generation. 1-12 - Weijie Lyu, Yi Zhou, Ming-Hsuan Yang, Zhixin Shu:

FaceLift: Learning Generalizable Single Image 3D Face Reconstruction From Synthetic Heads. 12691-12701 - Qihan Huang, Weilong Dai, Jinlong Liu, Wanggui He, Hao Jiang, Mingli Song, Jingyuan Chen, Chang Yao, Jie Song:

Boosting MLLM Reasoning with Text-Debiased Hint-GRPO. 4848-4857 - Jixuan Fan, Wanhua Li, Yifei Han, Tianru Dai, Yansong Tang:

Momentum-Gs: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction. 25250-25260 - Yiren Song, Xiaokang Liu, Mike Zheng Shou:

DiffSim: Taming Diffusion Models for Evaluating Visual Similarity. 16904-16915 - Xiaoqi Wang, Clint Sebastian, Wenbin He, Liu Ren:

ProSAM: Enhancing the Robustness of Sam-Based Visual Reference Segmentation with Probabilistic Prompts. 1-10 - Mahmoud Afifi, Luxi Zhao, Abhijith Punnappurath, Mohammed A. Abdelsalam, Ran Zhang, Michael S. Brown:

Time-Aware Auto White Balance in Mobile Photography. 5038-5047 - Minh Tran, Hongda Mao, Qingshuang Chen, Yelin Kim:

Head2Body: Body Pose Generation from Multi-Sensory Head-Mounted Inputs. 6849-6858 - Jinhong Ni, Chang-Bin Zhang, Qiang Zhang, Jing Zhang:

What Makes for Text to 360-Degree Panorama Generation with Stable Diffusion? 1-10 - Dongming Wu, Yanping Fu, Saike Huang, Yingfei Liu, Fan Jia, Nian Liu, Feng Dai, Tiancai Wang, Rao Muhammad Anwer, Fahad Shahbaz Khan, Jianbing Shen:

RAGNet: Large-Scale Reasoning-Based Affordance Segmentation Benchmark Towards General Grasping. 11980-11990 - Yingfan Ma, Bohan An, Ao Shen, Mingzhi Yuan, Minghong Duan, Manning Wang:

Flow-MIL: Constructing Highly-expressive Latent Feature Space for Whole Slide Image Classification using Normalizing Flow. 23561-23570 - Shaocheng Yan, Pengcheng Shi, Zhenjun Zhao, Kaixin Wang, Kuang Cao, Ji Wu, Jiayuan Li:

Turboreg: Turboclique for Robust and Efficient Point Cloud Registration. 26371-26381 - Tongkun Guan, Zining Wang, Pei Fu, Zhengtao Guo, Wei Shen, Kai Zhou, Tiezhu Yue, Chen Duan, Hao Sun, Qianyi Jiang, Junfeng Luo, Xiaokang Yang:

A Token-Level Text Image Foundation Model for Document Understanding. 23210-23220 - Jon Nyffeler, Federico Tombari, Daniel Barath:

Hierarchical 3D Scene Graphs Construction Outdoors. 26817-26826 - Jian Shi, Peter Wonka:

VoxelKP: A Voxel-Based Network Architecture for Human Keypoint Estimation in LiDAR Data. 28282-28291 - Melih Barsbey, Lucas Prieto, Stefanos Zafeiriou, Tolga Birdal:

Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility. 2055-2066 - Zhong-Yu Li, Ruoyi Du, Juncheng Yan, Le Zhuo, Zhen Li, Peng Gao, Zhanyu Ma, Ming-Ming Cheng:

VisualCloze: A Universal Image Generation Framework via Visual in-Context Learning. 18969-18979 - Xinqi Fan, Xueli Chen, Luoxiao Yang, Chuin Hong Yap, Rizwan Qureshi, Qi Dou, Moi Hoon Yap, Mubarak Shah:

Test-Time Retrieval-Augmented Adaptation for Vision-Language Models. 8810-8819 - Zhu Xu, Ting Lei, Zhimin Li, Guan Wang, Qingchao Chen, Yuxin Peng, Yang Liu:

TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-Enhanced Relation-Aware Knowledge Transferring. 15812-15821 - Mainak Singha, Subhankar Roy, Sarthak Mehrotra, Ankit Jha, Moloud Abdar, Biplab Banerjee, Elisa Ricci:

FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models. 1-10 - Mingquan Zhou, Chen He, Ruiping Wang, Xilin Chen:

OV3D-CG: Open-Vocabulary 3D Instance Segmentation with Contextual Guidance. 5305-5314 - Xunpeng Yi, Yibing Zhang, Xinyu Xiang, Qinglong Yan, Han Xu, Jiayi Ma:

LUT-Fuse: Towards Extremely Fast Infrared and Visible Image Fusion via Distillation to Learnable Look-Up Tables. 14559-14568 - Kanggeon Lee, Soochahn Lee, Kyoung Mu Lee:

Auto-Regressive Transformation for Image Alignment. 13569-13579 - Shiduo Zhang, Zhe Xu, Peiju Liu, Xiaopeng Yu, Yuan Li, Qinghui Gao, Zhaoye Fei, Zhangyue Yin, Zuxuan Wu, Yu-Gang Jiang, Xipeng Qiu:

VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks. 11142-11152 - Junchao Huang, Xinting Hu, Shaoshuai Shi, Zhuotao Tian, Li Jiang:

Edit360: 2D Image Edits to 3D Assets From Any Angle. 16618-16628 - Bowen Zhang, Sicheng Xu, Chuxin Wang, Jiaolong Yang, Feng Zhao, Dong Chen, Baining Guo:

Gaussian Variation Field Diffusion for High-Fidelity Video-to-4D Synthesis. 12502-12513 - Peijun Bao, Chenqi Kong, Siyuan Yang, Zihao Shao, Xinghao Jiang, Boon Poh Ng, Meng Hwa Er, Alex ChiChung Kot:

Vid-Group: Temporal Video Grounding Pretraining from Unlabeled Videos in the Wild. 20541-20550 - Guangyu Ren, Hengyan Liu, Michalis Lazarou, Tania Stathaki:

Multi-Modal Segment Anything Model for Camouflaged Scene Segmentation. 19882-19892 - Zhongpai Gao, Benjamin Planche, Meng Zheng, Anwesa Choudhuri, Terrence Chen, Ziyan Wu:

7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting. 26316-26325 - Zhixuan Li, Hyunse Yoon, Sanghoon Lee, Weisi Lin:

Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA. 21927-21937 - Yuzhu Wang, Manni Duan, Shu Kong:

Attention to the Burstiness in Visual Prompt Tuning! 4253-4263 - Shiwei Zhang, Qi Zhou, Wei Ke:

Enhancing Zero-Shot Object Counting via Text-Guided Local Ranking and Number-Evoked Global Attention. 21097-21106 - Jie Xu, Na Zhao, Gang Niu, Masashi Sugiyama, Xiaofeng Zhu:

Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation. 4232-4241 - Zihan Cao, Yu Zhong, Ziqi Wang, Liang-Jian Deng:

MMAIF: Multi-Task and Multi-Degradation All-in-One for Image Fusion with Language Guidance. 11744-11754 - Zhuoyuan Li, Jiahao Lu, Jiacheng Deng, Hanzhi Chang, Lifan Wu, Yanzhe Liang, Tianzhu Zhang:

SAS: Segment Any 3D Scene with Integrated 2D Priors. 8306-8318 - Zhang Li, Biao Yang, Qiang Liu, Shuo Zhang, Zhiyin Ma, Liang Yin, Linger Deng, Yabo Sun, Yuliang Liu, Xiang Bai:

LIRA: Inferring Segmentation in Large Multi-Modal Models with Local Interleaved Region Assistance. 24056-24067 - Weijia Zhang, Yuehao Liu, Wu Ran, Chao Ma:

Cross-Architecture Distillation Made Simple with Redundancy Suppression. 23256-23266 - Yuanze Li, Shihao Yuan, Haolin Wang, Qizhang Li, Ming Liu, Chen Xu, Guangming Shi, Wangmeng Zuo:

Triad: Empowering LMM-Based Anomaly Detection with Expert-Guided Region-of-Interest Tokenizer and Manufacturing Process. 21917-21926 - Viet Nguyen, Anh Nguyen, Trung Dao, Khoi Nguyen, Cuong Pham, Toan Tran, Anh Tran:

Supercharged One-Step Text-to-Image Diffusion Models with Negative Prompts. 18004-18013 - Gaurav Patel, Qiang Qiu:

Learning to Unlearn While Retaining: Combating Gradient Conflicts in Machine Unlearning. 4211-4221 - Jinhao Duan, Fei Kong, Hao Cheng, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu:

TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination via Latent Truthful-Guided Pre-Intervention. 7372-7382 - Jiawei Wang, Yushen Zuo, Yuanjun Chai, Zhendong Liu, Yicheng Fu, Yichun Feng, Kin-Man Lam:

Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-Based Attacks. 2773-2782 - Xin Wen, Bingchen Zhao, Ismail Elezi, Jiankang Deng, Xiaojuan Qi:

"Principal Components" Enable a New Language of Images. 16641-16651 - Hyunjin Cho, Giyun Choi, Jongwon Choi:

AJAHR: Amputated Joint Aware 3D Human Mesh Recovery. 7925-7935 - Wei Liao, Chunyan Xu, Chenxu Wang, Zhen Cui:

LLM-Assisted Semantic Guidance for Sparsely Annotated Remote Sensing Object Detection. 22519-22528 - Christophe Bolduc, Yannick Hold-Geoffroy, Jean-François Lalonde:

GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR. 29120-29130 - Nikita Karaev, Yuri Makarov, Jianyuan Wang, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht:

CoTracker3: Simpler and Better Point Tracking by Pseudo-Labeling Real Videos. 1-10 - Jiacheng Ruan, Wenzhen Yuan, Xiqi Gao, Ye Guo, Daoxin Zhang, Zhe Xu, Yao Hu, Ting Liu, Yuzhuo Fu:

VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models. 3163-3173 - Richard Liu, Daniel Fu, Noah Tan, Itai Lang, Rana Hanocka:

WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction. 14810-14821 - Ruijie Zhu, Mulin Yu, Linning Xu, Lihan Jiang, Yixuan Li, Tianzhu Zhang, Jiangmiao Pang, Bo Dai:

ObjectGS: Object-Aware Scene Reconstruction and Scene Understanding via Gaussian Splatting. 8350-8360 - Hanlin Li, Wenming Weng, Yueyi Zhang, Zhiwei Xiong:

GenFlow3D: Generative Scene Flow Estimation and Prediction on Point Cloud Sequences. 27488-27497 - Weiyi You, Mingyang Zhang, Leheng Zhang, Xingyu Zhou, Kexuan Shi, Shuhang Gu:

Consistency Trajectory Matching for One-Step Generative Super-Resolution. 12747-12756 - Kai Jia, Tengyu Liu, Yixin Zhu, Mingtao Pei, Siyuan Huang:

PrimHOI: Compositional Human-Object Interaction via Reusable Primitives. 11491-11501 - Hao Si, Ehsan Javanmardi, Manabu Tsukada:

You Share Beliefs, I Adapt: Progressive Heterogeneous Collaborative Perception. 27521-27530 - Chenhang Ying, Huiyu Yang, Jieyi Ge, Zhaodong Sun, Xu Cheng, Kui Ren, Xiaobai Li:

FusionPhys: A Flexible Framework for Fusing Complementary Sensing Modalities in Remote Physiological Measurement. 9363-9373 - Jiahua Dong, Hui Yin, Wenqi Liang, Hanbin Zhao, Henghui Ding, Nicu Sebe

, Salman Khan, Fahad Shahbaz Khan:
Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation. 11829-11839 - Sivan Doveh, Nimrod Shabtay, Eli Schwartz, Hilde Kuehne, Raja Giryes, Rogério Feris, Leonid Karlinsky, James R. Glass, Assaf Arbelle, Shimon Ullman, Muhammad Jehanzeb Mirza:

Teaching VLMs to Localize Specific Objects from In-Context Examples. 9572-9582 - Liwei Che, Tony Qingze Liu, Jing Jia, Weiyi Qin, Ruixiang Tang, Vladimir Pavlovic:

Hallucinatory Image Tokens: A Training-Free EAZY Approach to Detecting and Mitigating Object Hallucinations in LVLMs. 21635-21644 - Yuxuan Luo, Jiaqi Tang, Chenyi Huang, Feiyang Hao, Zhouhui Lian:

CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model. 23030-23040 - Shenghao Fu, Qize Yang, Yuan-Ming Li, Yi-Xing Peng, Kun-Yu Lin, Xihan Wei, Jian-Fang Hu, Xiaohua Xie, Wei-Shi Zheng:

ViSpeak: Visual Instruction Feedback in Streaming Videos. 21778-21788 - Minwen Liao, Hao Bo Dong, Xinyi Wang, Kurban Ubul, Yihua Shao, Ziyang Yan:

GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts. 8766-8776 - Alexander Mai, Peter Hedman, George Kopanas, Dor Verbin, David Futschik, Qiangeng Xu, Falko Kuester, Jonathan T. Barron, Yinda Zhang:

EVER: Exact Volumetric Ellipsoid Rendering for Real-Time View Synthesis. 4930-4939 - Yuhui Wu, Liyi Chen, Ruibin Li, Shihao Wang, Chenxi Xie, Lei Zhang:

InsViE-1M: Effective Instruction-Based Video Editing with Elaborate Dataset Construction. 16692-16701 - Xinyu Fang, Zhijian Chen, Kai Lan, Lixin Ma, Shengyuan Ding, Yingji Liang, Xiangyu Zhao, Farong Wen, Zicheng Zhang, Guofeng Zhang, Haodong Duan, Kai Chen, Dahua Lin:

Creation-Mmbench: Assessing Context-Aware Creative Intelligence in Mllms. 447-456 - Gabriele Moreno Berton, Alex Stoken, Carlo Masone:

AstroLoc: Robust Space to Ground Image Localizer. 5811-5820 - Tim Elsner, Paula Usinger, Julius Nehring-Wirxel, Gregor Kobsik, Victor Czech, Yanjiang He, Isaak Lim, Leif Kobbelt:

Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation. 21331-21341 - Rangel Daroya, Elijah Cole, Oisin Mac Aodha, Grant Van Horn, Subhransu Maji:

WildSAT: Learning Satellite Image Representations from Wildlife Observations. 6143-6154 - Sebastian Schmidt, Julius Körner, Dominik Fuchsgruber, Stefano Gasperini, Federico Tombari, Stephan Günnemann:

Prior2former - Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation. 23646-23656 - Zhu Yu, Bowen Pang, Lizhe Liu, Runmin Zhang, Qiang Li, Si-Yuan Cao, Maochun Luo, Mingxia Chen, Sheng Yang, Hui-Liang Shen:

Language Driven Occupancy Prediction. 7548-7558 - Fei Yin, Mallikarjun B. R., Chun-Han Yao, Rafal K. Mantiuk, Varun Jampani:

FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image. 11612-11621 - Xiaomeng Fu, Jia Li:

TCFG: Truncated Classifier-Free Guidance for Efficient and Scalable Text-to-Image Acceleration. 18552-18562 - Junpeng Jing, Weixun Luo, Ye Mao, Krystian Mikolajczyk:

Stereo Any Video: Temporally Consistent Stereo Matching. 20836-20846 - Guanxing Lu, Tengbo Yu, Haoyuan Deng, Season Si Chen, Yansong Tang, Ziwei Wang:

AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation. 13662-13672 - Luming Zhao, Jingwen Xuan, Jiamin Lou, Yonghui Yu, Wenwu Yang:

Context-Aware Academic Emotion Dataset and Benchmark. 13859-13868 - Ao Liang, Lingdong Kong, Dongyue Lu, Youquan Liu, Jian Fang, Huaici Zhao, Wei Tsang Ooi:

Perspective-Invariant 3D Object Detection. 27725-27738 - Habin Lim, Yeongseob Won, Juwon Seo, Park Park:

ConceptSplit: Decoupled Multi-Concept Personalization of Diffusion Models via Token-Wise Adaptation and Attention Disentanglement. 18421-18430 - Shivani Mall, João F. Henriques:

CRAM: Large-Scale Video Continual Learning with Bootstrapped Compression. 1504-15055 - Xi Li, Tong Rao, Cihui Pan:

EDM: Efficient Deep Feature Matching. 26198-26208 - Wonwoong Cho, Yan-Ying Chen, Matthew Klenk, David I. Inouye, Yanxia Zhang:

Att-Adapter: a Robust and Precise Domain-Specific Multi-Attributes T2i Diffusion Adapter Via Conditional Variational Autoencoder. 15626-15635 - Xinyue Hao, Gen Li, Shreyank N. Gowda, Robert B. Fisher, Jonathan Huang, Anurag Arnab, Laura Sevilla-Lara:

Principles of Visual Tokens for Efficient Video Understanding. 21254-21264 - Keon-Hee Park, Seun-An Choe, Gyeong-Moon Park:

SFUOD: Source-Free Unknown Object Detection. 3499-3508 - Sung-Yeon Park, Can Cui, Yunsheng Ma, Ahmadreza Moradipari, Rohit Gupta, Kyungtae Han, Ziran Wang:

NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models. 8066-8076 - Daniel DeAlcala, Aythami Morales, Julian Fierrez, Gonzalo Mancera, Ruben Tolosana, Javier Ortega-Garcia:

Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning. 647-656 - Hongliang Zhou, Yongxiang Liu, Canyu Mo, Weijie Li, Bowen Peng, Li Liu:

When Pixel Difference Patterns Meet ViT: PiDiViT for Few-Shot Object Detection. 24309-24318 - Chia-Wen Kuo, Sijie Zhu, Fan Chen, Xiaohui Shen, Longyin Wen:

$\mathcal{D}$-Attn: Decomposed Attention for Large Vision-and-Language Models. 23935-23944 - Longxin Kou, Fei Ni, Yan Zheng, Peilong Han, Jinyi Liu, Haiqin Cui, Rui Liu, Jianye Hao:

RoboAnnotatorX: A Comprehensive and Universal Annotation Framework for Accurate Understanding of Long-Horizon Robot Demonstration. 10353-10363 - Wanting Zhang, Zhenhui Ding, Guilian Chen, Huisi Wu, Jing Qin:

RA-BUSSeg: Relation-Aware Semi-Supervised Breast Ultrasound Image Segmentation via Adjacent Propagation and Cross-Layer Alignment. 21689-21698 - Jeongeun Park, Sungjoon Choi, Sangdoo Yun:

A Unified Framework for Motion Reasoning and Generation in Human Interaction. 10698-10707 - Jingyu Liu, Zijie Xin, Yuhan Fu, Ruixiang Zhao, Bangxiang Lan, Xirong Li:

Multi-Object Sketch Animation by Scene Decomposition and Motion Planning. 11537-11546 - Woo Kyoung Han, Yongjun Lee, Byeonghun Lee, Sanghyun Park, Sunghoon Im, Kyong Hwan Jin:

JPEG Processing Neural Operator for Backward-Compatible Coding. 19503-19512 - Jun Zhang, Desen Meng, Zhengming Zhang, Zhenpeng Huang, Tao Wu, Limin Wang:

p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay. 3705-3715 - Shijie Zhou, Ruiyi Zhang, Huaisheng Zhu, Branislav Kveton, Yufan Zhou, Jiuxiang Gu, Jian Chen, Changyou Chen:

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation. 19638-19648 - Ruiqian Li, Siyuan Shen, Suan Xia, Ziheng Wang, Xingyue Peng, Chengxuan Song, Yingsheng Zhu, Tao Wu, Shiying Li, Jingyi Yu:

TransiT: Transient Transformer for Non-Line-of-Sight Videography. 27542-27551 - Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Zhonghua Wu, Qingyi Tao, Wentao Liu, Wei Li, Chen Change Loy:

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation. 17739-17750 - Yuci Liang, Xinheng Lyu, Wenting Chen, Meidan Ding, Jipeng Zhang, Xiangjian He, Song Wu, Xiaohan Xing, Sen Yang, Xiyue Wang, Linlin Shen:

WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image. 22718-22727 - Longhua Li, Lei Qi, Xin Geng:

One-Shot Knowledge Transfer for Scalable Person Re-Identification. 668-677 - Zhenxing Dong, Jiazhou Chen:

Transformer-Based Tooth Alignment Prediction with Occlusion and Collision Constraints. 25145-25154 - Chongjie Si, Zhiyi Shi, Xuehui Wang, Yichen Xiao, Xiaokang Yang, Wei Shen:

Generalized Tensor-Based Parameter-Efficient Fine-Tuning via Lie Group Transformations. 197-207 - Nithin Gopalakrishnan Nair, Srinivas Kaza, Xuan Luo, Vishal M. Patel, Stephen Lombardi, Jungyeon Park:

Scaling Transformer-Based Novel View Synthesis with Models Token Disentanglement and Synthetic Data. 28567-28576 - Xianqi Wang, Hao Yang, Gangwei Xu, Junda Cheng, Min Lin, Yong Deng, Jinliang Zang, Yurui Chen, Xin Yang:

ZeroStereo: Zero-Shot Stereo Matching from Single Images. 28177-28187 - Shaoyuan Xie, Lingdong Kong, Yuhao Dong, Chonghao Sima, Wenwei Zhang, Qi Alfred Chen, Ziwei Liu, Liang Pan:

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives. 6285-6297 - Tianyi Wei, Yifan Zhou, Dongdong Chen, Xingang Pan:

FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing. 16745-16754 - Constantin Patsch, Yuankai Wu, Marsil Zakour, Driton Salihu, Eckehard Steinbach:

MistSense: Versatile Online Detection of Procedural and Execution Mistakes. 14528-14537 - Mohammadreza Salehi, Shashanka Venkataramanan, Ioana Simion, Efstratios Gavves, Cees G. M. Snoek, Yuki M. Asano:

Mosic: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning. 6541-6551 - Wufei Ma, Haoyu Chen, Guofeng Zhang, Yu-Cheng Chou, Jieneng Chen, Celso de Melo, Alan L. Yuille:

3DSRBENCH: A Comprehensive 3D Spatial Reasoning Benchmark. 6924-6934 - Alberto Jaenal, Paula Carbó Cubero, José Araujo, André Mateus:

Towards Visual Localization Interoperability: Cross-Feature for Collaborative Visual Localization and Mapping. 26783-26792 - Gengze Zhou, Yicong Hong, Zun Wang, Chongyang Zhao, Mohit Bansal, Qi Wu:

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts. 7794-7807 - Jiang Yuan, Ji Ma, Bo Wang, Guanzhou Ke, Weiming Hu:

LightBSR: Towards Lightweight Blind Super-Resolution via Discriminative Implicit Degradation Representation Learning. 11927-11936 - Zhennan Chen, Yajie Li, Haofan Wang, Zhibo Chen, Zhengkai Jiang, Jun Li, Qian Wang, Jian Yang, Ying Tai:

RAGD: Regional-Aware Diffusion Model for Text-to-Image Generation. 19331-19341 - Jiro Abe, Gaku Nakano, Kazumine Ogura:

NormalLoc: Visual Localization on Textureless 3D Models using Surface Normals. 25421-25430 - Ao Ma, Jiasong Feng, Ke Cao, Jing Wang, Yun Wang, Quanwei Zhang, Zhanjie Zhang:

Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation. 16102-16111 - Yu-Chien Liao, Jr-Jen Chen, Chi-Pin Huang, Ci-Siang Lin, Meng-Lin Wu, Yu-Chiang Frank Wang:

Continual Personalization for Diffusion Models. 15511-15520 - Chamin Hewa Koneputugodage, Dylan Campbell, Stephen Gould:

Leaps and Bounds: An Improved Point Cloud Winding Number Formulation for Fast Normal Estimation and Surface Reconstruction. 26116-26125 - Zhenyu Li, Mykola Lavreniuk, Jian Shi, Shariq Farooq Bhat, Peter Wonka:

Amodal Depth Anything: Amodal Depth Estimation in the Wild. 9673-9682 - Shijie Fang, Hongping Gan:

Unfolding-Associative Encoder-Decoder Network with Progressive Alignment for Pansharpening. 13651-13661 - Weiqi Zhang, Junsheng Zhou, Haotian Geng, Wenyuan Zhang, Yu-Shen Liu:

GAP: Gaussianize Any Point Clouds with Text Guidance. 25627-25638 - Bingchen Gong, Diego Gomez, Abdullah Hamdi, Abdelrahman Eldesokey, Ahmed Abdelreheem

, Peter Wonka, Maks Ovsjanikov:
ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models. 22089-22099 - Jiamin Wu, Kenkun Liu, Xiaoke Jiang, Yuan Yao, Lei Zhang:

UniGS: Modeling Unitary 3D Gaussians for Novel View Synthesis from Sparse-View Images. 26241-26251 - Zeyuan Yang, Delin Chen, Xueyang Yu, Maohao Shen, Chuang Gan:

VCA: Video Curious Agent for Long Video Understanding. 20168-20179 - Rui Ma, Qilong Wang, Bing Cao, Qinghua Hu, Yahong Han:

Unknown Text Learning for Clip-Based Few-Shot Open-Set Recognition. 657-667 - SungMin Jang, Wonjun Kim:

Identity-Aware Language Gaussian Splatting for Open-Vocabulary 3D Semantic Segmentation. 1-10 - Runjia Li, Philip Torr, Andrea Vedaldi, Tomas Jakab:

VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory. 25690-25699 - Taihang Hu, Linxuan Li, Kai Wang, Yaxing Wang, Jian Yang, Ming-Ming Cheng:

Anchor Token Matching: Implicit Structure Locking for Training-Free AR Image Editing. 18166-18176 - Yandan Wang, Chenqi Guo, Yinglong Ma, Jiangyan Chen, Yuan Gao, Weiming Dong:

Bridging Class Imbalance and Partial Labeling Via Spectral-Balanced Energy Propagation for Skeleton-Based Action Recognition. 10162-10172 - Wenjin Mo, Zhiyuan Li, Minghong Fang, Mingwei Fang:

Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning. 3967-3976 - Maan Qraitem, Piotr Teterwak, Kate Saenko, Bryan A. Plummer:

Web Artifact Attacks Disrupt Vision Language Models. 1048-1057 - Anusha Krishnan, Shaohui Liu, Paul-Edouard Sarlin, Oscar Gentilhomme, David Caruso, Maurizio Monge, Richard Newcombe, Jakob J. Engel, Marc Pollefeys:

Benchmarking Egocentric Visual-Inertial SLAM at City Scale. 25207-25217 - Alejandro Pardo, Fabio Pizzati, Tong Zhang, Alexander Pondaven, Philip Torr, Juan C. Pérez, Bernard Ghanem

:
MatchDiffusion: Training-Free Generation of Match-Cuts. 14973-14982 - Lei Fan, Junjie Huang, Donglin Di, Anyang Su, Tianyou Song, Maurice Pagnucco, Yang Song:

Salvaging the Overlooked: Leveraging Class-Aware Contrastive Learning for Multi-Class Anomaly Detection. 21419-21428 - Fengchen He, Dayang Zhao, Hao Xu, Tingwei Quan, Shaoqun Zeng:

Simulating Dual-Pixel Images From Ray Tracing for Depth Estimation. 26106-26115 - Shani Gamrian, Hila Barel, Feiran Li, Masakazu Yoshimura, Daisuke Iso:

Beyond RGB: Adaptive Parallel Processing for RAW Object Detection. 5547-5557 - Baicheng Li, Zike Yan, Dong Wu, Hongbin Zha:

Proactive Scene Decomposition and Reconstruction. 9780-9789 - Xiaogang Xu, Jiafei Wu, Qingsen Yan, Jiequan Cui, Richang Hong, Bei Yu:

Learnable Feature Patches and Vectors for Boosting Low-Light Image Enhancement Without External Knowledge. 7761-7770 - Kien Nguyen, Anh Tran, Cuong Pham:

SuMa: A Subspace Mapping Approach for Robust and Effective Concept Erasure in Text-to-Image Diffusion Models. 19587-19596 - Olaf Dünkel, Artur Jesslen, Jiahao Xie, Christian Theobalt, Christian Rupprecht, Adam Kortylewski:

CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts. 19978-19988 - Zebin He, Mingxin Yang, Shuhui Yang, Yixuan Tang, Tao Wang, Kaihao Zhang, Guanying Chen, Yuhong Liu, Jie Jiang, Chunchao Guo, Wenhan Luo:

MaterialMVP: Illumination-Invariant Material Generation via Multi-View PBR Diffusion. 26294-26305 - Zenghao Niu, Weicheng Xie, Siyang Song, Zitong Yu, Feng Liu, Linlin Shen:

Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling. 3885-3894 - Weiwei Cao, Jianpeng Zhang, Zhongyi Shui, Sinuo Wang, Zeli Chen, Xi Li, Le Lu, Xianghua Ye, Tingbo Liang, Qi Zhang, Ling Zhang:

Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-Language Pre-Training. 23041-23050 - Zixian Guo, Ming Liu, Qilong Wang, Zhilong Ji, Jinfeng Bai, Lei Zhang, Wangmeng Zuo:

Integrating Visual Interpretation and Linguistic Reasoning for Geometric Problem Solving. 3988-3998 - Peyman Gholami, Robert Xiao:

Streamlining Image Editing with Layered Diffusion Brushes. 17368-17378 - Robin Swanson, Esther Y. H. Lin, Masen Lamb, Suresh Sivanandam, Kiriakos N. Kutulakos:

Super Resolved Imaging with Adaptive Optics. 29142-29152 - Ting Lei, Shaofeng Yin, Qingchao Chen, Yuxin Peng, Yang Liu:

Open-Vocabulary Hoi Detection With Interaction-Aware Prompt and Concept Calibration. 23945-23957 - Yangyi Huang, Ye Yuan, Xueting Li, Jan Kautz, Umar Iqbal:

AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion. 13533-13543 - Weida Wang, Changyong He, Jin Zeng, Di Qiu:

Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention. 5188-5197 - Yifan Li, Xin Li, Tianqin Li, Wenbin He, Yu Kong, Liu Ren:

Svit-Split: Unleashing the Power of Vision Foundation Models Via Efficient Splitting Heads. 1979-1989 - Matan Kichler, Shai Bagon, Mark Sheinin:

Learning to See Inside Opaque Liquid Containers Using Speckle Vibrometry. 9466-9476 - Divyansh Srivastava, Xiang Zhang, He Wen, Chenru Wen, Zhuowen Tu:

Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers. 17909-17919 - Dimitrios Mallis, Ahmet Serdar Karadeniz, Sebastian Cavada, Danila Rukhovich, Niki Foteinopoulou, Kseniya Cherenkova, Anis Kacem, Djamila Aouada:

CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers. 7284-7294 - Qi He, Xiao Wu, Jun-Yan He, Shuai Li:

Dual-Rate Dynamic Teacher for Source-Free Domain Adaptive Object Detection. 2067-2076 - Sophia Sirko-Galouchenko, Spyros Gidaris, Antonín Vobecký, Andrei Bursuc, Nicolas Thome:

DIP: Unsupervised Dense In-Context Post-Training of Visual Representations. 4264-4274 - Parag Dutta, Mohd Ayyoob, Shalabh Bhatnagar, Ambedkar Dukkipati:

One Encoder to Rule Them All: Representation Learning for Model-Free Visual Reinforcement Learning Using Fourier Neural Operators. 4818-4827 - Binjian Xie, Pengju Zhang, Hao Wei, Yihong Wu:

Hi-Gaussian: Hierarchical Gaussians Under Normalized Spherical Projection for Single-View 3D Reconstruction. 28664-28673 - Xiaoling Hu, Xiangrui Zeng, Oula Puonti, Juan Eugenio Iglesias, Bruce Fischl, Yaël Balbastre:

Learn2Synth: Learning Optimal Data Synthesis Using Hypergradients for Brain Image Segmentation. 20368-20378 - Sebastian Höfer, Dorian Fritz Henning, Artemij Amiranashvili, Douglas Morrison, Mariliza Tzes, Ingmar Posner, Marc Matvienko, Alessandro Rennola, Anton Milan:

Kaputt: A Large-Scale Dataset for Visual Defect Detection. 24224-24233 - Minghe Gao, Xuqi Liu, Zhongqi Yue, Yang Wu, Shuang Chen, Juncheng Li, Siliang Tang, Fei Wu, Tat-Seng Chua, Yueting Zhuang:

Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program. 1718-1728 - Bin Fu, Zixuan Wang, Kainan Yan, Shitian Zhao, Qi Qin, Jie Wen, Junjun He, Peng Gao:

Fontanimate: High Quality Few-Shot Font Generation Via Animating Font Transfer Process. 16015-16025 - Chang Qiu, Feipeng Da, Zilei Zhang:

Feature Extraction and Representation of Pre-Training Point Cloud Based on Diffusion Models. 26559-26568 - Juhyung Ha, Vibhas K. Vats, Soon-Heung Jung, Md. Alimoor Reza, David J. Crandall:

HVPUNet: Hybrid-Voxel Point-Cloud Upsampling Network. 29153-29162 - Wenqi Wang, Reuben Tan, Pengyue Zhu, Jianwei Yang, Zhengyuan Yang, Lijuan Wang, Andrey Kolobov, Jianfeng Gao, Boqing Gong:

SITE: Towards Spatial Intelligence Thorough Evaluation. 9058-9069 - Li-Heng Chen, Zi-Xin Zou, Chang Liu, Tianjiao Jing, Yan-Pei Cao, Shi-Sheng Huang, Hongbo Fu, Hua Huang:

GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion. 25335-25345 - Byungchul Chae, Seonyeong Heo:

What to Distill? Fast Knowledge Distillation with Adaptive Sampling. 2407-2416 - Sofiène Boutaj, Marin Scalbert, Pierre Marza, Florent Couzinie-Devy, Maria Vakalopoulou, Stergios Christodoulidis:

Controllable Latent Space Augmentation for Digital Pathology. 22165-22174 - Jinsol Song, Jiamu Wang, Anh Tien Nguyen, Keunho Byeon, Sangjeong Ahn, Sung Hak Lee, Jin Tae Kwak:

Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images. 22066-22076 - Mainak Biswas, Ambedkar Dukkipati, Devarajan Sridharan:

Semi-Supervised Deep Transfer for Regression Without Domain Alignment. 827-836 - Yiyuan Zhang, Handong Li, Jing Liu, Xiangyu Yue:

Scaling Omni-Modal Pretraining with Multimodal Context: Advancing Universal Representation Learning Across Modalities. 1336-1348 - Jungeun Kim, Hyeongwoo Jeon, Jongseong Bae, Ha Young Kim:

Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation. 21048-21058 - Han Wang, Yuxiang Nie, Yongjie Ye, Yanjie Wang, Shuai Li, Haiyang Yu, Jinghui Lu, Can Huang:

Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM. 20812-20823 - Chenxin Li, Yifan Liu, Panwang Pan, Hengyu Liu, Xinyu Liu, Wuyang Li, Cheng Wang, Weihao Yu, Yiyang Lin, Yixuan Yuan:

InfoBridge: Balanced Multimodal Integration through Conditional Dependency Modeling. 393-404 - Hoonhee Cho, Yuhwan Jeong, Kuk-Jin Yoon:

Learning Large Motion Estimation from Intermediate Representations with a High-Resolution Optical Flow Dataset Featuring Long-Range Dynamic Motion. 6176-6187 - Ming-Yu Liu, Mikaela Angelina Uy, Donglai Xiang, Hao Su, Sanja Fidler, Nicholas Sharp, Jun Gao:

PartField: Learning 3D Feature Fields for Part Segmentation and Beyond. 9704-9715 - Huan Wang, Haoran Li, Huaming Chen, Jun Yan

, Jiahua Shi, Jun Shen
:
FedDifRC: Unlocking the Potential of Text-to-Image Diffusion Models in Heterogeneous Federated Learning. 3726-3736 - Zewei Xin, Qinya Li, Chaoyue Niu, Fan Wu, Guihai Chen:

Adaptive Routing of Text-to-Image Generation Requests between Large Cloud Model and Light-Weight Edge Model. 19482-19491 - Jingyi Lu, Kai Han:

Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping. 18304-18313 - Peng Wang, Yongcai Wang, Hualong Cao, Wang Chen, Deying Li:

LA-MOTR: End-to-End Multi-Object Tracking by Learnable Association. 12438-12448 - Xiaojie Li, Ronghui Li, Shukai Fang, Shuzhao Xie, Xiaoyang Guo, Jiaqing Zhou, Junkun Peng, Zhi Wang:

Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling. 14420-14430 - Chang Liu, Viraj Shah, Aiyu Cui, Svetlana Lazebnik:

UnZipLoRA: Separating Content and Style from a Single Image. 16776-16785 - Qi Bi, Yixian Shen, Jingjun Yi, Gui-Song Xia:

AdaDCP: Learning an Adapter with Discrete Cosine Prior for Clear-to-Adverse Domain Generalization. 12997-13008 - Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi:

Radiant Foam: Real-Time Differentiable Ray Tracing. 4135-4145 - George Stoica, Vivek Ramanujan, Xiang Fan, Ali Farhadi, Ranjay Krishna, Judy Hoffman:

Contrastive Flow Matching. 1185-1194 - Zichen Liu, Yihao Meng, Hao Ouyang, Yue Yu, Bolin Zhao, Daniel Cohen-Or, Huamin Qu:

Dynamic Typography: Bringing Text to Life via Video Diffusion Prior. 14787-14797 - Dale Decatur, Thibault Groueix, Wang Yifan, Rana Hanocka, Vladimir G. Kim, Matheus Gadelha:

Reusing Computation in Text-to-Image Diffusion for Efficient Generation of Image Sets. 16482-16491 - Zhenwei Shao, Mingyang Wang, Zhou Yu, Wenwen Pan, Yan Yang, Tao Wei, Hongyuan Zhang, Ning Mao, Wei Chen, Jun Yu:

Growing a Twig to Accelerate Large Vision-Language Models. 20064-20074 - Huanpeng Chu, Wei Wu, Guanyu Feng, Yutao Zhang:

OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models. 16302-16312 - Eunseo Koh, Seunghoo Hong, Tae-Young Kim, Simon S. Woo, Jae-Pil Heo:

Translation of Text Embedding Via Delta Vector to Suppress Strongly Entangled Content in Text-to-Image Diffusion Models. 15365-15374 - Dengke Zhang, Fagui Liu, Quan Tang:

CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation. 24677-24687 - Junyuan Deng, Wei Yin, Xiaoyang Guo, Qian Zhang, Xiaotao Hu, Weiqiang Ren, Xiao-Xiao Long, Ping Tan:

Boost 3D Reconstruction Using Diffusion-Based Monocular Camera Calibration. 7110-7121 - Xiaorui Jiang, Buyun He, Peng Yuan Zhou, Xinyue Chen, Jingcai Guo, Jie Xu, Yong Liao:

A Unified Framework to BRIDGE Complete and Incomplete Deep Multi-View Clustering Under Non-IID Missing Patterns. 594-603 - Jiaqi Liao, Yuwei Niu, Fanqing Meng, Hao Li, Changyao Tian, Yinuo Du, Yuwen Xiong, Dianqi Li, Xizhou Zhu, Li Yuan, Jifeng Dai, Yu Cheng:

LangBridge: Interpreting Image as a Combination of Language Embeddings. 23752-23762 - Bo Peng, Jie Lu, Guangquan Zhang, Zhen Fang:

On the Provable Importance of Gradients for Autonomous Language-Assisted Image Clustering. 19805-19815 - Qingwang Zhang, Yingying Zhu:

Breaking Rectangular Shackles: Cross-View Object Segmentation for Fine-Grained Object Geo-Localization. 8197-8206 - Xiaokun Feng, Shiyu Hu, Xuchen Li, Dailing Zhang, Meiqi Wu, Jing Zhang, Xiaotang Chen, Kaiqi Huang:

ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking. 19850-19861 - Zhen Zhang, Shuai Yang, Qianlong Dang, Zhize Wu, Lichuan Gu:

Split-And-Combine: Enhancing Style Augmentation for Single Domain Generalization. 15616-15625 - Liang Han, Xu Zhang, Haichuan Song, Kanle Shi, Yu-Shen Liu, Zhizhong Han:

SparseRecon: Neural Implicit Surface Reconstruction from Sparse Views with Feature and Depth Consistencies. 28514-28524 - Liam Schoneveld, Zhe Chen, Davide Davoli, Jiapeng Tang, Saimon Terazawa, Ko Nishino, Matthias Nießner:

SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians. 14162-14172 - Zhuoyan Xu, Khoi Duc Nguyen, Preeti Mukherjee, Saurabh Bagchi, Somali Chaterji, Yingyu Liang, Yin Li:

Learning to Inference Adaptively for Multimodal Large Language Models. 3552-3563 - Yuxuan Wang, Tianwei Cao, Huayu Zhang, Zhongjiang He, Kongming Liang, Zhanyu Ma:

FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models. 17046-17055 - Jiahui Ren, Mochu Xiang, Jiajun Zhu, Yuchao Dai:

PanoSplatt3R: Leveraging Perspective Pretraining for Generalized Unposed Wide-Baseline Panorama Reconstruction. 28959-28969 - Rui Wang, Quentin Lohmeyer, Mirko Meboldt, Siyu Tang:

DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-Free 3D Reconstruction. 6294-6303 - Sucheng Ren, Qihang Yu, Ju He, Xiaohui Shen, Alan L. Yuille, Liang-Chieh Chen:

Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation. 15781-15791 - Jiaxu Wan, Hong Zhang, Ziqi He, Yangyan Deng, Qishu Wang, Ding Yuan, Yifan Yang:

SP2T: Sparse Proxy Attention for Dual-Stream Point Transformer. 27885-27895 - Mang Cao, Sanping Zhou, Yizhe Li, Ye Deng, Wenli Huang, Le Wang:

Enhancing Mamba Decoder with Bidirectional Interaction in Multi-Task Dense Prediction. 1-10 - Runqi Wang, Caoyuan Ma, Guopeng Li, Hanrui Xu, Yuke Li, Zheng Wang:

You Think, You ACT: the New Task of Arbitrary Text to Motion Generation. 12012-12022 - Yi-Lin Wei, Mu Lin, Yuhao Lin, Jian-Jian Jiang, Xiao-Ming Wu, Ling-An Zeng, Wei-Shi Zheng:

AffordDexGrasp: Open-Set Language-Guided Dexterous Grasp With Generalizable-Instructive Affordance. 11818-11828 - Xin Wei, Qin Yang, Yijie Fang, Mingrui Zhu, Nannan Wang:

3D Test-Time Adaptation via Graph Spectral Driven Point Shift. 26762-26771 - Zhenghao Gao, Shengjie Xu, Zijing Li, Meixi Chen, Chaojian Yu, Yuanjie Shao, Changxin Gao:

FastJSMA: Accelerating Jacobian-Based Saliency Map Attacks Through Gradient Decoupling. 1506-1515 - Wanshui Gan, Fang Liu, Hongbin Xu, Ningkai Mo, Naoto Yokoya:

GaussianOcc: Fully Self-Supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting. 28980-28990 - Liming Jiang, Qing Yan, Yumin Jia, Zichuan Liu, Hao Kang, Xin Lu:

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity. 10898-10907 - Xinyu Liu, Guolei Sun, Cheng Wang, Yixuan Yuan, Ender Konukoglu:

MedVSR: Medical Video Super-Resolution with Cross State-Space Propagation. 11697-11707 - Xu Yang, Shaoli Huang, Shenbo Xie, Xuelin Chen, Yifei Liu, Changxing Ding:

Democratizing High-Fidelity Co-Speech Gesture Video Generation. 14283-14292 - Shanshan Yan, Zexi Li, Chao Wu, Meng Pang, Yang Lu, Yan Yan, Hanzi Wang:

You are Your Own Best Teacher: Achieving Centralized-level Performance in Federated Learning under Heterogeneous and Long-Tailed Data. 2750-2759 - Xiao Zhang, Fei Wei, Yong Wang, Wenda Zhao, Feiyi Li, Xiangxiang Chu:

UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement. 508-518 - Yuhang Ma, Xiaoshi Wu, Keqiang Sun, Hongsheng Li:

HPSv3: Towards Wide-Spectrum Human Preference Score. 15086-15095 - Yuejiang Dong, Wang Zhao, Jiale Xu, Ying Shan, Song-Hai Zhang:

DepthSync: Diffusion Guidance-Based Depth Synchronization for Scale- and Geometry-Consistent Video Depth Estimation. 5415-5425 - Pedro R. A. S. Bassi, Mehmet Can Yavuz, Ibrahim Ethem Hamamci, Sezgin Er, Xiaoxi Chen, Wenxuan Li, Bjoern Menze, Sergio Decherchi, Andrea Cavalli, Kang Wang, Yang Yang, Alan L. Yuille, Zongwei Zhou:

RadGPT: Constructing 3D Image-Text Tumor Datasets. 23720-23730 - Shuangkang Fang, I-Chao Shen, Takeo Igarashi, Yufeng Wang, Zesheng Wang, Yi Yang, Wenrui Ding, Shuchang Zhou:

NeRF is a Valuable Assistant for 3D Gaussian Splatting. 26230-26240 - Yuchen Guan, Chong Sun, Canmiao Fu, Zhipeng Huang, Chun Yuan, Chen Li:

Text-Guided Visual Prompt DINO for Generic Segmentation. 21288-21298 - Qihang Yu, Ju He, Xueqing Deng, Xiaohui Shen, Liang-Chieh Chen:

Randomized Autoregressive Visual Generation. 18431-18441 - Yifei Feng, Mingxin Yang, Shuhui Yang, Sheng Zhang, Jiaao Yu, Zibo Zhao, Yuhong Liu, Jie Jiang, Chunchao Guo:

RomanTex: Decoupling 3D-Aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis. 17203-17213 - Yejun Shou, Haocheng Wang, Lingfeng Shen, Qian Zheng, Gang Pan, Yanlong Cao:

Unsupervised Rgb-D Point Cloud Registration for Scenes With Low Overlap and Photometric Inconsistency. 24868-24877 - Zhaorui Tan, Xi Yang, Tan Pan, Tianyi Liu, Chen Jiang, Xin Guo, Qiufeng Wang, Anh Nguyen, Yuan Qi, Kaizhu Huang, Yuan Cheng:

Towards a Universal 3D Medical Multi-Modality Generalization via Learning Personalized Invariant Representation. 21895-21905 - Naifu Xue, Zhaoyang Jia, Jiahao Li, Bin Li, Yuan Zhang, Yan Lu:

DLF: Extreme Image Compression with Dual-Generative Latent Fusion. 19227-19236 - Bowen Wang, Zhouqiang Jiang, Yasuaki Susumu, Shotaro Miwa, Tianwei Chen, Yuta Nakashima:

Taming the Untamed: Graph-Based Knowledge Retrieval and Reasoning for MLLMs to Conquer the Unknown. 4732-4742 - Felix Krause, Timy Phan, Ming Gui, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer:

Tread: Token Routing for Efficient Architecture-Agnostic Diffusion Training. 15703-15713 - Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan:

LLaVA-Prumerge: Adaptive Token Reduction for Efficient Large Multimodal Models. 22857-22867 - Lixu Wang, Chenxi Liu, Junfeng Guo, Qingqing Ye, Heng Huang, Haibo Hu, Wei Dong:

Federated Continuous Category Discovery and Learning. 2429-2439 - Jisoo Kim, Wooseok Seo, Junwan Kim, Seungho Park, Sooyeon Park, Youngjae Yu:

V.I.P.: Iterative Online Preference Distillation for Efficient Video Diffusion Models. 17235-17245 - Viraj Prabhu, Senthil Purushwalkam, An Yan, Caiming Xiong, Ran Xu:

Trust but Verify: Programmatic VLM Evaluation in the Wild. 3258-3267 - Xiaofei Hui, Haoxuan Qu, Ping Hu, Hossein Rahmani, Jun Liu:

Boundary Probing for Input Privacy Protection when Using LMM Services. 467-477 - Fitim Abdullahu, Helmut Grabner:

Visual Interestingness Decoded: How GPT-4O Mirrors Human Interests. 15350-15364 - Yu Wang, Bo Dang, Wanchun Li, Wei Chen, Yansheng Li:

HoliTracer: Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery. 8482-8491 - Xu Chen, Yang Li, Yahong Han, Guangquan Xu, Jialie Shen:

Coupling the Generator with Teacher for Effective Data-Free Knowledge Distillation. 2152-2160 - Shuyi Ouyang, Ziwei Niu, Hongyi Wang, Yen-Wei Chen, Lanfen Lin:

Region-Aware Anchoring Mechanism for Efficient Referring Visual Grounding. 24192-24202 - Sanjoy Kundu, Shanmukha Vellamcheti, Sathyanarayanan N. Aakur:

ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition. 14128-14140 - Jianhua Sun, Yuxuan Li, Jiude Wei, Longfei Xu, Nange Wang, Yining Zhang, Cewu Lu:

Arti-PG: A Toolbox for Procedurally Synthesizing Large-Scale and Diverse Articulated Objects with Rich Annotations. 6396-6405 - Carter Sifferman, Yiquan Li, Yiming Li, Fangzhou Mu, Michael Gleicher, Mohit Gupta, Yin Li:

Recovering Parametric Scenes from Very Few Time-of-Flight Pixels. 27989-27999 - Lanning Zhang, Ying Zhou, Fei Gao, Ziyun Li, Maoying Qiao, Jinlan Xu, Nannan Wang:

Q-Norm: Robust Representation Learning via Quality-Adaptive Normalization. 13901-13911 - Jian-Jian Jiang, Xiao-Ming Wu, Yi-Xiang He, Ling-An Zeng, Yi-Lin Wei, Dandan Zhang, Wei-Shi Zheng:

Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework. 12427-12437 - Pooyan Rahmanzadehgervi, Hung Huy Nguyen, Rosanne Liu, Long Mai, Anh Totti Nguyen:

TAB: Transformer Attention Bottlenecks Enable User Intervention and Debugging in Vision-Language Models. 22551-22562 - Hongyang He, Hongyang Xie, Haochen You, Victor Sanchez:

Semi-ViM: Bidirectional State Space Model for Mitigating Label Imbalance in Semi-Supervised Learning. 765-774 - Bozhong Zheng, Jinye Gan, Xiaohao Xu, Xintao Chen, Wenqiao Li, Xiaonan Huang, Na Ni, Yingna Wu:

Bridging 3D Anomaly Localization and Repair Via High-Quality Continuous Geometric Representation. 27063-27072 - Jeong Hun Yeo, Minsu Kim, Chae Won Kim, Stavros Petridis, Yong Man Ro:

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations. 6693-6703 - Hamadi Chihaoui, Paolo Favaro:

Diffusion Image Prior. 24636-24644 - Taeuk Jang, Hoin Jung, Xiaoqian Wang:

Target Bias Is All You Need: Zero-Shot Debiasing of Vision-Language Models With Bias Corpus. 1-12 - Gueter Josmy Faure, Jia-Fong Yeh, Min-Hung Chen, Hung-Ting Su, Shang-Hong Lai, Winston H. Hsu:

HERMES: Temporal-Coherent Long-form Understanding with Episodes and Semantics. 22911-22921 - Amirhossein Ansari, Ke Wang, Pulei Xiong:

NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection. 573-582 - Zhaoyang Li, Yuan Wang, Guoxin Xiong, Wangkai Li, Yuwen Pan, Tianzhu Zhang:

Generalized Few-Shot Point Cloud Segmentation via LLM-Assisted Hyper-Relation Matching. 23063-23073 - Renye Yan, Jikang Cheng, Yaozhong Gan, Shikun Sun, You Wu, Yunfan Yang, Ling Liang, Jinlong Lin, Yeshuang Zhu, Jie Zhou, Jinchao Zhang, Junliang Xing, Yimao Cai, Ru Huang:

Entropy-Adaptive Diffusion Policy Optimization with Dynamic Step Alignment. 1924-1934 - Junhao Wei, Yu Zhe, Jun Sakuma:

Disrupting Model Merging: A Parameter-Level Defense without Sacrificing Accuracy. 17698-17707 - Yongjian Wu, Yang Zhou, Jiya Saiyin, Bingzheng Wei, Yan Xu:

Visual Textualization for Image Prompted Object Detection. 20900-20910 - Xiuyu Yang, Shuhan Tan, Philipp Krähenbühl:

Long-Term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation. 25305-25314 - Qiaosi Yi, Shuai Liu, Rongyuan Wu, Lingchen Sun, Yuhui Wu, Lei Zhang:

Fine-Structure Preserved Real-World Image Super-Resolution Via Transfer Vae Training. 12415-12426 - Zhihang Yuan, Rui Xie, Yuzhang Shang, Hanling Zhang, Siyuan Wang, Shengen Yan, Guohao Dai, Yu Wang:

Dlfr-Gen: Diffusion-Based Video Generation With Dynamic Latent Frame Rate. 16410-16419 - Yingxian Chen, Jiahui Liu, Ruidi Fan, Yanwei Li, Chirui Chang, Shizhen Zhao, Wilton W. T. Fok, Xiaojuan Qi, Yik-Chung Wu:

Aligning Effective Tokens with Video Anomaly in Large Language Models. 22695-22706 - Xiao Chen, Tai Wang, Quanyi Li, Tao Huang, Jiangmiao Pang, Tianfan Xue:

Gleam: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes. 5558-5568 - Youzhuo Wang, Jiayi Ye, Chuyang Xiao, Yiming Zhong, Heng Tao, Hang Yu, Yumeng Liu, Jingyi Yu, Yuexin Ma:

DexH2R: A Benchmark for Dynamic Dexterous Grasping in Human-To-Robot Handover. 12702-12712 - Xixi Hu, Runlong Liao, Keyang Xu, Bo Liu, Yeqing Li, Eugene Ie, Hongliang Fei, Qiang Liu:

Improving Rectified Flow with Boundary Conditions. 18177-18186 - Jinseok Bae, Inwoo Hwang, Young Yoon Lee, Ziyu Guo, Joseph Liu, Yizhak Ben-Shabat, Young Min Kim, Mubbasir Kapadia:

Less is More: Improving Motion Diffusion Models with Sparse Keyframes. 11069-11078 - Jungho Lee, Donghyeong Kim, Dogyoon Lee, Suhwan Cho, Minhyeok Lee, Wonjoon Lee, Taeoh Kim, Dongyoon Wee, Sangyoun Lee:

CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images. 26415-26424 - Sejin Park, Sangmin Lee, Kyong Hwan Jin, Seung-Won Jung:

IM-LUT: Interpolation Mixing Look-Up Tables for Image Super-Resolution. 14317-14325 - Yu Zheng, Boyang Gong, Fanye Kong, Yueqi Duan, Bingyao Yu, Wenzhao Zheng, Lei Chen, Jiwen Lu, Jie Zhou:

Learning Counterfactually Decoupled Attention for Open-World Model Attribution. 122-132 - Yuanrui Wang, Cong Han, Yafei Li, Zhipeng Jin, Xiawei Li, SiNan Du, Wen Tao, Shuanglong Li, Yi Yang, Chun Yuan, Liu Lin:

UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis. 18335-18344 - Xiaomeng Chu, Jiajun Deng, Guoliang You, Wei Liu, Xingchen Li, Jianmin Ji, Yanyong Zhang:

GraspCoT: Integrating Physical Property Reasoning for 6-DoF Grasping Under Flexible Language Instructions. 10130-10140 - Jiawei Mao, Yuhan Wang, Yucheng Tang, Daguang Xu, Kang Wang, Yang Yang, Zongwei Zhou, Yuyin Zhou:

MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs. 21525-21535 - Hanyang Kong, Xingyi Yang, Xinchao Wang:

Rogsplat: Robust Gaussian Splatting Via Generative Priors. 25735-25745 - Mengchen Zhang, Tong Wu, Jing Tan, Ziwei Liu, Gordon Wetzstein, Dahua Lin:

GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography. 18229-18239 - Yujie Wei, Shiwei Zhang, Hangjie Yuan, Biao Gong, Longxiang Tang, Xiang Wang, Haonan Qiu, Hengjia Li, Shuai Tan, Yingya Zhang, Hongming Shan:

DreamRelation: Relation-Centric Video Customization. 12381-12393 - Dominik Scheuble, Hanno Holzhüter, Steven Peters, Mario Bijelic, Felix Heide:

Lidar Waveforms are Worth 40×128×33 Words. 28913-28924 - Zixin Wang, Dong Gong, Sen Wang

, Zi Huang
, Yadan Luo
:
Is Less More? Exploring Token Condensation as Training-Free Test-Time Adaptation. 144-154 - Weihong Pan, Xiaoyu Zhang, Hongjia Zhai, Xiaojun Xiang, Hanqing Jiang, Guofeng Zhang:

Liberated-Gs: 3D Gaussian Splatting Independent From Sfm Point Clouds. 26675-26685 - Hongyu Wen, Yiming Zuo, Venkat Subramanian, Patrick Chen, Jia Deng:

Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation. 6715-6725 - Donald Shenaj, Ondrej Bohdal, Mete Ozay, Pietro Zanuttigh, Umberto Michieli:

LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation. 16132-16142 - Xirui Hu, Jiahao Wang, Hao Chen, Weizhan Zhang, Benqi Wang, Yikun Li, Haishun Nan:

DynamicID: Zero-Shot Multi-ID Image Personalization With Flexible Facial Editability. 10549-10559 - Zhen Zhou, Tong Wang, Yunkai Ma, Xiao Tan, Fengshui Jing:

LIRA: Reasoning Reconstruction via Multimodal Large Language Models. 1762-1772 - Jianqi Chen, Biao Zhang

, Xiangjun Tang, Peter Wonka:
V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video. 11643-11653 - Zhenyang Liu, Yikai Wang, Kuanning Wang, Longfei Liang, Xiangyang Xue, Yanwei Fu:

Spatial-Temporal Aware Visuomotor Diffusion Policy Learning. 1-10 - Jie Liu, Jiayi Shen, Pan Zhou, Jan-Jakob Sonke, Efstratios Gavves:

Probabilistic Prototype Calibration of Vision-Language Models for Generalized Few-Shot Semantic Segmentation. 21155-21165 - Tianci Wen, Zhiang Liu, Yongchun Fang:

Segs-Slam: Structure-Enhanced 3D Gaussian Splatting Slam With Appearance Embedding. 28103-28113 - Runkai Zheng, Vishnu Asutosh Dasu, Yinong Oliver Wang, Haohan Wang, Fernando De la Torre:

Improving Noise Efficiency in Privacy-Preserving Dataset Distillation. 4838-4847 - Jie Zhu, Yiyang Su, Minchul Kim, Anil K. Jain, Xiaoming Liu:

A Quality-Guided Mixture of Score-Fusion Experts Framework for Human Recognition. 13076-13086 - Junseong Shin, Seungwoo Chung, Yunjeong Yang, Tae Hyun Kim:

HazeFlow: Revisit Haze Physical Model as ODE and Non-Homogeneous Haze Generation for Real-World Dehazing. 6263-6272 - Marvin Heidinger, Snehal Jauhri, Vignesh Prasad, Georgia Chalvatzaki:

2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos. 14743-14753 - Tengjin Weng, Jingyi Wang, Wenhao Jiang, Zhong Ming:

VisNumBench: Evaluating Number Sense of Multimodal Large Language Models. 3830-3840 - Zhixin Cheng, Jiacheng Deng, Xinjun Li, Xiaotian Yin, Bohao Liao, Baoqun Yin, Wenfei Yang, Tianzhu Zhang:

CA-I2P: Channel-Adaptive Registration Network with Global Optimal Selection. 27739-27749 - Kai Tong, Kang Pan, Xiao Zhang, Erli Meng, Run He, Yawen Cui, Nuoyan Guo, Huiping Zhuang:

Any-SSR: How Recursive Least Squares Works in Continual Learning of Large Language Models. 3047-3057 - Fabian Perez, Sara Rojas, Carlos Hinojosa

, Hoover Rueda-Chacón, Bernard Ghanem
:
UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields. 26284-26293 - Haihao Zhang, Yunjian Zhang, Jianing Li, Lin Zhu, Meng Lv, Yao Zhu, Yanwei Liu, Xiangyang Ji:

Enhanced Event-Based Dense Stereo via Cross-Sensor Knowledge Distillation. 5437-5447 - Qingcheng Zhao, Xiang Zhang, Haiyang Xu, Zeyuan Chen, Jianwen Xie, Yuan Gao, Zhuowen Tu:

DepR: Depth Guided Single-View Scene Reconstruction with Instance-Level Diffusion. 5722-5733 - Akshat Ramachandran, Mingyu Lee, Huan Xu, Souvik Kundu, Tushar Krishna:

Ouromamba: a Data-Free Quantization Framework for Vision Mamba. 21177-21186 - Aryan Yazdan Parast, Basim Azam

, Naveed Akhtar:
DDB: Diffusion Driven Balancing to Address Spurious Correlations. 17526-17535 - Pinxin Liu, Luchuan Song, Junhua Huang, Haiyang Liu, Chenliang Xu:

GestureLSM: Latent Shortcut Based Co-Speech Gesture Generation with Spatial-Temporal Modeling. 10929-10939 - Seongmin Park, Hyungmin Kim, Sangwoo Kim, Wonseok Jeon, Juyoung Yang, Byeongwook Jeon, Yoonseon Oh, Jungwook Choi:

Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control. 13140-13150 - Qianqian Wang, Bowen Zhao, Zhengming Ding, Wei Feng, Quanxue Gao:

Hypergraph Clustering Network with Partial Attribute Imputation. 2697-2706 - Xiao Li, Qi Chen, Xiulian Peng, Kai Yu, Xie Chen, Yan Lu:

Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video. 12904-12914 - Sounak Mondal, Naveen Sendhilnathan, Ting Zhang, Yue Liu, Michael Proulx, Michael Louis Iuzzolino, Chuan Qin, Tanya R. Jonker:

Gaze-Language Alignment for Zero-Shot Prediction of Visual Search Targets from Human Gaze Scanpaths. 1-12 - Federico Girella, Davide Talon, Ziyue Liu, Zanxi Ruan, Yiming Wang

, Marco Cristani:
LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing. 19711-19720 - Jingjing Jiang, Chao Ma, Xurui Song, Hanwang Zhang, Jun Luo:

Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning. 3034-3046 - Yuanhao Zhai, Yen-Liang Lin, Minxu Peng, Larry S. Davis, Ashwin Chandramouli, Junsong Yuan, David S. Doermann:

Text2Outfit: Controllable Outfit Generation With Multimodal Language Models. 16165-16174 - Ali Shah Ali, Syed Ahmed Mahmood, Mubin Saeed, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran:

Joint Self-Supervised Video Alignment and Action Segmentation. 10807-10818 - Subrat Kishore Dutta, Xiao Zhang:

IAP: Invisible Adversarial Patch Attack Through Perceptibility-Aware Localization and Perturbation Optimization. 14766-14775 - Edoardo Palladin, Samuel Brucker, Filippo Ghilotti, Praveen Narayanan, Mario Bijelic, Felix Heide:

Self-Supervised Sparse Sensor Fusion for Long Range Perception. 27498-27509 - Wenjie Chang, Hanzhi Chang, Yueyi Zhang, Wenfei Yang, Tianzhu Zhang:

Learning Neural Scene Representation from iToF Imaging. 27937-27946 - Ziqi Gao, Qiufu Li, Linlin Shen:

DAP-MAE: Domain-Adaptive Point Cloud Masked Autoencoder for Effective Cross-Domain Learning. 3488-3498 - Younjoon Chung, Hyoungseob Park, Patrick Rim, Xiaoran Zhang, Jihe He, Ziyao Zeng, Safa Cicek, Byung-Woo Hong, James S. Duncan, Alex Wong:

ETA: Energy-Based Test-Time Adaptation for Depth Completion. 1-12 - Kartik Narayan, Vibashan VS, Rama Chellappa, Vishal M. Patel:

FaceXFormer: A Unified Transformer for Facial Analysis. 11369-11382 - Baoyue Hu, Yang Wei, Junhao Xiao, Wendong Huang, Xiuli Bi, Bin Xiao:

Who Controls the Authorization? Invertible Networks for Copyright Protection in Text-to-Image Synthesis. 15832-15841 - Jianyun Xu, Song Wang, Ziqian Ni, Chunyong Hu, Sheng Yang, Jianke Zhu, Qiang Li:

SAM4D: Segment Anything in Camera and LiDAR Streams. 28535-28545 - Shengjie Lin, Jiading Fang, Muhammad Zubair Irshad, Vitor Campagnolo Guizilini, Rares Andrei Ambrus, Greg Shakhnarovich, Matthew R. Walter:

SPLART: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting. 8841-8851 - Liuchi Xu, Kang Liu, Jinshuai Liu, Lu Wang, Lisheng Xu, Jun Cheng:

Local Dense Logit Relations for Enhanced Knowledge Distillation. 4539-4549 - Chen Chen, Zhirui Wang, Taowei Sheng, Yi Jiang, Yundu Li, Peirui Cheng, Luning Zhang, Kaiqiang Chen, Yanfeng Hu, Xue Yang, Xian Sun:

SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World. 27021-27030 - Sabbir Ahmed, Jingtao Li, Weiming Zhuang, Chen Chen, Lingjuan Lyu:

MixA: A Mixed Attention Approach with Stable Lightweight Linear Attention to Enhance Efficiency of Vision Transformers at the Edge. 21187-21196 - Yupeng Zheng, Pengxuan Yang, Zebin Xing, Qichao Zhang, Yuhang Zheng, Yinfeng Gao, Pengfei Li, Teng Zhang, Zhongpu Xia, Peng Jia, Xianpeng Lang, Dongbin Zhao:

World4Drive: End-to-End Autonomous Driving via Intention-Aware Physical Latent World Model. 28632-28642 - Fan Pei, Jinchen Bai, Xiang Feng, Zoubin Bi, Kun Zhou, Hongzhi Wu:

OpenSubstance: A High-Quality Measured Dataset of Multi-View and -Lighting Images and Shapes. 5221-5231 - Zhaotong Yang, Yuhui Li, Shengfeng He, Xinzhe Li, Yangyang Xu, Junyu Dong, Yong Du:

OmniVTON: Training-Free Universal Virtual Try-On. 16702-16711 - Wuyang Li, Wentao Pan, Xiaoyuan Liu, Zhendong Luo, Chenxin Li, Hengyu Liu, Din Ping Tsai, Mu Ku Chen, Yixuan Yuan:

Metascope: Optics-Driven Neural Network for Ultra-Micro Metalens Endoscopy. 25938-25950 - Zhisheng Zhong, Chengyao Wang, Yuqi Liu, Senqiao Yang, Longxiang Tang, Yuechen Zhang, Jingyao Li, Tianyuan Qu, Yanwei Li, Yukang Chen, Shaozuo Yu, Sitong Wu, Eric Lo, Shu Liu, Jiaya Jia:

LYRA: An Efficient and Speech-Centric Framework for Omni-Cognition. 3694-3704 - Nairouz Mrabah, Nicolas Richet, Ismail Ben Ayed, Eric Granger:

Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation. 3143-3152 - Sunghyun Park, Jungsoo Lee, Shubhankar Borse, Munawar Hayat, Sungha Choi, Kyuwoong Hwang, Fatih Porikli:

Understanding Personal Concept in Open-Vocabulary Semantic Segmentation. 1-10 - Qi Chen, Lingxiao Yang, Yun Chen, Nailong Zhao, Jianhuang Lai, Jie Shao, Xiaohua Xie:

Training-Free Class Purification for Open-Vocabulary Semantic Segmentation. 23124-23134 - Karlo Koledic, Luka Petrovic, Ivan Markovic, Ivan Petrovic:

GVDepth: Zero-Shot Monocular Depth Estimation for Ground Vehicles Based on Probabilistic Cue Fusion. 26126-26135 - Yiming Zhang, Zhuokai Zhao, Zhaorun Chen, Zenghui Ding, Xianjun Yang, Yining Sun:

Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding. 22046-22055 - Kaining Ying, Henghui Ding, Guangquan Jie, Yu-Gang Jiang:

Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation. 22575-22585 - Aoxiong Yin, Xu Tan, Kai Shen, Yichong Leng, Xinyu Zhou, Juncheng Li, Siliang Tang:

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation. 1-12 - Shih-Po Lee, Ehsan Elhamifar:

Error Recognition in Procedural Videos Using Generalized Task Graph. 10009-10021 - Taesung Kwon, Jong Chul Ye:

VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models. 10465-10474 - Yinqi Cai, Jichang Li, Zhaolun Li, Weikai Chen, Rushi Lan, Xi Xie, Xiaonan Luo, Guanbin Li:

DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis. 12524-12534 - Weihao Yu, Yuanhao Cai, Ruyi Zha, Zhiwen Fan, Chenxin Li, Yixuan Yuan:

$\mathbf{X}^{\mathbf{2}}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-Time Tomographic Reconstruction. 24728-24738 - Daniel Winter, Asaf Shul, Matan Cohen, Dana Berman, Yael Pritch, Alex Rav-Acha, Yedid Hoshen:

ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation. 16281-16291 - Zhensheng Yuan, Haozhi Huang, Zhen Xiong, Di Wang, Guanghua Yang:

Robust and Efficient 3D Gaussian Splatting for Urban Scene Reconstruction. 26209-26219 - Haiyang Ying, Matthias Zwicker:

SketchSplat: 3D Edge Reconstruction Via Differentiable Multi-View Sketch Splatting. 25649-25659 - Chenming Zhu, Tai Wang, Wenwei Zhang, Jiangmiao Pang, Xihui Liu:

LLaVA-3D: A Simple Yet Effective Pathway to Empowering LMMs with 3D Capabilities. 4295-4305 - Haotian Wang, Aoran Xiao, Xiaoqin Zhang, Meng Yang, Shijian Lu:

PacGDC: Label-Efficient Generalizable Depth Completion with Projection Ambiguity and Consistency. 7709-7720 - Junyu Shi, Lijiang Liu, Yong Sun, Zhiyuan Zhang, Jinni Zhou, Qiang Nie:

GenM3: Generative Pretrained Multi-Path Motion Model for Text Conditional Human Motion Generation. 13129-13139 - Hongcheng Gao, Tianyu Pang, Chao Du, Taihang Hu, Zhijie Deng, Min Lin:

Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts. 2131-2141 - Yuqian Fu, Runze Wang, Bin Ren, Guolei Sun, Biao Gong, Yanwei Fu, Danda Pani Paudel, Xuanjing Huang, Luc Van Gool:

ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives. 6530-6540 - Guowei Shi, Zian Mao, Peisen Huang:

Ultra-Precision 6DoF Pose Estimation Using 2-D Interpolated Discrete Fourier Transform. 5802-5810 - Rongchang Xie, Chen Du, Ping Song, Chang Liu:

MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding. 24135-24146 - Nicholas S. DiBrita, Jason Han, Tirthak Patel:

ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers. 20085-20094 - Minghao Fu, Guo-Hua Wang, Xiaohao Chen, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang:

Teefusion: Blending Text Embeddings to Distill Classifier-Free Guidance. 16652-16661 - Bo Wang, Huiyuan Fu, Zhiye Huang, Siru Zhang, Xin Wang, Huadong Ma:

From Abyssal Darkness to Blinding Glare: a Benchmark on Extreme Exposure Correction in Real World. 7666-7675 - Zhangquan Chen, Xufang Luo, Dongsheng Li:

VisRL: Intention-Driven Visual Perception via Reinforced Reasoning. 2545-2555 - Xinyu Sun, Zhikun Zhao, Congyan Lang, Bing Li, Juan Wang:

Multimodal Large Language Model-Guided ISP Hyperparameter Optimization with Dynamic Preference Learning. 437-446 - Inwoo Hwang, Jinseok Bae, Donggeun Lim, Young Min Kim:

Motion Synthesis with Sparse and Flexible Keyjoint Control. 13203-13213 - Xuran Ma, Yexin Liu, Yaofu Liu, Xianfeng Wu, Mingzhe Zheng, Zihao Wang, Ser-Nam Lim, Harry Yang:

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models. 17150-17159 - Peng-Hao Hsu, Ke Zhang, Fu-En Wang, Tao Tu, Ming-Feng Li, Yu-Lun Liu, Albert Y. C. Chen, Min Sun, Cheng-Hao Kuo:

OpenM3D: Open Vocabulary Multi-View Indoor 3D Object Detection without Human Annotations. 8688-8698 - Ziye Li, Hao Luo, Xincheng Shuai, Henghui Ding:

AnyI2V: Animating Any Conditional Image with Motion Control. 17302-17311 - Juntao Chen, Wen Shen, Zhihua Wei, Lijun Sun, Hongyun Zhang:

Leveraging Debiased Cross-Modal Attention Maps and Code-Based Reasoning for Zero-Shot Referring Expression Comprehension. 20413-20424 - Haoxuan Li, Ziya Erkoç, Lei Li, Daniele Sirigatti, Vladislav Rosov, Angela Dai, Matthias Nießner:

MeshPad: Interactive Sketch-Conditioned Artist-Reminiscent Mesh Generation and Editing. 16227-16237 - Heeji Yoon, Heeseong Shin, Eunbeen Hong, Hyunwook Choi, Hansang Cho, Daun Jeong, Seungryong Kim:

S4M: Boosting Semi-Supervised Instance Segmentation with SAM. 20226-20236 - Shintaro Shiba, Yoshimitsu Aoki, Guillermo Gallego:

Simultaneous Motion and Noise Estimation with Event Cameras. 6959-6969 - Weili Zeng, Ziyuan Huang, Kaixiang Ji, Yichao Yan:

Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models Via Adaptive Token Skipping. 21384-21397 - Mark Yu, Wenbo Hu, Jinbo Xing, Ying Shan:

TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models. 100-111 - Chao Pan, Ke Tang, Qing Li, Xin Yao:

Mitigating Catastrophic Overfitting in Fast Adversarial Training via Label Information Elimination. 2991-3000 - Zekun Qian, Ruize Han, Junhui Hou, Linqi Song, Wei Feng:

VOVTrack: Exploring the Potentiality in Raw Videos for Open-Vocabulary Multi-Object Tracking. 7472-7482 - Hanyi Wang, Han Fang, Shi-Lin Wang, Ee-Chien Chang:

ROAR: Reducing Inversion Error in Generative Image Watermarking. 19742-19751 - Conghao Wong, Ziqian Zou, Beihao Xia:

Resonance: Learning to Predict Social-Aware Pedestrian Trajectories as Co-Vibrations. 25788-25799 - Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Mingda Wan, Yufa Zhou:

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective. 11436-11446 - Yuwei Yang, Zeyu Zhang, Yunzhong Hou, Zhuowan Li, Gaowen Liu, Ali Payani, Yuan-Sen Ting, Liang Zheng:

Effective Training Data Synthesis for Improving MLLM Chart Understanding. 2653-2663 - Zhengyao Lv, Chenyang Si, Tianlin Pan, Zhaoxi Chen, Kwan-Yee K. Wong, Yu Qiao, Ziwei Liu:

Dual-Expert Consistency Model for Efficient and High-Quality Video Generation. 14983-14993 - Liyuan Deng, Yunpeng Bai, Yongkang Dai, Xiaoshui Huang, Hongping Gan, Dongshuo Huang, Jiacheng Hao, Yilei Shi:

MamTiff-CAD: Multi-Scale Latent Diffusion with Mamba+ for Complex Parametric Sequence. 10517-10526 - Lingteng Qiu, Xiaodong Gu, Peihao Li, Qi Zuo, Weichao Shen, Junfei Zhang, Kejie Qiu, Weihao Yuan, Guanying Chen, Zilong Dong, Liefeng Bo:

LHM: Large Animatable Human Reconstruction Model for Single Image to 3D in Seconds. 14184-14194 - Xinwei Long, Kai Tian, Peng Xu, Guoli Jia, Jingxuan Li, Sa Yang, Yihua Shao, Kaiyan Zhang, Che Jiang, Hao Xu, Yang Liu, Jiaheng Ma, Bowen Zhou:

AdsQA: Towards Advertisement Video Understanding. 1-12 - Chongyan Chen, Yu-Yun Tseng, Zhuoheng Li, Anush Venkatesh, Danna Gurari:

Acknowledging Focus Ambiguity in Visual Questions. 1228-1238 - Xuan Ju, Weicai Ye, Quande Liu, Qiulin Wang, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Qiang Xu:

FullDiT: Video Generative Foundation Models with Multimodal Control via Full Attention. 15737-15747 - Zesong Yang, Bangbang Yang, Liyuan Cui, Yuewen Ma, Wenqi Dong, Zhaopeng Cui, Chenxuan Cao, Hujun Bao:

InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction From Cluttered Scenes. 7771-7781 - Mingyang Liu, Xinyang Chen, Yang Shu, Xiucheng Li, Weili Guan, Liqiang Nie:

Debiased Curriculum Adaptation for Safe Transfer Learning in Chest X-Ray Classification. 1-10 - Hang Xu, Jie Huang, Linjiang Huang, Dong Liu, Yidi Liu, Feng Zhao:

FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment. 3268-3279 - Yikun Ma, Yiqing Li, Jiawei Wu, Xing Luo, Zhi Jin:

MotionDiff: Training-Free Zero-Shot Interactive Motion Editing via Flow-Assisted Multi-View Diffusion. 14475-14485 - Baijun Ye, Minghui Qin, Saining Zhang, Moonjun Goon, Shaoting Zhu, Hao Zhao, Hang Zhao:

GS-Occ3D: Scaling Vision-Only Occupancy Reconstruction with Gaussian Splatting. 25925-25937 - Jiangming Shi, Xiangbo Yin, Yeyun Chen, Yachao Zhang, Zhizhong Zhang, Yuan Xie, Yanyun Qu:

Multi-Schema Proximity Network for Composed Image Retrieval. 19999-20008 - Corentin Dumery, Noa Etté, Aoxiang Fan, Ren Li, Jingyi Xu, Hieu Le, Pascal Fua:

Counting Stacked Objects. 1-10 - Min-Jung Kim, Minsang Kim, Seung Jun Baek:

ContextFace: Generating Facial Expressions from Emotional Contexts. 11383-11392 - Quankai Gao, Iliyan Georgiev, Tuanfeng Y. Wang, Krishna Kumar Singh, Ulrich Neumann, Jae Shin Yoon:

Can3Tok: Canonical 3D Tokenization and Latent Modeling of Scene-Level 3D Gaussians. 9320-9331 - Teng Li, Guangcong Zheng, Rui Jiang, Shuigen Zhan, Tao Wu, Yehao Lu, Yining Lin, Chuanyun Deng, Yepan Xiong, Min Chen, Lin Cheng, Xi Li:

RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control. 28785-28796 - Baoli Sun, Ning Wang, Xinzhu Ma, Anqi Zou, Yihang Lu, Chuixuan Fan, Zhihui Wang, Kun Lu, Zhiyong Wang:

RobAVA: A Large-Scale Dataset and Baseline Towards Video Based Robotic Arm Action Understanding. 13985-13994 - Yuxin Deng, Kaining Zhang, Linfeng Tang, Jiaqi Yang, Jiayi Ma:

ArgMatch: Adaptive Refinement Gathering for Efficient Dense Matching. 27369-27379 - Trevor D. Canham, SaiKiran Kumar Tedla, Michael J. Murdoch, Michael S. Brown:

Gain-MLP: Improving HDR Gain Map Encoding via a Lightweight MLP. 18619-18628 - Hanzhi Zhong, Zhiyu Xiang, Ruoyu Xu, Jingyun Fu, Peng Xu, Shaohong Wang, Zhihao Yang, Tianyu Pu, Eryun Liu:

CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection. 28188-28197 - Weirong Chen, Ganlin Zhang, Felix Wimbauer, Rui Wang, Nikita Araslanov, Andrea Vedaldi, Daniel Cremers:

Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction. 4951-4960 - Hyeonjoong Jang, Dongyoung Choi, Donggun Kim, Woohyun Kang, Min H. Kim:

Splat-Based 3D Scene Reconstruction with Extreme Motion-Blur. 26425-26434 - Taekyung Ki, Dongchan Min, Gyeongsu Chae:

FLOAT: Generative Motion Latent Flow Matching for Audio-Driven Talking Portrait. 14699-14710 - Guoyizhe Wei, Rama Chellappa:

ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models. 20737-20747 - Qiang Zhu, Yuxuan Jiang, Shuyuan Zhu, Fan Zhang, David Bull, Bing Zeng:

Blind Video Super-Resolution Based on Implicit Kernels. 10971-10981 - Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, Anpei Chen:

Easi3R: Estimating Disentangled Motion from DUSt3R Without Training. 9158-9168 - Xiangzeng Liu, Chi Wang, Guanglu Shi, Xiaodong Zhang, Qiguang Miao, Miao Fan:

SGAD: Semantic and Geometric-Aware Descriptor for Local Feature Matching. 27095-27104 - Yongwei Jiang, Yixiong Zou, Yuhua Li, Ruixuan Li:

Revisiting Pool-Based Prompt Learning for Few-Shot Class-Incremental Learning. 1303-1313 - Pingrui Zhang, Xianqiang Gao, Yuhan Wu, Kehui Liu, Dong Wang, Zhigang Wang, Bin Zhao, Yan Ding, Xuelong Li:

MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation. 6315-6326 - Jiacheng Chen, Ziyu Jiang, Mingfu Liang, Bingbing Zhuang, Jong-Chyi Su, Sparsh Garg, Ying Wu, Manmohan Chandraker:

Autoscape: Geometry-Consistent Long-Horizon Scene Generation. 25700-25711 - Xi Fang, Jiankun Wang, Xiaochen Cai, Shangqian Chen, Shuwen Yang, Haoyi Tao, Nan Wang, Lin Yao, Linfeng Zhang, Guolin Ke:

MolParser: End-to-End Visual Recognition of Molecule Structures in the Wild. 24528-24538 - Dat Nguyen, Marcella Astrid, Anis Kacem, Enjie Ghorbel, Djamila Aouada:

Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection. 10786-10796 - Biao Zhang

, Jing Ren, Peter Wonka:
Geometry Distributions. 1495-1505 - Srikumar Sastry, Aayush Dhakal, Eric Xing, Subash Khanal, Nathan Jacobs:

Global and Local Entailment Learning for Natural World Imagery. 15770-15780 - Wang Liu, Wei Gao:

Omni-Scene Perception-Oriented Point Cloud Geometry Enhancement for Coordinate Quantization. 26055-26064 - Ruyi Xu, Yen-Tzu Chiu, Tai-I Chen, Oscar Chew, Yung-Yu Chuang, Wen-Huang Cheng:

Training-Free Industrial Defect Generation with Diffusion Models. 24214-24223 - Maximilian Augustin, Yannic Neuhaus, Matthias Hein:

DASH: Detection and Assessment of Systematic Hallucinations of VLMs. 22748-22759 - Dubing Chen, Jin Fang, Wencheng Han, Xinjing Cheng, Junbo Yin, Chenzhong Xu, Fahad Shahbaz Khan, Jianbing Shen:

ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions. 4156-4166 - Yijia Hong, Yuan-Chen Guo, Ran Yi, Yulong Chen, Yan-Pei Cao, Lizhuang Ma:

SuperMat: Physically Consistent PBR Material Estimation at Interactive Rates. 25083-25093 - Sakuya Ota, Qing Yu, Kent Fujiwara, Satoshi Ikehata, Ikuro Sato:

PINO: Person-Interaction Noise Optimization for Long-Duration and Customizable Motion Generation of Arbitrary-Sized Groups. 1-10 - Tuo Chen, Jie Gui, Minjing Dong, Ju Jia, Lanting Fang, Jian Liu:

Backdooring Self-Supervised Contrastive Learning by Noisy Alignment. 3684-3693 - Jeonghyeok Do, Sungpyo Kim, Geunhyuk Youk, Jaehyup Lee, Munchurl Kim:

PAN-Crafter: Learning Modality-Consistent Alignment for Pan-Sharpening. 4242-4252 - Shuangrui Ding, Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Yuwei Guo, Dahua Lin, Jiaqi Wang:

SAM2LONG: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree. 13614-13624 - Hui Lu, Albert Ali Salah, Ronald Poppe:

Snakes and Ladders: Two Steps Up for VideoMamba. 24234-24244 - Worameth Chinchuthakun, Tossaporn Saengja, Nontawat Tritrong, Pitchaporn Rewatbowornwong, Pramook Khungurn, Supasorn Suwajanakorn:

LUSD: Localized Update Score Distillation for Text-Guided Image Editing. 1-10 - Guohao Sun, Can Qin, Yihao Feng, Zeyuan Chen, Ran Xu, Sohail A. Dianat, Majid Rabbani, Raghuveer Rao, Zhiqiang Tao:

Structured Policy Optimization: Enhance Large Vision-Language Model via Self-Referenced Dialogue. 741-751 - Jian Ma, Qirong Peng, Xu Guo, Chen Chen, Haonan Lu, Zhenyu Yang:

X2i: Seamless Integration of Multimodal Understanding Into Diffusion Transformer Via Attention Distillation. 16733-16744 - Jefferson Hernandez, Jing Shi, Simon Jenni, Vicente Ordonez, Kushal Kafle:

Improving Large Vision and Language Models by Learning from a Panel of Peers. 1402-1412 - Bizhu Wu, Jinheng Xie, Meidan Ding, Zhe Kong, Jianfeng Ren, Ruibin Bai, Rong Qu, Linlin Shen:

FineMotion: A Dataset and Benchmark with Both Spatial and Temporal Annotation for Fine-Grained Motion Generation and Editing. 13837-13846 - Qingyuan Zhou, Yuehu Gong, Weidong Yang, Jiaze Li, Yeqi Luo, Baixin Xu, Shuhao Li, Ben Fei, Ying He:

MGSR: 2D/3D Mutual-boosted Gaussian Splatting for High-fidelity Surface Reconstruction Under Various Light Conditions. 27295-27304 - An-Lun Liu, Yu-Wei Chao, Yi-Ting Chen:

Task-Oriented Human Grasp Synthesis via Context- and Task-Aware Diffusers. 10375-10385 - Qi Wang, Zeyu Zhang, Dong Wang, Di Gai, Xin Xiong, Jiyang Xu, Ruihua Zhou:

Vehiclemae: View-Asymmetry Mutual Learning for Vehicle Re-Identification Pre-Training Via Masked Autoencoders. 4701-4711 - Hanwen Cao, Haobo Lu, Xiaosen Wang, Kun He:

ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers. 2000-2009 - Chen Liang, Zhicheng Shi, Wenguan Wang, Yi Yang:

Dual Reciprocal Learning of Language-based Human Motion Understanding and Generation. 6252-6262 - Pierre-André Brousseau, Sébastien Roy:

Spherical Epipolar Rectification for Deep Two-View Absolute Depth Estimation. 28925-28934 - Kangle Deng, Hsueh-Ti Derek Liu, Yiheng Zhu, Xiaoxia Sun, Chong Shang, Kiran S. Bhat, Deva Ramanan, Jun-Yan Zhu, Maneesh Agrawala, Tinghui Zhou:

Efficient Autoregressive Shape Generation Via Octree-Based Adaptive Tokenization. 11685-11696 - Justin Kay, Grant Van Horn, Subhransu Maji, Daniel Sheldon, Sara Beery:

Consensus-Driven Active Model Selection. 4594-4604 - Enming Zhang, Yuzhe Li, Yuliang Liu, Yingying Zhu, Xiang Bai:

Towards Comprehensive Lecture Slides Understanding: Large-Scale Dataset and Effective Method. 4455-4464 - Haiyang Liu, Zhan Xu, Fa-Ting Hong, Hsin-Ping Huang, Yi Zhou, Yang Zhou:

Video Motion Graphs. 13730-13740 - Hanxue Zhang, Haoran Jiang, Qingsong Yao, Yanan Sun, Renrui Zhang, Hao Zhao, Hongyang Li, Hongzi Zhu, Zetong Yang:

Detect Anything 3D in the Wild. 1-12 - Xudong Lu, Yinghao Chen, Renshou Wu, Haohao Gao, Xi Chen, Xue Yang, Xiangyu Zhao, Aojun Zhou, Fangyuan Li, Yafei Wen, Xiaoxin Chen, Shuai Ren, Hongsheng Li:

GenieBlue: Integrating Both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices. 4198-4210 - Yawen Zou, Guang Li, Duo Su, Zi Wang, Jun Yu, Chao Zhang:

Dataset Distillation Via Vision-Language Category Prototype. 2941-2950 - Xiao Liu, Nan Pu, Haiyang Zheng, Wenjing Li, Nicu Sebe

, Zhun Zhong:
Generate, Refine, and Encode: Leveraging Synthesized Novel Samples for On-the-Fly Fine-Grained Category Discovery. 1078-1087 - Chen Lin, Weizhi Du, Zhixiang Min, Baochen She, Enrique Dunn, Sonya M. Hanson:

DRaM-LHM: A Quaternion Framework for Iterative Camera Pose Estimation. 6447-6455 - Ziyu Guo, Yizhak Ben-Shabat, Young Yoon Lee, Joseph Liu, Victor Zordan, Mubbasir Kapadia:

StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion. 13349-13359 - Xingxiang Zhou, Xiangdong Su, Haoran Zhang, Wei Chen, Guanglai Gao:

Task-Decoupled Bézier Surface Constraint for Uneven Low-Light Image Enhancement. 6859-6868 - Junyan Ye, Jun He, Weijia Li, Zhutao Lv, Yi Lin, Jinhua Yu, Haote Yang, Conghui He:

Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis. 28451-28461 - Young Kyun Jang, Ser-Nam Lim:

Towards Cross-Modal Backward-Compatible Representation Learning for Vision-Language Models. 1783-1792 - Jinxiu Liang, Bohan Yu, Siqi Yang, Haotian Zhuang, Jieji Ren, Peiqi Duan, Boxin Shi:

EventUPS: Uncalibrated Photometric Stereo Using an Event Camera. 7516-7525 - Han Ji, Yuqi Feng, Jiahao Fan, Yanan Sun:

Loss Functions for Predictor-Based Neural Architecture Search. 1624-1633 - Jiao Tang, Junjie Zhou, Bo Qian, Peng Wan, Yingli Zuo, Wei Shao, Daoqiang Zhang:

AcZeroTS: Active Learning for Zero-Shot Tissue Segmentation in Pathology Images. 23508-23518 - Debasmit Das, Hyoungwoo Park, Munawar Hayat, Seokeon Choi, Sungrack Yun, Fatih Porikli:

ConsNoTrainLoRA: Data-driven Weight Initialization of Low-Rank Adapters Using Constraints. 498-507 - Haochen Chang, Pengfei Ren, Haoyang Zhang, Liang Xie, Hongbo Chen, Erwei Yin:

Hierarchical-Aware Orthogonal Disentanglement Framework for Fine-Grained Skeleton-Based Action Recognition. 1-10 - Peng Ren, Tian Bai, Jing Sun, Fuming Sun:

Seeing the Unseen: A Semantic Alignment and Context-Aware Prompt Framework for Open-Vocabulary Camouflaged Object Segmentation. 1-10 - Lisa Dunlap, Joseph E. Gonzalez, Trevor Darrell, Fabian Caba Heilbron, Josef Sivic, Bryan C. Russell:

Discovering Divergent Representations Between Text-To-Image Models. 17516-17525 - Saarthak Kapse, Pushpak Pati, Srikar Yellapragada, Srijan Das, Rajarsi R. Gupta, Joel H. Saltz, Dimitris Samaras, Prateek Prasanna:

GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology. 20020-20030 - Yinan Zhou, Yuxin Chen, Haokun Lin, Yichen Wu, Shuyu Yang, Zhongang Qi, Chen Ma, Li Zhu:

DOGR: Towards Versatile Visual Document Grounding and Referring. 3596-3606 - Yiyi Ma, Yuanzhi Liang, Xiu Li, Chi Zhang, Xuelong Li:

InterSyn: Interleaved Learning for Dynamic Motion Synthesis in the Wild. 12832-12841 - Chen Shi, Shaoshuai Shi, Kehua Sheng, Bo Zhang, Li Jiang:

DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving. 28599-28609 - Rongqing Li, Changsheng Li, Ruilin Lv, Yuhang Li, Yang Gao, Xiaolu Zhang, Jun Zhou:

NATRA: Noise-Agnostic Framework for Trajectory Prediction with Noisy Observations. 27872-27884 - Xiao Liang, Di Wang, Zhicheng Jiao, Ronghan Li, Pengfei Yang, Quan Wang, Tat-Seng Chua:

Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models. 21144-21154 - Jiwoo Park, Tae Eun Choi, Youngjun Jun, Seong Jae Hwang:

WAVE: Warp-Based View Guidance for Consistent Novel View Synthesis Using a Single Image. 11906-11915 - Jiong Yin, Liang Li, Jiehua Zhang, Yuhan Gao, Chenggang Yan, Xichun Sheng:

Progressive Homeostatic and Plastic Prompt Tuning for Audio-Visual Multi-Task Incremental Learning. 2022-2033 - Jing Yang, Qunliang Xing, Mai Xu, Minglang Qiao:

Uncover Treasures in DCT: Advancing JPEG Quality Enhancement by Exploiting Latent Correlations. 17598-17607 - Zhuoran Yang, Xi Guo, Chenjing Ding, Chiyu Wang, Wei Wu, Yanyong Zhang:

InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation. 25410-25420 - Junming Liu, Siyuan Meng, Yanting Gao, Song Mao, Pinlong Cai, Guohang Yan, Yirong Chen, Zilin Bian, Ding Wang, Botian Shi:

Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning. 981-992 - Mian Zou, Nan Zhong, Baosheng Yu, Yibing Zhan, Kede Ma:

Bi-Level Optimization for Self-Supervised AI-Generated Face Detection. 18959-18968 - Paul Roetzer, Florian Bernard:

Fast Globally Optimal and Geometrically Consistent 3D Shape Matching. 912-922 - Jianzhe Gao, Rui Liu, Wenguan Wang:

3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation. 9252-9262 - Ada-Astrid Balauca, Sanjana Garai, Stefan Balauca, Rasesh Udayakumar Shetty, Naitik Agrawal, Dhwanil Subhashbhai Shah, Yuqian Fu, Xi Wang, Kristina Toutanova, Danda Pani Paudel, Luc Van Gool:

Understanding Museum Exhibits using Vision-Language Reasoning. 2227-2238 - Young Seok Jeon, Hongfei Yang, Huazhu Fu, Mengling Feng:

Teaching AI the Anatomy Behind the Scan: Addressing Anatomical Flaws in Medical Image Segmentation with Learnable Prior. 24024-24033 - Ruiting Dai, Chenxi Li, Yandong Yan, Lisi Mo, Ke Qin, Tao He:

Unbiased Missing-Modality Multimodal Learning. 24507-24517 - Yexin Huang, Yongbin Lin, Lishengsa Yue, Zhihong Yao, Jie Wang:

From Gaze to Movement: Predicting Visual Attention for Autonomous Driving Human-Machine Interaction based on Programmatic Imitation Learning. 26146-26155 - Junho Kim, Gwangtak Bae, Eun Sun Lee, Young Min Kim:

Learning 3D Scene Analogies With Neural Contextual Scene Maps. 7828-7840 - Baojie Fan, Xiaotian Li, Yuhan Zhou, Yuyu Jiang, Jiandong Tian, Huijie Fan:

RIOcc: Efficient Cross-Modal Fusion Transformer with Collaborative Feature Refinement for 3D Semantic Occupancy Prediction. 25851-25861 - Zuo-Liang Zhu, Jian Yang, Beibei Wang:

Gaussian Splatting with Discretized SDF for Relightable Assets. 25155-25164 - Daehee Park, Monu Surana, Pranav Desai, Ashish Mehta, Reuben MV John, Kuk-Jin Yoon:

Generative Active Learning for Long-Tail Trajectory Prediction via Controllable Diffusion Model. 27839-27850 - Ziyang Leng, Jiawei Yang, Wenlong Yi, Bolei Zhou:

Occupancy Learning with Spatiotemporal Memory. 26569-26578 - Shoubin Yu, Difan Liu, Ziqiao Ma, Yicong Hong, Yang Zhou, Hao Tan, Joyce Chai, Mohit Bansal:

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation. 15147-15158 - Zhi-Wei Xia, Kun-Yu Lin, Yuan-Ming Li, Wei-Jin Huang, Xian-Tuo Tan, Wei-Shi Zheng:

Less Static, More Private: Towards Transferable Privacy-Preserving Action Recognition by Generative Decoupled Learning. 12894-12903 - Shuyuan Tu, Qi Dai, Zihao Zhang, Sicheng Xie, Zhi-Qi Cheng, Chong Luo, Xintong Han, Zuxuan Wu, Yu-Gang Jiang:

MotionFollower: Editing Video Motion via Score-Guided Diffusion. 12822-12831 - Deepayan Das, Davide Talon, Yiming Wang

, Massimiliano Mancini, Elisa Ricci:
Training-Free Personalization via Retrieval and Reasoning on Fingerprints. 9683-9692 - Shuo Liang, Yiwu Zhong, Zi-Yuan Hu, Yeyao Tao, Liwei Wang:

Fine-Grained Spatiotemporal Grounding on Egocentric Videos. 9385-9395 - Hongjun Wang, Jiyuan Chen, Zhengwei Yin, Xuan Song, Yinqiang Zheng:

Not All Degradations are Equal: A Targeted Feature Denoising Framework for Generalizable Image Super-Resolution. 14152-14161 - Giyeol Kim, Sooyoung Yang, Jihyong Oh, Myungjoo Kang, Chanho Eom:

Leveraging Prior Knowledge of Diffusion Model for Person Search. 20301-20312 - Shaowei Liu, Chuan Guo, Bing Zhou, Jian Wang:

Ponimator: Unfolding Interactive Pose for Versatile Human-Human Interaction Animation. 12068-12077 - Chenting Wang, Kunchang Li, Tianxiang Jiang, Xiangyu Zeng, Yi Wang, Limin Wang:

Make Your Training Flexible: Towards Deployment-Efficient Video Models. 23880-23891 - Bahri Batuhan Bilecen, Ahmet Berke Gokmen, Furkan Guzelant, Aysegul Dundar:

Identity Preserving 3D Head Stylization with Multiview Score Distillation. 12169-12179 - Edgar Sucar, Zihang Lai, Eldar Insafutdinov, Andrea Vedaldi:

Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction. 7295-7305 - Jinjia Peng, Zeze Tao, Huibing Wang, Meng Wang, Yang Wang:

Boosting Adversarial Transferability via Residual Perturbation Attack. 1261-1270 - Yuanhao Cai, He Zhang, Kai Zhang, Yixun Liang, Mengwei Ren, Fujun Luan, Qing Liu, Soo Ye Kim, Jianming Zhang, Zhifei Zhang, Yuqian Zhou, Yulun Zhang, Xiaokang Yang, Zhe Lin, Alan L. Yuille:

Baking Gaussian Splatting Into Diffusion Denoiser for Fast and Scalable Single-Stage Image-to-3D Generation and Reconstruction. 25062-25072 - Shijie Wang, Jian Shi, Haojie Li:

Adversarial Reconstruction Feedback for Robust Fine-Grained Generalization. 3080-3090 - Liang Qin, Min Wang, Peiwei Li, Wengang Zhou, Houqiang Li:

Active Perception Meets Rule-Guided RL: A Two-Phase Approach for Precise Object Navigation in Complex Environments. 7603-7612 - Hao Zhang, Haolan Xu, Chun Feng, Varun Jampani, Narendra Ahuja:

PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling. 6609-6620 - Donghyun Lee, Dawoon Jeong, Jae W. Lee, Hongil Yoon:

FastPoint: Accelerating 3D Point Cloud Model Inference via Sample Point Distance Prediction. 25114-25123 - Guanxing Lu, Baoxiong Jia, Puhao Li, Yixin Chen, Ziwei Wang, Yansong Tang, Siyuan Huang:

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation. 9263-9274 - Junzhe Lu, Jing Lin, Hongkun Dou, Ailing Zeng, Yue Deng, Xian Liu, Zhongang Cai, Lei Yang, Yulun Zhang, Haoqian Wang, Ziwei Liu:

DPoser-X: Diffusion Model as Robust 3D Whole-Body Human Pose Prior. 9988-9997 - Yicong Li, Yiyang Chen, Zhenyuan Ma, Junbin Xiao, Xiang Wang, Angela Yao:

Intermediate Connectors and Geometric Priors for Language-Guided Affordance Segmentation on Unseen Object Categories. 22836-22845 - Junhao Cheng, Yuying Ge, Yixiao Ge, Jing Liao, Ying Shan:

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction. 10875-10885 - Nataniel Ruiz, Yuanzhen Li, Neal Wadhwa, Yael Pritch, Michael Rubinstein, David E. Jacobs, Shlomi Fruchter:

Magic Insert: Style-Aware Drag-And-Drop. 15971-15981 - Hao Fang, Jiawei Kong, Wenbo Yu, Bin Chen, Jiawei Li, Hao Wu, Shu-Tao Xia, Ke Xu:

One Perturbation is Enough: On Generating Universal Adversarial Perturbations Against Vision-Language Pre-Training Models. 4090-4100 - Lukas Höllein, Aljaz Bozic, Michael Zollhöfer, Matthias Nießner:

3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt. 26740-26750 - Aniruddha Mahapatra, Long Mai, David Bourgin, Yitian Zhang, Feng Liu:

Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces. 17629-17639 - Rundong Luo, Matthew Wallingford, Ali Fahardi, Noah Snavely, Wei-Chiu Ma:

Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos. 14336-14345 - Jiayuan Lu, Rengan Xie, Zixuan Xie, Zhizhen Wu, Dianbing Xi, Qi Ye, Rui Wang, Hujun Bao, Yuchi Huo:

IntrinsicControlNet: Cross-Distribution Image Generation with Real and Unreal. 27315-27325 - Jiahui Lei, Kyle Genova, George Kopanas, Noah Snavely, Leonidas J. Guibas:

MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps. 10022-10031 - Sixian Zhang, Xinyao Yu, Xinhang Song, Yiyao Wang, Shuqiang Jiang:

Function-Centric Bayesian Network for Zero-Shot Object Goal Navigation. 19535-19545 - Caner Korkmaz, Brighton Nuwagira, Baris Coskunuzer, Tolga Birdal:

CuMPerLay: Learning Cubical Multiparameter Persistence Vectorizations. 27084-27094 - Fengxiang Wang, Hongzhen Wang, Di Wang, Zonghao Guo, Zhenyu Zhong, Long Lan, Wenjing Yang, Jing Zhang:

Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling. 6935-6947 - Songsong Duan, Xi Yang, Nannan Wang:

풟ℐℋ-CLIP: Unleashing the Diversity of Multi-Head Self-Attention for Training-Free Open-Vocabulary Semantic Segmentation. 22794-22803 - Xianghan Meng, Zhengyu Tong, Zhiyuan Huang, Chun-Guang Li:

Temporal Rate Reduction Clustering for Human Motion Segmentation. 14644-14654 - Hao Ju, Shaofei Huang, Si Liu, Zhedong Zheng:

Video2BEV: Transforming Drone Videos to BEVs for Video-Based Geo-Localization. 27073-27083 - Zihang Zou, Boqing Gong, Liqiang Wang:

Attention to Neural Plagiarism: Diffusion Models Can Plagiarize Your Copyrighted Images! 19546-19556 - Vladislav Bargatin, Egor Chistov, Alexander Yakovenko, Dmitriy S. Vatolin:

MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation. 1-10 - Xiao Li, Yiming Zhu, Yifan Huang, Wei Zhang, Yingzhe He, Jie Shi, Xiaolin Hu:

PBCAT: Patch-Based Composite Adversarial Training Against Physically Realizable Attacks on Object Detection. 24456-24466 - Giwon Lee, Wooseong Jeong, Daehee Park, Jaewoo Jeong, Kuk-Jin Yoon:

Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning. 1-12 - Katie Z. Luo, Minh-Quan Dao, Zhenzhen Liu, Mark E. Campbell, Wei-Lun Chao, Kilian Q. Weinberger, Ezio Malis, Vincent Frémont, Bharath Hariharan, Mao Shan, Stewart Worrall, Julie Stephany Berrio Perez:

Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration. 28763-28773 - Francesco Milano, Manuel López-Antequera, Naina Dhingra, Roland Siegwart, Robert Thiel:

Discontinuity-Aware Normal Integration for Generic Central Camera Models. 26026-26034 - Haonan He, Yufeng Zheng, Jie Song:

Capturing Head Avatar with Hand Contacts from a Monocular Video. 13099-13108 - Arthur Josi, Luiz Gustavo Hafemann, Abdallah Dib, Emeline Got, Rafael M. O. Cruz, Marc-André Carbonneau:

SEREP: Semantic Facial Expression Representation for Robust in-the-Wild Capture and Retargeting. 14538-14548 - Xinyi Zheng, Steve Zhang, Weizhe Lin, Aaron Zhang, Walterio W. Mayol-Cuevas, Yunze Liu, Junxiao Shen:

CULTURE3D: A Large-Scale and Diverse Dataset of Cultural Landmarks and Terrains for Gaussian-Based Scene Rendering. 29064-29074 - Phillip Mueller, Talip Uenlue, Sebastian Schmidt, Marcel Kollovieh, Jiajie Fan, Stephan Günnemann, Lars Mikelsons:

GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation. 6374-6384 - Tingting Zheng, Hongxun Yao, Kui Jiang, Yi Xiao, Sicheng Zhao:

GMMamba: Group Masking Mamba for Whole Slide Image Classification. 9935-9944 - Yudong Jin, Sida Peng, Xuan Wang, Tao Xie, Zhen Xu, Yifan Yang, Yujun Shen, Hujun Bao, Xiaowei Zhou:

Diffuman4D: 4D Consistent Human View Synthesis From Sparse-View Videos With Spatio-Temporal Diffusion Models. 11047-11057 - Sihan Yang, Runsen Xu, Chenhang Cui, Tai Wang, Dahua Lin, Jiangmiao Pang:

VFLowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization. 23924-23934 - Jiawen Zhu, Yew-Soon Ong, Chunhua Shen, Guansong Pang:

Fine-Grained Abnormality Prompt Learning for Zero-Shot Anomaly Detection. 22241-22251 - Yabo Zhang, Xinpeng Zhou, Yihan Zeng, Hang Xu, Hui Li, Wangmeng Zuo:

FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors. 18121-18131 - Suchisrit Gangopadhyay, Jung Hee Kim, Xien Chen, Patrick Rim, Hyoungseob Park, Alex Wong:

Extending Foundational Monocular Depth Estimators to Fisheye Cameras with Calibration Tokens. 5198-5209 - Shijie Li, Zhongyao Cheng, Rong Li, Shuai Li, Juergen Gall, Xun Xu, Xulei Yang:

Global-Aware Monocular Semantic Scene Completion with State Space Models. 25550-25559 - Quang Nguyen, Nhat Le, Baoru Huang, Minh Nhat Vu, Chengcheng Tang, Van Nguyen, Ngan Le, Thieu Vo, Anh Nguyen:

EgoMusic-Driven Human Dance Motion Estimation with Skeleton Mamba. 12023-12033 - Kaiyang Ji, Ye Shi, Zichen Jin, Kangyi Chen, Lan Xu, Yuexin Ma, Jingyi Yu, Jingya Wang:

Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis. 10173-10183 - Juntao Jian, Xiuping Liu, Zixuan Chen, Manyi Li, Jian Liu, Ruizhen Hu:

G-DexGrasp: Generalizable Dexterous Grasping Synthesis via Part-Aware Prior Retrieval and Prior-Assisted Generation. 11447-11457 - Zhanzhou Feng, Qingpei Guo, Xinyu Xiao, Ruihan Xu, Ming Yang, Shiliang Zhang:

Unified Visual Generation via Next-Set Prediction in Continuous Domain. 19427-19438 - Thomas Dagès, Michael Lindenbaum, Alfred M. Bruckstein:

Metric Convolutions: A Unifying Theory to Adaptive Image Convolutions. 13974-13984 - Akio Kodaira, Chenfeng Xu, Toshiki Hazama, Takanori Yoshimoto, Kohei Ohno, Shogo Mitsuhori, Soichi Sugano, Hanying Cho, Zhijian Liu, Masayoshi Tomizuka, Kurt Keutzer:

StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation. 12371-12380 - Juncheng Mu, Chengwei Ren, Weixiang Zhang, Liang Pan, Xiao-Ping Zhang, Yue Gao:

Diff2I2P: Differentiable Image-to-Point Cloud Registration with Diffusion Prior. 25777-25787 - Bo-Hsu Ke, You-Zhe Xie, Yu-Lun Liu, Wei-Chen Chiu:

StealthAttack: Robust 3D Gaussian Splatting Poisoning via Density-Guided Illusions. 27400-27411 - Hongdi Yang, Chengyang Li, Zhenxuan Wu, Gaozheng Li, Jingya Wang, Jingyi Yu, Zhuo Su, Lan Xu:

SMGDiff: Soccer Motion Generation using Diffusion Probabilistic Models. 11807-11817 - Shiyong Liu, Xiao Tang, Zhihao Li, Yingfan He, Chongjie Ye, Jianzhuang Liu, Binxiao Huang, Shunbo Zhou, Xiaofei Wu:

OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering. 26643-26652 - Jinxi Li, Ziyang Song, Bo Yang:

TRACE: Learning 3D Gaussian Physical Dynamics from Multi-View Videos. 8820-8829 - Wenxue Li, Tian Ye, Xinyu Xiong, Jinbin Bai, Feilong Tang, Wenxuan Song, Zhaohu Xing, Lie Ju, Guanbin Li, Lei Zhu:

GlassWizard: Harvesting Diffusion Priors for Glass Surface Detection. 17848-17858 - Kuo Wang, Quanlong Zheng, Junlin Xie, Yanhao Zhang, Jinguo Luo, Haonan Lu, Liang Lin, Fan Zhou, Guanbin Li:

Free-Moref: Instantly Multiplexing Context Perception Capabilities of Video-Mllms Within Single Inference. 22499-22508 - Yongxin Zhu, Bocheng Li, Yifei Xin, Zhihua Xia, Linli Xu:

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer. 22968-22977 - Yanguang Sun, Jiawei Lian, Jian Yang, Lei Luo:

Controllable-Lpmoe: Adapting to Challenging Object Segmentation Via Dynamic Local Priors From Mixture-Of-Experts. 22327-22337 - Shouwei Ruan, Hanqing Liu, Yao Huang, Xiaoqi Wang, Caixin Kang, Hang Su, Yinpeng Dong, Xingxing Wei:

AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations? 7894-7904 - Chong Xia, Shengjun Zhang, Fangfu Liu, Chang Liu, Khodchaphun Hirunyaratsameewong, Yueqi Duan:

Scenepainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment. 28808-28817 - Richard D. Paul, Johannes Seiffarth, David Rügamer, Katharina Nöh, Hanno Scharr:

How to Make Your Cell Tracker Say "I Dunno!". 6914-6923 - Lei-Lei Li, Jianwu Fang, Junbin Xiao, Shanmin Pang, Hongkai Yu, Chen Lv, Jianru Xue, Tat-Seng Chua:

Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis. 11208-11218 - Yuan Liang, Yang Zhou, Ziming Sun, Tianyi Xiang, Guiqing Li, Shengfeng He:

Instance-Level Video Depth in Groups Beyond Occlusions. 7581-7591 - Hyunjun Jung, Hae-Gon Jeon:

Inverse Image-Based Rendering for Light Field Generation From Single Images. 24739-24749 - Jiaxin Huang, Sheng Miao, Bangbang Yang, Yuewen Ma, Yiyi Liao:

Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting. 1-13 - Ze Li, Feng Zhang, Xiatian Zhu, Meng Zhang, Yanghong Zhou, P. Y. Mok:

Robust Low-Light Scene Restoration via Illumination Transition. 6188-6197 - Fengyuan Yang, Kerui Gu, Ha Linh Nguyen, Tze Ho Elden Tse, Angela Yao:

Humans as Checkerboards: Calibrating Camera Motion Scale for World-Coordinate Human Mesh Recovery. 6069-6079 - Kaining Ying, Hengrui Hu, Henghui Ding:

MOVE: Motion-Guided Few-Shot Video Object Segmentation. 11632-11642 - Zerui Chen, Rolandos Alexandros Potamias, Shizhe Chen, Cordelia Schmid:

HORT: Monocular Hand-held Objects Reconstruction with Transformers. 6046-6057 - Massimiliano Viola, Kevin Qu, Nando Metzger, Bingxin Ke, Alexander Becker, Konrad Schindler, Anton Obukhov:

Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion. 5359-5370 - Tianjiao Jiang, Zhen Zhang, Yuhang Liu, Javen Qinfeng Shi:

Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning. 890-900 - Yuanhui Huang, Weiliang Chen, Wenzhao Zheng, Yueqi Duan, Jie Zhou, Jiwen Lu:

SpectralAR: Spectral Autoregressive Visual Generation. 15842-15852 - Weixian Lei, Jiacong Wang, Haochen Wang, Xiangtai Li, Jun Hao Liew, Jiashi Feng, Zilong Huang:

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer. 20758-20769 - Wooseong Jeong, Kuk-Jin Yoon:

Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning. 2887-2897 - Yitian Zhang, Long Mai, Aniruddha Mahapatra, David Bourgin, Yicong Hong, Jonah Casebeer, Feng Liu, Yun Fu:

REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder. 18453-18462 - Ding Zhong, Xu Zheng, Chenfei Liao, Yuanhuiyi Lyu, Jialei Chen, Shengyang Wu, Linfeng Zhang, Xuming Hu:

OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation. 23892-23901 - Yiting Qu, Ziqing Yang, Yihan Ma, Michael Backes, Savvas Zannettou, Yang Zhang:

Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions. 19617-19627 - Jiahao Wang, Ning Kang, Lewei Yao, Mengzhao Chen, Chengyue Wu, Songyang Zhang, Shuchen Xue, Yong Liu, Taiqiang Wu, Xihui Liu, Kaipeng Zhang, Shifeng Zhang, Wenqi Shao, Zhenguo Li, Ping Luo:

LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation. 16068-16078 - Fan Li, Xuanbin Wang, Xuan Wangi, Zhaoxiang Zhang, Yuelei Xu:

Images as Noisy Labels: Unleashing the Potential of the Diffusion Model for Open-Vocabulary Semantic Segmentation. 24255-24265 - Yuhan Liu, Jingwen Fu, Yang Wu, Kangyi Wu, Pengna Li, Jiayi Wu, Sanping Zhou, Jingmin Xin:

Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching. 20313-20323 - Yong Liu, Song-Li Wu, Sule Bai, Jiahao Wang, Yitong Wang, Yansong Tang:

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation. 22664-22675 - Lennart Bastian, Mohammad Rashed, Nassir Navab, Tolga Birdal:

Forecasting Continuous Non-Conservative Dynamical Systems in So(3). 14845-14855 - Zijie Wu, Chaohui Yu, Fan Wang, Xiang Bai:

AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation. 13557-13568 - Prasen Kumar Sharma, Neeraj Matiyali, Siddharth Srivastava, Gaurav Sharma:

Preserve Anything: Controllable Image Synthesis with Object Preservation. 18058-18067 - Jiaqi Jin, Siwei Wang, Zhibin Dong, Xihong Yang, Xinwang Liu, En Zhu, Kunlun He:

Deep Incomplete Multi-View Clustering with Distribution Dual-Consistency Recovery Guidance. 1016-1026 - Chanhwi Jeong, Inhwan Bae, Jin-Hwi Park, Hae-Gon Jeon:

Test-Time Prompt Tuning for Zero-Shot Depth Completion. 9443-9454 - Baoyou Chen, Ce Liu, Weihao Yuan, Zilong Dong, Siyu Zhu:

Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration. 14507-14516 - Clinton Ansun Mo, Kun Hu, Chengjiang Long, Dong Yuan, Wan-Chi Siu, Zhiyong Wang:

PUMPS: Skeleton-Agnostic Point-Based Universal Motion Pre-Training for Synthesis in Human Motion Tasks. 14496-14506 - Songhua Liu, Ruonan Yu, Xinchao Wang:

UniversalBooth: Model-Agnostic Personalized Text-To-Image Generation. 18314-18324 - Yanzhe Lyu, Kai Cheng, Xin Kang, Xuejin Chen:

Resgs: Residual Densification of 3D Gaussian for Efficient Detail Recovery. 28093-28102 - Jiayuan Chen, Thai-Hoang Pham, Yuanlong Wang, Ping Zhang:

Integrating Biological Knowledge for Robust Microscopy Image Profiling on De Novo Cell Lines. 22846-22856 - Xiangyang Luo, Ye Zhu, Yunfei Liu, Lijian Lin, Cong Wan, Zijian Cai, Yu Li, Shao-Lun Huang:

Canonswap: High-Fidelity and Consistent Video Face Swapping Via Canonical Space Modulation. 10064-10074 - Ling Liu, Jun Tian, Li Yi:

4DSegStreamer: Streaming 4D Panoptic Segmentation via Dual Threads. 7089-7098 - Hao Zheng, Shunzhi Yang, Zhuoxin He, Jinfeng Yang, Zhenhua Huang:

Hierarchical Cross-Modal Prompt Learning for Vision-Language Models. 1891-1901 - Dahye Kim, Xavier Thomas, Deepti Ghadiyaram:

Revelio: Interpreting and Leveraging Semantic Information in Diffusion Models. 4659-4669 - Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Dingkang Liang, Chong Zhang, Dingyuan Zhang, Hongwei Xie, Bing Wang, Xiang Bai:

Orion: A Holistic End-To-End Autonomous Driving Framework by Vision-Language Instructed Action Generation. 24823-24834 - Wenchuan Wang, Mengqi Huang, Yijing Tu, Zhendong Mao:

DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization. 16565-16575 - Qizhen Lan, Qing Tian:

ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation. 3957-3966 - Jathushan Rajasegaran, Ilija Radosavovic, Rahul Ravishankar, Yossi Gandelsman, Christoph Feichtenhofer, Jitendra Malik:

An Empirical Study of Autoregressive Pre-Training from Videos. 19108-19118 - Jiacheng Lu, Hui Ding, Shiyu Zhang, Guoping Huo:

M-Net: MRI Brain Tumor Sequential Segmentation Network via Mesh-Cast. 1-10 - Weili Xu, Enxin Song, Wenhao Chai, Xuexiang Wen, Tian Ye, Gaoang Wang:

Bringing RNNs Back to Efficient Open-Ended Video Understanding. 23453-23465 - Pengfei Zhang, Pinxin Liu, Pablo Garrido, Hyeongwoo Kim, Bindita Chaudhuri:

KinMo: Kinematic-Aware Human Motion Understanding and Generation. 11187-11197 - Yi Huangi, Ke Zhang, Wei Liu, Yuanyuan Wang, Vishal M. Patel, Le Lu, Xu Han, Dakai Jin, Ke Yan:

Harmonyseg: Tubular Structure Segmentation With Deep-Shallow Feature Fusion and Growth-Suppression Balanced Loss. 23571-23581 - Haoran Wang, Bo Zhao, Jinghui Wang, Hanzhang Wang, Huan Yang, Wei Ji, Hao Liu, Xinyan Xiao:

SEGA: A Stepwise Evolution Paradigm for Content-Aware Layout Generation with Design Prior. 19321-19330 - Ziqi Wang, Chang Che, Qi Wang, Yangyang Li, Zenglin Shi, Meng Wang:

SMoLoRa: Exploring and Defying Dual Catastrophic Forgetting in Continual Visual Instruction Tuning. 177-186 - Xiaoyi Bao, Chenwei Xie, Hao Tang, Tingyu Weng, Xiaofeng Wang, Yun Zheng, Xingang Wang:

DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding. 23678-23688 - Sung-Bin Kim, Jeongsoo Choi, Puyuan Peng, Joon Son Chung, Tae-Hyun Oh, David Harwath:

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models. 14623-14632 - Yixu Wang, Yan Teng, Yingchun Wang, Xingjun Ma:

StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data. 263-272 - Yatian Pang, Bin Zhu, Bin Bin, Mingzhe Zheng, Francis E. H. Tay, Ser-Nam Lim, Harry Yang, Li Yuan:

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses. 14039-14050 - Zhihui Zhang, Luanyuan Dai, Qika Lin, Yunfeng Diao, Guangyin Jin, Yufei Guo, Jing Zhang, Xiaoshuai Hao:

Synergistic Prompting for Robust Visual Recognition with Missing Modalities. 1881-1890 - Wentao Hu, Shunkai Li, Ziqiao Peng, Haoxian Zhang, Fan Shi, Xiaoqiang Liu, Pengfei Wan, Di Zhang, Hui Tian:

GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation. 10108-10117 - Jiale Zhao, Xinyang Jiang, Junyao Gao, Yuhao Xue, Cairong Zhao:

Domain Generalizable Portrait Style Transfer. 15802-15811 - Minghan Li, Chenxi Xie, Yichen Wu, Lei Zhang, Mengyu Wang:

FiVE-Bench: A Fine-Grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models. 16672-16681 - Haru Kondoh, Asako Kanezaki:

Embodied Navigation with Auxiliary Task of Action Description Prediction. 7025-7036 - Youwei Zhou, Tianyang Xu, Cong Wu, Xiao-jun Wu, Josef Kittler:

Adaptive Hyper-Graph Convolution Network for Skeleton-Based Human Action Recognition with Virtual Connections. 12648-12658 - Yiting Yang, Hao Luo, Yuan Sun, Qingsen Yan, Haokui Zhang, Wei Dong, Guoqing Wang, Peng Wang, Yang Yang, Hengtao Shen:

Efficient Adaptation of Pre-Trained Vision Transformer Underpinned by Approximately Orthogonal Fine-Tuning Strategy. 4878-4887 - Ziyue Wang, Yurui Dong, Fuwen Luo, Minyuan Ruan, Zhili Cheng, Chi Chen, Peng Li, Yang Liu:

How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in an Extensible Escape Game. 4807-4817 - Qingyuan Liu, Ke Lu, Kun Dong, Jian Xue, Zehai Niu, Jinbao Wang:

Text-to-Any-Skeleton Motion Generation Without Retargeting. 12926-12936 - Yaoye Zhu, Zhe Wang, Yan Wang:

MamV2XCalib: V2X-based Target-Less Infrastructure Camera Calibration with State Space Model. 26696 - Runmin Zhang, Zhu Yu, Si-Yuan Cao, Lingyu Zhu, Guangyi Zhang, Xiaokai Bai, Hui-Liang Shen:

Boosting Multi-View Indoor 3D Object Detection Via Adaptive 3D Volume Construction. 5980-5989 - Jiaxu Zhang, Xianfang Zeng, Xin Chen, Wei Zuo, Gang Yu, Zhigang Tu:

MikuDance: Animating Character Art With Mixed Motion Dynamics. 19689-19699 - Changxing Liu, Genjia Liu, Zijun Wang, Jinchang Yang, Siheng Chen:

Colmdriver: Llm-Based Negotiation Benefits Cooperative Autonomous Driving. 25951-25960 - Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu, Xingang Pan, Xiaoyang Guo, Yuan Liu, Jingwei Huang, Li Yuan, Qian Zhang, Xiao-Xiao Long, Xun Cao, Wei Yin:

Epona: Autoregressive Diffusion World Model for Autonomous Driving. 27220-27230 - Qiao Zhang, Mingwen Shao, Xinyuan Chen, Xiang Lv, Kai Xu:

Wave-Mambaad: Wavelet-Driven State Space Model for Multi-Class Unsupervised Anomaly Detection. 20868-20877 - Yuheng Shi, Mingjia Li, Minjing Dong, Chang Xu:

VSSD: Vision Mamba With Non-Causal State Space Duality. 10819-10829 - Trong Bang Nguyen, Phi Le Nguyen, Simon Lucey, Minh Hoai:

Region-Level Data Attribution for Text-To-Image Generative Models. 18825-18833 - Zhiliang Wu, Kerui Chen, Kun Li, Hehe Fan, Yi Yang:

BVINet: Unlocking Blind Video Inpainting With Zero Annotations. 14017-14027 - Jiale Zhou, Wenhan Wang, Shikun Li, Xiaolei Qu, Xin Guo, Yizhong Liu, Wenzhong Tang, Xun Lin, Yefeng Zheng:

TopoTTA: Topology-Enhanced Test-Time Adaptation for Tubular Structure Segmentation. 24123-24134 - Shengao Wang Boston University, Arjun Chandra, Aoming Liu, Venkatesh Saligrama, Boqing Gong:

BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning. 1380-1390 - Shaojie Ma, Yawei Luo, Wei Yang, Yi Yang:

MaGS: Reconstructing and Simulating Dynamic 3D Objects with Mesh-Adsorbed Gaussian Splatting. 8745-8755 - Weihao Xia, A. Cengiz Öztireli:

Exploring the Visual Feature Space for Multimodal Neural Decoding. 4370-4379 - Bing Fan, Yunhe Feng, Yapeng Tian, James Chenhao Liang, Yuewei Lin, Yan Huang, Heng Fan:

PRVQL: Progressive Knowledge-Guided Refinement for Robust Egocentric Visual Query Localization. 5156-5165 - Sixian Chan, Zedong Li, Wenhao Li, Shijian Lu, Chunhua Shen, Xiaoqin Zhang:

SMStracker: Tri-Path Score Mask Sigma Fusion for Multi-Modal Tracking. 4766-4775 - Pedro Vélez, Luisa F. Polanía, Yi Yang, Chuhan Zhang, Rishabh Kabra, Anurag Arnab, Mehdi S. M. Sajjadi:

From Image to Video: An Empirical Study of Diffusion Representations. 16948-16958 - Dmitrii Torbunov, Yihui Ren, Animesh Ghose, Odera Dim, Yonggang Cui:

EvRT-DETR: Latent Space Adaptation of Image Detectors for Event-Based Vision. 9812-9821 - Yi Zhang, Yuhang Chen, Zhen Chen, Wenjie Ruan, Xiaowei Huang, Siddartha Khastgir, Xingyu Zhao:

Adversarial Training for Probabilistic Robustness. 1675-1685 - Peng Du, Hui Li, Han Xu, Paul Barom Jeon, Dongwook Lee, Daehyun Ji, Ran Yang, Feng Zhu:

Diffusion Transformer Meets Multi-Level Wavelet Spectrum for Single Image Super-Resolution. 19700-19710 - Yuyang Ji, Zeyi Huang, Haohan Wang, Yong Jae Lee:

Customizing Domain Adapters for Domain Generalization. 934-944 - Sitong Wu, Haoru Tan, Yukang Chen, Shaofeng Zhang, Jingyao Li, Bei Yu, Xiaojuan Qi, Jiaya Jia:

Mixture-of-Scores: Robust Image-Text Data Valuation via Three Lines of Code. 24603-24614 - Xiangyu Yin, Boyuan Yang, Weichen Liu, Qiyao Xue, Abrar Alamri, Goeran Fiedler, Wei Gao:

ProGait: A Multi-Purpose Video Dataset and Benchmark for Transfemoral Prosthesis Users. 8984-8993 - Sihang Li, Siqi Tan, Bowen Chang, Jing Zhang, Chen Feng, Yiming Li:

Adversarial Exploitation of Data Diversity Improves Visual Localization. 26848-26858 - Tiancheng Shen, Zilong Huang, Xiangtai Li, Zhijie Lin, Jiyang Liu, Yitong Wang, Jiashi Feng, Ming-Hsuan Yang, Jun Hao Liew:

QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing. 19043-19053 - Arindam Dutta, Meng Zheng, Zhongpai Gao, Benjamin Planche, Anwesa Choudhuri, Terrence Chen, Amit K. Roy-Chowdhury, Ziyan Wu:

CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image. 9124-9135 - Zhe Ma, Qingming Li, Xuhong Zhang, Tianyu Du, Ruixiao Lin, Zonghui Wang, Shouling Ji, Wenzhi Chen:

An Inversion-Based Measure of Memorization for Diffusion Models. 16959-16969 - Gyuejeong Lee, Daeyoung Choi:

Class-Wise Federated Averaging for Efficient Personalization. 1-10 - Fengyuan Shi, Zhuoyan Luo, Yixiao Ge, Yujiu Yang, Ying Shan, Limin Wang:

Scalable Image Tokenization with Index Backpropagation Quantization. 16037-16046 - Zhe Li, Lei Zhang, Zheren Fu, Kun Zhang, Zhendong Mao:

Hierarchy-Aware Pseudo Word Learning with Text Adaptation for Zero-Shot Composed Image Retrieval. 34319-24329 - Ge Zheng, Jiaye Qian, Jiajin Tang, Sibei Yang:

Why LVLMs are More Prone to Hallucinations in Longer Responses: The Role of Context. 4101-4113 - Hongwei Yu, Xinlong Ding, Jiawei Li, Jinlong Wang, Yudong Zhang, Rongquan Wang, Huimin Ma, Jiansheng Chen:

DADet: Safeguarding Image Conditional Diffusion Models Against Adversarial and Backdoor Attacks via Diffusion Anomaly Detection. 17411-17421 - Yingsi Qin, Aswin C. Sankaranarayanan, Matthew O'Toole:

Spatially-Varying Autofocus. 24645-24654 - Qianjiang Hu Wei Hu, Wei Hu:

Large Scene Generation with Cube-Absorb Discrete Diffusion. 25186-25196 - Zhenyu Yan, Jian Wang, Aoqiang Wang, Yuhan Li, Wenxiang Shang, Zhu Hangcheng:

TextMaster: A Unified Framework for Realistic Text Editing via Glyph-Style Dual-Control. 16112-16121 - Wooseong Jeong, Jegyeong Cho, Youngho Yoon, Kuk-Jin Yoon:

Synchronizing Task Behavior: Aligning Multiple Tasks During Test-Time Training. 24340-24350 - Yingde Song, Zongyuan Yang, Baolin Liu, Yongping Xiong, Sai Chen, Lan Yi, Zhaohe Zhang, Xunbo Yu:

EYE3: Turn Anything into Naked-Eye 3D. 27862-27871 - Erik A. Daxberger, Nina Wenzel, David Griffiths, Haiming Gang, Justin Lazarow, Gefen Kohavi, Kai Kang, Marcin Eichner, Yinfei Yang, Afshin Dehghan, Peter Grasch:

MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs. 7395-7408 - Linjing You, Jiabao Lu, Xiayuan Huang, Xiangli Nie:

FRET: Feature Redundancy Elimination for Test Time Adaptation. 2120-2130 - Danhui Chen, Ziquan Liu, Chuxi Yang, Dan Wang, Yan Yan, Yi Xu, Xiangyang Ji:

ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction. 24045-24055 - Jie Zhu, Sungkil Lee:

PBFG: A New Physically-Based Dataset and Removal of Lens Flares and Glares. 5448-5457 - Junxiang Qiu, Lin Liu, Shuo Wang, Jinda Lu, Kezhou Chen, Yanbin Hao:

Accelerating Diffusion Transformer via Gradient-Optimized Cache. 17608-17617 - Paul Engstler, Aleksandar Shtedritski, Iro Laina, Christian Rupprecht, Andrea Vedaldi:

Syncity: Training-Free Generation of 3D Worlds. 27585-27595 - Sindhu B. Hegde, K. R. Prajwal, Taein Kwon, Andrew Zisserman:

Understanding Co-Speech Gestures in-the-Wild. 9977-9987 - Jaehwan Jeong, Sumin In, Sieun Kim, Hannie Shin, Jongheon Jeong, Sang Ho Yoon, Jaewook Chung, Sangpil Kim:

FaceShield: Defending Facial Image Against Deepfake Threats. 10364-10374 - Zhengyuan Peng, Jianqing Xu, Yuge Huang, Jinkun Hao, Shouhong Ding, Zhizhong Zhang, Xin Tan, Lizhuang Ma:

Stylized-Face: A Million-Level Stylized Face Dataset for Face Recognition. 13053-13064 - Yue Su, Xinyu Zhan, Hongjie Fang, Han Xue, Hao-Shu Fang, Yong-Lu Li, Cewu Lu, Lixin Yang:

Dense Policy: Bidirectional Autoregressive Learning of Actions. 14486-14495 - Linzhan Mou, Jiahui Lei, Chen Wang, Lingjie Liu, Kostas Daniilidis:

DIMO: Diverse 3D Motion Generation for Arbitrary Objects. 14357-14368 - Runpeng Yu, Xinyin Ma, Xinchao Wang:

Auto-Controlled Image Perception in MLLMs via Visual Perception Tokens. 21822-21831 - Xiaoran Zhang, Byung-Woo Hong, Hyoungseob Park, Daniel H. Pak, Anne-Marie Rickmann, Lawrence H. Staib, James S. Duncan, Alex Wong:

Progressive Test Time Energy Adaptation for Medical Image Segmentation. 22338-22348 - Kaidong Zhang, Rongtao Xu, Pengzhen Ren, Junfan Lin, Hefeng Wu, Liang Lin, Xiaodan Liang:

RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation. 14590-14601 - Shijie Zhou, Alexander Vilesov, Xuehai He, Ziyu Wan, Shuwang Zhang, Aditya Nagachandra, Di Chang, Dongdong Chen, Xin Eric Wang, Achuta Kadambi:

VLM4D: Towards Spatiotemporal Awareness in Vision Language Models. 8600-8612 - Roi Benita, Michael Finkelson, Tavi Halperin, Gleb Sterkin, Yossi Adi:

CAFA: A Controllable Automatic Foley Artist. 15917-15926 - Xinyue Li, Zhangkai Ni, Wenhan Yang:

AFUNet: Cross-Iterative Alignment-Fusion Synergy for HDR Reconstruction via Deep Unfolding Paradigm. 10666-10675 - Zhiyuan Yang, Anqi Cheng, Haiyue Zhu, Tianjiao Li, Pey Yuen Tao, Kezhi Mao:

HFD-Teacher: High-Frequency Depth Distillation From Depth Foundation Models for Enhanced Depth Completion. 8994-9003 - Xiangyue Zhang

, Jianfang Li, Jiaxu Zhang, Ziqiang Dang, Jianqiang Ren, Liefeng Bo, Zhigang Tu:
SemTalk: Holistic Co-Speech Motion Generation with Frame-Level Semantic Emphasis. 13761-13771 - Kuangpu Guo, Lijun Sheng, Yongcan Yu, Jian Liang, Zilei Wang, Ran He:

Cooperative Pseudo Labeling for Unsupervised Federated Classification. 3326-3336 - Veedant Jain, Gabriel Kreiman, Felipe dos Santos Alves Feitosa:

HumorDB: Can AI Understand Graphical Humor? 604-613 - Thuy Tran, Ruochen Chen, Shaifali Parashar:

Image-Guided Shape-From-Template Using Mesh Inextensibility Constraints. 7419-7428 - Ahmed Abdelreheem

, Filippo Aleotti, Jamie Watson, Zawar Qureshi, Abdelrahman Eldesokey, Peter Wonka, Gabriel J. Brostow, Sara Vicente, Guillermo Garcia-Hernando:
Placeit3d: Language-Guided Object Placement in Real 3D Scenes. 6645-6655 - Chancharik Mitra, Brandon Huang, Tianning Chai, Zhiqiu Lin, Assaf Arbelle, Rogério Feris, Leonid Karlinsky, Trevor Darrell, Deva Ramanan, Roei Herzig:

Enhancing Few-Shot Vision-Language Classification With Large Multimodal Model Features. 2760-2772 - Xianhang Li, Yanqing Liu, Haoqin Tu, Cihang Xie:

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning. 3977-3987 - Xiaorong Qin, Xinhang Song, Sixian Zhang, Xinyao Yu, Xinmiao Zhang, Shuqiang Jiang:

Learning on the Go: A Meta-Learning Object Navigation Model. 8939-8949 - Junyuan Zhang, Qintong Zhang, Bin Wang, Linke Ouyang, Zichen Wen, Ying Li, Ka-Ho Chow, Conghui He, Wentao Zhang:

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation. 17443-17453 - Haonan Qiu, Shiwei Zhang, Yujie Wei, Ruihang Chu, Hangjie Yuan, Xiang Wang, Yingya Zhang, Ziwei Liu:

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion. 16893-16903 - Yifan Zhan, Qingtian Zhu, Muyao Niu, Mingze Ma, Jiancheng Zhao, Zhihang Zhong, Xiao Sun, Yu Qiao, Yinqiang Zheng:

Towards Explicit Exoskeleton for the Reconstruction of Complicated 3D Human Avatars. 14259-14269 - Tianrun Xu, Guanyu Chen, Ye Li, Yuxin Xi, Zeyu Mu, Ruichen Wang, Tianren Zhang, Haichuan Gao, Feng Chen:

OURO: A Self-Bootstrapped Framework for Enhancing Multimodal Scene Understanding. 18240-18251 - Haiyang Guo, Fanhu Zeng, Fei Zhu, Wenzhuo Liu, Da-Han Wang, Jian Xu, Xu-Yao Zhang, Cheng-Lin Liu:

Federated Continual Instruction Tuning. 1325-1335 - Chenghui Lu, Jianlong Kwan, Dilong Li, Ziyi Chen, Haiyan Guan:

Serialization Based Point Cloud Oversegmentation. 25831-25840 - Jiahe Zhao, Rongkun Zheng, Yi Wang, Helin Wang, Hengshuang Zhao:

DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs. 21710-21720 - Shraman Pramanick, Effrosyni Mavroudi, Yale Song, Rama Chellappa, Lorenzo Torresani, Triantafyllos Afouras:

Enrich and Detect: Video Temporal Grounding With Multimodal Llms. 24297-24308 - Xinyu Yan, Meijun Sun, Ge-Peng Ji, Fahad Shahbaz Khan, Salman Khan, Deng-Ping Fan:

LawDIS: Language-Window-Based Controllable Dichotomous Image Segmentation. 23902-23911 - Chunxiao Li, Xiaoxiao Wang, Meiling Li, Boming Miao, Peng Sun, Yunjian Zhang, Xiangyang Ji, Yao Zhu:

Bridging the Gap Between Ideal and Real-World Evaluation: Benchmarking AI-Generated Image Detection in Challenging Scenarios. 20379-20389 - Yuki Urakawa, Yoshihiro Watanabe:

Neural Inverse Rendering for High-Accuracy 3D Measurement of Moving Objects with Fewer Phase-Shifting Patterns. 27692-27701 - Zhiwei Xu:

DAA*: Deep Angular a Star for Image-based Path Planning. 25284-25293 - Vanessa Sklyarova, Egor Zakharov, Malte Prinzler, Giorgio Becherini, Michael J. Black, Justus Thies:

Im2Haircut: Single-View Strand-Based Hair Reconstruction for Human Avatars. 10656-10665 - Shijie Huang, Yiren Song, Yuxuan Zhang, Hailong Guo, Xueyin Wang, Jiaming Liu:

ArtEditor: Learning Customized Instructional Image Editor From Few-Shot Examples. 17651-17662 - Saihui Hou, Panjian Huang, Zengbin Wang, Yuan Liu, Zeyu Li, Man Zhang, Yongzhen Huang:

OpenAnimals: Revisiting Person Re-Identification for Animals Towards Better Generalization. 14369-14379 - Xiaohui Chen, Satya Narayan Shukla, Mahmoud Azab, Aashu Singh, Qifan Wang, David Yang, ShengYun Peng, Hanchao Yu, Shen Yan, Xuewen Zhang, Baosheng He:

CompCap: Improving Multimodal Large Language Models with Composite Captions. 23582-23592 - Junho Lee, Jeongwoo Shin, Hyungwook Choi, Joonseok Lee:

Latent Diffusion Models With Masked Autoencoders. 17422-17431 - Jianhui Zhang, Sheng Cheng, Qirui Sun, Jia Liu, Wang Luyang, Chaoyu Feng, Chen Fang, Lei Lei, Jue Wang, Shuaicheng Liu:

Ultra High-Resolution Image Inpainting with Patch-Based Content Consistency Adapter. 16991-17000 - Li Mi, Manon Béchaz, Zeming Chen, Antoine Bosselut, Devis Tuia:

GeoExplorer: Active Geo-Localization with Curiosity-Driven Exploration. 6122-6131 - Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Alper Canberk, Kwot Sin Lee, Vicente Ordonez, Sergey Tulyakov:

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation. 19373-19385 - Mazlum Ferhat Arslan, Weihong Guo, Shuo Li:

Neuromanifold-Regularized KANs for Shape-fair Feature Representations. 12790-12799 - Zachary Yahn, Selim Furkan Tekin, Fatih Ilhan, Sihao Hu, Tiansheng Huang, Yichang Xu, Margaret Loper, Ling Liu:

Adversarial Attention Perturbations for Large Object Detection Transformers. 3184-3193 - Ruixuan Cong, Yu Wang, Mingyuan Zhao, Da Yang, Rongshan Chen, Hao Sheng:

Rethinking the Upsampling Process in Light Field Super-Resolution with Spatial-Epipolar Implicit Image Function. 7559-7569 - Junhyuk So, Juncheol Shin, Hyunho Kook, Eunhyeok Park:

Grouped Speculative Decoding for Autoregressive Image Generation. 15375-15384 - Chongjie Ye, Yushuang Wu, Ziteng Lu, Jiahao Chang, Xiaoyang Guo, Jiaqing Zhou, Hao Zhao, Xiaoguang Han:

Hi3dgen: High-Fidelity 3D Geometry Generation From Images Via Normal Bridging. 1-12 - Haiping Wang, Yuan Liu, Ziwei Liu, Wenping Wang, Zhen Dong, Bisheng Yang:

Vistadream: Sampling Multiview Consistent Images for Single-View Scene Reconstruction. 26772-26782 - Junli Liu, Qizhi Chen, Zhigang Wang, Yiwen Tang, Yiting Zhang, Chi Yan, Dong Wang, Xuelong Li, Bin Zhao:

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations. 5177-5187 - Mostofa Rafid Uddin, Jana Armouti, Min Xu:

Unsupervised Identification of Protein Compositions and Conformations Via Implicit Content-Transformation Disentanglement. 7483-7493 - Yuhao Sun, Yihua Zhang, Gaowen Liu, Hongtao Xie, Sijia Liu:

Invisible Watermarks, Visible Gains: Steering Machine Unlearning with Bi-Level Watermarking Design. 2417-2428 - Yidi Liu, Dong Liu, Yuxin Ma, Jie Huang, Wenlong Zhang, Xueyang Fu, Zheng-Jun Zha:

Decouple to Reconstruct: High Quality UHD Restoration Via Active Feature Disentanglement and Reversible Fusion. 11622-11631 - Dinh Phu Tran, Dao Duy Hung, Daeyoung Kim:

VSRM: A Robust Mamba-Based Framework for Video Super-Resolution. 14711-14721 - Liuyi Wang, Xinyuan Xia, Hui Zhao, Hanqing Wang, Tai Wang, Yilun Chen, Chengju Liu, Qijun Chen, Jiangmiao Pang:

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities. 9455-9465 - Siyuan Yao, Rui Zhu, Ziqi Wang, Wenqi Ren, Yanyang Yan, Xiaochun Cao:

UMDATrack: Unified Multi-Domain Adaptive Tracking under Adverse Weather Conditions. 6466-6475 - Wen Jiang, Boshu Lei, Katrina Ashton, Kostas Daniilidis:

Multimodal LLM Guided Exploration and Active Mapping Using Fisher Information. 5392-5404 - Mengdi Liu, Zhangyang Gao, Hong Chang, Stan Z. Li, Shiguang Shan, Xilin Chen:

G2PDiffusion: Cross-Species Genotype-to-Phenotype Prediction Via Evolutionary Diffusion. 20705-20714 - Ce Wang, Zhenyu Hu, Wanjie Sun, Zhenzhong Chen:

Timestep-Aware Diffusion Model for Extreme Image Rescaling. 15594-15603 - En Ci, Shanyan Guan, Yanhao Ge, Yilin Zhang, Wei Li, Zhenyu Zhang, Jian Yang, Ying Tai:

Describe, Don't Dictate: Semantic Image Editing with Natural Language Intent. 1-10 - Xinggang Hu, Chenyangguang Zhang, Mingyuan Zhao, Yuanze Gui, Xiangkui Zhang, Xiangyang Ji:

DyGS-SLAM: Real-Time Accurate Localization and Gaussian Reconstruction for Dynamic Scenes. 9561-9571 - Ata Çelen, Marc Pollefeys, Dániel Baráth, Iro Armeni:

HouseTour: A Virtual Real Estate A(I)gent. 17761-17771 - Ting Yao, Yehao Li, Yingwei Pan, Zhaofan Qiu, Tao Mei:

Denoising Token Prediction in Masked Autoregressive Models. 1-10 - Haoye Dong, Gim Hee Lee:

PS-Mamba: Spatial-Temporal Graph Mamba for Pose Sequence Refinement. 8568-8578 - Dayong Su, Yafei Zhang, Huafeng Li, Jinxing Li, Yu Liu:

UniFuse: A Unified All-In-One Framework for Multi-Modal Medical Image Fusion Under Diverse Degradations and Misalignments. 14238-14247 - Long Lian, Yifan Ding, Yunhao Ge, Sifei Liu, Hanzi Mao, Boyi Li, Marco Pavone, Ming-Yu Liu, Trevor Darrell, Adam Yala, Yin Cui:

Describe Anything: Detailed Localized Image and Video Captioning. 21766-21777 - Wenjun Miao, Guansong Pang, Zihan Wang, Jin Zheng, Xiao Bai:

Auxiliary Prompt Tuning of Vision-Language Models for Few-Shot Out-of-Distribution Detection. 1-10 - Shaojin Wu, Mengqi Huang, Wenxu Wu, Yufeng Cheng, Fei Ding, Qian He:

Less-to-More Generalization: Unlocking More Controllability by In-Context Generation. 18682-18692 - Yuansheng Li, Yunhao Zou, Linwei Chen, Ying Fu:

Physical Degradation Model-Guided Interferometric Hyperspectral Reconstruction with Unfolding Transformer. 13815-13825 - Jaejun Hwang, Dayoung Gong, Manjin Kim, Minsu Cho:

Generic Event Boundary Detection via Denoising Diffusion. 14084-14094 - Junyi Wu, Zhiteng Li, Zheng Hui, Yulun Zhang, Linghe Kong, Xiaokang Yang:

QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation. 15035-15044 - Mingqi Yuan, Bo Li, Xin Jin, Wenjun Zeng:

ULTHO: Ultra-Lightweight Yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning. 2620-2630 - Zhigang Wang, Yifei Su, Chenhui Li, Dong Wang, Yan Huang, Xuelong Li, Bin Zhao:

Open-Vocabulary Octree-Graph for 3D Scene Understanding. 7037-7047 - Hongwei Lin, Dongyu Pan, Qiming Xia, Hai Wu, Cheng Wang, Siqi Shen, Chenglu Wen:

Pretend Benign: A Stealthy Adversarial Attack by Exploiting Vulnerabilities in Cooperative Perception. 19947-19956 - Zhijian Huang, Chengjian Feng, Feng Yan, Baihui Xiao, Zequn Jie, Yujie Zhong, Xiaodan Liang, Lin Ma:

RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving. 8011-8021 - Yiwu Zhong, Zhuoming Liu, Yin Li, Liwei Wang:

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning. 20180-20192 - Xudong Li, Zihao Huang, Yan Zhang, Yunhang Shen, Ke Li, Xiawu Zheng, Liujuan Cao, Rongrong Ji:

Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models. 10442-10452 - Qinqian Lei, Bo Wang, Robby T. Tan:

HOLa: Zero-Shot HOI Detection with Low-Rank Decomposed VLM Feature Adaptation. 1825-1835 - Jingyi Zhang, Jiaxing Huang, Huanjin Yao, Shunyu Liu, Xikun Zhang, Shijian Lu, Dacheng Tao:

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-Wise Group Relative Policy Optimization. 1859-1869 - Lu Chen, Yizhou Wang, Shixiang Tang, Qianhong Ma, Tong He, Wanli Ouyang, Xiaowei Zhou, Hujun Bao, Sida Peng:

EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds. 6970-6980 - Harsh Agrawal, Eldon Schoop, Xinlei Pan, Anuj Mahajan, Ari Seff, Di Feng, Ruijia Cheng, Andres Romero Mier Y. Teran, Esteban Gomez, Abhishek Sundararajan, Forrest Huang, Amanda Swearngin, Mohana Prasad Sathya Moorthy, Jeffrey Nichols, Alexander Toshev:

UINavBench: A Framework for Comprehensive Evaluation of Interactive Digital Agents. 23353-23363 - Jianhong Bai, Menghan Xia, Xiao Fu, Xintao Wang, Lianrui Mu, Jinwen Cao, Zuozhu Liu, Haoji Hu, Xiang Bai, Pengfei Wan, Di Zhang:

Recammaster: Camera-Controlled Generative Rendering From a Single Video. 14834-14844 - Yijun Yang, Zhao-Yang Wang, Qiuping Liu, Shuwen Sun, Kang Wang, Rama Chellappa, Zongwei Zhou, Alan L. Yuille, Lei Zhu, Yu-Dong Zhang, Jieneng Chen:

Medical World Model. 8319-8329 - Qing Li, Huifang Feng, Xun Gong, Yu-Shen Liu:

Learning Normals of Noisy Points by Local Gradient-Aware Surface Filtering. 28828-28838 - Seunghyun Lee, Tae-Kyun Kim:

Joint Learning of Pose Regression and Denoising Diffusion with Score Scaling Sampling for Category-Level 6D Pose Estimation. 5757-5768 - Yuanshen Guan, Ruikang Xu, Yinuo Liao, Mingde Yao, Lizhi Wang, Zhiwei Xiong:

HDR Image Generation via Gain Map Decomposed Diffusion. 17536-17545 - Yunqi Miao, Zhiyu Qu, Mingqi Gao, Changrui Chen, Jifei Song, Jungong Han, Jiankang Deng:

Unlocking the Potential of Diffusion Priors in Blind Face Restoration. 13471-13480 - Will Gao, Dilin Wang, Yuchen Fan, Aljaz Bozic, Tuur Stuyck, Zhengqin Li, Zhao Dong, Rakesh Ranjan, Nikolaos Sarafianos:

3D Mesh Editing Using Masked LRMs. 7154-7165 - Shenxing Wei, Jinxi Li, Yafei Yang, Siyuan Zhou, Bo Yang:

RayletDF: Raylet Distance Fields for Generalizable 3D Surface Reconstruction from Point Clouds or Gaussians. 25616-25626 - Xincheng Shuai, Henghui Ding, Zhenyuan Qin, Hao Luo, Xingjun Ma, Dacheng Tao:

Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation. 12449-12458 - Gen Li, Nikolaos Tsagkas, Jifei Song, Ruaridh Mon-Williams, Sethu Vijayakumar, Kun Shao, Laura Sevilla-Lara:

Learning Precise Affordances From Egocentric Videos for Robotic Manipulation. 10581-10591 - Ge Gao, Siyue Teng, Tianhao Peng, Fan Zhang, David Bull:

GIViC: Generative Implicit Video Compression. 1-12 - Hahyeon Choi, Junhoo Lee, Nojun Kwak:

What's Making That Sound Right Now? Video-Centric Audio-Visual Localization. 20095-20104 - Yadong Qu, Hongtao Xie, Yongdong Zhang, Shancheng Fang, Yuxin Wang, Xiaorui Wang, Zhineng Chen:

Igd: Instructional Graphic Design With Multimodal Layer Generatio. 18218-18228 - Yihong Cao, Jiaming Zhang, Xu Zheng, Hao Shi, Kunyu Peng, Hang Liu, Kailun Yang, Hui Zhang:

Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation. 8961-8972 - Junhao Dong, Piotr Koniusz, Liaoyuan Feng, Yifei Zhang, Hao Zhu, Weiming Liu, Xinghua Qu, Yew-Soon Ong:

Robustifying Zero-Shot Vision Language Models by Subspaces Alignment. 21037-21047 - Minkyun Seo, Hyungtae Lim, Kanghee Lee, Luca Carlone, Jaesik Park:

Buffer-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes. 3851-3862 - Zihan Wang, Jeff Tan, Tarasha Khurana, Neehar Peri, Deva Ramanan:

MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion. 8252-8263 - Jingye Chen, Zhaowen Wang, Nanxuan Zhao, Li Zhang, Difan Liu, Jimei Yang, Qifeng Chen:

Rethinking Layered Graphic Design Generation with a Top-Down Approach. 16861-16870 - Runyang Feng, Hyung Jin Chang, Tze Ho Elden Tse, Boeun Kim, Yi Chang, Yixing Gao:

High-Resolution Spatiotemporal Modeling with Global-Local State Space Models for Video-Based Human Pose Estimation. 1-10 - Yaopeng Lou, Li Shen, Tianqi Liu, Jiaqi Li, Zihao Huang, Huiqiang Sun, Zhiguo Cao:

MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction. 25583-25593 - Yijing Lin, Mengqi Huang, Shuhan Zhuang, Zhendong Mao:

Realgeneral: Unifying Visual Generation Via Temporal in-Context Learning With Video Models. 14994-15004 - Hongji Yang, Wencheng Han, Yucheng Zhou, Jianbing Shen:

DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models. 1-10 - Wenzhuang Wang, Yifan Zhao, Mingcan Ma, Ming Liu, Zhonglin Jiang, Yong Chen, Jia Li:

FICGen: Frequency-Inspired Contextual Disentanglement for Layout-driven Degraded Image Generation. 19097-19107 - Zheng Gao, Jifei Song, Zhensong Zhang, Jiankang Deng, Ioannis Patras:

Frequency-Guided Diffusion for Training-Free Text-Driven Image Translation. 19195-19205 - Qihang Fan, Huaibo Huang, Yuang Ai, Ran He:

Rectifying Magnitude Neglect in Linear Attention. 21505-21514 - Mengmeng Wang, Haonan Wang, Yulong Li, Xiangjie Kong, Jiaxin Du, Guojiang Shen, Feng Xia:

TrackAny3D: Transferring Pretrained 3D Models for Category-Unified 3D Point Cloud Tracking. 28249-28259 - Ayush Gupta, Anirban Roy, Rama Chellappa, Nathaniel D. Bastian, Alvaro Velasquez, Susmit Jha:

TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision. 23593-23603 - Ling Lo, Kelvin C. K. Chan, Wen-Huang Cheng, Ming-Hsuan Yang:

From Prompt to Progression: Taming Video Diffusion Models for Seamless Attribute Transition. 1-10 - Kaixiang Yang, Xin Li, Qiang Li, Zhiwei Wang:

CoStoDet-DDPM: Collaborative Training of Stochastic and Deterministic Models Improves Surgical Workflow Anticipation and Recognition. 23741-23751 - Xinyang Zhou, Fanyue Wei, Lixin Duan, Angela Yao, Wen Li:

The Devil Is in the Spurious Correlations: Boosting Moment Retrieval With Dynamic Learning. 20981-20990 - Artem Zholus, Carl Doersch, Yi Yang, Skanda Koppula, Viorica Patraucean, Xu Owen He, Ignacio Rocco, Mehdi S. M. Sajjadi, Sarath Chandar, Ross Goroshin:

Tapnext: Tracking Any Point (Tap) as Next Token Prediction. 9693-9703 - Ibtihel Amara, Ahmed Imtiaz Humayun, Ivana Kajic, Zarana Parekh, Natalie Harris, Sarah Young, Chirag Nagpal, Najoung Kim, Junfeng He, Cristina Nader Vasconcelos, Deepak Ramachandran, Golnoosh Farnadi, Katherine A. Heller, Mohammad Havaei, Negar Rostamzadeh:

Erasing More Than Intended? How Concept Erasure Degrades the Generation of Non-Target Concepts. 16420-16430 - Chen Gao, Shuo Zhang, Youfang Lin:

Epipolar Consistent Attention Aggregation Network for Unsupervised Light Field Disparity Estimation. 6488-6497 - Shouwen Wang, Qian Wan, Junbin Gao, Zhigang Zeng:

Language-Driven Multi-Label Zero-Shot Learning with Semantic Granularity. 1968-1978 - Md Ashiqur Rahman, Chiao-An Yang, Michael N. Cheng, Lim Jun Hao, Jeremiah Jiang, Teck-Yian Lim, Raymond A. Yeh:

Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer. 10527-10537 - Zhimin Chen, Xuewei Chen, Xiao Guo, Yingwei Li, Longlong Jing, Liang Yang, Bing Li:

Point Cloud Self-Supervised Learning via 3D to Multi-View Masked Learner. 27618-27629 - Jhe-Hao Lin, Yi Yao, Chan-Feng Hsu, Hong-Xia Xie, Hong-Han Shuai, Wen-Huang Cheng:

Perspective-Aware Teaching: Adapting Knowledge for Heterogeneous Distillation. 4178-4187 - Bin Yang, Yulin Zhang, Hong-Yu Zhou, Sibei Yang:

No More Sibling Rivalry: Debiasing Human-Object Interaction Detection. 22707-22717 - Rui Song, Chenwei Liang, Yan Xia, Walter Zimmer, Hu Cao, Holger Caesar, Andreas Festag, Alois Knoll:

CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving. 28031-28041 - Qing Ma, Pengwei Liang, Xiong Zhou, Jiayi Ma, Junjun Jiang, Zhe Peng:

Robust Test-Time Adaptation for Single Image Denoising Using Deep Gaussian Prior. 11230-11240 - Yuval Nissan, Marc Pollefeys, Daniel Barath:

Planar Affine Rectification from Local Change of Scale and Orientation. 27147-27155 - M. Anwar Ma'sum, Mahardhika Pratama, Savitha Ramasamy, Lin Liu, Habibullah Habibullah, Ryszard Kowalczyk:

PROL: Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning. 2471-2481 - Songlin Yang, Yushi Lan, Honghua Chen, Xingang Pan:

Textured 3D Regenerative Morphing with 3D Diffusion Prior. 15159-15170 - Dae-Young Song, Jung-Jae Yu, Donghyeon Cho:

Progressive Artwork Outpainting Via Latent Diffusion Models. 15405-15415 - Mengyu Gao, Qiulei Dong:

Causality-Guided Prompt Learning for Vision-Language Models via Visual Granulation. 1141-1151 - Zhen Qu, Xian Tao, Xinyi Gong, Shichen Qu, Xiaopei Zhang, Xingang Wang, Fei Shen, Zhengtao Zhang, Mukesh Prasad, Guiguang Ding:

DictAS: A Framework for Class-Generalizable Few-Shot Anomaly Segmentation via Dictionary Lookup. 20519-20528 - Haoyi Zhu, Yifan Wang, Jianjun Zhou, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Chunhua Shen, Jiangmiao Pang, Tong He:

AETHER: Geometric-Aware Unified World Modeling. 8535-8546 - Qi Chen, Xinze Zhou, Chen Liu, Hao Chen, Wenxuan Li, Zekun Jiang, Ziyan Huang, Yuxuan Zhao, Dexin Yu, Junjun He, Yefeng Zheng, Ling Shao, Alan L. Yuille, Zongwei Zhou:

Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data. 24001-24013 - Hyungrok Jung, Daneul Kim, Seunggyun Lim, Jeany Son, Jonghyun Choi:

Online Generic Event Boundary Detection. 13741-13750 - Yuxue Yang, Lue Fan, Zuzeng Lin, Feng Wang, Zhaoxiang Zhang:

LayerAnimate: Layer-Level Control for Animation. 10865-10874 - Yuxuan Wang, Yiqi Song, Cihang Xie, Yang Liu, Zilong Zheng:

VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges. 24170-24181 - Hui Li, Xiaoyu Ren, Hongjiu Yu, Ying Chen, Kai Li, L. Wang, Xiongkuo Min, Huiyu Duan, Guangtao Zhai, Xu Liu:

FPEM: Face Prior Enhanced Facial Attractiveness Prediction for Live Videos with Face Retouching. 11458-11468 - Duo Wu, Jinghe Wang, Yuan Meng, Yanning Zhang, Le Sun, Zhi Wang:

CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning. 8699-8709 - Yuxiang Ji, Boyong He, Zhuoyue Tan, Liaoni Wu:

MMGeo: Multimodal Compositional Geo-Localization for UAVs. 25165-25175 - Yunqiu Xu, Linchao Zhu, Yi Yang:

MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs. 17675-17687 - Yujie Xue, Huilong Pi, Jiapeng Zhang, Yunchuan Qin, Zhuo Tang, Kenli Li, Ruihui Li:

SDFormer: Vision-Based 3D Semantic Scene Completion via SAM-Assisted Dual-Channel Voxel Transformer. 26837-26847 - Yunzhe Shao, Xinyu Yi, Lu Yin, Shihui Guo, Junhai Yong, Feng Xu:

MagShield: Towards Better Robustness in Sparse Inertial Motion Capture Under Magnetic Disturbances. 29021-29030 - Kai Huang, Hao Zou, Bochen Wang, Ye Xi, Zhen Xie, Hao Wang:

AirCache: Activating Inter-Modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference. 23958-23967 - Wenzheng Zeng, Difei Gao, Mike Zheng Shou, Hwee Tou Ng:

Factorized Learning for Temporally Grounded Video-Language Models. 20683-20693 - Haiwen Feng, Junyi Zhang, Qianqian Wang, Yufei Ye, Pengcheng Yu, Michael J. Black, Trevor Darrell, Angjoo Kanazawa:

ST4RTrack: Simultaneous 4D Reconstruction and Tracking in the World. 8503-8513 - Yunze Tong, Fengda Zhang, Didi Zhu, Jun Xiao, Kun Kuang:

Decoding Correlation-Induced Misalignment in the Stable Diffusion Workflow for Text-to-Image Generation. 1-10 - Chenwei Lin, Hanjia Lyu, Xian Xu, Jiebo Luo:

INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance. 9036-9047 - Aashish Sharma:

DM-EFS: Dynamically Multiplexed Expanded Features Set form for Robust and Efficient Small Object Detection. 24569-24579 - Ilia A. Petrov, Riccardo Marin, Julian Chibane, Gerard Pons-Moll:

TriDi: Trilateral Diffusion of 3D Humans, Objects, and Interactions. 5523-5535 - Seonghoon Yu, Joonbeom Hong, Joonseok Lee, Jeany Son:

Latent Expression Generation for Referring Image Segmentation and Grounding. 21374-21383 - Vittorio Pipoli, Alessia Saporita, Federico Bolelli, Marcella Cornia, Lorenzo Baraldi, Costantino Grana, Rita Cucchiara, Elisa Ficarra:

MISSRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models. 3215-3224 - Jiahao Luo, Chaoyang Wang, Michael Vasilkovsky, Vladislav Shakhrai, Di Liu, Peiye Zhuang, Sergey Tulyakov, Peter Wonka, Hsin-Ying Lee, James Davis, Jian Wang:

T2Bs: Text-to-Character Blendshapes via Video Generation. 13625-13637 - Haoyu Zhao, Hao Wang, Xingyue Zhao, Hao Fei, Hongqiu Wang, Chengjiang Long, Hua Zou:

PhysSplat: Efficient Physics Simulation for 3D Scenes via MLLM-Guided Gaussian Splatting. 5242-5252 - Qiqi Liu, Jiaqiang Li, Yuchen Liu, Yaochu Jin, Lingjuan Lyu, Xiaohu Wu, Han Yu:

Personalized Federated Learning Under Local Supervision. 4069-4079 - Chen-Liang Fan, Mingpei Cao, Chih Chien Hung, Yuesheng Zhu:

Optical Model-Driven Sharpness Mapping for Autofocus in Small Depth-of-Field and Severe Defocus Scenarios. 6426-6435 - Jens U. Kreber, Joerg Stueckler:

Guiding Diffusion-Based Articulated Object Generation by Partial Point Cloud Alignment and Physical Plausibility Constraints. 3206-3214 - Shaokai Wu, Yuxiang Lu, Yapan Guo, Wei Ji, Suizhi Huang, Fengyu Yang, Shalayiding Sirejiding, Qichen He, Jing Tong, Yanbiao Ji, Yue Ding, Hongtao Lu:

Discretized Gaussian Representation for Tomographic Reconstruction. 25073-25082 - Xin Qiao, Matteo Poggi, Xing Wei, Pengchao Deng, Yanhui Zhou, Stefano Mattoccia:

Learnable Fractional Reaction-Diffusion Dynamics for Under-Display ToF Imaging and Beyond. 6080-6090 - Zehuan Huang, Yuan-Chen Guo, Haoran Wang, Ran Yi, Lizhuang Ma, Yan-Pei Cao, Lu Sheng:

Mv-Adapter: Multi-View Consistent Image Generation Made Easy. 16377-16387 - Hai Wu, Hongwei Lin, Xusheng Guo, Xin Li, Mingming Wang, Cheng Wang, Chenglu Wen:

Motal: Unsupervised 3D Object Detection by Modality and Task-Specific Knowledge Transfer. 6284-6293 - Yihang Liu, Ying Wen, Longzhen Yang, Lianghua He, Heng Tao Shen:

CoSMIC: Continual Self-Supervised Learning for Multi-Domain Medical Imaging Via Conditional Mutual Information Maximization. 23051-23062 - Dongyeun Lee, Jiwan Hur, Hyounguk Shon, Jae Young Lee, Junmo Kim:

DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization. 18510-18520 - Gopika Sudhakaran, Hikaru Shindo, Patrick Schramowski, Simone Schaub-Meyer, Kristian Kersting, Stefan Roth:

ART: Adaptive Relation Tuning for Generalized Relation Prediction. 16323-16332 - Yusuke Yoshiyasu, Leyuan Sun, Ryusuke Sagawa:

MeshMamba: State Space Models for Articulated 3D Mesh Generation and Reconstruction. 6563-6574 - Chengtang Yao, Lidong Yu, Zhidan Liu, Jiaxi Zeng, Yuwei Wu, Yunde Jia:

Diving into the Fusion of Monocular Priors for Generalized Stereo Matching. 14887-14897 - Yiming Gong, Zhen Zhu, Minjia Zhang:

InstantEdit: Text-Guided Few-Step Image Editing with Piecewise Rectified Flow. 16808-16817 - Ziming Yu, Pan Zhou, Sike Wang, Jia Li, Mi Tian, Hua Huang:

Zeroth-Order Fine-Tuning of LLMs in Random Subspaces. 4475-4485 - Xuehan Chen, Guangyu Ren, Tianhong Dai, Tania Stathaki, Hengyan Liu:

Enhancing Prompt Generation with Adaptive Refinement for Camouflaged Object Detection. 20672-20682 - Ye Tao

, Jiawei Zhang, Yahao Shi, Dongqing Zou, Bin Zhou:
GSV3D: Gaussian Splatting-Based Geometric Distillation With Stable Video Diffusion for Single-Image 3D Object Generation. 7751-7760 - Wenkui Yang, Jie Cao, Junxian Duan, Ran He:

Towards Robust Defense Against Customization via Protective Perturbation Resistant to Diffusion-based Purification. 19290-19300 - Yingsen Zeng, Zepeng Huang, Yujie Zhong, Chengjian Feng, Jie Hu, Lin Ma, Yang Liu:

DisTime: Distribution-Based Time Representation for Video Large Language Models. 21961-21971 - Xinhang Wan, Jiyuan Liu, Qian Qu, Suyuan Liu, Chuyu Zhang, Fangdi Wang, Xinwang Liu, En Zhu, Kunlun He:

Intra-view and Inter-view Correlation Guided Multi-view Novel Class Discovery. 4114-4124 - Jingming He, Chongyi Li, Shiqi Wang, Sam Kwong:

Joint Semantic and Rendering Enhancements in 3D Gaussian Modeling with Anisotropic Local Encoding. 28354-28363 - Haoxuan Wang, Jinlong Peng, Qingdong He, Hao Yang, Ying Jin, Jiafu Wu, Xiaobin Hu, Yanjie Pan, Zhenye Gan, Mingmin Chi, Bo Peng, Yabiao Wang:

Unicombine: Unified Multi-Conditional Combination with Diffusion Transformer. 18325-18334 - G. Thomas Hudson, Dean L. Slack, Thomas Winterbottom, Jamie Sterling, Chenghao Xiao, Junjie Shentu, Noura Al Moubayed:

Everything is a Video: Unifying Modalities Through Next-Frame Prediction. 22004-22013 - Dimitrije Antic, Georgios Paschalidis, Shashank Tripathi, Theo Gevers, Sai Kumar Dwivedi, Dimitrios Tzionas:

SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image. 96116-96226 - Yuval Haitman, Oded Bialer:

DoppDrive: Doppler-Driven Temporal Aggregation for Improved Radar Object Detection. 26085-26094 - Yilin Gao, Kangyi Chen, Zhongxing Peng, Hengjie Lu, Shugong Xu:

Knowledge Transfer from Interaction Learning. 3585-3595 - Yuntao Chen, Yuqi Wang, Zhaoxiang Zhang:

DrivingGPT: Unifying Driving World Modeling and Planning with Multi-Modal Autoregressive Transformers. 26890-26900 - Tianyu Zhang, Haobo Jiang, Jian Yang, Jin Xie:

DiffPCI: Large Motion Point Cloud Frame Interpolation with Diffusion Model. 27348-27358 - Haoyu Wu, Jingyi Xu, Hieu Le, Dimitris Samaras:

Importance-Based Token Merging for Efficient Image and Video Generation. 4983-4995 - Jin Hu, Mingjia Li, Xiaojie Guo:

ShadowHack: Hacking Shadows via Luminance-Color Divide and Conquer. 11403-11413 - Zhiqiang Yuan, Ting Zhang, Yeshuang Zhu, Jiapei Zhang, Ying Deng, Zexi Jia, Peixiang Luo, Xiaoyue Duan, Jie Zhou, Jinchao Zhang:

WalkVLM: Aid Visually Impaired People Walking by Vision Language Model. 9845-9854 - Weihan Wang, Zehai He, Wenyi Hong, Yean Cheng, Xiaohan Zhang, Ji Qi, Ming Ding, Xiaotao Gu, Shiyu Huang, Bin Xu, Yuxiao Dong, Jie Tang:

LVBench: An Extreme Long Video Understanding Benchmark. 22958-22967 - Bowei Guo, Shengkun Tang, Cong Zeng, Zhiqiang Shen:

MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics. 1655-1664 - Kejia Zhang, Juanjuan Weng, Shaozi Li, Zhiming Luo:

Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment. 2783-2792 - Shadi Hamdan, Chonghao Sima, Zetong Yang, Hongyang Li, Fatma Güney:

ETA: Efficiency through Thinking Ahead, a Dual Approach to Self-Driving with Large Models. 26529-26538 - Borui Kang, Lei Wang, Zhiping Wu, Tao Feng, Yawen Li, Yang Gao, Wenbin Li:

Dynamic Multi-Layer Null Space Projection for Vision-Language Continual Learning. 2077-2086 - Jialong Wu, Marco Braun, Dominic Spata, Matthias Rottmann:

TARS: Traffic-Aware Radar Scene Flow Estimation. 26075-26084 - Yansong Guo, Jie Hu, Yansong Qu, Liujuan Cao:

WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images. 5166-5176 - Sijia Chen, Bin Song:

RMultiplex200K: Toward Reliable Multimodal Process Supervision for Visual Language Models on Telecommunications. 1686-1696 - Fu-Zhao Ou, Chongyi Li, Shiqi Wang, Sam Kwong:

MR-FIQA: Face Image Quality Assessment with Multi-Reference Representations from Synthetic Data Generation. 12915-12925 - Shiqi Huang, Shuting He, Huaiyuan Qin, Bihan Wen:

SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation. 12559-12569 - Hanyuan Liu, Chengze Li, Minshan Xie

, Zhenni Wang, Jiawen Liang, Chi-Sing Leung, Tien-Tsin Wong:
BlueNeg: A 35MM Negative Film Dataset for Restoring Channel-Heterogeneous Deterioration. 13119-13128 - Xiaoyue Mi, Fan Tang, Zonghan Yang, Danding Wang, Juan Cao, Peng Li, Yang Liu:

Adversarial Robust Memory-Based Continual Learner. 562-572 - Zhixiang Guo, Siyuan Liang, Aishan Liu, Dacheng Tao:

CopyrightShield: Enhancing Diffusion Model Security Against Copyright Infringement Attacks. 19417-19426 - Tom Nuno Wolf, Emre Kavak, Fabian Bongratz, Christian Wachinger:

SIC: Similarity-Based Interpretable Image Classification with Neural Networks. 24276-24285 - Jiwon Kim, Pu-Reum Kim, SeonHwa Kim, Soobin Park, Eunju Cha, Kyong Hwan Jin:

Dual Recursive Feedback on Generation and Appearance Latents for Pose-Robust Text-to-Image Diffusion. 15491-15500 - Junfu Tan, Peiguang Jing, Yu Zhu, Yu Liu:

MPBR: Multimodal Progressive Bidirectional Reasoning for Open-Set Fine-Grained Recognition. 1282-1291 - Lorenzo Baraldi, Davide Bucciarelli, Federico Betti, Marcella Cornia, Lorenzo Baraldi, Nicu Sebe

, Rita Cucchiara:
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models. 16217-16226 - Haowen Bai, Jiangshe Zhang, Zixiang Zhao, Lilun Deng, Yukun Cui, Shuang Xu:

Retinex-MEF: Retinex-Based Glare Effects Aware Unsupervised Multi-Exposure Image Fusion. 7251-7261 - Ran Ran, Jiwei Wei, Shiyuan He, Zeyu Ma, Chaoning Zhang, Ning Xie, Yang Yang:

KDA: Knowledge Diffusion Alignment with Enhanced Context for Video Temporal Grounding. 23311-23320 - Dewei Zhou, Mingwei Li, Zongxin Yang, Yi Yang:

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models. 16712-16722 - Dongyoung Kim, Mahmoud Afifi, Dongyun Kim, Michael S. Brown, Seon Joo Kim:

CCMNet: Leveraging Calibrated Color Correction Matrices for Cross-Camera Color Constancy. 6198-6208 - Tiange Luo, Lajanugen Logeswaran, Justin Johnson, Honglak Lee:

Visual Test-Time Scaling for GUI Agent Grounding. 19989-19998 - Yukuan Min, Muli Yang, Jinhao Zhang, Yuxuan Wang, Aming Wu, Cheng Deng:

Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation. 16755-16764 - Sherry X. Chen, Yi Wei, Luowei Zhou, Suren Kumar:

ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation. 18345-18356 - Yujie Zhang, Bingyang Cui, Qi Yang, Zhu Li, Yiling Xu:

Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-To-3D Generation. 18563-18574 - Jiaqi Liao, Zhengyuan Yang, Linjie Li, Dianqi Li, Kevin Lin, Yu Cheng, Lijuan Wang:

ImageGen-CoT: Enhancing Text-to-Image in-context Learning with Chain-of-Thought Reasoning. 17214-17223 - Evangelos Kazakos, Cordelia Schmid, Josef Sivic:

Large-Scale Pre-Training for Grounded Video Caption Generation. 24434-24444 - Ruoxi Guo, Huaijin Pi, Zehong Shen, Qing Shuai, Zechen Hu, Zhumei Wang, Yajiao Dong, Ruizhen Hu, Taku Komura, Sida Peng, Xiaowei Zhou:

Motion-2-To-3: Leveraging 2D Motion Data for 3D Motion Generations. 14305-14316 - Xinhao Cai, Qiuxia Lai, Gensheng Pei, Xiangbo Shu, Yazhou Yao, Wenguan Wang:

Cycle-Consistent Learning for Joint Layout-to-Image Generation and Object Detection. 6797-6807 - Sarosij Bose, Arindam Dutta, Sayak Nag, Junge Zhang, Jiachen Li, Konstantinos Karydis, Amit K. Roy-Chowdhury:

Uncertainty-Aware Diffusion-Guided Refinement of 3D Scenes. 28271-28281 - Qikui Zhu:

PossLoss: A Reliable and Sensitive Facial Landmark Detection Loss Function. 24858-24867 - Kaname Yokoyama, Chihiro Nakatani, Norimichi Ukita:

Dynamic Group Detection using VLM-augmented Temporal Groupness Graph. 10475-10484 - Nupur Kumari, Xi Yin, Jun-Yan Zhu, Ishan Misra, Samaneh Azadi:

Generating Multi-Image Synthetic Data for Text-to-Image Customization. 16524-16534 - Pingchuan Ma, Ming Gui, Johannes Schusterbauer, Xiaopei Yang, Olga Grebenkova, Vincent Tao Hu, Björn Ommer:

Stochastic Interpolants for Revealing Stylistic Flows Across the History of Art. 5867-5878 - Huu-Tai Phung, Zong-Lin Gao, Yi-Chen Yao, Kuan-Wei Ho, Yi-Hsin Chen, Yu-Hsiang Lin, Alessandro Gnutti, Wen-Hsiao Peng:

MH-LVC: Multi-Hypothesis Temporal Prediction for Learned Conditional Residual Video Coding. 19649-19658 - Marko Mihajlovic, Siwei Zhang, Gen Li, Kaifeng Zhao, Lea Müller, Siyu Tang:

VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions. 5060-5070 - Jeongmin Yu, Susang Kim, Kisu Lee, Taekyoung Kwon, Won-Yong Shin, Ha Young Kim:

Multi-View Slot Attention using Paraphrased Texts for Face Anti-Spoofing. 21117-21128 - Kota Shimomura, Masaki Nambata, Atsuya Ishikawa, Ryota Mimura, Koki Inoue, Takayoshi Yamashita, Takayuki Kawabuchi:

OD-RASE: Ontology-Driven Risk Assessment and Safety Enhancement for Autonomous Driving. 26167-26177 - Kevin Tandi, Xiang Dai, Chinmay Talegaonkar, Gal Mishne, Nick Antipa:

RnGCam: High-Speed Video from Rolling & Global Shutter Measurements. 8830-8840 - Xiwei Xuan, Ziquan Deng, Kwan-Liu Ma:

ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation. 20954-20965 - Weihao Wang, Yu Lan, Mingyu You, Bin He:

Completing 3D Partial Assemblies with View-Consistent 2D-3D Correspondence. 7741-7750 - Sen Wang, Shao Zeng, Tianjun Gu, Zhizhong Zhang, Ruixin Zhang, Shouhong Ding, Jingyun Zhang, Jun Wang, Xin Tan, Yuan Xie, Lizhuang Ma:

From Enhancement to Understanding: Build a Generalized Bridge for Low-Light Vision via Semantically Consistent Unsupervised Fine-Tuning. 13804-13814 - Yufeng Jin, Vignesh Prasad, Snehal Jauhri, Mathias Franzius, Georgia Chalvatzaki:

6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting. 8032-8043 - Siyu Ren, Junhui Hou, Weiyao Lin, Wenping Wang:

Neural Compression for 3D Geometry Sets. 25294-25304 - Haiyang Bai, Jiaqi Zhu, Songru Jiang, Wei Huang, Tao Lu, Yuanqi Li, Jie Guo, Runze Fu, Yanwen Guo, Lijun Chen:

GaRe: Relightable 3D Gaussian Splatting for Outdoor Scenes from Unconstrained Photo Collections. 26456-26465 - Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Eshika Khandelwal, Gül Varol, Weidi Xie, Andrew Zisserman:

Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation. 16503-16513 - Takehiko Ohkawa, Jihyun Lee, Shunsuke Saito, Jason M. Saragih, Fabian Prada, Yichen Xu, Shoou-I Yu, Ryosuke Furuta, Yoichi Sato, Takaaki Shiratori:

Generative Modeling of Shape-Dependent Self-Contact Human Poses. 5426-5436 - Changsong Lei, Yaqian Liang, Shaofeng Wang, Jiajia Dai, Yong-Jin Liu:

TeethGenerator: A Two-Stage Framework for Paired Pre- and Post-Orthodontic 3D Dental Data Generation. 25872-25881 - Wanpeng Zhang, Yicheng Feng, Hao Luo, Yijiang Li, Zihao Yue, Sipeng Zheng, Zongqing Lu:

Unified Multimodal Understanding via Byte-Pair Visual Encoding. 12976-12986 - Yuntao Shou, Xiangyong Cao, Peiqiang Yan, Qiaohui, Qian Zhao, Deyu Meng:

Graph Domain Adaptation With Dual-Branch Encoder and Two-Level Alignment for Whole Slide Image-Based Survival Prediction. 19925-19935 - Anja Delic, Matej Grcic, Sinisa Segvic:

Sequential Keypoint Density Estimator: an Overlooked Baseline of Skeleton-Based Video Anomaly Detection. 11579-11589 - Yiren Song, Danze Chen, Mike Zheng Shou:

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer. 19731-19741 - Ryan Webster, Teddy Furon:

Multi-modal Identity Extraction. 10797-10806 - Jiahui Yang, Yongjia Ma, Donglin Di, Jianxun Cui, Hao Li, Wei Chen, Yan Xie, Xun Yang, Wangmeng Zuo:

QR-LoRA: Efficient and Disentangled Fine-Tuning via QR Decomposition for Customized Generation. 17587-17597 - Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi:

DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness. 6772-6783 - Rui Tian, Qi Dai, Jianmin Bao, Kai Qiu, Yifan Yang, Chong Luo, Zuxuan Wu, Yu-Gang Jiang:

REDUCIO! Generating 1K Video Within 16 Seconds Using Extremely Compressed Motion Latents. 19237-19247 - Chi-Hsi Kung, Frangil Ramirez, Juhyung Ha, Yi-Ting Chen, David Crandall, Yi-Hsuan Tsai:

What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning. 12294-12306 - Pattaramanee Arsomngern, Sasikarn Khwanmuang, Matthias Nießner, Supasorn Suwajanakorn:

Zero-Shot Inexact CAD Model Alignment from a Single Image. 6231-6241 - Philipp Becker, Abhinav Mehrotra, Ruchika Chavhan, Malcolm Chadwick, Luca Morreale, Mehdi Noroozi, Alberto Gil C. P. Ramos, Sourav Bhattacharya:

Edit: Efficient Diffusion Transformers with Linear Compressed Attention. 19608-19616 - Oindrila Saha, Logan Lawrence, Grant Van Horn, Subhransu Maji:

Generate, Transduct, Adapt: Iterative Transduction with VLMs. 1369-1379 - Mengxiao Tian, Xinxiao Wu, Shuo Yang:

LLM-Enhanced Action-Aware Multi-Modal Prompt Tuning for Image-Text Matching. 1-10 - Shin Ishihara, Imari Sato:

Spatio-Spectral Pattern Illumination for Direct and Indirect Separation from a Single Hyperspectral Image. 26827-26836 - Qi Wang, Zhipeng Zhang, Baao Xie, Xin Jin, Yunbo Wang, Shiyu Wang, Liaomo Zheng, Xiaokang Yang, Wenjun Zeng:

Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning. 2599-2608 - Yiting Li, Fayao Liu, Jingyi Liao, Sichao Tian, Chuan-Sheng Foo, Xulei Yang:

FIND: Few-Shot Anomaly Inspection with Normal-Only Multi-Modal Data. 1-10 - Jun-Hee Kim, Jumin Han, Seong-Whan Lee:

PoseAnchor: Robust Root Position Estimation for 3D Human Pose Estimation. 7079-7088 - Yunpeng Bai, Qixing Huang:

FiffDepth: Feed-Forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation. 6023-6033 - Shuangkang Fang, I-Chao Shen, Yufeng Wang, Yi-Hsuan Tsai, Yi Yang, Shuchang Zhou, Wenrui Ding, Takeo Igarashi, Ming-Hsuan Yang:

MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh. 14061-14072 - Hualong Ke, Jiangming Shi, Yachao Zhang, Fangyong Wang, Yuan Xie, Yanyun Qu:

Task-Aware Prompt Gradient Projection for Parameter-Efficient Tuning Federated Class-Incremental Learning. 2631-2641 - Yiwen Chen, Hieu T. Nguyen, Vikram Voleti, Varun Jampani, Huaizu Jiang:

HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Models. 28440-28450 - Yuqi Li, Chuanguang Yang, Hansheng Zeng, Zeyu Dong, Zhulin An, Yongjun Xu, Yingli Tian, Hao Wu:

Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting. 7262-7272 - Zhangjun Zhou, Yiping Li, Chunlin Zhong, Jianuo Huang, Jialun Pei, Hua Li, He Tang:

Rethinking Detecting Salient and Camouflaged Objects in Unconstrained Scenes. 22372-22382 - Osman Ülger, Maksymilian Kulicki, Yuki Asano, Martin R. Oswald:

Auto-Vocabulary Semantic Segmentation. 24266-24275 - Xuan Han, Yihao Zhao, Yanhao Ge, Mingyu You:

Toward Better Out-Painting: Improving the Image Composition With Initialization Policy Model. 16938-16947 - Hossein Mirzaei, Zeinab Taghavi, Sepehr Rezaee, Masoud Hadi, Moein Madadi, Mackenzie W. Mathis:

DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion. 3194-3205 - Jeongsol Kim, Bryan Sangwoo Kim, Jong Chul Ye:

FlowDPS: Flow-Driven Posterior Sampling for Inverse Problems. 12328-12337 - Haoang Lu, Yuanqi Su, Xiaoning Zhang, Longjun Gao, Yu Xue, Le Wang:

VisHall3D: Monocular Semantic Scene Completion from Reconstructing the Visible Regions to Hallucinating the Invisible Regions. 28674-28684 - Yunyang Xiong, Chong Zhou, Xiaoyu Xiang, Lemeng Wu, Chenchen Zhu, Zechun Liu, Saksham Suri, Balakrishnan Varadarajan, Ramya Akula, Forrest N. Iandola, Raghuraman Krishnamoorthi, Bilge Soran, Vikas Chandra:

Efficient Track Anything. 11513-11524 - Rui Chen, Zehuan Wu, Yichen Liu, Yuxin Guo, Jingcheng Ni, Haifeng Xia, Siyu Xia:

UniMLVG: Unified Framework for Multi-View Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving. 25453-25463 - Xin Zhou, Dingkang Liang, Sifan Tu, Xiwu Chen, Yikang Ding, Dingyuan Zhang, Feiyang Tan, Hengshuang Zhao, Xiang Bai:

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation. 27817-27827 - Luca Barsellotti, Lorenzo Bianchi, Nicola Messina, Fabio Carrara, Marcella Cornia, Lorenzo Baraldi, Fabrizio Falchi, Rita Cucchiara:

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation. 22025-22035 - Renshan Zhang, Rui Shao, Gongwei Chen, Miao Zhang, Kaiwen Zhou, Weili Guan, Liqiang Nie:

FALCON: Resolving Visual Redundancy and Fragmentation in High-Resolution Multimodal Large Language Models via Visual Registers. 23530-23540 - Yuzhuo Chen, Zehua Ma, Han Fang, Weiming Zhang, Nenghai Yu:

TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity. 16723-16732 - Minsu Kim, Subin Jeon, In Cho, Mijin Yoo, Seon Joo Kim:

ExploreGS: Explorable 3D Scene Reconstruction with Virtual Camera Samplings and Diffusion Priors. 1-10 - Guanjie Chen, Xinyu Zhao, Yucheng Zhou, Xiaoye Qu, Tianlong Chen, Yu Cheng:

Towards Stabilized and Efficient Diffusion Transformers Through Long-Skip-Connections With Spectral Constraints. 17708-17718 - Dehao Yuan, Levi Burner, Jiayi Wu, Minghui Liu, Jingxi Chen, Yiannis Aloimonos, Cornelia Fermüller:

Learning Normal Flow Directly from Events. 7969-7979 - Inseung Hwang, Kiseok Choi, Hyunho Ha, Min H. Kim:

Benchmarking Burst Super-Resolution for Polarization Images: Noise Dataset and Analysis. 24899-24909 - Yulin Pan, Xiangteng He, Chaojie Mao, Zhen Han, Zeyinzi Jiang, Jingfeng Zhang, Yu Liu:

ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing. 16586-16596 - Songyan Zhang, Yongtao Ge, Jinyuan Tian, Guangkai Xu, Hao Chen, Chen Lv, Chunhua Shen:

POMATO: Marrying Pointmap Matching with Temporal Motions for Dynamic 3D Reconstruction. 5680-5689 - Yuyi Liu, Xinhang Song, Tianliang Qi, Shuqiang Jiang:

Trial-Oriented Visual Rearrangement. 8022-8031 - Yibing Wei, Samuel Church, Victor Suciu, Jinhong Lin, Cheng-En Wu:

Trackverse: a Large-Scale Object-Centric Video Dataset for Image-Level Representation Learning. 11153-11163 - Trong-Thang Pham, Akash Awasthi, Saba Khan, Esteban Duran Marti, Tien-Phat Nguyen, Khoa Vo, Minh Tran, Son Nguyen, Cuong Tran, Yuki Ikebe, Anh Totti Nguyen, Anh Nguyen, Zhigang Deng, Carol C. Wu, Hien Nguyen, Ngan Le:

CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling. 21732-21743 - Yahao Liu, Qin Wang, Lixin Duan, Wen Li:

Balanced Sharpness-Aware Minimization for Imbalanced Regression. 6242-6251 - Chenxu Zhao, Wei Qian, Aobo Chen, Mengdi Huai:

Membership Inference Attacks With False Discovery Rate Control. 1216-1227 - Feng Qiao, Zhexiao Xiong, Eric Xing, Nathan Jacobs:

Towards Open-World Generation of Stereo Images and Unsupervised Matching. 26579-26589 - Yang Li, Jinglu Wang, Lei Chu, Xiao Li, Shiu-Hong Kao, Ying-Cong Chen, Yan Lu:

StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams. 25841-25850 - Guangting Zheng, Jiajun Deng, Xiaomeng Chu, Yu Yuan, Houqiang Li, Yanyong Zhang:

S3R-GS: Streamlining the Pipeline for Large-Scale Street Scene Reconstruction. 25594-25604 - Shuo Jin, Siyue Yu, Bingfeng Zhang, Mingjie Sun, Yi Dong, Jimin Xiao:

Feature Purification Matters: Suppressing Outlier Propagation for Training-Free Open-Vocabulary Semantic Segmentation. 20291-20300 - Xinli Xu, Wenhang Ge, Dicong Qiu, ZhiFei Chen, Dongyu Yan, Zhuoyun Liu, Haoyu Zhao, HanFeng Zhao, Shunsi Zhang, Junwei Liang, Ying-Cong Chen:

GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs. 7231-7240 - Mark Endo, Xiaohan Wang, Serena Yeung-Levy:

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration. 22826-22835 - Qi Fan, Kaiqi Liu, Nian Liu, Hisham Cholakkal, Rao Muhammad Anwer, Wenbin Li, Yang Gao:

Adapting In-Domain Few-Shot Segmentation to New Domains Without Source Domain Retraining. 21429-21439 - Shuai Tan, Bill Gong, Bin Ji, Ye Pan:

FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases. 24-36 - Jiaxin Lu, Gang Hua, Qixing Huang:

Jigsaw++: Imagining Complete Shape Priors for Object Reassembly. 6704-6714 - Yuwei Guo, Ceyuan Yang, Ziyan Yang, Zhibei Ma, Zhijie Lin, Zhenheng Yang, Dahua Lin, Lu Jiang:

Long Context Tuning for Video Generation. 17281-17291 - Yongkun Du, Zhineng Chen, Hongtao Xie, Caiyan Jia, Yu-Gang Jiang:

SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition. 20147-20156 - Xianghui Xie, Jan Eric Lenssen, Gerard Pons-Moll:

MVGBench: A Comprehensive Benchmark for Multi-View Generation Models. 8207-8218 - Xun Wu, Shaohan Huang, Lingjie Jiang, Furu Wei:

Rethinking DPO-Style Diffusion Aligning Frameworks. 18068-18077 - Songru Yang, Zhenwei Shi, Zhengxia Zou:

Unified Multi-Agent Trajectory Modeling with Masked Trajectory Diffusion. 27563-27574 - Wei Suo, Ji Ma, Mengyang Sun, Lin Yuanbo Wu, Peng Wang, Yanning Zhang:

Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models. 20247-20256 - Minjoo Ki, Daejung Kim, Kisung Kim, Seon Joo Kim, Jinhan Lee:

CARIM: Caption-Based Autonomous Driving Scene Retrieval via Inclusive Text Matching. 22036-22045 - Seunghoo Hong, Geonho Son, Juhun Lee, Simon S. Woo:

DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models. 17994-18003 - Wen Qian:

TryOn-Refiner: Conditional Rectified-Flow-Based Tryon Refiner for More Accurate Detail Reconstruction. 15669-15679 - Hongsong Wang, Renxi Cheng, Yang Zhang, Chaolei Han, Jie Gui:

LOTA: Bit-Planes Guided AI-Generated Image Detection. 17246-17255 - Jun Yin, Pengyu Zeng, Licheng Shen, Miao Zhang, Jing Zhong, Yuxing Han, Shuai Lu:

ArchiSet: Benchmarking Editable and Consistent Single-View 3D Reconstruction of Buildings with Specific Window-to-Wall Ratios. 26004-26014 - Wenhao Zhang, Hao Zhu, Delong Wu, Di Kang, Linchao Bao, Xun Cao, Zhan Ma:

WIPES: Wavelet-based Visual Primitives. 27338-27347 - Ziyi Wang, Peiming Li, Hong Liu, Zhichao Deng, Can Wang, Jun Liu, Junsong Yuan, Mengyuan Liu:

Recognizing Actions From Robotic View for Natural Human-Robot Interaction. 14218-14227 - Seungjun Moon, Hah Min Lew, Seungeun Lee, Ji-Su Kang, Gyeong-Moon Park:

GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar. 12811-12821 - Zeyu Yang, Zijie Pan, Yuankun Yang, Xiatian Zhu, Li Zhang:

Driving View Synthesis on Free-Form Trajectories with Generative Prior. 28083-28092 - Oliver J. Sutton, Qinghua Zhou, George Leete, Alexander N. Gorban, Ivan Yu. Tyukin:

Staining and Locking Computer Vision Models Without Retraining. 2346-2355 - Regine Hartwig, Dominik Muhle, Riccardo Marin, Daniel Cremers:

GECO: Geometrically Consistent Embedding with Lightspeed Inference. 9309-9319 - Xiaolong Sun, Le Wang, Sanping Zhou, Liushuai Shi, Kun Xia, Mengnan Liu, Yabing Wang, Gang Hua:

Moment Quantization for Video Temporal Grounding. 1-10 - Qiusheng Huang, Xiaohui Zhong, Xu Fan, Hao Li:

FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modelingg. 8852-8862 - Soonbin Lee, Fangwen Shu, Yago Sánchez de la Fuente, Thomas Schierl, Cornelius Hellge:

Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs. 25496-25505 - Zhiqi Ge, Juncheng Li, Xinglei Pang, Minghe Gao, Kaihang Pan, Wang Lin, Hao Fei, Wenqiao Zhang, Siliang Tang, Yueting Zhuang:

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining. 1-10 - Gangwei Xu, Jiaxin Liu, Xianqi Wang, Junda Cheng, Yong Deng, Jinliang Zang, Yurui Chen, Xin Yang:

BANet: Bilateral Aggregation Network for Mobile Stereo Matching. 28870-28880 - Wenqiang Sun, Shuo Chen, Fangfu Liu, Zilong Chen, Yueqi Duan, Jun Zhu, Jun Zhang, Yikai Wang:

Dimensionx: Create Any 3D and 4D Scenes From a Single Image With Decoupled Video Diffusion. 13695-13706 - Spyros Kondylatos, Nikolaos-Ioannis Bountos, Dimitrios Michail, Xiao Xiang Zhu, Gustau Camps-Valls, Ioannis Papoutsis:

On the Generalization of Representation Uncertainty in Earth Observation. 6552-6562 - Youngeun Kim, Seunghwan Lee, Aecheon Jung, Bogon Ryu, Sungeun Hong:

Task Vector Quantization for Memory-Efficient Model Merging. 20105-20115 - Tanay Agrawal, Abid Ali, Antitza Dantcheva, François Brémond:

Scaling Action Detection: AdaTAD++ with Transformer-Enhanced Temporal-Spatial Adaptation. 12222-12231 - Tang Tao, Likui Zhang, Youpeng Wen, Kaidong Zhang, Jia-Wang Bian, Xia Zhou, Tianyi Yan, Kun Zhan, Peng Jia, Hefeng Wu, Liang Lin, Xiaodan Liang:

RoboPearls: Editable Video Simulation for Robot Manipulation. 1-12 - Ron Raphaeli, Sean Man, Michael Elad:

SILO: Solving Inverse Problems with Latent Operators. 10570-10580 - Xuange Zhang, Dengjie Li, Bo Liu, Zenghao Bao, Yao Zhou, Baisong Yang, Zhongying Liu, Yujie Zhong, Tongtong Yuan:

Layer-Wise Vision Injection With Disentangled Attention for Efficient LVLMs. 7004-7013 - Joëlle Hanna, Damian Borth:

Know Your Attention Maps: Class-specific Token Masking for Weakly Supervised Semantic Segmentation. 23763-23772 - Rui Hu, Lianghui Zhu, Yuxuan Zhang, Tianheng Cheng, Lei Liu, Heng Liu, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang:

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding. 23105-23114 - Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Fuyun Wang, Huiwen Shi, Xianghui Yang, Qingxiang Lin, Jingwei Huang, Yuhong Liu, Jie Jiang, Chunchao Guo, Xiangyu Yue:

Unleashing Vecset Diffusion Model for Fast Shape Generation. 2523-2533 - Xueqing Deng, Linjie Yang, Qihang Yu, Chenglin Yang, Liang-Chieh Chen:

Leveraging Panoptic Scene Graph for Evaluating Fine-Grained Text-to-Image Generation. 15107-15116 - Haodong Jing, Dongyao Jiang, Yongqiang Ma, Haibo Hua, Bo Huang, Nanning Zheng:

Beyond Brain Decoding: Visual-Semantic Reconstructions to Mental Creation Extension Based on fMRI. 19258-19268 - Siyuan Yan, Ming Hu, Yiwen Jiang, Xieji Li, Hao Fei, Philipp Tschandl, Harald Kittler, Zongyuan Ge:

Derm1M: A Million-Scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology. 12681-12690 - Min Cen, Zhenfeng Zhuang, Yuzhe Zhang, Min Zeng, Baptiste Magnier, Lequan Yu, Hong Zhang, Liansheng Wang:

C2 MIL: Synchronizing Semantic and Topological Causalities in Multiple Instance Learning for Robust and Interpretable Survival Analysis. 24392-24401 - Zongyang Ma, Yuxin Chen, Ziqi Zhang, Zhongang Oi, Chunfeng Yuan, Shaojie Zhu, Chengxiang Zhuo, Bing Li, Ye Liu, Zang Li, Ying Shan, Weiming Hu:

VisionMath: Vision-Form Mathematical Problem-Solving. 1162-1172 - Meiao Wang, Xuejing Kang, Yaxi Lu, Jie Xu:

RetinexMCNet: A Memory Controller Dominated Network for Low-Light Video Enhancement Based on Retinex. 9716-9725 - Hongcheng Li, Yucan Zhou, Xiaoyan Gu, Bo Li, Weiping Wang:

Diversity-Enhanced Distribution Alignment for Dataset Distillation. 1-10 - Wenjia Wang, Liang Pan, Zhiyang Dou, Jidong Mei, Zhouyingcheng Liao, Yuke Lou, Yifan Wu, Lei Yang, Jingbo Wang, Taku Komura:

SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation. 14117-14127 - Lizhen Xu, Xiuxiu Bai, Xiaojun Jia, Jianwu Fang, Shanmin Pang:

Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning. 23085-23094 - Yi-Ting Shen, Sungmin Eum, Doheon Lee, Rohit Shete, Chiao-Yi Wang, Heesung Kwon, Shuvra S. Bhattacharyya:

Autocompose: Automatic Generation of Pose Transition Descriptions for Composed Pose Retrieval Using Multimodal LLMs. 7409-7418 - Cheonjun Park, Hyun Jae Oh, Mincheol Park, Hyunchan Moon, Minsik Kim, Suhyun Kim, Myung Kuk Yoon, Won Woo Ro:

WINS: Winograd Structured Pruning for Fast Winograd Convolution. 22477-22487 - Yehao Lu, Minghe Weng, Zekang Xiao, Rui Jiang, Wei Su, Guangcong Zheng, Ping Luo, Xi Li:

Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-Time Open-Vocabulary Object Detection. 20847-20856 - Paul Albert, Frederic Z. Zhang, Hemanth Saratchandran, Anton van den Hengel, Ehsan Abbasnejad:

Towards Higher Effective Rank in Parameter-Efficient Fine-Tuning Using Khatri-Rao Product. 1292-1302 - Akshay Krishnan, Xinchen Yan, Vincent Casser, Abhijit Kundu:

Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation. 28217-28227 - Qi Guo, Zhen Tian, Minghao Yao, Saiyu Qi, Yong Qi, Bingyi Liu:

Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation. 1474-1483 - Jiaer Xia, Bingkui Tong, Yuhang Zang, Rui Shao, Kaiyang Zhou:

Bootstrapping Grounded Chain-of-Thought in Multimodal Llms for Data-Efficient Model Adaptation. 208-217 - Zhaoyang Li, Zhu Teng, Baopeng Zhang, Jianping Fan:

Open-Unfairness Adversarial Mitigation for Generalized Deepfake Detection. 698-707 - Bingqing Zhang, Zhuo Cao, Heming Du, Yang Li, Xue Li

, Jiajun Liu, Sen Wang
:
Quantifying and Narrowing the Unknown: Interactive Text-to-Video Retrieval Via Uncertainty Minimization. 22120-22130 - Wenjie Pei, Qizhong Tan, Guangming Lu, Jiandong Tian, Jun Yu:

D2 ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-Shot Action Recognition. 1-10 - Chen Zhao, Xuan Wang, Tong Zhang, Saqib Javed, Mathieu Salzmann:

Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis. 4940-4950 - Yating Yu, Congqi Cao, Yifan Zhang, Yanning Zhang:

Learning to Generalize Without Bias for Open-Vocabulary Action Recognition. 12800-12810 - Chunlin Wen, Yu Zhang, Jie Fan, Hongyuan Zhu, Xiu-Shen Wei, Yijun Wang, Zhiqiang Kou, Shuzhou Sun:

Object-Level Correlation for Few-Shot Segmentation. 23689-23699 - Yichen Lu, Siwei Nie, Minlong Lu, Xudong Yang, Xiaobo Zhang, Peng Zhang:

Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection. 19248-19257 - Yesheng Zhang, Xu Zhao:

Semantic-Guided Camera Ray Regression for Visual Localization. 25639-25648 - Muhammad Sohail Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Kartik Kuckreja, Fahad Shahbaz Khan, Paolo Fraccaro, Alexandre Lacoste, Salman Khan:

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks. 7132-7142 - Yichen Shen, Yijin Li, Shuo Chen, Guanglin Li, Zhaoyang Huang, Hujun Bao, Zhaopeng Cui, Guofeng Zhang:

BlinkTrack: Feature Tracking Over 80 FPS via Events and Images. 9298-9308 - Mengyu Yang, Yiming Chen, Haozheng Pei, Siddhant Agarwal, Arun Balajee Vasudevan, James Hays:

Clink! Chop! Thud! - Learning Object Sounds From Real-World Interactions. 14549-14558 - Xu Zheng, Yuanhuiyi Lyu, Lutao Jiang, Danda Pani Paudel, Luc Van Gool, Xuming Hu:

Reducing Unimodal Bias in Multi-Modal Semantic Segmentation With Multi-Scale Functional Entropy Regularization. 21166-21176 - Chengjun Yu, Wei Zhai, Yuhang Yang, Yang Cao, Zheng-Jun Zha:

HERO: Human Reaction Generation from Videos. 10262-10274 - Yuxiao Wang, Yu Lei, Zhenao Wei, Weiying Xue, Xinyu Jiang, Nan Zhuang, Qi Liu:

Prompt Guidance and Human Proximal Perception for HOT Prediction with Regional Joint Loss. 23636-23645 - Zhuoling Li, Haoxuan Qu, Jason Kuen, Jiuxiang Gu, Qiuhong Ke, Jun Liu, Hossein Rahmani:

DiffIP: Representation Fingerprints for Robust IP Protection of Diffusion Models. 17035-17045 - Maksim Golyadkin, Valeria Rubanova, Aleksandr Utkov, Dmitry Nikolotov, Ilya Makarov:

MEH: A Multi-Style Dataset and Toolkit for Advancing Egyptian Hieroglyph Recognition. 24488-24496 - Jiazhe Guo, Yikang Ding, Xiwu Chen, Shuo Chen, Bohan Li, Yingshuang Zou, Xiaoyang Lyu, Feiyang Tan, Xiaojuan Qi, Zhiheng Li, Hao Zhao:

DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation. 27231-27241 - Yisu Zhang, Chenjie Cao, Chaohui Yu, Jianke Zhu:

LiON-LoRA: Rethinking LoRA Fusion to Unify Controllable Spatial and Temporal Generation for Video Diffusion. 14569-14579 - Yida Wang, Xueyang Zhang, Kun Zhan, Peng Jia, Xianpeng Lang:

HiNeuS: High-Fidelity Neural Surface Mitigating Low-Texture and Reflective Ambiguity. 25746-25755 - Xinyu Chen, Haotian Zhai, Can Zhang, Xiupeng Shi, Ruirui Li:

Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models. 2281-2291 - Mai Su, Zhongtao Wang, Huishan Au, Yilong Li, Xizhe Cao, Chengwei Pan, Yisong Chen, Guoping Wang:

HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes. 28839-28848

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














