


default search action
27th MMSP 2025: Beijing, China
- IEEE International Workshop on Multimedia Signal Processing, MMSP 2025, Beijing, China, September 21-23, 2025. IEEE 2025, ISBN 979-8-3315-9241-7

- Haolin Yu, Yanxiong Li:

Infant Cry Detection In Noisy Environment Using Blueprint Separable Convolutions and Time-Frequency Recurrent Neural Network. 1-6 - Yuanhao Gong

, Yongfei Guo:
Anderson Accelerated Residual Solver for Total Variation Models in Image Processing. 7-11 - Shuai Chen, Fanman Meng, Xiwei Zhang, Haoran Wei, Chenhao Wu, Qingbo Wu, Hongliang Li:

DFR: A Decompose-Fuse-Reconstruct Framework for Multi-Modal Few-Shot Segmentation. 12-17 - Menghui Zhang, Jing Zhang, Lin Chen, Li Zhuo:

Prototype Embedding Optimization for Human-Object Interaction Detection in Livestreaming : PeO-HOI. 18-23 - Jiajie Guo, Qingpeng Zhu, Jin Zeng, Xiaolong Wu, Changyong He, Weida Wang:

SpatialGeo: Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion. 24-29 - Olivier Lézoray, Anass Nouri:

Learning 3D mesh saliency from spiral patch features. 30-35 - Takahiro Shindo, Yui Tatsumi, Taiju Watanabe, Hiroshi Watanabe:

Guided Diffusion for the Extension of Machine Vision to Human Visual Perception. 36-41 - Yongfei Guo, Xudong Niu, Chizhi Zhang, Yuanhao Gong

:
FPGA Accelerated One-Sided Box Filter for Edge-Preserving Image Processing. 42-47 - Yajun Qiu, Shuyuan Zhu, Lantao Yu, Bing Zeng:

Blind Image Super-Resolution with Local and Global Dual-Guidance. 48-53 - Jiarui Wang, Huiyu Duan, Yuke Xing, Yiling Xu, Guangtao Zhai, Xiongkuo Min:

CompBench: Benchmarking and Comparing Image Generation with Large Multimodal Models. 54-59 - Manu Gond, Mohammadreza Shamshirgarha, Emin Zerman, Sebastian Knorr, Mårten Sjöström:

Real-Time View Synthesis with Multiplane Image Network using Multimodal Supervision. 60-65 - Mahmoud Z. A. Wahba, Sara Baldoni, Federica Battisti:

Sphere-GAN: a GAN-based approach for saliency estimation in 360° videos. 66-71 - Bing Han, Yuhua Huang, Pan Gao:

HyperDiff: Hypergraph Guided Diffusion Model for 3D Human Pose Estimation. 72-77 - Xinhui Yu, Sophie Liu, Chunhua Wu:

Multimodal Federated Learning for Personalized Clothing Recommendation. 78-83 - Peng Liu, Zitai Jiang:

CG-SMFNet: Consensus-Guided Selective Multimodal Fusion for Weakly Supervised Temporal Action Localization. 84-89 - Yui Tatsumi, Ziyue Zeng

, Hiroshi Watanabe:
Explicit Residual-Based Scalable Image Coding for Humans and Machines. 90-95 - Emin Zerman, Soheib Takhtardeshir, Anthony Trioux, Jianlong Qin, Wenjie Wu, Roger Olsson, Mårten Sjöström:

Subjective Visual Quality Assessment of Compressed Light Field Images: Learning-based vs. Conventional Methods. 96-101 - Khélian Larvet, Jean-Pierre Pedeboy, William Puech:

Secure protection of 3D content through reversible geometric deformation. 102-107 - Shi Pan, Hongshuai Li, Zhengxian Yang, Le Wang, Cheng Su, Liqian Ma, Hua Du, Borong Lin, Tao Yu:

Towards Volumetric Video: a Technical Overview of Immersive Media. 108-113 - Weixiang Zhao, Fei Shang, Jin Li, Jingyang Wen, Xiangui Kang, Z. Jane Wang:

Secure INN-based Steganography via Model Smoothing and Adversarial Attacks. 114-119 - Ali Hassan

, Tingting Zhang, Karen Egiazarian
, Mårten Sjöström:
EPINET-Lite: Rethinking Mixed Convolutions for Efficient Light Field Disparity Estimation Network. 120-125 - Yue Gao, Xiao Xu, Eckehard G. Steinbach, Daniel E. Lucani, Qi Zhang:

Touch-Augmented Gaussian Splatting for Enhanced 3D Scene Reconstruction. 126-131 - Zhuobin Yuan, Rui Dai

, Rayan Alghamdi:
Real-Time Distortion Detection for PTZ Camera Systems. 132-137 - Yongyi Zang, Zheqi Dai, Mark D. Plumbley, Qiuqiang Kong:

Music Source Restoration. 138-143 - Qingtong Xu, Chao Zhang, Ao Li, Xiaoning Liu, Ce Zhu:

IdCo: Joint Identification and Contrastive Learning for Masked Face Recognition. 144-149 - Fei He, Houji Du, Nipon Theera-Umpon, Yipeng Liu, Ce Zhu:

Flexibly Constrained Tucker Decomposition for High-Order Spectral Analysis. 150-155 - Chenghao Qi, Heqian Qiu, Zhaofeng Shi, Lanxiao Wang, Hanwen Zhang, Xinyu Chen, Hongliang Li:

D3Net: Dual-Path Decoupling-Distillation for Adaptive Fusion in Continual Egocentric Learning. 156-161 - Nassim Ali Ousalah, Peyman Rostami, Anis Kacem, Enjie Ghorbel, Emmanuel Koumandakis, Djamila Aouada:

FPG-NAS: FLOPs-Aware Gated Differentiable Neural Architecture Search for Efficient 6DoF Pose Estimation. 162-167 - Konstantinos Drossos, Mikko Heikkinen, Paschalis Tsiaflakis:

Lightweight DNN for Full-Band Speech Denoising on Mobile Devices: Exploiting Long and Short Temporal Patterns. 168-173 - Zhenchao Wu, Hongteng Xu, Xu Chen:

Meta Learning for Adaptive Disentangled User Preference Integration Toward Multimodal Recommendation. 174-179 - Lei Xiong

, Zihao Wang, Boyuan Zhang, Feiyu Chen, Shuyuan Zhu, Bing Zeng:
Task-Aware Optimized Color Image Demosaicing. 180-185 - Hanwen Zhang, Heqian Qiu, Lanxiao Wang, Chenghao Qi, Ruisong Dai, Hongliang Li:

Efficient Polyp Detection via Wavelet-Driven Boundary Enhancement and Temporal Consistency. 186-191 - Jungwoo Kim

, Jong-Seok Lee:
Exploring Cross-Stage Adversarial Transferability in Class-Incremental Continual Learning. 192-197 - Yen-Ku Yeh, Chun-Hao Yang, Kun-Tai Wu, Yan-Tsung Peng, Chun-Rong Huang, Jun-Cheng Chen:

Restore Anything Anywhere: Targeted Image Restoration with Object Segmentation and Text Guidance. 198-203 - Ruisong Dai, Hanwen Zhang, Xinyu Chen, Chenghao Qi, Heqian Qiu, Hongliang Li:

OrthCal: Synergizing Orthogonal Contrastive Learning and Prototype Calibration for Few-Shot Class-Incremental Learning. 204-209 - Xinyu Chen, Heqian Qiu, Chenghao Qi, Ruisong Dai, Hongliang Li:

DBAB: A Dual-Branch Adaptive Balance Framework with Optimized Plasticity Branch for Class-Incremental Learning. 210-215 - Taiga Hayami, Kakeru Koizumi, Hiroshi Watanabe:

Structure-Preserving Patch Decoding for Efficient Neural Video Representation. 216-221 - Diana-Alexandra Sas, Florin Oniga:

S-LAM3D: Segmentation-Guided Monocular 3D Object Detection via Feature Space Fusion. 222-227 - Owen Dossett, Ke Lyu, Maohong Liao, Han Li, Xianglong Feng:

An Exploration of User Biometric Identification In XR Applications Based On User Head Movement. 224-229 - Xun Wang, Xutao Xue, Xubing Kang, Siyuan Li, Shayer Shabab Utsho, Kun Li, Mengqi Ji:

PromptGS: Visual Prompting for Tiny Object Reconstruction in 3DGS Optimization. 228-233 - Can Cui, Paul Magron, Mostafa Sadeghi, Emmanuel Vincent:

Data-independent Beamforming for End-to-end Multichannel Multi-speaker ASR. 240-245 - Taous Iatariene, Alexandre Guérin, Romain Serizel:

Towards Low-Latency Tracking of Multiple Speakers With Short-Context Speaker Embeddings. 246-251 - Avinash Kumar Sharma, Tushar Shinde:

Efficient Generative Defect Synthesis for Industrial Anomaly Detection on MVTec AD. 252-257 - Lingyu Shi, Jiaqi Zou, Songlin Sun, Geert Van der Auwera, Zhu Li:

Low Latency Immersive Visual Communication with Scalable Gaussian Splatting Coding. 258-263 - Antoine R. Souchaud, Pedro Lladó, Annika Neidhardt, Zoran Cvetkovic, Enzo De Sena:

White-box Differentiable Model of Perceived Localisation. 264-268 - Duc V. Nguyen:

Tackling Re-buffering in Adaptive Video Streaming over Dynamic Networks: A Generative AI Approach. 269-273 - Zhenchao Wu, Hongteng Xu, Xu Chen:

Meta Learning-based Multimodal Recommendation with Adaptive User Modality-Aware Preference Integration. 274-279 - Luoxu Jin, Hiroshi Watanabe:

Adapting Image-to-Video Diffusion Models for Large-Motion Frame Interpolation. 280-285 - Yuxiang Liu, Shanxin Zhang, Zhenyong Li, Chuanfen Feng, Hui Ji, Jiande Sun:

MGFT: Multi-Geometric Fusion Transformer for Robust Point Cloud Registration. 286-291 - Zhenyong Li, Shanxin Zhang, Yuxiang Liu, Chuanfen Feng, Hui Ji, Jiande Sun:

HGS_OFAT: High-fidelity Gaussian SLAM based on Optical Flow Assisted Tracking. 292-297 - Honglei Zhang, A. Burakhan Koyuncu, Jukka I. Ahonen, Nannan Zou, Francesco Cricri:

Learned Image Codec with Progressive Multi-Scale Probability Model for Streaming in Unreliable Communication Channels. 298-303 - Zichen Zhu, Tian Guo, Sheng Wei:

Carbon-Efficient Internet Video Streaming. 304-309 - Nasser-Eddine Monir, Paul Magron, Romain Serizel:

Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement. 310-315 - Jin Zhou, Mufeng Zhu, Yao Liu, Songqing Chen:

NeRFCompressor: Enhancing Dynamic Scene Representation for Efficient 6-DoF Object Transportation. 316-321 - Sayush Maharjan, Raghunath Sai Puttagunta, Zach Button, Zhu Li:

Cross-Modal Thermal Image Compression via RGB Side Information. 322-327 - Jacob Chakareski, Lingdong Wang, Nicholas Mastronarde:

Reinforcement Learning-Based Dynamic Resource Allocation for Aerial 360° Video VR Streaming. 328-333 - Abderrezzaq Sendjasni, Mohamed-Chaker Larabi:

Latent Space Stability vs. Perceptual Sensitivity: A Study of Visual Encoders under Distortion. 334-339 - Zhehao Shen, Yiwen Cai, Yuanji Lu, Yu Hong, Yize Wu, Meihan Zheng, Yingliang Zhang, Lan Xu:

Dynamic Gaussian Streams for Volumetric Video via Codebook-Based Quantization. 340-345 - Tao Lu, Gaochang Wu:

Lightweight Steel Surface Defect Detection via Knowledge Distillation. 346-351 - Wenxi Li, Chenyang Lyu, Wei Ji, Liting Zhou, Cathal Gurrin, Yuchen Guo:

Rethinking Document Layout Analysis through Text Clustering via Multi-Modal Graph Convolution Networks. 352-357 - Zhuojiang Cai, Yiheng Zhang, Meitong Guo, Mingdao Wang, Yuwang Wang:

LSS3D: Learnable Spatial Shifting for Consistent and High-Quality 3D Generation from Single-Image. 358-363

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














