


default search action
MMM 2025, Nara, Japan - Part III
- Ichiro Ide

, Ioannis Kompatsiaris
, Changsheng Xu
, Keiji Yanai
, Wei-Ta Chu
, Naoko Nitta, Michael Riegler
, Toshihiko Yamasaki
:
MultiMedia Modeling - 31st International Conference on Multimedia Modeling, MMM 2025, Nara, Japan, January 8-10, 2025, Proceedings, Part III. Lecture Notes in Computer Science 15522, Springer 2025, ISBN 978-981-96-2063-0
Regular Papers
- Hanxu Ai

, Xiaomei Tao
, Xingbing Li, Yanling Gan:
Modeling High-Order Relationships Between Human and Video for Emotion Recognition in Video Learning. 3-16 - Shyi-Chyi Cheng, Yen-Lin Chen, Shih-Yu Li:

MPPQNet: A Moment-Preserving Product Quantization Neural Network for Progressive 3D Point Cloud Transmission. 17-30 - Enhui Yang, Zhibin Zhang:

MS-SAM: Multi-scale SAM Based on Dynamic Weighted Agent Attention. 31-44 - Bin Wang, Zekun Chen, Lei Zhang, Shili Liang, Sijia Guo, Xinyu Kang, Huajing Li:

MSA-Former: Multi-scale Adaptive Transformer for Image Snow Removal. 45-58 - Dongyu Liu, Yuan Zhu, Rui Liu

, Zhecong Xing, Weiyang Geng, Yanqiang Wang:
MSD-YOLO: An Efficient Algorithm for Small Target Detection. 59-72 - Yijie Zhu, Mingyong Li:

Multi-modal Information Multi-angle Mining for Multimedia Recommendation. 73-86 - Feifei Xu, Fumiaoyue Jia, Wang Zhou:

Multimodal Prompt Learning for Audio Visual Scene-Aware Dialog. 87-100 - Tin Yui Yip, Chuck-jee Chau

:
Music2MIDI: Pop Music to MIDI Piano Cover Generation. 101-113 - Wanchang Jiang, Yuxin Jiang:

Noise-Robust Separating Multi-source Aliased Vibration Signal Based on Transformer Demucs. 114-127 - Yanru Xiang, Yi Li:

One-Shot Generative Domain Adaptation by Constructing Self-amplifying Datasets. 128-141 - Yuta Goto, Satoshi Yamazaki, Takashi Shibata, Jianquan Liu:

Open-Vocabulary Scene Graph Generation via Synonym-Based Predicate Descriptor. 142-156 - Aoto Sugahara, Soma Kishimoto, Yuji Adachi, Kiyoto Tai, Ryoichi Takashima, Tetsuya Takiguchi:

Operatic Singing Voice Synthesis From Inexperienced Voice Considering Tempo and Vowel Change. 157-170 - Cheng-Yuan Wu, Yuan-Chun Sun, Cheng-Tse Lee, Cheng-Hsin Hsu:

Optimally Planning Drone Trajectories to Capture 3D Gaussian Splatting Objects. 171-185 - Jizhe Yu, Yu Liu, Xiaoshuai Wu, Kaiping Xu, Jiangquan Li:

PA2Net: Pyramid Attention Aggregation Network for Saliency Detection. 186-200 - Yufei Wang, Junfeng Yao, Zefeng Wang:

PianoPal: A Robotic Multimedia System for Interactive Piano Instruction Based on Q-Learning and Real-Time Feedback. 201-214 - Yulan Su, Sisi Zhang, Zechao Lin, Xingbin Wang, Lutan Zhao, Dan Meng, Rui Hou:

Poseidon: A NAS-Based Ensemble Defense Method Against Multiple Perturbations. 215-228 - Zhengzhuo Zhang, Liansheng Zhuang:

Progressive Neural Architecture Generation with Weaker Predictors. 229-242 - Pengzhou Cai, Lu Jiang, Yanxin Li, Xiaojuan Liu, Libin Lan

:
Pubic Symphysis-Fetal Head Segmentation Network Using BiFormer Attention Mechanism and Multipath Dilated Convolution. 243-256 - Yingqian Zhu

, Guanyu Gao:
QRALadder: QoE and Resource Consumption-Aware Encoding Ladder Optimization for Live Video Streaming. 257-269 - Yuzhang Shang, Gaowen Liu, Ramana Kompella, Yan Yan:

Quantized-ViT Efficient Training via Fisher Matrix Regularization. 270-284 - Mu-Jan Shih, Yi-Yu Hsu:

Real-Time Action Detection in Volleyball Matches Using DETR Architecture. 285-296 - Xuan Shao

, Leming Huang
, Xinghua Liu:
Revisit Data Association in Semantic SLAM Systems for Autonomous Parking. 297-310 - Yulan Su, Sisi Zhang, Yan Wang, Xingbin Wang, Lutan Zhao, Dan Meng, Rui Hou:

RobSparse: Automatic Search for GPU-Friendly Robust and Sparse Vision Transformers. 311-325 - Yongqian Li, Yong Luo, Xin Zhou:

Robust Active Speaker Detection in Challenging Environments Using GNN-Fused Multi-modal Cues and Body Language. 326-339 - Wenhui Tan, Bei Liu, Junbo Zhang, Ruihua Song, Jianlong Fu:

RoLD: Robot Latent Diffusion for Multi-task Policy Modeling. 340-353 - Wolfgang Hürst

, Leo Zeches:
Rotation Methods for 360-Degree Videos in Virtual Reality - A Comparative Study. 354-366 - Yongqiang Kong, Yunhong Wang, Annan Li:

Saliency Based Data Augmentation for Few-Shot Video Action Recognition. 367-380 - Xiwen Wang

, Jizhe Zhou, Xuekang Zhu, Cheng Li, Mao Li:
Saliency Guided Optimization of Diffusion Latents. 381-394 - Dajiang Yang, Wei Wu, Yuxing Lee:

SCANet: Semantic Coherence Attention Network for Clothing Change Person Re-identification. 395-409 - Min Yin, Liang Xie

, Haoran Liang
, Xing Zhao
, Ben Chen
, Ronghua Liang
:
SCLSTE: Semi-supervised Contrastive Learning-Guided Scene Text Editing. 410-424 - Hujiang Huang, Yu Xie

, Jun Gao
, Chuanliu Fan
, Ziqiang Cao
:
Select and Order: Optimizing Few-Shot Image Classification with In-Context Learning. 425-438 - Shuai Shi, Na Qi, Yezi Li, Qing Zhu:

Self-supervised Reference-Based Image Super-Resolution with Conditional Diffusion Model. 439-452

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














