


default search action
WASPAA 2025: Tahoe City, CA, USA
- IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2025, Tahoe City, CA, USA, October 12-15, 2025. IEEE 2025, ISBN 979-8-3315-3745-6

- Pascal Baumann, Seraina Glaus, Ludovic Amruthalingam, Fabian Gröger, Ruksana Giurda, Simone Lionetti:

Listening Intention Estimation for Hearables with Natural Behavior Cues. 1-5 - Robin Scheibler, John R. Hershey, Arnaud Doucet, Henry Li:

Source Separation by Flow Matching. 1-5 - Shigeki Karita, Yuma Koizumi, Heiga Zen, Haruko Ishikawa, Robin Scheibler, Michiel Bacchiani:

Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration. 1-5 - Yunyun Wang, Jiaqi Su, Adam Finkelstein, Rithesh Kumar, Ke Chen, Zeyu Jin:

DiTVC: One-Shot Voice Conversion via Diffusion Transformer with Environment and Speaking Rate Cloning. 1-5 - Dimitrios Bralios, Paris Smaragdis, Jonah Casebeer:

Learning to Upsample and Upmix Audio in the Latent Domain. 1-5 - Yulun Wu, Zhongweiyang Xu, Jianchong Chen, Zhong-Qiu Wang, Romit Roy Choudhury:

Unsupervised Multi-channel Speech Dereverberation via Diffusion. 1-5 - Mirco Pezzoli, Federico Miotello

, Shoichi Koyama, Fabio Antonacci:
Low-Rank Adaptation of Deep Prior Neural Networks For Room Impulse Response Reconstruction. 1-4 - Liang Xu, Longfei Yan, W. Bastiaan Kleijn

:
Robust One-step Speech Enhancement via Consistency Distillation. 1-5 - Haocheng Liu, Diego Di Carlo, Aditya Arie Nugraha, Kazuyoshi Yoshii, Gaël Richard, Mathieu Fontaine:

Physically Informed Spatial Regularization for Sound Event Localization and Detection. 1-5 - Hongzhi Shu, Xinglin Li, Hongyu Jiang, Minghao Fu, Xinyu Li:

Benchmarking Sub-Genre Classification For Mainstage Dance Music. 1-5 - Yihui Fu, Renzheng Shi, Marvin Sach, Wouter Tirry, Tim Fingscheidt:

EffDiffSE: Efficient Diffusion-Based Frequency-Domain Speech Enhancement with Hybrid Discriminative and Generative DNNs. 1-5 - Yuzhu Wang

, Archontis Politis
, Konstantinos Drossos, Tuomas Virtanen:
Multi-Utterance Speech Separation and Association Trained on Short Segments. 1-5 - Yuexuan Kong, Vincent Lostanlen, Romain Hennequin, Mathieu Lagrange, Gabriel Meseguer-Brocal:

Multi-Class-Token Transformer for Multitask Self-supervised Music Information Retrieval. 1-5 - Enric Gusó, Joanna Luberadzka, Umut Sayin, Xavier Serra:

MB-RIRs: a Synthetic Room Impulse Response Dataset with Frequency-Dependent Absorption Coefficients. 1-5 - Satvik Dixit, Sungjoon Park, Chris Donahue, Laurie M. Heller:

Learning Perceptually Relevant Temporal Envelope Morphing. 1-5 - Qiquan Zhang, Moran Chen, Zeyang Song, Hexin Liu, Xiangyu Zhang, Haizhou Li:

Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study. 1-5 - Mary Pilataki, Matthias Mauch, Simon Dixon:

Self-Supervised Representation Learning with a JEPA Framework for Multi-instrument Music Transcription. 1-5 - Luca Becker, Rainer Martin

:
Contrastive Representation Learning for Privacy-Preserving Fine-Tuning of Audio-Visual Speech Recognition. 1-5 - Tong Xiao, Reinhild Roden, Matthias Blau, Simon Doclo:

Soft-Constrained Spatially Selective Active Noise Control for Open-Fitting Hearables. 1-5 - Wiebke Middelberg, Jung-Suk Lee, Saeed Bagheri Sereshki, Ali Aroudi, Vladimir Tourbabin, Daniel D. E. Wong:

Microphone Occlusion Mitigation for Own-Voice Enhancement in Head-Worn Microphone Arrays Using Switching-Adaptive Beamforming. 1-5 - Manu Harju

, Annamaria Mesaros
:
Sound event detection with audio-text models and heterogeneous temporal annotations. 1-5 - Sonal Kumar, Prem Seetharaman, Justin Salamon, Dinesh Manocha, Oriol Nieto:

SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation. 1-5 - Jiarui Hai, Mounya Elhilali

:
SynSonic: Augmenting Sound Event Detection through Text-to-Audio Diffusion ControlNet and Effective Sample Filtering. 1-5 - Seungheon Doh, Junghyun Koo, Marco A. Martínez Ramírez, Wei-Hsiang Liao, Juhan Nam, Yuki Mitsufuji:

Can Large Language Models Predict Audio Effects Parameters from Natural Language? 1-5 - Zhi Zhong, Akira Takahashi, Shuyang Cui

, Keisuke Toyama, Shusuke Takahashi, Yuki Mitsufuji:
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet. 1-5 - Noah Jaffe

, John Ashley Burgoyne:
Musical Source Separation Bake-Off: Comparing Objective Metrics with Human Perception. 1-5 - Orchisama Das, Gloria Dal Santo

, Sebastian J. Schlecht, Zoran Cvetkovic:
Neural-network based interpolation of late reverberation in coupled spaces using the common slopes model. 1-5 - Ryan Niu, Shoichi Koyama, Tomohiko Nakamura:

Head-Related Transfer Function Individualization Using Anthropometric Features and Spatially Independent Latent Representation. 1-5 - David Rowe, Jean-Marc Valin:

RADE: A Neural Codec for Transmitting Speech over HF Radio Channels. 1-5 - Zachary Novack, Zach Evans, Zack Zukowski, Josiah Taylor, CJ Carr, Julian Parker, Adnan Al-Sinan, Gian Marco Iodice, Julian J. McAuley

, Taylor Berg-Kirkpatrick, Jordi Pons:
Fast Text-to-Audio Generation with Adversarial Post-Training. 1-5 - Sungkyun Chang, Simon Dixon, Emmanouil Benetos:

RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection. 1-5 - Julia Wilkins, Sivan Ding, Magdalena Fuentes, Juan Pablo Bello:

Balancing Information Preservation and Disentanglement in Self-Supervised Music Representation Learning. 1-5 - Adrian S. Roman, Irán R. Román, Juan Pablo Bello:

Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach. 1-5 - Ryo Matsuda, Juliano G. C. Ribeiro, Hitoshi Akiyama, Jorge Treviño:

Kernel ridge regression based sound field estimation using a rigid spherical microphone array. 1-5 - Genís Plaja-Roglans, Yun-Ning Hung, Xavier Serra, Igor Pereira:

Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures. 1-5 - Johannes W. de Vries, Timm-Jonas Bäumer, Stephan Töpken, Richard Heusdens, Steven van de Par, Richard C. Hendriks:

Beamforming with Interaural Time-To-Level Difference Conversion for Hearing Loss Compensation. 1-5 - Xilin Jiang, Junkai Wu, Vishal Choudhari, Nima Mesgarani:

Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation. 1-5 - Chin-Yun Yu, Marco A. Martínez Ramírez, Junghyun Koo, Wei-Hsiang Liao, Yuki Mitsufuji, György Fazekas:

Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior. 1-5 - Silvia Arellano, Chunghsin Yeh, Gautam Bhattacharya, Daniel Arteaga

:
Room Impulse Response Generation Conditioned on Acoustic Parameters. 1-5 - Riccardo Passoni, Francesca Ronchini, Luca Comanducci, Romain Serizel, Fabio Antonacci:

Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models. 1-5 - Cagdas Tuna, Andreas Walther, Emanuël A. P. Habets:

Device-Centric Room Impulse Response Augmentation Evaluated on Room Geometry Inference. 1-5 - Yoshiki Masuyama, François G. Germain, Gordon Wichern, Christopher Ick, Jonathan Le Roux:

Physics-Informed Direction-Aware Neural Acoustic Fields. 1-5 - Juan Sebastián Gómez Cañón

, Camille Noufi, Jonathan Berger, Karen J. Parker
, Daniel L. Bowling
:
The Test of Auditory-Vocal Affect (TAVA) dataset. 1-5 - Eric Grinstein, Ashutosh Pandey, Cole Li, Shanmukha Srinivas, Juan Azcarreta, Jacob Donley, Sanha Lee, Ali Aroudi, Çagdas Bilen:

Controlling the Parameterized Multi-channel Wiener Filter using a tiny neural network. 1-5 - Stephen Voran, Jaden Pieper

:
Frequency-Domain Signal-to-Noise Ratios Illuminate the Effects of the Spectral Consistency Constraint and Griffin-Lim Algorithms. 1-5 - Xinmeng Luan, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti:

Physics-Informed Transfer Learning for Data-Driven Sound Source Reconstruction in Near-Field Acoustic Holography. 1-5 - Shihori Kozuka, Shoichi Koyama, Hiroaki Itou, Noriyoshi Kamado:

Source and Sensor Placement for Sound Field Control Based on Mean Square Error with Prior Spatial Covariance. 1-5 - Kishan Gupta, Srikanth Korse, Andreas Brendel, Nicola Pia, Guillaume Fuchs:

UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension. 1-5 - Krishna Subramani, Paris Smaragdis, Takuya Higuchi, Mehrez Souden:

Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations. 1-5 - Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-Jin Shim, Soham Deshmukh, Shinji Watanabe:

OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder. 1-5 - Jan Büthe, Jean-Marc Valin:

A lightweight and robust method for blind wideband-to-fullband extension of speech. 1-5 - Manjunath Mulimani, Annamaria Mesaros

:
Online incremental learning for audio classification using a pretrained audio model. 1-5 - Joanna Luberadzka, Enric Gusó, Umut Sayin:

Conditioned Wave-U-Net for Acoustic Matching of Speech in Shared XR Environments. 1-5 - Clémentine Berger, Paraskevas Stamatiadis, Roland Badeau, Slim Essid:

IS3 : Generic Impulsive-Stationary Sound Separation in Acoustic Scenes using Deep Filtering. 1-5 - Pedro Lladó, Annika Neidhardt, Antoine R. Souchaud, Zoran Cvetkovic, Enzo De Sena:

Perceptually-driven panning for an extended listening area. 1-5 - Holger Severin Bovbjerg

, Jan Østergaard
, Jesper Jensen, Shinji Watanabe, Zheng-Hua Tan
:
Learning Robust Spatial Representations from Binaural Audio through Feature Distillation. 1-5 - Philipp Grundhuber, Mhd Modar Halimeh, Martin Strauss, Emanuël A. P. Habets:

Robust Speech Activity Detection in the Presence of Singing Voice. 1-5 - Kohei Saijo, Yoshiaki Bando:

Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation. 1-5 - Xiajie Zhou, Candy Olivia Mawalim, Masashi Unoki:

Modeling Multi-Level Hearing Loss for Speech Intelligibility Prediction. 1-5 - Yixiao Zhang, Haonan Chen, Ju-Chiang Wang, Jitong Chen:

Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis. 1-5 - Tsun-An Hsieh, Minje Kim:

TGIF: Talker Group-Informed Familiarization of Target Speaker Extraction. 1-5 - Christopher Mitcheltree, Hao Hao Tan, Joshua D. Reiss:

Modulation Discovery with Differentiable Digital Signal Processing. 1-5 - Haoshuai Zhou, Boxuan Cao, Changgeng Mo, Linkai Li, Shan Xiang Wang:

Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People. 1-5 - Filippo Maria Fazi, Francesco Veronesi, Marcos F. Simón Gálvez, Andreas Franck:

Theoretical Analysis of Recursive Implementations of Multi-Channel Cross-Talk Cancellation Systems. 1-5 - Paul Armin Bereuter, Benjamin Stahl, Mark D. Plumbley, Alois Sontacchi:

Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models. 1-5 - Shuichiro Nishigori, Koichi Saito, Naoki Murata, Masato Hirano, Shusuke Takahashi, Yuki Mitsufuji:

Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement. 1-5 - Francesco Paissan, Gordon Wichern, Yoshiki Masuyama, Ryo Aihara, François G. Germain, Kohei Saijo, Jonathan Le Roux:

FasTUSS: Faster Task-Aware Unified Source Separation. 1-5 - Junyi Fan, Donald Williamson:

JSQA: Speech Quality Assessment with Perceptually-Inspired Contrastive Pretraining Based on JND Audio Pairs. 1-5 - Ryo Sato, Chiho Haruta, Nobuhiko Hiruma, Keisuke Imoto:

Context-Aware Query Refinement for Target Sound Extraction: Handling Partially Matched Queries. 1-5 - Chuan Wen, Sarah Verhulst:

Low-Complexity Individualized Noise Reduction for Real-Time Processing. 1-5 - Gal Itzhak, Simon Doclo, Israel Cohen:

Optimal Region-of-Interest Beamforming for Audio Conferencing with Dual Perpendicular Sparse Circular Sectors. 1-5 - Anastasia Kuznetsova, Inseon Jang, Wootaek Lim, Minje Kim:

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine. 1-5 - Yun-Ning Hung, Igor Pereira, Filip Korzeniowski:

Moises-Light: Resource-efficient Band-split U-Net For Music Source Separation. 1-5 - Jianyuan Feng, Guangzheng Li, Yangfei Xu:

Hybrid-Sep: Language-queried audio source separation via pre-trained Model Fusion and Adversarial Consistent Training. 1-5 - Gene-Ping Yang, Sebastian Braun:

Distributed Asynchronous Device Speech Enhancement via Windowed Cross-Attention. 1-5 - Venkatakrishnan Vaidyanathapuram Krishnan, Nathaniel Condit-Schultz:

The Perception of Phase Intercept Distortion and its Application in Data Augmentation. 1-5 - Wataru Nakata, Yuma Koizumi, Shigeki Karita, Robin Scheibler, Haruko Ishikawa, Adriana Guevara-Rukoz, Heiga Zen, Michiel Bacchiani:

ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability. 1-5 - Aleksandr Lukoianov, Anssi Klapuri:

Transcribing Rhythmic Patterns of the Guitar Track in Polyphonic Music. 1-5 - Tanmay Khandelwal, Magdalena Fuentes:

Post-Training Quantization for Audio Diffusion Transformers. 1-5 - Paul Primus, Florian Schmid, Gerhard Widmer:

TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining. 1-5 - Takuma Okamoto:

SFC-L1: Sound Field Control With Least Absolute Deviation Regression. 1-4 - You Zhang

, Andrew Francl, Ruohan Gao, Paul Calamia, Zhiyao Duan, Ishwarya Ananthabhotla
:
Towards Perception-Informed Latent HRTF Representations. 1-5 - Yuta Kusaka, Akira Maezawa:

Learn from Virtual Guitar: A Comparative Analysis of Automatic Guitar Transcription using Synthetic and Real Audio. 1-5 - Riccardo Miccini

, Minje Kim, Clément Laroche, Luca Pezzarossa, Paris Smaragdis:
Adaptive Slimming for Scalable and Efficient Speech Enhancement. 1-5 - Yassine El Kheir, Arnab Das, Enes Erdem Erdogan, Fabian Ritter Gutierrez, Tim Polzehl, Sebastian Möller:

Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection. 1-5 - Jiarui Hai, Helin Wang, Weizhe Guo, Mounya Elhilali

:
FlexSED: Towards Open-Vocabulary Sound Event Detection. 1-5 - Huajian Fang, Buye Xu, Jacob Donley, Ashutosh Pandey, DeLiang Wang, Daniel Wong:

A Unified Framework for Evaluating DNN-Based Feedforward, Feedback, and Hybrid Active Noise Cancellation. 1-5 - Jakob Kienegger, Alina Mannanova, Huajian Fang, Timo Gerkmann:

Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance. 1-5 - Srikanth Korse, Andreas Walther, Emanuël A. P. Habets:

Stereo Reproduction in the Presence of Sample Rate Offsets. 1-5 - Cameron Churchwell, Minje Kim, Paris Smaragdis:

Combolutional Neural Networks. 1-5 - Giovanni Bologni, Richard Heusdens, Richard C. Hendriks:

Cyclic Multichannel Wiener Filter for Acoustic Beamforming. 1-5 - Ricardo Falcón Pérez, Ruohan Gao, Gregor Mueckl, Sebastià Vicenc Amengual Garí, Ishwarya Ananthabhotla

:
Scene-wide Acoustic Parameter Estimation. 1-5 - Ahmad Aloradi, Ünal Ege Gaznepoglu, Emanuël A. P. Habets, Daniel Tenbrinck:

VoxATtack: A Multimodal Attack on Voice Anonymization Systems. 1-5 - Klaus Brümann, Kouei Yamaoka, Nobutaka Ono, Simon Doclo:

Incremental Averaging Method to Improve Graph-Based Time-Difference-of-Arrival Estimation. 1-5 - Harnick Khera, Johan Pauwels

, Alan Archer-Boyd, Mark B. Sandler:
Beyond Architecture: The Critical Impact of Inference Overlap on Music Source Separation Benchmarks. 1-5

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














