


default search action
10th SSW 2019: Vienna, Austria
- Michael Pucher:

10th ISCA Speech Synthesis Workshop, SSW 2019, Vienna, Austria, September 20-22, 2019. ISCA 2019
Keynote 1
- Aäron van den Oord:

Deep learning for speech synthesis.
Oral 1: Neural vocoder
- Xin Wang, Junichi Yamagishi:

Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis. 1-6 - Prachi Govalkar, Johannes Fischer, Frank Zalkow, Christian Dittmar:

A Comparison of Recent Neural Vocoders for Speech Signal Reconstruction. 7-12 - Keiichiro Oura, Kazuhiro Nakamura, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda:

Deep neural network based real-time speech vocoder with periodic and aperiodic inputs. 13-18 - Qiao Tian, Xucheng Wan, Shan Liu:

Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder. 19-23
Oral 2: Adaptation
- Qiong Hu, Erik Marchi, David Winarsky, Yannis Stylianou, Devang Naik, Sachin Kajarekar:

Neural Text-to-Speech Adaptation from Low Quality Public Recordings. 24-28 - Bastian Schnell, Philip N. Garner:

Neural VTLN for Speaker Adaptation in TTS. 29-34 - David Álvarez, Santiago Pascual, Antonio Bonafonte:

Problem-Agnostic Speech Embeddings for Multi-Speaker Text-to-Speech with SampleRNN. 35-39
Poster 1: Voice conversion and multi-speaker TTS
- Hiroki Kanagawa, Yusuke Ijima:

Multi-Speaker Modeling for DNN-based Speech Synthesis Incorporating Generative Adversarial Networks. 40-44 - Ivan Himawan, Sandesh Aryal, Iris Ouyang, Shukhan Ng, Pierre Lanchantin:

Speaker Adaptation of Acoustic Model using a Few Utterances in DNN-based Speech Synthesis Systems. 45-50 - Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:

DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis. 51-56 - Wen-Chin Huang, Yi-Chiao Wu, Kazuhiro Kobayashi, Yu-Huai Peng, Hsin-Te Hwang, Patrick Lumban Tobing, Yu Tsao, Hsin-Min Wang, Tomoki Toda:

Generalization of Spectrum Differential based Direct Waveform Modification for Voice Conversion. 57-62 - Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:

Statistical Voice Conversion with Quasi-periodic WaveNet Vocoder. 63-68 - Hitoshi Suda, Daisuke Saito, Nobuaki Minematsu:

Voice Conversion without Explicit Separation of Source and Filter Components Based on Non-negative Matrix Factorization. 69-74 - Gaku Kotani, Daisuke Saito:

Voice conversion based on full-covariance mixture density networks for time-variant linear transformations. 75-80 - Tobias Gburrek, Thomas Glarner, Janek Ebbers, Reinhold Haeb-Umbach, Petra Wagner:

Unsupervised Learning of a Disentangled Speech Representation for Voice Conversion. 81-86 - Maitreya Patel, Mihir Parmar, Savan Doshi, Nirmesh Shah, Hemant A. Patil:

Novel Inception-GAN for Whispered-to-Normal Speech Conversion. 87-92 - Riku Arakawa, Shinnosuke Takamichi, Hiroshi Saruwatari:

Implementation of DNN-based real-time voice conversion and its improvements by audio data augmentation and mask-shaped device. 93-98
Keynote 2
- W. Tecumseh Fitch, Bart de Boer:

Synthesizing animal vocalizations and modelling animal speech.
Oral 3: Evaluation and performance
- Rob Clark, Hanna Silén, Tom Kenter, Ralph Leith:

Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs. 99-104 - Petra Wagner, Jonas Beskow, Simon Betz, Jens Edlund, Joakim Gustafson, Gustav Eje Henter, Sébastien Le Maguer

, Zofia Malisz, Éva Székely, Christina Tånnander, Jana Voße:
Speech Synthesis Evaluation - State-of-the-Art Assessment and Suggestion for a Novel Research Program. 105-110 - Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Shinji Takaki, Junichi Yamagishi:

Rakugo speech synthesis using segment-to-segment neural transduction and style tokens - toward speech synthesis for entertaining audiences. 111-116 - Matthew P. Aylett

, David A. Braude, Christopher J. Pidcock, Blaise Potard:
Voice Puppetry: Exploring Dramatic Performance to Develop Speech Synthesis. 117-120
Oral 4: Speech science
- Avashna Govender, Cassia Valentini-Botinhao

, Simon King:
Measuring the contribution to cognitive load of each predicted vocoder speech parameter in DNN-based speech synthesis. 121-126 - Lorenz Gutscher, Michael Pucher, Carina Lozo

, Marisa Hoeschele, Daniel C. Mann
:
Statistical parametric synthesis of budgerigar songs. 127-131 - Marc Freixes

, Marc Arnela, Francesc Alías
, Joan Claudi Socoró:
GlottDNN-based spectral tilt analysis of tense voice emotional styles for the expressive 3D numerical synthesis of vowel [a]. 132-136
Poster 2: Applications and practical issues
- Christina Tånnander, Jens Edlund:

Preliminary guidelines for the efficient management of OOV words for spoken text. 137-142 - Noriyuki Matsunaga, Yamato Ohtani, Tatsuya Hirahara:

Loss Function Considering Temporal Sequence for Feed-Forward Neural Network-Fundamental Frequency Case. 143-148 - Tomoki Koriyama, Shinnosuke Takamichi, Takao Kobayashi:

Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis. 149-154 - Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen, Massimiliano Todisco, Nicholas W. D. Evans, Jean-François Bonastre:

Speaker Anonymization Using X-vector and Neural Waveform Models. 155-160 - Taiki Nakamura, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Hiroshi Saruwatari:

V2S attack: building DNN-based voice conversion from automatic speaker verification. 161-165 - Takato Fujimoto, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:

Impacts of input linguistic feature representation on Japanese end-to-end speech synthesis. 166-171 - Nobuyuki Nishizawa

, Tomohiro Obara, Gen Hattori:
Evaluation of Block-Wise Parameter Generation for Statistical Parametric Speech Synthesis. 172-176 - Motoki Shimada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:

Low computational cost speech synthesis based on deep neural networks using hidden semi-Markov model structures. 177-182 - Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura:

Neural iTTS: Toward Synthesizing Speech in Real-time with End-to-end Neural Text-to-Speech Framework. 183-188
Keynote 3
- Claire Gardent:

Natural Language Generation: Creating Text.
Oral 5: Language and dialect varieties
- Aye Mya Hlaing, Win Pa Pa, Ye Kyaw Thu:

Enhancing Myanmar Speech Synthesis with Linguistic Information and LSTM-RNN. 189-193 - Anusha Prakash, Anju Leela Thomas, Srinivasan Umesh, Hema A. Murthy:

Building Multilingual End-to-End Speech Synthesisers for Indian Languages. 194-199 - Michael Pucher, Carina Lozo, Philip Vergeiner, Dominik Wallner:

Diphthong interpolation, phone mapping, and prosody transfer for speech synthesis of similar dialect pairs. 200-204 - Elshadai Tesfaye Biru, Yishak Tofik Mohammed, David Tofu, Erica Cooper, Julia Hirschberg:

Subset Selection, Adaptation, Gemination and Prosody Prediction for Amharic Text-to-Speech Synthesis. 205-210
Oral 6: Sequence to sequence model
- Yusuke Yasuda

, Xin Wang, Junichi Yamagishi:
Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments. 211-216 - Oliver Watts, Gustav Eje Henter, Jason Fong, Cassia Valentini-Botinhao

:
Where do the improvements come from in sequence-to-sequence neural TTS? 217-222 - Jason Fong, Jason Taylor, Korin Richmond

, Simon King:
A Comparison of Letters and Phones as Input to Sequence-to-Sequence Models for Speech Synthesis. 223-227
Poster 3: Prosody
- Yuma Shirahata, Daisuke Saito, Nobuaki Minematsu:

Generative Modeling of F0 Contours Leveraged by Phrase Structure and Its Application to Statistical Focus Control. 228-233 - Masashi Aso, Shinnosuke Takamichi, Norihiro Takamune, Hiroshi Saruwatari:

Subword tokenization based on DNN-based acoustic model for end-to-end prosody generation. 234-238 - Zack Hodari, Oliver Watts, Simon King:

Using generative modelling to produce varied intonation for speech synthesis. 239-244 - Éva Székely, Gustav Eje Henter, Jonas Beskow, Joakim Gustafson:

How to train your fillers: uh and um in spontaneous speech synthesis. 245-250 - Mohammad Eshghi, Kou Tanaka, Kazuhiro Kobayashi, Hirokazu Kameoka, Tomoki Toda:

An Investigation of Features for Fundamental Frequency Pattern Prediction in Electrolaryngeal Speech Enhancement. 251-256 - Zofia Malisz, Harald Berthelsen, Jonas Beskow, Joakim Gustafson:

PROMIS: a statistical-parametric speech synthesis system with prominence control via a prominence network. 257-262 - Raul Fernandez:

Deep Mixture-of-Experts Models for Synthetic Prosodic-Contour Generation. 263-268 - Rose Sloan, Syed Sarfaraz Akhtar, Bryan Li, Ritvik Shrivastava, Agustín Gravano, Julia Hirschberg:

Prosody Prediction from Syntactic, Lexical, and Word Embedding Features. 269-274 - Slava Shechtman, Alexander Sorin:

Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities. 275-280

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














