TTS with neural nets has been done since many decades but to my knowledge not for German).They replace the HMM approach to predict the best acoustic parameters for a given sequence of symbols representing text.The approaches are trained on a relatively large data corpus, but have a small footprint for synthesis because they dont operate on the wavedata directly but on some parameterized representation (e.g.LPC). However this is also the reason they tend to produce artifacts.
NLP module: (in some demos this sentence is truncuated due to. LPC (linear predictive coding), originally a compression algorithm, useful for synthesis because based on a sourcefilter model of speech. Hybrid approach combining formant synthesis for voiced phonemes and concatenating with waveform coded units for unvoiced parts. Synchronous Overlap and Add (PSOLA): famous algorithm to change pitch and. Great tool also to teach about speech synthesis because the output and input of different poicessing modules can be viewed as text. Speech Communication, Volume 52, Issue 2, February 2010, Pages 164-179. The NLP (text phonemisation) component is Txt2Pho, the Hadifix NLP in combination with Mbrola-Synthesis. The encoder is a bidirectional recurrent neural network that accepts text or phonemes as inputs, while the decoder is a recurrent neural network (RNN) with attention that produces vocoder acoustic features. For the German samples, the Pavoque database was used for training. Systems are usually either system- or signal modeling, primarily. Removed product comparison table (too much work to keep up-to-date).
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |