RHVoice 1.6.0 speech synthesizer release

The open source speech synthesis system RHVoice 1.6.0 was released, initially developed to provide high-quality support for the Russian language, but then adapted for other languages, including English, Portuguese, Ukrainian, Kyrgyz, Tatar and Georgian. The code is written in C++ and distributed under the LGPL 2.1 license. Work is supported in GNU/Linux, Windows and Android. The program is compatible with typical TTS (text-to-speech) interfaces for text-to-speech: SAPI5 (Windows), Speech Dispatcher (GNU/Linux) and Android Text-To-Speech API, but can also be used in the NVDA screen reader. The creator and main developer of RHVoice is Olga Yakovleva, who develops the project despite being completely blind.

The new version adds 5 new voice options for Russian speech. Implemented support for the Albanian language. Updated dictionary for the Ukrainian language. Support for voicing emoji characters has been expanded. Bug fixes have been made in the Android app, importing custom dictionaries has been simplified, and support for the Android 11 platform has been added. New settings and functionality have been added to the engine core, including g2p.case, word_break, and support for equalization filters.

Recall that RHVoice uses the developments of the HTS project (HMM / DNN-based Speech Synthesis System) and the parametric synthesis method with statistical models (Statistical Parametric Synthesis based on HMM - Hidden Markov Model). The advantage of the statistical model is low overhead and undemanding CPU power. All operations are performed locally on the user's system. Three levels of speech quality are supported (the lower the quality, the higher the performance and the shorter the response time).

The disadvantage of the statistical model is the relatively low quality of pronunciation, which does not reach the level of synthesizers that generate speech based on a combination of fragments of natural speech, but nevertheless the result is quite legible and resembles a broadcast recording from a loudspeaker. In comparison, the Silero project, which provides an open engine for speech synthesis based on machine learning technologies and a set of models for the Russian language, surpasses RHVoice in quality.

There are 13 voice options available for Russian, and 5 for English. Voices are formed based on natural speech recordings. In the settings you can change the speed, pitch and volume. The Sonic library can be used to change the tempo. It is possible to automatically detect and switch the language based on the analysis of the input text (for example, for words and quotes in another language, a native synthesis model for this language can be used). Voice profiles are supported that define combinations of voices for different languages.

Source: opennet.ru

Add a comment