RHVoice 1.8.0 speech synthesizer release

The open source speech synthesis system RHVoice 1.8.0 was released, initially developed to provide high-quality support for the Russian language, but then adapted for other languages, including English, Portuguese, Ukrainian, Kyrgyz, Tatar and Georgian. The code is written in C++ and distributed under the LGPL 2.1 license. Work is supported in GNU/Linux, Windows and Android. The program is compatible with typical TTS (text-to-speech) interfaces for text-to-speech: SAPI5 (Windows), Speech Dispatcher (GNU/Linux) and Android Text-To-Speech API, but can also be used in the NVDA screen reader. The creator and main developer of RHVoice is Olga Yakovleva, who develops the project despite being completely blind.

Version 1.8 for the Android platform introduces a new voice and language data management system that allows you to download voice data updates without updating the mobile application. Data updates for added voices and languages ​​are checked automatically. In addition, the new release introduces support for the Polish language and adds a new voice for the Macedonian language. Compatibility with the latest alpha and beta releases of the NVDA screen reader is ensured. Fixed problems with building on the Linux platform that occurred when Speech Dispatcher was not present.

Recall that RHVoice uses the developments of the HTS project (HMM / DNN-based Speech Synthesis System) and the parametric synthesis method with statistical models (Statistical Parametric Synthesis based on HMM - Hidden Markov Model). The advantage of the statistical model is low overhead and undemanding CPU power. All operations are performed locally on the user's system. Three levels of speech quality are supported (the lower the quality, the higher the performance and the shorter the response time).

The disadvantage of the statistical model is the relatively low quality of pronunciation, which does not reach the level of synthesizers that generate speech based on a combination of fragments of natural speech, but nevertheless the result is quite legible and resembles a broadcast recording from a loudspeaker. In comparison, the Silero project, which provides an open engine for speech synthesis based on machine learning technologies and a set of models for the Russian language, surpasses RHVoice in quality.

There are 14 voice options available for Russian, and 6 for English. Voices are formed based on natural speech recordings. In the settings you can change the speed, pitch and volume. The Sonic library can be used to change the tempo. It is possible to automatically detect and switch the language based on the analysis of the input text (for example, for words and quotes in another language, a native synthesis model for this language can be used). Voice profiles are supported that define combinations of voices for different languages.

Source: opennet.ru

Add a comment