Spleeter, a system for separating music and voice, is open source

Streaming provider Deezer has opened source code for the Spleeter experimental project, which develops a machine learning system for separating sound sources from complex sound compositions. The program allows you to remove vocals from the composition and leave only the musical accompaniment, manipulate the sound of individual instruments or discard the music and leave the voice for overlaying on another sound range, creating mixes, karaoke or transcription. The project code is written in Python using the Tensorflow engine and spreads under the MIT license.

For loading offered already trained models for separating vocals (one voice) from accompaniment, as well as for splitting into 4 and 5 streams, including vocals, drums, basses, piano and the rest of the sound. Spleeter can be used both as a Python library and as a standalone command line utility. In the simplest case, based on the source file is being created two, four or five files with voice and accompaniment components (vocals.wav, drums.wav, bass.wav, piano.wav, other.wav).

When divided into 2 and 4 streams, Spleeter provides very high performance, for example, when using the GPU, splitting an audio file into 4 streams takes 100 times less time than the duration of the original composition. On a system with an NVIDIA GeForce GTX 1080 GPU and a 32-core Intel Xeon Gold 6134 CPU, the musDB test collection was processed in 27 seconds, lasting three hours and 90 minutes.

Spleeter, a system for separating music and voice, is open source



Of Spleeter's strengths over other audio splitting developments such as the open source project open unmix, mentions the use of higher quality models built on the basis of an extensive collection of sound files. Due to copyright restrictions, machine learning researchers are limited to access to fairly meager public collections of music files, while for Spleeter, models were built using data from Deezer's extensive music catalog.

On Compared With Open-Unmix, the Spleeter tool splits about 35% faster when tested on the CPU, supports MP3 files, and generates noticeably better results (voice extraction in Open-Unmix leaves traces of some instruments, which is probably due to the fact that the models Open-Unmix are trained on a collection of only 150 songs).

Source: opennet.ru

Add a comment