Read Between Notes: Data Transfer System Within Music

Read Between Notes: Data Transfer System Within Music

Express what words cannot convey; feel a wide variety of emotions intertwined in a hurricane of feelings; break away from the earth, the sky and even the universe itself, going on a journey where there are no maps, no roads, no signs; come up with, tell and experience a whole story that will always remain unique and unrepeatable. All this allows us to do music - an art that has existed for many thousands of years and delights our ears and hearts.

However, music, or rather musical works, can serve not only for aesthetic pleasure, but also for the transmission of information encoded in them, intended for any device and imperceptible to the listener. Today we are going to get acquainted with a very unusual study in which graduate students from the Swiss ETH Zurich were able to imperceptibly for the human ear to introduce certain data into musical works, due to which the music itself becomes a data transmission channel. How exactly did they implement their technology, are the melodies with and without embedded data very different, and what did practical tests show? We learn about this from the report of the researchers. Go.

Research basis

The researchers call their technology acoustic transmission technology. When the speaker plays a modified melody, a person perceives it as normal, but, for example, a smartphone can read encoded information between lines, more precisely between notes, so to speak. The most important aspect in the implementation of this data transmission technique, scientists (the fact that these guys are still graduate students does not prevent them from being scientists) call the speed and reliability of transmission while maintaining the level of these parameters, regardless of the selected audio file. Psychoacoustics, which studies the psychological and physiological aspects of human perception of sounds, helps to cope with this task.

The backbone of acoustic data transmission can be called OFDM (orthogonal frequency division multiplexing), which, along with the adaptation of subcarriers to the original music over time, made it possible to maximize the use of the transmitted frequency spectrum for information transmission. As a result, a transmission rate of 412 bps was achieved over distances of up to 24 meters (error rate < 10%). Practical experiments with the participation of 40 volunteers confirmed the fact that it is almost impossible to hear the difference between the original melody and the one in which the information was embedded.

Where can this technology be applied in practice? Researchers have their own answer: almost all modern smartphones, laptops and other handheld devices are equipped with microphones, and in many public places (cafes, restaurants, shopping centers, etc.) there are speakers with background music. This background melody can be embedded, for example, data for connecting to a Wi-Fi network without the need to take additional steps.

The general features of acoustic data transmission have become clear to us, now let's move on to a detailed study of the structure of this system.

Description of the system

The introduction of data into the melody occurs due to frequency masking. In slots, masking frequencies are identified and OFDM subcarriers close to these masking elements are filled with data.

Read Between Notes: Data Transfer System Within Music
Image #1: converting the original file into a composite signal (melody + data) transmitted through the speakers.

First, the original audio signal is divided into successive segments for analysis. Each such segment (Hi) of L = 8820 samples, equal to 200 ms, is multiplied by window* to minimize edge effects.

Window* is a weighting function used to control the effects due to the presence of side lobes in the spectral estimates.

Next, the dominant frequencies of the original signal were found in the range from 500 Hz to 9.8 kHz, which made it possible to obtain masking frequencies fM,l for this segment. In addition, data was transmitted in a small range from 9.8 to 10 kHz to locate the subcarriers at the receiver. The upper limit of the usable frequency range has been set to 10 kHz due to the low sensitivity of smartphone microphones at high frequencies.

Masking frequencies were determined individually for each analyzed segment. The three dominant frequencies were determined using the HPS (Harmonic Spectrum of Products) method, after which they were rounded up to the nearest notes of the harmonic chromatic scale. This is how the main notes fF,i = 1…3 were obtained, lying between the keys C0 (16.35 Hz) and B0 (30.87 Hz). Based on the fact that the fundamental notes are too low for use in data transmission, their higher octaves 500kfF,i were calculated in the range of 9.8 Hz ... 2 kHz. Many of these frequencies (fO,l1) were more pronounced due to the nature of the HPS.

Read Between Notes: Data Transfer System Within Music
Image #2: Calculated octaves fO,l1 for the fundamental notes and harmonics fH,l2 of the strongest tone.

The set of octaves and harmonics as a result were used as masking frequencies, on the basis of which the frequencies of the OFDM subcarrier fSC,k were obtained. Two subcarriers were inserted below and above each masking frequency.

Next, the spectrum of the audio segment Hi was filtered at subcarrier frequencies fSC,k. After that, based on the information bits in Bi, an OFDM symbol was created, due to which the composite segment Ci could be transmitted through the speaker. The magnitude and phase of the subcarriers must be chosen in such a way that the receiver can extract the transmitted data while the listener does not notice changes in the melody.

Read Between Notes: Data Transfer System Within Music
Image No. 3: section of the spectrum and frequency of the subcarriers of the Hi segment of the original melody.

When an audio signal with information encoded in it is played through the speakers, the microphone of the receiving device records it. To find the starting positions of the embedded OFDM symbols, the entries must first be bandpass filtered. Thus, the upper frequency range is extracted, where there are no musical interference signals between subcarriers. You can find the start of OFDM symbols using the cyclic prefix.

After detecting the beginning of the OFDM symbols, the receiver obtains information about the most dominant notes by decoding the upper frequency domain. In addition, OFDM is quite resistant to narrowband interference sources, since they affect only some of the subcarriers.

Practical tests

The KRK Rokit 8 speaker acted as the source of the modified melodies, and the Nexus 5X smartphone played the role of the host.

Read Between Notes: Data Transfer System Within Music
Image #4: Difference between actual OFDM manifestations and correlation peaks measured indoors at a distance of 5m between speaker and microphone.

Most OFDM points are between 0 and 25 ms, so you can find a valid start within the 66.6 ms cyclic prefix. The researchers note that the receiver (in this experiment, a smartphone) takes into account that OFDM symbols are played periodically, which improves their detection.

The first thing to check was the effect of distance on the bit error rate (BER). To do this, three tests were carried out in different types of rooms: a carpeted corridor, an office with linoleum on the floor, and an auditorium with a wooden floor.


The song "And The Cradle Will Rock" by Van Halen was chosen as the test subject.

The sound volume was adjusted so that the sound level measured by a smartphone at a distance of 2 m from the speaker was 63 dB.

Read Between Notes: Data Transfer System Within Music
Image No. 5: BER values ​​depending on the distance between the speaker and the microphone (blue line - audience, green - corridor, orange - office).

In the corridor, the sound of 40 dB was picked up by a smartphone at a distance of up to 24 meters from the speaker. In the audience at a distance of 15 m, the sound was 55 dB, and in the office at a distance of 8 meters, the level of sound perceived by the smartphone reached 57 dB.

Because the audience and office are more reverberant, late OFDM symbol echoes exceed the length of the cyclic prefix and increase the BER.

Reverberation* - a gradual decrease in the intensity of sound due to its multiple reflections.

The researchers further demonstrated the versatility of their system by applying it to 6 different songs in three genres (table below).

Read Between Notes: Data Transfer System Within Music
Table 1: songs used in the tests.

Also through the table data we can see the bit rate and bit error rates for each song. The data rate is different because differential BPSK (Phase Shift Keying) works better when the same subcarriers are used. And this is possible when adjacent segments contain the same masking elements. Continuously loud songs provide an optimal base for hiding data because the masking frequencies are more pronounced over a wide frequency range. Rapidly changing music can only partially mask OFDM symbols due to the fixed length of the analysis window.

Next, people began testing the system, who had to determine which melody was original and which was modified by the information embedded in it. For this, 12-second excerpts from the songs from Table No. 1 were posted on a special website.

In the first experiment (E1), each participant was given either the modified or original piece to listen to and had to decide whether the piece was original or changed. In the second experiment (E2), participants could listen to both versions as many times as they wanted, and then decide which one was original and which one was modified.

Read Between Notes: Data Transfer System Within Music
Table #2: Results of experiments E1 and E2.

There are two indicators in the results of the first experiment: p(O|O) - the percentage of participants who correctly marked the original melody and p(O|M) - the percentage of participants who marked the modified version of the melody as original.

Curiously, some of the participants felt that certain altered melodies were more original than the original itself, according to the researchers. The average of both experiments suggests that the average listener will not notice the difference between a regular tune and one that has been embedded with data.

Naturally, music connoisseurs and musicians will be able to catch some inaccuracies and suspicious elements in the changed melodies, but these elements are not so significant as to cause discomfort.

And now we ourselves can participate in the experiment. Below are two versions of the same melody - original and modified. Do you hear the difference?

The original version of the melody
vs
Modified version of the melody

For a more detailed acquaintance with the nuances of the study, I recommend looking at report research group.

You can also download the ZIP archive of the audio files of the original and modified melodies used in the study at this link.

Finale

In this paper, PhD students at the Swiss ETH Zurich have described an amazing data transfer system within music. To do this, they used frequency masking, which made it possible to embed data into the melody played by the speaker. This melody is perceived by the microphone of the device, which recognizes the hidden data and decodes it, while the average listener will not even notice the difference. In the future, the guys plan to develop their system, choosing more advanced methods for introducing data into audio.

When someone comes up with something unusual, and most importantly working, we are always happy. But even more joy is that this invention was created by young people. Science has no age limits. And if young people think science is boring, then it is being presented from the wrong angle, so to speak. After all, as we know, science is an amazing world that never ceases to amaze.

Friday off-top:


Since we are talking about music, or rather about rock music, here is a wonderful journey through the expanses of rock.


Queen, "Radio Ga Ga" (1984).

Thanks for watching, stay curious, and have a great weekend everyone! πŸ™‚

Thank you for staying with us. Do you like our articles? Want to see more interesting content? Support us by placing an order or recommending to friends, 30% discount for Habr users on a unique analogue of entry-level servers, which was invented by us for you: The whole truth about VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps from $20 or how to share a server? (available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).

Dell R730xd 2 times cheaper? Only here 2 x Intel TetraDeca-Core Xeon 2x E5-2697v3 2.6GHz 14C 64GB DDR4 4x960GB SSD 1Gbps 100 TV from $199 in the Netherlands! Dell R420 - 2x E5-2430 2.2Ghz 6C 128GB DDR3 2x960GB SSD 1Gbps 100TB - from $99! Read about How to build infrastructure corp. class with the use of Dell R730xd E5-2650 v4 servers worth 9000 euros for a penny?

Source: habr.com

Add a comment