SIP phone on STM32F7-Discovery

Hello.

A while ago we wrote about how we managed to launch a SIP phone on STM32F4-Discovery with 1 MB ROM and 192 KB RAM) based on Embox. Here it must be said that that version was minimal and connected two phones directly without a server and with voice transmission in only one direction. Therefore, we decided to launch a more complete phone with a call through the server, voice transmission in both directions, but at the same time keep within the smallest possible memory size.


For the phone, it was decided to choose an application simple_pjsua as part of the PJSIP library. This is a minimal application that can register on the server, receive and answer calls. Below I will immediately give a description of how to run it on STM32F7-Discovery.

How to launch

  1. Configuring Embox
    make confload-platform/pjsip/stm32f7cube
  2. Set the required SIP account in the conf/mods.config file.
    
    include platform.pjsip.cmd.simple_pjsua_imported(
        sip_domain="server", 
        sip_user="username",
        sip_passwd="password")
    

    where server is a SIP server (for example, sip.linphone.org), username и Password - account username and password.

  3. Assembling Embox as a team make. About the board firmware we have on wiki and article.
  4. Run the “simple_pjsua_imported” command in the Embox console
    
    00:00:12.870    pjsua_acc.c  ....SIP outbound status for acc 0 is not active
    00:00:12.884    pjsua_acc.c  ....sip:[email protected]: registration success, status=200 (Registration succes
    00:00:12.911    pjsua_acc.c  ....Keep-alive timer started for acc 0, destination:91.121.209.194:5060, interval:15s
    

  5. Finally, it remains to insert speakers or headphones into the audio output, and speak into two small MEMS microphones next to the display. We call from Linux through the application simple_pjsua, pjsua. Well, or you can use any other type of linphone.

All this is described on our wiki.

How did we get there

So, initially the question arose about choosing a hardware platform. Since it was clear that STM32F4-Discovery would not fit from memory, STM32F7-Discovery was chosen. She has a 1 MB flash drive and 256 KB of RAM (+ 64 special fast memory, which we will also use). Also not a lot for calls through the server, but we decided to try to fit in.

Conditionally for themselves, the task was divided into several stages:

  • Running PJSIP on QEMU. It was convenient for debugging, plus we already had support for the AC97 codec there.
  • Voice recording and playback on QEMU and on STM32.
  • Porting an application simple_pjsua from PJSIP. It allows you to register on the SIP server and make calls.
  • Deploy your own Asterisk-based server and test on it, then try external ones such as sip.linphone.org

Sound in Embox works through Portaudio, which is also used in PISIP. The first problems appeared on QEMU - WAV played well at 44100 Hz, but at 8000 something clearly went wrong. It turned out that it was a matter of setting the frequency - by default it was 44100 in the equipment, and this did not change programmatically.

Here, perhaps, it is worth explaining a little how the sound is played in general. The sound card can be set to some pointer to a piece of memory from which you want to play or record at a predetermined frequency. After the buffer ends, an interrupt is generated and execution continues with the next buffer. The fact is that these buffers need to be filled in advance while the previous one is being played. We will face this problem further on STM32F7.

Next, we rented a server and deployed Asterisk on it. Since it was necessary to debug a lot, but I did not want to speak into the microphone much, it was necessary to make automatic playback and recording. To do this, we patched simple_pjsua so that you can slip files instead of audio devices. In PJSIP, this is done quite simply, since they have the concept of a port, which can be either a device or a file. And these ports can be flexibly connected to other ports. You can see the code in our pjsip repositories. As a result, the scheme was as follows. On the Asterisk server, I started two accounts - for Linux and for Embox. Next, the command is executed on Embox simple_pjsua_imported, Embox is registered on the server, after which we call Embox from Linux. At the moment of connection, we check on the Asterisk server that the connection is established, and after a while we should hear sound from Linux in Embox, and in Linux we save the file that is played from Embox.

After it worked on QEMU, we moved on to porting to STM32F7-Discovery. The first problem is that they didn’t fit into 1 MB of ROM without the enabled compiler optimization “-Os” for the size of the image. That's why we included "-Os". Further, the patch disabled support for C ++, so it is needed only for pjsua, and we use simple_pjsua.

After being placed simple_pjsua, decided that now there is a chance to launch it. But first it was necessary to deal with the recording and playback of the voice. The question is where to write? We chose external memory - SDRAM (128 MB). You can try this yourself:

Creates a stereo WAV with a frequency of 16000 Hz and a duration of 10 seconds:


record -r 16000 -c 2 -d 10000 -m C0000000

We lose:


play -m C0000000

There are two problems here. The first with the codec - WM8994 is used, and it has such a thing as a slot, and there are 4 of these slots. So, by default, if this is not configured, then when playing audio, playback occurs in all four slots. Therefore, at a frequency of 16000 Hz, we received 8000 Hz, but for 8000 Hz, playback simply did not work. When only slots 0 and 2 were selected, it worked as it should. Another problem was the audio interface in the STM32Cube, in which the audio output works via SAI (Serial Audio Interface) synchronously with the audio input (I didn’t understand the details, but it turns out that they share a common clock and when the audio output is initialized, audio is somehow attached to it entrance). That is, you cannot run them separately, so we did the following - the audio input and audio output always work (including interrupts are generated). But when nothing is being played in the system, then we simply slip an empty buffer into the audio output, and when playback starts, we honestly begin to fill it.

Further, we encountered the fact that the sound during voice recording was very quiet. This is due to the fact that MEMS microphones on the STM32F7-Discovery somehow do not work well at frequencies below 16000 Hz. Therefore, we set 16000 Hz, even if 8000 Hz comes. To do this, though, it was necessary to add a software conversion of one frequency to another.

Next, I had to increase the size of the heap, which is located in RAM. According to our calculations, pjsip required about 190 KB, and we only have about 100 KB left. Here I had to use some external memory - SDRAM (about 128 KB).

After all these edits, I saw the first packages between Linux and Embox, and I heard the sound! But the sound was terrible, not at all the same as on QEMU, it was impossible to make out anything. Then we thought about what could be the matter. Debugging showed that Embox simply does not have time to fill / unload audio buffers. While pjsip was processing one frame, 2 interrupts had time to occur about the completion of buffer processing, which is too much. The first thought for speed up was compiler optimization, but it was already included in PJSIP. The second is a hardware floating point, we talked about it in article. But as practice has shown, FPU did not give a significant increase in speed. The next step was to prioritize threads. Embox has different scheduling strategies, and I have included one that supports priorities and set audio streams to the highest priority. This didn't help either.

The next idea was that we are working with external memory and it would be nice to move structures there that are accessed extremely often. I did a preliminary analysis of when and under what simple_pjsua allocates memory. It turned out that out of 190 Kb, the first 90 Kb are allocated for internal needs of PJSIP and they are not accessed very often. Further, during an incoming call, the pjsua_call_answer function is called, in which buffers are then allocated for working with incoming and outgoing frames. It was still about 100 Kb. And then we did the following. Until the moment of the call, we place the data in external memory. As soon as the call, we immediately replace the heap with another one - in RAM. Thus, all “hot” data was transferred to faster and more predictable memory.

As a result, all this together made it possible to launch simple_pjsua and call through your server. And then through other servers such as sip.linphone.org.

Conclusions

As a result, it was possible to launch simple_pjsua with voice transmission in both directions through the server. The problem with additionally spent 128 KB of SDRAM can be solved by using a slightly more powerful Cortex-M7 (for example, STM32F769NI with 512 KB of RAM), but at the same time, we still have not given up hope to get into 256 KB 🙂 We will be glad if someone is interested, Or better yet, try it. All sources, as usual, are in our repositories.

Source: habr.com

Add a comment