OpenCV on STM32F7-Discovery

OpenCV on STM32F7-Discovery I am one of the developers of the operating system Embox, and in this article I will talk about how I managed to run OpenCV on the STM32746G board.

If you type something like “OpenCV on STM32 board” into a search engine, you can find quite a few people who are interested in using this library on STM32 boards or other microcontrollers.
There are several videos that, judging by the name, should demonstrate what is needed, but usually (in all the videos that I saw) on the STM32 board, only the image was received from the camera and the result was displayed on the screen, and the image processing itself was done either on a regular computer, or on more powerful boards (for example, Raspberry Pi).

Why is it difficult?

The popularity of search queries is explained by the fact that OpenCV is the most popular computer vision library, which means that more developers are familiar with it, and the ability to run desktop-ready code on a microcontroller greatly simplifies the development process. But why are there still no popular ready-made recipes for solving this problem?

The problem of using OpenCV on small shawls is related to two features:

  • If you compile the library even with a minimal set of modules, it simply will not fit into the flash memory of the same STM32F7Discovery (even without taking into account the OS) due to a very large code (several megabytes of instructions)
  • The library itself is written in C++, which means
    • Need support for positive runtime (exceptions, etc.)
    • Little support for LibC/Posix, which is usually found in OS for embedded systems - you need a standard plus library and a standard STL template library (vector, etc.)

Porting to Embox

As usual, before porting any programs to the operating system, it's a good idea to try to build it in the form in which the developers intended it. In our case, there are no problems with this - the source code can be found on githabe, the library is built under GNU/Linux with the usual cmake.

The good news is that OpenCV can be built as a static library out of the box, which makes porting easier. We collect a library with a standard config and see how much space they take up. Each module is collected in a separate library.

> size lib/*so --totals
   text    data     bss     dec     hex filename
1945822   15431     960 1962213  1df0e5 lib/libopencv_calib3d.so
17081885     170312   25640 17277837    107a38d lib/libopencv_core.so
10928229     137640   20192 11086061     a928ed lib/libopencv_dnn.so
 842311   25680    1968  869959   d4647 lib/libopencv_features2d.so
 423660    8552     184  432396   6990c lib/libopencv_flann.so
8034733   54872    1416 8091021  7b758d lib/libopencv_gapi.so
  90741    3452     304   94497   17121 lib/libopencv_highgui.so
6338414   53152     968 6392534  618ad6 lib/libopencv_imgcodecs.so
21323564     155912  652056 22131532    151b34c lib/libopencv_imgproc.so
 724323   12176     376  736875   b3e6b lib/libopencv_ml.so
 429036    6864     464  436364   6a88c lib/libopencv_objdetect.so
6866973   50176    1064 6918213  699045 lib/libopencv_photo.so
 698531   13640     160  712331   ade8b lib/libopencv_stitching.so
 466295    6688     168  473151   7383f lib/libopencv_video.so
 315858    6972   11576  334406   51a46 lib/libopencv_videoio.so
76510375     721519  717496 77949390    4a569ce (TOTALS)

As you can see from the last line, .bss and .data do not take up much space, but the code is more than 70 MiB. It is clear that if this is statically linked to a specific application, the code will become less.

Let's try to throw out as many modules as possible so that a minimal example is assembled (which, for example, will simply output the OpenCV version), so we look cmake .. -LA and turn off in the options everything that turns off.

        -DBUILD_opencv_java_bindings_generator=OFF 
        -DBUILD_opencv_stitching=OFF 
        -DWITH_PROTOBUF=OFF 
        -DWITH_PTHREADS_PF=OFF 
        -DWITH_QUIRC=OFF 
        -DWITH_TIFF=OFF 
        -DWITH_V4L=OFF 
        -DWITH_VTK=OFF 
        -DWITH_WEBP=OFF 
        <...>

> size lib/libopencv_core.a --totals
   text    data     bss     dec     hex filename
3317069   36425   17987 3371481  3371d9 (TOTALS)

On the one hand, this is only one module of the library, on the other hand, this is without compiler optimization for code size (-Os). ~3 MiB of code is still quite a lot, but already gives hope for success.

Run in emulator

It is much easier to debug on the emulator, so first make sure that the library works on qemu. As an emulated platform, I chose Integrator / CP, because firstly, it is also ARM, and secondly, Embox supports graphics output for this platform.

Embox has a mechanism for building external libraries, using it we add OpenCV as a module (passing all the same options for the "minimal" build in the form of static libraries), after that I add a simple application that looks like this:

version.cpp:

#include <stdio.h>
#include <opencv2/core/utility.hpp>

int main() {
    printf("OpenCV: %s", cv::getBuildInformation().c_str());

    return 0;
}

We assemble the system, run it - we get the expected output.

root@embox:/#opencv_version                                                     
OpenCV: 
General configuration for OpenCV 4.0.1 =====================================
  Version control:               bd6927bdf-dirty

  Platform:
    Timestamp:                   2019-06-21T10:02:18Z
    Host:                        Linux 5.1.7-arch1-1-ARCH x86_64
    Target:                      Generic arm-unknown-none
    CMake:                       3.14.5
    CMake generator:             Unix Makefiles
    CMake build tool:            /usr/bin/make
    Configuration:               Debug

  CPU/HW features:
    Baseline:
      requested:                 DETECT
      disabled:                  VFPV3 NEON

  C/C++:
    Built as dynamic libs?:      NO
< Дальше идут прочие параметры сборки -- с какими флагами компилировалось,
  какие модули OpenCV включены в сборку и т.п.>

The next step is to run some example, preferably one of the standard ones offered by the developers themselves. on your site. I chose border detector canny.

The example had to be slightly rewritten to display the image with the result directly in the frame buffer. I had to do this, because. function imshow() can draw images through the QT, GTK and Windows interfaces, which, of course, will definitely not be in the config for STM32. In fact, QT can also be run on STM32F7Discovery, but this will be discussed in another article 🙂

After a short clarification in which format the result of the edge detector is stored, we get an image.

OpenCV on STM32F7-Discovery

original picture

OpenCV on STM32F7-Discovery

Experience the Power of Effective Results

Running on STM32F7Discovery

On 32F746GDISCOVERY there are several hardware memory sections that we can use one way or another

  1. 320KiB RAM
  2. 1MiB flash for image
  3. 8MiB SDRAM
  4. 16MiB QSPI NAND Flash
  5. microSD card slot

An SD card can be used to store images, but in the context of running a minimal example, this is not very useful.
The display has a resolution of 480×272, which means that the framebuffer memory will be 522 bytes at a depth of 240 bits, i.e. this is more than the size of RAM, so the framebuffer and the heap (which will be required, including for OpenCV, to store data for images and auxiliary structures) will be located in SDRAM, everything else (memory for stacks and other system needs) will go to RAM .

If we take the minimum config for STM32F7Discovery (throw out the entire network, all commands, make stacks as small as possible, etc.) and add OpenCV with examples there, the required memory will be as follows:

   text    data     bss     dec     hex filename
2876890  459208  312736 3648834  37ad42 build/base/bin/embox

For those who are not very familiar with which sections go where, I will explain: in .text и .rodata instructions and constants (roughly speaking, readonly data) lie in .data the data is mutable, .bss there are "nulled" variables, which, nevertheless, need a place (this section will "go" to RAM).

The good news is that .data/.bss should fit, but with .text the trouble is that there is only 1MiB of memory for the image. Can be thrown out .text the picture from the example and read it, for example, from the SD card into memory at startup, but fruits.png weighs about 330KiB, so this will not solve the problem: most .text consists of the OpenCV code.

By and large, there is only one thing left - loading a part of the code onto a QSPI flash (it has a special mode of operation for mapping memory to the system bus, so that the processor can access this data directly). In this case, a problem arises: firstly, the memory of a QSPI flash drive is not available immediately after the device is rebooted (you need to separately initialize the memory-mapped mode), and secondly, you cannot “flash” this memory with a familiar bootloader.

As a result, it was decided to link all the code in QSPI, and flash it with a self-written loader that will receive the required binary via TFTP.

Experience the Power of Effective Results

The idea to port this library to Embox appeared about a year ago, but over and over again it was postponed due to various reasons. One of them is support for libstdc++ and the standard template library. The problem of C++ support in Embox is beyond the scope of this article, so here I will only say that we managed to achieve this support in the right amount for this library to work 🙂

In the end, these problems were overcome (at least enough for the OpenCV example to work), and the example ran. It takes 40 long seconds for the board to search for boundaries using the Canny filter. This, of course, is too long (there are considerations on how to optimize this matter, it will be possible to write a separate article about this in case of success).

OpenCV on STM32F7-Discovery

However, the intermediate goal was to create a prototype that will show the fundamental possibility of running OpenCV on STM32, respectively, this goal was achieved, hooray!

tl;dr: step by step instructions

0: Download Embox sources, like this:

    git clone https://github.com/embox/embox && cd ./embox

1: Let's start by assembling a bootloader that will "flash" a QSPI flash drive.

    make confload-arm/stm32f7cube

Now you need to configure the network, because. We will upload the image via TFTP. To set the board and host IP addresses, you need to edit the conf/rootfs/network.

Configuration example:

iface eth0 inet static
    address 192.168.2.2
    netmask 255.255.255.0
    gateway 192.168.2.1
    hwaddress aa:bb:cc:dd:ee:02

gateway - host address from where the image will be loaded, address - address of the board.

After that, we collect the bootloader:

    make

2: The usual loading of the bootloader (sorry for the pun) on the board - there is nothing specific here, you need to do it like for any other application for STM32F7Discovery. If you don't know how to do it, you can read about it here.
3: Compiling an image with a config for OpenCV.

    make confload-platform/opencv/stm32f7discovery
    make

4: Extract from ELF sections to be written to QSPI to qspi.bin

    arm-none-eabi-objcopy -O binary build/base/bin/embox build/base/bin/qspi.bin 
        --only-section=.text --only-section=.rodata 
        --only-section='.ARM.ex*' 
        --only-section=.data

There is a script in the conf directory that does this, so you can run it

    ./conf/qspi_objcopy.sh # Нужный бинарник -- build/base/bin/qspi.bin

5: Using tftp, download qspi.bin.bin to a QSPI flash drive. On the host, to do this, copy qspi.bin to the root folder of the tftp server (usually /srv/tftp/ or /var/lib/tftpboot/; packages for the corresponding server are available in most popular distributions, usually called tftpd or tftp-hpa, sometimes you have to do systemctl start tftpd.service to start).

    # вариант для tftpd
    sudo cp build/base/bin/qspi.bin /srv/tftp
    # вариант для tftp-hpa
    sudo cp build/base/bin/qspi.bin /var/lib/tftpboot

On Embox (i.e. in the bootloader), you need to execute the following command (we assume that the server has the address 192.168.2.1):

    embox> qspi_loader qspi.bin 192.168.2.1

6: With command goto you need to "jump" into QSPI memory. The specific location will vary depending on how the image is linked, you can see this address with the command mem 0x90000000 (the start address fits into the second 32-bit word of the image); you will also need to flag the stack -s, the stack address is at 0x90000000, example:

    embox>mem 0x90000000
    0x90000000:     0x20023200  0x9000c27f  0x9000c275  0x9000c275
                      ↑           ↑
              это адрес    это  адрес 
                стэка        первой
                           инструкции

    embox>goto -i 0x9000c27f -s 0x20023200 # Флаг -i нужен чтобы запретить прерывания во время инициализации системы

    < Начиная отсюда будет вывод не загрузчика, а образа с OpenCV >

7: Launch

    embox> edges 20

and enjoy the 40-second border search 🙂

If something goes wrong - write an issue in our repository, or to the mailing list [email protected], or in a comment here.

Source: habr.com

Add a comment