Video with Cloud Object Detector on Raspberry Pi

Prologue

A video is now circulating on the network - how Tesla's autopilot sees the road.

My hands have been itching for a long time to broadcast video enriched with a detector, and in real time.

Video with Cloud Object Detector on Raspberry Pi

The problem is that I want to broadcast video from Raspberry, and the performance of the neural network detector on it leaves much to be desired.

Intel Neural Computer Stick

I considered different solutions.

Π’ last article experimented with the Intel Neural Computer Stick. The piece of iron is powerful, but requires its own network format.

Although Intel provides converters for the main frameworks, there are a number of pitfalls here.

For example, the format of the desired network may be incompatible, and if it is compatible, then some layers may not be supported on the device, and if they are supported, then errors may occur during the conversion process, as a result of which we get some strange things at the output.

In general, if you want some kind of arbitrary neural network, then it may not work with NCS. Therefore, I decided to try to solve the problem through the most massive and available tools.

Cloud

The obvious alternative to the local-hardware solution is to go to the cloud.

Ready options - eyes run wide.

All leaders:

… And dozens of lesser known ones.

It is not easy to choose among this variety.

And I decided not to choose, but to wrap the good old working scheme on OpenCV in docker and run it in the cloud.

The advantage of this approach is flexibility and control - you can change the neural network, hosting, server - in general, any whim.

Server

Let's start with a local prototype.

Traditionally I use Flask for REST API, OpenCV and MobileSSD web.

Having installed the current versions on the docker, I found that OpenCV 4.1.2 does not work with Mobile SSD v1_coco_2018_01_28, and I had to roll back to the checked 11_06_2017.

At the start of the service, we load the class names and the network:

def init():
    tf_labels.initLabels(dnn_conf.DNN_LABELS_PATH)
    return cv.dnn.readNetFromTensorflow(dnn_conf.DNN_PATH, dnn_conf.DNN_TXT_PATH)

On a local docker (not the youngest laptop) it takes 0.3 seconds, on Raspberry it takes 3.5.

Let's start the calculation:

def inference(img):
    net.setInput(cv.dnn.blobFromImage(img, 1.0/127.5, (300, 300), (127.5, 127.5, 127.5), swapRB=True, crop=False))
    return net.forward()

Docker - 0.2 sec, Raspberry - 1.7.

Turning the tensor output into readable json:

def build_detection(data, thr, rows, cols):
    ret = []
    for detection in data[0,0,:,:]:
        score = float(detection[2])
        if score > thr:
            cls = int(detection[1])
            a = {"class" : cls, "name" : tf_labels.getLabel(cls),  "score" : score}
            a["x"] = int(detection[3] * cols)
            a["y"] = int(detection[4] * rows)
            a["w"] = int(detection[5] * cols ) - a["x"]
            a["h"] = int(detection[6] * rows) - a["y"]
            ret.append(a)
    return ret

Farther export this operation via Flask(the input is a picture, the output is the results of the detector in json).

An alternative option, in which more work is shifted to the server: it circles the found objects itself and returns the finished image.

This option is good where we do not want to pull opencv to the server.

Docker

We collect the image.

The code is combed and posted on Github, docker will take it directly from there.

As a platform, we will take the same Debian Stretch as on Raspberry - we will not leave the proven tech stack.

You need to install flask, protobuf, requests, opencv_python, download Mobile SSD, server code from Github and start the server.

FROM python:3.7-stretch

RUN pip3 install flask
RUN pip3 install protobuf
RUN pip3 install requests
RUN pip3 install opencv_python

ADD http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz /
RUN tar -xvf /ssd_mobilenet_v1_coco_11_06_2017.tar.gz

ADD https://github.com/tprlab/docker-detect/archive/master.zip /
RUN unzip /master.zip

EXPOSE 80

CMD ["python3", "/docker-detect-master/detect-app/app.py"]

Simple detector client requests based.

Publishing to Docker Hub

Docker registries proliferate as fast as cloud detectors.

In order not to bother, we will conservatively go through DockerHub.

  1. Checking in
  2. Let's log in:
    docker login
  3. Come up with a meaningful name:
    docker tag opencv-detect tprlab/opencv-detect-ssd
  4. Uploading the image to the server:
    docker push tprlab/opencv-detect-ssd

We run in the cloud

The choice of where to run the container is also very wide.

All the big players (Google, Microsoft, Amazon) offer a micro instance for free for the first year.
After experimenting with Microsoft Azure and Google Cloud, I settled on the latter - because it took off faster.

I did not write instructions here, since this part is very specific to the selected provider.

Tried different versions of iron,
Low levels (shared and dedicated) - 0.4 - 0.5 seconds.
More powerful cars - 0.25 - 0.3.
Well, even in the worst case, winning three times, you can try.

Video

We run a simple OpenCV video streamer on Raspberry, detecting via Google Cloud.
For the experiment, a video file was used, once filmed at a random intersection.


def handle_frame(frame):
    return detect.detect_draw_img(frame)
       
def generate():
    while True:
        rc, frame = vs.read()
        outFrame = handle_frame(frame)
        if outFrame is None:
            (rc, outFrame) = cv.imencode(".jpg", frame)
        yield(b'--framern' b'Content-Type: image/jpegrnrn' + bytearray(outFrame) + b'rn')

@app.route("/stream")
def video_feed():
    return Response(generate(), mimetype = "multipart/x-mixed-replace; boundary=frame")

With the detector, you get no more than three frames per second, everything goes very slowly.
If you take a powerful machine in GCloud, you can detect 4-5 frames per second, but the difference is almost imperceptible to the eye, it’s still slow.

Video with Cloud Object Detector on Raspberry Pi

The cloud and transport costs have nothing to do with it, on ordinary hardware the detector works at such a speed.

Neural Computer Stick

I could not resist and drove the benchmark on the NCS.

The speed of the detector was slightly slower than 0.1 seconds, in any case 2-3 times faster than the cloud on a weak machine, i.e. 8-9 frames per second.

Video with Cloud Object Detector on Raspberry Pi

The difference in results is due to the fact that Mobile SSD version 2018_01_28 was running on NCS.

PS In addition, experiments have shown that a fairly powerful desktop machine with an I7 processor shows slightly better results and it turned out to be possible to squeeze 10 frames per second on it.

Cluster

The experiment went further and I installed a detector on five nodes in Google Kubernetes.
The pods themselves were weak and each of them could not process more than 2 frames per second.
But if you run a cluster on N nodes and parse frames into N streams, then with a sufficient number of nodes (5), you can achieve the desired 10 frames per second.

def generate():
    while True:
        rc, frame = vs.read()
        if frame is not None:
            future = executor.submit(handle_frame, (frame.copy()))
            Q.append(future)

        keep_polling = len(Q) > 0
        while(keep_polling):            
            top = Q[0]
            if top.done():
                outFrame = top.result()
                Q.popleft()
                if outFrame:
                    yield(b'--framern' b'Content-Type: image/jpegrnrn' + bytearray(outFrame) + b'rn')
                keep_polling = len(Q) > 0
            else:
                keep_polling = len(Q) >= M

Here's what happened:

Video with Cloud Object Detector on Raspberry Pi

A little less brisk than with NCS, but more cheerful than in one stream.

The gain, of course, is not linear - there are overlays for synchronization and deep copying of opencv pictures.

Conclusion

In general, the experiment allows us to conclude that, if you try, you can get out with a simple cloud.

But a powerful desktop or local piece of hardware allows you to achieve better results, and without any tricks.

references

Source: habr.com

Add a comment