The function of pointing the camera by voice has become more accessible - the universal solution SmartCam A12 Voice Tracking

The function of pointing the camera by voice has become more accessible - the universal solution SmartCam A12 Voice TrackingThe topic of tracking a speaking participant in a video conference has gained a lot of momentum over the past few years. Technology has enabled the implementation of sophisticated real-time audio/video processing algorithms, prompting Polycom to introduce the world's first mass-market solution with intelligent automatic speaker tracking almost 10 years ago. For several years they managed to remain the only owners of such a solution, but Cisco did not take long to wait, and brought to the market their version of an intelligent two-chamber system, which was a fair competitor to the solution from Polycom. For many years, this videoconferencing segment was limited by the capabilities of several proprietary products, but this article is dedicated to the first universal a solution for pointing the camera by voice, compatible with both the hardware and software infrastructure of the videoconferencing network.
Before moving on to describing the solutions and demonstrating the possibilities, I want to note an important event:
I am honored to present to the habra community new hubdedicated to videoconferencing solutions (VCS). Now, thanks to the combined efforts (mine and the UFO), Videoconferencing has its own home on Habré, and I invite everyone involved in this extensive and relevant topic today to subscribe to new hub.

Two scenarios for pointing the camera at the speaker

At the moment, integrators of video conferencing solutions choose for themselves two different ways to implement the task of pointing to the speaker:

  1. Automatic - intelligent
  2. Semi-automatic - programmable

The first option is just solutions from Cisco, Polycom and other manufacturers, we will consider them below. Here we are dealing with full automation of pointing the camera at the speaking participant in the video conference. Unique algorithms for processing audio / video signals allow the camera to choose the desired position on its own.

The second option is automation systems based on various external control controllers, we will not consider them in detail, because. the article is devoted to just automatic tracking of speakers.
There are quite a few supporters of the second scenario for implementing camera pointing, and there are reasons for this. Experienced integrators understand that Polycom and Cisco smart solutions require ideal operating conditions for regular operation of automation. But it is not always possible to provide such conditions, so the following solution to the problem of pointing the camera sometimes becomes a guarantee of the system’s operation:

1. In the camera's memory (or sometimes in the control controller), all the necessary presets (positions of the PTZ device and the optical zoom ratio) are manually entered in advance. As a rule, this is the general plan of the meeting room, and the view of each conference participant in portrait mode.

2. Next, the initiators of calling the required preset are installed in the specified places - these are either microphone remotes or radio buttons, in general, any device that can send a signal that it understands to the control controller.

3. The control controller is programmed in such a way that each initiator has its own preset. The general plan of the room - all initiators are turned off.
As a result, when using a congress system, for example, and a control controller, the speaker, before starting his speech, activates his personal microphone console. The control system instantly works out the saved position of the camera.

This scenario works flawlessly - the system does not need to perform voice triangulation and video analytics. I pressed the button - the preset worked, no delays and false positives.
Control and automation systems are used in large, complex rooms, where sometimes not one, but several video cameras are installed. Well, for small and medium-sized meeting rooms, automatic systems are quite suitable (if there is a budget).
Let's start with the founding fathers.

Polycom EagleEye Director

The function of pointing the camera by voice has become more accessible - the universal solution SmartCam A12 Voice TrackingOnce upon a time, this decision created a sensation in the field of videoconferencing. The Polycom EagleEye Director was the first intelligent camera aiming solution. The solution consists of an EagleEye Director base unit and two cameras. A feature of that first implementation is that one camera is assigned only to a large view of the speaker, and the second - to the general plan of the meeting room. At the same time, the general plan camera can be placed generally separately from the base in another place in the meeting room - it does not directly participate in the automatic guidance process.
The system works as follows:

  1. The camera of the general plan of the room is active - everyone is silent
  2. The speaker begins to speak - the microphone array picks up the voice, the camera moves towards the sound using a patented technology that includes voice triangulation. Wide camera still active
  3. The main camera is just starting to look for the sound source, conducting video analytics. The system determines the speaker by eye-nose-mouth, frames the picture with the speaker and displays the stream from the main camera on the screen
  4. The speaker is changing. The microphone array understands that the voice is coming from somewhere else. The general plan is turned on again.
  5. And then in a circle, starting from point 2
  6. If the new speaker is in the frame with the previous one, the system changes the positioning “to hot” without changing the active stream to the general plan.

The downside, in my opinion, is the presence of only one main camera. This results in a significant delay when changing speakers. And every time at the moment of pointing, the system turns on the general plan of the room - with a lively conversation, this flickering begins to annoy.

Polycom EagleEye Director II

The function of pointing the camera by voice has become more accessible - the universal solution SmartCam A12 Voice TrackingThis is the second version of the solution from Polycom, which was released relatively recently. The principle of operation has changed, and began to look more like a solution from Cisco. Now both PTZ cameras are the main ones and serve to seamlessly switch channels from one speaker to another. A separate camera integrated into the body of the EagleEye Director II base unit is now responsible for the overall plan of the meeting room. For some reason, the stream from this wide-angle camera is displayed in an additional window in the corner of the screen, occupying 1/9 of the main stream. The positioning principle is the same - voice triangulation and video stream analysis. And the bottlenecks are the same: if the system does not see the speaking mouth, the camera will not point. And this situation can happen very often - the speaker turned away, the speaker turned sideways, the speaker is a ventriloquist, the speaker covered his mouth with his hand or document.
Both promo videos were filmed competently - 2 people speak in turn, and their mouths are opened like at a speech therapist's appointment. But even under such refined conditions, there is a very significant delay. But, on the other hand, the framing is flawless - a comfortable portrait plan.

Cisco TelePresence Speaker Track 60

The function of pointing the camera by voice has become more accessible - the universal solution SmartCam A12 Voice TrackingTo describe this decision, I will use the text from the official brochure.
The SpeakerTrack 60 uses a unique dual camera approach to quickly switch directly between participants. One camera quickly finds a close-up of the active speaker, while the other camera searches for and displays the next speaker. The MultiSpeaker feature prevents unnecessary switching if the next speaker is already present in the current frame.
Unfortunately, I didn't get a chance to test the SpeakerTrack 60 myself. Therefore, conclusions have to be drawn according to the opinion “from the field” and according to the results of the analysis of the demo video below. I counted the maximum delay of almost 8 seconds when hovering over a new speaker. The average delay was 2-3 seconds, judging by the video.

HUAWEI Intelligent Tracking Video Camera VPT300

The function of pointing the camera by voice has become more accessible - the universal solution SmartCam A12 Voice TrackingI came across this solution from Huawei by accident. The cost of the system is about $9K. Works only with Huawei terminals. The developers have added their own "trick" - the layout of the video from two speakers on one screen, if there is no one else in the room. According to the characteristics and the declared functionality, this is a very interesting version of the automatic guidance system. But, unfortunately, I did not find absolutely no demo material. The only video that fell out on this topic was a mounted video review of the solution, without the original sound, to the music. Thus, it was not possible to assess the quality of the system. For this reason, I will not consider this option.
I see that Huawei has an active blog on Habré - maybe colleagues will be able to publish some useful information on this product.

New - one-stop solution SmartCam A12 Voice Tracking

The function of pointing the camera by voice has become more accessible - the universal solution SmartCam A12 Voice TrackingSmartCam A12VT - a monoblock that includes two PTZ cameras for tracking speakers, two built-in cameras for analytics of the general plan of the room, as well as a microphone array built into the base of the case - as you can see, there are no bulky and fragile structures, like those of opponents.
Before proceeding with the description of the new product, I will bring together the characteristics and features of solutions from Cisco and Polycom so that you can compare SmartCam A12VT with existing offers.

Polycom EagleEye Director

  • Retail price of the system without terminal — $ 13K
  • EagleEye Director + RealPresence Group 500 minimum cost - $ 19K
  • Average switching delay 3 seconds
  • Voice guidance + video analytics
  • High demands on the face of the speaker - you can not hide your mouth
  • Incompatibility with third party equipment

Cisco TelePresence Speaker Track 60

  • Retail price of the system without terminal — $ 15,9K
  • Minimum cost of TelePresence SpeakerTrack 60 + SX80 Codec solution - $ 30K
  • Average switching delay 3 seconds
  • Voice guidance + video analytics
  • Requirements for the face of the speaker - did not check, did not find information
  • Incompatibility with third party equipment

SmartCam A12 Voice Tracking

  • Retail price of the system without terminal — $ 6,2K
  • The minimum cost of the solution SmartCam A12VT + Yealink VC880$ 10.8K
  • The minimum cost of the solution SmartCam A12VT+ software terminal$ 7,7K
  • Average switching delay 3 seconds
  • Voice guidance + video analytics
  • Requirements for the face of the speaker - no requirements
  • Third Party Equipment Compatibility - HDMI

As two main and indisputable advantages of the solution SmartCam A12 Voice Tracking i find:

  1. Connectivity Versatility - via HDMI, the system integrates with both hardware and software terminal video conferencing systems
  2. Low cost - with similar functionality, A12VT is many times more affordable in terms of budget than the above proposals.

To demonstrate how the system works, we recorded a video review. The task was not so much advertising as functional. Therefore, the video is devoid of the pathos of the Polikom promo video. As a venue for the presentation, we chose not a representative, but a laboratory meeting room of our partner, iPiMatika.
My goal was not to hide the flaws of the system, but on the contrary, to expose the bottlenecks of the functionality, to make the system make a mistake.

In my opinion, the system was successfully tested. I state this with confidence because at the time of this writing, the decision SmartCam A12 Voice Tracking visited a dozen real meeting rooms of our customers. Malfunctions of automation were observed only in conditions of violation of the recommended operating rules. In particular, the minimum distance to nearby participants. If you sit very close to the camera, less than a meter, the microphone array will not be able to recognize you, and the lens will not be able to track you.

The function of pointing the camera by voice has become more accessible - the universal solution SmartCam A12 Voice Tracking

In addition to distance, there is another requirement - the height of the camera.

The function of pointing the camera by voice has become more accessible - the universal solution SmartCam A12 Voice Tracking

If the camera is set too low, there may be problems with positioning by voice. The option under the TV, unfortunately, did not work.
But mounting the system above the display means is the ideal way for the device to work. Shelf for the camera is included, only wall mount is supported.

How SmartCam A12 Voice Tracking works

The main PTZ lenses have equal roles - their task is to follow the speakers in turn and display the general plan. Analytics of the overall picture in the room and determination of the distance to objects is performed using video streams received from two cameras integrated into the base of the system. This feature allows you to reduce the reaction time of the lens, when changing the speaker, to 1-2 seconds. The camera manages to rotate the participants in a comfortable rhythm, even if they exchange short sentences.
Video demonstration of the system fully reflects the functionality SmartCam A12VT. But, for those who have not watched the video, I will describe in words the principle of automation:

  1. The room is empty: one of the lenses shows the general plan, the second is at the ready - waiting for people
  2. People enter the room and sit down: the free lens finds the two extreme participants and frames the image along them, cropping the empty part of the room
  3. While people are on the move, the lenses take turns tracking everyone in the room, keeping them in the center of the frame.
  4. The speaker begins to speak: the lens is active, set to a wide shot. The second is aimed at the speaker, and only then goes into broadcast mode
  5. The speaker changes: the lens is active, tuned to the first speaker, and the second lens throws a wide shot and adjusts to the new speaker
  6. At the moment of switching the picture from the first speaker to the second, the free lens instantly adjusts to the general plan of the room
  7. If everyone is silent - a free lens will show a ready-made general shot without any delay
  8. If the speaker changes again, the free lens will go in search of him.

Conclusion

In my opinion, this solution, presented at ISE and ISR last year, brings high technology closer - if not to the people, then to business for sure. It is clear that for 400 thousand rubles, few people will buy such a “toy” at home, but for business, for corporate video conferencing, this is a very affordable and convenient solution to the problem of auto-aiming the camera.
Given the versatility SmartCam A12 Voice Tracking, the system can be used as a solution from scratch, or as an extension of the functionality of an existing video conferencing infrastructure. Connecting via HDMI is a big step towards the user, unlike proprietary systems from the above manufacturers.

I want to thank the partners who assisted in testing.
Company iPiMatika - for the Yealink VC880 terminal, meeting room and Yura Yakushin.
Company Smart AV — for the right of first and exclusive review of the solution and provision of the system SmartCam A12 Voice Tracking for testing.

In the last article Online meeting room constructor - selection of the optimal video conferencing solution, as a website promotion vc4u.ru и VKS constructor we announced 10% discount from price to directory by code word HORNBEAM until the end of summer 2019.

The discount applies to products in the sections:

On decision SmartCam A12 Voice Tracking I offer an additional 5% discount to the already existing 10% - total 15% until the end of summer 2019.

I look forward to your comments and answers in the survey!

Thank you for attention.
Best regards,
Kirill Usikov (Usikoff)
Head of
Video surveillance and video conferencing systems
[email protected]
stss.ru
vc4u.ru

Only registered users can participate in the survey. Sign in, you are welcome.

How useful is SmartCam A12 Voice Tracking?

  • Finally, there is a universal solution for software and hardware terminals!

  • The solution is good, but there are other options available (I will write in the comments)

  • The system is rather weak, it does not reach Polycom and Cisco - I will write in the comments why you should pay 3 times more!

  • Who needs auto-guidance in a meeting room anyway?

  • Who needs a PTZ camera in a conference room anyway? — Connected webcam and norms!

8 users voted. 5 users abstained.

Source: habr.com

Add a comment