ITMO Research_ podcast: how to approach syncing AR content with a stadium-wide show

This is the first part of the text transcript of the second interview for our program (Apple podcasts, Yandex.Music). Issue guest - Andrey Karsakov (capc3d), Ph.D., senior researcher at the National Center for Cognitive Research, Associate Professor at the Faculty of Digital Transformations.

Since 2012, Andrey has been working in the Visualization and Computer Graphics research group. Engaged in major applied projects at the state and international level. In this part of the conversation, we are talking about his experience in AR support for mass events.

ITMO Research_ podcast: how to approach syncing AR content with a stadium-wide show
Photo This is Engineering RA Eng (unsplash.com)

Context and objectives of the project

Timecode (according to audio versions) - 00:41

dmitrykabanov: I would like to start with the European Games project. It is multi-component, several teams participated in the preparation, and providing augmented reality for an audience of many thousands right during the event at the stadium is quite a serious task. In terms of your involvement, was it software in the first place?

capc3d: Yes, we did the program part and provided support during the show. It was necessary to track, monitor and launch everything in real time, and also work with a television group. If we consider this project as a whole, then we can talk about the opening and closing ceremonies European Games in Minsk, as well as about the opening ceremony of the championship WorldSkills in Kazan. It was the same scheme of work, but different activities. There was a gap of two months between them. We prepared the project together with the guys from the company Sechenov.com.

We met them by chance Science Festwhich took place in autumn 2018. Our undergraduates showcased their course project on VR. The guys came up to us and asked what we do in our laboratory. It looked something like this:

β€” You work with VR, but do you know how to work with augmented reality?

- Well, sort of, yes.

- There is such a task, with such input. Can you do it?

The turnip was scratched a little, there seems to be nothing unreal:

- Let's try to study everything in advance, and then we will find a solution.

Dmitriy: Do they only do media support?

Andrei: They make a full stack. From the point of view of management and organization, they are fully engaged in directing, staging, selection of scenery, logistics and other technical support. But they wanted to do something special for the European Games. These special effects, like mixed reality, have been made for television for a long time, but they are not the most budgetary in terms of technical implementation. Therefore, the guys were looking for alternative options.

Dmitriy: Let's discuss the problem in more detail. What was it?

Andrei: There is an event. It lasts an hour and a half. We need to make sure that the viewers who watch it live and those who sit in the stadium can see augmented reality effects with full synchronization with the live show in terms of time and location on the site.

There were a number of technical limitations. It was impossible to do time synchronization via the Internet, because there were fears about excessive network load with full stands and the prospect of visiting the event by heads of state, which could jam mobile networks.

Andrey Karsakov, photo from material from ITMO University
ITMO Research_ podcast: how to approach syncing AR content with a stadium-wide showWe had two key components of this project - the personal experience that people can get through mobile devices, and what goes into the television broadcast and information screens in the stadium itself.

If suddenly a person watches episodes of augmented reality through a mobile device and at the same time hits the screen, he should see the same picture.

We needed two actually different systems to be completely synchronized in time. But the peculiarity of such shows is that they are complex events, where a large number of technical services are involved and all operations are performed according to time codes. Timecode is a specific moment in time at which something starts: light, sound, people exit, stage petals open, and so on. We had to adapt to this system so that everything starts at the right time for us. Another feature was that the scenes and episodes with augmented reality were scripted to each other.

Dmitriy: But did you decide to abandon the use of time codes, because of the high risks of force majeure, or did you initially calculate some power characteristics and understand that the load on the entire system would be quite high?

Andrei: If you make a synchronization service for such an audience, then it is not very difficult. Requests in any case will not fall at one moment. Yes, the load is high, but this is not an emergency. The question is whether it is worth spending resources and time on this if the network suddenly goes out. We were not sure that this would not happen. In the end, everything worked, intermittently due to load, but it worked, and we synchronized according to the time code according to a different scheme. It was one of the global challenges.

Difficulties of implementation in terms of UX

Timecode (according to audio versions) - 10:42

Andrei: We also had to take into account that the stadium is not a classic concert venue, and synchronize the systems in space for mobile devices. Yes, it went viral some time ago. history with augmented reality at Eminem concerts, then there was a case with Loboda.

Photo Robert Bye (unsplash.com)
ITMO Research_ podcast: how to approach syncing AR content with a stadium-wide showBut this is always an experience in front of you - the whole crowd is standing in front of the stage, the synchronization is quite simple. In the case of the stadium, you need to understand which side you are on the circumference, the relative position, so that the stadium sits in the space that exists in the virtual environment. It was a non-sour challenge. We tried to solve it in various ways, and the result was a close case to what was implemented by Loboda, but not in everything.

We let the user decide for himself where he is. We made a marking of the stadium, where people chose a sector, a row, a place. All this in four clicks. Next, we had to determine the direction to the stage. To do this, we showed a silhouette of how the scene should look roughly from the user's perspective. He combined it, tapped it and that's it - the stage was set. We have tried to make this process as easy as possible. Still, 90% of the viewers who wanted to watch the show are not those people who have experience with augmented reality.

Dmitriy: Was there a separate application for this project?

Andrei: Yes, an application for iOS and for Android, which we pushed to the stores. There was a separate promotional campaign for it. It was previously described in detail how to download and so on.

Dmitriy: It must be understood that a person has nowhere to physically check and learn how to use such an application. Therefore, the task of "educating" the audience became more complicated.

Andrei: Yes Yes. With UX, we caught a lot of bumps, because the user wants to get experience in three clicks: download, install, run - it worked. Many people are too lazy to go through complex tutorials, read tutorials and so on. And we didn’t try to explain everything to the user as much as possible in the tutorial: a window will open here, access to the camera is here, otherwise it won’t work, and so on. No matter how much you write explanations, no matter how you chew in detail, no matter what gifs you insert, people don’t read this.

In Minsk, we have collected a large pool of feedback on this part, and have already changed a lot for the application in Kazan. We drove there not only those phonograms and those time codes that correspond to a specific episode of augmented reality, but took completely all the phonograms and time codes. So the application heard what was happening at the time of launch, and - if a person logged in at the wrong moment - it gave out information: β€œComrade, I'm sorry, your AR episode will be in 15 minutes.”

A little about the architecture and approach to synchronization

Timecode (according to audio versions) - 16:37

Dmitriy: Synchronization still decided to do the sound?

Andrei: Yes, it happened by accident. We were going through options and came across a company cifrasoft from Izhevsk. They make a not very sophisticated, but hard-working SDK, which allows you to synchronize with the timing by sound. The system was positioned to work with TV, when you can display something in the application by the sound of conditional advertising or give interactive information along the movie track.

Dmitriy: But it's one thing - you're sitting in your living room, and another - a stadium with thousands of people. How did everything work out for you with the quality of sound recording and its subsequent recognition?

Andrei: There were many fears and doubts, but in most cases everything was recognized well. They build signatures based on the audio track with their cunning algorithms - the result weighs less than the original audio file. When the microphone listens to ambient sound, it tries to find these features and recognize the track from them. In good conditions, the synchronization accuracy is 0,1-0,2 seconds. This was more than enough. In poor conditions, the discrepancy was up to 0,5 seconds.

Much depends on the device. We have worked with a large fleet of devices. For iPhones, this is only 10 models. They worked fine in terms of quality and other features. But with androids, the zoo is like my mother. Not everywhere it turned out that the sound synchronization worked. There were cases when on different devices, moreover, different tracks could not be heard due to some peculiarities. Somewhere low frequencies leave, somewhere high ones begin to wheeze. But if the device had a normalizer on the microphone, synchronization always worked.

Dmitriy: Please tell us about the architecture - what did you use in the project?

Andrei: We made the application on Unity - the easiest option in terms of multiplatform and working with graphics. Used AR Foundation. We immediately said that we would not want to complicate the system, so we limited ourselves to a fleet of devices that support ARKit and ARCore in order to have time to test everything. We made a plug-in for the digital-soft SDK, it is on our GitHub. We made a content management system so that scripts run on a timeline.

We messed around a little with the particle system, because the user can enter at any time in a particular episode, and you need him to see everything from the moment he synchronized from. They fiddled with a system that allows scenarios to play back in time so that the XNUMXD experience can be scrolled back and forth like in a movie. If it works out of the box with classic animations, then it was necessary to tinker with particle systems. At some point, they start to spawn, and if you find yourself somewhere before the spawn point, they have not been born yet, although they should be. But this problem is actually quite easy to solve.

For the mobile part, the architecture is quite simple. For TV broadcasting, everything is more difficult. We had iron restrictions. From the customer, a condition was set: β€œHere we have such and such an iron park, roughly speaking, it is necessary that everything work on it.” We immediately focused on the fact that we would work with relatively low-cost video capture cards. But budget doesn't mean bad.

There was a limitation on hardware, on video capture cards and on working conditions - how we should receive a picture. Capture cards - Blackmagic Design, worked according to the Internal keying scheme - this is when a video frame from the camera comes to you. The card has its own processing chip, where a frame is also inserted, which should be superimposed on top of the incoming one. The card mixes them up - we don’t touch anything else there and don’t affect the frame from the video camera. She spits out the result through the video output to the control room. This is a good method for overlaying titles and stuff like that, but it's not very suitable for mixed reality effects because there are a lot of restrictions on the render pipeline.

Dmitriy: In terms of real-time computing, object binding, or something else?

Andrei: In terms of quality and achieving the desired effects. Due to the fact that we do not know what we put the picture on top of. We're just passing the color and transparency information on top of the original stream. Some effects like refractions, correct transparency, additional shadows cannot be achieved with this scheme. To do this, you need to render everything together. For example, there is no way to make the effect of air distortion from a fire or from hot asphalt. The same with the transfer of the effect of transparency, taking into account the refractive index. We initially made content based on these restrictions, and tried to use the appropriate effects.

View this post on Instagram

Closing of the II European Games in Minsk.

A post shared by Alena Lanskaya (@alyonalanskaya) on Jun 30, 2019 at 3:19pm PDT

Dmitriy: Did you already have your content on the first project for the European Games?

Andrei: No, the main stage of content development was carried out by the guys from Sechenov.com. Their graphic artists drew basic content with animations and other things. And we integrated everything into the engine, tweaked additional effects, adapted them so that everything worked correctly.

If we talk about the pipeline, then for TV broadcasting we assembled everything on Unreal Engine 4. Coincidentally, they just at that moment began to force their toolkit for mixed reality (mixed reality). It turned out that everything is not so simple. All the tools are raw even now, we had to manually finish a lot. In Minsk, we worked on a custom build of the engine, that is, we rewrote some things inside the engine so that, for example, shadows could be drawn over real objects. On the version of the engine that was then relevant, there were no features that allowed this to be done using standard tools. For this reason, we had the guys doing their own custom build to provide everything that was vital.

Other nuances and adaptation to WorldSkills in Kazan

Timecode (according to audio versions) - 31:37

Dmitriy: But all this in a fairly short period of time?

Andrei: Deadlines were Kazan project, in Minsk - normal. About six months for development, but taking into account the fact that six people were involved. At the same time, we were doing the mobile part, developing tools for TV production. There was not only the output of the picture. For example, a tracking system with optics, for this you had to make your own tools.

Dmitriy: Was there an adaptation from one project to another? In a month and a half, it was necessary to take advantage of the developments and shift the project with new content to a new site?

Andrei: Yes, it was a month and a half ago. We had planned a two-week vacation for the whole team after the Minsk project. But immediately after the closing, the guys from Sechenov.com come up and say: "Well, let's do Kazan then." We still managed to take a break, but we switched to this project rather quickly. Some things have been done on the technical side. Most of the time was spent on content, because for WorldSkills we did it completely, we just coordinated it with the director's team. There was only a script on their part. But it was easier - there was no need for extra iterations. When you make content yourself, you immediately see how it works in the engine, you can quickly edit and coordinate.


As for the mobile part, they took into account all the subtleties that we had in Minsk. We made a new design of the application, reworked the architecture a bit, added tutorials, but tried to make it as short and clear as possible. We reduced the number of user steps from launching the application to viewing content. A month and a half was enough to make an adequate project. In a week and a half, they went to the site. It was easier to work there, because all control over the project was in the hands of the organizers, there was no need to coordinate with other committees. It was easier and easier to work in Kazan and it was quite normal that there was less time.

Dmitriy: But did you decide to leave the synchronization approach as it was, in terms of sound?

Andrei: Yes, we left by sound. Worked well. As the saying goes, if it works, don't touch it. We just took into account the nuances of the quality of the sound track. When they did the intro, there was actually a practice episode so people could try before the show started. It was surprising that when at the moment of playing a track in the stadium there is a storm of applause, β€œlive”, the system allows you to synchronize well on this track, but if at that moment recorded applause is mixed with the track, the track stops being caught. These nuances were taken into account, and in terms of sound, everything synchronized quite well.

PS In the second part of the issue, we are talking about scientific data visualization, process modeling in other projects, game development and the master's program "Computer game development technology". We will publish the continuation in the next article. You can listen and support us here:

PPS Meanwhile, on the English version of Habr: a closer look at ITMO University.

Source: habr.com

Add a comment