🥇How and why we won the Big Data track at the Urban Tech Challenge hackathon

My name is Dmitry. And I want to talk about how our team reached the finals of the Urban Tech Challenge hackathon on the Big Data track. I must say right away that this is not the first hackathon in which I participated, and not the first in which I take prizes. In this regard, in my story, I want to voice some general observations and conclusions regarding the hackathon industry as a whole, and give my point of view, as opposed to the negative reviews that appeared on the network immediately after the end of the Urban Tech Challenge (for example this).

So first some general observations.

1. It is surprising that quite a few people naively think that a hackathon is some kind of sports competition where the best coders win. This is wrong. I do not consider cases where the organizers of the hackathon themselves do not know what they want (I have seen this as well). But, as a rule, a company that organizes a hackathon pursues its own goals. Their list can be different: it can be a technical solution to some problems, a search for new ideas and people, etc. These goals often determine the format of the event, its timing, online/offline, how the tasks will be formulated (and whether they will be formulated at all), whether there will be a code review at the hackathon, etc. Both the teams and what they have done are evaluated from this point of view. And the winning teams are the ones that best hit the spot the company wants, and many hit that spot quite unconsciously and by accident, thinking that they are really participating in a sporting event. My observations show that in order to motivate participants, the organizers should create at least the appearance of a sports environment and equal conditions, otherwise they receive a wave of negativity, as in the above review. But we deviated.

2. Hence the following conclusion. The organizers are interested in the fact that the participants come to the hackathon with their best practices, sometimes even a special online stage is arranged for this. This allows you to get stronger solutions at the output. The concept of “own work” is very relative, any experienced proger can accumulate thousands of lines of code from their old projects in the first commit. And will it be a pre-prepared operating time? But in any case, the rule applies, which I expressed in the form of a well-known meme:

To win, you must have something, some kind of competitive advantage: a similar project that you did in the past, knowledge and experience in some specific topic, or a ready-made development made before the start of the hackathon. Yes, it's not sporty. Yes, it may not pay off the effort (here everyone decides for himself whether it is worth coding 3 weeks at night for a prize of 100 thousand divided by the whole team, and even with the risk of not getting it). But, often, this is the only chance to get ahead.

3. Team selection. As I have noticed in the hackathon chats, many people approach this question quite lightly (although this is the most important decision that will determine your result at the hackathon). In many areas of activity (both in sports and in hackathons), I saw that strong people tend to unite with the strong, the weak with the weak, the smart with the smart, well, in general, you understand ... This is approximately what happens in chats: more or less strong programmers they are immediately snapped up, people who do not have any skills that are valuable for a hackathon, hang in the chat for a long time and choose a team on the principle that only someone would take it. At some hackathons, random distribution by teams is practiced, and the organizers claim that random teams show the result no worse than the already established ones. But according to my observations, motivated people, as a rule, find a team on their own, if someone has to be distributed, then, often, many of them do not come to the hackathon.

As for the composition of the team, it is very individual and strongly depends on the task. I could say that the minimum viable team composition is a front-end designer or a back-end front-end designer. But I also know of cases where teams consisting only of frontenders won, who attached a simple node.js backing, or made a mobile application on React Native; or only from backenders that did simple layout. In general, everything is very individual and depends on the task. My plan for selecting a team for the hackathon was as follows: I planned to assemble a team or join a team like front-end - back-ender - designer (I myself am a front). And pretty quickly I started chatting with a python backender and a designer who accepted an invitation to join us. A little later, we were joined by a female business analyst who already had experience of winning the hackathon, and this decided the issue of her joining us. After a short meeting, we decided to call ourselves U4 (URBAN 4, Urban Four) by analogy with the Fantastic Four. And they even put the corresponding picture on the ava of our telegram channel.

4. Selecting a task. As I said, you must have a competitive advantage, the task for the hackathon is selected based on this. Based on this, looking task list and assessing their complexity, we settled on two tasks: a catalog of innovative enterprises from DPiIR and a chat bot from EFKO. The task from DPiR was chosen by the backender, the task from EFKO was chosen by me, because had experience in writing chat bots on node.js and DialogFlow. The EFKO task also assumed ML, I have some, not very big, experience in ML. And according to the conditions of the problem, it seemed to me that it is unlikely to be solved by means of ML. This feeling was strengthened when I went to the Urban Tech Challenge meetup, where the organizers showed me an EFKO dataset with about 100 photos of product layouts (taken from different angles) and about 20 classes of layout errors. And, at the same time, the customers of the task wanted to get a classification success rate of 90%. As a result, I prepared a presentation of the solution without ML, the backender prepared a presentation according to the catalog, and by joint efforts, having finalized the presentations, we sent them to the Urban Tech Challenge. Already at this stage, the level of motivation and contribution of each participant was revealed. Our designer did not take part in the discussions, answered late, and even filled out information about himself in the presentation at the very last moment, in general, there were doubts.

As a result, we passed the task from the DPiIR, and were not at all upset that we did not pass the EFKO, since the task seemed to us, to put it mildly, strange.

5. Preparing for the hackathon. When it became finally known that we went to the hackathon, we began to prepare a blank. And here I am not calling for you to start writing code a week before the start of the hackathon. At a minimum, you should have a boilerplate ready, with which you can immediately start working without having to set up tools, and without bumping into bugs of some lib that you decided to try for the first time at the hackathon. I know the story about the Angularists who came to the hackathon and spent all 2 days setting up the project build, so everything should be prepared in advance. We assumed to distribute responsibilities as follows: the backender writes crawlers that scour the Internet and put all the collected information in the database, while I write an API on node.js that queries this database and sends data to the front. In this regard, I made a server preparation on express.js in advance, I made a frontend preparation on react. I don’t use CRA, I always customize webpack for myself and I know very well what risks it can pose (remember the story about Angularists). At this point, I requested interface blanks or at least mockups from our designer in order to have an idea of what I would be typing. In theory, he should also make his own preparations and coordinate them with us, but I never received an answer. In the end, I borrowed a design from one of my old projects. And so it began to turn out even faster, since all the styles for this project were already written. Hence the conclusion: not always a team needs a designer))). With these developments, we came to the hackathon.

6. Work on the hackathon. For the first time, I saw my team live only at the opening of the hackathon at the CDP. We met, discussed the solution and the stages of work on the problem. And although after the opening we had to go by bus to Red October, we went home to sleep, having agreed to come to the place by 9.00. Why? The organizers, apparently, wanted to squeeze the maximum out of the participants, so they arranged just such a schedule. But in my experience, you can code normally without sleeping one night. As for the second one, I'm not sure. A hackathon is a marathon, you need to adequately calculate and plan your strength. Moreover, we had blanks.

Therefore, after sleeping off, at 9.00 we were sitting on the sixth floor of Dewocracy. Then our designer unexpectedly announced that he did not have a laptop, and that he would work from home, and we would communicate by phone. This was the last straw. And so we turned from a four into a three, although we did not change the name of the team. Again, this was not a strong blow for us, I already had the design from the old project. In general, at first everything went quite smoothly and according to plan. We loaded into the database (we decided to use neo4j) a dataset of innovative companies from the organizers. I started to typeset, then took up node.js, and then misfires started. I had never worked with neo4j before, and at first I was looking for a working driver for this database, then I figured out how the query was written, and then I was surprised to find that this database, when requested, returns entities in the form of an array of node objects and their edges. Those. when I requested an organization and all the data on it by TIN, instead of a single organization object, I was returned a long array of objects containing data on this organization and the relationships between them. I wrote a mapper that went through the entire array and glued all the objects by organization into one object. But in battle, when requesting a base for 8 thousand organizations, it was executed extremely slowly, about 20 - 30 seconds. I thought about optimization... And then we stopped in time and moved to MongoDB, and it took us about 30 minutes. In total, about 4 hours were lost on neo5j.

Remember, never take technology to a hackathon that you are not familiar with, there may be surprises. But, in general, apart from this failure, everything went according to plan. And already on the morning of December 9, we had a fully working application. For the rest of the day, we planned to add additional features to it. In the future, everything went relatively smoothly for me, but the backender had a whole bunch of problems with the ban of his crawlers in search engines, in the spam of legal entity aggregators, which came in the first places of search results when requested for each specific company. But he'd better talk about it himself. The first additional feature that I screwed up is a search by full name. CEO of VKontakte. This took several hours.

So, on the page of the company in our application, the ava of the general director appeared, a link to his VKontakte page and some other data. It was a good icing on the cake, although it may not have ensured our victory. Then, I wanted to wind up some kind of analytics. But after a long search of options (many nuances arose with the UI), I settled on the simplest aggregation of organizations by economic activity code. Already in the evening, in the last hours, I was laying out a template for displaying innovative products (our application is supposed to have a Products and Services section), although the backend for this was not ready. At the same time, the base was swollen by leaps and bounds, the crawlers continued to work, the backender experimented with NLP to distinguish innovative texts from non-innovative ones))). But it was time for the final presentation.

7. Presentation. From my own experience, I can say that you should switch to preparing a presentation somewhere 3-4 hours before its delivery. Especially if it involves video, shooting and editing it takes quite a lot of time. We were supposed to have a video. And we had a special person who dealt with this, and also solved a number of other organizational issues. In this regard, we were not distracted from coding until the very last moment.

8. Pitch. I did not like that the presentations and the final were made on a separate weekday (Monday). Here, most likely, the policy of the organizers to squeeze the maximum out of the participants continued. I did not plan to take time off from work, I only wanted to come to the finals, although the rest of my team members took days off. However, the emotional immersion in the hackathon was already so high that at 8 in the morning I wrote in the chat of my team (working, not the hackathon team) that I was taking the day at my own expense, and went to the CDP for pitches. There were a lot of pure data scientists in our task, and this greatly affected the approach to solving the problem. Many had a good DS, but no one had a working prototype, many could not get around the bans of their crawlers in the search engines. We were the only team with a working prototype. And we knew how to solve the problem. As a result, we won the track, although we were very lucky that we chose the least competitive task. Looking at the pitches in other tracks, we realized that we would not have any chance there. I also want to say that we were very lucky with the jury, they meticulously checked the code. And, judging by the reviews, this did not happen in all tracks.

9. Final. After we were called several times to the jury for code review, we, thinking that we had finally resolved all the issues, went to have lunch at Burger King. There, the organizers called us again, we had to hastily pack orders and return back.

The organizer showed us which room we had to go to, and, having entered there, we found ourselves at a public speaking training for the winning teams. The guys who were supposed to perform on stage were well charged, they all came out like real showmen.

And I must admit, in the final, against the backdrop of the strongest teams from other tracks, we looked pale, the victory in the nomination of the state customer deservedly went to the team from the real estate tech track. I think that the key factors that contributed to our victory on the track were: the availability of a ready-made blank, due to which we managed to quickly make a prototype, the presence of “highlights” in the prototype (search for CEOs in social networks) and the NLP skills of our backender , which also greatly interested the jury.

And in conclusion, traditional thanks to all those who supported us, the jury of our track, Evgeny Evgrafiev (the author of the problem that we solved at the hackathon) and, of course, the organizers of the hackathon. It was perhaps the biggest and coolest hackathon I've ever participated in, we can only wish the guys to keep such a high standard in the future!

Source: habr.com

How and why we won the Big Data track at the Urban Tech Challenge hackathon

Add a comment Отменить ответ