From physicists to Data Science (From engines of science to office plankton). The third part

From physicists to Data Science (From engines of science to office plankton). The third part

This picture, by Artur Kuzin (n01z3) summarizes the content of the blog post quite accurately. As a consequence, what follows should be seen more like a Friday story than anything extremely useful and technical. In addition, it is worth noting that the text is saturated with English words. Some of them I don’t know how to translate correctly, and some I just don’t want to translate.

The first part.
The second part.

How the transition from the academic environment to the industrial environment took place is revealed in the first two series. In the same, the conversation will go about what happened next.

It was January 2017. At that time, I had a little more than a year of work experience and I worked in San Francisco in a company True Accord as sr. Data Scientist.

TrueAccord is a debt collection startup. Simply put, it's a collection agency. Collectors usually call a lot. We sent a lot of emails, and called a little. Each email led to the company's website, where the debtor was offered a discount on the debt, and even allowed to pay in installments. This approach led to better collection, allowed for scaling and less interruption in lawsuits.

The company was fine. The product is understandable. Management is reasonable. The location is good.

On average, people in the valley work in one place for about a year and a half. That is, any company in which you work is just a small step. On this step, you will raise some money, gain new knowledge, skills, connections and lines in your resume. After that, there is a transition to the next step.

At TrueAccord itself, I was involved in the fact that I screwed recommender systems to email newsletters, as well as to prioritize phone calls. Impact is understandable and, through A/B testing, measured quite well in dollars. Since there was no real machine learning before my arrival, the impact from my work was not bad. Again, improving nothing is much easier than something that is already highly optimized.

I even had my base pay raised after six months of working on these systems from $150k to $163k. In the community Open Data Science (ODS) there is a meme about $163k. It grows from here with its feet.

All this was beautiful, but led nowhere, or led, but not there.

I have a lot of respect for TrueAccord, both for the company and for the guys I worked with there. I learned a lot from them, but I did not want to work on recommender systems for a long time in a collection agency. From this step it was necessary to step in some direction. If not forward and up, then at least sideways.

What didn't I like?

  1. From a machine learning point of view, the tasks did not excite me. I wanted something fashionable, youthful, that is, Deep Learning, Computer Vision, something rather close to science, or at least to alchemy.
  2. A startup, and even a collection agency, has problems hiring highly qualified personnel. As a startup, he can't pay much. And as a collection agency, he loses in status. Roughly speaking, if a girl on a date asks where you work? Your answer: "Google" sounds way better than "collection agency". It bothered me a little that, unlike me, my friends who work at Google, Facebook, opened doors like this: you can be invited to a conference or meetup as a speaker, or more interesting people write on LinkedIn with an offer to meet to talk over a glass of tea. I really like to communicate with people I don't know live. So if you live in San Francisco, feel free to write - we'll go for coffee, we'll talk.
  3. In addition to me, three Data Scientists worked in the company. I was doing machine learning, and they were doing other data science tasks, which are in any startup from here to tomorrow. As a result, they did not really understand machine learning. And in order to grow, I need to communicate with someone, discuss articles and latest developments, ask for advice, after all.

What was available?

  1. Physics education, not computer science.
  2. The only programming language I knew was Python. There was a feeling that it was necessary to switch from C ++ to you, but the hands still did not reach.
  3. A year with a penny of work in the industry. And at work, I was not engaged in either Deep Learning or Computer Vision.
  4. Not a single article on Deep Learning / Computer Vision in the resume.
  5. There was a Kaggle Master achievement.

What did you want?

  1. A position in which many networks will need to be trained, and closer to computer vision.
  2. Better if it's a big company like Google, Tesla, Facebook, Uber, LinkedIn, etc. Although in a pinch, a startup would do.
  3. I don't need to be the biggest machine learning expert on the team. There was a great need for senior comrades, mentors and all kinds of communication, which was supposed to speed up the learning process.
  4. After reading blog posts about how graduates without industrial experience have a total compensation of $300-500k per year, I wanted to go into the same range. It's not that I was bombarded with this, but since they say that this is a common occurrence, and I have less, then this is a signal.

The task looked quite solvable, though not in the sense that you can enter any company from the foot, but rather that if you starve, then everything will work out. That is, tens or hundreds of attempts, and the pain of every failure and every rejection, can be used to sharpen focus, improve memory and stretch the day to 36 hours.

I tweaked my resume, started sending out, and went to interviews. For the most part, I flew by at the stage of communication with HR. Many people needed C++, but I didn't know it, and there was a strong feeling that I wouldn't be very interested in positions that require C++.

It is worth noting that around the same time there was a phase transition in the type of competition on Kaggle. Until 2017, there was a lot of tabular data and very rarely picture data, but starting from 2017, a lot of computer vision tasks started.

Life flowed in the mode:

  1. Happy work.
  2. When tech screen / onsite you take time off.
  3. Evenings and weekends Kaggle + articles / books / blog posts

The end of 2016 marked the fact that I joined the community Open Data Science (ODS)which made a lot of things easier. There are plenty of guys in the community with rich industrial experience, which allowed us to ask a lot of stupid questions and get a lot of smart answers. There are also very strong machine learning specialists of all stripes, which, unexpectedly, allowed me to close the issue through ODS with regular deep communication about Data Science. Until now, in the context of ML, ODS gives me many times more than I get at work.

Well, as usual, ODS has enough specialists in competitions on Kaggle and other sites. It is more fun and productive to solve problems in a team, so with jokes, obscenities, memes and other nerdy entertainment, we began to cut tasks one by one.

In March 2017 - in a team with Seryoga Mushinsky - third place for Dstl Satellite Imagery Feature Detection. Gold medal on Kaggle + $20k for two. On this task, work with satellite images + binary segmentation via UNet was pumped. Blog post on Habré on this topic.

That same March, I went to an interview at NVidia with the Self Driving team. Strongly floated on questions about Object Detection. Knowledge was not enough.

Luckily, at the same time, the competition in Object Detection on aerial imagery from the same DSTL began. God himself ordered to solve the problem and pump. Monthly evenings and weekends. I picked up the knowledge and finished second. In this competition there was an interesting nuance in the rules, which led to the fact that I was shown in Russia on federal and not so channels. I got on Home Lenta.ru, and in a bunch of print and online publications. Mail Ru Group received a slightly positive PR at my expense and my own money, and the fundamental science of Russia was enriched by 12000 pounds. As usual, on this topic was written blog post on habr. Go there for details.

At the same time, a Tesla recruiter contacted me and offered to talk about the Computer Vision position. I agreed. Famously went to take home, two tech screens, onsite interview, had a very pleasant conversation with Andrei Karpathy, who had just been hired by Tesla as Director of AI. The next stage is the background check. After that, Elon Musk had to personally approve my application. Tesla has a strict Non Disclosure Agreement (NDA).
I didn't pass the backgound check. The recruiter said that I talk a lot online, violating the NDA. The only place where I said anything about the Tesla interview was at ODS, so the current hypothesis is that someone took a screenshot and wrote HR at Tesla, and I was removed from the race out of harm's way. Then it was embarrassing. Now I'm glad it didn't work out. My current position is much better, although it would be very interesting to work with Andrey.

Right after that, I plunged into the satellite imagery competition on Kaggle from Planet Labs - Understanding the Amazon from Space. The task was simple and extremely boring, no one wanted to solve it, but everyone wanted a free gold medal or prize money. Therefore, a team of Kaggle Masters of 7 people agreed that we would throw iron. We trained 480 networks in the 'fit_predict' mode and made a three-story ensemble out of them. Finished seventh. Blog post describing the solution by Artur Kuzin. By the way, Jeremy Howard, who is widely known as the creator of Fast AI finished 23.

After the end of the competition, through a friend who worked at AdRoll, I organized a Meetup on their squares. Representatives of Planet Labs spoke there about what the organization of the competition and data markup looked like on their part. Wendy Kwan, who works at Kaggle and curated the competition, spoke about how she saw it. I described our solution, tricks, tricks and technical details. Two-thirds of the audience solved this problem, so the questions were asked to the point and in general everything was cool. Jeremy Howard was there too. It turned out that he finished in 23rd place, because he did not know how to stack models and that he did not know at all about this method of building ensembles.

Machine learning meetups in the valley are very different from meetups in Moscow. As a rule, meetups in the valley are the bottom. But ours turned out well. Unfortunately, the comrade who was supposed to press the button and write everything down did not press the button 🙁

After that, I was invited to talk to the position of Deep Learning Engineer at this same Planet Labs, and right on site. I didn't pass it. The wording of the refusal is not enough knowledge in Deep Learning.

I designed each competition as a project in LinkedIn. For the DSTL task, we wrote pre print and uploaded to arxiv. Not an article, but even that is bread. I also recommend to everyone else to inflate their LinkedIn profile through competitions, articles, skills, and so on. There is a positive correlation between how many keywords you have on your LinkedIn profile and how often you get emails.

If in winter-spring I swam hard on the technical side, then by August I had both knowledge and self-confidence.

At the end of July, a guy who worked as a Data Science manager at Lyft contacted me on LinkedIn and invited me to drink coffee, talk about life, for Lyft, for TrueAccord. We talked. He offered to interview him in the team for the position of Data Scientist. I said that the option is working, provided that it is Computer Vision / Deep Learning from morning to evening. He assured that there were no objections from his side.

I sent my resume, he uploaded it to the internal Lyft portal. After that, a recruiter called me to open my resume and find out more about me. From the very first words, it seemed that this was a formality for him, since it is obvious to him from the summary that “I am not a material for Lyft”. I guess my resume ended up in the trash after that.

All this time, while I was interviewing, I was discussing my failures and falls in ODS and the guys gave me feedback and helped me in every way with advice, although, as usual, there was also enough friendly trolling there.

One of the ODS members offered to set me up with a friend of his who is the Director Of Engineering at Lyft. No sooner said than done. I come to Lyft for lunch, and besides this friend, there is also the Head of Data Science and one Product manager who is a big fan of Deep Learning. At lunch we talked for DL. And since I have been training networks 24/7 for half a year, reading cubic meters of literature, and chasing tasks on Kaggle with more or less intelligible results, I could talk about Deep Learning for hours, both in the context of new articles and in the context of practical techniques .

After dinner, they looked at me and said - it’s immediately obvious that you are handsome, don’t you want to talk to us? Moreover, they added that I understand that take home + tech screen can be skipped. And that I will be invited immediately to onsite. I agreed.

After that, that recruiter called me to schedule an onsite interview, and he was dissatisfied. He mumbled something about not having to jump over his head.

Came. onsite interview. Five hours of communication with different people. There was not a single question, not only about Deep Learning, but about machine learning, in principle, there was nothing. Since there is no Deep Learning / Computer Vision, then I'm not interested. So the interview results were orthogonal.

This recruiter calls and says - congratulations, you went to the second onsite interview. This is all surprising. What else is the second onsite? I have never heard of such a thing. Went down. There are a couple of hours, this time all about traditional machine learning. That's better. But still not interesting.

The recruiter calls with congratulations that I went to the third onsite interview and promises with an oath that this will be the last one. I also went to him - and here was both DL and CV.

I had a multi-month prior telling me that there would be no offer. I fly not on technical skills, but on soft. Not on soft, but on the fact that the position will be closed or that the company is not hiring yet, but simply probes the market and the level of candidates.

Mid August. I drank beer well. Thoughts are dark. It's been 8 months and still no offer. It is well creative under beer, especially if this creative is strange. An idea comes to my mind. I share it with Alexey Shvets, who at that time was a postdoc at MIT.

But what if you take the next DL/CV conference, watch the competitions that are held within it, train something and make a submission? Since there are all the experts who build their careers on this, and have been doing this for many months or even years, we have no chance. But it's not scary. We make some kind of meaningful submission, fly it to the last place, and after that we write a pre-print or an article that we are not like everyone else and talk about our decision. And the article is already on LinkedIn and in the resume.

That is, it seems to be to the point and more of the right keywords in the resume, which should slightly increase the chances of getting to the tech screen. Code and submissions from me, texts from Alexey. Game, of course, but why not?

No sooner said than done. The next conference we googled was MICCAI and there really were competitions. We poked at the first one. It was Gastrointestinal Image Analysis (GIANA). The task has 3 subtasks. The deadline was 8 days away. In the morning I sobered up, but I did not discard the idea. I took my pipelines from Kaggle, switched from satellite data to medical ones. 'fit_predict'. Alexey prepared a two-page description of the solutions for each problem, and we sent it. Ready. In theory, you can breathe. But it turned out that there is another task for the same workshop (Robotic Instrument Segmentation) with three subtasks and that its deadline was shifted by 4 days, that is, we can do 'fit_predict' there and send it. So we did.

Unlike Kaggle, these competitions had their own academic specifics:

  1. No Leaderboard. Submissions are sent by email.
  2. You are removed if the team representative did not come to present the solution to the conference at the Workshop.
  3. Your place on the leaderboard becomes known only during the conference. A sort of academic drama.

The MICCAI 2017 conference was held in Quebec City. To be honest, by September, I started to burn out, so the idea of ​​taking a week off from work and hitting the road to Canada looked interesting.

Came to the conference. I came to this Workshop, I don't know anyone, I'm sitting in the corner. Everyone knows each other, they communicate something there, pour smart medical words. Review of the first competition. Participants speak and talk about their decisions. There famously twisted, with a twinkle. My turn. And I'm kind of ashamed. They solved the problem, worked on it, moved science forward, and we purely - 'fit_predict' from past developments, not for science, but to swing the resume.

He came out, said that I was not an expert in zero medicine either, apologized for wasting their time, showed one slide with the solution. Went down to the hall.

They announce the first subtask - we are the first, and by a margin.
Announce the second - the third.
They announce the third - again the first and again with a margin.
General is the first.

From physicists to Data Science (From engines of science to office plankton). The third part

Official press release.

Some in the audience are smiling, looking at me with respect. Others, those who, apparently, were considered an expert in the field, knocked out a grant for this task and have been doing this for many years, their faces slightly twisted.

Next - the second task, the one in which there are three subtasks and which was moved by four days.

Here I also apologized, again showed our one slide.
The same story. Two first, one second, common first.

I think this is probably the first time in history that a collection agency has won a medical imaging competition.

And now I’m standing on the stage, they hand me some kind of regular diploma and they bomb me. How about your mother, right? These academics spend taxpayer money, work to simplify and improve the quality of doctors' work, that is, in theory, my life expectancy, and some body tore this entire academic staff into the British flag in a few evenings.

A bonus to this is that in other teams, graduate students who have been working on these tasks for many months will have a nice resume for HR, that is, they will easily reach the tech screen. And I have a freshly received email in front of my eyes:

A Googler recently referred you for the Research Scientist, Google Brain (United States) role. We carefully reviewed your background and experience and decided not to proceed with your application at this time.

In general, right from the stage, I ask the audience: “Does anyone know where I work?” One of the organizers of the competition knew - he googled what TrueAccord was. The rest are not. I continue: “I work in a collection agency, and at work I do neither Computer Vision nor Deep Learning. And in many ways, this happens because the HR departments of Google Brain and Deepmind filter my resume, not giving a chance to show technical training. "

They handed me a diploma, a break. I am pulled aside by a group of academicians. It turned out that this is a Health group with Deepmind. They were so imbued that they immediately wanted to talk to me about the Research Engineer vacancy in their team. (After all, we talked. This communication dragged on for 6 months, I went through take home, quiz, but cut off at the tech screen. 6 months from the beginning of communication to the tech screen is a long time. A long wait gives a taste of uselessness. Research Engineer in Deepmind in London, against the backdrop of TrueAccord, there was a strong step up, but against the backdrop of my current position, it's a step down. From a distance of two years that have since passed, it's good that I didn't.)

Conclusion

Around the same time, I did receive an offer from Lyft, which I accepted.
Following the results of these two competitions with MICCAI were published:

  1. Automatic instrument segmentation in robot-assisted surgery using deep learning
  2. Angiodysplasia detection and localization using deep convolutional neural networks
  3. 2017 Robotic instrument segmentation challenge

That is, despite all the wildness of the idea, adding incremental articles and preprints through competitions works well. And in the years that followed, we got even worse.

From physicists to Data Science (From engines of science to office plankton). The third part

I've been at Lyft for the last couple of years doing Computer Vision / Deep Learning for Self Driving cars. That is, what he wanted, he got. And tasks, and a status company, and strong colleagues, and all the other goodies.

During these months, I had communication with both large companies Google, Facebook, Uber, LinkedIn, and with a sea of ​​startups of various sizes.

All those months were painful. The universe tells you every day that something is not very pleasant. Regular rejection, you regularly make mistakes and all this is flavored with a persistent sense of hopelessness. There are no guarantees that everything will work out for you, but there is a feeling that you are a fool. Very reminiscent of how I tried to find a job right after university.

I think that many people were looking for work in the valley and everything was much easier for them. The focus, in my opinion, is this. If you are looking for a job in a field in which you understand, have abundant experience and your resume says the same, there is no problem. I took it and found it. Sea of ​​vacancies.

But if you look for a job in a new field for yourself, that is, when there is no knowledge, no connections and a resume says the wrong thing - at this moment everything becomes extremely interesting.

Right now, recruiters regularly write to me and offer to do the same thing that I do now, but in a different company. It's time to change your job. But there is no point in going to do what I already know how to do well. For what?

But for what I want, I again have neither knowledge nor lines in my resume. Let's see how this all ends. If everything grows together I will write the next part. 🙂

Source: habr.com

Add a comment