Notes Data Scientist: where to start and is it necessary?

Notes Data Scientist: where to start and is it necessary?

TL;DR is a post for questions/answers about Data Science and how to enter the profession and develop in it. In the article, I will analyze the basic principles and the FAQ and am ready to answer your specific questions - write in the comments (or in a personal message), I will try to answer everything within a few days.

With the advent of the “Satanist date” series of notes, a lot of messages and comments came with questions about how to start and where to dig, and today we will analyze the main skills and questions that arose after publications.

Everything indicated here does not claim any ultimate truth and is the subjective opinion of the author. We will break down the main things that seem to be the most important in the process.

Why exactly is this needed

In order for the goal to be achievable better, so that it looks at least somehow specific - you want to become a DS or Research Scientist on Facebook / Apple / Amazon / Netflix / Google - see the requirements, languages ​​​​and necessary skills directly for which position. What is the recruitment process? How to pass a normal day in such a role? What does the average profile of a person who works there look like?

Often the general picture is that a person does not really understand what exactly he wants and it is not entirely clear how to prepare for this unclear image - therefore, it is worth having at least a rough plan of what exactly you want.

Specify the current view of the goal

Even if it changes along the way, and this is generally normal - to change plans in the course of the play - it is worth having a goal in front of you and focusing on it, periodically evaluating and rethinking.

Will it or is it still relevant

By the time you get to the position.

Imagine that before your position you need to get a PhD, work for 2-3 years in the industry and generally get your hair cut while meditating in a monastery - won't the situation with Data Science be the same as it once was with economists and lawyers? Will everything change beyond recognition in the area that you want to do.

Isn't there a good chance that everyone will rush there now and we will see a picture when there is a wide layer of people who are trying to enter the profession - and there will be just a minuscule position to start.

It may be worth considering current trends when choosing a path, not only the current state of the labor market, but also your idea of ​​\uXNUMXb\uXNUMXbhow it is changing and where it is.

For example, the author did not plan to go to the data of Satanists, but during the PhD he sawed side projects that strongly echoed in skills with DS and, at the end of graduate school, naturally moved to the environment, seeing a good position.

If in the course of the play it turns out that it will be necessary to go somewhere else - because there is now the most action and all the most interesting action is taking place, well, then we will go there in a natural way.

Skill breakdown

These are conditional categories of skills that seem to me to be key to full and effective work in DS. Separately, I will highlight English - learn whatever you do in CS. Next are the key categories.

Programming/Scripting

What languages ​​do you need to know? Python? Java? shell scripting? lua? SQL? C++?

What exactly you need to be able to do and why in terms of programming - here the range of positions is very different.

For example, I often have to implement complex logic, queries, models, analytics, and generally develop interpreted systems, but there are almost never any code speed requirements, except for the most general and reasonable ones.

Therefore, my skill set is very different from those who write the Tensorflow library and think about optimizing the code for efficient use of the l1 cache and things like that, so see what you need and evaluate the right way to learn.

For example, for python, people already even compose map language learning.

Surely there are already experienced tips for your needs and there are good sources - you need to decide on the list and start doing it.

Understanding business processes

Without it, nowhere: you need to understand why you are needed in this process, what you are doing and why. Often this is what can save you a lot of time, maximize your benefits and not waste time and resources on bullshit.

Usually, I ask myself the following questions:

  • What exactly do I do in the company?
  • What for?
  • Who will use it and how?
  • What options do I have?
  • What are the limits of the parameters?

Here is a little more about the parameters: you can often greatly change the script of work if you know that something can be sacrificed: for example, interpretability or vice versa, a couple of percent will not play a role here and we have a very quick solution, and the client needs it, because he pays for the time the pipeline is running in AWS.

Mathematics

Here you think and understand everything yourself - without knowledge of basic mathematics, you are nothing more than a monkey with a grenade (sorry with Random Forest) - so you need to understand at least basic things. If I were to make the most minimal list, then it would include:

  • Linear Algebra - a huge number of resources are easy to google, look for what suits you best;
  • Mathematical analysis - (at least in the volume of the first two semesters);
  • Probability theory is everywhere in machine learning;
  • Combinatorics - it is actually complementary to theorver;
  • Graph theory - at least BASIC;
  • Algorithms - at least the volume of the first two semesters (see Kormen's recommendations in his book);
  • Mathlogic - at least basic.

Practical data analysis and visualization

One of the most fundamental things is to be able not to be afraid to get your hands dirty in the data and conduct a comprehensive analysis of the dataset, project and throw in a quick data visualization.

Exploratory data analysis should become just something natural, like all other data transformations and the ability to throw in a simple pipeline from unix nodes (see previous articles) or write a readable and understandable notebook.

Separately, I will mention visualization: it is better to see once than to hear a hundred times.

Showing a graph to a manager is a hundred times easier and clearer than a set of numbers, so matplotlib, seaborn and ggplot2 are your friends.

Soft skills

It is equally important to be able to communicate your ideas, as well as results and concerns (etc) to others - make sure you can clearly state the problem in both technical and business terms.

You can explain to colleagues, managers, superiors, clients and anyone who needs it, what is happening, what kind of data you operate on and what kind of results you got.

Your charts and documentation should be read without you. That is, you do not need to go to you to understand what is written there.

You can make a clear presentation to get the point across and/or document the project/your work.

You can reasonably and unemotionally convey your position, say “yes / no” or question / support the decision.

Training

There are many different places where you can learn all this. I will give a short list - I have tried everything from it and, to be honest, each item has its pros and cons. Try and decide what suits you, but I highly recommend trying several options and not getting hung up on one.

  • Online courses: coursera, udacity, Edx, etc;
  • New schools: online and offline - SkillFactory, ShAD, MADE;
  • Classical schools: university master's programs and advanced training courses;
  • Projects - you can simply select the tasks that interest you and cut them, posting them on github;
  • Internships - it’s hard to suggest something here, you have to look for what is available and find suitable options.

Is it necessary?

In conclusion, perhaps I will add three personal principles that I try to follow myself.

  • Should be interesting;
  • Bring inner pleasure (= at least not cause suffering);
  • "To be yours."

Why exactly them? It is difficult to imagine that you will be doing something day after day and you will not like it or will not be interested. Imagine that you are a doctor and hate to communicate with people - this can certainly work somehow, but you will be constantly uncomfortable with the flow of patients who want to ask you something. It doesn't work in the long run.

Why did I specifically mention inner pleasure? It seems to me that this is necessary for further development and, in principle, the learning process. I really enjoy when I manage to complete some complex feature and build a model or calculate an important parameter. I enjoy when my code is aesthetically pleasing and well written. Therefore, it is interesting to study something new and does not directly require any significant motivation.

“Being yours” is the very feeling that you wanted to do about this. I have a little story. Since childhood, I was fond of rock music (and metal - SALMON!) and how many people wanted to learn how to play, and that's it. It turned out that I had no hearing and no voice - this didn’t bother me at all (and I must say that this doesn’t bother many performers right on stage), and as a schoolboy I got a guitar ... and it became clear that I didn’t really like to sit for hours and play on it. It was hard, it always seemed to me that some kind of garbage was coming out - I did not enjoy it at all and only felt lousy, stupid and completely incapable. I literally forced myself to sit down for classes from under the stick, and in general it was not in the horse's fodder.

At the same time, I could quite calmly sit for hours developing some kind of toy, using a script to animate something on a flash (or something else) and I was wildly motivated to finish the elements in the game or deal with the mechanics of movement and / or connecting third-party libraries, plugins and everything else.

And at some point I realized that playing the guitar is not mine and I really like to listen, not to play. And my eyes burned when I wrote games and code (listening to all sorts of metal at that moment), and that's what I liked then, and that's what I should have been doing.

Do you have any other questions?

Of course, we could not go through all the topics and questions, so write comments and in a personal - I'm always happy to ask questions.

Notes Data Scientist: where to start and is it necessary?

Notes Data Scientist: where to start and is it necessary?

Source: habr.com

Add a comment