What to read as a data scientist in 2020

What to read as a data scientist in 2020
In this post, we share with you a selection of sources of useful information about Data Science from the co-founder and CTO of DAGsHub, a community and web platform for data version control and collaboration between data scientists and machine learning engineers. The selection includes a variety of sources, from Twitter accounts to full-fledged engineering blogs, which are targeted at those who know exactly what they are looking for. Details under the cut.

From the author:
You are what you eat, and as a knowledge worker, you need a good informational diet. I want to share sources of information about Data Science, artificial intelligence and related technologies that I find most useful or attractive. I hope this helps you too!

Two Minute Papers

A YouTube channel that is well suited to keep up to date with the latest events. The channel is updated frequently and the host has an infectious enthusiasm and positivity in all topics covered. Expect coverage of interesting work not only on AI, but also on computer graphics and other visually appealing topics.

Yannick Kilcher

On his YouTube channel, Yannick explains significant research in deep learning in technical detail. Instead of reading a study on your own, it's often quicker and easier to watch one of its videos to gain a deeper understanding of important articles. The explanations convey the essence of the articles without neglecting the math or getting lost in three pines. Yannick also shares his views on how studies fit together, how seriously results should be taken, broader interpretations, and so on. Beginners (or non-academic practitioners) find it more difficult to come to these discoveries on their own.

distill.pub

In their own words:

Machine learning research needs to be clear, dynamic, and vibrant. And Distill was created to help in research.

Distill is a unique machine learning research publication. Articles are promoted with stunning visualizations to give the reader a more intuitive understanding of the topics. Spatial thinking and imagination tend to work very well in helping you understand Machine Learning and Data Science topics. Traditional publication formats, on the other hand, tend to be rigid in their structure, static and dry, and sometimes "mathematical". Chris Olah, co-creator of Distill, also maintains an amazing personal blog at GitHub. It hasn't been updated for a long time, but still remains a collection of the best deep learning explanations ever written. In particular, it helped me a lot description LSTM!

What to read as a data scientist in 2020
source

Sebastian Ruder

Sebastian Ruder writes a very informative blog and newsletter, primarily about the intersection of neural networks and natural language text analysis. He also gives a lot of advice to researchers and conference speakers, which can be very helpful if you're in academia. Sebastian's articles tend to take the form of reviews, summarizing and explaining the state of the art in research and methods in a given area. This means that the articles are extremely useful for practitioners who want to get their bearings quickly. Sebastian also writes in Twitter.

Andrey Karpaty

Andrei Karpaty needs no introduction. In addition to being one of the most famous deep learning researchers on earth, he creates widely used tools such as archive sanity preserver as side projects. Countless people entered this realm through his Stanford course. cs231n, and it will be useful for you to know it recipe neural network training. I also recommend watching it speech about the real problems that Tesla must overcome when trying to apply machine learning on a massive scale in the real world. Speech is informative, impressive and sobering. Besides articles about ML itself, Andrey Karpaty gives good life advice for ambitious scientists. Read Andrew at Twitter and Github.

Uber Engineering

The Uber engineering blog is really impressive in terms of scale and breadth of coverage, covering a lot of topics, in particular Artificial Intelligence. What I especially like about Uber's engineering culture is their tendency to release very interesting and valuable Projects open source at a breakneck pace. Here are some examples:

OpenAI Blog

Controversy aside, the OpenAI blog is undeniably great. From time to time, the blog posts content and insights about deep learning that can only come at the scale of OpenAI: hypothetical phenomenon deep double descent. The OpenAI team tends to post infrequently, but these are important content.

What to read as a data scientist in 2020
source

Taboola Blog

The Taboola blog isn't as well known as some of the other sources in this post, but I think it's unique - the authors write about very mundane, real problems when trying to apply ML in production for a "normal" business: less about self-driving cars and RL agents winning world champions, more about "how do I know if my model is now predicting things with false confidence?". These issues are relevant to almost everyone working in the field and receive less press coverage than more common AI topics, but it still takes world-class talent to properly address these issues. Luckily, Taboola has both this talent and the willingness and ability to write about it so other people can learn too.

Reddit

Along with Twitter, there's nothing better on Reddit than getting hooked on the research, tools, or wisdom of the crowd.

State of AI

Posts are published only annually, but filled with information very densely. Compared to other sources on this list, this one is more accessible to non-tech business people. What I love about the talks is that they try to give a more holistic view of where the industry and research are heading, tying together advances in hardware, research, business, and even geopolitics from a bird's eye view. Be sure to start at the end to read about conflicts of interest.

Podcasts

Frankly, I think podcasts are ill-suited for learning about technical topics. After all, they use only sound to explain topics, and data science is a very visual field. Podcasts tend to give you an excuse to explore in more depth later, or for engaging philosophical discussions. However, here are some recommendations:

  • lex friedman podcastwhen he talks to prominent researchers in the field of artificial intelligence. Episodes with Francois Chollet are especially good!
  • Data Engineering podcast. Nice to hear about new data infrastructure tools.

Awesome lists

There's less to keep an eye on here, but more resources that are helpful once you know what you're looking for:

Twitter

  • Matty Mariansky
    Matty finds beautiful, creative ways to use neural networks, and it's just fun to see his results on your Twitter feed. Take a look at least this post.
  • Ori Cohen
    Ori is just a driving machine blogging. He writes extensively about problems and solutions for data scientists. Be sure to subscribe to be notified when an article is published. His collectionin particular is really impressive.
  • Jeremy Howard
    Co-founder of fast.ai, a comprehensive source of creativity and productivity.
  • Hamel Hussein
    A staff ML engineer at Github, Hamel Hussain is busy at work creating and reporting on many tools for coders in the data domain.
  • FranΓ§ois Chollet
    Creator of Keras, now is trying update our understanding of what intelligence is and how to test it.
  • hardmaru
    Research scientist at Google Brain.

Conclusion

The original post may be updated as the author finds great sources of content that it would be a shame not to include in the list. Feel free to contact him Twitterif you want to recommend some new source! And also DAGsHub hires Advocate [approx. transl. Public Practitioner] in Data Science, so if you create your own Data Science content, feel free to write to the author of the post.

What to read as a data scientist in 2020
Develop by reading the recommended sources, and by the promotional code HORNBEAM, you can get an additional 10% to the discount indicated on the banner.

More courses

Recommended Articles

Source: habr.com