How to recognize a charlatan from Data Science?

How to recognize a charlatan from Data Science?
You may have heard of analysts, machine learning and artificial intelligence specialists, but have you heard of those who are undeservedly overpaid? Meet data charlatan! These tricksters, lured by lucrative jobs, give real data scientists a bad name. In the material we understand how to bring such people to clean water.

Data charlatans are everywhere

Data charlatans are so good at hiding in plain sight that you can be one of themwithout even realizing it. Chances are your organization has been harboring these scammers for years, but the good news is that they are easy to identify if you know what to look for.
The first warning sign is not understanding what analytics and statistics are very different disciplines. I will explain this further.

Miscellaneous disciplines

Statisticians are trained to draw conclusions about what is outside their data, analysts are trained to study the content of a dataset. In other words, analysts draw conclusions about what is in their data, and statisticians draw conclusions about what is not in the data. Analysts help you ask good questions (guesses), and statistics help you get good answers (test hypotheses).

There are also bizarre hybrid roles where a person tries to sit on two chairs... Why not? A basic tenet of data science: if you're dealing with uncertainty, don't use the same data point for hypotheses and testing. When data is limited, uncertainty forces one to choose between statistics or analytics. Explanation here.

Without statistics, you will be stuck and unable to understand whether the judgment you have just formulated stands up to criticism, and without analysis, you are moving blindly, having little chance of taming the unknown. This is a difficult choice.

The charlatan's way out of this mess is to ignore it and then pretend to be surprised at what it suddenly reveals. The logic behind testing statistical hypotheses boils down to asking whether the data surprises us enough to change our minds. How can we be surprised by the data if we've already seen it?

Whenever charlatans find a pattern they inspire then test the same data for the same patternto post the result, with a legitimate p-value or two, alongside their theory. By doing so, they are lying to you (and possibly to themselves too). This p-value doesn't matter if you don't stick to your hypothesis. to how you view your data. Charlatans imitate the actions of analysts and statisticians without understanding the reasons. As a result, the entire field of data science is getting a bad rap.

True statisticians always draw their own conclusions

Thanks to the almost mystical reputation of statisticians for rigorous reasoning, the amount of fake information in Data Science is at an all-time high. It's easy to cheat and not get caught, especially if the unsuspecting victim thinks it's all about equations and data. A dataset is a dataset, right? No. It matters how you use it.

Luckily, you only need one clue to catch the charlatans: they "rediscover America after the fact." Rediscovering phenomena that they already know are present in the data.

Unlike charlatans, good analysts are open-minded and understand that inspirational ideas can have many different explanations. At the same time, good statisticians carefully define their conclusions before they draw them.

Analysts are exempt from liability... as long as they don't go beyond their data. If they are tempted to claim something they haven't seen, that's a different job. They should β€œtake off their shoes” as an analyst and β€œchange into” the shoes of a statistician. After all, whatever the official job title, there is no rule that says you can't study both trades if you want to. Just don't confuse them.

Just because you're good at statistics doesn't mean you're good at analytics, and vice versa. If someone is trying to tell you otherwise, you should be on your guard. If this person tells you that you are allowed to draw a statistical inference on the data that you have already studied, this is a reason to be doubly wary.

Bizarre Explanations

When you observe data charlatans in the wild, you will notice that they love to make up fantasy stories to "explain" observed data. The more academic the better. It doesn't matter that these stories are retroactively driven.

When charlatans do this - let me be generous with words - they are lying. No amount of equations or beautiful concepts make up for the fact that they offered zero proof of their versions. Do not be surprised at how unusual their explanations are.

This is the same as demonstrating your "psychic" abilities by first looking at the cards in your hands, and then predicting what you are holding ... what you are holding. This is a hindsight bias, and the data scientist profession is stuffed with it.

How to recognize a charlatan from Data Science?

Analysts say: "You just went with the queen of diamonds." Statisticians say, β€œI wrote down my hypotheses on this piece of paper before we started. Let's play, look at some data and see if I'm right." The charlatans say, "I knew you were going to be that queen of diamonds because..."

Data partitioning is the quick fix that everyone needs.

When there is not much data, you have to choose between statistics and analytics, but when there is more than enough data, there is a great opportunity to use analytics without cheating ΠΈ statistics. You have the perfect protection against charlatans - this is the separation of data and, in my opinion, this is the most powerful idea in Data Science.

To protect yourself from charlatans, all you have to do is make sure you keep some test data out of their prying eyes and then treat everything else as analytics. When you come across a theory that you risk accepting, use it to assess the situation and then reveal your secret test data to verify that the theory is not nonsense. It's so simple!

How to recognize a charlatan from Data Science?
Make sure no one is allowed to view the test data during the exploration phase. To do this, stick to research data. Test data should not be used for analysis.

This is a big step up from what people are used to in the era of "small data", where you have to explain how you know what you know in order to finally convince people that you really know something.

Applying the same rules to ML/AI

Some charlatans posing as ML/AI experts are also easy to spot. You will catch them the same way you would catch any other bad engineer: the "solutions" they try to build constantly fail. An early warning sign is a lack of experience with industry standard languages ​​and programming libraries.

But what about people building systems that seem to work? How do you know if something suspicious is going on? The same rule applies! The Charlatan is a sinister character who shows you how well the model performed… on the same data they used to create the model.

If you've built an insanely complex machine learning system, how do you know how good it is? You won't know until you show her that she's working with new data that she hasn't seen before.

When you saw the data before forecasting, it is unlikely that beforesaying.

When you have enough data to split, you don't need to invoke the beauty of your formulas to justify a project (an old fashionable habit I see everywhere, not just in science). You can say: β€œI know it works because I can take a data set that I haven’t seen before and predict exactly what will happen there… and I’ll be right. Again and again".

Testing your model/theory against new data is the best basis for trust.

I don't tolerate data charlatans. I don't care if your opinion is based on different chips. I'm not impressed by the beauty of the explanations. Show me that your theory/model works (and continues to work) on a range of new data that you've never seen before. This is the real test of the strength of your opinion.

Contacting Data Scientists

If you want to be taken seriously by anyone who understands this humor, stop hiding behind fancy equations to keep your personal bias alive. Show what you have. If you want those who "get it" to see your theory/model as more than just inspirational poetry, have the courage to put on a grand display of how well it performs on a brand new data set... in front of witnesses!

Appeal to leaders

Refuse to take any "ideas" about data seriously until it's been tested against new data. Don't want to put in the effort? Stick to the analytics, but don't rely on these ideas - they are unreliable and have not been tested for reliability. Also, when an organization has data in abundance, there is no downside to making separation the foundation of science and maintaining it at the infrastructure level by controlling access to test data for statistics. This is a great way to stop attempts to fool you!

If you want to see more examples of charlatans plotting something bad - this is a great twitter thread.

Results

When the data is too small to separate, only the charlatan tries to strictly follow the inspiration, discovering America retrospectively, mathematically rediscovering phenomena already known to be in the data, and calling the surprise statistically significant. This distinguishes them from the open-minded analyst dealing with inspiration and the meticulous statistician offering proof when making predictions.

When there is a lot of data, get into the habit of sharing data so you can have the best of both worlds! Be sure to do analytics and statistics separately for separate subsets of the original data pile.

  • Analysts offer you inspiration and perspective.
  • Statistics offer you rigorous testing.
  • Charlatans offer you a twisted hindsight that pretends to be analytics plus statistics.

Perhaps, after reading the article, you will have the thought β€œam I a charlatan”? This is fine. There are two ways to get rid of this thought: first, look back, see what you have done, whether your work with data has brought practical benefits. And secondly, you can still work on your qualifications (which certainly will not be superfluous), especially since we give our students practical skills and knowledge that allow them to become real data scientists.

How to recognize a charlatan from Data Science?

More courses

Read more

Source: habr.com

Add a comment