How to open comments and not drown in spam

How to open comments and not drown in spam

When your job is to create something beautiful, you don’t have to talk much about it, because the result is in front of everyone’s eyes. But if you erase the inscriptions from the fences, no one notices your work until the fences look decent or until you erase something wrong.

Any service where you can leave a comment, review, send a message or upload pictures, sooner or later faces the problem of spam, fraud and obscene language. This cannot be avoided, but it must be dealt with.

My name is Mikhail, I work in the Antispam team that protects users of Yandex services from such problems. Our work is rarely noticed (and good!), so today I will talk about it in more detail. You will learn in which cases moderation is useless and why accuracy is not the only indicator of its effectiveness. We will also talk about swearing using cats and dogs as an example, and why it is sometimes useful to “think like a swearer.”

More and more services appear in Yandex where users publish their content. You can ask a question or write an answer in Yandex.Q, discuss the news of the yard in Yandex.Region, share the traffic situation in conversations on Yandex.Maps. But when the audience of the service grows, it becomes attractive to scammers and spammers. They come and fill out the comments: offer easy money, advertise miracle cures, and promise social benefits. Due to spammers, some users lose money, while others lose their desire to spend time on a unkempt, overgrown with spam service.

And this is not the only problem. We strive not only to protect users from scammers, but also to create a comfortable atmosphere for communication. If people in the comments encounter swearing and insults, they are more likely to leave and never come back. So, you also need to be able to deal with this.

Pure Web

As it often happens with us, the first developments were born in the Search, in the part that fights spam in the issuance. Ten years ago, there appeared the task of filtering adult content for family searches and for requests that did not require answers from the 18+ category. This is how the first manually typed porn and swear dictionaries appeared, they were replenished by analysts. The main task was to classify requests into those where it is permissible to show adult content and where it is not. For this task, markup was collected, heuristics were built, and models were trained. This is how the first developments for filtering unwanted content appeared.

Over time, UGC (user generated content) began to appear in Yandex - messages that are written by users themselves, and Yandex only publishes. For the reasons described above, many messages could not be published without looking - moderation was required. Then we decided to create a service that would provide protection against spam and intruders for all Yandex UGC products and use the best practices to filter unwanted content in Search. The service was called "Clean Web".

New tasks and help from tolokers

At first, only simple automation worked for us: services sent us texts, and we ran swear words, porn dictionaries and regular games on them - analysts compiled everything manually. But over time, the service was used in an increasing number of Yandex products, and we had to learn how to work with new problems.

Often, instead of a review, users publish a meaningless set of letters, trying to win an achievement for themselves, sometimes they advertise their company in reviews of a competitor’s company, and sometimes they simply confuse organizations and write in a review about a pet store: “Perfectly cooked fish!”. Perhaps someday artificial intelligence will learn to perfectly capture the meaning of any text, but now automation sometimes copes worse than a person.

It became clear that manual marking is indispensable here, and we added a second step to our contour - sending it for manual verification by a person. There were those published texts in respect of which the classifier did not see any problems. You can easily imagine the scale of such a task, so we not only relied on the assessors, but also took advantage of the "wisdom of the crowd", that is, we turned to the tolokers for help. It is they who help us identify what the machine has missed, and thereby teach it.

Smart caching and LSH hashing

Another problem that we encountered when working with comments is spam, or rather, its volume and speed of distribution. When the audience of Yandex.Region began to grow rapidly, spammers came there. They learned to bypass regular expressions by slightly changing the text. Spam, of course, was still found and deleted, but on the scale of Yandex, an unacceptable message posted even for 5 minutes could be seen by hundreds of people.

How to open comments and not drown in spam

Of course, this did not suit us, and we made smart caching of texts based on LSH (locality-sensitive hashing). It works like this: we normalized the text, threw out links from it and cut it into n-grams (sequences of n letters). Next, the hashes from n-grams were counted, and the LSH vector of the document was already built from them. The point is that similar texts, even if they were slightly changed, turned into similar vectors.

This decision made it possible to reuse the verdicts of classifiers and tolokers for similar texts. During a spam attack, as soon as the first message passed the check and got into the cache with a spam verdict, all new such messages, even modified ones, received the same verdict and were automatically deleted. Later, we learned to train and automatically retrain spam classifiers, but this "smart cache" has remained with us and still often helps us out.

Good text classifier

Before we had time to take a break from fighting spam, we realized that 95% of our content is moderated manually: classifiers only respond to violations, and most of the texts are good. We load tolokers, which in 95 cases out of 100 give the rating “Everything is OK”. I had to do an unusual job - to make classifiers of good content, since enough markup has accumulated during this time.

The first classifier looked like this: we lemmatize the text (reduce words to the initial form), throw out all the auxiliary parts of speech and use the pre-prepared “dictionary of good lemmas”. If all the words in the text are “good”, then the text as a whole does not contain violations. On different services, this approach immediately gave from 25 to 35% automation of manual markup. Of course, this approach is not ideal: it is easy to combine a few innocent words and get a very offensive statement, but it allowed us to quickly reach a good level of automation and gave us time to train more complex models.

The next versions of good text classifiers already included linear models, decision trees, and their combinations. To mark up rudeness and insults, for example, we try the BERT neural network. It's important to capture the meaning of a word in context and the relationship between words from different sentences, and BERT does a pretty good job of that. (By the way, recently colleagues from News toldhow technology is used for a non-standard task - finding errors in headers.) As a result, it was possible to automate up to 90% of the flow, depending on the service.

Accuracy, completeness and speed

To develop, you need to understand what benefits certain automatic classifiers bring, changes in them, and also whether the quality of manual checks degrades. To do this, we use indicators of accuracy and recall.

Accuracy is the proportion of correct verdicts among all bad content verdicts. The higher the accuracy, the fewer false positives. If you do not follow the accuracy, then in theory you can delete all spam and swearing, and with them - half of the good messages. On the other hand, if you rely only on accuracy, then the best technology will be the one that does not catch anyone at all. Therefore, there is also an indicator of completeness: the proportion of bad content identified among the total amount of bad content. These two metrics balance each other out.

For measurement, we sample the entire incoming stream for each service and give content samples to assessors for peer review and comparison with machine solutions.

But there is another important indicator.

I wrote above that hundreds of people can see an unacceptable message even in 5 minutes. Therefore, we count how many times we managed to show bad content to people before hiding it. This is important, because it is not enough to work efficiently - you also need to work quickly. And when we were building a defense against a checkmate, we felt it to the fullest.

Antimat on the example of cats and dogs

A small lyrical digression. Some might say that obscene language and insults are not as dangerous as malicious links, and not as annoying as spam. But we strive to maintain comfortable conditions for the communication of millions of users, and people do not like to return to where they are offended. No wonder the ban on swearing and insults is spelled out in the rules of many communities, including on Habré. But we digress.

Math dictionaries cannot cope with all the richness of the Russian language. Despite the fact that there are only four main swear words, you can make up a myriad of words from them that you can’t catch with any regular expressions. In addition, you can write part of a word in transliteration, replace letters with similar combinations, rearrange letters, add asterisks, etc. Sometimes without context, in principle, it is impossible to determine that the user meant a swear word. We respect the rules of Habr, so we will demonstrate this not with live examples, but with cats and dogs.

How to open comments and not drown in spam

"Laugh," said the cat. But we understand that the cat said a different word ...

We began to think about algorithms for “fuzzy matching” of our dictionary and smarter preprocessing: we brought transliteration, glued spaces and punctuation, searched for patterns and wrote separate regular expressions on them. This approach brought results, but often reduced accuracy, not giving the desired completeness.

Then we decided to "think like swearers." We began to introduce noise into the data ourselves: we rearranged letters, generated typos, replaced letters with similar spellings, and so on. The initial markup for this was taken by applying mate dictionaries to large corpora of texts. If you take one sentence and distort it in several ways, you get many sentences. So you can increase the training sample tenfold. It only remained to train on the resulting pool some more or less smart model that takes into account the context.

How to open comments and not drown in spam

It is too early to talk about the final decision. We are still experimenting with approaches to this problem, but we already see that a simple symbolic convolutional network of several layers significantly outperforms dictionaries and regular expressions: it turns out to increase both accuracy and recall.

Of course, we understand that there will always be ways to get around even the most advanced automation, especially when it's so reckless: to write in such a way that a stupid machine does not understand. Here, as in the fight against spam, our goal is not to eradicate the very possibility of writing something obscene, our goal is to make sure that the game is not worth the candle.

Opening up the opportunity to share your opinion, communicate and comment is not difficult. It is much more difficult to achieve safe, comfortable conditions and respect for people. And without this there will be no development of any community.

Source: habr.com

Add a comment