If the number of comments under an article is rapidly approaching 1000, rest assured that regardless of the topic stated by the author, a squabble is raging inside: flashpoints of politics, surrounded by armchair experts on all issues, psychiatric diagnoses at a distance by avatar and nickname, getting personal, sarcastic attacks, the causticity of which exceeds that of the blood of xenomorphs, and, of course, the obligatory dish in such cases is mutual accusations that your counterpart is discussing with you solely for remuneration or out of duty. Which, apparently, is dangerous and difficult, and at first glance seems not to be visible, and thirty pieces of silver are not lying on the road.
The funny thing about this situation is that
Let's take one of
A real person will not be able to oppose anything to a professional commentator on a subscription...
User (so-and-so) spends an unrealistic amount of time on comments...
Moreover, its activity does not have patterns that are usually characteristic of an ordinary user....ps but this gave me the idea to write a parser-analyzer for such commentators) With an indication of activity by hour, amount of time per day, per week, etc... A good topic for an article)
Okay, stop. What kind of patterns are “usually inherent to the average user”? The author of this phrase in that thread, unfortunately, has already been transcribed, so you’ll have to go at random.
The question that I want to put before your clear eyes is the following: is it even possible, using statistical methods, to at least reliably identify these same patterns so as to create a formal classifier that distinguishes casual from professional commentators? Imagine - “according to Habra-botometer, you are 76% likely to be a Kremlinbot.” This will be much cooler than karmic raids on each other.
Unfortunately, my competencies are not enough to even suggest which direction to dig in to solve such a problem. However, last night I hacked together a small primitive parser, which (fortunately pages with comments are open even to unauthorized visitors) so far does two things - a) collects statistics from a given username of all his comments (for now just time -stamp) and adds it to the MySQL database; b) draws a time diagram, marking on it the events of comment sending taken from this database. Even without any sophisticated analysis it turned out to be quite funny. This is what my comment chart looks like. Explanations are below. It is best to view it in a separate window at a scale of 100% or more.
The horizontal axis is time, each pixel is equal to one minute, the value of the gray divisions is equal to one hour, the entire horizontal line is equal to one day. The days go from bottom to top along the vertical axis, the division value on it is 365 days.
There is nothing particularly interesting in my diagram. It can be seen that I like to sleep 7-8 hours, often go to bed after midnight, and sometimes have hours-long commenting marathons, and that activity over the past year is greater than or approximately equal to that over the previous five years.
Or here's a comrade
The activity diagram of a typical habracommentator looks something like this (this is
A distinct “sleepy hollow” on the left somewhere in the European night and leisurely commentary during daylight hours, perhaps with breaks for half a year.
But not all diagrams are so boring! How about this, for example:
In just over two years, our colleague apparently retrained his biorhythms to sleep from the European night somewhere under the Mid-Atlantic Ridge, evenly and gradually, and then spent another two years to return to the shores of Portugal. Did you walk? Swim? I can’t come up with plausible explanations... For the first three hours of being awake, comments fly like a machine gun, but at the end of the day it’s like that, once every hour I look in to see what’s going on there and that’s it.
By the way, it was
And here's another riddle:
The colleague lasted four and a half years without a single comment - apparently he was training somewhere in secret monasteries on how to stay awake for days, judging by how many comments were posted in “sleepy hollow.”
But the most interesting thing here is the anomaly at the 16th hour, which lasts for more than three years and gradually fades away in the last year. Smoke break? Walking the dog? Jogging? What else can tear a Khabrov resident away from the comments feed in the midst of a working day with such daily predetermination? I'm a slob and a lazy person, I can't imagine the kind of self-discipline that the respected
Finally, one last diagram to think about:
There is no clearly defined “sleepy hollow” on it at all. Only one can barely discern the visible excess in the number of comments sent after noon over those sent before.
With all Komsomol rigor I urge the respected
And finally, an insidious question - could someone be so interested in all this that they would want to develop the parser code or get a database dump or access to it, and so on? My own knowledge of data mining and data visualization methods hardly exceeds general erudition. I can hardly think of anything smarter and more interesting than these simple diagrams. If anyone is interested, write to me in telegram (nickname in profile).
Thank you for attention!
UPD. Posted it
Source: habr.com