Yadda muke daidaita tallace-tallace

Yadda muke daidaita tallace-tallace

Kowane sabis wanda masu amfani za su iya ƙirƙirar abubuwan da suka dace (UGC - Abubuwan da aka samar da mai amfani) an tilasta ba kawai don magance matsalolin kasuwanci ba, har ma don sanya abubuwa cikin tsari a cikin UGC. Matsakaicin matsakaici ko ƙarancin ingancin abun ciki na iya ƙarshe rage kyawun sabis ɗin ga masu amfani, har ma da ƙare aikin sa.

A yau za mu gaya muku game da haɗin kai tsakanin Yula da Odnoklassniki, wanda ke taimaka mana wajen daidaita tallace-tallace a cikin Yula yadda ya kamata.

Haɗin kai gabaɗaya abu ne mai fa'ida, kuma a cikin duniyar yau, lokacin da fasahohi da abubuwan da ke faruwa suka canza da sauri, zai iya zama mai ceton rai. Me ya sa ake ɓata ƙarancin albarkatu da lokaci wajen ƙirƙira wani abu da aka riga aka ƙirƙira aka kawo a gaban ku?

Mun yi tunani iri ɗaya lokacin da muka fuskanci cikakken aikin daidaita abun ciki mai amfani - hotuna, rubutu da hanyoyin haɗin gwiwa. Masu amfani da mu suna loda miliyoyi na abun ciki zuwa Yula kowace rana, kuma ba tare da sarrafa ta atomatik ba ba zai yuwu gaba ɗaya daidaita duk waɗannan bayanan da hannu ba.

Saboda haka, mun yi amfani da tsarin daidaitawa da aka yi, wanda a lokacin abokan aikinmu daga Odnoklassniki sun kammala zuwa yanayin "kusan kamala."

Me yasa Odnoklassniki?

Kowace rana, dubban miliyoyin masu amfani suna zuwa hanyar sadarwar zamantakewa kuma suna buga biliyoyin abubuwan ciki: daga hotuna zuwa bidiyo da rubutu. Dandalin daidaitawa na Odnoklassniki yana taimakawa don bincika manyan kundin bayanai da kuma magance masu saɓo da bots.

Ƙungiyar daidaitawa ta OK ta tara ƙwarewa da yawa, tun da ta ke inganta kayan aiki na tsawon shekaru 12. Yana da mahimmanci cewa ba za su iya raba hanyoyin da aka shirya kawai ba, har ma su tsara tsarin gine-ginen dandalin su don dacewa da takamaiman ayyukanmu.

Yadda muke daidaita tallace-tallace

Daga yanzu, don taƙaitawa, za mu kira kawai dandamalin daidaitawa OK “dandamali.”

Yadda duk yake aiki

An kafa musayar bayanai tsakanin Yula da Odnoklassniki ta hanyar Apache Kafka.

Dalilin da ya sa muka zaɓi wannan kayan aiki:

  • A Yula, duk tallace-tallace an daidaita su, don haka da farko ba a buƙatar amsa ta aiki tare.
  • Idan mummunan sakin layi ya faru kuma Yula ko Odnoklassniki ba su samuwa, gami da saboda wasu manyan lodi, to bayanan daga Kafka ba zai ɓace a ko'ina ba kuma ana iya karantawa daga baya.
  • An riga an haɗa dandalin tare da Kafka, don haka an warware yawancin matsalolin tsaro.

Yadda muke daidaita tallace-tallace

Ga kowane talla da mai amfani ya ƙirƙira ko ya gyara shi a cikin Yula, ana samar da JSON mai bayanai, wanda aka sanya shi cikin Kafka don daidaitawa na gaba. Daga Kafka, ana ɗora sanarwar a cikin dandamali, inda ake yanke hukunci ta atomatik ko da hannu. Ana toshe tallace-tallace mara kyau tare da dalili, kuma waɗanda dandamali ba su sami cin zarafi ba ana yiwa alama "mai kyau." Sa'an nan kuma duk yanke shawara ana mayar da su zuwa Yula kuma a yi amfani da su a cikin sabis.

A ƙarshe, don Yula duk yana zuwa ga ayyuka masu sauƙi: aika talla zuwa dandalin Odnoklassniki kuma dawo da ƙuduri "ok", ko me yasa ba "ok".

sarrafawa ta atomatik

Me zai faru da tallan bayan ya shiga dandalin? Kowace talla an kasu kashi-kashi da yawa:

  • Suna,
  • bayanin,
  • hotuna,
  • Nau'in da aka zaɓa na mai amfani da rukunin talla,
  • ƙena.

Yadda muke daidaita tallace-tallace

Sannan dandalin yana yin tari ga kowane mahaluki don nemo kwafi. Haka kuma, an tattara rubutu da hotuna bisa ga tsare-tsare daban-daban.

Kafin tari, ana daidaita rubutu don cire haruffa na musamman, canza haruffa da sauran datti. An raba bayanan da aka karɓa zuwa N-grams, kowannensu an yi masa hashed. Sakamakon haka shine hashes na musamman. An ƙaddara kamance tsakanin matani Ma'aunin Jackard tsakanin saiti biyu da aka samu. Idan kamanni ya fi ƙofa girma, to an haɗa rubutun zuwa gungu ɗaya. Don hanzarta neman gungu iri ɗaya, ana amfani da MinHash da hashing-m Locality.

An ƙirƙira zaɓuɓɓuka daban-daban don manne hotuna don hotuna, daga kwatanta hotunan pHash zuwa neman kwafi ta amfani da hanyar sadarwa ta jijiya.

Hanya ta ƙarshe ita ce mafi "mai tsanani". Don horar da ƙirar, an zaɓi hotuna uku (N, A, P) waɗanda N ba kama da A ba, kuma P yayi kama da A (wani kwafi ne). Sa'an nan kuma cibiyar sadarwa ta jijiyoyi sun koyi yin A da P a kusa da su, kuma A da N kamar yadda zai yiwu. Wannan yana haifar da ƙarancin ƙima idan aka kwatanta da ɗaukar abubuwan haɗin gwiwa daga cibiyar sadarwar da aka riga aka horar.

Lokacin da cibiyar sadarwar jijiyoyi ta karɓi hotuna azaman shigarwa, tana samar da nau'in nau'in nau'in N(128) ga kowane ɗayansu kuma ana buƙatar tantance kusancin hoton. Bayan haka, ana ƙididdige ƙofa inda aka ɗauki hotuna na kusa a matsayin kwafi.

Samfurin yana iya samun basirar nemo masu satar bayanai waɗanda ke ɗaukar hoto iri ɗaya musamman daga kusurwoyi daban-daban don ƙetare kwatancen pHash.

Yadda muke daidaita tallace-tallaceYadda muke daidaita tallace-tallace
Misalin hotunan spam da aka haɗa tare da hanyar sadarwa ta jijiyoyi azaman kwafi.

A mataki na ƙarshe, ana neman kwafin tallace-tallacen lokaci guda ta rubutu da hoto.

Idan tallace-tallace biyu ko fiye sun makale tare a cikin gungu, tsarin zai fara toshewa ta atomatik, wanda, ta amfani da wasu algorithms, yana zaɓar wanda za a goge da wanda zai bar. Misali, idan masu amfani biyu suna da hotuna iri ɗaya a cikin talla, tsarin zai toshe tallan kwanan nan.

Da zarar an ƙirƙira, duk gungu suna wucewa ta jerin abubuwan tacewa ta atomatik. Kowace tace tana ba da maki ga gungu: ta yaya zai kasance yana ƙunshe da barazanar da wannan tacewa ke ganowa.

Misali, tsarin yana bincika bayanin a cikin talla kuma yana zaɓar nau'ikan da za a iya amfani dashi. Sannan yana ɗaukar wanda yake da matsakaicin yuwuwar kuma ya kwatanta shi da nau'in da marubucin talla ya ayyana. Idan basu dace ba, ana toshe tallan don rukunin da ba daidai ba. Kuma da yake muna da kirki da gaskiya, muna gaya wa mai amfani kai tsaye wane nau'in da yake buƙatar zaɓar don tallan ya wuce matsakaici.

Yadda muke daidaita tallace-tallace
Sanarwa na toshewa don rukunin da ba daidai ba.

Koyon inji yana jin daidai a gida a dandalin mu. Alal misali, tare da taimakonsa muna bincika sunaye da kwatancin kayan da aka haramta a cikin Tarayyar Rasha. Kuma ƙirar hanyar sadarwa ta jijiyoyi suna “binciko” hotuna sosai don ganin ko suna ɗauke da URLs, rubutun banza, lambobin waya, da bayanan “haramta” iri ɗaya.

Ga al'amuran da suke ƙoƙarin siyar da samfur da aka haramta kamar wani abu na doka, kuma babu rubutu a cikin take ko bayanin, muna amfani da alamar hoto. Ga kowane hoto, har zuwa 11 dubu daban-daban tags za a iya ƙara da cewa bayyana abin da ke cikin hoton.

Yadda muke daidaita tallace-tallace
Suna kokarin sayar da hookah ta hanyar canza shi a matsayin samovar.

A cikin layi daya tare da matattara masu rikitarwa, masu sauƙi kuma suna aiki, suna warware matsalolin bayyane masu alaƙa da rubutu:

  • antimat;
  • URL da mai gano lambar waya;
  • ambaton manzannin nan take da sauran lambobin sadarwa;
  • rage farashin;
  • tallace-tallacen da babu abin sayarwa da sauransu.

A yau, kowane tallace-tallace yana wucewa ta cikin madaidaicin madaidaicin matattarar atomatik sama da 50 waɗanda ke ƙoƙarin nemo wani abu mara kyau a cikin tallan.

Idan babu wani daga cikin masu binciken da ya yi aiki, to, ana aika da amsa ga Yula cewa tallan yana "mafi yiwuwa" a cikin tsari mai kyau. Muna amfani da wannan amsar da kanmu, kuma masu amfani waɗanda suka yi rajista ga mai siyarwa suna karɓar sanarwa game da samuwar sabon samfuri.

Yadda muke daidaita tallace-tallace
Sanarwa cewa mai siyarwa yana da sabon samfur.

Sakamakon haka, kowane tallace-tallace yana “girmamawa” tare da metadata, wasu daga cikinsu ana samar da su lokacin da aka ƙirƙira tallar (adireshin IP na marubuci, wakilin mai amfani, dandamali, yanayin ƙasa, da sauransu), sauran kuma shine makin da kowane tacewa ke bayarwa. .

Layin sanarwa

Lokacin da talla ya shiga dandamali, tsarin yana sanya shi a cikin ɗaya daga cikin layukan. Ana ƙirƙira kowane jerin gwano ta amfani da dabarar lissafi wanda ke haɗa tallan metadata ta hanyar gano kowane mummunan tsari.

Misali, zaku iya ƙirƙirar layin talla a cikin nau'in "Wayoyin salula" daga masu amfani da Yula da ake zaton daga St. Petersburg, amma adireshin IP ɗin su daga Moscow ne ko wasu birane.

Yadda muke daidaita tallace-tallace
Misalin tallace-tallacen da wani mai amfani ya buga a garuruwa daban-daban.

Ko za ku iya samar da jerin gwano bisa makin da cibiyar sadarwar jijiyoyi ke ba wa tallace-tallace, tsara su cikin tsari mai saukowa.

Kowane jerin gwano, bisa ga tsarinsa, yana ba da maki na ƙarshe ga tallan. Sannan zaku iya ci gaba ta hanyoyi daban-daban:

  • Ƙayyade bakin kofa inda talla zai karɓi wani nau'in toshewa;
  • aika duk tallace-tallacen da ke cikin jerin gwano zuwa masu gudanarwa don bita da hannu;
  • ko haɗa zaɓuɓɓukan da suka gabata: saka madaidaicin toshewa ta atomatik kuma aika wa masu daidaitawa tallace-tallacen da basu kai ga wannan ƙofar ba.

Yadda muke daidaita tallace-tallace

Me yasa ake buƙatar waɗannan layukan? Bari mu ce mai amfani ya loda hoton bindiga. Cibiyar sadarwa ta jijiyoyi ta ba shi maki daga 95 zuwa 100 kuma ta ƙayyade tare da kashi 99 cikin 95 daidaito cewa akwai makami a cikin hoton. Amma idan ƙimar ƙimar ta kasance ƙasa da XNUMX%, daidaiton ƙirar ya fara raguwa (wannan sifa ce ta samfuran cibiyar sadarwar jijiyoyi).

Sakamakon haka, ana yin jerin gwano bisa tsarin ƙima, kuma waɗancan tallace-tallacen da aka karɓa tsakanin 95 zuwa 100 ana toshe su ta atomatik azaman “Kayayyakin Haramtacce”. Ana aika tallace-tallace masu maki ƙasa da 95 zuwa masu daidaitawa don sarrafa hannu.

Yadda muke daidaita tallace-tallace
Chocolate Beretta tare da harsashi. Kawai don daidaitawa da hannu! 🙂

Daidaitawa da hannu

A farkon 2019, kusan kashi 94% na duk tallace-tallace a Yula ana daidaita su ta atomatik.

Yadda muke daidaita tallace-tallace

Idan dandamali ba zai iya yanke shawara kan wasu tallace-tallace ba, yana aika su don daidaitawa da hannu. Odnoklassniki ya ƙera kayan aikin nasu: ayyuka don masu daidaitawa nan da nan suna nuna duk bayanan da ake buƙata don yanke shawara mai sauri - tallan ya dace ko yakamata a toshe, yana nuna dalilin.

Kuma don kada ingancin sabis ɗin ya sha wahala yayin daidaitawar hannu, ana kula da aikin mutane koyaushe. Alal misali, a cikin rafi na aiki, ana nuna mai gudanarwa "tarkon" - tallace-tallacen da aka riga an riga an shirya mafita. Idan shawarar mai gudanarwa ba ta zo daidai da wanda aka gama ba, ana ba mai gudanarwa kuskure.

A matsakaita, mai gudanarwa yana kashe daƙiƙa 10 yana duba talla ɗaya. Haka kuma, adadin kurakurai bai wuce 0,5% na duk tallace-tallacen da aka tabbatar ba.

Tsakanin mutane

Abokan aiki daga Odnoklassniki sun ci gaba har ma sun yi amfani da "taimakon masu sauraro": sun rubuta aikace-aikacen wasa don hanyar sadarwar zamantakewa wanda za ku iya yin alama da sauri da yawa na bayanai, suna nuna wasu mummunan alamar - Odnoklassniki Moderator (https://ok.ru/app/moderator). Hanya mai kyau don cin gajiyar taimakon masu amfani da OK waɗanda ke ƙoƙarin sanya abun ciki ya fi jin daɗi.

Yadda muke daidaita tallace-tallace
Wasan da masu amfani da shi ke yiwa hotuna alama masu lambar waya akan su.

Duk wani layi na talla a cikin dandamali ana iya tura shi zuwa wasan Odnoklassniki Moderator. Duk abin da masu amfani da wasan suka yi alama ana aika su zuwa masu daidaitawa na ciki don tabbatarwa. Wannan makirci yana ba ku damar toshe tallace-tallace waɗanda ba a ƙirƙiri masu tacewa ba, kuma a lokaci guda ƙirƙirar samfuran horo.

Ajiye sakamakon daidaitawa

Muna adana duk shawarar da aka yanke yayin daidaitawa don kada mu sake aiwatar da tallace-tallacen da muka riga muka yanke shawara a kansu.

Miliyoyin gungu ana ƙirƙira kowace rana bisa tallace-tallace. Bayan lokaci, kowane gungu ana yiwa lakabin "mai kyau" ko "mara kyau." Kowane sabon talla ko bita, shigar da gungu mai alama, yana karɓar ƙuduri ta atomatik daga gungu kanta. Akwai kusan 20 dubu XNUMX irin waɗannan shawarwari na atomatik kowace rana.

Yadda muke daidaita tallace-tallace

Idan ba sabon sanarwar da ta zo ga gungu ba, ana cire shi daga ƙwaƙwalwar ajiya kuma an rubuta zanta da mafita zuwa Apache Cassandra.

Lokacin da dandalin ya sami sabon talla, zai fara ƙoƙarin nemo irin wannan gungu a cikin waɗanda aka riga aka ƙirƙira tare da ɗaukar mafita daga gare ta. Idan babu irin wannan gungu, dandalin yana zuwa Cassandra ya duba wurin. Shin kun same shi? Mai girma, yana amfani da maganin zuwa gungu kuma aika zuwa Yula. Akwai matsakaita na 70 dubu 8 irin waɗannan shawarar "maimaita" kowace rana - XNUMX% na jimlar.

Don taƙaita

Mun kasance muna amfani da dandalin daidaitawa na Odnoklassniki tsawon shekaru biyu da rabi. Muna son sakamakon:

  • Muna daidaita kashi 94% na duk tallace-tallace ta atomatik kowace rana.
  • An rage farashin daidaitawa ɗaya talla daga 2 rubles zuwa 7 kopecks.
  • Godiya ga kayan aikin da aka shirya, mun manta da matsalolin sarrafa masu gudanarwa.
  • Mun ƙara adadin tallace-tallacen da aka sarrafa da hannu sau 2,5 tare da adadin masu daidaitawa da kasafin kuɗi iri ɗaya. Hakanan ingancin daidaitawa ta hannu ya ƙaru saboda sarrafawa ta atomatik, kuma yana canzawa kusan 0,5% na kurakurai.
  • Muna hanzarta rufe sabbin nau'ikan spam tare da masu tacewa.
  • Muna hanzarta haɗa sabbin sassan zuwa daidaitawa "Yula Verticals". Tun daga 2017, Yula ya kara da Gidajen Gidaje, Wuraren Ma'aikata da Madaidaitan Mota.

source: www.habr.com

Add a comment