Momwe mungatsegule ndemanga komanso kuti musamizidwe mu spam

Momwe mungatsegule ndemanga komanso kuti musamizidwe mu spam

Pamene ntchito yanu ndi kupanga chinthu chokongola, simukuyenera kuyankhula zambiri za izo, chifukwa zotsatira zake ziri pamaso pa aliyense. Koma ngati muchotsa zolembedwa pamipanda, palibe amene angazindikire ntchito yanu malinga ngati mipandayo ikuwoneka yabwino kapena mpaka mutachotsa cholakwika.

Ntchito iliyonse yomwe mungasiye ndemanga, kubwereza, kutumiza uthenga kapena kukweza zithunzi posachedwa kapena mtsogolomu idzakumana ndi vuto la sipamu, chinyengo ndi zonyansa. Izi sizingapewedwe, koma ziyenera kuthetsedwa.

Dzina langa ndi Mikhail, ndimagwira ntchito pa gulu la Antispam, lomwe limateteza ogwiritsa ntchito ntchito za Yandex ku zovuta zoterezi. Ntchito yathu siidziwika (ndipo ndi chinthu chabwino!), Choncho lero ndikuwuzani zambiri za izo. Mudzaphunzira pamene kudziletsa kuli kopanda phindu komanso chifukwa chake kulondola sikuli kokha chizindikiro cha mphamvu zake. Tidzakambirananso za kutukwana pogwiritsa ntchito chitsanzo cha amphaka ndi agalu komanso chifukwa chake nthawi zina zimakhala zothandiza β€œkuganiza ngati wolumbira.”

Ntchito zochulukirachulukira zikuwonekera mu Yandex pomwe ogwiritsa ntchito amasindikiza zomwe zili. Mutha kufunsa funso kapena kulemba yankho mu Yandex.Q, kambiranani nkhani zapabwalo ku Yandex.District, kugawana momwe magalimoto alili pamakambirano pa Yandex.Maps. Koma pamene omvera a ntchitoyo akukula, zimakhala zokopa kwa anthu ochita chinyengo ndi spammers. Amabwera ndikudzaza ndemanga: amapereka ndalama zosavuta, amalengeza machiritso ozizwitsa ndikulonjeza zopindulitsa. Chifukwa cha ma spammers, ogwiritsa ntchito ena amataya ndalama, pamene ena amataya chikhumbo chofuna kuthera nthawi pa ntchito yaumphawi yomwe ili ndi sipamu.

Ndipo si vuto lokhalo. Sitimayesetsa kuteteza ogwiritsa ntchito ku scammers, komanso kupanga malo omasuka olankhulana. Ngati anthu akumana ndi kutukwana ndi kutukwana m’mawu ake, amachoka ndipo sadzabwereranso. Izi zikutanthauza kuti muyeneranso kuthana ndi izi.

Webusaiti Yoyera

Monga momwe zimakhalira nthawi zambiri kwa ife, zoyamba zidabadwa mu Search, mu gawo lomwe limalimbana ndi sipamu pazotsatira zakusaka. Pafupifupi zaka khumi zapitazo, ntchito yosefa zomwe anthu achikulire amafufuza m'mabanja komanso mafunso omwe sanafune mayankho ochokera m'gulu la 18+ adawonekera pamenepo. Umu ndi momwe madikishonale oyamba ojambulidwa pamanja a zolaula ndi kutukwana adawonekera, adadzazidwanso ndi akatswiri. Ntchito yayikulu inali kugawa zopempha kuti zikhale zovomerezeka kuwonetsa zomwe zili zazikulu komanso zomwe sizili. Pantchitoyi, zolembera zidasonkhanitsidwa, ma heuristics adapangidwa, ndipo zitsanzo zidaphunzitsidwa. Umu ndi momwe zochitika zoyamba zosefera zosafunikira zidawonekera.

Patapita nthawi, UGC (zopangidwa ndi ogwiritsa ntchito) zinayamba kuonekera mu Yandex - mauthenga omwe amalembedwa ndi ogwiritsa ntchito okha, ndipo Yandex amangosindikiza. Pazifukwa zomwe tafotokozazi, mauthenga ambiri sakanakhoza kusindikizidwa popanda kuyang'ana - kuwongolera kumafunika. Kenako adaganiza zopanga ntchito yomwe ingateteze ku sipamu ndi owukira pazinthu zonse za Yandex UGC ndikugwiritsa ntchito zomwe zachitika kuti zisefe zomwe sizikufuna mu Search. Ntchitoyi idatchedwa "Clean Web".

Ntchito zatsopano ndi thandizo kuchokera kwa okankha

Poyamba, makina osavuta okha adatigwirira ntchito: ntchitozo zidatitumizira zolemba, ndipo tidatulutsa mtanthauzira mawu onyansa, otanthauzira zolaula ndi mawu okhazikika pa iwo - akatswiri adalemba chilichonse pamanja. Koma m'kupita kwa nthawi, ntchitoyo inagwiritsidwa ntchito muzinthu zowonjezereka za Yandex, ndipo tinayenera kuphunzira kugwira ntchito ndi mavuto atsopano.

Nthawi zambiri, m'malo mowunikiranso, ogwiritsa ntchito amasindikiza zilembo zopanda tanthauzo, kuyesera kuwonjezera zomwe akwaniritsa, nthawi zina amatsatsa kampani yawo pazowunikira zamakampani omwe akupikisana nawo, ndipo nthawi zina amangosokoneza mabungwe ndikulemba ndemanga za sitolo ya ziweto: " Nsomba zophikidwa bwino kwambiri!” Mwina tsiku lina luntha lochita kupanga lidzaphunzira kumvetsetsa bwino tanthauzo la mawu aliwonse, koma tsopano ma automation nthawi zina amatha kuthana ndi zoyipa kuposa anthu.

Zinali zoonekeratu kuti sitingathe kuchita izi popanda kuika chizindikiro pamanja, ndipo tinawonjezera gawo lachiwiri ku dera lathu-kutumiza kuti munthu ayang'ane pamanja. Malemba omwe adasindikizidwa omwe wowerengerayo sanawone vuto lililonse adaphatikizidwa pamenepo. Mutha kulingalira mosavuta kukula kwa ntchito yotereyi, kotero sitinangodalira oyesa, komanso kugwiritsa ntchito "nzeru za unyinji," ndiko kuti, tinatembenukira kwa tolokers kuti atithandize. Ndiwo amene amatithandiza kuzindikira zomwe makinawo anaphonya, ndipo potero aziphunzitsa.

Smart caching ndi LSH hashing

Vuto lina lomwe tidakumana nalo pogwira ntchito ndi ndemanga linali sipamu, kapena ndendende, kuchuluka kwake komanso liwiro la kufalikira. Pamene omvera a Yandex.Region anayamba kukula mofulumira, spammers anabwera kumeneko. Anaphunzira kulambalala mawu okhazikika mwa kusintha pang’ono lemba. Spam, ndithudi, idapezekabe ndikuchotsedwa, koma pamlingo wa Yandex, uthenga wosavomerezeka womwe unatumizidwa ngakhale kwa mphindi 5 ukhoza kuwonedwa ndi mazana a anthu.

Momwe mungatsegule ndemanga komanso kuti musamizidwe mu spam

Zachidziwikire, izi sizinatiyendere, ndipo tidapanga ma caching anzeru kutengera LSH (hashing yomwe imakhudzidwa ndi dera). Zimagwira ntchito motere: tidasintha mawuwo, ndikuchotsa maulalo ndikudula ma n-grams (kutsatizana kwa zilembo za n). Kenaka, ma hashes a n-grams anawerengedwa, ndipo LSH vector ya chikalatacho inamangidwa kuchokera kwa iwo. Mfundo ndi yakuti malemba ofanana, ngakhale atasinthidwa pang'ono, adasandulika kukhala ma vector ofanana.

Yankho ili linapangitsa kuti zitheke kugwiritsanso ntchito zigamulo za ogawa ndi toloker pamalemba ofanana. Pakuukira kwa sipamu, uthenga woyamba utangodutsa jambulani ndikulowa mu cache ndi chigamulo cha "spam", mauthenga onse atsopano ofanana, ngakhale osinthidwa, adalandira chigamulo chomwecho ndipo adachotsedwa. Pambuyo pake, tidaphunzira momwe tingaphunzitsire ndikusinthanso owerengera sipamu, koma "smart cache" iyi idakhala nafe ndipo imatithandizabe.

Wowerengera bwino mawu

Popanda kukhala ndi nthawi yopumira polimbana ndi sipamu, tinazindikira kuti 95% ya zomwe zili mkati mwathu zimasinthidwa pamanja: owerengera amangochita zophwanya, ndipo malemba ambiri ndi abwino. Timayika oyeretsa omwe mumilandu 95 mwa 100 amapereka "Chilichonse nzabwino". Ndinayenera kuchita ntchito yachilendo - kupanga magulu azinthu zabwino, mwamwayi zolembera zokwanira zinali zitasonkhanitsidwa panthawiyi.

Wophunzira woyamba adawoneka motere: timalemba lemmatize (kuchepetsa mawu kukhala mawonekedwe awo oyamba), kutaya mbali zonse zothandizira ndikugwiritsa ntchito "dictionary of good lemmas" yokonzedweratu. Ngati mawu onse a m'malembawo ndi "zabwino", ndiye kuti malemba onsewo alibe zophwanya. Pa mautumiki osiyanasiyana, njira iyi idapereka nthawi yomweyo kuchokera ku 25 mpaka 35% makina opangira pamanja. Zoonadi, njira iyi si yabwino: n'zosavuta kuphatikiza mawu angapo osalakwa ndikupeza mawu okhumudwitsa kwambiri, koma zinatilola kuti tifike pamlingo wabwino wa automation ndikutipatsa nthawi yophunzitsa zitsanzo zovuta kwambiri.

Mitundu yotsatira ya ogawa bwino zolemba kale anali ndi mitundu yofananira, mitengo yaziganizo, ndi kuphatikiza kwawo. Kuti tiwonetse mwano komanso mwano, mwachitsanzo, timayesa neural network ya BERT. Ndikofunikira kumvetsetsa tanthauzo la liwu ndi kulumikizana pakati pa mawu ochokera ku ziganizo zosiyanasiyana, ndipo BERT imagwira ntchito bwino pa izi. (Mwa njira, posachedwapa ogwira nawo ntchito ku News anauza, momwe teknoloji imagwiritsidwira ntchito pa ntchito yosakhala yokhazikika - kupeza zolakwika pamutu.) Zotsatira zake, zinali zotheka kupanga automate mpaka 90% ya kuyenda, malingana ndi utumiki.

Kulondola, kukwanira ndi liwiro

Kuti mukulitse, muyenera kumvetsetsa zomwe ophatikiza ena odzipangira okha amabweretsa, kusintha mwa iwo, komanso ngati kuwunika kwapamanja kumatsitsidwa. Kuti tichite izi, timagwiritsa ntchito miyeso yolondola komanso yokumbukira.

Kulondola ndi gawo la zigamulo zolondola pakati pa ziganizo zonse zokhudzana ndi zoipa. Kukwera kulondola, kumachepetsa zochepa zabodza. Ngati simusamala kulondola, ndiye kuti mwachidziwitso mungathe kuchotsa spam ndi zonyansa zonse, komanso pamodzi ndi theka la mauthenga abwino. Kumbali ina, ngati mudalira kulondola kokha, ndiye kuti teknoloji yabwino kwambiri idzakhala yomwe siigwira aliyense. Choncho, palinso chizindikiro cha kukwanira: gawo la zinthu zoipa zomwe zadziwika pakati pa chiwerengero cha zoipa. Ma metrics awiriwa amayenderana.

Kuti tiyeze, timayesa mitsinje yonse yomwe ikubwera pa ntchito iliyonse ndikupereka zitsanzo za zomwe zili kwa oyesa kuti aunikenso akatswiri ndikuyerekeza ndi mayankho a makina.

Koma pali chizindikiro china chofunika.

Ndinalemba pamwambapa kuti uthenga wosavomerezeka ukhoza kuwonedwa ndi mazana a anthu ngakhale mu maminiti a 5. Choncho timawerengera kuti ndi kangati tinkasonyeza anthu zinthu zoipa tisanazibise. Izi ndizofunikira chifukwa sikokwanira kugwira ntchito bwino - muyeneranso kugwira ntchito mwachangu. Ndipo tikamamanga chitetezo chokana kutukwana, tinali kumva mokwanira.

Antimatism pogwiritsa ntchito chitsanzo cha amphaka ndi agalu

Kutsika pang'ono kwanyimbo. Ena anganene kuti zotukwana ndi zotukwana sizowopsa monga maulalo oyipa, komanso osakwiyitsa ngati sipamu. Koma timayesetsa kukhala ndi mikhalidwe yabwino yolankhulirana ndi mamiliyoni a ogwiritsa ntchito, ndipo anthu sakonda kubwerera kumalo kumene amanyozedwa. Sizopanda pake kuti kuletsa kutukwana ndi kutukwana kumatchulidwa m'malamulo a anthu ambiri, kuphatikizapo HabrΓ©. Koma ife tikupita.

Madikishonale otukwana sangathe kuthana ndi kulemera konse kwa chilankhulo cha Chirasha. Ngakhale kuti pali mizu inayi yokha yolumbira, kuchokera kwa iwo mukhoza kupanga mawu osawerengeka omwe sangathe kugwidwa ndi injini iliyonse yokhazikika. Kuphatikiza apo, mutha kulemba gawo la liwu pomasulira, kusintha zilembo ndi kuphatikiza kofananira, kusintha zilembo, kuwonjezera nyenyezi, ndi zina zambiri. Nthawi zina, popanda mawu, sikutheka kudziwa kuti wogwiritsa ntchitoyo amatanthauza mawu otukwana. Timalemekeza malamulo a Habr, kotero tidzawonetsa izi osati ndi zitsanzo zamoyo, koma ndi amphaka ndi agalu.

Momwe mungatsegule ndemanga komanso kuti musamizidwe mu spam

β€œChilamulo,” anatero mphaka. Koma tikumvetsa kuti mphaka ananena mawu osiyana...

Tinayamba kuganiza za "zofananira movutirapo" za mtanthauzira mawu wathu komanso za kukonzanso mwanzeru: tidapereka zomasulira, zomatira ndi zizindikiro zopumira palimodzi, kuyang'ana mapatani ndikulemba mawu osiyana nthawi zonse. Njirayi inabweretsa zotsatira, koma nthawi zambiri imachepetsa kulondola ndipo sichinapereke chikhumbo chokwanira.

Kenako tinaganiza "kuganiza ngati otukwana." Tinayamba kubweretsa phokoso muzolemba tokha: tinakonzanso zilembo, kupanga typos, m'malo mwa zilembo ndi masipelo ofanana, ndi zina zotero. Chizindikiro choyambirira cha izi chinatengedwa pogwiritsa ntchito madikishonale a mat kumagulu akuluakulu a malemba. Ngati mutenga chiganizo chimodzi ndikuchipotoza m'njira zingapo, mutha kukhala ndi ziganizo zambiri. Mwanjira iyi mutha kuwonjezera zitsanzo zophunzitsira kakhumi. Chomwe chinatsala chinali kuphunzitsa padziwe lomwe likubweramo mtundu wina wanzeru kapena wocheperako womwe umaganizira zomwe zachitika.

Momwe mungatsegule ndemanga komanso kuti musamizidwe mu spam

Ndikochedwa kwambiri kuti tikambirane za chisankho chomaliza. Tikuyesabe njira zothetsera vutoli, koma titha kuona kale kuti njira yosavuta yophiphiritsira yamagulu angapo imaposa madikishonale ndi injini zokhazikika: ndizotheka kuwonjezera kulondola komanso kukumbukira.

Inde, tikumvetsa kuti nthawi zonse padzakhala njira zodutsa ngakhale makina apamwamba kwambiri, makamaka pamene nkhaniyo ili yoopsa kwambiri: lembani m'njira yakuti makina opusa sangamvetse. Pano, monga polimbana ndi spam, tilibe cholinga chochotseratu kuthekera kwenikweni kolemba chinthu chonyansa, ntchito yathu ndikuwonetsetsa kuti masewerawa sali oyenera kandulo.

Kutsegula mwayi wogawana malingaliro anu, kulankhulana ndi ndemanga sikuli kovuta. Ndizovuta kwambiri kupeza malo otetezeka, omasuka komanso kuchitiridwa ulemu kwa anthu. Ndipo popanda izi sipadzakhala chitukuko cha dera lililonse.

Source: www.habr.com

Kuwonjezera ndemanga