Momwe timagwirira ntchito paubwino ndi liwiro la kusankha kwa malingaliro

Dzina langa ndine Pavel Parkhomenko, ndine wopanga ML. M'nkhaniyi, ndikufuna kunena za dongosolo la utumiki wa Yandex.Zen ndikugawana luso lamakono, kukhazikitsidwa kwake kwapangitsa kuti zikhale zotheka kuwonjezera ubwino wa malingaliro. Kuchokera pa positiyi muphunzira momwe mungapezere zofunikira kwambiri kwa wogwiritsa ntchito pakati pa mamiliyoni a zolemba mu ma milliseconds ochepa chabe; momwe mungapangire kuwonongeka kosalekeza kwa masanjidwe akulu (okhala ndi mamiliyoni amizere ndi mizere makumi mamiliyoni) kuti zikalata zatsopano zilandire vekitala yawo mu mphindi makumi; momwe mungagwiritsire ntchitonso kuwola kwa mutu-nkhani kuti mupeze chithunzi chabwino cha vidiyo.

Momwe timagwirira ntchito paubwino ndi liwiro la kusankha kwa malingaliro

Zosungira zathu zotsimikizira zili ndi mamiliyoni a zolemba zamitundu yosiyanasiyana: zolemba zomwe zidapangidwa papulatifomu yathu ndikutengedwa kuchokera kumasamba akunja, makanema, nkhani ndi zolemba zazifupi. Kupititsa patsogolo ntchito yotereyi kumagwirizanitsidwa ndi zovuta zambiri zamakono. Nazi zina mwa izo:

  • Gawani ntchito zamakompyuta: chitani ntchito zolemetsa popanda intaneti, ndipo munthawi yeniyeni ingogwiritsani ntchito mwachangu zitsanzo kuti mukhale ndi udindo wa 100-200 ms.
  • Mwamsanga ganizirani zochita za ogwiritsa ntchito. Kuti muchite izi, ndikofunikira kuti zochitika zonse ziziperekedwa nthawi yomweyo kwa ovomereza ndikuwongolera zotsatira zamitundu.
  • Pangani chakudyacho kuti kwa ogwiritsa ntchito atsopano chisinthe mwachangu kumayendedwe awo. Anthu omwe angolowa kumene m'dongosololi ayenera kuganiza kuti malingaliro awo amakhudza malingaliro awo.
  • Mvetsetsani mwachangu yemwe mungapangire nkhani yatsopano.
  • Yankhani mwachangu kukuwonekera kosalekeza kwa zatsopano. Nkhani masauzande ambiri zimasindikizidwa tsiku lililonse, ndipo zambiri zimakhala ndi moyo wocheperako (mwachitsanzo, nkhani). Izi ndi zomwe zimawasiyanitsa ndi mafilimu, nyimbo ndi zinthu zina zautali komanso zodula kuti apange.
  • Chotsani chidziwitso kuchokera kudera lina kupita ku lina. Ngati njira yolangizira ili ndi zitsanzo zophunzitsidwa bwino zamalemba ndipo timawonjezera kanema, titha kugwiritsanso ntchito mitundu yomwe ilipo kuti mtundu watsopano wazinthu ukhale bwino.

Ndikuuzani momwe tinathetsera mavutowa.

Kusankhidwa kwa ofuna

Momwe mungachepetsere kuchuluka kwa zikalata zomwe zikuganiziridwa ndi masauzande ambiri pama milliseconds pang'ono, popanda kuwonongeka kwamtundu?

Tiyerekeze kuti taphunzitsa mitundu yambiri ya ML, zomwe zidapangidwa potengera zomwezo, ndikuphunzitsanso mtundu wina womwe umatengera zolemba za ogwiritsa ntchito. Chilichonse chikanakhala bwino, koma simungatenge ndikuwerengera zizindikiro zonse za zolemba zonse mu nthawi yeniyeni, ngati pali mamiliyoni a zolembazi, ndipo malingaliro ayenera kumangidwa mu 100-200 ms. Ntchito ndikusankha kagawo kakang'ono kuchokera kwa mamiliyoni ambiri, omwe adzasankhidwe kwa ogwiritsa ntchito. Gawoli nthawi zambiri limatchedwa kusankha anthu. Pali zofunika zingapo kwa izo. Choyamba, kusankha kuyenera kuchitika mwachangu kwambiri, kuti nthawi yochuluka momwe ingathere yatsala kuti isanjidwe. Kachiwiri, popeza tachepetsa kwambiri zikalata zosankhidwa, tiyenera kusunga zikalata zoyenera kwa wogwiritsa ntchito momwe tingathere.

Mfundo yathu yosankha ofuna kusankhidwa yasintha, ndipo pakadali pano tafika pamachitidwe angapo:

Momwe timagwirira ntchito paubwino ndi liwiro la kusankha kwa malingaliro

Choyamba, zolemba zonse zimagawidwa m'magulu, ndipo zolemba zodziwika kwambiri zimatengedwa kuchokera ku gulu lirilonse. Magulu akhoza kukhala malo, mitu, masango. Kwa wogwiritsa ntchito aliyense, malinga ndi mbiri yake, magulu omwe ali pafupi naye amasankhidwa ndipo zolemba zabwino kwambiri zimachotsedwa kwa iwo. Timagwiritsanso ntchito index ya kNN kusankha zolemba zomwe zili pafupi kwambiri ndi ogwiritsa ntchito munthawi yeniyeni. Pali njira zingapo zopangira index ya kNN; yathu idagwira bwino ntchito Mtengo wa HNSW (Ma graph a Hierarchical Navigable Small World). Uwu ndi mtundu wotsogola womwe umakupatsani mwayi wopeza ma vekta a N omwe ali pafupi kwambiri ndi wogwiritsa ntchito kuchokera munkhokwe ya mamiliyoni mu ma milliseconds ochepa. Choyamba timalozera zolemba zathu zonse zosungidwa pa intaneti. Popeza kusaka muzolozera kumagwira ntchito mwachangu, ngati pali zoyikapo zolimba zingapo, mutha kupanga zolozera zingapo (mlozera umodzi pakuyika kulikonse) ndikupeza chilichonse munthawi yeniyeni.

Tili ndi zikalata masauzande ambiri kwa aliyense wogwiritsa ntchito. Izi zikadali zambiri kuwerengera mawonekedwe onse, kotero pakadali pano timagwiritsa ntchito kusanja kopepuka - mtundu wopepuka wolemetsa wokhala ndi mawonekedwe ochepa. Ntchito ndikulosera zomwe mtundu wolemera udzakhala nawo pamwamba. Zolemba zomwe zili ndi cholozera chapamwamba kwambiri zidzagwiritsidwa ntchito pamtundu wolemetsa, ndiye kuti, pamapeto omaliza. Njirayi imakuthandizani kuti muchepetse nkhokwe ya zolemba zomwe zimaganiziridwa kwa wogwiritsa ntchito kuchokera mamiliyoni mpaka masauzande mu makumi a milliseconds.

ALS ikupita patsogolo

Kodi mungaganizire bwanji mayankho a ogwiritsa ntchito mukangodina?

Chofunikira pamalingaliro ndi nthawi yoyankha ku mayankho a ogwiritsa ntchito. Izi ndizofunikira makamaka kwa ogwiritsa ntchito atsopano: munthu akangoyamba kugwiritsa ntchito njira yolangizira, amalandira chakudya chopanda umwini cha zolemba zamitu yosiyanasiyana. Atangopanga kudina koyamba, muyenera kuganizira izi ndikusintha zomwe amakonda. Mukawerengera zinthu zonse popanda intaneti, kuyankha mwachangu pamakina sikutheka chifukwa chakuchedwa. Chifukwa chake ndikofunikira kukonza zochita za ogwiritsa ntchito munthawi yeniyeni. Pazifukwa izi, timagwiritsa ntchito sitepe ya ALS panthawi yothamanga kuti tipange chithunzithunzi cha wogwiritsa ntchito.

Tiyerekeze kuti tili ndi choyimira vekitala pazolemba zonse. Mwachitsanzo, titha kupanga zoyika popanda intaneti potengera zolemba zankhani pogwiritsa ntchito ELMo, BERT kapena mitundu ina yophunzirira pamakina. Kodi tingapeze bwanji chiwonetsero cha vekitala cha ogwiritsa ntchito pamalo omwewo potengera kuyanjana kwawo mudongosolo?

Mfundo yaikulu ya mapangidwe ndi kuwonongeka kwa matrix ogwiritsira ntchitoTikhale ndi ogwiritsa ntchito m ndi zolemba za n. Kwa ogwiritsa ntchito ena, ubale wawo ndi zolemba zina umadziwika. Kenako chidziwitsochi chikhoza kuyimiridwa ngati matrix a mxn: mizere imagwirizana ndi ogwiritsa ntchito, ndipo mizere imagwirizana ndi zolemba. Popeza munthuyo sanawone zolemba zambiri, maselo ambiri a matrix adzakhala opanda kanthu, pamene ena adzadzazidwa. Pa chochitika chilichonse (monga, kusakonda, dinani) mtengo wina umaperekedwa mu matrix - koma tiyeni tilingalire chitsanzo chosavuta chomwe kufanana kumafanana ndi 1, ndipo kusakonda kumafanana ndi -1.

Tiyeni tiwononge masanjidwewo kukhala awiri: P (mxd) ndi Q (dxn), pomwe d ndi gawo la chiwonetsero cha vector (nthawi zambiri nambala yaying'ono). Ndiye chinthu chilichonse chimagwirizana ndi d-dimensional vector (kwa wogwiritsa ntchito - mzere mu matrix P, pa chikalata - ndime mu matrix Q). Ma vector awa adzakhala ophatikizika a zinthu zofananira. Kuti muwone ngati wosuta angakonde chikalata, mutha kungochulukitsa zoyika zawo.

Momwe timagwirira ntchito paubwino ndi liwiro la kusankha kwa malingaliro
Imodzi mwa njira zomwe zingathe kuwola matrix ndi ALS (Alternating Least Squares). Tidzakonza zotsatirazi zotayika:

Momwe timagwirira ntchito paubwino ndi liwiro la kusankha kwa malingaliro

Apa rui ndi kuyanjana kwa wosuta u ndi chikalata i, qi ndiye vector ya chikalata i, pu ndiye vector ya wosuta u.

Ndiye vekitala yabwino yogwiritsira ntchito kuchokera pakuwona zolakwika za square square (kwa ma vectors okhazikika) imapezeka mwachisawawa pothetsa kusinthasintha kwa mzere wofanana.

Izi zimatchedwa "ALS step". Ndipo ALS aligorivimu yokha ndikuti timakonza imodzi mwa matrices (ogwiritsa ntchito ndi zolemba) ndikusintha ina, kupeza yankho labwino.

Mwamwayi, kupeza vekitala ya wogwiritsa ntchito ndi ntchito yofulumira kwambiri yomwe ingachitike panthawi yothamanga pogwiritsa ntchito malangizo a vector. Chinyengochi chimakupatsani mwayi woti mutengere ndemanga za ogwiritsa ntchito nthawi yomweyo. Kuyika komweko kungagwiritsidwe ntchito muzolozera za kNN kuwongolera kusankha kwa ofuna kusankha.

Zosefera Zogwirizana Zogawidwa

Kodi mungatani kuti muwonjezere kuchuluka kwa matrix factorization ndikupeza ma vector a nkhani zatsopano mwachangu?

Zomwe zili mkati sizomwe zimapatsa zizindikiro. Gwero lina lofunika ndi chidziwitso chogwirizana. Makhalidwe abwino amatha kupezeka kuchokera pakuwonongeka kwa matrix ogwiritsira ntchito. Koma poyesera kuchita kuwonongeka koteroko, tinakumana ndi mavuto:

1. Tili ndi mamiliyoni a zolemba ndi mamiliyoni a ogwiritsa ntchito. Matrix sagwirizana kwathunthu ndi makina amodzi, ndipo kuwola kudzatenga nthawi yayitali kwambiri.
2. Zambiri zomwe zili mu dongosololi zimakhala ndi moyo waufupi: zolemba zimakhala zofunikira kwa maola ochepa okha. Chifukwa chake, ndikofunikira kupanga mawonekedwe awo a vector mwachangu momwe angathere.
3. Ngati mumanga kuwonongeka mwamsanga chikalatacho chikasindikizidwa, chiwerengero chokwanira cha ogwiritsa ntchito sichidzakhala ndi nthawi yochiyesa. Chifukwa chake, mawonekedwe ake a vector sangakhale abwino kwambiri.
4. Ngati wogwiritsa ntchito amakonda kapena sakonda, sitingathe kuziganizira nthawi yomweyo pakuwonongeka.

Kuti tithane ndi mavutowa, tidakhazikitsa kugawika kwa zolemba za ogwiritsa ntchito ndikuwonjezera zosintha pafupipafupi. Kodi kwenikweni zimagwira ntchito bwanji?

Tiyerekeze kuti tili ndi gulu la makina a N (N ali m'ma mazana) ndipo tikufuna kupanga kugawanika kwa matrix omwe sakugwirizana ndi makina amodzi. Funso ndilo momwe mungapangire kuwonongeka kotero kuti, kumbali imodzi, pali deta yokwanira pamakina aliwonse ndipo, kumbali inayo, kuti mawerengedwewo akhale odziimira okha?

Momwe timagwirira ntchito paubwino ndi liwiro la kusankha kwa malingaliro

Tidzagwiritsa ntchito ALS decomposition algorithm tafotokozazi. Tiyeni tiwone momwe tingachitire gawo limodzi la ALS m'njira yogawidwa - masitepe ena onse adzakhala ofanana. Tiyerekeze kuti tili ndi zolemba zokhazikika ndipo tikufuna kupanga matrix a ogwiritsa ntchito. Kuti tichite izi, tigawaniza magawo a N ndi mizere, gawo lililonse limakhala ndi mizere yofanana. Titumiza ku makina aliwonse ma cell opanda kanthu amizere yofananira, komanso matrix oyika zikalata (konse). Popeza kukula kwake sikuli kwakukulu kwambiri, ndipo matrix ogwiritsira ntchito nthawi zambiri amakhala ochepa, izi zimakwanira pamakina okhazikika.

Chinyengo ichi chitha kubwerezedwa kwa nthawi zingapo mpaka mtunduwo utasintha, kusinthanitsa matrix okhazikika chimodzi ndi chimodzi. Koma ngakhale pamenepo, kuwonongeka kwa matrix kumatha kutenga maola angapo. Ndipo izi sizimathetsa vuto lomwe muyenera kulandira mwachangu zolemba zatsopano ndikusintha ma embeddings omwe analibe chidziwitso chochepa pomanga chitsanzocho.

Kuyambitsidwa kwa zosintha zachitsanzo zofulumira kunatithandiza. Tiyerekeze kuti tili ndi chitsanzo chophunzitsidwa panopa. Chiyambireni maphunziro ake, pakhala nkhani zatsopano zomwe ogwiritsa ntchito athu adalumikizana nazo, komanso zolemba zomwe sizinagwirizane pang'ono panthawi yophunzitsidwa. Kuti tipeze zoyikika za nkhani zoterezi mwachangu, timagwiritsa ntchito zoyika za ogwiritsa ntchito zomwe zidapezedwa pamaphunziro akulu akulu achitsanzo ndikuchita sitepe imodzi ya ALS kuti tiwerengere masanjidwe a chikalata chopatsidwa matrix okhazikika. Izi zimakulolani kuti mulandire zoyikapo mwachangu - patangopita mphindi zochepa chikalatacho chikasindikizidwa - ndipo nthawi zambiri sinthani zoyika zaposachedwa.

Kuti tipereke malingaliro nthawi yomweyo ganizirani zochita za anthu, mu nthawi yothamanga sitigwiritsa ntchito zoyika za ogwiritsa ntchito popanda intaneti. M'malo mwake, timachita sitepe ya ALS ndikupeza vekitala yeniyeni.

Kusamutsira ku dera lina ankalamulira

Momwe mungagwiritsire ntchito mayankho a ogwiritsa ntchito pazolemba kuti mupange chiwonetsero chazithunzi za kanema?

Poyamba, tinkangolimbikitsa zolemba zokha, kotero ma algorithms athu ambiri amapangidwa molingana ndi izi. Koma powonjezera mitundu ina yazinthu, tidakumana ndi kufunikira kosintha mitundu. Tinathetsa bwanji vutoli pogwiritsa ntchito chitsanzo cha kanema? Njira imodzi ndikubwezeretsanso zitsanzo zonse kuyambira pachiyambi. Koma izi zimatenga nthawi yayitali, ndipo ma aligorivimu ena akufuna pa kukula kwa chitsanzo cha maphunziro, chomwe sichinapezeke mu kuchuluka kofunikira kwa mtundu watsopano wazinthu mu mphindi zoyambirira za moyo wake pautumiki.

Tinapita njira ina ndikugwiritsanso ntchito zolemba za kanema. Chinyengo chomwecho cha ALS chidatithandizira kupanga ma vector owonetsera makanema. Tidatenga mawonekedwe a ogwiritsa ntchito potengera zolemba ndikuchita gawo la ALS pogwiritsa ntchito zidziwitso zamakanema. Chifukwa chake tidapeza choyimira vekitala cha kanemayo mosavuta. Ndipo pa nthawi yothamanga timangowerengera moyandikana pakati pa vekitala yochokera m'malemba ndi vector ya kanema.

Pomaliza

Kupanga maziko a ndondomeko yolangizira nthawi yeniyeni kumaphatikizapo zovuta zambiri. Muyenera kukonza deta mwachangu ndikugwiritsa ntchito njira za ML kuti mugwiritse ntchito bwino detayi; kumanga machitidwe ovuta omwe amagawidwa omwe amatha kukonza zizindikiro za ogwiritsa ntchito ndi magawo atsopano azinthu mu nthawi yochepa; ndi ntchito zina zambiri.

M'dongosolo lamakono, mapangidwe omwe ndidawafotokozera, ubwino wa malingaliro kwa wogwiritsa ntchito umakula pamodzi ndi ntchito yake komanso kutalika kwa kukhala pautumiki. Koma ndithudi, apa pali vuto lalikulu: ndizovuta kuti dongosololi limvetsetse nthawi yomweyo zokonda za munthu yemwe alibe chiyanjano chochepa ndi zomwe zili. Kupititsa patsogolo malingaliro a ogwiritsa ntchito atsopano ndicho cholinga chathu chachikulu. Tipitiliza kukulitsa ma aligorivimu kuti zomwe zili zogwirizana ndi munthu zilowe muzakudya zake mwachangu, ndipo zosafunika sizikuwonetsedwa.

Source: www.habr.com

Kuwonjezera ndemanga