Sanibonani nonke! Igama lami ngingu-Sasha, ngingu-CTO & Co-Founder e-LoyaltyLab. Eminyakeni emibili edlule, mina nabangane bami, njengabo bonke abafundi abampofu, sahamba kusihlwa siyothenga ubhiya esitolo esiseduze eduze kwasekhaya. Sacasuka kakhulu ukuthi umthengisi, azi ukuthi sizoza ngobhiya, akazange anikeze isaphulelo kuma-chips noma ama-crackers, nakuba lokhu kwakunengqondo! Asizange siqonde ukuthi kungani lesi simo senzeka futhi sanquma ukuqala inkampani yethu. Nokho, njengebhonasi, zinike izaphulelo njalo ngoLwesihlanu kulawo ma-chips afanayo.
Futhi konke kufinyelele eqophelweni lapho ngethula khona okokusebenza ohlangothini lobuchwepheshe lomkhiqizo
Isingeniso
Njengabo bonke abanye abantu ekuqaleni kohambo, siqale ngokubuka kabanzi ukuthi ama-recommender systems enziwa kanjani. Futhi i-architecture ethandwa kakhulu yaba uhlobo olulandelayo:
Iqukethe izingxenye ezimbili:
- Ukusampula kwamakhandidethi ezincomo kusetshenziswa imodeli elula nesheshayo, ngokuvamile ehlanganyelwayo.
- Ukulinganisa amakhandidethi anemodeli yokuqukethwe eyinkimbinkimbi futhi ehamba kancane, kucatshangelwa zonke izici ezingaba khona kudatha.
Ngemva kwalokhu ngizosebenzisa amagama alandelayo:
- ikhandidethi/ongenele izincomo - ipheya yomkhiqizo wabasebenzisi okungenzeka ifakwe kuzincomo ekukhiqizeni.
- indlela yokukhipha ikhandidethi/i-extractor/candidate extraction β inqubo noma indlela yokukhipha βamakhandidethi wokuncomaβ kudatha etholakalayo.
Isinyathelo sokuqala ngokuvamile sihlanganisa ukusebenzisa ukuhluka okuhlukene kokuhlunga ngokubambisana. Okudume kakhulu -
Ngaphambi kokuthi ngiqhubekele ekuchazeni indlela yethu, kubalulekile ukuqaphela ukuthi ezincomweni zesikhathi sangempela, lapho kubalulekile ukuthi sicabangele idatha eyenzeka emaminithini angu-30 edlule, empeleni azikho izindlela eziningi ezingasebenza ngesikhathi esidingekayo. Kodwa, esimweni sethu, kufanele siqoqe izincomo izikhathi ezingaphezu kwesisodwa ngosuku, futhi ezimweni eziningi - kanye ngesonto, okusinika ithuba lokusebenzisa amamodeli ayinkimbinkimbi futhi sithuthukise izinga ngokuphindaphindiwe.
Ake sithathe njengesisekelo ukuthi yimaphi amamethrikhi kuphela aboniswa yi-ALS emsebenzini wokukhipha amakhandidethi. Amamethrikhi abalulekile esiwaqaphayo yilawa:
- Ukunemba - ingxenye yamakhandidethi akhethwe kahle kwalawo amasampula.
- Ukukhumbula inani lamakhandidethi enzeke kulawo abekade esesikhathini okuhlosiwe.
- I-F1-score - Isilinganiso sika-F sibalwa emaphuzwini amabili adlule.
Sizophinde sibheke amamethrikhi emodeli yokugcina ngemva kokukhushulwa kwe-gradient ngezici zokuqukethwe ezengeziwe. Kukhona futhi amamethrikhi angu-3 amakhulu lapha:
- precision@5 - isilinganiso sephesenti lemikhiqizo esuka kwabangu-5 abaphezulu ngokwemibandela yamathuba omthengi ngamunye.
- impendulo-rate@5 - ukuguqulwa kwamakhasimende ukusuka ekuvakasheni esitolo kuya ekuthengeni okungenani okukodwa kokunikezwayo komuntu siqu (imikhiqizo emi-5 kokunikezwayo okukodwa).
- i-avg roc-auc ngomsebenzisi ngamunye - isilinganiso
i-roc-auc kumthengi ngamunye.
Kubalulekile ukuqaphela ukuthi wonke lawa mamethrikhi akalwa ngawo
Ngaphambi kokuthi siqale ukuchaza indlela yethu, siqale sibheke isisekelo, okuyimodeli eqeqeshwe yi-ALS.
Amamethrikhi okubuyiswa kwekhandidethi:
Amamethrikhi wokugcina:
Ngiphatha konke ukusetshenziswa kwama-algorithms njengohlobo oluthile lwe-hypothesis yebhizinisi. Ngakho, cishe, noma iyiphi imodeli yokuhlanganyela ingabhekwa njengenkoleloze yokuthi βabantu bavame ukuthenga lokho abantu abafana nabo abakuthengayo.β Njengoba bese ngishilo, asizange sigcine ku-semantics enjalo, futhi nansi imibono ethile esebenza kahle kudatha ekuthengiseni okungaxhunyiwe ku-inthanethi:
- Engivele ngathenga ngaphambili.
- Okufana nalokho engakuthenga ngaphambili.
- Isikhathi sokuthenga kudala.
- Idume ngesigaba/umkhiqizo.
- Okunye ukuthengwa kwezimpahla ezahlukene isonto nesonto (Markov chain).
- Imikhiqizo efanayo kubathengi, ngokuya ngezici ezakhiwe amamodeli ahlukene (i-Word2Vec, i-DSSM, njll.).
Wathengani ngaphambilini?
I-heuristic esobala kakhulu esebenza kahle kakhulu ekuthengisweni kwegrosa. Lapha sithatha zonke izimpahla umnikazi wekhadi lokwethembeka azithenge ezinsukwini ezingu-K zokugcina (imvamisa amaviki angu-1-3), noma izinsuku ezingu-K onyakeni odlule. Ngokusebenzisa le ndlela kuphela, sithola amamethrikhi alandelayo:
Lapha kusobala ukuthi uma sithatha isikhathi eside, siba nokukhumbula okwengeziwe kanye nokunemba okuncane esinakho futhi okuphambene nalokho. Ngokwesilinganiso, "amaviki angu-2 okugcina" anikeza imiphumela engcono kumakhasimende.
Okufana nalokho engakuthenga ngaphambili
Akumangazi ukuthi okuthengiswayo kwegrosa "engikuthenge ngaphambili" kusebenza kahle, kodwa ukukhipha amakhandidethi kuphela kulokho umsebenzisi asekuthengile akupholile kakhulu, ngoba akunakwenzeka ukuthi umangaza umthengi ngomkhiqizo omusha. Ngakho-ke, siphakamisa ukuthuthukisa kancane le heuristic sisebenzisa amamodeli asebenzisanayo afanayo. Kusuka kuma-vector esiwathole ngesikhathi sokuqeqeshwa kwe-ALS, singathola imikhiqizo efanayo naleyo umsebenzisi asevele ekuthengile. Lo mbono ufana kakhulu "namavidiyo afanayo" ezinsizeni zokubuka okuqukethwe kwevidiyo, kodwa njengoba asazi ukuthi umsebenzisi udlani/uthengani ngesikhathi esithile, singabheka kuphela afana nalawo asevele ethengile, ikakhulukazi. njengoba thina Siyazi kakade ukuthi isebenza kahle kangakanani. Ukusebenzisa le ndlela emisebenzini yabasebenzisi kula maviki angu-2 adlule, sithola amamethrikhi alandelayo:
kuyinto k β inani lemikhiqizo efanayo ebuyiselwa kumkhiqizo ngamunye othengwe umthengi ezinsukwini eziyi-14 ezedlule.
Le ndlela isebenze kahle kakhulu kuklayenti lethu, obekubaluleke kakhulu kulo ukuthi lingancomi noma yini ebivele isemlandweni wokuthenga komsebenzisi.
Isikhathi sokuthenga sekwephuzile
Njengoba sesivele sitholile, ngenxa yemvamisa ephezulu yokuthengwa kwezimpahla, indlela yokuqala isebenza kahle ngezidingo zethu ezithile. Kodwa kuthiwani ngezimpahla ezifana ne-washing powder/shampoo/njll. Okusho ukuthi, ngemikhiqizo engenakwenzeka ukuthi idingeke isonto ngalinye noma amabili nokuthi izindlela zangaphambili azikwazi ukukhipha. Lokhu kuholela embonweni olandelayo - kuhlongozwa ukubala isikhathi sokuthengwa komkhiqizo ngamunye ngokwesilinganiso kumakhasimende athenge umkhiqizo ngaphezulu. k kanye. Bese ukhipha lokho okungenzeka ukuthi umthengi usephelelwe yisikhathi. Izikhathi ezibaliwe zezimpahla zingabhekwa ngamehlo akho ukubona ukuthi ziyanele yini:
Bese sizobheka ukuthi ingabe ukuphela kwenkathi yomkhiqizo kuwela phakathi nesikhathi lapho izincomo zizokhiqizwa futhi sampula lokho okwenzekayo. Indlela yokwenza ingafanekiswa kanje:
Lapha sinezimo ezi-2 eziyinhloko ezingacatshangelwa:
- Ingabe kuyadingeka ukwenza isampula yemikhiqizo kumakhasimende athenge umkhiqizo izikhathi ezingaphansi kuka-K.
- Ingabe kuyadingeka ukwenza isampula yomkhiqizo uma ukuphela kwesikhathi sawo siwela ngaphambi kokuqala kwesikhawu esiqondiwe.
Igrafu elandelayo ikhombisa ukuthi yimiphi imiphumela efinyelelwa yile ndlela ngama-hyperparameter ahlukene:
ft β Thatha amakhasimende kuphela athenge umkhiqizo okungenani izikhathi ezingu-K (lapha K=5).
tm - Thatha kuphela amakhandidethi awela ngaphakathi kwesikhawu okuhlosiwe
Akumangazi ukuthi uyakwazi (0, 0) ngobukhulu Ukukhumbula kanye nelincane kakhulu ngokunemba, njengoba ngaphansi kwalesi simo amakhandidethi amaningi ayabuyiswa. Nokho, imiphumela engcono kakhulu ifinyelelwa uma singasampuli imikhiqizo yamakhasimende athenge umkhiqizo othile ngaphansi kwalokhu k izikhathi futhi kukhishwe, okuhlanganisa izimpahla, ukuphela kwesikhathi esiwela ngaphambi kwesikhawu esiqondiwe.
Idume ngezigaba
Omunye umqondo osobala uwukusampula imikhiqizo edumile kuzo zonke izigaba noma izinhlobo ezahlukene. Lapha sibala umthengi ngamunye phezulu-k "intandokazi" izigaba/amabhrendi bese ukhipha "okudumile" kulesi sigaba/umkhiqizo. Esimweni sethu, sizonquma "intandokazi" kanye "nedumile" ngenani lokuthengwa komkhiqizo. Inzuzo eyengeziwe yale ndlela ukusebenza kwayo esimweni sokuqala esibandayo. Okusho ukuthi, kumakhasimende athenge okumbalwa kakhulu, noma angakaze afike esitolo isikhathi eside, noma asanda kukhipha ikhadi lokwethembeka. Kubo, kulula futhi kungcono ukubeka izinto ezidumile kumakhasimende futhi ezinomlando. Amamethrikhi avelayo yilawa:
Lapha inombolo engemuva kwegama elithi βisigabaβ isho izinga lokuzalela kwesigaba.
Sekukonke, futhi akumangazi ukuthi izigaba ezincane zithola imiphumela engcono, njengoba zikhiphela abathengi imikhiqizo βeyintandokaziβ enembe kakhudlwana.
Okunye ukuthengwa kwezimpahla ezahlukene isonto nesonto
Indlela ethokozisayo engingakaze ngiyibone ezihlokweni mayelana nezinhlelo zokuncoma ilula futhi ngesikhathi esifanayo isebenza indlela yezibalo yamaketanga kaMarkov. Lapha sithatha amaviki angu-2 ahlukene, bese kukhasimende ngalinye sakha amapheya emikhiqizo [othengwe ngeviki i]-[othengwe ngeviki j], lapho j > i, futhi ukusuka lapha sibala kumkhiqizo ngamunye amathuba okushintshela komunye umkhiqizo ngesonto elizayo. Okusho ukuthi, ngepheya ngayinye yezimpahla umkhiqizo-umkhiqizoj Sibala inombolo yawo ngamapheya atholakele bese sihlukanisa ngenani lamapheya, kuphi imikhiqizo kwaba ngeviki lokuqala. Ukuze sikhiphe amakhandidethi, sithatha irisidi yokugcina yomthengi futhi sikhiphe phezulu-k imikhiqizo elandelayo engenzeka kakhulu evela ku-matrix yoshintsho esiyitholile. Inqubo yokwakha i-matrix yoshintsho ibukeka kanje:
Kusuka kuzibonelo zangempela ku-matrix yamathuba oshintsho sibona izenzakalo ezilandelayo ezithakazelisayo:
Lapha ungabona ukuncika okuthakazelisayo okuvezwa ekuziphatheni kwabathengi: isibonelo, abathandi bezithelo ezisawolintshi noma uhlobo lobisi okungenzeka bashintshele kolunye. Futhi akumangazi ukuthi imikhiqizo enemvamisa ephezulu yokuthenga okuphindayo, njengebhotela, nayo iphelela lapha.
Amamethrikhi endleleni enamaketanga kaMarkov ami kanje:
k - inombolo yemikhiqizo etholiwe kumkhiqizo ngamunye othengiwe kusukela ekuthengeni kokugcina komthengi.
Njengoba singabona, umphumela omuhle kakhulu uboniswa ukucushwa nge-k=4. I-spike evikini lesi-4 ingachazwa ngokuziphatha kwesizini ngamaholide.
Imikhiqizo efanayo kubathengi, ngokuya ngezici ezakhiwe amamodeli ahlukene
Manje sesifike engxenyeni enzima kakhulu nethakazelisa kakhulu - ukucinga omakhelwane abaseduze ngokusekelwe kuma-vector amakhasimende nemikhiqizo eyakhiwe ngokuvumelana namamodeli ahlukahlukene. Emsebenzini wethu sisebenzisa amamodeli anjalo ama-3:
- I-ALS
- I-Word2Vec (Item2Vec yemisebenzi enjalo)
- I-DSSM
Sesike sabhekana ne-ALS, ungafunda ukuthi ifunda kanjani
kuyinto Q - umbuzo, umbuzo wosesho lomsebenzisi, D[i] β idokhumenti, ikhasi le-inthanethi. Okokufaka kumodeli yizimfanelo zesicelo namakhasi, ngokulandelana. Ngemuva kwesendlalelo ngasinye sokufakwayo kunezingqimba eziningi ezixhunywe ngokugcwele (i-multilayer perceptron). Okulandelayo, imodeli ifunda ukunciphisa i-cosine phakathi kwama-vector atholwe ezingqimbeni zokugcina zemodeli.
Imisebenzi yezincomo isebenzisa ukwakheka okufanayo ncamashi, esikhundleni sesicelo kuphela kunomsebenzisi, futhi esikhundleni samakhasi kunemikhiqizo. Futhi esimweni sethu, lesi sakhiwo siguqulwa sibe okulandelayo:
Manje, ukuhlola imiphumela, kusasele ukumboza iphuzu lokugcina - uma esimweni se-ALS ne-DSSM sichaze ngokucacile ama-vectors abasebenzisi, ngakho-ke esimweni se-Word2Vec sinama-vector omkhiqizo kuphela. Lapha, ukwakha i-vector yomsebenzisi, sichaze izindlela ezi-3 eziphambili:
- Vele ungeze ama-vector, bese kubangeni le-cosine kuvela ukuthi sivele silinganisele imikhiqizo emlandweni wokuthenga.
- I-Vector summation enesisindo esithile.
- Ukukala izimpahla nge-TF-IDF coefficient.
Esimeni sokukala kwesisindo somugqa wevekhtha yomthengi, sisuka kumbono wokuthi umkhiqizo othengwe umsebenzisi izolo unomthelela omkhulu ekuziphatheni kwakhe kunomkhiqizo awuthenge ezinyangeni eziyisithupha ezedlule. Ngakho-ke sicabangela iviki langaphambilini lomthengi ngezingqinamba zokungu-1, nokuthi kwenzekeni ngokulandelayo ngamanani okuthi Β½, β
, njll.:
Kuma-coefficients e-TF-IDF, senza okufanayo ncamashi naku-TF-IDF emibhalweni, kuphela sibheka umthengi njengedokhumenti, futhi isheke njengomnikelo, ngokulandelana, igama liwumkhiqizo. Ngale ndlela, i-vector yomsebenzisi izoshintshela kakhulu ezimpahleni ezingavamile, kuyilapho izimpahla ezivamile nezijwayelekile zomthengi ngeke zishintshe kakhulu. Indlela yokwenza ingafanekiswa kanje:
Manje ake sibheke amamethrikhi. Nansi indlela imiphumela ye-ALS ebukeka ngayo:
Amamethrikhi e-Item2Vec anokwehluka okuhlukile kokwakha i-vector yomthengi:
Kulokhu, imodeli efanayo ncamashi isetshenziswa njengasekuqaleni kwethu. Umehluko kuphela ukuthi iyiphi i-k esizoyisebenzisa. Ukuze usebenzise amamodeli asebenzisanayo kuphela, kufanele uthathe imikhiqizo esondele kakhulu engama-50-70 kukhasimende ngalinye.
Futhi amamethrikhi ngokuya nge-DSSM:
Indlela yokuhlanganisa zonke izindlela?
Kupholile, usho, kodwa yini okufanele uyenze ngesethi enkulu kangaka yamathuluzi okukhipha ikhandidethi? Ungakukhetha kanjani ukucushwa okuphelele kwedatha yakho? Nazi izinkinga ezimbalwa:
- Kuyadingeka ukuthi ngandlela thize ukhawule indawo yokusesha yama-hyperparameter endleleni ngayinye. Yiqiniso, ihlukene yonke indawo, kodwa inani lamaphuzu angenzeka likhulu kakhulu.
- Usebenzisa isampula encane elinganiselwe yezindlela ezithile ezinamapharamitha athile, ungakhetha kanjani ukulungiselelwa okungcono kakhulu kwemethrikhi yakho?
Asikayitholi impendulo eqinisekile yombuzo wokuqala, ngakho-ke siqhubeka kokulandelayo: endleleni ngayinye, umkhawulo wesikhala sokusesha we-hyperparameter ubhalwa, kuye ngezibalo ezithile kudatha esinayo. Ngakho-ke, ngokwazi isikhathi esimaphakathi phakathi kokuthenga okuvela kubantu, singaqagela ukuthi yisiphi isikhathi sokusebenzisa indlela "yokuthenga osekuthengiwe" kanye "nenkathi yokuthenga okudlule".
Futhi ngemva kokuba sesidlule enanini elithile elanele lokuhlukahluka kwezindlela ezihlukene, siphawula okulandelayo: ukusetshenziswa ngakunye kukhipha inombolo ethile yamakhandidethi futhi kunevelu ethile yethu yemethrikhi engukhiye (khumbula). Sifuna ukuthola isamba senani elithile lamakhandidethi, kuye ngamandla ethu avumelekile okwenza ikhompuyutha, anemethrikhi ephezulu kakhulu engenzeka. Lapha inkinga igoqeka kahle ibe yinkinga kabhaka.
Lapha inombolo yamakhandidethi isisindo se-ingot, futhi indlela yokukhumbula inani layo. Kodwa-ke, kunamaphuzu ama-2 ngaphezulu okufanele acatshangelwe lapho kusetshenziswa i-algorithm:
- Izindlela zingase zidlulele kumakhandidethi abawatholayo.
- Kwezinye izimo, kuzoba okulungile ukuthatha indlela eyodwa kabili enemingcele ehlukene, futhi okokukhiphayo kwekhandidethi kusuka kweyokuqala ngeke kube isethi engaphansi yesibili.
Isibonelo, uma sithatha ukusetshenziswa kwendlela ethi "engivele ngikuthengile" ngezikhawu ezihlukene ukuze kubuyiswe, amasethi abo amakhandidethi azobekwa phakathi kwelinye. Ngesikhathi esifanayo, amapharamitha ahlukene "ekuthengeni kwezikhathi ezithile" ekuphumeni awanikezi impambana-mgwaqo ephelele. Ngakho-ke, sihlukanisa izindlela zesampula ngamapharamitha ahlukene sibe amabhulokhi kangangokuthi kubhulokhi ngayinye sifuna ukuthatha indlela yokukhipha okungenani eyodwa ngamapharamitha athile. Ukuze wenze lokhu, udinga ubuhlakani obuncane ekusebenziseni inkinga ye-knapsack, kepha ama-asymptotics kanye nomphumela ngeke kuguquke.
Le nhlanganisela ehlakaniphile isivumela ukuthi sithole amamethrikhi alandelayo uma siqhathaniswa namamodeli asebenzisanayo:
Kumamethrikhi wokugcina sibona isithombe esilandelayo:
Nokho, lapha ungaqaphela ukuthi kunephuzu elilodwa elingamboziwe lezincomo eziwusizo ebhizinisini. Manje sisanda kufunda indlela yokwenza umsebenzi omuhle wokubikezela lokho umsebenzisi azokuthenga, isibonelo, ngesonto elizayo. Kodwa ukumane unikeze isaphulelo entweni azovele ayithenge akulungile kakhulu. Kodwa kuhle ukukhulisa okulindelekile, ngokwesibonelo, kwamamethrikhi alandelayo:
- Imajini/inzuzo esuselwe ezincomweni zomuntu siqu.
- Isilinganiso sokuhlola kwamakhasimende.
- Ukuvama kokuvakasha.
Ngakho-ke siphindaphinda amathuba atholiwe ngama-coefficient ahlukene futhi siwamise kabusha ukuze imikhiqizo ethinta amamethrikhi angenhla ifike phezulu. Asikho isixazululo esenziwe ngomumo sokuthi iyiphi indlela engcono kakhulu ongayisebenzisa. Size sihlole ama-coefficient anjalo ngokuqondile ekukhiqizeni. Kodwa nanka amasu athakazelisayo avame ukusinika imiphumela engcono kakhulu:
- Phindaphinda ngenani/imajini yomkhiqizo.
- Phindaphinda ngeresidi emaphakathi lapho umkhiqizo uvela khona. Ngakho-ke kuzovela izimpahla, ngokuvamile ezithatha enye into.
- Phindaphinda ngesilinganiso sokuvama kokuvakashelwa ngabathengi balo mkhiqizo, ngokusekelwe kumbono wokuthi lo mkhiqizo ucasula abantu ukuthi babuyele kuwo kaningi.
Ngemva kokwenza ukuhlolwa ngama-coefficients, sithole amamethrikhi alandelayo ekukhiqizeni:
kuyinto ukuguqulwa kwemikhiqizo jikelele - isabelo semikhiqizo ethengiwe kuyo yonke imikhiqizo ezincomweni esizenzile.
Umfundi oqaphile uzoqaphela umehluko omkhulu phakathi kwamamethrikhi angaxhunyiwe ku-inthanethi nawe-inthanethi. Lokhu kuziphatha kuchazwa yiqiniso lokuthi akuzona zonke izihlungi eziguqukayo zemikhiqizo ezinganconywa ezingacatshangelwa lapho uqeqesha imodeli. Kithina, kuyindaba evamile lapho ingxenye yamakhandidethi abuyisiwe ingahlungwa; lokhu kucaciswa kuvamile embonini yethu.
Mayelana nemali engenayo, indaba elandelayo iyatholakala, kusobala ukuthi ngemva kokwethulwa kwezincomo, imali engenayo yeqembu lokuhlola ikhula kakhulu, manje ukukhuphuka okulinganiselwe kwemali engenayo ngezincomo zethu ku-3-4%:
Sengiphetha, ngifuna ukusho ukuthi uma udinga izincomo ezingezona ezesikhathi sangempela, khona-ke ukwanda okukhulu kakhulu kwekhwalithi kungatholakala ekuhloleni ngokukhipha abantu abazoncoma. Isikhathi esiningi sesizukulwane sabo senza kube nokwenzeka ukuhlanganisa izindlela eziningi ezinhle, ezizonikeza imiphumela emihle yebhizinisi.
Ngingajabula ukuxoxa emazwaneni nanoma ubani othola indaba ithakazelisa. Ungangibuza imibuzo mathupha ku
Source: www.habr.com