Izimvo kunye neentlanganiso malunga nokuba zeziphi ezinye iinkqubo ezinokuzenzekela zivela kumashishini anobungakanani obahlukeneyo yonke imihla. Kodwa ukongeza kwinto yokuba ixesha elininzi lingachithwa ekudaleni imodeli, kufuneka uchithe ekuyivavanyeni kwaye ujonge ukuba isiphumo esifunyenweyo asiyonto ingaqhelekanga. Emva kokuphunyezwa, nayiphi na imodeli kufuneka ibekwe esweni kwaye ihlolwe ngamaxesha.
Kwaye ezi zizo zonke izigaba ekufuneka zigqitywe kuyo nayiphi na inkampani, kungakhathaliseki ubungakanani bayo. Ukuba sithetha ngomlinganiselo kunye nelifa le-Sberbank, inani lokulungiswa kakuhle landa kakhulu. Ekupheleni kuka-2019, uSber wayesele esebenzise iimodeli ezingaphezu kwe-2000. Akwanelanga ukuphuhlisa nje imodeli; kuyimfuneko ukudibanisa kunye neenkqubo zemizi-mveliso, ukuphuhlisa i-data marts kwiimodeli zokwakha, kunye nokuqinisekisa ukulawula ukusebenza kwayo kwiqela.
Iqela lethu liphuhlisa iqonga le-Sber.DS. Ikuvumela ukuba usombulule iingxaki zokufunda koomatshini, ukhawulezise inkqubo yokuvavanya i-hypotheses, ngokomgaqo wenza lula inkqubo yokuphuhlisa kunye nokuqinisekisa iimodeli, kwaye ulawula umphumo wemodeli kwi-PROM.
Ukuze ungakhohlisi ukulindela kwakho, ndifuna ukuthetha kwangaphambili ukuba esi sithuba sisintshayelelo, kwaye phantsi kokusikwa, kubaqalayo, sithetha malunga nantoni na, ngokomgaqo, phantsi kwe-hood ye-platform ye-Sber.DS. Siza kuxela ibali malunga nomjikelo wobomi bomzekelo ukusuka ekudalweni ukuya ekuphunyezweni ngokwahlukileyo.
I-Sber.DS inamacandelo amaninzi, awona angundoqo lithala leencwadi, inkqubo yophuhliso kunye nenkqubo yokwenziwa kwemodeli.
Ithala leencwadi lilawula umjikelo wobomi bemodeli ukusuka kumzuzu umbono wokuwuphuhlisa de uphunyezwe kwi-PROM, ukubeka iliso kunye nokuyekiswa kogunyaziso. Uninzi lwezakhono zamathala eencwadi zilawulwa yimithetho yolawulo, umzekelo, ukunika ingxelo kunye nokugcinwa koqeqesho kunye neesampuli zokuqinisekisa. Enyanisweni, le yirejista yazo zonke iimodeli zethu.
Inkqubo yophuhliso yenzelwe uphuhliso olubonakalayo lweemodeli kunye nobuchule bokuqinisekisa. Iimodeli eziphuhlisiwe zifumana ukuqinisekiswa kokuqala kwaye zinikezelwe kwisistim yokuphumeza ukwenza imisebenzi yazo yeshishini. Kwakhona, kwinkqubo yexesha lokusebenza, imodeli inokubekwa esweni ngenjongo yokuqalisa ngamaxesha athile iindlela zokuqinisekisa ukujonga ukusebenza kwayo.
Kukho iindidi ezininzi zeenodi kwinkqubo. Ezinye ziyilelwe ukudibanisa kwimithombo eyahlukeneyo yedatha, ezinye ziyilelwe ukuguqula idatha yomthombo kwaye iyityebise (markup). Kukho iindawo ezininzi zokwakha iimodeli ezahlukeneyo kunye neendawo zokuziqinisekisa. Umphuhlisi unokulayisha idatha kuwo nawuphi na umthombo, aguqule, ahluze, abonise idatha ephakathi, kwaye ayaphule abe ngamacandelo.
Iqonga likwaqulethe iimodyuli esele zenziwe ezinokuthi zitsalwe kwaye zilahlwe kwindawo yokuyila. Zonke izenzo zenziwa kusetyenziswa ujongano olubonwayo. Enyanisweni, unokusombulula ingxaki ngaphandle komgca omnye wekhowudi.
Ukuba izakhono ezakhelwe ngaphakathi azanelanga, inkqubo ibonelela ngokukwazi ukwenza ngokukhawuleza iimodyuli zakho. Senze imowudi yophuhliso edibeneyo esekelwe
Uyilo lweSber.DS lwakhiwe kwiinkonzo ezincinci. Kukho iimbono ezininzi malunga nokuba zeziphi ii-microservices. Abanye abantu bacinga ukuba kwanele ukwahlula ikhowudi ye-monolithic ibe ngamacandelo, kodwa kwangaxeshanye basaya kwi-database efanayo. I-microservice yethu kufuneka inxibelelane nenye i-microservice kuphela nge-REST API. Akukho manyathelo okusebenza ukufikelela ngqo kwisiseko sedatha.
Sizama ukuqinisekisa ukuba iinkonzo azibi zinkulu kakhulu kwaye zinzima: umzekelo omnye akufanele udle ngaphezu kwe-4-8 gigabytes ye-RAM kwaye kufuneka ibonelele ngokukwazi ukulinganisa izicelo ngokuthe tye ngokusungula iimeko ezintsha. Inkonzo nganye inxibelelana nabanye kuphela nge-REST API (
Ingundoqo yesicelo ibhalwe kwiJava usebenzisa i-Spring Framework. Isisombululo saqale senzelwe ukuthunyelwa ngokukhawuleza kwisiseko selifu, ngoko ke isicelo sakhiwa kusetyenziswa inkqubo yesikhongozeli.
Enye yeempawu zeqonga lethu kukuba sinokuqhuba ikhowudi ephuhliswe kwi-interface ebonakalayo kuyo nayiphi na inkqubo yokwenziwa kwemodeli ye-Sberbank. Ngoku sele kukho ezimbini kuzo: enye ikwiHadoop, enye ikwi-OpenShift (Docker). Asiyeki apho kwaye senze iimodyuli zokudityaniswa ukuze siqhube ikhowudi kuyo nayiphi na iziseko zophuhliso, kubandakanya nesiseko kunye nelifu. Ngokumalunga namathuba okudibanisa okusebenzayo kwi-ecosystem ye-Sberbank, sikwaceba ukuxhasa umsebenzi kunye neendawo ezikhoyo zokubulawa. Kwixesha elizayo, isisombululo sinokudibaniswa ngokuguquguqukayo "ngaphandle kwebhokisi" kuyo nayiphi na indawo yombutho.
Abo baye bazama ukuxhasa isisombululo esiqhuba iPython kwiHadoop kwi-PROM bayazi ukuba akwanele ukulungiselela nokuhambisa indawo yomsebenzisi wePython kwidathanode nganye. Inani elikhulu leelayibrari zeC / C ++ zokufunda koomatshini ezisebenzisa iimodyuli zePython aziyi kukuvumela ukuba uphumle ngokulula. Kufuneka sikhumbule ukuhlaziya iipakethi xa songeza amathala eencwadi amatsha okanye iiseva, ngelixa sigcina ukuhambelana ngasemva kunye nekhowudi yemodeli esele iphunyeziwe.
Kukho iindlela ezininzi zokwenza oku. Umzekelo, lungiselela amathala eencwadi asetyenziswa rhoqo kwaye uwasebenzise kwi-PROM. Kusasazo lwe-Hadoop ye-Cloudera, bahlala besebenzisa
Ibhanki ithatha ukhuseleko lokusebenzisa ikhowudi yomntu wesithathu ngokubaluleke kakhulu, ke senza uninzi lwezinto ezintsha zeLinux kernel, apho inkqubo isebenza kwindawo ekwanti.
Kulo nyaka siceba ukugqiba i-MVP yokuqalisa iimodeli ezibhalwe kwiPython / R / Java kwiHadoop. Sizibekele umsebenzi wamabhongo wokufunda indlela yokuqhuba nayiphi na imeko yesiko kwiHadoop, ukuze singathinteli abasebenzisi beqonga lethu nangayiphi na indlela.
Ukongeza, njengoko kwavelayo, iingcali ezininzi ze-DS zigqwesile kwimathematika kunye nezibalo, zenza iimodeli ezipholileyo, kodwa aziyazi kakuhle kakhulu kwiinguqu ezinkulu zedatha, kwaye zifuna uncedo lweenjineli zethu zedatha ukulungiselela iisampulu zoqeqesho. Sigqibe ekubeni sincede oogxa bethu kunye nokwenza iimodyuli ezifanelekileyo zotshintsho olusemgangathweni kunye nolungiselelo lweempawu zeemodeli kwi-injini yeSpark. Oku kuya kukuvumela ukuba uchithe ixesha elininzi ekuphuhliseni iimodeli kwaye ungalindi iinjineli zedatha ukuba zilungiselele idatha entsha.
Siqesha abantu abanolwazi kwimimandla eyahlukeneyo: iLinux kunye neDevOps, iHadoop neSpark, iJava neSpring, iScala neAkka, iOpenShift neKubernetes. Ngexesha elizayo siza kuthetha ngethala leencwadi elingumzekelo, indlela imodeli ehamba ngayo kumjikelo wobomi ngaphakathi kwinkampani, ukuba ukuqinisekiswa nokuphunyezwa kwenzeka njani.
umthombo: www.habr.com