Ukusetshenziswa kwekhodi ephansi kumapulatifomu okuhlaziya

Bafundi abathandekayo, usuku oluhle!

Umsebenzi wokwakha izinkundla ze-IT zokuqoqa nokuhlaziya idatha ngokushesha noma kamuva uvela kunoma iyiphi inkampani ibhizinisi layo elisekelwe kumodeli yokulethwa kwesevisi elayishwe ubuhlakani noma ukwakhiwa kwemikhiqizo eyinkimbinkimbi yobuchwepheshe. Ukwakha amapulatifomu okuhlaziya kuwumsebenzi oyinkimbinkimbi futhi odla isikhathi. Nokho, noma yimuphi umsebenzi ungenziwa lula. Kulesi sihloko ngifuna ukwabelana ngolwazi lwami ekusebenziseni amathuluzi ekhodi ephansi ukusiza ukudala izixazululo zokuhlaziya. Lokhu okuhlangenwe nakho kwatholakala ngesikhathi sokuqaliswa kwenani lamaphrojekthi ekuqondisweni kwe-Big Data Solutions yenkampani ye-Neoflex. Kusukela ngo-2005, isiqondiso se-Big Data Solutions se-Neoflex besilokhu sibhekene nezindaba zokwakha izindawo zokugcina idatha namachibi, ukuxazulula izinkinga zokuthuthukisa isivinini sokucubungula ulwazi kanye nokusebenza ngendlela yokuphatha ikhwalithi yedatha.

Ukusetshenziswa kwekhodi ephansi kumapulatifomu okuhlaziya

Akekho ozokwazi ukugwema ukuqoqwa okuqaphelayo kwedatha ebuthaka kanye/noma eyakhiwe ngokuqinile. Mhlawumbe noma sikhuluma ngamabhizinisi amancane. Ngemuva kwakho konke, lapho ulinganisa ibhizinisi, usomabhizinisi othembisayo uzobhekana nezinkinga zokuthuthukisa uhlelo lokwethembeka, uzofuna ukuhlaziya ukusebenza kahle kwamaphuzu okuthengisa, uzocabanga ngokukhangisa okuhlosiwe, futhi uzodideka ngesidingo semikhiqizo ehambisanayo. . Ekulinganiseni kokuqala, inkinga ingaxazululwa "edolweni". Kodwa njengoba ibhizinisi likhula, ukuza endaweni yesikhulumi sokuhlaziya kusengenakugwemeka.

Kodwa-ke, kusiphi isimo lapho imisebenzi yokuhlaziya idatha ingathuthuka ibe yizinkinga zekilasi elithi "Rocket Science"? Mhlawumbe okwamanje lapho sikhuluma ngedatha enkulu ngempela.
Ukwenza i-Rocket Science ibe lula, ungadla ucezu lwendlovu ngocezu.

Ukusetshenziswa kwekhodi ephansi kumapulatifomu okuhlaziya

Uma izinhlelo zakho zokusebenza/izinsizakalo/izinsizakalo zakho zihluka kakhulu futhi zizimele, kuzoba lula kuwena, ozakwenu kanye nalo lonke ibhizinisi ukugaya indlovu.

Cishe wonke amakhasimende ethu eze kulesi sihloko, esephinde akhe indawo ngokususelwa kuzinqubo zobunjiniyela bamaqembu e-DevOps.

Kodwa ngisho nokudla "okuhlukile, kwendlovu", sinethuba elihle "lokugcwala ngokweqile" kwe-IT landscape. Kulo mzuzu kuwufanele ukuma, ukukhipha umoya futhi ubheke eceleni ipulatifomu yobunjiniyela enekhodi ephansi.

Onjiniyela abaningi bethuswa ithemba lokufa emsebenzini wabo lapho besuka ekubhaleni ngokuqondile ikhodi bebhekise imicibisholo “ehudulayo” ku-UI yokuxhumana yezinhlelo ezinamakhodi aphansi. Kodwa ukufika kwamathuluzi omshini akuzange kuholele ekunyamaleni konjiniyela, kodwa kulethe umsebenzi wabo ezingeni elisha!

Ake sithole ukuthi kungani.

Ukuhlaziywa kwedatha emkhakheni wezokuthutha, imboni yezokuxhumana, ucwaningo lwemidiya, umkhakha wezezimali kuhlala kuhlotshaniswa nemibuzo elandelayo:

  • Isivinini sokuhlaziya okuzenzakalelayo;
  • Ikhono lokwenza izivivinyo ngaphandle kokuthinta ukuhamba kokukhiqizwa kwedatha okuyinhloko;
  • Ukuthembeka kwedatha elungisiwe;
  • Shintsha ukulandelela kanye nenguqulo;
  • Ubufakazi bedatha, uhlu lwedatha, i-CDC;
  • Ukulethwa okusheshayo kwezici ezintsha endaweni yokukhiqiza;
  • Futhi okudume kabi: izindleko zentuthuko nokusekelwa.

Okusho ukuthi, onjiniyela banenani elikhulu lemisebenzi esezingeni eliphakeme, engaqedwa ngempumelelo eyanele kuphela ngokususa ukwazi kwabo imisebenzi yentuthuko esezingeni eliphansi.

Izidingongqangi zokuthi abathuthukisi badlulele ezingeni elisha kwaba wukuziphendukela kwemvelo kanye nokwenza idijithali kwebhizinisi. Inani lonjiniyela nalo liyashintsha: kukhona ukushoda okukhulu konjiniyela abakwazi ukugxila emicabangweni yokuthi ibhizinisi liyazenzakalela.

Ake sidwebe isifaniso ngezilimi zokuhlela ezisezingeni eliphansi nezisezingeni eliphezulu. Ushintsho olusuka ezilimini ezisezingeni eliphansi luye kwezisezingeni eliphezulu luwushintsho olusuka ekubhaleni “iziqondiso eziqondile ngolimi lwehadiwe” luye “kuyiziqondiso ngolimi lwabantu”. Okusho ukuthi, ukwengeza ungqimba oluthile lokususa. Kulokhu, ukushintshela ezisekelweni ezinekhodi ephansi ukusuka ezilimini zokuhlela ezisezingeni eliphezulu kuwushintsho olusuka “ekuziqondisweni ngolimi lwabantu” luye “kwiziqondiso ngolimi lwebhizinisi”. Uma kukhona abathuthukisi abadabukisayo ngaleli qiniso, khona-ke baye badabukisa, mhlawumbe, kusukela ngesikhathi i-Java Script yazalwa, esebenzisa imisebenzi yokuhlunga. Futhi le misebenzi, yiqiniso, inokuqaliswa kwesofthiwe ngaphansi kwe-hood ngezinye izindlela zohlelo olufanayo lwezinga eliphezulu.

Ngakho-ke, ikhodi ephansi iwukubukeka nje kwelinye izinga lokungafinyeleli.

Okwenziwayo kusetshenziswa ikhodi ephansi

Isihloko sekhodi ephansi sibanzi impela, kodwa manje ngithanda ukukhuluma mayelana nokusetshenziswa okungokoqobo "kwemiqondo yekhodi ephansi" ngisebenzisa isibonelo somunye wamaphrojekthi wethu.

Uphiko lwe-Big Data Solutions lwe-Neoflex lugxile kakhulu emkhakheni wezezimali webhizinisi, ukwakha izindawo zokugcina idatha namachibi kanye nokwenza ukubika okuhlukahlukene. Kule niche, ukusetshenziswa kwekhodi ephansi sekuyisikhathi eside kube yindinganiso. Phakathi kwamanye amathuluzi anekhodi ephansi, singabala amathuluzi okuhlela izinqubo ze-ETL: Isikhungo samandla se-Informatica, i-IBM Datastage, i-Pentaho Data Integration. Noma i-Oracle Apex, esebenza njengendawo yokuthuthukiswa okusheshayo kwezixhumanisi zokufinyelela nokuhlela idatha. Kodwa-ke, ukusetshenziswa kwamathuluzi okuthuthukisa amakhodi aphansi akubandakanyi ngaso sonke isikhathi ukwakha izinhlelo zokusebenza eziqondiswe kakhulu kusitaki sobuchwepheshe bezentengiselwano ngokuncika okucacile kumthengisi.

Usebenzisa amapulatifomu anekhodi ephansi, ungakwazi futhi ukuhlela ukuhlelwa kokugeleza kwedatha, udale amapulatifomu esayensi yedatha noma, isibonelo, amamojula okuhlola ikhwalithi yedatha.

Esinye sezibonelo ezisetshenziswayo zokuhlangenwe nakho ekusebenziseni amathuluzi okuthuthukisa amakhodi aphansi ukusebenzisana phakathi kwe-Neoflex neMediascope, omunye wabaholi emakethe yocwaningo lwemidiya yaseRussia. Enye yezinhloso zebhizinisi zale nkampani ukukhiqizwa kwedatha ngesisekelo sokuthi abakhangisi, izinkundla ze-inthanethi, iziteshi ze-TV, iziteshi zomsakazo, izinhlangano zokukhangisa nemikhiqizo benza izinqumo mayelana nokuthenga ukukhangisa nokuhlela ukuxhumana kwabo kokumaketha.

Ukusetshenziswa kwekhodi ephansi kumapulatifomu okuhlaziya

Ucwaningo lwemidiya luyindawo yebhizinisi elayishwe ngobuchwepheshe. Ukubona ukulandelana kwevidiyo, ukuqoqa idatha kumadivayisi ahlaziya ukubukwa, umsebenzi wokulinganisa kuzinsiza zewebhu - konke lokhu kusho ukuthi inkampani inabasebenzi abaningi be-IT kanye nolwazi olukhulu ekwakheni izixazululo zokuhlaziya. Kodwa ukukhula okunamandla kwenani lolwazi, inombolo kanye nenhlobonhlobo yemithombo yalo kuphoqa imboni yedatha ye-IT ukuthi ithuthuke njalo. Isixazululo esilula kakhulu sokukala inkundla yokuhlaziya yeMediascope esivele isebenza kungaba ukukhulisa abasebenzi be-IT. Kodwa isixazululo esisebenza kahle kakhulu ukusheshisa inqubo yokuthuthukiswa. Esinye sezinyathelo eziholela kulokhu kungaba ukusetshenziswa kwamapulatifomu anekhodi ephansi.

Ngesikhathi iphrojekthi iqala, inkampani yayisivele inesixazululo esisebenzayo somkhiqizo. Kodwa-ke, ukuqaliswa kwesixazululo ku-MSSQL akukwazanga ukuhlangabezana ngokugcwele nokulindelwe ekusebenzeni kokukala ngenkathi kugcinwa izindleko ezamukelekayo zentuthuko.

Umsebenzi owawuphambi kwethu wawungowokuvelela ngempela - i-Neoflex neMediascope kwadingeka ukuthi bakhe isixazululo sezimboni esikhathini esingaphansi konyaka, kuncike ekukhishweni kwe-MVP phakathi nekota yokuqala yedethi yokuqala.

Isitaki sobuchwepheshe be-Hadoop sikhethwe njengesisekelo sokwakha inkundla entsha yedatha esekelwe kukhompyutha enekhodi ephansi. I-HDFS isiphenduke indinganiso yokugcina idatha kusetshenziswa amafayela e-parquet. Ukuze ufinyelele idatha etholakala endaweni yesikhulumi, kwasetshenziswa i-Hive, lapho zonke izitolo ezitholakalayo zethulwa ngendlela yamatafula angaphandle. Ukulayisha idatha kusitoreji kwasetshenziswa kusetshenziswa i-Kafka ne-Apache NiFi.

Ithuluzi lekhodi ephansi kulo mqondo lisetshenziswe ukuthuthukisa umsebenzi onzima kakhulu ekwakheni inkundla yokuhlaziya - umsebenzi wokubala idatha.

Ukusetshenziswa kwekhodi ephansi kumapulatifomu okuhlaziya

Ithuluzi le-Datagram enekhodi ephansi likhethwe njengendlela eyinhloko yokwenza imephu yedatha. I-Neoflex Datagram iyithuluzi lokuthuthukisa izinguquko kanye nokugeleza kwedatha.
Usebenzisa leli thuluzi, ungenza ngaphandle kokubhala ikhodi ye-Scala ngesandla. Ikhodi ye-Scala ikhiqizwa ngokuzenzakalelayo kusetshenziswa indlela ye-Model Driven Architecture.

Inzuzo esobala yale ndlela ukusheshisa inqubo yokuthuthukiswa. Kodwa-ke, ngaphezu kwejubane, kukhona nezinzuzo ezilandelayo:

  • Ukubuka okuqukethwe kanye nesakhiwo semithombo/abamukeli;
  • Ukulandelela umsuka wezinto zokugeleza kwedatha ezinkambini ngazinye (uhlu lozalo);
  • Ukwenziwa kwezinguquko ngokwengxenye ngokubuka imiphumela emaphakathi;
  • Ukubuyekeza ikhodi yomthombo nokuyilungisa ngaphambi kokwenza;
  • Ukuqinisekisa okuzenzakalelayo kwezinguquko;
  • Ukulanda idatha ngokuzenzakalelayo 1 koku-1.

Isithiyo sokungena kuzixazululo zekhodi ephansi zokukhiqiza izinguquko siphansi kakhulu: unjiniyela udinga ukwazi i-SQL futhi abe nolwazi lokusebenza ngamathuluzi e-ETL. Kuhle ukusho ukuthi amajeneretha okuguqula aqhutshwa ngekhodi awawona amathuluzi e-ETL ngomqondo obanzi wegama. Amathuluzi ekhodi ephansi angase angabi nayo imvelo yawo yokusebenzisa ikhodi. Okusho ukuthi, ikhodi ekhiqiziwe izosetshenziswa endaweni ebikhona kuqoqo ngisho nangaphambi kokufaka isisombululo sekhodi ephansi. Futhi lokhu mhlawumbe kungenye futhi i-karma enekhodi ephansi. Njengoba, ngokuhambisana neqembu lekhodi ephansi, ithimba "lakudala" lingasebenza esebenzisa ukusebenza, isibonelo, ngekhodi ye-Scala ehlanzekile. Ukuletha ukuthuthukiswa okuvela kuwo womabili amaqembu ekukhiqizeni kuzoba lula futhi kungabi nazihibe.

Kuhle ukuqaphela ukuthi ngaphezu kwekhodi ephansi, kukhona nezixazululo ezingenayo ikhodi. Futhi emnyombeni wabo, lezi yizinto ezahlukene. Ikhodi ephansi ivumela unjiniyela ukuthi aphazamise kakhulu ikhodi ekhiqiziwe. Endabeni ye-Datagram, kungenzeka ukuthi ubuke futhi uhlele ikhodi ye-Scala ekhiqiziwe; ayikho ikhodi ingase inganikezi ithuba elinjalo. Lo mehluko ubaluleke kakhulu hhayi kuphela mayelana nokuguquguquka kwesixazululo, kodwa futhi mayelana nenduduzo nokugqugquzela emsebenzini wonjiniyela bedatha.

Isixazululo sezakhiwo

Ake sizame ukuthola ukuthi ithuluzi elinekhodi ephansi lisiza kanjani ukuxazulula inkinga yokuthuthukisa isivinini sokuthuthukisa ukusebenza kokubala idatha. Okokuqala, ake sibheke ukwakheka kokusebenza kwesistimu. Isibonelo kulokhu imodeli yokukhiqiza idatha yocwaningo lwemidiya.

Ukusetshenziswa kwekhodi ephansi kumapulatifomu okuhlaziya

Imithombo yedatha kithi ihluke kakhulu futhi ihlukahlukene:

  • Amamitha abantu (amamitha we-TV) ama-software kanye nezisetshenziswa zehadiwe ezifunda ukuziphatha komsebenzisi kwabaphendula ithimba likamabonakude - ubani, nini futhi yisiphi isiteshi sikamabonakude esibukwe emndenini obambe iqhaza ocwaningweni. Ulwazi olunikeziwe luwuchungechunge lwezikhathi zokubuka zokusakaza ezixhunywe kuphakheji yemidiya kanye nomkhiqizo wemidiya. Idatha esesigabeni sokulayishwa ku-Data Lake ingathuthukiswa ngezibaluli zezibalo, i-geostratification, indawo yesikhathi nolunye ulwazi oludingekayo ukuze kuhlaziywe ukubukwa kwethelevishini komkhiqizo othile wemidiya. Izilinganiso ezithathiwe zingasetshenziselwa ukuhlaziya noma ukuhlela imikhankaso yokukhangisa, ukuhlola umsebenzi nokuthandwa yizithameli, futhi kuhlanganiswe inethiwekhi yokusakaza;
  • Idatha ingavela ezinhlelweni zokuqapha zokusakaza ukusakazwa kukamabonakude kanye nokukala ukubukwa kokuqukethwe kwensiza yevidiyo ku-inthanethi;
  • Amathuluzi okulinganisa endaweni yewebhu, okuhlanganisa kokubili amamitha amaphakathi nesizinda somsebenzisi. Umhlinzeki wedatha we-Data Lake angaba isengezo sebha yocwaningo kanye nesicelo seselula esine-VPN eyakhelwe ngaphakathi.
  • Idatha ingavela futhi kumasayithi ahlanganisa imiphumela yokugcwalisa uhlu lwemibuzo oluku-inthanethi kanye nemiphumela yezingxoxo zocingo ocwaningweni lwenkampani;
  • Ukunothiswa okwengeziwe kwechibi ledatha kungenzeka ngokulanda ulwazi kumalogi ezinkampani zozakwethu.

Ukuqaliswa kokuthi njengoba kulayisha kusuka kumasistimu omthombo kuya esigabeni esiyinhloko sedatha eluhlaza kungahlelwa ngezindlela ezihlukahlukene. Uma ikhodi ephansi isetshenziselwa lezi zinhloso, ukukhiqizwa okuzenzakalelayo kokulayisha izikripthi ngokusekelwe kumethadatha kungenzeka. Kulesi simo, asikho isidingo sokwehlela ezingeni lokuthuthukisa umthombo ukuze uqondise amamephu. Ukuze usebenzise ukulayisha okuzenzakalelayo, sidinga ukusungula uxhumano emthonjeni, bese sichaza kusixhumi esibonakalayo sokulayisha uhlu lwamabhizinisi azolayishwa. Isakhiwo sohla lwemibhalo ku-HDFS sizodalwa ngokuzenzakalelayo futhi sizohambisana nesakhiwo sokulondoloza idatha kusistimu yomthombo.

Kodwa-ke, kumongo wale phrojekthi, sinqume ukungasebenzisi lesi sici sepulatifomu yekhodi ephansi ngenxa yokuthi inkampani yeMediascope isivele iqalile ngokuzimela umsebenzi wokukhiqiza isevisi efanayo isebenzisa inhlanganisela ye-Nifi + Kafka.

Kufanelekile ukukhombisa ngokushesha ukuthi lawa mathuluzi awashintsheki, kodwa ayahambisana. U-Nifi no-Kafka bayakwazi ukusebenza kokubili ngokuqondile (i-Nifi -> Kafka) kanye ne-reverse (Kafka -> Nifi) uxhumano. Kunkundla yocwaningo lwabezindaba, inguqulo yokuqala yenqwaba yasetshenziswa.

Ukusetshenziswa kwekhodi ephansi kumapulatifomu okuhlaziya

Esimeni sethu, i-NayFi yayidinga ukucubungula izinhlobo ezihlukahlukene zedatha kusuka kumasistimu omthombo futhi iyithumele kumthengisi we-Kafka. Kulokhu, imilayezo ithunyelwe esihlokweni esithile se-Kafka kusetshenziswa ama-PublishKafka Nifi processors. Ukuhlelwa nokugcinwa kwalamapayipi kwenziwa ngendlela ebonakalayo. Ithuluzi le-Nifi kanye nokusetshenziswa kwenhlanganisela ye-Nifi + Kafka nayo ingabizwa ngokuthi indlela yekhodi ephansi yokuthuthukiswa, enesithiyo esincane sokungena kubuchwepheshe beDatha enkulu futhi isheshisa inqubo yokuthuthukiswa kwesicelo.

Isigaba esilandelayo ekusetshenzisweni kwephrojekthi bekuwukuletha idatha enemininingwane kufomethi eyodwa yesendlalelo semantic. Uma ibhizinisi linezibaluli zomlando, isibalo senziwa kumongo wokuhlukaniswa okukhulunywa ngakho. Uma ibhizinisi lingewona umlando, ngakho-ke kungenzeka ngokuzikhethela ukuthi ubale kabusha konke okuqukethwe kwento, noma wenqabe ngokuphelele ukubala kabusha le nto (ngenxa yokuntuleka kwezinguquko). Kulesi sigaba, okhiye bakhiqizwa kuwo wonke amabhizinisi. Okhiye bagcinwa kunkomba ye-Hbase ehambisana nezinto eziyinhloko, eziqukethe ukuxhumana phakathi kokhiye kungxenyekazi yokuhlaziya kanye nokhiye abavela kumasistimu omthombo. Ukuhlanganiswa kwezinhlaka ze-athomu kuhambisana nokunothisa ngemiphumela yokubala kokuqala kwedatha yokuhlaziya. Uhlaka lokubala idatha bekungu-Spark. Umsebenzi ochaziwe wokuletha idatha ku-semantics eyodwa uphinde wasetshenziswa ngokusekelwe ekwenziweni kwemephu kusuka kuthuluzi lekhodi ephansi yeDathagram.

Isakhiwo esiqondiwe sasidinga ukufinyelela kwe-SQL kudatha yabasebenzisi bebhizinisi. I-Hive isetshenziswe kule nketho. Izinto zibhaliswa ku-Hive ngokuzenzakalelayo lapho unika amandla inketho ethi “Registr Hive Table” ethuluzini elinekhodi ephansi.

Ukusetshenziswa kwekhodi ephansi kumapulatifomu okuhlaziya

Ukulawula ukugeleza kwesibalo

I-Datagram ine-interface yokudala imiklamo yokuhamba komsebenzi. Amamephu angaqaliswa kusetshenziswa isihleli se-Oozie. Kusixhumi esibonakalayo sikanjiniyela wokusakaza, kungenzeka ukudala izikimu zokuguqulwa kwedatha okuhambisanayo, okulandelanayo, noma okuncike ekusebenzeni. Kukhona ukusekelwa kwemibhalo yegobolondo nezinhlelo ze-java. Kungenzeka futhi ukusebenzisa iseva ye-Apache Livy. I-Apache Livy isetshenziselwa ukusebenzisa izinhlelo zokusebenza ngokuqondile endaweni yokuthuthukiswa.

Uma inkampani isivele ine-orchestrator yayo yenqubo, kungenzeka ukusebenzisa i-REST API ukuze ushumeke amamephu ekugelezeni okukhona. Isibonelo, sibe nolwazi oluyimpumelelo lokushumeka amamephu e-Scala kuma-orchestrators abhalwe nge-PLSQL ne-Kotlin. I-REST API yethuluzi lekhodi ephansi ihlanganisa imisebenzi efana nokukhiqiza unyaka osebenzisekayo ngokusekelwe ekwakhiweni kwemephu, ukubiza imephu, ukubiza ukulandelana kwamamephu, kanye, vele, ukudlulisa amapharamitha ku-URL ukuze aqalise amamephu.

Kanye no-Oozie, kungenzeka ukuhlela ukugeleza kwesibalo usebenzisa i-Airflow. Mhlawumbe ngeke ngihlale isikhathi eside ekuqhathanisweni phakathi kwe-Oozie ne-Airflow, kodwa ngizomane ngithi esimweni somsebenzi wephrojekthi yocwaningo lwemidiya, ukukhetha kwavuna i-Airflow. Izimpikiswano ezinkulu kulokhu bekuwumphakathi osebenza kakhulu othuthukisa umkhiqizo kanye nesixhumi esibonakalayo esithuthuke kakhulu + i-API.

Ukugeleza komoya nakho kuhle ngoba kusebenzisa iPython ethandekayo ukuchaza izinqubo zokubala. Futhi ngokuvamile, azikho izinkundla eziningi kangaka zokuphatha ukugeleza kokusebenza komthombo ovulekile. Ukwethula nokuqapha ukwenziwa kwezinqubo (kuhlanganise neshadi le-Gantt) kwengeza amaphuzu kuphela ku-karma ye-Airflow.

Ifomethi yefayela yokumisa yokuqalisa ukudwetshwa kwezixazululo zekhodi ephansi isiphenduke i-spark-submit. Lokhu kwenzeka ngezizathu ezimbili. Okokuqala, i-spark-submit ikuvumela ukuthi usebenzise ngokuqondile ifayela lembiza kusuka kukhonsoli. Okwesibili, ingaqukatha lonke ulwazi oludingekayo ukuze ulungiselele ukuhamba komsebenzi (okwenza kube lula ukubhala imibhalo ekhiqiza i-Dag).
Isici esivame kakhulu sokugeleza komsebenzi kwe-Airflow esimweni sethu kwakuyi-SparkSubmitOperator.

I-SparkSubmitOperator ikuvumela ukuthi usebenzise izimbiza - ukupakishwa kwemephu ye-Datagram enezinhlaka zokufaka ezakhiwe ngaphambilini zazo.

Kuhle ukusho ukuthi umsebenzi ngamunye we-Airflow usebenza ngomucu ohlukile futhi awazi lutho ngeminye imisebenzi. Ngakho-ke, ukuxhumana phakathi kwemisebenzi kwenziwa ngokusebenzisa ama-opharetha okulawula, njenge-DummyOperator noma i-BranchPythonOperator.

Kuhlanganiswe ndawonye, ​​ukusetshenziswa kwesixazululo sekhodi ephansi ye-Datagram ngokuhambisana nokwenziwa kwendawo yonke kwamafayela okumisa (ukwenza i-Dag) kuholele ekusheshiseni okuphawulekayo nokwenza lula inqubo yokuthuthukisa ukugeleza kokulayisha idatha.

Bonisa izibalo

Mhlawumbe isiteji esigcwele ubuhlakani ekukhiqizeni idatha yokuhlaziya yisinyathelo semibukiso yokwakha. Ngomongo wokugeleza kokubala kwedatha yenkampani yocwaningo, kulesi sigaba, idatha incishiselwa ekusakazeni kwereferensi, kucatshangelwa ukulungiswa kwezindawo zesikhathi futhi kuxhunywe kugridi yokusakaza. Kungenzeka futhi ukulungisa inethiwekhi yendawo yokusakaza (izindaba zendawo nokukhangisa). Phakathi kokunye, lesi sinyathelo sihlukanisa izikhawu zokubukwa okuqhubekayo kwemikhiqizo yemidiya ngokusekelwe ekuhlaziyweni kwezikhawu zokubuka. Ngokushesha, amanani wokubuka "alinganiswa" ngokusekelwe olwazini mayelana nokubaluleka kwawo (ukubalwa kwesici sokulungisa).

Ukusetshenziswa kwekhodi ephansi kumapulatifomu okuhlaziya

Isinyathelo esihlukile sokulungiselela imibukiso ukuqinisekiswa kwedatha. I-algorithm yokuqinisekisa ibandakanya ukusetshenziswa kwenani lamamodeli esayensi yezibalo. Kodwa-ke, ukusetshenziswa kweplathifomu enekhodi ephansi kukuvumela ukuthi uphule i-algorithm eyinkimbinkimbi ibe yinani lamamephu ahlukene afundekayo abonakalayo. Imephu ngayinye yenza umsebenzi omncane. Ngenxa yalokho, ukulungisa iphutha okuphakathi, ukugawulwa kwemithi kanye nokuboniswa kwezigaba zokulungiselela idatha kungenzeka.

Kunqunywe ukuthi kukhishwe i-algorithm yokuqinisekisa kulezi zigaba ezilandelayo:

  • Ukwakha ukuhlehla kokuncika kokubuka inethiwekhi ye-TV endaweni enokubukwa kwawo wonke amanethiwekhi esifundeni izinsuku ezingama-60.
  • Ukubalwa kwezinsalela ezifundiwe (ukuchezuka kwamanani angempela ukusuka kulawo abikezelwe imodeli yokuhlehla) kuwo wonke amaphuzu okuhlehla kanye nosuku olubaliwe.
  • Ukukhethwa kwamapheya enethiwekhi yesifunda-okungavamile, lapho ibhalansi yesitshudeni yosuku lokukhokha idlula okujwayelekile (okucaciswa izilungiselelo zokusebenza).
  • Ukubalwa kabusha kwensalela elungisiwe eyisitshudeni yamapheya enethiwekhi yesifunda-TV engaqondakali kumphenduli ngamunye obuke inethiwekhi esifundeni, kunqunywa umnikelo walo mphenduli (inani loshintsho kwinsalela yomfundi) uma kungafaki ukubukwa kwalo mphenduli kusampula. .
  • Sesha amakhandidethi ukukhishwa kwawo kubuyisela ibhalansi yabafundi yosuku lokukhokha kwejwayelekile.

Lesi sibonelo esingenhla siqinisekisa umbono wokuthi unjiniyela wedatha usevele unokuningi kakhulu engqondweni yakhe... Futhi, uma ngempela lona “engunjiniyela” hhayi “i-coder,” khona-ke ukwesaba ukucekelwa phansi kochwepheshe lapho esebenzisa amathuluzi anekhodi ephansi. kufanele ihlehle ekugcineni.

Yini enye engenziwa ngekhodi ephansi?

Ububanzi bokusetshenziswa kwethuluzi lekhodi ephansi lenqwaba nokusakaza idatha ngaphandle kwesidingo sokubhala ikhodi mathupha ku-Scala akugcini lapho.

Ukusetshenziswa kwekhodi ephansi ekuthuthukisweni kwedathalake sekuvele kuyindinganiso kithi. Cishe singasho ukuthi izixazululo ezisekelwe kusitaki se-Hadoop zilandela indlela yokuthuthukiswa kwama-DWH akudala asekelwe ku-RDBMS. Amathuluzi ekhodi ephansi kusitaki se-Hadoop angaxazulula yomibili imisebenzi yokucubungula idatha kanye nomsebenzi wokwakha izindawo zokugcina ze-BI. Ngaphezu kwalokho, kufanele kuqashelwe ukuthi i-BI ingasho nje ukumelwa kwedatha, kodwa futhi nokuhlelwa kwayo ngabasebenzisi bebhizinisi. Sivamise ukusebenzisa lokhu kusebenza lapho sakha izinkundla zokuhlaziya zomkhakha wezezimali.

Ukusetshenziswa kwekhodi ephansi kumapulatifomu okuhlaziya

Phakathi kwezinye izinto, usebenzisa ikhodi ephansi futhi, ikakhulukazi, i-Datagram, kungenzeka ukuxazulula inkinga yokulandelela umsuka wezinto zokusakaza idatha nge-athomu phansi emasimini ngamanye (uhlu lozalo). Ukwenza lokhu, ithuluzi lekhodi ephansi lisebenzisa ukuxhumana ne-Apache Atlas ne-Cloudera Navigator. Empeleni, unjiniyela udinga ukubhalisa isethi yezinto kusichazamazwi se-Atlas futhi abhekise izinto ezibhalisiwe lapho akha amamephu. Indlela yokulandelela umsuka wedatha noma ukuhlaziya ukuncika entweni yonga isikhathi esiningi lapho kudingekile ukwenza ukuthuthukiswa kwama-algorithms okubala. Isibonelo, lapho ulungiselela izitatimende zezimali, lesi sici sikuvumela ukuthi usinde ngokukhululeka esikhathini sezinguquko zomthetho. Phela, lapho siqonda kangcono ukuncika kwe-inter-form kumongo wezinto zesendlalelo esinemininingwane, kancane sizobhekana namaphutha "okuzumayo" futhi sinciphise inani lokusebenza kabusha.

Ukusetshenziswa kwekhodi ephansi kumapulatifomu okuhlaziya

Ikhwalithi Yedatha nekhodi ephansi

Omunye umsebenzi owenziwe ithuluzi elinekhodi ephansi kuphrojekthi ye-Mediascope kwakuwumsebenzi wekilasi Lekhwalithi Yedatha. Isici esikhethekile sokuqaliswa kwepayipi lokuqinisekisa idatha yephrojekthi yenkampani yocwaningo kwaba ukungabikhona komthelela ekusebenzeni kanye nesivinini sokugeleza kokubala kwedatha okuyinhloko. Ukuze ukwazi ukuhlela ukugeleza kokuqinisekisa kwedatha okuzimele, kusetshenziswe i-Apache Airflow esejwayelekile. Njengoba isinyathelo ngasinye sokukhiqizwa kwedatha sase silungile, ingxenye ehlukile yepayipi le-DQ yethulwa ngokuhambisana.

Kubhekwa umkhuba omuhle ukuqapha ikhwalithi yedatha kusukela ngesikhathi iqalwa endaweni yokuhlaziya. Sinolwazi mayelana nemethadatha, singahlola ukuthobelana nemibandela eyisisekelo kusukela ngesikhathi ulwazi lungena kusendlalelo esiyinhloko - hhayi okungenalutho, imikhawulo, okhiye bangaphandle. Lokhu kusebenza kusetshenziswa ngokusekelwe ekwenziweni kwemephu okukhiqizwa ngokuzenzakalelayo komndeni wekhwalithi yedatha ku-Datagram. Ukukhiqizwa kwekhodi kuleli cala nakho kusekelwe kumethadatha yemodeli. Kuphrojekthi yeMediascope, isixhumi esibonakalayo senziwe ngemethadatha yomkhiqizo we-Enterprise Architect.

Ngokumatanisa ithuluzi lekhodi ephansi ne-Enterprise Architect, ukuhlola okulandelayo kwenziwa ngokuzenzakalelayo:

  • Ihlola ubukhona bamanani "angenalutho" ezinkambuni ezinesilungisi esithi "not null";
  • Ukuhlola ubukhona bezimpinda zokhiye oyinhloko;
  • Ihlola ukhiye wangaphandle webhizinisi;
  • Ihlola ukuhluka kweyunithi yezinhlamvu ngokusekelwe kusethi yezinkambu.

Ukuze uthole ukuhlolwa okuyinkimbinkimbi kokutholakala kwedatha nokuthembeka, imephu yadalwa nge-Scala Expression, ethatha njengokufaka ikhodi yokuhlola yangaphandle ye-Spark SQL elungiswe abahlaziyi e-Zeppelin.

Ukusetshenziswa kwekhodi ephansi kumapulatifomu okuhlaziya

Yiqiniso, ukukhiqizwa okuzenzakalelayo kwamasheke kufanele kufinyelelwe kancane kancane. Ngaphakathi kohlaka lwephrojekthi echaziwe, lokhu kwandulelwa yizinyathelo ezilandelayo:

  • I-DQ isetshenziswe ezincwadini zokubhalela ze-Zeppelin;
  • I-DQ yakhelwe ekudwebeni kwemephu;
  • I-DQ ngesimo semephu enkulu ehlukene equkethe isethi yonke yamasheke ebhizinisi elihlukile;
  • Imephu ye-DQ enepharamitha yendawo yonke eyamukela ulwazi mayelana nemethadatha nokuhlolwa kwebhizinisi njengokufakiwe.

Mhlawumbe inzuzo enkulu yokudala isevisi yesheke enepharamitha ukuncishiswa kwesikhathi esisithathayo ukuletha ukusebenza endaweni yokukhiqiza. Ukuhlolwa kwekhwalithi entsha kungadlula iphethini yakudala yokuletha ikhodi ngokungaqondile ngokuthuthukiswa nezimo zokuhlola:

  • Konke ukuhlola imethadatha kwenziwa ngokuzenzakalelayo lapho imodeli ilungiswa ku-EA;
  • Ukuhlola ukutholakala kwedatha (okunquma ukuba khona kwanoma iyiphi idatha ngesikhathi esithile) kungenziwa ngokusekelwe kuhla lwemibhalo olugcina isikhathi esilindelekile sokuvela kocezu olulandelayo lwedatha kumongo wezinto;
  • Ukuhlola ukuqinisekiswa kwedatha yebhizinisi kudalwe abahlaziyi ezincwadini zokubhalela ze-Zeppelin. Ukusuka lapho athunyelwa ngqo kumathebula okusetha amamojula e-DQ endaweni yokukhiqiza.

Azikho izingozi zokuthumela imibhalo ngokuqondile ekukhiqizeni. Ngisho noma kunephutha le-syntax, ubukhulu obusisongelayo ukwehluleka ukwenza isheke elilodwa, ngoba ukugeleza kokubala kwedatha nokugeleza kokuqaliswa kokuhlola ikhwalithi kuyahlukaniswa.

Empeleni, isevisi ye-DQ isebenza unomphela endaweni yokukhiqiza futhi isilungele ukuqala umsebenzi wayo lapho ucezu olulandelayo lwedatha luvela.

Esikhundleni isiphetho

Inzuzo yokusebenzisa ikhodi ephansi isobala. Abathuthukisi abadingi ukuthuthukisa uhlelo lokusebenza kusukela ekuqaleni. Futhi umhleli okhululwe emisebenzini eyengeziwe ukhiqiza imiphumela ngokushesha. Ukusheshisa, nakho, kukhulula isikhathi esengeziwe sokuxazulula izinkinga zokulungiselela. Ngakho-ke, kulesi simo, ungathembela kusixazululo esingcono futhi esisheshayo.

Vele, ikhodi ephansi akuyona i-panacea, futhi umlingo ngeke uzenzele ngokwawo:

  • Imboni enekhodi ephansi idlula esigabeni "sokuqina", futhi azikho izindinganiso zezimboni ezifanayo okwamanje;
  • Izixazululo eziningi zekhodi ephansi azikhululekile, futhi ukuzithenga kufanele kube isinyathelo esiqaphelayo, okufanele senziwe ngokuqiniseka okugcwele ezinzuzweni zezezimali zokuzisebenzisa;
  • Izixazululo eziningi zekhodi ephansi azihlali zisebenza kahle nge-GIT/SVN. Noma akulula ukuwasebenzisa uma ikhodi ekhiqiziwe ifihliwe;
  • Lapho unweba izakhiwo, kungase kudingeke ukuba kucwengisiswe isisombululo sekhodi ephansi - okubuye kubangele umphumela "wokunamathisela nokuncika" kumphakeli wesixazululo sekhodi ephansi.
  • Izinga elanele lokuphepha lingenzeka, kodwa lidinga abasebenzi kakhulu futhi kunzima ukulisebenzisa ezinjinini zesistimu enekhodi ephansi. Amapulatifomu anekhodi ephansi akufanele akhethwe kuphela ngomgomo wokufuna izinzuzo ekusebenziseni kwawo. Lapho ukhetha, kufanelekile ukubuza imibuzo mayelana nokubakhona komsebenzi wokulawula ukufinyelela nokudlulisela/ukwenyuka kwedatha yokuhlonza ifike ezingeni layo yonke indawo ye-IT yenhlangano.

Ukusetshenziswa kwekhodi ephansi kumapulatifomu okuhlaziya

Kodwa-ke, uma zonke iziphambeko zesistimu ekhethiwe zaziwa kuwe, futhi izinzuzo ezivela ekusetshenzisweni kwayo, noma kunjalo, ziningi kakhulu, bese udlulela kukhodi encane ngaphandle kokwesaba. Ngaphezu kwalokho, ukushintshela kukho akunakugwenywa - njengoba nje noma yikuphi ukuziphendukela kwemvelo kungenakugwema.

Uma umthuthukisi oyedwa endaweni yekhodi ephansi enza umsebenzi wakhe ngokushesha kunabathuthukisi ababili ngaphandle kwekhodi ephansi, khona-ke lokhu kunikeza inkampani inhloko kuzo zonke izici. Umkhawulo wokungena ezixazululweni zekhodi ephansi uphansi kunezobuchwepheshe "zendabuko", futhi lokhu kunomphumela omuhle endabeni yokushoda kwabasebenzi. Uma usebenzisa amathuluzi ekhodi ephansi, kungenzeka ukusheshisa ukusebenzisana phakathi kwamaqembu asebenzayo futhi wenze izinqumo ezisheshayo mayelana nokunemba kwendlela ekhethiwe yocwaningo lwesayensi yedatha. Amapulatifomu asezingeni eliphansi angakwazi ukushayela ukuguqulwa kwedijithali kwenhlangano ngoba izixazululo ezikhiqizwayo zingaqondwa ngochwepheshe abangebona abezobuchwepheshe (ikakhulukazi abasebenzisi bebhizinisi).

Uma unesikhathi esinqunyiwe esiqinile, ingqondo yebhizinisi elayishiwe, ukuntuleka kochwepheshe bezobuchwepheshe, futhi udinga ukusheshisa isikhathi sakho sokumaketha, khona-ke ikhodi ephansi iyindlela eyodwa yokuhlangabezana nezidingo zakho.

Akukho ukuphika ukubaluleka kwamathuluzi okuthuthukiswa kwendabuko, kodwa ezimweni eziningi, ukusebenzisa izixazululo zekhodi ephansi kuyindlela engcono kakhulu yokwandisa ukusebenza kahle kwemisebenzi exazululwa.

Source: www.habr.com

Engeza amazwana