Yini ekhethekile nge-Cloudera nokuthi ungayipheka kanjani

Imakethe yekhompyutha esabalalisiwe nedatha enkulu, ngokusho kwe izibalo, ikhula ngo-18-19% ngonyaka. Lokhu kusho ukuthi indaba yokukhetha isofthiwe yalezi zinhloso ihlala ibalulekile. Kulokhu okuthunyelwe, sizoqala ngokuthi kungani i-computing esabalalisiwe idingeka, singene emininingwaneni eyengeziwe mayelana nokukhetha isofthiwe, sikhulume ngokusebenzisa i-Hadoop usebenzisa i-Cloudera, futhi ekugcineni sikhulume ngokukhetha i-hardware nokuthi ithinta kanjani ukusebenza ngezindlela ezahlukene.

Yini ekhethekile nge-Cloudera nokuthi ungayipheka kanjani
Kungani ikhompyutha esabalalisiwe idingeka ebhizinisini elivamile? Konke lapha kulula futhi kuyinkimbinkimbi ngesikhathi esifanayo. Kulula - ngoba ezimweni eziningi senza izibalo ezilula ngeyunithi ngayinye yolwazi. Kunzima ngoba luningi ulwazi olunjalo. Okuningi. Ngenxa yalokho, kuyadingeka cubungula ama-terabytes edatha emiculweni eyi-1000. Ngakho-ke, izimo zokusetshenziswa zitholakala emhlabeni wonke: izibalo zingasetshenziswa noma kuphi lapho kudingeka khona ukuze kucatshangelwe inani elikhulu lamamethrikhi kuhlu lwedatha enkulu nakakhulu.

Esinye sezibonelo zakamuva: iketango le-pizzeria i-Dodo Pizza kunqunyiwe okusekelwe ekuhlaziyweni kwedathabheyisi ye-oda yamakhasimende, ukuthi lapho bekhetha i-pizza enokwengeza okungahleliwe, abasebenzisi ngokuvamile basebenza ngamasethi ayisisekelo ayisithupha kuphela ezithako kanye nezimbalwa ezimbalwa ezingahleliwe. Ngokuvumelana nalokhu, i-pizzeria yalungisa ukuthenga kwayo. Ngaphezu kwalokho, ukwazile ukuncoma kangcono imikhiqizo eyengeziwe enikezwa abasebenzisi ngesikhathi sesiteji soku-oda, okwandise inzuzo.

Esinye isibonelo: ukuhlaziya izinto zomkhiqizo zivumele isitolo se-H&M ukuthi sehlise i-assortment ezitolo ngazinye ngo-40%, kuyilapho sigcina amazinga okuthengisa. Lokhu kwafezwa ngokungabandakanyi izinto ezithengiswa kabi, futhi kwacatshangelwa izikhathi zonyaka ezibalweni.

Ukukhetha ithuluzi

Izinga lemboni yalolu hlobo lwekhompuyutha i-Hadoop. Kungani? Ngoba i-Hadoop iwuhlaka oluhle kakhulu, olubhalwe kahle (i-Habr efanayo inikeza izindatshana ezinemininingwane eminingi ngalesi sihloko), ehambisana nesethi yonke yezinsiza nemitapo yolwazi. Ungafaka amasethi amakhulu akho kokubili idatha ehlelekile nengahlelekile, futhi isistimu ngokwayo izosabalalisa phakathi kwamandla ekhompyutha. Ngaphezu kwalokho, lawa mandla afanayo angakhushulwa noma akhutshazwe nganoma yisiphi isikhathi - lokho kukhula okuvundlile okufanayo esenzweni.

Ngo-2017, inkampani yokubonisana enethonya iGartner kuphethaukuthi i-Hadoop izophelelwa yisikhathi maduze. Isizathu siphambene impela: abahlaziyi bakholelwa ukuthi izinkampani zizofuduka ngobuningi ziye efwini, njengoba zizokwazi ukukhokha njengoba zisebenzisa amandla ekhompyutha. Isici sesibili esibalulekile okuthiwa "singangcwaba" i-Hadoop ijubane layo. Ngoba izinketho ezifana ne-Apache Spark noma i-Google Cloud DataFlow zishesha kune-MapReduce, engaphansi kwe-Hadoop.

I-Hadoop incike ezinsikeni ezimbalwa, eziphawuleka kakhulu kuzo ubuchwepheshe be-MapReduce (uhlelo lokusabalalisa idatha yokubala phakathi kwamaseva) nohlelo lwefayela le-HDFS. Lokhu kokugcina kuklanyelwe ngokukhethekile ukugcina ulwazi olusatshalaliswa phakathi kwamanodi eqoqo: ibhulokhi ngalinye losayizi ogxilile lingabekwa ezindaweni ezimbalwa, futhi ngenxa yokuphindaphinda, uhlelo luyakwazi ukumelana nokwehluleka kwamanodi ngamanye. Esikhundleni setafula lefayela, kusetshenziswa iseva ekhethekile ebizwa nge-NameNode.

Umfanekiso ongezansi ubonisa ukuthi i-MapReduce isebenza kanjani. Esigabeni sokuqala, idatha ihlukaniswa ngokwemibandela ethile, esigabeni sesibili isakazwa ngokwamandla ekhompyutha, futhi esigabeni sesithathu ukubala kwenzeka.

Yini ekhethekile nge-Cloudera nokuthi ungayipheka kanjani
I-MapReduce ekuqaleni yadalwa yi-Google ngezidingo zayo zokusesha. Yabe i-MapReduce yahamba ngekhodi yamahhala, futhi i-Apache yathatha iphrojekthi. Nokho, i-Google kancane kancane yathuthela kwezinye izixazululo. I-tidbit ethokozisayo: I-Google njengamanje inephrojekthi ebizwa nge-Google Cloud Dataflow, ebekwe njengesinyathelo esilandelayo ngemuva kwe-Hadoop, njengokuyishintsha ngokushesha.

Ukubhekisisa kukhombisa ukuthi i-Google Cloud Dataflow isekelwe ekuhlukeni kwe-Apache Beam, kuyilapho i-Apache Beam ihlanganisa uhlaka lwe-Apache Spark olubhalwe kahle, olusivumela ukuthi sikhulume cishe ngesivinini esifanayo sokwenza izixazululo. Yebo, i-Apache Spark isebenza ngokuphelele ohlelweni lwefayela le-HDFS, elivumela ukuthi lisatshalaliswe kumaseva we-Hadoop.

Engeza lapha umthamo wamadokhumenti nezisombululo esezenziwe ngomumo ze-Hadoop ne-Spark ngokumelene ne-Google Cloud Dataflow, futhi ukukhetha kwethuluzi kuba sobala. Ngaphezu kwalokho, onjiniyela bangazinqumela bona ukuthi iyiphi ikhodi - ye-Hadoop noma i-Spark - okufanele bayigijime, igxile emsebenzini, isipiliyoni kanye neziqu.

Cloud noma iseva yendawo

Umkhuba obheke ekushintsheni okuvamile kwefu uze unikeze igama elithakazelisayo njenge-Hadoop-as-a-service. Esimeni esinjalo, ukuphathwa kwamaseva axhunyiwe kwaba okubaluleke kakhulu. Ngoba, maye, naphezu kokuthandwa kwayo, i-Hadoop ehlanzekile iyithuluzi elinzima kakhulu ukulilungisa, njengoba kuningi okufanele kwenziwe ngesandla. Isibonelo, lungiselela amaseva ngawodwana, qapha ukusebenza kwawo, futhi ulungiselele ngokucophelela amapharamitha amaningi. Ngokuvamile, umsebenzi ngowomfundamakhwela futhi kunethuba elikhulu lokumosha endaweni ethile noma ukuphuthelwa okuthile.

Ngakho-ke, amakhithi okusabalalisa ahlukahlukene, aqale ahlomeke ngamathuluzi alula okuhambisa nawokuphatha, asedume kakhulu. Okunye okusatshalaliswa okudume kakhulu okusekela i-Spark nokwenza konke kube lula yi-Cloudera. Inezinguqulo zombili ezikhokhelwayo nezimahhala - futhi ekugcineni konke ukusebenza okuyisisekelo kuyatholakala, ngaphandle kokukhawulela inani lamanodi.

Yini ekhethekile nge-Cloudera nokuthi ungayipheka kanjani

Phakathi nokusetha, Isiphathi se-Cloudera sizoxhuma nge-SSH kumaseva akho. Iphuzu elithakazelisayo: lapho ufaka, kungcono ukucacisa ukuthi kwenziwa yilabo okuthiwa amaphasela: amaphakheji akhethekile, ngalinye eliqukethe zonke izingxenye ezidingekayo ezilungiselelwe ukusebenzisana. Empeleni lena inguqulo ethuthukisiwe yomphathi wephakheji.

Ngemva kokufaka, sithola ikhonsoli yokuphatha iqoqo, lapho ungabona khona i-cluster telemetry, amasevisi afakiwe, futhi ungakwazi ukwengeza/ukukhipha izinsiza futhi uhlele ukucushwa kweqoqo.

Yini ekhethekile nge-Cloudera nokuthi ungayipheka kanjani

Ngenxa yalokho, ikhabethe le-rocket elizokuyisa ekusaseni eliqhakazile le-BigData livela phambi kwakho. Kodwa ngaphambi kokuthi sithi β€œasihambe,” ake sihambe ngaphansi kwesivalo.

Izidingo zezingxenyekazi zekhompuyutha

Kuwebhusayithi yayo, i-Cloudera ikhuluma ngokulungiselelwa okuhlukile okungenzeka. Izimiso ezijwayelekile ezakhiwa ngazo ziboniswa emfanekisweni:

Yini ekhethekile nge-Cloudera nokuthi ungayipheka kanjani
I-MapReduce ingafiphalisa lesi sithombe esinethemba. Uma ubheka futhi umdwebo ovela esigabeni sangaphambilini, kuyacaca ukuthi cishe kuzo zonke izimo, umsebenzi we-MapReduce ungabhekana nebhodlela lapho ufunda idatha kusuka kudiski noma kunethiwekhi. Lokhu kuphawulwe naku-Cloudera blog. Ngenxa yalokho, kunoma yiziphi izibalo ezisheshayo, okuhlanganisa ne-Spark, evame ukusetshenziselwa izibalo zesikhathi sangempela, isivinini se-I/O sibaluleke kakhulu. Ngakho-ke, uma usebenzisa i-Hadoop, kubaluleke kakhulu ukuthi iqoqo lihlanganisa imishini elinganiselayo futhi esheshayo, okuthi, ukuyibeka kancane, ayiqinisekiswa ngaso sonke isikhathi kungqalasizinda yamafu.

Ibhalansi ekusabalaliseni komthwalo itholakala ngokusebenzisa i-Openstack virtualization kumaseva anama-CPU anamandla amaningi. Ama-data node abelwe izinsiza zawo zokuphrosesa namadiski athile. Esinqumweni sethu I-Atos Codex Data Lake Engine I-Virtualization ebanzi ifinyelelwa, yingakho sizuza kokubili ngokusebenza (umthelela wengqalasizinda yenethiwekhi uyancishiswa) naku-TCO (amaseva angokwenyama angeziwe ayasuswa).

Yini ekhethekile nge-Cloudera nokuthi ungayipheka kanjani
Uma sisebenzisa amaseva e-BullSequana S200, sithola umthwalo ofanayo kakhulu, ongenawo amabhodlela athile. Ukucushwa okuncane kuhlanganisa amaseva angu-3 e-BullSequana S200, ngayinye enama-JBOD amabili, kanye nama-S200 engeziwe aqukethe ama-node edatha amane axhumeke ngokuzithandela. Nasi isibonelo somthwalo ekuhlolweni kwe-TeraGen:

Yini ekhethekile nge-Cloudera nokuthi ungayipheka kanjani

Ukuhlolwa okunamavolumu ahlukene wedatha namanani okuphindaphinda kubonisa imiphumela efanayo ngokuya ngokusabalalisa komthwalo phakathi kwamanodi eqoqo. Ngezansi igrafu yokusatshalaliswa kokufinyelela kwediski ngokuhlolwa kokusebenza.

Yini ekhethekile nge-Cloudera nokuthi ungayipheka kanjani

Izibalo zenziwe ngokusekelwe ekucushweni okuncane kwamaseva angu-3 we-BullSequana S200. Kuhlanganisa ama-data node angu-9, ​​kanye nemishini ebonakalayo egodliwe uma kwenzeka kuthunyelwa isivikelo esisekelwe ku-OpenStack Virtualization. Umphumela wokuhlola we-TeraSort: isici sokuphindaphinda sikasayizi webhulokhi 3 MB esilingana nokuthathu ngokubethela imizuzu engama-512.

Isistimu inganwetshwa kanjani? Kunezinhlobo ezahlukene zezandiso ezitholakalayo ze-Data Lake Engine:

  • Ama-data node: kuwo wonke ama-TB angama-40 wesikhala esisebenzisekayo
  • Amanodi okuhlaziya anekhono lokufaka i-GPU
  • Ezinye izinketho kuye ngezidingo zebhizinisi (isibonelo, uma udinga i-Kafka nokunye okunjalo)

Yini ekhethekile nge-Cloudera nokuthi ungayipheka kanjani

I-Atos Codex Data Lake Engine ihlanganisa kokubili amaseva ngokwawo kanye nesofthiwe efakwe ngaphambili, okuhlanganisa nekhithi ye-Cloudera enelayisensi; I-Hadoop ngokwayo, i-OpenStack enemishini ebonakalayo esekelwe ku-RedHat Enterprise Linux kernel, ukuphindaphinda kwedatha nezinhlelo zokulondoloza (okuhlanganisa ukusebenzisa i-backup node kanye ne-Cloudera BDR - Isipele Nokubuyisela Inhlekelele). I-Atos Codex Data Lake Engine ibe yisixazululo sokuqala sokubona ukuthi sigunyazwe I-Cloudera.

Uma unentshisekelo ngemininingwane, sizokujabulela ukuphendula imibuzo yethu kumazwana.

Source: www.habr.com

Engeza amazwana