Imakethe yekhompyutha esabalalisiwe nedatha enkulu, ngokusho kwe
Kungani ikhompyutha esabalalisiwe idingeka ebhizinisini elivamile? Konke lapha kulula futhi kuyinkimbinkimbi ngesikhathi esifanayo. Kulula - ngoba ezimweni eziningi senza izibalo ezilula ngeyunithi ngayinye yolwazi. Kunzima ngoba luningi ulwazi olunjalo. Okuningi. Ngenxa yalokho, kuyadingeka
Esinye sezibonelo zakamuva: iketango le-pizzeria i-Dodo Pizza
Esinye isibonelo:
Ukukhetha ithuluzi
Izinga lemboni yalolu hlobo lwekhompuyutha i-Hadoop. Kungani? Ngoba i-Hadoop iwuhlaka oluhle kakhulu, olubhalwe kahle (i-Habr efanayo inikeza izindatshana ezinemininingwane eminingi ngalesi sihloko), ehambisana nesethi yonke yezinsiza nemitapo yolwazi. Ungafaka amasethi amakhulu akho kokubili idatha ehlelekile nengahlelekile, futhi isistimu ngokwayo izosabalalisa phakathi kwamandla ekhompyutha. Ngaphezu kwalokho, lawa mandla afanayo angakhushulwa noma akhutshazwe nganoma yisiphi isikhathi - lokho kukhula okuvundlile okufanayo esenzweni.
Ngo-2017, inkampani yokubonisana enethonya iGartner
I-Hadoop incike ezinsikeni ezimbalwa, eziphawuleka kakhulu kuzo ubuchwepheshe be-MapReduce (uhlelo lokusabalalisa idatha yokubala phakathi kwamaseva) nohlelo lwefayela le-HDFS. Lokhu kokugcina kuklanyelwe ngokukhethekile ukugcina ulwazi olusatshalaliswa phakathi kwamanodi eqoqo: ibhulokhi ngalinye losayizi ogxilile lingabekwa ezindaweni ezimbalwa, futhi ngenxa yokuphindaphinda, uhlelo luyakwazi ukumelana nokwehluleka kwamanodi ngamanye. Esikhundleni setafula lefayela, kusetshenziswa iseva ekhethekile ebizwa nge-NameNode.
Umfanekiso ongezansi ubonisa ukuthi i-MapReduce isebenza kanjani. Esigabeni sokuqala, idatha ihlukaniswa ngokwemibandela ethile, esigabeni sesibili isakazwa ngokwamandla ekhompyutha, futhi esigabeni sesithathu ukubala kwenzeka.
I-MapReduce ekuqaleni yadalwa yi-Google ngezidingo zayo zokusesha. Yabe i-MapReduce yahamba ngekhodi yamahhala, futhi i-Apache yathatha iphrojekthi. Nokho, i-Google kancane kancane yathuthela kwezinye izixazululo. I-tidbit ethokozisayo: I-Google njengamanje inephrojekthi ebizwa nge-Google Cloud Dataflow, ebekwe njengesinyathelo esilandelayo ngemuva kwe-Hadoop, njengokuyishintsha ngokushesha.
Ukubhekisisa kukhombisa ukuthi i-Google Cloud Dataflow isekelwe ekuhlukeni kwe-Apache Beam, kuyilapho i-Apache Beam ihlanganisa uhlaka lwe-Apache Spark olubhalwe kahle, olusivumela ukuthi sikhulume cishe ngesivinini esifanayo sokwenza izixazululo. Yebo, i-Apache Spark isebenza ngokuphelele ohlelweni lwefayela le-HDFS, elivumela ukuthi lisatshalaliswe kumaseva we-Hadoop.
Engeza lapha umthamo wamadokhumenti nezisombululo esezenziwe ngomumo ze-Hadoop ne-Spark ngokumelene ne-Google Cloud Dataflow, futhi ukukhetha kwethuluzi kuba sobala. Ngaphezu kwalokho, onjiniyela bangazinqumela bona ukuthi iyiphi ikhodi - ye-Hadoop noma i-Spark - okufanele bayigijime, igxile emsebenzini, isipiliyoni kanye neziqu.
Cloud noma iseva yendawo
Umkhuba obheke ekushintsheni okuvamile kwefu uze unikeze igama elithakazelisayo njenge-Hadoop-as-a-service. Esimeni esinjalo, ukuphathwa kwamaseva axhunyiwe kwaba okubaluleke kakhulu. Ngoba, maye, naphezu kokuthandwa kwayo, i-Hadoop ehlanzekile iyithuluzi elinzima kakhulu ukulilungisa, njengoba kuningi okufanele kwenziwe ngesandla. Isibonelo, lungiselela amaseva ngawodwana, qapha ukusebenza kwawo, futhi ulungiselele ngokucophelela amapharamitha amaningi. Ngokuvamile, umsebenzi ngowomfundamakhwela futhi kunethuba elikhulu lokumosha endaweni ethile noma ukuphuthelwa okuthile.
Ngakho-ke, amakhithi okusabalalisa ahlukahlukene, aqale ahlomeke ngamathuluzi alula okuhambisa nawokuphatha, asedume kakhulu. Okunye okusatshalaliswa okudume kakhulu okusekela i-Spark nokwenza konke kube lula yi-Cloudera. Inezinguqulo zombili ezikhokhelwayo nezimahhala - futhi ekugcineni konke ukusebenza okuyisisekelo kuyatholakala, ngaphandle kokukhawulela inani lamanodi.
Phakathi nokusetha, Isiphathi se-Cloudera sizoxhuma nge-SSH kumaseva akho. Iphuzu elithakazelisayo: lapho ufaka, kungcono ukucacisa ukuthi kwenziwa yilabo okuthiwa amaphasela: amaphakheji akhethekile, ngalinye eliqukethe zonke izingxenye ezidingekayo ezilungiselelwe ukusebenzisana. Empeleni lena inguqulo ethuthukisiwe yomphathi wephakheji.
Ngemva kokufaka, sithola ikhonsoli yokuphatha iqoqo, lapho ungabona khona i-cluster telemetry, amasevisi afakiwe, futhi ungakwazi ukwengeza/ukukhipha izinsiza futhi uhlele ukucushwa kweqoqo.
Ngenxa yalokho, ikhabethe le-rocket elizokuyisa ekusaseni eliqhakazile le-BigData livela phambi kwakho. Kodwa ngaphambi kokuthi sithi “asihambe,” ake sihambe ngaphansi kwesivalo.
Izidingo zezingxenyekazi zekhompuyutha
Kuwebhusayithi yayo, i-Cloudera ikhuluma ngokulungiselelwa okuhlukile okungenzeka. Izimiso ezijwayelekile ezakhiwa ngazo ziboniswa emfanekisweni:
I-MapReduce ingafiphalisa lesi sithombe esinethemba. Uma ubheka futhi umdwebo ovela esigabeni sangaphambilini, kuyacaca ukuthi cishe kuzo zonke izimo, umsebenzi we-MapReduce ungabhekana nebhodlela lapho ufunda idatha kusuka kudiski noma kunethiwekhi. Lokhu kuphawulwe naku-Cloudera blog. Ngenxa yalokho, kunoma yiziphi izibalo ezisheshayo, okuhlanganisa ne-Spark, evame ukusetshenziselwa izibalo zesikhathi sangempela, isivinini se-I/O sibaluleke kakhulu. Ngakho-ke, uma usebenzisa i-Hadoop, kubaluleke kakhulu ukuthi iqoqo lihlanganisa imishini elinganiselayo futhi esheshayo, okuthi, ukuyibeka kancane, ayiqinisekiswa ngaso sonke isikhathi kungqalasizinda yamafu.
Ibhalansi ekusabalaliseni komthwalo itholakala ngokusebenzisa i-Openstack virtualization kumaseva anama-CPU anamandla amaningi. Ama-data node abelwe izinsiza zawo zokuphrosesa namadiski athile. Esinqumweni sethu I-Atos Codex Data Lake Engine I-Virtualization ebanzi ifinyelelwa, yingakho sizuza kokubili ngokusebenza (umthelela wengqalasizinda yenethiwekhi uyancishiswa) naku-TCO (amaseva angokwenyama angeziwe ayasuswa).
Uma sisebenzisa amaseva e-BullSequana S200, sithola umthwalo ofanayo kakhulu, ongenawo amabhodlela athile. Ukucushwa okuncane kuhlanganisa amaseva angu-3 e-BullSequana S200, ngayinye enama-JBOD amabili, kanye nama-S200 engeziwe aqukethe ama-node edatha amane axhumeke ngokuzithandela. Nasi isibonelo somthwalo ekuhlolweni kwe-TeraGen:
Ukuhlolwa okunamavolumu ahlukene wedatha namanani okuphindaphinda kubonisa imiphumela efanayo ngokuya ngokusabalalisa komthwalo phakathi kwamanodi eqoqo. Ngezansi igrafu yokusatshalaliswa kokufinyelela kwediski ngokuhlolwa kokusebenza.
Izibalo zenziwe ngokusekelwe ekucushweni okuncane kwamaseva angu-3 we-BullSequana S200. Kuhlanganisa ama-data node angu-9, kanye nemishini ebonakalayo egodliwe uma kwenzeka kuthunyelwa isivikelo esisekelwe ku-OpenStack Virtualization. Umphumela wokuhlola we-TeraSort: isici sokuphindaphinda sikasayizi webhulokhi 3 MB esilingana nokuthathu ngokubethela imizuzu engama-512.
Isistimu inganwetshwa kanjani? Kunezinhlobo ezahlukene zezandiso ezitholakalayo ze-Data Lake Engine:
- Ama-data node: kuwo wonke ama-TB angama-40 wesikhala esisebenzisekayo
- Amanodi okuhlaziya anekhono lokufaka i-GPU
- Ezinye izinketho kuye ngezidingo zebhizinisi (isibonelo, uma udinga i-Kafka nokunye okunjalo)
I-Atos Codex Data Lake Engine ihlanganisa kokubili amaseva ngokwawo kanye nesofthiwe efakwe ngaphambili, okuhlanganisa nekhithi ye-Cloudera enelayisensi; I-Hadoop ngokwayo, i-OpenStack enemishini ebonakalayo esekelwe ku-RedHat Enterprise Linux kernel, ukuphindaphinda kwedatha nezinhlelo zokulondoloza (okuhlanganisa ukusebenzisa i-backup node kanye ne-Cloudera BDR - Isipele Nokubuyisela Inhlekelele). I-Atos Codex Data Lake Engine ibe yisixazululo sokuqala sokubona ukuthi sigunyazwe
Uma unentshisekelo ngemininingwane, sizokujabulela ukuphendula imibuzo yethu kumazwana.
Source: www.habr.com