Imarike yokusasazwa kwekhompyuter kunye nedatha enkulu, ngokutsho
Kutheni kufuneka icomputing esasaziweyo kwishishini eliqhelekileyo? Yonke into apha ilula kwaye inzima ngexesha elinye. Lula - kuba kwiimeko ezininzi senza izibalo ezilula ngokweyunithi yolwazi. Kunzima kuba luninzi ulwazi olunjalo. Ezininzi kakhulu. Ngenxa yoko, kuyimfuneko
Omnye wemizekelo yakutsha nje: ikhonkco lepizzeria iDodo Pizza
Omnye umzekelo:
Ukukhetha isixhobo
Umgangatho weshishini wolu hlobo lwekhompyuter yiHadoop. Ngoba? Ngenxa yokuba iHadoop yeyona nto ibalaseleyo, isakhelo esibhalwe kakuhle (kwaloo Habr inye ibonelela ngamanqaku amaninzi aneenkcukacha ngesi sihloko), ekhatshwa luluhlu olupheleleyo lwezixhobo kunye namathala eencwadi. Ungafaka iiseti ezinkulu zazo zombini ezicwangcisiweyo kunye nedatha engalungiswanga, kwaye inkqubo ngokwayo iya kusasaza phakathi kwamandla ekhompyuter. Ngaphezu koko, ezi zakhono zifanayo zinokunyuswa okanye zikhubazwe nangaliphi na ixesha - kwaloo ntshukumo ithe tye scalability.
Kwi-2017, inkampani yokubonisana enempembelelo iGartner
I-Hadoop iphumle kwiintsika ezininzi, eyona nto iphawulekayo kuyo i-MapReduce technologies (inkqubo yokusabalalisa idatha yokubala phakathi kwamaseva) kunye nenkqubo yefayile ye-HDFS. Le yokugqibela yenzelwe ngokukhethekileyo ukugcina ulwazi olusasazwa phakathi kwee-nodes ze-cluster: ibhloko nganye yesayizi esisigxina inokubekwa kwiindawo ezininzi, kwaye ngenxa yokuphindaphinda, inkqubo iyakwazi ukujamelana nokungaphumeleli kweendawo zomntu ngamnye. Endaweni yetafile yefayile, iseva ekhethekileyo ebizwa ngokuba yiNameNode iyasetyenziswa.
Lo mfanekiso ungezantsi ubonisa indlela iMapReduce esebenza ngayo. Kwinqanaba lokuqala, idatha ihlukaniswe ngokwekhrayitheriya ethile, kwinqanaba lesibini lihanjiswa ngokwamandla ekhompyutheni, kwaye kwinqanaba lesithathu ukubala kwenzeka.
I-MapReduce yadalwa kuqala nguGoogle kwiimfuno zayo zokukhangela. Emva koko iMapReduce yahamba ikhowudi yasimahla, kwaye iApache yathatha iprojekthi. Ewe, uGoogle ngokuthe ngcembe uye wafudukela kwezinye izisombululo. Inqaku elinomdla: UGoogle okwangoku uneprojekthi ebizwa ngokuba yiGoogle Cloud Dataflow, ebekwe njengenyathelo elilandelayo emva kweHadoop, njengokutshintsha ngokukhawuleza kwayo.
Ujongo olusondeleyo lubonisa ukuba i-Google Cloud Dataflow isekelwe kukwahluka kwe-Apache Beam, ngelixa i-Apache Beam ibandakanya isakhelo esibhalwe kakuhle se-Apache Spark, esivumela ukuba sithethe malunga nesantya esifanayo sokwenza izisombululo. Ewe, i-Apache Spark isebenza ngokugqibeleleyo kwinkqubo yefayile ye-HDFS, evumela ukuba isetyenziswe kwiiseva ze-Hadoop.
Yongeza apha umthamo wamaxwebhu kunye nezisombululo esele zenziwe zeHadoop kunye ne-Spark ngokuchasene ne-Google Cloud Dataflow, kwaye ukhetho lwesixhobo luyacaca. Ngaphezu koko, iinjineli zinokuzigqibela ukuba yeyiphi ikhowudi - yeHadoop okanye i-Spark - kufuneka iqhube, igxininise kumsebenzi, amava kunye neziqinisekiso.
Cloud okanye iseva yendawo
Umkhwa obhekiselele kutshintsho jikelele kwilifu ude wavelisa igama elinomdla njengeHadoop-as-a-service. Kwimeko enjalo, ulawulo lweeseva eziqhagamshelweyo lwaba lubaluleke kakhulu. Kuba, yeha, ngaphandle kokuthandwa kwayo, i-Hadoop esulungekileyo sisixhobo esinzima ukuseta, kuba kuninzi ekufuneka kwenziwe ngesandla. Umzekelo, qwalasela iiseva ngabanye, jonga ukusebenza kwazo, kwaye uqwalasele ngononophelo iiparamitha ezininzi. Ngokubanzi, umsebenzi ngowe-amateur kwaye kukho ithuba elikhulu lokungcolisa kwindawo ethile okanye ukuphosa into ethile.
Ke ngoko, iikhithi ezahlukeneyo zokusasaza, eziqale zixhotyiswe ngezixhobo ezifanelekileyo zokuhambisa kunye nolawulo, ziye zaziwa kakhulu. Olunye usasazo oludumileyo oluxhasa iSpark kwaye yenza yonke into ibe lula yiCloudera. Ineenguqulelo zombini ezihlawulweyo kunye nezisimahla - kwaye ekugqibeleni yonke imisebenzi eyisiseko iyafumaneka, ngaphandle kokunciphisa inani leenodi.
Ngexesha lokucwangcisa, uMphathi weCloudera uya kuqhagamshela nge-SSH kwiiseva zakho. Inqaku elinomdla: xa ufaka, kungcono ukucacisa ukuba kuqhutywe yiloo nto kuthiwa iipasile: iipakethe ezikhethekileyo, nganye kuzo iqulethe zonke iinxalenye eziyimfuneko ezilungiselelwe ukusebenza kunye. Ngokusisiseko olu luguqulelo oluphuculweyo lomphathi wephakheji.
Emva kofakelo, sifumana i-console yolawulo lwe-cluster, apho unokubona i-telemetry ye-cluster, iinkonzo ezifakiweyo, kunye nokongeza / ukususa izibonelelo kwaye uhlele ukucwangciswa kweqela.
Ngenxa yoko, ikhabhinethi yerokhethi eya kukuthatha ikuse kwikamva eliqaqambileyo leBigData ibonakala phambi kwakho. Kodwa ngaphambi kokuba sithi "masihambe," masihambe phantsi kwe-hood.
Iimfuno zehardware
Kwiwebhusayithi yayo, i-Cloudera ikhankanya ulungelelwaniso olunokwenzeka. Imigaqo ngokubanzi ezakhiwa ngayo iboniswe kulo mzekeliso:
ImephuReduce inokuwenza ube mfiliba lo mfanekiso unethemba. Ukuba ujonga kwakhona umzobo ukusuka kwicandelo langaphambili, kuyacaca ukuba phantse kuzo zonke iimeko, umsebenzi we-MapReduce unokuhlangabezana ne-bottleneck xa ufunda idatha kwidiski okanye kwinethiwekhi. Oku kukwaphawulwe kwiblogi yeCloudera. Ngenxa yoko, nakuphi na ukubala ngokukhawuleza, kubandakanywa ne-Spark, ehlala isetyenziselwa izibalo zexesha langempela, isantya se-I / O sibaluleke kakhulu. Ngoko ke, xa usebenzisa i-Hadoop, kubaluleke kakhulu ukuba i-cluster ibandakanya oomatshini abalungeleleneyo nabakhawulezayo, abathi, ukuyibeka ngobumnene, ayisoloko iqinisekisiwe kwisiseko selifu.
Ibhalansi kulwabiwo lomthwalo luphunyezwa ngokusetyenziswa kwe-Openstack virtualization kwiiseva ezine-CPU ezinamandla ezininzi. Iinodi zedatha zabelwe izibonelelo zazo zeprosesa kunye neediski ezithile. Kwisigqibo sethu Atos Codex Data Lake Engine I-Wide virtualization ifezekisiwe, yingakho sizuza zombini ngokwemigaqo yokusebenza (impembelelo yeziseko zonxibelelwano zenethiwekhi iyancitshiswa) kunye ne-TCO (iiseva zomzimba ezongezelelweyo ziyacinywa).
Xa usebenzisa iiseva ze-BullSequana S200, sifumana umthwalo ofanayo kakhulu, ongenazo iibhotile. Ubuncinci bokucwangciswa bubandakanya iiseva ezi-3 ze-BullSequana S200, nganye ine-JBOD ezimbini, kunye nee-S200 ezongezelelweyo eziqulethe ii-node ezine zedatha zixhunyiwe ngokukhetha. Nanku umzekelo womthwalo kuvavanyo lweTeraGen:
Uvavanyo olunemithamo eyahlukeneyo yedatha kunye nexabiso lokuphindaphinda zibonisa iziphumo ezifanayo malunga nokuhanjiswa komthwalo phakathi kweendawo zeqela. Ngezantsi igrafu yokusasazwa kokufikelela kwidisk ngovavanyo lokusebenza.
Ubalo lwenziwe ngokusekwe kubuncinane bobumbeko lweeseva ezi-3 ze-BullSequana S200. Ibandakanya i-9 yedatha yedatha kunye ne-3 master nodes, kunye noomatshini abagciniweyo abagciniweyo kwimeko yokuthunyelwa kokukhusela ngokusekelwe kwi-OpenStack Virtualization. Isiphumo sovavanyo lweTeraSort: ubungakanani bebhloko 512 MB into ephindaphindwayo elingana nesithathu ngoguqulelo oluntsonkothileyo yimizuzu engama-23,1.
Inokwandiswa njani inkqubo? Kukho iintlobo ezahlukeneyo zezandiso ezikhoyo kwi-Data Lake Engine:
- Amanqaku edatha: kwi-TB nganye ye-40 yendawo enokusetyenziswa
- Iinodi zokuhlalutya kunye nokukwazi ukufaka i-GPU
- Olunye ukhetho ngokuxhomekeke kwiimfuno zoshishino (umzekelo, ukuba ufuna iKafka nokunye okunjalo)
I-Atos Codex Data Lake Engine ibandakanya zombini iiseva ngokwazo kunye nesoftware efakwe ngaphambili, kubandakanywa nekhithi enelayisensi ye-Cloudera; I-Hadoop ngokwayo, i-OpenStack enemishini ebonakalayo esekelwe kwi-RedHat Enterprise Linux kernel, ukuphindaphinda idatha kunye neenkqubo zokulondoloza (kubandakanywa nokusebenzisa i-node yokugcina kunye ne-Cloudera BDR - i-Backup and Disaster Recovery). I-Atos Codex Data Lake Engine yaba sisisombululo sokuqala sokubona ukuba siqinisekiswe
Ukuba unomdla kwiinkcukacha, siya kukuvuyela ukuphendula imibuzo yethu kumazwana.
umthombo: www.habr.com