Yintoni ekhethekileyo malunga ne-Cloudera kunye nendlela yokuyipheka

Imarike yokusasazwa kwekhompyuter kunye nedatha enkulu, ngokutsho amanani, ikhula nge-18-19% ngonyaka. Oku kuthetha ukuba umba wokukhetha isoftware kwezi njongo uhlala ubalulekile. Kule post, siza kuqala malunga nokuba kutheni i-computing esasazwayo iyadingeka, ngena kwiinkcukacha ezithe kratya malunga nokukhetha isofthiwe, thetha ngokusebenzisa i-Hadoop usebenzisa i-Cloudera, kwaye ekugqibeleni uthethe ngokukhetha i-hardware kunye nendlela echaphazela ngayo ukusebenza ngeendlela ezahlukeneyo.

Yintoni ekhethekileyo malunga ne-Cloudera kunye nendlela yokuyipheka
Kutheni kufuneka icomputing esasaziweyo kwishishini eliqhelekileyo? Yonke into apha ilula kwaye inzima ngexesha elinye. Lula - kuba kwiimeko ezininzi senza izibalo ezilula ngokweyunithi yolwazi. Kunzima kuba luninzi ulwazi olunjalo. Ezininzi kakhulu. Ngenxa yoko, kuyimfuneko sebenzisa i-terabytes yedatha kwimisonto ye-1000. Ke, iimeko zokusebenzisa zikho jikelele: izibalo zingasetyenziswa naphi na apho kuyimfuneko ukuthathela ingqalelo inani elikhulu leemetrikhi kuludwe olukhulu ngakumbi lwedatha.

Omnye wemizekelo yakutsha nje: ikhonkco lepizzeria iDodo Pizza uzimisele ngokusekelwe kuhlalutyo lwedathabheyisi yomyalelo wabathengi, ukuba xa ukhetha i-pizza kunye ne-random topping, abasebenzisi badla ngokusebenza ngeeseti ezisisiseko ezintandathu kuphela zezithako kunye nesibini esingaqhelekanga. Ngokuhambelana noku, i-pizzeria ilungelelanise ukuthengwa kwayo. Ukongeza, wakwazi ukucebisa ngcono iimveliso ezongezelelweyo ezinikezelwa kubasebenzisi ngexesha lenqanaba lokuodola, elonyusa inzuzo.

Omnye umzekelo: uhlalutyo izinto zemveliso zivumele ivenkile ye-H&M ukuba inciphise i-assortment kwiivenkile ezizimeleyo nge-40%, ngelixa igcina amanqanaba okuthengisa. Oku kwaphunyezwa ngokungabandakanyi izinto ezithengiswa kakubi, kwaye ixesha lonyaka lithathelwe ingqalelo kwizibalo.

Ukukhetha isixhobo

Umgangatho weshishini wolu hlobo lwekhompyuter yiHadoop. Ngoba? Ngenxa yokuba iHadoop yeyona nto ibalaseleyo, isakhelo esibhalwe kakuhle (kwaloo Habr inye ibonelela ngamanqaku amaninzi aneenkcukacha ngesi sihloko), ekhatshwa luluhlu olupheleleyo lwezixhobo kunye namathala eencwadi. Ungafaka iiseti ezinkulu zazo zombini ezicwangcisiweyo kunye nedatha engalungiswanga, kwaye inkqubo ngokwayo iya kusasaza phakathi kwamandla ekhompyuter. Ngaphezu koko, ezi zakhono zifanayo zinokunyuswa okanye zikhubazwe nangaliphi na ixesha - kwaloo ntshukumo ithe tye scalability.

Kwi-2017, inkampani yokubonisana enempembelelo iGartner waqukumbelaukuba iHadoop iza kuphelelwa lixesha. Isizathu sinqabile: abahlalutyi bakholelwa ukuba iinkampani ziya kufudukela efini ngobuninzi, kuba apho ziya kuba nakho ukuhlawula njengoko zisebenzisa amandla ekhompyuter. Into yesibini ebalulekileyo ekunokuthiwa "ingcwabe" iHadoop sisantya sayo. Ngenxa yokuba iinketho ezifana ne-Apache Spark okanye i-Google Cloud DataFlow zikhawuleza kune-MapReduce, ephantsi kweHadoop.

I-Hadoop iphumle kwiintsika ezininzi, eyona nto iphawulekayo kuyo i-MapReduce technologies (inkqubo yokusabalalisa idatha yokubala phakathi kwamaseva) kunye nenkqubo yefayile ye-HDFS. Le yokugqibela yenzelwe ngokukhethekileyo ukugcina ulwazi olusasazwa phakathi kwee-nodes ze-cluster: ibhloko nganye yesayizi esisigxina inokubekwa kwiindawo ezininzi, kwaye ngenxa yokuphindaphinda, inkqubo iyakwazi ukujamelana nokungaphumeleli kweendawo zomntu ngamnye. Endaweni yetafile yefayile, iseva ekhethekileyo ebizwa ngokuba yiNameNode iyasetyenziswa.

Lo mfanekiso ungezantsi ubonisa indlela iMapReduce esebenza ngayo. Kwinqanaba lokuqala, idatha ihlukaniswe ngokwekhrayitheriya ethile, kwinqanaba lesibini lihanjiswa ngokwamandla ekhompyutheni, kwaye kwinqanaba lesithathu ukubala kwenzeka.

Yintoni ekhethekileyo malunga ne-Cloudera kunye nendlela yokuyipheka
I-MapReduce yadalwa kuqala nguGoogle kwiimfuno zayo zokukhangela. Emva koko iMapReduce yahamba ikhowudi yasimahla, kwaye iApache yathatha iprojekthi. Ewe, uGoogle ngokuthe ngcembe uye wafudukela kwezinye izisombululo. Inqaku elinomdla: UGoogle okwangoku uneprojekthi ebizwa ngokuba yiGoogle Cloud Dataflow, ebekwe njengenyathelo elilandelayo emva kweHadoop, njengokutshintsha ngokukhawuleza kwayo.

Ujongo olusondeleyo lubonisa ukuba i-Google Cloud Dataflow isekelwe kukwahluka kwe-Apache Beam, ngelixa i-Apache Beam ibandakanya isakhelo esibhalwe kakuhle se-Apache Spark, esivumela ukuba sithethe malunga nesantya esifanayo sokwenza izisombululo. Ewe, i-Apache Spark isebenza ngokugqibeleleyo kwinkqubo yefayile ye-HDFS, evumela ukuba isetyenziswe kwiiseva ze-Hadoop.

Yongeza apha umthamo wamaxwebhu kunye nezisombululo esele zenziwe zeHadoop kunye ne-Spark ngokuchasene ne-Google Cloud Dataflow, kwaye ukhetho lwesixhobo luyacaca. Ngaphezu koko, iinjineli zinokuzigqibela ukuba yeyiphi ikhowudi - yeHadoop okanye i-Spark - kufuneka iqhube, igxininise kumsebenzi, amava kunye neziqinisekiso.

Cloud okanye iseva yendawo

Umkhwa obhekiselele kutshintsho jikelele kwilifu ude wavelisa igama elinomdla njengeHadoop-as-a-service. Kwimeko enjalo, ulawulo lweeseva eziqhagamshelweyo lwaba lubaluleke kakhulu. Kuba, yeha, ngaphandle kokuthandwa kwayo, i-Hadoop esulungekileyo sisixhobo esinzima ukuseta, kuba kuninzi ekufuneka kwenziwe ngesandla. Umzekelo, qwalasela iiseva ngabanye, jonga ukusebenza kwazo, kwaye uqwalasele ngononophelo iiparamitha ezininzi. Ngokubanzi, umsebenzi ngowe-amateur kwaye kukho ithuba elikhulu lokungcolisa kwindawo ethile okanye ukuphosa into ethile.

Ke ngoko, iikhithi ezahlukeneyo zokusasaza, eziqale zixhotyiswe ngezixhobo ezifanelekileyo zokuhambisa kunye nolawulo, ziye zaziwa kakhulu. Olunye usasazo oludumileyo oluxhasa iSpark kwaye yenza yonke into ibe lula yiCloudera. Ineenguqulelo zombini ezihlawulweyo kunye nezisimahla - kwaye ekugqibeleni yonke imisebenzi eyisiseko iyafumaneka, ngaphandle kokunciphisa inani leenodi.

Yintoni ekhethekileyo malunga ne-Cloudera kunye nendlela yokuyipheka

Ngexesha lokucwangcisa, uMphathi weCloudera uya kuqhagamshela nge-SSH kwiiseva zakho. Inqaku elinomdla: xa ufaka, kungcono ukucacisa ukuba kuqhutywe yiloo nto kuthiwa iipasile: iipakethe ezikhethekileyo, nganye kuzo iqulethe zonke iinxalenye eziyimfuneko ezilungiselelwe ukusebenza kunye. Ngokusisiseko olu luguqulelo oluphuculweyo lomphathi wephakheji.

Emva kofakelo, sifumana i-console yolawulo lwe-cluster, apho unokubona i-telemetry ye-cluster, iinkonzo ezifakiweyo, kunye nokongeza / ukususa izibonelelo kwaye uhlele ukucwangciswa kweqela.

Yintoni ekhethekileyo malunga ne-Cloudera kunye nendlela yokuyipheka

Ngenxa yoko, ikhabhinethi yerokhethi eya kukuthatha ikuse kwikamva eliqaqambileyo leBigData ibonakala phambi kwakho. Kodwa ngaphambi kokuba sithi "masihambe," masihambe phantsi kwe-hood.

Iimfuno zehardware

Kwiwebhusayithi yayo, i-Cloudera ikhankanya ulungelelwaniso olunokwenzeka. Imigaqo ngokubanzi ezakhiwa ngayo iboniswe kulo mzekeliso:

Yintoni ekhethekileyo malunga ne-Cloudera kunye nendlela yokuyipheka
ImephuReduce inokuwenza ube mfiliba lo mfanekiso unethemba. Ukuba ujonga kwakhona umzobo ukusuka kwicandelo langaphambili, kuyacaca ukuba phantse kuzo zonke iimeko, umsebenzi we-MapReduce unokuhlangabezana ne-bottleneck xa ufunda idatha kwidiski okanye kwinethiwekhi. Oku kukwaphawulwe kwiblogi yeCloudera. Ngenxa yoko, nakuphi na ukubala ngokukhawuleza, kubandakanywa ne-Spark, ehlala isetyenziselwa izibalo zexesha langempela, isantya se-I / O sibaluleke kakhulu. Ngoko ke, xa usebenzisa i-Hadoop, kubaluleke kakhulu ukuba i-cluster ibandakanya oomatshini abalungeleleneyo nabakhawulezayo, abathi, ukuyibeka ngobumnene, ayisoloko iqinisekisiwe kwisiseko selifu.

Ibhalansi kulwabiwo lomthwalo luphunyezwa ngokusetyenziswa kwe-Openstack virtualization kwiiseva ezine-CPU ezinamandla ezininzi. Iinodi zedatha zabelwe izibonelelo zazo zeprosesa kunye neediski ezithile. Kwisigqibo sethu Atos Codex Data Lake Engine I-Wide virtualization ifezekisiwe, yingakho sizuza zombini ngokwemigaqo yokusebenza (impembelelo yeziseko zonxibelelwano zenethiwekhi iyancitshiswa) kunye ne-TCO (iiseva zomzimba ezongezelelweyo ziyacinywa).

Yintoni ekhethekileyo malunga ne-Cloudera kunye nendlela yokuyipheka
Xa usebenzisa iiseva ze-BullSequana S200, sifumana umthwalo ofanayo kakhulu, ongenazo iibhotile. Ubuncinci bokucwangciswa bubandakanya iiseva ezi-3 ze-BullSequana S200, nganye ine-JBOD ezimbini, kunye nee-S200 ezongezelelweyo eziqulethe ii-node ezine zedatha zixhunyiwe ngokukhetha. Nanku umzekelo womthwalo kuvavanyo lweTeraGen:

Yintoni ekhethekileyo malunga ne-Cloudera kunye nendlela yokuyipheka

Uvavanyo olunemithamo eyahlukeneyo yedatha kunye nexabiso lokuphindaphinda zibonisa iziphumo ezifanayo malunga nokuhanjiswa komthwalo phakathi kweendawo zeqela. Ngezantsi igrafu yokusasazwa kokufikelela kwidisk ngovavanyo lokusebenza.

Yintoni ekhethekileyo malunga ne-Cloudera kunye nendlela yokuyipheka

Ubalo lwenziwe ngokusekwe kubuncinane bobumbeko lweeseva ezi-3 ze-BullSequana S200. Ibandakanya i-9 yedatha yedatha kunye ne-3 master nodes, kunye noomatshini abagciniweyo abagciniweyo kwimeko yokuthunyelwa kokukhusela ngokusekelwe kwi-OpenStack Virtualization. Isiphumo sovavanyo lweTeraSort: ubungakanani bebhloko 512 MB into ephindaphindwayo elingana nesithathu ngoguqulelo oluntsonkothileyo yimizuzu engama-23,1.

Inokwandiswa njani inkqubo? Kukho iintlobo ezahlukeneyo zezandiso ezikhoyo kwi-Data Lake Engine:

  • Amanqaku edatha: kwi-TB nganye ye-40 yendawo enokusetyenziswa
  • Iinodi zokuhlalutya kunye nokukwazi ukufaka i-GPU
  • Olunye ukhetho ngokuxhomekeke kwiimfuno zoshishino (umzekelo, ukuba ufuna iKafka nokunye okunjalo)

Yintoni ekhethekileyo malunga ne-Cloudera kunye nendlela yokuyipheka

I-Atos Codex Data Lake Engine ibandakanya zombini iiseva ngokwazo kunye nesoftware efakwe ngaphambili, kubandakanywa nekhithi enelayisensi ye-Cloudera; I-Hadoop ngokwayo, i-OpenStack enemishini ebonakalayo esekelwe kwi-RedHat Enterprise Linux kernel, ukuphindaphinda idatha kunye neenkqubo zokulondoloza (kubandakanywa nokusebenzisa i-node yokugcina kunye ne-Cloudera BDR - i-Backup and Disaster Recovery). I-Atos Codex Data Lake Engine yaba sisisombululo sokuqala sokubona ukuba siqinisekiswe I-Cloudera.

Ukuba unomdla kwiinkcukacha, siya kukuvuyela ukuphendula imibuzo yethu kumazwana.

umthombo: www.habr.com

Yongeza izimvo