Chii chakakosha nezve Cloudera uye maitiro ekubika

Musika wekugoverwa komputa uye data hombe, maererano manhamba, iri kukura ne18-19% pagore. Izvi zvinoreva kuti nyaya yekusarudza software yezvinangwa izvi inoramba yakakosha. Mune ino post, isu tichatanga nekuti nei kugoverwa komputa kuchidiwa, pinda mune zvakadzama nezvekusarudza software, taura nezve kushandisa Hadoop uchishandisa Cloudera, uye pakupedzisira taura nezve kusarudza Hardware uye kuti inokanganisa sei kuita nenzira dzakasiyana.

Chii chakakosha nezve Cloudera uye maitiro ekubika
Nei komputa yakagoverwa ichidikanwa mubhizinesi renguva dzose? Zvose pano zviri nyore uye zvakaoma panguva imwe chete. Nyore - nekuti kazhinji tinoita maverengero akareruka pachikamu cheruzivo. Zvakaoma nekuti kune zvakawanda zveruzivo rwakadai. Zvakawanda. Somugumisiro, zvinodiwa gadzira terabytes yedata mu1000 threads. Nekudaro, zviitiko zvekushandisa zvakati wandei: kuverenga kunogona kushandiswa pese pazvinenge zvichidikanwa kuti utarise huwandu hukuru hwemetrics pane yakatokura yakawanda yedata.

Mumwe wemienzaniso ichangoburwa: iyo pizzeria cheni Dodo Pizza kutsunga zvichibva pakuongororwa kwedhatabhesi revatengi, kuti kana uchisarudza pitsa ine dhizaini isina kujairika, vashandisi vanowanzo shanda nematanhatu ekutanga seti ezvigadzirwa uye akati wandei asina kujairika. Maererano neizvi, pizzeria yakagadzirisa kutenga kwayo. Uye zvakare, akakwanisa kukurudzira zvirinani zvimwe zvigadzirwa zvakapihwa vashandisi panguva yekuraira nhanho, iyo yakawedzera purofiti.

Mumwe muenzaniso: kuongorora zvigadzirwa zvakabvumira H&M chitoro kudzikisa assortment muzvitoro zvega ne40%, uku ichichengetedza mazinga ekutengesa. Izvi zvakazadzikiswa nekusabatanidzira zvinhu zvakatengeswa zvisina kunaka, uye mwaka wakaverengerwa mukuverenga.

Tool kusarudzwa

Iyo indasitiri chiyero cherudzi urwu rwekombuta ndeyeHadoop. Sei? Nekuti Hadoop yakanakisa, yakanyorwa zvakanaka chimiro (iyo yakafanana Habr inopa akawanda akadzama zvinyorwa pamusoro peiyi nyaya), iyo inoperekedzwa neseti yese yezvishandiso uye maraibhurari. Iwe unogona kuisa mahombe seti ese akaumbwa uye asina kurongeka dhata, uye iyo pachayo system inoiparadzira pakati pesimba rekombuta. Uyezve, hunyanzvi uhu humwechete hunogona kuwedzerwa kana kuremara chero nguva - iyo yakafanana yakachinjika scalability mukuita.

Muna 2017, kambani ine simba yekubvunza Gartner akagumisakuti Hadoop ichakurumidza kupera. Chikonzero chakanyanya kurambidzwa: vanoongorora vanotenda kuti makambani achatama akawanda kumakore, sezvo ipapo ivo vachakwanisa kubhadhara pavanoshandisa simba rekombuta. Chechipiri chakakosha chinhu chinogona kunzi "kuviga" Hadoop kumhanya kwayo. Nekuti sarudzo dzakaita seApache Spark kana Google Cloud DataFlow dzinokurumidza kupfuura MepuReduce, iyo iri pasi peHadoop.

Hadoop inotsamira pambiru dzakati wandei, iyo inonyanya kuzivikanwa iyo MapReduce tekinoroji (hurongwa hwekugovera data rekuverenga pakati pemaseva) uye HDFS faira system. Iyo yekupedzisira yakarongedzerwa kuchengetedza ruzivo rwakagoverwa pakati pemasumbu masumbu: bhuroka yega yega yehukuru hwakatarwa inogona kuiswa pamanodhi akati wandei, uye nekuda kwekudzokorora, sisitimu inoshingirira kukundikana kwenodhi yega. Panzvimbo petafura yefaira, sevha yakakosha inonzi NameNode inoshandiswa.

Mufananidzo uri pazasi unoratidza mashandiro anoita MapReduce. Padanho rekutanga, iyo data inokamurwa zvichienderana neimwe chirevo, padanho rechipiri inogoverwa zvinoenderana nesimba rekombuta, uye padanho rechitatu kuverenga kunoitika.

Chii chakakosha nezve Cloudera uye maitiro ekubika
MapReduce yakatanga kugadzirwa neGoogle kune zvayaida kutsvaga. Ipapo MapReduce yakaenda yemahara kodhi, uye Apache akatora purojekiti. Zvakanaka, Google zvishoma nezvishoma yakatamira kune dzimwe mhinduro. Chinhu chinonakidza tidbit: Google parizvino ine chirongwa chinonzi Google Cloud Dataflow, chakamisikidzwa sedanho rinotevera mushure meHadoop, sechimbichimbi chekuchitsiva.

Kunyatsotarisa kunoratidza kuti Google Cloud Dataflow yakavakirwa pakusiyana kweApache Beam, nepo Apache Beam inosanganisira yakanyatso nyorwa Apache Spark chimiro, iyo inotibvumira kutaura nezve ingangoita yakafanana kukurumidza kukurumidza kwemhinduro. Zvakanaka, Apache Spark inoshanda zvakakwana paHDFS faira system, iyo inobvumira kuti ishandiswe pamaseva eHadoop.

Wedzera pano vhoriyamu yezvinyorwa uye yakagadzirira-yakagadzirwa mhinduro dzeHadoop uye Spark kupesana neGoogle Cloud Dataflow, uye sarudzo yechishandiso inova pachena. Uyezve, mainjiniya anogona kuzvisarudzira kuti ndeipi kodhi - yeHadoop kana Spark - yavanofanira kumhanya, vachitarisa pabasa, ruzivo uye zvikwaniriso.

Cloud kana sevha yemunharaunda

Maitiro ekuenda kune yakajairika shanduko kune gore akatopa kusimuka kune izwi rinonakidza seHadoop-as-a-service. Mumamiriro ezvinhu akadai, kutonga kwemaseva akabatana kwakave kwakakosha. Nekuti, maiwe, kunyangwe nekuzivikanwa kwayo, yakachena Hadoop chishandiso chakaoma kugadzirisa, sezvo zvakawanda zvichifanira kuitwa nemaoko. Semuenzaniso, gadzira maseva ega, tarisa maitiro avo, uye nyatso gadzirisa akawanda ma paramita. Kazhinji, basa ndereamateur uye pane mukana wakakura wekukanganisa pane imwe nzvimbo kana kupotsa chimwe chinhu.

Naizvozvo, akasiyana makiti ekugovera, ayo akatanga akashongedzerwa nyore kuendesa uye maturusi ekutonga, ave akakurumbira. Imwe yeanonyanya kufarirwa kugovera inotsigira Spark uye inoita kuti zvese zvive nyore ndeye Cloudera. Iyo ine zvese zvakabhadharwa uye zvemahara vhezheni - uye mune yekupedzisira ese ekutanga mashandiro anowanikwa, pasina kudzikisira nhamba yemanodhi.

Chii chakakosha nezve Cloudera uye maitiro ekubika

Panguva yekuseta, Cloudera Maneja achabatana neSSH kumaseva ako. Chinhu chinonakidza: kana uchiisa, zviri nani kutsanangura kuti iitwe neayo anonzi mapasuru: mapakeji akakosha, imwe neimwe ine zvese zvinodiwa zvinogadziriswa kuti zvishande pamwe chete. Chaizvoizvo iyi ishanduro yakagadziridzwa yepakeji maneja.

Mushure mekuisa, tinogashira cluster management console, kwaunogona kuona cluster telemetry, akaiswa masevhisi, uyezve iwe unogona kuwedzera / kubvisa zviwanikwa uye kugadzirisa iyo cluster kumisikidza.

Chii chakakosha nezve Cloudera uye maitiro ekubika

Nekuda kweizvozvo, kabhini yeroketi iyo inokutora iwe mune ramangwana rakajeka reBigData rinoonekwa pamberi pako. Asi tisati tati "handei," ngatifambei pasi pehodhi.

Hardware zvinodiwa

Pawebhusaiti yayo, Cloudera inotaura zvakasiyana zvinogoneka zvigadziriso. Misimboti yakawanda yavanovakwa nayo inoratidzwa mumufananidzo:

Chii chakakosha nezve Cloudera uye maitiro ekubika
MapReduce inogona kudzima mufananidzo uyu une tariro. Kana iwe ukatarisa zvakare dhayagiramu kubva muchikamu chekare, zvinova pachena kuti munenge muzviitiko zvese, basa reMapReduce rinogona kusangana nebhodhoro pakuverenga data kubva kudhisiki kana kubva kune network. Izvi zvinoonekwa zvakare mu Cloudera blog. Nekuda kweizvozvo, kune chero nekukurumidza kuverenga, kusanganisira kuburikidza neSpark, iyo inowanzoshandiswa pakuverenga-chaiyo-nguva, I / O kumhanya kwakakosha. Nokudaro, kana uchishandisa Hadoop, zvakakosha zvikuru kuti sumbu rinosanganisira michina yakaenzana uye inokurumidza, iyo, kuiisa zvinyoro, haisi nguva dzose yakavimbiswa mukugadzirwa kwegore.

Kuenzana mukugovewa kwemutoro kunowanikwa kuburikidza nekushandiswa kweOpenstack virtualization pamaseva ane ane simba akawanda-epakati CPUs. Data node dzakagoverwa ega processor zviwanikwa uye chaiwo madhisiki. Muchisarudzo chedu Atos Codex Data Lake Injini Wide virtualization inowanikwa, ndosaka isu tichibatsirwa zvese maererano nekuita (kukanganisa kweiyo network network kunoderedzwa) uye muTCO (yakawedzera mavhavha emuviri anobviswa).

Chii chakakosha nezve Cloudera uye maitiro ekubika
Kana tichishandisa maSeva eBullSequana S200, tinowana mutoro wakafanana, usina mamwe mabhodhoro. Iko kushomeka kwekugadzirisa kunosanganisira 3 BullSequana S200 maseva, imwe neimwe iine maJBOD maviri, pamwe nekuwedzera maS200 ane mana data node anosarudzika akabatana. Heino muenzaniso wemutoro muyedzo yeTeraGen:

Chii chakakosha nezve Cloudera uye maitiro ekubika

Miedzo ine akasiyana data mavhoriyamu uye kudzokorora kukosha inoratidza iwo mhedzisiro yakafanana maererano nekugoverwa kwemutoro pakati pemasumbu masumbu. Pazasi pane girafu rekugoverwa kwedhisiki yekuwana nekuita bvunzo.

Chii chakakosha nezve Cloudera uye maitiro ekubika

Maverengero akaitwa anoenderana neshongedzo shoma ye3 BullSequana S200 maseva. Inosanganisira 9 data node uye 3 master node, pamwe neakachengeterwa chaiwo machina kana kuendesa kwekudzivirira kwakavakirwa paOpenStack Virtualization. TeraSort bvunzo mhedzisiro: block saizi 512 MB replication factor yakaenzana nematatu ane encryption ndeye 23,1 maminetsi.

Iyo system inogona sei kuwedzerwa? Kune marudzi akasiyana ekuwedzera anowanikwa kuData Lake Engine:

  • Data node: kune yega yega 40 TB yenzvimbo inoshandiswa
  • Analytical nodes nekukwanisa kuisa GPU
  • Dzimwe sarudzo zvinoenderana nezvinodiwa nebhizinesi (semuenzaniso, kana uchida Kafka nezvimwe zvakadaro)

Chii chakakosha nezve Cloudera uye maitiro ekubika

Iyo Atos Codex Data Lake Engine inosanganisira ese maseva pachawo uye pre-yakaiswa software, kusanganisira ine rezinesi Cloudera kit; Hadoop pachayo, OpenStack ine chaiwo michina yakavakirwa paRedHat Enterprise Linux kernel, data replication uye backup masisitimu (kusanganisira kushandisa backup node uye Cloudera BDR - Backup uye Disaster Recovery). Atos Codex Data Lake Engine yakave yekutanga virtualization mhinduro kupihwa chitupa Cloudera.

Kana iwe uchifarira ruzivo, isu tichafara kupindura mibvunzo yedu mumhinduro.

Source: www.habr.com

Voeg