Kev ua lag luam rau kev faib xam thiab cov ntaub ntawv loj, raws li
Vim li cas peb thiaj xav tau kev faib xam hauv kev lag luam zoo tib yam? Txhua yam yog yooj yim thiab nyuaj tib lub sijhawm. Yooj yim - vim tias feem ntau peb ua cov lej yooj yim rau ib chav tsev ntawm cov ntaub ntawv. Nyuaj - vim tias muaj ntau cov ntaub ntawv zoo li no. Ntau heev. Raws li qhov tshwm sim, nws yuav tsum
Ib qho piv txwv tsis ntev los no: Dodo Pizza
Lwm cov piv txwv:
Kev xaiv cuab yeej
Cov txheej txheem kev lag luam rau hom kev suav no yog Hadoop. Vim li cas? Vim Hadoop yog ib qho zoo heev, cov ntaub ntawv zoo heev (tib yam Habr muab ntau cov ncauj lus ntxaws ntxaws ntawm cov ncauj lus no), uas yog nrog rau tag nrho cov khoom siv hluav taws xob thiab cov tsev qiv ntawv. Koj tuaj yeem xa cov ntaub ntawv loj loj ntawm ob qho tib si tsim thiab tsis tsim cov ntaub ntawv raws li cov tswv yim, thiab lub kaw lus nws tus kheej yuav faib rau lawv ntawm kev siv hluav taws xob. Ntxiv mus, cov peev txheej tib yam no tuaj yeem nce lossis xiam oob qhab txhua lub sijhawm - uas tib txoj kab rov tav scalability hauv kev nqis tes ua.
Nyob rau hauv 2017, lub influential pab tswv yim tuam txhab Gartner
Hadoop so ntawm ob peb tus ncej, qhov tseem ceeb tshaj plaws ntawm uas yog MapReduce technologies (ib qho system rau faib cov ntaub ntawv rau kev suav ntawm cov servers) thiab HDFS cov ntaub ntawv kaw lus. Cov tom kawg yog tsim tshwj xeeb los khaws cov ntaub ntawv faib tawm ntawm pawg nodes: txhua qhov thaiv ntawm qhov loj me tuaj yeem muab tso rau ntawm ob peb lub nodes, thiab ua tsaug rau kev rov ua dua, lub kaw lus tiv thaiv kev ua tsis tiav ntawm tus neeg nodes. Hloov chaw ntawm cov ntaub ntawv, ib lub server tshwj xeeb hu ua NameNode yog siv.
Cov duab hauv qab no qhia tau tias MapReduce ua haujlwm li cas. Nyob rau hauv thawj theem, cov ntaub ntawv yog muab faib raws li ib tug tej yam cwj pwm, nyob rau hauv lub thib ob theem nws yog faib los ntawm xam lub hwj chim, nyob rau hauv lub thib peb theem kev xam yuav siv qhov chaw.
MapReduce yog thawj zaug tsim los ntawm Google rau kev xav tau ntawm nws txoj kev tshawb nrhiav. Tom qab ntawd MapReduce tau nkag mus rau hauv cov lej pub dawb, thiab Apache tau hla qhov project. Zoo, Google maj mam tsiv mus rau lwm cov kev daws teeb meem. Ib qho nthuav nuance: tam sim no, Google muaj ib qhov project hu ua Google Cloud Dataflow, positioned as the next step after Hadoop, as its quick change.
Kev saib ze dua qhia tau tias Google Cloud Dataflow yog raws li kev hloov pauv ntawm Apache Beam, thaum Apache Beam suav nrog cov ntaub ntawv zoo Apache Spark lub moj khaum, uas tso cai rau peb tham txog yuav luag tib yam kev daws teeb meem. Zoo, Apache Spark ua haujlwm zoo ntawm HDFS cov ntaub ntawv, uas tso cai rau koj siv nws ntawm Hadoop servers.
Ntxiv ntawm no qhov ntim ntawm cov ntaub ntawv thiab npaj cov kev daws teeb meem rau Hadoop thiab Spark tawm tsam Google Cloud Dataflow, thiab kev xaiv cov cuab yeej ua pom tseeb. Ntxiv mus, engineers tuaj yeem txiav txim siab rau lawv tus kheej qhov chaws twg - nyob rau hauv Hadoop lossis Spark - lawv yuav ua tiav, tsom mus rau txoj haujlwm, kev paub dhau los thiab kev tsim nyog.
Huab lossis hauv zos server
Qhov sib txawv ntawm kev hloov pauv mus rau huab tau txawm tias tau nce mus rau lub sijhawm zoo li Hadoop-as-a-service. Hauv qhov xwm txheej zoo li no, kev tswj hwm ntawm cov servers txuas nrog tau dhau los ua qhov tseem ceeb heev. Vim tias, alas, txawm tias nws muaj koob meej, Hadoop ntshiab yog ib qho cuab yeej nyuaj rau kev teeb tsa, txij li koj yuav tsum ua ntau ntawm tes. Piv txwv li, koj tuaj yeem teeb tsa cov servers ib tus zuj zus, saib xyuas lawv cov kev ua tau zoo, thiab kho ntau yam tsis zoo. Feem ntau, ua hauj lwm rau ib tug amateur thiab muaj lub caij nyoog loj los mus rau qhov chaw los yog nco ib yam dab tsi.
Yog li ntawd, ntau yam kev faib khoom tau dhau los ua neeg nyiam heev, uas yog pib nrog kev xa khoom yooj yim thiab kev tswj hwm cov cuab yeej. Ib qho ntawm cov khoom lag luam nrov tshaj plaws uas txhawb nqa Spark thiab ua kom yooj yim yog Cloudera. Nws muaj ob qho tib si them thiab dawb versions - thiab nyob rau tom kawg, tag nrho cov haujlwm tseem ceeb yog muaj, thiab tsis txwv tus lej ntawm cov nodes.
Thaum lub sijhawm teeb tsa, Cloudera Manager yuav txuas ntawm SSH rau koj cov servers. Ib qho nthuav taw tes: thaum txhim kho, nws yog qhov zoo dua los qhia meej tias nws yuav tsum tau nqa tawm los ntawm qhov hu ua parcel: pob khoom tshwj xeeb, txhua qhov uas muaj tag nrho cov khoom tsim nyog tau teeb tsa los ua haujlwm nrog ib leeg. Nyob rau hauv qhov tseeb, qhov no yog xws li ib tug txhim kho version ntawm lub pob manager.
Tom qab kev teeb tsa, peb tau txais lub console tswj pawg, qhov twg koj tuaj yeem pom telemetry rau pawg, cov kev pabcuam tau teeb tsa, ntxiv rau koj tuaj yeem ntxiv / tshem tawm cov peev txheej thiab hloov kho pawg.
Yog li ntawd, qhov kev txiav ntawm lub foob pob hluav taws tshwm nyob rau hauv pem hauv ntej ntawm koj, uas yuav coj koj mus rau lub neej yav tom ntej ntawm BigData. Tab sis ua ntej peb hais tias "cia mus", cia peb nrawm nrawm rau hauv qab lub hood.
hardware xav tau
Ntawm lawv lub vev xaib, Cloudera hais txog kev teeb tsa sib txawv. Cov ntsiab cai dav dav uas lawv tsim tau muaj nyob hauv cov lus piav qhia:
MapReduce tuaj yeem plam daim duab zoo li no. Saib dua ntawm daim duab hauv ntu dhau los, nws pom tseeb tias yuav luag txhua qhov xwm txheej, MapReduce txoj haujlwm tuaj yeem cuam tshuam lub fwj thaum nyeem cov ntaub ntawv los ntawm disk lossis network. Qhov no kuj tau sau tseg rau ntawm Cloudera blog. Yog li ntawd, rau txhua qhov kev suav ceev, suav nrog los ntawm Spark, uas feem ntau yog siv rau kev xam lub sijhawm tiag tiag, I / O ceev yog qhov tseem ceeb heev. Yog li ntawd, thaum siv Hadoop, nws yog ib qho tseem ceeb heev uas sib npaug thiab ceev cov cav tov nkag mus rau hauv pawg, uas, yuav tsum tau muab nws me me, tsis yog ib txwm muab rau hauv huab infrastructure.
Qhov sib npaug hauv kev faib khoom yog ua tiav los ntawm kev siv Openstack virtualization ntawm cov servers nrog cov tub ntxhais muaj zog ntau CPUs. Cov ntaub ntawv nodes tau faib lawv tus kheej cov peev txheej processor thiab qee cov disks. Hauv peb qhov kev daws teeb meem Atos Codex Data Lake Cav dav virtualization tau ua tiav, uas yog vim li cas peb yeej ob qho tib si ntawm kev ua tau zoo (qhov cuam tshuam ntawm lub network infrastructure yog txo qis) thiab TCO (cov servers ntxiv lub cev raug tshem tawm).
Nyob rau hauv cov ntaub ntawv ntawm kev siv BullSequana S200 servers, peb tau txais ib qho kev thauj khoom zoo heev, tsis muaj qee yam ntawm cov fwj. Qhov kev teeb tsa yam tsawg kawg nkaus suav nrog 3 BullSequana S200 servers, txhua tus muaj ob JBODs, ntxiv rau S200s ntxiv uas muaj plaub cov ntaub ntawv nodes yog xaiv tau txuas nrog. Nov yog ib qho piv txwv load hauv TeraGen xeem:
Kev ntsuam xyuas nrog cov ntaub ntawv sib txawv thiab cov txiaj ntsig rov ua dua qhia cov txiaj ntsig zoo ib yam hauv cov ntsiab lus ntawm kev faib khoom thoob plaws pawg pawg. Hauv qab no yog ib daim duab ntawm kev faib cov disk nkag los ntawm kev ntsuas kev ua tau zoo.
Kev suav yog ua raws li qhov tsawg kawg nkaus ntawm 3 BullSequana S200 servers. Nws suav nrog 9 cov ntaub ntawv nodes thiab 3 tus tswv nodes, nrog rau cov tshuab virtual tshwj xeeb nyob rau hauv rooj plaub ntawm kev xa tawm kev tiv thaiv raws li OpenStack Virtualization. TeraSort cov txiaj ntsig kev xeem: 512 MB thaiv qhov loj ntawm qhov rov ua dua ntawm peb nrog kev encryption yog 23,1 feeb.
Yuav ua li cas thiaj li yuav nthuav dav? Ntau hom kev txuas ntxiv muaj rau Data Lake Cav:
- Cov ntaub ntawv nodes: rau txhua 40 TB ntawm qhov chaw siv tau
- Analytic nodes nrog lub peev xwm los nruab GPU
- Lwm cov kev xaiv nyob ntawm kev xav tau kev lag luam (piv txwv li, yog tias koj xav tau Kafka thiab lwm yam)
Lub Atos Codex Data Lake Engine complex suav nrog cov servers lawv tus kheej thiab cov software ua ntej, suav nrog cov khoom siv Cloudera nrog daim ntawv tso cai; Hadoop nws tus kheej, OpenStack nrog cov tshuab virtual raws li RedHat Enterprise Linux ntsiav, cov ntaub ntawv rov ua dua thiab cov txheej txheem thaub qab (xws li siv lub thaub qab ntawm thiab Cloudera BDR - Thaub qab thiab Kev Puas Tsuaj Rov Qab). Atos Codex Data Lake Cav yog thawj qhov kev daws teeb meem virtualization kom tau txais ntawv pov thawj
Yog tias koj txaus siab rau cov ntsiab lus, peb yuav zoo siab los teb peb cov lus nug hauv cov lus.
Tau qhov twg los: www.hab.com