Zomwe zili zapadera za Cloudera ndi momwe mungakonzekerere

Msika wamakompyuta ogawidwa ndi data yayikulu, malinga ndi ziwerengero, ikukula ndi 18-19% pachaka. Izi zikutanthauza kuti nkhani yosankha mapulogalamu pazifukwa izi imakhalabe yofunikira. Mu positiyi, tiyamba ndi chifukwa chake makompyuta ogawidwa amafunikira, pita mwatsatanetsatane za kusankha mapulogalamu, kulankhula za kugwiritsa ntchito Hadoop pogwiritsa ntchito Cloudera, ndipo potsiriza tikambirane za kusankha hardware ndi momwe zimakhudzira ntchito m'njira zosiyanasiyana.

Zomwe zili zapadera za Cloudera ndi momwe mungakonzekerere
Chifukwa chiyani ma computing amafunikira pabizinesi yanthawi zonse? Chilichonse apa ndi chophweka komanso chovuta nthawi yomweyo. Zosavuta - chifukwa nthawi zambiri timawerengera zosavuta pamtundu uliwonse wa chidziwitso. Ndizovuta chifukwa pali zambiri zamtunduwu. Ambiri. Chifukwa chake, ndikofunikira sinthani ma terabytes a data mu ulusi 1000. Chifukwa chake, zochitika zomwe zimagwiritsidwa ntchito ndizodziwika konsekonse: kuwerengera kutha kugwiritsidwa ntchito kulikonse komwe kuli kofunikira kuti muganizire kuchuluka kwa ma metric pamitundu yokulirapo ya data.

Chimodzi mwa zitsanzo zaposachedwa: tcheni cha pizzeria Dodo Pizza wotsimikiza kutengera kuwunika kwa nkhokwe yamakasitomala, kuti posankha pitsa yokhala ndi topping mwachisawawa, ogwiritsa ntchito nthawi zambiri amagwira ntchito ndi zigawo zisanu ndi chimodzi zokha za zosakaniza kuphatikiza zingapo mwachisawawa. Mogwirizana ndi izi, pizzeria idasintha zogula zake. Kuphatikiza apo, adatha kulangiza bwino zinthu zina zomwe zimaperekedwa kwa ogwiritsa ntchito panthawi yoyitanitsa, zomwe zidawonjezera phindu.

Chitsanzo china: kusanthula zinthu zogulitsa zidalola sitolo ya H&M kuti ichepetse kuchulukana m'masitolo apadera ndi 40%, ndikusunga magawo ogulitsa. Izi zidatheka chifukwa chosaphatikiza zinthu zogulitsidwa bwino, ndipo nyengo idaganiziridwa powerengera.

Kusankha zida

Muyezo wamakampani pamtundu uwu wamakompyuta ndi Hadoop. Chifukwa chiyani? Chifukwa Hadoop ndiyabwino kwambiri, yolembedwa bwino (Habr yemweyo amapereka zolemba zambiri zatsatanetsatane pamutuwu), zomwe zimaphatikizidwa ndi zida zonse ndi malaibulale. Mutha kuyika ma seti akulu a data yokhazikika komanso yosasinthika, ndipo dongosolo lokhalo lizigawa pakati pa mphamvu zamakompyuta. Komanso, mphamvu zomwezi zitha kuonjezedwa kapena kuzimitsidwa nthawi iliyonse - scalability yopingasa yomweyi ikugwira ntchito.

Mu 2017, kampani yothandiza kwambiri ya Gartner anamalizakuti Hadoop posachedwapa adzakhala ntchito. Chifukwa chake ndi choletsedwa kwambiri: akatswiri amakhulupirira kuti makampani adzasamukira kumtambo chifukwa kumeneko adzatha kulipira akamagwiritsa ntchito mphamvu zamakompyuta. Chinthu chachiwiri chofunikira chomwe chitha "kukwirira" Hadoop ndi liwiro lake. Chifukwa zosankha monga Apache Spark kapena Google Cloud DataFlow ndizothamanga kuposa MapReduce, zomwe zimachokera ku Hadoop.

Hadoop imakhazikika pazipilala zingapo, zodziwika kwambiri zomwe ndi matekinoloje a MapReduce (njira yogawa deta yowerengera pakati pa maseva) ndi fayilo ya HDFS. Yotsirizirayi idapangidwa mwapadera kuti isungire zidziwitso zomwe zimagawidwa pakati pamagulu amagulu: chipika chilichonse chokhala ndi kukula kokhazikika chikhoza kuyikidwa pamfundo zingapo, ndipo chifukwa cha kubwerezabwereza, dongosololi limalimbana ndi zolephera za node zapayekha. M'malo mwa tebulo la fayilo, seva yapadera yotchedwa NameNode imagwiritsidwa ntchito.

Chithunzi chomwe chili pansipa chikuwonetsa momwe MapReduce imagwirira ntchito. Pa gawo loyamba, deta imagawidwa motsatira ndondomeko inayake, pa siteji yachiwiri imagawidwa molingana ndi mphamvu ya kompyuta, ndipo pa gawo lachitatu kuwerengera kumachitika.

Zomwe zili zapadera za Cloudera ndi momwe mungakonzekerere
MapReduce idapangidwa koyambirira ndi Google pazofuna zake. Kenako MapReduce idapita ku code yaulere, ndipo Apache adatenga ntchitoyi. Chabwino, Google pang'onopang'ono idasamukira ku mayankho ena. Nkhani yosangalatsa: Google pakadali pano ili ndi pulojekiti yotchedwa Google Cloud Dataflow, yomwe ili ngati sitepe yotsatira pambuyo pa Hadoop, ngati m'malo mwake.

Kuyang'anitsitsa kukuwonetsa kuti Google Cloud Dataflow imachokera ku kusiyana kwa Apache Beam, pamene Apache Beam imaphatikizapo ndondomeko ya Apache Spark yolembedwa bwino, yomwe imatilola kuti tilankhule za liwiro lomwelo la mayankho. Apache Spark imagwira ntchito bwino pamafayilo a HDFS, omwe amalola kuti agwiritsidwe ntchito pa maseva a Hadoop.

Onjezani apa kuchuluka kwa zolemba ndi mayankho okonzeka a Hadoop ndi Spark motsutsana ndi Google Cloud Dataflow, ndipo kusankha kwa chida kumawonekera. Kuphatikiza apo, mainjiniya amatha kusankha okha kuti ndi code iti - ya Hadoop kapena Spark - yomwe ayenera kuyendetsa, kuyang'ana kwambiri ntchito, chidziwitso ndi ziyeneretso.

Cloud kapena seva yapafupi

Zomwe zimachitika pakusintha kwamtambo kwadzetsanso mawu osangalatsa ngati Hadoop-as-a-service. Muzochitika zotere, kuyang'anira ma seva olumikizidwa kunakhala kofunika kwambiri. Chifukwa, tsoka, ngakhale kutchuka kwake, Hadoop yoyera ndi chida chovuta kwambiri kukonza, popeza zambiri ziyenera kuchitika pamanja. Mwachitsanzo, konzani ma seva payekhapayekha, kuyang'anira momwe amagwirira ntchito, ndikukonzekera mosamala magawo ambiri. Kawirikawiri, ntchitoyi ndi ya amateur ndipo pali mwayi waukulu wosokoneza penapake kapena kusowa chinachake.

Chifukwa chake, zida zosiyanasiyana zogawa, zomwe poyamba zimakhala ndi zida zoyendetsera bwino komanso zowongolera, zatchuka kwambiri. Chimodzi mwazinthu zodziwika bwino zomwe zimathandizira Spark ndikupanga chilichonse kukhala chosavuta ndi Cloudera. Ili ndi mitundu yolipira komanso yaulere - ndipo pamapeto pake magwiridwe antchito onse amapezeka, osachepetsa kuchuluka kwa node.

Zomwe zili zapadera za Cloudera ndi momwe mungakonzekerere

Pakukhazikitsa, Cloudera Manager adzalumikizana kudzera pa SSH kumaseva anu. Mfundo yosangalatsa: mukakhazikitsa, ndi bwino kunena kuti izi zichitike ndi omwe amatchedwa zidutswa: maphukusi apadera, omwe ali ndi zigawo zonse zofunika zomwe zimakonzedwa kuti zizigwira ntchito wina ndi mzake. Kwenikweni iyi ndi mtundu wowongoleredwa wa phukusi loyang'anira.

Pambuyo poika, timalandira cholembera cha cluster management, komwe mungathe kuwona cluster telemetry, mautumiki oikidwa, kuphatikizapo mukhoza kuwonjezera / kuchotsa zothandizira ndikusintha kasinthidwe kamagulu.

Zomwe zili zapadera za Cloudera ndi momwe mungakonzekerere

Zotsatira zake, kanyumba ka roketi kamene kangakulowetseni tsogolo lowala la BigData likuwonekera patsogolo panu. Koma tisanati "tiyeni," tiyeni tisunthire pansi.

Zofunikira pa Hardware

Patsamba lake, Cloudera imatchula masinthidwe osiyanasiyana zotheka. Mfundo zazikuluzikulu zomwe amamangidwira zikuwonetsedwa m'fanizoli:

Zomwe zili zapadera za Cloudera ndi momwe mungakonzekerere
MapReduce ikhoza kusokoneza chithunzi cha chiyembekezo ichi. Ngati muyang'ananso chithunzicho kuchokera ku gawo lapitalo, zikuwonekeratu kuti pafupifupi nthawi zonse, ntchito ya MapReduce imatha kukumana ndi vuto powerenga deta kuchokera ku disk kapena pa intaneti. Izi zadziwikanso mu Cloudera blog. Zotsatira zake, kuwerengera kulikonse kofulumira, kuphatikizapo kudzera mu Spark, yomwe nthawi zambiri imagwiritsidwa ntchito powerengera nthawi yeniyeni, liwiro la I / O ndilofunika kwambiri. Chifukwa chake, mukamagwiritsa ntchito Hadoop, ndikofunikira kwambiri kuti gululi liphatikizepo makina oyenerera komanso othamanga, omwe, kunena mofatsa, sizimatsimikizidwa nthawi zonse mumtambo wamtambo.

Kuyenda bwino pakugawa katundu kumatheka pogwiritsa ntchito Openstack virtualization pa maseva okhala ndi ma CPU amphamvu ambiri. Ma data amapatsidwa ma processor awo komanso ma disks enieni. Mu chisankho chathu Atos Codex Data Lake Engine Kuwonekera kwakukulu kumatheka, chifukwa chake timapindula ponse pakugwira ntchito (zotsatira za maukonde zimachepetsedwa) ndi TCO (ma seva owonjezera amachotsedwa).

Zomwe zili zapadera za Cloudera ndi momwe mungakonzekerere
Tikamagwiritsa ntchito ma seva a BullSequana S200, timapeza katundu wofanana kwambiri, wopanda zopinga zina. Kusintha kocheperako kumaphatikizapo ma seva a 3 BullSequana S200, iliyonse ili ndi ma JBOD awiri, kuphatikiza ma S200 owonjezera okhala ndi ma data anayi amalumikizidwa mwakufuna. Nachi chitsanzo cha katundu mu mayeso a TeraGen:

Zomwe zili zapadera za Cloudera ndi momwe mungakonzekerere

Mayesero okhala ndi ma voliyumu osiyanasiyana a data ndi mayendedwe obwereza amawonetsa zotsatira zomwezo pogawira katundu pakati pa magulu am'magulu. Pansipa pali graph ya kugawa kwa disk access ndi mayeso a magwiridwe antchito.

Zomwe zili zapadera za Cloudera ndi momwe mungakonzekerere

Kuwerengera kunachitika kutengera kusanjidwa kochepa kwa ma seva a 3 BullSequana S200. Zimaphatikizapo ma data 9 ndi ma 3 master node, komanso makina osungidwa osungidwa ngati atumizidwa ku chitetezo chochokera ku OpenStack Virtualization. Zotsatira za mayeso a TeraSort: block size 512 MB replication factor yofanana ndi atatu okhala ndi encryption ndi mphindi 23,1.

Kodi dongosololi lingakulitsidwe bwanji? Pali mitundu yosiyanasiyana yazowonjezera zomwe zilipo ku Data Lake Engine:

  • Ma data: pa 40 TB iliyonse yamalo ogwiritsidwa ntchito
  • Ma analytical node omwe amatha kukhazikitsa GPU
  • Zosankha zina kutengera zosowa zamabizinesi (mwachitsanzo, ngati mukufuna Kafka ndi zina)

Zomwe zili zapadera za Cloudera ndi momwe mungakonzekerere

Atos Codex Data Lake Engine imaphatikizapo ma seva okha ndi mapulogalamu omwe adayikidwa kale, kuphatikizapo zida zovomerezeka za Cloudera; Hadoop palokha, OpenStack yokhala ndi makina enieni otengera RedHat Enterprise Linux kernel, kubwereza deta ndi makina osunga zobwezeretsera (kuphatikiza kugwiritsa ntchito node yosunga zobwezeretsera ndi Cloudera BDR - Backup and Disaster Recovery). Atos Codex Data Lake Engine idakhala yankho loyamba lodziwika bwino kutsimikiziridwa Cloudera.

Ngati mukufuna zambiri, tidzakhala okondwa kuyankha mafunso athu mu ndemanga.

Source: www.habr.com

Kuwonjezera ndemanga