Msika wamakompyuta ogawidwa ndi data yayikulu, malinga ndi
Chifukwa chiyani ma computing amafunikira pabizinesi yanthawi zonse? Chilichonse apa ndi chophweka komanso chovuta nthawi yomweyo. Zosavuta - chifukwa nthawi zambiri timawerengera zosavuta pamtundu uliwonse wa chidziwitso. Ndizovuta chifukwa pali zambiri zamtunduwu. Ambiri. Chifukwa chake, ndikofunikira
Chimodzi mwa zitsanzo zaposachedwa: tcheni cha pizzeria Dodo Pizza
Chitsanzo china:
Kusankha zida
Muyezo wamakampani pamtundu uwu wamakompyuta ndi Hadoop. Chifukwa chiyani? Chifukwa Hadoop ndiyabwino kwambiri, yolembedwa bwino (Habr yemweyo amapereka zolemba zambiri zatsatanetsatane pamutuwu), zomwe zimaphatikizidwa ndi zida zonse ndi malaibulale. Mutha kuyika ma seti akulu a data yokhazikika komanso yosasinthika, ndipo dongosolo lokhalo lizigawa pakati pa mphamvu zamakompyuta. Komanso, mphamvu zomwezi zitha kuonjezedwa kapena kuzimitsidwa nthawi iliyonse - scalability yopingasa yomweyi ikugwira ntchito.
Mu 2017, kampani yothandiza kwambiri ya Gartner
Hadoop imakhazikika pazipilala zingapo, zodziwika kwambiri zomwe ndi matekinoloje a MapReduce (njira yogawa deta yowerengera pakati pa maseva) ndi fayilo ya HDFS. Yotsirizirayi idapangidwa mwapadera kuti isungire zidziwitso zomwe zimagawidwa pakati pamagulu amagulu: chipika chilichonse chokhala ndi kukula kokhazikika chikhoza kuyikidwa pamfundo zingapo, ndipo chifukwa cha kubwerezabwereza, dongosololi limalimbana ndi zolephera za node zapayekha. M'malo mwa tebulo la fayilo, seva yapadera yotchedwa NameNode imagwiritsidwa ntchito.
Chithunzi chomwe chili pansipa chikuwonetsa momwe MapReduce imagwirira ntchito. Pa gawo loyamba, deta imagawidwa motsatira ndondomeko inayake, pa siteji yachiwiri imagawidwa molingana ndi mphamvu ya kompyuta, ndipo pa gawo lachitatu kuwerengera kumachitika.
MapReduce idapangidwa koyambirira ndi Google pazofuna zake. Kenako MapReduce idapita ku code yaulere, ndipo Apache adatenga ntchitoyi. Chabwino, Google pang'onopang'ono idasamukira ku mayankho ena. Nkhani yosangalatsa: Google pakadali pano ili ndi pulojekiti yotchedwa Google Cloud Dataflow, yomwe ili ngati sitepe yotsatira pambuyo pa Hadoop, ngati m'malo mwake.
Kuyang'anitsitsa kukuwonetsa kuti Google Cloud Dataflow imachokera ku kusiyana kwa Apache Beam, pamene Apache Beam imaphatikizapo ndondomeko ya Apache Spark yolembedwa bwino, yomwe imatilola kuti tilankhule za liwiro lomwelo la mayankho. Apache Spark imagwira ntchito bwino pamafayilo a HDFS, omwe amalola kuti agwiritsidwe ntchito pa maseva a Hadoop.
Onjezani apa kuchuluka kwa zolemba ndi mayankho okonzeka a Hadoop ndi Spark motsutsana ndi Google Cloud Dataflow, ndipo kusankha kwa chida kumawonekera. Kuphatikiza apo, mainjiniya amatha kusankha okha kuti ndi code iti - ya Hadoop kapena Spark - yomwe ayenera kuyendetsa, kuyang'ana kwambiri ntchito, chidziwitso ndi ziyeneretso.
Cloud kapena seva yapafupi
Zomwe zimachitika pakusintha kwamtambo kwadzetsanso mawu osangalatsa ngati Hadoop-as-a-service. Muzochitika zotere, kuyang'anira ma seva olumikizidwa kunakhala kofunika kwambiri. Chifukwa, tsoka, ngakhale kutchuka kwake, Hadoop yoyera ndi chida chovuta kwambiri kukonza, popeza zambiri ziyenera kuchitika pamanja. Mwachitsanzo, konzani ma seva payekhapayekha, kuyang'anira momwe amagwirira ntchito, ndikukonzekera mosamala magawo ambiri. Kawirikawiri, ntchitoyi ndi ya amateur ndipo pali mwayi waukulu wosokoneza penapake kapena kusowa chinachake.
Chifukwa chake, zida zosiyanasiyana zogawa, zomwe poyamba zimakhala ndi zida zoyendetsera bwino komanso zowongolera, zatchuka kwambiri. Chimodzi mwazinthu zodziwika bwino zomwe zimathandizira Spark ndikupanga chilichonse kukhala chosavuta ndi Cloudera. Ili ndi mitundu yolipira komanso yaulere - ndipo pamapeto pake magwiridwe antchito onse amapezeka, osachepetsa kuchuluka kwa node.
Pakukhazikitsa, Cloudera Manager adzalumikizana kudzera pa SSH kumaseva anu. Mfundo yosangalatsa: mukakhazikitsa, ndi bwino kunena kuti izi zichitike ndi omwe amatchedwa zidutswa: maphukusi apadera, omwe ali ndi zigawo zonse zofunika zomwe zimakonzedwa kuti zizigwira ntchito wina ndi mzake. Kwenikweni iyi ndi mtundu wowongoleredwa wa phukusi loyang'anira.
Pambuyo poika, timalandira cholembera cha cluster management, komwe mungathe kuwona cluster telemetry, mautumiki oikidwa, kuphatikizapo mukhoza kuwonjezera / kuchotsa zothandizira ndikusintha kasinthidwe kamagulu.
Zotsatira zake, kanyumba ka roketi kamene kangakulowetseni tsogolo lowala la BigData likuwonekera patsogolo panu. Koma tisanati "tiyeni," tiyeni tisunthire pansi.
Zofunikira pa Hardware
Patsamba lake, Cloudera imatchula masinthidwe osiyanasiyana zotheka. Mfundo zazikuluzikulu zomwe amamangidwira zikuwonetsedwa m'fanizoli:
MapReduce ikhoza kusokoneza chithunzi cha chiyembekezo ichi. Ngati muyang'ananso chithunzicho kuchokera ku gawo lapitalo, zikuwonekeratu kuti pafupifupi nthawi zonse, ntchito ya MapReduce imatha kukumana ndi vuto powerenga deta kuchokera ku disk kapena pa intaneti. Izi zadziwikanso mu Cloudera blog. Zotsatira zake, kuwerengera kulikonse kofulumira, kuphatikizapo kudzera mu Spark, yomwe nthawi zambiri imagwiritsidwa ntchito powerengera nthawi yeniyeni, liwiro la I / O ndilofunika kwambiri. Chifukwa chake, mukamagwiritsa ntchito Hadoop, ndikofunikira kwambiri kuti gululi liphatikizepo makina oyenerera komanso othamanga, omwe, kunena mofatsa, sizimatsimikizidwa nthawi zonse mumtambo wamtambo.
Kuyenda bwino pakugawa katundu kumatheka pogwiritsa ntchito Openstack virtualization pa maseva okhala ndi ma CPU amphamvu ambiri. Ma data amapatsidwa ma processor awo komanso ma disks enieni. Mu chisankho chathu Atos Codex Data Lake Engine Kuwonekera kwakukulu kumatheka, chifukwa chake timapindula ponse pakugwira ntchito (zotsatira za maukonde zimachepetsedwa) ndi TCO (ma seva owonjezera amachotsedwa).
Tikamagwiritsa ntchito ma seva a BullSequana S200, timapeza katundu wofanana kwambiri, wopanda zopinga zina. Kusintha kocheperako kumaphatikizapo ma seva a 3 BullSequana S200, iliyonse ili ndi ma JBOD awiri, kuphatikiza ma S200 owonjezera okhala ndi ma data anayi amalumikizidwa mwakufuna. Nachi chitsanzo cha katundu mu mayeso a TeraGen:
Mayesero okhala ndi ma voliyumu osiyanasiyana a data ndi mayendedwe obwereza amawonetsa zotsatira zomwezo pogawira katundu pakati pa magulu am'magulu. Pansipa pali graph ya kugawa kwa disk access ndi mayeso a magwiridwe antchito.
Kuwerengera kunachitika kutengera kusanjidwa kochepa kwa ma seva a 3 BullSequana S200. Zimaphatikizapo ma data 9 ndi ma 3 master node, komanso makina osungidwa osungidwa ngati atumizidwa ku chitetezo chochokera ku OpenStack Virtualization. Zotsatira za mayeso a TeraSort: block size 512 MB replication factor yofanana ndi atatu okhala ndi encryption ndi mphindi 23,1.
Kodi dongosololi lingakulitsidwe bwanji? Pali mitundu yosiyanasiyana yazowonjezera zomwe zilipo ku Data Lake Engine:
- Ma data: pa 40 TB iliyonse yamalo ogwiritsidwa ntchito
- Ma analytical node omwe amatha kukhazikitsa GPU
- Zosankha zina kutengera zosowa zamabizinesi (mwachitsanzo, ngati mukufuna Kafka ndi zina)
Atos Codex Data Lake Engine imaphatikizapo ma seva okha ndi mapulogalamu omwe adayikidwa kale, kuphatikizapo zida zovomerezeka za Cloudera; Hadoop palokha, OpenStack yokhala ndi makina enieni otengera RedHat Enterprise Linux kernel, kubwereza deta ndi makina osunga zobwezeretsera (kuphatikiza kugwiritsa ntchito node yosunga zobwezeretsera ndi Cloudera BDR - Backup and Disaster Recovery). Atos Codex Data Lake Engine idakhala yankho loyamba lodziwika bwino kutsimikiziridwa
Ngati mukufuna zambiri, tidzakhala okondwa kuyankha mafunso athu mu ndemanga.
Source: www.habr.com