Ngokusho
Ngihlaziye izikhala zesikhundla sikanjiniyela wedatha njengoba zinjalo ngoJanuwari 2020 ukuze ngiqonde ukuthi yimaphi amakhono ezobuchwepheshe aziwa kakhulu. Ngabe sengiqhathanisa imiphumela nezibalo zezikhala zesikhundla sososayensi wedatha - futhi kwavela umehluko othakazelisayo.
Ngaphandle kwesethulo esiningi, nabu ubuchwepheshe obuyishumi obuphezulu okukhulunywa ngabo kakhulu ekuthunyelweni kwemisebenzi:
Ukukhulunywa ngobuchwepheshe ezikhaleni zesikhundla sikanjiniyela wedatha ngo-2020
Izibopho zikanjiniyela wedatha
Namuhla, umsebenzi owenziwa onjiniyela bedatha ubaluleke kakhulu ezinhlanganweni - laba abantu abanomthwalo wemfanelo wokugcina ulwazi futhi balethe ngendlela yokuthi abanye abasebenzi bakwazi ukusebenza nalo. Onjiniyela bedatha bakha amapayipi ukuze basakaze noma bahlanganise idatha kusuka emithonjeni eminingi. Amapayipi abe enza imisebenzi yokukhipha, ukuguqulwa, kanye nokulayisha (ngamanye amazwi, izinqubo ze-ETL), okwenza idatha ifaneleke kakhulu ukusetshenziswa okwengeziwe. Ngemva kwalokhu, idatha ihanjiswa kubahlaziyi kanye nososayensi bedatha ukuze kucutshungulwe ngokujulile. Ekugcineni, idatha iphetha uhambo lwayo ngamadeshibhodi, imibiko, namamodeli okufunda omshini.
Bengifuna ulwazi olungangivumela ukuthi ngifinyelele esiphethweni mayelana nokuthi ibuphi ubuchwepheshe obudingeka kakhulu emsebenzini kanjiniyela wedatha okwamanje.
Izindlela
Ngiqoqe ulwazi ezindaweni ezintathu zokucinga imisebenzi β
Ngegama elingukhiye ngalinye, ngibale iphesenti lokushaywa kusukela enanini eliphelele lemibhalo kusayithi ngalinye ngokuhlukene, ngase ngibala isilinganiso semithombo emithathu.
Imiphumela
Ngezansi kunamatemu angamashumi amathathu obunjiniyela bedatha yezobuchwepheshe anezikolo eziphakeme kuzo zonke izingosi zemisebenzi ezintathu.
Futhi nazi izinombolo ezifanayo, kodwa zethulwe ngendlela yetafula:
Asihambe ngohlelo.
Ukubuyekezwa kwemiphumela
Kokubili i-SQL nePython zivela ngaphezu kwezingxenye ezimbili kwezintathu zezikhala zemisebenzi ezibuyekeziwe. Yibo lobu buchwepheshe obubili obunengqondo ukufunda kuqala.
I-Spark ibalulwe cishe engxenyeni yezikhala.
I-AWS ivela cishe ku-45% wemisebenzi ethunyelwe. Kuyipulatifomu yekhompyutha yamafu eyenziwe yi-Amazon; inesabelo semakethe esikhulu kunazo zonke phakathi kwazo zonke izinkundla zamafu.
Okulandelayo kuza i-Java ne-Hadoop - okungaphezudlwana kuka-40% kumfowabo.
Kufana nokugibela emshinini wesikhathi
Bese sibona i-Hive, i-Scala, i-Kafka ne-NoSQL - ngayinye yalezi zobuchwepheshe ishiwo engxenyeni yesine yezikhala ezithunyelwe. I-Apache Hive isofthiwe yokugcina idatha βeyenza kube lula ukufunda, ukubhala, nokuphatha amasethi edatha amakhulu ahlala ezitolo ezisabalalisiwe kusetshenziswa i-SQL.β
Ukuqhathanisa namagama ezikhala zesayensi yedatha
Nawa amagama obuchwepheshe angamashumi amathathu avame kakhulu phakathi kwabaqashi besayensi yedatha. Ngithole lolu hlu ngendlela efanayo njengoba kuchazwe ngenhla kubunjiniyela bedatha.
Kukhulunywa ngobuchwepheshe ezikhaleni zesikhundla sososayensi wedatha ngo-2020
Uma sikhuluma ngenani eliphelele, uma kuqhathaniswa nokuqashwa okucatshangelwe phambilini, kube nezikhala zezikhala ezingama-28% ngaphezulu (12 uma kuqhathaniswa nezi-013). Ake sibone ukuthi yibuphi ubuchwepheshe obungavamile kakhulu ezikhaleni zososayensi bedatha kunonjiniyela bedatha.
Okudume kakhulu kubunjiniyela bedatha
Igrafu engezansi ibonisa amagama angukhiye anomehluko omaphakathi omkhulu kuno-10% noma ngaphansi kuka -10%.
Umehluko omkhulu kakhulu kumvamisa yamagama angukhiye phakathi kukanjiniyela wedatha nososayensi wedatha
I-AWS ikhombisa ukwenyuka okubaluleke kakhulu: kubunjiniyela bedatha ibonakala ngama-25% njalo kunesayensi yedatha (cishe ama-45% nama-20% yenani eliphelele lezikhala, ngokulandelana). Umehluko uyabonakala!
Nansi idatha efanayo esethulweni esihluke kancane - kugrafu, imiphumela yegama elingukhiye elifanayo ezikhaleni zesikhundla sikanjiniyela wedatha kanye nososayensi wedatha atholakala ngapha nangapha.
Umehluko omkhulu kakhulu kumvamisa yamagama angukhiye phakathi kukanjiniyela wedatha nososayensi wedatha
Ukweqa okukhulu okulandelayo engikuphawulile kwakuse-Spark - unjiniyela wedatha ngokuvamile kufanele asebenze nedatha enkulu.
Idume kancane kwezobunjiniyela bedatha
Manje ake sibone ukuthi ibuphi ubuchwepheshe obungadumile kakhulu ezikhaleni zonjiniyela bedatha.
Ukwehla okubukhali kakhulu uma kuqhathaniswa nomkhakha wesayensi yedatha kwenzeke kuwo
Kudingeka kukho kokubili ubunjiniyela bedatha nesayensi yedatha
Kufanele kuqashelwe ukuthi izikhundla eziyisishiyagalombili kweziyishumi zokuqala kuwo womabili amasethi ziyefana. I-SQL, i-Python, i-Spark, i-AWS, i-Java, i-Hadoop, i-Hive ne-Scala ingene kwabayishumi abahamba phambili kuzo zombili izimboni zobunjiniyela bedatha nesayensi yedatha. Kugrafu engezansi ungabona ubuchwepheshe obuyishumi nanhlanu obudume kakhulu phakathi kwabaqashi bonjiniyela bedatha, futhi eduze kwabo isilinganiso sabo sezikhala sososayensi bedatha.
Izincomo
Uma ufuna ukungena kubunjiniyela bedatha, ngingakweluleka ukuthi uphathe kahle lobu buchwepheshe obulandelayo - ngibubhala ngokulandelana kokubaluleke kakhulu.
Funda i-SQL. Ngincike ku-PostgreSQL ngoba iwumthombo ovulekile, odume kakhulu emphakathini, futhi isesigabeni sokukhula. Ungafunda ukusebenzisa ulimi encwadini ethi My Memorable SQL - inguqulo yayo yokuhlola iyatholakala
I-Master Python, noma ingekho ezingeni eliqinile kakhulu. I-My Memorable Python yakhelwe ngqo abaqalayo. Ingathengwa ku
Uma usujwayelene nePython, dlulela kuma-pandas, umtapo wezincwadi wePython osetshenziselwa ukuhlanza nokucubungula idatha. Uma uhlose ukusebenza enkampanini edinga ikhono lokubhala ngePython (futhi leli iningi labo), ungaqiniseka ukuthi ulwazi lwama-pandas luzothathwa ngokuzenzakalelayo. Njengamanje ngiqedela umhlahlandlela oyisingeniso ekusebenzeni nama-panda - ungakwazi
Umphathi we-AWS. Uma ufuna ukuba unjiniyela wedatha, awukwazi ukwenza ngaphandle kwenkundla yefu ku-stash, futhi i-AWS idume kakhulu kubo. Izifundo zangisiza kakhulu
Uma usuvele uluqedile lonke lolu hlu futhi ufuna ukuqhubeka nokukhula emehlweni abaqashi njengonjiniyela wedatha, ngiphakamisa ukuthi wengeze i-Apache Spark ukuze usebenze ngedatha enkulu. Nakuba ucwaningo lwami ngezikhala zesayensi yedatha lubonise ukwehla kwentshisekelo, phakathi konjiniyela bedatha lusavela cishe kuzo zonke izikhala zesibili.
Ekugcineni
Ngithemba ukuthi uthole lokhu kubuka konke kobuchwepheshe obudingeka kakhulu bonjiniyela bedatha. Uma uzibuza ukuthi iqhuba kanjani imisebenzi yabahlaziyi, funda
Source: www.habr.com