Amakhono adingeka kakhulu emsebenzini wobunjiniyela bedatha

Ngokusho izibalo 2019, unjiniyela wedatha okwamanje uwumsebenzi okufunwa kwawo kukhula ngokushesha kunanoma yimuphi omunye. Unjiniyela wedatha udlala indima ebalulekile enhlanganweni - ukudala nokugcina amaphayiphi nezizindalwazi ezisetshenziselwa ukucubungula, ukuguqula nokugcina idatha. Yimaphi amakhono abameleli balo msebenzi abawadingayo kuqala? Ingabe uhlu luhlukile kulokho okudingekayo kososayensi bedatha? Uzofunda ngakho konke lokhu esihlokweni sami.

Ngihlaziye izikhala zesikhundla sikanjiniyela wedatha njengoba zinjalo ngoJanuwari 2020 ukuze ngiqonde ukuthi yimaphi amakhono ezobuchwepheshe aziwa kakhulu. Ngabe sengiqhathanisa imiphumela nezibalo zezikhala zesikhundla sososayensi wedatha - futhi kwavela umehluko othakazelisayo.

Ngaphandle kwesethulo esiningi, nabu ubuchwepheshe obuyishumi obuphezulu okukhulunywa ngabo kakhulu ekuthunyelweni kwemisebenzi:

Amakhono adingeka kakhulu emsebenzini wobunjiniyela bedatha

Ukukhulunywa ngobuchwepheshe ezikhaleni zesikhundla sikanjiniyela wedatha ngo-2020

Ake sikuthole.

Izibopho zikanjiniyela wedatha

Namuhla, umsebenzi owenziwa onjiniyela bedatha ubaluleke kakhulu ezinhlanganweni - laba abantu abanomthwalo wemfanelo wokugcina ulwazi futhi balethe ngendlela yokuthi abanye abasebenzi bakwazi ukusebenza nalo. Onjiniyela bedatha bakha amapayipi ukuze basakaze noma bahlanganise idatha kusuka emithonjeni eminingi. Amapayipi abe enza imisebenzi yokukhipha, ukuguqulwa, kanye nokulayisha (ngamanye amazwi, izinqubo ze-ETL), okwenza idatha ifaneleke kakhulu ukusetshenziswa okwengeziwe. Ngemva kwalokhu, idatha ihanjiswa kubahlaziyi kanye nososayensi bedatha ukuze kucutshungulwe ngokujulile. Ekugcineni, idatha iphetha uhambo lwayo ngamadeshibhodi, imibiko, namamodeli okufunda omshini.

Bengifuna ulwazi olungangivumela ukuthi ngifinyelele esiphethweni mayelana nokuthi ibuphi ubuchwepheshe obudingeka kakhulu emsebenzini kanjiniyela wedatha okwamanje.

Izindlela

Ngiqoqe ulwazi ezindaweni ezintathu zokucinga imisebenzi βˆ’ SimplyHired, Ngempela ΠΈ Monster futhi wabheka ukuthi yimaphi amagama angukhiye atholakala ngokuhlanganyela β€œnonjiniyela wedatha” emibhalweni yezikhala eziqondiswe kubahlali base-US. Kulo msebenzi ngisebenzise imitapo yolwazi emibili yePython - Izicelo ΠΈ Isobho Elihle. Phakathi kwamagama angukhiye, ngiwafake womabili lawo afakwe ohlwini lwangaphambilini lokuhlaziya izikhala zesikhundla sososayensi wedatha, nalawo engiwakhethe mathupha ngenkathi ngifunda izipesheli zemisebenzi zonjiniyela bedatha. I-LinkedIn ayizange ifakwe ohlwini lwemithombo, njengoba ngivinjelwe lapho ngemva komzamo wami wokugcina wokuqoqa idatha.

Ngegama elingukhiye ngalinye, ngibale iphesenti lokushaywa kusukela enanini eliphelele lemibhalo kusayithi ngalinye ngokuhlukene, ngase ngibala isilinganiso semithombo emithathu.

Imiphumela

Ngezansi kunamatemu angamashumi amathathu obunjiniyela bedatha yezobuchwepheshe anezikolo eziphakeme kuzo zonke izingosi zemisebenzi ezintathu.

Amakhono adingeka kakhulu emsebenzini wobunjiniyela bedatha

Futhi nazi izinombolo ezifanayo, kodwa zethulwe ngendlela yetafula:

Amakhono adingeka kakhulu emsebenzini wobunjiniyela bedatha

Asihambe ngohlelo.

Ukubuyekezwa kwemiphumela

Kokubili i-SQL nePython zivela ngaphezu kwezingxenye ezimbili kwezintathu zezikhala zemisebenzi ezibuyekeziwe. Yibo lobu buchwepheshe obubili obunengqondo ukufunda kuqala. Python iwulimi lokuhlela oludume kakhulu olusetshenziselwa ukusebenza ngedatha, ukudala amawebhusayithi, nokubhala imibhalo. SQL imele Ulimi Lombuzo Ohleliwe; ifaka indinganiso esetshenziswa iqembu lezilimi futhi isetshenziselwa ukubuyisa idatha kusizindalwazi esihlobene. Ibonakale kudala futhi isizibonakalise ukuthi imelana kakhulu.

I-Spark ibalulwe cishe engxenyeni yezikhala. I-Apache Spark "iyinjini ehlanganisiwe yokuhlaziya idatha enkulu enamamojula akhelwe ngaphakathi okusakaza-bukhoma, i-SQL, ukufunda ngomshini, nokucubungula igrafu." Ithandwa kakhulu phakathi kwalabo abasebenza ngemininingwane emikhulu.

I-AWS ivela cishe ku-45% wemisebenzi ethunyelwe. Kuyipulatifomu yekhompyutha yamafu eyenziwe yi-Amazon; inesabelo semakethe esikhulu kunazo zonke phakathi kwazo zonke izinkundla zamafu.
Okulandelayo kuza i-Java ne-Hadoop - okungaphezudlwana kuka-40% kumfowabo. Java iwulimi olukhulunywa kabanzi, oluvivinywe empini Inhlolovo Yokuchichima Kanjiniyela yango-2019 yaklonyeliswa endaweni yeshumi phakathi kwezilimi ezibangela ukwethuka phakathi kwabahleli bohlelo. Ngokuphambene, iPython yayiwulimi lwesibili oluthandwa kakhulu. Ulimi lweJava luqhutshwa yi-Oracle, futhi konke odinga ukukwazi ngakho kungaqondwa kulesi sithombe-skrini sekhasi elisemthethweni kusukela ngoJanuwari 2020.

Amakhono adingeka kakhulu emsebenzini wobunjiniyela bedatha

Kufana nokugibela emshinini wesikhathi
I-Apache Hadoop isebenzisa imodeli yokuhlela ye-MapReduce enamaqoqo eseva yedatha enkulu. Manje le modeli iya ngokuya ishiywa.

Bese sibona i-Hive, i-Scala, i-Kafka ne-NoSQL - ngayinye yalezi zobuchwepheshe ishiwo engxenyeni yesine yezikhala ezithunyelwe. I-Apache Hive isofthiwe yokugcina idatha β€œeyenza kube lula ukufunda, ukubhala, nokuphatha amasethi edatha amakhulu ahlala ezitolo ezisabalalisiwe kusetshenziswa i-SQL.” Scala - ulimi lokuhlela olusetshenziswa ngokuzikhandla lapho usebenza nedatha enkulu. Ikakhulukazi, i-Spark yadalwa e-Scala. Esilinganisweni esesivele sishiwo sezilimi ezesatshwayo, i-Scala ikleliswe endaweni yeshumi nanye. Apache Kafka – inkundla esabalalisiwe yokucubungula imilayezo yokusakaza. Idume kakhulu njengendlela yokusakaza idatha.

NoSQL Databases baziqhathanise neSQL. Zihluke ngokuthi azihlobene, azihlelekile, futhi azilinganisi ngokuvundlile. I-NoSQL isizuze ukuthandwa okuthile, kodwa ukulangazelela le ndlela, kuze kufike ezingeni leziprofetho zokuthi izongena esikhundleni se-SQL njengepharadigm yesitoreji evelele, ibonakala isiphelile.

Ukuqhathanisa namagama ezikhala zesayensi yedatha

Nawa amagama obuchwepheshe angamashumi amathathu avame kakhulu phakathi kwabaqashi besayensi yedatha. Ngithole lolu hlu ngendlela efanayo njengoba kuchazwe ngenhla kubunjiniyela bedatha.

Amakhono adingeka kakhulu emsebenzini wobunjiniyela bedatha

Kukhulunywa ngobuchwepheshe ezikhaleni zesikhundla sososayensi wedatha ngo-2020

Uma sikhuluma ngenani eliphelele, uma kuqhathaniswa nokuqashwa okucatshangelwe phambilini, kube nezikhala zezikhala ezingama-28% ngaphezulu (12 uma kuqhathaniswa nezi-013). Ake sibone ukuthi yibuphi ubuchwepheshe obungavamile kakhulu ezikhaleni zososayensi bedatha kunonjiniyela bedatha.

Okudume kakhulu kubunjiniyela bedatha

Igrafu engezansi ibonisa amagama angukhiye anomehluko omaphakathi omkhulu kuno-10% noma ngaphansi kuka -10%.

Amakhono adingeka kakhulu emsebenzini wobunjiniyela bedatha

Umehluko omkhulu kakhulu kumvamisa yamagama angukhiye phakathi kukanjiniyela wedatha nososayensi wedatha

I-AWS ikhombisa ukwenyuka okubaluleke kakhulu: kubunjiniyela bedatha ibonakala ngama-25% njalo kunesayensi yedatha (cishe ama-45% nama-20% yenani eliphelele lezikhala, ngokulandelana). Umehluko uyabonakala!

Nansi idatha efanayo esethulweni esihluke kancane - kugrafu, imiphumela yegama elingukhiye elifanayo ezikhaleni zesikhundla sikanjiniyela wedatha kanye nososayensi wedatha atholakala ngapha nangapha.

Amakhono adingeka kakhulu emsebenzini wobunjiniyela bedatha

Umehluko omkhulu kakhulu kumvamisa yamagama angukhiye phakathi kukanjiniyela wedatha nososayensi wedatha

Ukweqa okukhulu okulandelayo engikuphawulile kwakuse-Spark - unjiniyela wedatha ngokuvamile kufanele asebenze nedatha enkulu. KAFKA iphinde yenyuka ngo-20%, okungukuthi, cishe izikhathi ezine uma kuqhathaniswa nomphumela wezikhala zososayensi bedatha. Ukudluliswa kwedatha kungenye yezibopho ezibalulekile zikanjiniyela wedatha. Ekugcineni, inani lezinkulumo laliphakeme ngo-15% emkhakheni wobunjiniyela bedatha ye-Java, i-NoSQL, i-Redshift, i-SQL ne-Hadoop.

Idume kancane kwezobunjiniyela bedatha

Manje ake sibone ukuthi ibuphi ubuchwepheshe obungadumile kakhulu ezikhaleni zonjiniyela bedatha.
Ukwehla okubukhali kakhulu uma kuqhathaniswa nomkhakha wesayensi yedatha kwenzeke kuwo R: lapho uvele cishe ku-56% wezikhala, lapha - kuphela ku-17%. Okuhlaba umxhwele. I-R iwulimi lokuhlela oluthandwa ososayensi nezibalo, futhi luwulimi lwesishiyagalombili olusatshwa kakhulu emhlabeni.

SAS iphinde itholakale ezikhaleni zesikhundla sikanjiniyela wedatha kancane kakhulu - umehluko ngu-14%. I-SAS iwulimi lobunikazi olwakhelwe ukusebenza ngezibalo nedatha. Iphuzu elithakazelisayo: ukwahlulela ngemiphumela ucwaningo lwami mayelana nokuvuleka kwemisebenzi kososayensi bedatha, ilahlekelwe kakhulu muva njeβ€”ngaphezu kwanoma ibuphi obunye ubuchwepheshe.

Kudingeka kukho kokubili ubunjiniyela bedatha nesayensi yedatha

Kufanele kuqashelwe ukuthi izikhundla eziyisishiyagalombili kweziyishumi zokuqala kuwo womabili amasethi ziyefana. I-SQL, i-Python, i-Spark, i-AWS, i-Java, i-Hadoop, i-Hive ne-Scala ingene kwabayishumi abahamba phambili kuzo zombili izimboni zobunjiniyela bedatha nesayensi yedatha. Kugrafu engezansi ungabona ubuchwepheshe obuyishumi nanhlanu obudume kakhulu phakathi kwabaqashi bonjiniyela bedatha, futhi eduze kwabo isilinganiso sabo sezikhala sososayensi bedatha.

Amakhono adingeka kakhulu emsebenzini wobunjiniyela bedatha

Izincomo

Uma ufuna ukungena kubunjiniyela bedatha, ngingakweluleka ukuthi uphathe kahle lobu buchwepheshe obulandelayo - ngibubhala ngokulandelana kokubaluleke kakhulu.

Funda i-SQL. Ngincike ku-PostgreSQL ngoba iwumthombo ovulekile, odume kakhulu emphakathini, futhi isesigabeni sokukhula. Ungafunda ukusebenzisa ulimi encwadini ethi My Memorable SQL - inguqulo yayo yokuhlola iyatholakala lapha.

I-Master Python, noma ingekho ezingeni eliqinile kakhulu. I-My Memorable Python yakhelwe ngqo abaqalayo. Ingathengwa ku Amazon, ikhophi ye-elekthronikhi noma ebonakalayo, ukukhetha kwakho, noma landa ngefomethi ye-pdf noma ye-epub kuleyo webhusayithi.

Uma usujwayelene nePython, dlulela kuma-pandas, umtapo wezincwadi wePython osetshenziselwa ukuhlanza nokucubungula idatha. Uma uhlose ukusebenza enkampanini edinga ikhono lokubhala ngePython (futhi leli iningi labo), ungaqiniseka ukuthi ulwazi lwama-pandas luzothathwa ngokuzenzakalelayo. Njengamanje ngiqedela umhlahlandlela oyisingeniso ekusebenzeni nama-panda - ungakwazi bhalisaukuze ungaphuthelwa isikhathi sokukhululwa.

Umphathi we-AWS. Uma ufuna ukuba unjiniyela wedatha, awukwazi ukwenza ngaphandle kwenkundla yefu ku-stash, futhi i-AWS idume kakhulu kubo. Izifundo zangisiza kakhulu I-Linux Academyngenkathi ngisafunda ubunjiniyela bedatha ku-Google Cloud, ngicabanga ukuthi bazoba nezinto ezinhle ku-AWS.

Uma usuvele uluqedile lonke lolu hlu futhi ufuna ukuqhubeka nokukhula emehlweni abaqashi njengonjiniyela wedatha, ngiphakamisa ukuthi wengeze i-Apache Spark ukuze usebenze ngedatha enkulu. Nakuba ucwaningo lwami ngezikhala zesayensi yedatha lubonise ukwehla kwentshisekelo, phakathi konjiniyela bedatha lusavela cishe kuzo zonke izikhala zesibili.

Ekugcineni

Ngithemba ukuthi uthole lokhu kubuka konke kobuchwepheshe obudingeka kakhulu bonjiniyela bedatha. Uma uzibuza ukuthi iqhuba kanjani imisebenzi yabahlaziyi, funda esinye isihloko sami. Ubunjiniyela obujabulisayo!

Source: www.habr.com

Engeza amazwana