Unyanzvi hwakanyanya-mukuda mune data mainjiniya basa

Maererano ne nhamba 2019, data engineer parizvino ibasa iro kudiwa kuri kukura nekukurumidza kupfuura chero imwe. Injiniya yedata inoita basa rakakosha musangano - kugadzira nekuchengetedza mapaipi uye dhatabhesi anoshandiswa kugadzirisa, kushandura uye kuchengeta data. Ndeupi unyanzvi hunodiwa nevamiririri vebasa iri kutanga? Rondedzero yacho yakasiyana here neinodiwa kune data masayendisiti? Iwe uchadzidza nezve zvese izvi kubva kuchinyorwa changu.

Ndakaongorora nzvimbo dzechinzvimbo cheinjiniya yedata sezvavari muna Ndira 2020 kuti ndinzwisise hunyanzvi hwetekinoroji hunonyanya kufarirwa. Ipapo ndakaenzanisa mhedzisiro nenhamba dzezvinzvimbo zve data sainzi chinzvimbo - uye mimwe misiyano inonakidza yakabuda.

Pasina nhanganyaya yakawanda, heano epamusoro gumi matekinoroji anowanzo kutaurwa mukutumira basa:

Unyanzvi hwakanyanya-mukuda mune data mainjiniya basa

Kutaurwa kwetekinoroji mune vacancies yenzvimbo yeinjiniya yedata muna 2020

Ngatizvionei.

Mabasa einjiniya data

Nhasi, basa rinoitwa nevainjiniya ve data rakakosha zvikuru kumasangano - ava ndivo vanhu vane basa rekuchengetedza ruzivo nekuunza muchimiro chakadaro icho vamwe vashandi vanogona kushanda nacho. Mainjiniya edata anovaka mapaipi ekuyerera kana batch data kubva kune akawanda masosi. Mapaipi anobva aita kudhirowa, shanduko, uye kurodha mashandiro (nemamwe mazwi, ETL maitiro), zvichiita kuti data rive rakakodzera kuti riwedzere kushandiswa. Mushure meizvi, iyo data inoendeswa kune vanoongorora uye data masayendisiti kuitira zvakadzama kugadzirisa. Chekupedzisira, iyo data inopedza rwendo rwayo mumadhibhodhi, mishumo, uye modhi yekudzidza yemuchina.

Ini ndanga ndichitsvaga ruzivo rwaizondibvumidza kuti nditore mhedziso pamusoro pekuti ndeapi matekinoroji ari kunyanya kudiwa mubasa reinjiniya wedata panguva ino.

Nzira

Ndakatora ruzivo kubva kunzvimbo nhatu dzekutsvaga mabasa βˆ’ SimplyHired, Zvechokwadi ΠΈ mhondi uye akatarisa kuti ndeapi mazwi akakosha akasangana ne "data mainjiniya" muzvinyorwa zvezvinzvimbo zvakanangana nevagari vemuUS. Pabasa iri ndakashandisa maraibhurari maviri ePython - Requests ΠΈ Soup yakanaka. Pakati pemazwi akakosha, ndakabatanidza ese ari maviri ayo akaverengerwa mune yapfuura runyorwa rwekuongorora nzvimbo dzechinzvimbo chesainzi yedata, uye idzo dzandakasarudza nemaoko ndichiverenga basa rinopihwa kune mainjiniya edata. LinkedIn haina kuverengerwa mune rondedzero yezvinyorwa, sezvo ini ndakarambidzwa ipapo mushure mekuedza kwangu kwekupedzisira kuunganidza data.

Kune rimwe nerimwe keyword, ini ndakaverenga iyo percentage yehits kubva kuhuwandu hwehuwandu hwezvinyorwa pane yega saiti zvakaparadzana, uye ndokuverenga avhareji kune matatu masosi.

Mhinduro

Pazasi pane makumi matatu tekinoroji data engineering mazwi ane epamusoro zvibodzwa munzvimbo dzese nhatu dzebasa.

Unyanzvi hwakanyanya-mukuda mune data mainjiniya basa

Uye heino nhamba dzakafanana, asi dzakaratidzwa muchimiro chetafura:

Unyanzvi hwakanyanya-mukuda mune data mainjiniya basa

Handei muhurongwa.

Wongororo yezvabuda

Ose ari maviri SQL nePython anoonekwa mune zvinopfuura zvikamu zviviri muzvitatu zvekuvhurwa kwemabasa akaongororwa. Ndiwo matekinoroji maviri aya anoita zvine musoro kutanga kudzidza. Python Mutauro unonyanya kufarirwa unoshandiswa pakushanda nedata, kugadzira mawebhusaiti, uye kunyora zvinyorwa. SQL inomirira Structured Query Language; inosanganisira chiyero chinoitwa neboka remitauro uye chinoshandiswa kutora data kubva kune zvine hukama dhatabhesi. Yakaonekwa kare-kare uye yakaratidza kuti inopikisa zvakanyanya.

Spark inotaurwa mune inenge hafu yezvinzvimbo. Apache spark i "yakabatana hombe data analytics injini ine yakavakirwa-mukati mamodule ekutenderera, SQL, muchina kudzidza, uye graph kugadzirisa." Inonyanya kufarirwa pakati pevaya vanoshanda nemahombe database.

AWS inoonekwa mune ingangoita 45% yemabasa anotumirwa. Iro cloud computing platform yakagadzirwa neAmazon; ine chikamu chikuru chemusika pakati pemapuratifomu ese emakore.
Tevere kuuya Java neHadoop - zvishoma kudarika 40% yehama yavo. Java mutauro unotaurwa nevakawanda, wakaedzwa nehondo 2019 Stack Overflow Developer Survey yakapihwa nzvimbo yechigumi pakati pemitauro inokonzera kutya pakati pevagadziri. Kusiyana neizvi, Python yaive mutauro wechipiri unodiwa zvakanyanya. Mutauro weJava unomhanyiswa naOracle, uye zvese zvaunoda kuziva nezvazvo zvinogona kunzwisiswa kubva pane ino skrini yepeji yepamutemo kubva muna Ndira 2020.

Unyanzvi hwakanyanya-mukuda mune data mainjiniya basa

Zvakafanana nokutasva muchina wenguva
Apache Hadoop inoshandisa iyo MepuReduce programming modhi ine server masumbu edata hombe. Iye zvino modhi iyi iri kuwedzera kusiiwa.

Ipapo tinoona Hive, Scala, Kafka uye NoSQL - imwe neimwe yeiyi tekinoroji inotaurwa muchikamu chechina chevakatumirwa vacancies. Apache Hive isoftware yekuchengetera data iyo "inoitisa kuti zvive nyore kuverenga, kunyora, uye kubata mahombe edataset anogara muzvitoro zvakagoverwa uchishandisa SQL." Scala - mutauro wepurogiramu unoshandiswa zvakanyanya kana uchishanda nedata hombe. Kunyanya, Spark yakagadzirwa muScala. Munzvimbo yatotaurwa pamusoro pemitauro inotyiwa, Scala iri pachinzvimbo chegumi nerimwe. Apache Kafka - chikuva chakagoverwa chekugadzirisa mameseji ekutepfenyura. Yakanyanya kufarirwa senzira yekufambisa data.

NoSQL Databases siyana ivo pachavo neSQL. Izvo zvinosiyana pakuti hazvisi zvehukama, hazvina kurongeka, uye zvakachinjika scalable. NoSQL yakawana mukurumbira, asi kuda kweiyo nzira, kunyangwe kusvika padanho rehuporofita kuti ichatsiva SQL seyakanyanya kuchengetedza paradigm, inoita kunge yapera.

Kuenzanisa nemashoko mune data sainzi vacancies

Heano makumi matatu tekinoroji mazwi anonyanya kuzivikanwa pakati pe data sainzi vashandirwi. Ndakawana runyorwa urwu nenzira imwechete sezvakatsanangurwa pamusoro apa yeinjiniya yedata.

Unyanzvi hwakanyanya-mukuda mune data mainjiniya basa

Kutaurwa kwe tekinoroji munzvimbo dzechinzvimbo chesainzi data muna 2020

Kana tikataura pamusoro pehuwandu hwehuwandu, kana tichienzaniswa neyakambofungidzirwa kupinza basa, pakanga paine 28% mamwe mabasa (12 maringe ne013). Ngationei kuti ndeapi matekinoroji asinganyanyi kuwanda munzvimbo dzevasaenzi vedata pane veinjiniya yedata.

Yakanyanya kufarirwa mune data engineering

Girafu iri pazasi inoratidza mazwi akakosha ane avhareji mutsauko wehukuru kupfuura 10% kana isingasviki -10%.

Unyanzvi hwakanyanya-mukuda mune data mainjiniya basa

Misiyano mikuru mu keyword frequency pakati pe data engineer uye data science

AWS inoratidza kuwedzera kunonyanya kukosha: muinjiniya yedata inoratidzika 25% nguva zhinji pane sainzi yedata (inenge 45% uye 20% yehuwandu hwehuwandu hwevachance, zvichiteerana). Musiyano unoonekwa!

Heino data rakafanana mumharidzo yakati siyanei - mugirafu, mhedzisiro yeizwi rakakosha mune zvinzvimbo zvechinzvimbo cheinjiniya yedata uye sainzi yedata inowanikwa padivi nepadivi.

Unyanzvi hwakanyanya-mukuda mune data mainjiniya basa

Misiyano mikuru mu keyword frequency pakati pe data engineer uye data science

Iyo yakatevera yakakura kusvetuka yandakaona yaive muSpark - injinjini yedata kazhinji inofanirwa kushanda nedata hombe. Kafka yakawedzerawo ne20%, ndiko kuti, kanenge kana zvichienzaniswa nemhedzisiro yedata sainzi vacancies. Kuendesa data nderimwe remabasa akakosha einjiniya yedata. Pakupedzisira, nhamba yezvakataurwa yaiva 15% yakakwirira mumunda we data engineering yeJava, NoSQL, Redshift, SQL uye Hadoop.

Zvishoma kufarirwa mune data engineering

Zvino ngationei kuti ndeapi matekinoroji asina kufarirwa zvakanyanya mune data mainjiniya vacancies.
Kudzikira kwakanyanya kuenzaniswa nechikamu chesainzi data chakaitika mukati R: ipapo akaonekwa mune angangoita 56% yevacancies, pano - chete mu17%. Zvinoshamisa. R mutauro wekuronga unofarirwa nevasayendisiti nevanoongorora nhamba, uye ndiwo mutauro wechisere unotyisa zvikuru pasi rose.

SAS inowanikwawo munzvimbo dzechinzvimbo cheinjiniya yedata zvakanyanya zvishoma kazhinji - mutsauko uri 14%. SAS mutauro wemuridzi wakagadzirirwa kushanda nenhamba uye data. Inonakidza pfungwa: kutonga nemhedzisiro tsvakiridzo yangu mukuvhurwa kwemabasa kune data masayendisiti, yakarasikirwa nehuwandu munguva pfupi yapfuura-kupfuura humwe unyanzvi hupi hupi.

Mukuda mune ese ari maviri data engineering uye data sainzi

Zvinofanira kucherechedzwa kuti zvisere zvezvinzvimbo gumi zvekutanga mumaseti ese ari maviri akafanana. SQL. Mune girafu pazasi iwe unogona kuona gumi neshanu anonyanya kufarirwa matekinoroji pakati pevashandirwi ve data mainjiniya, uye padivi pavo pane chiyero chavo chevacancy ye data sainzi.

Unyanzvi hwakanyanya-mukuda mune data mainjiniya basa

kurumbidza

Kana iwe uchida kupinda muinjiniya yedata, ini ndinokupa zano kuti ugone matekinoroji anotevera - ndinoanyora muhurongwa hwekufungidzira.

Dzidza SQL. Ini ndakazendamira ndakananga kuPostgreSQL nekuti yakavhurika sosi, yakakurumbira munharaunda, uye iri muchikamu chekukura. Unogona kudzidza kushandisa mutauro kubva mubhuku My Memorable SQL - yayo pilot version iripo pano.

Master Python, kunyangwe isiri padanho rakaomarara. My Memorable Python yakagadzirirwa zvakananga kune vanotanga. Inogona kutengwa pa Amazon, kopi yemagetsi kana yemuviri, sarudzo yako, kana kurodha mune pdf kana epub fomati panzvimbo ino.

Paunenge uchinge wajaira Python, enda kune pandas, raibhurari yePython inoshandiswa kuchenesa nekugadzirisa data. Kana iwe urikuvavarira kushanda mukambani inoda kugona kunyora muPython (uye iyi ndiyo yakawanda yavo), unogona kuve nechokwadi chekuti ruzivo rwepandas ruchafungidzirwa nekusarudzika. Ini parizvino ndiri kupedzisa gwara rekutanga rekushanda nemapanda - unogona nyorerakuitira kuti usapotsa nguva yekusunungurwa.

Master AWS. Kana iwe uchida kuve injinjini yedata, haugone kuita pasina gore papuratifomu mune stash, uye AWS ndiyo inonyanya kufarirwa kwavari. Makosi acho akandibatsira zvikuru Linux Chikoropandakanga ndichidzidza data engineering paGoogle Cloud, ndinofunga kuti vachange vaine zvinhu zvakanaka paAWS.

Kana iwe watopedza iyi rondedzero yese uye uchida kuwedzera kukura mumaziso evashandirwi seinjiniya yedata, ini ndinokurudzira kuwedzera Apache Spark yekushanda nedata hombe. Kunyangwe tsvakiridzo yangu nezve data sainzi vacancies yakaratidza kudzikira kwechido, pakati peinjiniya yedata ichiri kuoneka mune ingangoita sekondi yega yega.

Pakupedzisira

Ndinovimba wakawana iyi mhedziso yeakanyanya-mu-inoda matekinoroji einjiniya edata anobatsira. Kana iwe uchinetseka kuti mabasa eanalyst ari kufamba sei, verenga imwe nyaya yangu. Injiniya inofadza!

Source: www.habr.com

Voeg