Iwo maprofessionals eData Scientist uye Data Engineer anowanzo vhiringika. Kambani imwe neimwe ine yayo chaiyo yekushanda nedata, zvinangwa zvakasiyana zvekuongorora kwavo uye mazano akasiyana ekuti ndeapi nyanzvi dzinofanirwa kuita chikamu chebasa, saka chimwe nechimwe chine zvachinoda.
Isu tinoona kuti musiyano uripi pakati penyanzvi idzi, ndeapi mabasa ebhizinesi avanogadzirisa, hunyanzvi hwavanahwo uye kuti vanowana marii. Mashoko acho akava makuru, saka takaakamura kuva mabhuku maviri.
Munyaya yekutanga, Elena Gerasimova, mukuru wedhipatimendi,»muNetology, inotsanangura musiyano uripo pakati peData Scientist neData Injiniya uye maturusi api avanoshanda nawo.
Kuti mabasa einjiniya nesainzi anosiyana sei
Injiniya yedata inyanzvi iyo, kune rumwe rutivi, inovandudza, inoedza uye inotsigira data data: dhatabhesi, chengetedzo uye masisitimu ekugadzirisa akawanda. Kune rumwe rutivi, iye ndiye anochenesa uye "combs" data kuti ishandiswe nevanoongorora uye data masayendisiti, ndiko kuti, inogadzira data processing pipelines.
A Data Scientist inogadzira uye inodzidzisa fungidziro (uye mamwe) modhi ichishandisa muchina kudzidza algorithms uye neural network, kubatsira mabhizinesi kuwana mapatani akavanzika, kufanotaura zvichaitika mune ramangwana, uye optimize akakosha mabhizinesi maitiro.
Musiyano mukuru pakati peData Scientist uye Data Engineer ndeyekuti ivo vanowanzove nezvinangwa zvakasiyana. Ose ari maviri anoshanda kuti ave nechokwadi chekuti data inowanikwa uye yemhando yakanaka. Asi Data Scientist inowana mhinduro kumibvunzo yavo uye bvunzo dzekufungidzira mune data ecosystem (semuenzaniso, Hadoop), nepo Injiniya yeData inogadzira pombi yekushandira muchina wekudzidza algorithm yakanyorwa nesainzi yedata muSpark cluster mukati meiyo ecosystem.
Injiniya yedata inounza kukosha kubhizinesi nekushanda muchikwata. Basa ravo nderekuita sechibatanidza chakakosha pakati pevatori vechikamu vakasiyana: kubva kuvagadziri kuenda kune bhizinesi vatengi vekubika, uye kuwedzera kugadzirwa kwevanoongorora - kubva pakushambadzira uye chigadzirwa kuenda kuBI.
A Data Scientist, kune rumwe rutivi, anotora chikamu chinoshanda muhurongwa hwekambani uye kuburitsa kwenjere, kuita sarudzo, kuita otomatiki algorithms, modhi uye kugadzira kukosha kubva kudata.

Kushanda nedata kunoenderana neGIGO (marara mu - marara kunze) nheyo: kana vaongorori uye data masayendisiti vanobata nedata isina kugadzirira uye inogona kunge isiriyo, zvino mhedzisiro, kunyangwe neyakaomesesa yekuongorora algorithms, ichave isina kururama.
Mainjiniya edata anogadzirisa dambudziko iri nekuvaka mapaipi ekugadzirisa, kuchenesa uye kushandura data uye kubvumira musayendisiti wedata kuti ashande nepamusoro-mhando data.
Kune akawanda maturusi edatha pamusika anovhara nhanho yega yega: kubva pakubuda kwedata kusvika kune inobuda paboardroom dashboard. Uye zvakakosha kuti sarudzo yekuzvishandisa iitwe neinjiniya - kwete nekuti ndeye fashoni, asi nekuti iye achabatsira chaizvo vamwe vatori vechikamu mukuita basa ravo.
Zvichienderana: kana kambani ichida kuita shamwari pakati peBI neETL - kurodha data uye kushuma zvigadziriso, iyi ndiyo yakajairwa nhaka nheyo iyo Injiniya yeData ichafanirwa kubata nayo (zvakanaka kana timu ichisanganisira muvaki).
Data Engineer Mabasa
- Kuvandudza, kuvaka uye kugadzirisa kwekugadzirisa data data.
- Kubata zvikanganiso uye kugadzira yakavimbika data kugadzirisa mapaipi.
- Kuunza data isina kurongeka kubva kwakasiyana-siyana ine simba kune fomu inodiwa kubasa revanoongorora.
- Kupa mazano ekuvandudza kuenderana kwedata uye mhando.
- Kupa uye kutsigira dhizaini yedata inoshandiswa nemasaenzi edata uye vanoongorora data.
- Gadzirisa uye chengetedza data zvakangofanana uye zvinobudirira muboka rakaparadzirwa remakumi kana mazana emaseva.
- Kuongorora tekinoroji tradeoffs yezvishandiso kugadzira akareruka asi akasimba ezvivakwa anogona kupona kukundikana.
- Kudzora uye kutsigirwa kwekuyerera kwedata uye masisitimu ane hukama (kumisikidza yekutarisa uye chenjedzo).
Pane imwe nyanzvi mukati meData Injiniya trajectory - ML mainjiniya. Muchidimbu, mainjiniya akadai ane hunyanzvi mukuunza modhi dzekudzidza dzemuchina mukuita maindasitiri uye kushandisa. Kazhinji, iyo modhi yakagamuchirwa kubva kune data sainzi chikamu chekutsvaga uye inogona kusashanda mumamiriro ekurwa.
Data Scientist Responsibilities
- Kubvisa maficha kubva kudata kushandisa muchina kudzidza algorithms.
- Kushandisa akasiyana maturusi ekudzidza emuchina kufanotaura uye kuronga mapatani mu data.
- Kuvandudza mashandiro uye kurongeka kwemaalgorithms ekudzidza muchina nekunyatso-tuning uye optimize algorithms.
- Kuumbwa kwe "zvakasimba" hypotheses zvinoenderana nehurongwa hwekambani hunoda kuongororwa.
Vese ari Data Engineer uye Data Scientist vane zvakafanana mupiro unobatika mukusimudzira data tsika iyo inogona kubatsira kambani kuunza imwe purofiti kana kuderedza mitengo.
Ndeipi mitauro uye maturusi ayo mainjiniya nemasainzi vanoshanda nazvo?
Nhasi, zvinotarisirwa nesainzi data zvachinja. Pakutanga, mainjiniya akaunganidza mibvunzo mikuru yeSQL, akanyora nemaoko MapReduce, uye akagadziridza data vachishandisa maturusi akadai seInformatica ETL, Pentaho ETL, Talend.
Muna 2020, nyanzvi haigone kuita pasina ruzivo rwePython uye maturusi emazuva ano emakomputa (semuenzaniso, Airflow), kunzwisisa misimboti yekushanda nemapuratifomu emakore (uchishandisa iwo kuchengetedza pane Hardware, uchicherekedza misimboti yekuchengetedza).
SAP, Oracle, MySQL, Redis maturusi echinyakare einjiniya yedata mumakambani makuru. Iwo akanaka, asi mutengo wemarezinesi wakakwira zvekuti kudzidza kushanda navo kune musoro chete mumapurojekiti emaindasitiri. Panguva imwecheteyo, kune imwe yemahara imwe nzira muPostgres - ndeyemahara uye yakakodzera kwete yekudzidziswa chete.

Nhoroondo, kwave kune kudiwa kwakawanda kweJava neScala, kunyangwe matekinoroji nemaitiro zvinoshanduka, mitauro iyi iri kupera kumashure.
Nekudaro, hardcore BigData: Hadoop, Spark uye yakasara zoo haisisiri iyo inofanirwa kune injinjini yedata, asi rudzi rwezvishandiso zvekugadzirisa matambudziko izvo zvechinyakare ETL hazvigone kugadzirisa.
Maitiro acho ndeemasevhisi ekushandisa maturusi asina ruzivo rwemutauro waanonyorerwa (semuenzaniso, Hadoop pasina ruzivo rweJava), pamwe nekupa akagadzirira-akagadzirwa masevhisi ekugadzirisa dhata rekutepfenyura (izwi kana kucherechedzwa kwemufananidzo pavhidhiyo).
Zvigadziriso zveindasitiri kubva kuSAS neSPSS zvakakurumbira, nepo Tableau, Rapidminer, Stata naJulia zvakare inoshandiswa zvakanyanya nemasainzi edata kumabasa emuno.

Iko kugona kugadzira mapaipi uri wega kwave kungowanikwa kune vanoongorora uye data masayendisiti kwemakore akati wandei: semuenzaniso, zvave kuita kutumira data kuPostgreSQL-based storage uchishandisa zvinyorwa zviri nyore.
Nechinyakare, kushandiswa kwepombi uye zvakabatanidzwa data zvimiro zvinoramba zviri mutoro weinjiniya yedata. Asi nhasi, maitiro akanangana nenyanzvi dzakaumbwa neT - ane hunyanzvi hwakakura munzvimbo dzine hukama - hwakasimba kupfuura nakare kose, sezvo maturusi ari kugara achirerutswa.
Sei Data Injiniya uye Data Sayenzi Vanoshanda Pamwe Chete
Nekushanda padhuze nevainjiniya, Data Scientists inogona kutarisa kudivi rekutsvagisa rezvinhu, kugadzira kugadzira-yakagadzirira muchina kudzidza algorithms.
Uye mainjiniya anofanirwa kutarisa nezve scalability, kushandisazve data, uye kuona kuti mapombi ekuisa uye anobuda mupurojekiti yega yega anofanana nekuvaka kwepasirese.
Uku kupatsanurwa kwemabasa kunovimbisa kuenderana pakati pezvikwata zvinoshanda pamapurojekiti akasiyana ekudzidza muchina.
Kudyidzana kunobatsira kugadzira zvigadzirwa zvitsva nemazvo. Kumhanyisa uye mhando inowanikwa nekuyera pakati pekugadzira sevhisi yemunhu wese (yepasirese kuchengetedza kana dashboard kusanganisa) uye kuita chimwe nechimwe chinodiwa kana purojekiti (paipi ine hunyanzvi, inobatanidza ekunze masosi).
Kushanda padhuze nemasainzi edata uye vaongorori kunobatsira mainjiniya kukudziridza hunyanzvi hwekuongorora uye hwekuongorora kunyora kodhi iri nani. Kugovana ruzivo pakati pekuchengetera data uye vashandisi vedhamu yedata kunonatsiridza, zvichiita kuti mapurojekiti ave agile uye achiunza mhedzisiro yenguva refu.
Mumakambani anovavarira kukudziridza tsika yedata uye kuvaka mabhizinesi maitiro akavakirwa pairi, Data Scientist uye Data Engineer vanopindirana uye kugadzira yakazara-yakazara data data system.
Muchinyorwa chinotevera, isu tichataura nezve rudzi rwedzidzo Injiniya uye Data Scientists inofanirwa kuve nayo, ndeupi hunyanzvi hwavanoda kuvandudza, uye kuti musika wakarongwa sei.
Kubva kuvapepeti veNetology
Kana iwe uri kutarisa basa reData Injiniya kana Dhata Scientist, tinokukoka iwe kuti udzidze zvirongwa zvedu zvekosi:
- Basa "".
- Basa "".
Source: www.habr.com
