Data Engineer uye Data Scientist: Ndeupi mutsauko?

Iwo maprofessionals eData Scientist uye Data Engineer anowanzo vhiringika. Kambani yega yega ine yayo chaiyo yekushanda nedata, zvakasiyana zvinangwa zvekuongorora kwavo uye imwe pfungwa yekuti ndeipi nyanzvi inofanirwa kubata nechikamu chebasa, saka chimwe nechimwe chine zvachinoda. 

Ngationei kuti musiyano uripi pakati penyanzvi idzi, matambudziko api ebhizinesi avanogadzirisa, hunyanzvi hwavanahwo uye kuti vanowana marii. Mashoko acho akava makuru, saka takaakamura kuva mabhuku maviri.

Munyaya yekutanga, Elena Gerasimova, mukuru wedhipatimendi ".Data Science uye Analytics" muNetology, inotaura kuti musiyano uripi pakati peData Scientist neData Injiniya uye maturusi api avanoshanda nawo.

Kuti mabasa einjiniya nesainzi anosiyana sei

Injiniya yedata inyanzvi iyo, kune rumwe rutivi, inovandudza, inoedza uye inochengetedza data data: dhatabhesi, chengetedzo uye masisitimu ekugadzirisa akawanda. Kune rumwe rutivi, uyu ndiye anochenesa uye "combs" data kuti ishandiswe nevanoongorora uye data masayendisiti, ndiko kuti, inogadzira data processing pipelines.

Data Scientist inogadzira uye inodzidzisa fungidziro (uye mamwe) mamodheru vachishandisa muchina kudzidza algorithms uye neural network, kubatsira mabhizinesi kuwana mapatani akavanzika, kufanotaura zvichaitika uye nekugonesa akakosha mabhizinesi maitiro.

Musiyano mukuru pakati peData Scientist neData Injiniya ndewekuti kazhinji vane zvinangwa zvakasiyana. Ose ari maviri anoshanda kuti ave nechokwadi chekuti data inowanikwa uye yemhando yepamusoro. Asi Data Scientist inowana mhinduro kumibvunzo yake uye bvunzo dzekufungidzira mune data ecosystem (semuenzaniso, zvichibva paHadoop), uye Injiniya yeData inogadzira pombi yekushandira muchina wekudzidza algorithm yakanyorwa nesainzi yedata muSpark cluster mukati meiyo yakafanana. ecosystem. 

Injiniya yedata inounza kukosha kune bhizinesi nekushanda sechikamu chechikwata. Basa rayo nderekuita sechinhu chakakosha chinongedzo pakati pevatori vechikamu vakasiyana: kubva kuvagadziri kuenda kune vatengi vebhizinesi rekubika, uye kuwedzera kugadzirwa kwevanoongorora, kubva pakushambadzira uye chigadzirwa kuenda kuBI. 

A Data Scientist, pane kudaro, anotora chikamu chinoshanda muhurongwa hwekambani uye kutora nzwisiso, kuita sarudzo, kuita otomatiki algorithms, modhi uye kugadzira kukosha kubva kune data.
Data Engineer uye Data Scientist: Ndeupi mutsauko?

Kushanda nedata kunoenderana neGIGO (marara mu - marara kunze) musimboti: kana vaongorori uye data masayendisiti vanobata nedata isina kugadzirira uye inogona kunge isiriyo, zvino mhedzisiro kunyange kushandisa yakanyanya kuomarara kuongororwa algorithms ichave isina kururama. 

Mainjiniya edata anogadzirisa dambudziko iri nekuvaka mapaipi ekugadzirisa, kuchenesa uye kushandura data uye kubvumira masayendisiti edata kuti ashande nemhando yepamusoro data. 

Kune akawanda maturusi pamusika ekushanda nedata rinovhara nhanho yega yega: kubva pakuonekwa kwedata kusvika pakubuda kune dashboard yebhodhi revatungamiriri. Uye zvakakosha kuti sarudzo yekuzvishandisa iitwe neinjiniya - kwete nekuti iri fashoni, asi nekuti iye achabatsira chaizvo basa revamwe vatori vechikamu mukuita. 

Zvichienderana: kana kambani ichida kuita hukama pakati peBI neETL - kurodha data nekuvandudza mishumo, heino yakajairika nheyo yenhaka iyo Injiniya yeData ichafanirwa kubata nayo (zvakanaka kana painewo muvaki pachikwata).

Mabasa eData Engineer

  • Kuvandudza, kuvaka uye kugadzirisa kwekugadzirisa data data.
  • Kubata zvikanganiso uye kugadzira yakavimbika data kugadzirisa mapaipi.
  • Kuunza data isina kurongeka kubva kune akasiyana siyana ane simba kune fomu inodiwa kune basa revanoongorora.
  • Kupa mazano ekuvandudza kuenderana kwedata uye mhando.
  • Kupa uye kuchengetedza iyo data architecture inoshandiswa nedata masayendisiti uye vanoongorora data.
  • Gadzirisa uye chengetedza data zvakangofanana uye zvinobudirira muboka rakaparadzirwa remakumi kana mazana emaseva.
  • Ongorora iyo tekinoroji yekutengeserana-offs yezvishandiso zvekugadzira zviri nyore asi zvakasimba zvivakwa zvinogona kupona nekuvhiringwa.
  • Kudzora uye kutsigirwa kwekuyerera kwedata uye masisitimu ane hukama (kumisikidza yekutarisa uye yambiro).

Pane imwe nyanzvi mukati meData Injiniya trajectory - ML mainjiniya. Muchidimbu, mainjiniya aya ane hunyanzvi mukuunza modhi yekudzidza yemuchina mukuita maindasitiri uye kushandisa. Kazhinji, modhi yakagamuchirwa kubva kune data sainzi chikamu chekudzidza uye inogona kusashanda mumamiriro ekurwa.

Basa reData Scientist

  • Kubvisa maficha kubva kudata kushandisa muchina kudzidza algorithms.
  • Kushandisa akasiyana maturusi ekudzidza emuchina kufanotaura uye kuronga mapatani mu data.
  • Kuvandudza mashandiro uye kurongeka kwealgorithms yekudzidza muchina nekunyatso-tuning uye optimize algorithms.
  • Kuumbwa kwe "zvakasimba" hypotheses zvinoenderana nehurongwa hwekambani hunoda kuongororwa.

Vese Dhata Injiniya uye Dhata Scientist vanogovera mupiro unobatika mukuvandudza kweiyo data tsika, kuburikidza iyo kambani inogona kuunza imwe purofiti kana kuderedza mitengo.

Ndeipi mitauro uye maturusi ayo mainjiniya nemasainzi vanoshanda nazvo?

Nhasi, zvinotarisirwa zvesainzi data zvachinja. Pakutanga, mainjiniya akaunganidza mibvunzo mikuru yeSQL, akanyora nemaoko MapReduce uye akagadziridza data vachishandisa maturusi akadai seInformatica ETL, Pentaho ETL, Talend. 

Muna 2020, nyanzvi haigone kuita pasina ruzivo rwePython uye maturusi emazuva ano ekuverenga (semuenzaniso, Airflow), kunzwisisa kwemisimboti yekushanda nemapuratifomu emakore (uchishandisa iwo kuchengetedza pane Hardware, uchicherekedza misimboti yekuchengetedza).

SAP, Oracle, MySQL, Redis maturusi echinyakare einjiniya yedata mumakambani makuru. Ivo vakanaka, asi mutengo wemarezinesi wakakwirisa zvekuti kudzidza kushanda navo kunongoita zvine musoro muzvirongwa zveindasitiri. Panguva imwecheteyo, kune imwe yemahara imwe nzira muPostgres - ndeyemahara uye yakakodzera kwete yekudzidziswa chete. 

Data Engineer uye Data Scientist: Ndeupi mutsauko?
Nhoroondo, zvikumbiro zveJava neScala zvinowanzowanikwa, kunyangwe tekinoroji nemaitiro zvichikura, mitauro iyi inopera kumashure.

Nekudaro, hardcore BigData: Hadoop, Spark uye yakasara zoo haichisiri chinhu chinodiwa cheinjiniya yedata, asi rudzi rwematurusi ekugadzirisa matambudziko asingagone kugadziriswa neyakajairwa ETL. 

Maitiro acho masevhisi ekushandisa maturusi asina ruzivo rwemutauro waakanyorwa nawo (semuenzaniso, Hadoop pasina ruzivo rweJava), pamwe nekupihwa kweakagadzirira-akagadzirwa masevhisi ekugadzirisa dhata rekutepfenyura (kuzivikanwa kwezwi kana kuzivikanwa kwemufananidzo pavhidhiyo. )

Zvigadziriso zveindasitiri kubva kuSAS neSPSS zvakakurumbira, nepo Tableau, Rapidminer, Stata naJulia zvakare inoshandiswa zvakanyanya nemasainzi edata kumabasa emuno.

Data Engineer uye Data Scientist: Ndeupi mutsauko?
Kugona kuvaka mapaipi pachawo kwakaonekwa kune vanoongorora uye data masayendisiti makore mashoma apfuura: semuenzaniso, zvave kutogoneka kutumira data kuPostgreSQL-based storage uchishandisa zvinyorwa zviri nyore. 

Kazhinji, kushandiswa kwepombi uye zvakabatanidzwa data zvimiro zvinoramba zviri mutoro weinjiniya yedata. Asi nhasi, maitiro enyanzvi dzakaumbwa neT ane hunyanzvi hwakakura mundima dzine hukama akasimba kupfuura nakare kose, nekuti maturusi ari kugara achirerutswa.

Sei Data Injiniya uye Data Sayenzi Vanoshanda Pamwe Chete

Nekushanda padhuze nevainjiniya, Data Scientists inogona kutarisa kudivi rekutsvagisa, kugadzira kugadzira-yakagadzirira muchina kudzidza algorithms.
Uye mainjiniya anofanirwa kutarisa nezve scalability, kushandisazve data, uye nekuona kuti kupinza data uye kuburitsa mapaipi mupurojekiti yega yega zvinoenderana nekuvaka kwepasirese.

Uku kupatsanurwa kwemabasa kunovimbisa kuenderana kune zvikwata zvinoshanda pamapurojekiti akasiyana ekudzidza muchina. 

Kudyidzana kunobatsira kugadzira zvigadzirwa zvitsva nemazvo. Kumhanyisa uye mhando inowanikwa kuburikidza nechiyero pakati pekugadzira sevhisi yemunhu wese (kuchengetwa kwepasi rose kana kubatanidzwa kwemadhibhodhi) uye kuita chimwe nechimwe chinodiwa kana purojekiti (yakanyanya hunyanzvi pombi, inobatanidza ekunze masosi). 

Kushanda pamwe chete nemasainzi edata uye vaongorori kunobatsira mainjiniya kukudziridza hunyanzvi hwekuongorora uye hwekutsvagisa kunyora kodhi iri nani. Kugovana ruzivo pakati pekuchengetera uye vashandisi vedhamu yedata kunovandudza, zvichiita kuti mapurojekiti awedzere kukurumidza uye achiunza mhedzisiro yenguva refu.

Mumakambani anovavarira kukudziridza tsika yekushanda nedata uye kuvaka bhizinesi maitiro akavakirwa pazviri, Data Scientist uye Data Engineer vanopindirana uye kugadzira yakazara data data system. 

Muchinyorwa chinotevera tichataura nezve rudzi rwedzidzo iyo iyo Data Engineer uye Data Scientists inofanirwa kuve nayo, ndeapi hunyanzvi hwavanoda kuvandudza uye kuti musika unoshanda sei.

Kubva kuvapepeti veNetology

Kana iwe uri kutarisa basa reData Injiniya kana Dhata Scientist, tinokukoka iwe kuti udzidze zvirongwa zvedu zvekosi:

Source: www.habr.com

Voeg