Data Engineer ndi Data Scientist: Kodi pali kusiyana kotani?

Ntchito za Data Scientist ndi Data Engineer nthawi zambiri zimasokonezeka. Kampani iliyonse ili ndi zake zenizeni zogwirira ntchito ndi deta, zolinga zosiyanasiyana zowunikira komanso lingaliro losiyana la yemwe katswiri ayenera kuthana ndi gawo la ntchitoyo, chifukwa chake aliyense ali ndi zofunikira zake. 

Tiyeni tiwone kuti pali kusiyana kotani pakati pa akatswiriwa, mavuto abizinesi omwe amathetsa, maluso omwe ali nawo komanso momwe amapezera ndalama. Nkhanizo zinali zazikulu, choncho tinazigawa m’mabuku aΕ΅iri.

M'nkhani yoyamba, Elena Gerasimova, wamkulu wa faculty ".Sayansi ya Data ndi Analytics"Mu Netology, imafotokoza kusiyana komwe kuli pakati pa Data Scientist ndi Data Engineer ndi zida zomwe amagwira nazo.

Momwe maudindo a mainjiniya ndi asayansi amasiyanirana

Katswiri wa data ndi katswiri yemwe, kumbali imodzi, amapanga, kuyesa ndikusunga zida za data: nkhokwe, kusungirako ndi makina opangira misa. Kumbali inayi, uyu ndiye amene amatsuka ndi "kusa" deta kuti agwiritsidwe ntchito ndi akatswiri ndi asayansi a data, ndiko kuti, amapanga mapaipi opangira deta.

Data Scientist amapanga ndi kuphunzitsa zitsanzo zolosera (ndi zina) pogwiritsa ntchito makina ophunzirira makina ndi ma neural network, kuthandiza mabizinesi kupeza njira zobisika, kulosera zomwe zikuchitika komanso kukhathamiritsa njira zazikulu zamabizinesi.

Kusiyana kwakukulu pakati pa Data Scientist ndi Data Engineer ndikuti nthawi zambiri amakhala ndi zolinga zosiyana. Zonsezi zimagwira ntchito kuti zitsimikizire kuti deta ikupezeka komanso yapamwamba. Koma a Data Scientist amapeza mayankho ku mafunso ake ndikuyesa zongopeka mu data ecosystem (mwachitsanzo, kutengera Hadoop), ndipo Data Engineer amapanga payipi yogwiritsira ntchito makina ophunzirira makina olembedwa ndi wasayansi wa data mu gulu la Spark mkati momwemo. chilengedwe. 

Wopanga data amabweretsa phindu kubizinesi pogwira ntchito ngati gulu. Ntchito yake ndikuchita ngati ulalo wofunikira pakati pa omwe atenga nawo mbali osiyanasiyana: kuchokera kwa opanga mabizinesi kupita kwa ogula mabizinesi opereka malipoti, ndikuwonjezera zokolola za akatswiri, kuyambira pakutsatsa ndi malonda mpaka ku BI. 

A Data Scientist, m'malo mwake, amatenga nawo mbali muzanzeru za kampaniyo ndikuwunika zidziwitso, kupanga zisankho, kugwiritsa ntchito ma aligorivimu a automation, kufananiza ndi kupanga phindu kuchokera ku data.
Data Engineer ndi Data Scientist: Kodi pali kusiyana kotani?

Kugwira ntchito ndi deta kumagwirizana ndi mfundo ya GIGO (zinyalala mu - zinyalala): ngati akatswiri ndi asayansi a data akulimbana ndi deta yosakonzekera komanso yomwe ingakhale yolakwika, ndiye kuti zotsatira zake ngakhale pogwiritsa ntchito njira zowonongeka kwambiri zidzakhala zolakwika. 

Akatswiri opanga deta amathetsa vutoli pomanga mapaipi okonza, kuyeretsa ndi kusintha deta ndikulola asayansi a data kuti agwire ntchito ndi deta yapamwamba. 

Pali zida zambiri pamsika zogwirira ntchito ndi data yomwe imakhudza gawo lililonse: kuyambira pakuwoneka kwa data mpaka kutulutsa mpaka dashboard ya board of director. Ndipo ndikofunikira kuti chigamulo chogwiritsa ntchito chipangidwe ndi injiniya - osati chifukwa chapamwamba, koma chifukwa adzathandizadi ntchito za ena omwe akugwira nawo ntchitoyi. 

Mwachidule: ngati kampani ikufunika kulumikizana pakati pa BI ndi ETL - kutsitsa zidziwitso ndikusintha malipoti, apa pali maziko omwe amayenera kuthana nawo (ndibwino ngati palinso womanga pagululo).

Udindo wa Data Engineer

  • Kupanga, kumanga ndi kukonza zida zopangira ma data.
  • Kusamalira zolakwika ndikupanga mapaipi odalirika opangira data.
  • Kubweretsa deta yosasinthika kuchokera kuzinthu zosiyanasiyana zosinthika kukhala mawonekedwe ofunikira pa ntchito ya akatswiri.
  • Kupereka malingaliro owongolera kusasinthika kwa data ndi mtundu.
  • Kupereka ndi kusunga mamangidwe a data omwe amagwiritsidwa ntchito ndi asayansi a data ndi osanthula deta.
  • Sungani ndi kusunga deta mosasinthasintha komanso moyenera mumagulu ogawidwa a ma seva makumi kapena mazana.
  • Unikani zakusintha kwaukadaulo kwa zida kuti mupange zomanga zosavuta koma zolimba zomwe zitha kupulumuka chisokonezo.
  • Kuwongolera ndi kuthandizira kayendedwe ka deta ndi machitidwe okhudzana (kukhazikitsa kuwunika ndi zidziwitso).

Palinso luso lina mkati mwa data Engineer trajectory - ML injiniya. Mwachidule, mainjiniyawa amakhazikika pakubweretsa mitundu yophunzirira makina pakukhazikitsa ndikugwiritsa ntchito mafakitale. Nthawi zambiri, chitsanzo cholandilidwa kuchokera kwa wasayansi wa data ndi gawo la kafukufuku ndipo sangagwire ntchito muzochitika zankhondo.

Udindo wa Data Scientist

  • Kuchotsa zinthu mu data kuti mugwiritse ntchito makina ophunzirira makina.
  • Kugwiritsa ntchito zida zosiyanasiyana zophunzirira makina kuti mulosere ndikugawa magawo mu data.
  • Kupititsa patsogolo magwiridwe antchito ndi kulondola kwa ma algorithms ophunzirira makina mwa kukonza bwino ndikuwongolera ma algorithms.
  • Kupanga malingaliro "amphamvu" molingana ndi njira ya kampani yomwe imayenera kuyesedwa.

Onse a Data Engineer ndi Data Scientist amagawana zomwe zimathandizira pakupanga chikhalidwe cha data, momwe kampani imatha kupanga phindu lina kapena kuchepetsa ndalama.

Ndi zilankhulo ndi zida ziti zomwe mainjiniya ndi asayansi amagwira nazo ntchito?

Masiku ano, ziyembekezo za asayansi deta zasintha. M'mbuyomu, mainjiniya adasonkhanitsa mafunso akulu a SQL, adalemba pamanja MapReduce ndikusintha deta pogwiritsa ntchito zida monga Informatica ETL, Pentaho ETL, Talend. 

Mu 2020, katswiri sangachite popanda kudziwa Python ndi zida zamakono zowerengera (mwachitsanzo, Airflow), kumvetsetsa mfundo zogwirira ntchito ndi nsanja zamtambo (kuzigwiritsa ntchito posungira pa hardware, ndikuwona mfundo zachitetezo).

SAP, Oracle, MySQL, Redis ndi zida zachikhalidwe za akatswiri opanga ma data m'makampani akuluakulu. Iwo ndi abwino, koma mtengo wa zilolezo ndi wokwera kwambiri moti kuphunzira kugwira nawo ntchito kumangomveka m'mapulojekiti a mafakitale. Panthawi imodzimodziyo, pali njira ina yaulere mu mawonekedwe a Postgres - ndi yaulere komanso yoyenera osati maphunziro okha. 

Data Engineer ndi Data Scientist: Kodi pali kusiyana kotani?
M'mbiri, zopempha za Java ndi Scala zimapezeka nthawi zambiri, ngakhale matekinoloje ndi njira zikukula, zilankhulo izi zimazimiririka kumbuyo.

Komabe, hardcore BigData: Hadoop, Spark ndi zoo zina sizilinso chofunikira kwa mainjiniya a data, koma mtundu wa zida zothetsera mavuto omwe sangathe kuthetsedwa ndi ETL yachikhalidwe. 

Zomwe zimachitika ndi ntchito zogwiritsa ntchito zida popanda kudziwa chilankhulo chomwe amalembedwa (mwachitsanzo, Hadoop osadziwa Java), komanso kupereka ntchito zokonzekera zosefera (kuzindikira mawu kapena kuzindikira zithunzi pavidiyo). ).

Mayankho a mafakitale ochokera ku SAS ndi SPSS ndi otchuka, pamene Tableau, Rapidminer, Stata ndi Julia amagwiritsidwanso ntchito kwambiri ndi asayansi a data pa ntchito zakomweko.

Data Engineer ndi Data Scientist: Kodi pali kusiyana kotani?
Kutha kupanga mapaipi okha kunawonekera kwa akatswiri ndi asayansi a data zaka zingapo zapitazo: mwachitsanzo, ndizotheka kale kutumiza deta ku malo osungirako a PostgreSQL pogwiritsa ntchito zolemba zosavuta. 

Nthawi zambiri, kugwiritsa ntchito mapaipi ndi ma data ophatikizika ophatikizika kumakhalabe udindo wa akatswiri opanga ma data. Koma masiku ano, mayendedwe a akatswiri owoneka ngati T omwe ali ndi luso lochulukirapo m'magawo okhudzana ndi amphamvu kuposa kale, chifukwa zida zimangosavuta.

Chifukwa chiyani Data Engineer ndi Data Scientist Amagwirira Ntchito Pamodzi

Pogwira ntchito limodzi ndi mainjiniya, Data Scientists amatha kuyang'ana mbali ya kafukufuku, ndikupanga ma algorithms okonzekera makina ophunzirira.
Ndipo mainjiniya amayenera kuyang'ana kwambiri za scalability, kugwiritsanso ntchito deta, ndikuwonetsetsa kuti zolowetsa ndi zotuluka mu projekiti iliyonse zikugwirizana ndi zomangamanga zapadziko lonse lapansi.

Kupatukana kwa maudindo uku kumatsimikizira kusasinthika kwamagulu omwe amagwira ntchito pama projekiti osiyanasiyana ophunzirira makina. 

Kugwirizana kumathandiza kupanga zatsopano bwino. Kuthamanga ndi khalidwe zimatheka kudzera muyeso pakati pa kupanga ntchito kwa aliyense (kusungirako padziko lonse lapansi kapena kuphatikiza ma dashboards) ndikukwaniritsa chosowa chilichonse kapena projekiti (mapaipi apadera kwambiri, kulumikiza magwero akunja). 

Kugwira ntchito limodzi ndi asayansi a data ndi akatswiri amathandizira mainjiniya kukhala ndi luso losanthula ndi kufufuza kuti alembe ma code abwinoko. Kugawana nzeru pakati pa malo osungiramo katundu ndi ogwiritsa ntchito nyanja ya data kumapita patsogolo, kumapangitsa kuti mapulojekiti akhale ofulumira komanso opereka zotsatira zokhalitsa.

M'makampani omwe akufuna kukhala ndi chikhalidwe chogwira ntchito ndi deta ndikumanga njira zamabizinesi potengera iwo, Data Scientist ndi Data Engineer amathandizirana ndikupanga dongosolo lathunthu losanthula deta. 

M'nkhani yotsatira tidzakambirana za mtundu wa maphunziro omwe Data Engineer ndi Data Scientists ayenera kukhala nawo, ndi luso lotani lomwe akufunikira kuti apange komanso momwe msika umagwirira ntchito.

Kuchokera kwa akonzi a Netology

Ngati mukuyang'ana ntchito ya Data Engineer kapena Data Scientist, tikukupemphani kuti muphunzire maphunziro athu:

Source: www.habr.com

Kuwonjezera ndemanga