Unjiniyela Wedatha kanye Nososayensi Wedatha: Uyini umehluko?

Imisebenzi ye-Data Scientist kanye ne-Data Engineer ivame ukudideka. Inkampani ngayinye inemininingwane yayo yokusebenza nedatha, izinhloso ezehlukene zokuhlaziya kanye nombono ohlukile wokuthi yimuphi uchwepheshe okufanele abhekane nokuthi iyiphi ingxenye yomsebenzi, ngakho-ke ngamunye unezidingo zakhe. 

Ake sithole ukuthi uyini umehluko phakathi kwalaba chwepheshe, yiziphi izinkinga zebhizinisi abazixazululayo, yiziphi amakhono abanazo nokuthi bahola malini. Ukwaziswa kwabonakala kukukhulu, ngakho sakuhlukanisa kwaba izincwadi ezimbili.

Esihlokweni sokuqala, u-Elena Gerasimova, inhloko ye-faculty "Isayensi Yedatha Nezibalo" ku-Netology, isho ukuthi uyini umehluko phakathi kwe-Data Scientist kanye Nonjiniyela Wedatha nokuthi yimaphi amathuluzi abasebenza ngawo.

Ihluke kanjani izindima zonjiniyela nososayensi

Unjiniyela wedatha uchwepheshe, ngakolunye uhlangothi, othuthukisa, ahlole futhi agcine ingqalasizinda yedatha: isizindalwazi, ukugcinwa kanye nezinhlelo zokucubungula ubuningi. Ngakolunye uhlangothi, lo nguye ohlanza futhi "ahlanganise" idatha ukuze isetshenziswe abahlaziyi kanye nososayensi bedatha, okungukuthi, idala amapayipi okucubungula idatha.

I-Data Scientist idala futhi iqeqeshe amamodeli aqagelayo (namanye) isebenzisa ama-algorithms okufunda komshini namanethiwekhi emizwa, esiza amabhizinisi ukuthi athole amaphethini afihliwe, abikezele intuthuko futhi alungiselele izinqubo zebhizinisi ezibalulekile.

Umehluko omkhulu phakathi kwe-Data Scientist kanye Nonjiniyela Wedatha ukuthi ngokuvamile banemigomo ehlukene. Zombili zisebenzela ukuqinisekisa ukuthi idatha iyafinyeleleka futhi isezingeni eliphezulu. Kodwa i-Data Scientist ithola izimpendulo zemibuzo yayo futhi ihlola imibono yakhe ku-ecosystem yedatha (isibonelo, esekelwe ku-Hadoop), futhi i-Data Engineer idala ipayipi lokusevisa i-algorithm yokufunda ngomshini ebhalwe usosayensi wedatha kuqoqo le-Spark ngaphakathi okufanayo. i-ecosystem. 

Unjiniyela wedatha uletha inani ebhizinisini ngokusebenza njengengxenye yeqembu. Umsebenzi wayo ukwenza njengesixhumanisi esibalulekile phakathi kwabahlanganyeli abahlukene: kusukela konjiniyela kuya kubathengi bebhizinisi bokubika, kanye nokwandisa ukukhiqiza kwabahlaziyi, kusukela ekukhangiseni nomkhiqizo kuya ku-BI. 

I-Data Scientist, ngokuphambene nalokho, ibamba iqhaza elibonakalayo esu lenkampani futhi ikhiphe imininingwane, yenza izinqumo, isebenzise ama-algorithms wokuzenzakalela, ukumodela kanye nokukhiqiza inani ledatha.
Unjiniyela Wedatha kanye Nososayensi Wedatha: Uyini umehluko?

Ukusebenza ngedatha kungaphansi kwesimiso se-GIGO (udoti ophuma kudoti): uma abahlaziyi kanye nososayensi bedatha bebhekana nedatha engalungiselelwe futhi okungenzeka ingalungile, khona-ke imiphumela ngisho nokusebenzisa ama-algorithms okuhlaziya ayinkimbinkimbi izobe ingalungile. 

Onjiniyela bedatha baxazulula le nkinga ngokwakha amapayipi okucubungula, ukuhlanza nokuguqula idatha kanye nokuvumela ososayensi bedatha ukuthi basebenze ngedatha yekhwalithi ephezulu. 

Kunamathuluzi amaningi emakethe okusebenza ngedatha ehlanganisa zonke izigaba: kusukela ekubukeni kwedatha kuya kokuphumayo kuye kudeshibhodi yebhodi labaqondisi. Futhi kubalulekile ukuthi isinqumo sokuwasebenzisa senziwe unjiniyela - hhayi ngoba kuyimfashini, kodwa ngoba uzosiza ngempela umsebenzi wabanye abahlanganyeli kule nqubo. 

Ngokuvamile: uma inkampani idinga ukwenza ukuxhumana phakathi kwe-BI ne-ETL - ilayisha idatha kanye nemibiko yokubuyekeza, nasi isisekelo sefa esijwayelekile unjiniyela Wedatha okuzodingeka abhekane naso (kuhle uma kukhona futhi umakhi eqenjini).

Izibopho Zonjiniyela Wedatha

  • Ukuthuthukiswa, ukwakhiwa kanye nokugcinwa kwengqalasizinda yokucubungula idatha.
  • Ukuphatha amaphutha nokudala amapayipi okucubungula idatha athembekile.
  • Ukuletha idatha engahlelekile evela emithonjeni ehlukahlukene eguquguqukayo kufomu elidingekayo emsebenzini wabahlaziyi.
  • Ukunikeza izincomo zokuthuthukisa ukuvumelana kwedatha nekhwalithi.
  • Ukuhlinzeka nokugcina idatha ye-architecture esetshenziswa ososayensi bedatha nabahlaziyi bedatha.
  • Cubungula futhi ugcine idatha ngokungaguquki nangempumelelo kuqoqo elisabalalisiwe lamashumi noma amakhulu amaseva.
  • Linganisa ukuhwebelana kwezobuchwepheshe kwamathuluzi ukuze udale izakhiwo ezilula kodwa eziqinile ezingasinda ekuphazamisekeni.
  • Ukulawula nokusekelwa kokugeleza kwedatha namasistimu ahlobene (ukusetha ukuqapha nezixwayiso).

Kukhona okunye okukhethekile ngaphakathi kwe-Data Engineer trajectory - unjiniyela we-ML. Ngamafuphi, laba onjiniyela basebenza ngokukhethekile ekuletheni amamodeli okufunda ngomshini ekusetshenzisweni nasekusetshenzisweni kwezimboni. Ngokuvamile, imodeli etholwe kusosayensi wedatha iyingxenye yocwaningo futhi ingase ingasebenzi ezimweni zokulwa.

Izibopho Zososayensi Wedatha

  • Imonyula izici kudatha ukuze kusetshenziswe ama-algorithms okufunda komshini.
  • Ukusebenzisa amathuluzi okufunda omshini ahlukahlukene ukubikezela nokuhlukanisa amaphethini kudatha.
  • Ukuthuthukisa ukusebenza nokunemba kwama-algorithms okufunda komshini ngokulungisa kahle kanye nokwenza kahle ama-algorithms.
  • Ukwakhiwa kwemibono “eqinile” ngokuhambisana namasu enkampani adinga ukuhlolwa.

Bobabili i-Data Engineer kanye ne-Data Scientist bahlanganyela umnikelo obambekayo ekuthuthukisweni kwesiko ledatha, lapho inkampani ingakwazi ukukhiqiza inzuzo eyengeziwe noma ukunciphisa izindleko.

Yiziphi izilimi namathuluzi abasebenza ngazo onjiniyela nososayensi?

Namuhla, okulindelwe kososayensi bedatha kushintshile. Ngaphambilini, onjiniyela baqoqa imibuzo emikhulu ye-SQL, babhala ngokwenza i-MapReduce futhi bacubungule idatha besebenzisa amathuluzi afana ne-Informatica ETL, Pentaho ETL, Talend. 

Ngo-2020, uchwepheshe akakwazi ukwenza ngaphandle kolwazi lwePython namathuluzi wokubala esimanje (isibonelo, i-Airflow), ukuqonda imigomo yokusebenza ngamapulatifomu amafu (ukuwasebenzisa ukulondoloza ku-hardware, kuyilapho uqaphela izimiso zokuphepha).

I-SAP, i-Oracle, i-MySQL, i-Redis ngamathuluzi endabuko onjiniyela bedatha ezinkampanini ezinkulu. Zinhle, kodwa izindleko zamalayisense ziphezulu kangangokuthi ukufunda ukusebenza nazo kunengqondo kumaphrojekthi ezimboni. Ngesikhathi esifanayo, kukhona enye indlela yamahhala ngendlela ye-Postgres - imahhala futhi ayifanele ukuqeqeshwa kuphela. 

Unjiniyela Wedatha kanye Nososayensi Wedatha: Uyini umehluko?
Ngokomlando, izicelo ze-Java ne-Scala zivame ukutholakala, nakuba njengoba ubuchwepheshe nezindlela zithuthuka, lezi zilimi zifiphala ngemuva.

Nokho, i-hardcore BigData: I-Hadoop, i-Spark kanye nayo yonke i-zoo ayiseyona imfuneko kanjiniyela wedatha, kodwa uhlobo lwamathuluzi okuxazulula izinkinga ezingenakuxazululwa yi-ETL evamile. 

Ukuthambekela kuyizinsizakalo zokusebenzisa amathuluzi ngaphandle kolwazi lolimi abhalwe ngalo (isibonelo, i-Hadoop ngaphandle kolwazi lwe-Java), kanye nokuhlinzekwa kwezinsizakalo esezilungile zokucubungula idatha yokusakaza (ukuqashelwa kwezwi noma ukubonwa kwesithombe kuvidiyo ).

Izixazululo zezimboni ezivela kwa-SAS kanye ne-SPSS zidumile, kuyilapho i-Tableau, i-Rapidminer, i-Stata ne-Julia nazo zisetshenziswa kabanzi ososayensi bedatha emisebenzini yasendaweni.

Unjiniyela Wedatha kanye Nososayensi Wedatha: Uyini umehluko?
Amandla okwakha amapayipi ngokwawo abonakala kubahlaziyi nososayensi bedatha eminyakeni embalwa edlule: isibonelo, sekungenzeka kakade ukuthumela idatha kusitoreji esisekelwe ku-PostgreSQL usebenzisa imibhalo elula. 

Ngokuvamile, ukusetshenziswa kwamapayipi nezakhiwo zedatha ezihlanganisiwe kuhlala kuwumthwalo wonjiniyela bedatha. Kodwa namuhla, ukuthambekela kochwepheshe abanjengo-T abanekhono elibanzi emikhakheni ehlobene kunamandla kunangaphambili, ngoba amathuluzi ahlala enziwa lula.

Kungani Unjiniyela Wedatha kanye Nososayensi Wedatha Basebenza Ndawonye

Ngokusebenzisana eduze nonjiniyela, Ososayensi Bedatha bangagxila ohlangothini locwaningo, benze ama-algorithms okufunda omshini alungele ukukhiqiza.
Futhi onjiniyela badinga ukugxila ekulinganiseni, ekusetshenzisweni kabusha kwedatha, kanye nokuqinisekisa ukuthi amapayipi okufakwayo nokuphumayo kwedatha kuphrojekthi ngayinye athobelana nezakhiwo zomhlaba.

Lokhu kuhlukaniswa kwezibopho kuqinisekisa ukuvumelana kuwo wonke amaqembu asebenza kumaphrojekthi okufunda emishini ahlukene. 

Ukuhlanganyela kusiza ukudala imikhiqizo emisha kahle. Isivinini nekhwalithi kufinyelelwa ngokulinganisela phakathi kokwenza isevisi yawo wonke umuntu (isitoreji somhlaba wonke noma ukuhlanganiswa kwamadeshibhodi) nokusebenzisa isidingo ngasinye noma iphrojekthi ethile (ipayipi elikhethekile kakhulu, elixhuma imithombo yangaphandle). 

Ukusebenzisana eduze nososayensi nabahlaziyi bedatha kusiza onjiniyela bathuthukise amakhono okuhlaziya nokucwaninga ukuze babhale ikhodi engcono. Ukwabelana ngolwazi phakathi kwabasebenzisi bezindawo zokugcina izimpahla kanye nedatha yechibi kuyathuthuka, okwenza amaphrojekthi asheshe kakhulu futhi alethe imiphumela esimeme kakhudlwana yesikhathi eside.

Ezinkampanini ezihlose ukuthuthukisa isiko lokusebenza ngedatha nokwakha izinqubo zebhizinisi ezisekelwe kuzo, i-Data Scientist kanye Nonjiniyela Wedatha bayaphelelisana futhi bakhe uhlelo oluphelele lokuhlaziya idatha. 

Esihlokweni esilandelayo sizokhuluma ngokuthi yiluphi uhlobo lwemfundo okufanele abe nayo i-Data Engineer kanye ne-Data Scientists, yimaphi amakhono abadinga ukuwathuthukisa nokuthi imakethe isebenza kanjani.

Kusuka kubahleli beNetology

Uma ubheka umsebenzi we-Data Engineer noma Data Scientist, sikumema ukuthi ufunde izinhlelo zethu zezifundo:

Source: www.habr.com

Engeza amazwana