Injineli yeDatha kunye neNzululwazi yeDatha: Yintoni umahluko?

Ubuchwephesha beNzululwazi yeDatha kunye neNjineli yeDatha bahlala bebhidekile. Inkampani nganye ineenkcukacha zayo zokusebenza ngedatha, iinjongo ezahlukeneyo zokuhlalutya kwazo kunye nombono owahlukileyo wokuba yeyiphi ingcali ekufuneka ijongane nenxalenye yomsebenzi, ngoko ke nganye ineemfuno zayo. 

Makhe sijonge ukuba yintoni umahluko phakathi kwezi ngcali, zeziphi iingxaki zoshishino abazisombululayo, zeziphi izakhono abanazo kunye nokuba bamkela malini. Lo mbandela wabonakala umkhulu, ngoko sawuhlulahlula waba ziimpapasho ezimbini.

Kwinqaku lokuqala, u-Elena Gerasimova, intloko ye-faculty "ISayensi yeDatha kunye noHlaziyo"kwiNetology, ixela ukuba yintoni umahluko phakathi kweNzululwazi yeDatha kunye neNjineli yeDatha kunye nokuba zeziphi izixhobo abasebenza ngazo.

Iindima zeenjineli kunye nezazinzulu zahluke njani

Injineli yedatha yingcali ethi, kwelinye icala, iphuhlise, ivavanye kwaye igcine iziseko zedatha: i-database, ukugcinwa kunye neenkqubo zokucwangcisa ubunzima. Ngakolunye uhlangothi, lo nguye ohlambululayo kunye "ne-combs" idatha yokusetyenziswa ngabahlalutyi kunye nososayensi bedatha, oko kukuthi, kudala imibhobho yokucubungula idatha.

Isazi seDatha sidala kwaye siqeqeshe iimodeli zokuxela kwangaphambili (kunye nezinye) zisebenzisa i-algorithms yokufunda koomatshini kunye nenethiwekhi ye-neural, ukunceda amashishini ukuba afumane iipateni ezifihliweyo, ukuqikelela uphuhliso kunye nokwandisa iinkqubo eziphambili zoshishino.

Umahluko omkhulu phakathi kweNzululwazi yeDatha kunye neNjineli yeDatha kukuba bahlala beneenjongo ezahlukeneyo. Zombini zisebenza ukuqinisekisa ukuba idatha iyafikeleleka kwaye ikumgangatho ophezulu. Kodwa iNzululwazi yeDatha ifumana iimpendulo kwimibuzo yakhe kwaye ivavanya iingqikelelo kwi-ecosystem yedatha (umzekelo, esekwe kwiHadoop), kwaye iNjineli yeDatha yenza umbhobho wokusevisa umatshini wokufunda umatshini obhalwe ngusosayensi wedatha kwiqela leSpark ngaphakathi okufanayo. inkqubo yendalo. 

Injineli yedatha izisa ixabiso kwishishini ngokusebenza njengenxalenye yeqela. Umsebenzi wayo kukusebenza njengekhonkco elibalulekileyo phakathi kwabathathi-nxaxheba abahlukeneyo: ukusuka kubaphuhlisi ukuya kubathengi bezoshishino beengxelo, kunye nokwandisa imveliso yabahlalutyi, ukusuka kwintengiso kunye nemveliso ukuya kwi-BI. 

Isazi seDatha, ngokuchaseneyo, sithatha inxaxheba esebenzayo kwisicwangciso senkampani kunye nokukhupha ukuqonda, ukwenza izigqibo, ukuphumeza i-algorithms ye-automation, imodeli kunye nokuvelisa ixabiso kwidatha.
Injineli yeDatha kunye neNzululwazi yeDatha: Yintoni umahluko?

Ukusebenza kunye nedatha kuxhomekeke kumgaqo we-GIGO (udoti - udoti ngaphandle): ukuba abahlalutyi kunye noososayensi bedatha bajongana nedatha engalungiselelwanga kwaye enokuthi ingabikho, iziphumo kunye nokusebenzisa i-algorithms yohlalutyo olunzima kakhulu luya kuba lungalunganga. 

Iinjineli zedatha zixazulula le ngxaki ngokwakha imibhobho yokucubungula, ukucoca kunye nokuguqula idatha kunye nokuvumela izazinzulu zedatha ukuba zisebenze ngedatha ephezulu. 

Kukho izixhobo ezininzi kwiimarike zokusebenza ngedatha egubungela zonke izigaba: ukusuka ekubonakaleni kwedatha ukuya kwimveliso ukuya kwideshibhodi yebhodi yabalawuli. Kwaye kubalulekile ukuba isigqibo sokuzisebenzisa senziwe yinjineli - kungekhona ngenxa yokuba imfashini, kodwa ngenxa yokuba uya kunceda ngokwenene umsebenzi wabanye abathathi-nxaxheba kwinkqubo. 

Ngokwesiqhelo: ukuba inkampani idinga ukwenza unxibelelwano phakathi kwe-BI kunye ne-ETL - ukulayisha idatha kunye neengxelo ezihlaziyiweyo, nantsi isiseko selifa eliqhelekileyo apho iNjineli yeDatha iya kufuneka ijongane nayo (kulungile ukuba kukho nomyili wezakhiwo kwiqela).

Uxanduva lweNjineli yeDatha

  • Uphuhliso, ukwakhiwa kunye nokugcinwa kweziseko zophuhliso lwedatha.
  • Ukuphatha iimpazamo kunye nokudala imibhobho yokucubungula idatha ethembekileyo.
  • Ukuzisa idatha engacwangciswanga evela kwimithombo eyahlukeneyo eguquguqukayo kwifom efunekayo kumsebenzi wabahlalutyi.
  • Ukubonelela ngeengcebiso zokuphucula ukuhambelana kwedatha kunye nomgangatho.
  • Ukubonelela kunye nokugcina ulwakhiwo lwedatha olusetyenziswa zizazinzulu zedatha kunye nabahlalutyi bedatha.
  • Qhubeka kwaye ugcine idatha ngokungaguquguqukiyo nangokufanelekileyo kwiqela elisasazwe lamashumi okanye amakhulu abancedisi.
  • Vavanya urhwebo lobugcisa lwezixhobo zokwenza i-architecture elula kodwa eyomeleleyo enokusinda ukuphazamiseka.
  • Ukulawulwa kunye nenkxaso yokuhamba kwedatha kunye neenkqubo ezinxulumene nazo (ukubeka iliso kunye nezilumkiso).

Kukho enye into ekhethekileyo ngaphakathi kweNjineli yeDatha trajectory - injineli yeML. Ngamafutshane, ezi njineli zisebenza ngokukhethekileyo ekuziseni iimodeli zokufunda koomatshini ekuphunyezweni nasekusetyenzisweni kwamashishini. Rhoqo, imodeli efunyenwe kwisazi sedatha iyinxalenye yophononongo kwaye ayinakusebenza kwiimeko zokulwa.

Uxanduva lweNzululwazi yeDatha

  • Ukutsalwa kweempawu kwidatha ukusebenzisa i-algorithms yokufunda koomatshini.
  • Ukusebenzisa izixhobo zokufunda zoomatshini ezahlukeneyo ukuqikelela nokwahlulahlula iipatheni kwidatha.
  • Ukuphucula ukusebenza kunye nokuchaneka kwe-algorithms yokufunda koomatshini ngokulungisa kakuhle kunye nokuphucula i-algorithms.
  • Ukuqulunqwa kweengcamango "ezomeleleyo" ngokuhambelana nesicwangciso senkampani esifuna ukuvavanywa.

Bobabini iNjineli yeDatha kunye neNzululwazi yeDatha babelana ngegalelo elibonakalayo kuphuhliso lwenkcubeko yedatha, apho inkampani inokuvelisa inzuzo eyongezelelweyo okanye ukunciphisa iindleko.

Zeziphi iilwimi kunye nezixhobo abasebenza ngazo iinjineli kunye nezazinzulu?

Namhlanje, ulindelo lwezazinzulu zedatha zitshintshile. Ngaphambili, iinjineli ziqokelele imibuzo emikhulu ye-SQL, ibhala ngesandla i-MapReduce kwaye iqhutywe idatha isebenzisa izixhobo ezifana ne-Informatica ETL, i-Pentaho ETL, iTalend. 

Kwi-2020, ingcali ayikwazi ukwenza ngaphandle kolwazi lwePython kunye nezixhobo zokubala zanamhlanje (umzekelo, i-Airflow), ukuqonda imigaqo yokusebenza kunye neeplatifomu zamafu (ukusebenzisa ukugcina kwi-hardware, ngelixa uqwalasela imigaqo yokhuseleko).

I-SAP, i-Oracle, i-MySQL, i-Redis zixhobo zemveli zeenjineli zedatha kwiinkampani ezinkulu. Zilungile, kodwa ixabiso leelayisenisi liphezulu kangangokuba ukufunda ukusebenzisana nabo kunengqiqo kwiiprojekthi zoshishino. Ngelo xesha, kukho enye indlela yamahhala ngendlela yePostgres - ikhululekile kwaye ifanelekile kungekhona kuphela ukuqeqeshwa. 

Injineli yeDatha kunye neNzululwazi yeDatha: Yintoni umahluko?
Ngokwembali, izicelo zeJava kunye neScala zihlala zifunyanwa, nangona itekhnoloji kunye neendlela zikhula, ezi lwimi ziphela ngasemva.

Nangona kunjalo, i-hardcore BigData: I-Hadoop, i-Spark kunye nayo yonke i-zoo ayiseyomfuneko kwinjineli yedatha, kodwa luhlobo lwezixhobo zokusombulula iingxaki ezingenako ukusonjululwa yi-ETL yendabuko. 

Umkhwa ziinkonzo zokusebenzisa izixhobo ngaphandle kolwazi lolwimi ezibhalwe ngalo (umzekelo, iHadoop ngaphandle kolwazi lweJava), kunye nokubonelela ngeenkonzo ezilungiselelwe ukusetyenzwa kwedatha yokusasaza (ukuqondwa kwezwi okanye ukubonwa komfanekiso kwividiyo. ).

Izisombululo zoshishino ezivela kwi-SAS kunye ne-SPSS zithandwa, ngelixa i-Tableau, i-Rapidminer, i-Stata kunye ne-Julia nazo zisetyenziswa ngokubanzi ngoososayensi bedatha kwimisebenzi yendawo.

Injineli yeDatha kunye neNzululwazi yeDatha: Yintoni umahluko?
Ukukwazi ukwakha imibhobho ngokwazo kubonakala kubahlalutyi kunye nososayensi bedatha kwiminyaka embalwa edlulileyo: umzekelo, sele kunokwenzeka ukuthumela idatha kwi-PostgreSQL-based storage usebenzisa izikripthi ezilula. 

Ngokuqhelekileyo, ukusetyenziswa kwemibhobho kunye nezakhiwo zedatha ezidibeneyo zihlala zixanduva lweenjineli zedatha. Kodwa namhlanje, umkhwa weengcali ezimilise okwe-T ezinobuchule obubanzi kwiinkalo ezinxulumeneyo zomelele kunangaphambili, kuba izixhobo zihlala zisenziwa lula.

Kutheni iNjineli yeDatha kunye neNzululwazi yeDatha isebenza kunye

Ngokusebenzisana ngokusondeleyo neenjineli, iiNzululwazi zeDatha zinokugxila kwicala lophando, zenze i-algorithms yokufunda umatshini wokuvelisa.
Kwaye iinjineli kufuneka zigxininise kwi-scalability, ukusetyenziswa kwakhona kwedatha, kunye nokuqinisekisa ukuba igalelo ledatha kunye nemibhobho yemveliso kwiprojekthi nganye nganye ihambelana noyilo lwehlabathi.

Oku kwahlulwa koxanduva kuqinisekisa ukungaguquguquki kumaqela asebenza kwiiprojekthi ezahlukeneyo zokufunda koomatshini. 

Ukusebenzisana kunceda ukudala iimveliso ezintsha ngokufanelekileyo. Isantya kunye nomgangatho zifezekiswa ngolungelelwaniso phakathi kokwenza inkonzo yakhe wonke umntu (ukugcinwa kwehlabathi okanye ukudityaniswa kweedeshbhodi) kunye nokuphumeza imfuno okanye iprojekthi nganye (umbhobho okhethekileyo, odibanisa imithombo yangaphandle). 

Ukusebenza ngokusondeleyo kunye nezazinzulu zedatha kunye nabahlalutyi kunceda iinjineli ziphuhlise izakhono zokuhlalutya kunye nophando ukuze zibhale ikhowudi engcono. Ukwabelana ngolwazi phakathi kwendawo yokugcina kunye nabasebenzisi bechibi bedatha kuphucula, ukwenza iiprojekthi zibe lula kwaye zinike iziphumo ezizinzileyo zexesha elide.

Kwiinkampani ezijolise ekuphuhliseni inkcubeko yokusebenza kunye nedatha kunye nokwakha iinkqubo zoshishino ezisekelwe kuzo, iNzululwazi yeDatha kunye neNjineli yeDatha iyancedisana kwaye yenze inkqubo yokuhlalutya idatha epheleleyo. 

Kwinqaku elilandelayo siza kuthetha malunga noluphi uhlobo lwemfundo iNjineli yeDatha kunye neeNzululwazi zeDatha kufuneka zibe nazo, zeziphi izakhono ezifunekayo ukuze ziphuhlise kunye nendlela imarike isebenza ngayo.

Ukusuka kubahleli beNetology

Ukuba ujonge ubuchwephesha beNjineli yeDatha okanye iNzululwazi yeDatha, siyakumema ukuba ufunde iinkqubo zethu zekhosi:

umthombo: www.habr.com

Yongeza izimvo