Ngoobani iinjineli zedatha, kwaye uba ngomnye njani?

Molo kwakhona! Isihloko senqaku siyazithethela. Ngokulindela ukuqala kwekhosi Injineli yedatha Sicebisa ukuba uqonde ukuba ngoobani iinjineli zedatha. Kukho amakhonkco amaninzi aluncedo kwinqaku. Ukufunda okumnandi.

Ngoobani iinjineli zedatha, kwaye uba ngomnye njani?

Isikhokelo esilula sendlela yokubamba igagasi lobuNjineli beDatha kwaye ungavumeli ukuba likutsalele enzonzobileni.

Kubonakala ngathi wonke umntu ufuna ukuba yiNzululwazi yeDatha kule mihla. Kodwa kuthekani ngobuNjineli beDatha? Ngokusisiseko, olu luhlobo lwe-hybrid yomhlalutyi wedatha kunye nososayensi wedatha; Injineli yedatha ngokuqhelekileyo inoxanduva lokulawula ukuhamba komsebenzi, ukusetyenzwa kwemibhobho, kunye neenkqubo ze-ETL. Ngenxa yokubaluleka kwale misebenzi, le ngoku yenye yenye ijagoni yobuchwephesha edumileyo efumana amandla.

Imivuzo ephezulu kunye nemfuno enkulu yinxalenye nje encinci yento eyenza lo msebenzi ube nomtsalane kakhulu! Ukuba ufuna ukujoyina amanqanaba amagorha, akukaze kube mva kakhulu ukuba uqalise ukufunda. Kule post, ndiqokelele lonke ulwazi oluyimfuneko ukukunceda ukuba uthathe amanyathelo akho okuqala.

Ke, masiqale!

Yintoni Ubunjineli beDatha?

Ngokunyaniseka, akukho ngcaciso engcono kunale:

β€œIsazinzulu sinokufumanisa inkwenkwezi entsha, kodwa asinakuyenza. Kuya kufuneka acele injineli ukuba imenzele. "

-UGordon Lindsay Glegg

Ke ngoko, indima yenjineli yedatha ibaluleke kakhulu.

Njengoko igama libonisa, ubunjineli bedatha bujongene nedatha, oko kukuthi ukuhanjiswa kwayo, ukugcinwa kunye nokusebenza. Ngokufanelekileyo, umsebenzi ophambili weenjineli kukubonelela ngesiseko esithembekileyo sedatha. Ukuba sijonga kwi-AI yoluhlu lweemfuno, ubunjineli bedatha buthatha izigaba zokuqala ezi-2-3: ukuqokelela, intshukumo kunye nokugcinwa, ukulungiswa kwedatha.

Ngoobani iinjineli zedatha, kwaye uba ngomnye njani?

Wenza ntoni injineli yedatha?

Ngokufika kwedatha enkulu, ububanzi boxanduva butshintshe kakhulu. Ukuba ngaphambili ezi ngcali zabhala imibuzo enkulu ye-SQL kunye nedatha ye-distilled usebenzisa izixhobo ezifana ne-Informatica ETL, i-Pentaho ETL, iTalend, ngoku iimfuno zeenjineli zedatha ziye zanda.

Uninzi lweenkampani ezinezithuba ezivulekileyo zesikhundla sobunjineli bedatha zineemfuno ezilandelayo:

  • Ulwazi olugqwesileyo lweSQL kunye nePython.
  • Amava ngamaqonga elifu, ngakumbi iiNkonzo zeWebhu zeAmazon.
  • Ulwazi lweJava/Scala olukhethwayo.
  • Ukuqonda kakuhle i-SQL kunye ne-NoSQL yolwazi (umzekelo wedatha, ukugcinwa kwedatha).

Gcina ukhumbule, ezi zizinto eziyimfuneko kuphela. Kolu luhlu, kunokucingelwa ukuba iinjineli zedatha ziingcali kwinkalo yophuhliso lwesoftware kunye ne-backend.
Umzekelo, ukuba inkampani iqala ukuvelisa inani elikhulu ledatha evela kwimithombo eyahlukeneyo, umsebenzi wakho njengenjineli yedatha kukulungiselela ukuqokelela kolwazi, ukusetyenzwa kwayo kunye nokugcinwa kwayo.

Uluhlu lwezixhobo ezisetyenzisiweyo kule meko lunokwahluka, konke kuxhomekeke kumthamo wale datha, isantya sokufunyanwa kwayo kunye ne-heterogeneity. Uninzi lweenkampani azijongani nedatha enkulu konke konke, ke njengendawo yokugcina ephakathi, ebizwa ngokuba yindawo yokugcina idatha, ungasebenzisa isiseko sedatha yeSQL (iPostgreSQL, MySQL, njl.) indawo yokugcina impahla.

Izikhulu ze-IT ezifana neGoogle, i-Amazon, i-Facebook okanye i-Dropbox zineemfuno eziphezulu: ulwazi lwePython, iJava okanye i-Scala.

  • Amava ngedatha enkulu: Hadoop, Spark, Kafka.
  • Ulwazi lwe-algorithms kunye nezakhiwo zedatha.
  • Ukuqonda iziseko zeenkqubo ezisasazwayo.
  • Amava ngezixhobo zokubonwa kwedatha ezifana neTableau okanye i-ElasticSearch iya kuba ludibaniso.

Oko kukuthi, kukho utshintsho olucacileyo olubhekiselele kwidatha enkulu, oko kukuthi ekusebenzeni kwayo phantsi kwemithwalo ephezulu. Ezi nkampani ziye zanda iimfuno zokunyamezela iimpazamo zenkqubo.

Iinjineli zeDatha Vs. izazinzulu zedatha

Ngoobani iinjineli zedatha, kwaye uba ngomnye njani?
Kulungile, yayiluthelekiso olulula noluhlekisayo (akukho nto yobuqu), kodwa eneneni inzima kakhulu.

Okokuqala, kufuneka wazi ukuba kukho ukungafihlisi okuninzi ekuchazeni iindima kunye nezakhono zesazi sedatha kunye nenjineli yedatha. Oko kukuthi, unokubhideka ngokulula malunga nokuba zeziphi izakhono ezifunekayo ukuze ube yinjineli yedatha eyimpumelelo. Ewe, kukho izakhono ezithile ezihambelanayo zombini iindima. Kodwa kukwakho nenani lezakhono ezichasene nediametrically.

Inzululwazi yedatha lishishini elinzulu, kodwa sisiya kwihlabathi lesayensi yedatha esebenzayo apho iingcali zikwazi ukwenza uhlalutyo lwazo. Ukwenza imibhobho yedatha kunye nezakhiwo zedatha ezidibeneyo, udinga iinjineli zedatha, kungekhona izazinzulu zedatha.

Ngaba injineli yedatha ifunwa kakhulu kunesazinzulu sedatha?

- Ewe, kuba ngaphambi kokuba wenze ikhekhe le-karoti, kufuneka uqale uqokelele, udibanise kunye neekherothi zesitokhwe!

Injineli yedatha iqonda inkqubo engcono kunayo nayiphi na inzululwazi yedatha, kodwa xa kuziwa kwizibalo, okuchaseneyo kuyinyaniso.

Kodwa nantsi inzuzo yenjineli yedatha:

Ngaphandle kwakhe, ixabiso lemodeli yeprototype, ehlala iqulathe iqhekeza lekhowudi eyoyikekayo yomgangatho kwifayile yePython, efunyenwe kwisazi sedatha kwaye ngandlela thile ivelisa isiphumo, ithande ukuba zero.

Ngaphandle kwenjineli yedatha, le khowudi ayisoze ibe yiprojekthi kwaye akukho ngxaki yeshishini iya kusonjululwa ngokufanelekileyo. Injineli yedatha izama ukuguqula yonke le nto ibe yimveliso.

Ulwazi olusisiseko injineli yedatha kufuneka yazi

Ngoobani iinjineli zedatha, kwaye uba ngomnye njani?

Ngoko ke, ukuba lo msebenzi uvelisa ukukhanya kuwe kwaye unomdla-unokufunda, unokukwazi ukukwazi zonke izakhono eziyimfuneko kwaye ube yinkwenkwezi yokwenyani kwintsimi yobunjineli bedatha. Kwaye, ewe, unokutsala oku ngaphandle kwezakhono zokucwangcisa okanye olunye ulwazi lobugcisa. Kunzima, kodwa kunokwenzeka!

Ngawaphi amanyathelo okuqala?

Kuya kufuneka ube nombono oqhelekileyo wokuba yintoni na.

Okokuqala, ubuNjineli beDatha bubhekisa kwisayensi yekhompyuter. Ngokukodwa ngakumbi, kufuneka uqonde i-algorithms esebenzayo kunye nezakhiwo zedatha. Okwesibini, ekubeni iinjineli zedatha zisebenza ngedatha, kuyimfuneko ukuqonda imigaqo yogcino-lwazi kunye nezakhiwo eziphantsi kwazo.

Ngokomzekelo, ii-database ze-SQL ze-B-tree eziqhelekileyo zisekelwe kwisakhiwo sedatha ye-B-Tree, ngokunjalo, kwiindawo zokugcina ezisasazwayo zanamhlanje, i-LSM-Tree kunye nezinye izilungiso zeetafile ze-hash.

*La manyathelo asekelwe kwinqaku elikhulu Adilya Khashtamova. Ke, ukuba uyasazi isiRashiya, xhasa lo mbhali kwaye ufunde isithuba sakhe.

1. Ii-algorithms kunye nezakhiwo zedatha

Ukusebenzisa ulwakhiwo lwedatha oluchanekileyo kunokuphucula kakhulu ukusebenza kwe-algorithm. Ngokufanelekileyo, kufuneka sonke sifunde malunga nolwakhiwo lwedatha kunye ne-algorithms ezikolweni zethu, kodwa oku akufane kugutyungelwe. Kwimeko nayiphi na into, akukaze kube mva kakhulu ukuba uqhelane.
Ke nantsi ikhosi yam yasimahla yokufunda iziseko zedatha kunye ne-algorithms:

Kwaye ungalibali malunga nomsebenzi weklasikhi kaThomas Corman kwii-algorithms - Intshayelelo kwii-algorithms. Le yireferensi egqibeleleyo xa ufuna ukuhlaziya inkumbulo yakho.

  • Ukuphucula izakhono zakho, sebenzisa Leetcode.

Unokuntywila kwihlabathi leenkcukacha zolwazi ngeevidiyo ezimangalisayo ezivela kwiYunivesithi yaseCarnegie Mellon kwiYouTube:

2. Funda iSQL

Ubomi bethu bonke yidatha. Kwaye ukuze ukhuphe le datha kwisiseko sedatha, kufuneka "uthethe" ulwimi olufanayo kunye nayo.

I-SQL (i-Structured Query Language) lulwimi lonxibelelwano kwi-data domain. Nokuba umntu uthini na, uSQL uphile, uyaphila, kwaye uya kuhlala ixesha elide kakhulu.

Ukuba sele kuphuhliso ixesha elide, mhlawumbi uye waqaphela ukuba amarhe malunga ukufa okusondele SQL pop up ngamaxesha athile. Ulwimi lwaphuhliswa kwi-70s yokuqala kwaye lusathandwa kakhulu phakathi kwabahlalutyi, abaphuhlisi kunye nabathandi nje.
Ngaphandle kolwazi lwe-SQL akukho nto yakwenza kubunjineli bedatha njengoko ngokuqinisekileyo kuya kufuneka udale imibuzo ukuze ufumane idatha. Zonke iindawo zokugcina idatha ezinkulu zanamhlanje zixhasa iSQL:

  • I-Redshift yaseAmazon
  • HP Vertica
  • Oracle
  • SQL Server

... kunye nabanye abaninzi.

Ukuhlalutya uluhlu olukhulu lwedatha egcinwe kwiinkqubo ezisasazwayo ezifana ne-HDFS, iinjini ze-SQL zaqanjwa: i-Apache Hive, i-Impala, njl. Khangela, ayiyi ndawo.

Uyifunda njani iSQL? Yenza nje ngokusebenza.

Ukwenza oku, ndingacebisa ukuba ujonge isifundo esihle kakhulu, esithi, ngendlela, simahla, ukusuka Uhlalutyo lweMode.

  1. I-SQL ephakathi
  2. Ukujoyina iDatha kwiSQL

Yintoni eyenza ezi khosi zikhetheke kukuba zinendawo esebenzisanayo apho ungabhala kwaye uqhube imibuzo yeSQL kanye kwisikhangeli sakho. Umthombo I-SQL yanamhlanje ayiyi kuba yinto engafanelekanga. Kwaye ungasebenzisa olu lwazi ku Leetcode imisebenzi kwicandelo logcino-lwazi.

3. Inkqubo kwiPython kunye neJava / Scala

Kutheni ufanele ufunde ulwimi lweprogram yePython, sele ndibhale kwinqaku I-Python vs R. Ukukhetha esona sixhobo silungileyo se-AI, i-ML kunye neNzululwazi yeDatha. Xa kuziwa kwiJava kunye neScala, uninzi lwezixhobo zokugcina kunye nokucubungula inani elikhulu ledatha libhalwe ngezi lwimi. Umzekelo:

  • Apache Kafka (Scala)
  • Hadoop, HDFS (Java)
  • Apache Spark (Scala)
  • Apache Cassandra (Java)
  • I-HBase (iJava)
  • Apache Hive (Java)

Ukuqonda ukuba zisebenza njani ezi zixhobo, kufuneka wazi iilwimi ezibhalwe ngazo. Indlela yokusebenza yeScala ikuvumela ukuba usombulule ngokufanelekileyo iingxaki zokucwangcisa idatha. I-Python, ngelishwa, ayinakuqhayisa ngesantya kunye nokusebenza okufanayo. Ngokubanzi, ulwazi lweelwimi ezininzi kunye neeparadigms zeprogram zilungile kububanzi beendlela zokusombulula iingxaki.

Ukuntywila kulwimi lweScala, unokufunda Ukucwangcisa kwiScala kumbhali wolwimi. I-Twitter ikwapapashe isikhokelo esilungileyo sentshayelelo - Isikolo saseScala.

Ngokuphathelele iPython, ndiyakholwa IPython etyibilikayo eyona ncwadi ikumgangatho ophakathi.

4. Izixhobo zokusebenza ngedatha enkulu

Nalu uluhlu lwezona zixhobo zidumileyo kwihlabathi ledatha enkulu:

  • Apache Spark
  • Apache Kafka
  • Apache Hadoop (HDFS, HBase, Hive)
  • Apache cassandra

Ungafumana ulwazi oluthe kratya malunga nokwakha iibhloko ezinkulu zedatha kule emangalisayo imo engqongileyo esebenzisanayo. Izixhobo ezidumileyo zi-Spark kunye neKafka. Ngokuqinisekileyo bafanele ukufundisisa, kuyacetyiswa ukuba uqonde indlela abasebenza ngayo ngaphakathi. U-Jay Kreps (umbhali-mbhali we-Kafka) upapashe umsebenzi omkhulu kwi-2013 I-Log: Yintoni wonke uMvelisi weSoftware ekufuneka Ayazi malunga neXesha loQoqosho lokuDityaniswa kweDathaNgendlela, iingcamango eziphambili ezivela kule Talmud zazisetyenziselwa ukudala i-Apache Kafka.

5. Amaqonga elifu

Ngoobani iinjineli zedatha, kwaye uba ngomnye njani?

Ulwazi ubuncinane kwiplatifomu yefu kuluhlu lweemfuno ezisisiseko kubafaki-zicelo kwisithuba sobunjineli bedatha. Abaqeshi bakhetha iiNkonzo zeWebhu zeAmazon, kunye neqonga lelifu likaGoogle kwindawo yesibini kunye neMicrosoft Azure isondeza ezithathu eziphezulu.

Kuya kufuneka ube nolwazi oluhle lweAmazon EC2, AWS Lambda, Amazon S3, DynamoDB.

6. Iinkqubo ezisasazwayo

Ukusebenza ngedatha enkulu kuthetha ubukho bamaqela eekhompyuter ezisebenza ngokuzimeleyo, unxibelelwano phakathi kwalo lwenziwa kwinethiwekhi. Iqela elikhulu, kokukhona amathuba okuba angaphumeleli amalungu ayo amalungu. Ukuze ube ngusosayensi omkhulu wedatha, kufuneka uqonde iingxaki kunye nezisombululo ezikhoyo kwiinkqubo ezisasazwayo. Le ndawo indala kwaye intsonkothile.

UAndrew Tanenbaum uthathwa njengovulindlela kweli candelo. Kwabo bangoyikiyo ithiyori, ndincoma incwadi yakhe "Iinkqubo ezisasazwayo", kusenokubonakala kunzima kwabaqalayo, kodwa kuya kukunceda ngokwenene ulola izakhono zakho.

Ndiqwalasele Ukuyila ii-Data-Intensive Applications nguMartin Kleppmann eyona ncwadi intshayelelo. Ngendlela, uMartin unento emangalisayo Π±Π»ΠΎΠ³. Umsebenzi wakhe uya kunceda ukulungelelanisa ulwazi malunga nokwakhiwa kweziseko zophuhliso lwangoku zokugcina kunye nokucubungula idatha enkulu.
Kwabo bathanda ukubukela iividiyo, kukho ikhosi kwi-Youtube Iinkqubo zekhompyutha ezisasazwayo.

7. Imibhobho yedatha

Ngoobani iinjineli zedatha, kwaye uba ngomnye njani?

Imibhobho yedatha yinto ongenakuphila ngaphandle kwayo njengenjineli yedatha.

Amaxesha amaninzi, injineli yedatha yakha into ebizwa ngokuba ngumbhobho wedatha, oko kukuthi, udala inkqubo yokuhambisa idatha ukusuka kwindawo ukuya kwenye. Ezi zinokuba zizikripthi zesiko eziya kwi-API yenkonzo yangaphandle okanye zenze umbuzo we-SQL, ukwandisa idatha, kwaye uyibeke kwisitoreji esiphakathi (indawo yokugcina idatha) okanye isitoreji sedatha esingakhiwanga (amachibi edatha).

Ukushwankathela: uluhlu olusisiseko lwenjineli yedatha

Ngoobani iinjineli zedatha, kwaye uba ngomnye njani?

Ukushwankathela, ukuqonda kakuhle oku kulandelayo kuyafuneka:

  • IiNkqubo zoLwazi;
  • Uphuhliso lweSoftware (Agile, DevOps, Design Techniques, SOA);
  • Iinkqubo ezisasazwayo kunye neenkqubo ezifanayo;
  • Iziseko zeDatha eziBalulekileyo-uCwangciso, uYilo, ukuSebenza kunye nokuJonga iNgxaki;
  • Uyilo lweemvavanyo - iimvavanyo ze-A/B zokungqina iikhonsepthi, ukufumanisa ukuthembeka, ukusebenza kwenkqubo, kunye nokuphuhlisa iindlela ezithembekileyo zokuzisa izisombululo ezilungileyo ngokukhawuleza.

Ezi zimbalwa zeemfuno zokuba yinjineli yedatha, ngoko ke funda kwaye uqonde iinkqubo zedatha, iinkqubo zolwazi, ukuhanjiswa ngokuqhubekayo / ukuthunyelwa / ukuhlanganiswa, iilwimi zeprogram, kunye nezinye izihloko zesayensi yekhompyutha (hayi yonke imimandla yezifundo).

Kwaye ekugqibeleni, into yokugqibela kodwa ebaluleke kakhulu endifuna ukuyithetha.

Indlela eya ekubeni yiNjineli yeDatha ayilula njengoko isenokubonakala. Akaxoleli, uyaphoxa, kwaye kufuneka uyilungiselele le nto. Amaxesha athile kolu hambo asenokukunyanzela ukuba unikezele. Kodwa lo ngumsebenzi wokwenyani kunye nenkqubo yokufunda.

Musa nje ukuyifaka iswekile kwasekuqaleni. Inqaku elipheleleyo lokuhamba kukufunda kangangoko kwaye ulungele imingeni emitsha.
Nanku umfanekiso omhle endiye ndadibana nawo obonisa le ngongoma kakuhle:

Ngoobani iinjineli zedatha, kwaye uba ngomnye njani?

Kwaye ewe, khumbula ukuphepha ukudinwa kunye nokuphumla. Oku nako kubaluleke kakhulu. Nqwenelela impumelelo!

Ucinga ntoni ngenqaku, bahlobo? Siyakumema ukuba i-webinar yasimahla, eya kwenzeka namhlanje ngo-20.00. Ngexesha le-webinar, siya kuxubusha indlela yokwakha inkqubo esebenzayo kunye ne-scalable processing data yenkampani encinci okanye ukuqala ngexabiso elincinci. Njengesiqhelo, siya kuqhelana nezixhobo zokusetyenzwa kwedatha yeLifu likaGoogle. Ndiza kubona!

umthombo: www.habr.com

Yongeza izimvo