Sawubona futhi! Isihloko se-athikili siyazikhulumela. Sekulindelwe ukuqala kwezifundo
Umhlahlandlela olula wokuthi ungabamba kanjani igagasi lobunjiniyela bedatha futhi ungalivumeli likudonsele kwalasha.
Kubonakala sengathi wonke umuntu ufuna ukuba i-Data Scientist kulezi zinsuku. Kodwa kuthiwani ngobunjiniyela beDatha? Ngokuyinhloko, lolu uhlobo lwenhlanganisela yomhlaziyi wedatha kanye nososayensi wedatha; Unjiniyela wedatha ngokuvamile unesibopho sokuphatha ukugeleza komsebenzi, amapayipi okucubungula, nezinqubo ze-ETL. Ngenxa yokubaluleka kwale misebenzi, lena okwamanje enye i-jargon yobungcweti edumile ethola umfutho.
Amaholo aphezulu kanye nesidingo esikhulu kuyingxenye encane yalokho okwenza lo msebenzi ukhange ngokwedlulele! Uma ufuna ukujoyina ama-heroes, akukephuzi kakhulu ukuthi uqale ukufunda. Kulokhu okuthunyelwe, ngiqoqe lonke ulwazi oludingekayo ukukusiza ukuthi uthathe izinyathelo zakho zokuqala.
Yini Ubunjiniyela Bedatha?
Ngokweqiniso, ayikho incazelo engcono kunalena:
βUsosayensi angathola inkanyezi entsha, kodwa akakwazi ukuyidala. Kuzofanele acele unjiniyela ukuthi amenzele."
- UGordon Lindsay Glegg
Ngakho, indima kanjiniyela wedatha ibaluleke kakhulu.
Njengoba igama liphakamisa, ubunjiniyela bedatha buphathelene nedatha, okungukuthi ukulethwa kwayo, ukugcinwa nokucutshungulwa kwayo. Ngakho-ke, umsebenzi oyinhloko wonjiniyela ukuhlinzeka ngengqalasizinda ethembekile yedatha. Uma sibheka isigaba sezidingo ze-AI, ubunjiniyela bedatha buthatha izigaba zokuqala ezi-2-3: ukuqoqwa, ukunyakaza nokugcinwa, ukulungiswa kwedatha.
Wenzani unjiniyela wedatha?
Ngokufika kwedatha enkulu, ububanzi bomthwalo wemfanelo bushintshe kakhulu. Uma ngaphambilini laba chwepheshe bebebhala imibuzo emikhulu ye-SQL kanye nedatha ehlutshiwe besebenzisa amathuluzi afana ne-Informatica ETL, Pentaho ETL, Talend, manje izidingo zonjiniyela bedatha zinyukile.
Izinkampani eziningi ezinezikhala ezivulekile zesikhundla sikanjiniyela wedatha zinezidingo ezilandelayo:
- Ulwazi oluhle kakhulu lwe-SQL nePython.
- Zizwa ngezinkundla zamafu, ikakhulukazi i-Amazon Web Services.
- Ulwazi lwe-Java/Scala luyakhethwa.
- Ukuqonda kahle kolwazi lwe-SQL kanye ne-NoSQL (imodeli yedatha, ukugcinwa kwedatha).
Khumbula, lezi yizinto ezibalulekile kuphela. Kulolu hlu, kungacatshangwa ukuthi onjiniyela bedatha bangochwepheshe emkhakheni wokuthuthukiswa kwesofthiwe kanye ne-backend.
Isibonelo, uma inkampani iqala ukukhiqiza inani elikhulu ledatha evela emithonjeni ehlukahlukene, umsebenzi wakho njengonjiniyela wedatha ukuhlela ukuqoqwa kolwazi, ukucutshungulwa kwalo nokugcinwa kwalo.
Uhlu lwamathuluzi asetshenziswe kuleli cala lungase luhluke, konke kuncike kumthamo wale datha, isivinini sokutholwa kwayo kanye ne-heterogeneity. Izinkampani eziningi azibhekani nedatha enkulu nhlobo, ngakho-ke njengenqolobane ephakathi nendawo, okuthiwa yi-warehouse yedatha, ungasebenzisa isizindalwazi se-SQL (i-PostgreSQL, i-MySQL, njll.) enesethi encane yemibhalo efaka idatha ku inqolobane.
Iziqhwaga ze-IT ezifana ne-Google, i-Amazon, i-Facebook noma i-Dropbox zinezidingo eziphakeme: ulwazi lwePython, i-Java noma i-Scala.
- Isipiliyoni ngedatha enkulu: Hadoop, Spark, Kafka.
- Ulwazi lwama-algorithms nezakhiwo zedatha.
- Ukuqonda izisekelo zezinhlelo ezisabalalisiwe.
- Ukuzizwisa ngamathuluzi okubonisa idatha njenge-Tableau noma i-ElasticSearch kuzoba ukuhlanganisa.
Okusho ukuthi, kukhona ukuguquguquka okucacile kudatha enkulu, okungukuthi ekucubunguleni kwayo ngaphansi kwemithwalo ephezulu. Lezi zinkampani ziye zanda izimfuneko zokubekezelela amaphutha esistimu.
Onjiniyela Bedatha Vs. ososayensi bedatha
Kulungile, lokho bekuwukuqhathanisa okulula nokuhlekisayo (akukho lutho lomuntu siqu), kodwa empeleni kuyinkimbinkimbi kakhulu.
Okokuqala, kufanele wazi ukuthi kunokuningi okungaqondakali ekuchazeni izindima namakhono kasosayensi wedatha kanye nonjiniyela wedatha. Okusho ukuthi, ungadideka kalula ngokuthi yimaphi amakhono adingekayo ukuze ube unjiniyela wedatha ophumelelayo. Kunjalo, kukhona amakhono athile adlulana ngazo zombili izindima. Kodwa kukhona nenani lamakhono aphikisana ne-diametrically.
Isayensi yedatha iyibhizinisi elibucayi, kodwa sibheke emhlabeni wesayensi yedatha esebenzayo lapho ongoti bekwazi khona ukuzenzela izibalo zabo. Ukuze unike amandla amapayipi edatha nezakhiwo zedatha ezihlanganisiwe, udinga onjiniyela bedatha, hhayi ososayensi bedatha.
Ingabe unjiniyela wedatha udingeka kakhulu kunososayensi wedatha?
- Yebo, ngoba ngaphambi kokwenza ikhekhe le-carrot, udinga ukuvuna kuqala, ikhasi bese ubeka izaqathe!
Unjiniyela wedatha ukuqonda izinhlelo kangcono kunanoma yimuphi usosayensi wedatha, kodwa uma kuziwa ezibalweni, okuphambene kuyiqiniso.
Kodwa nansi inzuzo kanjiniyela wedatha:
Ngaphandle kwakhe, inani lemodeli ye-prototype, evame ukuhlanganisa ucezu lwekhodi yekhwalithi embi efayeleni lePython, elitholwe kusosayensi wedatha futhi ngandlela thize likhiqiza umphumela, livame ukuba ziro.
Ngaphandle kukanjiniyela wedatha, le khodi ayisoze yaba iphrojekthi futhi ayikho inkinga yebhizinisi ezoxazululwa ngempumelelo. Unjiniyela wedatha uzama ukuguqula konke lokhu kube umkhiqizo.
Ulwazi oluyisisekelo unjiniyela wedatha okufanele alwazi
Ngakho-ke, uma lo msebenzi uveza ukukhanya kuwe futhi unomdlandla - ungawufunda, ungakwazi kahle wonke amakhono adingekayo futhi ube inkanyezi yangempela ye-rock emkhakheni wobunjiniyela bedatha. Futhi, yebo, ungakhipha lokhu ngaphandle kwamakhono wokuhlela noma olunye ulwazi lobuchwepheshe. Kunzima, kodwa kungenzeka!
Yiziphi izinyathelo zokuqala?
Kufanele ube nombono ojwayelekile wokuthi kuyini.
Okokuqala nje, i-Data Engineering ibhekisela kwisayensi yekhompyutha. Ngokucacile, kufanele uqonde ama-algorithms asebenzayo kanye nezakhiwo zedatha. Okwesibili, njengoba onjiniyela bedatha besebenza ngedatha, kuyadingeka ukuqonda izimiso zolwazi kanye nezakhiwo eziwasekelayo.
Isibonelo, isizindalwazi esivamile se-B-tree SQL sisekelwe esakhiweni sedatha ye-B-Tree, kanye, kumakhosombe asabalalisiwe esimanje, i-LSM-Tree nokunye ukuguqulwa kwamathebula e-hashi.
*Lezi zinyathelo zisekelwe esihlokweni esihle kakhulu
1. Ama-algorithms nezakhiwo zedatha
Ukusebenzisa ukwakheka kwedatha efanele kungathuthukisa kakhulu ukusebenza kwe-algorithm. Ngokufanelekile, sonke kufanele ngabe sifunda mayelana nezakhiwo zedatha nama-algorithms ezikoleni zethu, kodwa lokhu akuvamile ukuthi kuhlanganiswe. Kunoma yikuphi, akukephuzi kakhulu ukujwayelana.
Ngakho-ke nazi izifundo zami zamahhala eziyintandokazi zokufunda izakhiwo zedatha nama-algorithms:
Ukusuka Okulula kuye Okuyinkimbinkimbi: Izakhiwo Zedatha (Udemy) Ama-algorithms, Ingxenye I (Coursera) Ama-algorithms, Ingxenye II (Coursera)
Futhi ungakhohlwa ngomsebenzi wakudala kaThomas Corman kuma-algorithms -
- Ukuze uthuthukise amakhono akho, sebenzisa
Ikhodi ye-Leetcode .
Ungakwazi futhi ukungena emhlabeni wolwazi onamavidiyo amangalisayo avela eCarnegie Mellon University ku-YouTube:
2. Funda i-SQL
Impilo yethu yonke idatha. Futhi ukuze ukhiphe le datha kusizindalwazi, udinga "ukhuluma" ulimi olufanayo nayo.
I-SQL (Ulimi Oluhlelekile Lombuzo) iwulimi lokuxhumana esizindeni sedatha. Kungakhathaliseki ukuthi ubani uthini, i-SQL iphile, iyaphila, futhi izophila isikhathi eside kakhulu.
Uma usunesikhathi eside uthuthukiswa, kungenzeka ukuthi uqaphele ukuthi amahlebezi mayelana nokufa okuseduze kwe-SQL avela ngezikhathi ezithile. Ulimi lwathuthukiswa ekuqaleni kwawo-70s futhi lusathandwa kakhulu phakathi kwabahlaziyi, abathuthukisi kanye nabathandi nje.
Ngaphandle kolwazi lwe-SQL akukho okungenziwa kubunjiniyela bedatha njengoba nakanjani kuzodingeka udale imibuzo ukuze ubuyise idatha. Zonke izindawo zokugcina idatha ezinkulu zesimanje zisekela i-SQL:
- I-Amazon Redshift
- I-HP Vertica
- Oracle
- I-SQL Server
... nabanye abaningi.
Ukuze kuhlaziywe isendlalelo esikhulu sedatha egcinwe kumasistimu asabalalisiwe njenge-HDFS, izinjini ze-SQL zasungulwa: i-Apache Hive, i-Impala, njll. Bheka, ayiyi ndawo.
Ungayifunda kanjani i-SQL? Vele ukwenze ngokusebenza.
Ukuze wenze lokhu, ngingancoma ukuthi uhlole okokufundisa okuhle kakhulu, okuthi, ngendlela, kumahhala, kusuka
Okwenza lezi zifundo zikhetheke ukuthi zinendawo yokusebenzisana lapho ungabhala futhi usebenzise imibuzo ye-SQL khona kanye esipheqululini sakho. Insiza
3. Ukuhlelwa ku-Python ne-Java/Scala
Kungani kufanele ufunde ulimi lohlelo lwePython, sengibhale esihlokweni
- I-Apache Kafka (Scala)
- I-Hadoop, HDFS (Java)
- I-Apache Spark (Scala)
- I-Apache Cassandra (Java)
- I-HBase (Java)
- I-Apache Hive (Java)
Ukuze uqonde ukuthi la mathuluzi asebenza kanjani, udinga ukwazi izilimi abhalwe ngazo. Indlela yokusebenza ye-Scala ikuvumela ukuthi uxazulule ngempumelelo izinkinga zokucubungula idatha efanayo. I-Python, ngeshwa, ayikwazi ukuziqhayisa ngesivinini nokucubungula okufanayo. Ngokuvamile, ulwazi lwezilimi ezimbalwa kanye nama-paradigms ohlelo luhle ngobubanzi bezindlela zokuxazulula izinkinga.
Ukuze ungene ngolimi lwe-Scala, ungafunda
Ngokuqondene nePython, ngiyakholwa
4. Amathuluzi okusebenza ngedatha enkulu
Nalu uhlu lwamathuluzi adume kakhulu emhlabeni wedatha enkulu:
- I-Apache Spark
- Apache Kafka
- I-Apache Hadoop (HDFS, HBase, Hive)
- Apache Cassandra
Ungathola ulwazi olwengeziwe mayelana nokwakha amabhulokhi edatha amakhulu kulokhu okumangalisayo
- Isingeniso se-Hadoop singaba
Umhlahlandlela ophelele we-Mastering Hadoop (Mahhala) . - Umhlahlandlela ophelele kakhulu we-Apache Spark kimina uthi -
I-Spark: Umhlahlandlela Ophelele .
5. Izinkundla zamafu
Ulwazi okungenani lwenkundla yefu eyodwa lisohlwini lwezidingo eziyisisekelo zabafakizicelo besikhundla sikanjiniyela wedatha. Abaqashi bakhetha i-Amazon Web Services, inkundla ye-Google yamafu isendaweni yesibili kanye ne-Microsoft Azure ehlanganisa ezintathu eziphezulu.
Kufanele ube nolwazi oluhle lwe-Amazon EC2, AWS Lambda, Amazon S3, DynamoDB.
6. Amasistimu asabalalisiwe
Ukusebenza ngedatha enkulu kusho ukuba khona kwamaqoqo amakhompyutha asebenza ngokuzimela, ukuxhumana phakathi kwawo okwenziwa ngenethiwekhi. Uma iqoqo likhulu, makhulu amathuba okuthi ukwehluleka kwamanodi amalungu alo. Ukuze ube usosayensi wedatha omkhulu, udinga ukuqonda izinkinga nezixazululo ezikhona zezinhlelo ezisabalalisiwe. Le ndawo indala futhi iyinkimbinkimbi.
U-Andrew Tanenbaum uthathwa njengephayona kulo mkhakha. Kulabo abangayesabi ithiyori, ngincoma incwadi yakhe
Ngiyacabanga
Kulabo abathanda ukubuka amavidiyo, kukhona isifundo ku-Youtube
7. Amapayipi edatha
Amapayipi edatha yinto ongeke ukwazi ukuphila ngaphandle kwayo njengonjiniyela wedatha.
Isikhathi esiningi, unjiniyela wedatha wakha lokho okubizwa ngokuthi ipayipi ledatha, okungukuthi, udala inqubo yokuletha idatha kusuka endaweni eyodwa kuya kwenye. Lokhu kungase kube imibhalo yangokwezifiso eya ku-API yesevisi yangaphandle noma yenze umbuzo we-SQL, ikhulise idatha, bese iyibeka esitolo esiphakathi nendawo (inqolobane yedatha) noma isitolo sedatha esingahlelekile (amachibi edatha).
Ukufingqa: uhlu lokuhlola oluyisisekelo lukanjiniyela wedatha
Ukufingqa, ukuqonda okuhle kwalokhu okulandelayo kuyadingeka:
- Izinhlelo Zolwazi;
- Ukuthuthukiswa kwesofthiwe (i-Agile, i-DevOps, i-Design Techniques, i-SOA);
- Amasistimu asabalalisiwe kanye nezinhlelo ezifanayo;
- Okuyisisekelo Kwesizindalwazi - Ukuhlela, Ukuklama, Ukusebenza kanye Nokuxazulula Izinkinga;
- Idizayini yokuhlolwa - Ukuhlolwa kwe-A/B ukufakazela imiqondo, ukunquma ukwethembeka, ukusebenza kwesistimu, nokuthuthukisa izindlela ezithembekile zokuletha izixazululo ezinhle ngokushesha.
Lezi izimfuneko ezimbalwa nje zokuba unjiniyela wedatha, ngakho-ke funda futhi uqonde izinhlelo zedatha, amasistimu olwazi, ukulethwa okuqhubekayo/ukuthunyelwa/ukuhlanganiswa, izilimi zokuhlela, nezinye izihloko zesayensi yekhompyutha (hhayi zonke izifundo).
Futhi ekugcineni, into yokugcina kodwa ebaluleke kakhulu engifuna ukuyisho.
Indlela yokuba Ubunjiniyela Bedatha ayilula ngendlela engase ibonakale ngayo. Akaxoleli, uyakhungathekisa, futhi kufanele ukulungiselele lokhu. Izikhathi ezithile kulolu hambo zingase zikuphushe ukuthi uyeke. Kodwa lona umsebenzi wangempela kanye nenqubo yokufunda.
Ungalifaki ushukela kusukela ekuqaleni. Iphuzu eliphelele lokuhamba liwukufunda okuningi ngangokunokwenzeka futhi ulungele izinselele ezintsha.
Nasi isithombe esihle engihlangane naso esilicacisa kahle leli phuzu:
Futhi yebo, khumbula ukugwema ukutubeka nokuphumula. Lokhu nakho kubaluleke kakhulu. Ngikufisela inhlanhla!
Nicabangani ngesihloko, bangane? Sikumema ukuthi
Source: www.habr.com