Mhoro zvakare! Musoro wechinyorwa unotaura wega. Mukutarisira kutanga kwekosi
Nhungamiro yakapusa yekubata iyo Data Engineering wave uye kwete kuirega ichikwevera iwe mugomba rakadzika-dzika.
Zvinoita sekunge munhu wese anoda kuve Data Scientist mazuva ano. Asi zvakadini neData Engineering? Chaizvoizvo, iyi imhando yehusanganiswa yemuongorori wedata uye sainzi data; Injiniya yedata inowanzoita basa rekugadzirisa mafambiro ebasa, kugadzirisa mapaipi, uye ETL maitiro. Nekuda kwekukosha kweaya mabasa, iyi parizvino imwe yakakurumbira nyanzvi jargon iri kushingaira kuwana simba.
Mihoro yakakwira uye kudiwa kukuru chingori chikamu chidiki chezvinoita kuti basa iri riwedzere kunaka! Kana iwe uchida kujoina magamba, hazvina kunonoka kutanga kudzidza. Mune ino post, ndaunganidza ruzivo rwese rwunodiwa kuti ndikubatsire kutora matanho ako ekutanga.
Chii chinonzi Data Engineering?
Kutendeseka, hapana tsananguro iri nani kupfuura iyi:
βMusayendisiti anogona kuwana nyeredzi itsva, asi haakwanisi kusika imwe. Anofanira kukumbira mainjiniya kuti vamuitire."
-Gordon Lindsay Glegg
Saka, basa reinjiniya yedata rakakosha.
Sezvinoratidzwa nezita racho, data engineering ine chekuita nedata, kureva kuendesa kwayo, kuchengetedza uye kugadzirisa. Saizvozvo, iro basa guru revainjiniya nderekupa yakavimbika masisitimu e data. Kana tikatarisa iyo AI hierarchy yezvido, data engineering inotora yekutanga 2-3 nhanho: kuunganidza, kufamba uye kuchengetedza, kugadzirira data.
Chii chinoitwa neanjiniya wedata?
Nekuuya kwedata hombe, chiyero chemutoro chakachinja zvakanyanya. Kana kare nyanzvi idzi dzakanyora mibvunzo mikuru yeSQL uye data yakanyungudutswa vachishandisa maturusi akadai seInformatica ETL, Pentaho ETL, Talend, ikozvino zvinodiwa zveinjiniya zve data zvawedzera.
Mazhinji makambani ane nzvimbo dzakavhurika dzechinzvimbo cheinjiniya yedata ane zvinotevera zvinodiwa:
- Kuziva kwakanakisa kweSQL uye Python.
- Ziva nemapuratifomu emakore, kunyanya Amazon Web Services.
- Kuziva kweJava / Scala kunoda.
- Kunzwisisa kwakanaka kweSQL uye NoSQL dhatabhesi (data modelling, data warehousing).
Ramba uchifunga, izvi ndizvo chete zvakakosha. Kubva pane iyi runyorwa, zvinogona kufungidzirwa kuti mainjiniya edata inyanzvi mumunda wekuvandudza software uye backend.
Semuenzaniso, kana kambani ikatanga kuburitsa yakawanda data kubva kwakasiyana masosi, basa rako seinjiniya yedata ndere kuronga kuunganidzwa kweruzivo, kugadzirisa kwayo uye kuchengetedza.
Rondedzero yemidziyo inoshandiswa munyaya iyi inogona kusiyana, zvose zvinoenderana nehuwandu hweiyi data, kukurumidza kwekugamuchira kwayo uye heterogeneity. Mazhinji makambani haaite nedata hombe zvachose, saka senzvimbo yepakati, inonzi dura re data, unogona kushandisa database yeSQL (PostgreSQL, MySQL, nezvimwewo) ine diki seti yezvinyorwa zvinodyisa data mukati. imba yokuchengetera zvinhu.
IT hofori dzakadai seGoogle, Amazon, Facebook kana Dropbox dzine zvakakwirira zvinodiwa: ruzivo rwePython, Java kana Scala.
- Chiitiko nedata hombe: Hadoop, Spark, Kafka.
- Kuziva kwealgorithms uye data zvimiro.
- Kunzwisisa izvo zvakakosha zveakagoverwa masisitimu.
- Zvakaitika nematurusi ekuona data akadai seTableau kana ElasticSearch ichave yekuwedzera.
Ndiko kuti, pane kushanduka kwakajeka kune data hombe, kureva mukugadzirisa kwayo pasi pemitoro yakakura. Aya makambani akawedzera zvinodiwa zve system fault tolerance.
Data Engineers Vs. data masayendisiti
Zvakanaka, izvo zvaive nyore uye zvinosetsa kuenzanisa (hapana chinhu chega), asi muchokwadi zvakanyanya kuoma.
Kutanga, iwe unofanirwa kuziva kuti kune kwakawanda kusanzwisisika mukutsanangurwa kwemabasa uye hunyanzvi hwesainzi wedata uye injiniya yedata. Ndokunge, iwe unogona kuvhiringidzika nyore nyore nezve hunyanzvi hunodiwa kuti uve akabudirira data engineer. Ehe, kune humwe hunyanzvi hunopindirana nemabasa ese. Asi kune zvakare huwandu hwehunyanzvi hwakapokana nediametrically.
Sainzi yedata ibhizinesi rakakomba, asi isu tiri kuenda kune nyika yesainzi inoshanda data uko varapi vanokwanisa kuzviitira ivo pachavo analytics. Kuti ugone kugonesa mapaipi edata uye zvakabatanidzwa data zvimiro, unoda mainjiniya edata, kwete masayendisiti edata.
Injiniya yedata inonyanya kudiwa kupfuura sainzi wedata?
- Ehe, nekuti usati wagadzira keke keke, iwe unofanirwa kutanga watora, peel uye stock karoti!
Injiniya yedata inonzwisisa kuronga zvirinani kupfuura chero sainzi wedata, asi kana zvasvika kune manhamba, zvakapesana ndezvechokwadi.
Asi heino mukana weinjiniya yedata:
Pasina iye, kukosha kweiyo prototype modhi, kazhinji inoumbwa nechidimbu chemhando inotyisa yekodhi muPython faira, yakawanikwa kubva kune data sainzi uye neimwe nzira inogadzira mhedzisiro, inoenda kune zero.
Pasina injinjini yedata, iyi kodhi haizombove purojekiti uye hapana dambudziko rebhizinesi richagadziriswa zvinobudirira. Injiniya yedata iri kuyedza kushandura zvese izvi kuita chigadzirwa.
Ruzivo rwekutanga iyo injiniya yedata inofanirwa kuziva
Saka, kana basa iri rikaburitsa chiedza mauri uye uchifarira - unogona kuridzidza, unogona kugona hunyanzvi hwese hunodiwa uye uve nyeredzi chaiyo yedombo mumunda weinjiniya yedata. Uye, hongu, iwe unogona kudhonza izvi kunyangwe pasina hunyanzvi hwekugadzira kana imwe ruzivo rwehunyanzvi. Zvakaoma, asi zvinogoneka!
Ndeapi matanho ekutanga?
Iwe unofanirwa kuve neruzivo rwese kuti chii chii.
Chekutanga pane zvese, Injiniya yeData inoreva sainzi yekombuta. Kunyanya zvakanyanya, iwe unofanirwa kunzwisisa inoshanda algorithms uye data zvimiro. Chechipiri, sezvo mainjiniya edata achishanda nedata, zvinodikanwa kuti unzwisise misimboti yedatabase uye zvimiro zvinovatsigira.
Semuenzaniso, yakajairwa B-muti SQL dhatabhesi yakavakirwa paiyo B-Tree data chimiro, uye, mune yemazuva ano yakagovaniswa repositori, LSM-Muti uye kumwe kugadziridzwa kwematafura hashi.
*Matanho aya anobva pachinyorwa chikuru
1. Algorithms uye data zvimiro
Kushandisa iyo chaiyo data chimiro kunogona kuvandudza zvakanyanya kuita kwealgorithm. Zvakanaka, isu tese tinofanira kunge tichidzidza nezve data zvimiro uye algorithms muzvikoro zvedu, asi izvi hazviwanzo kufukidzwa. Chero zvazvingava, hazvisati zvanyanya kunonoka kuzivana.
Saka heano andinofarira emahara makosi ekudzidza data zvimiro uye algorithms:
Kubva Nyore kusvika Yakaoma: Dhata Zvimiro (Udemy) Algorithms, Chikamu I (Coursera) Algorithms, Chikamu II (Coursera)
Uyezve usakanganwa nezvebasa raThomas Corman rekutanga pane algorithms -
- Kuti uvandudze unyanzvi hwako, shandisa
Leetcode .
Iwe unogona zvakare kunyura munyika yedhatabhesi nemavhidhiyo anoshamisa kubva kuCarnegie Mellon University paYouTube:
2. Dzidza SQL
Hupenyu hwedu hwese idata. Uye kuti ubvise iyi data kubva kune database, unofanirwa "kutaura" mutauro wakafanana nawo.
SQL (Structured Query Mutauro) ndiwo mutauro wekutaurirana mudura re data. Pasinei nezvinotaurwa nemunhu, SQL yakararama, mupenyu, uye achararama kwenguva yakareba.
Kana wanga uri mukusimudzira kwenguva yakareba, iwe ungangodaro wakaona kuti runyerekupe rwekufa kuri pedyo kweSQL runobuda nguva nenguva. Mutauro wakagadzirwa mukutanga 70s uye uchiri kufarirwa zvakanyanya pakati pevanoongorora, vanogadzira uye vanongofarira.
Pasina ruzivo rweSQL hapana chekuita muinjiniya yedata sezvo uchizofanira kugadzira mibvunzo kuti utore data. Ese emazuva ano mahombe matura data anotsigira SQL:
- Amazon RedShift
- HP Vertica
- pangataura
- SQL Server
... nevamwe vakawanda.
Kuti uongorore yakakura yedata yakachengetwa mumasisitimu akagoverwa akadai seHDFS, SQL injini dzakagadzirwa: Apache Hive, Impala, etc. Ona, hapana kwainoenda.
Nzira yekudzidza sei SQL? Ingozviita uchidzidzira.
Kuti uite izvi, ini ndinokurudzira kutarisa yakanakisa tutori, iyo, nenzira, yemahara, kubva
Chii chinoita kuti makosi aya ave akakosha ndechekuti ane nharaunda inodyidzana kwaunogona kunyora nekumhanyisa mibvunzo yeSQL mubrowser yako. Resource
3. Kuronga muPython uye Java / Scala
Nei uchifanira kudzidza mutauro wePython programming, ini ndakatonyora muchinyorwa
- Apache Kafka (Scala)
- Hadoop, HDFS (Java)
- Apache Spark (Scala)
- Apache Cassandra (Java)
- HBase (Java)
- Apache Hive (Java)
Kuti unzwisise kuti maturusi aya anoshanda sei, unofanirwa kuziva mitauro yaakanyorwa nayo. Maitiro eScala anoshanda anotendera iwe kuti unyatso kugadzirisa parallel data kugadzirisa matambudziko. Python, zvinosuruvarisa, haigone kuzvirumbidza nekumhanya uye kuenderana kugadzirisa. Kazhinji, ruzivo rwemitauro yakati wandei uye zvirongwa zveparadigm zvakanakira kuwanda kwemaitiro ekugadzirisa matambudziko.
Kuti unyure mumutauro weScala, unogona kuverenga
Kana iri Python, ndinotenda
4. Zvishandiso zvekushanda nedata hombe
Heino rondedzero yeanonyanya kufarirwa maturusi munyika yedata hombe:
- Apache spark
- Apache Kafka
- Apache Hadoop (HDFS, HBase, Hive)
- Apache cassandra
Iwe unogona kuwana rumwe ruzivo nezve kuvaka hombe data zvidhinha mune izvi zvinoshamisa
- Sumo yeHadoop inogona kuva
Iyo Yakazara Nhungamiro yeMastering Hadoop (Yemahara) . - Iyo yakanyatso dhairekitori yeApache Spark kwandiri ndeye -
Spark: The Complete Guide .
5. Cloud platforms
Kuziva kweinokwana gore papuratifomu iri pane rondedzero yezvakakosha zvinodiwa kune vanonyorera chinzvimbo cheinjiniya yedata. Vashandirwi vanosarudza Amazon Web Services, neGoogle's Cloud platform munzvimbo yechipiri uye Microsoft Azure inotenderedza vatatu vepamusoro.
Iwe unofanirwa kuve neruzivo rwakanaka rweAmazon EC2, AWS Lambda, Amazon S3, DynamoDB.
6. Distributed systems
Kushanda nedata hombe kunoreva kuvepo kwemasumbu emakomputa anoshanda akazvimiririra, kutaurirana pakati peiyo inoitwa pamusoro petiweki. Kukura kwesumbu, kunowedzera mukana wekutadza kwenhengo dzaro. Kuti uve mukuru data sainzi, iwe unofanirwa kunzwisisa matambudziko uye iripo mhinduro dzeakagoverwa masisitimu. Iyi nzvimbo ndeyekare uye yakaoma.
Andrew Tanenbaum anoonekwa sepiyona mune iyi ndima. Kune avo vasingatyi dzidziso, ndinokurudzira bhuku rake
Ndinofunga
Kune avo vanoda kuona mavhidhiyo, pane kosi paYouTube
7. Data pipelines
Mapaipi edata chinhu chausingagone kurarama pasina seinjiniya yedata.
Kazhinji yenguva, injinjini yedata inovaka iyo inonzi pombi yedata, ndiko kuti, inogadzira nzira yekuendesa data kubva kune imwe nzvimbo kuenda kune imwe. Aya anogona kunge ari manyorerwo etsika anoenda kune yekunze sevhisi API kana kuita SQL query, kuwedzera data, nekuiisa muchitoro chepakati (data warehouse) kana isina kurongeka data store (data dhamu).
Kupfupisa: iyo yekutanga yekutarisa yeinjiniya yedata
Kupfupisa, kunzwisisa kwakanaka kwezvinotevera kunodiwa:
- Information Systems;
- Kuvandudza software (Agile, DevOps, Design Techniques, SOA);
- Distributed systems uye parallel programming;
- Database Basics - Kuronga, Dhizaini, Kushanda uye Kugadzirisa Matambudziko;
- Dhizaini yezviedzo - A / B bvunzo kuratidza pfungwa, kuona kuvimbika, kuita kwehurongwa, uye kugadzira nzira dzakavimbika dzekuunza mhinduro dzakanaka nekukurumidza.
Izvi zvingori zvishoma zvezvinodiwa kuti uve injinjini yedata, saka dzidza uye unzwisise masisitimu edatha, masisitimu eruzivo, kuenderera mberi kwekutumira / kutumira / kubatanidza, mitauro yekuronga, uye mamwe misoro yesainzi yemakomputa (kwete zvese zvidzidzo).
Uye pakupedzisira, chinhu chekupedzisira asi chakakosha kwazvo chandinoda kutaura.
Nzira yekuve Injiniya yeData haisi nyore sezvaingaita. Haakanganwiri, anogumbura, uye iwe unofanira kunge wakagadzirira izvi. Dzimwe nguva murwendo urwu dzinogona kukusundidzira kurega. Asi iri ibasa chairo uye maitiro ekudzidza.
Ingo usaite sugarcoat kubva pakutanga. Chinhu chose chekufamba ndechekudzidza zvakanyanya sezvinobvira uye kugadzirira matambudziko matsva.
Heino mufananidzo wakanaka wandakawana unoratidza pfungwa iyi zvakanaka:
Uye hongu, yeuka kudzivirira kupera simba uye kuzorora. Izvi zvakakoshawo zvikuru. Rombo rakanaka!
Munofungei nezvechinyorwa, shamwari? Tinokukoka iwe
Source: www.habr.com