Leej twg yog data engineers, thiab ua li cas koj ua ib tug?

Nyob zoo dua! Lub npe ntawm tsab xov xwm hais rau nws tus kheej. Hauv kev cia siab ntawm qhov pib ntawm chav kawm Data Engineer Peb xav kom koj nkag siab leej twg cov ntaub ntawv engineers yog. Muaj ntau qhov sib txuas muaj txiaj ntsig hauv kab lus. Zoo siab nyeem ntawv.

Leej twg yog data engineers, thiab ua li cas koj ua ib tug?

Ib qho yooj yim qhia txog yuav ua li cas ntes cov ntaub ntawv Engineering nthwv dej thiab tsis txhob cia nws rub koj mus rau hauv abyss.

Zoo li txhua tus xav los ua Tus Kws Tshawb Fawb Txog Kev Tshawb Fawb niaj hnub no. Tab sis dab tsi txog Data Engineering? Qhov tseem ceeb, qhov no yog hom hybrid ntawm tus kws tshuaj xyuas cov ntaub ntawv thiab tus kws tshawb fawb cov ntaub ntawv; Tus kws tshaj lij cov ntaub ntawv feem ntau yog lub luag haujlwm rau kev tswj cov haujlwm ua haujlwm, ua cov kav dej, thiab cov txheej txheem ETL. Vim yog qhov tseem ceeb ntawm cov haujlwm no, qhov no yog tam sim no lwm tus kws tshaj lij jargon uas nquag tau txais lub zog.

Cov nyiaj hli siab thiab kev thov loj yog ib feem me me ntawm qhov ua rau txoj haujlwm no ntxim nyiam heev! Yog tias koj xav koom nrog cov qib ntawm cov phab ej, nws yeej tsis lig dhau los pib kawm. Hauv tsab ntawv no, kuv tau sau tag nrho cov ntaub ntawv tsim nyog los pab koj ua thawj kauj ruam.

Yog li cia peb pib!

Data Engineering yog dab tsi?

Ua siab ncaj, tsis muaj lus piav qhia zoo dua qhov no:

β€œIb tug kws tshawb fawb tuaj yeem tshawb pom lub hnub qub tshiab, tab sis nws tsim tsis tau. Nws yuav tsum tau nug tus kws ua haujlwm ua haujlwm rau nws. "

- Gordon Lindsay Glegg

Yog li, lub luag haujlwm ntawm cov ntaub ntawv engineer yog qhov tseem ceeb heev.

Raws li lub npe qhia, cov ntaub ntawv engineering muaj kev txhawj xeeb nrog cov ntaub ntawv, uas yog nws cov khoom xa tuaj, khaws cia thiab ua tiav. Yog li ntawd, lub luag haujlwm tseem ceeb ntawm cov engineers yog muab cov ntaub ntawv txhim khu kev qha. Yog tias peb saib ntawm AI hierarchy ntawm kev xav tau, cov ntaub ntawv engineering tuav thawj 2-3 theem: sau, txav thiab khaws cia, npaj cov ntaub ntawv.

Leej twg yog data engineers, thiab ua li cas koj ua ib tug?

Tus kws sau ntawv ua haujlwm ua dab tsi?

Nrog rau qhov tshwm sim ntawm cov ntaub ntawv loj, lub luag haujlwm ntawm lub luag haujlwm tau hloov pauv ntau. Yog tias yav dhau los cov kws tshaj lij no tau sau cov lus nug SQL loj thiab cov ntaub ntawv distilled siv cov cuab yeej xws li Informatica ETL, Pentaho ETL, Talend, tam sim no cov kev cai rau cov ntaub ntawv engineers tau nce.

Feem ntau cov tuam txhab uas muaj cov haujlwm qhib rau txoj haujlwm ntawm cov ntaub ntawv engineer muaj cov cai hauv qab no:

  • Kev paub zoo ntawm SQL thiab Python.
  • Kev paub nrog huab platforms, tshwj xeeb yog Amazon Web Services.
  • Kev paub txog Java / Scala nyiam.
  • Kev nkag siab zoo ntawm SQL thiab NoSQL databases (cov ntaub ntawv qauv, cov ntaub ntawv khaws cia).

Nco ntsoov, cov no tsuas yog qhov tseem ceeb xwb. Los ntawm cov npe no, nws tuaj yeem xav tias cov ntaub ntawv engineers yog cov kws tshaj lij hauv kev txhim kho software thiab backend.
Piv txwv li, yog tias ib lub tuam txhab pib tsim cov ntaub ntawv ntau los ntawm ntau qhov chaw, koj txoj haujlwm ua tus kws tsim cov ntaub ntawv yog los npaj cov ntaub ntawv sau, nws cov txheej txheem thiab kev khaws cia.

Daim ntawv teev cov cuab yeej siv nyob rau hauv rooj plaub no yuav txawv, nws tag nrho nyob ntawm qhov ntim ntawm cov ntaub ntawv no, qhov ceev ntawm nws qhov tau txais thiab heterogeneity. Feem ntau cov tuam txhab tsis cuam tshuam nrog cov ntaub ntawv loj txhua, yog li lub hauv paus chaw cia khoom, lub npe hu ua cov ntaub ntawv khaws cia, koj tuaj yeem siv SQL database (PostgreSQL, MySQL, thiab lwm yam) nrog cov ntawv me me uas pub cov ntaub ntawv rau hauv. lub tsev khaws khoom.

IT loj heev xws li Google, Amazon, Facebook lossis Dropbox muaj qhov xav tau ntau dua: kev paub txog Python, Java lossis Scala.

  • Kev paub nrog cov ntaub ntawv loj: Hadoop, Spark, Kafka.
  • Kev paub txog algorithms thiab cov qauv ntaub ntawv.
  • Nkag siab txog cov hauv paus ntsiab lus ntawm kev faib tshuab.
  • Kev paub nrog cov ntaub ntawv pom cov cuab yeej xws li Tableau lossis ElasticSearch yuav yog qhov ntxiv.

Ntawd yog, muaj kev hloov pauv meej ntawm cov ntaub ntawv loj, uas yog hauv nws cov kev ua haujlwm nyob rau hauv siab loads. Cov tuam txhab no tau nce qhov yuav tsum tau ua rau kev ua txhaum cai.

Data Engineers Vs. cov kws tshawb fawb cov ntaub ntawv

Leej twg yog data engineers, thiab ua li cas koj ua ib tug?
Okay, qhov ntawd yog qhov sib piv yooj yim thiab lom zem (tsis muaj dab tsi ntawm tus kheej), tab sis qhov tseeb nws yog qhov nyuaj dua.

Ua ntej, koj yuav tsum paub tias muaj ntau qhov tsis meej pem hauv kev delineation ntawm lub luag haujlwm thiab kev txawj ntse ntawm tus kws tshawb fawb cov ntaub ntawv thiab cov ntaub ntawv engineer. Ntawd yog, koj tuaj yeem yooj yim tsis meej pem txog dab tsi cov txuj ci yuav tsum tau ua kom tiav cov ntaub ntawv engineer. Tau kawg, muaj qee yam kev txawj uas sib tshooj nrog ob lub luag haujlwm. Tab sis kuj tseem muaj ntau qhov kev tawm tsam diametrically.

Cov ntaub ntawv kev tshawb fawb yog kev lag luam loj, tab sis peb tab tom mus rau lub ntiaj teb ntawm cov ntaub ntawv kev tshawb fawb uas cov kws kho mob muaj peev xwm ua lawv tus kheej analytics. Txhawm rau kom cov ntaub ntawv xa mus thiab cov ntaub ntawv sib xyaw ua ke, koj xav tau cov ntaub ntawv engineers, tsis yog cov kws tshawb fawb cov ntaub ntawv.

Puas yog tus kws tsim cov ntaub ntawv xav tau ntau dua li tus kws tshawb fawb cov ntaub ntawv?

- Yog, vim tias ua ntej koj tuaj yeem ua cov ncuav mog qab zib, koj yuav tsum xub sau, tev thiab npaj cov carrots!

Tus kws tshaj lij cov ntaub ntawv nkag siab txog kev ua haujlwm zoo dua li cov kws tshawb fawb cov ntaub ntawv, tab sis thaum nws los txog rau kev txheeb cais, qhov sib txawv yog qhov tseeb.

Tab sis ntawm no yog qhov zoo ntawm cov ntaub ntawv engineer:

Yog tsis muaj nws, tus nqi ntawm tus qauv qauv, feem ntau muaj xws li ib daim ntawm cov cai tsis zoo nyob rau hauv cov ntaub ntawv Python, tau txais los ntawm cov kws tshawb fawb cov ntaub ntawv thiab qee qhov ua tau zoo, nyhav rau xoom.

Yog tsis muaj cov ntaub ntawv engineer, cov cai no yuav tsis dhau los ua ib qhov project thiab tsis muaj teeb meem kev lag luam yuav daws tau zoo. Cov ntaub ntawv engineer sim tig tag nrho cov no mus rau hauv ib qho khoom.

Cov ntaub ntawv tseem ceeb uas tus engineer data yuav tsum paub

Leej twg yog data engineers, thiab ua li cas koj ua ib tug?

Yog li, yog tias txoj haujlwm no coj lub teeb rau hauv koj thiab koj zoo siab - koj tuaj yeem kawm tau, koj tuaj yeem paub txhua yam kev txawj ntse thiab dhau los ua lub hnub qub pob zeb tiag tiag hauv kev ua haujlwm ntawm cov ntaub ntawv engineering. Thiab, yog, koj tuaj yeem rub tawm qhov no txawm tias tsis muaj kev txawj programming lossis lwm yam kev paub. Nws nyuaj, tab sis ua tau!

Thawj kauj ruam yog dab tsi?

Koj yuav tsum muaj lub tswv yim dav dav ntawm qhov yog dab tsi.

Ua ntej tshaj plaws, Data Engineering hais txog computer science. Tshwj xeeb tshaj yog, koj yuav tsum nkag siab zoo algorithms thiab cov qauv ntaub ntawv. Qhov thib ob, txij li cov ntaub ntawv engineers ua haujlwm nrog cov ntaub ntawv, nws yog ib qho tsim nyog yuav tsum nkag siab txog cov hauv paus ntsiab lus ntawm databases thiab cov qauv uas ua rau lawv.

Piv txwv li, cov qauv B-ntoo SQL databases yog raws li B-Tree cov ntaub ntawv qauv, nrog rau, nyob rau hauv niaj hnub faib repositories, LSM-Tree thiab lwm yam kev hloov kho ntawm cov rooj hash.

* Cov kauj ruam no yog ua raws li ib tsab xov xwm zoo Adilya Khashtamova. Yog li, yog tias koj paub Lavxias teb sab, txhawb tus sau no thiab nyeem nws post.

1. Algorithms thiab cov qauv ntaub ntawv

Kev siv cov ntaub ntawv tsim nyog tuaj yeem txhim kho kev ua tau zoo ntawm cov algorithm. Qhov zoo tshaj plaws, peb txhua tus yuav tsum tau kawm txog cov qauv ntaub ntawv thiab cov algorithms hauv peb cov tsev kawm ntawv, tab sis qhov no tsis tshua muaj. Txawm li cas los xij, nws yeej tsis lig dhau los kom paub.
Yog li ntawm no yog kuv nyiam cov chav kawm dawb rau kev kawm cov ntaub ntawv qauv thiab algorithms:

Ntxiv thiab tsis txhob hnov ​​​​qab txog Thomas Corman txoj haujlwm classic ntawm algorithms - Taw qhia rau Algorithms. Qhov no yog qhov zoo tshaj plaws siv thaum koj xav tau kom rov qab koj lub cim xeeb.

  • Txhawm rau txhim kho koj cov txuj ci, siv Leetcode.

Koj tseem tuaj yeem dhia mus rau hauv lub ntiaj teb ntawm cov ntaub ntawv nrog cov yeeb yaj kiab zoo los ntawm Carnegie Mellon University hauv Youtube:

2. Kawm SQL

Peb lub neej tag nrho yog cov ntaub ntawv. Thiab txhawm rau rho tawm cov ntaub ntawv no los ntawm cov ntaub ntawv, koj yuav tsum "hais" tib hom lus nrog nws.

SQL (Structured Query Language) yog hom lus ntawm kev sib txuas lus hauv cov ntaub ntawv sau. Txawm hais tias leej twg hais, SQL tau nyob, muaj sia nyob, thiab yuav nyob ntev heev.

Yog tias koj tau nyob rau hauv txoj kev loj hlob ntev, tej zaum koj tau pom tias cov lus xaiv hais txog kev tuag ntawm SQL pop tuaj ib ntus. Cov lus tau tsim nyob rau hauv thaum ntxov 70s thiab tseem yog heev nrov ntawm cov kws tshuaj ntsuam, developers thiab tsuas enthusiasts.
Yog tsis muaj kev paub txog SQL tsis muaj dab tsi ua hauv cov ntaub ntawv engineering vim tias koj yuav tsum tau tsim cov lus nug kom khaws cov ntaub ntawv. Tag nrho cov niaj hnub loj cov ntaub ntawv warehouses txhawb SQL:

  • Amazon RedShift
  • HP Vertica Cov
  • Oracle
  • SQL neeg rau zaub mov

... thiab ntau lwm tus.

Txhawm rau txheeb xyuas txheej txheej loj ntawm cov ntaub ntawv khaws cia hauv cov txheej txheem faib xws li HDFS, SQL cav tau tsim: Apache Hive, Impala, thiab lwm yam Saib, nws tsis mus qhov twg.

Yuav kawm SQL li cas? Tsuas yog ua nws hauv kev xyaum.

Ua li no, kuv xav kom kuaj xyuas qhov kev qhia zoo heev, uas, los ntawm txoj kev, yog dawb, los ntawm Hom Analytics.

  1. Nruab nrab SQL
  2. Koom nrog cov ntaub ntawv hauv SQL

Dab tsi ua rau cov kev kawm tshwj xeeb yog tias lawv muaj kev sib tham sib ib puag ncig uas koj tuaj yeem sau thiab khiav SQL queries txoj cai hauv koj tus browser. Cov peev txheej Niaj hnub SQL yuav tsis superfluous. Thiab koj tuaj yeem siv qhov kev paub no rau Leetcode cov haujlwm hauv ntu Databases.

3. Programming hauv Python thiab Java/Scala

Vim li cas koj yuav tsum kawm Python programming lus, kuv twb tau sau rau hauv tsab xov xwm Python vs R. Xaiv cov cuab yeej zoo tshaj plaws rau AI, ML thiab Data Science. Thaum nws los txog rau Java thiab Scala, feem ntau ntawm cov cuab yeej khaws cia thiab ua cov ntaub ntawv ntau yog sau ua hom lus. Piv txwv li:

  • Apache Kafka (Scala)
  • Hadoop, HDFS (Java)
  • Apache Spark (Scala)
  • Apache Cassandra (Java)
  • HBase (Java)
  • Apache Hive (Java)

Yuav kom nkag siab tias cov cuab yeej no ua haujlwm li cas, koj yuav tsum paub cov lus uas lawv tau sau. Scala txoj hauv kev ua haujlwm tso cai rau koj los daws cov teeb meem kev ua cov ntaub ntawv sib luag. Hmoov tsis zoo, Python tsis tuaj yeem khav theeb ntawm kev nrawm thiab ua haujlwm sib luag. Feem ntau, kev paub txog ntau hom lus thiab cov txheej txheem programming yog qhov zoo rau qhov dav ntawm txoj hauv kev los daws cov teeb meem.

Txhawm rau nkag mus rau hauv hom lus Scala, koj tuaj yeem nyeem Programming hauv Scala los ntawm tus sau lus. Twitter kuj tau tshaj tawm cov lus qhia qhia zoo - Tsev kawm ntawv Scala.

Raws li rau Python, kuv ntseeg Python zoo phau ntawv nruab nrab zoo tshaj plaws.

4. Cov cuab yeej ua haujlwm nrog cov ntaub ntawv loj

Nov yog cov npe ntawm cov cuab yeej nrov tshaj plaws hauv ntiaj teb ntawm cov ntaub ntawv loj:

  • Apache txim
  • Apache Kafka
  • Apache Hadoop (HDFS, HBase, Hive)
  • Apache Cassandra

Koj tuaj yeem nrhiav cov ntaub ntawv ntau ntxiv txog kev tsim cov ntaub ntawv loj loj hauv qhov amazing no sib tham sib ib puag ncig. Cov cuab yeej nrov tshaj plaws yog Spark thiab Kafka. Lawv yeej tsim nyog kawm, nws raug nquahu kom nkag siab tias lawv ua haujlwm li cas los ntawm sab hauv. Jay Kreps (co-sau ntawm Kafka) luam tawm ib qho haujlwm tseem ceeb hauv 2013 Lub Log: Dab tsi Txhua Tus Tsim Kho Software Yuav Tsum Paub Txog Real-Time Data Aggregation AbstractionLos ntawm txoj kev, cov tswv yim tseem ceeb ntawm Talmud no tau siv los tsim Apache Kafka.

5. Huab platforms

Leej twg yog data engineers, thiab ua li cas koj ua ib tug?

Kev paub txog tsawg kawg ib lub platform huab yog nyob rau hauv cov npe ntawm cov kev cai yooj yim rau cov neeg thov rau txoj haujlwm ntawm cov ntaub ntawv engineer. Cov tswv lag luam nyiam Amazon Web Services, nrog Google's huab platform hauv qhov thib ob thiab Microsoft Azure sib sau ua ke peb sab saum toj.

Koj yuav tsum muaj kev paub zoo txog Amazon EC2, AWS Lambda, Amazon S3, DynamoDB.

6. Kev faib cov tshuab

Ua hauj lwm nrog cov ntaub ntawv loj txhais tau hais tias muaj cov pawg ntawm nws tus kheej ua haujlwm hauv computer, kev sib txuas lus ntawm cov uas tau ua dhau los ntawm lub network. Qhov loj dua cov pawg, qhov ntau dua qhov tshwm sim ntawm kev ua tsis tiav ntawm nws cov tswv cuab ntawm cov nodes. Txhawm rau dhau los ua tus kws tshawb fawb cov ntaub ntawv zoo, koj yuav tsum nkag siab txog cov teeb meem thiab cov kev daws teeb meem uas twb muaj lawm rau cov tshuab xa tawm. Lub cheeb tsam no yog qub thiab complex.

Andrew Tanenbaum suav hais tias yog ib tug tho kev hauv daim teb no. Rau cov neeg uas tsis ntshai txoj kev xav, kuv xav kom nws phau ntawv "Distributed Systems", tej zaum nws yuav zoo li daunting rau beginners, tab sis nws yuav pab tau koj hone koj cov kev txawj ntse.

kuv xav tias Tsim cov ntaub ntawv-Intensive Applications los ntawm Martin Kleppmann phau ntawv qhia zoo tshaj plaws. Los ntawm txoj kev, Martin muaj qhov zoo kawg nkaus Blog. Nws txoj haujlwm yuav pab ua kom muaj kev paub txog kev tsim kho vaj tse niaj hnub rau kev khaws cia thiab ua cov ntaub ntawv loj.
Rau cov neeg nyiam saib cov yeeb yaj kiab, muaj chav kawm hauv Youtube Distributed computer systems.

7. Cov ntaub ntawv kav dej

Leej twg yog data engineers, thiab ua li cas koj ua ib tug?

Cov ntaub ntawv kav dej yog ib yam uas koj tsis tuaj yeem nyob yam tsis muaj cov ntaub ntawv tus engineer.

Feem ntau, tus kws tsim cov ntaub ntawv tsim cov ntaub ntawv xa mus, uas yog, nws tsim cov txheej txheem xa cov ntaub ntawv los ntawm ib qho mus rau lwm qhov. Cov no tuaj yeem yog cov ntawv sau kev cai uas mus rau lwm qhov kev pabcuam API lossis ua SQL query, augment cov ntaub ntawv, thiab muab tso rau hauv lub hauv paus khw (cov ntaub ntawv warehouse) los yog cov ntaub ntawv tsis tsim nyog (cov ntaub ntawv pas dej).

Los xaus: daim ntawv teev npe yooj yim rau cov ntaub ntawv engineer

Leej twg yog data engineers, thiab ua li cas koj ua ib tug?

Yuav kom ua tiav, kev nkag siab zoo ntawm cov hauv qab no yuav tsum tau:

  • Cov Ntaub Ntawv Txheej Txheem;
  • Kev tsim kho software (Agile, DevOps, Design Techniques, SOA);
  • Distributed systems thiab parallel programming;
  • Database Fundamentals - Kev npaj, tsim, kev ua haujlwm thiab kev daws teeb meem;
  • Tsim cov kev sim - A / B xeem los ua pov thawj cov ntsiab lus, txiav txim siab kev ntseeg siab, kev ua haujlwm ntawm lub cev, thiab txhim kho txoj hauv kev txhim khu kev qha kom xa cov kev daws teeb meem zoo sai.

Cov no tsuas yog qee qhov kev xav tau los ua tus kws tshaj lij cov ntaub ntawv, yog li kawm thiab nkag siab cov ntaub ntawv, cov ntaub ntawv xov xwm, kev xa khoom mus tas li / xa mus / kev sib koom ua ke, cov lus programming, thiab lwm yam kev tshawb fawb hauv computer (tsis yog txhua yam kev kawm).

Thiab thaum kawg, qhov kawg tab sis tseem ceeb heev uas kuv xav hais.

Txoj hauv kev los ua Data Engineering tsis yooj yim li nws zoo li. Nws tsis zam txim, nws chim siab, thiab koj yuav tsum npaj rau qhov no. Qee lub sij hawm hauv kev taug kev no yuav thawb koj tso tseg. Tab sis qhov no yog kev ua haujlwm tiag tiag thiab kev kawm.

Cia li tsis txhob qab zib los ntawm qhov pib. Tag nrho cov ntsiab lus ntawm kev mus ncig yog kawm kom ntau li ntau tau thiab npaj txhij rau kev sib tw tshiab.
Nov yog ib daim duab zoo kawg uas kuv tuaj hla uas qhia txog qhov no zoo:

Leej twg yog data engineers, thiab ua li cas koj ua ib tug?

Thiab yog, nco ntsoov kom tsis txhob burnout thiab so. Qhov no kuj tseem ceeb heev. Hmoov zoo!

Koj xav li cas rau tsab xov xwm, cov phooj ywg? Peb caw koj mus dawb webinar, uas yuav muaj hnub no thaum 20.00. Thaum lub sij hawm webinar, peb yuav tham txog yuav ua li cas los tsim kom tau ib tug zoo thiab scalable cov ntaub ntawv ua system rau ib lub tuam txhab me me los yog pib ntawm tus nqi tsawg. Raws li kev xyaum, peb yuav tau paub nrog Google Cloud cov ntaub ntawv ua cov cuab yeej. Pom koj!

Tau qhov twg los: www.hab.com

Ntxiv ib saib