Kodi mainjiniya a data ndi ndani, ndipo mumakhala bwanji m'modzi?

Moni kachiwiri! Mutu wa nkhaniyo ukunena wokha. Poyembekezera kuyamba kwa maphunzirowo Data Engineer Tikukulangizani kuti mumvetsetse omwe mainjiniya a data ndi. Pali maulalo ambiri othandiza m'nkhaniyi. Kuwerenga kosangalatsa.

Kodi mainjiniya a data ndi ndani, ndipo mumakhala bwanji m'modzi?

Kalozera wosavuta wamomwe mungagwirire mafunde a Data Engineering ndipo musalole kukukokerani kuphompho.

Zikuwoneka kuti aliyense akufuna kukhala Data Scientist masiku ano. Koma bwanji za Data Engineering? Kwenikweni, uwu ndi mtundu wosakanizidwa wa wosanthula deta ndi wasayansi wa data; Katswiri wa data nthawi zambiri amakhala ndi udindo woyang'anira kayendetsedwe ka ntchito, mapaipi okonza, ndi njira za ETL. Chifukwa cha kufunikira kwa ntchitozi, iyi ndi jargon ina yodziwika bwino yomwe ikuchulukirachulukira.

Malipiro okwera komanso kufunikira kwakukulu ndi gawo laling'ono chabe la zomwe zimapangitsa kuti ntchitoyi ikhale yosangalatsa kwambiri! Ngati mukufuna kulowa nawo mgulu la ngwazi, sikunachedwe kuti muyambe kuphunzira. Mu positi iyi, ndasonkhanitsa zidziwitso zonse zofunika kuti zikuthandizeni kuchitapo kanthu koyamba.

Ndiye tiyeni tiyambe!

Kodi Data Engineering ndi chiyani?

Moona mtima, palibe kufotokoza kwabwinoko kuposa uku:

β€œWasayansi akhoza kupeza nyenyezi yatsopano, koma sangailenge. Ayenera kufunsa mainjiniya kuti amuchitire."

- Gordon Lindsay Glegg

Chifukwa chake, ntchito ya injiniya wa data ndi yofunika kwambiri.

Monga momwe dzinalo likusonyezera, uinjiniya wa data umakhudzidwa ndi deta, mwachitsanzo, kutumiza, kusungidwa ndi kukonza. Chifukwa chake, ntchito yayikulu ya mainjiniya ndikupereka maziko odalirika a data. Ngati tiyang'ana pazofunikira za AI, uinjiniya wa data umakhala ndi magawo 2-3 oyamba: kusonkhanitsa, kuyenda ndi kusunga, kukonzekera deta.

Kodi mainjiniya a data ndi ndani, ndipo mumakhala bwanji m'modzi?

Kodi injiniya wa data amachita chiyani?

Pakubwera kwa deta yaikulu, kukula kwa udindo wasintha kwambiri. Ngati kale akatswiriwa adalemba mafunso akuluakulu a SQL ndi deta yosungunuka pogwiritsa ntchito zipangizo monga Informatica ETL, Pentaho ETL, Talend, tsopano zofunikira za akatswiri a data zawonjezeka.

Makampani ambiri omwe ali ndi ntchito zotseguka paudindo wa mainjiniya wa data ali ndi izi:

  • Kudziwa bwino kwa SQL ndi Python.
  • Dziwani ndi nsanja zamtambo, makamaka Amazon Web Services.
  • Kudziwa Java/Scala kumakonda.
  • Kumvetsetsa bwino kwa SQL ndi NoSQL databases (ma data modelling, data warehousing).

Kumbukirani, izi ndi zofunika zokha. Kuchokera pamndandandawu, titha kuganiziridwa kuti mainjiniya a data ndi akatswiri pantchito yokonza mapulogalamu ndi backend.
Mwachitsanzo, ngati kampani iyamba kupanga zambiri kuchokera kuzinthu zosiyanasiyana, ntchito yanu monga injiniya wa deta ndikukonzekera kusonkhanitsa zidziwitso, kukonza ndi kusunga.

Mndandanda wa zida zomwe zimagwiritsidwa ntchito pankhaniyi zingakhale zosiyana, zonse zimadalira kuchuluka kwa deta iyi, kuthamanga kwa chiphaso chake ndi heterogeneity. Makampani ambiri samachita ndi deta yayikulu nkomwe, kotero monga malo osungiramo malo, otchedwa malo osungiramo deta, mungagwiritse ntchito database ya SQL (PostgreSQL, MySQL, etc.) ndi zolemba zazing'ono zomwe zimadyetsa deta. nyumba yosungiramo katundu.

Zimphona za IT monga Google, Amazon, Facebook kapena Dropbox zili ndi zofunikira zapamwamba: chidziwitso cha Python, Java kapena Scala.

  • Zochitika ndi data yayikulu: Hadoop, Spark, Kafka.
  • Kudziwa ma algorithms ndi mapangidwe a data.
  • Kumvetsetsa zoyambira zamachitidwe ogawa.
  • Kudziwa ndi zida zowonera deta monga Tableau kapena ElasticSearch kudzakhala chowonjezera.

Ndiko kuti, pali kusintha koonekeratu kupita ku deta yaikulu, yomwe ndi yokonza pansi pa katundu wambiri. Makampani awa akuwonjezera zofunikira pakulekerera zolakwika zadongosolo.

Data Engineers Vs. asayansi a data

Kodi mainjiniya a data ndi ndani, ndipo mumakhala bwanji m'modzi?
Chabwino, uku kunali kufananitsa kosavuta komanso koseketsa (palibe munthu), koma kwenikweni ndizovuta kwambiri.

Choyamba, muyenera kudziwa kuti pali zosadziwika bwino pofotokozera maudindo ndi luso la wasayansi wa data ndi injiniya wa data. Ndiko kuti, mutha kusokonezedwa mosavuta ndi maluso ati omwe amafunikira kuti mukhale mainjiniya opambana. Inde, pali maluso ena omwe amafanana ndi maudindo onse awiri. Koma palinso maluso angapo otsutsana ndi diametrically.

Sayansi ya data ndi bizinesi yayikulu, koma tikupita kudziko la sayansi yama data komwe akatswiri amatha kusanthula okha. Kuti mutsegule mapaipi a data ndi ma data ophatikizika, mufunika akatswiri opanga ma data, osati asayansi a data.

Kodi injiniya wa data amafunidwa kwambiri kuposa wasayansi wa data?

- Inde, chifukwa musanapange keke ya kaloti, muyenera kukolola kaye, kusenda ndikusunga kaloti!

Wopanga ma data amamvetsetsa bwino mapulogalamu kuposa wasayansi aliyense wa data, koma zikafika pamawerengero, zosiyana ndizowona.

Koma nayi mwayi wa mainjiniya wa data:

Popanda iye, mtengo wachitsanzo, nthawi zambiri wokhala ndi kachidutswa koyipa koyipa mu fayilo ya Python, yopezedwa kuchokera kwa wasayansi wa data ndipo mwanjira ina kutulutsa zotsatira, imakhala zero.

Popanda injiniya wa data, code iyi sidzakhala projekiti ndipo palibe vuto la bizinesi lomwe lidzathetsedwa bwino. Wopanga data akuyesera kusandutsa zonsezi kukhala chinthu.

Zambiri zomwe injiniya wa data ayenera kudziwa

Kodi mainjiniya a data ndi ndani, ndipo mumakhala bwanji m'modzi?

Chifukwa chake, ngati ntchitoyi ikutulutsa kuwala mwa inu ndipo muli okondwa - mutha kuiphunzira, mutha kudziwa maluso onse ofunikira ndikukhala nyenyezi yeniyeni pazaumisiri wa data. Ndipo, inde, mutha kuzichotsa ngakhale popanda luso la pulogalamu kapena chidziwitso china chaukadaulo. Ndizovuta, koma zotheka!

Masitepe oyamba ndi ati?

Muyenera kukhala ndi lingaliro wamba kuti ndi chiyani.

Choyamba, Data Engineering imanena za sayansi yamakompyuta. Makamaka, muyenera kumvetsetsa bwino ma algorithms ndi ma data. Kachiwiri, popeza akatswiri opanga ma data amagwira ntchito ndi data, ndikofunikira kumvetsetsa mfundo za nkhokwe ndi mapangidwe omwe amawatsogolera.

Mwachitsanzo, nkhokwe zanthawi zonse za B-tree SQL zimatengera kapangidwe ka data ka B-Tree, komanso, m'malo osungira amakono, LSM-Tree ndi zosintha zina zama hashi.

*Masitepewa adatengera nkhani yayikulu Adilya Khashtamova. Chifukwa chake, ngati mukudziwa Chirasha, thandizirani wolemba uyu ndikuwerenga positi yake.

1. Ma algorithms ndi mapangidwe a data

Kugwiritsa ntchito njira yoyenera ya data kumatha kusintha kwambiri magwiridwe antchito a algorithm. Moyenera, tonse tiyenera kuphunzira za kapangidwe ka data ndi ma aligorivimu m'masukulu athu, koma izi sizikambidwa. Mulimonsemo, sikuchedwa kwambiri kuti mudziwe bwino.
Chifukwa chake nayi maphunziro anga aulere omwe ndimawakonda ophunzirira mapangidwe a data ndi ma aligorivimu:

Komanso musaiwale za ntchito yapamwamba ya Thomas Corman pa ma algorithms - Chidziwitso cha ma algorithms. Uwu ndiye katchulidwe koyenera mukafuna kutsitsimutsa kukumbukira kwanu.

  • Kuti muwonjezere luso lanu, gwiritsani ntchito Leet kodi.

Mutha kulowanso mdziko lazosungidwa ndi makanema odabwitsa ochokera ku Carnegie Mellon University pa Youtube:

2. Phunzirani SQL

Moyo wathu wonse ndi data. Ndipo kuti mutenge deta iyi kuchokera ku database, muyenera "kulankhula" chinenero chomwecho.

SQL (Structured Query Language) ndi chilankhulo choyankhulirana mu data domain. Mosasamala kanthu za zomwe wina akunena, SQL wakhalapo, ali moyo, ndipo adzakhala ndi moyo kwa nthawi yaitali kwambiri.

Ngati mwakhala mukukula kwa nthawi yayitali, mwina mwawona kuti mphekesera za imfa ya SQL yomwe ili pafupi imapezeka nthawi ndi nthawi. Chilankhulochi chinapangidwa kumayambiriro kwa zaka za m'ma 70 ndipo chikadali chodziwika kwambiri pakati pa akatswiri, opanga mapulogalamu komanso okonda chabe.
Popanda chidziwitso cha SQL palibe chochita muukadaulo wa data chifukwa mudzayenera kupanga mafunso kuti mutengenso deta. Malo onse amakono osungiramo data amathandizira SQL:

  • Redshift ya Amazon
  • HP Vertica
  • Oracle
  • SQL Server

... ndi ena ambiri.

Kusanthula deta yaikulu yosungidwa mu machitidwe ogawidwa monga HDFS, injini za SQL zinapangidwa: Apache Hive, Impala, etc. Onani, sizikupita kulikonse.

Kodi kuphunzira SQL? Ingochitani muzochita.

Kuti muchite izi, ndikupangira kuyang'ana maphunziro abwino kwambiri, omwe, mwa njira, ndi aulere, kuchokera Mode Analytics.

  1. SQL yapakatikati
  2. Kujowina Data mu SQL

Chomwe chimapangitsa maphunzirowa kukhala apadera ndikuti ali ndi malo ochezera omwe mungalembe ndikuyendetsa mafunso a SQL mumsakatuli wanu. Zothandizira SQL yamakono sizikhala zochulukira. Ndipo mukhoza kugwiritsa ntchito chidziwitso ichi Leetcode ntchito mu gawo la Databases.

3. Mapulogalamu mu Python ndi Java/Scala

Chifukwa chiyani muyenera kuphunzira chilankhulo cha pulogalamu ya Python, ndalemba kale m'nkhaniyi Python vs R. Kusankha Chida Chabwino Kwambiri cha AI, ML ndi Data Science. Zikafika ku Java ndi Scala, zida zambiri zosungira ndi kukonza zambiri zimalembedwa m'zilankhulo izi. Mwachitsanzo:

  • Apache Kafka (Scala)
  • Hadoop, HDFS (Java)
  • Apache Spark (Scala)
  • Apache Cassandra (Java)
  • HBase (Java)
  • Apache Hive (Java)

Kuti mumvetsetse momwe zidazi zimagwirira ntchito, muyenera kudziwa zilankhulo zomwe zidalembedwa. Njira yogwirira ntchito ya Scala imakupatsani mwayi wothana ndi zovuta zofananira za data. Python, mwatsoka, sangadzitamande chifukwa cha liwiro komanso ntchito yofananira. Mwambiri, kudziwa zilankhulo zingapo ndi ma paradigm amapulogalamu ndizabwino pakufalikira kwa njira zothetsera mavuto.

Kuti mulowe mu chilankhulo cha Scala, mutha kuwerenga Kupanga mapulogalamu ku Scala kuchokera kwa wolemba chinenerocho. Twitter idatulutsanso chiwongolero chabwino choyambira - Scala School.

Ponena za Python, ndikukhulupirira Python Yosavuta buku labwino kwambiri lapakati.

4. Zida zogwirira ntchito ndi deta yayikulu

Nawu mndandanda wa zida zodziwika kwambiri padziko lonse lapansi za data yayikulu:

  • Apache Spark
  • Apache Kafka
  • Apache Hadoop (HDFS, HBase, Hive)
  • Apache cassandra

Mungapeze zambiri zokhudza kumanga midadada lalikulu deta mu zodabwitsa izi malo ochezera. Zida zodziwika kwambiri ndi Spark ndi Kafka. Ndiwofunikanso kuwaphunzira, ndikofunikira kumvetsetsa momwe amagwirira ntchito kuchokera mkati. Jay Kreps (wolemba nawo wa Kafka) adasindikiza ntchito yayikulu mu 2013 Log: Zomwe Wopanga Mapulogalamu Onse Ayenera Kudziwa Zokhudza Real-Time Data Aggregation AbstractionMwa njira, malingaliro akulu ochokera ku Talmud iyi adagwiritsidwa ntchito popanga Apache Kafka.

5. Mapulatifomu amtambo

Kodi mainjiniya a data ndi ndani, ndipo mumakhala bwanji m'modzi?

Kudziwa kwa nsanja imodzi yamtambo kuli pamndandanda wazofunikira kwa omwe adzalembetse udindo wa mainjiniya a data. Olemba ntchito amakonda Amazon Web Services, ndi nsanja yamtambo ya Google pamalo achiwiri ndipo Microsoft Azure ikutulutsa atatu apamwamba.

Muyenera kukhala ndi chidziwitso chabwino cha Amazon EC2, AWS Lambda, Amazon S3, DynamoDB.

6. machitidwe ogawidwa

Kugwira ntchito ndi data yayikulu kumatanthawuza kukhalapo kwa magulu a makompyuta odzipangira okha, kulumikizana komwe kumachitika pamaneti. Kukula kwa tsango, kumapangitsanso mwayi wolephera kwa ma membala ake. Kuti mukhale wasayansi wamkulu wa data, muyenera kumvetsetsa zovuta ndi mayankho omwe alipo pamachitidwe ogawidwa. Malowa ndi akale komanso ovuta.

Andrew Tanenbaum amaonedwa kuti ndi mpainiya pantchito imeneyi. Kwa iwo omwe saopa chiphunzitso, ndikupangira buku lake "Distributed Systems", zingawoneke zovuta kwa oyamba kumene, koma zidzakuthandizani kukulitsa luso lanu.

Ndimaganizira Kupanga Mapulogalamu Ogwiritsa Ntchito Zambiri ndi Martin Kleppmann buku labwino kwambiri loyambira. Mwa njira, Martin ali ndi zodabwitsa blog. Ntchito yake ithandiza kukonza chidziwitso chomanga maziko amakono osungira ndi kukonza deta yayikulu.
Kwa iwo omwe amakonda kuwonera makanema, pali maphunziro pa Youtube Makina apakompyuta ogawidwa.

7. Mapaipi a data

Kodi mainjiniya a data ndi ndani, ndipo mumakhala bwanji m'modzi?

Mapaipi a data ndichinthu chomwe simungakhale nacho ngati injiniya wa data.

Nthawi zambiri, wojambula deta amapanga chotchedwa payipi ya data, ndiko kuti, amapanga njira yoperekera deta kuchokera kumalo ena kupita kumalo. Izi zitha kukhala zolemba zomwe zimapita ku API ya ntchito zakunja kapena kupanga funso la SQL, kuwonjezera deta, ndikuyika mu sitolo yapakati (malo osungiramo data) kapena malo osungiramo data osakhazikika (madzi a data).

Mwachidule: mndandanda wofunikira wa injiniya wa data

Kodi mainjiniya a data ndi ndani, ndipo mumakhala bwanji m'modzi?

Kufotokozera mwachidule, kumvetsetsa bwino zotsatirazi ndikofunikira:

  • Kachitidwe ka Information;
  • Kupititsa patsogolo mapulogalamu (Agile, DevOps, Design Techniques, SOA);
  • Machitidwe ogawidwa ndi mapulogalamu ofanana;
  • Mfundo Zosungirako Zosungirako - Kukonzekera, Kupanga, Kugwira Ntchito ndi Kuthetsa Mavuto;
  • Kupanga kwa zoyeserera - Mayeso a A/B kuti atsimikizire malingaliro, kudziwa kudalirika, magwiridwe antchito, ndikupanga njira zodalirika zoperekera mayankho abwino mwachangu.

Izi ndi zochepa chabe mwazofunikira kuti mukhale injiniya wa data, kotero phunzirani ndikumvetsetsa machitidwe a deta, machitidwe a chidziwitso, kutumiza kosalekeza / kutumiza / kuphatikiza, zilankhulo zamapulogalamu, ndi mitu ina ya sayansi ya makompyuta (osati nkhani zonse).

Ndipo potsiriza, chinthu chomaliza koma chofunika kwambiri chimene ndikufuna kunena.

Njira yoti mukhale Data Engineering siyosavuta monga momwe ingawonekere. Sakhululukira, amakhumudwitsa, ndipo muyenera kukonzekera izi. Nthawi zina paulendowu zitha kukukakamizani kuti mugonje. Koma iyi ndi ntchito yeniyeni komanso njira yophunzirira.

Musati muzipaka shuga kuyambira pachiyambi. Cholinga chonse choyenda ndikuphunzira momwe mungathere ndikukonzekera zovuta zatsopano.
Nachi chithunzi chabwino chomwe ndapeza chomwe chikuwonetsa bwino mfundo iyi:

Kodi mainjiniya a data ndi ndani, ndipo mumakhala bwanji m'modzi?

Ndipo inde, kumbukirani kupewa kutopa ndi kupuma. Izinso ndizofunikira kwambiri. Zabwino zonse!

Mukuganiza bwanji za nkhaniyi, abwenzi? Tikukuitanani webinar yaulere, zomwe zichitike lero pa 20.00. Pa webinar, tikambirana momwe tingamangire makina opangira ma data ogwira ntchito komanso owopsa akampani yaying'ono kapena kuyambitsa pamtengo wotsika. Monga chizolowezi, tidzadziwa zida za Google Cloud processing data. Tiwonana!

Source: www.habr.com

Kuwonjezera ndemanga