Delta: Data Synchronization uye Enrichment Platform

Mukutarisira kutangwa kwekuyerera kutsva pachiyero "Data Engineer" Takagadzirira shanduro yezvinyorwa zvinonakidza.

Delta: Data Synchronization uye Enrichment Platform

tsananguro

Isu tichataura nezve yakajairika yakakurumbira maitiro ayo maapplication anoshandisa akawanda data zvitoro, apo chitoro chimwe nechimwe chinoshandiswa kune zvachinoda, semuenzaniso, kuchengetedza iyo canonical fomu yedata (MySQL, nezvimwewo), inopa hunyanzvi hwekutsvaga (ElasticSearch, etc.) .), caching (Memcached, etc.) nevamwe. Kazhinji, kana uchishandisa akawanda data zvitoro, imwe yacho inoita seyekutanga chitoro uye imwe sezvitoro zvinobuda. Dambudziko chete nderekuti ungawiriranisa sei zvitoro zvedata.

Takatarisa akati wandei akasiyana mapatani akaedza kugadzirisa dambudziko rekubatanidza zvitoro zvakawanda, sekunyora kaviri, kugoverwa kwekutengeserana, nezvimwe. Zvisinei, nzira idzi dzine zvipingamupinyi zvakakura maererano nekushandiswa kwehupenyu chaihwo, kuvimbika, nekugadzirisa. Pamusoro pekubatanidza data, mamwe maapplication anodawo kupfumisa data nekufonera ekunze masevhisi.

Delta yakagadzirwa kugadzirisa matambudziko aya. Delta pakupedzisira inopa inowirirana, inofambiswa nechiitiko chikuva chekubatanidza data uye kupfumisa.

Mhinduro dziripo

Kupinda kaviri

Kuti uchengete zvitoro zviviri zvedata mukuwiriranisa, unogona kushandisa zvinyorwa zviviri, izvo zvinonyora kune chimwe chitoro uye wozonyorera kune chimwe pakarepo mushure. Yekutanga kurekodha inogona kudzokororwa uye yechipiri inogona kubviswa kana yekutanga ikakundikana mushure mekunge nhamba yekuedza yapera. Nekudaro, izvo zvitoro zviviri zvedata zvinogona kusawirirana kana kunyorera kuchitoro chechipiri kukatadza. Dambudziko iri rinowanzogadziriswa nekugadzira nzira yekudzoreredza iyo inogona nguva nenguva kuendesa zvakare data kubva kune yekutanga kuchengetedza kune yechipiri, kana kuzviita chete kana misiyano ichionekwa mune data.

Matambudziko:

Kuita maitiro ekudzoreredza ibasa chairo risingagone kushandiswa zvakare. Pamusoro pezvo, data pakati penzvimbo dzekuchengetera rinoramba riri kunze kwekuenderana kusvika maitiro ekudzorera aitika. Mhinduro yacho inova yakaoma kana anopfuura maviri zvitoro zvedata akashandiswa. Pakupedzisira, maitiro ekudzorera anogona kuwedzera mutoro kune yekutanga data sosi.

Shandura tafura yelogi

Kana shanduko dzikaitika kune seti yematafura (sekuisa, kuvandudza, uye kudzima rekodhi), zvinyorwa zvekuchinja zvinowedzerwa patafura yelogi sechikamu chekutengeserana kumwe chete. Imwe tambo kana maitiro anogara achikumbira zviitiko kubva patafura yelogi uye anonyora kune imwe kana akawanda zvitoro zvedata, kana zvichidikanwa, kubvisa zviitiko kubva patafura yelogi mushure mekunge rekodhi yasimbiswa nezvitoro zvese.

Matambudziko:

Iyi pateni inofanirwa kuitwa seraibhurari, uye zvakanaka pasina kushandura kodhi yekushandisa inoishandisa. Munzvimbo yepolyglot, kuitwa kweraibhurari yakadaro kunofanirwa kuvepo mumutauro chero unodiwa, asi kuve nechokwadi chekuenderana kwekuita uye maitiro mumitauro yese kwakaoma.

Rimwe dambudziko riri mukuwana shanduko ye schema mumasisitimu asingatsigire transactional schema shanduko [1] [2], senge MySQL. Naizvozvo, maitiro ekuita shanduko (somuenzaniso, schema shanduko) uye transactionally kurekodha iyo mune yekuchinja tafura tafura haishande nguva dzose.

Distributed Transactions

Mafambisirwo akagoverwa anogona kushandiswa kupatsanura kutengeserana kune akawanda heterogeneous data zvitoro kuitira kuti mashandiro acho angave akazvipira kune ese ezvitoro zve data anoshandiswa, kana kusazvipira kune chero chazvo.

Matambudziko:

Distributed transaction idambudziko rakakura kwazvo kune heterogeneous data store. Nehunhu hwavo, vanogona chete kuvimba neiyo yakaderera yakafanana denominator yemasisitimu anobatanidzwa. Semuenzaniso, XA transactions inovharira kuurayiwa kana maitiro ekunyorera akatadza panguva yekugadzirira chikamu. Pamusoro pezvo, XA haipe kuona kwakafa kana kutsigira tarisiro yekudzora concurrency zvirongwa. Pamusoro pezvo, mamwe masisitimu akaita seElasticSearch haatsigire XA kana chero imwe heterogeneous transaction modhi. Nekudaro, kuve nechokwadi chekunyora atomicity muakasiyana ekuchengetedza data tekinoroji rinoramba riri basa rakaoma kwazvo rekushandisa [3].

Delta

Delta yakagadzirirwa kugadzirisa zvipingamupinyi zvearipo ekugadzirisa data mhinduro uye zvakare inogonesa pane-iyo-inobhururuka data kupfumisa. Chinangwa chedu chaive chekubvisa zvese izvi zvinonetsa kubva kune vanogadzira maapplication kuti vakwanise kutarisa zvizere kuita basa rebhizinesi. Tevere tichave tichitsanangura "Movie Search", iyo chaiyo yekushandisa kesi yeNetflix's Delta.

Netflix inoshandisa zvakanyanya microservice architecture, uye microservice yega yega inowanzo shandisa imwe mhando yedata. Ruzivo rwekutanga nezve firimu iri mune microservice inonzi Movie Service, uye yakabatana data senge ruzivo nezve vagadziri, vatambi, vatengesi, zvichingodaro inotungamirwa nemamwe akati wandei mamicroservices (anonzi Deal Service, Talent Service uye Vendor Service).
Vashandisi vebhizinesi paNetflix Studios vanowanzoda kutsvaga nzira dzakasiyana dzemubhaisikopo, ndosaka zvakakosha kuti vakwanise kutsvaga data rese rine chekuita nefirimu.

Pamberi peDelta, timu yekutsvaga bhaisikopo yaida kudhonza data kubva kune akawanda mamicroservices isati yarongedza data remufirimu. Pamusoro pezvo, timu yaifanira kugadzira sisitimu yaizopota ichivandudza index yekutsvaga nekukumbira shanduko kubva kune mamwe mamicroservices, kunyangwe pakange pasina shanduko zvachose. Iyi sisitimu yakakurumidza kuve yakaoma uye yakaoma kuchengetedza.

Delta: Data Synchronization uye Enrichment Platform
Mufananidzo 1. Polling system kuDelta
Mushure mekushandisa Delta, iyo sisitimu yakarerutswa kune chiitiko chinofambiswa sisitimu sezvakaratidzwa mumufananidzo unotevera. CDC (Change-Data-Capture) zviitiko zvinotumirwa kuKeystone Kafka misoro uchishandisa Delta-Connector. Chishandiso cheDelta chakavakwa uchishandisa iyo Delta Stream Processing Framework (yakavakirwa paFlink) inogamuchira zviitiko zveCDC kubva mumusoro, inovapfumisa nekudaidza mamwe mamicroservices, uye pakupedzisira inopfumisa iyo data kune yekutsvaga index muElasticsearch. Iyo yese maitiro inoitika ingangoita munguva chaiyo, ndiko kuti, nekukurumidza kana shanduko dzaitwa kudura re data, ma index ekutsvaga anovandudzwa.

Delta: Data Synchronization uye Enrichment Platform
Mufananidzo 2. Data pombi uchishandisa Delta
Muzvikamu zvinotevera, tichatsanangura kushanda kweDelta-Connector, iyo inobatanidza nekuchengetedza uye inobudisa zviitiko zveCDC kune kutakura kwekutakura, iyo inguva chaiyo yekufambisa data data inofambisa zviitiko zveCDC kuKafka misoro. Uye pakupedzisira, isu tichataura nezve Delta rukova rwekugadzirisa masisitimu, ayo vanogadzira maapplication vanogona kushandisa kugadzirisa data uye kupfumisa pfungwa.

CDC (Change-Data-Capture)

Isu takagadzira sevhisi yeCDC inonzi Delta-Connector, iyo inogona kutora shanduko dzakazvipira kubva muchitoro chedata munguva chaiyo uye kuinyorera kune rwizi. Shanduko dzenguva-chaiyo dzinotorwa kubva kurogi yekutengeserana uye marara ekuchengetedza. Dumps inoshandiswa nekuti matanda ekutengeserana kazhinji haachengete nhoroondo yese yekuchinja. Shanduko dzinowanzo kuisirwa sezviitiko zveDelta, saka anogamuchira haafanire kunetseka kuti shanduko yacho inobva kupi.

Delta-Connector inotsigira akati wandei mamwe maficha akadai se:

  • Kugona kunyora tsika yakabuda data yapfuura Kafka.
  • Kugona kumisa maduru emaoko chero nguva kune ese matafura, tafura chaiyo, kana kune chaiwo makiyi ekutanga.
  • Dumps inogona kudzoserwa muzvidimbu, saka hapana chikonzero chekutanga patsva kana ukatadza.
  • Iko hakuna chikonzero chekuisa makiyi pamatafura, izvo zvakakosha kuve nechokwadi chekuti dhatabhesi kunyora traffic haina kumbovharwa nebasa redu.
  • Kuwanikwa kwepamusoro nekuda kwezviitiko zvisingaverengeki muAWS Kuwanikwa Nzvimbo.

Isu parizvino tinotsigira MySQL nePostgres, kusanganisira kutumirwa paAWS RDS neAurora. Isu tinotsigirawo Cassandra (akawanda-tenzi). Unogona kuwana rumwe ruzivo nezve Delta-Connector pano blog.

Kafka uye yekutakura layer

Delta's event transport layer yakavakirwa papuratifomu yekutumira mameseji dombo rinokosha.

Nhoroondo, kutumira paNetflix kwakagadziridzwa kuti iwanike kwete kurarama kwenguva refu (ona pazasi). nyaya yapfuura) Kutengeserana-kwaive kungangoita bhuroka data kusaenderana mune akasiyana epamucheto mamiriro. Semuyenzaniso, kusarudzwa kwemutungamiri asina kuchena ine basa rekuti mugamuchiri angangove nezviitiko zvakapetwa kana kurasika.

NeDelta, isu taida yakasimba durability vimbiso yekuona kuendeswa kweCDC zviitiko kuzvitoro zvakatorwa. Nechinangwa ichi, takakurudzira yakanyatsogadzirirwa Kafka cluster sechinhu chekutanga-kirasi. Unogona kutarisa mamwe marongero ebroker patafura pazasi:

Delta: Data Synchronization uye Enrichment Platform

MuKeystone Kafka masumbu, kusarudzwa kwemutungamiri asina kuchena kazhinji inosanganisirwa kuitira kuti muparidzi avepo. Izvi zvinogona kuguma nekurasikirwa kwemeseji kana mufananidzo usina kuwiriraniswa ukasarudzwa semutungamiri. Kutsva kwepamusoro kuwanikwa kweKafka cluster, sarudzo kusarudzwa kwemutungamiri asina kuchena yakadzimwa kudzivirira kurasikirwa kwemeseji.

Takawedzerawo replication factor kubva 2 kusvika 3 uye shoma insync replicas 1 kusvika ku2. Vaparidzi vanonyorera kuchikwata ichi vanoda acks kubva kune vamwe vose, vachiva nechokwadi chokuti 2 kubva pa3 replicas ine mharidzo yemazuva ano inotumirwa nemuparidzi.

Kana bhuroka rapera, chiitiko chitsva chinotsiva chekare. Nekudaro, iyo nyowani broker inozoda kubata neasina kuwiriraniswa replicas, izvo zvinogona kutora maawa akati wandei. Kuti tideredze nguva yekudzoreredza yechiitiko ichi, takatanga kushandisa block data kuchengetedza (Amazon Elastic Block Store) panzvimbo yemadhisiki dhisiki. Kana chiitiko chitsva chichitsiva chiitiko chakamiswa bhuroka, chinoisa vhoriyamu yeEBS iyo yakamiswa chiitiko uye inotanga kubata mameseji matsva. Maitiro aya anoderedza kudzoserwa kumashure nguva kubva kumaawa kuenda kumaminetsi nekuti iyo nyowani nyowani haichada kudzokorora kubva kune isina chinhu. Pakazara, chengetedzo yakaparadzana uye bhuroka lifecycles inoderedza zvakanyanya kukanganisa kwebroker switching.

Kuti tiwedzere kuwedzera vimbiso yekutumira data, takashandisa meseji yekutevera system kuona chero kurasikirwa kwemeseji pasi pemamiriro akanyanya (semuenzaniso, wachi desynchronization mumutungamiri wekugovera).

Stream Processing Framework

Delta's processing layer yakavakirwa pamusoro peNetflix SPaaS chikuva, iyo inopa Apache Flink kubatanidzwa neNetflix ecosystem. Iyi puratifomu inopa mushandisi interface inodzora kutumirwa kweFlink mabasa uye orchestration yeFlink masumbu pamusoro peiyo Titus container management platform. Iyo interface inobatawo zvigadziriso zvebasa uye inobvumira vashandisi kuita shanduko yekuchinja zvine simba pasina kudzoreredza Flink mabasa.

Delta inopa rwizi rwekugadzirisa hurongwa hwakavakirwa paFlink uye SPaaS inoshandisa annotation-based DSL (Domain Specific Language) kune abstract technical details. Semuenzaniso, kutsanangura nhanho iyo zviitiko zvichave zvakafumiswa nekudaidza ekunze masevhisi, vashandisi vanofanirwa kunyora inotevera DSL, uye iyo dhizaini ichagadzira modhi yakavakirwa pairi, iyo ichaitwa neFlink.

Delta: Data Synchronization uye Enrichment Platform
Mufananidzo 3. Muenzaniso wekupfumisa paDSL muDelta

Iyo yekugadziridza dhizaini haingodzikisire curve yekudzidza, asi zvakare inopa yakajairwa mafambiro ekugadzirisa maficha senge deduplication, schematization, uye kuchinjika uye kusimba kugadzirisa zvakajairika mashandiro matambudziko.

Delta Stream Processing Framework ine maviri makiyi module, iyo DSL & API module uye iyo Runtime module. Iyo DSL & API module inopa DSL uye UDF (User-Defined-Function) APIs kuitira kuti vashandisi vanyore yavo yekugadzira logic (sekusefa kana shanduko). Iyo Runtime module inopa kuita kweDSL parser inovaka inomiririra yemukati yematanho ekugadzirisa mumhando dzeDAG. Chikamu cheExecution chinodudzira mhando dzeDAG kutanga iwo chaiwo Flink zvirevo uye pakupedzisira kumhanya iyo Flink application. Mavakirwo emugadzirirwo anoratidzwa mumufananidzo unotevera.

Delta: Data Synchronization uye Enrichment Platform
Mufananidzo 4. Delta Stream Processing Framework architecture

Iyi nzira ine zvakawanda zvakanakira:

  • Vashandisi vanogona kutarisa pane yavo bhizinesi logic pasina kunyura mune zvakatemwa zveFlink kana iyo SPaaS chimiro.
  • Optimization inogona kuitwa nenzira iri pachena kune vashandisi, uye zvikanganiso zvinogona kugadziriswa pasina kuda chero shanduko kukodhi yemushandisi (UDF).
  • Iyo Delta application chiitiko chakarerutswa kune vashandisi nekuti chikuva chinopa kuchinjika uye kusimba kunze kwebhokisi uye inounganidza akasiyana akadzama metrics anogona kushandiswa kuchenjedza.

Kushandiswa kwekugadzira

Delta yanga ichigadzirwa kweanopfuura gore uye inoita basa rakakosha mune akawanda Netflix Studio application. Akabatsira zvikwata kushandisa makesi ekushandisa akadai sekutsvaga indexing, kuchengetedza data, uye kufambiswa kwechiitiko chinofambiswa. Pazasi pane mhedziso yemhando yepamusoro yezvivakwa zveDelta papuratifomu.

Delta: Data Synchronization uye Enrichment Platform
Mufananidzo 5. Delta's high-level architecture.

Kutenda

Tinoda kutenda vanhu vanotevera vakapinda mukugadzira nekusimudzira Delta paNetflix: Allen Wang, Charles Zhao, Jaebin Yoon, Josh Snyder, Kasturi Chatterjee, Mark Cho, Olof Johansson, Piyush Goyal, Prashanth Ramdas, Raghuram Onti. Srinivasan, Sandeep Gupta , Steven Wu, Tharanga Gamaethige, Yun Wang uye Zhenzhong Xu.

Sources

  1. dev.mysql.com/doc/refman/5.7/en/implicit-commit.html
  2. dev.mysql.com/doc/refman/5.7/en/cannot-roll-back.html
  3. Martin Kleppmann, Alastair R. Beresford, Boerge Svingen: Online chiitiko kugadzirisa. Commun. ACM 62(5): 43–49 (2019). DOI: doi.org/10.1145/3312527

Saina kuti uwane webinar yemahara: "Data Build Tool yeAmazon Redshift Storage."

Source: www.habr.com

Voeg