Delta: Kulunzanitsa kwa Data ndi Pulatifomu Yowonjezera

Poyembekezera kukhazikitsidwa kwa kayendedwe katsopano pa mlingo Data Engineer Takonza zomasulira za zinthu zosangalatsa.

Delta: Kulunzanitsa kwa Data ndi Pulatifomu Yowonjezera

mwachidule

Tidzakambirana za njira yodziwika bwino yomwe mapulogalamu amagwiritsa ntchito masitolo angapo a data, pomwe sitolo iliyonse imagwiritsidwa ntchito pazolinga zake, mwachitsanzo, kusunga mawonekedwe ovomerezeka a data (MySQL, etc.), perekani zofufuzira zapamwamba (ElasticSearch, etc.) .), caching (Memcached, etc.) ndi ena. Nthawi zambiri, mukamagwiritsa ntchito masitolo angapo a data, imodzi mwazinthuzo imakhala ngati sitolo yoyamba ndipo enawo ngati masitolo otengera. Vuto lokha ndi momwe mungalumikizire masitolo a data awa.

Tinayang'ana njira zingapo zomwe zimayesa kuthetsa vuto la kulunzanitsa masitolo angapo, monga kulemba kawiri, kugawidwa kogawidwa, ndi zina zotero. Komabe, njirazi zili ndi malire akuluakulu pakugwiritsa ntchito moyo weniweni, kudalirika, ndi kukonza. Kuphatikiza pa kulunzanitsa deta, mapulogalamu ena amafunikiranso kukulitsa deta poyimba ntchito zakunja.

Delta idapangidwa kuti ithetse mavutowa. Delta pamapeto pake imapereka nsanja yosasinthika, yoyendetsedwa ndi zochitika pakulunzanitsa deta ndikulemeretsa.

Zothetsera zomwe zilipo

Kulowa kawiri

Kuti musunge masitolo awiri a data mu kulunzanitsa, mutha kugwiritsa ntchito zolemba ziwiri, zomwe zimalembera ku sitolo imodzi ndikulembera winayo nthawi yomweyo. Chojambulira choyamba chikhoza kuyesedwanso ndipo chachiwiri chikhoza kuthetsedwa ngati choyamba chikulephereka chiwerengero cha zoyesayesa chatha. Komabe, masitolo awiriwa atha kukhala osalumikizana ngati kulembera sitolo yachiwiri sikulephera. Vutoli nthawi zambiri limathetsedwa popanga njira yochira yomwe imatha kusamutsanso deta nthawi ndi nthawi kuchokera kosungira koyamba kupita ku yachiwiri, kapena kutero pokhapokha ngati pali kusiyana komwe kukupezeka mu data.

Mavuto:

Kuchita ndondomeko yobwezeretsa ndi ntchito yapadera yomwe singagwiritsidwenso ntchito. Kuonjezera apo, deta pakati pa malo osungirako imakhalabe yosagwirizanitsa mpaka ndondomeko yobwezeretsa ikuchitika. Yankho limakhala lovuta kwambiri ngati masitolo oposa awiri akugwiritsidwa ntchito. Pomaliza, njira yobwezeretsa imatha kuwonjezera katundu kugwero loyambirira la data.

Sinthani tebulo la log

Zosintha zikachitika pamatebulo (monga kuyika, kukonzanso, ndikuchotsa zolemba), zolemba zosintha zimawonjezedwa patebulo la logi monga gawo la zochitika zomwezo. Ulusi wina kapena ndondomeko nthawi zonse zimapempha zochitika kuchokera pa tebulo la logi ndikuzilembera ku sitolo imodzi kapena zingapo za data, ngati kuli kofunikira, kuchotsa zochitika pa tebulo la logi pambuyo potsimikiziridwa ndi masitolo onse.

Mavuto:

Njira iyi iyenera kukhazikitsidwa ngati laibulale, ndipo mosasintha popanda kusintha kachidindo kakugwiritsa ntchito. M'malo a polyglot, kukhazikitsa laibulale yotereyi kuyenera kukhalapo m'zilankhulo zilizonse zofunika, koma kuwonetsetsa kuti magwiridwe antchito ndi machitidwe azilankhulo zonse ndizovuta kwambiri.

Vuto lina liri pakupeza kusintha kwa schema m'makina omwe samathandizira kusintha kwa schema [1] [2], monga MySQL. Chifukwa chake, njira yosinthira (mwachitsanzo, kusintha kwa schema) ndikuyijambula muzolemba zosintha sizigwira ntchito nthawi zonse.

Zogulitsa Zogawidwa

Zogulitsa zomwe zimagawidwa zitha kugwiritsidwa ntchito kugawanitsa malonda m'masitolo angapo amitundu yosiyanasiyana kuti ntchitoyi ikhale yoperekedwa ku masitolo onse omwe amagwiritsidwa ntchito, kapena osaperekedwa kwa aliyense wa iwo.

Mavuto:

Kugawidwa kogawidwa ndivuto lalikulu kwambiri m'masitolo amitundu yosiyanasiyana. Mwa chikhalidwe chawo, amatha kudalira chochepa kwambiri cha machitidwe omwe akukhudzidwa. Mwachitsanzo, zochitika za XA zimalepheretsa kugwira ntchito ngati ntchitoyo ikulephera panthawi yokonzekera. Kuphatikiza apo, XA siyimapereka kuzindikira kwanthawi yayitali kapena kuthandizira njira zowongolera ndalama. Kuphatikiza apo, makina ena monga ElasticSearch samathandizira XA kapena mtundu wina uliwonse wosinthika. Chifukwa chake, kuwonetsetsa kuti atomiki yolemba pamatekinoloje osiyanasiyana osungira deta imakhalabe ntchito yovuta kwambiri pakufunsira [3].

Delta

Delta idapangidwa kuti izithana ndi zoletsa zamayankho omwe alipo kale komanso imathandizira kupititsa patsogolo chidziwitso chapaulendo. Cholinga chathu chinali kuchotsa zovuta zonsezi kwa opanga mapulogalamu kuti athe kuyang'ana kwambiri pakukhazikitsa magwiridwe antchito. Kenako tikhala tikufotokozera "Kusaka Kwakanema", momwe mungagwiritsire ntchito Netflix's Delta.

Netflix imagwiritsa ntchito kwambiri kamangidwe ka microservice, ndipo microservice iliyonse imagwiritsa ntchito mtundu umodzi wa data. Zambiri zokhudza filimuyi zili mu microservice yotchedwa Movie Service, ndi deta yogwirizana monga zokhudzana ndi opanga, ochita zisudzo, ogulitsa, ndi zina zotero zimayendetsedwa ndi ma microservices ena angapo (omwe ndi Deal Service, Talent Service ndi Vendor Service).
Ogwiritsa ntchito mabizinesi ku Netflix Studios nthawi zambiri amafunikira kufufuza njira zosiyanasiyana zamakanema, ndichifukwa chake ndikofunikira kuti athe kusaka pa data yonse yokhudzana ndi kanema.

Delta isanachitike, gulu losakira makanema limayenera kukokera zambiri kuchokera ku ma microservices angapo musanalembe zomwe zili mu kanema. Kuphatikiza apo, gululo lidayenera kupanga dongosolo lomwe lingasinthire nthawi ndi nthawi mndandanda wazosaka popempha kusintha kwa ma microservices ena, ngakhale panalibe kusintha konse. Dongosololi mwachangu linakhala lovuta komanso lovuta kulisamalira.

Delta: Kulunzanitsa kwa Data ndi Pulatifomu Yowonjezera
Chithunzi 1. Dongosolo lovotera ku Delta
Pambuyo pogwiritsira ntchito Delta, dongosololi linasinthidwa kukhala dongosolo loyendetsedwa ndi zochitika monga momwe tawonetsera pachithunzichi. Zochitika za CDC (Change-Data-Capture) zimatumizidwa ku mitu ya Keystone Kafka pogwiritsa ntchito Delta-Connector. Pulogalamu ya Delta yomangidwa pogwiritsa ntchito Delta Stream Processing Framework (yochokera pa Flink) imalandira zochitika za CDC kuchokera pamutu, kuwalemeretsa poyitana ma microservices ena, ndipo pamapeto pake amadutsa deta yolemeretsedwa ku index yosaka mu Elasticsearch. Ndondomeko yonseyi ikuchitika pafupifupi nthawi yeniyeni, ndiko kuti, mwamsanga pamene zosintha zaperekedwa ku malo osungiramo deta, zolemba zofufuzira zimasinthidwa.

Delta: Kulunzanitsa kwa Data ndi Pulatifomu Yowonjezera
Chithunzi 2. Data payipi pogwiritsa ntchito Delta
M'magawo otsatirawa, tidzafotokozera ntchito ya Delta-Connector, yomwe imagwirizanitsa ndi yosungirako ndikusindikiza zochitika za CDC kumalo oyendetsa, omwe ndi njira yeniyeni yotumizira deta yomwe imayendetsa zochitika za CDC ku mitu ya Kafka. Ndipo pamapeto pake, tikambirana za njira yosinthira mtsinje wa Delta, yomwe opanga mapulogalamu angagwiritse ntchito pokonza deta komanso kukulitsa malingaliro.

CDC (Change-Data-Capture)

Tapanga ntchito ya CDC yotchedwa Delta-Connector, yomwe imatha kujambula zosintha kuchokera ku sitolo ya data mu nthawi yeniyeni ndikuzilembera kumtsinje. Zosintha zenizeni zimatengedwa kuchokera ku chipika cha malonda ndi malo osungiramo zinthu. Zotayira zimagwiritsidwa ntchito chifukwa zipika zamalonda nthawi zambiri sizisunga mbiri yonse yakusintha. Zosintha nthawi zambiri zimasinthidwa ngati zochitika za Delta, kotero wolandila sayenera kuda nkhawa kuti kusinthaku kumachokera kuti.

Delta-Connector imathandizira zina zingapo monga:

  • Kutha kulemba zidziwitso zomwe zidachitika kale Kafka.
  • Kutha kuyambitsa zotayira pamanja nthawi iliyonse pamagome onse, tebulo linalake, kapena makiyi apadera.
  • Zotayira zitha kubwezedwa m'machunks, kotero palibe chifukwa choyambiranso ngati zalephera.
  • Palibe chifukwa choyika maloko pamatebulo, zomwe ndizofunikira kwambiri kuwonetsetsa kuti kuchuluka kwa magalimoto a database sikuletsedwa ndi ntchito yathu.
  • Kupezeka kwakukulu chifukwa chakuchulukirachulukira mu Magawo Opezeka a AWS.

Pakali pano timathandizira MySQL ndi Postgres, kuphatikizapo kutumiza pa AWS RDS ndi Aurora. Timathandiziranso Cassandra (multi-master). Mutha kudziwa zambiri za Delta-Connector Pano positi blog.

Kafka ndi transport layer

Malo oyendetsa zochitika za Delta amamangidwa pa ntchito yotumizira mauthenga papulatifomu Mwalawafungulo.

M'mbiri, kutumiza pa Netflix kwakonzedwa kuti athe kupezeka m'malo mokhala ndi moyo wautali (onani pansipa). nkhani yapita). Kugulitsako kunali kusagwirizana kwa data ya broker muzochitika zosiyanasiyana zam'mphepete. Mwachitsanzo, chisankho cha atsogoleri osadetsedwa ali ndi udindo woti wolandirayo akhale ndi zochitika zobwereza kapena zotayika.

Ndi Delta, tinkafuna zitsimikizo zolimba kuti tiwonetsetse kuti zochitika za CDC zimaperekedwa m'masitolo otengedwa. Pachifukwa ichi, tinakonza gulu la Kafka lopangidwa mwapadera ngati chinthu choyamba. Mutha kuyang'ana zokonda za broker patebulo ili pansipa:

Delta: Kulunzanitsa kwa Data ndi Pulatifomu Yowonjezera

M'magulu a Keystone Kafka, chisankho cha atsogoleri osadetsedwa nthawi zambiri zimaphatikizidwa kuti zitsimikizire kupezeka kwa osindikiza. Izi zitha kupangitsa kuti uthenga utayike ngati chofananira chosasinthika chasankhidwa kukhala mtsogoleri. Kwa gulu latsopano la Kafka lopezeka, njirayo chisankho cha atsogoleri osadetsedwa kuzimitsa kuteteza uthenga kutayika.

Tinaonjezeranso replication factor ku 2 mpaka 3 ndi zofananira zosachepera zochepa 1 mpaka 2. Ofalitsa akulembera gululi amafuna ma acks ochokera kwa ena onse, kuwonetsetsa kuti 2 mwa 3 zofananira zili ndi mauthenga amakono omwe amatumizidwa ndi wosindikiza.

Pamene broker atha, chochitika chatsopano chimalowa m'malo chakale. Komabe, wobwereketsa watsopanoyo adzafunika kupeza zofananira zosasinthika, zomwe zingatenge maola angapo. Kuti tichepetse nthawi yobwezeretsa pazochitikazi, tinayamba kugwiritsa ntchito block data yosungirako (Amazon Elastic Block Store) m'malo mwa ma disks am'deralo. Chochitika chatsopano chikalowa m'malo mwa broker yomwe yathetsedwa, imayika voliyumu ya EBS yomwe chochitikacho chinali nacho ndikuyamba kulandira mauthenga atsopano. Izi zimachepetsa kubweza nthawi yotsalira kuchokera maola kupita ku mphindi chifukwa chatsopanocho sichifunikanso kubwereza kuchokera ku malo opanda kanthu. Ponseponse, kusungirako kosiyana ndi kusinthika kwa ma broker kumachepetsa kwambiri kusintha kwa ma broker.

Kuti tiwonjezere chitsimikizo chopereka deta, tidagwiritsa ntchito dongosolo kutsatira uthenga kuti muzindikire kutayika kwa uthenga uliwonse pamikhalidwe yovuta kwambiri (mwachitsanzo, kusanja kwa wotchi mu mtsogoleri wogawa).

Stream Processing Framework

Chosanjikiza cha Delta chimamangidwa pamwamba pa nsanja ya Netflix SPaaS, yomwe imapereka kuphatikiza kwa Apache Flink ndi chilengedwe cha Netflix. Pulatifomuyi imapereka mawonekedwe ogwiritsira ntchito omwe amayang'anira kutumizidwa kwa ntchito za Flink ndi orchestration ya magulu a Flink pamwamba pa nsanja yathu yoyang'anira zotengera za Titus. Mawonekedwewa amayang'aniranso masanjidwe a ntchito ndikulola ogwiritsa ntchito kusintha masinthidwe mwachangu popanda kubweza ntchito za Flink.

Delta imapereka njira yoyendetsera mitsinje yotengera Flink ndi SPaaS yomwe imagwiritsa ntchito zofotokozera DSL (Domain Specific Language) kuti mumve zambiri zaukadaulo. Mwachitsanzo, kuti afotokoze sitepe yomwe zochitika zidzapindulitsidwe poyitana mautumiki akunja, ogwiritsa ntchito ayenera kulemba DSL yotsatira, ndipo chimango chidzapanga chitsanzo chokhazikika, chomwe chidzachitidwa ndi Flink.

Delta: Kulunzanitsa kwa Data ndi Pulatifomu Yowonjezera
Chithunzi 3. Chitsanzo cha kulemeretsa pa DSL ku Delta

Kukonzekera kumangochepetsa njira yophunzirira, komanso kumaperekanso zinthu zomwe zimagwiritsidwa ntchito potsata mitsinje monga deduplication, schematization, ndi kusinthasintha ndi kulimba mtima kuti athetse mavuto omwe amagwiritsidwa ntchito.

Delta Stream Processing Framework ili ndi ma module awiri ofunikira, gawo la DSL & API ndi gawo la Runtime. Module ya DSL & API imapereka ma DSL ndi UDF (User-Defined-Function) APIs kuti ogwiritsa ntchito athe kulemba malingaliro awo okonzekera (monga kusefa kapena kusintha). Runtime module imapereka kukhazikitsidwa kwa DSL parser yomwe imapanga chiwonetsero chamkati cha masitepe okonza mumitundu ya DAG. Chigawo cha Execution chimatanthauzira mitundu ya DAG kuti iyambitse ziganizo zenizeni za Flink ndikuyendetsa pulogalamu ya Flink. Mamangidwe a chimango akuwonetsedwa mu chithunzi chotsatirachi.

Delta: Kulunzanitsa kwa Data ndi Pulatifomu Yowonjezera
Chithunzi 4. Zomangamanga za Delta Stream Processing Framework

Njirayi ili ndi zabwino zingapo:

  • Ogwiritsa ntchito amatha kuyang'ana pamalingaliro awo abizinesi osayang'ana zenizeni za Flink kapena mawonekedwe a SpaaS.
  • Kukhathamiritsa kutha kuchitika m'njira yowonekera kwa ogwiritsa ntchito, ndipo zolakwika zitha kukhazikitsidwa popanda kufunikira kusintha kwa code code (UDF).
  • Zomwe zimagwiritsidwa ntchito pa Delta ndizosavuta kwa ogwiritsa ntchito chifukwa nsanja imapereka kusinthasintha komanso kukhazikika kunja kwa bokosi ndikusonkhanitsa ma metrics osiyanasiyana omwe angagwiritsidwe ntchito pochenjeza.

Kugwiritsa ntchito kupanga

Delta yakhala ikupanga kwa chaka chimodzi ndipo imagwira ntchito yofunika kwambiri pamapulogalamu ambiri a Netflix Studio. Anathandizira magulu kukhazikitsa zochitika zogwiritsira ntchito monga kusakira, kusungirako deta, ndi mayendedwe oyendetsedwa ndi zochitika. Pansipa pali chithunzithunzi cha zomangamanga zapamwamba za nsanja ya Delta.

Delta: Kulunzanitsa kwa Data ndi Pulatifomu Yowonjezera
Chithunzi 5. Zomangamanga zapamwamba za Delta.

Zothokoza

Tikufuna kuthokoza anthu otsatirawa omwe adatenga nawo gawo pakupanga ndi chitukuko cha Delta ku Netflix: Allen Wang, Charles Zhao, Jaebin Yoon, Josh Snyder, Kasturi Chatterjee, Mark Cho, Olof Johansson, Piyush Goyal, Prashanth Ramdas, Raghuram Onti Srinivasan, Sandeep Gupta , Steven Wu, Tharanga Gamaethige, Yun Wang ndi Zhenzhong Xu.

Zotsatira

  1. dev.mysql.com/doc/refman/5.7/en/implicit-commit.html
  2. dev.mysql.com/doc/refman/5.7/en/cannot-roll-back.html
  3. Martin Kleppmann, Alastair R. Beresford, Boerge Svingen: Kukonzekera zochitika pa intaneti. Commun. ACM 62(5): 43–49 (2019). DOI: doi.org/10.1145/3312527

Lowani pa webinar yaulere: "Data Build Tool for Amazon Redshift Storage."

Source: www.habr.com

Kuwonjezera ndemanga