Delta: Aiki tare da Data Platform

A cikin tsammanin ƙaddamar da sabon kwarara a cikin ƙimar Injiniya Data Mun shirya fassarar abu mai ban sha'awa.

Delta: Aiki tare da Data Platform

Siffar

Za mu yi magana game da wani fairly m juna a cikin abin da aikace-aikace amfani da mahara data Stores, inda kowane kantin sayar da ake amfani da nasa dalilai, misali, don adana canonical nau'i na data (MySQL, da dai sauransu.), samar da ci-gaba search capabilities (ElasticSearch). da dai sauransu)), caching (Memcached, da dai sauransu) da sauransu. Yawanci, lokacin amfani da shagunan bayanai da yawa, ɗayansu yana aiki azaman babban kantin sayar da sauran kuma azaman shagunan da aka samu. Matsalar kawai ita ce yadda ake aiki tare da waɗannan shagunan bayanai.

Mun kalli nau'ikan nau'ikan nau'ikan nau'ikan nau'ikan da suka yi ƙoƙarin magance matsalar daidaita shaguna da yawa, kamar su rubuce-rubuce biyu, ma'amaloli da aka rarraba, da sauransu. Duk da haka, waɗannan hanyoyin suna da ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙayyadaddun amfani da rayuwa ta ainihi, aminci, da kiyayewa. Baya ga aiki tare da bayanai, wasu aikace-aikacen kuma suna buƙatar wadatar bayanai ta hanyar kiran sabis na waje.

An samar da Delta don magance wadannan matsalolin. A ƙarshe Delta yana ba da daidaito, dandamali mai gudana don daidaita bayanai da haɓakawa.

Maganganun da ake dasu

Shiga biyu

Don kiyaye ma'ajin bayanai guda biyu suna aiki tare, zaku iya amfani da rubutattun dual, wanda ya rubuta zuwa kantin guda ɗaya sannan ya rubuta zuwa ɗayan nan da nan. Za a iya sake gwada rikodin na farko kuma za a iya zubar da na biyu idan na farko ya gaza bayan adadin ƙoƙarin ya ƙare. Koyaya, ma'ajin bayanan biyu na iya zama rashin aiki tare idan rubutu zuwa kantin na biyu ya gaza. Yawancin lokaci ana magance wannan matsala ta hanyar ƙirƙirar hanyar dawowa wanda zai iya sake canza bayanai lokaci-lokaci daga ma'adana ta farko zuwa na biyu, ko yin haka kawai idan an gano bambance-bambance a cikin bayanan.

Matsaloli:

Yin hanyar dawowa wani takamaiman aiki ne wanda ba za a iya sake amfani da shi ba. Bugu da kari, bayanai tsakanin wuraren ajiya ba su aiki tare har sai an aiwatar da aikin maidowa. Maganin ya zama mafi rikitarwa idan an yi amfani da fiye da ɗakunan bayanai biyu. A ƙarshe, hanyar maidowa na iya ƙara kaya zuwa tushen bayanan asali.

Canja tebur log

Lokacin da canje-canje suka faru zuwa saitin tebur (kamar sakawa, sabuntawa, da share rikodin), ana ƙara bayanan canjin zuwa teburin log ɗin azaman ɓangaren ma'amala iri ɗaya. Wani zaren ko tsari koyaushe yana buƙatar abubuwan da suka faru daga teburin log kuma yana rubuta su zuwa shagunan bayanai ɗaya ko fiye, idan ya cancanta, cire abubuwan da suka faru daga teburin log bayan an tabbatar da rikodin ta duk shagunan.

Matsaloli:

Ya kamata a aiwatar da wannan tsari azaman ɗakin karatu, kuma da kyau ba tare da canza lambar aikace-aikacen da ke amfani da shi ba. A cikin yanayin polyglot, aiwatar da irin wannan ɗakin karatu ya kamata ya kasance a cikin kowane yare mai mahimmanci, amma tabbatar da daidaiton ayyuka da ɗabi'a a cikin harsuna yana da wahala sosai.

Wata matsala ta ta'allaka ne wajen samun sauye-sauyen tsari a tsarin da ba sa goyan bayan sauye-sauyen tsarin ma'amala [1][2], kamar MySQL. Don haka, tsarin yin canji (misali, canjin ƙira) da yin rikodi ta ma'amala a cikin tebur ɗin canjin ba koyaushe zai yi aiki ba.

Kasuwancin Rarraba

Ana iya amfani da ma'amaloli da aka rarraba don raba ma'amala a cikin ma'ajin bayanai daban-daban ta yadda aikin ya kasance ko dai ya sadaukar da duk wuraren ajiyar bayanan da aka yi amfani da su, ko kuma ba a ƙaddamar da kowane ɗayansu ba.

Matsaloli:

Ma'amaloli da aka rarraba babbar matsala ce ga shagunan bayanai iri-iri. Ta yanayinsu, za su iya dogara ga mafi ƙasƙanci gama gari na tsarin da abin ya shafa. Misali, ma'amaloli na XA sun toshe aiwatarwa idan tsarin aikace-aikacen ya gaza yayin lokacin shiri. Bugu da ƙari, XA ba ta samar da gano kulle-kulle ko goyan bayan kyawawan tsare-tsare na sarrafa ma'amala. Bugu da kari, wasu tsarin kamar ElasticSearch ba sa goyan bayan XA ko kowane nau'in ma'amala iri-iri. Don haka, tabbatar da rubuta atomity a cikin fasahar adana bayanai daban-daban ya kasance babban aiki mai wahala ga aikace-aikace [3].

Delta

An ƙirƙira Delta don magance iyakokin hanyoyin daidaita bayanan da ke akwai kuma yana ba da damar haɓaka bayanan kan-tashi. Manufarmu ita ce mu kawar da duk wannan rikitarwa daga masu haɓaka aikace-aikacen don su sami cikakkiyar mai da hankali kan aiwatar da ayyukan kasuwanci. A gaba za mu yi bayanin "Binciken Fim", ainihin yanayin amfani na Netflix's Delta.

Netflix yana amfani da gine-ginen microservice, kuma kowane microservice yawanci yana hidimar nau'in bayanai guda ɗaya. Bayani na asali game da fim yana kunshe ne a cikin microservice da ake kira Movie Service, da kuma bayanan da suka danganci, kamar bayanai game da furodusoshi, ƴan wasan kwaikwayo, dillalai, da sauransu, ana sarrafa su ta wasu ƙananan ayyuka (wato Deal Service, Talent Service and Vendor Service).
Masu amfani da kasuwanci a Netflix Studios sau da yawa suna buƙatar bincika cikin ma'auni daban-daban na fim, wanda shine dalilin da ya sa yana da matukar muhimmanci a gare su su iya bincika duk bayanan da suka shafi fim.

Kafin Delta, ƙungiyar binciken fina-finai na buƙatar cire bayanai daga ma'auni na microservices da yawa kafin tantance bayanan fim ɗin. Bugu da ƙari, ƙungiyar dole ne ta haɓaka tsarin da zai sabunta bayanan bincike lokaci-lokaci ta hanyar neman canje-canje daga wasu ƙananan ayyuka, koda kuwa babu canje-canje kwata-kwata. Wannan tsarin da sauri ya zama mai rikitarwa kuma yana da wuyar kulawa.

Delta: Aiki tare da Data Platform
Hoto 1. Tsarin zabe zuwa Delta
Bayan amfani da Delta, an sauƙaƙa tsarin zuwa tsarin tafiyar da taron kamar yadda aka nuna a adadi mai zuwa. Ana aika abubuwan CDC (Change-Data-Capture) abubuwan da suka faru zuwa batutuwan Keystone Kafka ta amfani da Delta-Connector. Aikace-aikacen Delta da aka gina ta amfani da Tsarin Gudanar da Rarraba Delta (dangane da Flink) yana karɓar abubuwan da suka faru na CDC daga wani batu, yana wadatar da su ta hanyar kiran wasu ƙananan ayyuka, kuma a ƙarshe ya ƙaddamar da ingantattun bayanai zuwa fihirisar bincike a cikin Elasticsearch. Dukkanin tsarin yana faruwa kusan a ainihin lokacin, wato, da zarar an ƙaddamar da canje-canje ga ma'ajiyar bayanai, ana sabunta ma'aunin bincike.

Delta: Aiki tare da Data Platform
Hoto 2. Bututun bayanai ta amfani da Delta
A cikin sassan da ke gaba, za mu bayyana aikin Delta-Connector, wanda ke haɗawa zuwa ajiya kuma ya buga abubuwan CDC zuwa layin sufuri, wanda shine ainihin kayan aikin watsa bayanai wanda ke tafiyar da abubuwan CDC zuwa batutuwan Kafka. Kuma a ƙarshe, za mu yi magana game da tsarin sarrafa rafin Delta, wanda masu haɓaka aikace-aikacen za su iya amfani da su don sarrafa bayanai da dabaru na haɓakawa.

CDC (Canja-Bayanai-Kwamar)

Mun ƙirƙira sabis na CDC mai suna Delta-Connector, wanda zai iya ɗaukar yunƙurin sauye-sauye daga ma'ajin bayanai a ainihin lokacin kuma ya rubuta su zuwa rafi. Ana ɗaukar canje-canje na ainihi daga ma'amalar ma'amala da jujjuyawar ajiya. Ana amfani da juji saboda yawan rajistar ma'amala ba sa adana tarihin canje-canje. Yawancin canje-canje ana jera su azaman abubuwan da suka faru na Delta, don haka mai karɓa ba lallai ne ya damu da inda canjin ya fito ba.

Delta-Connector yana goyan bayan ƙarin fasali da yawa kamar:

  • Ikon rubuta bayanan fitarwa na al'ada ta amfani da Kafka.
  • Ikon kunna jujjuyawar hannu a kowane lokaci don duk teburi, takamaiman tebur, ko don takamaiman maɓallan farko.
  • Ana iya ɗaukar juji a gungu-gungu, don haka babu buƙatar sake farawa gabaɗaya idan an gaza.
  • Babu buƙatar sanya makullai a kan teburi, wanda ke da matukar mahimmanci don tabbatar da cewa sabis ɗinmu bai taɓa toshe hanyoyin rubuta bayanai ba.
  • Babban samuwa saboda yawan lokuta a Yankunan Samun AWS.

A halin yanzu muna tallafawa MySQL da Postgres, gami da turawa akan AWS RDS da Aurora. Muna kuma goyan bayan Cassandra (multi-master). Kuna iya samun ƙarin cikakkun bayanai game da Delta-Connector anan shafi.

Kafka and the transport Layer

An gina layin sufuri na taron Delta akan sabis ɗin aika saƙon dandamali Keystone.

A tarihi, an inganta aikawa akan Netflix don samun dama maimakon tsawon rai (duba ƙasa). labarin da ya gabata). Kasuwancin ya kasance yuwuwar rashin daidaiton bayanan dillali a cikin yanayi daban-daban. Misali, zaben shugaban kasa mara tsarki ke da alhakin mai karɓo mai yuwuwar samun abubuwan kwafi ko ɓacewa.

Tare da Delta, muna son ƙarin garantin dorewa don tabbatar da isar da abubuwan CDC zuwa shagunan da aka samo. Don wannan dalili, mun ba da shawarar ƙirar Kafka ta musamman azaman abu na farko. Kuna iya duba wasu saitunan dillali a cikin teburin da ke ƙasa:

Delta: Aiki tare da Data Platform

In Keystone Kafka clusters, zaben shugaban kasa mara tsarki yawanci ana haɗawa don tabbatar da samun damar mai wallafawa. Wannan na iya haifar da asarar saƙonni idan an zaɓi kwafin da ba a daidaita shi ba a matsayin jagora. Don sabon babban samuwan gungu na Kafka, zaɓi zaben shugaban kasa mara tsarki kashe don hana asarar saƙo.

Mu kuma mun karu abubuwan kwafi daga 2 zuwa 3 mafi ƙarancin insync kwafi 1 zuwa 2. Masu wallafe-wallafen da ke rubuta wa wannan gungu suna buƙatar acks daga duk sauran, tabbatar da cewa 2 cikin 3 suna da mafi yawan saƙon da mawallafin ya aika.

Lokacin da misalin dillali ya ƙare, sabon misali yana maye gurbin tsohon. Koyaya, sabon dillali zai buƙaci cim ma kwafin da ba a daidaita su ba, wanda zai ɗauki sa'o'i da yawa. Don rage lokacin dawowa don wannan yanayin, mun fara amfani da ma'ajin bayanan toshe (Amazon Elastic Block Store) maimakon fayafai na gida. Lokacin da sabon misali ya maye gurbin misalin dillali da aka ƙare, yana haɗa ƙarar EBS wanda misalin da aka ƙare yana da shi kuma ya fara cim ma sabbin saƙonni. Wannan tsari yana rage lokacin share bayanan baya daga sa'o'i zuwa mintuna saboda sabon misali baya buƙatar yin kwafi daga halin da babu komai a ciki. Gabaɗaya, keɓancewar ma'ajiyar rayuwa da dillalai suna rage tasirin canjin dillali.

Don ƙara haɓaka garantin isar da bayanai, mun yi amfani da shi tsarin bin saƙo don gano duk wani asarar saƙo a ƙarƙashin matsanancin yanayi (misali, kawar da agogo a cikin jagoran ɓangaren).

Tsarin Gudanar da Yawo

An gina layin sarrafa Delta a saman dandalin Netflix SpaaS, wanda ke ba da haɗin gwiwar Apache Flink tare da tsarin muhalli na Netflix. Dandali yana ba da hanyar sadarwa ta mai amfani da ke kula da tura ayyukan Flink da ƙungiyar ƙungiyoyin Flink a saman dandalin sarrafa kwantena na Titus. Har ila yau, keɓancewar yana sarrafa saitunan aiki kuma yana bawa masu amfani damar yin canje-canje na sanyi sosai ba tare da sake tattara ayyukan Flink ba.

Delta yana ba da tsarin sarrafa rafi dangane da Flink da SPAaS waɗanda ke amfani da su tushen annotation DSL (Yankin Musamman Harshen) don cikakkun bayanan fasaha. Misali, don ayyana matakin da abubuwan da suka faru za su haɓaka ta hanyar kiran sabis na waje, masu amfani suna buƙatar rubuta DSL mai zuwa, kuma tsarin zai ƙirƙiri samfurin bisa shi, wanda Flink zai aiwatar da shi.

Delta: Aiki tare da Data Platform
Hoto 3. Misalin haɓakawa akan DSL a Delta

Tsarin sarrafawa ba wai kawai yana rage tsarin koyo ba, har ma yana samar da fasalulluka na sarrafa rafi na gama gari kamar ƙaddamarwa, ƙira, da sassauci da juriya don warware matsalolin aiki gama gari.

Tsarin Gudanar da Rarraba Delta ya ƙunshi nau'ikan maɓalli guda biyu, tsarin DSL & API da tsarin Runtime. Tsarin DSL & API yana ba da DSL da UDF (Ayyukan-Masu Amfani) APIs don masu amfani su iya rubuta dabarun sarrafa nasu (kamar tacewa ko canji). Tsarin Runtime yana ba da aiwatar da mai binciken DSL wanda ke gina wakilcin ciki na matakan sarrafawa a cikin DAG model. Bangaren Kisa yana fassara samfuran DAG don fara ainihin maganganun Flink kuma a ƙarshe gudanar da aikace-aikacen Flink. An kwatanta tsarin gine-ginen a cikin adadi mai zuwa.

Delta: Aiki tare da Data Platform
Hoto 4. Gine-ginen Tsarin Gudanar da Rafi na Delta

Wannan hanyar tana da fa'idodi da yawa:

  • Masu amfani za su iya mai da hankali kan dabarun kasuwancin su ba tare da zurfafa cikin ƙayyadaddun ƙayyadaddun Flink ko tsarin SPAaS ba.
  • Ana iya inganta haɓakawa ta hanyar da ke bayyana ga masu amfani, kuma ana iya gyara kurakurai ba tare da buƙatar kowane canje-canje ga lambar mai amfani ba (UDF).
  • Kwarewar aikace-aikacen Delta an sauƙaƙe don masu amfani saboda dandamali yana ba da sassauci da juriya daga cikin akwatin kuma yana tattara nau'ikan ma'auni dalla-dalla waɗanda za a iya amfani da su don faɗakarwa.

Amfani da samarwa

Delta ta kasance tana samarwa sama da shekara guda kuma tana taka muhimmiyar rawa a yawancin aikace-aikacen Netflix Studio. Ta taimaka wa ƙungiyoyi su aiwatar da shari'o'in amfani da su kamar ƙididdigar bincike, adana bayanai, da ayyukan aiki da ke gudana. A ƙasa akwai bayyani na babban matakin gine-gine na dandalin Delta.

Delta: Aiki tare da Data Platform
Hoto 5. Babban tsarin gine-gine na Delta.

Godiya

Muna so mu gode wa waɗannan mutanen da suka shiga cikin ƙirƙirar da ci gaban Delta a Netflix: Allen Wang, Charles Zhao, Jaebin Yoon, Josh Snyder, Kasturi Chatterjee, Mark Cho, Olof Johansson, Piyush Goyal, Prashanth Ramdas, Raghuram Onti Srinivasan, Sandeep Gupta, Steven Wu, Tharanga Gamaethige, Yun Wang da Zhenzhong Xu.

Sources

  1. dev.mysql.com/doc/refman/5.7/en/implicit-commit.html
  2. dev.mysql.com/doc/refman/5.7/en/cannot-roll-back.html
  3. Martin Kleppmann, Alastair R. Beresford, Boerge Svingen: Gudanar da taron kan layi. Jama'a. ACM 62 (5): 43-49 (2019). DOI: doi.org/10.1145/3312527

Yi rajista don gidan yanar gizo kyauta: "Kayan Gina Bayanai don Ma'ajiyar Redshift na Amazon."

source: www.habr.com

Add a comment