I-DataHub yoMthombo oVulekileyo: Ukukhangela kweMetadata ye-LinkedIn kunye nePlatifomu yokuFumana
Ukufumana idatha oyifunayo ngokukhawuleza kubalulekile kuyo nayiphi na inkampani exhomekeke kwixabiso elikhulu ledatha ukwenza izigqibo eziqhutywa yidatha. Oku akuchaphazeli kuphela imveliso yabasebenzisi bedatha (kubandakanywa abahlalutyi, abaphuhlisi bokufunda koomatshini, izazinzulu zedatha, kunye neenjineli zedatha), kodwa kunempembelelo ngqo kwimveliso yokugqibela exhomekeke kumbhobho wokufunda ngomatshini (ML). Ukongeza, umkhwa wokuphumeza okanye wokwakha amaqonga okufunda ngoomatshini ngokwendalo uphakamisa umbuzo: yeyiphi indlela yakho yangaphakathi yokufumanisa iimpawu, iimodeli, iimethrikhi, iiseti zedatha, njl.
Kweli nqaku siza kuthetha ngendlela esipapashe ngayo umthombo wedatha phantsi kwelayisensi evulekileyo
WhereHows ngoku iDataHub!
Iqela lemetadata lika-LinkedIn libonisiwe ngaphambili
IiNdlela zoMthombo oVulekileyo
WhereHows, i-portal ye-LinkedIn yokuqala yokufumana idatha kunye nalapho ivela khona, yaqala njengeprojekthi yangaphakathi; Iqela lemetadata liyivulile
Zama kuqala: "Vula umthombo kuqala"
Siqale salandela "umthombo ovulekileyo kuqala" imodeli yophuhliso, apho uninzi lophuhliso lwenzeka kwindawo yokugcina umthombo ovulekileyo kunye notshintsho lwenzelwe ukuthunyelwa kwangaphakathi. Ingxaki ngale ndlela kukuba ikhowudi ihlala ityhalelwa kwi-GitHub kuqala ngaphambi kokuba ihlaziywe ngokupheleleyo ngaphakathi. Kuze kubekho utshintsho oluvela kwindawo yomthombo ovulekileyo kunye nokuthunyelwa kwangaphakathi okutsha kwenziwa, asiyi kufumana nayiphi na imiba yokuvelisa. Kwimeko yokuthunyelwa kakubi, kwakunzima kakhulu ukufumanisa unobangela ngenxa yokuba utshintsho lwenziwa kwiibhetshi.
Ukongeza, le modeli yanciphisa imveliso yeqela xa kusenziwa izinto ezintsha ezifuna uphindaphindo olukhawulezayo, kuba inyanzelise ukuba zonke iinguqu zityhalelwe kuqala kwindawo yokugcina umthombo ovulekileyo emva koko zityhalelwe kwindawo yokugcina yangaphakathi. Ukunciphisa ixesha lokucubungula, ukulungiswa okufunekayo okanye utshintsho lunokwenziwa kwindawo yokugcina yangaphakathi kuqala, kodwa oku kube yingxaki enkulu xa kufikwa ekudibaniseni olo tshintsho lubuyele kwindawo yomthombo ovulekileyo ngenxa yokuba iindawo zokugcina ezimbini zazingaphandle kokuvumelanisa.
Le modeli ilula kakhulu ukuyiphumeza kumaqonga ekwabelwana ngawo, amathala eencwadi, okanye iiprojekthi zeziseko ezingundoqo kunezicelo ezigcweleyo zewebhu. Ukongeza, le modeli ifanelekile kwiiprojekthi eziqala umthombo ovulekileyo ukusuka kusuku lokuqala, kodwa i-HowHows yakhiwe njengesicelo sewebhu sangaphakathi ngokupheleleyo. Kwakunzima kakhulu ukukhupha ngokupheleleyo konke ukuxhomekeka kwangaphakathi, ngoko ke bekufuneka sigcine ifolokhwe yangaphakathi, kodwa ukugcina ifolokhwe yangaphakathi kunye nokuphuhlisa umthombo ovulekileyo kakhulu akuzange kusebenze.
Umzamo wesibini: "Ngaphakathi kuqala"
** Njengomzamo wesibini, siye safudukela kwimodeli yophuhliso "yangaphakathi yokuqala", apho uninzi lophuhliso lwenzeka ngaphakathi kwaye utshintsho lwenziwa rhoqo kwikhowudi yomthombo ovulekileyo. Nangona le modeli ifanelekile kwimeko yethu yokusetyenziswa, ineengxaki zendalo. Ukutyhala ngokuthe ngqo zonke iiyantlukwano kwindawo yokugcina umthombo ovulekileyo kwaye emva koko uzame ukusombulula iingxabano kamva lukhetho, kodwa kuthatha ixesha. Abaphuhlisi kwiimeko ezininzi bazama ukungakwenzi oku ngalo lonke ixesha bephonononga ikhowudi yabo. Ngenxa yoko, oku kuya kwenziwa kancinci rhoqo, kwiibhetshi, kwaye ke kwenza kube nzima kakhulu ukusombulula ukudibanisa ungquzulwano kamva.
Okwesithathu yasebenza!
Ezi nzame zimbini aziphumelelanga zikhankanywe ngasentla zibangele ukuba indawo yokugcina i-HowHows GitHub ihlale iphelelwe lixesha ixesha elide. Iqela liqhubekile nokuphucula iimpawu zemveliso kunye noyilo, ukuze inguqulelo yangaphakathi ye-HowHows ye-LinkedIn ibe phambili ngakumbi kunenguqulelo yomthombo ovulekileyo. Ide ibe negama elitsha- DataHub. Ngokusekelwe kwimizamo yangaphambili engaphumelelanga, iqela lagqiba ekubeni liphuhlise isisombululo esinobunzima, sexesha elide.
Kuyo nayiphi na iprojekthi entsha yomthombo ovulekileyo, iqela le-LinkedIn lomthombo ovulekileyo licebisa kwaye lixhasa imodeli yophuhliso apho iimodyuli zeprojekthi ziphuhliswa ngokupheleleyo kumthombo ovulekileyo. Izinto zakudala eziguqulelweyo zisasazwa kwindawo yokugcina uluntu kwaye emva koko zijongwe kwakhona kwi-LinkedIn ye-artifact yangaphakathi kusetyenziswa.
Nangona kunjalo, isicelo esivuthiweyo sokugqibela esifana neDathaHub siya kufuna ixesha elibalulekileyo lokufikelela kweli lizwe. Oku kukwathintela ukuba nokwenzeka kokukhangela okuvulelekileyo umiliselo olusebenza ngokupheleleyo phambi kokuba zonke izinto ezixhomekeke ngaphakathi zifunyanwe ngokupheleleyo. Yiyo loo nto siye saphuhlisa izixhobo ezisinceda ukuba senze igalelo lomthombo ovulekileyo ngokukhawuleza nangeentlungu ezisezantsi. Esi sisombululo sixhamla zombini iqela lemethadatha (umphuhlisi weDataHub) kunye noluntu oluvulekileyo lomthombo. La macandelo alandelayo aza kuxoxa ngale ndlela intsha.
I-Open Source yoPapasho oluzenzekelayo
Indlela yakutshanje yeqela leMetadata kumthombo ovulekileyo weDataHub kukuphuhlisa isixhobo esivumelanisa ngokuzenzekelayo i-codebase yangaphakathi kunye nomthombo ovulekileyo womthombo. Iimpawu zomgangatho ophezulu wale zixhobo zibandakanya:
- Vumelanisa ikhowudi ye-LinkedIn ukuya / ukusuka kumthombo ovulekileyo, okufanayo
rsync . - Ukuveliswa kwesihloko sephepha-mvume, esifana ne
Apache Rat . - Yenza ngokuzenzekelayo iilog zokuzibophelela kwimithombo evulekileyo ukusuka kwiilog zokuzibophelela zangaphakathi.
- Thintela utshintsho lwangaphakathi olwaphula umthombo ovulekileyo owakhayo
uvavanyo lokuxhomekeka .
La macandelwana alandelayo aza kuphonononga kule misebenzi ikhankanywe ngasentla eneengxaki ezinomdla.
Ungqamaniso lwekhowudi yemvelaphi
Ngokungafaniyo nenguqulelo yomthombo ovulekileyo weDataHub, eyi-GitHub enye yokugcina, inguqulo ye-LinkedIn yeDataHub yindibaniselwano yeendawo zokugcina ezininzi (ezibizwa ngaphakathi.
Umzobo 1: Ungqamaniso phakathi kweendawo zokugcina LinkedIn DataHub kunye novimba omnye DataHub Vula Umnikezi
Ukuxhasa ulwakhiwo oluzenzekelayo, ukutyhala, kunye nokutsala umsebenzi, isixhobo sethu esitsha sizenza ngokuzenzekelayo imephu yomgangatho wefayile ehambelana nefayile nganye yomthombo. Nangona kunjalo, i-toolkit ifuna uqwalaselo lokuqala kwaye abasebenzisi kufuneka babonelele ngemephu yemodyuli ephezulu njengoko kubonisiwe ngezantsi.
{
"datahub-dao": [
"${datahub-frontend}/datahub-dao"
],
"gms/impl": [
"${dataset-gms}/impl",
"${user-gms}/impl"
],
"metadata-dao": [
"${metadata-models}/metadata-dao"
],
"metadata-builders": [
"${metadata-models}/metadata-builders"
]
}
Imephu yenqanaba lemodyuli yi-JSON elula enezitshixo eziziimodyuli ekujoliswe kuzo kwindawo yokugcina umthombo ovulekileyo kwaye amaxabiso luluhlu lweemodyuli zomthombo kwiindawo zokugcina ze-LinkedIn. Nayiphi na imodyuli ekujoliswe kuyo kwindawo yokugcina umthombo ovulekileyo inokondliwa ngalo naliphi na inani leemodyuli zomthombo. Ukubonisa amagama angaphakathi eendawo zokugcina kwiimodyuli zomthombo, sebenzisa
{
"${metadata-models}/metadata-builders/src/main/java/com/linkedin/Foo.java":
"metadata-builders/src/main/java/com/linkedin/Foo.java",
"${metadata-models}/metadata-builders/src/main/java/com/linkedin/Bar.java":
"metadata-builders/src/main/java/com/linkedin/Bar.java",
"${metadata-models}/metadata-builders/build.gradle": null,
}
Inqanaba lefayile lemephu lenziwe ngokuzenzekelayo zizixhobo; nangona kunjalo, inokuhlaziywa ngesandla ngumsebenzisi. Le yimephu ye-1: 1 yefayile ye-LinkedIn yefayile kwifayile kwindawo yokugcina umthombo ovulekileyo. Kukho imigaqo emininzi eyayanyaniswa nolu yilo oluzenzekelayo lonxulumano lwefayile:
- Kwimeko yeemodyuli ezininzi zemithombo yemodyuli ekujoliswe kuyo kumthombo ovulekileyo, iingxabano zinokuvela, umz.
FQCN , ekhoyo kwimodyuli yomthombo engaphezulu kwesinye. Njengesicwangciso sokusombulula impixano, izixhobo zethu azigqibekanga kukhetho "lokugqibela oluphumeleleyo". - "null" ithetha ukuba ifayile yemvelaphi ayiyonxalenye yendawo yokugcina umthombo.
- Emva kokungeniswa komthombo ngamnye ovulekileyo okanye utsalo, le maphu ihlaziywa ngokuzenzekelayo kwaye i-snapshot yenziwe. Oku kuyimfuneko ukuchonga izongezo kunye nokususwa kwikhowudi yomthombo ukususela kwisenzo sokugqibela.
Ukudala iinkuni zokuzibophelela
Iilog zokuzibophelela zonikezelo lwemithombo evulekileyo nazo zenziwa ngokuzenzekelayo ngokudibanisa iilog zokubophelela zogcino lwangaphakathi. Apha ngezantsi kukho isampulu yokuzibophelela kwelog ukubonisa ubume belog yokuzinikela eyenziwe sisixhobo sethu. Isibophelelo sibonisa ngokucacileyo ukuba zeziphi iinguqulelo zemithombo yogcino ezipakishwe kweso sibophelelo kwaye sinika isishwankathelo selogi yokuzinikela. Jonga le ngaphandle
metadata-models 29.0.0 -> 30.0.0
Added aspect model foo
Fixed issue bar
dataset-gms 2.3.0 -> 2.3.4
Added rest.li API to serve foo aspect
MP_VERSION=dataset-gms:2.3.4
MP_VERSION=metadata-models:30.0.0
Uvavanyo lokuxhomekeka
LinkedIn unayo
Le yindlela eluncedo enceda ukuthintela nakuphi na ukuzibophelela kwangaphakathi okwaphula umthombo ovulekileyo wokwakha kwaye kubhaqwe ngexesha lokuzibophelela. Ngaphandle koku, kuya kuba nzima ukufumanisa ukuba sesiphi na isibophelelo sangaphakathi esibangele uvimba womthombo ovulekileyo ukuba ungaphumeleli, kuba sidibanisa utshintsho lwangaphakathi kwiDataHub yomthombo ovulekileyo wovimba.
Umahluko phakathi komthombo ovulekileyo weDataHub kunye nenguqulelo yethu yemveliso
Ukuza kuthi ga ngoku, sixoxe ngesisombululo sethu sokungqamanisa iinguqulelo ezimbini zeDathaHub zokugcina, kodwa asikachazi izizathu zokuba kutheni sifuna imijelo emibini eyahlukeneyo yophuhliso kwasekuqaleni. Kweli candelo, siya kuluhlula umahluko phakathi kwenguqu yoluntu yeDataHub kunye nenguqu yokuvelisa kwiiseva ze-LinkedIn, kwaye uchaze izizathu zolu mahluko.
Omnye umthombo wokungangqinelani uvela kwinto yokuba inguqulelo yethu yemveliso inokuxhomekeka kwikhowudi engekavuli mthombo, njengeLinkedIn's Offspring (Isakhelo seLinkedIn yokuxhomekeka kwangaphakathi). Inzala isetyenziswa ngokubanzi kwiikhowudi zangaphakathi kuba yindlela ekhethwayo yokulawula uqwalaselo oluguquguqukayo. Kodwa ayingomthombo ovulekileyo; ngoko besidinga ukufumana ezinye iindlela zomthombo ovulekileyo kumthombo ovulekileyo weDataHub.
Zikho nezinye izizathu. Njengoko sisenza izandiso kwimodeli yemethadatha yeemfuno ze-LinkedIn, ezi zandiso ziqhelekile ngokukodwa kwi-LinkedIn kwaye azinakusebenza ngokuthe ngqo kwezinye iimeko. Umzekelo, sineeleyibhile ezithe ngqo kakhulu kwii-ID zabathathi-nxaxheba kunye nezinye iintlobo zemetadata ehambelanayo. Ke, ngoku asizibandakanyi ezi zandiso kwimodeli yemethadatha yomthombo ovulekileyo weDataHub. Njengoko sisebenzisana noluntu kwaye siqonda iimfuno zabo, siya kusebenza kwiinguqulelo eziqhelekileyo zomthombo ovulekileyo wolu lwandiso apho lufuneka khona.
Ukusebenziseka ngokulula kunye nokulungelelaniswa lula kuluntu oluvulekileyo lomthombo luphinde lwaphefumlela ezinye iiyantlukwano phakathi kweenguqulelo ezimbini zeDataHub. Umahluko kwiziseko zokusetyenzwa komsinga ngumzekelo omhle woku. Nangona inguqulelo yethu yangaphakathi isebenzisa isakhelo sokusingathwa komjelo olawulwayo, sikhethe ukusebenzisa i-built-in (i-standalone) inkqubo yokuhambisa inguqulelo yomthombo ovulekileyo ngenxa yokuba igwema ukudala enye isiseko sokuxhomekeka.
Omnye umzekelo womahluko kukuba ne-GMS enye (uGcino lweMetadata Jikelele) ekuphunyezweni komthombo ovulekileyo kune-GMS ezininzi. I-GMA (i-Architecture yeMetadata Jikelele) ligama le-architecture ye-back-end ye-DataHub, kwaye i-GMS igcina i-metadata kumxholo we-GMA. I-GMA lulwakhiwo olubhetyebhetye kakhulu olukuvumela ukuba usasaze ulwakhiwo lwedatha nganye (umzekelo, iiseti zedatha, abasebenzisi, njl.njl.) kwisitoreji sayo semetadata, okanye ugcine ulwakhiwo lwedatha eninzi kwivenkile yemetadata enye nje ukuba ubhaliso luqulethe imephu yedatha I-GMS ihlaziywa. Ukusebenziseka lula, sikhethe umzekelo omnye we-GMS ogcina zonke iintlobo ezahlukeneyo zedatha kwiDataHub yomthombo ovulekileyo.
Uluhlu olupheleleyo lweeyantlukwano phakathi kokuphunyezwa kokubini lunikwe kwitheyibhile engezantsi.
Iziphumo zomkhiqizo
LinkedIn DataHub
Vula uMthombo weDataHub
Ulwakhiwo lweDatha oluxhaswayo
1) Iiseti zedatha 2) Abasebenzisi 3) Iimetriki 4) Iimpawu zeML 5) Iitshathi 6) Iidashbhodi
1) Iiseti zedatha 2) Abasebenzisi
Imithombo yeMetadata eXhasiweyo yeeSeti zedatha
1)
Hive Kafka RDBMS
I-Pub-sub
I-Kafka edibeneyo
Ukusasazwa koMsinga
aphethwe
Ifakwe (izimele)
Isitofu esiXhomekekileyo kunye noBumbeko oluDynamic
LinkedIn Offspring
Yakha izixhobo
I-Ligradle (Isonga seGradle sangaphakathi seLinkedIn)
CI / CD
CRT (I-LinkedIn's yangaphakathi CI/CD)
IiVenkile zeMetadata
Ukusasazwa kwe-GMS ezininzi: 1) Iseti yedatha ye-GMS 2) Umsebenzisi we-GMS 3) I-Metric GMS 4) Inqaku le-GMS 5) Itshathi/iDashboard ye-GMS
I-GMS enye: 1) Iiseti zedatha 2) Abasebenzisi
Iinkonzo ezincinci kwizikhongozeli zeDocker
Umzobo 2: Uyilo lwezakhiwo DataHub *Vula Umnikezi**
Uyakwazi ukubona i-architecture yezinga eliphezulu leDathaHub kumfanekiso ongentla. Ngaphandle kwezinto ezisisiseko, inezikhongozeli ezine ezahlukeneyo zeDocker:
datahub-gms: inkonzo yokugcina imetadata
datahub-frontend: isicelo
datahub-mce-umthengi: isicelo
datahub-mae-umthengi: isicelo
Vula umthombo wogcino amaxwebhu kunye
I-CI/CD kwi-DataHub ngumthombo ovulekileyo
Umthombo ovulekileyo wogcino lweDataHub usebenzisa
Ngokuzibophelela ngakunye kwiDataHub evulekileyo yokugcina umthombo, yonke imifanekiso yeDocker yakhiwe ngokuzenzekelayo kwaye ibekwe kwi-Docker Hub enethegi "yamva nje". Ukuba i-Docker Hub iqwalaselwe nabanye
Ukusebenzisa i-DataHub
- Vala indawo yokugcina umthombo ovulekileyo kwaye uqhube zonke izitya ze-Docker kunye ne-docker-compose usebenzisa i-docker-compose script enikiweyo ukuqala ngokukhawuleza.
- Khuphela idatha yesampula enikezelweyo kwindawo yokugcina usebenzisa isixhobo somgca womyalelo onikwe kwakhona.
- Bhrawuza iDataHub kwisikhangeli sakho.
Ilandelwa ngokukhutheleyo
Izicwangciso zekamva
Okwangoku, zonke iziseko okanye i-microservice yomthombo ovulekileyo weDataHub yakhiwe njengesitya seDocker, kwaye yonke inkqubo ilungelelaniswe kusetyenziswa.
Siceba kwakhona ukubonelela ngesisombululo se-turnkey yokuhambisa i-DataHub kwinkonzo yefu yoluntu njenge
Okokugqibela kodwa okungancinci, enkosi kubo bonke abamkeli bokuqala beDataHub kwindawo yomthombo ovulekileyo abaye balinganisela i-DataHub alphas kwaye basinceda ukuba sibone imiba kunye nokuphucula amaxwebhu.
umthombo: www.habr.com