Open Source DataHub: LinkedIn's Metadata Kutsvaga uye Discovery Platform
Kutsvaga iyo data yaunoda nekukurumidza kwakakosha kune chero kambani inovimba neakawanda data kuita sarudzo dzinofambiswa nedata. Izvi hazvingokanganisa kugadzirwa kwevashandisi vedata (kusanganisira vanoongorora, vanogadzira muchina, vesainzi vedata, uye mainjiniya edata), asi zvakare zvine chekuita nemhedzisiro yezvigadzirwa zvinoenderana nemhando yekudzidza muchina (ML) pombi. Pamusoro pezvo, maitiro ekushandisa kana kuvaka mapuratifomu ekudzidza muchina anomutsa mubvunzo: ndeipi nzira yako yemukati yekuwana maficha, modhi, metrics, dataset, nezvimwe.
Muchikamu chino tichataura nezve mabudiro atakaita sosi yedata pasi perezinesi rakavhurika
WhereHows ikozvino DataHub!
LinkedIn's metadata timu yakamboratidzwa
Open Source Approaches
WhereHows, LinkedIn's yepakutanga portal yekutsvaga data uye kwainobva, yakatanga sechirongwa chemukati; timu yemetadata yakavhura
Chekutanga edza: "Vhura sosi kutanga"
Isu takatanga tatevera "yakavhurika sosi kutanga" modhi yekusimudzira, uko budiriro zhinji inoitika munzvimbo yakavhurika sosi repository uye shanduko dzinoitirwa kutumirwa kwemukati. Dambudziko nemaitiro aya nderekuti kodhi inogara ichisundirwa kuGitHub kutanga isati yanyatsoongororwa mukati. Kusvikira shanduko dzaitwa kubva kune yakavhurika sosi repository uye kutsva kwemukati kutumirwa kwaitwa, isu hatizowana chero nyaya dzekugadzira. Kana pakange pasina kufambiswa zvakanaka, zvaive zvakaomawo kuona kuti ndiani akonzeresa nekuti shanduko dzaiitwa mumabhechi.
Pamusoro pezvo, iyi modhi yakaderedza kugadzirwa kwechikwata pakugadzira zvinhu zvitsva izvo zvinoda kukurumidza kudzokororwa, sezvo zvakamanikidza shanduko dzese kuti dzitange kusundirwa munzvimbo yakavhurika sosi repository yobva yasundirwa kune yemukati repository. Kuti uderedze nguva yekugadzira, gadziriso inodiwa kana shanduko inogona kuitwa mudura remukati kutanga, asi iri rakava dambudziko hombe kana rasvika pakubatanidza shanduko idzo mudura rakavhurika repository nekuti matura maviri akange asisina kuwiriraniswa.
Iyi modhi iri nyore kushandisa kune yakagovaniswa mapuratifomu, maraibhurari, kana mapurojekiti ezvivakwa pane kune yakazara-inoratidzirwa tsika pawebhu application. Pamusoro pezvo, iyi modhi yakanakira mapurojekiti anotanga akavhurika sosi kubva pazuva rekutanga, asi WhereHows yakavakwa seyakakwana yemukati webhu application. Zvakanga zvakaoma chaizvo kubvisa zvese zvemukati zvinotsamira, saka taifanira kuchengeta forogo yemukati, asi kuchengeta forogo yemukati uye kugadzira kazhinji yakavhurika sosi hazvina kunyatso shanda.
Kuedza kwechipiri: "Inner kutanga"
** Sekuedza kwechipiri, takatamira kune "yemukati yekutanga" modhi yekuvandudza, uko budiriro zhinji inoitika mumba uye shanduko dzinoitwa kune yakavhurika sosi kodhi nguva nenguva. Kunyangwe iyi modhi yakanyatsokodzera kune yedu kesi yekushandisa, ine matambudziko ekuzvarwa. Kunanga kusundira misiyano yese kune yakavhurika sosi repository uyezve kuyedza kugadzirisa kusawirirana gare gare isarudzo, asi inotora nguva. Vagadziri muzviitiko zvakawanda vanoedza kusaita izvi pese pavanoongorora kodhi yavo. Nekuda kweizvozvo, izvi zvichaitwa zvishoma kazhinji, mumabhechi, uye nekudaro zvinoita kuti zvinyanye kunetsa kugadzirisa kubatanidza kusawirirana gare gare.
Nguva yechitatu yakashanda!
Iwo maviri akatadza kuedza ataurwa pamusoro akakonzera kuti WhereHows GitHub repository irambe iri kunze kwenguva kwenguva refu. Chikwata chakaramba chichivandudza maficha echigadzirwa uye zvivakwa, kuitira kuti iyo yemukati vhezheni yeHowHows yeLinkedIn yakave yepamusoro kupfuura iyo yakavhurwa sosi vhezheni. Yakanga iine zita idzva - DataHub. Kubva pane zvakambokundikana zvakaedza, timu yakafunga kugadzira scalable, yenguva refu mhinduro.
Kune chero purojekiti nyowani yakavhurika sosi, LinkedIn's open source timu inopa zano uye inotsigira modhi yekusimudzira umo mamodule epurojekiti anogadzirwa zvachose mune yakavhurika sosi. Zvinyorwa zvakadhindwa zvinoiswa kunzvimbo inochengeterwa veruzhinji zvobva zvadzoserwa mune yemukati LinkedIn artifact uchishandisa.
Nekudaro, yakakura yekumashure-yekupedzisira application yakadai seDataHub inoda nguva yakakura kuti isvike iyi nyika. Izvi zvinodzivirirawo mukana wekuvhura kuvhurwa kwekushanda kwakazara kusati kwaitwa zvese zvemukati zvakabviswa zvizere. Ndosaka takagadzira maturusi anotibatsira kuti tiite mipiro yakavhurika nekukurumidza uye nekurwadziwa kushoma. Iyi mhinduro inobatsira vese metadata timu (DataHub developer) uye yakavhurika sosi nharaunda. Zvikamu zvinotevera zvichakurukura nzira itsva iyi.
Open Source Publishing Automation
Iyo Metadata timu yazvino maitiro kune yakavhurika sosi DataHub ndeyekugadzira chishandiso chino wiriranisa otomatiki iyo yemukati codebase uye yakavhurika sosi repository. Mamiriro epamusoro echishandiso ichi anosanganisira:
- Batanidza LinkedIn kodhi ku/kubva kune yakavhurika sosi, yakafanana
rsync . - Kugadzira musoro werezinesi, wakafanana ne
Apache Rat . - Gadzira otomatiki yakavhurika sosi dhizaini kubva mukati mekuita matanda.
- Dzivirira shanduko yemukati inotyora yakavhurika sosi inovaka ne
kuvimba kwekuongorora .
Zvikamu zvidiki zvinotevera zvichanyura mumabasa ataurwa pamusoro ane matambudziko anonakidza.
Kwakabva kodhi kuwiriranisa
Kusiyana neyakavhurika sosi vhezheni yeDataHub, inova imwechete GitHub repository, iyo LinkedIn vhezheni yeDataHub musanganiswa weakawanda marepositori (anonzi mukati.
Mufananidzo 1: Kuwiriranisa pakati pezvinyorwa LinkedIn DataHub uye imwe repository DataHub open source
Kutsigira otomatiki kuvaka, kusunda, uye kudhonza workflows, chishandiso chedu chitsva chinogadzira otomatiki faira-level mepu inoenderana nechero sosi faira. Nekudaro, iyo Toolkit inoda kugadziridzwa kwekutanga uye vashandisi vanofanirwa kupa yepamusoro-level module mepu sezvakaratidzwa pazasi.
{
"datahub-dao": [
"${datahub-frontend}/datahub-dao"
],
"gms/impl": [
"${dataset-gms}/impl",
"${user-gms}/impl"
],
"metadata-dao": [
"${metadata-models}/metadata-dao"
],
"metadata-builders": [
"${metadata-models}/metadata-builders"
]
}
Iyo module-level mepu iri nyore JSON iyo makiyi ayo ari anotariswa mamodule mune yakavhurika sosi repository uye kukosha ndiko rondedzero yemamodule emodule mune LinkedIn repositories. Chero yakanangwa module mune yakavhurika sosi repository inogona kudyiswa nechero nhamba yemasource modules. Kuti uratidze mazita emukati ezvinyorwa mune zvinyorwa modules, shandisa
{
"${metadata-models}/metadata-builders/src/main/java/com/linkedin/Foo.java":
"metadata-builders/src/main/java/com/linkedin/Foo.java",
"${metadata-models}/metadata-builders/src/main/java/com/linkedin/Bar.java":
"metadata-builders/src/main/java/com/linkedin/Bar.java",
"${metadata-models}/metadata-builders/build.gradle": null,
}
Iyo faira level mepu inogadzirwa otomatiki nezvishandiso; zvisinei, inogona zvakare kuvandudzwa nemaoko nemushandisi. Iyi i1: 1 mepu yeLinkedIn source faira kune faira mune yakavhurwa sosi repository. Pane mitemo yakati wandei yakabatana neiyi otomatiki kugadzirwa kwemafaira kushamwaridzana:
- Panyaya yeakawanda sosi mamodule eiyo inotangwa module mune yakavhurika sosi, kukakavara kunogona kumuka, semuenzaniso zvakafanana.
FQCN , iripo mune anopfuura imwe sosi module. Sehurongwa hwekugadzirisa kusawirirana, maturusi edu anogara kune "wekupedzisira anohwina" sarudzo. - "null" zvinoreva kuti iyo sosi faira haisi chikamu cheyakavhurika sosi repository.
- Mushure mega yega yakavhurwa sosi kutumira kana kudhirowa, iyi mepu inovandudzwa otomatiki uye mufananidzo unogadzirwa. Izvi zvinodikanwa kuti uone mawedzero uye kudzima kubva kusource code kubva pachiitiko chekupedzisira.
Kugadzira matanda ekuita
Commit logs for open source commits anogadzirwawo otomatiki nekubatanidza matanda ekuisa emukati repositori. Pazasi pane muenzaniso wekuita danda kuratidza chimiro chegiyodhi inogadzirwa nechokushandisa chedu. Kuzvipira kunoratidza zvakajeka kuti ndedzipi vhezheni dzeiyo sosi repositories dzakaiswa mune icho chibvumirano uye inopa pfupiso yegidhi rekuita. Tarisa iyi
metadata-models 29.0.0 -> 30.0.0
Added aspect model foo
Fixed issue bar
dataset-gms 2.3.0 -> 2.3.4
Added rest.li API to serve foo aspect
MP_VERSION=dataset-gms:2.3.4
MP_VERSION=metadata-models:30.0.0
Dependency test
LinkedIn ine
Iyi inzira inobatsira inobatsira kudzivirira chero kuzvipira kwemukati kunotyora yakavhurika sosi kuvaka uye kuiona panguva yekuzvipira. Pasina izvi, zvingave zvakaoma kuona kuti ndechipi chisungo chemukati chakakonzera kuti yakavhurika sosi repository ivake, nekuti isu tinounganidza shanduko yemukati kuDataHub yakavhurika sosi repository.
Misiyano pakati peyakavhurika sosi DataHub uye yedu yekugadzira vhezheni
Kusvika panguva ino, takakurukura mhinduro yedu yekuwiriranisa mavhezheni maviri eDataHub repositori, asi isu hatisati tatsanangura zvikonzero nei tichida hova mbiri dzakasiyana dzekusimudzira pakutanga. Muchikamu chino, tichanyora misiyano pakati peruzhinji vhezheni yeDataHub uye shanduro yekugadzira pane LinkedIn maseva, uye tsanangura zvikonzero zvekusiyana uku.
Imwe bviro yekusawirirana kunobva pakuti vhezheni yedu yekugadzira ine zvinoenderana nekodhi iyo isati yavhurwa sosi, senge LinkedIn's Offspring (LinkedIn's yemukati dependency jekiseni). Offspring inoshandiswa zvakanyanya mumakodhesi emukati nekuti ndiyo nzira inosarudzika yekugadzirisa dhizaini yekumisikidza. Asi haisi yakavhurika sosi; saka taida kutsvaga yakavhurika sosi dzimwe nzira kune yakavhurika sosi DataHub.
Pane zvimwe zvikonzero zvakare. Sezvo isu tichigadzira mawedzero kune metadata modhi yezvido zveLinkedIn, aya mawedzero anowanzo nyatso kuenderana neLinkedIn uye anogona kusashanda zvakananga kune dzimwe nharaunda. Semuyenzaniso, isu tine mavara chaiwo emaID evatori vechikamu nedzimwe mhando dzemetadata dzinoenderana. Saka, isu tabvisa aya ekuwedzera kubva kuDataHub's open source metadata modhi. Sezvo isu tichibatana nenharaunda uye tichinzwisisa zvavanoda, isu tichashanda pane zvakafanana yakavhurika sosi shanduro dzeizvi edzedzero pazvinenge zvichidikanwa.
Kureruka kwekushandisa uye nyore kuchinjika kune yakavhurika sosi nharaunda zvakare yakafuridzira mamwe misiyano pakati peiviri shanduro dzeDataHub. Misiyano murukova yekugadzirisa zvivakwa muenzaniso wakanaka weizvi. Kunyangwe yedu yemukati vhezheni inoshandisa yakagadziriswa rukova yekugadzirisa chimiro, isu takasarudza kushandisa yakavakirwa-mukati (yakamira) kurukova kugadzirisa kune yakavhurika sosi vhezheni nekuti inodzivirira kugadzira kumwe kutsamira kwezvivakwa.
Mumwe muenzaniso wemusiyano kuve neGMS imwechete (Generalized Metadata Store) mune yakavhurika sosi kuita kwete akawanda maGMS. GMA (Generalized Metadata Architecture) izita rekumashure-yekupedzisira architecture yeDataHub, uye GMS ndiyo metadata chitoro mumamiriro eGMA. GMA chivakwa chinochinjika chinokutendera kugovera yega yega data kuvaka (semu dataset, vashandisi, nezvimwewo) muchitoro chayo chemetadata, kana kuchengetedza akawanda data anovaka muchitoro chimwe chemetadata chero bedzi registry ine iyo data data mepu mukati. GMS inovandudzwa. Kuti zvive nyore kushandisa, isu takasarudza imwe chete GMS muenzaniso inochengeta ese akasiyana data anovaka mune yakavhurika sosi DataHub.
Rondedzero yakazara yemisiyano pakati pemashandisirwo maviri inopiwa mutafura iri pazasi.
Product Features
LinkedIn DataHub
Vhura Source DataHub
Inotsigirwa Data Constructs
1) Zvinyorwa 2) Vashandisi 3) Metrics 4) ML Zvimiro 5) Machati 6) Dashboards
1) Datasets 2) Vashandisi
Inotsigirwa Metadata Source yeDatasets
1)
Hive Kafka RDBMS
Pub-sub
Confluent Kafka
Stream Processing
vakakwanisa
Yakaiswa (yakamira)
Dependency Injection & Dynamic Configuration
LinkedIn Offspring
Kuvaka Tooling
Ligradle (LinkedIn's yemukati Gradle wrapper)
CI / CD
CRT (LinkedIn yemukati CI/CD)
Metadata Stores
Distributed multiple GMS: 1) Dataset GMS 2) User GMS 3) Metric GMS 4) Feature GMS 5) Chati/Dashboard GMS
Imwe GMS ye: 1) Datasets 2) Vashandisi
Microservices muDocker midziyo
Mufananidzo 2: Architecture DataHub *Open source**
Iwe unogona kuona iyo yepamusoro-level architecture yeDataHub mumufananidzo uri pamusoro. Kunze kwezvivakwa zvezvivakwa, ine ina dzakasiyana Docker midziyo:
datahub-gms: metadata yekuchengetedza sevhisi
datahub-mberi: application
datahub-mce-consumer: application
datahub-mae-mutengi: application
Open source repository zvinyorwa uye
CI/CD paDataHub yakavhurika sosi
Iyo yakavhurika sosi DataHub repository inoshandisa
Nekuzvipira kwese kuDataHub yakavhurika sosi repository, mifananidzo yese yeDocker inovakwa otomatiki uye inoiswa kuDocker Hub ine "izvino" tag. Kana Docker Hub yakagadziriswa nevamwe
Kushandisa DataHub
- Clone iyo yakavhurika sosi repository uye mhanyisa zvese Docker midziyo ine docker-nyora uchishandisa yakapihwa docker-nyora script kuti utange nekukurumidza.
- Dhawunirodha iyo data yemuenzaniso yakapihwa mune repository uchishandisa yekuraira mutsara chishandiso chakapihwa zvakare.
- Bhurawuza DataHub mubrowser yako.
Active Tracked
Zvirongwa zvemangwana
Parizvino, zvese zvivakwa kana microservice yeakavhurika sosi DataHub inovakwa seDocker mudziyo, uye iyo yese sisitimu inorongwa uchishandisa.
Isu tinorongawo kupa mhinduro yeturnkey yekuisa DataHub pane yeruzhinji Cloud sevhisi senge
Chekupedzisira asi chisiri chidiki, tinotenda kune vese vekutanga kutora DataHub munharaunda yakavhurika sosi vakayera DataHub alphas uye vakatibatsira kuona nyaya nekuvandudza zvinyorwa.
Source: www.habr.com