Open Source DataHub: LinkedIn's Metadata Search and Discovery Platform
Open Source DataHub: LinkedIn's Metadata Search and Discovery Platform
Kupeza zomwe mukufuna mwachangu ndikofunikira kwa kampani iliyonse yomwe imadalira zambiri kuti ipange zisankho zoyendetsedwa ndi data. Izi sizimangokhudza zokolola za ogwiritsa ntchito data (kuphatikiza akatswiri, opanga makina ophunzirira, asayansi a data, ndi mainjiniya a data), komanso zimakhudza mwachindunji zinthu zomwe zimadalira makina ophunzirira makina abwino (ML). Kuphatikiza apo, zomwe zimachitika pakukhazikitsa kapena kumanga nsanja zophunzirira makina zimadzutsa funso: njira yanu yodziwira zinthu zamkati, zitsanzo, ma metric, ma dataset ndi chiyani.
M'nkhaniyi tikambirana momwe tidasindikizira gwero la data pansi pa chilolezo chotseguka DataHub pakusaka ndi kutulukira kwa metadata, kuyambira masiku oyambirira a polojekiti Kuliko. LinkedIn imakhala ndi mtundu wake wa DataHub mosiyana ndi mtundu wotseguka. Tiyamba ndi kufotokoza chifukwa chake timafunikira malo awiri osiyana otukuka, kenako kukambirana njira zoyambirira zogwiritsira ntchito gwero lotseguka la WhereHows ndikufanizira mtundu wathu wamkati (wopanga) wa DataHub ndi mtundu womwe ulipo. GitHub. Tidzagawananso zambiri za yankho lathu latsopano lokhazikika pakukankhira ndi kulandira zosintha zaposachedwa kuti nkhokwe zonse zigwirizane. Pomaliza, tipereka malangizo amomwe mungayambitsire kugwiritsa ntchito DataHub yotseguka ndikukambirana mwachidule kamangidwe kake.
WhereHows tsopano ndi DataHub!
Gulu la metadata la LinkedIn lomwe lidawonetsedwa kale DataHub (m'malo mwa WhereHows), nsanja ya LinkedIn yosaka ndi metadata, ndikugawana mapulani oti mutsegule. Posakhalitsa chilengezochi, tidatulutsa mtundu wa alpha wa DataHub ndikugawana ndi anthu ammudzi. Kuyambira pamenepo, takhala tikuthandizira posungiramo ndikugwira ntchito ndi ogwiritsa ntchito achidwi kuti tiwonjezere zomwe zafunsidwa ndikuthetsa mavuto. Ndife okondwa kulengeza za kutulutsidwa kwa boma DataHub pa GitHub.
Open Source Njira
WhereHows, LinkedIn's portal yoyambirira yopezera deta komanso komwe imachokera, idayamba ngati ntchito yamkati; gulu la metadata linatsegula source code mu 2016. Kuyambira nthawi imeneyo, gululi lakhala likusunga ma codebase awiri osiyana-imodzi yotsegula ndi imodzi yogwiritsira ntchito mkati mwa LinkedIn-monga sizinthu zonse zopangidwa ndi LinkedIn zomwe zinkagwiritsidwa ntchito nthawi zambiri kwa omvera ambiri. Kuphatikiza apo, WhereHows ali ndi zodalira zamkati (zomangamanga, malaibulale, ndi zina zotero) zomwe sizili zotseguka. M'zaka zotsatira, WhereHows adadutsa maulendo angapo obwerezabwereza ndi chitukuko, zomwe zinapangitsa kusunga ma codebase awiriwo kuti agwirizane kukhala vuto lalikulu. Gulu la metadata layesa njira zosiyanasiyana pazaka zambiri kuyesa kusunga chitukuko chamkati ndi chotseguka mu kulunzanitsa.
Yesani choyamba: "Open source poyamba"
Poyamba tidatsatira chitsanzo chachitukuko cha "open source first", pomwe chitukuko chochuluka chimapezeka pamalo otseguka ndipo zosintha zimapangidwira kutumizidwa mkati. Vuto ndi njira iyi ndikuti code nthawi zonse imakankhidwira ku GitHub poyamba isanawunikidwe kwathunthu mkati. Mpaka kusintha kupangidwa kuchokera ku malo otseguka komanso kutumizidwa kwatsopano mkati, sitidzapeza zovuta zopanga. Pakadapanda kutumizidwa bwino, zinalinso zovuta kudziwa wolakwa chifukwa kusintha kunachitika m'magulu.
Njira yaposachedwa kwambiri ya gulu la Metadata ku gwero lotseguka la DataHub ndikupanga chida chomwe chimangolumikizana ndi codebase yamkati ndi malo otseguka. Zapamwamba kwambiri za chida ichi ndi:
Mosiyana ndi gwero lotseguka la DataHub, lomwe ndi malo amodzi a GitHub, mtundu wa LinkedIn wa DataHub ndi kuphatikiza kwa nkhokwe zingapo (zotchedwa mkati. zopangidwa zambiri). Mawonekedwe a DataHub, laibulale yachitsanzo ya metadata, ntchito yosungiramo metadata yosungiramo zinthu, ndi ntchito zotsatsira zimakhala m'malo osiyanasiyana pa LinkedIn. Komabe, kuti zikhale zosavuta kwa ogwiritsa ntchito otsegula, tili ndi chosungira chimodzi cha mtundu wotseguka wa DataHub.
Chithunzi 1: Kulumikizana pakati pa nkhokweLinkedInDataHubndi nkhokwe imodziDataHubgwero lotseguka
Kuthandizira kupanga, kukankha, ndi kukoka kayendedwe ka ntchito, chida chathu chatsopano chimapanga mapu amtundu wa mafayilo ogwirizana ndi fayilo iliyonse. Komabe, zida zogwirira ntchito zimafunikira masinthidwe oyambira ndipo ogwiritsa ntchito ayenera kupereka mamapu apamwamba monga momwe ziliri pansipa.
Mapu amtundu wa module ndi JSON wosavuta yemwe makiyi ake ndi ma module omwe amayang'aniridwa pamalo otseguka gwero ndipo mayendedwe ake ndi mndandanda wamagawo omwe ali mu LinkedIn repositories. Module iliyonse yomwe mukufuna kulowa munkhokwe yotseguka imatha kudyetsedwa ndi ma module angapo aliwonse. Kuti muwonetse mayina amkati a nkhokwe mumagawo oyambira, gwiritsani ntchito kumasulira kwa chingwe mu kalembedwe ka Bash. Pogwiritsa ntchito fayilo ya mapu a module-level, zidazo zimapanga fayilo ya mapu amtundu wa fayilo posanthula mafayilo onse m'makalata ogwirizana nawo.
Mapu amtundu wa fayilo amapangidwa ndi zida; komabe, itha kusinthidwanso pamanja ndi wogwiritsa ntchito. Awa ndi mapu a 1:1 a fayilo ya gwero ya LinkedIn kupita ku fayilo yomwe ili pamalo otseguka. Pali malamulo angapo okhudzana ndi kupanga kokha kwa mayanjano a mafayilo:
Pankhani ya ma module angapo a gawo lomwe mukufuna kulowa mu gwero lotseguka, mikangano imatha kubuka, mwachitsanzo, zomwezo. Mtengo wa FQCN, yomwe ilipo mumitundu yopitilira imodzi. Monga njira yothanirana ndi mikangano, zida zathu zimasintha kukhala "womaliza amapambana".
"null" amatanthauza kuti fayilo yochokera si gawo la malo otseguka.
Kupereka zipika za magwero otseguka amapangidwanso mwa kuphatikiza zipika za nkhokwe zamkati. Pansipa pali chitsanzo cha chipika chosonyeza mawonekedwe a chipika chopangidwa ndi chida chathu. Kudzipereka kumawonetsa momveka bwino kuti ndi mitundu iti ya nkhokwe zomwe zapakidwa muzochitazo ndipo zimapereka chidule cha chipikacho. Onani izi perekani pogwiritsa ntchito chitsanzo chenicheni cha chipika chopangidwa ndi zida zathu.
metadata-models 29.0.0 -> 30.0.0
Added aspect model foo
Fixed issue bar
dataset-gms 2.3.0 -> 2.3.4
Added rest.li API to serve foo aspect
MP_VERSION=dataset-gms:2.3.4
MP_VERSION=metadata-models:30.0.0
Kuyesa kudalira
LinkedIn ili ndi kudalira kuyezetsa maziko, zomwe zimathandiza kuonetsetsa kuti kusintha kwa zinthu zambiri zamkati sikusokoneza msonkhano wa zinthu zambiri zodalira. Malo otsegula a DataHub sizinthu zambiri, ndipo sangakhale kudalira kwachindunji kwazinthu zambiri, koma mothandizidwa ndi kapu yazinthu zambiri zomwe zimatenga code source source DataHub, tikhoza kugwiritsabe ntchito kuyesa kudalira uku. Choncho, kusintha kulikonse (komwe pambuyo pake kungawonekere) kuzinthu zambiri zomwe zimadyetsa malo otseguka a DataHub repository zimayambitsa chochitika chomanga mu chipolopolo chochuluka. Chifukwa chake, kusintha kulikonse komwe kulephera kupanga cholembera kumalephera kuyesa musanapange chinthu choyambirira ndikubwezeredwa.
Ichi ndi njira yothandiza yomwe imathandiza kupewa kudzipereka kulikonse komwe kumaphwanya gwero lotseguka ndikuzindikira panthawi yochita. Popanda izi, zingakhale zovuta kudziwa kuti ndi chiyani chomwe chinapangitsa kuti malo otseguka alephereke, chifukwa timagwirizanitsa zosintha zamkati ku DataHub open source repository.
Kusiyana pakati pa Open source DataHub ndi mtundu wathu wopanga
Mpaka pano, takambirana njira yathu yolumikizira mitundu iwiri ya nkhokwe za DataHub, koma sitinafotokozebe zifukwa zomwe timafunikira mitsinje iwiri yoyambira. M'chigawo chino, tidzalemba kusiyana pakati pa anthu onse a DataHub ndi kupanga ma seva a LinkedIn, ndikufotokozera zifukwa za kusiyana kumeneku.
Chimodzi mwazosemphana ndi zomwe timapanga zimatengera ma code omwe sanatseguke, monga LinkedIn's Offspring (LinkedIn's internal dependency injection framework). Ana amagwiritsidwa ntchito kwambiri m'ma codebases amkati chifukwa ndi njira yabwino yoyendetsera masinthidwe osinthika. Koma si gwero lotseguka; chifukwa chake tidafunikira kupeza njira zina zotsegulira zopezeka ku DataHub.
Palinso zifukwa zina. Pamene tikupanga zowonjezera kumtundu wa metadata pazosowa za LinkedIn, zowonjezera izi zimakhala zachindunji ku LinkedIn ndipo sizingagwire ntchito kumadera ena. Mwachitsanzo, tili ndi zilembo zenizeni za ma ID otenga nawo mbali ndi mitundu ina yofananira. Chifukwa chake, tsopano sitinaphatikizepo zowonjezera izi kuchokera ku dataHub's open source metadata model. Pamene tikuchita ndi anthu ammudzi ndikumvetsetsa zosowa zawo, tidzagwiritsa ntchito mitundu yofanana yotseguka ya zowonjezerazi ngati pakufunika.
Kusavuta kugwiritsa ntchito komanso kusinthika kosavuta kwa gulu lotseguka lotseguka kunalimbikitsanso kusiyana pakati pa mitundu iwiri ya DataHub. Kusiyanasiyana kwa zomangamanga zopangira mitsinje ndi chitsanzo chabwino cha izi. Ngakhale mtundu wathu wamkati umagwiritsa ntchito njira yoyendetsera mitsinje yoyendetsedwa, tidasankha kugwiritsa ntchito makina omangidwira (standalone) amtundu wotseguka chifukwa amapewa kupanga kudalira kwina kwazinthu.
Chitsanzo china cha kusiyana kwake ndikukhala ndi GMS imodzi (Generalized Metadata Store) potsegula gwero lotsegula osati ma GMS angapo. GMA (Generalized Metadata Architecture) ndi dzina la zomangamanga kumbuyo kwa DataHub, ndipo GMS ndi malo osungira metadata muzochitika za GMA. GMA ndi kamangidwe kosinthika komwe kamakulolani kuti mugawane chilichonse chopangidwa ndi data (monga ma dataset, ogwiritsa ntchito, ndi zina zambiri) m'malo ake osungira metadata, kapena kusunga zomanga zingapo mu sitolo imodzi ya metadata malinga ngati kaundula wokhala ndi mapu a data mkati mwake. GMS yasinthidwa. Kuti tigwiritse ntchito mosavuta, tidasankha chitsanzo chimodzi cha GMS chomwe chimasunga mitundu yonse yosiyanasiyana mu DataHub yotseguka.
Mndandanda wathunthu wa kusiyana pakati pa machitidwe awiriwa waperekedwa mu tebulo ili m'munsimu.
Zambiri Zamalonda
LinkedIn DataHub
Open Source DataHub
Zopanga Zothandizira Zothandizira
1) Ma data 2) Ogwiritsa 3) Mayeso 4) Mawonekedwe a ML 5) Ma chart 6) Ma Dashboards
1) Ma dataset 2) Ogwiritsa
Malo Othandizira a Metadata a Datasets
1) Ambry 2) 3) Dalids 4) Espresso 5) HDFS 6) Hive 7) Kafka 8) MongoDB 9) MySQL 10) Oracle 11) Pinot 12) Nsomba 12) Nyanja 13) Teradata 13) Vector 14) Venice
Hive Kafka RDBMS
Masitolo a Metadata
Ma GMS angapo amagawidwa: 1) Dataset GMS 2) Wogwiritsa GMS 3) Metric GMS 4) Mbali ya GMS 5) Tchati/Dashboard GMS
GMS Imodzi ya: 1) Ma Dataset 2) Ogwiritsa
Microservices muzotengera za Docker
Docker imathandizira kutumiza ndi kugawa ntchito ndi kusungirako katundu. Chigawo chilichonse chautumiki mu DataHub ndi gwero lotseguka, kuphatikiza zida zamagulu monga Kafka, Elasticsearch, @Alirezatalischioriginal ΠΈ MySQL, ili ndi chithunzi chake cha Docker. Kupanga zida za Docker zomwe tidagwiritsa ntchito Docker Kulemba.
Chithunzi 2: ZomangamangaDataHub *Open source**
Mutha kuwona mapangidwe apamwamba a DataHub pachithunzi pamwambapa. Kupatula magawo azomangamanga, ili ndi zotengera zinayi za Docker:
datahub-gms: ntchito yosungirako metadata
datahub-frontend: ntchito Play, kutumikira mawonekedwe a DataHub.
datahub-mce-consumer: ntchito Kafka Mitsinje, yomwe imagwiritsa ntchito mtsinje wa kusintha kwa metadata (MCE) ndikusintha sitolo ya metadata.
datahub-mae-consumer: ntchito Kafka Mitsinje, yomwe imagwiritsa ntchito mtsinje wa metadata audit event stream (MAE) ndipo imapanga index index ndi graph database.
Malo otseguka a DataHub akugwiritsa ntchito Travis CI kwa kuphatikiza kosalekeza ndi Docker likulu kuti azitumiza mosalekeza. Onse awiri ali ndi kuphatikiza kwa GitHub ndipo ndi kosavuta kukhazikitsa. Pazinthu zambiri zotseguka zopangidwa ndi anthu ammudzi kapena makampani azinsinsi (mwachitsanzo. Kusintha), Zithunzi za Docker zimapangidwa ndikutumizidwa ku Docker Hub kuti anthu ammudzi azigwiritsa ntchito. Chithunzi chilichonse cha Docker chopezeka mu Docker Hub chitha kugwiritsidwa ntchito mosavuta ndi lamulo losavuta kukoka docker.
Ndi kudzipereka kulikonse kumalo otsegulira a DataHub, zithunzi zonse za Docker zimamangidwa zokha ndikutumizidwa ku Docker Hub ndi tag "yaposachedwa". Ngati Docker Hub idakonzedwa ndi ena kutchula nthambi zowonetsera nthawi zonse, ma tag onse omwe ali pamalo otseguka amatulutsidwanso ndi mayina ofananira mu Docker Hub.
Tikukonzekeranso kupereka yankho la turnkey potumiza DataHub pagulu lamtambo la anthu monga Azure, AWS kapena Google Cloud. Popeza chilengezo chaposachedwa cha kusamuka kwa LinkedIn kupita ku Azure, izi zigwirizana ndi zomwe gulu la metadata limafunikira mkati.
Pomaliza, zikomo kwa onse oyamba kutengera DataHub mdera lotseguka lomwe adavotera ma alphas a DataHub ndipo adatithandiza kuzindikira zovuta ndikuwongolera zolemba.