I-Open Source DataHub: I-LinkedIn Metadata Search kanye ne-Discovery Platform
Ukuthola idatha oyidingayo ngokushesha kubalulekile kunoma iyiphi inkampani ethembele enanini elikhulu ledatha ukwenza izinqumo eziqhutshwa idatha. Lokhu akuthinti nje kuphela ukukhiqiza kwabasebenzisi bedatha (okuhlanganisa abahlaziyi, onjiniyela bokufunda bemishini, ososayensi bedatha, nonjiniyela bedatha), kodwa futhi kunomthelela oqondile emikhiqizweni yokugcina encike epayipini lokufunda lomshini lekhwalithi (ML). Ukwengeza, ithrendi ekusetshenzisweni noma ekwakheni izinkundla zokufunda zomshini iphakamisa umbuzo ngokwemvelo: ithini indlela yakho yokuthola ngaphakathi ngaphakathi izici, amamodeli, amamethrikhi, amasethi edatha, njll.
Kulesi sihloko sizokhuluma ngokuthi sishicilele kanjani umthombo wedatha ngaphansi kwelayisensi evulekile
I-HowHows manje iyi-DataHub!
Ithimba lemethadatha le-LinkedIn yethulwe ngaphambilini
Izindlela Zomthombo Ovulekile
WhereHows, ingosi yokuqala ye-LinkedIn yokuthola idatha nalapho ivela khona, iqale njengephrojekthi yangaphakathi; ithimba lemethadatha liyivulile
Okokuqala zama: "Vula umthombo kuqala"
Siqale salandela imodeli yokuthuthukisa "umthombo ovulekile kuqala", lapho ukuthuthukiswa okuningi kwenzeka endaweni yekhosombe yomthombo ovulekile futhi izinguquko zenziwa ukuze zisetshenziswe ngaphakathi. Inkinga ngale ndlela ukuthi ikhodi ihlezi iphushwa ku-GitHub kuqala ngaphambi kokuthi ibuyekezwe ngokugcwele ngaphakathi. Kuze kube yilapho kwenziwa izinguquko endaweni yokugcina yomthombo ovulekile futhi kwenziwa ukusetshenziswa okusha kwangaphakathi, ngeke sithole izinkinga zokukhiqiza. Esimeni sokungatshalwa kahle, bekunzima kakhulu ukucacisa ukuthi ngubani onecala ngoba izinguquko zenziwa ngamaqoqo.
Ukwengeza, le modeli yehlisa ukukhiqiza kweqembu lapho kwakhiwa izici ezintsha ezidinga ukuphindaphindwa ngokushesha, njengoba iphoqe ukuthi zonke izinguquko ziqhutshwe kuqala endaweni yokugcina yomthombo ovulekile bese ziphushelwa endaweni yokugcina yangaphakathi. Ukuze kuncishiswe isikhathi sokucubungula, ukulungiswa okudingekayo noma ukuguqulwa kungenziwa endaweni yokugcina yangaphakathi kuqala, kodwa lokhu kube inkinga enkulu uma kuziwa ekuhlanganiseni lezo zinguquko zibuyiselwe endaweni evulekile yomthombo ngenxa yokuthi amakhosombe amabili ayengavumelanisiwe.
Le modeli ilula kakhulu ukuyisebenzisela izinkundla ezabiwe, imitapo yolwazi, noma amaphrojekthi wengqalasizinda kunezinhlelo zokusebenza zewebhu zangokwezifiso ezinesici esigcwele. Ukwengeza, le modeli ilungele amaphrojekthi aqala umthombo ovulekile kusukela ngosuku lokuqala, kodwa i-HowHows yakhiwe njengohlelo lokusebenza lwewebhu lwangaphakathi ngokuphelele. Kwakunzima ngempela ukukhipha konke okuncikile kwangaphakathi, ngakho-ke besidinga ukugcina imfoloko yangaphakathi, kodwa ukugcina imfoloko yangaphakathi nokuthuthukisa umthombo ovulekile kakhulu akusebenzanga.
Umzamo wesibili: βInner firstβ
**Njengomzamo wesibili, sithuthele kumodeli yokuthuthukisa "yangaphakathi yokuqala", lapho ukuthuthukiswa okuningi kwenzeka ngaphakathi endlini futhi izinguquko zenziwa kukhodi yomthombo ovulekile njalo. Nakuba le modeli ifaneleka kakhulu esimweni sethu sokusetshenziswa, inezinkinga zemvelo. Ukuphusha ngokuqondile wonke umehluko kunqolobane yomthombo ovulekile bese uzama ukuxazulula izingxabano kamuva kuyinketho, kodwa kudla isikhathi. Onjiniyela ezikhathini eziningi bazama ukungakwenzi lokhu njalo lapho bebuyekeza amakhodi abo. Ngenxa yalokho, lokhu kuzokwenziwa kancane kakhulu, ngamaqoqo, futhi ngaleyo ndlela kwenza kube nzima kakhulu ukuxazulula izingxabano zokuhlanganisa kamuva.
Okwesithathu kwasebenza!
Imizamo emibili ehlulekile okukhulunywe ngayo ngenhla iholele ekutheni i-HowHows GitHub repository ihlale iphelelwe yisikhathi isikhathi eside. Ithimba liqhubekile nokuthuthukisa izici zomkhiqizo nezakhiwo, ukuze inguqulo yangaphakathi ye-HowHows ye-LinkedIn ithuthuke kakhulu kunenguqulo yomthombo ovulekile. Yaze yaba negama elisha - DataHub. Ngokusekelwe emizamweni ehlulekile yangaphambilini, ithimba linqume ukuthuthukisa isisombululo esinokalishi, sesikhathi eside.
Kunoma iyiphi iphrojekthi yomthombo ovulekile entsha, ithimba le-LinkedIn lomthombo ovulekile liyeluleka futhi lisekele imodeli yokuthuthukisa lapho amamojula ephrojekthi athuthukiswa ngokuphelele kumthombo ovulekile. Ama-artifact enguqulo afakwa endaweni yokugcina yomphakathi bese aphinde ahlolwe ku-LinkedIn artifact yangaphakathi kusetshenziswa.
Kodwa-ke, uhlelo lokusebenza lokubuyela emuva oluvuthiwe olufana ne-DataHub luzodinga isikhathi esibalulekile ukuze lufinyelele kulesi simo. Lokhu kuphinde kuvimbele ukuthi kube khona ithuba lokuthola umsebenzi ovulelekile wokuqalisa ukusebenza ngokugcwele ngaphambi kokuthi konke ukuncika kwangaphakathi kukhishwe ngokuphelele. Yingakho sithuthukise amathuluzi asisiza ukuthi senze iminikelo yomthombo ovulekile ngokushesha futhi ngobuhlungu obuncane kakhulu. Lesi sixazululo sizuzisa ithimba lemethadatha (unjiniyela we-DataHub) kanye nomphakathi womthombo ovulekile. Izigaba ezilandelayo zizoxoxa ngale ndlela entsha.
I-Open Source Publishing Automation
Indlela yakamuva yethimba le-Metadata yomthombo ovulekile we-DataHub iwukwenza ithuluzi elivumelanisa ngokuzenzakalelayo i-codebase yangaphakathi kanye nekhosombe lomthombo ovulekile. Izici zezinga eliphezulu zaleli qoqo lamathuluzi zihlanganisa:
- Vumelanisa ikhodi ye-LinkedIn kuya/kusuka kumthombo ovulekile, okufanayo
rsync . - Ukwenziwa kwesihloko selayisense, sifana ne
I-Apache Rat . - Khiqiza ngokuzenzakalelayo amalogi wokuzibophezela omthombo ovulekile kusuka kumalogi okuzibophezela angaphakathi.
- Vimbela izinguquko zangaphakathi eziphula umthombo ovulekile owakhiwe ngawo
ukuhlolwa kokuncika .
Lezi zigatshana ezilandelayo zizongena kule misebenzi eshiwo ngenhla enezinkinga ezithakazelisayo.
Ukuvumelanisa ikhodi yomthombo
Ngokungafani nenguqulo yomthombo ovulekile we-DataHub, okuyindawo eyodwa ye-GitHub, inguqulo ye-LinkedIn ye-DataHub iyinhlanganisela yamaqoqo amaningi (okuthiwa ngaphakathi
Umfanekiso 1: Ukuvumelanisa phakathi kwamakhosombe I-LinkedIn I-DataHub kanye nenqolobane eyodwa I-DataHub umthombo ovulekile
Ukuze sisekele ukwakha okuzenzakalelayo, ukusunduza, nokudonsa ukuhamba komsebenzi, ithuluzi lethu elisha lidala ngokuzenzakalelayo imephu yezinga lefayela elihambisana nefayela ngalinye elingumthombo. Nokho, ikhithi yamathuluzi idinga ukucushwa kwasekuqaleni futhi abasebenzisi kufanele banikeze imephu yemojuli yezinga eliphezulu njengoba kukhonjisiwe ngezansi.
{
"datahub-dao": [
"${datahub-frontend}/datahub-dao"
],
"gms/impl": [
"${dataset-gms}/impl",
"${user-gms}/impl"
],
"metadata-dao": [
"${metadata-models}/metadata-dao"
],
"metadata-builders": [
"${metadata-models}/metadata-builders"
]
}
Imephu yeleveli yemojuli i-JSON elula okhiye bayo abangamamojula aqondiwe endaweni yenqolobane yomthombo ovulekile futhi amanani awuhlu lwamamojula omthombo kumakhosombe e-LinkedIn. Noma iyiphi imojuli eqondiwe endaweni yokugcina yomthombo ovulekile ingaphakelwa nganoma iyiphi inombolo yamamojula omthombo. Ukuze ubonise amagama angaphakathi wamaqoqo kumamojula omthombo, sebenzisa
{
"${metadata-models}/metadata-builders/src/main/java/com/linkedin/Foo.java":
"metadata-builders/src/main/java/com/linkedin/Foo.java",
"${metadata-models}/metadata-builders/src/main/java/com/linkedin/Bar.java":
"metadata-builders/src/main/java/com/linkedin/Bar.java",
"${metadata-models}/metadata-builders/build.gradle": null,
}
Imephu yezinga lefayela idalwa ngokuzenzakalelayo ngamathuluzi; nokho, ingabuyekezwa ngesandla ngumsebenzisi. Lokhu imephu engu-1:1 yefayela lomthombo we-LinkedIn kuya kufayela elisendaweni evulekile yomthombo. Kunemithetho eminingana ehlotshaniswa nalokhu kudalwa okuzenzakalelayo kwezinhlangano zamafayela:
- Esimeni samamojula emithombo eminingi yemojuli eqondiwe kumthombo ovulekile, ukungqubuzana kungase kuphakame, isb. okufanayo
I-FQCN , ekhona kumamojula womthombo angaphezu kweyodwa. Njengesu lokuxazulula izingxabano, amathuluzi ethu azenzakalela abe yinketho "eyokugcina iyawina". - "null" kusho ukuthi ifayela lomthombo aliyona ingxenye yenqolobane yomthombo ovulekile.
- Ngemuva kokuthunyelwa ngakunye komthombo ovulekile noma ukukhishwa, lokhu kumepha kubuyekezwa ngokuzenzakalelayo futhi kwakhiwa isifinyezo. Lokhu kuyadingeka ukuze uhlonze izengezo kanye nokususwa kukhodi yomthombo kusukela esenzweni sokugcina.
Ukudala izingodo zokuzibophezela
Amalogi okuzibophezela emisebenzi yomthombo ovulekile nawo enziwa ngokuzenzakalelayo ngokuhlanganisa amalogi okubophezela amaqoqo angaphakathi. Ngezansi isampula lokungena lokuzibophezela ukukhombisa ukwakheka kwelogi lokuzibophezela elikhiqizwe ithuluzi lethu. Ukuzibophezela kukhombisa ngokusobala ukuthi yiziphi izinguqulo zamakhosombe omthombo apakishwe kulokho kuzinikela futhi kunikeza isifinyezo selogi yokubophezela. Hlola lokhu
metadata-models 29.0.0 -> 30.0.0
Added aspect model foo
Fixed issue bar
dataset-gms 2.3.0 -> 2.3.4
Added rest.li API to serve foo aspect
MP_VERSION=dataset-gms:2.3.4
MP_VERSION=metadata-models:30.0.0
Ukuhlolwa kokuncika
I-LinkedIn ine
Lena indlela ewusizo esiza ukuvimbela noma yikuphi ukuzibophezela kwangaphakathi okwephula ukwakhiwa komthombo ovulekile futhi ikuthole ngesikhathi sokuzinikela. Ngaphandle kwalokhu, kungaba nzima kakhulu ukunquma ukuthi isiphi isibopho sangaphakathi esibangele ukuthi ukwakhiwa kwenqolobane yomthombo ovulekile kuhluleke, ngoba sihlanganisa izinguquko zangaphakathi kunqolobane yomthombo ovulekile we-DataHub.
Umehluko phakathi kwe-DataHub yomthombo ovulekile kanye nenguqulo yethu yokukhiqiza
Kuze kube manje, sixoxile ngesisombululo sethu sokuvumelanisa izinguqulo ezimbili zamakhosombe e-DataHub, kodwa namanje asikakasho izizathu zokuthi kungani sidinga imifudlana yokuthuthukisa emibili ehlukene kwasekuqaleni. Kulesi sigaba, sizofaka uhlu umehluko phakathi kwenguqulo yomphakathi ye-DataHub kanye nenguqulo yokukhiqiza kumaseva we-LinkedIn, futhi sichaze izizathu zalo mehluko.
Umthombo owodwa wokungafani usukela eqinisweni lokuthi inguqulo yethu yokukhiqiza inokuncika kukhodi engakabi umthombo ovulekile, njenge-LinkedIn's Offspring (uhlaka lwe-LinkedIn lokujova lwangaphakathi). Inzalo isetshenziswa kakhulu kuma-codebases angaphakathi ngoba iyindlela ekhethwayo yokuphatha ukucushwa okuguquguqukayo. Kodwa akuwona umthombo ovulekile; ngakho-ke besidinga ukuthola ezinye izindlela zomthombo ovulekile ku-DataHub yomthombo ovulekile.
Kukhona nezinye izizathu. Njengoba sakha izandiso zemodeli yemethadatha yezidingo ze-LinkedIn, lezi zandiso zivame ukucaciswa kakhulu ku-LinkedIn futhi zingase zingasebenzi ngokuqondile kwezinye izindawo. Isibonelo, sinamalebula aqondile kakhulu ama-ID ababambiqhaza nezinye izinhlobo zemethadatha efanayo. Ngakho-ke, manje sesizikhiphile lezi zandiso kumodeli yemethadatha yomthombo ovulekile we-DataHub. Njengoba sizibandakanya nomphakathi futhi siqonda izidingo zawo, sizosebenza ezinguqulweni zomthombo ovulekile ovamile walezi zandiso lapho kudingeka khona.
Ukusebenziseka kalula nokujwayela okulula komphakathi womthombo ovulekile kuphinde kwagqugquzela omunye umehluko phakathi kwezinguqulo ezimbili ze-DataHub. Umehluko kwingqalasizinda yokucubungula imifudlana uyisibonelo esihle salokhu. Nakuba inguqulo yethu yangaphakathi isebenzisa uhlaka lokucubungula ukusakaza okuphethwe, sikhethe ukusebenzisa ukucubungula okwakhelwe ngaphakathi (okuzimele) kunguqulo yomthombo ovulekile ngoba igwema ukudala okunye ukuncika kwengqalasizinda.
Esinye isibonelo somehluko ukuba ne-GMS eyodwa (Isitolo Semethadatha Ejwayelekile) ekusetshenzisweni komthombo ovulekile kunama-GMS amaningi. I-GMA (I-Generalized Metadata Architecture) igama le-architecture engemuva ye-DataHub, futhi i-GMS iyisitolo semethadatha kumongo we-GMA. I-GMA iyisakhiwo esivumelana nezimo kakhulu esikuvumela ukuthi usabalalise ukwakhiwa kwedatha ngakunye (isb. amasethi edatha, abasebenzisi, njll.) esitolo sayo semethadatha, noma ugcine idatha eyakhiwe eminingi esitolo esisodwa semethadatha inqobo nje uma ukubhalisa okuqukethe imephu yesakhiwo sedatha I-GMS ibuyekeziwe. Ukuze kube lula ukusebenzisa, sikhethe isenzakalo esisodwa se-GMS esigcina yonke imininingwane ehlukahlukene eyakhiwe kumthombo ovulekile we-DataHub.
Uhlu oluphelele lomehluko phakathi kokusetshenziswa okubili lunikezwe kuthebula elingezansi.
Izici Product
I-LinkedIn DataHub
I-Open Source DataHub
Ukwakhiwa Kwedatha Okusekelwe
1) Amasethi edatha 2) Abasebenzisi 3) Amamethrikhi 4) Izici ze-ML 5) Amashadi 6) Amadeshibhodi
1) Amasethi edatha 2) Abasebenzisi
Imithombo Yemethadatha Esekelwe Yamasethi Edatha
1)
I-Hive Kafka RDBMS
I-Pub-sub
I-Confluent Kafka
Ukusakaza Ukucubungula
Iphathwe
Kushumekiwe (kuzimele)
Umjovo Wokuncika & Ukucushwa Kwe-Dynamic
I-LinkedIn Offspring
Yakha Amathuluzi
I-Ligradle (isisonga sangaphakathi se-Gradle se-LinkedIn)
CI/CD
I-CRT (I-LinkedIn's yangaphakathi CI/CD)
Izitolo zemethadatha
I-GMS eminingi esatshalaliswa: 1) Isethi yedatha ye-GMS 2) Umsebenzisi i-GMS 3) I-Metric GMS 4) Isici se-GMS 5) Ishadi/Ideshibhodi ye-GMS
I-GMS Eyodwa: 1) Amasethi edatha 2) Abasebenzisi
Ama-Microservices ezitsheni ze-Docker
Umfanekiso 2: Izakhiwo I-DataHub *umthombo ovulekile**
Ungabona izinga eliphezulu le-DataHub esithombeni esingenhla. Ngaphandle kwezingxenye zengqalasizinda, ineziqukathi ezine ezihlukene ze-Docker:
i-datahub-gms: isevisi yokugcina imethadatha
idathahub-frontend: isicelo
i-datahub-mce-consumer: isicelo
i-datahub-mae-consumer: isicelo
Amadokhumenti enqolobane yomthombo ovulekile kanye
I-CI/CD ku-DataHub ingumthombo ovulekile
Inqolobane yeDathaHub yomthombo ovulekile esetshenziswayo
Ngazo zonke izibophezelo kukhosombe lomthombo ovulekile we-DataHub, zonke izithombe ze-Docker zakhiwa ngokuzenzakalelayo futhi zithunyelwe ku-Docker Hub ngomaka "wakamuva". Uma i-Docker Hub ilungiselelwe nabanye
Ukusebenzisa i-DataHub
- Vala inqolobane yomthombo ovulekile bese usebenzisa zonke iziqukathi ze-Docker nge-docker-compose usebenzisa i-docker-compose script enikeziwe ukuze uqale ngokushesha.
- Landa idatha yesampula enikezwe endaweni yokugcina usebenzisa ithuluzi lomugqa womyalo nalo elinikeziwe.
- Phequlula i-DataHub esipheqululini sakho.
Ilandelelwa ngokuqhubekayo
Izinhlelo zekusasa
Njengamanje, yonke ingqalasizinda noma i-microservice yomthombo ovulekile we-DataHub yakhiwe njengesiqukathi se-Docker, futhi lonke uhlelo luhlelwe kusetshenziswa.
Futhi sihlela ukuhlinzeka ngesixazululo se-turnkey sokuphakela i-DataHub kusevisi yamafu yomphakathi efana
Okokugcina, sibonga bonke abamukeli bokuqala be-DataHub emphakathini womthombo ovulekile abalinganisele ama-alpha e-DataHub futhi basisiza ukukhomba izinkinga nokuthuthukisa imibhalo.
Source: www.habr.com