Eli nqaku lithetha ngesixhobo esilula nesikhawulezayo sokufunyanwa kwedatha, umsebenzi owubonayo kwi-KDPV. Okubangela umdla kukuba, i-whale yenzelwe ukuba isingathwe kwi-remote git server. Iinkcukacha phantsi kokusikwa.
Indlela isixhobo sika-Airbnb sokuFumana idatha esibutshintshe ngayo ubomi bam
Kumsebenzi wam, ndibe nethamsanqa ngokwaneleyo lokusebenza kwiingxaki zolonwabo: Ndifunde imathematika ngelixa ndisenza isidanga sam eMIT, ndasebenza kwiimodeli ezongezelelweyo, kunye neprojekthi yomthombo ovulekileyo. eWayfair, saza sasebenzisa imifuziselo yekhasi lasekhaya elitsha kunye nophuculo lwe-CUPED kwa-Airbnb. Kodwa wonke lo msebenzi awuzange ube mnandi-enyanisweni, bendihlala ndichitha ixesha lam elininzi ndikhangela, ndiphanda, kwaye ndiqinisekisa idatha. Nangona le yayiyimeko engaguqukiyo emsebenzini, ayizange ithi qatha kum ukuba le yingxaki de ndaya kufika kwa-Airbnb apho yasonjululwa ngesixhobo sokufumanisa idatha − .
Ndingayifumana phi {{data}}? idathaportal.
Ithetha ukuthini le kholamu? idathaportal.
Injani i-{{metric}} namhlanje? idathaportal.
Yintoni imvakalelo yobomi? IN idathaportal, mhlawumbi.
Kulungile, uwunike umfanekiso. Ukufumana idatha kunye nokuqonda ukuba kuthetha ukuthini, indlela eyadalwa ngayo kunye nendlela yokuyisebenzisa yonke kuthatha imizuzu embalwa nje, kungekhona iiyure. Ndingachitha ixesha lam ndisenza izigqibo ezilula, okanye ii-algorithms ezintsha, (... okanye ndiphendule imibuzo engacwangciswanga malunga nedatha), kunokuba ndigrumbe amanqaku, ndibhale imibuzo ephindaphindwayo yeSQL, kwaye ndikhankanye oogxa bam kwi-Slack ukuzama ukuyila kwakhona umxholo. waba.
Yintoni ingxaki?
Ndabona ukuba abahlobo bam abaninzi babengenaso isixhobo esinjalo. Iinkampani ezimbalwa zizimisele ukunikela ngezibonelelo ezinkulu ekwakheni nasekugcineni isixhobo seqonga njengeDataportal. Kwaye ngelixa kukho izisombululo ezimbalwa zomthombo ovulekileyo, zivame ukuyilwa ukulinganisa, okwenza kube nzima ukuseta nokugcina ngaphandle kwenjineli ye-DevOps ezinikeleyo. Ndiye ndagqiba ekubeni ndenze into entsha.
Umnenga: Sisixhobo esilula sokufumanisa idatha

Kwaye ewe, ngobudenge obulula ndithetha ngokulula ngokulula. Umnenga unamacandelo amabini kuphela:
- Ithala leencwadi lePython eliqokelela imetadata kwaye liyifomethe kwiMarkDown.
- Umhlwa ujongano lomgca womyalelo wokukhangela le datha.
Ukususela kumbono wesiseko sangaphakathi sokugcinwa, kukho kuphela iifayile ezininzi zetekisi kunye neprogram ehlaziya isicatshulwa. Yiloo nto ke, ukubamba kwi-server ye-git njenge-Github yinto encinci. Akukho mbuzo mtsha wolwimi lokufunda, akukho zixhobo zokulawula, akukho zixhobo ezigcinayo. Wonke umntu uyayazi iGit, ke ungqamaniso kunye nentsebenziswano simahla. Makhe sijonge ngakumbi ukusebenza .
I-GUI egcweleyo esekwe kwi-git
Umnenga wenzelwe ukudada kulwandle lwe-git server ekude. Yena inoqwalaselo: chaza imidibaniso ethile, khuphela i-Github Actions script (okanye ubhale enye kwiplatifti oyikhethileyo yeCI/CD) kwaye uya kuba nesixhobo sewebhu sokufunyanwa kwangoko. Uya kukwazi ukukhangela, ukujonga, uxwebhu kwaye wabelane ngespredishithi zakho ngqo kwi-Github.

Umzekelo wetafile ye stub eyenziwe kusetyenziswa iintshukumo zeGithub. Idemo yokusebenza epheleleyo .
Umbane okhawulezayo wokukhangela i-CLI kwindawo yakho yokugcina
Umnenga uphila kwaye uphefumla kumgca womyalelo, ubonelela ngamandla, ukujonga i-millisecond kwiitafile zakho. Nokuba sinezigidi zeetafile, sikwazile ukwenza umnenga usebenze ngendlela encomekayo ngokusebenzisa iindlela ezichuliweyo zogcino kunye nangokuphinda sakhe i-backend eRust. Awuyi kuqaphela nakuphi na ukulibaziseka kokukhangela [molo Google DS].

Idemo yomnenga, ujongo lwetafile yezigidi.
Ubalo oluzenzekelayo lweemetrics [kwi-beta]
Enye yezona zinto zincinci endizithandayo njengososayensi wedatha uqhuba imibuzo efanayo ngokuphindaphindiweyo ukujonga umgangatho wedatha esetyenziswayo. Umnenga uxhasa ukukwazi ukuchaza i-metrics kwi-SQL ecacileyo eya kucwangciswa ukuba iqhube kunye nemibhobho yokucoca imethadatha. Chaza ibhlokhi yeemetrics ye-YAML ngaphakathi kwetafile ye-stub, kwaye uMnenga uzakuqhuba ngokuzenzekelayo kwishedyuli kwaye uqhube imibuzo ebekwe kwi-metrics.
```metrics
metric-name:
sql: |
select count(*) from table
``` 
Idityaniswe ne-Github, le ndlela ithetha ukuba umnenga unokusebenza njengomthombo olula wenyaniso kwiinkcazo zemetriki. Umnenga ude ugcine amaxabiso kunye nesitampu sexesha kwi "~/. whale/metrics" ukuba ufuna ukwenza itshathi okanye uphando olunzulu.
Ixesha elizayo
Emva kokuthetha nabasebenzisi beenguqulelo zethu zangaphambi kokukhutshwa komnenga, siye saqonda ukuba abantu bafuna ukusebenza ngakumbi. Kutheni isixhobo sokujonga itafile? Kutheni ungasisixhobo sokukhangela seemetriki? Kutheni ungajongi? Kutheni ingekho isixhobo sokwenza umbuzo weSQL? Ngelixa umnenga v1 ekuqaleni wakhawulwa njengesixhobo esilula se-CLI Dataportal/Amundsen, sele iguqulelwe kwiqonga elizimeleyo eligcweleyo, kwaye siyathemba ukuba liya kuba yinxalenye yezixhobo ze-Data Scientist's toolkit.
Ukuba kukho into ofuna ukuyibona kwinkqubo yophuhliso, joyina yethu , vula iMiba apha okanye uqhagamshelane ngqo . Sele sinezinto ezininzi ezipholileyo- iitemplates zeJinja, iibhukhimakhi, izihluzi zokukhangela, izilumkiso zeSlack, indibaniselwano yeJupyter, kunye nedeshibhodi yeCLI yeemetriki - kodwa siyalithanda igalelo lakho.
isiphelo
Umnenga uphuhliswa kwaye ugcinwe yiDathaframe, isiqalo endisanda kuba nolonwabo lokusebenzisana nabanye abantu. Ngoxa i-whale yenzelwe izazinzulu zedatha, i-Dataframe yenzelwe izazinzulu zedatha. Kwabo bafuna ukusebenzisana ngokusondeleyo ngakumbi, zive ukhululekile siya kukongeza kuluhlu lokulinda.
Kwaye ngekhowudi yokuthengisa IHABR, ungafumana i-10% eyongezelelweyo kwisaphulelo esiboniswe kwibhanile.
Izifundo ezingakumbi
Amanqaku akhoyo
umthombo: www.habr.com
