Yadda ake bincika bayanai cikin sauri da sauƙi tare da Whale

Yadda ake bincika bayanai cikin sauri da sauƙi tare da Whale
Wannan labarin yayi magana game da kayan aikin gano bayanai mafi sauƙi kuma mafi sauri, aikin da kuke gani akan KDPV. Abin sha'awa, an ƙirƙira whale don a shirya shi akan sabar git mai nisa. Cikakkun bayanai a ƙarƙashin yanke.

Yadda Kayan Aikin Gano Data na Airbnb Ya Canza Rayuwata

A cikin aiki na, Na yi sa'a don yin aiki a kan wasu matsalolin nishadi: Na karanta ilimin lissafi yayin da nake yin digiri na a MIT, na yi aiki akan ƙirar haɓakawa, tare da buɗe tushen aikin. pylift a Wayfair, kuma ya aiwatar da sabbin samfura masu niyya na shafin gida da haɓaka CUPED a Airbnb. Amma duk wannan aikin bai kasance mai ban sha'awa ba-a zahiri, sau da yawa nakan kashe mafi yawan lokutana don bincike, bincike, da tabbatar da bayanai. Duk da cewa wannan yanayi ne akai-akai a wurin aiki, bai same ni ba cewa wannan lamari ne har sai na isa Airbnb inda aka warware shi tare da kayan aikin gano bayanai - dataportal.

A ina zan iya samun {{data}}? dataportal.
Menene ma'anar wannan shafi? dataportal.
Yaya {{metric}} ke gudana a yau? dataportal.
Menene ma'anar rayuwa? IN dataportal, tabbas.

To, kun gabatar da hoton. Neman bayanai da fahimtar abin da ake nufi, yadda aka ƙirƙira shi da yadda ake amfani da su duka yana ɗaukar mintuna kaɗan kawai, ba sa'o'i ba. Zan iya ciyar da lokacina don yin yanke shawara mai sauƙi, ko sababbin algorithms, (... ko amsa tambayoyin bazuwar game da bayanan), maimakon yin tono ta hanyar bayanin kula, rubuta maimaita tambayoyin SQL, da ambaton abokan aiki akan Slack don gwadawa da sake fasalin mahallin. da.

Menene matsalar?

Na gane cewa yawancin abokaina ba su da damar yin amfani da irin wannan kayan aiki. Ƙananan kamfanoni suna shirye su sadaukar da albarkatu masu yawa don ginawa da kiyaye kayan aikin dandamali kamar Dataportal. Kuma yayin da akwai ƴan buɗaɗɗen mafita na tushen, sun kasance an tsara su don ƙima, yana sa ya zama da wahala a kafa da kulawa ba tare da kwararren injiniyan DevOps ba. Don haka na yanke shawarar ƙirƙirar sabon abu.

Whale: Kayan aikin gano bayanai marasa wauta

Yadda ake bincika bayanai cikin sauri da sauƙi tare da Whale

Kuma a, ta hanyar wauta mai sauƙi ina nufin wauta mai sauƙi. Wale yana da abubuwa biyu kawai:

  1. Laburaren Python wanda ke tattara metadata da tsara shi a cikin MarkDown.
  2. Tsatsa layin umarni don neman ta wannan bayanan.

Daga ra'ayi na kayan aikin ciki don kiyayewa, akwai kawai fayilolin rubutu da yawa da shirin da ke sabunta rubutun. Shi ke nan, don haka karbar bakuncin sabar git kamar Github ba komai bane. Babu sabon yaren tambaya da za a koyo, babu kayan aikin gudanarwa, babu ajiya. Kowa ya san Git, don haka daidaitawa da haɗin gwiwa kyauta ne. Bari mu dubi aikin sosai Whale v1.0.

Cikakken GUI na tushen git

An tsara Whale don yin iyo a cikin tekun uwar garken git mai nisa. Shi mai sauqi daidaitawa: ayyana wasu haɗin kai, kwafi rubutun Github Actions (ko rubuta ɗaya don dandalin CI/CD ɗin da kuka zaɓa) kuma zaku sami kayan aikin gidan yanar gizo na gano bayanai nan take. Za ku iya bincika, duba, daftarin aiki da raba maƙunsar bayanan ku kai tsaye akan Github.

Yadda ake bincika bayanai cikin sauri da sauƙi tare da Whale
Misalin tebur stub da aka samar ta amfani da Ayyukan Github. Cikakken demo mai aiki gani a wannan sashe.

Walƙiya mai sauri CLI bincika ma'ajiyar ku

Whale yana rayuwa kuma yana numfashi akan layin umarni, yana ba da iko, duban millisecond a cikin teburin ku. Ko da tare da miliyoyin tebura, mun sami nasarar yin kifin kifin mai ban sha'awa ta hanyar amfani da wasu dabarun caching masu wayo da kuma sake gina bangon baya a cikin Rust. Ba za ku lura da wani jinkirin bincike ba [sannu Google DS].

Yadda ake bincika bayanai cikin sauri da sauƙi tare da Whale
Whale demo, duba tebur miliyan.

Lissafi ta atomatik na ma'auni [a cikin beta]

Ɗaya daga cikin mafi ƙarancin abubuwan da na fi so a matsayin masanin kimiyyar bayanai shine gudanar da tambayoyi iri ɗaya akai-akai don kawai duba ingancin bayanan da ake amfani da su. Whale yana goyan bayan ikon ayyana ma'auni a sarari SQL wanda za'a tsara gudanarwa tare da bututun tsaftace metadata. Ƙayyade ma'auni na YAML a cikin tebur ɗin stub, kuma Whale zai yi aiki ta atomatik akan jadawalin kuma yana gudanar da tambayoyin da aka sanya cikin awo.

```metrics
metric-name:
  sql: |
    select count(*) from table
```

Yadda ake bincika bayanai cikin sauri da sauƙi tare da Whale
Haɗe tare da Github, wannan hanyar tana nufin whale zai iya zama mai sauƙin tushen gaskiya don ma'anar awo. Whale har ma yana adana ƙimar tare da tambarin lokaci a cikin "~/. whale/metrics" idan kuna son yin wasu tsararru ko ƙarin bincike mai zurfi.

Nan gaba

Bayan mun yi magana da masu amfani da nau'ikan mu na whale kafin fitowa, mun fahimci cewa mutane suna buƙatar ƙarin ayyuka. Me yasa kayan aikin neman tebur? Me yasa ba kayan aikin neman awo ba? Me yasa ba a saka idanu ba? Me yasa ba kayan aikin kisa na SQL ba? Yayin da whale v1 aka fara ɗaukar ciki azaman kayan aiki mai sauƙi na CLI Dataportal/Amundsen, Ya riga ya samo asali zuwa cikakkiyar dandamali mai zaman kansa, kuma muna fatan zai zama wani muhimmin bangare na kayan aikin Masanin Kimiyyar Bayanai.

Idan akwai wani abu da kuke son gani a cikin tsarin ci gaba, shiga cikin mu ga al'ummar Slack, Buɗe Batutuwa a Githubko ma tuntuɓar kai tsaye LinkedIn. Mun riga muna da abubuwa masu kyau da yawa - samfuran Jinja, alamun shafi, masu tacewa, faɗakarwar Slack, haɗin Jupyter, har ma da dashboard na CLI don ma'auni - amma muna son shigarwar ku.

ƙarshe

Whale yana haɓakawa kuma yana kiyaye shi ta Dataframe, farawa wanda kwanan nan na ji daɗin haɗin gwiwa tare da wasu mutane. Yayin da aka yi whale don masana kimiyyar bayanai, an yi Dataframe don masana kimiyyar bayanai. Ga waɗanda daga cikinku waɗanda ke son yin haɗin gwiwa sosai, ku ji daɗi adireshinza mu ƙara ku zuwa jerin jira.

Yadda ake bincika bayanai cikin sauri da sauƙi tare da Whale
Kuma ta hanyar promo code HABR, za ku iya samun ƙarin 10% zuwa rangwamen da aka nuna akan banner.

Ƙarin darussa

Fitattun Labarai

source: www.habr.com