Mhoro, Habr! Kunyoreswa kwekosi nyowani kwavhurwa izvozvi paOTUS
Mazuva ese, vanhu vanopfuura miriyoni zana vanoshanyira Twitter kuti vaone zviri kuitika munyika uye kuzvikurukura. Yese tweet uye yega yega mushandisi chiito inogadzira chiitiko chinowanikwa che Twitter chemukati data kuongororwa. Mazana evashandi anoongorora uye kuona iyi data, uye kuvandudza ruzivo rwavo chinhu chepamusoro chechikwata che Twitter Data Platform.
Isu tinotenda kuti vashandisi vane hunyanzvi hwakasiyana-siyana hwehunyanzvi vanofanirwa kuwana data uye kuwana mukana wekuita zvakanaka SQL-based analysis uye maturusi ekuona. Izvi zvinobvumira boka idzva revashandisi vashoma vehunyanzvi, kusanganisira vanoongorora data uye maneja echigadzirwa, kuti vatore ruzivo kubva kune data, zvichivabvumira kunzwisisa zviri nani uye kushandisa kugona kwe Twitter. Aya ndiwo maitiro atinoita demokrasi data analytics pa Twitter.
Sezvo maturusi edu uye emukati data analytics kugona kwave nani, taona Twitter ichivandudza. Zvisinei, pachine nzvimbo yekuvandudza. Zvishandiso zvazvino seScalding zvinoda ruzivo rwekugadzira. SQL-yakavakirwa maturusi ekuongorora akadai sePresto neVertica ane nyaya dzekuita pachiyero. Isu tinewo dambudziko rekugovera data kune akawanda masisitimu pasina nguva dzose kuwana kwairi.
Gore rakapera takazivisa
bigquery : bhizinesi data warehouse ine SQL injini yakavakirwaDremel , iyo ine mukurumbira nekumhanya kwayo, kuve nyore uye kubata nayokudzidza muchina .Data Studio: hombe data kuona chishandiso neGoogle Docs-senge maficha ekubatana.
Munyaya ino, uchadzidza nezvezvakaitika kwatiri tichishandisa maturusi aya: zvatakaita, zvatakadzidza, uye zvatichaita. Iye zvino tichatarisa pane batch uye interactive analytics. Tichakurukura nguva-chaiyo analytics munyaya inotevera.
Nhoroondo ye Twitter Data Stores
Usati wanyura muBigQuery, zvakafanira kurondedzera muchidimbu nhoroondo ye Twitter data warehousing. Muna 2011, Twitter data analysis yakaitwa muVertica neHadoop. Isu takashandisa Nguruve kugadzira MepuReduce Hadoop mabasa. Muna 2012, takatsiva Nguruve neScalding, yaive neScala API ine mabhenefiti akadai sekugona kugadzira mapaipi akaomarara uye nyore kuyedza. Nekudaro, kune vakawanda vanoongorora data uye maneja ezvigadzirwa vaive vakasununguka kushanda neSQL, yaive yakanyatso kudzika yekudzidza. Munenge muna 2016, takatanga kushandisa Presto seSQL interface kuHadoop data. Spark yakapa Python interface inoita kuti ive sarudzo yakanaka yead hoc data sainzi uye kudzidza muchina.
Kubva 2018, takashandisa maturusi anotevera ekuongorora data uye kuona:
- Scalding yekugadzira conveyors
- Scalding uye Spark yead hoc data yekuongorora uye kudzidza muchina
- Vertica uye Presto ye ad hoc uye inopindirana SQL ongororo
- Druid yepasi inopindirana, yekuongorora uye yakaderera latency yekuwana kune nguva yakatevedzana metrics
- Tableau, Zeppelin uye Pivot yekuona data
Takaona kuti nepo maturusi aya achipa hunyanzvi hwakasimba, takanetseka kuita kuti kugona uku kuwanikwe kune vateereri vakawanda paTwitter. Nekuwedzera puratifomu yedu neGoogle Cloud, tiri kutarisa kurerutsa maturusi edu eanalytics kune ese Twitter.
Google's BigQuery Data Warehouse
Zvikwata zvakati wandei paTwitter zvakatoisa BigQuery mune mamwe mapaipi ekugadzira. Tichishandisa hunyanzvi hwavo, takatanga kuongorora kugona kweBigQuery kune ese machesi ekushandisa Twitter. Chinangwa chedu chaive chekupa BigQuery kukambani yese uye kuimisa uye kuitsigira mukati meData Platform zvishandiso. Izvi zvakanga zvakaoma nokuda kwezvikonzero zvakawanda. Taifanira kugadzira zvivakwa kuti tipinde nekuvimbika mavhoriyamu makuru edata, kutsigira kambani-yakafara manejimendi data, kuve nechokwadi chekutonga kwekuwana kwakakodzera, uye kuona kuvanzika kwevatengi. Isu taifanirawo kugadzira masisitimu ekugova zviwanikwa, kutarisa, uye kubhadharisa kuitira kuti zvikwata zvishandise BigQuery nemazvo.
MunaNovember 2018, takaburitsa kambani-yakafara alpha kuburitswa kweBigQuery uye Data Studio. Takapa vashandi veTwitter mamwe emaspredishiti edu anowanzo shandiswa ane data rakacheneswa. BigQuery yakashandiswa nevashandisi vanopfuura mazana maviri nemakumi mashanu kubva kuzvikwata zvakasiyana zvinosanganisira engineering, mari uye kushambadzira. Nguva pfupi yadarika, vanga vachimhanyisa zvikumbiro zve250k, vachigadzira nezve 8 PB pamwedzi, vasingaverenge zvikumbiro zvakarongwa. Mushure mekugamuchira mhinduro yakanaka kwazvo, takafunga kuenderera mberi nekupa BigQuery seyo yekutanga sosi yekudyidzana nedata pa Twitter.
Heino dhizaini repamusoro-soro reGoogle BigQuery data warehouse architecture.
Isu tinokopa data kubva pane-nzvimbo Hadoop masumbu kuenda kuGoogle Cloud Storage (GCS) tichishandisa iyo yemukati Cloud Replicator chishandiso. Isu tinobva tashandisa Apache Airflow kugadzira mapaipi anoshandisa "
Muzvikamu zvinotevera, tinokurukura maitiro edu uye hunyanzvi munzvimbo dzekureruka kwekushandisa, kuita, manejimendi data, hutano hwehurongwa, uye mutengo.
Kunakidzwa kwekushandiswa
Takaona kuti zvaive nyore kuti vashandisi vatange neBigQuery nekuti yaisada kuisirwa software uye vashandisi vaigona kuiwana kuburikidza neiyo intuitive web interface. Nekudaro, vashandisi vaifanira kujairana nezvimwe zveGCP uye pfungwa, kusanganisira zviwanikwa zvakaita semapurojekiti, dhatabheti, uye matafura. Isu takagadzira zvekudzidzisa uye zvidzidzo zvekubatsira vashandisi kuti vatange. Nekunzwisisa kwekutanga kwakawanikwa, vashandisi vakawana zviri nyore kufamba-famba seti yedata, kuona schema uye data retafura, mhanyisa mibvunzo yakapusa, uye kuona mhedzisiro muData Studio.
Chinangwa chedu chekupinda data muBigQuery chaive chekugonesa kurodha zvisina musono kweHDFS kana GCS dataset nekudzvanya kumwe chete. Takafunga
Kushandura data kuita BigQuery, vashandisi vanogadzira akareruka SQL data mapaipi vachishandisa yakarongwa mibvunzo. Kune akaomesesa mapaipi ematanho akawanda ane anotsamira, isu tinoronga kushandisa yedu yedu Airflow chimiro kana Cloud Composer pamwe chete.
Kubudirira
BigQuery yakagadzirirwa chinangwa chakajairwa SQL mibvunzo inogadzirisa huwandu hukuru hwe data. Haina kuitirwa iyo yakaderera latency, yakakwirira throughput mibvunzo inodikanwa nea transaction dhatabhesi, kana kune yakaderera latency nguva yakatevedzana kuongororwa kwakaitwa.
Takaongorora pamusoro pe800 mibvunzo ichigadzirisa ingangoita 1 TB yedata imwe neimwe uye takaona kuti avhareji yenguva yekuuraya yaive masekondi makumi matatu. Isu takadzidza zvakare kuti kuita kunoenderana zvakanyanya nekushandiswa kweslot yedu mumapurojekiti akasiyana uye mabasa. Isu taifanira kunyatso tsanangura kugadzirwa kwedu uye ad hoc slot reserves kuchengetedza mashandiro emakesi ekushandisa ekugadzira uye kuongororwa kwepamhepo. Izvi zvakapesvedzera zvakanyanya dhizaini yedu yekuchengetera slot uye hutungamiriri hweprojekiti.
Tichataura nezve data manejimendi, kushanda uye mutengo wemasisitimu mumazuva anouya muchikamu chechipiri cheshanduro, asi ikozvino tinokoka munhu wese
Verenga zvimwe:
Dhata Kuvaka Turusi kana izvo zvakajairika pakati peData Warehouse neSmoothie Dive muDelta Lake: Schema Enforcement uye Evolution High-speed Apache Parquet muPython ine Apache Arrow
Source: www.habr.com