ãããããã«ïŒ æ°ããã³ãŒã¹ã¹ããªãŒã ãžã®ç»é²ã¯çŸåšOTUSã§åä»äžã§ã
ããŒã¿ç®¡ç
匷åãªããŒã¿ ã¬ããã³ã¹ã¯ Twitter ãšã³ãžãã¢ãªã³ã°ã®äžæ žçãªç念ã§ãã BigQuery ããã©ãããã©ãŒã ã«å®è£
ããéã«ã¯ãããŒã¿æ€åºãã¢ã¯ã»ã¹å¶åŸ¡ãã»ãã¥ãªãã£ããã©ã€ãã·ãŒã«éç¹ã眮ããŠããŸãã
ããŒã¿ãæ€åºããŠç®¡çããããã«ãããŒã¿ ã¢ã¯ã»ã¹ ã¬ã€ã€ãŒã次ã®ããã«æ¡åŒµããŸããã
BigQuery ã䜿çšãããšãããŒã¿ã®å ±æãšã¢ã¯ã»ã¹ãç°¡åã«ãªããŸãããããŒã¿ã®æŒæŽ©ãé²ãããã«ãããããçšåºŠå¶åŸ¡ããå¿ èŠããããŸããã ä»ã®ããŒã«ã®äžããã次㮠XNUMX ã€ã®æ©èœãéžæããŸããã
ãã¡ã€ã³å¶éä»ãå ±æ : ãŠãŒã¶ãŒã Twitter 以å€ã®ãŠãŒã¶ãŒãš BigQuery ããŒã¿ã»ãããå ±æã§ããªãããã«ããããŒã¿æ©èœãVPC ãµãŒãã¹å¶åŸ¡ : ããŒã¿ã®æŒæŽ©ãé²æ¢ãããŠãŒã¶ãŒãæ¢ç¥ã® IP ã¢ãã¬ã¹ç¯å²ãã BigQuery ã«ã¢ã¯ã»ã¹ããããšãèŠæ±ããã³ã³ãããŒã«ã
次ã®ããã«ãã»ãã¥ãªãã£ã®ããã®èªèšŒãèªå¯ãç£æ» (AAA) èŠä»¶ãå®è£ ããŸããã
- èªèšŒ: ã¢ããã㯠ãªã¯ãšã¹ãã«ã¯ GCP ãŠãŒã¶ãŒ ã¢ã«ãŠã³ãããæ¬çªãªã¯ãšã¹ãã«ã¯ãµãŒãã¹ ã¢ã«ãŠã³ãã䜿çšããŸããã
- æ¿èª: åããŒã¿ã»ããã«ã¯ææè ãµãŒãã¹ ã¢ã«ãŠã³ããšãªãŒã㌠ã°ã«ãŒããå¿ èŠã§ãã
- ç£æ»: åæã容æã«ããããã«ã詳现ãªã¯ãšãªå®è¡æ å ±ãå«ãŸãã BigQuery stackdriver ãã°ã BigQuery ããŒã¿ã»ããã«ãšã¯ã¹ããŒãããŸããã
Twitter ãŠãŒã¶ãŒã®å人ããŒã¿ãé©åã«æ±ãããããã«ããã«ã¯ããã¹ãŠã® BigQuery ããŒã¿ã»ãããç»é²ããå人ããŒã¿ã«æ³šéãä»ããé©åãªã¹ãã¬ãŒãžãç¶æãããŠãŒã¶ãŒã«ãã£ãŠåé€ãããããŒã¿ãåé€ïŒã¹ã¯ã¬ã€ãã³ã°ïŒããå¿ èŠããããŸãã
ç§ãã¡ã¯GoogleãèŠãŸãã
Twitter ã§ã¯ãBigQuery ã®ããŒã¿ã»ããã«å¯Ÿã㊠XNUMX ã€ã®ãã©ã€ãã·ãŒ ã«ããŽãªãäœæããŸãããããã§ã¯ãæ©å¯æ§ã®é«ãé ã«ãªã¹ãããŠããŸãã
- æ©å¯æ§ã®é«ãããŒã¿ ã»ããã¯ãæå°ç¹æš©ã®ååã«åºã¥ããŠå¿ èŠã«å¿ããŠå©çšå¯èœã«ãªããŸãã åããŒã¿ã»ããã«ã¯åå¥ã®ãªãŒããŒã°ã«ãŒãããããåå¥ã®ã¢ã«ãŠã³ãããšã«äœ¿çšç¶æ³ã远跡ããŸãã
- äžæ©å¯ããŒã¿ã»ãã (ãœã«ããã ããã·ã¥ã䜿çšããäžæ¹åã®ä»®å) ã«ã¯å人ãç¹å®ã§ããæ å ± (PII) ãå«ãŸããŠããªãããããã倧èŠæš¡ãªåŸæ¥å¡ã°ã«ãŒããã¢ã¯ã»ã¹ã§ããŸãã ããã¯ããã©ã€ãã·ãŒãžã®æžå¿µãšããŒã¿ã®æçšæ§ãšã®éã®é©åãªãã©ã³ã¹ã§ãã ããã«ãããåŸæ¥å¡ã¯ãå®éã®ãŠãŒã¶ãŒã誰ã§ããããç¥ããªããŠããæ©èœã䜿çšãããŠãŒã¶ãŒã®æ°ãèšç®ãããªã©ã®åæã¿ã¹ã¯ãå®è¡ã§ããŸãã
- ãã¹ãŠã®ãŠãŒã¶ãŒèå¥æ å ±ãå«ãäœæ床ããŒã¿ã»ããã ããã¯ãã©ã€ãã·ãŒã®èŠ³ç¹ããã¯åªããã¢ãããŒãã§ããããŠãŒã¶ãŒã¬ãã«ã®åæã«ã¯äœ¿çšã§ããŸããã
- å ¬éããŒã¿ã»ãã (Twitter 瀟å€ã§å ¬é) ã¯ãTwitter åŸæ¥å¡å šå¡ãå©çšã§ããŸãã
ãã®ã³ã°ã«é¢ããŠã¯ãã¹ã±ãžã¥ãŒã«ãããã¿ã¹ã¯ã䜿çšã㊠BigQuery ããŒã¿ã»ãããåæããããŒã¿ ã¢ã¯ã»ã¹ ã¬ã€ã€ãŒã«ç»é²ããŸãã (
ã·ã¹ãã æ©èœ
BigQuery ã¯ãããŒãžã ãµãŒãã¹ã§ãããããTwitter ã® SRE ããŒã ãã·ã¹ãã 管çããã¹ã¯æ¥åã«é¢äžããå¿ èŠã¯ãããŸããã§ããã ã¹ãã¬ãŒãžãšã³ã³ãã¥ãŒãã£ã³ã°ã®äž¡æ¹ã«ããã«å€ãã®å®¹éãæäŸããã®ã¯ç°¡åã§ããã Google ãµããŒãã§ãã±ãããäœæããããšã§ãã¹ãããã®äºçŽãå€æŽã§ããŸãã ç§ãã¡ã¯ãã»ã«ããµãŒãã¹ã®ã¹ãããå²ãåœãŠãã¢ãã¿ãªã³ã°ã®ããã®ããã·ã¥ããŒãã®æ¹åãªã©ãæ¹åã§ããé åãç¹å®ãããããã®ãªã¯ãšã¹ãã Google ã«éä¿¡ããŸããã
ã®ã³ã¹ã
äºåçãªåæã§ã¯ãBigQuery ãš Presto ã®ã¯ãšãª ã³ã¹ããåãã¬ãã«ã§ããããšãããããŸããã ã¹ãããã賌å
¥ããŸãã
BigQuery ã«ããŒã¿ãä¿åãããšãGCS ã®ã³ã¹ãã«å ããŠã³ã¹ããçºçããŸãã Scalding ã®ãããªããŒã«ã«ã¯ GCS ã®ããŒã¿ã»ãããå¿
èŠã§ãBigQuery ã«ã¢ã¯ã»ã¹ããã«ã¯åãããŒã¿ã»ããã BigQuery 圢åŒã§èªã¿èŸŒãå¿
èŠããããŸããã
æ°åãã¿ãã€ãã®é »åºŠã®äœãã¯ãšãªãå¿ èŠãšãããŸããªã±ãŒã¹ã§ã¯ãBigQuery ã«ããŒã¿ã»ãããä¿åããã®ã¯è²»çšå¯Ÿå¹æãé«ããªããšå€æããPresto ã䜿çšã㊠GCS ã®ããŒã¿ã»ããã«çŽæ¥ã¢ã¯ã»ã¹ããŸããã ãããè¡ãããã«ãBigQuery å€éšããŒã¿ ãœãŒã¹ã«æ³šç®ããŸãã
次ã®ã¹ããã
ã¢ã«ãã¡çã®ãªãªãŒã¹ä»¥æ¥ãBigQuery ã«å¯Ÿããå€ãã®é¢å¿ãå¯ããããŠããŸããã BigQuery ã«ã¯ããã«å€ãã®ããŒã¿ã»ãããšã³ãã³ããè¿œå ãããŠããŸãã ç§ãã¡ã¯ãBigQuery ã¹ãã¬ãŒãžã«èªã¿æžãããããã® Scalding ãªã©ã®ããŒã¿åæããŒã«çšã®ã³ãã¯ã¿ãéçºããŠããŸãã BigQuery ããŒã¿ã»ããã䜿çšããŠãšã³ã¿ãŒãã©ã€ãºå質ã®ã¬ããŒããã¡ã¢ãäœæããããã® Looker ã Apache Zeppelin ãªã©ã®ããŒã«ãæ€èšããŠããŸãã
Google ãšã®ã³ã©ãã¬ãŒã·ã§ã³ã¯éåžžã«çç£çã§ããããã®ããŒãããŒã·ãããç¶ç¶ããçºå±ãããŠããããšãå¬ããæããŸãã Google ãšååããŠç¬èªã®å®è£
ãè¡ããŸãã
Google ã«å¯Ÿããåªå 床ã®é«ãæ©èœãªã¯ãšã¹ãã®äžéšã以äžã«ç€ºããŸãã
- 䟿å©ãªããŒã¿åä¿¡ãš LZO-Thrift 圢åŒã®ãµããŒãã®ããã®ããŒã«ã
- æéããšã®ã»ã°ã¡ã³ããŒã·ã§ã³
- ããŒãã«ãè¡ãåã¬ãã«ã®æš©éãªã©ã®ã¢ã¯ã»ã¹å¶åŸ¡ã®æ¹åã
- ããã°ã¯ãšãªãŒ
å€éšããŒã¿ãœãŒã¹ Hive Metastore ã®çµ±åãš LZO-Thrift 圢åŒã®ãµããŒããåããŠããŸãã - BigQuery ãŠãŒã¶ãŒ ã€ã³ã¿ãŒãã§ãŒã¹ã§ã®ããŒã¿ ã«ã¿ãã°ã®çµ±åã®æ¹å
- ã»ã«ããµãŒãã¹ã«ããã¹ãããå²ãåœãŠãšç£èŠã
ãŸãšã
ããŒã¿åæãèŠèŠåãæ©æ¢°åŠç¿ãå®å šãªæ¹æ³ã§æ°äž»åããããšã¯ãããŒã¿ ãã©ãããã©ãŒã ããŒã ã«ãšã£ãŠã®æåªå äºé ã§ãã ç§ãã¡ã¯ããã®ç®æšã®éæã«åœ¹ç«ã€ããŒã«ãšã㊠Google BigQuery ãšããŒã¿ããŒã¿ã«ãç¹å®ããæšå¹Ž BigQuery Alpha ãå šç€Ÿã«ãªãªãŒã¹ããŸããã
BigQuery ã®ã¯ãšãªã¯ã·ã³ãã«ã§å¹ççã§ããããšãããããŸããã åçŽãªãã€ãã©ã€ã³ã®ããŒã¿ã®åã蟌ã¿ãšå€æã«ã¯ Google ããŒã«ã䜿çšããŸããããè€éãªãã€ãã©ã€ã³ã®å Žåã¯ç¬èªã® Airflow ãã¬ãŒã ã¯ãŒã¯ãæ§ç¯ããå¿ èŠããããŸããã ããŒã¿ç®¡çã®åéã§ã¯ãBigQuery ã®èªèšŒãèªå¯ãç£æ»ã®ãµãŒãã¹ãããŒãºãæºãããŸãã ã¡ã¿ããŒã¿ã管çãããã©ã€ãã·ãŒãç¶æããã«ã¯ããããªãæè»æ§ãå¿ èŠã§ãããç¬èªã®ã·ã¹ãã ãæ§ç¯ããå¿ èŠããããŸããã BigQuery ã¯ãããŒãžã ãµãŒãã¹ãªã®ã§äœ¿ããããã£ãã§ãã ã¯ãšãªã®ã³ã¹ãã¯æ¢åã®ããŒã«ãšåæ§ã§ããã BigQuery ã«ããŒã¿ãä¿åããã«ã¯ãGCS ã®ã³ã¹ãã«å ããŠã³ã¹ããçºçããŸãã
å šäœãšããŠãBigQuery ã¯äžè¬ç㪠SQL åæã«é©ããŠããŸãã BigQuery ã«ã¯å€ãã®é¢å¿ãå¯ããããŠãããããå€ãã®ããŒã¿ã»ããã移è¡ããããå€ãã®ããŒã ãæéããBigQuery ã䜿çšããŠããå€ãã®ãã€ãã©ã€ã³ãæ§ç¯ããããšã«åãçµãã§ããŸãã Twitter ã¯ããŸããŸãªããŒã¿ã䜿çšãããããScaldingãSparkãPrestoãDruid ãªã©ã®ããŒã«ãçµã¿åãããå¿ èŠããããŸãã åœç€Ÿã¯ä»åŸãããŒã¿åæããŒã«ã匷åããåœç€Ÿã®ãµãŒãã¹ãæ倧éã«æŽ»çšããæ¹æ³ã«ã€ããŠãŠãŒã¶ãŒã«æ確ãªã¬ã€ãã³ã¹ãæäŸããã€ããã§ãã
æè¬ã®èšè
ãã®ãããžã§ã¯ãã§å€å€§ãªååãšå°œåãããŠãããå ±èè ã§ããããŒã ã¡ã€ãã® Anju Jha ãš Will Pascucci ã«æè¬ããŸãã ãŸããç§ãã¡ãå©ããŠããã Twitter ãš Google ã®ããã€ãã®ããŒã ã®ãšã³ãžãã¢ãšãããŒãžã£ãŒããããŠè²Žéãªãã£ãŒãããã¯ãæäŸããŠããã Twitter ã® BigQuery ãŠãŒã¶ãŒã«ãæè¬ããããšæããŸãã
ãããã®åé¡ã«åãçµãããšã«èå³ãããå Žåã¯ããã¡ãããã§ãã¯ããŠãã ããã
åºæïŒ habr.com