ããã«ã¡ã¯ãããïŒçŸåšãOTUS ã¯æ°ããã³ãŒã¹ ã¹ããªãŒã ã®ç»é²ãéå§ããŠããŸãã ãã³ãŒã¹ã®éå§ã«åããŠãåŒãç¶ã圹ç«ã€è³æãå ±æããŠãããŸãã

ããŒã¿ç®¡ç
匷åãªããŒã¿ ã¬ããã³ã¹ã¯ãTwitter ãšã³ãžãã¢ãªã³ã°ã®äžæ žãšãªãååã§ãã BigQuery ããã©ãããã©ãŒã ã«çµ±åããéãããŒã¿ã®æ€åºãã¢ã¯ã»ã¹å¶åŸ¡ãã»ãã¥ãªãã£ããã©ã€ãã·ãŒã«éç¹ã眮ããŠããŸãã
ããŒã¿ã®çºèŠãšç®¡çã®ããã«ãããŒã¿ã¢ã¯ã»ã¹å±€ïŒDALïŒãæ¡åŒµããŸããã ïŒã¯ããªã³ãã¬ãã¹ãš Google Cloud ã®äž¡æ¹ã®ããŒã¿ã«å¯Ÿå¿ããããŒã«ãæäŸãããŠãŒã¶ãŒã«åäžã®ã€ã³ã¿ãŒãã§ãŒã¹ãš API ãæäŸããŸãã Googleãšã㊠äžè¬å ¬éã«åããŠæºåãé²ãã«ã€ããŠãåæ€çŽ¢ãªã©ã®æ©èœããŠãŒã¶ãŒã«æäŸããããã«ãããžã§ã¯ãã«çµã¿èŸŒãäºå®ã§ãã
BigQuery ã䜿çšãããšããŒã¿ã®å ±æãã¢ã¯ã»ã¹ã容æã«ãªããŸãããããŒã¿ã®æµåºãé²ãããã«ããçšåºŠã®å¶åŸ¡ãå¿ èŠã§ãããä»ã®ããŒã«ã®äžãããæ¬¡ã® 2 ã€ã®æ©èœãéžæããŸããã
- : ãŠãŒã¶ãŒã BigQuery ããŒã¿ã»ããã Twitter å€éšã®ãŠãŒã¶ãŒãšå ±æã§ããªãããã«ããããŒã¿æ©èœã
- : ããŒã¿ã®æµåºãé²ãããŠãŒã¶ãŒã«æ¢ç¥ã® IP ç¯å²ãã BigQuery ã«ã¢ã¯ã»ã¹ããããèŠæ±ããã³ã³ãããŒã«ã
ã»ãã¥ãªãã£ã確ä¿ããããã«ã次ã®èªèšŒãæ¿èªãç£æ» (AAA) èŠä»¶ãå®è£ ããŠããŸãã
- èªèšŒ: ã¢ããã㯠ãªã¯ãšã¹ãã«ã¯ GCP ãŠãŒã¶ãŒ ã¢ã«ãŠã³ãã䜿çšããæ¬çªç°å¢ãªã¯ãšã¹ãã«ã¯ãµãŒãã¹ ã¢ã«ãŠã³ãã䜿çšããŸããã
- æ¿èª: åããŒã¿ã»ããã«ã¯ãææè ã®ãµãŒãã¹ ã¢ã«ãŠã³ããšãªãŒã㌠ã°ã«ãŒããå¿ èŠã§ããã
- ç£æ»: 詳现ãªã¯ãšãªå®è¡æ å ±ãå«ã BigQuery Stackdriver ãã°ã BigQuery ããŒã¿ã»ããã«ãšã¯ã¹ããŒãããç°¡åã«åæã§ããããã«ããŸããã
Twitter ãŠãŒã¶ãŒã®å人ããŒã¿ãé©åã«åŠçããã«ã¯ããã¹ãŠã® BigQuery ããŒã¿ã»ãããç»é²ããå人ããŒã¿ã«æ³šéãä»ããé©åãªã¹ãã¬ãŒãžãç¶æãããŠãŒã¶ãŒã«ãã£ãŠåé€ãããããŒã¿ãåé€ (ããŒãž) ããå¿ èŠããããŸãã
Googleã§èª¿ã¹ãŠã¿ã ã¯ãæ©æ¢°åŠç¿ã䜿çšããŠæ©å¯ããŒã¿ãåé¡ããã³ç·šéããŸãããæ£ç¢ºæ§ãéèŠããŠããŒã¿ã»ããã«æåã§æ³šéãä»ããããšã«ããŸãããã«ã¹ã¿ã 泚éãè£å®ããããã«ãData Loss Prevention API ã䜿çšããäºå®ã§ãã
Twitter ã§ã¯ãBigQuery ã®ããŒã¿ã»ããã«å¯Ÿã㊠4 ã€ã®ãã©ã€ãã·ãŒ ã«ããŽãªãäœæããŸãããæ©å¯æ§ã®äœãé ã«ä»¥äžã«ç€ºããŸãã
- æ©å¯æ§ã®é«ãããŒã¿ ã»ããã«ã¯ãæå°æš©éã®ååã«åºã¥ããŠå¿ èŠã«å¿ããŠã¢ã¯ã»ã¹ãããŸããåããŒã¿ã»ããã«ã¯åå¥ã®èªè ã°ã«ãŒãããããåã ã®ã¢ã«ãŠã³ãã«ãã䜿çšç¶æ³ã远跡ããŸãã
- äžçšåºŠã®æ©å¯æ§ããŒã¿ã»ãã (ãœã«ãããã·ã¥ã䜿çšããäžæ¹åä»®å) ã«ã¯å人ãç¹å®ã§ããæ å ± (PII) ãå«ãŸãããããå€§èŠæš¡ãªåŸæ¥å¡ã°ã«ãŒããã¢ã¯ã»ã¹ã§ããŸãããã©ã€ãã·ãŒã®æžå¿µãšããŒã¿ã®æçšæ§ã®éã®ãã©ã³ã¹ãåããŠããŸããããã«ãããåŸæ¥å¡ã¯å®éã®ãŠãŒã¶ãŒã誰ã§ããããç¥ããªããŠããæ©èœã䜿çšãããŠãŒã¶ãŒæ°ãèšç®ãããªã©ã®åæã¿ã¹ã¯ãå®è¡ã§ããŸãã
- ãã¹ãŠã®ãŠãŒã¶ãŒè奿 å ±ãå«ããäœæåºŠããŒã¿ã»ãããããã¯ãã©ã€ãã·ãŒã®èгç¹ããã¯è¯ãã¢ãããŒãã§ããããŠãŒã¶ãŒ ã¬ãã«ã®åæã«ã¯äœ¿çšã§ããŸããã
- å ¬éããŒã¿ã»ããïŒTwitter å€ã§å ¬éïŒã¯ããã¹ãŠã® Twitter åŸæ¥å¡ãå©çšã§ããŸãã
ãã°èšé²ã«é¢ããŠã¯ãã¹ã±ãžã¥ãŒã«ãããã¿ã¹ã¯ã䜿çšããŠBigQueryããŒã¿ã»ãããåæããããŒã¿ã¢ã¯ã»ã¹å±€ïŒïŒãTwitter ã®ã¡ã¿ããŒã¿ ãªããžããªã§ãããŠãŒã¶ãŒã¯ããŒã¿ã»ããã«ãã©ã€ãã·ãŒæ å ±ãæ³šéä»ãããä¿ææéãæå®ããŸããã¯ãªãŒãã³ã°ã«é¢ããŠã¯ã次㮠2 ã€ã®ãªãã·ã§ã³ã®ããã©ãŒãã³ã¹ãšã³ã¹ããè©äŸ¡ããŸãã 1. Scalding ãªã©ã®ããŒã«ã䜿çšã㊠GCS ã®ããŒã¿ã»ãããã¯ãªãŒãã³ã°ããBigQuery ã«ããŒãããŸãã 2. BigQuery DML æŒç®åã®äœ¿çšãããŸããŸãªã°ã«ãŒããããŒã¿ã®èŠä»¶ãæºããããã«ãäž¡æ¹ã®æ¹æ³ãçµã¿åãããŠäœ¿çšââããããšã«ãªãã§ãããã
ã·ã¹ãã æ©èœ
BigQuery ã¯ãããŒãžã ãµãŒãã¹ã§ãããããã·ã¹ãã ã®ç®¡çããªã³ã³ãŒã«æ¥åã®å®è¡ã« Twitter ã® SRE ããŒã ãé¢äžãããå¿ èŠã¯ãããŸããã§ãããã¹ãã¬ãŒãžãšã³ã³ãã¥ãŒãã£ã³ã°ã®äž¡æ¹ã«å€§å®¹éãæäŸããããšã容æã«ãªããŸããã Google ãµããŒãã§ãã±ãããäœæããããšã§ãã¹ãããã®äºçŽã倿Žã§ããŸããã»ã«ããµãŒãã¹ã®ã¹ãããå²ãåœãŠãããã·ã¥ããŒãç£èŠã®æ¹åãªã©ãæ¹åã§ããé åãç¹å®ãããããã®ãªã¯ãšã¹ãã Google ã«æž¡ããŸããã
ã®ã³ã¹ã
äºåçãªåæã§ã¯ãBigQuery ãš Presto ã®ã¯ãšãªã³ã¹ãã¯åçšåºŠã§ããããšã瀺ãããŸãããã¹ããããè³Œå ¥ãã æ¯æå®å®ããè²»çšãæ¯æã代ããã« åŠçãããããŒã¿ 1 TB ãããããã®æ±ºå®ã¯ãåãªã¯ãšã¹ããè¡ãåã«ã³ã¹ãã«ã€ããŠèããããªããšãããŠãŒã¶ãŒããã®ãã£ãŒãããã¯ã«åºã¥ããŠããŸããã
BigQuery ã«ããŒã¿ãä¿åãããšãGCS ã®ã³ã¹ãã«å ããŠã³ã¹ããçºçããŸãã Scalding ãªã©ã®ããŒã«ã§ã¯ããŒã¿ã»ããã GCS ã«ããå¿ èŠããããBigQuery ã«ã¢ã¯ã»ã¹ããã«ã¯åãããŒã¿ã»ããã BigQuery 圢åŒã§ããŒãããå¿ èŠããããŸããã ãç§ãã¡ã¯ Scalding ã BigQuery ããŒã¿ã»ããã«æ¥ç¶ããäœæ¥ãé²ããŠãããããã«ãã GCS ãš BigQuery ã®äž¡æ¹ã«ããŒã¿ã»ãããä¿åããå¿ èŠããªããªããŸãã
æ°åãã¿ãã€ãã®äœé »åºŠã¯ãšãªãå¿ èŠãªãŸããªã±ãŒã¹ã§ã¯ãBigQuery ã«ããŒã¿ã»ãããä¿åããã®ã¯è²»çšå¯Ÿå¹æãäœããšå€æããPresto ã䜿çšã㊠GCS ã®ããŒã¿ã»ããã«çŽæ¥ã¢ã¯ã»ã¹ããŸããããã®ãããBigQuery å€éšããŒã¿ ãœãŒã¹ãæ€èšããŸãã
次ã®ã¹ããã
ã¢ã«ãã¡çã®ãªãªãŒã¹ä»¥æ¥ãBigQuery ã«ã¯å€ãã®é¢å¿ãå¯ããããŠããŸãã BigQuery ã«ããã«å€ãã®ããŒã¿ã»ãããšã³ãã³ãã远å ããŠããŸããç§ãã¡ã¯ãScalding ãªã©ã®ããŒã¿åæããŒã«ã BigQuery ã¹ãã¬ãŒãžãèªã¿æžãããããã®ã³ãã¯ã¿ãéçºããŠããŸãã BigQuery ããŒã¿ã»ããã䜿çšããŠãšã³ã¿ãŒãã©ã€ãºå質ã®ã¬ããŒããã¡ã¢ãäœæããããã® Looker ã Apache Zeppelin ãªã©ã®ããŒã«ãæ€èšããŸãã
Google ãšã®ååã¯éåžžã«çç£çã§ããããã®ããŒãããŒã·ãããç¶ç¶ãçºå±ãããŠããããšãå¬ããæã£ãŠããŸãã GoogleãšååããŠç¬èªã® ã¯ãšãªã Google ã«çŽæ¥éä¿¡ããŸãã BigQuery Parquet ããŒããŒãªã©ããã®äžéšã¯ãã§ã« Google ã«ãã£ãŠå®è£ ãããŠããŸãã
Google ã«å¯Ÿããåªå 床ã®é«ãæ©èœãªã¯ãšã¹ããããã€ã玹ä»ããŸãã
- 䟿å©ãªããŒã¿åä¿¡ãš LZO-Thrift 圢åŒã®ãµããŒãã®ããã®ããŒã«ã
- æéå¥ã»ã°ã¡ã³ããŒã·ã§ã³
- ããŒãã«ãè¡ãåã¬ãã«ã®æš©éãªã©ã®ã¢ã¯ã»ã¹å¶åŸ¡ã®æ¹åã
- ããã°ã¯ãšãªãŒ Hive Metastore çµ±åãš LZO-Thrift 圢åŒã®ãµããŒããåããŠããŸãã
- BigQuery UI ã§ã®ããŒã¿ã«ã¿ãã°çµ±åã®æ¹å
- ã¹ãããã®å²ãåœãŠãšç£èŠã®ããã®ã»ã«ããµãŒãã¹ã
ãŸãšã
ããŒã¿åæãèŠèŠåãæ©æ¢°åŠç¿ãå®å šãªæ¹æ³ã§æ°äž»åããããšã¯ãããŒã¿ ãã©ãããã©ãŒã ããŒã ã®æåªå äºé ã§ããç§ãã¡ã¯ããã®ç®æšãéæããã®ã«åœ¹ç«ã€ããŒã«ãšã㊠Google BigQuery ãš Data Studio ãç¹å®ããæšå¹Ž BigQuery Alpha ãå šç€Ÿã«ãªãªãŒã¹ããŸããã
BigQuery ã®ã¯ãšãªã¯ã·ã³ãã«ã§å¹ççã§ããããšãããããŸãããããŒã¿ã®åã蟌ã¿ãšå€æã«ã¯ãåçŽãªãã€ãã©ã€ã³ã«ã¯ Google ããŒã«ã䜿çšããŸããããè€éãªãã€ãã©ã€ã³ã®å Žåã¯ç¬èªã® Airflow ã€ã³ãã©ã¹ãã©ã¯ãã£ãæ§ç¯ããå¿ èŠããããŸãããããŒã¿ç®¡çåéã§ã¯ãBigQuery ã®èªèšŒãèªå¯ãç£æ»ãµãŒãã¹ãç§ãã¡ã®ããŒãºãæºãããŠããŸããã¡ã¿ããŒã¿ã管çãããã©ã€ãã·ãŒãç¶æããã«ã¯ãããæè»æ§ãå¿ èŠã§ãããç¬èªã®ã·ã¹ãã ãæ§ç¯ããå¿ èŠããããŸããã BigQuery ã¯ãããŒãžã ãµãŒãã¹ãªã®ã§ãæäœãç°¡åã§ãããã¯ãšãªã³ã¹ãã¯æ¢åã®ããŒã«ãšåæ§ã§ããã BigQuery ã«ããŒã¿ãä¿åãããšãGCS ã®ã³ã¹ãã«å ããŠã³ã¹ããçºçããŸãã
å šäœçã«ãBigQuery ã¯äžè¬ç㪠SQL åæã«é©ããŠããŸãã BigQuery ã«ã¯å€ãã®é¢å¿ãå¯ããããŠãããç§ãã¡ã¯ããå€ãã®ããŒã¿ã»ããã®ç§»è¡ãããå€ãã®ããŒã ã®åå ããã㊠BigQuery ã䜿çšããããå€ãã®ãã€ãã©ã€ã³ã®æ§ç¯ã«åãçµãã§ããŸãã Twitter ã¯ããŸããŸãªããŒã¿ã䜿çšãããããScaldingãSparkãPrestoãDruid ãªã©ã®ããŒã«ã®çµã¿åãããå¿ èŠã«ãªããŸããåœç€Ÿã¯ä»åŸãããŒã¿åæããŒã«ã®æ§ç¯ãç¶ç¶ãããŠãŒã¶ãŒã®çæ§ã«åœç€Ÿã®ãµãŒãã¹ãæå€§éã«æŽ»çšããããã®æç¢ºãªã¬ã€ãã³ã¹ãæäŸããŠããæåã§ãã
æè¬ã®èšè
ãã®ãããžã§ã¯ãã§çŽ æŽãããååãšå€å€§ãªåªåãããŠãããå ±èè ã§ããããŒã ã¡ã€ãã® Anju Jha ãš Will Pascucci ã«æè¬ããŸãããŸããç§ãã¡ãæ¯æŽããŠããã Twitter ãš Google ã®è€æ°ã®ããŒã ã®ãšã³ãžãã¢ãšãããŒãžã£ãŒããããŠè²Žéãªãã£ãŒãããã¯ãæäŸããŠããã Twitter ã® BigQuery ãŠãŒã¶ãŒã«ãæè¬ããããšæããŸãã
ãããã®ã¿ã¹ã¯ã«åãçµãããšã«èå³ãããå Žåã¯ã ããŒã¿ ãã©ãããã©ãŒã ããŒã ã«æå±ã
åºæïŒ habr.com
