ãã€ãã©ãã§äœã圹ã«ç«ã£ãããå人çãªçµéšãããäŒãããŸããäžè¬çãã€ããŒãå¥ã«ãäœãã©ããŸã§æãäžããã°ããããæç¢ºã«ãªããŸãããããã§ã¯ç§ã®äž»èгçãªå人ççµéšã®ã¿ãåãäžããŠããããããããããªãã«ãšã£ãŠã¯ãã¹ãŠããŸã£ããç°ãªãã§ãããã
ã¯ãšãªèšèªãç¥ã£ãŠãæ±ããããã«ãªãããšãéèŠãªã®ã¯ãªãã§ãã?ããŒã¿ ãµã€ãšã³ã¹ã«ã¯ãæ¬è³ªçã«ããã€ãã®éèŠãªäœæ¥æ®µéããããæåã®æãéèŠãªæ®µé (ããããªããã°äœãæ©èœããŸãã!) ã¯ãããŒã¿ã®ååŸãŸãã¯æœåºã§ããã»ãšãã©ã®å Žåãäœããã®åœ¢åŒã®ããŒã¿ã¯ã©ããã«ä¿åãããŠããããããããæœåºãããå¿ èŠããããŸãã
ã¯ãšãªèšèªã¯ãŸãã«ãã®ããŒã¿ãæœåºã§ããããã«ãããã®ã§ãããããŠä»æ¥ã¯ãç§ã«ãšã£ãŠåœ¹ã«ç«ã£ãã¯ãšãªèšèªã«ã€ããŠã話ãããã©ãã§ãã©ã®ããã«ããªããããç ç©¶ã«å¿ èŠãªã®ããæ£ç¢ºã«èª¬æãã瀺ããŸãã
ãã®èšäºã§ã¯ãããŒã¿ ã¯ãšãª ã¿ã€ãã®äž»ãªãããã¯ãšããŠæ¬¡ã® 3 ã€ã«ã€ããŠèª¬æããŸãã
- ãæšæºãã¯ãšãªèšèªãšã¯ããªã¬ãŒã·ã§ãã«ä»£æ°ã SQL ãªã©ã®ã¯ãšãªèšèªã«ã€ããŠè©±ããšãã«éåžžæå³ããããã®ã§ãã
- ã¹ã¯ãªãã ã¯ãšãªèšèª: ããšãã°ãpandasãnumpyãã·ã§ã« ã¹ã¯ãªãããªã©ã® Python é¢é£ã®èšèªã
- ãã¬ããžã°ã©ããšã°ã©ãããŒã¿ããŒã¹çšã®ã¯ãšãªèšèªã
ããã«æžãããŠããããšã¯ãã¹ãŠãå人çãªçµéšã圹ã«ç«ã£ãããšãç¶æ³ã®èª¬æããããŠããªããããå¿ èŠã ã£ãã®ããã§ãã誰ãããåããããªç¶æ³ã«ééãããããããªãããšãæ³åãããããžã§ã¯ãã§ïŒç·æ¥ã«ïŒäœ¿çšããããå¿ èŠãªãããžã§ã¯ãã«ãã©ãçãåã«ãããã®èšèªãçè§£ããŠãäºåã«æºåããããšãã§ããŸãã
ãæšæºãã¯ãšãªèšèª
æšæºã¯ãšãªèšèªã¯ãã¯ãšãªã«ã€ããŠè©±ããšãã«ç§ãã¡ãéåžžèãããã®ãšãŸã£ããåãã§ãã
ãªã¬ãŒã·ã§ãã«ä»£æ°
仿¥ããªã¬ãŒã·ã§ãã«ä»£æ°ã¯ãªãå¿ èŠãªã®ã§ãããã?ã¯ãšãªèšèªãç¹å®ã®æ¹æ³ã§èšèšãããŠããçç±ãååã«çè§£ããæèçã«äœ¿çšããã«ã¯ãã¯ãšãªèšèªã®æ ¹åºã«ããã³ã¢ãçè§£ããå¿ èŠããããŸãã
ãªã¬ãŒã·ã§ãã«ä»£æ°ãšã¯äœã§ãã?
æ£åŒãªå®çŸ©ã¯æ¬¡ã®ãšããã§ãããªã¬ãŒã·ã§ãã«ä»£æ°ã¯ããªã¬ãŒã·ã§ãã« ããŒã¿ ã¢ãã«å ã®é¢ä¿ã«å¯Ÿããæäœã®éããã·ã¹ãã ã§ããããå°ã人éçã«èšãã°ãããã¯çµæãåžžã«ããŒãã«ã«ãªããããªããŒãã«äžã®æŒç®ã·ã¹ãã ã§ãã
ãã¹ãŠã®é¢ä¿æŒç®ãèŠã Habr ã®èšäº - ããã§ã¯ããªãç¥ã£ãŠããå¿ èŠãããã®ãââããŸãã©ãã§åœ¹ç«ã€ã®ãã説æããŸãã
ãªãã§ããïŒ
ã¯ãšãªèšèªãäœã§æ§æãããŠããããç¹å®ã®ã¯ãšãªèšèªã®åŒã®èåŸã«ã©ã®ãããªæäœãããã®ãââãçè§£ãå§ããŸããããã«ãããã¯ãšãªèšèªã§äœãã©ã®ããã«æ©èœãããã«ã€ããŠãããæ·±ãçè§£ãåŸãããããšãå€ãã§ãã

ããåããã èšäºãæäœã®äŸãšããŠã¯ãããŒãã«ãçµåããçµåããããŸãã
åŠç¿ææ:
ãäžè¬çã«ããªã¬ãŒã·ã§ãã«ä»£æ°ãšçè«ã«é¢ããè³æã¯ãCourseraãUdacity ãªã©å€æ°ãããŸããåªãããã®ãå«ããŠããªã³ã©ã€ã³äžã«ã¯èšå€§ãªéã®è³æããããŸãã ãç§ã®å人çãªã¢ããã€ã¹ïŒãªã¬ãŒã·ã§ãã«ä»£æ°ãéåžžã«ããçè§£ããå¿ èŠããããŸãããªã¬ãŒã·ã§ãã«ä»£æ°ã¯åºç€äžã®åºç€ã§ãã
SQL

ããåããã èšäºã
SQL ã¯æ¬è³ªçã«ãªã¬ãŒã·ã§ãã«ä»£æ°ã®å®è£ ã§ãããã ããéèŠãªæ³šæç¹ãšããŠãSQL ã¯å®£èšåã§ããã€ãŸãããªã¬ãŒã·ã§ãã«ä»£æ°ã®èšèªã§ã¯ãšãªãæžããšãã¯ãå®éã«ã¯èšç®æ¹æ³ãèšã£ãŠããã®ã§ãããSQLã§ã¯æœåºããããã®ãæå®ããDBMSããªã¬ãŒã·ã§ãã«ä»£æ°ã®èšèªã§ïŒæå¹ãªïŒåŒãçæããŸãïŒãããã®åçæ§ã¯ã ).

ããåããã èšäºã
ãªãã§ããïŒ
RDBMS: OracleãPostgresãSQL Server ãªã©ã¯ãä»ã§ãäºå®äžã©ãã«ã§ãååšããŠãããããããæäœããªããã°ãªããªãå¯èœæ§ãéåžžã«é«ãã§ããã€ãŸããSQL ãèªã (éåžžã«å¯èœæ§ãé«ã) ããSQL ãæžã (ãããããåŸãªãããšã§ã¯ãããŸãã) å¿ èŠããããšããããšã§ãã
äœãèªãã§å匷ããã
äžèšã®åããªã³ã¯ïŒãªã¬ãŒã·ã§ãã«ä»£æ°ã«ã€ããŠïŒã«ã¯ãäŸãã°ãä¿¡ããããªãã»ã©ã®éã®è³æããããŸãã .
ãšããã§ãNoSQLãšã¯äœã§ãããã?
ããNoSQLããšããçšèªã¯å®å šã«èªç¶çºççã«çãŸãããã®ã§ãããäžè¬ã«åãå ¥ããããŠããå®çŸ©ãç§åŠçæ©é¢ã®æ¯æŽãåããŠããããã§ã¯ãªãããšãæ¹ããŠåŒ·èª¿ããŠãã䟡å€ããããã察å¿ãã ããã«ã§ã
æ¬è³ªçã«ãå€ãã®åé¡ã解決ããããã«å®å šãªãªã¬ãŒã·ã§ãã« ã¢ãã«ã¯å¿ èŠãªããšããããšãèªèãããŸãããç¹ã«ãããã©ãŒãã³ã¹ãäžå¯æ¬ ã§ãéçŽã䌎ãç¹å®ã®åçŽãªã¯ãšãªãäž»æµã§ãããããªåé¡ã§ã¯ãã¡ããªãã¯ããã°ããèšç®ããŠããŒã¿ããŒã¹ã«æžã蟌ãããšãéèŠã§ããããªã¬ãŒã·ã§ãã«æ©èœã®ã»ãšãã©ã¯äžå¿ èŠãªã ãã§ãªãæå®³ã§ããããšã倿ããŸãããç§ãã¡ã«ãšã£ãŠæãéèŠãªããš (ç¹å®ã®ã¿ã¹ã¯ã®å Žå) ã§ããããã©ãŒãã³ã¹ãæãªãã®ã§ããã°ããªãäœããæ£èŠåããã®ã§ããããã
åŸæ¥ã®ãªã¬ãŒã·ã§ãã« ã¢ãã«ã®åºå®ãããæ°åŠçã¹ããŒãã®ä»£ããã«ãæè»ãªã¹ããŒããå¿ èŠã«ãªãããšããããããŸããããã«ãããã·ã¹ãã ãè¿ éã«èµ·åããŠå®è¡ããçµæãåŠçããããšãéèŠãªå Žåããã¹ããŒããä¿åãããããŒã¿ã®çš®é¡ãããã»ã©éèŠã§ãªãå Žåãã¢ããªã±ãŒã·ã§ã³éçºãéåžžã«ç°¡åã«ãªããŸãã
ããšãã°ããšãã¹ããŒã ã·ã¹ãã ãäœæããŠããŠãç¹å®ã®ãã¡ã€ã³ã«é¢ããæ å ±ãã¡ã¿æ å ±ãšãšãã«ä¿åããããšããŸãããã¹ãŠã®ãã£ãŒã«ããææ¡ããŠããããã§ã¯ãªããåã¬ã³ãŒãã® JSON ãåçŽã«ä¿åããå ŽåããããŸããããã«ãããããŒã¿ ã¢ãã«ãæ¡åŒµããŠãã°ããå埩åŠçããããã®éåžžã«æè»ãªç°å¢ãåŸãããŸãããã®ããããã®å Žå㯠NoSQL ãããã«å¥œãŸãããããèªã¿ããããªããŸããã¬ã³ãŒãã®äŸ (ç§ã®ãããžã§ã¯ãã® 1 ã€ãããå¿ èŠãªå Žæã« NoSQL ãé 眮ãããŠããŸãã)ã
{"en_wikipedia_url":"https://en.wikipedia.org/wiki/Johnny_Cash",
"ru_wikipedia_url":"https://ru.wikipedia.org/wiki/?curid=301643",
"ru_wiki_pagecount":149616,
"entity":[42775,"ÐжПММО ÐÑÑ","ru"],
"en_wiki_pagecount":2338861}
ãã£ãšèªãããšãã§ããŸã NoSQL ã«ã€ããŠã
äœãå匷ããã°ããã§ããïŒ
ããã§ã¯ãã¿ã¹ã¯ãããåæããã¿ã¹ã¯ãã©ã®ãããªç¹æ§ãæã£ãŠãããããã®èª¬æã«é©åãã NoSQL ã·ã¹ãã ã¯äœãã調ã¹ãŠããããã®ã·ã¹ãã ã®èª¿æ»ãéå§ããå¿ èŠããããŸãã
ã¹ã¯ãªããã¯ãšãªèšèª
äžèŠãããšãPython ãšã¯äœã®é¢ä¿ãããã®ãââãšæãããŸããPython ã¯ããã°ã©ãã³ã°èšèªã§ãããã¯ãšãªãšã¯ãŸã£ããé¢ä¿ãããŸããã

- Pandas ã¯ããŒã¿ ãµã€ãšã³ã¹ã®çã®äžèœããŒã«ã§ãããèšå€§ãªéã®ããŒã¿å€æãéçŽãªã©ãããã§è¡ãããŸãã
- Numpy - ãã¯ãã«èšç®ãè¡åãç·åœ¢ä»£æ°ã
- Scipy - ãã®ããã±ãŒãžã«ã¯ãç¹ã«çµ±èšã«é¢ããæ°åŠçãªå 容ã倿°å«ãŸããŠããŸãã
- Jupyter ã©ã - å€ãã®æ¢çŽ¢çããŒã¿åæãããŒãããã¯ã«ããŸãåãŸãã®ã§ãç¥ã£ãŠãããšäŸ¿å©ã§ãã
- ãªã¯ãšã¹ã â ãããã¯ãŒã¯ã®æäœã
- Pyspark - ããŒã¿ ãšã³ãžãã¢ã®éã§éåžžã«äººæ°ãããããã®äººæ°ã®é«ããããããããããã Spark ã®ããããã䜿çšããããšã«ãªãã§ãããã
- *Selenium ã¯ãWeb ãµã€ãããªãœãŒã¹ããããŒã¿ãåéããã®ã«éåžžã«äŸ¿å©ã§ãããããŒã¿ãååŸããä»ã®æ¹æ³ããªãå ŽåããããŸãã
ç§ã®äž»ãªã¢ããã€ã¹ã¯ãPython ãåŠã¶ããšã§ãã
ãã³ã
次ã®ã³ãŒããäŸã«æããŠã¿ãŸãããã
import pandas as pd
df = pd.read_csv(âdata/dataset.csvâ)
# Calculate and rename aggregations
all_together = (df[df[âtrip_typeâ] == âreturnâ]
.groupby(['start_station_name','end_station_name'])
.agg({'trip_duration_seconds': [np.size, np.mean, np.min, np.max]})
.rename(columns={'size': 'num_trips',
'mean': 'avg_duration_seconds',
'amin': min_duration_seconds',
âamax': 'max_duration_seconds'}))åºæ¬çã«ãã³ãŒãã¯åŸæ¥ã® SQL ãã¿ãŒã³ã«é©åããŠããããšãããããŸãã
SELECT start_station_name, end_station_name, count(trip_duration_seconds) as size, âŠ..
FROM dataset
WHERE trip_type = âreturnâ
GROUPBY start_station_name, end_station_nameãããéèŠãªã®ã¯ããã®ã³ãŒããã¹ã¯ãªãããšãã€ãã©ã€ã³ã®äžéšã§ãããå®éã«ã¯ãªã¯ãšã¹ãã Python ãã€ãã©ã€ã³ã«åã蟌ãã§ãããšããããšã§ãããã®ãããªç¶æ³ã§ã¯ãã¯ãšãªèšèªã¯ Pandas ã pySpark ãªã©ã®ã©ã€ãã©ãªããæäŸãããŸãã
å šäœçã«ãpySpark ã§ã¯ã次ã®ãããªèãæ¹ã«åºã¥ããŠãã¯ãšãªèšèªãéããŠåæ§ã®ã¿ã€ãã®ããŒã¿å€æãè¡ãããŸãã
df.filter(df.trip_type = âreturnâ)
.groupby(âdayâ)
.agg({duration: 'mean'})
.sort()ã©ãã§äœãèªãã
Pythonå šè¬ã«ã€ã㊠å匷ã®ããã®è³æãèŠã€ãããã€ã³ã¿ãŒãããäžã«ã¯èšå€§ãªæ°ã®ãã¥ãŒããªã¢ã«ããã , ããã³ã³ãŒã¹ ïŒãããŠæã ïŒãå šäœçã«ãããã«ããè³æã¯ã°ãŒã°ã«ã§æ€çŽ¢ããã®ã«æé©ã§ãããã 1 ã€ã®ããã±ãŒãžã«éç¹ã眮ãå¿ èŠããããšãããããã¡ãããã³ãã§ãã DS+Pythonãã³ãã«ã«é¢ããè³æããããŸã .
ã¯ãšãªèšèªãšããŠã®ã·ã§ã«
ç§ããããŸã§åãçµãã§ããããŒã¿åŠçããã³åæãããžã§ã¯ãã®å€ãã¯ãæ¬è³ªçã«ã¯ Python ã³ãŒããJava ã³ãŒããããã³å®éã®ã·ã§ã« ã³ãã³ãèªäœãåŒã³åºãã·ã§ã« ã¹ã¯ãªããã§ãããããã£ãŠãäžè¬çã«ãbash/zsh ãªã©ã®ãã€ãã©ã€ã³ã¯ãããçš®ã®é«ã¬ãã«ã®ãªã¯ãšã¹ããšèŠãªãããšãã§ããŸã (ãã¡ãããããã«ãµã€ã¯ã«ãé 眮ããããšãã§ããŸãããããã¯ã·ã§ã«èšèªã® DS ã³ãŒãã§ã¯äžè¬çã§ã¯ãããŸãã)ãç°¡åãªäŸãæããŠã¿ãŸããããWikidata ã® QID ãšãã·ã¢èªããã³è±èªã® wiki ãžã®å®å šãªãªã³ã¯ããããããå¿ èŠããããŸããããã®ããã«ãbash ã®ã³ãã³ãããã®ç°¡åãªãªã¯ãšã¹ããèšè¿°ããåºåçšã« Python ã§ç°¡åãªã¹ã¯ãªãããèšè¿°ããŸããããããæ¬¡ã®ããã«ãŸãšããŸããã
pv âdata/latest-all.json.gzâ |
unpigz -c |
jq --stream $JQ_QUERY |
python3 scripts/post_process.py "output.csv"
ã©ã
JQ_QUERY = 'select((.[0][1] == "sitelinks" and (.[0][2]=="enwiki" or .[0][2] =="ruwiki") and .[0][3] =="title") or .[0][1] == "id")' ããã¯å®éã«ãå¿ èŠãªãããã³ã°ãäœæãããã€ãã©ã€ã³å šäœã§ããããã¹ãŠããã㌠ã¢ãŒãã§åäœããŠããããšãããããŸãã
- pv filepath - ãã¡ã€ã«ãµã€ãºã«åºã¥ããŠé²è¡ç¶æ³ããŒã衚瀺ãããã®å å®¹ãæž¡ããŸã
- unpigz -cã¯ã¢ãŒã«ã€ãã®äžéšãèªã¿åããjqãåºåããŸãã
- ããŒä»ãã®jq â ã¹ããªãŒã ã¯ããã«çµæãçæããããããã¹ãããã»ããµã«æž¡ããŸãïŒæåã®äŸãšåãããã«ïŒPythonã§
- å éšçã«ã¯ãã¹ãããã»ããµã¯åºåããã©ãŒãããããåçŽãªã¹ããŒããã·ã³ã§ããã
ãã®çµæã倧ããªãªãœãŒã¹ãå¿ èŠãšãããã·ã³ãã«ãªãã€ãã©ã€ã³ãšããã€ãã®ããŒã«ããäœæããããå€§èŠæš¡ãªããŒã¿ (0.5 TB) ã«å¯ŸããŠãã㌠ã¢ãŒãã§åäœããè€éãªãã€ãã©ã€ã³ãå®çŸããŸããã
ããäžã€ã®éèŠãªã¢ããã€ã¹: ã¿ãŒããã«ã§é©åãã€å¹ççã«äœæ¥ããbash/zsh ãªã©ã§èšè¿°ã§ããããšã
ã©ãã§åœ¹ç«ã¡ãŸãã?ã¯ããã»ãšãã©ã©ãã«ã§ããããŸããã€ã³ã¿ãŒãããäžã«ã¯åŠç¿ã®ããã®è³æã倧éã«ååšããŸããç¹ã«ãããã§ã¯ ç§ã®ååã®èšäºã
R ã¹ã¯ãªãã
åã³ãèªè ã¯ããå«ã¶ãããããŸããããããã¯å®å šãªããã°ã©ãã³ã°èšèªã ïŒããããŠãã¡ããåœŒã¯æ£ããã§ããããããããç§ã R ã«ééããå žåçãªæ¹æ³ã¯ããããæ¬è³ªçã«ã¯ãšãªèšèªã«éåžžã«ãã䌌ãã³ã³ããã¹ãã§åžžã«è¡ãããŠããŸããã
Rã¯çµ±èšèšç®ç°å¢ã§ãããçµ±èšèšç®ãšèŠèŠåã®ããã®èšèªã§ãïŒ ).

æ®åœ± ãã¡ãªã¿ã«ãè¯ãçŽ æãªã®ã§ãªã¹ã¹ã¡ã§ãã
ããŒã¿ ãµã€ãšã³ãã£ã¹ãã R ãç¥ã£ãŠããå¿ èŠãããã®ã¯ãªãã§ãã?å°ãªããšããR ã§ã®ããŒã¿åæã«æºãã£ãŠãã IT éšé以å€ã®äººã ã倿°ååšããããã§ããç§ã¯æ¬¡ã®ãããªå Žæã§åœŒãã«äŒããŸããã
- è£œè¬æ¥çã
- çç©åŠè ã
- éèéšéã
- çŽç²ã«æ°åŠçãªæè²ãåããçµ±èšãæ±ã人ã ã
- ç¹æ®ãªçµ±èšããã³æ©æ¢°åŠç¿ã¢ãã« (å€ãã®å ŽåãR ããã±ãŒãžãšããŠèè ã®ããŒãžã§ã³ã«ã®ã¿ååšããŸã)ã
ãªããããã¯ãšãªèšèªãªã®ã§ãããã?ããééãã圢åŒã§ã¯ãå®éã«ã¯ããŒã¿ã®èªã¿åããã¯ãšãª (ã¢ãã«) ã®ãã©ã¡ãŒã¿ã®ä¿®æ£ãªã©ãã¢ãã«ãäœæããããã®èŠæ±ã§ãããggplot2 ãªã©ã®ããã±ãŒãžã§ããŒã¿ãèŠèŠåããããšãã¯ãšãªãèšè¿°ãã圢åŒã§ãã
èŠèŠåã®ããã®ã¯ãšãªã®äŸ
ggplot(data = beav,
aes(x = id, y = temp,
group = activ, color = activ)) +
geom_line() +
geom_point() +
scale_color_manual(values = c("red", "blue"))äžè¬çã«ãããŒã¿ãã¬ãŒã ãããŒã¿ã®ãã¯ãã«åãªã©ãR ã®å€ãã®ã¢ã€ãã¢ã¯ pandasãnumpyãscipy ãªã©ã® Python ããã±ãŒãžã«ç§»è¡ãããŠãããããäžè¬çã«ãR ã®å€ãã®æ©èœã¯éŠŽæã¿ããã䟿å©ã«æããã§ãããã
ç ç©¶ãã¹ãæ å ±æºã¯æ°å€ããããäŸãã°ã .
ãã¬ããžã°ã©ã
ããã§ç§ã¯å°ãå€ãã£ãçµéšãããŸãããªããªãããã¬ããž ã°ã©ããã°ã©ã ã¯ãšãªèšèªãé »ç¹ã«äœ¿çšããªããã°ãªããªãããã§ãããã®éšåã¯å°ãç¹æ®ãªã®ã§ãç°¡åã«åºæ¬ã«ã€ããŠèª¬æããŸãããã
åŸæ¥ã®ãªã¬ãŒã·ã§ãã« ããŒã¿ããŒã¹ã§ã¯ã¹ããŒããåºå®ãããŠããŸããããããã§ã¯ã¹ããŒãã¯æè»ã§ãåè¿°èªã¯å®éã«ã¯ãåãã§ãããããã«ãã以äžã§ãã
ãã人ç©ãã¢ãã«åããŠãéèŠãªäºæã説æããããšããŸããããšãã°ãç¹å®ã®äººç©ã§ãããã°ã©ã¹ ã¢ãã ã¹ãåãäžãããã®èª¬æãåºç€ãšããŠäœ¿çšããŸãã

ãªã¬ãŒã·ã§ãã« ããŒã¿ããŒã¹ã䜿çšããå Žåãèšå€§ãªæ°ã®åãæã€å·šå€§ãªããŒãã«ãäœæããå¿ èŠããããŸãããã®ã»ãšãã©ã NULL ããããã©ã«ãã® False å€ã§åããããŸããããšãã°ãéåœåœç«å³æžé€šã«ãšã³ããªãæã£ãŠãã人ã¯ã»ãšãã©ããªãã§ãããããã¡ãããããããå¥ã ã®ããŒãã«ã«é 眮ããããšãã§ããŸãããããã¯æçµçã«ã¯ãåºå®ããããªã¬ãŒã·ã§ãã« ã¹ããŒãã䜿çšããŠãè¿°èªãå«ãæè»ãªè«çã¹ããŒããã¢ãã«åããããšãã詊ã¿ã«ãªããŸãã

ãããã£ãŠããã¹ãŠã®ããŒã¿ãã°ã©ããšããŠããŸãã¯ãã€ããªããã³åé
è«çåŒãšããŠä¿åãããŠãããšæ³åããŠãã ããã
äžäœã©ãã§ãããªãã®ã«åºäŒããã®ã§ããããïŒãŸãã ãããã³ä»»æã®ã°ã©ã ããŒã¿ããŒã¹ããªã³ã¯ ããŒã¿ãšé£æºããŸãã
以äžã¯ç§ã䜿çšããäœæ¥ããå¿ èŠããã£ãäž»ãªã¯ãšãªèšèªã§ãã
SPARQL
WikiïŒ
SPARQL ( ãã SPARQL ãããã³ã«ãš RDF ã¯ãšãªèšèªïŒ - ã¢ãã«ã«ãã£ãŠæç€ºããã ãš ãããã®èŠæ±ãéä¿¡ããããã«å¿çããŸãã SPARQLã¯æšå¥šãããŠãã ãããŠãã®æè¡ã®äžã€ã .
å®éã«ã¯ãããã¯è«çåé è¿°èªããã³è«çäºé è¿°èªçšã®ã¯ãšãªèšèªã§ããè«çåŒã§äœãåºå®ãããäœãåºå®ãããŠããªãããåçŽã«è¿°ã¹ãŠããã ãã§ã (éåžžã«åçŽåãããŠããŸã)ã
SPARQLã¯ãšãªãå®è¡ãããRDFïŒãªãœãŒã¹èšè¿°ãã¬ãŒã ã¯ãŒã¯ïŒããŒã¿ããŒã¹èªäœã¯ããªãã«ã§ããã object, predicate, subject â ãããŠã¯ãšãªã¯ãp_55(X, q_33) ãçãšãªã X ãèŠã€ãããšãã粟ç¥ã§ãæå®ãããå¶çŽã«åŸã£ãŠå¿
èŠãªããªãã«ãéžæããŸããããã§ããã¡ãããp_55 㯠ID 55 ã®äœããã®é¢ä¿ã§ãããq_33 㯠ID 33 ã®ãªããžã§ã¯ãã§ã (ãããå
šäœã®ã¹ããŒãªãŒã§ãããããã§ãèãããããã¹ãŠã®è©³çްã¯çç¥ããŠããŸã)ã
ããŒã¿è¡šç€ºã®äŸ:

åçãšåœå¥ã®äŸã¯ãã¡ã .
åºæ¬çãªã¯ãšãªã®äŸ

å®éãç§ãã¡ã¯ãè¿°èªã
member_ofãmember_of(?country,q458) 㯠true ã§ãããq458 ã¯æ¬§å·é£åã® ID ã§ãã
Python ãšã³ãžã³å ã®å®éã® SPARQL ã¯ãšãªã®äŸ:

éåžžãç§ã¯ SPARQL ãæžãã®ã§ã¯ãªãèªãå¿ èŠããããŸããããã®ãããªç¶æ³ã§ã¯ãããŒã¿ãã©ã®ããã«ååŸãããããæ£ç¢ºã«çè§£ããããã«ãå°ãªããšãåºæ¬ã¬ãã«ã§èšèªãçè§£ããããšã¯åœ¹ç«ã€ã¹ãã«ã§ããå¯èœæ§ããããŸãã
ãªã³ã©ã€ã³ã§åŠç¿ã§ããè³æã¯ãããããããŸããäŸãã°ããã¡ã О ãç§ã¯éåžžãå ·äœçãªæ§æãäŸã Google ã§æ€çŽ¢ããŸãããä»ã®ãšããã¯ããã§ååã§ãã
è«çã¯ãšãªèšèª
ãã®ãããã¯ã®è©³çްã«ã€ããŠã¯ç§ã®èšäºãã芧ãã ãã ãããã§ã¯ãè«çèšèªãã¯ãšãªã®èšè¿°ã«é©ããŠããçç±ã«ã€ããŠç°¡åã«èª¬æããŸããæ¬è³ªçã«ãRDF 㯠p(X) ãš h(X,Y) ã®åœ¢åŒã®è«çã¹ããŒãã¡ã³ãã®ã»ããã§ãããè«çã¯ãšãªã¯æ¬¡ã®åœ¢åŒã«ãªããŸãã
output(X) :- country(X), member_of(X,âEUâ).
ããã§ã¯ãX ã«å¯Ÿã㊠country(X) ãçã§ãããã€ãŸã X ãåœã§ãããã〠member_of(X,âEUâ) ã§ããããšããåæã§ãæ°ããè¿°èª output/1 (/1 ã¯åé ãæå³ãã) ãäœæããããšã«ã€ããŠèª¬æããŠããŸãã
ã€ãŸãããã®å ŽåãããŒã¿ãšã«ãŒã«ã®äž¡æ¹ãåãæ¹æ³ã§æç€ºããããããåé¡ãã¢ãã«åããã®ãéåžžã«ç°¡åãã€é©åã«ãªããŸãã
æ¥çã§ã©ãã§åºäŒã£ãã®ã§ãã?: ãã®ãããªèšèªã§ã¯ãšãªãäœæããäŒç€Ÿãšã®å€§èŠæš¡ãªãããžã§ã¯ãå šäœãããã³ã·ã¹ãã ã®äžæ žã«ããçŸåšã®ãããžã§ã¯ã - ããã¯ããªãçããããšã®ããã«æããŸãããæã ééããŸãã
ãŠã£ãããŒã¿ãåŠçããè«çèšèªã®ã³ãŒããã©ã°ã¡ã³ãã®äŸ:

è³æ: ããã§ã¯ãçŸä»£ã®è«çããã°ã©ãã³ã°èšèªã§ãã Answer Set Programming ãžã®ãªã³ã¯ãããã€ã玹ä»ããŸãããã®èšèªãå匷ããããšããå§ãããŸãã
åºæïŒ habr.com
