์ ํ•ฉํ•œ ๊ฒฝ๋กœ ์ฐพ๊ธฐ ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ์œ„ํ•œ JanusGraph ๊ทธ๋ž˜ํ”„ DBMS์˜ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ํ…Œ์ŠคํŠธํ•˜๋Š” ์‹คํ—˜

์ ํ•ฉํ•œ ๊ฒฝ๋กœ ์ฐพ๊ธฐ ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ์œ„ํ•œ JanusGraph ๊ทธ๋ž˜ํ”„ DBMS์˜ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ํ…Œ์ŠคํŠธํ•˜๋Š” ์‹คํ—˜

์•ˆ๋…•ํ•˜์„ธ์š” ์—ฌ๋Ÿฌ๋ถ„. ์˜คํ”„๋ผ์ธ ํŠธ๋ž˜ํ”ฝ ๋ถ„์„์„ ์œ„ํ•œ ์ œํ’ˆ์„ ๊ฐœ๋ฐœํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ํ”„๋กœ์ ํŠธ์—๋Š” ์ง€์—ญ๋ณ„ ๋ฐฉ๋ฌธ์ž ๊ฒฝ๋กœ์— ๋Œ€ํ•œ ํ†ต๊ณ„์  ๋ถ„์„๊ณผ ๊ด€๋ จ๋œ ์ž‘์—…์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ์ž‘์—…์˜ ์ผ๋ถ€๋กœ ์‚ฌ์šฉ์ž๋Š” ๋‹ค์Œ ์œ ํ˜•์˜ ์‹œ์Šคํ…œ ์ฟผ๋ฆฌ๋ฅผ ์š”์ฒญํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • "A" ์ง€์—ญ์—์„œ "B" ์ง€์—ญ์œผ๋กœ ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ๋ฐฉ๋ฌธ๊ฐ์ด ์ง€๋‚˜๊ฐ”๋Š”์ง€;
  • "A" ์˜์—ญ์—์„œ "C" ์˜์—ญ์„ ๊ฑฐ์ณ "B" ์˜์—ญ์œผ๋กœ ์ด๋™ํ•œ ๋‹ค์Œ "D" ์˜์—ญ์„ ํ†ต๊ณผํ•œ ๋ฐฉ๋ฌธ์ž ์ˆ˜;
  • ํŠน์ • ์œ ํ˜•์˜ ๋ฐฉ๋ฌธ์ž๊ฐ€ "A" ์˜์—ญ์—์„œ "B" ์˜์—ญ์œผ๋กœ ์ด๋™ํ•˜๋Š” ๋ฐ ๊ฑธ๋ฆฐ ์‹œ๊ฐ„์ž…๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ์œ ์‚ฌํ•œ ๋ถ„์„ ์ฟผ๋ฆฌ๊ฐ€ ๋งŽ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์˜์—ญ ๊ฐ„ ๋ฐฉ๋ฌธ์ž์˜ ์ด๋™์€ ๋ฐฉํ–ฅ์„ฑ ๊ทธ๋ž˜ํ”„์ž…๋‹ˆ๋‹ค. ์ธํ„ฐ๋„ท์„ ๋ณด๋‹ค๊ฐ€ ๊ทธ๋ž˜ํ”„ DBMS๊ฐ€ ๋ถ„์„ ๋ณด๊ณ ์„œ์—๋„ ํ™œ์šฉ๋œ๋‹ค๋Š” ์‚ฌ์‹ค์„ ์•Œ๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ๊ทธ๋ž˜ํ”„ DBMS๊ฐ€ ๊ทธ๋Ÿฌํ•œ ์ฟผ๋ฆฌ์— ์–ด๋–ป๊ฒŒ ๋Œ€์ฒ˜ํ•˜๋Š”์ง€ ๋ณด๊ณ  ์‹ถ์—ˆ์Šต๋‹ˆ๋‹ค.TL; DR; ์‹ ํ†ต์น˜ ์•Š๊ฒŒ).

๋‚˜๋Š” DBMS๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ๋กœ ๊ฒฐ์ •ํ–ˆ๋‹ค. ์•ผ๋ˆ„์Šค ๊ทธ๋ž˜ํ”„, (์ œ ์ƒ๊ฐ์—๋Š”) ์ ์ ˆํ•œ ์šด์˜ ํŠน์„ฑ์„ ์ œ๊ณตํ•ด์•ผ ํ•˜๋Š” ์„ฑ์ˆ™ํ•œ ๊ธฐ์ˆ  ์Šคํƒ์— ์˜์กดํ•˜๋Š” ๊ทธ๋ž˜ํ”„ ์˜คํ”ˆ ์†Œ์Šค DBMS์˜ ๋›ฐ์–ด๋‚œ ๋Œ€ํ‘œ์ž์ž…๋‹ˆ๋‹ค.

  • BerkeleyDB ์Šคํ† ๋ฆฌ์ง€ ๋ฐฑ์—”๋“œ, Apache Cassandra, Scylla;
  • ๋ณต์žกํ•œ ์ธ๋ฑ์Šค๋Š” Lucene, Elasticsearch, Solr์— ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

JanusGraph์˜ ์ €์ž๋Š” ์ด๊ฒƒ์ด OLTP์™€ OLAP ๋ชจ๋‘์— ์ ํ•ฉํ•˜๋‹ค๊ณ  ์ผ์Šต๋‹ˆ๋‹ค.

์ €๋Š” BerkeleyDB, Apache Cassandra, Scylla ๋ฐ ES๋กœ ์ž‘์—…ํ•ด ์™”์œผ๋ฉฐ ์ด๋Ÿฌํ•œ ์ œํ’ˆ์€ ์šฐ๋ฆฌ ์‹œ์Šคํ…œ์—์„œ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋ฏ€๋กœ ์ด ๊ทธ๋ž˜ํ”„ DBMS๋ฅผ ํ…Œ์ŠคํŠธํ•˜๋Š” ๊ฒƒ์— ๋Œ€ํ•ด ๋‚™๊ด€์ ์ด์—ˆ์Šต๋‹ˆ๋‹ค. RocksDB ๋Œ€์‹  BerkeleyDB๋ฅผ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด ์ด์ƒํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ์•˜์ง€๋งŒ ์ด๋Š” ์•„๋งˆ๋„ ํŠธ๋žœ์žญ์…˜ ์š”๊ตฌ ์‚ฌํ•ญ ๋•Œ๋ฌธ์ผ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์–ด์จŒ๋“  ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์ œํ’ˆ ์‚ฌ์šฉ์„ ์œ„ํ•ด์„œ๋Š” Cassandra ๋˜๋Š” Scylla์—์„œ ๋ฐฑ์—”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

ํด๋Ÿฌ์Šคํ„ฐ๋ง์—๋Š” ์ƒ์šฉ ๋ฒ„์ „์ด ํ•„์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์— Neo4j๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ์ฆ‰, ํ•ด๋‹น ์ œํ’ˆ์€ ์˜คํ”ˆ ์†Œ์Šค๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค.

๊ทธ๋ž˜ํ”„ DBMS๋Š” "๊ทธ๋ž˜ํ”„์ฒ˜๋Ÿผ ๋ณด์ด๋ฉด ๊ทธ๋ž˜ํ”„์ฒ˜๋Ÿผ ๋‹ค๋ฃจ์„ธ์š”!"๋ผ๊ณ  ๋งํ•ฉ๋‹ˆ๋‹ค. - ์•„๋ฆ„๋‹ค์›€!

๋จผ์ € ๊ทธ๋ž˜ํ”„ DBMS์˜ ํ‘œ์ค€์— ๋”ฐ๋ผ ์ •ํ™•ํ•˜๊ฒŒ ๋งŒ๋“ค์–ด์ง„ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ ธ์Šต๋‹ˆ๋‹ค.

์ ํ•ฉํ•œ ๊ฒฝ๋กœ ์ฐพ๊ธฐ ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ์œ„ํ•œ JanusGraph ๊ทธ๋ž˜ํ”„ DBMS์˜ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ํ…Œ์ŠคํŠธํ•˜๋Š” ์‹คํ—˜

๋ณธ์งˆ์ด ์žˆ๋‹ค Zone, ํ•ด๋‹น ์ง€์—ญ์„ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ์— ZoneStep ์ด๊ฒƒ์— ์†ํ•œ๋‹ค Zone, ๊ทธ๋Ÿฐ ๋‹ค์Œ ๊ทธ๋Š” ๊ทธ๊ฒƒ์„ ์–ธ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค. ๋ณธ์งˆ์ ์œผ๋กœ Area, ZoneTrack, Person ์ฃผ์˜๋ฅผ ๊ธฐ์šธ์ด์ง€ ๋งˆ์‹ญ์‹œ์˜ค. ๋„๋ฉ”์ธ์— ์†ํ•˜๋ฉฐ ํ…Œ์ŠคํŠธ์˜ ์ผ๋ถ€๋กœ ๊ฐ„์ฃผ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ „์ฒด์ ์œผ๋กœ ์ด๋Ÿฌํ•œ ๊ทธ๋ž˜ํ”„ ๊ตฌ์กฐ์— ๋Œ€ํ•œ ์ฒด์ธ ๊ฒ€์ƒ‰ ์ฟผ๋ฆฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

g.V().hasLabel('Zone').has('id',0).in_()
       .repeat(__.out()).until(__.out().hasLabel('Zone').has('id',19)).count().next()

๋Ÿฌ์‹œ์•„์–ด์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ID=0์ธ ์˜์—ญ์„ ์ฐพ๊ณ , ๊ฐ€์žฅ์ž๋ฆฌ๊ฐ€ ํ•ด๋‹น ์˜์—ญ์œผ๋กœ ๊ฐ€๋Š” ๋ชจ๋“  ์ •์ ์„ ๊ฐ€์ ธ์˜ค๊ณ (ZoneStep), ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์˜์—ญ์— ๋Œ€ํ•œ ๊ฐ€์žฅ์ž๋ฆฌ๊ฐ€ ์žˆ๋Š” ZoneStep์„ ์ฐพ์„ ๋•Œ๊นŒ์ง€ ๋Œ์•„๊ฐ€์ง€ ์•Š๊ณ  ๋ฐŸ์Šต๋‹ˆ๋‹ค. ID=19, ๊ทธ๋Ÿฌํ•œ ์ฒด์ธ์˜ ์ˆ˜๋ฅผ ์…‰๋‹ˆ๋‹ค.

๊ทธ๋ž˜ํ”„ ๊ฒ€์ƒ‰์˜ ๋ณต์žกํ•œ ๋‚ด์šฉ์„ ๋ชจ๋‘ ์•„๋Š” ์ฒ™์€ ์•„๋‹ˆ์ง€๋งŒ, ์ด ์ฟผ๋ฆฌ๋Š” ์ด ์ฑ…์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒ์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค(https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html).

BerkeleyDB ๋ฐฑ์—”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ JanusGraph ๊ทธ๋ž˜ํ”„ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ๊ธธ์ด๊ฐ€ 50~3ํฌ์ธํŠธ์ธ 20๋งŒ ๊ฐœ์˜ ํŠธ๋ž™์„ ๋กœ๋“œํ•˜๊ณ  ๋‹ค์Œ์— ๋”ฐ๋ผ ์ธ๋ฑ์Šค๋ฅผ ์ƒ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ์กฐ์น˜.

Python ๋‹ค์šด๋กœ๋“œ ์Šคํฌ๋ฆฝํŠธ:


from random import random
from time import time

from init import g, graph

if __name__ == '__main__':

    points = []
    max_zones = 19
    zcache = dict()
    for i in range(0, max_zones + 1):
        zcache[i] = g.addV('Zone').property('id', i).next()

    startZ = zcache[0]
    endZ = zcache[max_zones]

    for i in range(0, 10000):

        if not i % 100:
            print(i)

        start = g.addV('ZoneStep').property('time', int(time())).next()
        g.V(start).addE('belongs').to(startZ).iterate()

        while True:
            pt = g.addV('ZoneStep').property('time', int(time())).next()
            end_chain = random()
            if end_chain < 0.3:
                g.V(pt).addE('belongs').to(endZ).iterate()
                g.V(start).addE('goes').to(pt).iterate()
                break
            else:
                zone_id = int(random() * max_zones)
                g.V(pt).addE('belongs').to(zcache[zone_id]).iterate()
                g.V(start).addE('goes').to(pt).iterate()

            start = pt

    count = g.V().count().next()
    print(count)

์šฐ๋ฆฌ๋Š” SSD์— 4๊ฐœ์˜ ์ฝ”์–ด์™€ 16GB RAM์ด ์žˆ๋Š” VM์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. JanusGraph๋Š” ๋‹ค์Œ ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐฐํฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

docker run --name janusgraph -p8182:8182 janusgraph/janusgraph:latest

์ด ๊ฒฝ์šฐ ์™„์ „ ์ผ์น˜ ๊ฒ€์ƒ‰์— ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ์™€ ์ธ๋ฑ์Šค๋Š” BerkeleyDB์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. ์•ž์„œ ์ฃผ์–ด์ง„ ์š”์ฒญ์„ ์‹คํ–‰ํ•œ ๊ฒฐ๊ณผ ์ˆ˜์‹ญ ์ดˆ์— ํ•ด๋‹นํ•˜๋Š” ์‹œ๊ฐ„์ด ์ฃผ์–ด์กŒ์Šต๋‹ˆ๋‹ค.

์œ„์˜ 4๊ฐœ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ๋ณ‘๋ ฌ๋กœ ์‹คํ–‰ํ•จ์œผ๋กœ์จ Docker ๋กœ๊ทธ์—์„œ Java ์Šคํƒ ์ถ”์ ์˜ ์พŒํ™œํ•œ ์ŠคํŠธ๋ฆผ(์šฐ๋ฆฌ ๋ชจ๋‘๋Š” Java ์Šคํƒ ์ถ”์  ์ฝ๊ธฐ๋ฅผ ์ข‹์•„ํ•จ)์„ ์‚ฌ์šฉํ•˜์—ฌ DBMS๋ฅผ ํ˜ธ๋ฐ•์œผ๋กœ ๋ฐ”๊ฟ€ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

๋ช‡ ๊ฐ€์ง€ ์ƒ๊ฐ ๋์— ๊ทธ๋ž˜ํ”„ ๋‹ค์ด์–ด๊ทธ๋žจ์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‹จ์ˆœํ™”ํ•˜๊ธฐ๋กœ ๊ฒฐ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.

์ ํ•ฉํ•œ ๊ฒฝ๋กœ ์ฐพ๊ธฐ ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ์œ„ํ•œ JanusGraph ๊ทธ๋ž˜ํ”„ DBMS์˜ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ํ…Œ์ŠคํŠธํ•˜๋Š” ์‹คํ—˜

์—”ํ„ฐํ‹ฐ ์†์„ฑ์œผ๋กœ ๊ฒ€์ƒ‰ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ์ž๋ฆฌ๋กœ ๊ฒ€์ƒ‰ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋น ๋ฅด๋‹ค๊ณ  ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ ๋‚ด ์š”์ฒญ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฐ”๋€Œ์—ˆ์Šต๋‹ˆ๋‹ค.

g.V().hasLabel('ZoneStep').has('id',0).repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',19)).count().next()

๋Ÿฌ์‹œ์•„์–ด์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ID=0์ธ ZoneStep์„ ์ฐพ๊ณ , ID=19์ธ ZoneStep์„ ์ฐพ์„ ๋•Œ๊นŒ์ง€ ๋Œ์•„๊ฐ€์ง€ ์•Š๊ณ  ๋ฐŸ๊ณ , ๊ทธ๋Ÿฌํ•œ ์ฒด์ธ์˜ ์ˆ˜๋ฅผ ์„ธ์–ด๋ณด์„ธ์š”.

๋˜ํ•œ ๋ถˆํ•„์š”ํ•œ ์—ฐ๊ฒฐ์„ ์ƒ์„ฑํ•˜์ง€ ์•Š๊ณ  ์†์„ฑ์œผ๋กœ ์ œํ•œํ•˜๊ธฐ ์œ„ํ•ด ์œ„์— ์ œ๊ณต๋œ ๋กœ๋”ฉ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ๋‹จ์ˆœํ™”ํ–ˆ์Šต๋‹ˆ๋‹ค.

์š”์ฒญ์„ ์™„๋ฃŒํ•˜๋Š” ๋ฐ ๋ช‡ ์ดˆ๊ฐ€ ๊ฑธ๋ ธ๋Š”๋ฐ, ์ด๋Š” ์–ด๋–ค ์ข…๋ฅ˜์˜ AdHoc ์š”์ฒญ ๋ชฉ์ ์—๋„ ์ „ํ˜€ ์ ํ•ฉํ•˜์ง€ ์•Š์•˜๊ธฐ ๋•Œ๋ฌธ์— ์šฐ๋ฆฌ ์ž‘์—…์— ์™„์ „ํžˆ ํ—ˆ์šฉ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

๊ฐ€์žฅ ๋น ๋ฅธ Cassandra ๊ตฌํ˜„์œผ๋กœ Scylla๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ JanusGraph๋ฅผ ๋ฐฐํฌํ•˜๋ ค๊ณ  ์‹œ๋„ํ–ˆ์ง€๋งŒ ์ด ์—ญ์‹œ ํฐ ์„ฑ๋Šฅ ๋ณ€ํ™”๋กœ ์ด์–ด์ง€์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ "๊ทธ๋ž˜ํ”„์ฒ˜๋Ÿผ ๋ณด์ธ๋‹ค"๋Š” ์‚ฌ์‹ค์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๊ทธ๋ž˜ํ”„ DBMS๋ฅผ ๊ตฌํ•ด ๋น ๋ฅด๊ฒŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์—†์—ˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ๋‚ด๊ฐ€ ๋ชจ๋ฅด๋Š” ๊ฒƒ์ด ์žˆ๊ณ  JanusGraph๊ฐ€ ์ด ๊ฒ€์ƒ‰์„ ๋‹จ ๋ช‡ ์ดˆ ์•ˆ์— ์ˆ˜ํ–‰ํ•˜๋„๋ก ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ „์ ์œผ๋กœ ๊ฐ€์ •ํ–ˆ์ง€๋งŒ ๋‚˜๋Š” ๊ทธ๊ฒƒ์„ ํ•  ์ˆ˜ ์—†์—ˆ์Šต๋‹ˆ๋‹ค.

๋ฌธ์ œ๋Š” ์—ฌ์ „ํžˆ ํ•ด๊ฒฐํ•ด์•ผ ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์šฐ์•„ํ•จ ์ธก๋ฉด์—์„œ ๋‚™๊ด€๋ก ์„ ๋ถˆ๋Ÿฌ์ผ์œผํ‚ค์ง€๋Š” ์•Š์•˜์ง€๋งŒ ์‹ค์ œ๋กœ ์™„์ „ํžˆ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ์˜ต์…˜์ด ๋  ์ˆ˜ ์žˆ๋Š” ํ…Œ์ด๋ธ”์˜ JOIN ๋ฐ ํ”ผ๋ฒ—์— ๋Œ€ํ•ด ์ƒ๊ฐํ•˜๊ธฐ ์‹œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค.

์šฐ๋ฆฌ ํ”„๋กœ์ ํŠธ๋Š” ์ด๋ฏธ Apache ClickHouse๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์œผ๋ฏ€๋กœ ์ด ๋ถ„์„ DBMS์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋ฅผ ํ…Œ์ŠคํŠธํ•˜๊ธฐ๋กœ ๊ฒฐ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ฐ„๋‹จํ•œ ๋ ˆ์‹œํ”ผ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ClickHouse๋ฅผ ๋ฐฐํฌํ–ˆ์Šต๋‹ˆ๋‹ค.

sudo docker run -d --name clickhouse_1 
     --ulimit nofile=262144:262144 
     -v /opt/clickhouse/log:/var/log/clickhouse-server 
     -v /opt/clickhouse/data:/var/lib/clickhouse 
     yandex/clickhouse-server

๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์™€ ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.

CREATE TABLE 
db.steps (`area` Int64, `when` DateTime64(1, 'Europe/Moscow') DEFAULT now64(), `zone` Int64, `person` Int64) 
ENGINE = MergeTree() ORDER BY (area, zone, person) SETTINGS index_granularity = 8192

๋‹ค์Œ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋กœ ์ฑ„์› ์Šต๋‹ˆ๋‹ค.

from time import time

from clickhouse_driver import Client
from random import random

client = Client('vm-12c2c34c-df68-4a98-b1e5-a4d1cef1acff.domain',
                database='db',
                password='secret')

max = 20

for r in range(0, 100000):

    if r % 1000 == 0:
        print("CNT: {}, TS: {}".format(r, time()))

    data = [{
            'area': 0,
            'zone': 0,
            'person': r
        }]

    while True:
        if random() < 0.3:
            break

        data.append({
                'area': 0,
                'zone': int(random() * (max - 2)) + 1,
                'person': r
            })

    data.append({
            'area': 0,
            'zone': max - 1,
            'person': r
        })

    client.execute(
        'INSERT INTO steps (area, zone, person) VALUES',
        data
    )

์ธ์„œํŠธ๊ฐ€ ์ผ๊ด„์ ์œผ๋กœ ์ œ๊ณต๋˜๋ฏ€๋กœ JanusGraph๋ณด๋‹ค ์ฑ„์šฐ๊ธฐ ์†๋„๊ฐ€ ํ›จ์”ฌ ๋นจ๋ž์Šต๋‹ˆ๋‹ค.

JOIN์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‘ ๊ฐœ์˜ ์ฟผ๋ฆฌ๋ฅผ ๊ตฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. A ์ง€์ ์—์„œ B ์ง€์ ์œผ๋กœ ์ด๋™ํ•˜๋ ค๋ฉด:

SELECT s1.person AS person,
       s1.zone,
       s1.when,
       s2.zone,
       s2.when
FROM
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 0)) AS s1 ANY INNER JOIN
  (SELECT *
   FROM steps AS s2
   WHERE (area = 0)
     AND (zone = 19)) AS s2 USING person
WHERE s1.when <= s2.when

3๊ฐœ ์ง€์ ์„ ํ†ต๊ณผํ•˜๋ ค๋ฉด:

SELECT s3.person,
       s1z,
       s1w,
       s2z,
       s2w,
       s3.zone,
       s3.when
FROM
  (SELECT s1.person AS person,
          s1.zone AS s1z,
          s1.when AS s1w,
          s2.zone AS s2z,
          s2.when AS s2w
   FROM
     (SELECT *
      FROM steps
      WHERE (area = 0)
        AND (zone = 0)) AS s1 ANY INNER JOIN
     (SELECT *
      FROM steps AS s2
      WHERE (area = 0)
        AND (zone = 3)) AS s2 USING person
   WHERE s1.when <= s2.when) p ANY INNER JOIN
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 19)) AS s3 USING person
WHERE p.s2w <= s3.when

๋ฌผ๋ก  ์š”์ฒญ์€ ๊ฝค ๋ฌด์„ญ๊ฒŒ ๋ณด์ž…๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ์†Œํ”„ํŠธ์›จ์–ด ์ƒ์„ฑ๊ธฐ ํ•˜๋„ค์Šค๋ฅผ ๋งŒ๋“ค์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ทธ๋“ค์€ ์ผํ•˜๊ณ  ๋น ๋ฅด๊ฒŒ ์ผํ•ฉ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์š”์ฒญ๊ณผ ๋‘ ๋ฒˆ์งธ ์š”์ฒญ ๋ชจ๋‘ 0.1์ดˆ ์ด๋‚ด์— ์™„๋ฃŒ๋ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ์€ 3๊ฐœ ์ง€์ ์„ ํ†ต๊ณผํ•˜๋Š” count(*)์— ๋Œ€ํ•œ ์ฟผ๋ฆฌ ์‹คํ–‰ ์‹œ๊ฐ„์˜ ์˜ˆ์ž…๋‹ˆ๋‹ค.

SELECT count(*)
FROM 
(
    SELECT 
        s1.person AS person, 
        s1.zone AS s1z, 
        s1.when AS s1w, 
        s2.zone AS s2z, 
        s2.when AS s2w
    FROM 
    (
        SELECT *
        FROM steps
        WHERE (area = 0) AND (zone = 0)
    ) AS s1
    ANY INNER JOIN 
    (
        SELECT *
        FROM steps AS s2
        WHERE (area = 0) AND (zone = 3)
    ) AS s2 USING (person)
    WHERE s1.when <= s2.when
) AS p
ANY INNER JOIN 
(
    SELECT *
    FROM steps
    WHERE (area = 0) AND (zone = 19)
) AS s3 USING (person)
WHERE p.s2w <= s3.when

โ”Œโ”€count()โ”€โ”
โ”‚   11592 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

1 rows in set. Elapsed: 0.068 sec. Processed 250.03 thousand rows, 8.00 MB (3.69 million rows/s., 117.98 MB/s.)

IOPS์— ๋Œ€ํ•œ ์ฐธ๊ณ  ์‚ฌํ•ญ. ๋ฐ์ดํ„ฐ๋ฅผ ์ฑ„์šธ ๋•Œ JanusGraph๋Š” ์ƒ๋‹นํžˆ ๋†’์€ IOPS(1000๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์ฑ„์šฐ๊ธฐ ์Šค๋ ˆ๋“œ์— ๋Œ€ํ•ด 1300-XNUMX)๋ฅผ ์ƒ์„ฑํ–ˆ์œผ๋ฉฐ IOWAIT๋„ ์ƒ๋‹นํžˆ ๋†’์•˜์Šต๋‹ˆ๋‹ค. ๋™์‹œ์— ClickHouse๋Š” ๋””์Šคํฌ ํ•˜์œ„ ์‹œ์Šคํ…œ์— ์ตœ์†Œํ•œ์˜ ๋กœ๋“œ๋ฅผ ์ƒ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ฒฐ๋ก 

์šฐ๋ฆฌ๋Š” ์ด๋Ÿฌํ•œ ์œ ํ˜•์˜ ์š”์ฒญ์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ClickHouse๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ๋กœ ๊ฒฐ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. ClickHouse์— ๋กœ๋“œํ•˜๊ธฐ ์ „์— Apache Flink๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฒคํŠธ ์ŠคํŠธ๋ฆผ์„ ์‚ฌ์ „ ์ฒ˜๋ฆฌํ•จ์œผ๋กœ์จ ๊ตฌ์ฒดํ™”๋œ ๋ทฐ์™€ ๋ณ‘๋ ฌํ™”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฟผ๋ฆฌ๋ฅผ ํ•ญ์ƒ ๋”์šฑ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์„ฑ๋Šฅ์ด ๋„ˆ๋ฌด ์ข‹์•„์„œ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ฐฉ์‹์œผ๋กœ ํ…Œ์ด๋ธ”์„ ํ”ผ๋ฒ—ํ•˜๋Š” ๊ฒƒ์— ๋Œ€ํ•ด ์ƒ๊ฐํ•  ํ•„์š”์กฐ์ฐจ ์—†์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด์ „์—๋Š” Apache Parquet์— ์—…๋กœ๋“œ๋ฅผ ํ†ตํ•ด Vertica์—์„œ ๊ฒ€์ƒ‰๋œ ๋ฐ์ดํ„ฐ๋ฅผ ํ”ผ๋ฒ—ํ•ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค.

๋ถˆํ–‰ํ•˜๊ฒŒ๋„ ๊ทธ๋ž˜ํ”„ DBMS๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋Š” ๋˜ ๋‹ค๋ฅธ ์‹œ๋„๋Š” ์‹คํŒจํ–ˆ์Šต๋‹ˆ๋‹ค. ์ €๋Š” JanusGraph๊ฐ€ ์ œํ’ˆ์„ ์‰ฝ๊ฒŒ ์ตํž ์ˆ˜ ์žˆ๋Š” ์นœ๊ทผํ•œ ์ƒํƒœ๊ณ„๋ฅผ ๊ฐ–๊ณ  ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋™์‹œ์— ์„œ๋ฒ„๋ฅผ ๊ตฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์ „ํ†ต์ ์ธ Java ๋ฐฉ์‹์ด ์‚ฌ์šฉ๋˜๋Š”๋ฐ, ์ด๋Š” Java์— ์ต์ˆ™ํ•˜์ง€ ์•Š์€ ์‚ฌ๋žŒ๋“ค์ด ํ”ผ๋ˆˆ๋ฌผ์„ ํ˜๋ฆฌ๊ฒŒ ๋งŒ๋“ค ๊ฒƒ์ž…๋‹ˆ๋‹ค.

host: 0.0.0.0
port: 8182
threadPoolWorker: 1
gremlinPool: 8
scriptEvaluationTimeout: 30000
channelizer: org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer

graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties,
  airlines: conf/airlines.properties
}

scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/airline-sample.groovy]}}}}

serializers:
# GraphBinary is here to replace Gryo and Graphson
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }}
  # Gryo and Graphson, latest versions
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}

processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}

metrics: {
  consoleReporter: {enabled: false, interval: 180000},
  csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: false},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferHighWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
  enabled: false}

์ €๋Š” ์‹ค์ˆ˜๋กœ JanusGraph์˜ BerkeleyDB ๋ฒ„์ „์„ "๋„ฃ์—ˆ์Šต๋‹ˆ๋‹ค".

์ธ๋ฑ์Šค๋ฅผ ๊ด€๋ฆฌํ•˜๋ ค๋ฉด Groovy์—์„œ ๋‹ค์†Œ ์ด์ƒํ•œ ์ƒค๋จธ๋‹ˆ์ฆ˜์„ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฌธ์„œ๋Š” ์ธ๋ฑ์Šค ์ธก๋ฉด์—์„œ ์ƒ๋‹นํžˆ ๋น„๋šค์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ธ๋ฑ์Šค๋ฅผ ์ƒ์„ฑํ•˜๋ ค๋ฉด Gremlin ์ฝ˜์†”์—์„œ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค(๋‹จ, ๊ธฐ๋ณธ์ ์œผ๋กœ๋Š” ์ž‘๋™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค). ๊ณต์‹ JanusGraph ๋ฌธ์„œ์—์„œ:

graph.tx().rollback() //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.buildIndex('byNameAndAgeComposite', Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()

//Wait for the index to become available
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameComposite').call()
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameAndAgeComposite').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"), SchemaAction.REINDEX).get()
mgmt.commit()

์‚ฌํ›„

์–ด๋–ค ์˜๋ฏธ์—์„œ ์œ„์˜ ์‹คํ—˜์€ ๋”ฐ๋œปํ•จ๊ณผ ๋ถ€๋“œ๋Ÿฌ์›€์˜ ๋น„๊ต์ž…๋‹ˆ๋‹ค. ์ƒ๊ฐํ•ด๋ณด๋ฉด, ๊ทธ๋ž˜ํ”„ DBMS๋Š” ๋™์ผํ•œ ๊ฒฐ๊ณผ๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด ๋‹ค๋ฅธ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ํ…Œ์ŠคํŠธ์˜ ์ผํ™˜์œผ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์š”์ฒญ์œผ๋กœ ์‹คํ—˜๋„ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

g.V().hasLabel('ZoneStep').has('id',0)
    .repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',1)).count().next()

์ด๋Š” ๋„๋ณด ๊ฑฐ๋ฆฌ๋ฅผ ๋ฐ˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ทธ๋Ÿฐ ๋ฐ์ดํ„ฐ์—์„œ๋„ ๊ทธ๋ž˜ํ”„ DBMS๋Š” ๋ช‡ ์ดˆ๊ฐ€ ๋„˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ๋Š”๋ฐ... ์ด๋Š” ๋‹น์—ฐํžˆ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฝ๋กœ๊ฐ€ ์žˆ์—ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. 0 -> X -> Y ... -> 1, ๊ทธ๋ž˜ํ”„ ์—”์ง„๋„ ์ด๋ฅผ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ฟผ๋ฆฌ์˜ ๊ฒฝ์šฐ์—๋„:

g.V().hasLabel('ZoneStep').has('id',0).out().has('id',1)).count().next()

XNUMX์ดˆ๋„ ์•ˆ ๋˜๋Š” ์ฒ˜๋ฆฌ ์‹œ๊ฐ„์œผ๋กœ ์ƒ์‚ฐ์ ์ธ ๋‹ต๋ณ€์„ ์–ป์„ ์ˆ˜ ์—†์—ˆ์Šต๋‹ˆ๋‹ค.

์ด์•ผ๊ธฐ์˜ ๊ตํ›ˆ์€ ์•„๋ฆ„๋‹ค์šด ์•„์ด๋””์–ด์™€ ํŒจ๋Ÿฌ๋‹ค์ž„์ ์ธ ๋ชจ๋ธ๋ง์ด ์›ํ•˜๋Š” ๊ฒฐ๊ณผ๋กœ ์ด์–ด์ง€์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ClickHouse์˜ ์˜ˆ๋ฅผ ํ†ตํ•ด ํ›จ์”ฌ ๋” ๋†’์€ ํšจ์œจ์„ฑ์œผ๋กœ ์ž…์ฆ๋ฉ๋‹ˆ๋‹ค. ์ด ๊ธฐ์‚ฌ์— ์ œ์‹œ๋œ ์‚ฌ์šฉ ์‚ฌ๋ก€๋Š” ๊ทธ๋ž˜ํ”„ DBMS์— ๋Œ€ํ•œ ๋ช…ํ™•ํ•œ ์•ˆํ‹ฐ ํŒจํ„ด์ด์ง€๋งŒ ํ•ด๋‹น ํŒจ๋Ÿฌ๋‹ค์ž„์—์„œ ๋ชจ๋ธ๋งํ•˜๋Š” ๋ฐ ์ ํ•ฉํ•ด ๋ณด์ž…๋‹ˆ๋‹ค.

์ถœ์ฒ˜ : habr.com

์ฝ”๋ฉ˜ํŠธ๋ฅผ ์ถ”๊ฐ€