Kuedza kuyedza kushanda kweJanusGraph graph DBMS yekugadzirisa dambudziko rekutsvaga nzira dzakakodzera.

Kuedza kuyedza kushanda kweJanusGraph graph DBMS yekugadzirisa dambudziko rekutsvaga nzira dzakakodzera.

Mhoroi mose. Tiri kugadzira chigadzirwa chekuongorora traffic pasina Indaneti. Iyo purojekiti ine basa rine chekuita nenhamba yekuongorora nzira dzevashanyi munzvimbo dzese.

Sechikamu chebasa iri, vashandisi vanogona kubvunza sisitimu mibvunzo yemhando inotevera:

  • vangani vashanyi vakapfuura kubva munzvimbo "A" kuenda kunzvimbo "B";
  • vangani vashanyi vakapfuura kubva munzvimbo "A" kuenda kunzvimbo "B" kuburikidza nenharaunda "C" uyezve kuburikidza nenzvimbo "D";
  • zvakatora nguva yakareba sei kuti mumwe rudzi rwemuenzi afambe kubva munzvimbo β€œA” kuenda kunzvimbo β€œB”.

uye akati wandei akafanana analytical mibvunzo.

Kufamba kwemuenzi munzvimbo dzese igirafu rakanangwa. Mushure mekuverenga iyo Internet, ndakaona kuti graph DBMSs inoshandiswawo kune analytical mishumo. Ini ndaive nechishuwo chekuona kuti graph DBMSs yaizoita sei nemibvunzo yakadai (TL; DR; zvakaipa).

Ndakasarudza kushandisa DBMS JanusGraph, semumiriri akatanhamara wegirafu yakavhurika-sosi DBMS, iyo inotsamira pane akakura matekinoroji, ayo (mumaonero angu) anofanira kuapa ane hunhu hwekushanda:

  • BerkeleyDB kuchengetedza backend, Apache Cassandra, Scylla;
  • yakaoma indexes inogona kuchengetwa muLucene, Elasticsearch, Solr.

Vanyori veJanusGraph vanonyora kuti inokodzera zvese OLTP uye OLAP.

Ndakashanda neBerkeleyDB, Apache Cassandra, Scylla uye ES, uye zvigadzirwa izvi zvinowanzo shandiswa mumasisitimu edu, saka ndaive netarisiro yekuyedza iyi grafu DBMS. Ndakaona zvisinganzwisisike kusarudza BerkeleyDB pamusoro peRocksDB, asi pamwe nekuda kwezvido zvekutengeserana. Chero zvazvingava, kune scalable, kushandiswa kwechigadzirwa, zvinokurudzirwa kushandisa backend paCassandra kana Scylla.

Ini handina kufunga Neo4j nekuti kusanganisa kunoda vhezheni yekutengeserana, ndiko kuti, chigadzirwa hachina kuvhurika sosi.

Girafu DBMS inoti: "Kana ichiita senge girafu, ibate segirafu!" - runako!

Chekutanga, ndakadhirowa girafu, iyo yakagadzirwa chaizvo maererano necanons yegraph DBMSs:

Kuedza kuyedza kushanda kweJanusGraph graph DBMS yekugadzirisa dambudziko rekutsvaga nzira dzakakodzera.

Pane hunhu Zone, mutoro wenzvimbo. Kana ZoneStep ndecheizvi Zone, ipapo anonongedzera kwairi. On essence Area, ZoneTrack, Person Usateerere, ivo ndevenzvimbo uye havatariswe sechikamu chebvunzo. Pakazara, cheni yekutsvaga mubvunzo weiyo girafu chimiro chaizotaridzika senge:

g.V().hasLabel('Zone').has('id',0).in_()
       .repeat(__.out()).until(__.out().hasLabel('Zone').has('id',19)).count().next()

Chii muchiRussia chinhu chakadai: tsvaga Zone ine ID = 0, tora ese vertices kubva kune mupendero unoenda kwairi (ZoneStep), tsika usingadzoke kusvika wawana iwo ZoneSteps kubva kune kumucheto kune Zone ine ID = 19, verenga nhamba dzakadaro cheni.

Ini handinyepedzeri kuziva zvese zvakaoma zvekutsvaga pamagirafu, asi mubvunzo uyu wakagadzirwa kubva mubhuku rino (https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html).

Ndakarodha zviuru makumi mashanu zvemateki kubva pamakumi matatu kusvika kumakumi maviri mapoinzi pakureba muJanusGraph graph database ndichishandisa BerkeleyDB backend, yakagadzira indexes maererano. utungamiri.

Python download script:


from random import random
from time import time

from init import g, graph

if __name__ == '__main__':

    points = []
    max_zones = 19
    zcache = dict()
    for i in range(0, max_zones + 1):
        zcache[i] = g.addV('Zone').property('id', i).next()

    startZ = zcache[0]
    endZ = zcache[max_zones]

    for i in range(0, 10000):

        if not i % 100:
            print(i)

        start = g.addV('ZoneStep').property('time', int(time())).next()
        g.V(start).addE('belongs').to(startZ).iterate()

        while True:
            pt = g.addV('ZoneStep').property('time', int(time())).next()
            end_chain = random()
            if end_chain < 0.3:
                g.V(pt).addE('belongs').to(endZ).iterate()
                g.V(start).addE('goes').to(pt).iterate()
                break
            else:
                zone_id = int(random() * max_zones)
                g.V(pt).addE('belongs').to(zcache[zone_id]).iterate()
                g.V(start).addE('goes').to(pt).iterate()

            start = pt

    count = g.V().count().next()
    print(count)

Takashandisa VM ine 4 cores uye 16 GB RAM pane SSD. JanusGraph yakashandiswa uchishandisa uyu murairo:

docker run --name janusgraph -p8182:8182 janusgraph/janusgraph:latest

Mune ino kesi, iyo data uye indexes anoshandiswa kutsvaga chaiwo match anochengetwa muBerkeleyDB. Sezvo ndaita chikumbiro chandanga ndapiwa, ndakagamuchira nguva yakaenzana nemakumi emakumi emasekondi.

Nekumhanyisa iwo mana ari pamusoro pezvinyorwa zvakafanana, ndakakwanisa kushandura iyo DBMS kuita nhanga ine rukova runofadza rweJava stacktraces (uye isu tese tinoda kuverenga Java stacktraces) muDocker matanda.

Mushure mekufunga, ndakafunga kurerutsa dhiyagiramu yegirafu kune inotevera:

Kuedza kuyedza kushanda kweJanusGraph graph DBMS yekugadzirisa dambudziko rekutsvaga nzira dzakakodzera.

Kufunga kuti kutsvaga nehunhu kwaizokurumidza kupfuura kutsvaga nemucheto. Nekuda kweizvozvo, chikumbiro changu chakashanduka kuva chinotevera:

g.V().hasLabel('ZoneStep').has('id',0).repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',19)).count().next()

Chii muRussia chinhu chakadai: tsvaga ZoneStep ine ID = 0, mira usingadzoke kusvika wawana ZoneStep ine ID = 19, verenga nhamba yemaketani akadaro.

Ini zvakare ndakarerutsa iyo yekurodha script yakapihwa pamusoro kuitira kuti ndisagadzire zvisina basa zvinongedzo, ndichizviganhurira kune hunhu.

Chikumbiro ichi chakaramba chakatora masekonzi akati wandei kuti chipedze, icho chaive chisingagamuchirwe zvachose pabasa redu, sezvo chaive chisina kutombokodzera zvinangwa zveAdHoc zvikumbiro zvechero rudzi.

Ndakaedza kuendesa JanusGraph ndichishandisa Scylla sekukurumidza kuita kweCassandra, asi izvi zvakare hazvina kutungamira kune chero yakakosha shanduko yekuita.

Saka kunyangwe nenyaya yekuti "inotaridzika segirafu", handina kukwanisa kuwana girafu DBMS kuti igadzirise nekukurumidza. Ndinonyatso fungidzira kuti pane chimwe chinhu chandisingazivi uye kuti JanusGraph inogona kuitwa kuti iite iyi yekutsvaga muchidimbu chechipiri, zvisinei, handina kukwanisa kuzviita.

Sezvo dambudziko richiri kuda kugadziriswa, ndakatanga kufunga nezve JOINs uye Pivots yematafura, izvo zvisina kukurudzira tarisiro maererano nekunaka, asi inogona kuve sarudzo inoshanda zvachose mukuita.

Yedu purojekiti yatoshandisa Apache ClickHouse, saka ndakafunga kuyedza tsvakiridzo yangu pane ino yekuongorora DBMS.

Yakaiswa ClickHouse uchishandisa iri nyore resipi:

sudo docker run -d --name clickhouse_1 
     --ulimit nofile=262144:262144 
     -v /opt/clickhouse/log:/var/log/clickhouse-server 
     -v /opt/clickhouse/data:/var/lib/clickhouse 
     yandex/clickhouse-server

Ndakagadzira dhatabhesi uye tafura mairi seizvi:

CREATE TABLE 
db.steps (`area` Int64, `when` DateTime64(1, 'Europe/Moscow') DEFAULT now64(), `zone` Int64, `person` Int64) 
ENGINE = MergeTree() ORDER BY (area, zone, person) SETTINGS index_granularity = 8192

Ndakaizadza nedata ndichishandisa script inotevera:

from time import time

from clickhouse_driver import Client
from random import random

client = Client('vm-12c2c34c-df68-4a98-b1e5-a4d1cef1acff.domain',
                database='db',
                password='secret')

max = 20

for r in range(0, 100000):

    if r % 1000 == 0:
        print("CNT: {}, TS: {}".format(r, time()))

    data = [{
            'area': 0,
            'zone': 0,
            'person': r
        }]

    while True:
        if random() < 0.3:
            break

        data.append({
                'area': 0,
                'zone': int(random() * (max - 2)) + 1,
                'person': r
            })

    data.append({
            'area': 0,
            'zone': max - 1,
            'person': r
        })

    client.execute(
        'INSERT INTO steps (area, zone, person) VALUES',
        data
    )

Sezvo kuiswa kuchiuya mumabhechi, kuzadza kwaikurumidza kupfuura kweJanusGraph.

Vakagadzira mibvunzo miviri vachishandisa JOIN. Kubva pane A kuenda kunongedzo B:

SELECT s1.person AS person,
       s1.zone,
       s1.when,
       s2.zone,
       s2.when
FROM
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 0)) AS s1 ANY INNER JOIN
  (SELECT *
   FROM steps AS s2
   WHERE (area = 0)
     AND (zone = 19)) AS s2 USING person
WHERE s1.when <= s2.when

Kupfuura nemapoinzi matatu:

SELECT s3.person,
       s1z,
       s1w,
       s2z,
       s2w,
       s3.zone,
       s3.when
FROM
  (SELECT s1.person AS person,
          s1.zone AS s1z,
          s1.when AS s1w,
          s2.zone AS s2z,
          s2.when AS s2w
   FROM
     (SELECT *
      FROM steps
      WHERE (area = 0)
        AND (zone = 0)) AS s1 ANY INNER JOIN
     (SELECT *
      FROM steps AS s2
      WHERE (area = 0)
        AND (zone = 3)) AS s2 USING person
   WHERE s1.when <= s2.when) p ANY INNER JOIN
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 19)) AS s3 USING person
WHERE p.s2w <= s3.when

Izvo zvikumbiro, zvechokwadi, zvinotaridzika zvinotyisa; kuti ushandise chaizvo, unofanirwa kugadzira software jenareta harness. Zvisinei, vanoshanda uye vanoshanda nokukurumidza. Zvese zvikumbiro zvekutanga nezvechipiri zvinopedzwa musingasviki 0.1 masekonzi. Heino muenzaniso wenguva yekubvunza nguva yekuverenga(*) ichipfuura nemapoinzi matatu:

SELECT count(*)
FROM 
(
    SELECT 
        s1.person AS person, 
        s1.zone AS s1z, 
        s1.when AS s1w, 
        s2.zone AS s2z, 
        s2.when AS s2w
    FROM 
    (
        SELECT *
        FROM steps
        WHERE (area = 0) AND (zone = 0)
    ) AS s1
    ANY INNER JOIN 
    (
        SELECT *
        FROM steps AS s2
        WHERE (area = 0) AND (zone = 3)
    ) AS s2 USING (person)
    WHERE s1.when <= s2.when
) AS p
ANY INNER JOIN 
(
    SELECT *
    FROM steps
    WHERE (area = 0) AND (zone = 19)
) AS s3 USING (person)
WHERE p.s2w <= s3.when

β”Œβ”€count()─┐
β”‚   11592 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1 rows in set. Elapsed: 0.068 sec. Processed 250.03 thousand rows, 8.00 MB (3.69 million rows/s., 117.98 MB/s.)

Chinyorwa nezve IOPS. Pakuburitsa data, JanusGraph yakagadzira yakakwira nhamba yeIOPS (1000-1300 yematambo mana ehuwandu hwedata) uye IOWAIT yaive yakakwira zvakanyanya. Panguva imwecheteyo, ClickHouse yakagadzira zvishoma mutoro pane disk subsystem.

mhedziso

Isu takasarudza kushandisa ClickHouse kushandira rudzi urwu rwekukumbira. Tinogona kugara tichiwedzera kukwirisa mibvunzo tichishandisa maonerwo enyama uye kufanana nekufanogadzirisa rwizi rwechiitiko tichishandisa Apache Flink tisati taisa muClickHouse.

Kuita kwacho kwakanaka zvekuti isu hatingatozofanira kufunga nezve pivoting matafura nenzira. Pakutanga, taifanira kuita pivots yedata yakatorwa kubva kuVertica kuburikidza nekuisa kuApache Parquet.

Zvinosuruvarisa, kumwe kuedza kushandisa girafu DBMS hakuna kubudirira. Ini handina kuwana JanusGraph kuve ine hushamwari ecosystem yakaita kuti zvive nyore kusimuka kuti ikurumidze nechigadzirwa. Panguva imwecheteyo, kugadzirisa sevha, nzira yechinyakare yeJava inoshandiswa, iyo ichaita kuti vanhu vasina kujairana neJava vacheme misodzi yeropa:

host: 0.0.0.0
port: 8182
threadPoolWorker: 1
gremlinPool: 8
scriptEvaluationTimeout: 30000
channelizer: org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer

graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties,
  airlines: conf/airlines.properties
}

scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/airline-sample.groovy]}}}}

serializers:
# GraphBinary is here to replace Gryo and Graphson
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }}
  # Gryo and Graphson, latest versions
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}

processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}

metrics: {
  consoleReporter: {enabled: false, interval: 180000},
  csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: false},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferHighWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
  enabled: false}

Ndakakwanisa "kuisa" BerkeleyDB vhezheni yeJanusGraph.

Zvinyorwa zvacho zvakatsveyama maererano nema indexes, sezvo kutonga indexes kunoda kuti uite imwe shamanism isinganzwisisike muGroovy. Semuenzaniso, kugadzira index kunofanirwa kuitwa nekunyora kodhi muGremlin console (iyo, nenzira, isingashande kunze kwebhokisi). Kubva pane zviri pamutemo JanusGraph zvinyorwa:

graph.tx().rollback() //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.buildIndex('byNameAndAgeComposite', Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()

//Wait for the index to become available
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameComposite').call()
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameAndAgeComposite').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"), SchemaAction.REINDEX).get()
mgmt.commit()

Afterword

Neimwe nzira, kuedza kuri pamusoro apa kuenzanisa pakati pekudziya nekupfava. Kana iwe uchifunga nezvazvo, girafu DBMS inoita mamwe maoparesheni kuti iwane mibairo yakafanana. Nekudaro, sechikamu chebvunzo, ndakaitisawo kuyedza nechikumbiro senge:

g.V().hasLabel('ZoneStep').has('id',0)
    .repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',1)).count().next()

iyo inoratidza chinhambwe chekufamba. Zvisinei, kunyange pamashoko akadaro, DBMS yegirafu yakaratidza zvigumisiro zvakapfuura masekondi mashomanana ... Izvi, zvechokwadi, zvinokonzerwa nekuti kwaiva nemakwara akafanana. 0 -> X -> Y ... -> 1, iyo injini yegirafu yakatariswawo.

Kunyangwe kumubvunzo wakafanana:

g.V().hasLabel('ZoneStep').has('id',0).out().has('id',1)).count().next()

Handina kukwanisa kuwana mhinduro inobudirira nenguva yekugadzirisa isingasviki sekondi.

Hunhu hwenyaya ndeyekuti zano rakanaka uye paradigmatic modelling haitungamiri kune inodiwa mhedzisiro, iyo inoratidzwa nehunyanzvi hwepamusoro uchishandisa muenzaniso weClickHouse. Nyaya yekushandiswa inoratidzwa munyaya ino inopesana-pattern yakajeka yegirafu DBMSs, kunyange zvazvo ichiita seyakakodzera kuenzanisira muparadigm yavo.

Source: www.habr.com

Voeg