Idanwo idanwo iwulo ti aworan JanusGraph DBMS fun ipinnu iṣoro ti wiwa awọn ọna to dara

Idanwo idanwo iwulo ti aworan JanusGraph DBMS fun ipinnu iṣoro ti wiwa awọn ọna to dara

Bawo ni gbogbo eniyan. A n ṣe idagbasoke ọja kan fun itupalẹ ijabọ aisinipo. Ise agbese na ni iṣẹ-ṣiṣe ti o ni ibatan si iṣiro iṣiro ti awọn ipa-ọna alejo kọja awọn agbegbe.

Gẹgẹbi apakan ti iṣẹ-ṣiṣe yii, awọn olumulo le beere awọn ibeere eto ti iru atẹle:

  • melo ni awọn alejo ti o kọja lati agbegbe "A" si agbegbe "B";
  • melo ni awọn alejo ti o kọja lati agbegbe "A" si agbegbe "B" nipasẹ agbegbe "C" ati lẹhinna nipasẹ agbegbe "D";
  • bi o ti pẹ to fun iru alejo kan lati rin irin-ajo lati agbegbe “A” si agbegbe “B”.

ati awọn nọmba kan ti iru analitikali ibeere.

Gbigbe alejo kọja awọn agbegbe jẹ iyaya ti o darí. Lẹhin kika intanẹẹti, Mo ṣe awari pe awọn DBMS aworan kan tun lo fun awọn ijabọ itupalẹ. Mo ni ifẹ lati wo bi awọn DBMSs aworan yoo ṣe koju iru awọn ibeere bẹ (TL; DR; ko dara).

Mo yan lati lo DBMS JanusGraph, gẹgẹbi aṣoju ti o tayọ ti DBMS orisun-ìmọ aworan, eyiti o dale lori akopọ ti awọn imọ-ẹrọ ti ogbo, eyiti (ninu ero mi) yẹ ki o pese pẹlu awọn abuda iṣẹ ṣiṣe to peye:

  • BerkeleyDB ipamọ backend, Apache Cassandra, Scylla;
  • Awọn atọka eka le wa ni ipamọ ni Lucene, Elasticsearch, Solr.

Awọn onkọwe ti JanusGraph kọ pe o dara fun mejeeji OLTP ati OLAP.

Mo ti ṣiṣẹ pẹlu BerkeleyDB, Apache Cassandra, Scylla ati ES, ati pe awọn ọja wọnyi ni igbagbogbo lo ninu awọn eto wa, nitorinaa Mo ni ireti nipa idanwo DBMS aworan yii. Mo rii pe o jẹ iyalẹnu lati yan BerkeleyDB lori RocksDB, ṣugbọn iyẹn ṣee ṣe nitori awọn ibeere idunadura naa. Ni eyikeyi idiyele, fun iwọn, lilo ọja, o daba lati lo ẹhin lori Cassandra tabi Scylla.

Emi ko gbero Neo4j nitori iṣupọ nilo ẹya ti iṣowo, iyẹn ni, ọja naa kii ṣe orisun ṣiṣi.

Awọn DBMS Graph sọ pe: “Ti o ba dabi iyaya kan, tọju rẹ bi iyaya!” - ẹwa!

Ni akọkọ, Mo ya aworan kan, eyiti a ṣe ni deede ni ibamu si awọn canons ti awọn DBMSs aworan:

Idanwo idanwo iwulo ti aworan JanusGraph DBMS fun ipinnu iṣoro ti wiwa awọn ọna to dara

Ohun pataki kan wa Zone, lodidi fun agbegbe. Ti o ba jẹ ZoneStep je ti yi Zone, lẹhinna o tọka si. Lori koko Area, ZoneTrack, Person Maṣe ṣe akiyesi, wọn wa si agbegbe ati pe a ko ṣe akiyesi bi apakan ti idanwo naa. Lapapọ, ibeere wiwa pq kan fun iru ọna ayaworan kan yoo dabi:

g.V().hasLabel('Zone').has('id',0).in_()
       .repeat(__.out()).until(__.out().hasLabel('Zone').has('id',19)).count().next()

Kini ni Ilu Rọsia jẹ nkan bii eyi: wa Agbegbe kan pẹlu ID = 0, mu gbogbo awọn inaro lati eyiti eti kan lọ si (ZoneStep), stomp lai pada sẹhin titi iwọ o fi rii awọn Igbesẹ Zone naa lati eyiti o wa eti si Agbegbe pẹlu ID=19, ka nọmba iru awọn ẹwọn.

Emi ko ṣe dibọn lati mọ gbogbo awọn intricacies ti wiwa lori awọn aworan, ṣugbọn ibeere yii jẹ ipilẹṣẹ ti o da lori iwe yii (https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html).

Mo kojọpọ awọn orin 50 ẹgbẹrun ti o wa lati awọn aaye 3 si 20 ni gigun sinu aaye data aworan JanusGraph ni lilo ẹhin BerkeleyDB, awọn atọka ti a ṣẹda ni ibamu si olori.

Python download akosile:


from random import random
from time import time

from init import g, graph

if __name__ == '__main__':

    points = []
    max_zones = 19
    zcache = dict()
    for i in range(0, max_zones + 1):
        zcache[i] = g.addV('Zone').property('id', i).next()

    startZ = zcache[0]
    endZ = zcache[max_zones]

    for i in range(0, 10000):

        if not i % 100:
            print(i)

        start = g.addV('ZoneStep').property('time', int(time())).next()
        g.V(start).addE('belongs').to(startZ).iterate()

        while True:
            pt = g.addV('ZoneStep').property('time', int(time())).next()
            end_chain = random()
            if end_chain < 0.3:
                g.V(pt).addE('belongs').to(endZ).iterate()
                g.V(start).addE('goes').to(pt).iterate()
                break
            else:
                zone_id = int(random() * max_zones)
                g.V(pt).addE('belongs').to(zcache[zone_id]).iterate()
                g.V(start).addE('goes').to(pt).iterate()

            start = pt

    count = g.V().count().next()
    print(count)

A lo VM pẹlu awọn ohun kohun 4 ati 16 GB Ramu lori SSD kan. JanusGraph ti ran lọ ni lilo aṣẹ yii:

docker run --name janusgraph -p8182:8182 janusgraph/janusgraph:latest

Ni idi eyi, awọn data ati awọn atọka ti a lo fun awọn wiwa baramu deede wa ni ipamọ ni BerkeleyDB. Lẹhin ṣiṣe ibeere ti a fun ni iṣaaju, Mo gba akoko kan ti o dọgba si ọpọlọpọ awọn mewa ti awọn aaya.

Nipa ṣiṣe awọn iwe afọwọkọ 4 loke ni afiwe, Mo ṣakoso lati tan DBMS sinu elegede kan pẹlu ṣiṣan idunnu ti awọn akopọ Java (ati pe gbogbo wa nifẹ kika awọn akopọ Java) ni awọn akọọlẹ Docker.

Lẹhin ironu diẹ, Mo pinnu lati sọ aworan atọka ni irọrun si atẹle yii:

Idanwo idanwo iwulo ti aworan JanusGraph DBMS fun ipinnu iṣoro ti wiwa awọn ọna to dara

Ṣiṣe ipinnu pe wiwa nipasẹ awọn abuda nkan yoo yara ju wiwa nipasẹ awọn egbegbe. Bi abajade, ibeere mi yipada si atẹle yii:

g.V().hasLabel('ZoneStep').has('id',0).repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',19)).count().next()

Kini ni Ilu Rọsia jẹ nkan bii eyi: wa ZoneStep pẹlu ID = 0, stomp laisi lilọ pada titi ti o fi rii ZoneStep pẹlu ID = 19, ka nọmba awọn ẹwọn bẹ.

Mo tun rọrun iwe afọwọkọ ikojọpọ ti a fun loke ki o ma ṣe ṣẹda awọn asopọ ti ko wulo, ni opin ara mi si awọn abuda.

Ibeere naa tun gba awọn iṣẹju-aaya pupọ lati pari, eyiti ko jẹ itẹwọgba patapata fun iṣẹ-ṣiṣe wa, nitori ko dara rara fun awọn idi ti awọn ibeere AdHoc iru eyikeyi.

Mo gbiyanju lati mu JanusGraph lọ ni lilo Scylla bi imuse Cassandra ti o yara ju, ṣugbọn eyi tun ko yorisi eyikeyi awọn ayipada iṣẹ ṣiṣe pataki.

Nitorinaa pelu otitọ pe “o dabi iyaya kan”, Emi ko le gba DBMS iwọn lati ṣe ilana ni iyara. Mo ro ni kikun pe nkan kan wa ti Emi ko mọ ati pe JanusGraph le ṣe lati ṣe wiwa yii ni ida kan ti iṣẹju kan, sibẹsibẹ, Emi ko ni anfani lati ṣe.

Niwọn igba ti iṣoro naa tun nilo lati yanju, Mo bẹrẹ si ronu nipa JOINs ati Pivots ti awọn tabili, eyiti ko ṣe iwuri ireti ni awọn ofin didara, ṣugbọn o le jẹ aṣayan iṣẹ-ṣiṣe patapata ni iṣe.

Iṣẹ akanṣe wa ti lo Apache ClickHouse tẹlẹ, nitorinaa Mo pinnu lati ṣe idanwo iwadii mi lori DBMS itupalẹ yii.

Ti firanṣẹ ClickHouse nipa lilo ohunelo ti o rọrun:

sudo docker run -d --name clickhouse_1 
     --ulimit nofile=262144:262144 
     -v /opt/clickhouse/log:/var/log/clickhouse-server 
     -v /opt/clickhouse/data:/var/lib/clickhouse 
     yandex/clickhouse-server

Mo ṣẹda data data ati tabili ninu rẹ bi eleyi:

CREATE TABLE 
db.steps (`area` Int64, `when` DateTime64(1, 'Europe/Moscow') DEFAULT now64(), `zone` Int64, `person` Int64) 
ENGINE = MergeTree() ORDER BY (area, zone, person) SETTINGS index_granularity = 8192

Mo kun pẹlu data nipa lilo iwe afọwọkọ atẹle yii:

from time import time

from clickhouse_driver import Client
from random import random

client = Client('vm-12c2c34c-df68-4a98-b1e5-a4d1cef1acff.domain',
                database='db',
                password='secret')

max = 20

for r in range(0, 100000):

    if r % 1000 == 0:
        print("CNT: {}, TS: {}".format(r, time()))

    data = [{
            'area': 0,
            'zone': 0,
            'person': r
        }]

    while True:
        if random() < 0.3:
            break

        data.append({
                'area': 0,
                'zone': int(random() * (max - 2)) + 1,
                'person': r
            })

    data.append({
            'area': 0,
            'zone': max - 1,
            'person': r
        })

    client.execute(
        'INSERT INTO steps (area, zone, person) VALUES',
        data
    )

Niwọn igba ti awọn ifibọ wa ni awọn ipele, kikun yiyara pupọ ju fun JanusGraph.

Ti ṣe awọn ibeere meji ni lilo JOIN. Lati gbe lati aaye A si aaye B:

SELECT s1.person AS person,
       s1.zone,
       s1.when,
       s2.zone,
       s2.when
FROM
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 0)) AS s1 ANY INNER JOIN
  (SELECT *
   FROM steps AS s2
   WHERE (area = 0)
     AND (zone = 19)) AS s2 USING person
WHERE s1.when <= s2.when

Lati lọ nipasẹ awọn aaye 3:

SELECT s3.person,
       s1z,
       s1w,
       s2z,
       s2w,
       s3.zone,
       s3.when
FROM
  (SELECT s1.person AS person,
          s1.zone AS s1z,
          s1.when AS s1w,
          s2.zone AS s2z,
          s2.when AS s2w
   FROM
     (SELECT *
      FROM steps
      WHERE (area = 0)
        AND (zone = 0)) AS s1 ANY INNER JOIN
     (SELECT *
      FROM steps AS s2
      WHERE (area = 0)
        AND (zone = 3)) AS s2 USING person
   WHERE s1.when <= s2.when) p ANY INNER JOIN
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 19)) AS s3 USING person
WHERE p.s2w <= s3.when

Awọn ibeere naa, nitorinaa, dabi ẹru pupọ; fun lilo gidi, o nilo lati ṣẹda ijanu olupilẹṣẹ sọfitiwia kan. Sibẹsibẹ, wọn ṣiṣẹ ati pe wọn ṣiṣẹ ni kiakia. Mejeeji awọn ibeere akọkọ ati keji ti pari ni kere ju awọn aaya 0.1. Eyi ni apẹẹrẹ ti akoko ipaniyan ibeere fun kika(*) ti o kọja nipasẹ awọn aaye mẹta:

SELECT count(*)
FROM 
(
    SELECT 
        s1.person AS person, 
        s1.zone AS s1z, 
        s1.when AS s1w, 
        s2.zone AS s2z, 
        s2.when AS s2w
    FROM 
    (
        SELECT *
        FROM steps
        WHERE (area = 0) AND (zone = 0)
    ) AS s1
    ANY INNER JOIN 
    (
        SELECT *
        FROM steps AS s2
        WHERE (area = 0) AND (zone = 3)
    ) AS s2 USING (person)
    WHERE s1.when <= s2.when
) AS p
ANY INNER JOIN 
(
    SELECT *
    FROM steps
    WHERE (area = 0) AND (zone = 19)
) AS s3 USING (person)
WHERE p.s2w <= s3.when

┌─count()─┐
│   11592 │
└─────────┘

1 rows in set. Elapsed: 0.068 sec. Processed 250.03 thousand rows, 8.00 MB (3.69 million rows/s., 117.98 MB/s.)

Akọsilẹ nipa IOPS. Nigbati o ba n gbe data jade, JanusGraph ṣe ipilẹṣẹ nọmba giga ti IOPS (1000-1300 fun awọn okun olugbe data mẹrin) ati pe IOWAIT ga gaan. Ni akoko kanna, ClickHouse ti ipilẹṣẹ fifuye iwonba lori disiki subsystem.

ipari

A pinnu lati lo ClickHouse lati ṣe iṣẹ iru ibeere yii. A le nigbagbogbo mu awọn ibeere siwaju sii nipa lilo awọn iwo ohun elo ati isọdọkan nipasẹ ṣiṣe iṣaju ṣiṣan iṣẹlẹ ni lilo Apache Flink ṣaaju ikojọpọ wọn sinu ClickHouse.

Iṣe naa dara tobẹẹ ti o ṣee ṣe kii yoo paapaa ni lati ronu nipa awọn tabili pivoting ni eto. Ni iṣaaju, a ni lati ṣe awọn pivots ti data ti a gba pada lati Vertica nipasẹ gbigbe si Apache Parquet.

Laanu, igbiyanju miiran lati lo DBMS aworan kan ko ni aṣeyọri. Emi ko rii JanusGraph lati ni ilolupo ore ti o jẹ ki o rọrun lati dide ni iyara pẹlu ọja naa. Ni akoko kanna, lati tunto olupin naa, ọna Java ti aṣa ni a lo, eyiti yoo jẹ ki awọn eniyan ti ko faramọ pẹlu Java sọkun omije ti ẹjẹ:

host: 0.0.0.0
port: 8182
threadPoolWorker: 1
gremlinPool: 8
scriptEvaluationTimeout: 30000
channelizer: org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer

graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties,
  airlines: conf/airlines.properties
}

scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/airline-sample.groovy]}}}}

serializers:
# GraphBinary is here to replace Gryo and Graphson
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }}
  # Gryo and Graphson, latest versions
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}

processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}

metrics: {
  consoleReporter: {enabled: false, interval: 180000},
  csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: false},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferHighWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
  enabled: false}

Mo ṣakoso lati “fi” ẹya BerkeleyDB ti JanusGraph lairotẹlẹ.

Iwe naa jẹ wiwọ pupọ ni awọn ofin ti awọn atọka, nitori ṣiṣakoso awọn atọka nilo ki o ṣe diẹ ninu awọn shamanism ajeji ni Groovy. Fun apẹẹrẹ, ṣiṣẹda atọka gbọdọ ṣee ṣe nipa kikọ koodu ni Gremlin console (eyiti, nipasẹ ọna, ko ṣiṣẹ jade ninu apoti). Lati iwe aṣẹ JanusGraph osise:

graph.tx().rollback() //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.buildIndex('byNameAndAgeComposite', Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()

//Wait for the index to become available
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameComposite').call()
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameAndAgeComposite').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"), SchemaAction.REINDEX).get()
mgmt.commit()

Lẹhin Ọrọ

Ni ọna kan, idanwo ti o wa loke jẹ afiwe laarin gbona ati rirọ. Ti o ba ronu nipa rẹ, DBMS aworan kan ṣe awọn iṣẹ miiran lati gba awọn abajade kanna. Sibẹsibẹ, gẹgẹbi apakan ti awọn idanwo, Mo tun ṣe idanwo kan pẹlu ibeere bii:

g.V().hasLabel('ZoneStep').has('id',0)
    .repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',1)).count().next()

eyi ti o ṣe afihan ijinna ririn. Sibẹsibẹ, paapaa lori iru data bẹẹ, DBMS aworan naa fihan awọn esi ti o kọja awọn iṣẹju diẹ ... Eyi, dajudaju, jẹ nitori otitọ pe awọn ọna wa bi 0 -> X -> Y ... -> 1, eyi ti ẹrọ ayaworan tun ṣayẹwo.

Paapaa fun ibeere bii:

g.V().hasLabel('ZoneStep').has('id',0).out().has('id',1)).count().next()

Emi ko le gba esi ti o ni eso pẹlu akoko ṣiṣe ti o kere ju iṣẹju kan.

Iwa ti itan naa ni pe imọran ẹlẹwa ati awoṣe paradigmatic ko yorisi abajade ti o fẹ, eyiti o ṣe afihan pẹlu ṣiṣe ti o ga julọ nipa lilo apẹẹrẹ ti ClickHouse. Ọran lilo ti a gbekalẹ ninu nkan yii jẹ apẹrẹ atako ti o han gbangba fun awọn DBMS ayaworan, botilẹjẹpe o dabi pe o dara fun awoṣe ni apẹrẹ wọn.

orisun: www.habr.com

Fi ọrọìwòye kun