Nnwale na-anwale uru nke JanusGraph graph DBMS maka idozi nsogbu nke ịchọta ụzọ dabara adaba.

Nnwale na-anwale uru nke JanusGraph graph DBMS maka idozi nsogbu nke ịchọta ụzọ dabara adaba.

Ndewo, unu niile. Anyị na-emepụta ngwaahịa maka nyocha okporo ụzọ na-anọghị n'ịntanetị. Ọrụ a nwere ọrụ metụtara nyocha ọnụ ọgụgụ nke ụzọ ndị ọbịa n'ofe mpaghara.

Dịka akụkụ nke ọrụ a, ndị ọrụ nwere ike jụọ ajụjụ sistemụ nke ụdị ndị a:

  • ole ndị ọbịa si mpaghara "A" gafere na mpaghara "B";
  • ole ndị ọbịa si mpaghara "A" gafere na mpaghara "B" site na mpaghara "C" wee gafee mpaghara "D";
  • Ogologo oge ole ka ọ na-ewe maka ụfọdụ ụdị onye ọbịa iji si mpaghara "A" gaa na mpaghara "B".

yana ọtụtụ ajụjụ nyocha yiri ya.

Ntugharị onye ọbịa n'ofe mpaghara bụ eserese atụziri. Mgbe m gụchara ịntanetị, achọpụtara m na a na-ejikwa DBMS eserese maka akụkọ nyocha. Enwere m ọchịchọ ịhụ ka eserese DBMS ga-esi nagide ajụjụ ndị dị otú ahụ (TL; DR; adịghị mma).

M họọrọ iji DBMS Janus Graph, dị ka onye nnochite anya pụtara ìhè nke DBMS mepere emepe, nke na-adabere n'ọtụtụ teknụzụ tozuru oke, nke (n'uche m) kwesịrị inye ya njirimara arụmọrụ dị mma:

  • Ebe nchekwa nchekwa BerkeleyDB, Apache Cassandra, Scylla;
  • Enwere ike ịchekwa ndeksi mgbagwoju anya na Lucene, Elasticsearch, Solr.

Ndị dere JanusGraph dere na ọ dabara ma OLTP na OLAP.

M na-arụ ọrụ na BerkeleyDB, Apache Cassandra, Scylla na ES, na ngwaahịa ndị a na-ejikarị na anyị na sistemụ, otú ahụ ka m nwere nchekwube maka ịnwale nke a DBMS eserese. Achọpụtara m na ọ dị njọ ịhọrọ BerkeleyDB karịa RocksDB, mana nke ahụ nwere ike ịbụ n'ihi azụmahịa achọrọ. N'ọnọdụ ọ bụla, maka scalable, iji ngwaahịa, a na-atụ aro ka iji azụ azụ na Cassandra ma ọ bụ Scylla.

Echeghị m Neo4j n'ihi na nchịkọta chọrọ ụdị azụmahịa, ya bụ, ngwaahịa anaghị emeghe.

Ihe eserese DBMS na-ekwu: "Ọ bụrụ na ọ dị ka eserese, were ya dị ka eserese!" - mma!

Nke mbụ, m sere eserese, nke emere kpọmkwem dị ka canons nke eserese DBMS si dị:

Nnwale na-anwale uru nke JanusGraph graph DBMS maka idozi nsogbu nke ịchọta ụzọ dabara adaba.

Enwere isi ihe Zone, na-ahụ maka mpaghara ahụ. Ọ bụrụ ZoneStep bụ nke a Zone, mgbe ahụ ọ na-ezo aka na ya. Na isi Area, ZoneTrack, Person Akwụsịla nlebara anya, ha bụ nke ngalaba ahụ, a naghị ewere ha dịka akụkụ nke ule ahụ. Na mkpokọta, ajụjụ ọchụchọ yinye maka nhazi eserese dị otú ahụ ga-adị ka:

g.V().hasLabel('Zone').has('id',0).in_()
       .repeat(__.out()).until(__.out().hasLabel('Zone').has('id',19)).count().next()

Kedu ihe na Russian bụ ihe dị ka nke a: chọta mpaghara nwere ID = 0, were akụkụ niile nke akụkụ ya na-aga na ya (ZoneStep), sụọ ụkwụ na-agaghachi azụ ruo mgbe ịchọtara mpaghara ndị ahụ nke nwere ọnụ na mpaghara ahụ. ID=19, gụọ ọnụọgụgụ ụgbụ a.

Anaghị m eme ka m maara mgbagwoju anya niile nke ịchọ eserese na eserese, mana ewepụtara ajụjụ a dabere na akwụkwọ a (https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html).

Ejiri m egwu puku iri ise sitere na 50 ruo 3 n'ogologo n'ime nchekwa data JanusGraph site na iji BerkeleyDB backend, mepụtara indexes dị ka ndu.

Ederede nbudata Python:


from random import random
from time import time

from init import g, graph

if __name__ == '__main__':

    points = []
    max_zones = 19
    zcache = dict()
    for i in range(0, max_zones + 1):
        zcache[i] = g.addV('Zone').property('id', i).next()

    startZ = zcache[0]
    endZ = zcache[max_zones]

    for i in range(0, 10000):

        if not i % 100:
            print(i)

        start = g.addV('ZoneStep').property('time', int(time())).next()
        g.V(start).addE('belongs').to(startZ).iterate()

        while True:
            pt = g.addV('ZoneStep').property('time', int(time())).next()
            end_chain = random()
            if end_chain < 0.3:
                g.V(pt).addE('belongs').to(endZ).iterate()
                g.V(start).addE('goes').to(pt).iterate()
                break
            else:
                zone_id = int(random() * max_zones)
                g.V(pt).addE('belongs').to(zcache[zone_id]).iterate()
                g.V(start).addE('goes').to(pt).iterate()

            start = pt

    count = g.V().count().next()
    print(count)

Anyị jiri VM nwere cores 4 na 16 GB Ram na SSD. Ebugara JanusGraph site na iji iwu a:

docker run --name janusgraph -p8182:8182 janusgraph/janusgraph:latest

N'okwu a, a na-echekwa data na ndenye aha ndị a na-eji maka nyocha egwuregwu kpọmkwem na BerkeleyDB. N'ịbụ onye mezuru arịrịọ ahụ e nyere na mbụ, enwetara m oge ruru ọtụtụ iri iri nke sekọnd.

Site n'ịgba ọsọ nke 4 dị n'elu scripts n'otu n'otu, ejisiri m ike gbanwee DBMS ka ọ bụrụ ugu nwere iyi na-atọ ụtọ nke stacktraces Java (anyị niile na-enwe mmasị ịgụ stacktrace Java) na ndekọ Docker.

Mgbe m chechara echiche, ekpebiri m ime ka eserese eserese ahụ dị mfe ka ọ bụrụ ndị a:

Nnwale na-anwale uru nke JanusGraph graph DBMS maka idozi nsogbu nke ịchọta ụzọ dabara adaba.

Kpebisie ike na ịchọ site na njirimara ga-adị ngwa karịa ịchọ site n'akụkụ. N'ihi ya, arịrịọ m ghọrọ nke a:

g.V().hasLabel('ZoneStep').has('id',0).repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',19)).count().next()

Ihe na Russian bụ ihe dị ka nke a: chọta ZoneStep nwere ID = 0, steepụ na-alaghachi azụ ruo mgbe ịchọtara ZoneStep nwere ID = 19, gụọ ọnụ ọgụgụ nke agbụ ndị dị otú ahụ.

Emekwara m ihe odide nbudata nke enyere n'elu ka ọ ghara ịmepụta njikọ na-enweghị isi, na-amachi onwe m na njirimara.

Arịrịọ ahụ ka were ọtụtụ sekọnd iji wuchaa, bụ nke a na-anabataghị kpamkpam maka ọrụ anyị, ebe ọ bụ na ọ dabaraghị maka ebumnuche AdHoc ụdị ọ bụla.

Agbalịrị m ibuga JanusGraph site na iji Scylla dị ka mmejuputa Cassandra kacha ngwa ngwa, mana nke a ebuteghịkwa mgbanwe arụmọrụ ọ bụla.

Yabụ na n'agbanyeghị na "ọ dị ka eserese", enweghị m ike ịnweta DBMS eserese iji hazie ya ngwa ngwa. Echere m n'ụzọ zuru ezu na amaghị m ihe na ọ ga-ekwe omume ịme JanusGraph mee ọchụchọ a na nkewa nkewa, agbanyeghị, enweghị m ike ịme ya.

Ebe ọ bụ na nsogbu ahụ ka dị mkpa ka edozi ya, amalitere m iche echiche banyere JOINs na Pivots nke tebụl, nke na-adịghị akpali nchekwube n'ihe gbasara ịdị mma, ma ọ nwere ike ịbụ nhọrọ zuru oke na-arụ ọrụ na omume.

Ọrụ anyị ejirila Apache ClickHouse mee ihe, yabụ ekpebiri m ịnwale nyocha m na DBMS nyocha a.

Ebugara ClickHouse site na iji uzommeputa dị mfe:

sudo docker run -d --name clickhouse_1 
     --ulimit nofile=262144:262144 
     -v /opt/clickhouse/log:/var/log/clickhouse-server 
     -v /opt/clickhouse/data:/var/lib/clickhouse 
     yandex/clickhouse-server

Emere m nchekwa data na tebụl n'ime ya dị ka nke a:

CREATE TABLE 
db.steps (`area` Int64, `when` DateTime64(1, 'Europe/Moscow') DEFAULT now64(), `zone` Int64, `person` Int64) 
ENGINE = MergeTree() ORDER BY (area, zone, person) SETTINGS index_granularity = 8192

Eji m edemede a mejuo ya na data:

from time import time

from clickhouse_driver import Client
from random import random

client = Client('vm-12c2c34c-df68-4a98-b1e5-a4d1cef1acff.domain',
                database='db',
                password='secret')

max = 20

for r in range(0, 100000):

    if r % 1000 == 0:
        print("CNT: {}, TS: {}".format(r, time()))

    data = [{
            'area': 0,
            'zone': 0,
            'person': r
        }]

    while True:
        if random() < 0.3:
            break

        data.append({
                'area': 0,
                'zone': int(random() * (max - 2)) + 1,
                'person': r
            })

    data.append({
            'area': 0,
            'zone': max - 1,
            'person': r
        })

    client.execute(
        'INSERT INTO steps (area, zone, person) VALUES',
        data
    )

Ebe ntinye na-abata na batches, ndochi dị ngwa ngwa karịa maka JanusGraph.

Ejiri JỌỌIN rụọ ajụjụ abụọ. Iji si n'ókè A gaa n'ókè B:

SELECT s1.person AS person,
       s1.zone,
       s1.when,
       s2.zone,
       s2.when
FROM
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 0)) AS s1 ANY INNER JOIN
  (SELECT *
   FROM steps AS s2
   WHERE (area = 0)
     AND (zone = 19)) AS s2 USING person
WHERE s1.when <= s2.when

Iji nweta isi ihe atọ:

SELECT s3.person,
       s1z,
       s1w,
       s2z,
       s2w,
       s3.zone,
       s3.when
FROM
  (SELECT s1.person AS person,
          s1.zone AS s1z,
          s1.when AS s1w,
          s2.zone AS s2z,
          s2.when AS s2w
   FROM
     (SELECT *
      FROM steps
      WHERE (area = 0)
        AND (zone = 0)) AS s1 ANY INNER JOIN
     (SELECT *
      FROM steps AS s2
      WHERE (area = 0)
        AND (zone = 3)) AS s2 USING person
   WHERE s1.when <= s2.when) p ANY INNER JOIN
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 19)) AS s3 USING person
WHERE p.s2w <= s3.when

Arịrịọ ndị ahụ, n'ezie, na-atụ egwu nke ukwuu; maka iji ya eme ihe n'ezie, ịkwesịrị ịmepụta njikota jenerato ngwanrọ. Otú ọ dị, ha na-arụ ọrụ ma na-arụ ọrụ ngwa ngwa. A na-emecha arịrịọ nke mbụ na nke abụọ n'ihe na-erughị 0.1 sekọnd. Nke a bụ ọmụmaatụ nke oge mmezu ajụjụ maka ọnụ ọgụgụ(*) na-agafe isi atọ:

SELECT count(*)
FROM 
(
    SELECT 
        s1.person AS person, 
        s1.zone AS s1z, 
        s1.when AS s1w, 
        s2.zone AS s2z, 
        s2.when AS s2w
    FROM 
    (
        SELECT *
        FROM steps
        WHERE (area = 0) AND (zone = 0)
    ) AS s1
    ANY INNER JOIN 
    (
        SELECT *
        FROM steps AS s2
        WHERE (area = 0) AND (zone = 3)
    ) AS s2 USING (person)
    WHERE s1.when <= s2.when
) AS p
ANY INNER JOIN 
(
    SELECT *
    FROM steps
    WHERE (area = 0) AND (zone = 19)
) AS s3 USING (person)
WHERE p.s2w <= s3.when

┌─count()─┐
│   11592 │
└─────────┘

1 rows in set. Elapsed: 0.068 sec. Processed 250.03 thousand rows, 8.00 MB (3.69 million rows/s., 117.98 MB/s.)

Ederede gbasara IOPS. Mgbe ị na-ebi data, JanusGraph weputara ọnụ ọgụgụ IOPS dị oke elu (1000-1300 maka eriri data ọnụọgụ anọ) yana IOWAIT dị oke elu. N'otu oge ahụ, ClickHouse mepụtara obere ibu na sistemụ diski.

nkwubi

Anyị kpebiri iji ClickHouse nye ụdị arịrịọ a. Anyị nwere ike na-ebuli ajụjụ mgbe niile site na iji echiche ihe onwunwe na myirịta site na iji Apache Flink na-ebu ụzọ hazie iyi ihe omume tupu itinye ha na ClickHouse.

Arụmọrụ a dị ezigbo mma nke na ọ ga-abụ na anyị agaghị eche echiche maka ịmegharị tebụl na mmemme. Na mbụ, anyị ga-eme pivots nke data ewepụtara na Vertica site na bulite na Apache Parquet.

N'ụzọ dị mwute, mbọ ọzọ iji eserese DBMS agaghị eme nke ọma. Achọtaghị m JanusGraph ka ọ nwere gburugburu enyi na enyi nke mere ka ọ dị mfe iji ngwa ngwa na ngwaahịa ahụ. N'otu oge ahụ, iji hazie ihe nkesa ahụ, a na-eji ụzọ omenala Java eme ihe, nke ga-eme ka ndị na-amaghị Java tie mkpu anya mmiri:

host: 0.0.0.0
port: 8182
threadPoolWorker: 1
gremlinPool: 8
scriptEvaluationTimeout: 30000
channelizer: org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer

graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties,
  airlines: conf/airlines.properties
}

scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/airline-sample.groovy]}}}}

serializers:
# GraphBinary is here to replace Gryo and Graphson
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }}
  # Gryo and Graphson, latest versions
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}

processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}

metrics: {
  consoleReporter: {enabled: false, interval: 180000},
  csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: false},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferHighWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
  enabled: false}

Ejisiri m ike tinye "ụdị BerkeleyDB nke JanusGraph na mberede.

Akwụkwọ a gbagọrọ agbagọ n'ihe gbasara ndeksi, ebe njikwa ndepụta chọrọ ka ịmee ụfọdụ shamanism dị ịtụnanya na Groovy. Dịka ọmụmaatụ, ịmepụta index ga-emerịrị site na ide koodu na Gremlin console (nke, n'ụzọ, anaghị arụ ọrụ na igbe). Site na akwụkwọ JanusGraph gọọmentị:

graph.tx().rollback() //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.buildIndex('byNameAndAgeComposite', Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()

//Wait for the index to become available
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameComposite').call()
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameAndAgeComposite').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"), SchemaAction.REINDEX).get()
mgmt.commit()

Afterword

N'echiche, nnwale a dị n'elu bụ ntụnyere n'etiti ọkụ na nro. Ọ bụrụ na ị na-eche maka ya, DBMS eserese na-arụ ọrụ ndị ọzọ iji nweta otu nsonaazụ ahụ. Agbanyeghị, dịka akụkụ nke ule ahụ, emekwara m nnwale na arịrịọ dịka:

g.V().hasLabel('ZoneStep').has('id',0)
    .repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',1)).count().next()

nke na-egosipụta ịdị anya ije. Otú ọ dị, ọbụna na data dị otú ahụ, DBMS eserese gosipụtara nsonaazụ nke gafere sekọnd ole na ole ... Nke a, n'ezie, bụ n'ihi na e nwere ụzọ ndị dị ka. 0 -> X -> Y ... -> 1, nke igwe eserese na-enyochakwa.

Ọbụna maka ajụjụ dị ka:

g.V().hasLabel('ZoneStep').has('id',0).out().has('id',1)).count().next()

Enweghị m ike ịnweta nzaghachi na-arụpụta ihe na oge nhazi nke na-erughị otu sekọnd.

Omume nke akụkọ ahụ bụ na echiche mara mma na ihe ngosi paradaịs adịghị eduga na nsonaazụ achọrọ, nke gosipụtara na arụmọrụ dị elu site na iji ihe atụ nke ClickHouse. Okwu eji ewepụtara n'isiokwu a bụ ihe mgbochi doro anya maka eserese DBMS, n'agbanyeghị na ọ dabara adaba maka ịmegharị n'ụdị ha.

isi: www.habr.com

Tinye a comment