He whakamatautau e whakamatautau ana i te whaihuatanga o te kauwhata JanusGraph DBMS mo te whakaoti rapanga ki te rapu huarahi tika

He whakamatautau e whakamatautau ana i te whaihuatanga o te kauwhata JanusGraph DBMS mo te whakaoti rapanga ki te rapu huarahi tika

Kia ora katoa. Kei te whakawhanake matou i tetahi hua mo te tātari waka tuimotu. He mahi ta te kaupapa e pa ana ki te tātari tauanga o nga huarahi manuhiri puta noa i nga rohe.

Hei waahanga o tenei mahi, ka taea e nga kaiwhakamahi te patai i nga patai punaha o te momo e whai ake nei:

  • e hia nga manuhiri i haere mai i te waahi "A" ki te waahi "B";
  • e hia nga manuhiri i haere mai i te waahi "A" ki te waahi "B" ma te waahi "C" ka haere ma te waahi "D";
  • e hia te roa o te haerenga o tetahi momo manuhiri mai i te waahi "A" ki te waahi "B".

me te maha o nga paatai ​​wetewete rite.

Ko te nekehanga o te manuhiri puta noa i nga waahi he kauwhata whai mana. I muri i te panui i te Ipurangi, ka kitea e au kei te whakamahia nga DBMS kauwhata mo nga purongo tātari. I hiahia ahau ki te kite me pehea e taea ai e nga DBMS kauwhata nga patai penei (TL; DR; kino).

I whiriwhiri ahau ki te whakamahi i te DBMS JanusGraph, hei tino maangai mo te kauwhata tuwhera-puna DBMS, e whakawhirinaki ana ki te puranga o nga hangarau pakeke, e (ki taku whakaaro) me whakarato ki a ia nga ahuatanga whakahaere tika:

  • Putunga rokiroki BerkeleyDB, Apache Cassandra, Scylla;
  • Ka taea te penapena i nga tohu matatini ki Lucene, Elasticsearch, Solr.

Ka tuhia e nga kaituhi o JanusGraph he pai mo te OLTP me te OLAP.

I mahi tahi ahau me BerkeleyDB, Apache Cassandra, Scylla me ES, a he maha nga wa e whakamahia ana enei hua i roto i a maatau punaha, na reira i tino pai ahau ki te whakamatautau i tenei kauwhata DBMS. I kitea e au he rerekee te whiriwhiri i a BerkeleyDB i runga i a RocksDB, engari na nga whakaritenga tauwhitinga pea tera. Ahakoa he aha, mo te tauineine, te whakamahi hua, ka whakaarohia kia whakamahia he tuara i runga i a Cassandra, i a Scylla ranei.

Kaore au i whakaaro ki a Neo4j na te mea ka hiahiatia e te kohinga he putanga arumoni, ara, ehara te hua i te puna tuwhera.

E kii ana nga DBMS Kauwhata: "Mena he kauwhata te ahua, me rite ki te kauwhata!" - ataahua!

Tuatahi, ka tuhia e au he kauwhata, he mea hanga kia rite ki nga canons o nga DBMS kauwhata:

He whakamatautau e whakamatautau ana i te whaihuatanga o te kauwhata JanusGraph DBMS mo te whakaoti rapanga ki te rapu huarahi tika

He maatanga Zone, te kawenga mo te rohe. Mehemea ZoneStep no tenei Zone, katahi ka korerotia e ia. I runga i te mauri Area, ZoneTrack, Person Kaua e aro, no te rohe ratou, kaore i te whakaarohia hei waahanga o te whakamatautau. I te katoa, ka penei te ahua o te uiui rapu mekameka mo taua hanganga kauwhata:

g.V().hasLabel('Zone').has('id',0).in_()
       .repeat(__.out()).until(__.out().hasLabel('Zone').has('id',19)).count().next()

He aha te reo Ruhia he penei: kimihia he Rohe me te ID=0, tangohia nga poutoko katoa e haere ai tetahi tapa ki reira (ZoneStep), takahia kaua e hoki whakamuri kia kitea ra ano nga RoheSteps kei reira he tapa ki te Rohe ID=19, tatauhia te maha o aua mekameka.

Kare au i te kii kei te mohio au ki nga uauatanga katoa o te rapu i runga kauwhata, engari i hangaia tenei patai i runga i tenei pukapuka (https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html).

I utaina e ahau te 50 mano riu mai i te 3 ki te 20 ira te roa ki roto i te putunga kauwhata JanusGraph ma te whakamahi i te tuara o BerkeleyDB, i hangaia nga tohu e ai ki te ārahitanga.

Python tikiake hōtuhi:


from random import random
from time import time

from init import g, graph

if __name__ == '__main__':

    points = []
    max_zones = 19
    zcache = dict()
    for i in range(0, max_zones + 1):
        zcache[i] = g.addV('Zone').property('id', i).next()

    startZ = zcache[0]
    endZ = zcache[max_zones]

    for i in range(0, 10000):

        if not i % 100:
            print(i)

        start = g.addV('ZoneStep').property('time', int(time())).next()
        g.V(start).addE('belongs').to(startZ).iterate()

        while True:
            pt = g.addV('ZoneStep').property('time', int(time())).next()
            end_chain = random()
            if end_chain < 0.3:
                g.V(pt).addE('belongs').to(endZ).iterate()
                g.V(start).addE('goes').to(pt).iterate()
                break
            else:
                zone_id = int(random() * max_zones)
                g.V(pt).addE('belongs').to(zcache[zone_id]).iterate()
                g.V(start).addE('goes').to(pt).iterate()

            start = pt

    count = g.V().count().next()
    print(count)

I whakamahia e matou he VM me nga waahanga 4 me te 16 GB RAM i runga i te SSD. I tukuna a JanusGraph ma te whakamahi i tenei whakahau:

docker run --name janusgraph -p8182:8182 janusgraph/janusgraph:latest

I tenei keehi, ko nga raraunga me nga taurangi e whakamahia ana mo nga rapunga orite tika ka rongoa ki BerkeleyDB. I muri i te whakatutuki i te tono i tukuna mai i mua, ka whiwhi ahau i te wa e rite ana ki etahi tekau hēkona.

Na roto i te whakahaere i nga tuhinga e 4 i runga ake nei i te whakarara, i taea e au te huri i te DBMS ki te paukena me te awa koa o Java stacktraces (a he pai ki a matou te panui Java stacktraces) i roto i nga raarangi Docker.

I muri i etahi whakaaro, ka whakatau ahau ki te whakangawari i te hoahoa kauwhata ki enei e whai ake nei:

He whakamatautau e whakamatautau ana i te whaihuatanga o te kauwhata JanusGraph DBMS mo te whakaoti rapanga ki te rapu huarahi tika

Ko te whakatau ko te rapu ma nga huanga hinonga ka tere ake i te rapu ma te taha. Ko te mutunga, ka huri taku tono ki enei e whai ake nei:

g.V().hasLabel('ZoneStep').has('id',0).repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',19)).count().next()

He aha te reo Ruhia he penei: kimihia te ZoneStep me te ID=0, takahia kaua e hoki whakamuri kia kitea ra ano a ZoneStep me te ID=19, tatauhia te maha o aua mekameka.

I whakangwarihia ano e ahau nga tuhinga uta i runga ake nei kia kore ai e hanga hononga koretake, ka whakawhäitihia ahau ki nga huanga.

He maha nga hēkona tonu te tono ki te whakaoti, kaore i tino whakaaetia mo a maatau mahi, i te mea kaore i te pai mo nga kaupapa o nga tono a AdHoc ahakoa he aha.

I whakamatau ahau ki te whakamahi i a JanusGraph ma te whakamahi i a Scylla hei whakatinanatanga tere a Cassandra, engari kaore ano tenei i arahi ki nga huringa mahi nui.

Na ahakoa te mea "he ahua kauwhata" karekau e taea e te kauwhata DBMS te tukatuka tere. Kei te tino whakaaro ahau he mea kaore au i te mohio ka taea e JanusGraph te mahi i tenei rapunga i roto i te hautanga o te hēkona, heoi, kaore au i kaha ki te mahi.

I te mea me whakatika tonu te raru, ka tiimata ahau ki te whakaaro mo nga Hononga me nga Kaurori o nga ripanga, kaore nei i whakahihiko i te whakaaro o te huatau, engari ka taea pea te mahi i roto i nga mahi.

Kua whakamahia e to maatau kaupapa a Apache ClickHouse, no reira ka whakatau ahau ki te whakamatau i aku rangahau mo tenei DBMS tātari.

I tukuna a ClickHouse ma te whakamahi i tetahi tohutao ngawari:

sudo docker run -d --name clickhouse_1 
     --ulimit nofile=262144:262144 
     -v /opt/clickhouse/log:/var/log/clickhouse-server 
     -v /opt/clickhouse/data:/var/lib/clickhouse 
     yandex/clickhouse-server

I hanga e ahau he papa raraunga me tetahi ripanga ki roto penei:

CREATE TABLE 
db.steps (`area` Int64, `when` DateTime64(1, 'Europe/Moscow') DEFAULT now64(), `zone` Int64, `person` Int64) 
ENGINE = MergeTree() ORDER BY (area, zone, person) SETTINGS index_granularity = 8192

I whakakiia e ahau ki nga raraunga ma te whakamahi i te tuhinga e whai ake nei:

from time import time

from clickhouse_driver import Client
from random import random

client = Client('vm-12c2c34c-df68-4a98-b1e5-a4d1cef1acff.domain',
                database='db',
                password='secret')

max = 20

for r in range(0, 100000):

    if r % 1000 == 0:
        print("CNT: {}, TS: {}".format(r, time()))

    data = [{
            'area': 0,
            'zone': 0,
            'person': r
        }]

    while True:
        if random() < 0.3:
            break

        data.append({
                'area': 0,
                'zone': int(random() * (max - 2)) + 1,
                'person': r
            })

    data.append({
            'area': 0,
            'zone': max - 1,
            'person': r
        })

    client.execute(
        'INSERT INTO steps (area, zone, person) VALUES',
        data
    )

I te mea ka uru mai nga whakaurunga, he tere ake te whakakii i a JanusGraph.

I hangaia nga patai e rua ma te whakamahi i te JOIN. Hei neke mai i te waahi A ki te waahi B:

SELECT s1.person AS person,
       s1.zone,
       s1.when,
       s2.zone,
       s2.when
FROM
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 0)) AS s1 ANY INNER JOIN
  (SELECT *
   FROM steps AS s2
   WHERE (area = 0)
     AND (zone = 19)) AS s2 USING person
WHERE s1.when <= s2.when

Ki te haere i roto i nga tohu e 3:

SELECT s3.person,
       s1z,
       s1w,
       s2z,
       s2w,
       s3.zone,
       s3.when
FROM
  (SELECT s1.person AS person,
          s1.zone AS s1z,
          s1.when AS s1w,
          s2.zone AS s2z,
          s2.when AS s2w
   FROM
     (SELECT *
      FROM steps
      WHERE (area = 0)
        AND (zone = 0)) AS s1 ANY INNER JOIN
     (SELECT *
      FROM steps AS s2
      WHERE (area = 0)
        AND (zone = 3)) AS s2 USING person
   WHERE s1.when <= s2.when) p ANY INNER JOIN
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 19)) AS s3 USING person
WHERE p.s2w <= s3.when

Ko nga tono, he tino whakamataku te ahua; mo te whakamahi pono, me hanga e koe he taputapu miihini rorohiko. Heoi, ka mahi, ka tere te mahi. Ko nga tono tuatahi me te tuarua ka oti i roto i te 0.1 hēkona. Anei he tauira o te wa mahi uiui mo te tatau(*) e haere ana i nga tohu e 3:

SELECT count(*)
FROM 
(
    SELECT 
        s1.person AS person, 
        s1.zone AS s1z, 
        s1.when AS s1w, 
        s2.zone AS s2z, 
        s2.when AS s2w
    FROM 
    (
        SELECT *
        FROM steps
        WHERE (area = 0) AND (zone = 0)
    ) AS s1
    ANY INNER JOIN 
    (
        SELECT *
        FROM steps AS s2
        WHERE (area = 0) AND (zone = 3)
    ) AS s2 USING (person)
    WHERE s1.when <= s2.when
) AS p
ANY INNER JOIN 
(
    SELECT *
    FROM steps
    WHERE (area = 0) AND (zone = 19)
) AS s3 USING (person)
WHERE p.s2w <= s3.when

┌─count()─┐
│   11592 │
└─────────┘

1 rows in set. Elapsed: 0.068 sec. Processed 250.03 thousand rows, 8.00 MB (3.69 million rows/s., 117.98 MB/s.)

He korero mo te IOPS. I te wa e noho ana nga raraunga, i hangaia e JanusGraph he tau tino nui o IOPS (1000-1300 mo nga miro taupori raraunga e wha) me te IOWAIT he tino tiketike. I te wa ano, i hangaia e ClickHouse te iti o te kawenga i runga i te punaha iti o te kōpae.

mutunga

I whakatau matou ki te whakamahi ClickHouse ki te mahi i tenei momo tono. Ka taea e taatau te arotau i nga paatai ​​​​ma te whakamahi i nga tirohanga me te whakarara ma te tukatuka i mua i te awa huihuinga ma te whakamahi i te Apache Flink i mua i te utaina ki ClickHouse.

He pai rawa te mahinga, kare pea tatou e whai whakaaro mo te pivoting table ma te hotaka. I mua, me mahi tatou i nga pivots o nga raraunga i tangohia mai i Vertica ma te tuku ki Apache Parquet.

Ko te mea pouri, ko tetahi atu ngana ki te whakamahi i te kauwhata DBMS kaore i tutuki. Kaore au i kite i a JanusGraph he puunaha rauwiringa kaiao e ngawari ana ki te whakatika tere me te hua. I te wa ano, ki te whirihora i te kaimau, ka whakamahia te tikanga Java tuku iho, ka tangi nga roimata o te toto o nga tangata kaore e mohio ki a Java:

host: 0.0.0.0
port: 8182
threadPoolWorker: 1
gremlinPool: 8
scriptEvaluationTimeout: 30000
channelizer: org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer

graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties,
  airlines: conf/airlines.properties
}

scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/airline-sample.groovy]}}}}

serializers:
# GraphBinary is here to replace Gryo and Graphson
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }}
  # Gryo and Graphson, latest versions
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}

processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}

metrics: {
  consoleReporter: {enabled: false, interval: 180000},
  csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: false},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferHighWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
  enabled: false}

I taea e au te "tuku" i te putanga BerkeleyDB o JanusGraph.

Ko nga tuhinga he tino kopikopiko mo nga tohu, na te mea ko te whakahaere i nga taurangi me mahi koe i etahi mahi shamanism rerekee i Groovy. Hei tauira, ko te hanga i tetahi tohu me mahi ma te tuhi i te waehere i roto i te papatohu Gremlin (e, na te ara, kaore e mahi i waho o te pouaka). Mai i te tuhinga mana JanusGraph:

graph.tx().rollback() //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.buildIndex('byNameAndAgeComposite', Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()

//Wait for the index to become available
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameComposite').call()
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameAndAgeComposite').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"), SchemaAction.REINDEX).get()
mgmt.commit()

Afterword

I roto i te tikanga, ko te whakamatautau i runga ake nei he whakataurite i waenga i te mahana me te ngawari. Mena ka whakaarohia e koe, ka mahia e te kauwhata DBMS etahi atu mahi kia rite ai nga hua. Heoi, hei waahanga o nga whakamatautau, i whakahaerehia ano e ahau tetahi whakamatautau me te tono penei:

g.V().hasLabel('ZoneStep').has('id',0)
    .repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',1)).count().next()

e whakaatu ana i te tawhiti hikoi. Heoi, ahakoa i runga i aua raraunga, i whakaatu te kauwhata DBMS i nga hua i puta i tua atu i etahi hēkona ... Ko tenei, he tika, na te mea he huarahi penei 0 -> X -> Y ... -> 1, i tirohia hoki e te miihini kauwhata.

Ahakoa mo te patai penei:

g.V().hasLabel('ZoneStep').has('id',0).out().has('id',1)).count().next()

Kaore i taea e au te whiwhi urupare whai hua me te wa tukatuka iti iho i te hekona.

Ko te morare o te korero ko te whakaaro ataahua me te whakatauira paradigmatic e kore e arahi ki te hua e hiahiatia ana, e whakaatuhia ana me te tino pai ake ma te whakamahi i te tauira o ClickHouse. Ko te take whakamahi e whakaatuhia ana i roto i tenei tuhinga he tino tauira anti-tauira mo nga DBMS kauwhata, ahakoa he pai te ahua mo te whakatauira i roto i o raatau tauira.

Source: will.com

Tāpiri i te kōrero