He hoʻokolohua e hoʻāʻo ana i ka pono o ka JanusGraph graph DBMS no ka hoʻoponopono ʻana i ka pilikia o ka ʻimi ʻana i nā ala kūpono.

He hoʻokolohua e hoʻāʻo ana i ka pono o ka JanusGraph graph DBMS no ka hoʻoponopono ʻana i ka pilikia o ka ʻimi ʻana i nā ala kūpono.

Aloha kākou. Ke kūkulu nei mākou i kahi huahana no ka nānā ʻana i nā kaʻa waho. He hana ko ka papahana e pili ana i ka ʻikepili helu o nā ala malihini kipa ma nā wahi.

Ma ke ʻano o kēia hana, hiki i nā mea hoʻohana ke nīnau i nā nīnau ʻōnaehana o kēia ʻano:

  • ehia mau malihini i hele mai kahi "A" a i kahi "B";
  • ʻehia ka nui o nā malihini i hele mai kahi "A" a i kahi "B" ma o kahi "C" a laila ma kahi "D";
  • pehea ka lōʻihi o ka hele ʻana o kekahi ʻano malihini mai kahi “A” a i kahi “B”.

a me kekahi mau ninau analytical like.

ʻO ka neʻe ʻana o ka malihini ma nā wahi he pakuhi kuhikuhi. Ma hope o ka heluhelu ʻana i ka Pūnaewele, ʻike wau ua hoʻohana pū ʻia nā DBMS kiʻi no nā hōʻike analytical. Ua makemake au e ʻike pehea e hoʻokō ai nā DBMS kiʻi i kēlā mau nīnau (TL; DR; maikaʻi ʻole).

Ua koho au e hoʻohana i ka DBMS JanusGraph, ma ke ʻano he mākaʻikaʻi koʻikoʻi o ka DBMS open-source graph, e hilinaʻi nei i kahi pūʻulu o nā ʻenehana makua, a (i koʻu manaʻo) pono e hāʻawi iā ia i nā ʻano hana kūpono:

  • BerkeleyDB waihona waihona, Apache Cassandra, Scylla;
  • Hiki ke mālama ʻia nā index paʻakikī ma Lucene, Elasticsearch, Solr.

Ua kākau nā mea kākau o JanusGraph he kūpono ia no OLTP a me OLAP.

Ua hana au me BerkeleyDB, Apache Cassandra, Scylla a me ES, a ua hoʻohana pinepine ʻia kēia mau huahana i kā mākou ʻōnaehana, no laila ua manaʻo wau e hoʻāʻo i kēia DBMS kiʻi. Ua ʻike wau he mea paʻakikī ke koho iā BerkeleyDB ma luna o RocksDB, akā ma muli paha ia o nā koi kālepa. I kekahi hihia, no ka scalable, hoʻohana huahana, ua manaʻo ʻia e hoʻohana i kahi backend ma Cassandra a i ʻole Scylla.

ʻAʻole wau i noʻonoʻo iā Neo4j no ka mea e koi ana ka clustering i kahi mana pāʻoihana, ʻo ia hoʻi, ʻaʻole i wehe ʻia ka huahana.

'Ōlelo nā Graph DBMS: "Inā like ia me ka pakuhi, e mālama iā ia e like me ka pakuhi!" - nani!

ʻO ka mua, ua kahakiʻi au i kahi pakuhi, i hana ʻia e like me nā canons o ka pakuhi DBMSs:

He hoʻokolohua e hoʻāʻo ana i ka pono o ka JanusGraph graph DBMS no ka hoʻoponopono ʻana i ka pilikia o ka ʻimi ʻana i nā ala kūpono.

Aia kekahi ʻano Zone, kuleana no ia wahi. Ina ZoneStep no keia Zone, a laila kuhikuhi ʻo ia iā ia. Ma ke kumu Area, ZoneTrack, Person Mai hoʻolohe, aia lākou i ka domain a ʻaʻole i manaʻo ʻia he ʻāpana o ka hoʻāʻo. ʻO ka huina, e like me ke ʻano o kahi hulina hulina kaulahao no kēlā ʻano kiʻi:

g.V().hasLabel('Zone').has('id',0).in_()
       .repeat(__.out()).until(__.out().hasLabel('Zone').has('id',19)).count().next()

He aha ka ʻōlelo Lūkini e like me kēia: e ʻimi i kahi Zone me ID = 0, e lawe i nā vertices a pau kahi e hele ai kahi ʻaoʻao iā ia (ZoneStep), e hehi me ka hoʻi ʻole ʻana a ʻike ʻoe i kēlā mau ZoneSteps mai kahi ʻaoʻao i ka Zone me ID=19, e helu i ka helu o ia mau kaulahao.

ʻAʻole au e hoʻohālike i ka ʻike i nā mea paʻakikī o ka ʻimi ʻana ma nā pakuhi, akā ua hana ʻia kēia nīnau ma muli o kēia puke (https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html).

Ua hoʻouka au i 50 tausani mau mele mai ka 3 a hiki i ka 20 mau helu ka lōʻihi i loko o kahi waihona kiʻi kiʻi JanusGraph me ka hoʻohana ʻana i ka BerkeleyDB backend, i hana i nā kuhikuhi e like me alakaʻi.

Python hoʻoiho palapala:


from random import random
from time import time

from init import g, graph

if __name__ == '__main__':

    points = []
    max_zones = 19
    zcache = dict()
    for i in range(0, max_zones + 1):
        zcache[i] = g.addV('Zone').property('id', i).next()

    startZ = zcache[0]
    endZ = zcache[max_zones]

    for i in range(0, 10000):

        if not i % 100:
            print(i)

        start = g.addV('ZoneStep').property('time', int(time())).next()
        g.V(start).addE('belongs').to(startZ).iterate()

        while True:
            pt = g.addV('ZoneStep').property('time', int(time())).next()
            end_chain = random()
            if end_chain < 0.3:
                g.V(pt).addE('belongs').to(endZ).iterate()
                g.V(start).addE('goes').to(pt).iterate()
                break
            else:
                zone_id = int(random() * max_zones)
                g.V(pt).addE('belongs').to(zcache[zone_id]).iterate()
                g.V(start).addE('goes').to(pt).iterate()

            start = pt

    count = g.V().count().next()
    print(count)

Ua hoʻohana mākou i kahi VM me 4 cores a me 16 GB RAM ma kahi SSD. Ua hoʻohana ʻia ʻo JanusGraph me kēia kauoha:

docker run --name janusgraph -p8182:8182 janusgraph/janusgraph:latest

Ma kēia hihia, mālama ʻia nā ʻikepili a me nā kuhikuhi i hoʻohana ʻia no nā huli hoʻohālikelike pololei ma BerkeleyDB. Ma hope o ka hoʻokō ʻana i ka noi i hāʻawi ʻia ma mua, ua loaʻa iaʻu kahi manawa like me nā ʻumi kekona.

Ma ka holo ʻana i nā palapala 4 ma luna aʻe i ka like, ua hiki iaʻu ke hoʻohuli i ka DBMS i ka paukena me kahi kahawai ʻoliʻoli o Java stacktraces (a makemake mākou a pau i ka heluhelu ʻana i nā stacktraces Java) i nā lāʻau Docker.

Ma hope o ka noʻonoʻo ʻana, ua hoʻoholo wau e hoʻomaʻamaʻa i ke kiʻikuhi i kēia mau mea:

He hoʻokolohua e hoʻāʻo ana i ka pono o ka JanusGraph graph DBMS no ka hoʻoponopono ʻana i ka pilikia o ka ʻimi ʻana i nā ala kūpono.

ʻOi aku ka wikiwiki o ka huli ʻana ma nā ʻano hui ma mua o ka huli ʻana ma nā ʻaoʻao. ʻO ka hopena, ua lilo kaʻu noi i kēia:

g.V().hasLabel('ZoneStep').has('id',0).repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',19)).count().next()

He aha ka ʻōlelo Lūkini e like me kēia: e ʻimi iā ZoneStep me ID=0, hehi me ka hoʻi ʻole a hiki i ka loaʻa ʻana o ZoneStep me ID=19, e helu i ka helu o ia mau kaulahao.

Ua maʻalahi hoʻi au i ka palapala hoʻouka i hāʻawi ʻia ma luna nei i ʻole e hana i nā pilina pono ʻole, e kaupalena ana iaʻu iho i nā ʻano.

He mau kekona mau kekona e hoʻopau ai ka noi, ʻaʻole i ʻae ʻia no kā mākou hana, ʻoiai ʻaʻole kūpono ia no nā kumu o nā noi AdHoc o kēlā me kēia ʻano.

Ua ho'āʻo wau e hoʻohana iā JanusGraph me Scylla e like me ka wikiwiki o Cassandra hoʻokō, akā ʻaʻole kēia i alakaʻi i nā loli hana nui.

No laila, ʻoiai ʻo ka "like me ka pakuhi", ʻaʻole hiki iaʻu ke kiʻi i ka DBMS kiʻi e hana wikiwiki. Ke manaʻo nei au aia kekahi mea aʻu i ʻike ʻole ai a hiki ke hana ʻia ʻo JanusGraph e hana i kēia ʻimi i kahi hapa o kekona, akā naʻe, ʻaʻole hiki iaʻu ke hana.

No ka mea e pono e hoʻoponopono ʻia ka pilikia, hoʻomaka wau e noʻonoʻo e pili ana i nā JOIN a me nā Pivots o nā papa, ʻaʻole ia i hoʻoulu i ka manaʻo maikaʻi ma ke ʻano o ka nani, akā hiki ke lilo i koho kūpono loa i ka hana.

Ua hoʻohana mua kā mākou papahana iā Apache ClickHouse, no laila ua hoʻoholo wau e hoʻāʻo i kaʻu noiʻi ma kēia DBMS analytical.

Hoʻohana ʻia ʻo ClickHouse me ka hoʻohana ʻana i kahi meaʻai maʻalahi:

sudo docker run -d --name clickhouse_1 
     --ulimit nofile=262144:262144 
     -v /opt/clickhouse/log:/var/log/clickhouse-server 
     -v /opt/clickhouse/data:/var/lib/clickhouse 
     yandex/clickhouse-server

Ua hana au i kahi waihona a me kahi papa i loko e like me kēia:

CREATE TABLE 
db.steps (`area` Int64, `when` DateTime64(1, 'Europe/Moscow') DEFAULT now64(), `zone` Int64, `person` Int64) 
ENGINE = MergeTree() ORDER BY (area, zone, person) SETTINGS index_granularity = 8192

Ua hoʻopiha au iā ia me ka ʻikepili me ka hoʻohana ʻana i kēia palapala:

from time import time

from clickhouse_driver import Client
from random import random

client = Client('vm-12c2c34c-df68-4a98-b1e5-a4d1cef1acff.domain',
                database='db',
                password='secret')

max = 20

for r in range(0, 100000):

    if r % 1000 == 0:
        print("CNT: {}, TS: {}".format(r, time()))

    data = [{
            'area': 0,
            'zone': 0,
            'person': r
        }]

    while True:
        if random() < 0.3:
            break

        data.append({
                'area': 0,
                'zone': int(random() * (max - 2)) + 1,
                'person': r
            })

    data.append({
            'area': 0,
            'zone': max - 1,
            'person': r
        })

    client.execute(
        'INSERT INTO steps (area, zone, person) VALUES',
        data
    )

No ka hiki ʻana mai o nā mea hoʻokomo i nā pūʻulu, ʻoi aku ka wikiwiki o ka hoʻopiha ʻana ma mua o JanusGraph.

Ua kūkulu ʻia ʻelua nīnau me ka hoʻohana ʻana iā JOIN. No ka neʻe ʻana mai kahi A i kahi B:

SELECT s1.person AS person,
       s1.zone,
       s1.when,
       s2.zone,
       s2.when
FROM
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 0)) AS s1 ANY INNER JOIN
  (SELECT *
   FROM steps AS s2
   WHERE (area = 0)
     AND (zone = 19)) AS s2 USING person
WHERE s1.when <= s2.when

No ka hele ʻana i nā helu 3:

SELECT s3.person,
       s1z,
       s1w,
       s2z,
       s2w,
       s3.zone,
       s3.when
FROM
  (SELECT s1.person AS person,
          s1.zone AS s1z,
          s1.when AS s1w,
          s2.zone AS s2z,
          s2.when AS s2w
   FROM
     (SELECT *
      FROM steps
      WHERE (area = 0)
        AND (zone = 0)) AS s1 ANY INNER JOIN
     (SELECT *
      FROM steps AS s2
      WHERE (area = 0)
        AND (zone = 3)) AS s2 USING person
   WHERE s1.when <= s2.when) p ANY INNER JOIN
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 19)) AS s3 USING person
WHERE p.s2w <= s3.when

ʻO nā noi, ʻoiaʻiʻo, makaʻu loa; no ka hoʻohana maoli ʻana, pono ʻoe e hana i kahi lako polokalamu generator harness. Eia naʻe, hana lākou a hana wikiwiki. Hoʻopau ʻia nā noi mua a me ka lua ma lalo o 0.1 kekona. Eia kekahi laʻana o ka manawa hoʻokō nīnau no ka helu (*) e hele ana ma 3 mau helu:

SELECT count(*)
FROM 
(
    SELECT 
        s1.person AS person, 
        s1.zone AS s1z, 
        s1.when AS s1w, 
        s2.zone AS s2z, 
        s2.when AS s2w
    FROM 
    (
        SELECT *
        FROM steps
        WHERE (area = 0) AND (zone = 0)
    ) AS s1
    ANY INNER JOIN 
    (
        SELECT *
        FROM steps AS s2
        WHERE (area = 0) AND (zone = 3)
    ) AS s2 USING (person)
    WHERE s1.when <= s2.when
) AS p
ANY INNER JOIN 
(
    SELECT *
    FROM steps
    WHERE (area = 0) AND (zone = 19)
) AS s3 USING (person)
WHERE p.s2w <= s3.when

┌─count()─┐
│   11592 │
└─────────┘

1 rows in set. Elapsed: 0.068 sec. Processed 250.03 thousand rows, 8.00 MB (3.69 million rows/s., 117.98 MB/s.)

He leka e pili ana i IOPS. I ka helu ʻana i ka ʻikepili, ua hoʻopuka ʻo JanusGraph i kahi helu kiʻekiʻe loa o IOPS (1000-1300 no nā pae helu helu ʻehā) a ua kiʻekiʻe loa ʻo IOWAIT. I ka manawa like, ua hana ʻo ClickHouse i ka haʻahaʻa liʻiliʻi ma ka subsystem disk.

hopena

Ua hoʻoholo mākou e hoʻohana i ClickHouse e lawelawe i kēia ʻano noi. Hiki iā mākou ke hoʻonui i nā nīnau me ka hoʻohana ʻana i nā manaʻo a me ka hoʻohālikelike ʻana ma o ka hoʻoponopono mua ʻana i ke kahawai hanana me Apache Flink ma mua o ka hoʻouka ʻana iā lākou i ClickHouse.

Maikaʻi loa ka hana a ʻaʻole paha mākou e noʻonoʻo e pili ana i ka pivoting papa ma ka papahana. Ma mua, pono mākou e hana i nā pivots o ka ʻikepili i kiʻi ʻia mai Vertica ma o ka hoʻouka ʻana iā Apache Parquet.

ʻO ka mea pōʻino, ʻaʻole i kūleʻa ka hoʻāʻo ʻē aʻe e hoʻohana i ka pakuhi DBMS. ʻAʻole i loaʻa iaʻu iā JanusGraph kahi kaiaola aloha i maʻalahi i ka wikiwiki me ka huahana. I ka manawa like, no ka hoʻonohonoho ʻana i ke kikowaena, hoʻohana ʻia ke ala Java kuʻuna, kahi e uē ai ka poʻe ʻike ʻole iā Java i nā waimaka o ke koko:

host: 0.0.0.0
port: 8182
threadPoolWorker: 1
gremlinPool: 8
scriptEvaluationTimeout: 30000
channelizer: org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer

graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties,
  airlines: conf/airlines.properties
}

scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/airline-sample.groovy]}}}}

serializers:
# GraphBinary is here to replace Gryo and Graphson
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }}
  # Gryo and Graphson, latest versions
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}

processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}

metrics: {
  consoleReporter: {enabled: false, interval: 180000},
  csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: false},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferHighWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
  enabled: false}

Ua hiki ia'u ke "hookomo" i ka version BerkeleyDB o JanusGraph.

He kekee loa ka palapala ma ke ano o ka indexes, no ka mea, o ka hookele ana i na index, pono oe e hana i kekahi shamanism ano e ma Groovy. No ka laʻana, pono e hana ʻia kahi index ma ke kākau ʻana i ke code ma ka console Gremlin (ʻo ia, ma ke ala, ʻaʻole hana i waho o ka pahu). Mai ka palapala mana o JanusGraph:

graph.tx().rollback() //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.buildIndex('byNameAndAgeComposite', Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()

//Wait for the index to become available
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameComposite').call()
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameAndAgeComposite').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"), SchemaAction.REINDEX).get()
mgmt.commit()

Ma hope o ka'ōlelo

Ma kahi manaʻo, ʻo ka hoʻokolohua ma luna nei he hoʻohālikelike ma waena o ka mahana a me ka palupalu. Inā ʻoe e noʻonoʻo e pili ana iā ia, hana ka DBMS graph i nā hana ʻē aʻe e loaʻa ai nā hopena like. Eia naʻe, ma ke ʻano he ʻāpana o nā hoʻokolohua, ua hana pū wau i kahi hoʻokolohua me kahi noi e like me:

g.V().hasLabel('ZoneStep').has('id',0)
    .repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',1)).count().next()

e hōʻike ana i ka hele wāwae. Eia nō naʻe, ʻoiai ma ia ʻikepili, ua hōʻike ka DBMS graph i nā hopena i hele ma mua o kekahi mau kekona ... ʻO kēia, ʻoiaʻiʻo, ma muli o ka loaʻa ʻana o nā ala e like me 0 -> X -> Y ... -> 1, ka mea i nānā ʻia e ka ʻenekini kiʻi.

ʻOiai no kahi nīnau e like me:

g.V().hasLabel('ZoneStep').has('id',0).out().has('id',1)).count().next()

ʻAʻole hiki iaʻu ke loaʻa i kahi pane huahua me ka manawa hana ma lalo o kekona.

ʻO ka pono o ka moʻolelo ʻo ia ka manaʻo nani a me ka hoʻohālike paradigmatic ʻaʻole i alakaʻi i ka hopena i makemake ʻia, i hōʻike ʻia me ka ʻoi aku ka maikaʻi o ka hoʻohana ʻana i ka laʻana o ClickHouse. ʻO ka hihia hoʻohana i hōʻike ʻia i loko o kēia ʻatikala he ʻano anti-kiʻi no nā DBMS kiʻi, ʻoiai he kūpono ia no ka hoʻohālikelike ʻana i kā lākou paradigm.

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka