Teko e lekang ho sebetsa ha DBMS ea graph ea JanusGraph bakeng sa ho rarolla bothata ba ho fumana litsela tse nepahetseng.

Teko e lekang ho sebetsa ha DBMS ea graph ea JanusGraph bakeng sa ho rarolla bothata ba ho fumana litsela tse nepahetseng.

Lumelang bohle. Re etsa sehlahisoa bakeng sa tlhahlobo ea sephethephethe ntle le marang-rang. Morero o na le mosebetsi o amanang le tlhahlobo ea lipalo-palo ea litsela tsa baeti ho pholletsa le libaka.

E le karolo ea mosebetsi ona, basebelisi ba ka botsa lipotso tsa tsamaiso ea mofuta o latelang:

  • ke baeti ba bakae ba ileng ba tloha sebakeng sa "A" ho ea sebakeng sa "B";
  • ke baeti ba bakae ba fetileng ho tloha sebakeng sa "A" ho ea sebakeng sa "B" ho ea sebakeng sa "C" ebe ba feta sebakeng sa "D";
  • ho nkile nako e kae hore moeti wa mofuta o itseng a tsamaye ho tloha sebakeng sa “A” ho ya sebakeng sa “B”.

le lipotso tse ngata tse tšoanang tsa tlhahlobo.

Ho tsamaea ha moeti ho pholletsa le libaka ke kerafo e tobileng. Kamora ho bala Marang-rang, ke ile ka fumana hore li-graph DBMS li boetse li sebelisoa bakeng sa litlaleho tsa tlhahlobo. Ke ne ke e-na le takatso ea ho bona hore na li-DBMS li ka sebetsana joang le lipotso tse joalo (TL; DR; hampe).

Ke khethile ho sebelisa DBMS JanusGraph, joalo ka moemeli ea hlahelletseng oa DBMS ea mohloli o bulehileng oa graph, e itšetlehileng ka bongata ba theknoloji e hōlileng tsebong, eo (ka maikutlo a ka) e lokelang ho e fana ka litšobotsi tse ntle tsa ts'ebetso:

  • Backend ea polokelo ea BerkeleyDB, Apache Cassandra, Scylla;
  • li-index tse rarahaneng li ka bolokoa ho Lucene, Elasticsearch, Solr.

Bangoli ba JanusGraph ba ngola hore e loketse OLTP le OLAP ka bobeli.

Ke sebelitse le BerkeleyDB, Apache Cassandra, Scylla le ES, 'me lihlahisoa tsena li sebelisoa hangata lits'ebetsong tsa rona, kahoo ke ne ke na le tšepo ea ho hlahloba kerafo ena ea DBMS. Ke fumane e le ntho e makatsang ho khetha BerkeleyDB holim'a RocksDB, empa mohlomong ke ka lebaka la litlhoko tsa transaction. Ho sa tsotellehe boemo leha e le bofe, bakeng sa ts'ebeliso ea lihlahisoa, ho kgothaletswa ho sebelisa backend ho Cassandra kapa Scylla.

Ha kea ka ka nahana ka Neo4j hobane ho kopanya ho hloka mofuta oa khoebo, ke hore, sehlahisoa ha se mohloli o bulehileng.

Kerafo ea DBMS e re: "Haeba e shebahala joalo ka kerafo, e tšoare joalo ka kerafo!" - botle!

Taba ea pele, ke ile ka taka graph, e entsoeng hantle ho latela li-canon tsa graph DBMS:

Teko e lekang ho sebetsa ha DBMS ea graph ea JanusGraph bakeng sa ho rarolla bothata ba ho fumana litsela tse nepahetseng.

Ho na le moelelo Zone, ea ikarabellang sebakeng seo. Haeba ZoneStep ke ea sena Zone, ebe o bua ka eona. Hantle-ntle Area, ZoneTrack, Person U se ke ua ela hloko, ke ba sebaka sa marang-rang 'me ha ba nkoe e le karolo ea teko. Ka kakaretso, potso ea ho batla ketane bakeng sa sebopeho se joalo sa kerafo e ka shebahala tjena:

g.V().hasLabel('Zone').has('id',0).in_()
       .repeat(__.out()).until(__.out().hasLabel('Zone').has('id',19)).count().next()

Seo ka Serussia e leng ntho e kang ena: fumana Zone e nang le ID = 0, nka lintlha tsohle tseo moeli o eang ho tsona (ZoneStep), u hatakele ntle le ho khutlela morao ho fihlela u fumana ZoneSteps eo ho eona ho nang le moeli oa Zone ka ID=19, bala palo ea liketane tse joalo.

Ha ke etse eka ke tseba lintho tsohle tse rarahaneng tsa ho batla lirapeng, empa potso ena e hlahisitsoe ho latela buka ena (https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html).

Ke laetse lipina tse likete tse 50 ho tloha ho 3 ho isa ho lintlha tse 20 ka bolelele ho database ea graph ea JanusGraph ke sebelisa BerkeleyDB backend, ke thehile li-index ho latela boetapele.

Python download script:


from random import random
from time import time

from init import g, graph

if __name__ == '__main__':

    points = []
    max_zones = 19
    zcache = dict()
    for i in range(0, max_zones + 1):
        zcache[i] = g.addV('Zone').property('id', i).next()

    startZ = zcache[0]
    endZ = zcache[max_zones]

    for i in range(0, 10000):

        if not i % 100:
            print(i)

        start = g.addV('ZoneStep').property('time', int(time())).next()
        g.V(start).addE('belongs').to(startZ).iterate()

        while True:
            pt = g.addV('ZoneStep').property('time', int(time())).next()
            end_chain = random()
            if end_chain < 0.3:
                g.V(pt).addE('belongs').to(endZ).iterate()
                g.V(start).addE('goes').to(pt).iterate()
                break
            else:
                zone_id = int(random() * max_zones)
                g.V(pt).addE('belongs').to(zcache[zone_id]).iterate()
                g.V(start).addE('goes').to(pt).iterate()

            start = pt

    count = g.V().count().next()
    print(count)

Re sebelisitse VM e nang le li-cores tse 4 le 16 GB RAM ho SSD. JanusGraph e ile ea sebelisoa ho sebelisa taelo ena:

docker run --name janusgraph -p8182:8182 janusgraph/janusgraph:latest

Tabeng ena, lintlha le li-index tse sebelisetsoang lipatlisiso tsa lipapali li bolokiloe BerkeleyDB. Kaha ke entse kōpo e fanoeng pejana, ke ile ka fumana nako e lekanang le metsotsoana e mashome a ’maloa.

Ka ho tsamaisa lingoloa tse 4 tse kaholimo ka tsela e ts'oanang, ke khonne ho fetola DBMS mokopu o nang le molapo o monate oa li-stacktraces tsa Java ('me kaofela re rata ho bala li-stacktraces tsa Java) ho li-logs tsa Docker.

Ka mor'a ho nahana, ke ile ka etsa qeto ea ho nolofatsa setšoantšo sa graph hore e be se latelang:

Teko e lekang ho sebetsa ha DBMS ea graph ea JanusGraph bakeng sa ho rarolla bothata ba ho fumana litsela tse nepahetseng.

Ho etsa qeto ea hore ho batlisisa ka litšobotsi tsa mokhatlo ho tla ba kapele ho feta ho batla ka mathōko. Ka lebaka leo, kopo ea ka e ile ea fetoha e latelang:

g.V().hasLabel('ZoneStep').has('id',0).repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',19)).count().next()

Seo ka Serussia e leng ntho e kang ena: fumana ZoneStep ka ID=0, stomp ntle le ho khutlela morao ho fihlela u fumana ZoneStep ka ID=19, bala palo ea liketane tse joalo.

Ke boetse ke nolofalitse sengoloa sa ho jarolla se fanoeng kaholimo e le hore ke se ke ka theha likhokahano tse sa hlokahaleng, ke ipehela ho litšoaneleho.

Kopo e ne e ntse e nka metsotsoana e 'maloa hore e phethe, e neng e sa amohelehe ho hang bakeng sa mosebetsi oa rona, kaha e ne e sa tšoanelehe ho hang bakeng sa likopo tsa AdHoc tsa mofuta ofe kapa ofe.

Ke lekile ho tsamaisa JanusGraph ke sebelisa Scylla joalo ka ts'ebetso e potlakileng ea Cassandra, empa sena ha sea ka sa lebisa liphetohong tsa bohlokoa tsa ts'ebetso.

Kahoo ho sa tsotellehe taba ea hore "e shebahala joaloka kerafo", ha kea ka ka fumana graph DBMS ho e sebetsa kapele. Ke nahana ka botlalo hore ho na le ntho eo ke sa e tsebeng le hore JanusGraph e ka etsoa ho etsa patlisiso ena ka motsotsoana, leha ho le joalo, ha kea khona ho e etsa.

Kaha bothata bo ne bo ntse bo hloka ho rarolloa, ke ile ka qala ho nahana ka JOINs le Pivots ea litafole, tse sa kang tsa susumetsa tšepo mabapi le bokhabane, empa e ka ba khetho e sebetsang ka ho feletseng ts'ebetsong.

Morero oa rona o se o ntse o sebelisa Apache ClickHouse, kahoo ke nkile qeto ea ho leka lipatlisiso tsa ka ho DBMS ena ea tlhahlobo.

ClickHouse e kentsoe ho sebelisoa risepe e bonolo:

sudo docker run -d --name clickhouse_1 
     --ulimit nofile=262144:262144 
     -v /opt/clickhouse/log:/var/log/clickhouse-server 
     -v /opt/clickhouse/data:/var/lib/clickhouse 
     yandex/clickhouse-server

Ke thehile database le tafole ho eona ka tsela ena:

CREATE TABLE 
db.steps (`area` Int64, `when` DateTime64(1, 'Europe/Moscow') DEFAULT now64(), `zone` Int64, `person` Int64) 
ENGINE = MergeTree() ORDER BY (area, zone, person) SETTINGS index_granularity = 8192

Ke e tlatsitse ka data ke sebelisa mongolo o latelang:

from time import time

from clickhouse_driver import Client
from random import random

client = Client('vm-12c2c34c-df68-4a98-b1e5-a4d1cef1acff.domain',
                database='db',
                password='secret')

max = 20

for r in range(0, 100000):

    if r % 1000 == 0:
        print("CNT: {}, TS: {}".format(r, time()))

    data = [{
            'area': 0,
            'zone': 0,
            'person': r
        }]

    while True:
        if random() < 0.3:
            break

        data.append({
                'area': 0,
                'zone': int(random() * (max - 2)) + 1,
                'person': r
            })

    data.append({
            'area': 0,
            'zone': max - 1,
            'person': r
        })

    client.execute(
        'INSERT INTO steps (area, zone, person) VALUES',
        data
    )

Kaha tse kentsoeng li tla ka lihlopha, ho tlatsa ho ne ho potlakile ho feta JanusGraph.

O entse lipotso tse peli a sebelisa JOIN. Ho tloha ntlheng ea A ho ea ntlheng ea B:

SELECT s1.person AS person,
       s1.zone,
       s1.when,
       s2.zone,
       s2.when
FROM
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 0)) AS s1 ANY INNER JOIN
  (SELECT *
   FROM steps AS s2
   WHERE (area = 0)
     AND (zone = 19)) AS s2 USING person
WHERE s1.when <= s2.when

Ho fihlella lintlha tse 3:

SELECT s3.person,
       s1z,
       s1w,
       s2z,
       s2w,
       s3.zone,
       s3.when
FROM
  (SELECT s1.person AS person,
          s1.zone AS s1z,
          s1.when AS s1w,
          s2.zone AS s2z,
          s2.when AS s2w
   FROM
     (SELECT *
      FROM steps
      WHERE (area = 0)
        AND (zone = 0)) AS s1 ANY INNER JOIN
     (SELECT *
      FROM steps AS s2
      WHERE (area = 0)
        AND (zone = 3)) AS s2 USING person
   WHERE s1.when <= s2.when) p ANY INNER JOIN
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 19)) AS s3 USING person
WHERE p.s2w <= s3.when

Likopo, ehlile, li shebahala li tšosa haholo; molemong oa ts'ebeliso ea 'nete, o hloka ho theha marang-rang a jenereithara ea software. Leha ho le joalo, lia sebetsa ’me li sebetsa ka potlako. Likopo tsa pele le tsa bobeli li phethiloe ka nako e ka tlase ho metsotsoana e 0.1. Mona ke mohlala oa nako ea ho etsa potso ea count(*) e fetang lintlha tse 3:

SELECT count(*)
FROM 
(
    SELECT 
        s1.person AS person, 
        s1.zone AS s1z, 
        s1.when AS s1w, 
        s2.zone AS s2z, 
        s2.when AS s2w
    FROM 
    (
        SELECT *
        FROM steps
        WHERE (area = 0) AND (zone = 0)
    ) AS s1
    ANY INNER JOIN 
    (
        SELECT *
        FROM steps AS s2
        WHERE (area = 0) AND (zone = 3)
    ) AS s2 USING (person)
    WHERE s1.when <= s2.when
) AS p
ANY INNER JOIN 
(
    SELECT *
    FROM steps
    WHERE (area = 0) AND (zone = 19)
) AS s3 USING (person)
WHERE p.s2w <= s3.when

┌─count()─┐
│   11592 │
└─────────┘

1 rows in set. Elapsed: 0.068 sec. Processed 250.03 thousand rows, 8.00 MB (3.69 million rows/s., 117.98 MB/s.)

Tsebiso mabapi le IOPS. Ha e hlahisa data, JanusGraph e hlahisitse palo e phahameng haholo ea IOPS (1000-1300 bakeng sa likhoele tse nne tsa batho ba data) mme IOWAIT e ne e phahame haholo. Ka nako e ts'oanang, ClickHouse e hlahisitse mojaro o fokolang ho disk subsystem.

fihlela qeto e

Re nkile qeto ea ho sebelisa ClickHouse ho sebeletsa mofuta ona oa kopo. Re ka khona ho ntlafatsa lipotso ka linako tsohle re sebelisa lipono tse entsoeng ka nama le ho bapisa ka ho lokisa pele molaetsa oa ketsahalo re sebelisa Apache Flink pele re li kenya ho ClickHouse.

Ts'ebetso e ntle hoo mohlomong re ke keng ra tlameha ho nahana ka litafole tsa pivoting ka mokhoa o hlophisitsoeng. Pejana, re ne re tlameha ho etsa li-pivots tsa data e nkiloeng ho Vertica ka ho e kenya ho Apache Parquet.

Ka bomalimabe, teko e 'ngoe ea ho sebelisa graph DBMS ha ea atleha. Ha kea fumana JanusGraph e na le ecosystem e bonolo e entseng hore ho be bonolo ho potlakisa sehlahisoa. Ka nako e ts'oanang, ho lokisa seva, ho sebelisoa mokhoa oa setso oa Java, o tla etsa hore batho ba sa tsebeng Java ba lle meokho ea mali:

host: 0.0.0.0
port: 8182
threadPoolWorker: 1
gremlinPool: 8
scriptEvaluationTimeout: 30000
channelizer: org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer

graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties,
  airlines: conf/airlines.properties
}

scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/airline-sample.groovy]}}}}

serializers:
# GraphBinary is here to replace Gryo and Graphson
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }}
  # Gryo and Graphson, latest versions
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}

processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}

metrics: {
  consoleReporter: {enabled: false, interval: 180000},
  csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: false},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferHighWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
  enabled: false}

Ke khonne ho "beha" mofuta oa BerkeleyDB oa JanusGraph ka phoso.

Litokomane li khopame ho latela li-index, kaha ho laola li-index ho hloka hore u etse shamanism e makatsang ho Groovy. Ka mohlala, ho theha index e tlameha ho etsoa ka ho ngola khoutu ho Gremlin console (eo, ka tsela, e sa sebetseng ka ntle ho lebokose). Ho tsoa litokomaneng tsa semmuso tsa JanusGraph:

graph.tx().rollback() //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.buildIndex('byNameAndAgeComposite', Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()

//Wait for the index to become available
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameComposite').call()
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameAndAgeComposite').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"), SchemaAction.REINDEX).get()
mgmt.commit()

Afterword

Ka tsela e itseng, teko e ka holimo ke papiso pakeng tsa mofuthu le bonolo. Haeba u nahana ka eona, kerafo ea DBMS e etsa lits'ebetso tse ling ho fumana liphetho tse tšoanang. Leha ho le joalo, e le karolo ea liteko, ke ile ka boela ka etsa teko ka kopo e kang:

g.V().hasLabel('ZoneStep').has('id',0)
    .repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',1)).count().next()

e bontshang bohole ba ho tsamaya. Leha ho le joalo, esita le boitsebisong bo joalo, graph DBMS e bontšitse liphello tse ileng tsa feta metsotsoana e seng mekae ... Sena, ha e le hantle, se bakoa ke taba ea hore ho ne ho e-na le litsela tse kang 0 -> X -> Y ... -> 1, eo enjene ea graph e ileng ea boela ea e hlahloba.

Le bakeng sa potso e kang:

g.V().hasLabel('ZoneStep').has('id',0).out().has('id',1)).count().next()

Ha kea khona ho fumana karabo e sebetsang ka nako e ka tlase ho motsotsoana.

Boitšoaro ba pale ke hore mohopolo o motle le mohlala oa paradigmatic ha o lebise sephethong se lakatsehang, se bontšoang ka katleho e phahameng haholo ho sebelisoa mohlala oa ClickHouse. Nyeoe ea tšebeliso e hlahisitsoeng sehloohong sena ke mokhoa o hlakileng oa ho hanyetsa li-graph DBMS, le hoja ho bonahala eka ho loketse ho etsa mohlala ka paradigm ea bona.

Source: www.habr.com

Eketsa ka tlhaloso