Kuyesera kuyesa kugwiritsa ntchito JanusGraph graph DBMS kuthetsa vuto lopeza njira zoyenera.

Kuyesera kuyesa kugwiritsa ntchito JanusGraph graph DBMS kuthetsa vuto lopeza njira zoyenera.

Moni nonse. Tikupanga chinthu chowunikira anthu osagwiritsa ntchito intaneti. Pulojekitiyi ili ndi ntchito yokhudzana ndi kusanthula kwa ziwerengero za njira za alendo kudutsa zigawo.

Monga gawo la ntchitoyi, ogwiritsa ntchito atha kufunsa mafunso pamakina amtundu wotsatirawu:

  • ndi alendo angati omwe adadutsa kudera la "A" kupita ku "B";
  • ndi alendo angati omwe adadutsa kudera la "A" kupita ku "B" kudutsa "C" kenako "D";
  • zinatenga nthawi yayitali bwanji kuti mlendo wamtundu wina ayende kuchokera kudera "A" kupita ku "B".

ndi mafunso angapo owunikira ofanana.

Kuyenda kwa mlendo kudutsa madera ndi graph yolunjika. Nditawerenga pa intaneti, ndidazindikira kuti ma graph DBMS amagwiritsidwanso ntchito pakuwunika malipoti. Ndinali ndi chikhumbo chowona momwe ma graph DBMS angathanirane ndi mafunso otere (TL; DR; bwino).

Ndinasankha kugwiritsa ntchito DBMS JanusGraph, monga woyimilira wodziwika bwino wa DBMS yotsegulira ma graph, yomwe imadalira mulu wa matekinoloje okhwima, omwe (m'malingaliro anga) ayenera kuwapatsa mawonekedwe ogwirira ntchito:

  • BerkeleyDB yosungirako backend, Apache Cassandra, Scylla;
  • zolemba zovuta zitha kusungidwa ku Lucene, Elasticsearch, Solr.

Olemba a JanusGraph amalemba kuti ndiyoyenera OLTP ndi OLAP.

Ndagwira ntchito ndi BerkeleyDB, Apache Cassandra, Scylla ndi ES, ndipo mankhwalawa amagwiritsidwa ntchito nthawi zambiri m'makina athu, kotero ndinali ndi chiyembekezo choyesa graph DBMS iyi. Ndinaona kuti ndizosamvetseka kusankha BerkeleyDB pa RocksDB, koma mwina ndi chifukwa cha zomwe mukufuna kuchita. Mulimonsemo, pakuwonongeka, kugwiritsa ntchito mankhwala, tikulimbikitsidwa kugwiritsa ntchito kumbuyo kwa Cassandra kapena Scylla.

Sindinaganizirepo za Neo4j chifukwa kusanja kumafuna mtundu wamalonda, ndiye kuti, malondawo siwotseguka.

Graph DBMSs imati: "Ngati ikuwoneka ngati graph, itengeni ngati graph!" - kukongola!

Choyamba, ndinajambula graph, yomwe inapangidwa ndendende malinga ndi ma graph DBMSs:

Kuyesera kuyesa kugwiritsa ntchito JanusGraph graph DBMS kuthetsa vuto lopeza njira zoyenera.

Pali chenicheni Zone, udindo wa dera. Ngati ZoneStep za izi Zone, kenako amalozera kwa izo. Pa zenizeni Area, ZoneTrack, Person Osamvera, ndi a domain ndipo samatengedwa ngati gawo la mayeso. Ponseponse, funso losakasaka maunyolo pamapangidwe a ma graph ngati awa:

g.V().hasLabel('Zone').has('id',0).in_()
       .repeat(__.out()).until(__.out().hasLabel('Zone').has('id',19)).count().next()

Zomwe mu Chirasha zili ngati izi: pezani Zone yokhala ndi ID = 0, tengani ma vertices onse omwe m'mphepete mwake amapitako (ZoneStep), pondani osabwereranso mpaka mutapeza ZoneSteps zomwe zili m'mphepete mwa Zone ndi ID=19, werengani nambala ya maunyolo otere.

Sindimadziyesa kuti ndikudziwa zovuta zonse zakusaka pama graph, koma funsoli lidapangidwa kutengera bukuli (https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html).

Ndinakweza ma track 50 kuyambira 3 mpaka 20 m'litali mu JanusGraph graph database pogwiritsa ntchito BerkeleyDB backend, ndikupanga indexes molingana ndi utsogoleri.

Python download script:


from random import random
from time import time

from init import g, graph

if __name__ == '__main__':

    points = []
    max_zones = 19
    zcache = dict()
    for i in range(0, max_zones + 1):
        zcache[i] = g.addV('Zone').property('id', i).next()

    startZ = zcache[0]
    endZ = zcache[max_zones]

    for i in range(0, 10000):

        if not i % 100:
            print(i)

        start = g.addV('ZoneStep').property('time', int(time())).next()
        g.V(start).addE('belongs').to(startZ).iterate()

        while True:
            pt = g.addV('ZoneStep').property('time', int(time())).next()
            end_chain = random()
            if end_chain < 0.3:
                g.V(pt).addE('belongs').to(endZ).iterate()
                g.V(start).addE('goes').to(pt).iterate()
                break
            else:
                zone_id = int(random() * max_zones)
                g.V(pt).addE('belongs').to(zcache[zone_id]).iterate()
                g.V(start).addE('goes').to(pt).iterate()

            start = pt

    count = g.V().count().next()
    print(count)

Tidagwiritsa ntchito VM yokhala ndi ma cores 4 ndi 16 GB RAM pa SSD. JanusGraph adayikidwa pogwiritsa ntchito lamulo ili:

docker run --name janusgraph -p8182:8182 janusgraph/janusgraph:latest

Pachifukwa ichi, deta ndi ma index omwe amagwiritsidwa ntchito pakusaka machesi amasungidwa ku BerkeleyDB. Nditachita zimene ndinapempha poyamba, ndinalandira nthawi yofanana ndi masekondi angapo.

Poyendetsa zolemba za 4 pamwambapa mofananira, ndinatha kutembenuza DBMS kukhala dzungu ndi mtsinje wokondwa wa Java stacktraces (ndipo tonse timakonda kuwerenga Java stacktraces) m'mabuku a Docker.

Nditaganizira pang'ono, ndidaganiza zosinthira chithunzicho kukhala chosavuta:

Kuyesera kuyesa kugwiritsa ntchito JanusGraph graph DBMS kuthetsa vuto lopeza njira zoyenera.

Kusankha kuti kusaka ndi mawonekedwe a bungwe kungakhale kwachangu kuposa kusaka m'mphepete. Chotsatira chake, pempho langa linasanduka motere:

g.V().hasLabel('ZoneStep').has('id',0).repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',19)).count().next()

Zomwe mu Chirasha zili chonchi: pezani ZoneStep yokhala ndi ID=0, stomp osabwerera mpaka mutapeza ZoneStep yokhala ndi ID=19, werengani kuchuluka kwa maunyolo oterowo.

Ndinafewetsanso zolemba zomwe zaperekedwa pamwambapa kuti ndisapange kulumikizana kosafunikira, ndikumangokhalira kukhudzidwa.

Pempholi linatengabe masekondi angapo kuti amalize, zomwe sizinali zovomerezeka pa ntchito yathu, chifukwa sizinali zoyenera pazifukwa za AdHoc zamtundu uliwonse.

Ndinayesa kutumiza JanusGraph pogwiritsa ntchito Scylla ngati njira yofulumira kwambiri ya Cassandra, koma izi sizinapangitsenso kusintha kwakukulu kwa magwiridwe antchito.

Kotero ngakhale kuti "zikuwoneka ngati graph", sindinathe kupeza graph DBMS kuti ikonze mwamsanga. Ndikuganiza kuti sindikudziwa china chake ndipo ndizotheka kupanga JanusGraph kuti afufuze izi mumphindikati, komabe, sindinathe.

Popeza vutoli linkafunikabe kuthetsedwa, ndinayamba kuganiza za JOINs ndi Pivots of tables, zomwe sizinalimbikitse chiyembekezo ponena za kukongola, koma zikhoza kukhala njira yogwirira ntchito.

Ntchito yathu ikugwiritsa ntchito kale Apache ClickHouse, kotero ndidaganiza zoyesa kafukufuku wanga pa DBMS yowunikira iyi.

Kutumiza ClickHouse pogwiritsa ntchito njira yosavuta:

sudo docker run -d --name clickhouse_1 
     --ulimit nofile=262144:262144 
     -v /opt/clickhouse/log:/var/log/clickhouse-server 
     -v /opt/clickhouse/data:/var/lib/clickhouse 
     yandex/clickhouse-server

Ndinapanga database ndi tebulo mmenemo motere:

CREATE TABLE 
db.steps (`area` Int64, `when` DateTime64(1, 'Europe/Moscow') DEFAULT now64(), `zone` Int64, `person` Int64) 
ENGINE = MergeTree() ORDER BY (area, zone, person) SETTINGS index_granularity = 8192

Ndinadzaza ndi deta pogwiritsa ntchito malemba awa:

from time import time

from clickhouse_driver import Client
from random import random

client = Client('vm-12c2c34c-df68-4a98-b1e5-a4d1cef1acff.domain',
                database='db',
                password='secret')

max = 20

for r in range(0, 100000):

    if r % 1000 == 0:
        print("CNT: {}, TS: {}".format(r, time()))

    data = [{
            'area': 0,
            'zone': 0,
            'person': r
        }]

    while True:
        if random() < 0.3:
            break

        data.append({
                'area': 0,
                'zone': int(random() * (max - 2)) + 1,
                'person': r
            })

    data.append({
            'area': 0,
            'zone': max - 1,
            'person': r
        })

    client.execute(
        'INSERT INTO steps (area, zone, person) VALUES',
        data
    )

Popeza zoyika zimabwera m'magulu, kudzaza kunali kofulumira kwambiri kuposa JanusGraph.

Anapanga mafunso awiri pogwiritsa ntchito JOIN. Kuchokera pa mfundo A kupita ku nsonga B:

SELECT s1.person AS person,
       s1.zone,
       s1.when,
       s2.zone,
       s2.when
FROM
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 0)) AS s1 ANY INNER JOIN
  (SELECT *
   FROM steps AS s2
   WHERE (area = 0)
     AND (zone = 19)) AS s2 USING person
WHERE s1.when <= s2.when

Kuti mudutse mfundo zitatu:

SELECT s3.person,
       s1z,
       s1w,
       s2z,
       s2w,
       s3.zone,
       s3.when
FROM
  (SELECT s1.person AS person,
          s1.zone AS s1z,
          s1.when AS s1w,
          s2.zone AS s2z,
          s2.when AS s2w
   FROM
     (SELECT *
      FROM steps
      WHERE (area = 0)
        AND (zone = 0)) AS s1 ANY INNER JOIN
     (SELECT *
      FROM steps AS s2
      WHERE (area = 0)
        AND (zone = 3)) AS s2 USING person
   WHERE s1.when <= s2.when) p ANY INNER JOIN
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 19)) AS s3 USING person
WHERE p.s2w <= s3.when

Zopempha, ndithudi, zikuwoneka zoopsa; kuti mugwiritse ntchito kwenikweni, muyenera kupanga makina opanga mapulogalamu. Komabe, amagwira ntchito ndipo amagwira ntchito mwachangu. Zopempha zoyamba ndi zachiwiri zimamalizidwa pasanathe masekondi 0.1. Nachi chitsanzo cha nthawi yofunsa mafunso kuti count(*) kudutsa mfundo zitatu:

SELECT count(*)
FROM 
(
    SELECT 
        s1.person AS person, 
        s1.zone AS s1z, 
        s1.when AS s1w, 
        s2.zone AS s2z, 
        s2.when AS s2w
    FROM 
    (
        SELECT *
        FROM steps
        WHERE (area = 0) AND (zone = 0)
    ) AS s1
    ANY INNER JOIN 
    (
        SELECT *
        FROM steps AS s2
        WHERE (area = 0) AND (zone = 3)
    ) AS s2 USING (person)
    WHERE s1.when <= s2.when
) AS p
ANY INNER JOIN 
(
    SELECT *
    FROM steps
    WHERE (area = 0) AND (zone = 19)
) AS s3 USING (person)
WHERE p.s2w <= s3.when

β”Œβ”€count()─┐
β”‚   11592 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1 rows in set. Elapsed: 0.068 sec. Processed 250.03 thousand rows, 8.00 MB (3.69 million rows/s., 117.98 MB/s.)

Chidziwitso cha IOPS. Podzaza deta, JanusGraph idapanga kuchuluka kwa IOPS (1000-1300 pamizere inayi ya data) ndipo IOWAIT inali yokwera kwambiri. Nthawi yomweyo, ClickHouse idatulutsa zochepa pa disk subsystem.

Pomaliza

Tidaganiza zogwiritsa ntchito ClickHouse kuti tithandizire pempho lamtunduwu. Nthawi zonse titha kukhathamiritsa mafunso pogwiritsa ntchito mawonedwe owoneka bwino komanso kufanana pokonzekeratu zochitika zomwe zikuchitika pogwiritsa ntchito Apache Flink tisanazilowetse mu ClickHouse.

Kuchita kwake ndikwabwino kwambiri kotero kuti mwina sitidzafunikanso kuganiza zongoyang'ana matebulo mwadongosolo. M'mbuyomu, tidayenera kuchita ma pivots a data yomwe idabwezedwa kuchokera ku Vertica kudzera pakukweza ku Apache Parquet.

Tsoka ilo, kuyesa kwina kugwiritsa ntchito graph DBMS sikunapambane. Sindinapeze JanusGraph kukhala ndi chilengedwe chochezeka chomwe chinapangitsa kuti zikhale zosavuta kufulumira ndi malonda. Nthawi yomweyo, kukonza seva, njira yachikhalidwe ya Java imagwiritsidwa ntchito, zomwe zimapangitsa kuti anthu omwe sadziwa Java azilira misozi yamagazi:

host: 0.0.0.0
port: 8182
threadPoolWorker: 1
gremlinPool: 8
scriptEvaluationTimeout: 30000
channelizer: org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer

graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties,
  airlines: conf/airlines.properties
}

scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/airline-sample.groovy]}}}}

serializers:
# GraphBinary is here to replace Gryo and Graphson
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }}
  # Gryo and Graphson, latest versions
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}

processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}

metrics: {
  consoleReporter: {enabled: false, interval: 180000},
  csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: false},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferHighWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
  enabled: false}

Ndidakwanitsa "kuyika" mwangozi mtundu wa BerkeleyDB wa JanusGraph.

Zolembazo ndi zokhota pama index, chifukwa kuyang'anira ma index kumafuna kuti muchite shamanism yachilendo ku Groovy. Mwachitsanzo, kupanga cholozera kuyenera kuchitidwa polemba khodi mu Gremlin console (yomwe, mwa njira, sizigwira ntchito m'bokosi). Kuchokera pamakalata ovomerezeka a JanusGraph:

graph.tx().rollback() //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.buildIndex('byNameAndAgeComposite', Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()

//Wait for the index to become available
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameComposite').call()
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameAndAgeComposite').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"), SchemaAction.REINDEX).get()
mgmt.commit()

Pambuyo pake

M'lingaliro lina, kuyesa pamwambaku ndikufanizira pakati pa kutentha ndi kofewa. Ngati mukuganiza za izi, graph DBMS imachita ntchito zina kuti mupeze zotsatira zomwezo. Komabe, monga gawo la mayeserowo, ndinayesanso ndi pempho monga:

g.V().hasLabel('ZoneStep').has('id',0)
    .repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',1)).count().next()

zomwe zimasonyeza mtunda woyenda. Komabe, ngakhale pazidziwitso zotere, graph DBMS inasonyeza zotsatira zomwe zinadutsa masekondi angapo ... Izi, ndithudi, chifukwa chakuti panali njira monga 0 -> X -> Y ... -> 1, yomwe injini ya graph inayang'ananso.

Ngakhale funso ngati:

g.V().hasLabel('ZoneStep').has('id',0).out().has('id',1)).count().next()

Sindinathe kupeza yankho lachindunji ndi nthawi yokonzekera yosakwana sekondi imodzi.

Makhalidwe a nkhaniyi ndikuti lingaliro lokongola ndi mawonekedwe a paradigmatic satsogolera ku zotsatira zomwe mukufuna, zomwe zikuwonetsedwa ndipamwamba kwambiri pogwiritsa ntchito chitsanzo cha ClickHouse. Mlandu wogwiritsiridwa ntchito womwe waperekedwa m'nkhaniyi ndiwotsutsana ndi mawonekedwe a graph DBMSs, ngakhale akuwoneka kuti ndi oyenera kutsanzira paradigm yawo.

Source: www.habr.com

Kuwonjezera ndemanga