Experimentum probandi applicabilitas graphi DBMS JanusGraph ad problema solvendum semitas idoneas inveniendi

Experimentum probandi applicabilitas graphi DBMS JanusGraph ad problema solvendum semitas idoneas inveniendi

Hi omnes. Productum explicamus pro analysis negotiationis offline. Project munus habet ad statisticam analysin visitandi viarum per regiones pertinentes.

Ut pars huius instituti, utentes ratio quaesita huius generis quaerere possunt:

  • quot visitatores ab area "A" ad aream "B" transierunt;
  • quot visitatores ab area "A" ad aream "B" per aream "C" transierunt et deinde per aream "D";
  • quousque tempus quoddam visitatoris genus peregrinandi ab area "A" ad aream "B" suscepit.

et plura similia quaestionibus analyticis.

Motus visitatoris per areas graph est directus. Cum legerem Interreti, inveni DBMSs graphes in relationibus analyticis adhibendas esse. Cupiebam videre quomodo graph DBMSs talibus quaestionibus obire posset (Librarum turonensium, DR; male).

Libet uti DBMS JanusGraphut praeclarum repraesentativum graphiae aperturae fontis DBMS, qui innixus est in acervo technologiae maturae, quae (ut equidem arbitror) honestis notis perficiendis praebere debet;

  • Repono BerkeleyDB backend, Apache Cassandra, Scylla;
  • Indices complexi condi in Lucene, Elasticsearch, Solr.

Auctores JanusGraph aptum scribunt utrumque OLTP et OLAP.

Operavi cum BerkeleyDB, Apache Cassandra, Scylla et ES, quibus productis saepe in systematis nostris usi sumus, itaque optimam partem hanc DBMS graphio probavi. Imparium inveni in BerkeleyDB eligere super RocksDB, sed id probabiliter ob requisita transactionis. Ceterum, pro scalable usu producto, Cassandrae vel Scyllae tergum uti suggeritur.

Neo4j non intellexi, quia pampineam versionem commercii requirit, hoc est, producti fontem non apertum.

Aliquam lacinia purus DBMSs dicunt: "Si similis lacinia purus videtur, aliquam lacinia purus! β€” pulchritudo!

Prius grapham traxi, quae exacte facta est secundum canones graphi DBMSs;

Experimentum probandi applicabilitas graphi DBMS JanusGraph ad problema solvendum semitas idoneas inveniendi

Est essentia Zoneginem ret. Si ZoneStep hoc pertinet Zoneergo refert ad illud. De essentia Area, ZoneTrack, Person Noli attendere, ad dominium pertinent nec considerantur ut pars test. In summa, catena quaerendi quaestionem pro tali structurae graphi similem spectare debet:

g.V().hasLabel('Zone').has('id',0).in_()
       .repeat(__.out()).until(__.out().hasLabel('Zone').has('id',19)).count().next()

Quid in Russica aliquid simile est: invenis Zonam cum ID=0, sume omnes vertices, e quibus crepidines ad eam accedit (ZoneStep), allide sine regressu usque dum inveneris illos ZoneSteps a quibus est ora ad Zonam cum ID=19, numera catenulas talium numero.

Omnes subtilitates graphes investigandi non praetexo scire, sed haec quaestio ex hoc libro generata est.https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html).

Inposui 50 milia vestigia ab 3 ad 20 puncta longitudinis in JaniGraph graphi datorum per BerkeleyDB backend utentes, indices creatos secundum ducibus.

Python download script:


from random import random
from time import time

from init import g, graph

if __name__ == '__main__':

    points = []
    max_zones = 19
    zcache = dict()
    for i in range(0, max_zones + 1):
        zcache[i] = g.addV('Zone').property('id', i).next()

    startZ = zcache[0]
    endZ = zcache[max_zones]

    for i in range(0, 10000):

        if not i % 100:
            print(i)

        start = g.addV('ZoneStep').property('time', int(time())).next()
        g.V(start).addE('belongs').to(startZ).iterate()

        while True:
            pt = g.addV('ZoneStep').property('time', int(time())).next()
            end_chain = random()
            if end_chain < 0.3:
                g.V(pt).addE('belongs').to(endZ).iterate()
                g.V(start).addE('goes').to(pt).iterate()
                break
            else:
                zone_id = int(random() * max_zones)
                g.V(pt).addE('belongs').to(zcache[zone_id]).iterate()
                g.V(start).addE('goes').to(pt).iterate()

            start = pt

    count = g.V().count().next()
    print(count)

Usi sumus a VM cum 4 coros et 16 GB RAM in SSD. JanusGraph deployed hoc mandatum utens:

docker run --name janusgraph -p8182:8182 janusgraph/janusgraph:latest

In hoc casu notitiae et indices qui ad certas inquisitiones adhibitae sunt repositae sunt in BerkeleyDB. Petitione antea praestita, compluribus decem secundis tempus aequale accepi.

Currens 4 supra scripta in parallelis, DBMS in cucurbita cum alacri rivulo acervos Javae (et omnes amantes Iavae acervos legentes) in Docker trabes vertere potui.

Post aliquam cogitationem decrevi graphium graphium simpliciorem reddere ad sequentia:

Experimentum probandi applicabilitas graphi DBMS JanusGraph ad problema solvendum semitas idoneas inveniendi

Statuens investigationem illam per entitates attributorum velociorem fore quam per oras inquisitionem. Quam ob rem petitio mea in sequentia vertitur;

g.V().hasLabel('ZoneStep').has('id',0).repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',19)).count().next()

Quid in Russico simile est: invenis ZoneStep cum ID=0, stomp sine reditu usque dum inveneris ZoneStep cum ID=19, numera catenarum talium.

Etiam scripturam oneratam supra positam simpliciorem feci ut nexus non necessarias crearet, limitans me attributis.

Postulatio etiamnum aliquot momenta complevit, quod negotium nostro omnino ingratum fuit, cum ad hoc quodlibet postulandum minime idoneus esset.

IanusGraph cum velocissimo Cassandra exsecutionem Scyllae utentem explicare conatus sum, sed hoc etiam non ad aliquas mutationes significantes perficiendas induxit.

Ita non obstante quod "similis graphi similis", non potui graphum DBMS consequi ut celeriter processum esset. Plene suppono aliquid esse quod nescio et JanusGraph fieri posse ut hanc investigationem in fractione alterius perficiat, tamen id facere non potui.

Cum problema adhuc solvendum esset, coepi cogitare de societatibus et cardinis tabularum, quae non optimismum elegantiae vocabant, sed optionem in praxi omnino operabilem esse posse.

Intentio nostra iam Apache ClickHouse utitur, ideo placuit experiri meam investigationem in hac DBMS analytica.

Deployed ClickHouse usus simplex facito:

sudo docker run -d --name clickhouse_1 
     --ulimit nofile=262144:262144 
     -v /opt/clickhouse/log:/var/log/clickhouse-server 
     -v /opt/clickhouse/data:/var/lib/clickhouse 
     yandex/clickhouse-server

database in eo et mensam creavi sic:

CREATE TABLE 
db.steps (`area` Int64, `when` DateTime64(1, 'Europe/Moscow') DEFAULT now64(), `zone` Int64, `person` Int64) 
ENGINE = MergeTree() ORDER BY (area, zone, person) SETTINGS index_granularity = 8192

Implevi eam cum notitia utens scripto sequenti:

from time import time

from clickhouse_driver import Client
from random import random

client = Client('vm-12c2c34c-df68-4a98-b1e5-a4d1cef1acff.domain',
                database='db',
                password='secret')

max = 20

for r in range(0, 100000):

    if r % 1000 == 0:
        print("CNT: {}, TS: {}".format(r, time()))

    data = [{
            'area': 0,
            'zone': 0,
            'person': r
        }]

    while True:
        if random() < 0.3:
            break

        data.append({
                'area': 0,
                'zone': int(random() * (max - 2)) + 1,
                'person': r
            })

    data.append({
            'area': 0,
            'zone': max - 1,
            'person': r
        })

    client.execute(
        'INSERT INTO steps (area, zone, person) VALUES',
        data
    )

Cum inserta venirent in batches, impletio multo velocior fuit quam JanusGraph.

Construuntur duae interrogationes utens JOIN. Movere a puncto A ad punctum B;

SELECT s1.person AS person,
       s1.zone,
       s1.when,
       s2.zone,
       s2.when
FROM
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 0)) AS s1 ANY INNER JOIN
  (SELECT *
   FROM steps AS s2
   WHERE (area = 0)
     AND (zone = 19)) AS s2 USING person
WHERE s1.when <= s2.when

Ire per puncta III:

SELECT s3.person,
       s1z,
       s1w,
       s2z,
       s2w,
       s3.zone,
       s3.when
FROM
  (SELECT s1.person AS person,
          s1.zone AS s1z,
          s1.when AS s1w,
          s2.zone AS s2z,
          s2.when AS s2w
   FROM
     (SELECT *
      FROM steps
      WHERE (area = 0)
        AND (zone = 0)) AS s1 ANY INNER JOIN
     (SELECT *
      FROM steps AS s2
      WHERE (area = 0)
        AND (zone = 3)) AS s2 USING person
   WHERE s1.when <= s2.when) p ANY INNER JOIN
  (SELECT *
   FROM steps
   WHERE (area = 0)
     AND (zone = 19)) AS s3 USING person
WHERE p.s2w <= s3.when

Petitiones, sane, valde FORMIDULOSUS spectant, ad usum realem programmatum generantis phaleras creare debes. Sed laborant et celeriter laborant. Tam prima et secunda petitiones minores quam 0.1 secundis absolvuntur. Exemplum adest interrogationis exsecutionis temporis comitialis (*) transeuntis per 3 puncta:

SELECT count(*)
FROM 
(
    SELECT 
        s1.person AS person, 
        s1.zone AS s1z, 
        s1.when AS s1w, 
        s2.zone AS s2z, 
        s2.when AS s2w
    FROM 
    (
        SELECT *
        FROM steps
        WHERE (area = 0) AND (zone = 0)
    ) AS s1
    ANY INNER JOIN 
    (
        SELECT *
        FROM steps AS s2
        WHERE (area = 0) AND (zone = 3)
    ) AS s2 USING (person)
    WHERE s1.when <= s2.when
) AS p
ANY INNER JOIN 
(
    SELECT *
    FROM steps
    WHERE (area = 0) AND (zone = 19)
) AS s3 USING (person)
WHERE p.s2w <= s3.when

β”Œβ”€count()─┐
β”‚   11592 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1 rows in set. Elapsed: 0.068 sec. Processed 250.03 thousand rows, 8.00 MB (3.69 million rows/s., 117.98 MB/s.)

Nota de IOPS. Cum data populatio, JanusGraph satis altum numerum IOPS (1000-1300 pro quattuor notitiis filis incolarum) generavit et IOWAIT satis altum erat. Eodem tempore, ClickHouse generavit minimum onus in disco subsystem.

conclusio,

Hoc genus petitionis uti constituimus ut strepita opera serviamus. Possimus semper ulteriores optimize queries utentes materiales opiniones et parallelizationem pre-processus eventus amnis utens Apache Flink antequam eas oneraret in ClickHouse.

Faciendum est ita bonum ut ne forte etiam de tabularum pivationibus programmatice cogitare. Antea facere debebamus cardines notitiarum e Vertica via onerationis Apache Parquet receptae.

Infeliciter, alius conatus DBMS graphi uti potuit. JanusGraph non inveni, habere amicabilem ecosystematis, quae facilem efficeret ut cum producto velocitate surgeret. Eodem tempore, ut servo configurare, more maiorum Javae adhibetur, quae homines faciunt lacrimas sanguinis cum Java notos;

host: 0.0.0.0
port: 8182
threadPoolWorker: 1
gremlinPool: 8
scriptEvaluationTimeout: 30000
channelizer: org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer

graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties,
  airlines: conf/airlines.properties
}

scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/airline-sample.groovy]}}}}

serializers:
# GraphBinary is here to replace Gryo and Graphson
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }}
  # Gryo and Graphson, latest versions
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}

processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}

metrics: {
  consoleReporter: {enabled: false, interval: 180000},
  csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: false},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferHighWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
  enabled: false}

In casu "ponere" potui in versione BerkeleyDB JanusGraph.

Documenta in verbis indicibus satis perversa est, cum indices disponendi te requirunt ut in Groovy aliqua potius aliena shamanismum perficias. Exempli causa, index creans in Gremlin console scribendo fieri debet (quod obiter ex arca non operatur). Ex documentis officialis JanusGraph;

graph.tx().rollback() //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.buildIndex('byNameAndAgeComposite', Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()

//Wait for the index to become available
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameComposite').call()
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameAndAgeComposite').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"), SchemaAction.REINDEX).get()
mgmt.commit()

afterword

Hoc modo experimentum praedictum est comparatio inter calidum et molle. Si hoc cogitas, graph DBMS alias operationes exercet ad eosdem eventus obtinendos. Sed, sicut pars probat, etiam experimentum feci cum petitione sicut:

g.V().hasLabel('ZoneStep').has('id',0)
    .repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',1)).count().next()

quae procul incedentem reflectit. Nihilominus, etiam in tali notitia, DBMS graphus eventus ostendit qui brevi tempore transierunt... Quod quidem accidit ex eo quod viae similes erant. 0 -> X -> Y ... -> 1quas graphi machinamenta cohibebat.

Etiam quaesitum est simile:

g.V().hasLabel('ZoneStep').has('id',0).out().has('id',1)).count().next()

Responsionem productivam cum processu temporis minus quam secundum non potui obtinere.

Fabulae moralis est notionem pulchram et exemplares paradigmaticos ad optatum exitum non adducere, quae multo altiori efficacia utens exemplo ClickHouse demonstratur. Usus casus in hoc articulo praesentatus perspicuum est exemplum anti-graphorum DBMSs, quamvis idoneus videatur ad eorum paradigma adumbrandum.

Source: www.habr.com