Sanibonani nonke. Sakha umkhiqizo wokuhlaziya ithrafikhi engaxhunyiwe ku-inthanethi. Iphrojekthi inomsebenzi ohlobene nokuhlaziywa kwezibalo zemizila yezivakashi kuzo zonke izifunda.
Njengengxenye yalo msebenzi, abasebenzisi bangabuza imibuzo yesistimu yohlobo olulandelayo:
- zingaki izivakashi ezidlule endaweni ethi "A" zaya endaweni ethi "B";
- zingaki izivakashi ezidlule endaweni ethi "A" ziye endaweni ethi "B" endaweni ethi "C" bese zidlulela endaweni ethi "D";
- kuthathe isikhathi esingakanani ukuthi uhlobo oluthile lwesivakashi luhambe lusuka endaweni “A” luye endaweni “B”.
kanye nenani lemibuzo yokuhlaziya efanayo.
Ukuhamba kwesivakashi ezindaweni zonke kuyigrafu eqondisiwe. Ngemva kokufunda i-Inthanethi, ngithole ukuthi ama-DBMS egrafu nawo asetshenziselwa imibiko yokuhlaziya. Nganginesifiso sokubona ukuthi ama-DBMS egrafu angabhekana kanjani nemibuzo enjalo (TL; DR; kahle).
Ngikhethe ukusebenzisa i-DBMS
- I-backend yokugcina i-BerkeleyDB, i-Apache Cassandra, i-Scylla;
- izinkomba eziyinkimbinkimbi zingagcinwa e-Lucene, Elasticsearch, Solr.
Ababhali be-JanusGraph babhala ukuthi ifanele kokubili i-OLTP ne-OLAP.
Ngisebenze ne-BerkeleyDB, i-Apache Cassandra, i-Scylla ne-ES, futhi le mikhiqizo ivame ukusetshenziswa ezinhlelweni zethu, ngakho-ke benginethemba lokuhlola le grafu ye-DBMS. Ngithole kuyinqaba ukukhetha i-BerkeleyDB kune-RocksDB, kodwa lokho mhlawumbe kungenxa yezidingo zokwenziwayo. Kunoma ikuphi, ukuze kuhlaziywe, ukusetshenziswa komkhiqizo, kuphakanyiswa ukuthi usebenzise i-backend ku-Cassandra noma i-Scylla.
Angizange ngicabangele i-Neo4j ngoba ukuhlanganisa kudinga inguqulo yezohwebo, okungukuthi, umkhiqizo awuwona umthombo ovulekile.
Igrafu DBMSs ithi: "Uma ibukeka njengegrafu, iphathe njengegrafu!" - ubuhle!
Okokuqala, ngidwebe igrafu, eyenziwe ncamashí ne-canon ye-graph DBMSs:
Kukhona ingqikithi Zone
, obhekele indawo. Uma ZoneStep
kungokwalokhu Zone
, abese ebhekisela kuyo. Empeleni Area
, ZoneTrack
, Person
Unganaki, bangabesizinda futhi ababhekwa njengengxenye yokuhlolwa. Sekukonke, umbuzo wokusesha owuchungechunge wesakhiwo segrafu esinjalo ungabukeka kanje:
g.V().hasLabel('Zone').has('id',0).in_()
.repeat(__.out()).until(__.out().hasLabel('Zone').has('id',19)).count().next()
Okufana nalokhu ngesi-Russian: thola i-Zone ene-ID=0, thatha wonke ama-vertices lapho unqenqema luya khona (ZoneStep), gxuma ngaphandle kokubuyela emuva uze uthole lawo ma-ZoneSteps okunomkhawulo oya ku-Zone nge ID=19, bala inombolo yamaketango anjalo.
Angizenzi ngiyazi bonke ubunkimbinkimbi bokusesha emagrafu, kodwa lo mbuzo wakhiwe ngokusekelwe kule ncwadi (
Ngilayishe amathrekhi ayizinkulungwane ezingama-50 asukela kumaphuzu angama-3 kuye kwangama-20 ubude kusizindalwazi segrafu ye-JanusGraph ngisebenzisa i-backend ye-BerkeleyDB, ngenza izinkomba ngokusho
I-Python download script:
from random import random
from time import time
from init import g, graph
if __name__ == '__main__':
points = []
max_zones = 19
zcache = dict()
for i in range(0, max_zones + 1):
zcache[i] = g.addV('Zone').property('id', i).next()
startZ = zcache[0]
endZ = zcache[max_zones]
for i in range(0, 10000):
if not i % 100:
print(i)
start = g.addV('ZoneStep').property('time', int(time())).next()
g.V(start).addE('belongs').to(startZ).iterate()
while True:
pt = g.addV('ZoneStep').property('time', int(time())).next()
end_chain = random()
if end_chain < 0.3:
g.V(pt).addE('belongs').to(endZ).iterate()
g.V(start).addE('goes').to(pt).iterate()
break
else:
zone_id = int(random() * max_zones)
g.V(pt).addE('belongs').to(zcache[zone_id]).iterate()
g.V(start).addE('goes').to(pt).iterate()
start = pt
count = g.V().count().next()
print(count)
Sisebenzise i-VM enamacores angu-4 kanye ne-RAM engu-16 GB ku-SSD. I-JanusGraph yasetshenziswa kusetshenziswa lo myalo:
docker run --name janusgraph -p8182:8182 janusgraph/janusgraph:latest
Kulesi simo, idatha nezinkomba ezisetshenziselwa ukusesha okufanayo zigcinwa ku-BerkeleyDB. Ngemva kokwenza isicelo engangisinikwe ngaphambili, ngathola isikhathi esilingana namashumi amaningana emizuzwana.
Ngokusebenzisa izikripthi ezi-4 ezingenhla ngokuhambisana, ngikwazile ukushintsha i-DBMS ibe ithanga ngomfudlana ojabulisayo we-Java stacktraces (futhi sonke siyakuthanda ukufunda ama-stacktraces e-Java) ezingodweni ze-Docker.
Ngemva kokucabanga okuthile, nginqume ukwenza lula umdwebo wegrafu ube lokhu okulandelayo:
Ukunquma ukuthi ukusesha ngezibaluli zebhizinisi kuzoshesha kunokusesha ngemiphetho. Ngenxa yalokho, isicelo sami saphenduka saba okulandelayo:
g.V().hasLabel('ZoneStep').has('id',0).repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',19)).count().next()
Okushiwo ngesi-Russian kufana nalokhu: thola i-ZoneStep ene-ID=0, cindezela ngaphandle kokubuyela emuva uze uthole i-ZoneStep ene-ID=19, ubale inani lamaketango anjalo.
Ngiphinde ngenza lula iskripthi sokulayisha esinikezwe ngenhla ukuze ngingadali ukuxhumana okungadingekile, ngikhawulele kuzibaluli.
Isicelo sisathatha imizuzwana embalwa ukuthi siqedwe, okwakungamukeleki neze emsebenzini wethu, njengoba sasingazifanele neze izinjongo zezicelo ze-AdHoc zanoma yiluphi uhlobo.
Ngizamile ukusebenzisa i-JanusGraph ngisebenzisa i-Scylla njengokuqaliswa kwe-Cassandra esheshayo, kodwa lokhu futhi akuzange kuholele kunoma yiziphi izinguquko ezibalulekile zokusebenza.
Ngakho-ke naphezu kweqiniso lokuthi "kubukeka njengegrafu", angikwazanga ukuthola i-DBMS yegrafu ukuyicubungula ngokushesha. Ngicabanga ngokugcwele ukuthi kukhona into engingayazi nokuthi i-JanusGraph ingenziwa ukuthi yenze lolu sesho ngengxenye yesekhondi, nokho, angikwazanga ukukwenza.
Njengoba inkinga yayisadinga ukuxazululwa, ngaqala ukucabanga ngama-JOIN nama-Pivots amatafula, angazange akhuthaze ithemba ngobuhle, kodwa kungaba inketho esebenzisekayo ngokuphelele ekusebenzeni.
Iphrojekthi yethu isivele isebenzisa i-Apache ClickHouse, ngakho-ke nginqume ukuhlola ucwaningo lwami kule DBMS yokuhlaziya.
Kusetshenziswe i-ClickHouse kusetshenziswa iresiphi elula:
sudo docker run -d --name clickhouse_1
--ulimit nofile=262144:262144
-v /opt/clickhouse/log:/var/log/clickhouse-server
-v /opt/clickhouse/data:/var/lib/clickhouse
yandex/clickhouse-server
Ngakha i-database kanye netafula kuyo kanje:
CREATE TABLE
db.steps (`area` Int64, `when` DateTime64(1, 'Europe/Moscow') DEFAULT now64(), `zone` Int64, `person` Int64)
ENGINE = MergeTree() ORDER BY (area, zone, person) SETTINGS index_granularity = 8192
Ngiyigcwalise ngedatha ngisebenzisa umbhalo olandelayo:
from time import time
from clickhouse_driver import Client
from random import random
client = Client('vm-12c2c34c-df68-4a98-b1e5-a4d1cef1acff.domain',
database='db',
password='secret')
max = 20
for r in range(0, 100000):
if r % 1000 == 0:
print("CNT: {}, TS: {}".format(r, time()))
data = [{
'area': 0,
'zone': 0,
'person': r
}]
while True:
if random() < 0.3:
break
data.append({
'area': 0,
'zone': int(random() * (max - 2)) + 1,
'person': r
})
data.append({
'area': 0,
'zone': max - 1,
'person': r
})
client.execute(
'INSERT INTO steps (area, zone, person) VALUES',
data
)
Njengoba okufakiwe kuza ngamaqoqo, ukugcwalisa bekushesha kakhulu kune-JanusGraph.
Yakha imibuzo emibili kusetshenziswa JOIN. Ukusuka ephuzwini A uye endaweni engu-B:
SELECT s1.person AS person,
s1.zone,
s1.when,
s2.zone,
s2.when
FROM
(SELECT *
FROM steps
WHERE (area = 0)
AND (zone = 0)) AS s1 ANY INNER JOIN
(SELECT *
FROM steps AS s2
WHERE (area = 0)
AND (zone = 19)) AS s2 USING person
WHERE s1.when <= s2.when
Ukudlula amaphuzu ama-3:
SELECT s3.person,
s1z,
s1w,
s2z,
s2w,
s3.zone,
s3.when
FROM
(SELECT s1.person AS person,
s1.zone AS s1z,
s1.when AS s1w,
s2.zone AS s2z,
s2.when AS s2w
FROM
(SELECT *
FROM steps
WHERE (area = 0)
AND (zone = 0)) AS s1 ANY INNER JOIN
(SELECT *
FROM steps AS s2
WHERE (area = 0)
AND (zone = 3)) AS s2 USING person
WHERE s1.when <= s2.when) p ANY INNER JOIN
(SELECT *
FROM steps
WHERE (area = 0)
AND (zone = 19)) AS s3 USING person
WHERE p.s2w <= s3.when
Izicelo, vele, zibukeka zesabeka impela; ukuze uyisebenzise ngempela, udinga ukwakha ihhanisi le-software generator. Nokho, ziyasebenza futhi zisebenza ngokushesha. Kokubili izicelo zokuqala nezesibili ziqedwa ngaphansi kwamasekhondi angu-0.1. Nasi isibonelo sesikhathi sokwenza kombuzo sokubala(*) esidlula amaphoyinti angu-3:
SELECT count(*)
FROM
(
SELECT
s1.person AS person,
s1.zone AS s1z,
s1.when AS s1w,
s2.zone AS s2z,
s2.when AS s2w
FROM
(
SELECT *
FROM steps
WHERE (area = 0) AND (zone = 0)
) AS s1
ANY INNER JOIN
(
SELECT *
FROM steps AS s2
WHERE (area = 0) AND (zone = 3)
) AS s2 USING (person)
WHERE s1.when <= s2.when
) AS p
ANY INNER JOIN
(
SELECT *
FROM steps
WHERE (area = 0) AND (zone = 19)
) AS s3 USING (person)
WHERE p.s2w <= s3.when
┌─count()─┐
│ 11592 │
└─────────┘
1 rows in set. Elapsed: 0.068 sec. Processed 250.03 thousand rows, 8.00 MB (3.69 million rows/s., 117.98 MB/s.)
Inothi mayelana ne-IOPS. Lapho igcwalisa idatha, i-JanusGraph ikhiqize inani eliphezulu kakhulu le-IOPS (1000-1300 emicu emine yabantu bedatha) futhi i-IOWAIT yayiphezulu kakhulu. Ngaso leso sikhathi, i-ClickHouse ikhiqize umthwalo omncane kusistimu engaphansi yediski.
isiphetho
Sinqume ukusebenzisa i-ClickHouse ukuze sisevise lolu hlobo lwesicelo. Singahlala sithuthukisa imibuzo sisebenzisa ukubukwa okwenyama kanye nokufanisa ngokucubungula kusengaphambili ukusakaza komcimbi sisebenzisa i-Apache Flink ngaphambi kokuyilayisha ku-ClickHouse.
Ukusebenza kuhle kangangokuthi cishe ngeke size sicabange ngokuzungeza amatafula ngokohlelo. Ngaphambilini, bekufanele senze ama-pivots edatha etholwe ku-Vertica ngokulayisha ku-Apache Parquet.
Ngeshwa, omunye umzamo wokusebenzisa igrafu ye-DBMS awuphumelelanga. Angizange ngithole i-JanusGraph ukuthi ibe ne-ecosystem enobungane eyenza kwaba lula ukusheshisa umkhiqizo. Ngasikhathi sinye, ukulungisa iseva, kusetshenziswa indlela yeJava yendabuko, ezokwenza abantu abangajwayelene neJava bakhale izinyembezi zegazi:
host: 0.0.0.0
port: 8182
threadPoolWorker: 1
gremlinPool: 8
scriptEvaluationTimeout: 30000
channelizer: org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer
graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties,
airlines: conf/airlines.properties
}
scriptEngines: {
gremlin-groovy: {
plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/airline-sample.groovy]}}}}
serializers:
# GraphBinary is here to replace Gryo and Graphson
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }}
# Gryo and Graphson, latest versions
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
# Older serialization versions for backwards compatibility:
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
processors:
- { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
- { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
metrics: {
consoleReporter: {enabled: false, interval: 180000},
csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
jmxReporter: {enabled: false},
slf4jReporter: {enabled: true, interval: 180000},
gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferHighWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
enabled: false}
Ngikwazile "ukubeka" ngephutha inguqulo ye-BerkeleyDB ye-JanusGraph.
Amadokhumenti agwegwile ngokwezinkomba, njengoba ukuphatha izinkomba kudinga ukuthi wenze ubushamanism obuyinqaba eGroovy. Isibonelo, ukwenza inkomba kufanele kwenziwe ngokubhala ikhodi kukhonsoli ye-Gremlin (okuthi, ngendlela, engasebenzi ngaphandle kwebhokisi). Kusuka kumadokhumenti asemthethweni we-JanusGraph:
graph.tx().rollback() //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.buildIndex('byNameAndAgeComposite', Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()
//Wait for the index to become available
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameComposite').call()
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameAndAgeComposite').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"), SchemaAction.REINDEX).get()
mgmt.commit()
I-Afterword
Ngomqondo othile, ukuhlola okungenhla kuwukuqhathanisa phakathi kokufudumele nokuthambile. Uma ucabanga ngakho, igrafu i-DBMS yenza eminye imisebenzi ukuze ithole imiphumela efanayo. Kodwa-ke, njengengxenye yokuhlolwa, ngiphinde ngenza isilingo ngesicelo esifana nalokhu:
g.V().hasLabel('ZoneStep').has('id',0)
.repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',1)).count().next()
okukhombisa ibanga lokuhamba. Kodwa-ke, ngisho nakudatha enjalo, i-DBMS yegrafu ibonise imiphumela eye yadlula imizuzwana embalwa ... Lokhu, yiqiniso, kungenxa yokuthi kwakukhona izindlela ezifana 0 -> X -> Y ... -> 1
, okuhloliwe injini yegrafu.
Ngisho nombuzo ofana nokuthi:
g.V().hasLabel('ZoneStep').has('id',0).out().has('id',1)).count().next()
Angikwazanga ukuthola impendulo ekhiqizayo ngesikhathi sokucubungula esingaphansi kwesekhondi.
Ukuziphatha kwendaba ukuthi umbono omuhle kanye ne-paradigmatic modeling akuholeli kumphumela oyifunayo, oboniswa ngokusebenza kahle okuphezulu kakhulu usebenzisa isibonelo se-ClickHouse. Icala lokusebenzisa elivezwe kulesi sihloko liyiphethini eliphikisayo elicacile lama-DBMS egrafu, nakuba libonakala lilungele ukumodela ku-paradigm yawo.
Source: www.habr.com