Kia ora katoa. Kei te whakawhanake matou i tetahi hua mo te tātari waka tuimotu. He mahi ta te kaupapa e pa ana ki te tātari tauanga o nga huarahi manuhiri puta noa i nga rohe.
Hei waahanga o tenei mahi, ka taea e nga kaiwhakamahi te patai i nga patai punaha o te momo e whai ake nei:
- e hia nga manuhiri i haere mai i te waahi "A" ki te waahi "B";
- e hia nga manuhiri i haere mai i te waahi "A" ki te waahi "B" ma te waahi "C" ka haere ma te waahi "D";
- e hia te roa o te haerenga o tetahi momo manuhiri mai i te waahi "A" ki te waahi "B".
me te maha o nga paatai wetewete rite.
Ko te nekehanga o te manuhiri puta noa i nga waahi he kauwhata whai mana. I muri i te panui i te Ipurangi, ka kitea e au kei te whakamahia nga DBMS kauwhata mo nga purongo tātari. I hiahia ahau ki te kite me pehea e taea ai e nga DBMS kauwhata nga patai penei (TL; DR; kino).
I whiriwhiri ahau ki te whakamahi i te DBMS
- Putunga rokiroki BerkeleyDB, Apache Cassandra, Scylla;
- Ka taea te penapena i nga tohu matatini ki Lucene, Elasticsearch, Solr.
Ka tuhia e nga kaituhi o JanusGraph he pai mo te OLTP me te OLAP.
I mahi tahi ahau me BerkeleyDB, Apache Cassandra, Scylla me ES, a he maha nga wa e whakamahia ana enei hua i roto i a maatau punaha, na reira i tino pai ahau ki te whakamatautau i tenei kauwhata DBMS. I kitea e au he rerekee te whiriwhiri i a BerkeleyDB i runga i a RocksDB, engari na nga whakaritenga tauwhitinga pea tera. Ahakoa he aha, mo te tauineine, te whakamahi hua, ka whakaarohia kia whakamahia he tuara i runga i a Cassandra, i a Scylla ranei.
Kaore au i whakaaro ki a Neo4j na te mea ka hiahiatia e te kohinga he putanga arumoni, ara, ehara te hua i te puna tuwhera.
E kii ana nga DBMS Kauwhata: "Mena he kauwhata te ahua, me rite ki te kauwhata!" - ataahua!
Tuatahi, ka tuhia e au he kauwhata, he mea hanga kia rite ki nga canons o nga DBMS kauwhata:
He maatanga Zone
, te kawenga mo te rohe. Mehemea ZoneStep
no tenei Zone
, katahi ka korerotia e ia. I runga i te mauri Area
, ZoneTrack
, Person
Kaua e aro, no te rohe ratou, kaore i te whakaarohia hei waahanga o te whakamatautau. I te katoa, ka penei te ahua o te uiui rapu mekameka mo taua hanganga kauwhata:
g.V().hasLabel('Zone').has('id',0).in_()
.repeat(__.out()).until(__.out().hasLabel('Zone').has('id',19)).count().next()
He aha te reo Ruhia he penei: kimihia he Rohe me te ID=0, tangohia nga poutoko katoa e haere ai tetahi tapa ki reira (ZoneStep), takahia kaua e hoki whakamuri kia kitea ra ano nga RoheSteps kei reira he tapa ki te Rohe ID=19, tatauhia te maha o aua mekameka.
Kare au i te kii kei te mohio au ki nga uauatanga katoa o te rapu i runga kauwhata, engari i hangaia tenei patai i runga i tenei pukapuka (
I utaina e ahau te 50 mano riu mai i te 3 ki te 20 ira te roa ki roto i te putunga kauwhata JanusGraph ma te whakamahi i te tuara o BerkeleyDB, i hangaia nga tohu e ai ki te
Python tikiake hōtuhi:
from random import random
from time import time
from init import g, graph
if __name__ == '__main__':
points = []
max_zones = 19
zcache = dict()
for i in range(0, max_zones + 1):
zcache[i] = g.addV('Zone').property('id', i).next()
startZ = zcache[0]
endZ = zcache[max_zones]
for i in range(0, 10000):
if not i % 100:
print(i)
start = g.addV('ZoneStep').property('time', int(time())).next()
g.V(start).addE('belongs').to(startZ).iterate()
while True:
pt = g.addV('ZoneStep').property('time', int(time())).next()
end_chain = random()
if end_chain < 0.3:
g.V(pt).addE('belongs').to(endZ).iterate()
g.V(start).addE('goes').to(pt).iterate()
break
else:
zone_id = int(random() * max_zones)
g.V(pt).addE('belongs').to(zcache[zone_id]).iterate()
g.V(start).addE('goes').to(pt).iterate()
start = pt
count = g.V().count().next()
print(count)
I whakamahia e matou he VM me nga waahanga 4 me te 16 GB RAM i runga i te SSD. I tukuna a JanusGraph ma te whakamahi i tenei whakahau:
docker run --name janusgraph -p8182:8182 janusgraph/janusgraph:latest
I tenei keehi, ko nga raraunga me nga taurangi e whakamahia ana mo nga rapunga orite tika ka rongoa ki BerkeleyDB. I muri i te whakatutuki i te tono i tukuna mai i mua, ka whiwhi ahau i te wa e rite ana ki etahi tekau hēkona.
Na roto i te whakahaere i nga tuhinga e 4 i runga ake nei i te whakarara, i taea e au te huri i te DBMS ki te paukena me te awa koa o Java stacktraces (a he pai ki a matou te panui Java stacktraces) i roto i nga raarangi Docker.
I muri i etahi whakaaro, ka whakatau ahau ki te whakangawari i te hoahoa kauwhata ki enei e whai ake nei:
Ko te whakatau ko te rapu ma nga huanga hinonga ka tere ake i te rapu ma te taha. Ko te mutunga, ka huri taku tono ki enei e whai ake nei:
g.V().hasLabel('ZoneStep').has('id',0).repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',19)).count().next()
He aha te reo Ruhia he penei: kimihia te ZoneStep me te ID=0, takahia kaua e hoki whakamuri kia kitea ra ano a ZoneStep me te ID=19, tatauhia te maha o aua mekameka.
I whakangwarihia ano e ahau nga tuhinga uta i runga ake nei kia kore ai e hanga hononga koretake, ka whakawhäitihia ahau ki nga huanga.
He maha nga hēkona tonu te tono ki te whakaoti, kaore i tino whakaaetia mo a maatau mahi, i te mea kaore i te pai mo nga kaupapa o nga tono a AdHoc ahakoa he aha.
I whakamatau ahau ki te whakamahi i a JanusGraph ma te whakamahi i a Scylla hei whakatinanatanga tere a Cassandra, engari kaore ano tenei i arahi ki nga huringa mahi nui.
Na ahakoa te mea "he ahua kauwhata" karekau e taea e te kauwhata DBMS te tukatuka tere. Kei te tino whakaaro ahau he mea kaore au i te mohio ka taea e JanusGraph te mahi i tenei rapunga i roto i te hautanga o te hēkona, heoi, kaore au i kaha ki te mahi.
I te mea me whakatika tonu te raru, ka tiimata ahau ki te whakaaro mo nga Hononga me nga Kaurori o nga ripanga, kaore nei i whakahihiko i te whakaaro o te huatau, engari ka taea pea te mahi i roto i nga mahi.
Kua whakamahia e to maatau kaupapa a Apache ClickHouse, no reira ka whakatau ahau ki te whakamatau i aku rangahau mo tenei DBMS tātari.
I tukuna a ClickHouse ma te whakamahi i tetahi tohutao ngawari:
sudo docker run -d --name clickhouse_1
--ulimit nofile=262144:262144
-v /opt/clickhouse/log:/var/log/clickhouse-server
-v /opt/clickhouse/data:/var/lib/clickhouse
yandex/clickhouse-server
I hanga e ahau he papa raraunga me tetahi ripanga ki roto penei:
CREATE TABLE
db.steps (`area` Int64, `when` DateTime64(1, 'Europe/Moscow') DEFAULT now64(), `zone` Int64, `person` Int64)
ENGINE = MergeTree() ORDER BY (area, zone, person) SETTINGS index_granularity = 8192
I whakakiia e ahau ki nga raraunga ma te whakamahi i te tuhinga e whai ake nei:
from time import time
from clickhouse_driver import Client
from random import random
client = Client('vm-12c2c34c-df68-4a98-b1e5-a4d1cef1acff.domain',
database='db',
password='secret')
max = 20
for r in range(0, 100000):
if r % 1000 == 0:
print("CNT: {}, TS: {}".format(r, time()))
data = [{
'area': 0,
'zone': 0,
'person': r
}]
while True:
if random() < 0.3:
break
data.append({
'area': 0,
'zone': int(random() * (max - 2)) + 1,
'person': r
})
data.append({
'area': 0,
'zone': max - 1,
'person': r
})
client.execute(
'INSERT INTO steps (area, zone, person) VALUES',
data
)
I te mea ka uru mai nga whakaurunga, he tere ake te whakakii i a JanusGraph.
I hangaia nga patai e rua ma te whakamahi i te JOIN. Hei neke mai i te waahi A ki te waahi B:
SELECT s1.person AS person,
s1.zone,
s1.when,
s2.zone,
s2.when
FROM
(SELECT *
FROM steps
WHERE (area = 0)
AND (zone = 0)) AS s1 ANY INNER JOIN
(SELECT *
FROM steps AS s2
WHERE (area = 0)
AND (zone = 19)) AS s2 USING person
WHERE s1.when <= s2.when
Ki te haere i roto i nga tohu e 3:
SELECT s3.person,
s1z,
s1w,
s2z,
s2w,
s3.zone,
s3.when
FROM
(SELECT s1.person AS person,
s1.zone AS s1z,
s1.when AS s1w,
s2.zone AS s2z,
s2.when AS s2w
FROM
(SELECT *
FROM steps
WHERE (area = 0)
AND (zone = 0)) AS s1 ANY INNER JOIN
(SELECT *
FROM steps AS s2
WHERE (area = 0)
AND (zone = 3)) AS s2 USING person
WHERE s1.when <= s2.when) p ANY INNER JOIN
(SELECT *
FROM steps
WHERE (area = 0)
AND (zone = 19)) AS s3 USING person
WHERE p.s2w <= s3.when
Ko nga tono, he tino whakamataku te ahua; mo te whakamahi pono, me hanga e koe he taputapu miihini rorohiko. Heoi, ka mahi, ka tere te mahi. Ko nga tono tuatahi me te tuarua ka oti i roto i te 0.1 hēkona. Anei he tauira o te wa mahi uiui mo te tatau(*) e haere ana i nga tohu e 3:
SELECT count(*)
FROM
(
SELECT
s1.person AS person,
s1.zone AS s1z,
s1.when AS s1w,
s2.zone AS s2z,
s2.when AS s2w
FROM
(
SELECT *
FROM steps
WHERE (area = 0) AND (zone = 0)
) AS s1
ANY INNER JOIN
(
SELECT *
FROM steps AS s2
WHERE (area = 0) AND (zone = 3)
) AS s2 USING (person)
WHERE s1.when <= s2.when
) AS p
ANY INNER JOIN
(
SELECT *
FROM steps
WHERE (area = 0) AND (zone = 19)
) AS s3 USING (person)
WHERE p.s2w <= s3.when
┌─count()─┐
│ 11592 │
└─────────┘
1 rows in set. Elapsed: 0.068 sec. Processed 250.03 thousand rows, 8.00 MB (3.69 million rows/s., 117.98 MB/s.)
He korero mo te IOPS. I te wa e noho ana nga raraunga, i hangaia e JanusGraph he tau tino nui o IOPS (1000-1300 mo nga miro taupori raraunga e wha) me te IOWAIT he tino tiketike. I te wa ano, i hangaia e ClickHouse te iti o te kawenga i runga i te punaha iti o te kōpae.
mutunga
I whakatau matou ki te whakamahi ClickHouse ki te mahi i tenei momo tono. Ka taea e taatau te arotau i nga paatai ma te whakamahi i nga tirohanga me te whakarara ma te tukatuka i mua i te awa huihuinga ma te whakamahi i te Apache Flink i mua i te utaina ki ClickHouse.
He pai rawa te mahinga, kare pea tatou e whai whakaaro mo te pivoting table ma te hotaka. I mua, me mahi tatou i nga pivots o nga raraunga i tangohia mai i Vertica ma te tuku ki Apache Parquet.
Ko te mea pouri, ko tetahi atu ngana ki te whakamahi i te kauwhata DBMS kaore i tutuki. Kaore au i kite i a JanusGraph he puunaha rauwiringa kaiao e ngawari ana ki te whakatika tere me te hua. I te wa ano, ki te whirihora i te kaimau, ka whakamahia te tikanga Java tuku iho, ka tangi nga roimata o te toto o nga tangata kaore e mohio ki a Java:
host: 0.0.0.0
port: 8182
threadPoolWorker: 1
gremlinPool: 8
scriptEvaluationTimeout: 30000
channelizer: org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer
graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties,
airlines: conf/airlines.properties
}
scriptEngines: {
gremlin-groovy: {
plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/airline-sample.groovy]}}}}
serializers:
# GraphBinary is here to replace Gryo and Graphson
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }}
# Gryo and Graphson, latest versions
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
# Older serialization versions for backwards compatibility:
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
processors:
- { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
- { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
metrics: {
consoleReporter: {enabled: false, interval: 180000},
csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
jmxReporter: {enabled: false},
slf4jReporter: {enabled: true, interval: 180000},
gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferHighWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
enabled: false}
I taea e au te "tuku" i te putanga BerkeleyDB o JanusGraph.
Ko nga tuhinga he tino kopikopiko mo nga tohu, na te mea ko te whakahaere i nga taurangi me mahi koe i etahi mahi shamanism rerekee i Groovy. Hei tauira, ko te hanga i tetahi tohu me mahi ma te tuhi i te waehere i roto i te papatohu Gremlin (e, na te ara, kaore e mahi i waho o te pouaka). Mai i te tuhinga mana JanusGraph:
graph.tx().rollback() //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.buildIndex('byNameAndAgeComposite', Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()
//Wait for the index to become available
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameComposite').call()
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameAndAgeComposite').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"), SchemaAction.REINDEX).get()
mgmt.commit()
Afterword
I roto i te tikanga, ko te whakamatautau i runga ake nei he whakataurite i waenga i te mahana me te ngawari. Mena ka whakaarohia e koe, ka mahia e te kauwhata DBMS etahi atu mahi kia rite ai nga hua. Heoi, hei waahanga o nga whakamatautau, i whakahaerehia ano e ahau tetahi whakamatautau me te tono penei:
g.V().hasLabel('ZoneStep').has('id',0)
.repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',1)).count().next()
e whakaatu ana i te tawhiti hikoi. Heoi, ahakoa i runga i aua raraunga, i whakaatu te kauwhata DBMS i nga hua i puta i tua atu i etahi hēkona ... Ko tenei, he tika, na te mea he huarahi penei 0 -> X -> Y ... -> 1
, i tirohia hoki e te miihini kauwhata.
Ahakoa mo te patai penei:
g.V().hasLabel('ZoneStep').has('id',0).out().has('id',1)).count().next()
Kaore i taea e au te whiwhi urupare whai hua me te wa tukatuka iti iho i te hekona.
Ko te morare o te korero ko te whakaaro ataahua me te whakatauira paradigmatic e kore e arahi ki te hua e hiahiatia ana, e whakaatuhia ana me te tino pai ake ma te whakamahi i te tauira o ClickHouse. Ko te take whakamahi e whakaatuhia ana i roto i tenei tuhinga he tino tauira anti-tauira mo nga DBMS kauwhata, ahakoa he pai te ahua mo te whakatauira i roto i o raatau tauira.
Source: will.com