Aloha kākou. Ke kūkulu nei mākou i kahi huahana no ka nānā ʻana i nā kaʻa waho. He hana ko ka papahana e pili ana i ka ʻikepili helu o nā ala malihini kipa ma nā wahi.
Ma ke ʻano o kēia hana, hiki i nā mea hoʻohana ke nīnau i nā nīnau ʻōnaehana o kēia ʻano:
- ehia mau malihini i hele mai kahi "A" a i kahi "B";
- ʻehia ka nui o nā malihini i hele mai kahi "A" a i kahi "B" ma o kahi "C" a laila ma kahi "D";
- pehea ka lōʻihi o ka hele ʻana o kekahi ʻano malihini mai kahi “A” a i kahi “B”.
a me kekahi mau ninau analytical like.
ʻO ka neʻe ʻana o ka malihini ma nā wahi he pakuhi kuhikuhi. Ma hope o ka heluhelu ʻana i ka Pūnaewele, ʻike wau ua hoʻohana pū ʻia nā DBMS kiʻi no nā hōʻike analytical. Ua makemake au e ʻike pehea e hoʻokō ai nā DBMS kiʻi i kēlā mau nīnau (TL; DR; maikaʻi ʻole).
Ua koho au e hoʻohana i ka DBMS
- BerkeleyDB waihona waihona, Apache Cassandra, Scylla;
- Hiki ke mālama ʻia nā index paʻakikī ma Lucene, Elasticsearch, Solr.
Ua kākau nā mea kākau o JanusGraph he kūpono ia no OLTP a me OLAP.
Ua hana au me BerkeleyDB, Apache Cassandra, Scylla a me ES, a ua hoʻohana pinepine ʻia kēia mau huahana i kā mākou ʻōnaehana, no laila ua manaʻo wau e hoʻāʻo i kēia DBMS kiʻi. Ua ʻike wau he mea paʻakikī ke koho iā BerkeleyDB ma luna o RocksDB, akā ma muli paha ia o nā koi kālepa. I kekahi hihia, no ka scalable, hoʻohana huahana, ua manaʻo ʻia e hoʻohana i kahi backend ma Cassandra a i ʻole Scylla.
ʻAʻole wau i noʻonoʻo iā Neo4j no ka mea e koi ana ka clustering i kahi mana pāʻoihana, ʻo ia hoʻi, ʻaʻole i wehe ʻia ka huahana.
'Ōlelo nā Graph DBMS: "Inā like ia me ka pakuhi, e mālama iā ia e like me ka pakuhi!" - nani!
ʻO ka mua, ua kahakiʻi au i kahi pakuhi, i hana ʻia e like me nā canons o ka pakuhi DBMSs:
Aia kekahi ʻano Zone
, kuleana no ia wahi. Ina ZoneStep
no keia Zone
, a laila kuhikuhi ʻo ia iā ia. Ma ke kumu Area
, ZoneTrack
, Person
Mai hoʻolohe, aia lākou i ka domain a ʻaʻole i manaʻo ʻia he ʻāpana o ka hoʻāʻo. ʻO ka huina, e like me ke ʻano o kahi hulina hulina kaulahao no kēlā ʻano kiʻi:
g.V().hasLabel('Zone').has('id',0).in_()
.repeat(__.out()).until(__.out().hasLabel('Zone').has('id',19)).count().next()
He aha ka ʻōlelo Lūkini e like me kēia: e ʻimi i kahi Zone me ID = 0, e lawe i nā vertices a pau kahi e hele ai kahi ʻaoʻao iā ia (ZoneStep), e hehi me ka hoʻi ʻole ʻana a ʻike ʻoe i kēlā mau ZoneSteps mai kahi ʻaoʻao i ka Zone me ID=19, e helu i ka helu o ia mau kaulahao.
ʻAʻole au e hoʻohālike i ka ʻike i nā mea paʻakikī o ka ʻimi ʻana ma nā pakuhi, akā ua hana ʻia kēia nīnau ma muli o kēia puke (
Ua hoʻouka au i 50 tausani mau mele mai ka 3 a hiki i ka 20 mau helu ka lōʻihi i loko o kahi waihona kiʻi kiʻi JanusGraph me ka hoʻohana ʻana i ka BerkeleyDB backend, i hana i nā kuhikuhi e like me
Python hoʻoiho palapala:
from random import random
from time import time
from init import g, graph
if __name__ == '__main__':
points = []
max_zones = 19
zcache = dict()
for i in range(0, max_zones + 1):
zcache[i] = g.addV('Zone').property('id', i).next()
startZ = zcache[0]
endZ = zcache[max_zones]
for i in range(0, 10000):
if not i % 100:
print(i)
start = g.addV('ZoneStep').property('time', int(time())).next()
g.V(start).addE('belongs').to(startZ).iterate()
while True:
pt = g.addV('ZoneStep').property('time', int(time())).next()
end_chain = random()
if end_chain < 0.3:
g.V(pt).addE('belongs').to(endZ).iterate()
g.V(start).addE('goes').to(pt).iterate()
break
else:
zone_id = int(random() * max_zones)
g.V(pt).addE('belongs').to(zcache[zone_id]).iterate()
g.V(start).addE('goes').to(pt).iterate()
start = pt
count = g.V().count().next()
print(count)
Ua hoʻohana mākou i kahi VM me 4 cores a me 16 GB RAM ma kahi SSD. Ua hoʻohana ʻia ʻo JanusGraph me kēia kauoha:
docker run --name janusgraph -p8182:8182 janusgraph/janusgraph:latest
Ma kēia hihia, mālama ʻia nā ʻikepili a me nā kuhikuhi i hoʻohana ʻia no nā huli hoʻohālikelike pololei ma BerkeleyDB. Ma hope o ka hoʻokō ʻana i ka noi i hāʻawi ʻia ma mua, ua loaʻa iaʻu kahi manawa like me nā ʻumi kekona.
Ma ka holo ʻana i nā palapala 4 ma luna aʻe i ka like, ua hiki iaʻu ke hoʻohuli i ka DBMS i ka paukena me kahi kahawai ʻoliʻoli o Java stacktraces (a makemake mākou a pau i ka heluhelu ʻana i nā stacktraces Java) i nā lāʻau Docker.
Ma hope o ka noʻonoʻo ʻana, ua hoʻoholo wau e hoʻomaʻamaʻa i ke kiʻikuhi i kēia mau mea:
ʻOi aku ka wikiwiki o ka huli ʻana ma nā ʻano hui ma mua o ka huli ʻana ma nā ʻaoʻao. ʻO ka hopena, ua lilo kaʻu noi i kēia:
g.V().hasLabel('ZoneStep').has('id',0).repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',19)).count().next()
He aha ka ʻōlelo Lūkini e like me kēia: e ʻimi iā ZoneStep me ID=0, hehi me ka hoʻi ʻole a hiki i ka loaʻa ʻana o ZoneStep me ID=19, e helu i ka helu o ia mau kaulahao.
Ua maʻalahi hoʻi au i ka palapala hoʻouka i hāʻawi ʻia ma luna nei i ʻole e hana i nā pilina pono ʻole, e kaupalena ana iaʻu iho i nā ʻano.
He mau kekona mau kekona e hoʻopau ai ka noi, ʻaʻole i ʻae ʻia no kā mākou hana, ʻoiai ʻaʻole kūpono ia no nā kumu o nā noi AdHoc o kēlā me kēia ʻano.
Ua ho'āʻo wau e hoʻohana iā JanusGraph me Scylla e like me ka wikiwiki o Cassandra hoʻokō, akā ʻaʻole kēia i alakaʻi i nā loli hana nui.
No laila, ʻoiai ʻo ka "like me ka pakuhi", ʻaʻole hiki iaʻu ke kiʻi i ka DBMS kiʻi e hana wikiwiki. Ke manaʻo nei au aia kekahi mea aʻu i ʻike ʻole ai a hiki ke hana ʻia ʻo JanusGraph e hana i kēia ʻimi i kahi hapa o kekona, akā naʻe, ʻaʻole hiki iaʻu ke hana.
No ka mea e pono e hoʻoponopono ʻia ka pilikia, hoʻomaka wau e noʻonoʻo e pili ana i nā JOIN a me nā Pivots o nā papa, ʻaʻole ia i hoʻoulu i ka manaʻo maikaʻi ma ke ʻano o ka nani, akā hiki ke lilo i koho kūpono loa i ka hana.
Ua hoʻohana mua kā mākou papahana iā Apache ClickHouse, no laila ua hoʻoholo wau e hoʻāʻo i kaʻu noiʻi ma kēia DBMS analytical.
Hoʻohana ʻia ʻo ClickHouse me ka hoʻohana ʻana i kahi meaʻai maʻalahi:
sudo docker run -d --name clickhouse_1
--ulimit nofile=262144:262144
-v /opt/clickhouse/log:/var/log/clickhouse-server
-v /opt/clickhouse/data:/var/lib/clickhouse
yandex/clickhouse-server
Ua hana au i kahi waihona a me kahi papa i loko e like me kēia:
CREATE TABLE
db.steps (`area` Int64, `when` DateTime64(1, 'Europe/Moscow') DEFAULT now64(), `zone` Int64, `person` Int64)
ENGINE = MergeTree() ORDER BY (area, zone, person) SETTINGS index_granularity = 8192
Ua hoʻopiha au iā ia me ka ʻikepili me ka hoʻohana ʻana i kēia palapala:
from time import time
from clickhouse_driver import Client
from random import random
client = Client('vm-12c2c34c-df68-4a98-b1e5-a4d1cef1acff.domain',
database='db',
password='secret')
max = 20
for r in range(0, 100000):
if r % 1000 == 0:
print("CNT: {}, TS: {}".format(r, time()))
data = [{
'area': 0,
'zone': 0,
'person': r
}]
while True:
if random() < 0.3:
break
data.append({
'area': 0,
'zone': int(random() * (max - 2)) + 1,
'person': r
})
data.append({
'area': 0,
'zone': max - 1,
'person': r
})
client.execute(
'INSERT INTO steps (area, zone, person) VALUES',
data
)
No ka hiki ʻana mai o nā mea hoʻokomo i nā pūʻulu, ʻoi aku ka wikiwiki o ka hoʻopiha ʻana ma mua o JanusGraph.
Ua kūkulu ʻia ʻelua nīnau me ka hoʻohana ʻana iā JOIN. No ka neʻe ʻana mai kahi A i kahi B:
SELECT s1.person AS person,
s1.zone,
s1.when,
s2.zone,
s2.when
FROM
(SELECT *
FROM steps
WHERE (area = 0)
AND (zone = 0)) AS s1 ANY INNER JOIN
(SELECT *
FROM steps AS s2
WHERE (area = 0)
AND (zone = 19)) AS s2 USING person
WHERE s1.when <= s2.when
No ka hele ʻana i nā helu 3:
SELECT s3.person,
s1z,
s1w,
s2z,
s2w,
s3.zone,
s3.when
FROM
(SELECT s1.person AS person,
s1.zone AS s1z,
s1.when AS s1w,
s2.zone AS s2z,
s2.when AS s2w
FROM
(SELECT *
FROM steps
WHERE (area = 0)
AND (zone = 0)) AS s1 ANY INNER JOIN
(SELECT *
FROM steps AS s2
WHERE (area = 0)
AND (zone = 3)) AS s2 USING person
WHERE s1.when <= s2.when) p ANY INNER JOIN
(SELECT *
FROM steps
WHERE (area = 0)
AND (zone = 19)) AS s3 USING person
WHERE p.s2w <= s3.when
ʻO nā noi, ʻoiaʻiʻo, makaʻu loa; no ka hoʻohana maoli ʻana, pono ʻoe e hana i kahi lako polokalamu generator harness. Eia naʻe, hana lākou a hana wikiwiki. Hoʻopau ʻia nā noi mua a me ka lua ma lalo o 0.1 kekona. Eia kekahi laʻana o ka manawa hoʻokō nīnau no ka helu (*) e hele ana ma 3 mau helu:
SELECT count(*)
FROM
(
SELECT
s1.person AS person,
s1.zone AS s1z,
s1.when AS s1w,
s2.zone AS s2z,
s2.when AS s2w
FROM
(
SELECT *
FROM steps
WHERE (area = 0) AND (zone = 0)
) AS s1
ANY INNER JOIN
(
SELECT *
FROM steps AS s2
WHERE (area = 0) AND (zone = 3)
) AS s2 USING (person)
WHERE s1.when <= s2.when
) AS p
ANY INNER JOIN
(
SELECT *
FROM steps
WHERE (area = 0) AND (zone = 19)
) AS s3 USING (person)
WHERE p.s2w <= s3.when
┌─count()─┐
│ 11592 │
└─────────┘
1 rows in set. Elapsed: 0.068 sec. Processed 250.03 thousand rows, 8.00 MB (3.69 million rows/s., 117.98 MB/s.)
He leka e pili ana i IOPS. I ka helu ʻana i ka ʻikepili, ua hoʻopuka ʻo JanusGraph i kahi helu kiʻekiʻe loa o IOPS (1000-1300 no nā pae helu helu ʻehā) a ua kiʻekiʻe loa ʻo IOWAIT. I ka manawa like, ua hana ʻo ClickHouse i ka haʻahaʻa liʻiliʻi ma ka subsystem disk.
hopena
Ua hoʻoholo mākou e hoʻohana i ClickHouse e lawelawe i kēia ʻano noi. Hiki iā mākou ke hoʻonui i nā nīnau me ka hoʻohana ʻana i nā manaʻo a me ka hoʻohālikelike ʻana ma o ka hoʻoponopono mua ʻana i ke kahawai hanana me Apache Flink ma mua o ka hoʻouka ʻana iā lākou i ClickHouse.
Maikaʻi loa ka hana a ʻaʻole paha mākou e noʻonoʻo e pili ana i ka pivoting papa ma ka papahana. Ma mua, pono mākou e hana i nā pivots o ka ʻikepili i kiʻi ʻia mai Vertica ma o ka hoʻouka ʻana iā Apache Parquet.
ʻO ka mea pōʻino, ʻaʻole i kūleʻa ka hoʻāʻo ʻē aʻe e hoʻohana i ka pakuhi DBMS. ʻAʻole i loaʻa iaʻu iā JanusGraph kahi kaiaola aloha i maʻalahi i ka wikiwiki me ka huahana. I ka manawa like, no ka hoʻonohonoho ʻana i ke kikowaena, hoʻohana ʻia ke ala Java kuʻuna, kahi e uē ai ka poʻe ʻike ʻole iā Java i nā waimaka o ke koko:
host: 0.0.0.0
port: 8182
threadPoolWorker: 1
gremlinPool: 8
scriptEvaluationTimeout: 30000
channelizer: org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer
graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
ConfigurationManagementGraph: conf/janusgraph-cql-configurationgraph.properties,
airlines: conf/airlines.properties
}
scriptEngines: {
gremlin-groovy: {
plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/airline-sample.groovy]}}}}
serializers:
# GraphBinary is here to replace Gryo and Graphson
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }}
# Gryo and Graphson, latest versions
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
# Older serialization versions for backwards compatibility:
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
processors:
- { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
- { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
metrics: {
consoleReporter: {enabled: false, interval: 180000},
csvReporter: {enabled: false, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
jmxReporter: {enabled: false},
slf4jReporter: {enabled: true, interval: 180000},
gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferHighWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
enabled: false}
Ua hiki ia'u ke "hookomo" i ka version BerkeleyDB o JanusGraph.
He kekee loa ka palapala ma ke ano o ka indexes, no ka mea, o ka hookele ana i na index, pono oe e hana i kekahi shamanism ano e ma Groovy. No ka laʻana, pono e hana ʻia kahi index ma ke kākau ʻana i ke code ma ka console Gremlin (ʻo ia, ma ke ala, ʻaʻole hana i waho o ka pahu). Mai ka palapala mana o JanusGraph:
graph.tx().rollback() //Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
mgmt.buildIndex('byNameAndAgeComposite', Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()
//Wait for the index to become available
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameComposite').call()
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameAndAgeComposite').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"), SchemaAction.REINDEX).get()
mgmt.commit()
Ma hope o ka'ōlelo
Ma kahi manaʻo, ʻo ka hoʻokolohua ma luna nei he hoʻohālikelike ma waena o ka mahana a me ka palupalu. Inā ʻoe e noʻonoʻo e pili ana iā ia, hana ka DBMS graph i nā hana ʻē aʻe e loaʻa ai nā hopena like. Eia naʻe, ma ke ʻano he ʻāpana o nā hoʻokolohua, ua hana pū wau i kahi hoʻokolohua me kahi noi e like me:
g.V().hasLabel('ZoneStep').has('id',0)
.repeat(__.out().simplePath()).until(__.hasLabel('ZoneStep').has('id',1)).count().next()
e hōʻike ana i ka hele wāwae. Eia nō naʻe, ʻoiai ma ia ʻikepili, ua hōʻike ka DBMS graph i nā hopena i hele ma mua o kekahi mau kekona ... ʻO kēia, ʻoiaʻiʻo, ma muli o ka loaʻa ʻana o nā ala e like me 0 -> X -> Y ... -> 1
, ka mea i nānā ʻia e ka ʻenekini kiʻi.
ʻOiai no kahi nīnau e like me:
g.V().hasLabel('ZoneStep').has('id',0).out().has('id',1)).count().next()
ʻAʻole hiki iaʻu ke loaʻa i kahi pane huahua me ka manawa hana ma lalo o kekona.
ʻO ka pono o ka moʻolelo ʻo ia ka manaʻo nani a me ka hoʻohālike paradigmatic ʻaʻole i alakaʻi i ka hopena i makemake ʻia, i hōʻike ʻia me ka ʻoi aku ka maikaʻi o ka hoʻohana ʻana i ka laʻana o ClickHouse. ʻO ka hihia hoʻohana i hōʻike ʻia i loko o kēia ʻatikala he ʻano anti-kiʻi no nā DBMS kiʻi, ʻoiai he kūpono ia no ka hoʻohālikelike ʻana i kā lākou paradigm.
Source: www.habr.com