What's going on with RDF repositories now?

Semantic Web and Linked Data are like outer space: there is no life there. To go there for a more or less long term… I don’t know what they told you as a child in response to β€œI want to become an astronaut.” But you can watch what is happening while on Earth; becoming an amateur astronomer or even a professional is much easier.

The article will focus on fresh, no older than a few months, trends from the world of RDF storage. The metaphor in the first paragraph is inspired by an epic promotional image under the cut.


epic picture

What's going on with RDF repositories now?

I. GraphQL for RDF Access

They saythat GraphQL claims to be the universal database access language. And what about the ability to access using GraphQL to RDF?

Out of the box, this opportunity is provided by:

If the repository does not provide such an opportunity, it is implemented independently by writing the appropriate β€œresolver” (resolver). This was done, for example, in the French project DataTourism. Or you can already write nothing, but just take HyperGraphQL.

From the point of view of an orthodox follower of the Semantic Web and Linked Data, all this is, of course, sad, since it seems intended for integrations built around the next data silo, and not suitable platforms (of course, RDF stores).

The impressions from comparing GraphQL with SPARQL are twofold.

  • On the one hand, GraphQL looks like a distant relative of SPARQL: it solves the problems of reselection and multiple queries that are typical for REST - without which, probably, it would not be possible to consider query language, at least for the web;
  • On the other hand, the rigid scheme of GraphQL upsets. Accordingly, its "introspectiveness" seems to be very limited compared to the full reflexivity of RDF. And there is no analogue of property paths, so it's not even very clear why it is "Graph-".

II. Adapters for MongoDB

A trend complementary to the previous one.

  • At Stardog now perhaps - in particular, all on the same GraphQL - configure the display of MongoDB data into virtual RDF graphs;
  • Ontotext GraphDB recently Allows insert into SPARQL fragments on MongoDB Query.

Speaking more broadly, about adapters to JSON sources that allow more or less "on the fly" to represent the JSON stored in these sources as RDF, then we can also recall the existing one for quite some time SPARQL Generatewhich can be adjusted for example, to Apache Jena.

Summarizing the first two trends, we can say that RDF repositories demonstrate full readiness for integration and functioning in conditions of β€œmultiple storage” (polyglot persistence). It is known, however, that this latter has long been out of fashion, and to replace it is coming multi-modeling. And what about multi-modeling in the world of RDF storage?

In short, no way. I would like to devote a separate article to the topic of multi-model DBMS, but for now you can see that there are no multi-model DBMS β€œbased” on the graph model (RDF can be considered a variation of it) now. About some small multi-modeling - support by RDF storages of an alternative LPG graph model - will be discussed in Section V.

III. OLTP vs. OLAP

However, the same Gartner ΠΏΠΈΡˆΠ΅Ρ‚that multi-modeling is a sine qua non condition primarily for operating DBMS. This is understandable: in a situation of β€œmultiple storage”, the main problems arise with transactionality.

But where on the OLTP-OLAP scale are RDF repositories? I would answer like this: neither there nor here. To indicate what they are intended for, some third abbreviation is needed. As an option I would suggest OLIP β€” Online Intellectual Processing.

However, still:

  • the integration mechanisms implemented in GraphDB with MongoDB are not the least intended to work around write performance issues;
  • Stardog goes even further and completely rewrites engine, again with the goal of improving write performance.

And now let me introduce a new player to the market. From the creators of IBM Netezza and Amazon Redshift - AnzoGraphβ„’. A picture from an advertisement for a product based on it was placed at the beginning of the article. AnzoGraph positions itself as a GOLAP solution. How do you like SPARQL with window functions? β€”

SELECT ?month (COUNT(?event) OVER (PARTITION BY ?month) AS ?events) WHERE {  …  }

IV. RocksDB

Above already there was a link to the announcement of Stardog 7 Beta, which said that Stardog was going to use RocksDB as an underlying storage system - key-value storage, Facebook's fork of Google's LevelDB. Why is it worth talking about a certain trend?

First, judging by Wikipedia article, not only RDF repositories are "transplanted" to RocksDB. There are projects to use RocksDB as a storage engine in ArangoDB, MongoDB, MySQL and MariaDB, Cassandra.

Secondly, projects (that is, not products) of the corresponding subject are made on RocksDB.

For example, eBay uses RocksDB in platform for your "knowledge graph". By the way, it's funny to read: the query language started as a home grown format, but more recently it has been transitioning to be much more like SPARQL. As in a joke: no matter how much knowledge graph we do, we still get RDF.

Another example - appeared a few months ago Wikidata History Query Service. Prior to its introduction, Wikidata's historical information had to be accessed through MWAPI to the standard Mediawiki API. A lot is now possible in pure SPARQL. "Under the hood" there is also RocksDB. By the way, WDHQS did it, it looks like the person involved in importing Freebase into the Google Knowledge Graph.

V. LPG support

Let me remind you the main difference between LPG graphs and RDF graphs.

In LPG, scalar properties can be attached to edge instances, while in RDF they can only be attached to edge "types" (but not only scalar properties, but also ordinary links). This limitation of RDF compared to LPG overcome some kind of modeling technique. The limitations of LPG compared to RDF are more difficult to overcome, but LPG graphs are more like pictures from Harari's textbook than RDF graphs, so people want them.

Obviously, the task of "supporting LPG" falls into two parts:

  1. making changes to the RDF model that make it possible to simulate LPG constructs in it;
  2. making changes to the RDF query language that make it possible to access data in this modified model, or the implementation of the ability to query this model in popular LPG query languages.

V.1. Data Model

There are several possible approaches here.

V.1.1. singleton property

The most literal approach to harmonizing RDF and LPG is probably singleton property:

  • Instead of, for example, the predicate :isMarriedTo predicates are used :isMarriedTo1, :isMarriedTo2 etc.
  • These predicates then become subjects of new triplets: :isMarriedTo1 :since "2013-09-13"^^xsd:date etc.
  • The connection of these instances of predicates with a common predicate is established by triplets of the form :isMarriedTo1 rdf:singletonPropertyOf :isMarriedTo.
  • Obviously, the rdf:singletonPropertyOf rdfs:subPropertyOf rdf:type, but consider why you shouldn't just write :isMarriedTo1 rdf:type :isMarriedTo.

The task of "LPG support" is solved here at the RDFS level. Such a decision requires inclusion in the relevant standard. Some changes may be required from RDF repositories that support attaching consequences, but for now, Singleton Property can be thought of as just another modeling technique.

V.1.2. Reification Done Right

Less naive approaches stem from the realization that property instances are perfectly instantiated by triplets. By being able to talk about triplets, we can also talk about property instances.

The most solid of these approaches is RDF*aka RDR, born in the bowels of Blazegraph. It's from the start elected for myself and AnzoGraph. The solidity of the approach is determined by the fact that within its framework offered corresponding changes in RDF Semantics. The point, however, is extremely simple. In RDF Turtle serialization, you can now write something like this:

<<:bob :isMarriedTo :alice>> :since "2013-09-13"^^xsd:date .

V.1.3. Other approaches

You can not bother with formal semantics, but simply consider that the triplets have some identifiers, which, of course, are URIs, and compose new triplets with these URIs. All that remains is to give access to these URIs in SPARQL. So arrives stardog.

In Allegrograph let's go in an intermediate way. It is known that the identifiers of triplets in Allegrograph Yes, but when triple attributes are implemented, they do not stick out. However, even formal semantics is very far away. Notably, triplet attributes are not URIs, and the values ​​of these attributes can also only be literals. LPG adherents get exactly what they wanted. In the specially invented NQX format, an example similar to the one above for RDF* looks like this:

:bob :marriedTo :alice {"since" : "2013-09-13"}

V.2. Query languages

Having supported LPG in one way or another at the model level, you need to make it possible to query data in such a model.

  • Blazegraph for RDF* queries supports SPARQL* ΠΈ Gremlin. A SPARQL* query looks like this:

 SELECT * { <<:bob :isMarriedTo ?wife>> :since ?since }

  • Anzograph also supports SPARQL* and is going to support Cypher, the query language in Neo4j.
  • Stardog maintains its own extension SPARQL and again Gremlin. You can get the URI of the triplet and "meta-information" in SPARQL using something like this:

SELECT * {
    BIND (stardog:identifier(:bob, :isMarriedTo, ?wife) AS ?id)
    ?id :since ?since
}

  • Allegrograph also supports its own extension SPARQL:

 SELECT * { ("since" ?since)  franz:attributesNameValue  ( :bob :marriedTo ?wife ) }

Incidentally, GraphDB supported Tinkerpop/Gremlin at one time without supporting LPG, but that stopped in version 8.0 or 8.1.

VI. Tightening licenses

There have been no recent additions to the intersection of the β€œtriplestore of choice” and β€œopen source triplestore” sets. New open source RDF stores are far from being a good choice for everyday use, and the source code for new triple stores that I would like to use (for example, AnzoGraph) is closed. Rather, we can talk about reductions ...

Of course, previously open source is not closed, but some open source repositories are gradually no longer considered worthy of choice. Virtuoso, which has an open source edition, in my opinion, is drowning in bugs. Blazegraph bought by AWS and formed the basis of Amazon Neptune; now it is not clear whether there will be at least one more release. Only Jenna remains...

If open source is not very important, but you just want to try, then everything is also less rosy than before. For example:

  • star dog ceases distribute the free version (however, the trial period of the regular one has doubled);
  • Π² GraphDB Cloud, where you could previously choose the free basic plan, new user registration is suspended.

In general, space is becoming more and more inaccessible for an ordinary IT layman, its development is becoming the lot of corporations.

Source: habr.com

Add a comment