Introduction to SQL Server 2017 Graph Databases

Before the start of the course "MS SQL Server Developer" We have prepared another useful translation for you.

Graph databases are an important technology for database professionals. I try to follow the innovations and new technologies in this area and after working with relational and NoSQL databases, I see that the role of graph databases is becoming more and more. When working with complex hierarchical data, not only traditional databases, but also NoSQL are ineffective. Often, with an increase in the number of link levels and the size of the database, there is a decrease in performance. And as the relationships become more complex, the number of JOINs also increases.

Of course, in the relational model there are solutions for working with hierarchies (for example, using recursive CTEs), but these are still workarounds. At the same time, the functionality of SQL Server graph databases makes it easy to handle multiple levels of hierarchy. Both the data model and queries are simplified, and therefore their efficiency is increased. Significantly reduces the amount of code.

Graph databases are an expressive language for representing complex systems. This technology is already quite widely used in the IT industry in areas such as social networks, anti-fraud systems, IT network analysis, social recommendations, product and content recommendations.

The graph database functionality in SQL Server is suitable for scenarios where the data is highly interconnected and has well-defined relationships.

Graph data model

A graph is a set of vertices (nodes, node) and edges (relationships, edge). Vertices represent entities, and edges represent links whose attributes can contain information.

A graph database models entities as a graph as defined in graph theory. Data structures are vertices and edges. Attributes are properties of vertices and edges. A connection is a connection of vertices.

Unlike other data models, graph databases prioritize relationships between entities. Therefore, there is no need to calculate relationships using foreign keys or in some other way. It is possible to create complex data models using only vertex and edge abstractions.

In today's world, modeling relationships requires more and more sophisticated techniques. To model relationships, SQL Server 2017 offers graph database capabilities. The vertices and edges of the graph are represented as new types of tables: NODE and EDGE. Graph queries use a new T-SQL function called MATCH(). Since this functionality is built into SQL Server 2017, it can be used in your existing databases without the need for any database conversion.

Benefits of the Graph Model

Nowadays, businesses and users demand applications that work with more and more data, while expecting high performance and reliability. The graph representation of data offers a convenient means for handling complex relationships. This approach solves many problems and helps to get results within a given context.

It looks like many applications will benefit from graph databases in the future.

Data Modeling: From Relational to Graph Modeling

Introduction to SQL Server 2017 Graph Databases
Example

Let's look at an example of an organizational structure with a hierarchy of employees: an employee reports to a manager, a manager reports to a senior manager, and so on. Depending on the specific company, there can be any number of levels in this hierarchy. But as the number of levels increases, calculating relationships in a relational database becomes more and more difficult. It is quite difficult to imagine the hierarchy of employees, the hierarchy in marketing or social media connections. Let's see how SQL Graph can solve the problem of handling different levels of hierarchy.

For this example, let's make a simple data model. Create a table of employees EMP with id EMPNO and column M.G.R.A that points to the ID of the head (manager) of the employee. All information about the hierarchy is stored in this table and can be queried using columns EMPNO ΠΈ M.G.R..

Introduction to SQL Server 2017 Graph Databases
The following diagram also shows the same org chart model with four levels of nesting in a more familiar form. Employees are the vertices of the graph from the table EMP. The entity "employee" is linked to itself by the relationship "submits" (ReportsTo). In graph terms, a link is an edge (EDGE) that connects nodes (NODEs) of employees.

Introduction to SQL Server 2017 Graph Databases

Let's create a regular table EMP and add values ​​there according to the diagram above.

CREATE TABLE EMP
(EMPNO INT NOT NULL,
ENAME VARCHAR(20),
JOB VARCHAR(10),
MGR INT,
JOINDATE DATETIME,
SALARY DECIMAL(7, 2),
COMMISIION DECIMAL(7, 2),
DNO INT)
 
INSERT INTO EMP VALUES
(7369, 'SMITH', 'CLERK', 7902, '02-MAR-1970', 8000, NULL, 2),
(7499, 'ALLEN', 'SALESMAN', 7698, '20-MAR-1971', 1600, 3000, 3),
(7521, 'WARD', 'SALESMAN', 7698, '07-FEB-1983', 1250, 5000, 3),
(7566, 'JONES', 'MANAGER', 7839, '02-JUN-1961', 2975, 50000, 2),
(7654, 'MARTIN', 'SALESMAN', 7698, '28-FEB-1971', 1250, 14000, 3),
(7698, 'BLAKE', 'MANAGER', 7839, '01-JAN-1988', 2850, 12000, 3),
(7782, 'CLARK', 'MANAGER', 7839, '09-APR-1971', 2450, 13000, 1),
(7788, 'SCOTT', 'ANALYST', 7566, '09-DEC-1982', 3000, 1200, 2),
(7839, 'KING', 'PRESIDENT', NULL, '17-JUL-1971', 5000, 1456, 1),
(7844, 'TURNER', 'SALESMAN', 7698, '08-AUG-1971', 1500, 0, 3),
(7876, 'ADAMS', 'CLERK', 7788, '12-MAR-1973', 1100, 0, 2),
(7900, 'JAMES', 'CLERK', 7698, '03-NOV-1971', 950, 0, 3),
(7902, 'FORD', 'ANALYST', 7566, '04-MAR-1961', 3000, 0, 2),
(7934, 'MILLER', 'CLERK', 7782, '21-JAN-1972', 1300, 0, 1)

The figure below shows the employees:

  • employee with EMPNO 7369 is subordinate to 7902;
  • employee with EMPNO 7902 subordinate to 7566
  • employee with EMPNO 7566 subordinate to 7839

Introduction to SQL Server 2017 Graph Databases
Now let's look at the representation of the same data in the form of a graph. The EMPLOYEE node has several attributes and is linked to itself by a "submits" relationship (EmplReportsTo). EmplReportsTo is the name of the relationship.

The edge table (EDGE) may also have attributes.

Introduction to SQL Server 2017 Graph Databases
Create a node table EmpNode

The syntax for creating a node is quite simple: to the expression CREATE TABLE added to the end AS NODE.

CREATE TABLE dbo.EmpNode(
ID Int Identity(1,1),
EMPNO NUMERIC(4) NOT NULL,
ENAME VARCHAR(10),
MGR NUMERIC(4),
DNO INT
) AS NODE;

Now let's convert the data from a regular table to a graph one. Next INSERT inserts data from a relational table EMP.

INSERT INTO EmpNode(EMPNO,ENAME,MGR,DNO) select empno,ename,MGR,dno from emp

Introduction to SQL Server 2017 Graph Databases
In the node table in a special column $node_id_* the host ID is stored as JSON. The remaining columns of this table contain the attributes of the node.

Create Edges (EDGE)

Creating an edge table is very similar to creating a node table, except that the keyword AS EDGE.

CREATE TABLE empReportsTo(Deptno int) AS EDGE

Introduction to SQL Server 2017 Graph Databases

Now let's define relationships between employees using columns EMPNO ΠΈ M.G.R.. The org chart clearly shows how to write INSERT.

INSERT INTO empReportsTo  VALUES ((SELECT $node_id FROM EmpNode WHERE ID = 1),
   	(SELECT $node_id FROM EmpNode WHERE id = 13),20);
INSERT INTO empReportsTo  VALUES ((SELECT $node_id FROM EmpNode WHERE ID = 2),
   	(SELECT $node_id FROM EmpNode WHERE id = 6),10);
INSERT INTO empReportsTo  VALUES ((SELECT $node_id FROM EmpNode WHERE ID = 3),
   	(SELECT $node_id FROM EmpNode WHERE id = 6),10)
INSERT INTO empReportsTo  VALUES ((SELECT $node_id FROM EmpNode WHERE ID = 4),
   	(SELECT $node_id FROM EmpNode WHERE id = 9),30);
INSERT INTO empReportsTo  VALUES ((SELECT $node_id FROM EmpNode WHERE ID = 5),
   	(SELECT $node_id FROM EmpNode WHERE id = 6),30);
INSERT INTO empReportsTo  VALUES ((SELECT $node_id FROM EmpNode WHERE ID = 6),
   	(SELECT $node_id FROM EmpNode WHERE id = 9),30);
INSERT INTO empReportsTo  VALUES ((SELECT $node_id FROM EmpNode WHERE ID = 7),
   	(SELECT $node_id FROM EmpNode WHERE id = 9),30);
INSERT INTO empReportsTo  VALUES ((SELECT $node_id FROM EmpNode WHERE ID = 8),
   	(SELECT $node_id FROM EmpNode WHERE id = 4),30);
INSERT INTO empReportsTo  VALUES ((SELECT $node_id FROM EmpNode WHERE ID = 9),
   	(SELECT $node_id FROM EmpNode WHERE id = 9),30);
INSERT INTO empReportsTo  VALUES ((SELECT $node_id FROM EmpNode WHERE ID = 10),
   	(SELECT $node_id FROM EmpNode WHERE id = 6),30);
INSERT INTO empReportsTo  VALUES ((SELECT $node_id FROM EmpNode WHERE ID = 11),
   	(SELECT $node_id FROM EmpNode WHERE id = 8),30);
INSERT INTO empReportsTo  VALUES ((SELECT $node_id FROM EmpNode WHERE ID = 12),
   	(SELECT $node_id FROM EmpNode WHERE id = 6),30);
INSERT INTO empReportsTo  VALUES ((SELECT $node_id FROM EmpNode WHERE ID = 13),
   	(SELECT $node_id FROM EmpNode WHERE id = 4),30);
INSERT INTO empReportsTo  VALUES ((SELECT $node_id FROM EmpNode WHERE ID = 14),
   	(SELECT $node_id FROM EmpNode WHERE id = 7),30);

The edge table has three columns by default. First, $edge_id β€” edge identifier in the form of JSON. The other two ($from_id ΠΈ $to_id) represent the relationship between nodes. In addition, edges can have additional properties. In our case, this Deptno.

System Views

In system view sys.tables There are two new columns:

  1. is_edge
  2. is_node

SELECT t.is_edge,t.is_node,*
FROM sys.tables t
WHERE name like 'emp%'

Introduction to SQL Server 2017 Graph Databases

ssms

Objects related to graphs are located in the Graph Tables folder. The node table icon is marked with a dot, and the edge table icon is marked with two connected circles (which looks a bit like glasses).

Introduction to SQL Server 2017 Graph Databases

MATCH expression

Expression MATCH taken from CQL (Cypher Query Language). This is an efficient way to query the properties of a graph. CQL starts with an expression MATCH.

Syntax

MATCH (<graph_search_pattern>)
 
<graph_search_pattern>::=
    {<node_alias> {
                 	{ <-( <edge_alias> )- }
               	| { -( <edge_alias> )-> }
             	<node_alias>
             	}
 	}
 	[ { AND } { ( <graph_search_pattern> ) } ]
 	[ ,...n ]
 
<node_alias> ::=
    node_table_name | node_alias
 
<edge_alias> ::=
    edge_table_name | edge_alias

Examples

Let's look at a few examples.

The query below displays the employees that Smith and his manager report to.

SELECT
E.EMPNO,E.ENAME,E.MGR,E1.EMPNO,E1.ENAME,E1.MGR
FROM
    empnode e, empnode e1, empReportsTo m
WHERE
    MATCH(e-(m)->e1)
and e.ENAME='SMITH'

Introduction to SQL Server 2017 Graph Databases
The following query is for finding employees and second level managers for Smith. If you remove the offer WHERE, then all employees will be displayed as a result.

SELECT
E.EMPNO,E.ENAME,E.MGR,E1.EMPNO,E1.ENAME,E1.MGR,E2.EMPNO,e2.ENAME,E2.MGR
FROM
    empnode e, empnode e1, empReportsTo m ,empReportsTo m1, empnode e2
WHERE
    MATCH(e-(m)->e1-(m1)->e2)
and e.ENAME='SMITH'

Introduction to SQL Server 2017 Graph Databases
And finally, a request for third-level employees and managers.

SELECT
E.EMPNO,E.ENAME,E.MGR,E1.EMPNO,E1.ENAME,E1.MGR,E2.EMPNO,e2.ENAME,E2.MGR,E3.EMPNO,e3.ENAME,E3.MGR
FROM
    empnode e, empnode e1, empReportsTo m ,empReportsTo m1, empnode e2, empReportsTo M2, empnode e3
WHERE
    MATCH(e-(m)->e1-(m1)->e2-(m2)->e3)
and e.ENAME='SMITH'

Introduction to SQL Server 2017 Graph Databases
Now let's change direction to get Smith's bosses.

SELECT
E.EMPNO,E.ENAME,E.MGR,E1.EMPNO,E1.ENAME,E1.MGR,E2.EMPNO,e2.ENAME,E2.MGR,E3.EMPNO,e3.ENAME,E3.MGR
FROM
    empnode e, empnode e1, empReportsTo m ,empReportsTo m1, empnode e2, empReportsTo M2, empnode e3
WHERE
    MATCH(e<-(m)-e1<-(m1)-e2<-(m2)-e3)

Introduction to SQL Server 2017 Graph Databases

Conclusion

SQL Server 2017 has established itself as a complete enterprise solution for a variety of business IT needs. The first version of SQL Graph is very promising. Even despite some limitations, there is already enough functionality to explore the possibilities of graphs.

SQL Graph functionality is fully integrated into SQL Engine. However, as already mentioned, SQL Server 2017 has the following limitations:

No support for polymorphism.

  • Only unidirectional links are supported.
  • Edges cannot update their $from_id and $to_id columns via UPDATE.
  • Transitive closures are not supported, but they can be obtained using CTEs.
  • Limited support for In-Memory OLTP objects.
  • Temporal tables (System-Versioned Temporal Table), temporary local and global tables are not supported.
  • Table types and table variables cannot be declared as NODE or EDGE.
  • Cross-database queries are not supported.
  • There is no direct way or some kind of wizard (wizard) to convert regular tables to graph tables.
  • There is no GUI to display graphs, but Power BI can be used.

Introduction to SQL Server 2017 Graph Databases

Read more:

Source: habr.com

Add a comment