Functional DBMS

The database world has long been taken over by relational DBMSs that use the SQL language. So much so that emerging varieties are called NoSQL. They managed to win a certain place for themselves in this market, but relational DBMS are not going to die, and continue to be actively used for their own purposes.

In this article, I want to describe the concept of a functional database. For a better understanding, I will do this by comparing with the classical relational model. As examples, tasks from various SQL tests found on the Internet will be used.

Introduction

Relational databases operate on tables and fields. In a functional database, classes and functions will be used instead, respectively. A field in a table with N keys will be represented as a function of N parameters. Instead of links between tables, functions will be used that return objects of the class to which the link goes. Function composition will be used instead of JOIN.

Before proceeding directly to the tasks, I will describe the task of domain logic. For DDL, I will use PostgreSQL syntax. For functional its own syntax.

Tables and fields

A simple Sku object with name and price fields:

relational

CREATE TABLE Sku
(
    id bigint NOT NULL,
    name character varying(100),
    price numeric(10,5),
    CONSTRAINT id_pkey PRIMARY KEY (id)
)

Functional

CLASS Sku;
name = DATA STRING[100] (Sku);
price = DATA NUMERIC[10,5] (Sku);

We announce two Features, which take one Sku parameter as input and return a primitive type.

It is assumed that in a functional DBMS, each object will have some internal code that is automatically generated and can be accessed if necessary.

Let's set the price for the product / store / supplier. It can change over time, so let's add a time field to the table. I will skip the declaration of tables for directories in a relational database to shorten the code:

relational

CREATE TABLE prices
(
    skuId bigint NOT NULL,
    storeId bigint NOT NULL,
    supplierId bigint NOT NULL,
    dateTime timestamp without time zone,
    price numeric(10,5),
    CONSTRAINT prices_pkey PRIMARY KEY (skuId, storeId, supplierId)
)

Functional

CLASS Sku;
CLASS Store;
CLASS Supplier;
dateTime = DATA DATETIME (Sku, Store, Supplier);
price = DATA NUMERIC[10,5] (Sku, Store, Supplier);

Indexes

For the last example, let's build an index on all keys and date so that we can quickly find the price for a certain time.

relational

CREATE INDEX prices_date
    ON prices
    (skuId, storeId, supplierId, dateTime)

Functional

INDEX Sku sk, Store st, Supplier sp, dateTime(sk, st, sp);

Tasks

Let's start with relatively simple problems taken from the corresponding Articles on Habr.

First, let's declare the domain logic (for a relational database, this is done directly in the above article).

CLASS Department;
name = DATA STRING[100] (Department);

CLASS Employee;
department = DATA Department (Employee);
chief = DATA Employee (Employee);
name = DATA STRING[100] (Employee);
salary = DATA NUMERIC[14,2] (Employee);

Task 1.1

Display a list of employees who receive wages greater than those of the immediate supervisor.

relational

select a.*
from   employee a, employee b
where  b.id = a.chief_id
and    a.salary > b.salary

Functional

SELECT name(Employee a) WHERE salary(a) > salary(chief(a));

Task 1.2

Display a list of employees earning the highest salary in their department

relational

select a.*
from   employee a
where  a.salary = ( select max(salary) from employee b
                    where  b.department_id = a.department_id )

Functional

maxSalary 'Максимальная Π·Π°Ρ€ΠΏΠ»Π°Ρ‚Π°' (Department s) = 
    GROUP MAX salary(Employee e) IF department(e) = s;
SELECT name(Employee a) WHERE salary(a) = maxSalary(department(a));

// ΠΈΠ»ΠΈ Π΅ΡΠ»ΠΈ "Π·Π°ΠΈΠ½Π»Π°ΠΉΠ½ΠΈΡ‚ΡŒ"
SELECT name(Employee a) WHERE 
    salary(a) = maxSalary(GROUP MAX salary(Employee e) IF department(e) = department(a));

Both implementations are equivalent. For the first case in the relational database, you can use CREATE VIEW, which in the same way will first calculate the maximum salary for a specific department in it. In the future, for clarity, I will use the first case, since it better reflects the solution.

Task 1.3

Display a list of department IDs, the number of employees in which does not exceed 3 people.

relational

select department_id
from   employee
group  by department_id
having count(*) <= 3

Functional

countEmployees 'ΠšΠΎΠ»ΠΈΡ‡Π΅ΡΡ‚Π²ΠΎ ΡΠΎΡ‚Ρ€ΡƒΠ΄Π½ΠΈΠΊΠΎΠ²' (Department d) = 
    GROUP SUM 1 IF department(Employee e) = d;
SELECT Department d WHERE countEmployees(d) <= 3;

Task 1.4

Display a list of employees who do not have an assigned manager working in the same department.

relational

select a.*
from   employee a
left   join employee b on (b.id = a.chief_id and b.department_id = a.department_id)
where  b.id is null

Functional

SELECT name(Employee a) WHERE NOT (department(chief(a)) = department(a));

Task 1.5

Find the list of department IDs with the maximum total employee salary.

relational

with sum_salary as
  ( select department_id, sum(salary) salary
    from   employee
    group  by department_id )
select department_id
from   sum_salary a       
where  a.salary = ( select max(salary) from sum_salary )

Functional

salarySum 'Максимальная Π·Π°Ρ€ΠΏΠ»Π°Ρ‚Π°' (Department d) = 
    GROUP SUM salary(Employee e) IF department(e) = d;
maxSalarySum 'Максимальная Π·Π°Ρ€ΠΏΠ»Π°Ρ‚Π° ΠΎΡ‚Π΄Π΅Π»ΠΎΠ²' () = 
    GROUP MAX salarySum(Department d);
SELECT Department d WHERE salarySum(d) = maxSalarySum();

Let's move on to more complex tasks from another Articles. It contains a detailed analysis of how to implement this task in MS SQL.

Task 2.1

What sellers sold more than 1997 pieces of item #30 in 1?

Domain logic (as before, we skip the declaration on the RDBMS):

CLASS Employee 'ΠŸΡ€ΠΎΠ΄Π°Π²Π΅Ρ†';
lastName 'Ѐамилия' = DATA STRING[100] (Employee);

CLASS Product 'ΠŸΡ€ΠΎΠ΄ΡƒΠΊΡ‚';
id = DATA INTEGER (Product);
name = DATA STRING[100] (Product);

CLASS Order 'Π—Π°ΠΊΠ°Π·';
date = DATA DATE (Order);
employee = DATA Employee (Order);

CLASS Detail 'Π‘Ρ‚Ρ€ΠΎΠΊΠ° Π·Π°ΠΊΠ°Π·Π°';

order = DATA Order (Detail);
product = DATA Product (Detail);
quantity = DATA NUMERIC[10,5] (Detail);

relational

select LastName
from Employees as e
where (
  select sum(od.Quantity)
  from [Order Details] as od
  where od.ProductID = 1 and od.OrderID in (
    select o.OrderID
    from Orders as o
    where year(o.OrderDate) = 1997 and e.EmployeeID = o.EmployeeID)
) > 30

Functional

sold (Employee e, INTEGER productId, INTEGER year) = 
    GROUP SUM quantity(OrderDetail d) IF 
        employee(order(d)) = e AND 
        id(product(d)) = productId AND 
        extractYear(date(order(d))) = year;
SELECT lastName(Employee e) WHERE sold(e, 1, 1997) > 30;

Task 2.2

For each customer (first name, last name), find the two items (name) on which the customer spent the most money in 1997.

Extending the domain logic from the previous example:

CLASS Customer 'ΠšΠ»ΠΈΠ΅Π½Ρ‚';
contactName 'ЀИО' = DATA STRING[100] (Customer);

customer = DATA Customer (Order);

unitPrice = DATA NUMERIC[14,2] (Detail);
discount = DATA NUMERIC[6,2] (Detail);

relational

SELECT ContactName, ProductName FROM (
SELECT c.ContactName, p.ProductName
, ROW_NUMBER() OVER (
    PARTITION BY c.ContactName
    ORDER BY SUM(od.Quantity * od.UnitPrice * (1 - od.Discount)) DESC
) AS RatingByAmt
FROM Customers c
JOIN Orders o ON o.CustomerID = c.CustomerID
JOIN [Order Details] od ON od.OrderID = o.OrderID
JOIN Products p ON p.ProductID = od.ProductID
WHERE YEAR(o.OrderDate) = 1997
GROUP BY c.ContactName, p.ProductName
) t
WHERE RatingByAmt < 3

Functional

sum (Detail d) = quantity(d) * unitPrice(d) * (1 - discount(d));
bought 'ΠšΡƒΠΏΠΈΠ»' (Customer c, Product p, INTEGER y) = 
    GROUP SUM sum(Detail d) IF 
        customer(order(d)) = c AND 
        product(d) = p AND 
        extractYear(date(order(d))) = y;
rating 'Π Π΅ΠΉΡ‚ΠΈΠ½Π³' (Customer c, Product p, INTEGER y) = 
    PARTITION SUM 1 ORDER DESC bought(c, p, y), p BY c, y;
SELECT contactName(Customer c), name(Product p) WHERE rating(c, p, 1997) < 3;

The PARTITION operator works according to the following principle: it sums the expression specified after SUM (here 1) within the specified groups (here Customer and Year, but can be any expression), sorting within the groups according to the expressions specified in ORDER (here bought, and if are equal, then by the internal product code).

Task 2.3

How many goods need to be ordered from suppliers to fulfill current orders.

Let's extend the domain logic again:

CLASS Supplier 'ΠŸΠΎΡΡ‚Π°Π²Ρ‰ΠΈΠΊ';
companyName = DATA STRING[100] (Supplier);

supplier = DATA Supplier (Product);

unitsInStock 'ΠžΡΡ‚Π°Ρ‚ΠΎΠΊ Π½Π° ΡΠΊΠ»Π°Π΄Π΅' = DATA NUMERIC[10,3] (Product);
reorderLevel 'Норма ΠΏΡ€ΠΎΠ΄Π°ΠΆΠΈ' = DATA NUMERIC[10,3] (Product);

relational

select s.CompanyName, p.ProductName, sum(od.Quantity) + p.ReorderLevel β€” p.UnitsInStock as ToOrder
from Orders o
join [Order Details] od on o.OrderID = od.OrderID
join Products p on od.ProductID = p.ProductID
join Suppliers s on p.SupplierID = s.SupplierID
where o.ShippedDate is null
group by s.CompanyName, p.ProductName, p.UnitsInStock, p.ReorderLevel
having p.UnitsInStock < sum(od.Quantity) + p.ReorderLevel

Functional

orderedNotShipped 'Π—Π°ΠΊΠ°Π·Π°Π½ΠΎ, Π½ΠΎ Π½Π΅ ΠΎΡ‚Π³Ρ€ΡƒΠΆΠ΅Π½ΠΎ' (Product p) = 
    GROUP SUM quantity(OrderDetail d) IF product(d) = p;
toOrder 'К Π·Π°ΠΊΠ°Π·Ρƒ' (Product p) = orderedNotShipped(p) + reorderLevel(p) - unitsInStock(p);
SELECT companyName(supplier(Product p)), name(p), toOrder(p) WHERE toOrder(p) > 0;

Task with an asterisk

And the last example is from me personally. There is the logic of a social network. People can be friends with each other and like each other. From a functional database perspective, this would look like this:

CLASS Person;
likes = DATA BOOLEAN (Person, Person);
friends = DATA BOOLEAN (Person, Person);

It is necessary to find possible candidates for friendship. More formally, you need to find all people A, B, C such that A is friends with B, and B is friends with C, A likes C, but A is not friends with C.
From a functional database point of view, the query would look like this:

SELECT Person a, Person b, Person c WHERE 
    likes(a, c) AND NOT friends(a, c) AND 
    friends(a, b) AND friends(b, c);

The reader is invited to independently solve this problem in SQL. It is assumed that there are far fewer friends than those who like. Therefore, they are in separate tables. In case of a successful solution, there is also a problem with two asterisks. Her friendship is not symmetrical. On a functional database it would look like this:

SELECT Person a, Person b, Person c WHERE 
    likes(a, c) AND NOT friends(a, c) AND 
    (friends(a, b) OR friends(b, a)) AND 
    (friends(b, c) OR friends(c, b));

UPD: solution of the problem with the first and second asterisk from dss_kalika:

SELECT 
   pl.PersonAID
  ,pf.PersonAID
  ,pff.PersonAID
FROM Persons                 AS p
--Π›Π°ΠΉΠΊΠΈ                      
JOIN PersonRelationShip      AS pl ON pl.PersonAID = p.PersonID
                                  AND pl.Relation  = 'Like'
--Π”Ρ€ΡƒΠ·ΡŒΡ                     
JOIN PersonRelationShip      AS pf ON pf.PersonAID = p.PersonID 
                                  AND pf.Relation = 'Friend'
--Π”Ρ€ΡƒΠ·ΡŒΡ Π”Ρ€ΡƒΠ·Π΅ΠΉ              
JOIN PersonRelationShip      AS pff ON pff.PersonAID = pf.PersonBID
                                   AND pff.PersonBID = pl.PersonBID
                                   AND pff.Relation = 'Friend'
--Π•Ρ‰Ρ‘ Π½Π΅ Π΄Ρ€ΡƒΠΆΠ°Ρ‚         
LEFT JOIN PersonRelationShip AS pnf ON pnf.PersonAID = p.PersonID
                                   AND pnf.PersonBID = pff.PersonBID
                                   AND pnf.Relation = 'Friend'
WHERE pnf.PersonAID IS NULL 

;WITH PersonRelationShipCollapsed AS (
  SELECT pl.PersonAID
        ,pl.PersonBID
        ,pl.Relation 
  FROM #PersonRelationShip      AS pl 
  
  UNION 

  SELECT pl.PersonBID AS PersonAID
        ,pl.PersonAID AS PersonBID
        ,pl.Relation
  FROM #PersonRelationShip      AS pl 
)
SELECT 
   pl.PersonAID
  ,pf.PersonBID
  ,pff.PersonBID
FROM #Persons                      AS p
--Π›Π°ΠΉΠΊΠΈ                      
JOIN PersonRelationShipCollapsed  AS pl ON pl.PersonAID = p.PersonID
                                 AND pl.Relation  = 'Like'                                  
--Π”Ρ€ΡƒΠ·ΡŒΡ                          
JOIN PersonRelationShipCollapsed  AS pf ON pf.PersonAID = p.PersonID 
                                 AND pf.Relation = 'Friend'
--Π”Ρ€ΡƒΠ·ΡŒΡ Π”Ρ€ΡƒΠ·Π΅ΠΉ                   
JOIN PersonRelationShipCollapsed  AS pff ON pff.PersonAID = pf.PersonBID
                                 AND pff.PersonBID = pl.PersonBID
                                 AND pff.Relation = 'Friend'
--Π•Ρ‰Ρ‘ Π½Π΅ Π΄Ρ€ΡƒΠΆΠ°Ρ‚                   
LEFT JOIN PersonRelationShipCollapsed AS pnf ON pnf.PersonAID = p.PersonID
                                   AND pnf.PersonBID = pff.PersonBID
                                   AND pnf.Relation = 'Friend'
WHERE pnf.[PersonAID] IS NULL 

Conclusion

It should be noted that the above syntax of the language is just one of the options for implementing the above concept. It was SQL that was taken as the basis, and the goal was to make it as similar as possible to it. Of course, someone may not like the names of keywords, case of words, and so on. The main thing here is the concept itself. If desired, you can make both C ++ and Python similar syntax.

The described database concept, in my opinion, has the following advantages:

  • Ease. This is a relatively subjective indicator that is not obvious in simple cases. But if you look at more complex cases (for example, tasks with asterisks), then, in my opinion, writing such queries is much easier.
  • Encapsulation. In some examples, I declared intermediate functions (for example, sold, bought etc.), from which subsequent functions were built. This allows you to change the logic of certain functions, if necessary, without changing the logic of those that depend on them. For example, you can make sales sold were calculated from completely different objects, while the rest of the logic will not change. Yes, in RDBMS this can be done using CREATE VIEW. But if you write all the logic in this way, then it will not look very readable.
  • No Semantic Gap. Such a database operates with functions and classes (instead of tables and fields). In the same way as in classical programming (assuming that a method is a function with the first parameter in the form of a class to which it belongs). Accordingly, it should be much easier to β€œmake friends” with universal programming languages. In addition, this concept allows you to implement much more complex functions. For example, you can embed statements like this in the database:

    CONSTRAINT sold(Employee e, 1, 2019) > 100 IF name(e) = 'ΠŸΠ΅Ρ‚Ρ' MESSAGE  'Π§Ρ‚ΠΎ-Ρ‚ΠΎ ΠŸΠ΅Ρ‚я ΠΏΡ€ΠΎΠ΄Π°Π΅Ρ‚ ΡΠ»ΠΈΡˆΠΊΠΎΠΌ ΠΌΠ½ΠΎΠ³ΠΎ ΠΎΠ΄Π½ΠΎΠ³ΠΎ Ρ‚ΠΎΠ²Π°Ρ€Π° Π² 2019 Π³ΠΎΠ΄Ρƒ';

  • Inheritance and polymorphism. In a functional database, you can introduce multiple inheritance through the CLASS ClassP: Class1, Class2 constructs and implement multiple polymorphism. How exactly, perhaps I will write in the following articles.

Even though this is just a concept, we already have some implementation in Java that translates all functional logic into relational logic. Plus, the logic of representations and a lot of other things are beautifully screwed to it, thanks to which we get a whole platform. Essentially, we are using an RDBMS (only PostgreSQL so far) as a "virtual machine". This translation sometimes causes problems because the RDBMS query optimizer does not know certain statistics that the FDBMS does. In theory, it is possible to implement a database management system that will use a certain structure as storage, adapted specifically for functional logic.

Source: habr.com

Add a comment