The database world has long been taken over by relational DBMSs that use the SQL language. So much so that emerging varieties are called NoSQL. They managed to win a certain place for themselves in this market, but relational DBMS are not going to die, and continue to be actively used for their own purposes.
In this article, I want to describe the concept of a functional database. For a better understanding, I will do this by comparing with the classical relational model. As examples, tasks from various SQL tests found on the Internet will be used.
Introduction
Relational databases operate on tables and fields. In a functional database, classes and functions will be used instead, respectively. A field in a table with N keys will be represented as a function of N parameters. Instead of links between tables, functions will be used that return objects of the class to which the link goes. Function composition will be used instead of JOIN.
Before proceeding directly to the tasks, I will describe the task of domain logic. For DDL, I will use PostgreSQL syntax. For functional its own syntax.
Tables and fields
A simple Sku object with name and price fields:
relational
CREATE TABLE Sku
(
id bigint NOT NULL,
name character varying(100),
price numeric(10,5),
CONSTRAINT id_pkey PRIMARY KEY (id)
)
Functional
CLASS Sku;
name = DATA STRING[100] (Sku);
price = DATA NUMERIC[10,5] (Sku);
We announce two Features, which take one Sku parameter as input and return a primitive type.
It is assumed that in a functional DBMS, each object will have some internal code that is automatically generated and can be accessed if necessary.
Let's set the price for the product / store / supplier. It can change over time, so let's add a time field to the table. I will skip the declaration of tables for directories in a relational database to shorten the code:
relational
CREATE TABLE prices
(
skuId bigint NOT NULL,
storeId bigint NOT NULL,
supplierId bigint NOT NULL,
dateTime timestamp without time zone,
price numeric(10,5),
CONSTRAINT prices_pkey PRIMARY KEY (skuId, storeId, supplierId)
)
Functional
CLASS Sku;
CLASS Store;
CLASS Supplier;
dateTime = DATA DATETIME (Sku, Store, Supplier);
price = DATA NUMERIC[10,5] (Sku, Store, Supplier);
Indexes
For the last example, let's build an index on all keys and date so that we can quickly find the price for a certain time.
relational
CREATE INDEX prices_date
ON prices
(skuId, storeId, supplierId, dateTime)
Functional
INDEX Sku sk, Store st, Supplier sp, dateTime(sk, st, sp);
Tasks
Let's start with relatively simple problems taken from the corresponding
First, let's declare the domain logic (for a relational database, this is done directly in the above article).
CLASS Department;
name = DATA STRING[100] (Department);
CLASS Employee;
department = DATA Department (Employee);
chief = DATA Employee (Employee);
name = DATA STRING[100] (Employee);
salary = DATA NUMERIC[14,2] (Employee);
Task 1.1
Display a list of employees who receive wages greater than those of the immediate supervisor.
relational
select a.*
from employee a, employee b
where b.id = a.chief_id
and a.salary > b.salary
Functional
SELECT name(Employee a) WHERE salary(a) > salary(chief(a));
Task 1.2
Display a list of employees earning the highest salary in their department
relational
select a.*
from employee a
where a.salary = ( select max(salary) from employee b
where b.department_id = a.department_id )
Functional
maxSalary 'ΠΠ°ΠΊΡΠΈΠΌΠ°Π»ΡΠ½Π°Ρ Π·Π°ΡΠΏΠ»Π°ΡΠ°' (Department s) =
GROUP MAX salary(Employee e) IF department(e) = s;
SELECT name(Employee a) WHERE salary(a) = maxSalary(department(a));
// ΠΈΠ»ΠΈ Π΅ΡΠ»ΠΈ "Π·Π°ΠΈΠ½Π»Π°ΠΉΠ½ΠΈΡΡ"
SELECT name(Employee a) WHERE
salary(a) = maxSalary(GROUP MAX salary(Employee e) IF department(e) = department(a));
Both implementations are equivalent. For the first case in the relational database, you can use CREATE VIEW, which in the same way will first calculate the maximum salary for a specific department in it. In the future, for clarity, I will use the first case, since it better reflects the solution.
Task 1.3
Display a list of department IDs, the number of employees in which does not exceed 3 people.
relational
select department_id
from employee
group by department_id
having count(*) <= 3
Functional
countEmployees 'ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΡΠΎΡΡΡΠ΄Π½ΠΈΠΊΠΎΠ²' (Department d) =
GROUP SUM 1 IF department(Employee e) = d;
SELECT Department d WHERE countEmployees(d) <= 3;
Task 1.4
Display a list of employees who do not have an assigned manager working in the same department.
relational
select a.*
from employee a
left join employee b on (b.id = a.chief_id and b.department_id = a.department_id)
where b.id is null
Functional
SELECT name(Employee a) WHERE NOT (department(chief(a)) = department(a));
Task 1.5
Find the list of department IDs with the maximum total employee salary.
relational
with sum_salary as
( select department_id, sum(salary) salary
from employee
group by department_id )
select department_id
from sum_salary a
where a.salary = ( select max(salary) from sum_salary )
Functional
salarySum 'ΠΠ°ΠΊΡΠΈΠΌΠ°Π»ΡΠ½Π°Ρ Π·Π°ΡΠΏΠ»Π°ΡΠ°' (Department d) =
GROUP SUM salary(Employee e) IF department(e) = d;
maxSalarySum 'ΠΠ°ΠΊΡΠΈΠΌΠ°Π»ΡΠ½Π°Ρ Π·Π°ΡΠΏΠ»Π°ΡΠ° ΠΎΡΠ΄Π΅Π»ΠΎΠ²' () =
GROUP MAX salarySum(Department d);
SELECT Department d WHERE salarySum(d) = maxSalarySum();
Let's move on to more complex tasks from another
Task 2.1
What sellers sold more than 1997 pieces of item #30 in 1?
Domain logic (as before, we skip the declaration on the RDBMS):
CLASS Employee 'ΠΡΠΎΠ΄Π°Π²Π΅Ρ';
lastName 'Π€Π°ΠΌΠΈΠ»ΠΈΡ' = DATA STRING[100] (Employee);
CLASS Product 'ΠΡΠΎΠ΄ΡΠΊΡ';
id = DATA INTEGER (Product);
name = DATA STRING[100] (Product);
CLASS Order 'ΠΠ°ΠΊΠ°Π·';
date = DATA DATE (Order);
employee = DATA Employee (Order);
CLASS Detail 'Π‘ΡΡΠΎΠΊΠ° Π·Π°ΠΊΠ°Π·Π°';
order = DATA Order (Detail);
product = DATA Product (Detail);
quantity = DATA NUMERIC[10,5] (Detail);
relational
select LastName
from Employees as e
where (
select sum(od.Quantity)
from [Order Details] as od
where od.ProductID = 1 and od.OrderID in (
select o.OrderID
from Orders as o
where year(o.OrderDate) = 1997 and e.EmployeeID = o.EmployeeID)
) > 30
Functional
sold (Employee e, INTEGER productId, INTEGER year) =
GROUP SUM quantity(OrderDetail d) IF
employee(order(d)) = e AND
id(product(d)) = productId AND
extractYear(date(order(d))) = year;
SELECT lastName(Employee e) WHERE sold(e, 1, 1997) > 30;
Task 2.2
For each customer (first name, last name), find the two items (name) on which the customer spent the most money in 1997.
Extending the domain logic from the previous example:
CLASS Customer 'ΠΠ»ΠΈΠ΅Π½Ρ';
contactName 'Π€ΠΠ' = DATA STRING[100] (Customer);
customer = DATA Customer (Order);
unitPrice = DATA NUMERIC[14,2] (Detail);
discount = DATA NUMERIC[6,2] (Detail);
relational
SELECT ContactName, ProductName FROM (
SELECT c.ContactName, p.ProductName
, ROW_NUMBER() OVER (
PARTITION BY c.ContactName
ORDER BY SUM(od.Quantity * od.UnitPrice * (1 - od.Discount)) DESC
) AS RatingByAmt
FROM Customers c
JOIN Orders o ON o.CustomerID = c.CustomerID
JOIN [Order Details] od ON od.OrderID = o.OrderID
JOIN Products p ON p.ProductID = od.ProductID
WHERE YEAR(o.OrderDate) = 1997
GROUP BY c.ContactName, p.ProductName
) t
WHERE RatingByAmt < 3
Functional
sum (Detail d) = quantity(d) * unitPrice(d) * (1 - discount(d));
bought 'ΠΡΠΏΠΈΠ»' (Customer c, Product p, INTEGER y) =
GROUP SUM sum(Detail d) IF
customer(order(d)) = c AND
product(d) = p AND
extractYear(date(order(d))) = y;
rating 'Π Π΅ΠΉΡΠΈΠ½Π³' (Customer c, Product p, INTEGER y) =
PARTITION SUM 1 ORDER DESC bought(c, p, y), p BY c, y;
SELECT contactName(Customer c), name(Product p) WHERE rating(c, p, 1997) < 3;
The PARTITION operator works according to the following principle: it sums the expression specified after SUM (here 1) within the specified groups (here Customer and Year, but can be any expression), sorting within the groups according to the expressions specified in ORDER (here bought, and if are equal, then by the internal product code).
Task 2.3
How many goods need to be ordered from suppliers to fulfill current orders.
Let's extend the domain logic again:
CLASS Supplier 'ΠΠΎΡΡΠ°Π²ΡΠΈΠΊ';
companyName = DATA STRING[100] (Supplier);
supplier = DATA Supplier (Product);
unitsInStock 'ΠΡΡΠ°ΡΠΎΠΊ Π½Π° ΡΠΊΠ»Π°Π΄Π΅' = DATA NUMERIC[10,3] (Product);
reorderLevel 'ΠΠΎΡΠΌΠ° ΠΏΡΠΎΠ΄Π°ΠΆΠΈ' = DATA NUMERIC[10,3] (Product);
relational
select s.CompanyName, p.ProductName, sum(od.Quantity) + p.ReorderLevel β p.UnitsInStock as ToOrder
from Orders o
join [Order Details] od on o.OrderID = od.OrderID
join Products p on od.ProductID = p.ProductID
join Suppliers s on p.SupplierID = s.SupplierID
where o.ShippedDate is null
group by s.CompanyName, p.ProductName, p.UnitsInStock, p.ReorderLevel
having p.UnitsInStock < sum(od.Quantity) + p.ReorderLevel
Functional
orderedNotShipped 'ΠΠ°ΠΊΠ°Π·Π°Π½ΠΎ, Π½ΠΎ Π½Π΅ ΠΎΡΠ³ΡΡΠΆΠ΅Π½ΠΎ' (Product p) =
GROUP SUM quantity(OrderDetail d) IF product(d) = p;
toOrder 'Π Π·Π°ΠΊΠ°Π·Ρ' (Product p) = orderedNotShipped(p) + reorderLevel(p) - unitsInStock(p);
SELECT companyName(supplier(Product p)), name(p), toOrder(p) WHERE toOrder(p) > 0;
Task with an asterisk
And the last example is from me personally. There is the logic of a social network. People can be friends with each other and like each other. From a functional database perspective, this would look like this:
CLASS Person;
likes = DATA BOOLEAN (Person, Person);
friends = DATA BOOLEAN (Person, Person);
It is necessary to find possible candidates for friendship. More formally, you need to find all people A, B, C such that A is friends with B, and B is friends with C, A likes C, but A is not friends with C.
From a functional database point of view, the query would look like this:
SELECT Person a, Person b, Person c WHERE
likes(a, c) AND NOT friends(a, c) AND
friends(a, b) AND friends(b, c);
The reader is invited to independently solve this problem in SQL. It is assumed that there are far fewer friends than those who like. Therefore, they are in separate tables. In case of a successful solution, there is also a problem with two asterisks. Her friendship is not symmetrical. On a functional database it would look like this:
SELECT Person a, Person b, Person c WHERE
likes(a, c) AND NOT friends(a, c) AND
(friends(a, b) OR friends(b, a)) AND
(friends(b, c) OR friends(c, b));
UPD: solution of the problem with the first and second asterisk from
SELECT
pl.PersonAID
,pf.PersonAID
,pff.PersonAID
FROM Persons AS p
--ΠΠ°ΠΉΠΊΠΈ
JOIN PersonRelationShip AS pl ON pl.PersonAID = p.PersonID
AND pl.Relation = 'Like'
--ΠΡΡΠ·ΡΡ
JOIN PersonRelationShip AS pf ON pf.PersonAID = p.PersonID
AND pf.Relation = 'Friend'
--ΠΡΡΠ·ΡΡ ΠΡΡΠ·Π΅ΠΉ
JOIN PersonRelationShip AS pff ON pff.PersonAID = pf.PersonBID
AND pff.PersonBID = pl.PersonBID
AND pff.Relation = 'Friend'
--ΠΡΡ Π½Π΅ Π΄ΡΡΠΆΠ°Ρ
LEFT JOIN PersonRelationShip AS pnf ON pnf.PersonAID = p.PersonID
AND pnf.PersonBID = pff.PersonBID
AND pnf.Relation = 'Friend'
WHERE pnf.PersonAID IS NULL
;WITH PersonRelationShipCollapsed AS (
SELECT pl.PersonAID
,pl.PersonBID
,pl.Relation
FROM #PersonRelationShip AS pl
UNION
SELECT pl.PersonBID AS PersonAID
,pl.PersonAID AS PersonBID
,pl.Relation
FROM #PersonRelationShip AS pl
)
SELECT
pl.PersonAID
,pf.PersonBID
,pff.PersonBID
FROM #Persons AS p
--ΠΠ°ΠΉΠΊΠΈ
JOIN PersonRelationShipCollapsed AS pl ON pl.PersonAID = p.PersonID
AND pl.Relation = 'Like'
--ΠΡΡΠ·ΡΡ
JOIN PersonRelationShipCollapsed AS pf ON pf.PersonAID = p.PersonID
AND pf.Relation = 'Friend'
--ΠΡΡΠ·ΡΡ ΠΡΡΠ·Π΅ΠΉ
JOIN PersonRelationShipCollapsed AS pff ON pff.PersonAID = pf.PersonBID
AND pff.PersonBID = pl.PersonBID
AND pff.Relation = 'Friend'
--ΠΡΡ Π½Π΅ Π΄ΡΡΠΆΠ°Ρ
LEFT JOIN PersonRelationShipCollapsed AS pnf ON pnf.PersonAID = p.PersonID
AND pnf.PersonBID = pff.PersonBID
AND pnf.Relation = 'Friend'
WHERE pnf.[PersonAID] IS NULL
Conclusion
It should be noted that the above syntax of the language is just one of the options for implementing the above concept. It was SQL that was taken as the basis, and the goal was to make it as similar as possible to it. Of course, someone may not like the names of keywords, case of words, and so on. The main thing here is the concept itself. If desired, you can make both C ++ and Python similar syntax.
The described database concept, in my opinion, has the following advantages:
- Ease. This is a relatively subjective indicator that is not obvious in simple cases. But if you look at more complex cases (for example, tasks with asterisks), then, in my opinion, writing such queries is much easier.
- Encapsulation. In some examples, I declared intermediate functions (for example, sold, bought etc.), from which subsequent functions were built. This allows you to change the logic of certain functions, if necessary, without changing the logic of those that depend on them. For example, you can make sales sold were calculated from completely different objects, while the rest of the logic will not change. Yes, in RDBMS this can be done using CREATE VIEW. But if you write all the logic in this way, then it will not look very readable.
- No Semantic Gap. Such a database operates with functions and classes (instead of tables and fields). In the same way as in classical programming (assuming that a method is a function with the first parameter in the form of a class to which it belongs). Accordingly, it should be much easier to βmake friendsβ with universal programming languages. In addition, this concept allows you to implement much more complex functions. For example, you can embed statements like this in the database:
CONSTRAINT sold(Employee e, 1, 2019) > 100 IF name(e) = 'ΠΠ΅ΡΡ' MESSAGE 'Π§ΡΠΎ-ΡΠΎ ΠΠ΅ΡΡ ΠΏΡΠΎΠ΄Π°Π΅Ρ ΡΠ»ΠΈΡΠΊΠΎΠΌ ΠΌΠ½ΠΎΠ³ΠΎ ΠΎΠ΄Π½ΠΎΠ³ΠΎ ΡΠΎΠ²Π°ΡΠ° Π² 2019 Π³ΠΎΠ΄Ρ';
- Inheritance and polymorphism. In a functional database, you can introduce multiple inheritance through the CLASS ClassP: Class1, Class2 constructs and implement multiple polymorphism. How exactly, perhaps I will write in the following articles.
Even though this is just a concept, we already have some implementation in Java that translates all functional logic into relational logic. Plus, the logic of representations and a lot of other things are beautifully screwed to it, thanks to which we get a whole
Source: habr.com