SQL, what could be easier? Each of us can write a simple request - we type select, list the required columns, then from, table name, few conditions in Where and that's all - useful data in our pocket, and (almost) regardless of which DBMS is under the hood at that time (or maybe
And let's start right with
Object relational mapping
Proponents of ORM traditionally value the speed and ease of development, independence from the DBMS and the purity of the code. For many of us, the code for working with the database (and often the database itself)
it usually looks like this...
@Entity
@Table(name = "stock", catalog = "maindb", uniqueConstraints = {
@UniqueConstraint(columnNames = "STOCK_NAME"),
@UniqueConstraint(columnNames = "STOCK_CODE") })
public class Stock implements java.io.Serializable {
@Id
@GeneratedValue(strategy = IDENTITY)
@Column(name = "STOCK_ID", unique = true, nullable = false)
public Integer getStockId() {
return this.stockId;
}
...
The model is wrapped in clever annotations, and somewhere behind the scenes, a valiant ORM generates and executes tons of some SQL code. By the way, developers are trying with all their might to isolate themselves from their database with kilometers of abstractions, which indicates some
On the other side of the barricades, adherents of pure "handmade"-SQL note the ability to squeeze all the juice out of their DBMS without additional layers and abstractions. As a result, "data-centric" projects appear, where specially trained people are involved in the database (they are also "basicists", they are also "basicists", they are also "bastards", etc.), and developers only have to "pull" ready-made views and stored procedures, without going into details.
And what if you take the best of both worlds? How it's done in a wonderful tool with a life-affirming name
Clojure is a cool language for building DSLs, but SQL itself is a cool DSL and we don't need another one. S-expressions are great, but they don't add anything new here. As a result, we get brackets for the sake of brackets. Do not agree? Then wait for the moment when the abstraction over the database will leak, and you will start fighting with the function (raw-sql)
And what to do? Let's leave SQL as normal SQL - one file per request:
-- name: users-by-country
select *
from users
where country_code = :country_code
...and then read that file, turning it into a normal Clojure function:
(defqueries "some/where/users_by_country.sql"
{:connection db-spec})
;;; A function with the name `users-by-country` has been created.
;;; Let's use it:
(users-by-country {:country_code "GB"})
;=> ({:name "Kris" :country_code "GB" ...} ...)
By adhering to the "SQL separate, Clojure separate" principle, you get:
- No syntactic surprises. Your database (like any other) is not 100% SQL compliant - but Yesql doesn't care. You will never waste time hunting for functions with SQL equivalent syntax. You never have to return to a function (raw-sql "some('funky'::SYNTAX)")).
- Better editor support. Your editor already has excellent SQL support. By storing SQL as SQL you can just use it.
- Team Compatibility. Your DBAs can read and write the SQL you use in your Clojure project.
- Easier performance tuning. Need to build a plan for a problematic query? This is not a problem when your query is regular SQL.
- Reusing queries. Drag those same SQL files to other projects because it's just good old SQL - just share it.
In my opinion, the idea is very cool and at the same time very simple, thanks to which the project has gained a lot of
IDE & DB managers
Let's start with a simple everyday task. Often we have to look for some objects in the database, for example, find a table in the schema and study its structure (which columns, keys, indexes, constraints, etc. are used). And from any graphical IDE or small DB-manager, first of all, we are waiting for these very abilities. So that it is fast and does not have to wait half an hour until a window with the necessary information is drawn (especially with a slow connection to a remote database), and at the same time that the information received is fresh and relevant, and not cached junk. Moreover, the more complex and larger the database and the greater their number, the more difficult it is to do.
But usually I throw the mouse far away and just write the code. Let's say you want to know which tables (and with which properties) are contained in the "HR" schema. In most DBMS, the desired result can be achieved with such a simple query from information_schema:
select table_name
, ...
from information_schema.tables
where schema = 'HR'
From database to database, the contents of such lookup tables vary depending on the capabilities of each DBMS. And, for example, for MySQL, from the same directory, you can get table parameters specific to this DBMS:
select table_name
, storage_engine -- ΠΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΠΌΡΠΉ "Π΄Π²ΠΈΠΆΠΎΠΊ" ("MyISAM", "InnoDB" etc)
, row_format -- Π€ΠΎΡΠΌΠ°Ρ ΡΡΡΠΎΠΊΠΈ ("Fixed", "Dynamic" etc)
, ...
from information_schema.tables
where schema = 'HR'
Oracle does not know information_schema, but it does
select table_name
, pct_free -- ΠΠΈΠ½ΠΈΠΌΡΠΌ ΡΠ²ΠΎΠ±ΠΎΠ΄Π½ΠΎΠ³ΠΎ ΠΌΠ΅ΡΡΠ° Π² Π±Π»ΠΎΠΊΠ΅ Π΄Π°Π½Π½ΡΡ
(%)
, pct_used -- ΠΠΈΠ½ΠΈΠΌΡΠΌ ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΠΌΠΎΠ³ΠΎ ΠΌΠ΅ΡΡΠ° Π² Π±Π»ΠΎΠΊΠ΅ Π΄Π°Π½Π½ΡΡ
(%)
, last_analyzed -- ΠΠ°ΡΠ° ΠΏΠΎΡΠ»Π΅Π΄Π½Π΅Π³ΠΎ ΡΠ±ΠΎΡΠ° ΡΡΠ°ΡΠΈΡΡΠΈΠΊΠΈ
, ...
from all_tables
where owner = 'HR'
ClickHouse is no exception:
select name
, engine -- ΠΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΠΌΡΠΉ "Π΄Π²ΠΈΠΆΠΎΠΊ" ("MergeTree", "Dictionary" etc)
, ...
from system.tables
where database = 'HR'
Something similar can be done in Cassandra (which has columnfamilies instead of tables and keyspaces instead of schemas):
select columnfamily_name
, compaction_strategy_class -- Π‘ΡΡΠ°ΡΠ΅Π³ΠΈΡ ΡΠ±ΠΎΡΠΊΠΈ ΠΌΡΡΠΎΡΠ°
, gc_grace_seconds -- ΠΡΠ΅ΠΌΡ ΠΆΠΈΠ·Π½ΠΈ ΠΌΡΡΠΎΡΠ°
, ...
from system.schema_columnfamilies
where keyspace_name = 'HR'
For most other databases, you can also come up with similar queries (even Mongo has
Of course, in this way you can get information not only about tables, but in general about any object. From time to time, kind people share such code for different databases, as, for example, in the series of habra-articles "Functions for documenting PostgreSQL databases" (
As a result, this way of navigating and searching for objects is much more flexible, saves a lot of time, allows you to get exactly the information and in the form in which you need it now (as, for example, described in the post
Operations with objects
After we have found and studied the necessary objects, it's time to do something useful with them. Naturally, also without taking your fingers off the keyboard.
It's no secret that simply deleting a table will look the same in almost all databases:
drop table hr.persons
But with the creation of the table is already more interesting. Almost any DBMS (including many NoSQL) in one form or another can "create table", and its main part will even differ little (name, list of columns, data types), but other details can differ dramatically and depend on the internal device and capabilities of a particular DBMS. My favorite example is that in the Oracle documentation there are only one "bare" BNFs for the "create table" syntax
Also, many DBMSs have their own specific types of objects that are not available in other DBMSs. Moreover, we can perform operations not only on database objects, but also on the DBMS itself, for example, "kill" the process, free some memory area, enable tracing, switch to "read only" mode, and much more.
And now let's draw a little
One of the most common tasks is to build a diagram with database objects, to see the objects and the relationships between them in a beautiful picture. Almost any graphic IDE, separate "command line" utilities, specialized graphic tools and modelers can do this. Which will draw something for you "as best they can", and you can slightly influence this process only with the help of a few parameters in the configuration file or checkboxes in the interface.
But this problem can be solved in a much simpler, more flexible and elegant way, and of course with the help of code. To build diagrams of any complexity, we have several specialized markup languages ββat once (DOT, GraphML etc), and for them - a whole bunch of applications (GraphViz, PlantUML, Mermaid) that can read such instructions and visualize them in a variety of formats. Well, we already know how to get information about objects and relationships between them.
Let's give a small example of what it could look like using PlantUML and
select '@startuml'||chr(10)||'hide methods'||chr(10)||'hide stereotypes' union all
select distinct ccu.table_name || ' --|> ' ||
tc.table_name as val
from table_constraints as tc
join key_column_usage as kcu
on tc.constraint_name = kcu.constraint_name
join constraint_column_usage as ccu
on ccu.constraint_name = tc.constraint_name
where tc.constraint_type = 'FOREIGN KEY'
and tc.table_name ~ '.*' union all
select '@enduml'
And if you try a little, then on the basis of
SQL query is a little more complicated
-- Π¨Π°ΠΏΠΊΠ°
select '@startuml
!define Table(name,desc) class name as "desc" << (T,#FFAAAA) >>
!define primary_key(x) <b>x</b>
!define unique(x) <color:green>x</color>
!define not_null(x) <u>x</u>
hide methods
hide stereotypes'
union all
-- Π’Π°Π±Π»ΠΈΡΡ
select format('Table(%s, "%s n information about %s") {'||chr(10), table_name, table_name, table_name) ||
(select string_agg(column_name || ' ' || upper(udt_name), chr(10))
from information_schema.columns
where table_schema = 'public'
and table_name = t.table_name) || chr(10) || '}'
from information_schema.tables t
where table_schema = 'public'
union all
-- Π‘Π²ΡΠ·ΠΈ ΠΌΠ΅ΠΆΠ΄Ρ ΡΠ°Π±Π»ΠΈΡΠ°ΠΌΠΈ
select distinct ccu.table_name || ' "1" --> "0..N" ' || tc.table_name || format(' : "A %s may haven many %s"', ccu.table_name, tc.table_name)
from information_schema.table_constraints as tc
join information_schema.key_column_usage as kcu on tc.constraint_name = kcu.constraint_name
join information_schema.constraint_column_usage as ccu on ccu.constraint_name = tc.constraint_name
where tc.constraint_type = 'FOREIGN KEY'
and ccu.constraint_schema = 'public'
and tc.table_name ~ '.*'
union all
-- ΠΠΎΠ΄Π²Π°Π»
select '@enduml'
If you look closely, under the hood, many visualization tools also use similar queries. True, these requests are usually deeply
Metrics and monitoring
Let's move on to the traditionally difficult topic - database performance monitoring. I remember a small true story told to me by "one of my friends". On the next project, there lived a certain powerful DBA, and few of the developers knew him personally, and indeed had ever seen him in the eye (despite the fact that he worked, according to rumors, somewhere in a nearby building) . At hour "X", when the poduction system of a large retailer began to "feel bad" once again, he silently sent screenshots of graphs from the Oracle Enterprise Manager, on which he carefully highlighted critical places with a red marker for "clearness" (this, to put it mildly, helped little). And this "photocard" had to be treated. At the same time, no one had access to the precious (in both senses of the word) Enterprise Manager, because the system is complex and expensive, suddenly "the developers will stumble upon something and break everything." Therefore, the developers "empirically" found the place and cause of the brakes and released a patch. If the terrible letter from DBA did not come again in the near future, then everyone breathed a sigh of relief and returned to their current tasks (until the new Letter).
But the monitoring process can look more fun and friendly, and most importantly, accessible and transparent to everyone. At least its basic part, as an addition to the main monitoring systems (which are certainly useful and in many cases irreplaceable). Any DBMS is freely and absolutely free of charge ready to share information about its current state and performance. In the same "bloody" Oracle DB, almost any performance information can be obtained from system views, from processes and sessions to the state of the buffer cache (for example,
Thus, armed with some kind of metrics collector (Telegraf, Metricbeat, Collectd), which can execute custom sql queries, a storage of these metrics (InfluxDB, Elasticsearch, Timescaledb) and a visualizer (Grafana, Kibana), you can get a fairly easy and a flexible monitoring system that will be tightly integrated with other system-wide metrics (obtained, for example, from the application server, from the OS, etc.). As, for example, it is done in pgwatch2, which uses the InfluxDB + Grafana bundle and a set of queries to system views, which can also be
Total
And this is only an approximate list of what can be done with our database using ordinary SQL code. I'm sure you can find many more applications, write in the comments. And we will talk about how (and most importantly why) to automate all this and include it in your CI / CD pipeline next time.
Source: habr.com