"Database as Code" Experience

"Database as Code" Experience

SQL, what could be easier? Each of us can write a simple request - we type select, list the required columns, then from, table name, few conditions in Where and that's all - useful data in our pocket, and (almost) regardless of which DBMS is under the hood at that time (or maybe not a DBMS at all). As a result, working with almost any data source (relational and not so) can be considered from the point of view of ordinary code (with all the consequences - version control, code review, static analysis, autotests, and that's all). And this applies not only to the data itself, schemes and migrations, but in general to the entire life of the storage. In this article, we will talk about everyday tasks and problems of working with various databases under the scope of "database as code".

And let's start right with ORM. The first battles like "SQL vs ORM" were seen back in pre-Petrine Rus'.

Object relational mapping

Proponents of ORM traditionally value the speed and ease of development, independence from the DBMS and the purity of the code. For many of us, the code for working with the database (and often the database itself)

it usually looks like this...

@Entity
@Table(name = "stock", catalog = "maindb", uniqueConstraints = {
        @UniqueConstraint(columnNames = "STOCK_NAME"),
        @UniqueConstraint(columnNames = "STOCK_CODE") })
public class Stock implements java.io.Serializable {

    @Id
    @GeneratedValue(strategy = IDENTITY)
    @Column(name = "STOCK_ID", unique = true, nullable = false)
    public Integer getStockId() {
        return this.stockId;
    }
  ...

The model is wrapped in clever annotations, and somewhere behind the scenes, a valiant ORM generates and executes tons of some SQL code. By the way, developers are trying with all their might to isolate themselves from their database with kilometers of abstractions, which indicates some "SQL of hate".

On the other side of the barricades, adherents of pure "handmade"-SQL note the ability to squeeze all the juice out of their DBMS without additional layers and abstractions. As a result, "data-centric" projects appear, where specially trained people are involved in the database (they are also "basicists", they are also "basicists", they are also "bastards", etc.), and developers only have to "pull" ready-made views and stored procedures, without going into details.

And what if you take the best of both worlds? How it's done in a wonderful tool with a life-affirming name Yesql. I will give a couple of lines from the general concept in my free translation, and you can get acquainted with it in more detail here.

Clojure is a cool language for building DSLs, but SQL itself is a cool DSL and we don't need another one. S-expressions are great, but they don't add anything new here. As a result, we get brackets for the sake of brackets. Do not agree? Then wait for the moment when the abstraction over the database will leak, and you will start fighting with the function (raw-sql)

And what to do? Let's leave SQL as normal SQL - one file per request:

-- name: users-by-country
select *
  from users
 where country_code = :country_code

...and then read that file, turning it into a normal Clojure function:

(defqueries "some/where/users_by_country.sql"
   {:connection db-spec})

;;; A function with the name `users-by-country` has been created.
;;; Let's use it:
(users-by-country {:country_code "GB"})
;=> ({:name "Kris" :country_code "GB" ...} ...)

By adhering to the "SQL separate, Clojure separate" principle, you get:

  • No syntactic surprises. Your database (like any other) is not 100% SQL compliant - but Yesql doesn't care. You will never waste time hunting for functions with SQL equivalent syntax. You never have to return to a function (raw-sql "some('funky'::SYNTAX)")).
  • Better editor support. Your editor already has excellent SQL support. By storing SQL as SQL you can just use it.
  • Team Compatibility. Your DBAs can read and write the SQL you use in your Clojure project.
  • Easier performance tuning. Need to build a plan for a problematic query? This is not a problem when your query is regular SQL.
  • Reusing queries. Drag those same SQL files to other projects because it's just good old SQL - just share it.

In my opinion, the idea is very cool and at the same time very simple, thanks to which the project has gained a lot of followers in a variety of languages. And then we will try to apply a similar philosophy of separating SQL code from everything else far beyond the ORM.

IDE & DB managers

Let's start with a simple everyday task. Often we have to look for some objects in the database, for example, find a table in the schema and study its structure (which columns, keys, indexes, constraints, etc. are used). And from any graphical IDE or small DB-manager, first of all, we are waiting for these very abilities. So that it is fast and does not have to wait half an hour until a window with the necessary information is drawn (especially with a slow connection to a remote database), and at the same time that the information received is fresh and relevant, and not cached junk. Moreover, the more complex and larger the database and the greater their number, the more difficult it is to do.

But usually I throw the mouse far away and just write the code. Let's say you want to know which tables (and with which properties) are contained in the "HR" schema. In most DBMS, the desired result can be achieved with such a simple query from information_schema:

select table_name
     , ...
  from information_schema.tables
 where schema = 'HR'

From database to database, the contents of such lookup tables vary depending on the capabilities of each DBMS. And, for example, for MySQL, from the same directory, you can get table parameters specific to this DBMS:

select table_name
     , storage_engine -- Π˜ΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌΡ‹ΠΉ "Π΄Π²ΠΈΠΆΠΎΠΊ" ("MyISAM", "InnoDB" etc)
     , row_format     -- Π€ΠΎΡ€ΠΌΠ°Ρ‚ строки ("Fixed", "Dynamic" etc)
     , ...
  from information_schema.tables
 where schema = 'HR'

Oracle does not know information_schema, but it does Oracle metadata, and there are no big problems:

select table_name
     , pct_free       -- ΠœΠΈΠ½ΠΈΠΌΡƒΠΌ свободного мСста Π² Π±Π»ΠΎΠΊΠ΅ Π΄Π°Π½Π½Ρ‹Ρ… (%)
     , pct_used       -- ΠœΠΈΠ½ΠΈΠΌΡƒΠΌ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌΠΎΠ³ΠΎ мСста Π² Π±Π»ΠΎΠΊΠ΅ Π΄Π°Π½Π½Ρ‹Ρ… (%)
     , last_analyzed  -- Π”Π°Ρ‚Π° послСднСго сбора статистики
     , ...
  from all_tables
 where owner = 'HR'

ClickHouse is no exception:

select name
     , engine -- Π˜ΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌΡ‹ΠΉ "Π΄Π²ΠΈΠΆΠΎΠΊ" ("MergeTree", "Dictionary" etc)
     , ...
  from system.tables
 where database = 'HR'

Something similar can be done in Cassandra (which has columnfamilies instead of tables and keyspaces instead of schemas):

select columnfamily_name
     , compaction_strategy_class  -- БтратСгия сборки мусора
     , gc_grace_seconds           -- ВрСмя ΠΆΠΈΠ·Π½ΠΈ мусора
     , ...
  from system.schema_columnfamilies
 where keyspace_name = 'HR'

For most other databases, you can also come up with similar queries (even Mongo has special system collection, which contains information about all collections in the system).

Of course, in this way you can get information not only about tables, but in general about any object. From time to time, kind people share such code for different databases, as, for example, in the series of habra-articles "Functions for documenting PostgreSQL databases" (ayb, Π±Π΅Π½, hym). Of course, keeping all this mountain of queries in my head and constantly typing them is "such a" pleasure, so in my favorite IDE / editor I have a pre-prepared set of snippets for frequently used queries, and all that remains is to type the object names into the template.

As a result, this way of navigating and searching for objects is much more flexible, saves a lot of time, allows you to get exactly the information and in the form in which you need it now (as, for example, described in the post "Export data from a database in any format: what IDEs on the IntelliJ platform can do").

Operations with objects

After we have found and studied the necessary objects, it's time to do something useful with them. Naturally, also without taking your fingers off the keyboard.

It's no secret that simply deleting a table will look the same in almost all databases:

drop table hr.persons

But with the creation of the table is already more interesting. Almost any DBMS (including many NoSQL) in one form or another can "create table", and its main part will even differ little (name, list of columns, data types), but other details can differ dramatically and depend on the internal device and capabilities of a particular DBMS. My favorite example is that in the Oracle documentation there are only one "bare" BNFs for the "create table" syntax occupy 31 pages. Other DBMS have more modest capabilities, but each of them also has a lot of interesting and unique features for creating tables (postgres, mysql, cockroach, cassandra). It is unlikely that any graphic "wizard" from the next IDE (especially universal) will be able to fully cover all these abilities, and if it can, then it will not be a sight for the faint of heart. At the same time, a correctly and timely written statement create table will allow you to easily use all of them, make storage and access to your data reliable, optimal and as comfortable as possible.

Also, many DBMSs have their own specific types of objects that are not available in other DBMSs. Moreover, we can perform operations not only on database objects, but also on the DBMS itself, for example, "kill" the process, free some memory area, enable tracing, switch to "read only" mode, and much more.

And now let's draw a little

One of the most common tasks is to build a diagram with database objects, to see the objects and the relationships between them in a beautiful picture. Almost any graphic IDE, separate "command line" utilities, specialized graphic tools and modelers can do this. Which will draw something for you "as best they can", and you can slightly influence this process only with the help of a few parameters in the configuration file or checkboxes in the interface.

But this problem can be solved in a much simpler, more flexible and elegant way, and of course with the help of code. To build diagrams of any complexity, we have several specialized markup languages ​​at once (DOT, GraphML etc), and for them - a whole bunch of applications (GraphViz, PlantUML, Mermaid) that can read such instructions and visualize them in a variety of formats. Well, we already know how to get information about objects and relationships between them.

Let's give a small example of what it could look like using PlantUML and demo database for PostgreSQL (on the left is the SQL query that will generate the required instruction for PlantUML, and on the right is the result):

"Database as Code" Experience

select '@startuml'||chr(10)||'hide methods'||chr(10)||'hide stereotypes' union all
select distinct ccu.table_name || ' --|> ' ||
       tc.table_name as val
  from table_constraints as tc
  join key_column_usage as kcu
    on tc.constraint_name = kcu.constraint_name
  join constraint_column_usage as ccu
    on ccu.constraint_name = tc.constraint_name
 where tc.constraint_type = 'FOREIGN KEY'
   and tc.table_name ~ '.*' union all
select '@enduml'

And if you try a little, then on the basis of ER template for PlantUML you can get something very similar to a real ER diagram:

SQL query is a little more complicated

-- Π¨Π°ΠΏΠΊΠ°
select '@startuml
        !define Table(name,desc) class name as "desc" << (T,#FFAAAA) >>
        !define primary_key(x) <b>x</b>
        !define unique(x) <color:green>x</color>
        !define not_null(x) <u>x</u>
        hide methods
        hide stereotypes'
 union all
-- Π’Π°Π±Π»ΠΈΡ†Ρ‹
select format('Table(%s, "%s n information about %s") {'||chr(10), table_name, table_name, table_name) ||
       (select string_agg(column_name || ' ' || upper(udt_name), chr(10))
          from information_schema.columns
         where table_schema = 'public'
           and table_name = t.table_name) || chr(10) || '}'
  from information_schema.tables t
 where table_schema = 'public'
 union all
-- Бвязи ΠΌΠ΅ΠΆΠ΄Ρƒ Ρ‚Π°Π±Π»ΠΈΡ†Π°ΠΌΠΈ
select distinct ccu.table_name || ' "1" --> "0..N" ' || tc.table_name || format(' : "A %s may haven many %s"', ccu.table_name, tc.table_name)
  from information_schema.table_constraints as tc
  join information_schema.key_column_usage as kcu on tc.constraint_name = kcu.constraint_name
  join information_schema.constraint_column_usage as ccu on ccu.constraint_name = tc.constraint_name
 where tc.constraint_type = 'FOREIGN KEY'
   and ccu.constraint_schema = 'public'
   and tc.table_name ~ '.*'
 union all
-- Подвал
select '@enduml'

"Database as Code" Experience

If you look closely, under the hood, many visualization tools also use similar queries. True, these requests are usually deeply "hardwired" into the code of the application itself and difficult to understandnot to mention any modification of them.

Metrics and monitoring

Let's move on to the traditionally difficult topic - database performance monitoring. I remember a small true story told to me by "one of my friends". On the next project, there lived a certain powerful DBA, and few of the developers knew him personally, and indeed had ever seen him in the eye (despite the fact that he worked, according to rumors, somewhere in a nearby building) . At hour "X", when the poduction system of a large retailer began to "feel bad" once again, he silently sent screenshots of graphs from the Oracle Enterprise Manager, on which he carefully highlighted critical places with a red marker for "clearness" (this, to put it mildly, helped little). And this "photocard" had to be treated. At the same time, no one had access to the precious (in both senses of the word) Enterprise Manager, because the system is complex and expensive, suddenly "the developers will stumble upon something and break everything." Therefore, the developers "empirically" found the place and cause of the brakes and released a patch. If the terrible letter from DBA did not come again in the near future, then everyone breathed a sigh of relief and returned to their current tasks (until the new Letter).

But the monitoring process can look more fun and friendly, and most importantly, accessible and transparent to everyone. At least its basic part, as an addition to the main monitoring systems (which are certainly useful and in many cases irreplaceable). Any DBMS is freely and absolutely free of charge ready to share information about its current state and performance. In the same "bloody" Oracle DB, almost any performance information can be obtained from system views, from processes and sessions to the state of the buffer cache (for example, DBA Scripts, "Monitoring" section). Postgresql also has a bunch of system views for database operation monitoring, in particular, such indispensable in the daily life of any DBA as pg_stat_activity, pg_stat_database, pg_stat_bgwriter. In MySQL, even a separate schema is designed for this. performance_schedule. A B Mongo built-in profiler aggregates performance data into a system collection system.profile.

Thus, armed with some kind of metrics collector (Telegraf, Metricbeat, Collectd), which can execute custom sql queries, a storage of these metrics (InfluxDB, Elasticsearch, Timescaledb) and a visualizer (Grafana, Kibana), you can get a fairly easy and a flexible monitoring system that will be tightly integrated with other system-wide metrics (obtained, for example, from the application server, from the OS, etc.). As, for example, it is done in pgwatch2, which uses the InfluxDB + Grafana bundle and a set of queries to system views, which can also be add custom requests.

Total

And this is only an approximate list of what can be done with our database using ordinary SQL code. I'm sure you can find many more applications, write in the comments. And we will talk about how (and most importantly why) to automate all this and include it in your CI / CD pipeline next time.

Source: habr.com

Add a comment