Security and DBMS: what to remember when choosing protection tools

Security and DBMS: what to remember when choosing protection tools

My name is Denis Rozhkov, I am the head of software development at Gazinformservice, in the product team jatoba. Legislation and corporate regulations impose certain requirements for the security of data storage. Nobody wants third parties to gain access to confidential information, so the following issues are important for any project: identification and authentication, data access control, ensuring the integrity of information in the system, registration of security events. Therefore, I want to talk about some interesting points regarding the security of the DBMS.

The article was prepared after a speech at @DatabasesMeetup, organized Mail.ru Cloud Solutions. If you don't want to read, you can watch:


The article will have three parts:

  • How to secure connections.
  • What is an audit of actions and how to fix what happens from the side of the database and connection to it.
  • How to protect data in the database itself and what technologies are available for this.

Security and DBMS: what to remember when choosing protection tools
The three pillars of DBMS security: connection protection, activity auditing, and data protection

Connection security

You can connect to the database both directly and indirectly through web applications. As a rule, the user from the business side, that is, the person who works with the DBMS, interacts with it indirectly.

Before talking about securing connections, you need to answer important questions that determine how security measures will be built:

  • whether one business user is equivalent to one DBMS user;
  • whether access to DBMS data is provided only through an API that you control, or is there direct access to tables;
  • whether the DBMS is separated into a separate protected segment, who interacts with it and how;
  • whether pooling/proxy is used and intermediate layers that can change information about how the connection is built and who uses the database.

Now let's see what tools can be used to secure connections:

  1. Use database firewall class solutions. An additional layer of protection, at a minimum, will increase the transparency of what is happening in the DBMS, and at the maximum, you will be able to provide additional data protection.
  2. Use password policies. Their use depends on how your architecture is built. In any case, one password in the configuration file of a web application that connects to the DBMS is not enough to protect. There are a number of DBMS tools that allow you to control that a user and password need to be updated.

    Read more about user rating features here, you can also learn about MS SQL Vulnerability Assessmen here

  3. Enrich the context of the session with the right information. If the session is opaque, you do not understand who within it works in the DBMS, you can add information about who is doing what and why as part of the operation being performed. This information can be seen in the audit.
  4. Configure SSL if you do not have a network delimitation of the DBMS from end users, it is not in a separate VLAN. In such cases, it is imperative to protect the channel between the consumer and the DBMS itself. There are security tools, including among open source.

How will this affect the performance of the DBMS?

Let's look at the PostgreSQL example, how SSL affects the CPU load, the increase in timings and the decrease in TPS, whether it will take too many resources if it is enabled.

Loading PostgreSQL using pgbench is a simple program for running benchmarks. It executes the same sequence of commands repeatedly, possibly in parallel database sessions, and then calculates the average transaction rate.

Test 1 without SSL and using SSL - the connection is established with each transaction:

pgbench.exe --connect -c 10 -t 5000 "host=192.168.220.129 dbname=taskdb user=postgres sslmode=require 
sslrootcert=rootCA.crt sslcert=client.crt sslkey=client.key"

vs

pgbench.exe --connect -c 10 -t 5000 "host=192.168.220.129 dbname=taskdb user=postgres"

Test 2 without SSL and using SSL - all transactions are performed in one connection:

pgbench.exe -c 10 -t 5000 "host=192.168.220.129 dbname=taskdb user=postgres sslmode=require
sslrootcert=rootCA.crt sslcert=client.crt sslkey=client.key"

vs

pgbench.exe -c 10 -t 5000 "host=192.168.220.129 dbname=taskdb user=postgres"

Other settings:

scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 5000
number of transactions actually processed: 50000/50000

Test results:

 
NO SSL
SSL

A connection is established on every transaction

latency average
171.915 ms
187.695 ms

tps including connections establishing
58.168112
53.278062

tps excluding connections establishing
64.084546
58.725846

CPU
24%
28%

All transactions are performed in one connection

latency average
6.722 ms
6.342 ms

tps including connections establishing
1587.657278
1576.792883

tps excluding connections establishing
1588.380574
1577.694766

CPU
17%
21%

At low loads, the influence of SSL is comparable to the measurement error. If the amount of data being transferred is very large, the situation may be different. If we establish one connection per transaction (this is rare, usually the connection is shared between users), you have a large number of connections / disconnections, the impact may be a little more. That is, there may be risks of performance degradation, however, the difference is not so large as to not use protection.

Please note that there is a strong difference if you compare the modes of operation: you work within the same session or different ones. This is understandable: resources are spent on creating each connection.

We had a case when we connected Zabbix in trust mode, that is, we did not check md5, there was no need for authentication. Then the customer asked to enable the md5 authentication mode. This gave a heavy load on the CPU, the performance dipped. We began to look for ways to optimize. One of the possible solutions to the problem is to implement a network restriction, make separate VLANs for the DBMS, add settings so that it is clear who connects from where and remove authentication. You can also optimize authentication settings to reduce costs when enabling authentication, but in general, using different methods authentication affects performance and requires these factors to be taken into account when designing the computing power of servers (hardware) for a DBMS.

Conclusion: in a number of solutions, even small nuances in authentication can greatly affect the project and it’s bad when it becomes clear only when implemented in a productive environment.

Action audit

Audit can be not only a DBMS. Audit is getting information about what is happening in different segments. This can be both a database firewall and the operating system on which the DBMS is built.

In commercial Enterprise-level DBMS with auditing, everything is fine, in open source - not always. Here's what's in PostgreSQL:

  • default log - built-in logging;
  • extensions: pgaudit - if you do not have enough default logging, you can use separate settings that solve some of the problems.

Addition to the report in the video:

“Basic statement logging can be provided by the standard logger with log_statement = all.

This is acceptable for monitoring and other uses, but does not provide the level of detail normally required for auditing.

It is not enough to have a list of all operations performed on the database.

It should also be possible to find specific statements that are of interest to the auditor.

The standard logger shows what the user requested, while pgAudit focuses on the details of what happened when the database executed the request.

For example, an auditor may want to verify that a particular table was created in a documented maintenance window.

This might seem like a simple task for basic auditing and grep, but what if you were presented with something like this (intentionally confusing) example:

DO$$
BEGIN
EXECUTE 'CREATE TABLE import' || 'ant_table (id INT)';
END$$;

Standard logging will give you this:

LOG: statement: DO $$
BEGIN
EXECUTE 'CREATE TABLE import' || 'ant_table (id INT)';
END$$;

It appears that finding the table of interest may require some code knowledge in cases where tables are created dynamically.

This is not ideal as it would be preferable to just search by table name.

This is where pgAudit comes in handy.

For the same input, it will give this output in the log:

AUDIT: SESSION,33,1,FUNCTION,DO,,,"DO $$
BEGIN
EXECUTE 'CREATE TABLE import' || 'ant_table (id INT)';
END$$;"
AUDIT: SESSION,33,2,DDL,CREATE TABLE,TABLE,public.important_table,CREATE TABLE important_table (id INT)

Not only the DO block is registered, but also the full text of CREATE TABLE with operator type, object type, and fully qualified name, making it easier to search.

When logging SELECT and DML statements, pgAudit can be configured to log a separate entry for each relationship referenced in the statement.

No parsing is required to find all statements that refer to a particular table(*). "

How will this affect the performance of the DBMS?

Let's run tests with full auditing enabled and see what happens to PostgreSQL performance. Enable maximum database logging for all parameters.

We do not change almost anything in the configuration file, from the important one we turn on the debug5 mode in order to get the maximum information.

postgresql.conf

log_destination = 'stderr'
logging_collector = on
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 10MB
log_min_messages = debug5
log_min_error_statement = debug5
log_min_duration_statement = 0
debug_print_parse = on
debug_print_rewritten = on
debug_print_plan = on
debug_pretty_print = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_duration = on
log_hostname = on
log_lock_wait = on
log_replication_commands = on
log_temp_files = 0
log_timezone = 'Europe/Moscow'

On the PostgreSQL DBMS with parameters 1 CPU, 2,8 GHz, 2 GB RAM, 40 GB HDD, we perform three load tests using the commands:

$ pgbench -p 3389 -U postgres -i -s 150 benchmark
$ pgbench -p 3389 -U postgres -c 50 -j 2 -P 60 -T 600 benchmark
$ pgbench -p 3389 -U postgres -c 150 -j 2 -P 60 -T 600 benchmark

Test results:

No logging
With logging

Total database filling time
43,74 sec
53,23 sec

RAM
24%
40%

CPU
72%
91%

Test 1 (50 connections)

Number of transactions per 10 min
74169
32445

Transactions/sec
123
54

Average delay
405 ms
925 ms

Test 2 (150 connections with 100 possible)

Number of transactions per 10 min
81727
31429

Transactions/sec
136
52

Average delay
550 ms
1432 ms

About sizes

Database size
2251 MB
2262 MB

DB log size
0 MB
4587 MB

As a result: a full audit is not very good. The data from the audit will turn out in volume, like the data in the database itself, or even more. This amount of logging that is generated when working with a DBMS is a common problem in production.

Let's look at other options:

  • The speed does not change much: without logging - 43,74 seconds, with logging - 53,23 seconds.
  • RAM and CPU performance will sag, as you need to generate an audit file. This is also noticeable in productivity.

With an increase in the number of connections, of course, the performance will deteriorate slightly.

In corporations with an audit it is even more difficult:

  • a lot of data;
  • audit is needed not only through syslog in SIEM, but also in files: suddenly something happens to syslog, there should be a file close to the database in which the data will be saved;
  • for audit, you need a separate shelf so as not to sag on I / O disks, since it takes up a lot of space;
  • it happens that IS employees need GOSTs everywhere, they require guest identification.

Data Access Restriction

Let's look at the technologies that are used to protect data and access it in commercial DBMS and open source.

What can be used in general:

  1. Encryption and obfuscation of procedures and functions (Wrapping) - that is, individual tools and utilities that make unreadable code out of readable code. True, then it can neither be changed nor refactored back. Such an approach is sometimes required at least on the DBMS side - the logic of licensing restrictions or authorization logic is encrypted at the level of procedure and function.
  2. Restricting the visibility of data by row (RLS) is when different users see the same table, but the composition of the rows in it is different, that is, something cannot be shown to someone at the row level.
  3. Editing the displayed data (Masking) is when users in one column of the table see either data or only asterisks, that is, information will be closed for some users. The technology determines which user to show what, taking into account the access level.
  4. Security DBA / Application DBA / DBA access control is more about restricting access to the DBMS itself, that is, information security employees can be separated from database administrators and application administrators. In open source, there are few such technologies, in commercial DBMS they are enough. They are needed when there are many users with access to the servers themselves.
  5. Restricting access to files at the file system level. You can issue rights, directory access privileges, so that each administrator gets access only to the necessary data.
  6. Mandatory access and memory cleanup - these technologies are rarely used.
  7. End-to-end encryption directly by the DBMS is client-side encryption with key management on the server side.
  8. Data encryption. For example, column encryption is when you use a mechanism that encrypts a single column of the base.

How does this affect the performance of the DBMS?

Let's look at an example of column encryption in PostgreSQL. There is a pgcrypto module, it allows you to store selected fields in encrypted form. This is useful when only some of the data is of value. To read the encrypted fields, the client sends a decryption key, the server decrypts the data and gives it to the client. Without a key with your data, no one can do anything.

Let's test with pgcrypto. Let's create a table with encrypted data and with regular data. Below is the command for creating tables, in the very first line a useful command is the creation of the extension itself with the registration of the DBMS:

CREATE EXTENSION pgcrypto;
CREATE TABLE t1 (id integer, text1 text, text2 text);
CREATE TABLE t2 (id integer, text1 bytea, text2 bytea);
INSERT INTO t1 (id, text1, text2)
VALUES (generate_series(1,10000000), generate_series(1,10000000)::text, generate_series(1,10000000)::text);
INSERT INTO t2 (id, text1, text2) VALUES (
generate_series(1,10000000),
encrypt(cast(generate_series(1,10000000) AS text)::bytea, 'key'::bytea, 'bf'),
encrypt(cast(generate_series(1,10000000) AS text)::bytea, 'key'::bytea, 'bf'));

Next, let's try to make a data selection from each table and look at the execution timings.

Selecting from a table without an encryption function:

psql -c "timing" -c "select * from t1 limit 1000;" "host=192.168.220.129 dbname=taskdb
user=postgres sslmode=disable" > 1.txt

Stopwatch is on.

  id | text1 | text2
——+——-+——-
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3
...
997 | 997 | 997
998 | 998 | 998
999 | 999 | 999
1000 | 1000 | 1000
(1000 lines)

Time: 1,386ms

Selection from a table with an encryption function:

psql -c "timing" -c "select id, decrypt(text1, 'key'::bytea, 'bf'),
decrypt(text2, 'key'::bytea, 'bf') from t2 limit 1000;"
"host=192.168.220.129 dbname=taskdb user=postgres sslmode=disable" > 2.txt

Stopwatch is on.

  id | decrypt | decrypt
——+—————+————
1 | x31 | x31
2 | x32 | x32
3 | x33 | x33
...
999 | x393939 | x393939
1000 | x31303030 | x31303030
(1000 lines)

Time: 50,203ms

Test results:

 
Without encryption
Pgcrypto (decrypt)

Fetch 1000 rows
1,386 ms
50,203 ms

CPU
15%
35%

RAM
 
+ 5 %

Encryption greatly affects performance. It can be seen that the timing has increased, since the operations of decrypting encrypted data (and decryption is usually still wrapped in your logic) require significant resources. That is, the idea to encrypt all columns containing some data is fraught with performance degradation.

At the same time, encryption is not a silver bullet that solves all issues. The decrypted data and the decryption key in the process of decrypting and transmitting data are located on the server. Therefore, the keys can be intercepted by someone who has full access to the database server, such as a system administrator.

When there is one key for the entire column for all users (even if not for all, but for clients of a limited set), this is not always good and correct. That is why they began to do end-to-end encryption, the DBMS began to consider options for encrypting data from the client and server sides, the very key-vault storages appeared - separate products that provide key management on the DBMS side.

Security and DBMS: what to remember when choosing protection tools
An example of such encryption in MongoDB

Security tools in commercial and open source DBMS

Functions
A type
Password Policy
Enviromental compliance audit
Protecting the source code of procedures and functions
RLS
Encryption

Oracle
a commercial
+
+
+
+
+

MsSQL
a commercial
+
+
+
+
+

jatoba
a commercial
+
+
+
+
extensions

PostgreSQL
Free
extensions
extensions

+
extensions

mongodb
Free

+


Available in MongoDB Enterprise only

The table is far from complete, but the situation is as follows: in commercial products, security problems have been solved for a long time, in open source, as a rule, some add-ons are used for security, many functions are missing, sometimes you have to add something. For example, password policies - PostgreSQL has many different extensions (1, 2, 3, 4, 5), which implement password policies, but, in my opinion, none of them cover all the needs of the domestic corporate segment.

What to do if there is nothing you need anywhere? For example, you want to use a certain DBMS that does not have the functions that the customer requires.

Then you can use third-party solutions that work with different DBMS, for example, "Crypto DB" or "Garda DB". If we are talking about solutions from the domestic segment, then they know about GOSTs better than in open source.

The second option is to write what you need yourself, implement data access and encryption in the application at the procedure level. True, it will be more difficult with GOST. But in general, you can hide the data as needed, add it to the DBMS, then get it and decrypt it as needed, right at the application level. At the same time, immediately think about how you will protect these algorithms on the application. In our opinion, this should be done at the DBMS level, because it will work faster this way.

This report was first presented at @Databases Meetup by Mail.ru Cloud Solutions. See video other performances and subscribe to announcements of events in Telegram Around Kubernetes in Mail.ru Group.

What else to read on the topic:

  1. More Than Ceph: Cloud Block Storage MCS.
  2. How to select a database for a project so as not to select again.

Source: habr.com

Add a comment