TL; DR: JSONB can greatly simplify database schema development without sacrificing query performance.
Introduction
Here is a classic example of probably one of the oldest use cases in the world of relational databases (database): we have an entity, and we need to save certain properties (attributes) of this entity. But not all instances may have the same set of properties, moreover, in the future, the possible addition of more properties.
The easiest way to solve this problem is to create a column in the database table for each property value, and just fill in the ones that are needed for a particular entity instance. Great! Problem solved...until your table has millions of records and you need to add a new record.
Consider the EAV pattern () is quite common. One table contains entities (records), another table contains property names (attributes), and a third table associates entities with their attributes and contains the value of these attributes for the current entity. This gives you the ability to have different sets of properties for different objects, as well as add properties on the fly without changing the database structure.
However, I wouldn't be writing this post if there weren't some flaws in the EVA approach. So, for example, to get one or more entities that have 1 attribute each, 2 joins (joins) are required in the query: the first is a join with an attribute table, the second is a join with a value table. If an entity has 2 attributes, then 4 joins are needed! Also, all attributes are usually stored as strings, which results in type casting for both the result and the WHERE clause. If you write a lot of queries, then this is quite wasteful in terms of resource usage.
Despite these obvious shortcomings, EAV has long been used to solve these kinds of problems. These were inevitable drawbacks, and there was simply no better alternative.
But then a new “technology” appeared in PostgreSQL…
Starting with PostgreSQL 9.4, the JSONB data type was added to store binary JSON data. While storing JSON in this format typically takes a bit more space and time than plain text JSON, it is much faster to perform operations on it. JSONB also supports indexing, which makes querying them even faster.
The JSONB data type allows us to replace the cumbersome EAV pattern by adding just one JSONB column to our entity table, which greatly simplifies database design. But many argue that this should be accompanied by a decrease in performance ... That's why I came up with this article.
Setting up a test database
For this comparison, I created the database on a fresh install of PostgreSQL 9.5 on an $80 build. Ubuntu 14.04. After setting some options in postgresql.conf I ran script with psql. The following tables have been created to present the data in EAV format:
CREATE TABLE entity (
id SERIAL PRIMARY KEY,
name TEXT,
description TEXT
);
CREATE TABLE entity_attribute (
id SERIAL PRIMARY KEY,
name TEXT
);
CREATE TABLE entity_attribute_value (
id SERIAL PRIMARY KEY,
entity_id INT REFERENCES entity(id),
entity_attribute_id INT REFERENCES entity_attribute(id),
value TEXT
);
Below is a table where the same data will be stored, but with attributes in a JSONB type column − properties.
CREATE TABLE entity_jsonb (
id SERIAL PRIMARY KEY,
name TEXT,
description TEXT,
properties JSONB
);
Looks much easier, doesn't it? Then it was added to entity tables (entity & entity_jsonb) 10 million records, and accordingly, were filled with the same table data where the EAV pattern is used and the JSONB column approach - entity_jsonb.properties. Thus, we received several different data types among the entire set of properties. Sample data:
{
id: 1
name: "Entity1"
description: "Test entity no. 1"
properties: {
color: "red"
lenght: 120
width: 3.1882420
hassomething: true
country: "Belgium"
}
}So, now we have the same data for the two options. Let's start comparing implementations at work!
Design Simplification
As mentioned earlier, the database design has been greatly simplified: one table, by using a JSONB column for properties, instead of using three tables for EAV. But how is this reflected in requests? Updating a single entity property looks like this:
-- EAV
UPDATE entity_attribute_value
SET value = 'blue'
WHERE entity_attribute_id = 1
AND entity_id = 120;
-- JSONB
UPDATE entity_jsonb
SET properties = jsonb_set(properties, '{"color"}', '"blue"')
WHERE id = 120;
As you can see, the last query does not look simpler. To update the value of a property in a JSONB object, we must use the function , and should pass our new value as a JSONB object. However, we don't need to know any identifier beforehand. Looking at the EAV example, we need to know both entity_id and entity_attribute_id in order to update. If you want to update a property in a JSONB column based on the name of an object, it's all done in one simple line.
Now let's select the entity we just updated based on its new color:
-- EAV
SELECT e.name
FROM entity e
INNER JOIN entity_attribute_value eav ON e.id = eav.entity_id
INNER JOIN entity_attribute ea ON eav.entity_attribute_id = ea.id
WHERE ea.name = 'color' AND eav.value = 'blue';
-- JSONB
SELECT name
FROM entity_jsonb
WHERE properties ->> 'color' = 'blue';
I think we can agree that the second one is shorter (no join!), and therefore more readable. JSONB wins here! We use the JSON ->> operator to get the color as a text value from a JSONB object. There is also a second way to achieve the same result in a JSONB model using the @> operator:
-- JSONB
SELECT name
FROM entity_jsonb
WHERE properties @> '{"color": "blue"}';
It's a little more complicated: we're checking to see if the JSON object in the property column contains the object to the right of the @> operator. Less readable, more performant (see below).
Let's make it even easier to use JSONB when you need to select multiple properties at the same time. This is where the JSONB approach really comes in: we just select properties as additional columns in our result set without the need for joins:
-- JSONB
SELECT name
, properties ->> 'color'
, properties ->> 'country'
FROM entity_jsonb
WHERE id = 120;
With EAV you will need 2 joins for each property you want to query. In my opinion, the queries above show a great simplification in database design. See more examples of how to write JSONB queries, also see post.
Now it's time to talk about performance.
Performance
To compare performance I used in requests, to calculate the execution time. Each query was run at least three times because the query planner takes longer the first time. First I ran queries without any indexes. Obviously, this served as an advantage to JSONB, since the joins needed for EAV could not use indexes (foreign key fields were not indexed). After that I created an index on the 2 foreign key columns of the EAV value table and also the index for a JSONB column.
Refreshing the data showed the following results in terms of time (in ms). Note that the scale is logarithmic:

We see that JSONB is much (> 50000-x) faster than EAV if no indexes are used, for the reason stated above. When we index columns with primary keys, the difference almost disappears, but JSONB is still 1,3 times faster than EAV. Note that the index on the JSONB column has no effect here, as we are not using a property column in the evaluation criteria.
For selecting data based on property value, we get the following results (normal scale):

You can see that JSONB is again faster than EAV without indexes, but when EAV is with indexes it is still faster than JSONB. But then I saw that the timing for the JSONB requests was the same, which led me to the fact that the GIN index was not firing. Apparently, when you use a GIN index on a column with populated properties, it only takes effect when using the include operator @>. I used this in a new test, which had a huge impact on the time: only 0,153ms! This is 15000 times faster than EAV and 25000 times faster than the ->> operator.
I think it was fast enough!
DB table size
Let's compare the table sizes for both approaches. In psql we can show the size of all tables and indexes with the command dti+

For the EAV approach, table sizes are around 3068MB and indexes are up to 3427MB for a total of 6,43GB. The JSONB approach uses 1817 MB for the table and 318 MB for the indexes, which is 2,08 GB. It turns out 3 times less! This fact surprised me a little because we store property names in every JSONB object.
But still, the numbers speak for themselves: in EAV, we store 2 integer foreign keys per attribute value, resulting in 8 bytes of additional data. Also, in EAV, all property values are stored as text, while JSONB will use numeric and boolean values internally where possible, resulting in a smaller footprint.
Conclusion
In general, I think storing entity properties in JSONB format can greatly simplify the design and maintenance of your database. If you are doing a lot of queries, then everything stored in the same table as the entity will actually work more efficiently. And the fact that this simplifies the interaction between data is already a plus, but the resulting database is 3 times smaller in volume.
Also, according to the tests done, we can conclude that the performance loss is very small. In some cases, JSONB is even faster than EAV, which makes it even better. However, this benchmark does not of course cover all aspects (e.g. entities with a very large number of properties, a significant increase in the number of properties of existing data,…), so if you have any suggestions on how to improve them, please feel free to leave in the comments!
Source: habr.com
