Cassandra. How not to die if you only know Oracle

Hey Habr.

My name is Misha Butrimov, I would like to tell you a little about Cassandra. My story will be useful to those who have never encountered NoSQL databases - it has a lot of implementation features and pitfalls that you need to know about. And if you haven't seen anything other than Oracle or any other relational database, these things will save your life.

Why is Cassandra good? It is a NoSQL database designed without a single point of failure that scales well. If you need to add a couple of terabytes for some base, you simply add nodes to the ring. Expand it to another data center? Add nodes to the cluster. Increase processed RPS? Add nodes to the cluster. It also works in reverse.

Cassandra. How not to die if you only know Oracle

What else is she good at? In order to process many requests. But a lot is how much? 10, 20, 30, 40 thousand requests per second is not much. 100 thousand requests per second for writing - too. There are companies that have said they are handling 2 million requests per second. That's what they'll probably have to believe.

And in principle, Cassandra has one big difference from relational data - it does not look like them at all. And this is very important to remember.

Not everything that looks the same works the same

Once a colleague came to me and asked: “Here is the СQL Cassandra query language, and it has a select statement, it has where, it has and. I write letters and it doesn't work. Why?". If you treat Cassandra like a relational database, then this is the perfect way to commit violent suicide. And I'm not promoting, it's forbidden in Russia. You will just design something wrong.

For example, a customer comes to us and says: “Let's build a database for serials, or a database for a recipe directory. We will have food dishes or a list of TV shows and actors in it.” We say joyfully: "Let's go!". This is to send two bytes, a couple of plates and everything is ready, everything will work very quickly, reliably. And everything is fine until the customers come and say that housewives also solve the inverse problem: they have a list of products, and they want to know what dish they want to cook. You are dead.

This is because Cassandra is a hybrid database: it is both a key value and stores data in wide columns. Speaking in Java or Kotlin, it could be described like this:

Map<RowKey, SortedMap<ColumnKey, ColumnValue>>

That is, a map, inside of which there is also a sorted map. The first key to this map is the Row key or Partition key - the partitioning key. The second key, which is the key to the already sorted map, is the Clustering key.

To illustrate the distribution of the database, let's draw three nodes. Now we need to understand how to decompose the data into nodes. Because if we shove everything into one (by the way, there can be a thousand, two thousand, five - as many as you like), this is not very much about distribution. Therefore, we need a mathematical function that will return a number. Just a number, a long int that will fall within some range. And we will have one node responsible for one range, the second - for the second, the n-th - for the n-th.

Cassandra. How not to die if you only know Oracle

This number is taken using a hash function, which is applied to exactly what we call the Partition key. This is the column that is specified in the Primary key directive, and this is the column that will be the first and most basic map key. It determines which node will receive which data. The table is created in Cassandra with almost the same syntax as in SQL:

CREATE TABLE users (
	user_id uu id,
	name text,
	year int,
	salary float,
	PRIMARY KEY(user_id)

)

The Primary key in this case consists of one column, and it is also the partitioning key.

How will our users fall? Part will go to one node, part to another, and part to a third. It turns out an ordinary hash table, it is also a map, it is also a dictionary in Python, it is also a simple Key value structure from which we can read all values, read and write by key.

Cassandra. How not to die if you only know Oracle

Select: when allow filtering turns into full scan, or how not to do it

Let's write some select statement: select * from users where, userid = . It turns out like in Oracle: we write select, we specify conditions and everything works, users get. But if you select, for example, a user with a certain year of birth, Cassandra swears that she cannot fulfill the request. Because she doesn’t know anything at all about how our data on the year of birth is distributed - she has only one column as a key. Then she says, “Okay, I can still complete this request. Add allow filtering". We add the directive, everything works. And at this moment something terrible happens.

When we drive on test data, everything is fine. And when you execute a query in production, where we have, for example, 4 million records, then we are not doing very well. Because allow filtering is a directive that allows Cassandra to collect all the data from this table from all nodes, all data centers (if there are many of them in this cluster), and only then filter it. This is an analogue of Full Scan, and hardly anyone is delighted with it.

If we needed users only by ID, that would suit us. But sometimes we need to write other queries and impose other restrictions on the selection. Therefore, we recall: this is all we have a map that has a partitioning key, but inside it is a sorted map.

And she also has a key, which we call the Clustering Key. This key, which, in turn, consists of columns that we choose, with the help of which Cassandra understands how her data is physically sorted and will lie on each node. That is, for some Partition key Clustering key will tell you exactly how to push the data into this tree, what place they will take there.

This is a real tree, a comparator is simply called there, into which we pass a certain set of columns as an object, and it is also set as an enumeration of columns.

CREATE TABLE users_by_year_salary_id (
	user_id uuid,
	name text,
	year int,
	salary float,
	PRIMARY KEY((year), salary, user_id)

Pay attention to the Primary key directive, its first argument (in our case, the year) always has a Partition key. It can consist of one or more columns, it doesn't matter. If there are several columns, it must be bracketed again so that the language preprocessor understands that this is the Primary key, and after it all the other columns are the Clustering key. In this case, they will be transmitted in the comparator in the order in which they go. That is, the first column is more significant, the second is less significant, and so on. As we write for data classes, for example, equals fields: we enumerate the fields, and for them we write which are more and which are less. In Cassandra, these are, relatively speaking, the fields of the data class, to which the equals written for it will be applied.

Set sorting, impose restrictions

It must be remembered that the sort order (descending, ascending, it doesn't matter) is set at the same moment when the key is created, and it will not be possible to change it later. It physically determines how the data will be sorted and how it will lie. If you need to change the Clustering key or sort order, you will have to create a new table and pour data into it. It won't work with an existing one.

Cassandra. How not to die if you only know Oracle

We filled our table with users and saw that they formed a ring first by year of birth, and then inside on each node by salary and user ID. Now we can select by imposing restrictions.

Our worker appears again where, and, and users get to us, and everything is fine again. But if we try to use only a part of the Clustering key, and a less significant one, then Cassandra will immediately swear that it cannot find a place in our map where this object, which has these fields for the null comparator, and this one that has just been set , where he lies. I will have to pick up all the data from this node again and filter it. And this is an analogue of Full Scan within the node, which is bad.

In any incomprehensible situation, create a new table

If we want to be able to get users by ID, or by age, or by salary, what should we do? Nothing. Just use two tables. If you need to get users in three different ways, there will be three tables. Gone are the days when we saved space on a screw. This is the cheapest resource. It costs much less than response time, which can be detrimental to the user. It is much more pleasant for the user to get something in a second than in 10 minutes.

We're trading overhead, denormalized data for the ability to scale well, work reliably. Indeed, in fact, a cluster that consists of three data centers, each of which has five nodes, with an acceptable level of data retention (when nothing is lost for sure), is able to survive the death of one data center completely. And two more nodes in each of the two remaining ones. And it's only after that that the problems begin. This is a pretty good backup, it costs a couple of extra ssd drives and processors. Therefore, in order to use Cassandra, which is never SQL, in which there are no relations, foreign keys, you need to know simple rules.

We design everything from request. The main thing is not the data, but how the application is going to work with them. If it needs to receive different data in different ways, or the same data in different ways, we must put them in a way that is convenient for the application. Otherwise, we will fail in Full Scan and Cassandra will not give us any advantage.

Denormalizing data is the norm. Forget about normal forms, we no longer have relational databases. Let's put something 100 times, it will lie 100 times. It's still cheaper than slowing down.

We select keys for partitioning so that they are normally distributed. We don't want the hash of our keys to fall within one narrow range. That is, the year of birth in the example above is a bad example. Rather, it is good if our users are normally distributed by year of birth, and bad if we are talking about 5th grade students - it will not be very well partitioned there.

Sorting is selected once at the stage of creating a Clustering Key. If it needs to be changed, then we will have to transfer our table with a different key.

And most importantly, if we need to get the same data in 100 different ways, then we will have 100 different tables.

Source: habr.com

Add a comment