How to look Cassandra in the eye without losing data, stability, and faith in NoSQL

How to look Cassandra in the eye without losing data, stability, and faith in NoSQL

They say everything in life is worth trying at least once. And if you are used to working with relational DBMS, then getting to know NoSQL in practice is worth it, at least for general development. Now, due to the rapid development of this technology, there are a lot of conflicting opinions and heated debates on this topic, which especially stirs up interest.
If you delve into the essence of all these disputes, you can see that they arise due to the wrong approach. Those who use NoSQL databases exactly where they are needed are satisfied and receive all its advantages from this solution. And experimenters who rely on this technology as a panacea where it is not applicable at all, are disappointed when they lose the strengths of relational databases without acquiring significant benefits.

I will tell you about our experience in implementing a solution based on the Cassandra DBMS: what we had to face, how we got out of difficult situations, did we manage to benefit from using NoSQL, and where did we have to invest additional effort/funds.
The initial task is to build a system that records calls to some kind of storage.

The operating principle of the system is as follows. The input comes with files with a specific structure that describes the structure of the call. The application then ensures that this structure is stored in the appropriate columns. In the future, saved calls are used to display information on traffic consumption for subscribers (charges, calls, balance history).

How to look Cassandra in the eye without losing data, stability, and faith in NoSQL

Why Cassandra was chosen is quite understandable - she writes like a machine gun, easily scalable, fault-tolerant.

So, that's what experience gave us

Yes, a failed node is not a tragedy. That is the essence of Cassandra's fault tolerance. But a node can be alive and at the same time start to sag in performance. As it turned out, this immediately affects the performance of the entire cluster.

Cassandra will not insure where Oracle saved with its constraints. And if the author of the application did not understand this in advance, then the double that arrived for Cassandra is no worse than the original. Once arrived, then we will insert.

Free Cassandra "out of the box" sharply disliked IB: there is no logging of user actions, differentiation of rights too. Call information refers to personal data, which means that all attempts to request / change it in any way should be logged with the possibility of subsequent audit. Also, you need to be aware of the need to separate rights at different levels for different users. A simple operation engineer and a superadmin who can freely delete the entire keyspace - these are different roles, different responsibilities, and competencies. Without such differentiation of access rights, the value and integrity of the data will immediately be called into question faster than with the ANY consistency level.

We did not take into account that calls require both serious analytics and periodic sampling for a variety of conditions. Since the selected records are then supposed to be deleted and overwritten (within the framework of the task, we must support the process of updating the data with the data that was initially incorrectly received by us in the data loop), Cassandra is not our friend here. Cassandra is like a piggy bank - it’s convenient to put it in, but you won’t be able to count in it.

Faced with the problem of transferring data to test zones (5 nodes in test versus 20 in prom). Dump cannot be used in this case.

The problem of updating the data schema of the application writing to Cassandra. The rollback will spawn a great many tombstones, which in an unpredictable way can squander our performance. Cassandra is optimized for writes and doesn't think much before writing. Any operation with existing data in it is also a write. That is, by removing the superfluous, we will simply spawn even more records, and only some of them will be marked with tombstones.

Timeouts on insertion. Cassandra in the recording is beautiful, but sometimes the incoming stream can significantly puzzle her. This happens when the application starts circling multiple records that can't be inserted for whatever reason. And we will need quite a real DBA, which will monitor gc.log, system and debug logs for slow query, metrics for compaction pending.

Multiple data centers in a cluster. Where to read and where to write?
Perhaps split into read and write? And if so, should there be a DC for writing or for reading closer to the application? And won't we get a real split brain if we choose the wrong level of consistency? There are a lot of questions, a lot of unexplored settings, opportunities that you really want to twist.

How did we decide

To prevent the node from sinking, SWAP was disabled. And now, with a lack of memory, the node should lie down, and not produce large gc pauses.

So, we no longer rely on logic in the database. Application developers are retrained and begin to actively secure themselves in their own code. Ideal clear separation of data storage and processing.

Bought support from DataStax. Boxed Cassandra has already ceased to be developed (last commit in February 2018). At the same time, Datastax offers an excellent service and a large number of modified and adapted solutions for existing ICs.

I also want to note that Cassandra is not very convenient for select queries. Of course, CQL is a big step towards users (compared to Trift). But if you have entire departments that are used to such convenient joins, free filtering by any field and query optimization capabilities, and these departments are working to close claims and accidents, then the Cassandra solution seems to them hostile and stupid. And we began to decide how our colleagues should make samples.

We considered two options. In the first option, we write calls not only to C *, but also to the Oracle archive database. Only unlike C*, this database stores calls only for the current month (sufficient depth of calls storage for overcharging cases). The following problem was immediately seen here: if we write synchronously, then we lose all the pluses of C * associated with fast insertion, if asynchronously, there is no guarantee that all the necessary calls got into Oracle at all. There was one plus, but a big one: the same familiar PL / SQL Developer remains for operation, that is, we practically implement the “Facade” pattern. An alternative option. We implement a mechanism that unloads calls from C *, pulls some data for enrichment from the corresponding tables in Oracle, joins the resulting selections and gives us the result, which we then somehow use (roll back, repeat, analyze, admire). Cons: the process turns out to be quite multi-step, and besides, there is no interface for the operations staff.

As a result, we settled on the second option. For samples from different jars, Apache Spark was used. The essence of the mechanism was reduced to Java code, which, using the specified keys (subscriber, call time - section keys), pulls data from C *, as well as the necessary data for enrichment from any other database. After that, it joins them in its memory and displays the result in the resulting table. A web muzzle was drawn over the spark and it turned out to be quite usable.

How to look Cassandra in the eye without losing data, stability, and faith in NoSQL

When solving the problem with updating data, the prom-test again considered several solutions. Both the transfer through Sstloader, and the option with splitting the cluster in the test zone into two parts, each of which alternately enters the same cluster with the promotional one, thus being powered from it. When updating the test, it was planned to swap them: the part that worked in the test is cleared and introduced into the prom, and the other starts working with the data separately. However, after thinking again, we more rationally assessed the data that should be transferred, and realized that the calls themselves are an inconsistent entity for tests, quickly generated if necessary, and it is the Prom dataset that has no value for transferring to the test. There are a few storage objects that are worth moving, but it's literally a couple of tables, and not very heavy. Therefore, we Spark again came to the rescue as a solution, with the help of which we wrote and began to actively use the script for transferring data between prom-test tables.

Our current deployment policy allows us to work without rollbacks. Before the promotion, there is a mandatory roll-on to the test, where a mistake is not so expensive. In case of failure, you can always drop the case and roll the whole scheme from the beginning.

To ensure continuous availability of Cassandra, you need a dba and not only it. Everyone who works with the application must understand where and how to look at the current situation and how to diagnose problems in a timely manner. To do this, we actively use the DataStax OpsCenter (Administration and monitoring of workloads), Cassandra Driver system metrics (number of write timeouts in C*, number of timeouts for reading from C*, maximum latency, etc.), we monitor the work of the application itself, working with Cassandra.

When we thought about the previous question, we realized where our main risk might lie. These are data display forms that display data from several independent requests to the storage. Thus, we can get rather inconsistent information. But this problem would be just as relevant if we worked with only one data center. So the most reasonable thing here is, of course, to make a batch function for reading data on a third-party application, which will ensure that data is received in a single period of time. As for the division into reading and writing in terms of performance, here we were stopped by the risk that with some loss of connection between DCs, we can get two completely inconsistent clusters.

Finally, for now stopped at the consistency level for writing EACH_QUORUM, for reading - LOCAL_QUORUM

Brief impressions and conclusions

In order to evaluate the resulting solution in terms of operational support and prospects for further development, we decided to think about where else this development could be applied.

If on the move, then scoring data for programs like “Pay when it’s convenient” (loading information into C *, calculation on Spark scripts), accounting for claims with aggregation by direction, storing roles and calculating user access rights using the role matrix.

As you can see, the repertoire is wide and varied. And if we choose the camp of supporters / opponents of NoSQL, then we will join the supporters, since we received our pluses, and exactly where we expected.

Even the out-of-the-box version of Cassandra allows horizontal scaling in real time, absolutely painlessly solving the issue of increasing data in the system. We managed to move a very highly loaded mechanism for calculating aggregates by calls into a separate circuit, as well as separate the scheme and application logic, getting rid of the vicious practice of writing custom jobs and objects in the database itself. We got the opportunity to choose and configure, to speed up, on which DCs we will calculate, and on which data recording, we insure ourselves against falls of both individual nodes and the DC as a whole.

Applying our architecture to new projects, and having already some experience, I would like to immediately take into account the nuances described above, and avoid some mistakes, smooth out some sharp corners that could not be avoided initially.

For example, the keep track of updates to Cassandra itself in timebecause quite a few issues we got were already known and corrected.

Do not put both the database itself and Spark on the same nodes (or strictly divide by the amount of allowed resource usage), since Spark can eat more OP than it should, and we will quickly get the number 1 problem on our list.

Improve monitoring and operational competence at the stage of project testing. Initially, take into account to the maximum all potential consumers of our solution, because the structure of the database will ultimately depend on this.

Twist the resulting scheme several times for possible optimization. Select which fields can be serialized. Understand what additional tables we should make in order to take into account most correctly and optimally, and then give the required information upon request (for example, assuming that we can store the same data in different tables, taking into account different breakdowns according to different criteria, you can save a lot CPU time for read requests).

Average immediately provide for hanging TTL and cleaning outdated data.

When uploading data from Cassandra the application logic should work according to the FETCH principle so that not all rows are loaded into memory at one time, but are selected in batches.

It is desirable before transferring the project to the described solution test the fault tolerance of the system by conducting a series of crash tests, such as data loss in one data center, restoration of corrupted data for a certain period, network drawdown between data centers. Such tests will not only allow you to evaluate the pros and cons of the proposed architecture, but also give good warm-up practice to the engineers conducting them, and the acquired skill will be far from superfluous if system failures are reproduced in the industry.

If we work with critical information (such as data for billing, calculation of subscriber debt), then we should also pay attention to tools that will reduce the risks arising due to the peculiarities of the DBMS. For example, use the nodesync utility (Datastax), having developed an optimal strategy for its use in order to for the sake of consistency, do not form an excessive load on Cassandra and use it only for certain tables in a certain period.

What, after six months of life, with Cassandra? In general, there are no unresolved problems. We also did not allow serious accidents and data loss. Yes, we had to think about compensating for some problems that did not arise before, but in the end it did not overshadow our architectural solution. If you are willing and not afraid to try something new, and at the same time do not want to be very disappointed, then get ready for the fact that nothing is free. You will have to understand, delve into the documentation and collect your individual rake more than in the old legacy solution, and no theory will tell you in advance what kind of rake is waiting for you.

Source: habr.com

Add a comment