Was MongoDB generally the right choice?

I recently found out that Red Hat removes MongoDB support from Satellite (say, due to license changes). It got me thinking that in the last few years I've seen a bunch of articles about how terrible MongoDB is and that no one should ever use it. But during this time, MongoDB has become a much more mature product. What happened? Is all the hatred really due to mistakes at the beginning of the marketing of the new DBMS? Or are people just using MongoDB in the wrong place?

If you suddenly feel like I'm defending MongoDB, please read disclaimer at the end of the article.

New trend

I've been in the software industry for more years than it's fair to say, but still I've only been part of the trends that hit our industry. I have witnessed the rise of 4GL, AOP, Agile, SOA, Web 2.0, AJAX, blockchain… the list is endless. Every year there are new trends. Some are fading fast, while others are fundamentally changing the way software is developed.

Around each new trend, a certain general excitement is created: people either jump into the boat themselves, or see the noise generated by others - and follow the crowd. This process has been codified by Gartner in Hype cycle. Although debatable, this graph roughly describes what happens to technologies before they eventually become useful for use.

But from time to time there is (or there is a second coming, as in this case) a new innovation, driven by only one specific implementation of it. In the case of NoSQL, the hype was heavily driven by the advent and meteoric rise of MongoDB. MongoDB did not start this trend: in fact, large Internet companies began to have problems with processing large amounts of data, which led to the return of non-relational databases. The general movement started with projects such as Google's Bigtable and Facebook's Cassandra, but it was MongoDB that became the most famous and accessible implementation of the NoSQL database that most developers had access to.

Note: You might think that I'm confusing document databases with column databases, key/value stores, or any of the many other types of data stores that fall under the general definition of NoSQL. And you are right. But at the time, chaos reigned. Everyone is obsessed with NoSQL, it has become everything absolutely necessary, although many did not see the differences in different technologies. For many, MongoDB has become synonymous with NoSQL.

And the developers jumped on it. The idea of ​​a schemaless database that magically scales to solve any problem was pretty tempting. Around 2014, it seemed that everywhere a relational database such as MySQL, Postgres or SQL Server was used a year ago, MongoDB databases were being deployed. When asked why, you could get answers from the banal β€œthis is the scale of the web” to the more thoughtful β€œmy data is very loosely structured and fits well in a database without a schema.”

It's important to remember that MongoDB, and document databases in general, solve a number of problems with traditional relational databases:

  • Strict scheme: with a relational database, if you have dynamically generated data, you are forced to either create a bunch of random "different" data columns, push data blobs in there, or use a configuration EAV… all this has significant drawbacks.
  • Difficulty of scaling: If there is so much data that it does not fit on one server, MongoDB offered mechanisms to allow it to scale out across multiple machines.
  • Complex circuit modifications: no migrations! In a relational database, changing the structure of the database can be a huge problem (especially when there is a lot of data). MongoDB has been able to greatly simplify the process. And made it so easy that you can just update the schema on the go and move on really fast.
  • Write performance: MongoDB performance was good, especially when properly tuned. Even the out-of-the-box configuration of MongoDB, for which it was often criticized, showed some impressive performance figures.

All risks are on you

The potential benefits of MongoDB were enormous, especially for certain classes of problems. If you read the above list without understanding the context and having no experience, then you might get the impression that MongoDB is really a revolutionary DBMS. The only problem was that the benefits listed above came with a number of caveats, some of which are listed below.

To be fair, no one at 10gen/MongoDB Inc. will not say that the following is not true, these are just compromises.

  • Loss of transactionsA: Transactions are a core feature of many relational databases (not all, but most). Transactional means that you can perform multiple operations atomically and can ensure that the data stays consistent. Of course, with a NoSQL database, transactionality can be within a single document, or you can use two-phase commits to get transactional semantics. But you will have to implement this functionality yourself... which can be a difficult and time consuming task. Often you don't realize the problem until you see that the data in the database gets into invalid states because it's impossible to guarantee the atomicity of the operations. Note: I've been told by many that transactions were introduced in MongoDB 4.0 last year, but with some limitations. The conclusion from the article remains the same: assess how the technology fits your needs.
  • Loss of relational integrity (foreign keys): if your data has relationships, then you will have to apply them in the application. Having a database that respects these relationships will take a lot of work off the application and therefore on your programmers.
  • Inability to apply the data structure: Strict schemas can sometimes be a big problem, but they are also a powerful mechanism for good data structuring if used wisely. Document databases like MongoDB provide incredible schema flexibility, but that flexibility takes away the responsibility of keeping the data clean. If you don't take care of them, you will end up writing a lot of code in your application to account for data that is not stored in the form you expect. As they often say in our company Simple Thread… the application will be rewritten someday, but the data will live forever. Note: MongoDB supports schema validation, which is useful but does not provide the same guarantees as a relational database. First of all, adding or changing schema validation does not affect existing data in the collection. You must make sure that you update the data according to the new schema. Decide for yourself if this is enough for your needs.
  • Own query language / loss of tool ecosystem: The advent of SQL was an absolute revolution, and nothing has changed since then. It's an incredibly powerful language, but also quite complex. The need to construct database queries in a new language, consisting of JSON fragments, is regarded as a big step back by people who have experience with SQL. There is a whole universe of tools that interact with SQL databases, from IDEs to reporting tools. Moving to a database that doesn't support SQL means you can't use most of these tools, or you need to convert the data to SQL in order to use them, which can be more difficult than you think.

Many developers who turned to MongoDB didn't really understand the trade-offs, and often dived headfirst into setting it up as their primary data store. After that, it was often incredibly difficult to go back.

What could have been done differently?

Not everyone jumped head first and crashed into the bottom. But many projects have installed the MongoDB base where it simply did not fit - and they will have to live with it for many more years. If these organizations had taken some time to methodically consider their technology choices, many would have made a different choice.

How to choose the right technology? There have been several attempts to create a systematic framework for technology assessment, such as "Framework for the implementation of technologies in software organizations" ΠΈ "Framefork for evaluating software technologies", but it seems to me that this is an unnecessary complexity.

Many technologies can be valued intelligently by asking just two basic questions. The problem lies in finding people who can answer them responsibly, taking the time to find answers and without bias.

If you don't face some problem, you don't need a new tool. Dot.

Question 1: What problems am I trying to solve?

If you don't face some problem, you don't need a new tool. Dot. No need to look for a solution and then come up with a problem. Unless you're facing a problem that the new technology doesn't solve significantly better than your existing technology, then there's nothing to discuss here. If you're considering using this technology because you've seen others use it, think about the problems they're having and ask if you're having those problems. It's easy to embrace technology because others are using it, the difficulty is knowing if you're facing the same issues.

Question 2: What am I missing?

This is certainly a more difficult question, because you have to dig and understand both the old and the new technology well. Sometimes you can't really understand a new one until you've built something with it or have a colleague with that experience.

If you don't have either, then it makes sense to think about the minimum possible investment to determine the value of this instrument. And if you make an investment, how difficult will it be to reverse the decision?

People always ruin everything

In trying to answer these questions as impartially as possible, remember one thing: you have to fight human nature. There are a number of cognitive biases that must be overcome in order to effectively evaluate technology. Here are just a few:

  • The effect of joining the majority Everyone knows about him, but it's still hard to fight him. Just make sure the technology really suits your real needs.
  • novelty effect Many developers tend to underestimate technologies they have been working with for a long time and overestimate the benefits of a new technology. Not only programmers, everyone is subject to this cognitive bias.
  • Positive Attribute Effect We tend to see what is and lose sight of what is not. This can lead to chaos, combined with the novelty effect, as you not only inherently overvalue the new technology, but also ignore its shortcomings..

Objective assessment is not easy, but understanding the underlying cognitive biases will help you make more rational decisions.

Summary

When an innovation emerges, two questions need to be answered with great care:

  • Does this tool solve a real problem?
  • Are we good at understanding trade-offs?

If you can't confidently answer these two questions, take a few steps back and think.

So was la MongoDB generally the right choice? Of course yes; as with most engineering technologies, it depends on many factors. Among those who answered these two questions, many have benefited from MongoDB and continue to do so. For those of you who haven't, I hope you've learned a valuable and not too painful lesson about moving through the hype cycle.

Disclaimer

I want to clarify that I neither love nor hate MongoDB. We just didn't have the kind of problems that MongoDB is best suited to solve. I know 10gen/MongoDB Inc. acted very boldly at first, setting insecure defaults and promoting MongoDB everywhere (especially at hackathons) as a one-stop solution for working with any data. It was probably a bad decision. But it confirms the approach described here: these problems could be detected very quickly even with a superficial assessment of the technology.

Source: habr.com

Add a comment