14 things I wish I knew before getting started with MongoDB

The translation of the article was prepared on the eve of the start of the course "Non-relational databases".

14 things I wish I knew before getting started with MongoDB

Highlights:

  • It is extremely important to develop a schema, even though it is optional in MongoDB.
  • Likewise, indexes must match your schema and access patterns.
  • Avoid using large objects and large arrays.
  • Be careful with MongoDB settings, especially when it comes to security and reliability.
  • MongoDB does not have a query optimizer, so you must be careful when performing query operations.

I have been working with databases for a very long time, but only recently discovered MongoDB. There are a few things I would like to know before starting to work with her. When a person already has experience in a certain area, he has preconceived notions about what databases are and what they do. In the hope of making it easier for others to understand, here is a list of common mistakes.

Creating a MongoDB Server Without Authentication

Unfortunately, MongoDB comes with no authentication by default. For a workstation accessed locally, this practice is normal. But since MongoDB is a multi-user system that likes to use large amounts of memory, it's best if you put it on a server with as much RAM as possible in your environment, even if you only intend to use it for development. Installing on the server through the default port can be problematic, especially if any javascript code can be executed in the request (for example, $where as an idea for injections).

There are several authentication methods, but the easiest is to set up an ID/password for the user. Use this idea as you think about fancy authentication based on LDAP. In terms of security, MongoDB should be kept up to date and the logs should always be checked for tampering. For example, I like to choose a different port as the default port.

Don't forget to bind the attack surface to MongoDB

MongoDB Security Checklist contains good tips for reducing the risk of network penetration and data leakage. It's easy to brush aside and say that a development server doesn't need a high level of security. However, things are not so simple and this applies to all MongoDB servers. In particular, unless there is a good reason to use mapReduce, group or $where, you need to disable the use of arbitrary JavaScript code by writing in the configuration file javascriptEnabled:false. Since data files are not encrypted in standard MongoDB, it makes sense to start MongoDB with Dedicated User, which has full access to files, with limited access only to itself, and the ability to use the operating system's own file access controls.

Schema Design Error

MongoDB does not use a schema. But this does not mean that the scheme is not needed. If you just want to store documents without any agreed upon schema, saving them can be quick and easy, but retrieving them later can be damn hard.

Classic article "6 Rules of Thumb for Designing MongoDB Schemas worth a read, and features like Schema Explorer in a third party tool called Studio 3T, should be used for regular circuit checks.

Don't forget sort order

Forgetting about sort order is the most frustrating and time-consuming way of doing it than any other misconfiguration. By default, MongoBD uses binary sort. But it is unlikely that it will be useful to anyone. Case-sensitive, stress-sensitive, binary sorts were considered curious anachronisms along with beads, kaftans and curling mustaches back in the 80s of the last century. Now their use is unforgivable. In real life, "motorcycle" is the same as "Motorcycle". And "Britain" and "Britain" are the same place. A lowercase letter is simply the uppercase equivalent of a capital letter. And don't make me talk about sorting diacritics. When creating a database in MongoDB, use collation without regard to accent and register, which correspond to the language and system user culture. This will greatly simplify the search for string data.

Creating collections with large documents

MongoDB is happy to host large documents up to 16MB in collections, and GridFS designed for large documents larger than 16 MB. But just because large documents can be placed there, storing them there is not a good idea. MongoDB works best if you store individual documents that are several kilobytes in size, treating them more like rows in a wide SQL table. Large documents will be a source of problems with performance.

Creating Documents with Large Arrays

Documents can contain arrays. It is best if the number of elements in the array is far from a four-digit number. If elements are added to an array frequently, it will outgrow its containing document and will need to be move, which means it will be necessary update and indexes. When re-indexing a document with a large array, the indexes will often be overwritten, since there is a subscript for each element. recordA that holds its index. This reindexing also occurs when a document is inserted or deleted.

MongoDB has something called "fill factor", which provides room for documents to grow to minimize this problem.
You might think that you can do without array indexing. Unfortunately, due to the lack of indexes, you may have other problems. Since documents are scanned from start to finish, finding elements at the end of the array will take more time, and most of the operations associated with such a document will be slow.

Don't forget that the order of the stages in an aggregation matters.

In a database system with a query optimizer, the queries you write are explanations of what you want to get, not how to get it. This mechanism works by analogy with an order in a restaurant: usually you just order a dish, and do not give detailed instructions to the chef.

In MongoDB, you instruct the cook. For example, you need to make sure that the data passes through reduce as early as possible in the pipeline with $match и $project, and sorting occurs only after reduce, and that the lookup happens in exactly the order you want it to. Having a query optimizer that gets rid of extra work, optimally arranges steps, and chooses a join type can spoil you. With MongoDB, you have more control at the cost of convenience.

Tools such as Studio 3T simplify the construction of aggregation queries in MongoDB. The Aggregation Editor feature will allow you to apply pipeline operators one step at a time, and validate the input and output of each step for easier debugging.

Using Quick Recording

Never set MongoDB write options with high speed but low reliability. This mode file-and-forget seems fast because the command returns before the write is done. If the system crashes before the data is written to disk, it will be lost and left in an inconsistent state. Luckily, 64-bit MongoDB has logging enabled.

The MMAPv1 and WiredTiger storage engines use logging to prevent this, although WiredTiger can recover to the last negotiated checkpointif logging is disabled.

Journaling ensures that the database is in a consistent state after recovery and retains all data until the time it is written to the journal. The frequency of recordings is configured using the parameter commitIntervalMs.

To be sure of the entries, make sure that logging is enabled in the configuration file (storage.journal.enabled), and the periodicity of records corresponds to the amount of information that you can afford to lose.

Sorting without index

When searching and aggregating, there is often a need to sort data. Let's hope that this is done at one of the final stages, after filtering the result in order to reduce the amount of data to be sorted. And even in this case, for sorting you need index. You can use a single or composite index.

If there is no suitable index, MongoDB will do without it. There is a 32 MB memory limit on the total size of all documents in sort operations, and if MongoDB reaches that limit, it will either throw an error or return empty recordset.

Search without index support

Search queries perform a function similar to the JOIN operation in SQL. To work best, they need the index of the key value used as the foreign key. This is not obvious since the usage is not reflected in explain(). Such indexes are in addition to the index written in explain(), which in turn is used by pipeline operators $match и $sortwhen they meet at the beginning of the pipeline. Indexes can now cover any stage aggregation pipeline.

Refusal to use multi-updates

Method db.collection.update() used to change part of an existing document or the whole document, up to a complete replacement, depending on the parameter you specify update. It's not so obvious that it won't process all the documents in the collection until you set the option multi to update all documents matching the query criteria.

Don't forget the importance of the order of keys in a hash table

In JSON, an object consists of an unordered collection of zero or more name/value pairs, where name is a string and value is a string, number, boolean, zero, object, or array.

Unfortunately, BSON places a lot of importance on search order. In MongoDB order of keys inside built-in objects has the meaningIe { firstname: "Phil", surname: "factor" } - is not the same as { { surname: "factor", firstname: "Phil" }. That is, you must store the order of name/value pairs in documents if you want to be sure you find them.

Do not confuse "Null" и "undefined"

Value "undefined" was never valid in JSON, according to official standard JSON (ECMA-404, Section 5), even though it is used in JavaScript. Moreover, for BSON it is deprecated and is converted to $nullwhich is not always a good solution. Avoid using "undefined" in MongoDB.

Using $limit() without $sort()

Very often when you're developing in MongoDB, it's helpful to just see a sample of the result that will be returned from a query or aggregation. For this task, you will need $limit(), but it should never be in the final version of the code, unless you use before it $sort. This mechanic is needed because otherwise you can't guarantee the order of the result, and you won't be able to reliably view the data. At the top of the result, you will get different entries depending on the sort. To work reliably, queries and aggregations must be deterministic, that is, produce the same results each time they are executed. The code that has $limit(), but no $sort, will not be deterministic and may subsequently cause errors that are difficult to track down.

Conclusion

The only way to get frustrated with MongoDB is to compare it directly to another type of database, such as a RDBMS, or come to use it based on certain expectations. It's like comparing an orange to a fork. Database systems serve specific purposes. It is best to simply understand and appreciate these differences for yourself. It would be a shame to put pressure on the MongoDB developers because of the path that forced them to go the DBMS path. I want to see new and interesting ways to solve old problems, such as ensuring data integrity and building data systems that are resilient to failure and malicious attacks.

MongoDB's 4.0 implementation of ACID transactionality is a good example of introducing important improvements in an innovative way. Multi-document and multi-statement transactions are now atomic. It also became possible to adjust the time required to obtain locks and end hung transactions, as well as change the isolation level.

14 things I wish I knew before getting started with MongoDB

Read more:

Source: habr.com

Add a comment