The translation of the article was prepared on the eve of the start of the course
Highlights:
- It is extremely important to develop a schema, even though it is optional in MongoDB.
- Likewise, indexes must match your schema and access patterns.
- Avoid using large objects and large arrays.
- Be careful with MongoDB settings, especially when it comes to security and reliability.
- MongoDB does not have a query optimizer, so you must be careful when performing query operations.
I have been working with databases for a very long time, but only recently discovered MongoDB. There are a few things I would like to know before starting to work with her. When a person already has experience in a certain area, he has preconceived notions about what databases are and what they do. In the hope of making it easier for others to understand, here is a list of common mistakes.
Creating a MongoDB Server Without Authentication
Unfortunately, MongoDB comes with no authentication by default. For a workstation accessed locally, this practice is normal. But since MongoDB is a multi-user system that likes to use large amounts of memory, it's best if you put it on a server with as much RAM as possible in your environment, even if you only intend to use it for development. Installing on the server through the default port can be problematic, especially if any javascript code can be executed in the request (for example, $where
as an idea for
There are several authentication methods, but the easiest is to set up an ID/password for the user. Use this idea as you think about fancy authentication based on
Don't forget to bind the attack surface to MongoDB
,
or
. Since data files are not encrypted in standard MongoDB, it makes sense to start MongoDB with
Schema Design Error
MongoDB does not use a schema. But this does not mean that the scheme is not needed. If you just want to store documents without any agreed upon schema, saving them can be quick and easy, but retrieving them later can be
Classic article "
Don't forget sort order
Forgetting about sort order is the most frustrating and time-consuming way of doing it than any other misconfiguration. By default, MongoBD uses
Creating collections with large documents
MongoDB is happy to host large documents up to 16MB in collections, and
Creating Documents with Large Arrays
Documents can contain arrays. It is best if the number of elements in the array is far from a four-digit number. If elements are added to an array frequently, it will outgrow its containing document and will need to be
MongoDB has something called
You might think that you can do without array indexing. Unfortunately, due to the lack of indexes, you may have other problems. Since documents are scanned from start to finish, finding elements at the end of the array will take more time, and most of the operations associated with such a document will be
Don't forget that the order of the stages in an aggregation matters.
In a database system with a query optimizer, the queries you write are explanations of what you want to get, not how to get it. This mechanism works by analogy with an order in a restaurant: usually you just order a dish, and do not give detailed instructions to the chef.
In MongoDB, you instruct the cook. For example, you need to make sure that the data passes through reduce
as early as possible in the pipeline with $match
и $project
, and sorting occurs only after reduce
, and that the lookup happens in exactly the order you want it to. Having a query optimizer that gets rid of extra work, optimally arranges steps, and chooses a join type can spoil you. With MongoDB, you have more control at the cost of convenience.
Tools such as
Using Quick Recording
Never set MongoDB write options with high speed but low reliability. This mode file-and-forget seems fast because the command returns before the write is done. If the system crashes before the data is written to disk, it will be lost and left in an inconsistent state. Luckily, 64-bit MongoDB has logging enabled.
The MMAPv1 and WiredTiger storage engines use logging to prevent this, although WiredTiger can recover to the last negotiated
Journaling ensures that the database is in a consistent state after recovery and retains all data until the time it is written to the journal. The frequency of recordings is configured using the parameter
.
To be sure of the entries, make sure that logging is enabled in the configuration file
, and the periodicity of records corresponds to the amount of information that you can afford to lose.
Sorting without index
When searching and aggregating, there is often a need to sort data. Let's hope that this is done at one of the final stages, after filtering the result in order to reduce the amount of data to be sorted. And even in this case, for sorting you need
If there is no suitable index, MongoDB will do without it. There is a 32 MB memory limit on the total size of all documents in
Search without index support
Search queries perform a function similar to the JOIN operation in SQL. To work best, they need the index of the key value used as the foreign key. This is not obvious since the usage is not reflected in explain()
. Such indexes are in addition to the index written in explain()
, which in turn is used by pipeline operators $match
и $sort
when they meet at the beginning of the pipeline. Indexes can now cover any stage
Refusal to use multi-updates
Method
used to change part of an existing document or the whole document, up to a complete replacement, depending on the parameter you specify
. It's not so obvious that it won't process all the documents in the collection until you set the option
to update all documents matching the query criteria.
Don't forget the importance of the order of keys in a hash table
In JSON, an object consists of an unordered collection of zero or more name/value pairs, where name is a string and value is a string, number, boolean, zero, object, or array.
Unfortunately, BSON places a lot of importance on search order. In MongoDB order of keys inside built-in objects { firstname: "Phil", surname: "factor" }
- is not the same as { { surname: "factor", firstname: "Phil" }
. That is, you must store the order of name/value pairs in documents if you want to be sure you find them.
Do not confuse "Null" и "undefined"
Value "undefined" was never valid in JSON, according to $null
which is not always a good solution.
Using $limit()
without $sort()
Very often when you're developing in MongoDB, it's helpful to just see a sample of the result that will be returned from a query or aggregation. For this task, you will need $limit()
, but it should never be in the final version of the code, unless you use before it $sort
. This mechanic is needed because otherwise you can't guarantee the order of the result, and you won't be able to reliably view the data. At the top of the result, you will get different entries depending on the sort. To work reliably, queries and aggregations must be deterministic, that is, produce the same results each time they are executed. The code that has $limit()
, but no $sort
, will not be deterministic and may subsequently cause errors that are difficult to track down.
Conclusion
The only way to get frustrated with MongoDB is to compare it directly to another type of database, such as a RDBMS, or come to use it based on certain expectations. It's like comparing an orange to a fork. Database systems serve specific purposes. It is best to simply understand and appreciate these differences for yourself. It would be a shame to put pressure on the MongoDB developers because of the path that forced them to go the DBMS path. I want to see new and interesting ways to solve old problems, such as ensuring data integrity and building data systems that are resilient to failure and malicious attacks.
MongoDB's 4.0 implementation of ACID transactionality is a good example of introducing important improvements in an innovative way. Multi-document and multi-statement transactions are now atomic. It also became possible to adjust the time required to obtain locks and end hung transactions, as well as change the isolation level.
Read more:
Source: habr.com