This database is on fire...

This database is on fire...

Let me tell a technical story.

Many years ago, I developed an application with collaboration features built into it. It was a handy experimental stack that used the full potential of early React and CouchDB. It synchronized data in real time over JSON OT. It was used in the internal work of the company, but the wide applicability and potential in other areas was obvious.

While trying to sell this technology to potential customers, we ran into an unexpected hurdle. In the demo video, our technology looked and worked great, no issues here. The video showed exactly how it works, and nothing was simulated in it. We came up with and coded a realistic scenario for using the program.

This database is on fire...
In fact, this has become a problem. Our demo worked exactly the way everyone else imitated the work of their applications. Specifically, information was instantly transferred from A to B, even if it was large media files. After logging in, each user saw new entries. With the help of the application, different users could work together clearly on the same projects, even in the event of an interrupted Internet connection somewhere in the village. This is implicitly implicit in any After Effects cut product video.

While everyone knew what the Refresh button was for, no one completely understood that the web apps they ask us to build are usually subject to their limitations. And that if they are no longer needed, then the user experience will be completely different. Basically, they noticed that it was possible to “chat” by leaving notes to the interlocutors, so they wondered how this was different, for example, from Slack. Uff!

Everyday sync design

If you already have experience in software development, then it must get on your nerves to remember that most people can't just look at a picture of an interface and figure out what it will do when interacting with it. Not to mention what happens inside the program itself. Knowledge about what can to happen is largely the result of knowing what cannot happen and what should not happen. This requires mental model not only what the software does, but also how its individual parts are coordinated and communicate with each other.

A classic example of this is a user looking at spinner.gifwondering when the work will finally be completed. The developer would realize that the process is probably hung, and that the gif will never disappear from the screen. This animation simulates the execution of work, but is not related to its state. In cases like this, some techies like to roll their eyes, marveling at the extent to which users are confused. However, notice which one of them points to the rotating clock and says that it is actually standing still?

This database is on fire...
This is the essence of the value of real time. These days, real-time databases are still very little used and viewed with suspicion by many. Most of these databases are actively leaning towards the NoSQL style, which is why they usually use Mongo-based solutions, which are better to forget about. However, for me, this means the comfort of working with CouchDB, as well as learning how to design structures that will be able to fill in data not only some bureaucrat. I think I'm making the best use of my time.

But the real topic of this post is what I use today. Not by choice, but because of indifferent and blindly applied corporate policy. So I'm going to give you a Totally Fair and Unbiased comparison of two closely related Google real-time database products.

This database is on fire...
Both have the word Fire in their names. One I remember fondly. The second for me is a different kind of fire. I'm in no hurry to say their names, because as soon as I do, we will face the first big problem - the names.

The first one is called Firebase Real-Time Database, and second - Firebase Cloud Firestore. Both are products from firebase suite Google. Their APIs are named, respectively, firebase.database(…) и firebase.firestore(…).

This happened because Real Time Database - it's just the original Firebase prior to its purchase by Google in 2014. Then Google decided to create as a parallel product copy Firebase is based on a big data company and named it Firestore with a cloud. I hope you are not confused yet. If you still get confused, do not worry, I myself rewrote this part of the article ten times.

Because you have to point Firebase in question about Firebase, and Firestore in a question about Firebase, at least to be understood a few years ago on Stack Overflow.

If there was an award for the worst naming of software products, then this case would definitely become one of the contenders. The Hamming distance between these names is so small that it confuses even experienced engineers whose fingers type one name while their head thinks of another. These are miserably failed plans, devised with the best of intentions; they fulfilled the prophecy that the database would be on fire. And I'm not kidding. The person who came up with this naming scheme caused blood, sweat and tears.

This database is on fire...

Pyrrhic victory

One might think that Firestore is replacement Firebase, its next generation descendant, but that would be misleading. Firestore is guaranteed not to be a replacement for Firebase. It seems that someone cut out everything interesting from it, and confused most of the rest in various ways.

However, a quick look at the two products can be confusing: they seem to do the same thing, through basically the same APIs, and even in the same database session. Differences are subtle and only come to light upon careful comparative study of lengthy documentation. Or when you're trying to port code that works perfectly on Firebase to work with Firestore. Even then, you find out that the database interface lights up as soon as you try to perform real-time drag and drop. I repeat, I'm not kidding.

The Firebase client is polite in the sense that it buffers changes and performs automatic update retries that prioritize the last write operation. However, Firestore has a limit of 1 document write operation per user per second, and this limit is enforced by the server. When working with it, you yourself must find a way to get around it and implement an update rate limiter, even when you are just trying to create your application. That is, Firestore is a real-time database without a real-time client, which masquerades as it using the API.

This is where we begin to see the first signs of the Firestore's raison d'être. I may be wrong, but I suspect that someone high up in Google's leadership looked after the purchase on Firebase and simply said, "No, my god, no. This is unacceptable. Just not under my direction."

This database is on fire...
He came from his chambers and proclaimed:

“One big JSON document? No. You will separate the data into separate documents, each of which will be no more than 1 megabyte in size.

It seems that such a limitation will not survive the first encounter with any sufficiently motivated user base. You know it is. At work, for example, we have more than one and a half thousand presentations, and this is Perfectly Normal.

With this limitation, you will have to come to terms with the fact that one "document" in the database will not look like any object that the user might call a document.

"Arrays of arrays that can contain other elements recursively? No. Arrays will only contain fixed length objects or numbers as intended by the Lord."

So if you were hoping to put GeoJSON in your Firestore, you will find that this is not possible. Nothing non-one-dimensional is allowed. I hope you love Base64 and/or JSON within JSON.

"Import and export JSON over HTTP, command line tools or admin panel? No. You will only be able to export and import data to Google Cloud Storage. That's what it's called now, I think. And when I say "you" I'm only referring to those who have Project Owner authority. Everyone else can go and create tickets."

As you can see, the FireBase data model is easy to describe. It contains one huge JSON document that maps JSON keys to URL paths. If you write with HTTP PUT в / firebase the following:

{
  "hello": "world"
}

That GET /hello will return "world". Basically it works exactly as you would expect. Collection of FireBase objects /my-collection/:id equivalent to a JSON dictionary {"my-collection": {...}} at the root, whose contents are available in /my-collection:

{
  "id1": {...object},
  "id2": {...object},
  "id3": {...object},
  // ...
}

This works fine if each insert has a non-collision ID, for which there is a standard solution in the system.

In other words, the database is 100% JSON(*) compliant and works great with HTTP like CouchDB does. But you mostly use it through a real-time API that abstracts away websockets, authorization, and subscriptions. The admin panel has both capabilities, allowing for both live editing and JSON import/export. If you stick to the same code in your code, you'll be surprised how much specialized code is wasted when you realize that patch and diff JSON can solve 90% of the routine tasks of handling persistent state.

The Firestore data model is similar to JSON, but differs from it in some critical ways. I already mentioned the lack of arrays within arrays. The sub-collections model is for them to be first class concepts, separate from the containing JSON document. Since there is no out-of-the-box serialization for this, getting and writing data requires a specialized code execution path. To process your own collections, you need to write your own scripts and tools. The admin panel only allows you to make small changes one field at a time and has no import/export capabilities.

They took a real-time NoSQL database and turned it into a slow non-SQL with auto-join and a separate non-JSON column. Something like GraftQL.

This database is on fire...

Hot Java

If Firestore was supposed to be more reliable and scalable, then the irony is that the average developer will get a less reliable solution than choosing FireBase out of the box. The kind of software that the Grumpy DBA needs requires a level of effort and caliber that is simply unrealistic for the niche that the product is supposed to be good at. It's like HTML5 Canvas isn't a Flash replacement at all if you don't have development tools and a player. Moreover, Firestore is mired in the pursuit of data cleanliness and sterile validation, which just doesn't fit in with how the average business user likes to work: everything is optional for him, because until the very end everything is a draft.

The main disadvantage of FireBase is that the client was created several years ahead of its time, even before most web developers knew about immutability. Because of this, FireBase assumes that you will be changing the data and therefore does not take advantage of user-provided immutability. In addition, it does not reuse data in snapshots sent to the user, which makes it much more difficult to diff. For large documents, its transaction mechanism based on mutable diffs is simply inadequate. Guys, we already have WeakMap in JavaScript. It's comfortable.

If you give the data the right shape, and do not make the trees too voluminous, then this problem can be circumvented. But I'm curious if FireBase would be much more interesting if the developers released a really good client API using immutability coupled with some serious practical advice on database design. Instead, they seemed to try to fix what wasn't broken, and that made it worse.

I don't know all the logic behind the creation of the Firestore. Reasoning about the motives that arise inside the black box is also part of the entertainment. Such a contrast between two extremely similar but incomparable databases is quite rare. As if someone thought: "Firebase is just a feature that we can emulate on Google Cloud", but has yet to discover the concept of defining real-world requirements or creating useful solutions that meet all of these requirements. “Let the developers think about it. Just make the UI look pretty… can you add more fire?”

I understand a couple of things about data structures. I can definitely see that the concept of "everything in one big JSON tree" is an attempt to abstract any sense of large-scale structure from the database. Expecting software to just deal with any questionable data structure fractal is just crazy. I don't even have to imagine how bad things can get, I've done rigorous code audits and I saw things that you people never dreamed of. But I also know what good structures look like how to use them и why should it be done. I can imagine a world where Firestore would seem quite logical, and the people who created it would think that they did a good job. But we don't live in this world.

FireBase's query building support is bad by any standards, it's practically non-existent. It definitely needs improvement or at least revision. But Firestore isn't much better, as it's limited to the same one-dimensional indexes found in plain SQL. If you want queries that people run on chaotic data, then you need full-text search, multiple range filters, and arbitrary user-specified order. On closer examination, the functions of plain SQL are themselves too limited. Also, the only SQL queries people can run in production are fast queries. You will need a specialized indexing solution with thoughtful data structures. For everything else, at least there should be incremental map-reduce or something similar.

If you search the Google docs for this, you will hopefully be pointed in the direction of something like BigTable and BigQuery. However, all these decisions are accompanied by such a volume of thick corporate sales jargon that you will quickly go back and start looking for something else.

The last thing you need in the case of a real-time database is something created by people and for people working on a salary scale for management.

(*) This is a joke, there is no such thing as 100% JSON compatible.

As advertising

Looking for VDS for debugging projects, server for development and hosting? You are definitely our client 🙂 Daily billing for servers of various configurations, anti-DDoS and Windows licenses are already included in the price.

This database is on fire...

Source: habr.com

Add a comment