How to scale from 1 to 100 users

Many startups have gone through this: hordes of new users register every day, and the development team struggles to keep the service running.

This is a nice challenge, but there is little clear information on the web on how to accurately scale a web application from zero to hundreds of thousands of users. Usually there are either fire solutions or debottlenecking (and often both). So people use pretty formulaic tricks to scale their amateur project into something really serious.

Let's try to filter the information and write down the main formula. We are going to scale our new Graminsta photo sharing site step by step from 1 to 100 users.

Let's write down what specific actions need to be taken when increasing the audience to 10, 100, 1000, 10 and 000 people.

1 user: 1 car

Almost every application, be it a website or a mobile app, has three key components:

  • API
  • database
  • client (mobile app or website itself)

The database stores persistent data. The API serves requests to and around this data. The client sends data to the user.

I came to the conclusion that it is much easier to talk about scaling an application if, from an architectural point of view, the client and API entities are completely separated.

When we first start building an application, all three components can be run on the same server. In a way, this is similar to our development environment: one engineer runs the database, API, and client on the same machine.

Theoretically, we could deploy it in the cloud on a single DigitalOcean Droplet or AWS EC2 instance, as shown below:
How to scale from 1 to 100 users
With that said, if there will be more than one user on the site, it almost always makes sense to separate the database layer.

10 users: moving the database to a separate level

Splitting the database into managed services like Amazon RDS or Digital Ocean Managed Database will serve us well for a long time to come. It's a little more expensive than self-hosting on a single machine or EC2 instance, but with these services you get a lot of useful future-proofing extensions out of the box: multi-region redundancy, read replicas, automatic backups, and more.

Here's what the system looks like now:
How to scale from 1 to 100 users

100 users: removal of the client to a separate level

Luckily, our app was very liked by the first users. Traffic is becoming more stable, so it's time to move the client to a separate layer. It should be noted that division entities is a key aspect of building a scalable application. Since one part of the system gets more traffic, we can split it up so that we can control the scaling of the service based on specific traffic patterns.

That's why I like to think of the client as separate from the API. This makes it very easy to talk about developing for multiple platforms: web, mobile web, iOS, Android, desktop apps, third party services, etc. They are all just clients using the same API.

For example, now our users most often ask to release a mobile application. If you separate the client and API entities, then this becomes easier.

Here's what such a system looks like:

How to scale from 1 to 100 users

1000 users: add a load balancer

Things are going well. Graminsta users are uploading more and more photos. The number of registrations is also growing. Our lone API server is having a hard time keeping up with all the traffic. Need more iron!

The load balancer is a very powerful concept. The key idea is that we put a balancer in front of the API, and it distributes traffic to individual service instances. This is how horizontal scaling is done, that is, we add more servers with the same code, increasing the number of requests that we can handle.

We are going to place separate load balancers in front of the web client and in front of the API. This means that you can launch multiple instances that run API code and web client code. The load balancer will direct requests to the server that is less loaded.

Here we get another important advantage - redundancy. When one instance fails (maybe overloaded or crashes), then we are left with others that are still responding to incoming requests. If a single instance were running, then the entire system would crash if it crashed.

The load balancer also provides automatic scaling. We can tweak it to increase the number of instances before peak load and decrease when all users are asleep.

With a load balancer, the API layer can be scaled almost indefinitely by simply adding new instances as the number of requests increases.

How to scale from 1 to 100 users

Note. At the moment, our system is very similar to what PaaS companies like Heroku or Elastic Beanstalk on AWS offer out of the box (which is why they are so popular). Heroku puts the database on a separate host, manages an autoscaling load balancer, and allows you to host the web client separately from the API. This is a great reason to use Heroku for projects or early stage startups - you get all the basic services out of the box.

10 users: CDN

Perhaps it should have been done from the beginning. Processing requests and accepting new photos is starting to load our servers too much.

At this stage, you need to use a cloud service for storing static content - images, videos, and more (AWS S3 or Digital Ocean Spaces). In general, our API should avoid handling things like serving images and uploading images to the server.

Another advantage of cloud hosting is a CDN (AWS calls this add-on Cloudfront, but many cloud storage providers offer it out of the box). The CDN automatically caches our images in various data centers around the world.

Although our main data center may be located in Ohio, if someone requests an image from Japan, the cloud provider will make a copy and store it in their Japanese data center. The next person to request this image in Japan will receive it much faster. This is important when we work with large files, like photos or videos, which take a long time to download and transfer across the planet.

How to scale from 1 to 100 users

100 users: data tier scaling

CDN helped a lot: traffic is growing at full speed. Famous vlogger Mavid Mobrick just signed up with us and posted his "story" as they say. Thanks to the load balancer, the CPU and memory usage on the API servers is kept low (ten API instances running), but we're starting to get a lot of request timeouts… where are these delays coming from?

After a little digging into the metrics, we see that the CPU on the database server is 80-90% loaded. We're on the edge.

Scaling a data layer is probably the hardest part of the equation. The API servers serve stateless requests, so we're just adding more API instances. Nose by a majority databases can't do that. We will talk about popular relational database management systems (PostgreSQL, MySQL, etc.).

caching

One of the easiest ways to increase the performance of our database is to introduce a new component: the cache layer. The most common caching method is an in-memory key-value record store, such as Redis or Memcached. Most clouds have a managed version of these services: Elasticache on AWS and Memorystore on Google Cloud.

A cache is useful when a service makes many repeated calls to the database to retrieve the same information. In fact, we access the database only once, store the information in the cache - and do not touch it again.

For example, in our Graminsta service, every time someone visits the profile page of the star Mobrik, the API server queries the database for information from his profile. It happens over and over again. Because Mobrik's profile information doesn't change with every request, it's great for caching.

We will cache the results from the database in Redis by key user:id with a duration of 30 seconds. Now, when someone visits Mobrik's profile, we first check Redis, and if the data is there, we simply transfer it directly from Redis. Now requests to the most popular profile on the site practically do not load our database.

Another advantage of most caching services is that they are easier to scale than database servers. Redis has a built-in Redis Cluster mode. Like a load balancer1, it allows you to distribute the Redis cache across multiple machines (thousands of servers if needed).

Almost all large-scale applications use caching, it is an absolutely essential part of a fast API. Faster query processing and faster code are all important, but without a cache it's almost impossible to scale a service to millions of users.

Reading cues

When the number of queries to the database has increased greatly, we can do one more thing - add read replicas in the database management system. With the help of the managed services described above, this can be done in one click. The read replica will remain up-to-date in the main database and available for SELECT statements.

Here is our system now:

How to scale from 1 to 100 users

Further actions

As the application continues to scale, we will continue to separate the services to scale independently. For example, if we start using Websockets, then it makes sense to pull the Websockets handling code into a separate service. We can host it on new instances behind our own load balancer that can scale up and down based on open Websockets connections and regardless of the number of HTTP requests.

We will also continue to deal with restrictions at the database level. It is at this stage that it is time to learn database partitioning and sharding. Both approaches require additional overhead, but allow you to scale the database almost indefinitely.

We also want to install a monitoring and analytics service like New Relic or Datadog. This allows you to identify slow queries and understand where improvement is needed. As we scale, we want to focus on finding bottlenecks and fixing themβ€”often using some of the ideas from previous sections.

Sources of

This post is inspired by one of my favorite posts about high scalability. I wanted to make the article a little more specific for the initial stages of projects and decouple it from one vendor. Be sure to read if you are interested in this topic.

Footnotes

  1. While similar in terms of load balancing across multiple instances, the underlying implementation of a Redis cluster is very different from a load balancer. [return]

How to scale from 1 to 100 users

Source: habr.com

Add a comment