We accept 10 events in Yandex.Cloud. Part 000

Hello everyone, friends!

* This article is based on the REBRAIN & Yandex.Cloud open workshop, if you like watching videos more, you can find it at this link - https://youtu.be/cZLezUm0ekE

Recently, we had the opportunity to feel Yandex.Cloud live. Since we wanted to feel for a long time and tightly, we immediately abandoned the idea of ​​\uXNUMXb\uXNUMXbstarting a simple wordpress blog with a cloud base - it's too boring. After some deliberation, we decided to deploy something similar to the production architecture of the service for receiving and analyzing events in near real time mode.

I am absolutely sure that the vast majority of online (and not only) businesses collect a mountain of information about their users and their actions in one way or another. At a minimum, this is necessary for making certain decisions - for example, if you manage an online game - you can see statistics on which level users most often get stuck and remove your toy. Or why users leave your site without buying anything (hello, Yandex.Metrica).

So, our story: how we wrote an application in golang, tested kafka vs rabbitmq vs yqs, wrote data streaming to the Clickhouse cluster and visualized data using yandex datalens. Naturally, all this was seasoned with infrastructural delights in the form of docker, terraform, gitlab ci and, of course, prometheus. Let's go!

I would like to make a reservation right away that we won’t be able to set everything up in one sitting - for this we need several articles in a series. A little about the structure:

Part 1 (you are reading it). We will define the terms of reference and the architecture of the solution, as well as write an application in golang.
2 part. We upload our application to production, make it scalable and test the load.
3 part. Let's try to figure out why we need to store messages in a buffer, and not in files, and also compare kafka, rabbitmq and yandex queue service with each other.
4 part. We will deploy a Clickhouse cluster, write streaming to transfer data from the buffer there, and set up visualization in datalens.
5 part. Let's bring the entire infrastructure into proper shape - set up ci / cd using gitlab ci, connect monitoring and service discovery using prometheus and consul.

TK

First, let's formulate the terms of reference - what exactly we want to get at the output.

  1. We want to have an endpoint like events.kis.im (kis.im is the test domain we will be using throughout the articles) that should accept events using HTTPS.
  2. Events are a simple json of the form: {"event": "view", "os": "linux", "browser": "chrome"}. At the final stage, we will add a little more fields, but this will not play a big role. If you wish, you can switch to protobuf.
  3. The service must be able to process 10 events per second.
  4. It should be possible to scale horizontally by simply adding new instances to our solution. And it will be nice if we can move the front-end to different geolocations to reduce latency for client requests.
  5. Fault tolerance. The solution must be stable enough and be able to survive when any parts fall (up to a certain amount, of course).

Architecture

In general, for this type of task, classical architectures have long been invented that allow for efficient scaling. The figure shows an example of our solution.

We accept 10 events in Yandex.Cloud. Part 000

So what do we have:

1. On the left, our devices are shown, which generate various events, whether it is passing the level of players in a toy on a smartphone or creating an order in an online store through a regular browser. The event, as specified in the TOR, is a simple json that is sent to our endpoint - events.kis.im.

2. The first two servers are simple balancers, their main tasks are:

  • Be constantly available. To do this, you can use, for example, keepalived, which will switch the virtual IP between nodes in case of problems.
  • Terminate TLS. Yes, we will terminate TLS on them. Firstly, so that our solution complies with the TOR, and secondly, in order to remove the burden of establishing an encrypted connection from our backend servers.
  • Balance incoming requests to available backend servers. The key word here is accessible. Based on this, we come to the understanding that load balancers should be able to monitor our servers with applications and stop balancing traffic to failed nodes.

3. Behind the balancers we have application servers running a fairly simple application. It should be able to accept incoming requests via HTTP, validate the sent json and buffer the data.

4. The diagram shows kafka as a buffer, although, of course, other similar services can be used at this level. We will compare Kafka, rabbitmq and yqs in the third article.

5. The penultimate point of our architecture is Clickhouse - a columnar database that allows you to store and process a huge amount of data. At this level, we need to transfer data from the buffer to, in fact, the storage system (more on this in article 4).

This scheme allows us to horizontally scale each layer independently. Backend servers can't cope - let's add more - because they are stateless applications, and therefore, this can be done at least automatically. The buffer in the form of kafka does not pull - let's add more servers and transfer part of the partitions of our topic to them. The clickhouse can't cope - it's impossible πŸ™‚ In fact, we'll also dock the servers and shard the data.

By the way, if you want to implement an optional part of our technical task and scale in different geolocations, then there is nothing easier:

We accept 10 events in Yandex.Cloud. Part 000

At each geolocation, we deploy a load balancer with application and kafka. In general, 2 application servers, 3 kafka nodes and a cloud balancer, for example, cloudflare, are enough, which will check the availability of application nodes and balance requests by geolocation based on the client's source IP address. This way data sent by US client will land on US servers. And the data from Africa is in African.

Then everything is quite simple - we use the mirror tool from the Kafka set and copy all the data from all locations to our central data center located in Russia. Inside, we parse the data and write it to Clickhouse for subsequent visualization.

So, we figured out the architecture - we start to shake Yandex.Cloud!

Writing an application

Before the Cloud, you still have to be a little patient and write a fairly simple service for processing incoming events. We will use golang, because it has proven itself very well as a language for writing network applications.

After spending an hour of time (maybe a couple of hours), we get something like this: https://github.com/RebrainMe/yandex-cloud-events/blob/master/app/main.go.

What are the main points to be noted here:

1. When starting the application, you can specify two flags. One is responsible for the port on which we will listen for incoming http requests (-addr). The second is for the address of the kafka server where we will write our events (-kafka):

addr     = flag.String("addr", ":8080", "TCP address to listen to")
kafka    = flag.String("kafka", "127.0.0.1:9092", "Kafka endpoints”)

2. The application uses the sarama library ([] github.com/Shopify/sarama) to send messages to the kafka cluster. We immediately set the settings focused on the maximum processing speed:

config := sarama.NewConfig()
config.Producer.RequiredAcks = sarama.WaitForLocal
config.Producer.Compression = sarama.CompressionSnappy
config.Producer.Return.Successes = true

3. Also, the prometheus client is built into our application, which collects various metrics, such as:

  • the number of requests to our application;
  • the number of errors when executing the request (it is impossible to read the post request, broken json, it is impossible to write to kafka);
  • processing time of one request from the client, including the time of writing the message to kafka.

4. Three endpoints that our application processes:

  • /status - just return ok to show that we are alive. Although you can add some checks, such as the availability of a kafka cluster.
  • /metrics β€” using this url, the prometheus client will return the metrics it has collected.
  • /post - the main endpoint where POST requests with json inside will come. Our application checks json for validity and if everything is OK, it writes the data to the kafka cluster.

I will make a reservation that the code is not perfect - it can (and should!) Be finished. For example, you can stop using the built-in net/http and switch to the faster fasthttp. Or win processing time and cpu resources by moving the json validity check to a later stage - when the data will be transferred from the buffer to the clickhouse cluster.

In addition to the development side of the issue, we immediately thought about our future infrastructure and decided to deploy our application through docker. The final Dockerfile for building the application is βˆ’ https://github.com/RebrainMe/yandex-cloud-events/blob/master/app/Dockerfile. In general, it is quite simple, the only point that I want to pay attention to is the multistage assembly, which allows us to reduce the final image of our container.

First steps in the cloud

First, register for cloud.yandex.ru. After filling in all the required fields, an account will be created for us and a grant will be issued for a certain amount of money that can be used to test cloud services. If you want to repeat all the steps from our article, this grant should be enough for you.

After registration, a separate cloud and a default directory will be created for you, in which you can start creating cloud resources. In general, in Yandex.Cloud, the relationship of resources looks like this:

We accept 10 events in Yandex.Cloud. Part 000

You can create multiple clouds per account. And inside the cloud, make different directories for different projects of the company. You can read more about this in the documentation - https://cloud.yandex.ru/docs/resource-manager/concepts/resources-hierarchy. By the way, below in the text I will often refer to it. When I set up the entire infrastructure from scratch, the documentation helped me out more than once, so I advise you to study it.

To manage the cloud, you can use both the web interface and the console utility - yc. Installation is performed with one command (for Linux and Mac Os):

curl https://storage.yandexcloud.net/yandexcloud-yc/install.sh | bash

If your internal security raged about running scripts from the Internet, then, firstly, you can open the script and read it, and secondly, we run it under our user - without root rights.

If you want to install a client for windows, you can use the instructions here and then execute yc initto fully customize it:

vozerov@mba:~ $ yc init
Welcome! This command will take you through the configuration process.
Please go to https://oauth.yandex.ru/authorize?response_type=token&client_id= in order to obtain OAuth token.

Please enter OAuth token:
Please select cloud to use:
 [1] cloud-b1gv67ihgfu3bp (id = b1gv67ihgfu3bpt24o0q)
 [2] fevlake-cloud (id = b1g6bvup3toribomnh30)
Please enter your numeric choice: 2
Your current cloud has been set to 'fevlake-cloud' (id = b1g6bvup3toribomnh30).
Please choose folder to use:
 [1] default (id = b1g5r6h11knotfr8vjp7)
 [2] Create a new folder
Please enter your numeric choice: 1
Your current folder has been set to 'default' (id = b1g5r6h11knotfr8vjp7).
Do you want to configure a default Compute zone? [Y/n]
Which zone do you want to use as a profile default?
 [1] ru-central1-a
 [2] ru-central1-b
 [3] ru-central1-c
 [4] Don't set default zone
Please enter your numeric choice: 1
Your profile default Compute zone has been set to 'ru-central1-a'.
vozerov@mba:~ $

In principle, the process is simple - first you need to get an oauth token to manage the cloud, select the cloud and the folder that you will use.

If you have several accounts or folders within the same cloud, you can create additional profiles with separate settings via yc config profile create and switch between them.

In addition to the above methods, the Yandex.Cloud team wrote a very good plugin for terraform to manage cloud resources. For my part, I prepared a git repository, where I described all the resources that will be created as part of the article - https://github.com/rebrainme/yandex-cloud-events/. We are interested in the master branch, let's clone it locally:


vozerov@mba:~ $ git clone https://github.com/rebrainme/yandex-cloud-events/ events
Cloning into 'events'...
remote: Enumerating objects: 100, done.
remote: Counting objects: 100% (100/100), done.
remote: Compressing objects: 100% (68/68), done.
remote: Total 100 (delta 37), reused 89 (delta 26), pack-reused 0
Receiving objects: 100% (100/100), 25.65 KiB | 168.00 KiB/s, done.
Resolving deltas: 100% (37/37), done.
vozerov@mba:~ $ cd events/terraform/

All the main variables that are used in terraform are registered in the main.tf file. To get started, create a private.auto.tfvars file in the terraform folder with the following content:

# Yandex Cloud Oauth token
yc_token = ""
# Yandex Cloud ID
yc_cloud_id = ""
# Yandex Cloud folder ID
yc_folder_id = ""
# Default Yandex Cloud Region
yc_region = "ru-central1-a"
# Cloudflare email
cf_email = ""
# Cloudflare token
cf_token = ""
# Cloudflare zone id
cf_zone_id = ""

All variables can be taken from the yc config list, since we have already configured the console utility. I advise you to immediately add private.auto.tfvars to .gitignore so as not to inadvertently publish private data.

In private.auto.tfvars, we also specified data from Cloudflare - to create dns records and proxy the main events.kis.im domain to our servers. If you don't want to use cloudflare, then remove the initialization of the cloudflare provider in main.tf and the dns.tf file, which is responsible for creating the necessary dns records.

In our work, we will combine all three methods - the web interface, the console utility, and terraform.

Virtual networks

To be honest, this step could have been skipped, because when you create a new cloud, you will automatically create a separate network and 3 subnets - one for each availability zone. But still I would like to make a separate network for our project with its own addressing. The general scheme of the network in Yandex.Cloud is shown in the figure below (honestly taken from https://cloud.yandex.ru/docs/vpc/concepts/)

We accept 10 events in Yandex.Cloud. Part 000

So, you create a common network within which resources can communicate with each other. For each availability zone, a subnet is made with its own addressing and connected to a common network. As a result, all cloud resources in it can communicate, even being in different availability zones. Resources connected to different cloud networks can only see each other through external addresses. By the way, how does this magic work inside, it was well described on HabrΓ©.

Network creation is described in the network.tf file from the repository. There we create one common private network internal and connect three subnets to it in different availability zones - internal-a (172.16.1.0/24), internal-b (172.16.2.0/24), internal-c (172.16.3.0/24 ).

Initialize terraform and create networks:

vozerov@mba:~/events/terraform (master) $ terraform init
... skipped ..

vozerov@mba:~/events/terraform (master) $ terraform apply -target yandex_vpc_subnet.internal-a -target yandex_vpc_subnet.internal-b -target yandex_vpc_subnet.internal-c

... skipped ...

Plan: 4 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

yandex_vpc_network.internal: Creating...
yandex_vpc_network.internal: Creation complete after 3s [id=enp2g2rhile7gbqlbrkr]
yandex_vpc_subnet.internal-a: Creating...
yandex_vpc_subnet.internal-b: Creating...
yandex_vpc_subnet.internal-c: Creating...
yandex_vpc_subnet.internal-a: Creation complete after 6s [id=e9b1dad6mgoj2v4funog]
yandex_vpc_subnet.internal-b: Creation complete after 7s [id=e2liv5i4amu52p64ac9p]
yandex_vpc_subnet.internal-c: Still creating... [10s elapsed]
yandex_vpc_subnet.internal-c: Creation complete after 10s [id=b0c2qhsj2vranoc9vhcq]

Apply complete! Resources: 4 added, 0 changed, 0 destroyed.

Great! We have made our network and are now ready to create our internal services.

Create virtual machines

To test the application, it will be enough for us to create two virtual machines - we need the first one to build and run the application, the second one - to run kafka, which we will use to store incoming messages. And we will create another machine where we will set up prometheus to monitor the application.

Virtual machines will be configured using ansible, so before running terraform, make sure you have one of the latest versions of ansible. And install the necessary roles with ansible galaxy:

vozerov@mba:~/events/terraform (master) $ cd ../ansible/
vozerov@mba:~/events/ansible (master) $ ansible-galaxy install -r requirements.yml
- cloudalchemy-prometheus (master) is already installed, skipping.
- cloudalchemy-grafana (master) is already installed, skipping.
- sansible.kafka (master) is already installed, skipping.
- sansible.zookeeper (master) is already installed, skipping.
- geerlingguy.docker (master) is already installed, skipping.
vozerov@mba:~/events/ansible (master) $

Inside the ansible folder there is an example .ansible.cfg configuration file that I use. Perhaps useful.

Before creating virtual machines, make sure that you have ssh-agent running and ssh key added, otherwise terraform will not be able to connect to the created machines. I, of course, came across a bug in os x: https://github.com/ansible/ansible/issues/32499#issuecomment-341578864. To prevent such a story from repeating itself, add a small variable to env before starting Terraform:

vozerov@mba:~/events/terraform (master) $ export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

In the folder with the terraform we create the necessary resources:

vozerov@mba:~/events/terraform (master) $ terraform apply -target yandex_compute_instance.build -target yandex_compute_instance.monitoring -target yandex_compute_instance.kafka
yandex_vpc_network.internal: Refreshing state... [id=enp2g2rhile7gbqlbrkr]
data.yandex_compute_image.ubuntu_image: Refreshing state...
yandex_vpc_subnet.internal-a: Refreshing state... [id=e9b1dad6mgoj2v4funog]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

... skipped ...

Plan: 3 to add, 0 to change, 0 to destroy.

... skipped ...

If everything ended successfully (as it should), then we will have three virtual machines:

  1. build - a machine for testing and building an application. Docker was installed automatically by ansible.
  2. monitoring - monitoring machine - prometheus & grafana is installed on it. Login / password standard: admin / admin
  3. kafka is a small machine with kafka installed, available on port 9092.

Let's make sure they're all in place:

vozerov@mba:~/events (master) $ yc compute instance list
+----------------------+------------+---------------+---------+---------------+-------------+
|          ID          |    NAME    |    ZONE ID    | STATUS  |  EXTERNAL IP  | INTERNAL IP |
+----------------------+------------+---------------+---------+---------------+-------------+
| fhm081u8bkbqf1pa5kgj | monitoring | ru-central1-a | RUNNING | 84.201.159.71 | 172.16.1.35 |
| fhmf37k03oobgu9jmd7p | kafka      | ru-central1-a | RUNNING | 84.201.173.41 | 172.16.1.31 |
| fhmt9pl1i8sf7ga6flgp | build      | ru-central1-a | RUNNING | 84.201.132.3  | 172.16.1.26 |
+----------------------+------------+---------------+---------+---------------+-------------+

The resources are in place, and from here we can pull out their ip-addresses. Everywhere below, I will use ip-addresses to connect via ssh and test the application. If you have a cloudflare account connected to terraform, feel free to use freshly created DNS names.
By the way, when creating a virtual machine, an internal ip and an internal DNS name are issued, so you can access servers within the network by name:

ubuntu@build:~$ ping kafka.ru-central1.internal
PING kafka.ru-central1.internal (172.16.1.31) 56(84) bytes of data.
64 bytes from kafka.ru-central1.internal (172.16.1.31): icmp_seq=1 ttl=63 time=1.23 ms
64 bytes from kafka.ru-central1.internal (172.16.1.31): icmp_seq=2 ttl=63 time=0.625 ms
^C
--- kafka.ru-central1.internal ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.625/0.931/1.238/0.308 ms

This is useful for us to specify the endpoint with kafk to the application.

Putting together the application

Great, there are servers, there is an application - it remains only to collect and publish it. For assembly, we will use the usual docker build, but as an image store, we will take a service from Yandex - container registry. But first things first.

We copy the application to the build machine, go via ssh and build the image:

vozerov@mba:~/events/terraform (master) $ cd ..
vozerov@mba:~/events (master) $ rsync -av app/ [email protected]:app/

... skipped ...

sent 3849 bytes  received 70 bytes  7838.00 bytes/sec
total size is 3644  speedup is 0.93

vozerov@mba:~/events (master) $ ssh 84.201.132.3 -l ubuntu
ubuntu@build:~$ cd app
ubuntu@build:~/app$ sudo docker build -t app .
Sending build context to Docker daemon  6.144kB
Step 1/9 : FROM golang:latest AS build
... skipped ...

Successfully built 9760afd8ef65
Successfully tagged app:latest

Half the job is done - now we can test the performance of our application by running it and pointing it to kafka:

ubuntu@build:~/app$ sudo docker run --name app -d -p 8080:8080 app /app/app -kafka=kafka.ru-central1.internal:9092</code>

Π‘ локальной машинки ΠΌΠΎΠΆΠ½ΠΎ ΠΎΡ‚ΠΏΡ€Π°Π²ΠΈΡ‚ΡŒ тСстовый event ΠΈ ΠΏΠΎΡΠΌΠΎΡ‚Ρ€Π΅Ρ‚ΡŒ Π½Π° ΠΎΡ‚Π²Π΅Ρ‚:

<code>vozerov@mba:~/events (master) $ curl -D - -s -X POST -d '{"key1":"data1"}' http://84.201.132.3:8080/post
HTTP/1.1 200 OK
Content-Type: application/json
Date: Mon, 13 Apr 2020 13:53:54 GMT
Content-Length: 41

{"status":"ok","partition":0,"Offset":0}
vozerov@mba:~/events (master) $

The application responded with a write success and an indication of the id of the partition and the offset in which the message fell. The only thing left to do is to create a registry in Yandex.Cloud and upload our image there (how to do this using three lines is described in the registry.tf file). Create storage:

vozerov@mba:~/events/terraform (master) $ terraform apply -target yandex_container_registry.events

... skipped ...

Plan: 1 to add, 0 to change, 0 to destroy.

... skipped ...

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

There are several ways to authenticate in the container registry - using an oauth token, an iam token, or a service account key. More details about these methods can be found in the documentation. https://cloud.yandex.ru/docs/container-registry/operations/authentication. We will use the service account key, so we create an account:

vozerov@mba:~/events/terraform (master) $ terraform apply -target yandex_iam_service_account.docker -target yandex_resourcemanager_folder_iam_binding.puller -target yandex_resourcemanager_folder_iam_binding.pusher

... skipped ...

Apply complete! Resources: 3 added, 0 changed, 0 destroyed.

Now it remains to make a key for it:

vozerov@mba:~/events/terraform (master) $ yc iam key create --service-account-name docker -o key.json
id: ajej8a06kdfbehbrh91p
service_account_id: ajep6d38k895srp9osij
created_at: "2020-04-13T14:00:30Z"
key_algorithm: RSA_2048

We get information about the id of our storage, transfer the key and log in:

vozerov@mba:~/events/terraform (master) $ scp key.json [email protected]:
key.json                                                                                                                    100% 2392   215.1KB/s   00:00

vozerov@mba:~/events/terraform (master) $ ssh 84.201.132.3 -l ubuntu

ubuntu@build:~$ cat key.json | sudo docker login --username json_key --password-stdin cr.yandex
WARNING! Your password will be stored unencrypted in /home/ubuntu/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
ubuntu@build:~$

To upload the image to the registry, we need the container registry ID, we take it from the yc utility:

vozerov@mba:~ $ yc container registry get events
id: crpdgj6c9umdhgaqjfmm
folder_id:
name: events
status: ACTIVE
created_at: "2020-04-13T13:56:41.914Z"

After that, we tag our image with a new name and load it:

ubuntu@build:~$ sudo docker tag app cr.yandex/crpdgj6c9umdhgaqjfmm/events:v1
ubuntu@build:~$ sudo docker push cr.yandex/crpdgj6c9umdhgaqjfmm/events:v1
The push refers to repository [cr.yandex/crpdgj6c9umdhgaqjfmm/events]
8c286e154c6e: Pushed
477c318b05cb: Pushed
beee9f30bc1f: Pushed
v1: digest: sha256:1dd5aaa9dbdde2f60d833be0bed1c352724be3ea3158bcac3cdee41d47c5e380 size: 946

We can verify that the image loaded successfully:

vozerov@mba:~/events/terraform (master) $ yc container repository list
+----------------------+-----------------------------+
|          ID          |            NAME             |
+----------------------+-----------------------------+
| crpe8mqtrgmuq07accvn | crpdgj6c9umdhgaqjfmm/events |
+----------------------+-----------------------------+

By the way, if you install the yc utility on a linux machine, you can use the command

yc container registry configure-docker

to configure docker.

Conclusion

We have done a lot of hard work and as a result:

  1. We came up with the architecture of our future service.
  2. We wrote an application in golang that implements our business logic.
  3. We collected it and poured it into a private container registry.

In the next part, let's move on to the interesting part - we will pour our application into production and finally launch the load on it. Don't switch!

This material is in the video of the REBRAIN & Yandex.Cloud open workshop: We accept 10 requests per second to Yandex Cloud - https://youtu.be/cZLezUm0ekE

If you are interested in attending such events online and asking questions in real time, connect to DevOps by REBRAIN channel.

We want to say special thanks to Yandex.Cloud for the opportunity to hold such an event. Link to them - https://cloud.yandex.ru/prices

If you need to move to the cloud or have questions about your infrastructure, Feel free to submit a request.

PS We have 2 free audits per month, perhaps your project will be one of them.

Source: habr.com

Add a comment