How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

Hello, my name is Kostya Kramlikh, I'm the lead developer of the Virtual Private Cloud division in Yandex.Cloud. I'm a virtual networker, and as you might guess, in this article I'll talk about the Virtual Private Cloud (VPC) device in general and the virtual network in particular. And you will also find out why we, the developers of the service, value feedback from our users. But first things first.

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

What is VPC?

Nowadays, there are a variety of options for deploying services. I'm sure someone still keeps the server under the administrator's desk, although I hope there are fewer such stories.

Now services are trying to go to public clouds, and this is where they collide with VPCs. VPC is a part of a public cloud that ties user, infrastructure, platform and other capacities together, wherever they are, in our Cloud or outside it. At the same time, VPC allows you not to expose these capacities to the Internet unnecessarily, they remain within your isolated network.

What does a virtual network look like from the outside?

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

By VPC, we primarily mean an overlay network and network services, such as VPNaaS, NATaas, LBaas, etc. And all this works on top of a fault-tolerant network infrastructure, which has already been great article here, on HabrΓ©.

Let's take a closer look at the virtual network and its device.

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

Consider two availability zones. We provide a virtual network - what we called VPC. In fact, it determines the space of uniqueness of your "gray" addresses. Within each virtual network, you have complete control over the space of addresses that you can assign to compute resources.

The network is global. At the same time, it is projected onto each of the availability zones in the form of an entity called Subnet. For each Subnet, you assign a CIDR of size 16 or less. There can be more than one such entity in each availability zone, and there is always transparent routing between them. This means that all your resources within the same VPC can "talk" to each other, even if they are in different Availability Zones. "Communicate" without access to the Internet, through our internal channels, "thinking" that they are within the same private network.

The diagram above shows a typical situation: two VPCs that intersect somewhere in addresses. Both can be yours. For example, one for development, the other for testing. There may simply be different users - in this case it does not matter. And one virtual machine is plugged into each VPC.

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

Let's make the scheme worse. You can make it so that one virtual machine is stuck into several Subnets at once. And not just like that, but in different virtual networks.

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

At the same time, if you need to expose machines to the Internet, this can be done through the API or UI. To do this, you need to configure NAT translation of your β€œgray”, internal address, to β€œwhite” - public. You cannot choose a "white" address, it is assigned randomly from our pool of addresses. As soon as you stop using the external IP, it is returned to the pool. You pay only for the time of using the "white" address.

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

It is also possible to give the machine access to the Internet using a NAT instance. You can route traffic to an instance through a static routing table. We have provided such a case, because users sometimes need it, and we know about it. Accordingly, our image catalog contains a specially configured NAT image.

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

But even when there is a ready NAT image, setup can be tricky. We understood that for some users this is not the most convenient option, so in the end we made it possible to enable NAT for the desired Subnet in one click. This feature is still in closed preview access, where it is tested with the help of community members.

How the virtual network is arranged from the inside

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

How does the user interact with the virtual network? The web looks outward with its API. The user comes to the API and works with the target state. Through the API, the user sees how everything should be arranged and configured, while he sees the status, how much the actual state differs from the desired one. This is a picture of the user. What's going on inside?

We write the desired state to Yandex Database and go to configure different parts of our VPC. The overlay network in Yandex.Cloud is based on selected components of OpenContrail, which has recently been called Tungsten Fabric. Network services are implemented on a single CloudGate platform. In CloudGate, we also used a number of open source components: GoBGP - to access control information, as well as VPP - to implement a software router that runs on top of DPDK for the data path.

Tungsten Fabric communicates with CloudGate via GoBGP. Tells what's going on in the overlay network. CloudGate, in turn, connects overlay networks with each other and with the Internet.

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

Now let's see how a virtual network solves the problems of scaling and availability. Let's consider a simple case. There is one availability zone and two VPCs are created in it. We deployed one Tungsten Fabric instance, and it pulls several tens of thousands of networks. Networks communicate with CloudGate. CloudGate, as we have already said, ensures their connectivity with each other and with the Internet.

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

Let's say a second availability zone is added. It should fail completely independently of the first. Therefore, in the second availability zone, we must install a separate Tungsten Fabric instance. This will be a separate system that deals with the overlay and knows little about the first system. And the visibility that our virtual network is global, in fact, creates our VPC API. This is his task.

VPC1 is mapped to Availability Zone B if there are resources in Availability Zone B that are pushed to VPC1. If there are no resources from VPC2 in availability zone B, we will not materialize VPC2 in this zone. In turn, since resources from VPC3 exist only in zone B, VPC3 does not exist in zone A. Everything is simple and logical.

Let's go a little deeper and see how a particular host in Y.Cloud works. The main thing that I want to note is that all hosts are arranged in the same way. We make it so that only the necessary minimum of services runs on hardware, all the rest run on virtual machines. We build higher-order services based on basic infrastructure services, and also use the Cloud to solve some engineering problems, for example, within the framework of Continuous Integration.

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

If we look at a specific host, we can see that there are three components running on the host OS:

  • Compute - the part responsible for the distribution of computing resources on the host.
  • VRouter is a part of Tungsten Fabric that organizes an overlay, that is, it tunnels packets through an underlay.
  • VDisks are chunks of storage virtualization.

In addition, services are launched in virtual machines: Cloud infrastructure services, platform services and customer capacities. Customer capacities and platform services always go to the overlay through VRouter.

Infrastructure services can stick into the overlay, but basically they want to work in the underlay. They are stuck into the underlay with the help of SR-IOV. In fact, we cut the card into virtual network cards (virtual functions) and push them into infrastructure virtual machines so as not to lose performance. For example, the same CloudGate is launched as one of these infrastructure virtual machines.

Now that we have described the global tasks of the virtual network and the structure of the basic components of the cloud, let's see how exactly the different parts of the virtual network interact with each other.

We distinguish three layers in our system:

  • Config Plane - sets the target state of the system. This is what the user configures via the API.
  • Control Plane - provides user-defined semantics, that is, brings the Data Plane state to what was described by the user in Config Plane.
  • Data Plane - directly processes the user's packets.

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

As I said above, it all starts with the fact that the user or internal platform service comes to the API and describes a certain target state.

This state is immediately written to Yandex Database, returns the asynchronous operation ID via the API, and starts our internal machinery to return the state that the user wanted. Configuration tasks go to the SDN controller and tell Tungsten Fabric what to do in the overlay. For example, they reserve ports, virtual networks, and the like.

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

Config Plane in Tungsten Fabric sends the required state to the Control Plane. Through it, Config Plane communicates with the hosts, telling what exactly will be spinning on them soon.

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

Now let's see how the system looks on the hosts. The virtual machine has a network adapter plugged into VRouter. VRouter is a core Tungsten Fabric module that looks at packets. If there is already a flow for some package, the module processes it. If there is no flow, the module does the so-called punting, that is, it sends a packet to the usermod process. The process parses the packet and either responds to it itself, like DHCP and DNS, or tells VRouter what to do with it. After that, VRouter can process the packet.

Further, traffic between virtual machines within the same virtual network goes transparently, it is not directed to CloudGate. The hosts on which the virtual machines are deployed communicate directly with each other. They tunnel traffic and forward it for each other through the underlay.

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

Control Planes communicate with each other between availability zones via BGP, as with another router. They tell which machines are up where so that VMs in one zone can communicate directly with other VMs.

How Yandex.Cloud works with Virtual Private Cloud and how our users help us implement useful features

And Control Plane communicates with CloudGate. Similarly, it reports where and which virtual machines are raised, what addresses they have. This allows you to direct external traffic and traffic from balancers to them.

The traffic that leaves the VPC comes to CloudGate, to the data path, where the VPP with our plugins is quickly chewed up. Then the traffic is fired either to other VPCs or outside, to border routers that are configured through the Control Plane of CloudGate itself.

Plans for the near future

If we summarize everything said above in a few sentences, we can say that VPC in Yandex.Cloud solves two important tasks:

  • Provides isolation between different clients.
  • Combines resources, infrastructure, platform services, other clouds and on-premise into a single network.

And in order to solve these problems well, you need to provide scalability and fault tolerance at the level of the internal architecture, which VPC does.

Gradually VPC acquires functions, we implement new features, we try to improve something in terms of user convenience. Some ideas are voiced and get on the priority list thanks to the members of our community.

We currently have the following list of plans for the near future:

  • VPN as a service
  • Private DNS instances are images for quickly setting up virtual machines with a pre-configured DNS server.
  • DNS as a service.
  • Internal load balancer.
  • Adding a "white" IP address without recreating the virtual machine.

The balancer and the ability to switch the IP address for an already created virtual machine were on this list at the request of users. To be honest, without explicit feedback, we would have taken on these functions a little later. And so we are already working on the problem about addresses.

Initially, a "white" IP address could only be added when creating a machine. If the user forgot to do this, the virtual machine had to be recreated. The same and, if necessary, remove the external IP. It will soon be possible to turn the public IP on and off without having to recreate the machine.

Feel free to express your ideas and support suggestions other users. You help us make the Cloud better and get important and useful features faster!

Source: habr.com

Add a comment