Compiled distributed system configuration

I would like to tell you one interesting mechanism for working with the configuration of a distributed system. The configuration is represented directly in the compiled language (Scala) using safe types. In this post, an example of such a configuration has been analyzed and various aspects of introducing a compiled configuration into the overall development process have been considered.

Compiled distributed system configuration

(english)

Introduction

Building a reliable distributed system implies that all nodes use the correct configuration, synchronized with other nodes. DevOps technologies (terraform, ansible or something similar) are usually used to automatically generate configuration files (often their own for each node). We also want to make sure that all communicating nodes use identical protocols (including the same version). Otherwise, there will be incompatibility in our distributed system. In the JVM world, one consequence of this requirement is the need to use the same version of the library containing protocol messages everywhere.

What about distributed system testing? Of course, we assume that unit tests are provided for all components before we move on to integration testing. (In order for us to extrapolate test results to runtime, we also need to ensure that we have an identical set of libraries in the test phase and at runtime.)

When working with integration tests, it is often easier to use the same classpath everywhere on all nodes. We just have to ensure that the same classpath is used at runtime as well. (While it's possible to run different nodes with different classpaths, this leads to complex configuration and difficulties with deployment and integration tests.) For the purposes of this post, we assume that all nodes will use the same classpath.

The configuration evolves with the application. We use versions to identify different stages of software evolution. Apparently, it is also logical to identify different versions of configurations. And put the configuration itself into the version control system. If there is only one configuration in production, then we can just use the version number. If there are many instances of production, then we need several
configuration branches and an additional label besides the version (for example, the name of the branch). This way we can uniquely identify the exact configuration. Each configuration identifier uniquely corresponds to a certain combination of distributed nodes, ports, external resources, library versions. For the purposes of this post, we will assume that there is only one branch, and we can identify the configuration in the usual way using three numbers separated by a dot (1.2.3).

In modern environments, manual configuration files are rarely created. More often they are generated during deployment and are no longer touched (to don't break anything). A natural question arises, why do we still use a text format to store the configuration? A viable alternative seems to be the ability to use regular configuration code and take advantage of compile-time checks.

In this post, we are just exploring the idea of ​​presenting a configuration inside a compiled artifact.

Compiled Configuration

This section provides an example of a static compiled configuration. Two simple services are implemented, the echo service and the echo service client. Based on these two services, two variants of the system are assembled. In one variant, both services are located on the same node, in the other variant, they are located on different nodes.

Typically, a distributed system contains several nodes. You can identify nodes using values ​​of some type NodeId:

sealed trait NodeId
case object Backend extends NodeId
case object Frontend extends NodeId

or

case class NodeId(hostName: String)

or

object Singleton
type NodeId = Singleton.type

The nodes perform different roles, they run services, and TCP/HTTP connections can be established between them.

To describe a TCP connection, we need at least a port number. We would also like to reflect which protocol is supported on this port, to ensure that both client and server are using the same protocol. We will describe the connection using the following class:

case class TcpEndPoint[Protocol](node: NodeId, port: Port[Protocol])

where Port is just an integer Int with an indication of the range of acceptable values:

type PortNumber = Refined[Int, Closed[_0, W.`65535`.T]]

Refined types

see library refined ΠΈ my report. In short, the library allows you to add compile-time constraints to types. In this case, valid port number values ​​are 16-bit integers. For a compiled configuration, the use of the refined library is not mandatory, but it improves the compiler's ability to check the configuration.

For HTTP (REST) ​​protocols, in addition to the port number, we may also need the path to the service:

type UrlPathPrefix = Refined[String, MatchesRegex[W.`"[a-zA-Z_0-9/]*"`.T]]
case class PortWithPrefix[Protocol](portNumber: PortNumber, pathPrefix: UrlPathPrefix)

Phantom types

To identify the protocol at compile time, we use a type parameter that is not used within the class. This decision is due to the fact that we do not use a protocol instance at runtime, but would like the compiler to check protocol compatibility. By specifying the protocol, we will not be able to pass the wrong service as a dependency.

One of the common protocols is the REST API with Json serialization:

sealed trait JsonHttpRestProtocol[RequestMessage, ResponseMessage]

where RequestMessage - request type, ResponseMessage - response type.
Of course, other protocol descriptions can be used that provide the accuracy of description we require.

For the purposes of this post, we will use a simplified version of the protocol:

sealed trait SimpleHttpGetRest[RequestMessage, ResponseMessage]

Here the request is a string appended to the url and the response is the returned string in the body of the HTTP response.

The configuration of a service is described by the service name, ports, and dependencies. These elements can be represented in Scala in several ways (for example, HList-ami, algebraic data types). For the purposes of this post, we will use the Cake Pattern and represent the modules with trait'ov. (Cake Pattern is not a required element of the described approach. It is just one of the possible implementations.)

Dependencies between services can be represented as methods that return ports EndPoint's other nodes:

  type EchoProtocol[A] = SimpleHttpGetRest[A, A]

  trait EchoConfig[A] extends ServiceConfig {
    def portNumber: PortNumber = 8081
    def echoPort: PortWithPrefix[EchoProtocol[A]] = PortWithPrefix[EchoProtocol[A]](portNumber, "echo")
    def echoService: HttpSimpleGetEndPoint[NodeId, EchoProtocol[A]] = providedSimpleService(echoPort)
  }

To create an echo service, all you need is a port number and an indication that this port supports the echo protocol. We could not specify a specific port, because. traits allow you to declare methods without an implementation (abstract methods). In this case, when creating a particular configuration, the compiler would require us to provide an implementation of the abstract method and provide a port number. Since we have implemented the method, when creating a specific configuration, we can not specify a different port. The default value will be used.

In the client configuration, we declare a dependency on the echo service:

  trait EchoClientConfig[A] {
    def testMessage: String = "test"
    def pollInterval: FiniteDuration
    def echoServiceDependency: HttpSimpleGetEndPoint[_, EchoProtocol[A]]
  }

The dependency is of the same type as the exported service echoService. In particular, in the echo client we require the same protocol. Therefore, when connecting two services, we can be sure that everything will work correctly.

Service Implementation

A function is required to start and stop the service. (The ability to stop a service is critical for testing.) Again, there are several ways to implement such a feature (for example, we could use type classes based on the configuration type). For the purposes of this post, we will use the Cake Pattern. We will represent the service with a class cats.Resource, because this class already provides a secure guaranteed release of resources in case of problems. To get a resource, we need to provide a configuration and a ready-made runtime context. The service start function might look like this:

  type ResourceReader[F[_], Config, A] = Reader[Config, Resource[F, A]]

  trait ServiceImpl[F[_]] {
    type Config
    def resource(
      implicit
      resolver: AddressResolver[F],
      timer: Timer[F],
      contextShift: ContextShift[F],
      ec: ExecutionContext,
      applicative: Applicative[F]
    ): ResourceReader[F, Config, Unit]
  }

where

  • Config β€” configuration type for this service
  • AddressResolver - a runtime object that allows you to find out the addresses of other nodes (see below)

and other types from the library cats:

  • F[_] β€” type of effect (in the simplest case F[A] could just be a function () => A. In this post, we will use cats.IO.)
  • Reader[A,B] - more or less a synonym for the function A => B
  • cats.Resource - a resource that can be obtained and released
  • Timer - timer (allows you to fall asleep for a while and measure time intervals)
  • ContextShift - analogue ExecutionContext
  • Applicative - an effect type class that allows you to combine individual effects (almost a monad). In more complex applications it seems to be better to use Monad/ConcurrentEffect.

Using this function signature, we can implement several services. For example, a service that does nothing:

  trait ZeroServiceImpl[F[_]] extends ServiceImpl[F] {
    type Config <: Any
    def resource(...): ResourceReader[F, Config, Unit] =
      Reader(_ => Resource.pure[F, Unit](()))
  }

(See source, in which other services are implemented βˆ’ echo service, echo client
ΠΈ lifetime controllers.)

A node is an object that can start several services (starting the resource chain is provided by the Cake Pattern):

object SingleNodeImpl extends ZeroServiceImpl[IO]
  with EchoServiceService
  with EchoClientService
  with FiniteDurationLifecycleServiceImpl
{
  type Config = EchoConfig[String] with EchoClientConfig[String] with FiniteDurationLifecycleConfig
}

Note that we are specifying the exact type of configuration that this node requires. If we forget to specify any of the configuration types required by a particular service, then there will be a compilation error. Also, we won't be able to start the node unless we provide some object of the right type with all the necessary data.

Host name resolution

To connect to a remote host, we need a real IP address. It is possible that the address will become known later than the rest of the configuration. Therefore, we need a function that maps a host ID to an address:

case class NodeAddress[NodeId](host: Uri.Host)
trait AddressResolver[F[_]] {
  def resolve[NodeId](nodeId: NodeId): F[NodeAddress[NodeId]]
}

There are several ways to implement this functionality:

  1. If the addresses become known to us before deployment, then we can generate Scala code with
    addresses and then run the assembly. This will compile and run tests.
    In this case, the function will be known statically and can be represented in the code as a mapping Map[NodeId, NodeAddress].
  2. In some cases, the actual address is only known after the node is started.
    In this case, we can implement a β€œdiscovery service” that runs before the rest of the nodes and all nodes will register with this service and request the addresses of other nodes.
  3. If we can modify /etc/hosts, you can use predefined hostnames (like my-project-main-node ΠΈ echo-backend) and just link these names
    with IP addresses during deployment.

In this post, we will not consider these cases in more detail. For our
For a toy example, all nodes will have the same IP address βˆ’ 127.0.0.1.

Next, consider two options for a distributed system:

  1. Placement of all services on one node.
  2. And placing the echo service and the echo client on different nodes.

Configuration for one node:

Single node configuration

object SingleNodeConfig extends EchoConfig[String] 
  with EchoClientConfig[String] with FiniteDurationLifecycleConfig
{
  case object Singleton // identifier of the single node 
  // configuration of server
  type NodeId = Singleton.type
  def nodeId = Singleton

  /** Type safe service port specification. */
  override def portNumber: PortNumber = 8088

  // configuration of client

  /** We'll use the service provided by the same host. */
  def echoServiceDependency = echoService

  override def testMessage: UrlPathElement = "hello"

  def pollInterval: FiniteDuration = 1.second

  // lifecycle controller configuration
  def lifetime: FiniteDuration = 10500.milliseconds // additional 0.5 seconds so that there are 10 requests, not 9.
}

The object implements both client and server configuration. It also uses a lifetime configuration so that after the interval lifetime terminate the program. (Ctrl-C also works and releases all resources correctly.)

The same set of configuration traits and implementations can be used to create a system consisting of two separate nodes:

Configuration for two nodes

  object NodeServerConfig extends EchoConfig[String] with SigTermLifecycleConfig
  {
    type NodeId = NodeIdImpl

    def nodeId = NodeServer

    override def portNumber: PortNumber = 8080
  }

  object NodeClientConfig extends EchoClientConfig[String] with FiniteDurationLifecycleConfig
  {
    // NB! dependency specification
    def echoServiceDependency = NodeServerConfig.echoService

    def pollInterval: FiniteDuration = 1.second

    def lifetime: FiniteDuration = 10500.milliseconds // additional 0.5 seconds so that there are 10 request, not 9.

    def testMessage: String = "dolly"
  }

Important! Notice how the service binding is done. We specify a service implemented by one node as the implementation of another node's dependency method. The dependency type is checked by the compiler, because contains the protocol type. When run, the dependency will contain the correct target node ID. Thanks to this scheme, we specify the port number exactly once and are always guaranteed to refer to the correct port.

Implementation of two system nodes

For this configuration, we use the same service implementations without changes. The only difference is that we now have two objects that implement different sets of services:

  object TwoJvmNodeServerImpl extends ZeroServiceImpl[IO] with EchoServiceService with SigIntLifecycleServiceImpl {
    type Config = EchoConfig[String] with SigTermLifecycleConfig
  }

  object TwoJvmNodeClientImpl extends ZeroServiceImpl[IO] with EchoClientService with FiniteDurationLifecycleServiceImpl {
    type Config = EchoClientConfig[String] with FiniteDurationLifecycleConfig
  }

The first node implements the server and only needs the server configuration. The second node implements the client and uses another part of the configuration. Both nodes also need lifetime management. The server node runs indefinitely until it is stopped SIGTERM'ohm, and the client node terminates after some time. Cm. launcher app.

General development process

Let's see how this configuration approach affects the overall development process.

The configuration will be compiled along with the rest of the code and an artifact (.jar) will be generated. It seems to make sense to put the configuration in a separate artifact. This is because we can have many configurations based on the same code. Again, it is possible to generate artifacts corresponding to different configuration branches. Dependencies on specific versions of the libraries are saved with the configuration, and these versions are saved forever, whenever we choose to deploy that version of the configuration.

Any configuration change turns into a code change. And therefore, each
the change will be covered by the normal quality assurance process:

Ticket in bugtracker -> PR -> review -> merge with relevant branches ->
integration -> deployment

The main consequences of implementing a compiled configuration:

  1. The configuration will be consistent across all nodes of the distributed system. Due to the fact that all nodes receive the same configuration from a single source.

  2. It is problematic to change the configuration in only one of the nodes. Therefore, "configuration drift" is unlikely.

  3. It becomes more difficult to make small configuration changes.

  4. Most configuration changes will occur as part of the overall development process and will be reviewed.

Do I need a separate repository to store the production configuration? This configuration may contain passwords and other secret information that we would like to restrict access to. Based on this, it seems to make sense to store the final configuration in a separate repository. You can split the configuration into two parts, one containing public configuration settings and the other containing restricted access settings. This will allow most developers to have access to the general options. This separation is easy to achieve using intermediate traits containing default values.

Possible variations

Let's try to compare the compiled configuration with some common alternatives:

  1. Text file on the target machine.
  2. Centralized key-value store (etcd/zookeeper).
  3. Process components that can be reconfigured/restarted without restarting the process.
  4. Storing configuration outside of artifact and version control.

Text files provide a lot of flexibility in terms of small changes. The system administrator can log into the remote host, make changes to the relevant files, and restart the service. For large systems, however, this flexibility may not be desirable. From the changes made, there are no traces in other systems. Nobody reviews changes. It is difficult to establish who exactly made the changes and for what reason. Changes are not tested. If the system is distributed, then the administrator may forget to make the corresponding change on other nodes.

(It should also be noted that the use of a compiled configuration does not close the possibility of using text files in the future. It will be enough to add a parser and a validator that give the same output type Config, and you can use text files. It directly follows from this that the complexity of a system with a compiled configuration is somewhat less than the complexity of a system using text files, because text files require additional code.)

A centralized key-value store is a good mechanism for distributing the meta parameters of a distributed application. We need to decide what are configuration parameters and what are just data. Let's have a function C => A => B, and the parameters C rarely change, and the data A - often. In this case, we can say that C - configuration parameters, and A - data. It seems that configuration parameters differ from data in that they generally change less frequently than data. Also, data usually comes from one source (from the user), and configuration parameters from another (from the system administrator).

If rarely-changing parameters need to be updated without restarting the program, then this can often lead to complication of the program, because we need to somehow deliver parameters, store, parse and check, and process incorrect values. Therefore, from the point of view of reducing the complexity of the program, it makes sense to reduce the number of parameters that can change during program operation (or not to support such parameters at all).

For the purposes of this post, we will distinguish between static and dynamic parameters. If the logic of the service operation requires changing parameters during the course of the program, then we will call such parameters dynamic. Otherwise, the options are static and can be configured using a compiled configuration. For dynamic reconfiguration, we may need a mechanism for restarting parts of the program with new parameters, similar to how the operating system processes are restarted. (In our opinion, it is desirable to avoid real-time reconfiguration, since this increases the complexity of the system. If possible, it is better to use the standard OS capabilities to restart processes.)

One important aspect of using static configuration that makes people consider dynamic reconfiguration is the time it takes for the system to reboot after updating the configuration (downtime). Indeed, if we need to make changes to the static configuration, we will have to restart the system for the new values ​​to take effect. The downtime problem has a different severity for different systems. In some cases, you can schedule a reboot for a time when the load is at its lowest. If you want to provide continuous service, you can implement "connection drainage" (AWS ELB connection draining). At the same time, when we need to reboot the system, we launch a parallel instance of this system, switch the balancer to it, and wait until the old connections are completed. After all the old connections have ended, we turn off the old instance of the system.

Let us now consider the issue of storing the configuration inside the artifact or outside it. If we store the configuration inside the artifact, then at least we had the opportunity to make sure that the configuration was correct during the assembly of the artifact. If the configuration is outside the controlled artifact, it is difficult to trace who and why made changes to this file. How important is it? In our opinion, for many production systems, it is important to have a stable and high-quality configuration.

The version of an artifact allows you to determine when it was created, what values ​​it contains, what features are enabled/disabled, and who is responsible for any configuration change. Of course, storing the configuration inside an artifact requires some effort, so you need to make an informed decision.

Pros and cons

I would like to dwell on the pros and cons of the proposed technology.

Advantages

Below is a list of the main features of a compiled distributed system configuration:

  1. Static configuration check. Allows you to be sure that
    configuration is correct.
  2. Rich configuration language. Typically, other configuration methods are limited to string variable substitution at most. When using Scala, a wide range of language features become available to improve the configuration. For example, we can use
    traits for default values, using objects to group parameters, we can refer to vals declared once (DRY) in the enclosing scope. You can instantiate any classes directly inside the configuration (Seq, Map, custom classes).
  3. DSL. Scala has a number of language features that make it easy to create DSLs. It is possible to take advantage of these opportunities and implement a configuration language that is more convenient for the target user group, such that the configuration is at least readable by domain experts. Specialists can, for example, participate in the configuration review process.
  4. Integrity and synchrony between nodes. One advantage of having the configuration of an entire distributed system stored at a single point is that all values ​​are declared exactly once and then reused wherever they are needed. Using phantom types to declare ports ensures that, in all correct system configurations, hosts use compatible protocols. Explicit mandatory dependencies between nodes ensure that all services are linked.
  5. High quality change. Making changes to the configuration, using the overall development process, makes available high quality standards for the configuration as well.
  6. Simultaneous configuration update. Automatic system deployment after configuration changes ensure that all nodes are up to date.
  7. Simplifying the application. The application does not need to parsign, check the configuration, and handle invalid values. This reduces the complexity of the application. (Some of the configuration complexity that we see in our example is not an attribute of the compiled configuration, but only a conscious decision driven by a desire for more type-safety.) It's easy enough to fall back to regular configurationβ€”just implement the missing parts. Therefore, you can, for example, start with a compiled configuration, postponing the implementation of unnecessary parts until the time when it is really needed.
  8. versioned configuration. Since configuration changes follow the usual fate of any other changes, we end up with an artifact with a unique version. This allows us, for example, to revert to a previous configuration version if necessary. We can even use a year old configuration and the system will work exactly the same. A stable configuration improves the predictability and reliability of a distributed system. Since the configuration is fixed at compile time, it is quite difficult to fake it in production.
  9. Modularity. The proposed framework is modular, and modules can be combined in various ways to obtain different systems. In particular, it is possible to configure the system to run on one node in one embodiment and to run on multiple nodes in another. You can create multiple configurations for production instances of the system.
  10. Testing. By replacing individual services with mock objects, you can get several versions of the system that are convenient for testing.
  11. Integration testing. The presence of a single configuration of the entire distributed system provides the ability to run all components in a controlled environment as part of integration testing. It is easy to emulate, for example, a situation where some nodes become unavailable.

Disadvantages and limitations

The compiled configuration differs from other configuration approaches and may not be suitable for some applications. Below are some disadvantages:

  1. static configuration. Sometimes you need to quickly fix the configuration in production, bypassing all the protection mechanisms. Under this approach, it can be more difficult. At least compilation and automatic deployment will still be required. This is both a useful feature of the approach and a disadvantage in some cases.
  2. Configuration generation. If the configuration file is generated by an automatic tool, additional efforts may be required to integrate the build script.
  3. Tools. Currently, the utilities and techniques for working with the configuration are based on text files. Not all such utilities/techniques will be available in the case of a compiled configuration.
  4. A change of mind is required. Developers and DevOps are used to text files. The very idea of ​​compiling a configuration can be somewhat unexpected and unusual and cause rejection.
  5. A high quality development process is required. In order to comfortably use the compiled configuration, it is necessary to fully automate the process of building and deploying the application (CI / CD). Otherwise, it will be quite inconvenient.

Let us also dwell on a number of limitations of the considered example that are not related to the idea of ​​a compiled configuration:

  1. If we provide extra configuration information that is not used by the host, then the compiler won't help us detect the missing implementation. This problem can be solved by abandoning the Cake Pattern and using more rigid types, for example, HList or algebraic data types (case classes) to represent the configuration.
  2. There are lines in the configuration file that are not related to the actual configuration: (package, import, object declarations; override def's for parameters with default values). You can partially avoid this if you implement your own DSL. In addition, other types of configuration (such as XML) also impose certain restrictions on the structure of the file.
  3. Within the framework of this post, we do not consider dynamic reconfiguration of a cluster of similar nodes.

Conclusion

In this post, we looked at the idea of ​​representing a configuration in source code using the rich features of the Scala type system. This approach can be used in various applications as a replacement for traditional configuration methods based on xml or text files. Even though our example is implemented in Scala, the same ideas can be transferred to other compiled languages ​​(such as Kotlin, C#, Swift, ...). You can try this approach in one of the following projects, and if it doesn't work, move on to text files, adding the missing details.

Naturally, a compiled configuration requires a high quality development process. In return, high quality and reliability of configurations is ensured.

The considered approach can be extended:

  1. You can use macros to perform checks at compile time.
  2. You can implement a DSL to present the configuration in a way that is accessible to end users.
  3. You can implement dynamic resource management with automatic configuration tuning. For example, changing the number of nodes in a cluster requires that (1) each node get a slightly different configuration; (2) the cluster manager received information about new nodes.

Acknowledgements

I would like to thank Andrey Saxonov, Pavel Popov and Anton Nekhaev for their constructive criticism of the draft article.

Source: habr.com

Add a comment