Run Keycloak in HA mode on Kubernetes

Run Keycloak in HA mode on Kubernetes

TL; DR: there will be a description of Keycloak, an open source access control system, an analysis of the internal device, configuration details.

Introduction and main ideas

In this article, we will see the main ideas to keep in mind when deploying a Keycloak cluster on top of Kubernetes.

If you want to know more about Keycloak, please refer to the links at the end of the article. In order to immerse yourself more deeply in practice, you can study our repository with a module that implements the main ideas of this article (the launch guide is there, in this article there will be an overview of the device and settings, approx. translator).

Keycloak is a complex system written in Java and built on top of an application server. wild fly. In short, it is an authorization framework that gives application users federation and SSO (single sign-on) capability.

We invite you to read the official broker or Wikipedia for detailed understanding.

Start Keycloak

Keycloak needs two persistent data sources to run:

  • A database used to store persistent data, such as information about users
  • Datagrid cache, which is used to cache data from the database, as well as to store some short-lived and frequently changed metadata, such as user sessions. Released Infinispan, which is usually significantly faster than the database. But in any case, the data saved in Infinispan is ephemeral - and it does not need to be saved somewhere when the cluster is restarted.

Keycloak works in four different modes:

  • Normal - one and only one process, configured through a file standalone.xml
  • regular cluster (highly available option) - All processes must use the same configuration, which must be synchronized manually. Settings are stored in a file standalone-ha.xml, in addition, you need to make a shared access to the database and a load balancer.
  • Domain cluster - starting the cluster in normal mode quickly becomes a routine and boring task as the cluster grows, since every time you change the configuration, you need to make all the changes on each node of the cluster. The domain mode of operation solves this issue by setting up some shared storage and publishing the configuration. These settings are stored in a file domain.xml
  • Replication between data centers - in case you want to run Keycloak in a cluster of several data centers, most often in different geographical locations. In this option, each data center will have its own cluster of Keycloak servers.

In this article, we will take a closer look at the second option, i.e. regular cluster, as well as a little touch on the topic of replication between data centers, since it makes sense to run these two options in Kubernetes. Luckily Kubernetes doesn't have a problem with syncing the settings of multiple pods (Keycloak nodes), so domain cluster it won't be too hard to do.

Also please note that the word Cluster until the end of the article will only apply to a group of Keycloak nodes working together, there is no need to refer to a Kubernetes cluster.

Regular Keycloak Cluster

To run Keycloak in this mode, you need:

  • set up an external shared database
  • install load balancer
  • have an internal network with ip multicast support

We will not analyze the configuration of the external database, since it is not the purpose of this article. Let's assume that somewhere there is a working database - and we have a connection point to it. We will simply add this data to the environment variables.

To better understand how Keycloak works in a failover (HA) cluster, it's important to know how much it all depends on Wildfly's clustering abilities.

Wildfly uses several subsystems, some of them are used as a load balancer, some are used for failover. The load balancer ensures the availability of the application when the cluster node is overloaded, and failover ensures the availability of the application even if some of the cluster nodes fail. Some of these subsystems are:

  • mod_cluster: works in conjunction with Apache as an HTTP load balancer, depends on TCP multicast for default host discovery. Can be replaced by an external balancer.

  • infinispan: distributed cache using JGroups channels as transport layer. Optionally, it can use the HotRod protocol to communicate with an external Infinispan cluster to synchronize the contents of the cache.

  • jgroups: Provides support for group association for highly available services based on JGroups channels. Named pipes allow application instances in a cluster to be connected into groups so that the connection has properties such as reliability, orderliness, and failure sensitivity.

load balancer

When installing a balancer as an ingress controller in a Kubernetes cluster, it is important to keep the following things in mind:

Keycloak's work implies that the remote address of the client connecting via HTTP to the authentication server is the real IP address of the client computer. Balancer and ingress settings should correctly set HTTP headers X-Forwarded-For ΠΈ X-Forwarded-Proto, and keep the original title HOST. latest version ingress-nginx (> 0.22.0) disables it by default

Flag activation proxy-address-forwarding by setting an environment variable PROXY_ADDRESS_FORWARDING Π² true gives Keycloak the understanding that it is running behind a proxy.

You also need to enable sticky sessions in ingress. Keycloak uses Infinispan's distributed cache to store data associated with the current authentication session and user session. Caches are single owner by default, in other words that particular session is stored on some cluster node and other nodes must request it remotely if they need access to that session.

Specifically, contrary to the documentation, attaching a session with the cookie name did not work for us AUTH_SESSION_ID. Keycloak has looped the redirect, so we recommend choosing a different cookie name for the sticky session.

Keycloak also attaches the name of the host that answered first to AUTH_SESSION_ID, and since each node in the highly available version uses the same database, each of them must have a separate and unique node ID for managing transactions. It is recommended to put in JAVA_OPTS Parameters jboss.node.name ΠΈ jboss.tx.node.id unique for each node - for example, you can set the name of the pod. If you put the name of the pod - do not forget about the 23 character limit for jboss variables, so it's better to use StatefulSet, not Deployment.

Another rake - if a pod is deleted or restarted, its cache is lost. With this in mind, it is worth setting the number of cache owners for all caches to at least two, so there will be a copy of the cache. The solution is to run script for Wildfly when starting the pod, placing it in the directory /opt/jboss/startup-scripts in container:

Script content

embed-server --server-config=standalone-ha.xml --std-out=echo
batch

echo * Setting CACHE_OWNERS to "${env.CACHE_OWNERS}" in all cache-containers

/subsystem=infinispan/cache-container=keycloak/distributed-cache=sessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=authenticationSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=actionTokens:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=clientSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineClientSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=loginFailures:write-attribute(name=owners, value=${env.CACHE_OWNERS:1})

run-batch
stop-embedded-server

then set the value of the environment variable CACHE_OWNERS to the required.

Private network with ip multicast support

If you're using Weavenet as your CNI, multicast will work right away - and your Keycloak nodes will see each other as soon as they're up and running.

If you don't have ip multicast support in your Kubernetes cluster, you can configure JGroups to work with other protocols to find nodes.

The first option is to use KUBE_DNSwhich uses headless service to find Keycloak nodes, you simply pass JGroups the name of the service that will be used to find the nodes.

Another option is to use the method KUBE_PING, which works with the API for finding nodes (you need to configure serviceAccount with rights list ΠΈ get, and then configure the pods to work with this serviceAccount).

How nodes are searched for JGroups is configured by setting environment variables JGROUPS_DISCOVERY_PROTOCOL ΠΈ JGROUPS_DISCOVERY_PROPERTIES. For KUBE_PING you need to choose pods by asking namespace ΠΈ labels.

️ If you use multicast and run two or more Keycloak clusters in the same Kubernetes cluster (let's say one in namespace productionsecond - staging) - nodes from one Keycloak cluster can join another cluster. Be sure to use a unique multicast address for each cluster by setting variablesjboss.default.multicast.address и jboss.modcluster.multicast.address в JAVA_OPTS.

Replication between data centers

Run Keycloak in HA mode on Kubernetes

communication

Keycloak uses multiple separate Infinispan Cache Clusters for each data center hosting Keycloack clusters made up of Keycloak nodes. But at the same time, there is no difference between Keycloak nodes in different data centers.

Keycloak nodes use an external Java Data Grid (Infinispan servers) to communicate between data centers. Communication works according to the protocol Infinispan HotRod.

Infinispan caches must be configured with the attribute remoteStore, so that the data can be stored in remote (in another data center, approx. translator) caches. There are separate infinispan clusters among the JDG servers, so data stored on JDG1 on site site1 will be replicated to JDG2 on site site2.

Finally, the receiving JDG server notifies the Keycloak servers of its cluster via client connections, which is a feature of the HotRod protocol. Keycloak nodes on site2 update their Infinispan caches and the particular user session becomes available on the Keycloak nodes on site2.

It is also possible for some caches not to be backed up and to completely refuse to write data through the Infinispan server. To do this, you need to remove the setting remote-store specific Infinispan cache (in file standalone-ha.xml), after which some specific replicated-cache will also no longer be needed on the side of the Infinispan server.

Setting up caches

There are two types of caches in Keycloak:

  • Local. It is located next to the base, serves to reduce the load on the database, as well as to reduce response latency. This type of cache stores the realm, clients, roles, and user metadata. This type of cache is not replicated even if this cache is part of a Keycloak cluster. If some entry in the cache changes, a change message is sent to the rest of the servers in the cluster, after which the entry is excluded from the cache. see description work below for a more detailed description of the procedure.

  • Replicable. Processes user sessions, offline tokens, and monitors login failures to detect password phishing attempts and other attacks. The data stored in these caches is temporary, stored only in RAM, but can be replicated across the cluster.

Infinispan Caches

Sessions - a concept in Keycloak, separate caches, which are called authenticationSessions, are used to store the data of specific users. Requests from these caches are usually needed by the browser and Keycloak servers, not applications. This is where the dependence on sticky sessions manifests itself, and such caches themselves do not need to be replicated, even in the case of Active-Active mode.

Action tokens. Another concept, usually used for various scenarios, when, for example, the user needs to do something asynchronously by mail. For example, during the procedure forget password cache actionTokens used to track the metadata of related tokens - for example, the token has already been used and cannot be reactivated. This type of cache should typically be replicated between datacenters.

Caching and expiration of stored data works to take the load off the database. This caching improves performance but adds an obvious problem. If one Keycloak server updates the data, the rest of the servers must be notified so that they can update their caches. Keycloak uses local caches realms, users ΠΈ authorization for caching data from the database.

There is also a separate cache work, which is replicated across all data centers. It itself does not store any data from the database, but serves to send data aging messages to cluster nodes between data centers. In other words, as soon as the data is updated, the Keycloak node sends a message to other nodes in its data center, as well as nodes in other data centers. Upon receipt of such a message, each node purges the corresponding data in its local caches.

User sessions. Caches with names sessions, clientSessions, offlineSessions ΠΈ offlineClientSessions, are usually replicated between data centers and serve to store data about user sessions that are active while the user is active in the browser. These caches work with the application that handles HTTP requests from end users, so they are associated with sticky sessions and must be replicated between datacenters.

brute force protection. Cache loginFailures used to track login error data, such as the number of times a user entered an incorrect password. Replication of this cache is up to the administrator. But for an accurate calculation, it is worth activating replication between data centers. But on the other hand, if you do not replicate this data, you will be able to improve performance, and if this question arises, replication may not be activated.

When rolling out an Infinispan cluster, you need to add cache definitions to the settings file:

<replicated-cache-configuration name="keycloak-sessions" mode="ASYNC" start="EAGER" batching="false">
</replicated-cache-configuration>

<replicated-cache name="work" configuration="keycloak-sessions" />
<replicated-cache name="sessions" configuration="keycloak-sessions" />
<replicated-cache name="offlineSessions" configuration="keycloak-sessions" />
<replicated-cache name="actionTokens" configuration="keycloak-sessions" />
<replicated-cache name="loginFailures" configuration="keycloak-sessions" />
<replicated-cache name="clientSessions" configuration="keycloak-sessions" />
<replicated-cache name="offlineClientSessions" configuration="keycloak-sessions" />

You must configure and start the Infinispan cluster before running the Keycloak cluster

Then you need to set remoteStore for Keycloak caches. For this, a script is enough, which is done similarly to the previous one, which is used to set the variable CACHE_OWNERS, you need to save it to a file and put it in a directory /opt/jboss/startup-scripts:

Script content

embed-server --server-config=standalone-ha.xml --std-out=echo
batch

echo *** Update infinispan subsystem ***
/subsystem=infinispan/cache-container=keycloak:write-attribute(name=module, value=org.keycloak.keycloak-model-infinispan)

echo ** Add remote socket binding to infinispan server **
/socket-binding-group=standard-sockets/remote-destination-outbound-socket-binding=remote-cache:add(host=${remote.cache.host:localhost}, port=${remote.cache.port:11222})

echo ** Update replicated-cache work element **
/subsystem=infinispan/cache-container=keycloak/replicated-cache=work/store=remote:add( 
    passivation=false, 
    fetch-state=false, 
    purge=false, 
    preload=false, 
    shared=true, 
    remote-servers=["remote-cache"], 
    cache=work, 
    properties={ 
        rawValues=true, 
        marshaller=org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory, 
        protocolVersion=${keycloak.connectionsInfinispan.hotrodProtocolVersion} 
    } 
)

/subsystem=infinispan/cache-container=keycloak/replicated-cache=work:write-attribute(name=statistics-enabled,value=true)

echo ** Update distributed-cache sessions element **
/subsystem=infinispan/cache-container=keycloak/distributed-cache=sessions/store=remote:add( 
    passivation=false, 
    fetch-state=false, 
    purge=false, 
    preload=false, 
    shared=true, 
    remote-servers=["remote-cache"], 
    cache=sessions, 
    properties={ 
        rawValues=true, 
        marshaller=org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory, 
        protocolVersion=${keycloak.connectionsInfinispan.hotrodProtocolVersion} 
    } 
)
/subsystem=infinispan/cache-container=keycloak/distributed-cache=sessions:write-attribute(name=statistics-enabled,value=true)

echo ** Update distributed-cache offlineSessions element **
/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineSessions/store=remote:add( 
    passivation=false, 
    fetch-state=false, 
    purge=false, 
    preload=false, 
    shared=true, 
    remote-servers=["remote-cache"], 
    cache=offlineSessions, 
    properties={ 
        rawValues=true, 
        marshaller=org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory, 
        protocolVersion=${keycloak.connectionsInfinispan.hotrodProtocolVersion} 
    } 
)
/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineSessions:write-attribute(name=statistics-enabled,value=true)

echo ** Update distributed-cache clientSessions element **
/subsystem=infinispan/cache-container=keycloak/distributed-cache=clientSessions/store=remote:add( 
    passivation=false, 
    fetch-state=false, 
    purge=false, 
    preload=false, 
    shared=true, 
    remote-servers=["remote-cache"], 
    cache=clientSessions, 
    properties={ 
        rawValues=true, 
        marshaller=org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory, 
        protocolVersion=${keycloak.connectionsInfinispan.hotrodProtocolVersion} 
    } 
)
/subsystem=infinispan/cache-container=keycloak/distributed-cache=clientSessions:write-attribute(name=statistics-enabled,value=true)

echo ** Update distributed-cache offlineClientSessions element **
/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineClientSessions/store=remote:add( 
    passivation=false, 
    fetch-state=false, 
    purge=false, 
    preload=false, 
    shared=true, 
    remote-servers=["remote-cache"], 
    cache=offlineClientSessions, 
    properties={ 
        rawValues=true, 
        marshaller=org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory, 
        protocolVersion=${keycloak.connectionsInfinispan.hotrodProtocolVersion} 
    } 
)
/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineClientSessions:write-attribute(name=statistics-enabled,value=true)

echo ** Update distributed-cache loginFailures element **
/subsystem=infinispan/cache-container=keycloak/distributed-cache=loginFailures/store=remote:add( 
    passivation=false, 
    fetch-state=false, 
    purge=false, 
    preload=false, 
    shared=true, 
    remote-servers=["remote-cache"], 
    cache=loginFailures, 
    properties={ 
        rawValues=true, 
        marshaller=org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory, 
        protocolVersion=${keycloak.connectionsInfinispan.hotrodProtocolVersion} 
    } 
)
/subsystem=infinispan/cache-container=keycloak/distributed-cache=loginFailures:write-attribute(name=statistics-enabled,value=true)

echo ** Update distributed-cache actionTokens element **
/subsystem=infinispan/cache-container=keycloak/distributed-cache=actionTokens/store=remote:add( 
    passivation=false, 
    fetch-state=false, 
    purge=false, 
    preload=false, 
    shared=true, 
    cache=actionTokens, 
    remote-servers=["remote-cache"], 
    properties={ 
        rawValues=true, 
        marshaller=org.keycloak.cluster.infinispan.KeycloakHotRodMarshallerFactory, 
        protocolVersion=${keycloak.connectionsInfinispan.hotrodProtocolVersion} 
    } 
)
/subsystem=infinispan/cache-container=keycloak/distributed-cache=actionTokens:write-attribute(name=statistics-enabled,value=true)

echo ** Update distributed-cache authenticationSessions element **
/subsystem=infinispan/cache-container=keycloak/distributed-cache=authenticationSessions:write-attribute(name=statistics-enabled,value=true)

echo *** Update undertow subsystem ***
/subsystem=undertow/server=default-server/http-listener=default:write-attribute(name=proxy-address-forwarding,value=true)

run-batch
stop-embedded-server

Don't forget to install JAVA_OPTS for Keycloak nodes to work HotRod: remote.cache.host, remote.cache.port and service name jboss.site.name.

Links and additional documentation

The article was translated and prepared for Habr by employees Slurm training center β€” intensives, video courses and corporate training from practitioners (Kubernetes, DevOps, Docker, Ansible, Ceph, SRE)

Source: habr.com

Add a comment