Loki - collecting logs using the Prometheus approach

Salute, Khabrovites! In anticipation of the start of a new enrollment for the course "DevOps practices and tools" prepared a translation of interesting material for you.

This article is a brief introduction to Loki. Project Loki supported by Grafana and is aimed at centralized collection of logs (from servers or containers).

The main inspiration for Loki was Prometheus with the idea of ​​applying his approaches to log management:

  • using labels to store data
  • low resource consumption

We will return to the principles of Prometheus and give some examples of its use in the context of Kubernetes.

A few words about Prometheus

To fully understand how Loki works, it's important to take a step back and revisit Prometheus a bit.

One of the distinguishing characteristics of Prometheus is the extraction of metrics from collection points (via exporters) and storing them in a TSDB (Time Series Data Base, time series database) with the addition of metadata in the form of labels.

Why this is necessary

Recently, Prometheus has become the de facto standard in the world of containers and Kubernetes: its installation is very simple, and a Kubernetes cluster initially has an endpoint for Prometheus. Prometheus can also extract metrics from applications deployed in a container while maintaining specific labels. Therefore, application monitoring is very easy to implement.

Unfortunately, there is still no turnkey solution for log management, and you have to find a solution for yourself:

  • managed cloud service for centralizing logs (AWS, Azure or Google)
  • monitoring service "monitoring as a service" (for example, Datadog)
  • creating your own log collection service.

For the third option, I have traditionally used Elasticsearch, despite the fact that I was not always happy with it (especially its heaviness and complexity of setup).

Loki was designed to be easy to implement according to the following principles:

  • be easy to start
  • consume few resources
  • work independently without any special maintenance
  • serve as an add-on to Prometheus to help with bug investigations

However, this simplicity comes at the expense of some compromises. One of them is not to index the content. Therefore, text search is not very efficient or rich and does not allow you to keep statistics on the content of the text. But since Loki wants to be the grep equivalent and complement to Prometheus, this is not a disadvantage.

Incident investigation

To better understand why Loki doesn't need indexing, let's go back to the incident investigation method used by the Loki developers:

Loki - collecting logs using the Prometheus approach
1 Alert → 2 Dashboard → 3 Adhoc Query → 4 Log Aggregation → 5 Distributed Tracing → 6 Fix!
(1 Warning → 2 Dashboard → 3 Adhoc Query → 4 Log Aggregation → 5 Distributed Tracing → 6 Fix!)

The idea is that we get some kind of alert (Slack Notification, SMS, etc.) and after that:

  • look at Grafana dashboards
  • look at service metrics (for example, in Prometheus)
  • look at log entries (for example, in Elasticsearch)
  • perhaps take a look at distributed traces (Jaeger, Zipkin, etc.)
  • and finally fix the original problem.

Here, in the case of the Grafana + Prometheus + Elasticsearch + Zipkin stack, you will have to use four different tools. To save time, it would be nice to be able to do all these steps with one tool: Grafana. It is worth noting that this approach to research has been implemented in Grafana since version 6. Thus, it becomes possible to access Prometheus data directly from Grafana.

Loki - collecting logs using the Prometheus approach
Explorer screen split between Prometheus and Loki

From this screen, you can view logs in Loki related to Prometheus metrics using the split screen concept. Since version 6.5, Grafana allows you to parse the trace id in Loki log entries to follow links to your favorite distributed tracing tools (Jaeger).

Loki local test

The easiest way to test Loki locally is to use docker-compose. The docker-compose file is located in the Loki repository. You can get the repository with the following command git:

$ git clone https://github.com/grafana/loki.git

Then you need to change to the production directory:

$ cd production

After that, you can get the latest Docker images:

$ docker-compose pull

Finally, the Loki stack is started with the following command:

$ docker-compose up

Loki architecture

Here is a small diagram with Loki architecture:

Loki - collecting logs using the Prometheus approach
Loki Architecture Principles

The web client runs applications on the server, Promtail collects logs and sends them to Loki, the web client also sends metadata to Loki. Loki aggregates everything and passes it to Grafana.
Loki is running. To view the available components, run the following command:

$ docker ps

In the case of a freshly installed Docker, the command should return the following result:

IMAGE               PORTS                  NAMES
grafana/promtail:                          production_promtail_1
grafana/grafana: m  0.0.0.0:3000->3000/tcp production_grafana_1
grafana/loki: late  80/tcp,0.0.0.0:3100... production_loki_1

We see the following components:

  • Promtail: agent responsible for centralizing logs
  • Grafana: the famous dashboard tool
  • Loki: data centralization daemon

As part of a classic infrastructure (for example, based on virtual machines), the Promtail agent must be deployed on each machine. Grafana and Loki can be installed on the same machine.

Deployment to Kubernetes

Installing Loki components in Kubernetes will be as follows:

  • daemonSet to deploy the Promtail agent on each of the machines in the server cluster
  • Loki Deployment
  • and the last one is the deployment of Grafana.

Luckily, Loki is available as a Helm package, making it easy to deploy.

Installation via Heml

You should already have Heml installed. It can be downloaded from the project's GitHub repository. It is installed by extracting the archive appropriate for your architecture and adding helm to $PATH.

Note: version 3.0.0 of Helm was released recently. Since there have been many changes in it, the reader is advised to wait a bit before starting to use it..

Adding source for Helm

The first step is to add the "loki" repository with the following command:

$ helm add loki https://grafana.github.io/loki/charts

After that, you can search for packages named “loki”:

$ helm search loki

Result:

loki/loki       0.17.2 v0.4.0 Loki: like Prometheus, but for logs.
loki/loki-stack 0.19.1 v0.4.0 Loki: like Prometheus, but for logs.
loki/fluent-bit 0.0.2  v0.0.1 Uses fluent-bit Loki go plugin for...
loki/promtail   0.13.1 v0.4.0 Responsible for gathering logs and...

These packages have the following features:

  • package loki/loki only matches the Loki server
  • package loki/fluent-bit allows you to deploy DaemonSet using fluent-bin to collect logs instead of Promtail
  • package loki/promtail contains a log collection agent
  • package loki/loki-stack, allows you to immediately deploy Loki together with Promtail.

Installing Loki

To deploy Loki to Kubernetes, run the following command in the “monitoring” namespace:

$ helm upgrade --install loki loki/loki-stack --namespace monitoring

To save to disk, add the option --set loki.persistence.enabled = true:

$ helm upgrade --install loki loki/loki-stack 
              --namespace monitoring 
              --set loki.persistence.enabled=true

Note: if you want to deploy Grafana at the same time, then add the parameter --set grafana.enabled = true

When you run this command, you should get the following output:

LAST DEPLOYED: Tue Nov 19 15:56:54 2019
NAMESPACE: monitoring
STATUS: DEPLOYED
RESOURCES:
==> v1/ClusterRole
NAME AGE
loki-promtail-clusterrole 189d
…
NOTES:
The Loki stack has been deployed to your cluster. Loki can now be added as a datasource in Grafana.
See <a href="http://docs.grafana.org/features/datasources/loki/">http://docs.grafana.org/features/datasources/loki/</a> for more details.

Looking at the state of the pods in the “monitoring” namespace, we can see that everything is deployed:

$ kubectl -n monitoring get pods -l release=loki

Result:

NAME                 READY  STATUS   RESTARTS  AGE
loki-0               1/1    Running  0         147m
loki-promtail-9zjvc  1/1    Running  0         3h25m
loki-promtail-f6brf  1/1    Running  0         11h
loki-promtail-hdcj7  1/1    Running  0         3h23m
loki-promtail-jbqhc  1/1    Running  0         11h
loki-promtail-mj642  1/1    Running  0         62m
loki-promtail-nm64g  1/1    Running  0         24m

All pods are running. Now it's time to do some tests!

Connecting to Grafana

In order to connect to Grafana under Kubernetes, you need to open a tunnel to its pod. Following is the command to open port 3000 for a Grafana pod:

$ kubectl -n port-forward monitoring svc/loki-grafana 3000:80

Another important point is the need to recover the Grafana administrator password. The password is kept secret loki-grafana in the field .data.admin-user in base64 format.

To restore it, you need to run the following command:

$ kubectl -n monitoring get secret loki-grafana 
 --template '{{index .data "admin-password" | base64decode}}'; echo

Use this password in conjunction with the default administrator account (admin).

Loki data source definition in Grafana

First of all, make sure the Loki data source (Configuration / Datasource) is created.
Here's an example:

Loki - collecting logs using the Prometheus approach
An example of setting up a data source for Loki

By clicking on “Test” you can test the connection with Loki.

Making requests to Loki

Now go to Grafana and go to the “Explore” section. When receiving logs from containers, Loki adds metadata from Kubernetes. Thus, it becomes possible to view the logs of a specific container.

For example, to select promtail container logs, you can use the following query: {container_name = "promtail"}.
Don't forget to select the Loki data source here as well.

This query will return container activity as follows:

Loki - collecting logs using the Prometheus approach
Query result in Grafana

Adding to the dashboard

Starting with Grafana 6.4, it is possible to put log information directly on the dashboard. After that, the user will be able to quickly switch between the number of requests on his site to application traces.

Below is an example dashboard that implements this interaction:

Loki - collecting logs using the Prometheus approach
Sample dashboard with Prometheus metrics and Loki logs

The future of Loki

I started using Loki back in May/June with version 0.1. Version 1 has already been released today, and even 1.1 and 1.2.

It must be admitted that version 0.1 was not stable enough. But 0.3 already showed real signs of maturity, and the next versions (0.4, then 1.0) only strengthened this impression.

After 1.0.0, no one can have an excuse not to use this wonderful tool.

Further improvements should not be about Loki, but rather its integration with the excellent Grafana. In fact, Grafana 6.4 already has a nice integration with dashboards.

Grafana 6.5, which was released recently, further improves this integration by automatically recognizing the contents of logs in JSON format.

The video below shows a small example of this mechanism:

Loki - collecting logs using the Prometheus approach
Using Loki strings rendered in Grafana

It becomes possible to use one of the JSON fields, for example, to:

  • links to an external tool
  • log content filtering

For example, you can click on traceId to go to Zipkin or Jaeger.

As usual, we look forward to your comments and invite you to open webinar, where we will talk about how the DevOps industry has developed during 2019 and discuss possible development paths for 2020.

Source: habr.com