We develop the world's most convenient* interface for viewing logs

We develop the world's most convenient* interface for viewing logs If you have ever used web interfaces to view logs, then you probably noticed how cumbersome and (often) not very convenient and responsive these interfaces are, as a rule. Some you can get used to, some are downright terrible, but it seems to me that the reason for all the problems is that we are not approaching the task of viewing logs in the right way: we are trying to create a web interface where the CLI (command line interface) works best. I am personally very comfortable working with tail, grep, awk and others, and therefore for me the ideal interface for working with logs would be something similar to tail and grep, but which at the same time could be used to read logs that came from many servers . That is, of course, to read them from ClickHouse!

*according to the personal opinion of the user youROCK

Meet logscli

I did not come up with a name for my interface, and, to be honest, it rather exists in the form of a prototype, but if you want to immediately see the source code, then welcome: https://github.com/YuriyNasretdinov/logscli (350 lines of selected Go code).

Capabilities

My goal was to make an interface that would seem familiar to those who are used to tail/grep, that is, to support the following things:

  1. View all logs, no filtering.
  2. Leave strings that contain a fixed substring (flag -F у grep).
  3. Leave strings that match the regular expression (flag -E у grep).
  4. The default view is in reverse chronological order, as the most recent logs are usually the most interesting first.
  5. Show context next to each line (options -A, -B и -C у grep, printing N lines before, after, and around each matching line, respectively).
  6. View incoming logs in real time, with and without filtering (essentially tail -f | grep).
  7. The interface must be compatible with less, head, tail and others - by default, results should be returned without restrictions on their number; lines are printed in a stream as long as the user is interested in receiving them; signal SIGPIPE should silently stop streaming logs, just like they do tail, grep and other UNIX utilities.

implementation

I will assume that you already somehow know how to deliver logs to ClickHouse. If not, I recommend trying LSD и kitten houseand this article about log delivery.

First you need to decide on the base scheme. Since you usually want to receive logs sorted by time, it seems logical to store them that way. If there are many categories of logs and they are all of the same type, then you can make the category of logs as the first column of the primary key - this will allow you to have one table instead of several, which will be a big plus when inserting into ClickHouse (on servers with hard drives, it is recommended to insert data no more than ~1 times per second for the whole server).

That is, we need something like the following table schema:

CREATE TABLE logs(
    category LowCardinality(String), -- категория логов (опционально)
    time DateTime, -- время события
    millis UInt16, -- миллисекунды (могут быть и микросекунды, и т.д.): рекомендуется хранить, если событий много, чтобы было легче различать события между собой
    ..., -- ваши собственные поля, например имя сервера, уровень логирования, и так далее
    message String -- текст сообщения
) ENGINE=MergeTree()
ORDER BY (category, time, millis)

Unfortunately, I could not immediately find any open sources with realistic logs that could be taken and downloaded, so I took this instead for an example reviews from products from Amazon until 2015. Of course, their structure is not exactly the same as that of text logs, but this is not important for illustration.

Instructions for Uploading Amazon Reviews to ClickHouse

Let's create a table:

CREATE TABLE amazon(
   review_date Date,
   time DateTime DEFAULT toDateTime(toUInt32(review_date) * 86400 + rand() % 86400),
   millis UInt16 DEFAULT rand() % 1000,
   marketplace LowCardinality(String),
   customer_id Int64,
   review_id String,
   product_id LowCardinality(String),
   product_parent Int64,
   product_title String,
   product_category LowCardinality(String),
   star_rating UInt8,
   helpful_votes UInt32,
   total_votes UInt32,
   vine FixedString(1),
   verified_purchase FixedString(1),
   review_headline String,
   review_body String
)
ENGINE=MergeTree()
ORDER BY (time, millis)
SETTINGS index_granularity=8192

The Amazon dataset only has a date for the recall, but there is no exact time, so we will fill in this data with randon.

You can not download all tsv-files and limit yourself to the first ~ 10-20 in order to already get a large enough data set that will not fit into 16 GB of RAM. To upload TSV files, I used the following command:

for i in *.tsv; do
    echo $i;
    tail -n +2 $i | pv |
    clickhouse-client --input_format_allow_errors_ratio 0.5 --query='INSERT INTO amazon(marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date) FORMAT TabSeparated'
done

On a standard Persistent Disk (which is an HDD) in Google Cloud with a size of 1000 GB (I took this size mainly so that the speed was a little higher, although perhaps an SSD of the required volume would have come out cheaper) the upload speed was approximately ~ 75 Mb / s on 4 cores.

  • I must say that I work at Google, but I used a personal account and this article has nothing to do with my work at the company

I will produce all illustrations with this dataset, since this is all that I had at hand.

Show data scan progress

Since in ClickHouse we will use full scan over a table with logs, and this operation can take a significant amount of time and give no results for a long time if few matches are found, it is desirable to be able to show the progress of the query until the first rows with the result are received. To do this, there is a parameter in the HTTP interface that allows you to give progress in HTTP headers: send_progress_in_http_headers=1. Unfortunately, Go's standard library doesn't know how to read headers as they are received, but the HTTP 1.0 interface (don't confuse it with 1.1!) is supported by ClickHouse, so you can open a raw TCP connection to ClickHouse, send GET /?query=... HTTP/1.0nn and get back the headers and body of the response without any escaping and encryption, so in this case we don't even need to use the standard library.

Streaming logs from ClickHouse

ClickHouse has been optimizing for queries with ORDER BY for a relatively long time (since 2019?), so a query like

SELECT time, millis, message
FROM logs
WHERE message LIKE '%something%'
ORDER BY time DESC, millis DESC

It will immediately start returning lines that have the "something" substring in message, without waiting for the end of the scan.

Also, it would be very convenient if ClickHouse itself canceled the request when the connection was closed with it, but this is not the default behavior. Automatic request cancellation can be enabled with the option cancel_http_readonly_queries_on_client_close=1.

Correct handling of SIGPIPE in Go

When you execute, say, the command some_cmd | head -n 10exactly how the command some_cmd terminates when head subtracted 10 lines? The answer is simple: when head ends, the pipe closes, and the stdout of the some_cmd command starts pointing, conditionally, "to nowhere". When some_cmd trying to write to a closed pipe, it receives the SIGPIPE signal, which silently terminates the program by default.

This also happens in Go by default, but the SIGPIPE signal handler at the end also prints "signal: SIGPIPE" or a similar message, and to remove this message, we just need to process SIGPIPE ourselves the way we want, that is, just silently exit:

ch := make(chan os.Signal)
signal.Notify(ch, syscall.SIGPIPE)
go func() {
    <-ch
    os.Exit(0)
}()

Show message context

Often you want to see the context in which some error occurred (for example, which request caused a panic, or what related problems were visible before the crash), and in grep this is done with the -A, -B, and -C options, which show the specified number of lines after, before, and around the message, respectively.

Unfortunately, I did not find an easy way to do the same in ClickHouse, so to display the context, an additional query is sent to each row of the result, something like this (details depend on sorting and whether the context is shown before or after):

SELECT time,millis,review_body FROM amazon
WHERE (time = 'ВРЕМЯ_СОБЫТИЯ' AND millis < МИЛЛИСЕКУНДЫ_СОБЫТИЯ) OR (time < 'ВРЕМЯ_СОБЫТИЯ')
ORDER BY time DESC, millis DESC
LIMIT КОЛИЧЕСТВО_СТРОК_КОНТЕКСТА
SETTINGS max_threads=1

Since the request is sent almost immediately after ClickHouse has returned the corresponding string, it gets into the cache and in general the request is executed quite quickly and spends a little CPU (usually the request takes about ~ 6 ms on my virtual machine).

Show new messages in real time

In order to show incoming messages in (almost) real time, we simply execute the request once every few seconds, remembering the last timestamp that we met before.

Command examples

What do typical logscli commands look like in practice?

If you downloaded the Amazon dataset I mentioned at the beginning of the article, you can run the following commands:

# Показать строки, где встречается слово walmart
$ logscli -F 'walmart' | less

# Показать самые свежие 10 строк, где встречается "terrible"
$ logscli -F terrible -limit 10

# То же самое без -limit:
$ logscli -F terrible | head -n 10

# Показать все строки, подходящие под /times [0-9]/, написанные для vine и у которых высокий рейтинг
$ logscli -E 'times [0-9]' -where="vine='Y' AND star_rating>4" | less

# Показать все строки со словом "panic" и 3 строки контекста вокруг
$ logscli -F 'panic' -C 3 | less

# Непрерывно показывать новые строки со словом "5-star"
$ logscli -F '5-star' -tailf

references

The utility code (without documentation) is available on github at https://github.com/YuriyNasretdinov/logscli. I'd love to hear your thoughts on my idea for a ClickHouse based log viewer console interface.

Source: habr.com

Add a comment