Not New Relic Alone: ​​A Look at Datadog and Atatus

Not New Relic Alone: ​​A Look at Datadog and Atatus

In the environment of SRE-/DevOps-engineers, you won’t surprise anyone that one day a client (or a monitoring system) appears and reports that “everything is gone”: the site is down, payments are not going through, life is decay… No matter how much you want to help in such a situation , it can be very difficult to do this without a simple and understandable tool. Often the problem is hidden in the code of the application itself - you just need to localize it.

And in sorrow and in joy…

It so happened that we have been very fond of New Relic for a very long time. It was and remains a great tool for monitoring the performance of an application, and also allows you to instrument a microservice architecture (using its agent) and much, much more. And everything could be great if it were not for changes in the pricing policy of the service: its cost with 2013 years increased by 3+ times. In addition, since last year, getting a trial account requires communication with a personal manager, which makes it difficult to present the product to a potential customer.

Usual situation: New Relic is not needed on a “permanent basis”, it is only remembered at the moment when the problems began. But you still need to pay regularly (140 USD per server per month), and in an automatically scaling cloud infrastructure, the amounts are rather big. Although there is a “Pay-As-You-Go” option, enabling New Relic will require you to restart the application, which can lead to the loss of the problematic situation for which everything was started. Not so long ago, New Relic introduced a new tariff plan - Essentials, - which at first glance looks like a reasonable alternative to Professional ... but upon closer examination, it turned out that some important functions are missing (in particular, it does not Key Transactions, Cross Application Tracing, Distributed Tracing).

As a result, we thought about finding a cheaper alternative, and our choice fell on the two services Datadog and Atatus. Why on them?

About competitors

Immediately make a reservation that there are other solutions on the market. We even considered Open Source options, but not every client has free capacity to host solutions of the self-hosted category ... - in addition, they will require additional maintenance. The couple we chose was the closest to our needs:

  • built-in and advanced support for PHP applications (the stack of our clients is very diverse, but this is a clear leader in the context of finding an alternative to New Relic);
  • affordable cost (less than 100 USD per month per host);
  • automatic instrumentation;
  • integration with Kubernetes;
  • the similarity to the New Relic interface is a noticeable plus (because our engineers are used to it).

Therefore, at the stage of primary selection, we weeded out several other popular solutions, and in particular:

  • Tideways, AppDynamics and Dynatrace - for the cost;
  • Stackify is blocked in the Russian Federation and shows too little data.

The following article is structured in such a way that first the solutions considered will be briefly presented, after which I will talk about our typical interaction with New Relic and experience / impressions from performing similar operations in other services.

Presentation of selected competitors

Not New Relic Alone: ​​A Look at Datadog and Atatus
About New Relicprobably heard by everyone? This service began its development more than 10 years ago, in 2008. We have been actively using it since 2012 and have had no problems integrating a really large number of applications in PHP, Ruby and Python, and we have also had experience integrating with C # and Go. The authors of the service have solutions for monitoring applications, infrastructure, tracing microservice infrastructures, convenient applications for user devices have been created, and much more.

However, the New Relic agent uses proprietary protocols and does not support OpenTracing. Advanced instrumentation requires editing specifically for New Relic. Finally, Kubernetes support is still experimental.

Not New Relic Alone: ​​A Look at Datadog and Atatus
Launched in 2010 datadog looks noticeably more interesting than New Relic just in terms of application in Kubernetes environments. In particular, it supports integration with NGINX Ingress, logging, statsd and OpenTracing protocols, which allows you to track a user request from the moment it connects to the end of work, as well as find logs for this request (both on the web server side and on the side of consumers).

When using Datadog, we encountered the fact that it sometimes incorrectly built a microservice map, and some technical shortcomings. For example, it incorrectly determined the type of service (took Django for a caching service) and caused 500 errors in a PHP application using the popular Predis library.

Not New Relic Alone: ​​A Look at Datadog and Atatus
Atatus - the youngest instrument; service launched in 2014. Its marketing budget is clearly inferior to the listed competitors, mentions are much rarer. Nevertheless, the tool itself is very similar to New Relic, not only in terms of features (APM, Browser monitoring, etc.), but also in appearance.

A significant drawback is that it only supports Node.js and PHP. On the other hand, it is implemented noticeably better than that of Datadog. Unlike the latter, Atatus does not require applications to improve and put additional tags in the code.

How we work with New Relic

Now let's look at how we generally use New Relic in general. Let's say we have a problem that needs to be solved:

Not New Relic Alone: ​​A Look at Datadog and Atatus

It is easy to see on the chart spike - let's analyze it. In New Relic, web transactions are immediately selected for a web application, all components are indicated in the performance graph, there are error-rate, request-rate panels ... to the database section).

Since in this example we see a surge in activity PHP, click on this graph and automatically go to Transactions:

Not New Relic Alone: ​​A Look at Datadog and Atatus

The list of transactions that are essentially controllers from the MVC model is already sorted by Most time consuming, which is very convenient: we immediately see what the application is doing. There are also examples of long requests that are automatically collected by New Relic. By switching sorting, it's easy to find:

  • the most loaded application controller;
  • the most frequently requested controller;
  • the slowest of the controllers.

In addition, you can expand each transaction and see what the application was doing at the time the code was executed:

Not New Relic Alone: ​​A Look at Datadog and Atatus

Finally, the application saves examples of traces of long requests (which process more than 2 seconds). Here is the panel for the long transaction:

Not New Relic Alone: ​​A Look at Datadog and Atatus

It can be seen that two methods take a lot of time, and at the same time, the time when the request was executed, its URI and domain is shown. Very often it helps to find the request in the logs. Going to trace details, you can see where these methods are called from:

Not New Relic Alone: ​​A Look at Datadog and Atatus

And in Database queries — to evaluate queries to databases that were executed at the time the application was running:

Not New Relic Alone: ​​A Look at Datadog and Atatus

Armed with this knowledge, we can evaluate the cause of the application slowdown and, together with the developer, develop a strategy to solve the problem. In reality, New Relic does not always give a clear picture, but it helps to choose the direction of the investigation:

  • long PDO::Construct led us to the strange functioning of pgpoll;
  • instability over time Memcache::Get prompted about the incorrect configuration of the virtual machine;
  • Suspiciously increased time for processing the template led to a nested loop with a check for the presence of 500 avatars in the object storage;
  • and so on…

It also happens that instead of executing code on the main screen, something related to external data storage grows - and it doesn’t matter what it will be: Redis or PostgreSQL - they are all hidden in the tab Databases.

Not New Relic Alone: ​​A Look at Datadog and Atatus

You can select a specific base for research and sort requests - similar to how it is done in Transactions. And by going to the request tab, you can see how many this request occurs in each of the application controllers, as well as evaluate how often it is called. It is very comfortable:

Not New Relic Alone: ​​A Look at Datadog and Atatus

Similar data contains the tab External Services, which hides requests to external HTTP services, such as accessing object storage, sending events to sentry, or the like. In its content, the tab is completely similar to Databases:

Not New Relic Alone: ​​A Look at Datadog and Atatus

Competitors: opportunities and impressions

Now the most interesting thing is to compare the capabilities of New Relic with what competitors offer. Unfortunately, we were unable to test all three tools on the same version of a single production application. Nevertheless, we tried to compare the most identical situations / configurations.

1.Datadog

Datadog greets us with a panel with a wall of services:

Not New Relic Alone: ​​A Look at Datadog and Atatus

It tries to break applications into components / microservices, so in the example Django application, we will see 2 connections to PostgreSQL (defaultdb и postgres), as well as Celery, Redis. Working with Datadog requires you to have minimal knowledge of MVC principles: you need to understand where user requests generally come from. It usually helps service map:

Not New Relic Alone: ​​A Look at Datadog and Atatus

By the way, there is something similar in New Relic:

Not New Relic Alone: ​​A Look at Datadog and Atatus

... and their map, in my opinion, is made simpler and clearer: it does not display the components of a single application (which would make it unnecessarily detailed, as in the case of Datadog), but only specific services or microservices.

Back to Datadog, you can see from the service map that user requests are coming to Django. Let's go to the Django service and finally see what we expected:

Not New Relic Alone: ​​A Look at Datadog and Atatus

Unfortunately, there is no chart by default. Web transaction time, similar to what we see on the main panel of New Relic. However, it can be configured in place of the graph % of time spent. It is enough to switch it to Avg time per request by Type... and now the familiar graph is looking at us!

Not New Relic Alone: ​​A Look at Datadog and Atatus

Why Datadog chose a different chart is a mystery to us. It was also frustrating that the system does not remember the user's choice (unlike both competitors), and therefore only the creation of custom panels saves.

But I was pleased with the opportunity in Datadog to switch from these graphs to the metrics of linked servers, read the logs and evaluate the load of the web server handlers (Gunicorn). Everything is almost like in New Relic ... and even a little more (logs)!

Below the charts are transactions that are completely similar to New Relic:

Not New Relic Alone: ​​A Look at Datadog and Atatus

In Datadog, transactions are called the resources. You can sort the controllers by the number of requests, by the average response time, by the maximum elapsed time for the selected period of time.

The resource can be expanded and you can see everything that we have already observed in New Relic:

Not New Relic Alone: ​​A Look at Datadog and Atatus

There are resource statistics, a generalized list of internal calls, and examples of requests that can be sorted by response code ... By the way, our engineers really liked this sorting.

Any example resource in Datadog can be opened and examined:

Not New Relic Alone: ​​A Look at Datadog and Atatus

Query parameters are presented, a summary chart of the time spent on each of the components, and a waterfall chart that shows the sequence of calls. It is also possible to switch to a tree view of a waterfall chart:

Not New Relic Alone: ​​A Look at Datadog and Atatus

And the most interesting thing is viewing the load of the host on which the request was executed, and viewing the request logs.

Not New Relic Alone: ​​A Look at Datadog and Atatus

Great integration!

You might wonder where the tabs are Databases и External Services, like in New Relic. They are not here: since Datadog disassembles the application into components, PostgreSQL will be considered separate service, and instead of External Services you should look for aws.storage (It will be the same for every other external service that the application can access).

Not New Relic Alone: ​​A Look at Datadog and Atatus

And here is an example with postgres:

Not New Relic Alone: ​​A Look at Datadog and Atatus

In fact, there is everything that we wanted:

Not New Relic Alone: ​​A Look at Datadog and Atatus

You can see from which "service" the request came.

It will not be superfluous to recall that Datadog perfectly integrates with NGINX Ingress and allows you to perform end-to-end tracing from the moment a request arrives in the cluster, and also allows you to receive statsd metrics, collect logs and host metrics.

A huge plus of Datadog is that its price is shaping up from infrastructure monitoring, APM, Log Management and Synthetics test, i.e. You can choose the plan flexibly.

2.Atatus

The Atatus team claims that their service is “just like New Relic, but better.” Let's see if this is actually the case.

The title bar does indeed look similar, but it was not possible to determine the Redis and memcached used in the application.

Not New Relic Alone: ​​A Look at Datadog and Atatus

APM selects all transactions by default, although only Web. As in Datadog, there is no way to go to the desired service from the main panel. Moreover, transactions are in the list after errors, which does not look very logical for APM.

In transactions, Atatus is very similar to New Relic. Minus - you can not immediately see the dynamics for each of the controllers. You have to look for it in the controller table, sorting by Most Time Consumed:

Not New Relic Alone: ​​A Look at Datadog and Atatus

The list of controllers familiar to us is available in the tab Explore:

Not New Relic Alone: ​​A Look at Datadog and Atatus

In some ways, this table resembles Datadog and I like it more than the similar one in New Relic.

You can expand each transaction and see what the application was doing:

Not New Relic Alone: ​​A Look at Datadog and Atatus

The panel is also more like Datadog: there is a number of requests, a general picture of calls. The top bar provides a tab with errors HTTP Failures and examples of slow queries Session Traces:

Not New Relic Alone: ​​A Look at Datadog and Atatus

If you go into a transaction, you can see an example of a trace, you can get a list of requests to the database and see the request headers. Everything is similar to New Relic:

Not New Relic Alone: ​​A Look at Datadog and Atatus

In general, Atatus pleased with detailed traces - without gluing calls into a reminder block typical of New Relic:

Not New Relic Alone: ​​A Look at Datadog and Atatus
Not New Relic Alone: ​​A Look at Datadog and Atatus

However, it lacks a filter that (as in New Relic) cuts off ultra-fast requests (<5ms). On the other hand, I liked the display of the final transaction response (successful or error).

 Panel Databases will help to study the requests to external databases that the application makes. Let me remind you that Atatus found only PostgreSQL and MySQL, although Redis and memcached are also involved in the project.

Not New Relic Alone: ​​A Look at Datadog and Atatus

Requests are sorted according to the usual criteria: response frequency, average response time, and so on. Separately, I would like to note the tab with the slowest requests - this is very convenient. Moreover, the data in this tab for PostgreSQL matched the data from the extension pg_stat_statements - excellent result!

Not New Relic Alone: ​​A Look at Datadog and Atatus

Tab External Requests completely identical to Databases.

Conclusions

Both presented tools performed well in the role of APM. Any of them can offer the necessary minimum. We can briefly summarize our impressions as follows:

datadog

Pros:

  • convenient tariff scale (APM costs 31 USD per host);
  • performed well with Python;
  • possibility of integration with OpenTracing
  • integration with Kubernetes;
  • integration with NGINX Ingress.

Cons:

  • the only APM that caused the application to be unavailable due to a module (predis) error;
  • weak PHP autoinstrumentation;
  • a somewhat strange definition of services and their purpose.

Atatus

Pros:

  • deep PHP instrumentation;
  • similar to New Relic user interface.

Cons:

  • does not work on older operating systems (Ubuntu 12.05, CentOS 5);
  • weak auto-instrumentation;
  • support for only two programming languages ​​(Node.js and PHP);
  • slow interface.

Considering the price of Atatus at 69 USD per server per month, we would rather use Datadog, which integrates perfectly for our needs (web applications in K8s) and has many useful features.

PS

Read also on our blog:

Source: habr.com

Add a comment