David O'Brien (Xirus): Metrics! Metrics! Metrics! Part 1

David O'Brien recently launched his own company, Xirus (https://xirus.com.au), focusing on Microsoft's Azure Stack cloud products. They are designed to consistently build and run hybrid applications across datacenters, edge locations, remote offices, and the cloud.

David educates individuals and companies on everything related to Microsoft Azure and Azure DevOps (formerly VSTS) and is still doing hands-on consulting and infracoding. He has been a Microsoft MVP (Microsoft Most Valuable Professional) for 5 years and recently received the Azure MVP award. As co-host of the Melbourne Microsoft Cloud and Datacentre Meetup, O'Brien is a regular speaker at international conferences, combining his interest in traveling the world with a passion for sharing IT stories with the community. David's blog is located at david-obrien.net, he also posts his online Pluralsight trainings.

The talk talks about the importance of metrics in understanding what's going on in your environment and how your application is performing. Microsoft Azure has a powerful and easy way to display metrics for all kinds of workloads, and the lecture covers how you can use them all.

At 3am on a Sunday while you are sleeping, you are suddenly woken up by a text message alert: β€œSupercritical app not responding again.” What is going on? Where and what is the reason for "brakes"? In this talk, you will learn about the services that Microsoft Azure offers to customers for collecting logs and, in particular, metrics for your cloud workloads. David will tell you what metrics you should be interested in when working on a cloud platform and how to get to them. You will learn about open source tools and building dashboards, and as a result, gain enough knowledge to create your own dashboards.

And if you are woken up again at 3 a.m. by a message about the fall of a critical application, you can quickly figure out its cause.

Good afternoon, today we will talk about metrics. My name is David O'Brien and I am the co-founder and owner of Xirus, a small consulting company in Australia. Thank you again for coming here to spend your time with me. So why are we here? To talk about metrics, or rather, I will tell you about them, and before we do any things, let's start with theory.

David O'Brien (Xirus): Metrics! Metrics! Metrics! Part 1

I'll cover what metrics are, what you can do with them, what you need to pay attention to, how to collect and enable metrics collection in Azure, and what metrics visualization is. I'll show you what these things look like in the Microsoft cloud and how to work with this cloud.

Before I begin, I would like to ask those who use Microsoft Azure to raise their hands. And who works with AWS? I see few. And with Google? ALI Cloud? One man! Great. So what are metrics? The official definition of the US National Institute of Standards and Technology is as follows: "A metric is a measurement standard that describes the conditions and rules for performing a measurement of a property and serves to understand the measurement results." What does it mean?

Consider, for example, a metric for changing the free disk space of a virtual machine. For example, we are given the number 90, and this number means percentages, that is, the amount of free disk space is 90%. I note that it is not very interesting to read the description of the definition of metrics, which takes 40 pages in pdf format.

However, the metric does not say how the result of the measurement was obtained, it only shows this result. What do we do with metrics?

First, we measure the value of something, then to use the result of the measurement.

David O'Brien (Xirus): Metrics! Metrics! Metrics! Part 1

For example, we found out the amount of free disk space and now we can use it, use this memory, etc. After we have received the result of the metric, we must interpret it. For example, the metric returned a result of 90. We need to know what this number means: the amount of free space or the amount of disk space used in percent or gigabytes, network latency equal to 90 ms, and so on, that is, we need to interpret the meaning of the metric value. In order for metrics to make sense at all, after interpreting a single metric value, we need to ensure that multiple values ​​are collected. This is very important as many people don't realize the need to collect metrics. Microsoft has made it very easy to get the metrics, but it's up to you to collect them yourself. These metrics are stored for only 41 days and disappear on the 42nd day. Therefore, depending on the properties of your external or internal equipment, you must take care of how to save metrics for more than 41 days - in the form of logs, logs, etc. Thus, after collection, you must place them in some place that allows you to raise all the statistics of changing the results of the metrics if necessary. By placing them there, you can start working effectively with them.

Only after you get the values ​​of the metrics, interpret them and collect them, you can create an SLA - an agreement on the level of service delivery. This SLA may not matter much to your customers, it is more important to your colleagues, managers, those who maintain the system and are concerned about its functionality. The metric can measure the number of tickets - for example, you receive 5 tickets per day, and in this case it shows the speed of response to user requests and the speed of troubleshooting. A metric should not just say that your site loads in 20ms or the response speed is 20ms, a metric is more than just one technical metric.

Therefore, the task of our conversation is to provide you with a detailed picture of the essence of metrics. The metric serves to ensure that by looking at it, you can get a complete picture of the process.

David O'Brien (Xirus): Metrics! Metrics! Metrics! Part 1

Once we have the metric, we can 99% guarantee that the system is working, because it's not just a look at the log file, which says that the system is working. The 99% uptime guarantee means that, for example, 99% of the time the API responds at the normal 30ms rate. This is exactly what your users, your colleagues and managers are interested in. Many of our clients monitor web server logs, but they do not notice any errors in them and think that everything is in order. For example, they see a network speed of 200 mb/s and think: β€œOK, everything is fine!”. But to achieve those 200, users need a response time of 30 milliseconds, and this is exactly the figure that is not measured and not collected in log files. At the same time, users are surprised that the site loads very slowly, because, without having the right metric, they do not know the reason for this behavior.

But since we have an SLA that guarantees 100% uptime, customers are starting to complain because the site is really hard to use. Therefore, to create an objective SLA, it is necessary to see the full picture of the process created by the collected metrics. This is the subject of my constant dispute with some providers who, when creating an SLA, do not understand what the term β€œuptime” means, and in most cases do not explain to their customers how their API works.

If you have created a service, for example, an API for a third person, you should understand what the resulting metric of 39,5 means - a response, a successful response, a response at a speed of 20 ms, or at a speed of 5 ms. It is up to you to adapt their SLA to your own SLA, to your own metrics.

With all that out of the way, you can start building your awesome dashboard. Tell me, has anyone already used the Grafana interactive visualization application? Great! I'm a big fan of this open source because it's free and easy to use.

David O'Brien (Xirus): Metrics! Metrics! Metrics! Part 1

If you haven't used Grafana yet, I'll show you how to work with it. Who was born in the 80s and 90s, probably remembers the caring cubs CareBears? I don’t know how popular these bears were in Russia, but in terms of metrics, we should act as the same β€œcare bears”. As I said, you want a big picture of how the whole system works, and it doesn't have to be just about your API, your website, or a service running in a virtual machine.

David O'Brien (Xirus): Metrics! Metrics! Metrics! Part 1

You must organize the collection of those metrics that most fully reflect the operation of the entire system. Most of you are software developers, so your life is constantly changing to accommodate new product requirements, and just as you are concerned with coding processes, you should be concerned with metrics. You need to know how the metric relates to each line of code you write. For example, next week you are starting a new marketing campaign and you expect a large number of users to visit your site. To analyze this event, you will need metrics, and perhaps you will need an entire dashboard to track the activity of these people. You will need metrics to understand how successful and how your marketing campaign is actually working. They will help you, for example, develop an effective CRM - a customer relationship management system.

So let's get started with our Azure cloud service. It is very easy to find and organize the collection of metrics because it has Azure Monitor. This monitor centralizes configuration management of your system. Each of the Azure elements you want to apply to your system has many metrics enabled by default. This is a free application that works right out of the box and does not require any preliminary settings, you do not need to write anything and "screw" it to your system. We will verify this by viewing the following demo.

David O'Brien (Xirus): Metrics! Metrics! Metrics! Part 1

In addition, it is possible to send these metrics to third-party applications, such as the Splunk log storage and analysis system, the SumoLogic cloud log management application, the ELK log processing tool, IBM Radar. True, there are slight differences that depend on the resources you use - virtual machine, network services, Azure SQL databases, that is, the use of metrics differs depending on the functions of your production environment. I will not say that these differences are serious, but, unfortunately, they are still present, and this should be taken into account. Enabling and forwarding metrics is possible in several ways: through Portal, CLI/Power Shell, or using ARM templates.

David O'Brien (Xirus): Metrics! Metrics! Metrics! Part 1

Before starting the first demo, I will answer any questions you may have. If there are no questions, let's get started. The screen shows what the Azure Monitor page looks like. Can any of you say that this monitor is not working?

David O'Brien (Xirus): Metrics! Metrics! Metrics! Part 1

So now everything is in order, you can see what the monitor services look like. I can say that this is an excellent and very simple tool for everyday work. It can be used to monitor applications, networks and infrastructure. Recently, the monitoring interface has been improved, and if earlier services were located in different places, now all information on services is consolidated on the monitor's home page.

The metrics table is a tab under the HomeMonitorMetrics path that you can go to to see all available metrics and select the ones you need. But if you need to enable metrics collection, you need to use the HomeMonitorDiagnostic settings directory path and check the Enabled/Disabled metrics checkboxes. By default, almost all metrics are enabled, but if you need to enable something extra, then you will need to change the diagnostic status from Disabled to Enabled.

David O'Brien (Xirus): Metrics! Metrics! Metrics! Part 1

To do this, click on the line of the selected metric and turn on the diagnostic mode on the tab that opens. If you are going to analyze the selected metric, then after clicking the Turn on diagnostic link, you need to check the Send to Log Analytics checkbox in the window that appears.

David O'Brien (Xirus): Metrics! Metrics! Metrics! Part 1

Log Analytics is a bit like Splunk, but it costs less. This service allows you to collect all your metrics, logs and whatever you need and place them in your Log Analytics workspace. The service uses a special KQL query processing language - Kusto Quarry Language, we will consider its operation in the next demo. For now, I note that with its help you can generate queries regarding metrics, logs, terms, trends, patterns, etc. and create dashboards.

So, we check the Send to Log Analytics checkbox and the LOG panel checkboxes: DataPlaneRequests, MongoRequests and QueryRuntimeStatistics, and below on the METRIC panel, the Requests checkbox. Then we assign a name and save the settings. On the command line, this is two lines of code. By the way, the Azure Cloud shell in this sense is similar to Google, which also allows you to use the command line in your web browser. AWS doesn't have anything like that, so Azure is much more convenient in that sense.

For example, I can run a demo through the web interface without using any code on my laptop. To do this, I must authenticate with my Azure account. Then you can use, for example, terrafone, if you already use it, wait for the connection to the service and get the Linux desktop that Microsoft uses by default.

David O'Brien (Xirus): Metrics! Metrics! Metrics! Part 1

Next, I use Bash built into the Azure Cloud Shell. A very useful thing is the IDE built into the browser, a lightweight version of VS Code. Next, I can go into my error metrics template, modify it and customize it to my needs.

David O'Brien (Xirus): Metrics! Metrics! Metrics! Part 1

By configuring this template to collect metrics, you can use it to generate metrics for your entire infrastructure. After we have applied the metrics, collected and saved them, we need to visualize them.

David O'Brien (Xirus): Metrics! Metrics! Metrics! Part 1

Azure Monitor is only concerned with metrics and does not provide an overview of the state of your system. You may have a number of other applications running outside of the Azure environment. So if you need to monitor all processes by visualizing all the collected metrics in one place, then Azure Monitor is not suitable for this.

To solve this problem, Microsoft offers the Power BI tool, a comprehensive business analysis software that includes the visualization of a wide variety of data. This is a rather expensive product, the cost of which depends on the set of features you need. By default, it offers you 48 types of processed data and is associated with Azure SQL Data Warehouse, Azure Data Lake Storage, Azure Machine Learning Services, and Azure Databricks. Using scalability, you can receive new data every 30 minutes. This may be sufficient for your needs or not sufficient if you need real-time monitoring visualization. In this case, it is recommended to use applications such as the Grafana I mentioned. In addition, the Microsoft documentation describes the ability to send metrics, logs, and event tables using SIEM tools to Splunk, SumoLogic, ELK, and IBM radar visualization systems.

23:40 min

To be continued very soon...

Some ads πŸ™‚

Thank you for staying with us. Do you like our articles? Want to see more interesting content? Support us by placing an order or recommending to friends, cloud VPS for developers from $4.99, a unique analogue of entry-level servers, which was invented by us for you: The whole truth about VPS (KVM) E5-2697 v3 (6 Cores) 10GB DDR4 480GB SSD 1Gbps from $19 or how to share a server? (available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).

Dell R730xd 2 times cheaper in Equinix Tier IV data center in Amsterdam? Only here 2 x Intel TetraDeca-Core Xeon 2x E5-2697v3 2.6GHz 14C 64GB DDR4 4x960GB SSD 1Gbps 100 TV from $199 in the Netherlands! Dell R420 - 2x E5-2430 2.2Ghz 6C 128GB DDR3 2x960GB SSD 1Gbps 100TB - from $99! Read about How to build infrastructure corp. class with the use of Dell R730xd E5-2650 v4 servers worth 9000 euros for a penny?

Source: habr.com

Add a comment