Why don't engineers care about application monitoring?

Happy Friday everyone! Friends, today we continue a series of publications dedicated to the course "DevOps practices and tools", because classes in a new group on the course will start at the end of next week. So, let's begin!

Why don't engineers care about application monitoring?

Monitoring is just. This is a known fact. Get Nagios up, run NRPE on the remote system, set Nagios to NRPE TCP port 5666 and you have monitoring.

It's so easy it's not interesting. Now you have the basic metrics for CPU time, disk subsystem, RAM that come by default in Nagios and NRPE. But it's not really "monitoring" per se. This is just the beginning.

(Usually install PNP4Nagios, RRDtool and Thruk, set up notifications in Slack and go straight to nagiosexchange, but for now let's skip that).

Good monitoring is actually quite complex, you really need to know the internals of the application you are monitoring.

Is monitoring difficult?

Any server, be it Linux or Windows, will, by definition, serve a purpose. Apache, Samba, Tomcat, file storage, LDAP are all more or less unique in one or more respects. Each has its own function, its own characteristics. There are different ways to get the metrics, KPIs (Key Performance Indicators) you are interested in when the server is under load.

Why don't engineers care about application monitoring?
Photo by luke chesser on Unsplash

(wish my dashboards were neon blue - sigh dreamily -... hmm...)

Any software that provides services must have a mechanism for collecting metrics. Apache has a module mod-statusA that displays the server status page. Nginx has - stub_status. Tomcat has JMX or custom web applications that show key metrics. MySQL has a "show global status" command, etc.
So why aren't developers building these mechanisms into the applications they create?

Are only developers doing this?

A certain level of indifference to embedding metrics is not limited to developers. I've worked in companies where they developed applications using Tomcat and didn't give out any of their metrics, no logs of service activity, other than general Tomcat error logs. Some developers generate an abundance of logs that mean nothing to a system administrator who is unfortunate enough to read them at 3:15 in the morning.

Why don't engineers care about application monitoring?
Photo by Tim Gow on Unsplash

The systems engineers who allow such products to be released must also bear some responsibility for the situation. Few systems engineers have the time and care to try and get meaningful metrics from logs without the context of those metrics and the ability to interpret them in light of application activity. Some do not understand how they can benefit from this, other than indicators like β€œsomething is currently (or will soon) be wrong.”

A change in mindset regarding the need for metrics needs to take place not only among developers, but also among systems engineers.

For any systems engineer who needs to not only respond to critical events but also ensure they don't happen, the lack of metrics is usually a barrier to doing so.

However, systems engineers usually don't dig into the code to make money for their company. They need lead developers who understand the importance of the systems engineer's responsibility in finding problems, raising awareness of performance issues, and the like.

This devops thing

The devops mentality describes the synergy of developer (dev) and operations (ops) thinking. Any company claiming to be "making devops" must:

  1. saying things they probably don't (an allusion to the meme from The Princess Bride - "I don't think it means what you think it means!")
  2. encourage an attitude of continuous product improvement.

You cannot improve a product and know that it has been improved unless you know how it currently works. You can't know how a product works if you don't understand how its components work, the services it depends on, its major pain points and bottlenecks.
If you don't watch for potential bottlenecks, you won't be able to follow the Five Whys technique when writing a Postmortem. You won't be able to gather everything on one screen to see how a product works or how it looks β€œnormal and happy”.

Shift left, LEFT, I SAID LEEEEEEEβ€”

For me, one of the key principles of Devops is "shift left" (shift left). A left shift in this context means a possibility shift (no responsibility, but only capabilities) to do things that systems engineers usually care about, for example, create performance metrics, use logs more efficiently, etc., to the left in the Software Delivery Life Cycle.

Why don't engineers care about application monitoring?
Photo by NESA by Makers on Unsplash

Software developers must be able to use and know the monitoring tools that a company uses to monitor in all its forms, metrics, logging, monitoring interfaces and most importantly, see how their product works in production. You can't force developers to invest time and effort into monitoring until they can see the metrics and influence how they look, how the product owner introduces them to the CTO at the next briefing, etc.

Shortly speaking

  1. Lead the horse to the water. Show developers how many problems they can avoid for themselves, help them identify the right KPIs and metrics for their applications so that there is less yelling from the product owner who is yelled at by the technical director (CTO). Bring them out into the light, gently and calmly. If that doesn't work, then bribe, threaten, and coax either them or the product owner into getting those metrics out of the apps as quickly as possible, and then draw diagrams. This will be difficult as it will not be seen as a priority and there will be many pending revenue generating projects on the product roadmap. Therefore, you will need a business case to justify the time and expense spent incorporating monitoring into the product.
  2. Help systems engineers sleep. Show them that it's good to have a "release" checklist for any product that's being released. And making sure all applications in production are covered with metrics will help ensure healthy sleep at night, allowing developers to see what goes wrong where. However, the right way to irritate and frustrate any developer, product owner, and CTO is to persistently put a spoke in the wheel and fight back. This behavior will affect the release date of any product if you wait until the last minute again, so again shift to the left and include these issues in the project plan as soon as possible. If necessary, sneak into meetings dedicated to the product. Wear a fake mustache and felt or something like that, it will never let you down. Communicate your concerns, show clear benefits, and evangelize.
  3. Ensure that both developers (dev) and operations (ops) understand the meaning and impact of product metrics going into the red zone. Don't leave exploitation as the sole guardian of product health, make sure the developers are involved too (#productsquads).
  4. Logs are great, but so are metrics. Combine them and don't let your logs become garbage in a huge flaming ball of uselessness. Explain and show the developers why no one but them will understand their logs, show them what it's like to look at useless logs at 3:15 in the morning.

Why don't engineers care about application monitoring?
Photo by Marko Croatia on Unsplash

That's all. New material will be released next week. If you would like to learn more about the course, please visit Open Daywhich will take place on Monday. And now we are traditionally waiting for your comments.

Source: habr.com

Add a comment