We save time, nerves and man-hours

Our projects are usually regional, and customers, as a rule, are ministries. But, in addition to the public sector, private organizations also use our systems. There are practically no problems with them.

So, the main projects are regional, and sometimes there are problems with them. For example, with performance, when in the regions there are from 20k of our precious users during the period of rolling out new functionality to product servers. It is a pain…

My name is Ruslan and I am engaged in maintenance of information systems "BARS Group" and developing a killer bot for brutal serial DBAs. The post is not for the faint of heart - a lot of letters and pictures.

We save time, nerves and man-hours

/awr

Some of our applications run on Oracle DBMS. There are also projects on the PostgreSQL DBMS. Oracle has a wonderful thing - collecting statistics on the load on the DBMS, which highlights the existing problems and even gives recommendations for elimination - Automatic Workload Repository (AWR). At one point (namely, at the moment of pain), the developers constantly asked to collect AWR reports for performance analysis. We honestly went to the DBMS server, collected reports, dragged them to ourselves and sent them to production for analysis. After the 5th time, it began to strain ... after the 10th - to cause irritation ...

One of my colleagues once suggested that everything that is done more than once should be automated. Until the moment of irritation, to be honest, I didn’t think about it and tried to automate everything that could be automated, but often this was not in demand and was more of a research than an applied nature.

And then I thought: “No admins needed to generate a report…”. After all, collecting a report means executing the sql script @$ORACLE_HOME/rdbms/admin/awrrpt.sql and taking the report from the server to yourself ... Oh, yes, we don’t let the development go for production.

Then I googled the necessary information, created a function from the article on a test base, pulled the script and a miracle - the report was assembled and it can be saved locally. Created functions where AWR reports were often needed, told developers how to use them.

Around this time, in my spare time, after talking with @BotFather, I created a Telegram bot for myself, just for fun. I screwed a simple functionality there - show the current time, exchange rates, weather, taught him to send compliments to my wife (then a girl) on schedule. Perhaps, at that time, sending compliments was the most requested functionality of my bot, my wife appreciated it.

So. Developers write to us in Telegram, we send a report to them in Telegram... But what if they write not to us, but to the bot? After all, it will be better for everyone, the report will be received faster, and most importantly, bypass us. This is how the idea of ​​the first requested functionality for my bot was born.

I started implementation. I did my best in PHP (actually our application is in PHP, I understand it more than in the same Python). The coder from me is so-so, so I won’t show my code 🙂

The bot lives in our corporate network, it has access to certain projects, including target databases. In order not to bother with the parameters in the command or with the menu, I attached this functionality to the group chat with monitoring notifications. So the bot immediately knows which database to collect the report on.

Having received a command like /awr N, where N is the number of full hours for which the report is needed (default is 1 hour), at least for a week, if the database has not been restarted, the bot immediately starts working, collects the report, publishes it as a web page and immediately (almost right there) provides a link to a much-needed report.

We follow the link and here it is, the AWR report:

We save time, nerves and man-hours

As expected, the developers coped with such report generation, someone even thanked.

Assessing the convenience of the team, project managers from other regions wanted the same, because they get the most people from the customer, they worry about the performance and availability of systems. Added a bot to the rest of the chats. They still use it and I'm happy about it.

Later, colleagues from the CIT found out about how we collect reports, and they also wanted to. I did not add them to our chats, I created a separate chat with the generation of reports on a schedule and on request.

/pgBadger

We also have other applications, in PHP in conjunction with PostgreSQL. Implemented pgBadger report collection for those in need according to the same principle - in group chats. At first they used it, but then they stopped. The functionality was cut out as unnecessary.

/duty

Our department has night shifts and, accordingly, there is a schedule. It is in google sheets. It is not always convenient to look for a link, open a chart, look for yourself... One of the former colleagues also played with his Telegram bot and introduced it into the chat of our department notifications about the beginning of the duty shift for the employees of the department. The bot parses the schedule, determines the person on duty by the current date and, according to the schedule or upon request, reports who is on duty today. It turned out great, comfortable. However, I did not really like the format of messages. Also, employees of another department (for example, BC "Medicine") do not really need information about those on duty in other areas, but they need to know who is on duty in "Medicine" in case of problems. I decided to "borrow" the functionality, but change what I didn't like. I made the format of messages convenient for myself and for others, removing unnecessary information.

/tnls

After the “pen test” of automation through the Telegram bot, many different ideas appeared, but I wanted to do strictly necessary things. Decided to lead request statistics. To access the projects of our customers, we have implemented the so-called "jump server" or forwarding server. VPN connections are raised on it, then through ssh, application ports, databases and other auxiliary forwarding are forwarded to our local network, for the convenience of accessing our employees' projects, without problems with VPN connections. It is enough to set up a VPN connection to our corporate network.

The statistics of calls suggested that often, after the fall of one of the tunnels (in case of network problems, due to a timeout, for example), they turn to restore access to the project. In most cases, simply restarting the connection is enough and everything is fine. And let's do it yourself. Here is the command:
We save time, nerves and man-hours

You “fall through” to the desired menu item, choose your project, wait a minute and everyone is happy and satisfied ...

When receiving a command, with a slight movement of the hand of bytes and bits, the bot connects to the forwarding server, knowing in advance which forwarding needs to be restarted, and does its job - restores the connection to the project. Wrote instructions to solve such issues on their own. And they contacted us only if the provided tool does not work ...

/ecp_to_pem

Further statistics showed that it is often necessary to convert EDS Crypto Pro in pem-format(Base64) for various integrations, and we have a lot of them. Task: you take a container, copy it to a Windows computer with the P12FromGostCSP utility installed (by the way, paid), convert it to pfx, and then convert pfx using OpenSSL (with support for GOST encryption) to pem. Not very convenient, but you want it at the click of your fingers.

Google has come to the rescue again. Found some kind person's utility. Collected, as it is written in README - earned. Taught the bot to work with the utility and got almost instant conversion.
We save time, nerves and man-hours

By the time of the final implementation, an order was issued to switch to a new encryption format - gost-2012. As far as I remember, the utility at that moment only worked with the old GOST (2001), perhaps it was generally another similar utility from another kind person, I don’t remember exactly.
After the transition to the new GOST, the functionality of the bot was removed for security reasons. Implemented it in a docker container.

Dockerfile, suddenly someone needs it:

FROM ubuntu:16.04                                                                                                                                                                        
RUN apt update && apt -y install git sudo wget unzip gcc g++ make &&                        
   cd /srv/ && git clone https://github.com/kov-serg/get-cpcert.git &&                     
   cd get-cpcert && chmod +x *.sh && ./prepare.sh && ./build.sh &&                         
   mkdir -p /srv/{in,out} &&                                                               
   echo '#!/bin/bash' > /srv/getpem.sh &&                                                  
   echo 'cd /srv/get-cpcert' >> /srv/getpem.sh &&                                          
   echo './get-cpcert /srv/in/$CONT.000 $PASS > /srv/out/$CONT.pem' >> /srv/getpem.sh &&   
   chmod +x /srv/getpem.sh                                                                  ENTRYPOINT /srv/getpem.sh

To convert, you need to place the source container (directory like xxx.000) in the /srv/in directory, and pick up the finished pem in /srv/out.

To convert do:

 docker run -t -i -e CONT='<имя директории с контейнером(без ".000")>' -e PASS='<пароль для контейнера>' -v /srv/in:/srv/in -v /srv/out:/srv/out --name ecptopem <адрес нашего репозитория>/med/ecptopem:latest 

/emstop and /emstart

One day, a very cool Oracle DBA got a job in our company, with a lot of experience in DBMS administration and development. And right off the bat he didn’t succeed with an ssh connection to the DBMS servers: either he doesn’t know where and how to connect, then the accesses are incomprehensibly described, then he can’t forward something necessary to himself. Well, we are happy to help, told how to connect, forwarded Enterprise Manager for him. But with ssh it still didn't work out. One of my colleagues explained it simply: DBA-thoroughbred 🙂 We decided that if we need to tweak something on the server, we will do it ourselves.

EM sometimes crashes under heavy load, and restarting it ... you need to connect via ssh and restart through the terminal. “Admins are good at it,” our new colleague decided. Large loads on the DBMS are not uncommon for us, and requests to restart EM are not uncommon. Then the same scenario follows: tension, irritation and the search for a solution to the problem. This is how the following commands appeared in the same group chats: /emstop and /emstart.

We save time, nerves and man-hours

/ kill

In the case of strong competition on the database, and this sometimes happens, it is necessary to quickly unload the database. The fastest way is to kill the problematic process... To do this, connect via ssh, kill -9... The bot will help!

We save time, nerves and man-hours

Alexey appreciated the team and gave it an affectionate name - "Kilyalka" or a gun.
One day, after watching how Aleksey tries and suffers, entering /kill xxx each time for each of the processes, I decided to add "multi-barrel" to our gun:

We save time, nerves and man-hours

That's better! All for you, Alexey, just work, dear!

Naturally, to such an important team was limited access by user_id - "fool protection". Seeing how Lesha deftly kills processes on the database server, several people tried to enter a command with a random process number, but you can’t fool my smart bot, he immediately refused.

/alertlog

Well, just in case, I made the command:
/alertlog <number of lines> — get the specified number of alertlog lines
The bot pulls the alertlog and sends it to our service, like pastebin, called pyste, and sends a link to the paste to the request chat.

/checks

Next came a request for monitoring the real performance of our application. Until now, the technical support of the project has been collecting this data by hand. Not the point! Our valiant testers have developed test cases for this. The resulting test log is not very convenient to read, an inexperienced user will take a long time to figure it out and it’s not a fact that he will highlight the necessary information. And we don't like to do with our hands what we can do with our hands… A new task for the bot!

We save time, nerves and man-hours

The /checks command gives a simple and unambiguous menu, this time our guys have learned how to use this command without instructions!

When you select the desired item, instead of a menu, a notification about the start of the test appears so that impatient users do not run our test 100500 times:

We save time, nerves and man-hours

Depending on the selected menu item, a certain test is launched from our network, namely from the machine where the bot lives (jmeter is pre-configured there, the necessary tests are placed ...) or immediately from the data center (from the prepared machine next to the application), in order to exclude network delays, well, or minimize them.

After the test is completed and the log is received, the bot parses it and gives the result in a “human-readable” form:

We save time, nerves and man-hours

Collection of metrics

The functionality has “entered” and willing project managers have received such a function for their regions. And one compassionate Project Manager said: “I want to have time statistics!” Someone from CIT suggested to her that it would be convenient to monitor all this in Zabbix. Zabbix, so Zabbix...

I thought that I needed to prepare for the need to replicate the solution ... I formalized the idea in a docker container. In the container, according to the schedule (once every 10 minutes), jmeter is launched, puts the log in a certain place, php parses it and displays the necessary data in the form of a web page. Zabbix gets this page using the web.page.get key, selects the necessary data for certain dependent elements regularly and builds a graph.

We save time, nerves and man-hours

Guess it didn't work out too bad. By observing the graph, we, firstly, see the approximate speed of the application, and if peaks are found on the graph, we know approximately where the “plug” is. Everything is simple. So far, it has turned out to be in demand only for one region, but I am already ready to replicate for those who wish.

Application development

Statistics on the same type of tasks not so long ago threw more ideas for simplifying and facilitating work. On some projects, on application servers, there is a need to install Crypto Pro key containers, there are many of them, and the EDS expires over time. Sometimes a day "arrives" on 2 tasks. But I considered it unsafe to use the bot for these purposes and decided that I would make the functionality directly in the application. Naturally with authorization and verification of access rights. If you have the necessary privileges, an additional menu item will be available for working with EDS, installing, deleting, viewing information, etc. At the moment, the functionality is under development. As it turned out, this is not very difficult, you just need to read the available instructions a little, look at code examples, ask around for more experienced colleagues in the development, well, do it. In the process of research, there were ideas for adding to the application. I will not build Napoleonic plans - there is a development, let everyone mind their own business. But while it's interesting - I'm doing it myself.

Plans

As I said, many different ideas for using our bot were born and not only - in general, let's say, ideas for "automation points", many of them were forgotten, since I did not have time to write them down. Now I try to write down everything that comes to mind, and I encourage others to do the same.

But Alexey does not forget to throw up his Wishlist. From the latest:
/kill_sql SQL_ID - kill all sessions with this SQL_ID query
/kill_block - kill root blocking session
/show_em - show EM performance picture
Cunning, wants to DBA from the phone =)

This is how we work for the good of the Motherland!

And how do you save yourself from routine and uninteresting tasks?

I hope the reading turned out to be interesting, and maybe even useful to someone, and I did not have time to tire the reader ... Good luck to all of us.

Source: habr.com

Add a comment