From outsourcing to development (Part 1)

Hello everyone, my name is Sergey Emelyanchik. I am the head of the Audit-Telecom company, the main developer and author of the Veliam system. I decided to write an article about how my friend and I created an outsourcing company, wrote software for ourselves and subsequently began to distribute it to everyone via the SaaS system. About how I categorically did not believe that this was possible. The article will contain not only a story, but also technical details of how the Veliam product was created. Including some pieces of source code. I'll tell you about the mistakes that were made and how they were then corrected. There were doubts whether to publish such an article. But I thought it was better to do it, get feedback and fix it, than not publish the article and think about what would happen if ...

prehistory

I worked in an IT company as an employee. The company was quite large with an extensive network structure. I will not dwell on my official duties, I will only say that they definitely did not include the development of something.

We had monitoring, but purely from academic interest, we wanted to try to write our simplest one. The idea was this: I wanted it to be on the web so that you could easily log in and see what was happening with the network from any device, including a mobile device via Wi-Fi, without installing any clients, and I also really wanted to quickly understand which the room is equipped with equipment that “bewildered”, because. there were very strict requirements for the response time to such problems. As a result, a plan was born in my head to write a simple web page, on which there was a jpeg with a network diagram in the background, cut out the devices themselves with their IP addresses in this picture, and show already dynamic content in the right coordinates over the picture in the form of a green or flashing red IP address. The task is set, let's get started.

I have previously programmed in Delphi, PHP, JS and very superficially C++. I'm pretty good at networking. VLAN, Routing (OSPF, EIGRP, BGP), NAT. This was enough to write a primitive monitoring prototype on my own.

Wrote the idea in PHP. The Apache and PHP server was on Windows. Linux for me at that moment was something incomprehensible and very difficult, as it turned out later, I was very wrong and in many places Linux is much simpler than Windows, but this is a separate issue and we all know how many holivars there are on this topic. The Windows Task Scheduler pulled at a short interval (I don’t remember exactly, but something like once every three seconds) a PHP script that polled all objects with a banal ping and saved the state to a file.

system(“ping -n 3 -w 100 {$ip_address}“); 

Yes, yes, working with the database at that moment was also not mastered for me. I did not know that it was possible to parallel processes, and it took a long time to go through all the nodes of the network, because it happened in one thread. Especially problems arose when several nodes were unavailable, because. each of them delayed the script on itself for 300 ms. On the client side, there was a simple looped function that, at intervals of several seconds, downloaded updated information from the Ajax server with an Ajax request and updated the interface. Well, then after 3 unsuccessful pings in a row, if a web page with monitoring was opened on the computer, a cheerful composition played.

When everything turned out, I was very inspired by the result and thought that I could add more (because of my knowledge and capabilities). But I have always disliked systems with a million graphs, which I then believed, and still consider to this day, are in most cases unnecessary. I wanted to screw there only what would really help me in my work. This principle remains to this day as fundamental in the development of Veliam. Further, I realized that it would be very cool if it were not necessary to keep monitoring open and know about problems, and when it happened, even then open the page and see where this problematic network node is located and what to do with it further. Somehow I didn’t read e-mail then, I simply didn’t use it. I came across on the Internet that there are SMS gateways to which you can send a GET or POST request, and they will send SMS to my mobile phone with the text that I will write. I knew right away that I really wanted this. And began to study the documentation. After some time, I succeeded, and now I received SMS about problems on the network on my mobile phone with the name of the “fallen object”. Although the system was primitive, it was written by me myself, and the most important thing that motivated me to develop it then was that it was an application program that really helps me in my work.

And then the day came when one of the Internet channels fell at work, and my monitoring did not let me know about it. Since Google's DNS has been pinging fine so far. It's time to think about how you can monitor that the communication channel is alive. There were different ideas on how to do it. I didn't have access to all the equipment. I had to figure out how to understand which of the channels is alive, but at the same time not being able to see it in any way on the network equipment itself. Then a colleague threw in the idea that it is possible that the route tracing to public servers may differ depending on which communication channel is currently accessing the Internet. Checked it out and it turned out. There were different routes when tracing.

system(“tracert -d -w 500 8.8.8.8”);

So another script appeared, or rather, for some reason, the trace was added to the end of the same script that pinged all devices on the network. After all, this is another long process that was executed in the same thread and slowed down the work of the entire script. But then it was not so obvious. But one way or another, he did his job, the code was rigidly spelled out what the trace should have been for each of the channels. This is how the system began to work, which already monitored (loudly, because there was no collection of any metrics, but just ping) network devices (routers, switches, wi-fi, etc.) and communication channels with the outside world . SMS messages were regularly received and it was always clearly visible on the diagram where the problem was.

Further, in everyday work I had to deal with cross-country. And every time I go to Cisco switches in order to see which interface you need to use, I'm tired of it. How cool it would be to click on an object on the monitor and see a list of its interfaces with descriptions. It would save me time. Moreover, in this scheme, it would not be necessary to run Putty or SecureCRT to enter accounts and commands. I just poked in the monitoring, saw what I needed and went to do my job. I started looking for how to interact with the switches. I immediately came across 2 options: SNMP or go to the switch via SSH, enter the commands I need and parse the result. SNMP was dismissed due to the complexity of the implementation, I was itching to get the result. with SNMP, you would have to dig into the MIB for a long time, based on this data, generate data about the interfaces. There is a great team at CISCO

show interface status

It shows just what I need for crossovers. Why bother with SNMP when I just want to see the output of this command, I thought. Some time later, I realized this possibility. Clicked on the web page on the object. An event was triggered by which the client AJAX accessed the server, and the latter, in turn, connected via SSH to the switch I needed (credentials were hardwired into the code, there was no desire to ennoble, make some separate menus where one could change accounts from the interface , I needed the result and quickly) entered the above command there and gave it back to the browser. So I began to see information on interfaces with one click of the mouse. It was extremely convenient, especially when I had to look at this information on different switches at once.

Trace-based channel monitoring turned out to be not the best idea in the end, because. sometimes work was carried out on the network, and the trace could change and monitoring began to scream to me that there were problems with the channel. But after spending a lot of time on the analysis, I realized that all the channels were working, and my monitoring was deceiving me. As a result, I asked my colleagues who managed the channel-forming switches to send me just a syslog when the visibility state of the neighbors (neighbor) changed. Accordingly, it was much simpler, faster and more truthful than tracing. An event like neighbor lost has come, and I immediately make a notification about the fall of the channel.

Further, there were conclusions on clicking on an object, a few more commands, and SNMP was added to collect some metrics, well, that’s all by and large. The system has not been further developed. She did everything I needed, it was a good tool. Many readers will probably tell me that there is already a lot of software on the Internet to solve these problems. But in fact, I didn’t google such free products then, and I really wanted to develop programming skills, and what could be better to push for this than a real applied task. On this, the first version of the monitoring was completed and no longer modified.

Establishment of the company Audit-Telecom

As time went on, I began to earn extra money in parallel in other companies, since the work schedule allowed me to do this. When you work in different companies, your skill in various areas grows very quickly, your horizons develop well. There are companies in which, as they say, you are a Swiss, a reaper, and a player on the pipe. On the one hand, it is difficult, on the other hand, if you are not lazy, you become a generalist and this allows you to solve problems faster and more efficiently because you know how the related field works.

My friend Pavel (also an IT specialist) constantly tried to encourage me to start his business. There were countless ideas with different variations of their case. This has been discussed for over a year. And in the end, nothing should have come about because I am a skeptic, and Pavel is a dreamer. Every time he came up with an idea, I always didn't believe in it and refused to participate. But we really wanted to open our own business.

Finally, we were able to find an option that suits both of us and do what we can do. In 2016, we decided to create an IT company that will help businesses solve IT problems. This is the deployment of IT systems (1C, terminal server, mail server, etc.), their maintenance, classic HelpDesk for users and network administration.

Frankly, at the time of the creation of the company, I did not believe in it by about 99,9%. But somehow Pavel was able to get me to try, and looking ahead, he was right. Pavel and I chipped in 300 rubles each, registered a new Audit Telecom LLC, rented a tiny office, put on cool business cards, well, in general, like, probably, most inexperienced, novice businessmen, and began to look for clients. Finding clients is another story altogether. Perhaps we will write a separate article as part of a corporate blog, if it is of interest to someone. Cold calls, flyers, and more. It did not give any results. As I read now, from many stories about business, one way or another, a lot depends on luck. We were lucky. and just a couple of weeks after the company was founded, my brother Vladimir turned to us, who brought us the first client. I will not bore you with the details of working with clients, the article is not about that, I will only say that we went for an audit, identified critical places and these places broke down while a decision was made whether to cooperate with us on an ongoing basis as outsourcers. After that, a positive decision was made immediately.

Further, mainly by sarafan through acquaintances, other service companies began to appear. Helpdesk was on the same system. Connections to network equipment and servers in another, or rather, someone else. Someone saved shortcuts, someone used RDP address books. Monitoring is another separate system. It is very inconvenient for a team to work in disparate systems. Important information is missing. Well, for example, the terminal server at the client became unavailable. Applications from users of this client are immediately received. The support service specialist starts a request (it was received by phone). If incidents and requests were registered in the same system, then the support specialist would immediately see what the problem is with the user and tell him about it, in parallel already connecting to the desired object to work out the situation. Everyone is aware of the tactical situation and work smoothly. We have not found such a system where all this is combined. It became clear that it was time to make your own product.

Continued work on your monitoring system

It was clear that the system that was written earlier did not fit the current tasks at all. Neither in terms of functionality nor in terms of quality. And it was decided to write the system from scratch. Graphically, it should have looked completely different. It was supposed to be a hierarchical system so that it would be possible to quickly and conveniently open the desired object from the desired client. The scheme as in the first version was absolutely not justified in the current case, because. customers are different and it did not matter at all in which premises the equipment is located. It has already been transferred to the documentation.

So the tasks:

  1. Hierarchical structure;
  2. Some kind of server part that can be placed on the client in the form of a virtual machine to collect the metrics we need and send to the central server, which will generalize and show us all this;
  3. Alerts. Ones that cannot be missed, because. at that time it was not possible for someone to sit and only look at the monitor;
  4. Application system. Clients began to appear, for whom we serviced not only server and network equipment, but also workstations;
  5. The ability to quickly connect to servers and equipment from the system;

The tasks are set, we begin to write. Along the way, processing requests from customers. At that time there were already 4 of us. We started writing both parts at once, both the central server and the server for installation to clients. By this point, Linux was no longer a stranger to us, and it was decided that the virtual machines that customers would have would be on Debian. There will be no installers, just make a server part project on one specific virtual machine, and then we will simply clone it to the desired client. It was another mistake. Later it became clear that in such a scheme the mechanism of updates was not worked out at all. Those. we added some new feature, and then there was a whole problem to distribute it to all client servers, but we will return to this later, everything in order.

Made the first prototype. He was able to ping the client network devices and servers we needed and send this data to our central server. And he, in turn, updated this data in the general mass on the central server. Here I will write not only a story about how and what was possible, but also what amateurish mistakes were made and how then I had to pay for it with time. So, the entire tree of objects was stored in one single file as a serialized object. While we connected several clients to the system, everything was more or less normal, although sometimes there were some artifacts that were completely incomprehensible. But when we connected a dozen servers to the system, miracles began to happen. Sometimes, for some unknown reason, all objects in the system simply disappeared. It is important to note here that the servers that the clients had sent data to the central server every few seconds via a POST request. An attentive reader and an experienced programmer have already guessed that there was a problem of multiple access to the very file in which the serialized object was stored from different streams at the same time. And just as this was happening, miracles were occurring with the disappearance of objects. The file simply became empty. But all this was not discovered immediately, but only during operation with several servers. During this time, port scanning functionality was added (servers sent to the central not only information about the availability of devices, but also about the ports open on them). This was done by calling the command:

$connection = @fsockopen($ip, $port, $errno, $errstr, 0.5);

the results were often incorrect and the scan took a very long time. I completely forgot about ping, it was done via fping:

system("fping -r 3 -t 100 {$this->ip}");

All this was also not parallelized and therefore the process was very long. Later, the entire list of IP addresses needed for checking was already transmitted to fping at once, and a ready-made list of those who responded was received back. Unlike us, fping was able to parallelize processes.

Another frequent routine work was setting up some services via the WEB. Well, for example, ECP from MS Exchange. Basically, it's just a link. And we decided that we need to give us the opportunity to add such links directly to the system, so as not to look in the documentation or somewhere else in the bookmarks for how to access the ECP of a particular client. This is how the concept of resource links for the system appeared, their functionality is available to this day and has not changed, well, almost.

How resource links work in Veliam
From outsourcing to development (Part 1)

Remote connections

Here's what it looks like in action in the current version of Veliam
From outsourcing to development (Part 1)

One of the tasks was a quick and convenient connection to servers, of which there were already many (more than one hundred) and it was extremely inconvenient to go through millions of pre-saved RDP shortcuts. A tool was needed. There is software on the Internet that is something like an address book for such RDP connections, but they are not integrated with the monitoring system, and accounts cannot be saved. Entering accounts for different clients every time is a living hell when you connect more than a dozen times to different servers a day. With SSH, things are a little better, there is a lot of good software in which it is possible to sort such connections into folders and remember accounts from them. But there are 2 problems. The first is that we did not find a single program for RDP and SSH connections. The second - if at some point I am not at my computer and I need to quickly connect, or I just reinstalled the system, I will have to go into the documentation to see the account from this client. It's inconvenient and a waste of time.

The hierarchical structure we needed for client servers was already in our internal product. It was only necessary to figure out how to fasten fast connections to the necessary equipment there. For starters, at least within your network.

Taking into account the fact that the client in our system was a browser that does not have access to the local resources of the computer, in order to just take it and run the application we need with some command, it was thought to do everything through the “Windows custom url scheme”. So a certain “plugin” to our system appeared, which simply included Putty and Remote Desktop Plus in its composition and, during installation, simply registered the scheme URIs in Windows. Now, when we wanted to connect to an object via RDP or SSH, we clicked this action on our system and the Custom URI worked. The standard mstsc.exe built into Windows or putty was launched, which was part of the “plugin”. I put the word plugin in quotation marks, because it is not a browser plugin in the classic sense.

It was at least something. Convenient address book. Moreover, in the case of Putty, everything was generally fine, it was possible to give it both the IP connection and the login and password as input parameters. Those. we have already connected to Linux servers in our network with one click without entering passwords. But with RDP, not everything is so simple. You cannot pass credentials as parameters to the standard mstsc. Remote Desktop Plus came to the rescue. He allowed it to be done. Now we already do without it, but for a long time it was a faithful assistant in our system. With HTTP(S) sites, everything is simple, such objects just opened in the browser and that's it. Convenient and practical. But it was happiness only in the internal network.

Since we solved the vast majority of problems remotely from the office, the easiest thing was to make VPNs to clients. And then from our system it was possible to connect to them. But it was still a little uncomfortable. For each client, it was necessary to keep a bunch of remembered VPN connections on each computer, and before connecting to any, it was necessary to turn on the corresponding VPN. We have been using this solution for a long time. But the number of clients is increasing, the number of VPNs is also increasing, and all this began to strain and something had to be done about it. Tears especially welled up in my eyes after reinstalling the system, when it was necessary to re-drive dozens of VPN connections in a new Windows profile. Stop tolerating this, I said, and began to think what could be done about it.

It so happened that all customers had devices from the well-known company Mikrotik as routers. They are very functional and convenient to perform almost any task. Of the minuses - they are "stealed". We solved this problem simply by closing all access from the outside. But it was necessary to somehow have access to them, without coming to the client's place, because. it's long. We simply made tunnels to each such microtic and allocated them to a separate pool. without any routing, so that there is no unification of your network with the networks of clients and their networks among themselves.

The idea was born to make it so that when I click on the object I need in the system, the central monitoring server, knowing the SSH accounts of all client microticks, connects to the right one, creates a forwarding rule to the right host with the right port. There are several points here. The solution is not universal - it will only work for Mikrotik, as all routers have their own command syntax. Also, such forwarding then had to be somehow removed, and the server part of our system, in fact, could not track whether I ended my RDP session in any way. Well, such forwarding is a hole for the client. And we didn’t pursue universality, because. the product was used only within our company and there was not even a thought to bring it to the public.

Each of the problems was solved in its own way. When the rule was created, this forwarding was available only for one specific external IP address (from which the connection was initialized). So the security hole was avoided. But with each such connection, a Mikrotik rule was added to the NAT page and was not cleared. And everyone knows that the more rules there are, the more the router processor is loaded. And in general, I could not accept such that one day I would go to some kind of Mikrotik, and there were hundreds of dead rules that no one needed.

Since our server cannot track the connection status, let Mikrotik track them itself. And I wrote a script that constantly monitored all forwarding rules with a specific description (description) and checked if there was a suitable TCP connection rule. If there has not been one for some time, then the connection is probably already completed and this forwarding can be deleted. Everything turned out, the script worked well.

By the way, here it is:

global atmonrulecounter {"dontDelete"="dontDelete"}
:foreach i in=[/ip firewall nat find comment~"atmon_script_main"] do={ 
	local dstport [/ip firewall nat get value-name="dst-port" $i]
	local dstaddress [/ip firewall nat get value-name="dst-address" $i]
	local dstaddrport "$dstaddress:$dstport"
	#log warning message=$dstaddrport
	local thereIsCon [/ip firewall connection find dst-address~"$dstaddrport"]
	if ($thereIsCon = "") do={
		set ($atmonrulecounter->$dstport) ($atmonrulecounter->$dstport + 1)
		#:log warning message=($atmonrulecounter->$dstport)
		if (($atmonrulecounter->$dstport) > 5) do={
			#log warning message="Removing nat rules added automaticaly by atmon_script"
			/ip firewall nat remove [/ip firewall nat find comment~"atmon_script_main_$dstport"]
			/ip firewall nat remove [/ip firewall nat find comment~"atmon_script_sub_$dstport"]
			set ($atmonrulecounter->$dstport) 0
		}
	} else {
		set ($atmonrulecounter->$dstport) 0
	}
}

Surely it could have been made prettier, faster, etc., but it worked, did not load Mikrotiks and did an excellent job. We were finally able to connect to our clients' servers and network equipment with just one click. Without raising the VPN and without entering passwords. The system has become really easy to work with. Service times were shrinking, and we were all wasting time working, not connecting to the right facilities.

Mikrotik backup

We have configured the backup of all Mikrotiks to FTP. And in general, everything was fine. But when it was necessary to get a backup, but it was necessary to open this FTP and look for it there. We have a system where all the routers are set up, we can communicate with devices via SSH. Why don’t we make it so that the system itself will take backups from all Mikrotiks every day, I thought. And he began to implement. We connected, made a backup and took it to the storage.

PHP script code for removing backup from Mikrotik:

<?php

	$IP = '0.0.0.0';
	$LOGIN = 'admin';
	$PASSWORD = '';
	$BACKUP_NAME = 'test';

    $connection = ssh2_connect($IP, 22);

    if (!ssh2_auth_password($connection, $LOGIN, $PASSWORD)) exit;

    ssh2_exec($connection, '/system backup save name="atmon" password="atmon"');
    stream_get_contents($connection);
    ssh2_exec($connection, '/export file="atmon.rsc"');
    stream_get_contents($connection);
    sleep(40); // Waiting bakup makes

    $sftp = ssh2_sftp($connection);

    // Download backup file
    $size = filesize("ssh2.sftp://$sftp/atmon.backup");
    $stream = fopen("ssh2.sftp://$sftp/atmon.backup", 'r');
    $contents = '';
    $read = 0;
    $len = $size;
    while ($read < $len && ($buf = fread($stream, $len - $read))) {
        $read += strlen($buf);
        $contents .= $buf;
    }
    file_put_contents ($BACKUP_NAME . ‘.backup’,$contents);
    @fclose($stream);

    sleep(3);
    // Download RSC file
    $size = filesize("ssh2.sftp://$sftp/atmon.rsc");
    $stream = fopen("ssh2.sftp://$sftp/atmon.rsc", 'r');
    $contents = '';
    $read = 0;
    $len = $size;
    while ($read < $len && ($buf = fread($stream, $len - $read))) {
        $read += strlen($buf);
        $contents .= $buf;
    }
    file_put_contents ($BACKUP_NAME . ‘.rsc’,$contents);
    @fclose($stream);

    ssh2_exec($connection, '/file remove atmon.backup');
    ssh2_exec($connection, '/file remove atmon.rsc');

?>

The backup is removed in two forms - binary and text config. The binary helps to quickly restore the desired config, and the text one allows you to understand what needs to be done if there was a forced replacement of equipment and the binary cannot be uploaded to it. As a result, we got one more convenient functionality in the system. Moreover, when adding new mikrotiks, it was not necessary to configure anything, just added an object to the system and set an account for it from SSH. Then the system itself was engaged in the removal of backups. This functionality is not yet available in the current version of SaaS Veliam, but we will port it soon.

Screenshots of how it looked in the internal system
From outsourcing to development (Part 1)

Transition to normal storage in the database

I already wrote above that artifacts appeared. Sometimes the entire list of objects in the system simply disappeared, sometimes when editing an object, the information was not saved and the object had to be renamed three times. This terribly annoyed everyone. The disappearance of objects happened rarely, and was easily restored by restoring this very file, but the fail when editing objects was quite common. Probably, I initially didn’t do it through the database because it didn’t fit in my mind how to keep a tree with all the links in a flat table. It is flat, but the tree is hierarchical. But a good solution for multiple access, and later (as the system becomes more complex) and transactional, is a DBMS. I'm certainly not the first to encounter this problem. Useful to google. It turned out that everything had already been invented before me and there are several algorithms that build a tree from a flat table. After looking at each, I implemented one of them. But it was already a new version of the system, because. in fact, a lot of rewriting had to be done because of this. The result was natural, the problems of random behavior of the system are gone. Some might say that errors are quite amateurish (single-threaded scripts, storing information to which there was multiple simultaneous access from different threads in a file, etc.) in the field of software development. Maybe it is, but my main job was administration, and programming was a side job for the soul, and I simply did not have experience working in a team of programmers, where such elementary things would have been prompted to me by senior comrades. Therefore, I stuffed all these bumps on my own, but I learned the material very well. And also, one’s own business is meetings with clients, and actions aimed at trying to promote the company, and a bunch of administrative issues within the company, and much, much more. But one way or another, what was already there was in demand. The guys and myself used the product in daily work. There were frankly unsuccessful ideas, and solutions that took time, but in the end it became clear that this was a non-working tool and no one used it and it didn’t get to Veliam.

Helpdesk - Helpdesk

It would not be superfluous to mention how HelpDesk was formed. This is a completely different story, because. in Veliam this is already the 3rd completely new version, which is different from all the previous ones. Now it is a simple system, intuitive without unnecessary bells and whistles, with the ability to integrate with the domain, as well as with the ability to access the same user profile from anywhere using the link from the letter. And most importantly, it is possible from anywhere (at my home or in the office) to connect to the applicant via VNC directly from the application without VPN or port forwarding. I’ll tell you how we came to this, what happened before and what terrible decisions we made.

We connected to users through the well-known TeamViewer. TV is installed on all computers whose users we serve. The first thing we did wrong, and subsequently removed it, was the binding of each client of the HD to the hardware. How did the user enter the HD system in order to leave a request? In addition to TV, a special utility written in Lazarus was installed on everyone’s computers (here many will widen their eyes, and maybe even google what it is, but I knew Delphi best of all compiled languages, and Lazarus is almost the same, only free). In general, the user launched a special batch file that launched this utility, which, in turn, read the HWID of the system and after that the browser was launched and authorization took place. Why was this done? In some companies, the number of users served is counted individually, and the service price for each month is formed based on the number of people. This is understandable, you say, but why bind to the hardware. Very simply, some individuals came home and made a request from their home laptop in the style of “make everything beautiful for me here”. In addition to reading the HWID of the system, the utility pulled Teamviewer's current ID from the registry and passed it to us in the same way. Teamviewer has an API for integration. And we made this integration. But there was one catch. Through these APIs, it is impossible to connect to the user's computer when he does not explicitly initiate this session, and after trying to connect to him, he must still click "confirm". At that time, it seemed logical to us that no one should connect without the user's request, and since a person is at the computer, he will initiate the session and answer the remote connection request in the affirmative. Everything turned out wrong. The applicants forgot to press the start of the session, and they had to say this in a telephone conversation. This was time consuming and unnerving on both sides of the process. Moreover, it is by no means uncommon for such moments when a person leaves an application, but allows you to connect only when he leaves for lunch. For the problem is not critical and does not want its workflow to be interrupted. Accordingly, he will not press any buttons for permission to connect. This is how additional functionality appeared during authorization in HelpDesk - reading Teamviwer's ID. We knew the permanent password that was used when installing Teamviwer. More precisely, only the system knew it, since it was sewn into the installer, and into our system. Accordingly, there was a connection button from the application by clicking on which you didn’t have to wait for anything, but Teamviewer immediately opened and a connection took place. As a result, there were two types of possible connections. Through the official Teamviewer API and our self-made one. To my surprise, they stopped using the first one almost immediately, although there was an instruction to use it only in special cases and when the user himself gives the go-ahead for this. Give me safety now though. But it turned out that the applicants did not need it. They all absolutely do not mind that they are connected without a confirmation button.

Transition to multithreading in Linux

For a long time, the question of accelerating the passage of a network scanner for the openness of a predefined list of ports and simple pinging of network objects has already begun to arise. Here, by itself, the first solution that comes to mind is multithreading. Since the main time spent on ping is waiting for the packet to return, and the next ping cannot begin until the previous packet returns, in companies with even 20+ servers plus network equipment, this has already worked quite slowly. The bottom line is that one package may disappear without immediately notifying the system administrator about it. It's just that spam will very quickly stop perceiving. So you need to ping each object more than once before making a conclusion about inaccessibility. If you do not go into details, then you need to parallel because if this is not done, then most likely the system administrator will learn about the problem from the client, and not from the monitoring system.

By itself, PHP is not capable of multithreading out of the box. Able to multiprocess, you can fork. But, in fact, I already had a polling mechanism written and I wanted to make it so that I once counted all the nodes I needed from the database, pinged everything at once, waited for a response from each, and only after that I immediately wrote the data. This saves on the number of read requests. Multithreading fit perfectly into this idea. For PHP, there is a PThreads module that allows you to do real multithreading, although it took a lot of work to set it up for PHP 7.2, but it was done. Port scanning and ping are fast. And instead of, for example, 15 seconds per lap earlier, this process began to take 2 seconds. It was a good result.

Quick audit of new companies

How did the functionality for collecting various metrics and characteristics of iron appear? Everything is simple. We are sometimes ordered to simply audit the current IT infrastructure. Well, the same thing is necessary to speed up the audit of a new client. I needed something that would allow me to come to a medium or large company and quickly find out what they generally have. Ping in the internal network is blocked, in my opinion, only by those who want to make life difficult for themselves, and in our experience there are few such people. But these also occur. Accordingly, you can quickly scan networks for devices with a simple ping. Then you can add them and scan for open ports that interest us. In fact, this functionality was already there, it was only necessary to add a command from the central server to the slave, so that it scanned the specified networks and added everything it found to the list. I forgot to mention, it was assumed that we already had a ready-made image with a configured system (slave monitoring server) that we could simply roll out from the client during the audit and hook it up to our cloud.

But the audit result usually includes a bunch of different information, and one of them is what kind of devices are on the network. First of all, we were interested in Windows servers and Windows workstations as part of a domain. Since in medium and large companies, the absence of a domain is probably an exception to the rule. To speak the same language, the average, in my opinion, is 100+ people. It was necessary to come up with a way to collect data from all Windows machines and servers, knowing their IP and domain administrator account, but without installing any software on each of them. The WMI interface comes to the rescue. Windows Management Instrumentation (WMI) literally means Windows Management Instrumentation. WMI is one of the underlying technologies for centrally managing and monitoring the operation of various parts of a computer infrastructure running on the Windows platform. Taken from wiki. Next, I had to tinker again in order to build wmic (this is a WMI client) for Debian. After everything was ready, it remained just to poll the necessary nodes through wmic for the necessary information. Through WMI, you can get almost any information from a Windows computer, and moreover, through it you can also control the computer, well, for example, send it to reboot. This is how the collection of information about Windows stations and servers in our system appeared. Plus, there was also current information about the current indicators of system load. We request them more often, and information on hardware less often. After that, auditing became a little more pleasant.

Software Distribution Decision

We ourselves use the system on a daily basis, and it is always open at every technical employee. And we thought that we could share with others what we already have. The system was not yet ready to be distributed. A lot of things had to be reworked for the local version to become SaaS. These are changes in various technical aspects of the system operation (remote connections, support service), and analysis of modules for licensing, and sharding of customer databases, and scaling each of the services, and developing auto-update systems for all parts. But this will be the second part of the article.

Update

The second part

Source: habr.com

Add a comment