Freeze or upgrade - what will we do on vacation?

Freeze or upgrade - what will we do on vacation?

The New Year holidays are approaching and on the eve of the holidays and entrance it's time to answer the question: what will happen to the IT infrastructure at this time? How will she survive without us all this time? Or maybe spend this time on upgrading the IT infrastructure so that “it all works by itself” within a year?

The option when the IT department intends to have a rest in full force together with everyone (with the exception of the administrators on duty, if any) requires the implementation of complex work, which can be denoted by the general term "freeze".

Planned work is the opposite, when you can take the opportunity to calmly try to take any necessary actions, for example, upgrade network and / or server equipment.

"Freeze"

The basic principle of this strategy is "It works - don't touch it."

Starting from a certain point in time, a moratorium is declared on all work,
related to development and improvement.

All questions on improvement and development are postponed to a later time.

Working services are thoroughly tested.

All identified problems are analyzed and divided into two types:
and intractable.

Easily fixable problems are first analyzed for: what will happen
If? Work to eliminate them is carried out only in the absence of
potential difficulties.

Intractable problems are fixed, documented, but their implementation
postponed until the end of the moratorium.

Before checking, a plan is developed where objects for control are entered,
control parameters and methods of verification.

For example, Windows file servers - reading Event logs, checking status
RAID array, etc.

Network infrastructure has its own reporting tools.

For equipment with cloud platform support Zyxel Nebula In principle, there are no special problems, the system works, information is collected.

For firewalls, the role of such a data collector can be assumed by the service
SecuReporter.

The greatest danger to the normal development of events occurs at the moment of a forced pause. When all the verification work has already ended, and the weekend has not yet arrived. In the free time, employees do not know what to do with themselves. It has been noticed that all the nightmarish problems that caused a bunch of stupid unnecessary work to eliminate them began with the words: "I'll just try ...".

Intensified documentation work is perfect for filling the gap in work during such periods. The benefit of this is twofold: not only to occupy someone's playful hands and shiny eyes with something, but also to reduce the time to eliminate incidents if they do arise.

On weekends and holidays, employees are often unavailable, so if relevant information is stored only in someone's brilliant head, it's time to transfer it to paper or to a file.

By the way, about paper media. Despite accusations of backwardness, hard copies of documents, for example, listings of servers with IP and MAC addresses, a network diagram, various regulations, are very useful. Especially the rules for turning on and off, because the situation is: in order to properly start the IT infrastructure, you need to read the documentation and only then turn on the equipment, and in order to read the documentation, you need to turn on the equipment - although not often, it does occur. A similar situation is when most of the servers are safely sent to shutdown before the power is turned off, and the required document is just stored on one of them. And of course, such situations arise at the most inopportune moment.

So, all important technical details are documented. What else is there to take care of?

  • Check the video surveillance system, if necessary, free up space on the system
    storage of video data.

  • Check alarms, both burglar and fire alarms.

  • Check if the bills for the Internet, domain names, website hosting and
    other cloud services.

  • Check the availability of spare parts, primarily hard drives and SSDs for replacement in
    RAID arrays.

  • Replacement parts (spare parts) should be stored in close proximity to the equipment for which they are intended. The option when the disk failed at a remote facility outside the city, and the components are stored at the central office is not very pleasant on New Year's Eve.

  • Update the contact list of useful employees, including the secretary (office manager), head of security, supply manager, storekeeper and other employees who are not directly related to the IT department, but may be needed in a critical situation.

IMPORTANT! All the necessary contacts should be available to all employees of the IT department. It’s one thing when people meet every time in the office, when the treasured file with phone numbers and addresses is always available on a shared resource, and another thing when an employee tries to solve a problem remotely when there is no one in the office.

ATTENTION! If the equipment is located in the data center, you should take care of passes in advance for employees who are allowed access to the equipment on weekends and holidays.

The same applies to the situation when the server room is located in a rented building. You can easily run into a situation where, at the will of the "highest authorities" on weekends and holidays, access is limited and the guards do not even let the system administrator inside the building.

It is also worth taking care of the performance of remote access. If everything is more or less clear with servers - in extreme cases, if RDP or SSH does not respond - there is IPMI (for example, iLO for HP servers or IMM2 for IBM), then it is not so easy with remote equipment.

Users of Zyxel Nebula in this case are in a better situation.

For example, if the Internet gateway configuration was incorrectly configured during remote work, then you can easily get the situation: “the key to the urgent medical room is stored in the urgent medical room.” And there is only one thing left: to come to the server room, to the office, to the data center, to a remote facility, etc.

Luckily for us, Nebula always warns of possible problems related to incorrect configuration.

Most importantly, cloud management uses an outgoing connection when a piece of network equipment itself establishes a connection to the management environment. That is, you do not need to “dig holes” on the firewall, and there is less risk that resetting the settings to zero will close these “holes” again.

ADVICE. In Nebula, you can enter information about the placement of equipment and the most
important contacts as a note.

Planned work

The New Year holidays are only for ordinary workers an unconditional break from work. Often the IT department is forced to use these free days as the only opportunity to put the infrastructure in order.

In many cases, instead of riding reindeer, one has to modernize and rebuild the IT infrastructure, fix old problems that could not be reached in ordinary days. Things like cross-wiring, replacing network infrastructure elements, rebuilding the VLAN structure, tweaking the hardware configuration to improve security, and so on.

Let's immediately briefly analyze the main points that you need to go through during the preparation and implementation of planned work.

We answer the question: "Why?"

To be honest, it happens that technical work is carried out for the sake of a “tick”, because the authorities want it that way. In this case, it is better to return to the “Freeze” item, “repainting” this process for visible modernization. In the end, the documentation will have to be updated anyway.

Thoroughly documenting the system

It seems that there is a server, but no one knows what is running on it. There is an old NoName switch with configured VLANs, but how to change or configure them is unknown and not clear.

First, we clarify and find out all the technical nuances of the IT infrastructure, and only then do we plan something.

Who is the owner of this process (resource, service, server, equipment, premises, and so on)?

The owner is not a material owner, but a process owner. For example, this switch is used by the CCTV department and after reconfiguring the VLAN, the cameras lost contact with the server for storing video data - this is somehow not good at all and a “workaround” should be provided if it is really necessary. The option “Oh, we didn’t know that this is your piece of iron” - this, in principle, should not be.

As in the case of the “freeze”, we update the contact list “for all occasions”, in which we do not forget to add process owners.

We develop an action plan

If the plan is kept only in the minds, it's no good. If it's on paper, it's already a little better. If it is carefully worked out with all the “competitors”, including the head of security, who will have to issue the keys to the locked offices if necessary, this is already something.

A plan with the signatures of various bosses, at least according to the principle: “Notified. Agreed" - this will save you from various problems in the form: "But no one
warned!" Therefore, be prepared at the very end to prepare the relevant documents for signature.

We create backups for everything, everything, everything!

At the same time, backups are not only a copy of all business data, but also configuration files, snapshots (images) of system disks, and so on. We won't go into detail about backing up business data and quick recovery information. If we talk about the theory and practice of backup, then this is dedicated to whole separate allowance

To back up the configuration of network equipment, you can use both the built-in options for saving configuration files and external services like Zyxel Nebula or Zyxel SecuManager

Working on alternate options

There is always a situation when something went wrong or for some reason you need to deviate from the main plan. For example, the same CCTV department changed their mind about changing the VLAN on their switch. You always need to have an answer to the question: “What if?”

And, finally, when everything has been worked out, labor costs have been estimated, man-hours have been calculated, and we have thought about how much to ask for time off and bonuses for this, it is worth returning to the “Why?” and once again critically reconsider what was conceived.

We coordinate downtime and other aspects of work

Little warning. It is necessary to bring to the authorities and other employees a clear understanding that something (or even the whole thing) may not work for some time.

We must be prepared for the fact that downtime can be greatly reduced from some part
plan to be abandoned?

“What did you want? You, IT people, only spend money and interfere with work! Rejoice that at least they agreed on this! - these are the kind of arguments you sometimes hear in response to any question regarding technical work and modernization.

Again we look at the point "Why?"

We think for a long time on the subject: “Why is all this necessary?” and “Is the game worth the candle?”

And only if, after all these stages, what was conceived is beyond doubt, it is worth
to proceed with the implementation of the planned, planned, prepared and
agreed with all authorities.

Of course, in such a short review it is impossible to describe all life situations. But we honestly tried to describe some of the most common moments. And of course, there will always be companies and divisions where all this is taken into account, special documents are written and approved.

But it is not important. Something else is important.

The main thing is that everything goes smoothly and without failures. And may the New Year be successful for you!

Happy holiday, colleagues!

Useful links

  1. Our body for networkers. We help, communicate, learn about all sorts of goodies from Zyxel.
  2. Nebula cloud network on Zyxel official website.
  3. Description of the Cloud CNM SecuReporter analytics service on the official website
    Zyxel
    .
  4. Description of software for management and analytics Cloud CNM SecuManager on the official
    Online
    Zyxel
    .
  5. Useful resources at Zyxel Support Campus EMEA -
    Nebula
    .

Source: habr.com

Add a comment