Simple monitoring of DFS Replication in Zabbix

Introduction

With a sufficiently large and distributed infrastructure that uses DFS as a single data access point and DFSR for data replication between the data center and branch servers, the question arises of monitoring the status of this replication.
Coincidentally, almost immediately after the start of using DFSR, we began implementing Zabbix in order to replace the existing zoo of various tools and bring infrastructure monitoring to a more informative, complete and logical form. We will talk about using Zabbix to monitor DFS replication.

First of all, we need to decide what data about DFS replication we need to receive in order to monitor its status. The most relevant indicator is the backlog. It contains files that have not been synchronized with other members of the replication group. You can see its size with the utility dfsrdiaginstalled with the DFSR role. In the normal state of replication, the backlog size should tend to zero. Accordingly, large numbers of files in the backlog indicate problems with replication.

Now about the practical side of the issue.

In order to monitor the backlog size through Zabbix Agent, we need:

  • Script that will parse the output dfsrdiag to provide final backlog size values ​​to Zabbix,
  • A script that will determine how many replication groups there are on the server, what folders they replicate and what other servers they include (we don’t want to drive all this into Zabbix by hand for each server, right?),
  • Entering these scripts as a UserParameter into the Zabbix agent configuration for subsequent call from the monitoring server,
  • Starting the Zabbix agent service as a user with rights to read the backlog,
  • Template for Zabbix, in which group discovery will be configured, processing of received data and issuing alerts on them.

Script Parser

To write the parser, I chose VBS as the most universal language present in all versions of Windows Server. The logic of the script is simple: it receives the name of the replication group, the replicated folder, and the names of the sending and receiving servers via the command line. These parameters are then passed to dfsrdiag, and depending on its output:
Number of files - if a message about the presence of files in the backlog is received,
0 - if a message was received about the absence of files in the backlog ("No Backlog"),
-1 - if an error message was received dfsrdiag when executing the request ("[ERROR]").

get-Backlog.vbs

strReplicationGroup=WScript.Arguments.Item(0)
strReplicatedFolder=WScript.Arguments.Item(1)
strSending=WScript.Arguments.Item(2)
strReceiving=WScript.Arguments.Item(3)

Set WshShell = CreateObject ("Wscript.shell")
Set objExec = WSHshell.Exec("dfsrdiag.exe Backlog /RGName:""" & strReplicationGroup & """ /RFName:""" & strReplicatedFolder & """ /SendingMember:" & strSending & " /ReceivingMember:" & strReceiving)
strResult = ""
Do While Not objExec.StdOut.AtEndOfStream
	strResult = strResult & objExec.StdOut.ReadLine() & "\"
Loop

If InStr(strResult, "No Backlog") > 0 then
	intBackLog = 0
ElseIf  InStr(strResult, "[ERROR]") > 0 Then
    intBackLog = -1
Else
	arrLines = Split(strResult, "\")
	arrResult = Split(arrLines(1), ":")
	intBackLog = arrResult(1)
End If

WScript.echo intBackLog

Discovery script

In order for Zabbix to determine all the replication groups present on the server itself and find out all the parameters required for the request (folder name, names of neighboring servers), we need to get this information, firstly, and secondly, present it in a format that Zabbix understands. The format that the discovery tool understands looks like this:

        "data":[
                {
                        "{#GROUP}":"Share1",
                        "{#FOLDER}":"Folder1",
                        "{#SENDING}":"Server1",
                        "{#RECEIVING}":"Server2"}

...

                        "{#GROUP}":"ShareN",
                        "{#FOLDER}":"FolderN",
                        "{#SENDING}":"Server1",
                        "{#RECEIVING}":"ServerN"}]}

The easiest way to get the information we are interested in is through WMI, pulling it out of the corresponding sections of DfsrReplicationGroupConfig. As a result, a script was born that generates a request to WMI and outputs a list of groups, their folders and servers in the required format.

DFSRDiscovery.vbs


dim strComputer, strLine, n, k, i

Set wshNetwork = WScript.CreateObject( "WScript.Network" )
strComputer = wshNetwork.ComputerName

Set oWMIService = GetObject("winmgmts:\" & strComputer & "rootMicrosoftDFS")
Set colRGroups = oWMIService.ExecQuery("SELECT * FROM DfsrReplicationGroupConfig")
wscript.echo "{"
wscript.echo "        ""data"":["
n=0
k=0
i=0
For Each oGroup in colRGroups
  n=n+1
  Set colRGFolders = oWMIService.ExecQuery("SELECT * FROM DfsrReplicatedFolderConfig WHERE ReplicationGroupGUID='" & oGroup.ReplicationGroupGUID & "'")
  For Each oFolder in colRGFolders
    k=k+1
    Set colRGConnections = oWMIService.ExecQuery("SELECT * FROM DfsrConnectionConfig WHERE ReplicationGroupGUID='" & oGroup.ReplicationGroupGUID & "'")
    For Each oConnection in colRGConnections
      i=i+1
      binInbound = oConnection.Inbound
      strPartner = oConnection.PartnerName
      strRGName = oGroup.ReplicationGroupName
      strRFName = oFolder.ReplicatedFolderName
      If oConnection.Enabled = True and binInbound = False Then
        strSendingComputer = strComputer
        strReceivingComputer = strPartner
        strLine1="                {"    
        strLine2="                        ""{#GROUP}"":""" & strRGName & """," 
        strLine3="                        ""{#FOLDER}"":""" & strRFName & """," 
        strLine4="                        ""{#SENDING}"":""" & strSendingComputer & ""","                  
        if (n < colRGroups.Count) or (k < colRGFolders.count) or (i < colRGConnections.Count) then
          strLine5="                        ""{#RECEIVING}"":""" & strReceivingComputer & """},"
        else
          strLine5="                        ""{#RECEIVING}"":""" & strReceivingComputer & """}]}"       
        end if		
        wscript.echo strLine1
        wscript.echo strLine2
        wscript.echo strLine3
        wscript.echo strLine4
        wscript.echo strLine5	   
      End If
    Next
  Next
Next

I agree, the script may not shine with the elegance of the code and something in it can certainly be simplified, but its main function - to give information about the parameters of replication groups in a format understandable by Zabbix - it performs successfully.

Adding scripts to the Zabbix agent configuration

Everything is extremely simple here. Add the following lines to the end of the agent configuration file:

UserParameter=check_dfsr[*],cscript /nologo "C:Program FilesZabbix Agentget-Backlog.vbs" $1 $2 $3 $4
UserParameter=discovery_dfsr[*],cscript /nologo "C:Program FilesZabbix AgentDFSRDiscovery.vbs"

Of course, we correct the paths to those where we have scripts. I put them in the same folder where the agent is installed.

After making changes, restart the Zabbix agent service.

Changing the user under which the Zabbix Agent service is running

In order to receive information through dfsrdiag, the utility must be run on behalf of an account that has administrative rights to both sending and receiving members of the replication group. The Zabbix agent service, running by default under the system account, will not be able to fulfill such a request. I created a separate account in the domain, gave it administrative rights on the required servers, and configured these servers to start the service from under it.

You can also go the other way: since dfsrdiag, in fact, works through the same WMI, you can use description, how to give a domain account the right to use it without issuing administrative rights, but if we have many replication groups, then it will be difficult to issue rights to each group. However, in case we want to monitor Domain System Volume replication on domain controllers, this may be the only acceptable option, since giving domain administrator rights to the monitoring service account is not the best idea.

Monitoring Template

Based on the data I received, I created a template that:

  • Runs automatic discovery of replication groups once per hour,
  • Once every 5 minutes checks the size of the backlog for each group,
  • Contains a trigger that issues an alert when the size of the backlog for any group is more than 100 for 30 minutes. The trigger is described as a prototype that is automatically added to the discovered groups,
  • Plots the backlog size for each replication group.

You can download the template for Zabbix 2.2 here.

Π‘onclusion

After importing the template into Zabbix and creating an account with the necessary rights, we only need to copy the scripts to the file servers that we want to monitor for DFSR, add two lines to the agent configuration on them and restart the Zabbix agent service, configuring it to run on behalf of the desired account. No other manual settings are required to monitor DFSR.

Source: habr.com

Add a comment