Introduction
With a sufficiently large and distributed infrastructure that uses DFS as a single data access point and DFSR for data replication between the data center and branch servers, the question arises of monitoring the status of this replication.
Coincidentally, almost immediately after the start of using DFSR, we began implementing Zabbix in order to replace the existing zoo of various tools and bring infrastructure monitoring to a more informative, complete and logical form. We will talk about using Zabbix to monitor DFS replication.
First of all, we need to decide what data about DFS replication we need to receive in order to monitor its status. The most relevant indicator is the backlog. It contains files that have not been synchronized with other members of the replication group. You can see its size with the utility dfsrdiaginstalled with the DFSR role. In the normal state of replication, the backlog size should tend to zero. Accordingly, large numbers of files in the backlog indicate problems with replication.
Now about the practical side of the issue.
In order to monitor the backlog size through Zabbix Agent, we need:
- Script that will parse the output dfsrdiag to provide final backlog size values ββto Zabbix,
- A script that will determine how many replication groups there are on the server, what folders they replicate and what other servers they include (we donβt want to drive all this into Zabbix by hand for each server, right?),
- Entering these scripts as a UserParameter into the Zabbix agent configuration for subsequent call from the monitoring server,
- Starting the Zabbix agent service as a user with rights to read the backlog,
- Template for Zabbix, in which group discovery will be configured, processing of received data and issuing alerts on them.
Script Parser
To write the parser, I chose VBS as the most universal language present in all versions of Windows Server. The logic of the script is simple: it receives the name of the replication group, the replicated folder, and the names of the sending and receiving servers via the command line. These parameters are then passed to dfsrdiag, and depending on its output:
Number of files - if a message about the presence of files in the backlog is received,
0 - if a message was received about the absence of files in the backlog ("No Backlog"),
-1 - if an error message was received dfsrdiag when executing the request ("[ERROR]").
get-Backlog.vbs
strReplicationGroup=WScript.Arguments.Item(0)
strReplicatedFolder=WScript.Arguments.Item(1)
strSending=WScript.Arguments.Item(2)
strReceiving=WScript.Arguments.Item(3)
Set WshShell = CreateObject ("Wscript.shell")
Set objExec = WSHshell.Exec("dfsrdiag.exe Backlog /RGName:""" & strReplicationGroup & """ /RFName:""" & strReplicatedFolder & """ /SendingMember:" & strSending & " /ReceivingMember:" & strReceiving)
strResult = ""
Do While Not objExec.StdOut.AtEndOfStream
strResult = strResult & objExec.StdOut.ReadLine() & "\"
Loop
If InStr(strResult, "No Backlog") > 0 then
intBackLog = 0
ElseIf InStr(strResult, "[ERROR]") > 0 Then
intBackLog = -1
Else
arrLines = Split(strResult, "\")
arrResult = Split(arrLines(1), ":")
intBackLog = arrResult(1)
End If
WScript.echo intBackLog
Discovery script
In order for Zabbix to determine all the replication groups present on the server itself and find out all the parameters required for the request (folder name, names of neighboring servers), we need to get this information, firstly, and secondly, present it in a format that Zabbix understands. The format that the discovery tool understands looks like this:
"data":[
{
"{#GROUP}":"Share1",
"{#FOLDER}":"Folder1",
"{#SENDING}":"Server1",
"{#RECEIVING}":"Server2"}
...
"{#GROUP}":"ShareN",
"{#FOLDER}":"FolderN",
"{#SENDING}":"Server1",
"{#RECEIVING}":"ServerN"}]}
The easiest way to get the information we are interested in is through WMI, pulling it out of the corresponding sections of DfsrReplicationGroupConfig. As a result, a script was born that generates a request to WMI and outputs a list of groups, their folders and servers in the required format.
DFSRDiscovery.vbs
dim strComputer, strLine, n, k, i
Set wshNetwork = WScript.CreateObject( "WScript.Network" )
strComputer = wshNetwork.ComputerName
Set oWMIService = GetObject("winmgmts:\" & strComputer & "rootMicrosoftDFS")
Set colRGroups = oWMIService.ExecQuery("SELECT * FROM DfsrReplicationGroupConfig")
wscript.echo "{"
wscript.echo " ""data"":["
n=0
k=0
i=0
For Each oGroup in colRGroups
n=n+1
Set colRGFolders = oWMIService.ExecQuery("SELECT * FROM DfsrReplicatedFolderConfig WHERE ReplicationGroupGUID='" & oGroup.ReplicationGroupGUID & "'")
For Each oFolder in colRGFolders
k=k+1
Set colRGConnections = oWMIService.ExecQuery("SELECT * FROM DfsrConnectionConfig WHERE ReplicationGroupGUID='" & oGroup.ReplicationGroupGUID & "'")
For Each oConnection in colRGConnections
i=i+1
binInbound = oConnection.Inbound
strPartner = oConnection.PartnerName
strRGName = oGroup.ReplicationGroupName
strRFName = oFolder.ReplicatedFolderName
If oConnection.Enabled = True and binInbound = False Then
strSendingComputer = strComputer
strReceivingComputer = strPartner
strLine1=" {"
strLine2=" ""{#GROUP}"":""" & strRGName & ""","
strLine3=" ""{#FOLDER}"":""" & strRFName & ""","
strLine4=" ""{#SENDING}"":""" & strSendingComputer & ""","
if (n < colRGroups.Count) or (k < colRGFolders.count) or (i < colRGConnections.Count) then
strLine5=" ""{#RECEIVING}"":""" & strReceivingComputer & """},"
else
strLine5=" ""{#RECEIVING}"":""" & strReceivingComputer & """}]}"
end if
wscript.echo strLine1
wscript.echo strLine2
wscript.echo strLine3
wscript.echo strLine4
wscript.echo strLine5
End If
Next
Next
Next
I agree, the script may not shine with the elegance of the code and something in it can certainly be simplified, but its main function - to give information about the parameters of replication groups in a format understandable by Zabbix - it performs successfully.
Adding scripts to the Zabbix agent configuration
Everything is extremely simple here. Add the following lines to the end of the agent configuration file:
UserParameter=check_dfsr[*],cscript /nologo "C:Program FilesZabbix Agentget-Backlog.vbs" $1 $2 $3 $4
UserParameter=discovery_dfsr[*],cscript /nologo "C:Program FilesZabbix AgentDFSRDiscovery.vbs"
Of course, we correct the paths to those where we have scripts. I put them in the same folder where the agent is installed.
After making changes, restart the Zabbix agent service.
Changing the user under which the Zabbix Agent service is running
In order to receive information through dfsrdiag, the utility must be run on behalf of an account that has administrative rights to both sending and receiving members of the replication group. The Zabbix agent service, running by default under the system account, will not be able to fulfill such a request. I created a separate account in the domain, gave it administrative rights on the required servers, and configured these servers to start the service from under it.
You can also go the other way: since dfsrdiag, in fact, works through the same WMI, you can use
Monitoring Template
Based on the data I received, I created a template that:
- Runs automatic discovery of replication groups once per hour,
- Once every 5 minutes checks the size of the backlog for each group,
- Contains a trigger that issues an alert when the size of the backlog for any group is more than 100 for 30 minutes. The trigger is described as a prototype that is automatically added to the discovered groups,
- Plots the backlog size for each replication group.
You can download the template for Zabbix 2.2
Π‘onclusion
After importing the template into Zabbix and creating an account with the necessary rights, we only need to copy the scripts to the file servers that we want to monitor for DFSR, add two lines to the agent configuration on them and restart the Zabbix agent service, configuring it to run on behalf of the desired account. No other manual settings are required to monitor DFSR.
Source: habr.com