How to build a rocket booster for PowerCLI scripts 

Sooner or later, any VMware system administrator comes to automating routine tasks. It all starts with the command line, then comes PowerShell or VMware PowerCLI.

Let's say you have mastered PowerShell a little further than running ISE and using standard cmdlets from modules that work through "some kind of magic." When you start counting hundreds of virtual machines, you will find that the scripts that helped out on a small scale run noticeably slower on a large one. 

In this situation, 2 tools will help out:

  • PowerShell Runspaces – an approach that allows you to parallelize the execution of processes in separate threads; 
  • get-view - the basic function of PowerCLI, an analogue of Get-WMIObject in Windows. This cmdlet doesn't pull related entity objects, but gets the information in the form of a simple object with simple data types. In many cases it comes out faster.

Next, I will briefly talk about each tool and show examples of use. Let's analyze specific scripts and see when one works better, when the second. Go!

How to build a rocket booster for PowerCLI scripts

First Stage: Runspace

So, Runspace is designed for parallel processing of tasks outside the main module. Of course, you can start another process that will eat up some memory, processor, etc. If your script runs in a couple of minutes and wastes gigabytes of memory, most likely you will not need Runspace. But for scripts for tens of thousands of objects, it is needed.

You can start learning from here: 
Beginning Use of PowerShell Runspaces: Part 1

What gives the use of Runspace:

  • speed by limiting the list of executable commands,
  • parallel execution of tasks,
  • security.

Here is an example from the internet where Runspace helps:

“Storage contention is one of the hardest metrics to track in vSphere. Inside vCenter, you can’t just go and see which VM consumes more storage resources. Luckily, you can collect this data in minutes with PowerShell.
I will share a script that will allow VMware system administrators to quickly search the entire vCenter and get a list of VMs with data on their average consumption.  
The script uses PowerShell runspaces so that each ESXi host collects information on the consumption of its own VMs in a separate Runspace and immediately reports the completion. This allows PowerShell to close jobs immediately, rather than iterating through hosts and waiting for each one to complete its request.”

Source: How to Show Virtual Machine I/O on an ESXi Dashboard

In the case below, Runspace is out of business:

“I'm trying to write a script that collects a lot of data from the VM and writes new data if necessary. The problem is that there are quite a lot of VMs, and it takes 5-8 seconds for one machine.” 

Source: Multithreading PowerCLI with RunspacePool

Get-View is needed here, let's move on to it. 

Second step: Get-View

To understand how Get-View is useful, it is worth remembering how cmdlets work in general. 

Cmdlets are needed to conveniently obtain information without having to study API references and reinvent the next wheel. What in the old days was written in a hundred or two lines of code, PowerShell allows you to do it with one command. We pay for this convenience with speed. There is no magic inside the cmdlets themselves: the same script, but of a lower level, written by the skillful hands of a master from sunny India.

Now, for comparison with Get-View, let's take the Get-VM cmdlet: it accesses the virtual machine and returns a composite object, that is, it attaches other related objects to it: VMHost, Datastore, etc.  

Get-View in its place does not screw anything extra into the returned object. Moreover, it allows you to hardcode what information we need, which will make it easier to output an object. In Windows Server in general and in Hyper-V in particular, the Get-WMIObject cmdlet is a direct analogue - the idea is absolutely the same.

Get-View is inconvenient in routine operations on point objects. But when it comes to thousands and tens of thousands of objects, it has no price.

Read more on the VMware blog: Introduction to Get-View

Now I will show everything on a real case. 

Writing a script to unload a VM

One day my colleague asked me to optimize his script. The task is the usual routine: find all VMs with a duplicate cloud.uuid parameter (yes, this is possible when cloning a VM in vCloud Director). 

The obvious solution that comes to mind is:

  1. Get a list of all VMs.
  2. Somehow parse the list.

The initial version was such a simple script:

function Get-CloudUUID1 {
   # Получаем список всех ВМ
   $vms = Get-VM
   $report = @()

   # Обрабатываем каждый объект, получая из него только 2 свойства: Имя ВМ и Cloud UUID.
   # Заносим данные в новый PS-объект с полями VM и UUID
   foreach ($vm in $vms)
   {
       $table = "" | select VM,UUID

       $table.VM = $vm.name
       $table.UUID = ($vm | Get-AdvancedSetting -Name cloud.uuid).Value
          
       $report += $table
   }
# Возвращаем все объекты
   $report
}
# Далее РУКАМИ парсим полученный результат

Everything is extremely simple and clear. Written in a couple of minutes with a coffee break. Tighten the filter, and it's done.

But let's take a look at the time:

How to build a rocket booster for PowerCLI scripts

How to build a rocket booster for PowerCLI scripts

2 minutes 47 seconds while processing almost 10k VMs. The bonus is the lack of filters and the need to manually sort the result. It is obvious that the script needs optimization.

Runspaces are the first to come to the rescue when you need to get host metrics from vCenter at a time or you need to process tens of thousands of objects. Let's see what this approach gives.

Turning on the first speed: PowerShell Runspaces

The first thing that comes to mind for this script is to execute the loop not sequentially, but in parallel streams, collect all the data into one object and filter. 

But there is a problem: PowerCLI will not allow us to open many independent sessions to vCenter and will throw a funny error:

You have modified the global:DefaultVIServer and global:DefaultVIServers system variables. This is not allowed. Please reset them to $null and reconnect to the vSphere server.

To solve it, we must first pass information about the session inside the thread. We recall that PowerShell works with objects that can be passed as a parameter even to a function, even to a ScriptBlock. Let's pass the session as such an object, bypassing $global:DefaultVIServers (Connect-VIServer with the -NotDefault key):

$ConnectionString = @()
foreach ($vCenter in $vCenters)
   {
       try {
           $ConnectionString += Connect-VIServer -Server $vCenter -Credential $Credential -NotDefault -AllLinked -Force -ErrorAction stop -WarningAction SilentlyContinue -ErrorVariable er
       }
       catch {
           if ($er.Message -like "*not part of a linked mode*")
           {
               try {
                   $ConnectionString += Connect-VIServer -Server $vCenter -Credential $Credential -NotDefault -Force -ErrorAction stop -WarningAction SilentlyContinue -ErrorVariable er
               }
               catch {
                   throw $_
               }
              
           }
           else {
               throw $_
           }
       }
   }

Now we implement multithreading through Runspace Pools.  

The algorithm is as follows:

  1. We get a list of all VMs.
  2. In parallel streams, we get cloud.uuid.
  3. We collect data from streams into one object.
  4. We filter the object by grouping by the value of the CloudUUID field: those where the number of unique values ​​is greater than 1 are the desired VMs.

As a result, we get the script:


function Get-VMCloudUUID {
   param (
       [string[]]
       [ValidateNotNullOrEmpty()]
       $vCenters = @(),
       [int]$MaxThreads,
       [System.Management.Automation.PSCredential]
       [System.Management.Automation.Credential()]
       $Credential
   )

   $ConnectionString = @()

   # Создаем объект с сессионным ключом
   foreach ($vCenter in $vCenters)
   {
       try {
           $ConnectionString += Connect-VIServer -Server $vCenter -Credential $Credential -NotDefault -AllLinked -Force -ErrorAction stop -WarningAction SilentlyContinue -ErrorVariable er
       }
       catch {
           if ($er.Message -like "*not part of a linked mode*")
           {
               try {
                   $ConnectionString += Connect-VIServer -Server $vCenter -Credential $Credential -NotDefault -Force -ErrorAction stop -WarningAction SilentlyContinue -ErrorVariable er
               }
               catch {
                   throw $_
               }
              
           }
           else {
               throw $_
           }
       }
   }

   # Получаем список всех ВМ
   $Global:AllVMs = Get-VM -Server $ConnectionString

   # Поехали!
   $ISS = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
   $RunspacePool = [runspacefactory]::CreateRunspacePool(1, $MaxThreads, $ISS, $Host)
   $RunspacePool.ApartmentState = "MTA"
   $RunspacePool.Open()
   $Jobs = @()

# ScriptBlock с магией!)))
# Именно он будет выполняться в потоке
   $scriptblock = {
       Param (
       $ConnectionString,
       $VM
       )

       $Data = $VM | Get-AdvancedSetting -Name Cloud.uuid -Server $ConnectionString | Select-Object @{N="VMName";E={$_.Entity.Name}},@{N="CloudUUID";E={$_.Value}},@{N="PowerState";E={$_.Entity.PowerState}}

       return $Data
   }
# Генерируем потоки

   foreach($VM in $AllVMs)
   {
       $PowershellThread = [PowerShell]::Create()
# Добавляем скрипт
       $null = $PowershellThread.AddScript($scriptblock)
# И объекты, которые передадим в качестве параметров скрипту
       $null = $PowershellThread.AddArgument($ConnectionString)
       $null = $PowershellThread.AddArgument($VM)
       $PowershellThread.RunspacePool = $RunspacePool
       $Handle = $PowershellThread.BeginInvoke()
       $Job = "" | Select-Object Handle, Thread, object
       $Job.Handle = $Handle
       $Job.Thread = $PowershellThread
       $Job.Object = $VM.ToString()
       $Jobs += $Job
   }

# Ставим градусник, чтобы наглядно отслеживать выполнение заданий
# И здесь же прибиваем отработавшие задания
   While (@($Jobs | Where-Object {$_.Handle -ne $Null}).count -gt 0)
   {
       $Remaining = "$($($Jobs | Where-Object {$_.Handle.IsCompleted -eq $False}).object)"

       If ($Remaining.Length -gt 60) {
           $Remaining = $Remaining.Substring(0,60) + "..."
       }

       Write-Progress -Activity "Waiting for Jobs - $($MaxThreads - $($RunspacePool.GetAvailableRunspaces())) of $MaxThreads threads running" -PercentComplete (($Jobs.count - $($($Jobs | Where-Object {$_.Handle.IsCompleted -eq $False}).count)) / $Jobs.Count * 100) -Status "$(@($($Jobs | Where-Object {$_.Handle.IsCompleted -eq $False})).count) remaining - $remaining"

       ForEach ($Job in $($Jobs | Where-Object {$_.Handle.IsCompleted -eq $True})){
           $Job.Thread.EndInvoke($Job.Handle)     
           $Job.Thread.Dispose()
           $Job.Thread = $Null
           $Job.Handle = $Null
       }
   }

   $RunspacePool.Close() | Out-Null
   $RunspacePool.Dispose() | Out-Null
}


function Get-CloudUUID2
{
   [CmdletBinding()]
   param(
   [string[]]
   [ValidateNotNullOrEmpty()]
   $vCenters = @(),
   [int]$MaxThreads = 50,
   [System.Management.Automation.PSCredential]
   [System.Management.Automation.Credential()]
   $Credential)

   if(!$Credential)
   {
       $Credential = Get-Credential -Message "Please enter vCenter credentials."
   }

   # Вызов функции Get-VMCloudUUID, где мы распараллеливаем операцию
   $AllCloudVMs = Get-VMCloudUUID -vCenters $vCenters -MaxThreads $MaxThreads -Credential $Credential
   $Result = $AllCloudVMs | Sort-Object Value | Group-Object -Property CloudUUID | Where-Object -FilterScript {$_.Count -gt 1} | Select-Object -ExpandProperty Group
   $Result
}

The beauty of this script is that it can be used in other similar cases by simply replacing the ScriptBlock and the parameters that will be passed to the stream. Exploit it!

We measure time:

How to build a rocket booster for PowerCLI scripts

55 seconds. Already better, but still faster. 

Switching to the second speed: GetView

Let's find out what's wrong.
First and obvious, the Get-VM cmdlet takes a long time to complete.
Second, the Get-AdvancedOptions cmdlet takes even longer to complete.
Let's deal with the second one first. 

Get-AdvancedOptions is handy on single VM objects, but very clumsy when dealing with multiple objects. We can get the same information from the virtual machine object (Get-VM) itself. It's just well buried in the ExtensionData object. Armed with filtering, we speed up the process of obtaining the necessary data.

With a flick of the wrist, this is:


VM | Get-AdvancedSetting -Name Cloud.uuid -Server $ConnectionString | Select-Object @{N="VMName";E={$_.Entity.Name}},@{N="CloudUUID";E={$_.Value}},@{N="PowerState";E={$_.Entity.PowerState}}

Turns into this:


$VM | Where-Object {($_.ExtensionData.Config.ExtraConfig | Where-Object {$_.key -eq "cloud.uuid"}).Value -ne $null} | Select-Object @{N="VMName";E={$_.Name}},@{N="CloudUUID";E={($_.ExtensionData.Config.ExtraConfig | Where-Object {$_.key -eq "cloud.uuid"}).Value}},@{N="PowerState";E={$_.summary.runtime.powerstate}}

The output is the same as Get-AdvancedOptions, but it's much faster. 

Now to Get-VM. It is not fast, as it deals with complex objects. A logical question arises: why do we need extra information and a monstrous PSObject in this case, when we just need the name of the VM, its state and the value of the tricky attribute?  

In addition, the brake in the face of Get-AdvancedOptions is gone from the script. The use of Runspace Pools now looks like overkill, since there is no longer a need to parallelize a slow task in threads with session transfer squats. The tool is good, but not for this case. 

We look at the output of ExtensionData: this is nothing more than a Get-View object. 

Let's call on the ancient technique of the PowerShell wizards: one line using filters, sorting and grouping. All the previous horror is elegantly collapsed into one line and executed in one session:


$AllVMs = Get-View -viewtype VirtualMachine -Property Name,Config.ExtraConfig,summary.runtime.powerstate | Where-Object {($_.Config.ExtraConfig | Where-Object {$_.key -eq "cloud.uuid"}).Value -ne $null} | Select-Object @{N="VMName";E={$_.Name}},@{N="CloudUUID";E={($_.Config.ExtraConfig | Where-Object {$_.key -eq "cloud.uuid"}).Value}},@{N="PowerState";E={$_.summary.runtime.powerstate}} | Sort-Object CloudUUID | Group-Object -Property CloudUUID | Where-Object -FilterScript {$_.Count -gt 1} | Select-Object -ExpandProperty Group

We measure time:

How to build a rocket booster for PowerCLI scripts

9 seconds for almost 10k objects with filtering by the desired condition. Great!

Instead of a conclusion

An acceptable result directly depends on the choice of tool. It is often difficult to say for sure what exactly should be chosen to achieve it. Each of the above methods of accelerating scripts is good within the limits of its applicability. I hope this article will help you in the difficult task of understanding the basics of process automation and optimizing them in your infrastructure.

PS: The author thanks all the members of the commune for their help and support in preparing the article. Even those with paws. And even those who have no paws, like a boa constrictor.

Source: habr.com

Add a comment