The Surly Admin

Father, husband, IT Pro, cancer survivor

Multithreading Powershell Scripts

In your scripting journey there will come a time that you have a script that is simply running too long. Perhaps you’re wanting to gather information hourly and the script is taking two hours to run. Maybe you’re a consultant and need a discovery script to run as fast as possible so you can get out of there? Whatever it is at some point you’ll consider multithreading. Powershell has this capability baked right into it using Powershell Jobs, but .Net has a way too, and initial testing shows it might be faster! Read on to see what I mean.

A Note about Jobs

Jobs are really the “Powershell” way of doing multithreading, at least until Workflow begins taking on more steam, which was only introduced with Powershell 3.0. There are a couple of ways to run jobs, one using the Start-Job cmdlet and the other is to watch for cmdlet’s with the -AsJob parameter. Both will put a scriptblock–or cmdlet function–into a background job and return the console immediately to you. One major problem with Jobs is that there’s no easy way to throttle them. Setup a loop with 1000 elements in it, each submitting a job into the background and you’ll end up with 1000 jobs running on your computer and everything pretty much grinding to a halt. So in order to throttle this down you have to control the flow of background jobs. This can be done using the Get-Job cmdlet like so:

Do {
   Start-Sleep -Seconds 1
} Until (@(Get-Job).Count -le 5)

You have to force Get-Job into an array–by surrounding it with @( )–because if there are no background jobs running it will return a $null, which of course doesn’t have a count property. This works very effectively but you essentially have to write it twice, because you need to throttle the submission of jobs and then after you’re done you need to monitor the jobs for when they’re all done. You then use Retrieve-Job to get any information returned by the job. After you’ve retrieved the information you then have to dispose of the job to clean up memory using Remove-Job.

The interesting thing here, is it turns out there is a lot of overhead with jobs, especially in the creating and retrieving of the job. If you create a lot of jobs–like DFS Monitor with History does–you could be leaving a lot of performance on the table.

Runspaces

Runspaces are not necessarily a Powershell function, really more of a .NET one. Luckily, since Powershell is a .NET language we have full access to it. I wish I could say I discovered them and wrote the upcoming code myself, but I didn’t. I found two great sources:

First is Boe Prox, who wrote this blog post Using Background Runspaces Instead of PSJobs For Better Performance that really turned me on to the possibilities. I had seen some other posts from him about Runspaces but never looked into it, but with this post you can see the overhead created with Jobs and avoided with Runspaces. Great stuff.

Next shoulder I’m standing on is Jon Boulineau who wrote a very interesting Powershell module to submit and use Runspaces: psasync Module: Multithreaded PowerShell. Another great read and a blog you should definitely follow. Honestly, if you read no further and just used Jon’s module you’d be in great shape, and probably better than the code I’ll be showing you!

So why I’m I writing more on this subject? As good as the above posts were they left some information out and I ended up spending a lot of time distilling what was there so that I understood it. I’ve always said I learn with my fingers and this was a great case of having to write it myself, in my own way, in order to understand it. I’m going to try to save you that and explain what the heck is going on, which is pretty straight forward. This is no deep dive, either. Don’t expect to come away knowing everything there is to know about Runspaces. What I do hope to achieve is you understanding Runspaces well enough to put the code in your own scripts and execute background jobs successfully.

Setting Up Runspaces

The first thing we need to do is set up a Runspace pool. This is where you set aside memory and resources for our background jobs, or pipes/pipelines. There’s really not a lot to these, except for one of the best things about Runspaces and that’s the automatic throttling.

$MaxThreads = 5

$RunspacePool = [RunspaceFactory ]::CreateRunspacePool(1, $MaxThreads)
$RunspacePool.Open()

Now we have $RunspacePool with your pool definition, and we’ve set it for a maximum threads of 5. You can change this value to whatever you want but keep in mind there will be a point of diminishing returns. Have a PC with a single CPU, single core and no Hyper-threading (there are still a few of those out there, aren’t there?) and you probably don’t want to push that thread count too high. Got dual processors with 6-cores each? Yeah, go for it!

The beauty here is you don’t have to worry about submitting too many jobs, the Runspace will manage that for you. Remember our loop above with 1000 elements? Go ahead and submit them all and only up to $MaxThreads will run at a time.

Now we need a script to run in the background, and a variable to hold the Runspace handle which we need to use to track the background job. Last we’ll need something to hold the variable reference to the Runspace itself. I’ve seen a couple of different ways to do all of this, from a hashtable (which I’m not the biggest fan of) to a customized object. I like to keep things simple so I’m just going to use my favorite object type, the PSObject.

$ScriptBlock = {
   Param (
      [int]$RunNumber
   )
   $RanNumber = Get-Random -Minimum 1 -Maximum 10
   Start-Sleep -Seconds $RanNumber
   $RunResult = New-Object PSObject -Property @{
      RunNumber = $RunNumber
      Sleep = $RanNumber
   }
   Return $RunResult
}

$Jobs = @()

Notice the Param section? Runspaces are like Powershell Jobs in that they are completely separated from the script and you have to pass arguments down to them. Now the meat:

$Job = [powershell ]::Create(). AddScript($ScriptBlock ).AddArgument ($argument1)
$Job.RunspacePool = $RunspacePool
$Jobs += New-Object PSObject -Property @{
   Pipe = $Job
   Result = $Job.BeginInvoke()
}

First define $Job as a Powershell object, then use the AddScript() method to add our scriptblock to the object. Then another method, AddArgument() to put our variable into there. Need to submit multiple arguments? Just keep adding .AddArgument() to your line, or reference the job variable and add more like this: $Job.AddArgument($variable).

After that we use the RunspacePool property to add our Runspace definition to the job. Last line is using the PSObject to store the relevant information. I use the Pipe property to track the job itself, and the Result property to store the Job handle information. You use the BeginInvoke() method for that information, and this will start the background job assuming the number of threads allowed in the Runspace pool isn’t full.

Watch It Go By

So we’ve defined a Runspace pool, we’ve defined our script in a scriptblock and we’ve submitted our job into the background. Now what? We need a mechanism to monitor the jobs running and see when they’re completed and there’s a pretty easy way to do that by watching the IsCompleted property in the Runspace handle.

Write-Host "Waiting.." -NoNewline
Do {
   Write-Host "." -NoNewline
   Start-Sleep -Seconds 1
} While ( $Jobs.Result.IsCompleted -contains $false )
Write-Host "All jobs completed!"

We stored the Runspace handle in the Result property of our $Jobs object, so we need to monitor that. One way you could do that is to loop through the entire array of objects stored in $Jobs, or we can use the -contians interrogator which will go through array for us. Because of that we can set up a simple Do loop to monitor that IsCompleted property until all of the jobs report back as $true. I like to give a little feedback while it’s checking too.

Now all of the background jobs are done we need to get the information they’ve collected back. That’s why we kept the Job information in the Pipe property of our $Jobs object. It’s all in there, we just have to get it out.

$Results = @()
ForEach ($Job in $Jobs )
{   $Results += $Job.Pipe.EndInvoke($Job.Result)
}

We setup a loop to go through all of the elements in the $Jobs array–of objects–and use the EndInvoke() method to pull the data out of the Runspace and store it into another variable, $Results.

And that’s it. You’ve gone through the entire cycle of creating and running background jobs in Runspaces, the Surly way.

What about DFS Monitor?

Interesting you should bring that up. I, of course, immediately went to the DFS Monitor to see if Runspaces would shave any time off of them and it really didn’t! Hopefully you read Proe’s blog Using Background Runspaces Instead of PSJobs For Better Performance above and you know that overall it should improve your multi-threading performance but I actually saw several seconds added on to my DFS Monitor performance! Now there are a lot of things involved with that, including how busy the server I’m querying is at the moment, which can affect performance and I didn’t have time to run some extensive tests. I didn’t have time because I ran into a really bad problem!

Hashtables and Me

In DFS Monitor I use a hashtable to return multiple points of data from the background job back to the primary script and this works just fine when using a PS background job but did not work at all with a Runspace job! The hashtable came back as a weird PSCustomObject that I had to use specialized dot sourcing to get to the information ($Result.Item.Status kinda stuff).

I’ll have to do some testing and whatnot to figure out what’s going on. Since DFS Monitor was one of my first Powershell scripts it could very well be that I am not creating the hashtable correctly and while a background job allows these rule breaks .NET Runspaces don’t. Or it could be something else entirely. I’ll be doing some testing over the next couple of weeks to try to find out what’s happening and I’ll report my results back once I have them.

Test Code

If you’re interested in trying out my test code, here it is.

cls
$Throttle = 5 #threads

$ScriptBlock = {
   Param (
      [int]$RunNumber
   )
   $RanNumber = Get-Random -Minimum 1 -Maximum 10
   Start-Sleep -Seconds $RanNumber
   $RunResult = New-Object PSObject -Property @{
      RunNumber = $RunNumber
      Sleep = $RanNumber
   }
   Return $RunResult
}

$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $Throttle)
$RunspacePool.Open()
$Jobs = @()

1..20 | % {
   #Start-Sleep -Seconds 1
   $Job = [powershell]::Create().AddScript($ScriptBlock).AddArgument($_)
   $Job.RunspacePool = $RunspacePool
   $Jobs += New-Object PSObject -Property @{
      RunNum = $_
      Pipe = $Job
      Result = $Job.BeginInvoke()
   }
}

Write-Host "Waiting.." -NoNewline
Do {
   Write-Host "." -NoNewline
   Start-Sleep -Seconds 1
} While ( $Jobs.Result.IsCompleted -contains $false)
Write-Host "All jobs completed!"

$Results = @()
ForEach ($Job in $Jobs)
{   $Results += $Job.Pipe.EndInvoke($Job.Result)
}

$Results | Out-GridView

Enjoy!

Follow-up: Made another post about multi-threading the “Powershell” way, using Jobs.

About these ads

February 11, 2013 - Posted by | PowerShell, Powershell - Performance | , , , , , ,

20 Comments »

  1. [...] wrote about multithreading using Runspace here, but I also wanted to talk about running them the Powershell way using jobs.  I want to make sure [...]

    Pingback by Multithreading Revisited – Using Jobs « The Surly Admin | March 4, 2013 | Reply

  2. Thanks for the post!
    The formatting was hosed.
    Here are the ones that need to be fixed.
    $RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $Throttle) $RunspacePool.Open()
    Write-Host "All jobs completed!"

    ###### FIXED
    $RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $Throttle)
    $RunspacePool.Open()
    Write-Host “All jobs completed!”

    Comment by Derpo | May 4, 2013 | Reply

    • Thanks for spotting that! Have updated the post.

      Comment by Martin9700 | May 4, 2013 | Reply

  3. I don’t see how this is ever going to work:

    } While ( $Jobs.Result.IsCompleted -contains $false)

    $jobs is an array of disparate objects and you’re calling it like it’s a single object. This should be something like:

    } While ((($jobs | % { $_.result }) | Select -ExpandProperty IsCompleted) -contains $false)

    Comment by thepip3r | June 28, 2013 | Reply

    • -Contains is a pretty cool feature (when you can get it to work) that will scan all the elements in an array without looping through the array.

      Comment by Martin9700 | July 8, 2013 | Reply

      • that’s the point… $Jobs is an array of objects and you’re referencing an property on the entire array (which does not exist or work) and doesn’t work in the code you have posted either.

        Comment by thepip3r | July 8, 2013

      • It works in v3.. That’s one of the problems I had in v2.

        Comment by George | July 8, 2013

      • I see it now. My code is updated too, so I ran into the same problem just never updated the blog post. Here’s what I do now:

        While (@(Get-Job -State “Running”).count -gt 0)
        { Write-Debug “All threads submitted, waiting for them to finish…”
        Start-Sleep -Milliseconds 5000
        }

        Comment by Martin9700 | July 8, 2013

      • Ah… well, that explains it. I run in V2 always because that’s my environment.

        Comment by thepip3r | July 8, 2013

      • No, I think you were right. Besides, most people are still running 2.0 (at least most I’ve run into) so it’s good to update the code. I’ll put this in my todo list!

        Comment by Martin9700 | July 8, 2013

  4. Thank You! Works great with PowerShell v3, but not with v2… :-(

    Comment by George | July 8, 2013 | Reply

    • Yah, I had some wonky results with v2 too. It did work, but data was coming back funny. It’s a tad slower, but I find myself sticking with Powershell Jobs instead. Very reliable and works across versions nicely.

      Comment by Martin9700 | July 8, 2013 | Reply

  5. psasync module is recommended ?

    any good PS Jobs module ?

    Comment by kiquenet kiquenet | October 10, 2013 | Reply

    • Don’t know of any, sorry!

      Comment by Martin9700 | October 10, 2013 | Reply

  6. You aren’t properly disposing of your objects, which could lead to resource constraints. Also closing the pool is a best practice.

    $RunspacePool.Close() when finished using the pool.
    $Job.Dispose() after the async call has ended.

    Comment by Billy | January 26, 2014 | Reply

  7. […] Another method of multithreading is runspaces. I haven’t had a chance to try them yet, but testing by others has shown they are faster than jobs, and they can pass variables between the job and the main script (presumably bypassing the deserialization concern). If you are interested in this, you can read more about it in Multithreading Powershell Scripts. […]

    Pingback by Weekend Scripter: PowerShell Speed Improvement Techniques - Hey, Scripting Guy! Blog - Site Home - TechNet Blogs | May 18, 2014 | Reply

  8. I think this article is great. I have been trying to learn threading to better my it work and make it easier on myself. I do have one question. is it possible to pass two arguments to the Script block when you are creating $Job?

    Comment by zacharyshupp | June 26, 2014 | Reply

    • If you’re using RunSpaces, just add additional .AddArgumentA($myparam) to the [powershell]::Create line. If you’re using Jobs–and to be honest, that’s all I use these days as I find them more reliable and easier to work with–just use the -ArgumentList and separate your arguments with comma’s.

      Comment by Martin9700 | June 26, 2014 | Reply


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 218 other followers

%d bloggers like this: