Distributed Computing and Powershell
Great little script request came across on Spiceworks last week, something I’ve been looking forward to for a long time but never really thought I’d get a chance to do. Time to unlock the power of Remoting in Powershell and dive into true Distributed Computing–not multi-threading, but Distributed Computing!
Scoping out a Script
It was a simple enough request, user has a rendering program that takes advantage of distributed computing to chunk out pieces of a picture and have PC’s that aren’t that busy do the work. Problem is the program isn’t too good and keeping track of things and often leaves orphaned renders on the distributed PC’s. We just need to hit those PC’s, locate the render file and copy it back up to the server where a manual “re-stitch” can be done.
At first glance we would just have a list of the computer names and use Get-ChildItem to scan the directory needed and copy it to the file server and just loop through all the names. Which would work just fine, but it would be pretty slow especially since these files can get as big as 500mb in size. How do we speed it up? Time to think about multi-threading, and this would work just fine too but the problem is multi-threading big file copies isn’t really going to save all that much time because all of the data has to flow through the PC running the script so it would be faster but still not as efficient as I’d like.
The next thought then is why have the PC running the script do all the work? Why not have each individual PC scan its own files and then copy up to the server? This seems much better! But not without flaws either. So while we’d have some great distributed computing going on all of these PC’s (as many as thirty!) would be copying their files all at the same time. Now we’ve placed the burden on the render server and unless it’s a server truly designed for massive file serving we’re going to run into the same problem that the multi-threaded script would have.
That means we’ll have to throttle how many distributed computers can be working on this task at a time. This will take some thinking.
I know I’m going to have to use Invoke-Command to do this, since we’re going to be using Remoting to accomplish this task and it turns out that Invoke-Command has this cool little parameter called -AsJob. With this parameter Powershell will automatically make the command into a Powershell job and submit it into the background. This is the point where this whole script fell into place for me. I’ve done lots of multi-threading with jobs, including throttling how many background jobs can run at a time. By controlling how many jobs get run we can control how many remote computers are copying files at the same time.
But if we’re going to use Remoting, we’re going to need to make sure our network is set up for this. First we have to make sure all of the computers in question have Powershell installed. If this is Vista and higher that’s no problem because Powershell comes pre-installed but XP will need to have it installed. Luckily we can accomplish this pretty easy with WSUS, Group Policies or logon scripts.
Next we need to make sure Remoting is enabled for all of the machines, and luckily I had already written a “How-To” on Spiceworks on how to do this (link here).
Last we need to use the multi-threading code I’ve already talked about here. As it turned out, I ended up changing ONE LINE of code to turn the script from a multi-threading script to a Distributing Computing one! One line.
The only change I had to make was changing Start-Job–see the “Submit the Job” section on the Multi-Threading Revisited post–to Invoke-Command like so:
Invoke-Command -ComputerName $Computer.Name -ScriptBlock $Scriptblock -ArgumentList $FileName,$SearchPath,$CopyPath -AsJob
Use the -ComputerName parameter to submit the job to that computer, while the -ScriptBlock and -ArgumentList parameters remain the same. Last you add the -AsJob parameter to make this command a background job and all of the other code we’ve used before to monitor and control background jobs will work exactly like it before. And you’ve done it. You’re now using Distributed Computing with Powershell and it simply couldn’t have been easier
When Can You Use It?
For me this was the biggest problem. I’ve been wanting to write a script like this ever since the potential of Remoting hit me several months ago. But to be honest, I’ve just never had a workload that lent itself to this kind of work. Sure I could farm out Active Directory updates to several computers in the IT department but it would be more for show and tell than actual needed workload relief. I was so glad when this script request came along that I could finally give it a go. And to discover that as I wrote it I already had all of the control mechanism’s written, and that they didn’t even require any adaptation was amazing!
If you come up with a great way to use Distributed Computing at your workplace, let me know I’d love to hear about it!