ForEach Performance
It’s been a tough couple of weeks, let me tell you! I had a cold, that eventually dropped into my lungs and became pneumonia. I don’t know about you, but I don’t have much interest in doing anything when I’m sick, not even Powershell! Also, at work, we’re ramping up to migrate from our current ERP system to one from SAP and I expect that will be eating up a ton of time too. Not to mention the need to finish my Exchange 2010 migration, create a Sharepoint 2010 test environment from our production one and half a dozen other projects going on at the same time. I hope to keep fitting my scripts in amongst all this, as I have to admit this is where my IT passion is right now.
New/Exiting User Automation
As must happen to nearly every Windows admin at some point, you feel the need to automate the new user and termination processes and the last week or so has been my time. Most of this work is so specific to the environment that not a lot is written on it out there but I’d like to at least bring some insights I’ve gained by doing it. I hope to get something written up in the next couple of weeks about it.
ForEach
What I really want to talk about today is ForEach. As I’ve mentioned in my “Getting Started” series, there are really two versions of ForEach in Powershell, the ForEach-Object cmdlet and the ForEach statement. An interesting thing came by recently about the relative performance of the two technique’s. Here’s the link, and you have to read the comments section to really get into the nitty gritty of the argument. But, I am who I am and I have to do things myself to see the results. Time to revisit the Palindrome scripts and see what we get.
This turned out to be a pretty simple test. Basically reuse the Palindrome location scripts from my earlier performance series and adjust to test using Foreach in the pipeline, and as a statement.
cls [regex]$Search = "^.{5}$" $Number = 0 Measure-Command { Get-Content C:\Dropbox\dictionary.txt | Where { $_ -match $Search } | % { $Palindrome = $_ -split "" [array]::Reverse($Palindrome) $Palindrome = $Palindrome -join "" If ($_ -eq $Palindrome) { $Number ++ } } } Write-Host "`n$Number palindromes found" $Number = 0 Measure-Command { ForEach ($Word in (Get-Content C:\Dropbox\dictionary.txt | Where { $_ -match $Search })) { $Palindrome = $Word -split "" [array]::Reverse($Palindrome) $Palindrome = $Palindrome -join "" If ($Word -eq $Palindrome) { $Number ++ } } } Write-Host "`n$Number of palindromes found"
The results were completely as expected, too. Using the ForEach-Object in the Pipeline, the script took 6.4 seconds while the ForEach statement took just shy of 6 seconds. It’s not a huge difference in performance, but it’s definitely there and if you have a very large dataset you need to work through the difference will get wider and wider.
Which do you use?
This is one of those decisions you have to make in Powershell a lot. There are so many ways of doing things, and some things are clearly faster to use, but should you always pick that?
One of the beauties of Powershell is it’s ability to create complex commands using a single line of code. You do this by using the pipeline and passing information down the line to the next cmdlet, and the ForEach-Object is an integral part of that so don’t forget that sometimes faster isn’t necessarily better. Also, how often are you going to be working with datasets where that performance is truely needed? Remember you can often shave far more performance out of your scripts by limiting the size of the dataset in the first place, which negates much of the advantage that the ForEach statement has.
That said, I often like to use the ForEach statement because I find it easier to read and understand what’s going on. Using the pipeline and the $_ variable can often be confusing, especially if you have other people attempting to read your code (say you’re helping someone out on Spiceworks).
In the long term, I think I will begin using the ForEach statement far more often just to “eek” out that little bit of performance and to make my code that much more readable (win win, really). But when trying to write a quick one-liner just to get something done, ForEach-Object (or it’s alias “%”) will still be my go-to.
What about you, what are your experiences?
[…] As with most scripting languages, and Powershell is no different, there are usually many many ways of doing things. But when given the choice of using a fancy coding technique, or using a couple of extra lines to write things out pick the later. Why? It just makes the code easier to read. In almost all cases any Powershell script you’re writing does not need to be written to squeeze every ounce of performance out of it that you can, and if that’s the case for your script go ahead and slow it down a few thousand miliseconds by writing something out. This is going to be true with the pipeline. Now, the pipeline is hugely important in Powershell and you can’t–and shouldn’t–avoid it but consider if it’s really necessary all the time. In fact, sometimes avoiding it can actually improve the performance of your script quite a bit, read here. […]
[…] is, and being a bit of a perfectionist, I like to have more control over my variables. Also, ForEach-Object isn’t the most performant of techniques out there. I like using the ForEach statement […]