The Surly Admin

Father, husband, IT Pro, cancer survivor

Powershell and String Searches

I read somewhere that Powershell was built for reliability, not for performance and that’s really true.  I ran into this a lot on the DFS Monitor project, where running queries against 40,000 records in memory were taking 1.2 seconds or so.  But there are a few things we can do to improve performance if needed.  The simple fact is most chores you’ll be doing with Powershell will not run against these performance limitations and you’ll be more then happy with your scripts.

But an interesting thread on Spiceworks came up and really gave me the opportunity to test some things, and I thought I’d talk about it here.

The original thread is here, and it includes pretty much the same information as in this post.  The original poster was looking for a way to have Powershell scan his DNS servers Event Logs for 2 particular events and notify him about it.  This turned out to be pretty easy and I got an opportuity to test out my RegEx skills.  Now, before you start emailing me on how to use RegEx, please let me inform you that I do not know it at all.  Well, maybe just a little bit.  Basically I know how to use the square brackets syntax and that’s about it!

Script ran great and the OP was pleased.  But it got me thinking.  I had read that defining RegEx in a variable first would actually speed up Powershell because it would compile the RegEx for later use and I thought this would be a great way to test that statement.  And while I’m at it I can see how straight Powershell commands work too.  So I came up with a script to test the 3 ways you could search a large block of data.  First way was using Powershell to do the comparing, second was use a RegEx search right in the Where clause and the third way defined the RegEx before the search and then referenced that variable in the Where clause.  Here’s the script:

cls
$DateLimit = (Get-Date).AddDays(-1)
[regex]$Search = "6[45]"
Measure-Command { $Events = Get-EventLog -ComputerName mydomaincontroller -Log Application -After $DateLimit }
Measure-Command { $Events | Where { $_.eventID -eq "64" -or $_.eventID -eq "65"} | Select eventID,Message }
Measure-Command { $Events | Where { $_.eventID -like "6[45]"} | Select eventID,Message }
Measure-Command { $Events | Where { $_.eventID -like $Search} | Select eventID,Message }

Second and third line set our variables.  Get-EventLog has the ability to only return the date range you want, so the $DateLimit variable is for that.  $Search is where we define the RegEx before the search.

About that RegEx, what is it doing?  Any letter or number you have in the string is an exact match, then the brackets define the next character and is essentially an “or”.  That means this simple little RegEx is looking for 64 or 65.  You can also use a range, meaning 6[4-6] would match 64, 65 and 66.  Of course, there’s a TON more to RegEx but I won’t be talking more about it until I start understanding it better myself!

Measure-Command is a great little tool for measuring the length of time a scriptblock–code between the curly brackets, or braces–takes to run.  Since the Eventlog loading is going to take a fixed amount of time no matter what search we do I run that seperate and measure it’s time, just for grins.  Then we do our Powershell search, then the “inline” RegEx and finally the RegEx search using the $Search variable.  The results were pretty interesting.

Log Load Powershell Inline RegEx Precast RegEx
1 Day 1m 43s 160.76 milliseconds 8.72 milliseconds 10.32 milliseconds

The results are pretty interesting.  First thing it shows, on a relatively light record load using straight Powershell incurs a pretty heavy performance burden when compared to RegEx.  That said, on this many records (only a few hundred) even 161 milliseconds isn’t that bad and you’d be happy with your script at the end of the day.  What was interesting was how the Precast RegEx was actually slower.  So much for compiling the RegEx and being faster.  I can’t help but wonder if this is partially overhead caused by Powershell having to retrieve the contents of $Search before it actually runs the RegEx?  One way to find that out is to run it on a bigger dataset and see what the results are.

To do that, we have to slightly modify the eventlog load to not limit it to a single day’s worth of data, but instead tell it to load everything.  Here’s how to do that:

Measure-Command { $Events = Get-EventLog -ComputerName mydomaincontroller -Log Application }

As you can see, we just remove the -After parameter and run the script again.  Here are the results:

Log Load Powershell Inline RegEx Precast RegEx
41,973 Records 1m 42s 1m 54s (114025.25 milliseconds) 4.8s (4873.32 milliseconds) 4.9s (4912.92 milliseconds)

As you can see, the log load time is pretty much the same. What this tells us is Get-EventLog is loading all of the records from the server and then filtering them before passing it to the variable. So -After is not something you would use for performance–at least not for the Get-EventLog cmdlet itself.  However, it will very much affect your performance afterwards by limiting the amount of data you have to work with.

The Powershell search was pretty scary too.  It actually took longer for Powershell to search those 42,000 records then it did for it to load those records from a remote server!

So Inline RegEx is still the performance champ, but the precast RegEx is starting to catch up.  It does suggest that if the dataset gets big enough the precast RegEx variable is indeed faster, but the overhead of Powershell using the variable is substantial.

What’s the takeaway here?  If you’re using more then a few hundred records and need to do a string search/match you will want to consider using RegEx, it’s clearly significantly faster then native Powershell.  That said, RegEx is it’s own language and relies heavily on some pretty arcane methodology.  I’ve only begun to scratch the surface of it and it makes my head spin!

And remember:  “If you have a problem you need to use RegEx to solve, you really have two problems!”

Advertisements

October 8, 2012 - Posted by | Powershell - Performance | ,

2 Comments »

  1. […] can simply check that property against every word in the dictionary.  But as we discovered in this post, using Powershell to do a massive amount of string searches isn’t very efficient and I […]

    Pingback by More on String Performance « The Surly Admin | November 1, 2012 | Reply

  2. Thanks for sharing superb informations. Your site is very cool. I am impressed by the details that you have on this website. It reveals how nicely you understand this subject. Bookmarked this web page, will come back for more articles. You, my pal, ROCK! I found just the information I already searched everywhere and simply couldn’t come across. What a great site.

    Comment by test | March 17, 2013 | Reply


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: