The Surly Admin

Father, husband, IT Pro, cancer survivor

More on String Performance

Was reading another blog from Tome Tanasovski and he was doing a simple series on learning Powershell.  One of his suggestions was taking a large dictionary file and finding all of the five letter words and locating the Palindromes.  I thought this would be interesting, and another opportunity to look at performance and Powershell.

Locating 5 Letter Words

Assuming you got the dictionary text from Tome’s web page (here if you didn’t), how do we locate all of the 5 letter words?  A common property of any string is length and we can simply check that property against every word in the dictionary.  But as we discovered in this post, using Powershell to do a massive amount of string searches isn’t very efficient and I wondered if we could write a RegEx that would perform better?  My RegEx foo is not very strong, but I managed to come up with “^…..$” where the carot designates the beginning of the string, and the “.” means any character, repeated 5 times comes and then a “$” to tell RegEx to look no further.  I then fired up Expresso and discovered I could also notate it like so:  “^.{5}$”.

One other thing, when using the Get-Content cmdlet you don’t have to load the full results into a variable.  In fact, there will definitely be times when you don’t want to do it.  You just want to scroll through the file, get the data you want and get out again.  In Powershell you do this by piping the results of Get-Content into another cmdlet.  Powershell will now take an entry from Get-Content and send it down the line before loading the next line, and so on.

$Count = 0
Measure-Command {
Get-Content C:\Utils\dictionary.txt | Where { $_.Length -eq 5 } | % { $Count++} }
Write-Host "Number 5 letter words found: $Count"
$Count = 0
Measure-Command {
[regex]$Search = "^.{5}$"
Get-Content C:\Utils\dictionary.txt | Where { $_ -match $Search }| % { $Count++} }
Write-Host "Number 5 letter words found: $Count"

And the results are:

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 12
Milliseconds      : 502
Ticks             : 125025185
TotalDays         : 0.000144705075231481
TotalHours        : 0.00347292180555556
TotalMinutes      : 0.208375308333333
TotalSeconds      : 12.5025185
TotalMilliseconds : 12502.5185

Number 5 letter words found: 8636

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 12
Milliseconds      : 63
Ticks             : 120636685
TotalDays         : 0.000139625792824074
TotalHours        : 0.00335101902777778
TotalMinutes      : 0.201061141666667
TotalSeconds      : 12.0636685
TotalMilliseconds : 12063.6685

Number 5 letter words found: 8636

There are 80,368 records in the dictionary folder, and it took both versions of the code about 12 seconds to go through them all and locate the 5 letter words.   Using RegEx I was able to shave off about half a second off of the search, so if we extrapolate that out to 1,000,000 records we’d end up saving about 25 seconds, not too bad.  Still, not a massive improvement and most scripts you run won’t have this many records so using the .Count property is perfectly valid.

Reverse the string

There are 2 ways to reverse the string, one is a simple loop where you loop through the string backwards and reconstruct it, and another is to use a .NET method called Reverse().  The problem with reverse is that it is a method that only works on arrays, not strings so we have to take our string and convert it to an array, reverse it and then convert it back.  Here’s an example:

$a = "This is a Test"
$arr = $a -split ""
[array]::Reverse($arr)
$a = $arr -join ""

But I wasn’t sure if that was really faster or not.  So time to use the dictionary.txt again and reverse my 8600 5 letter words and measure the results!   I decided to load that into a variable outside of the Measure-Command cmdlet, this will give us a clearer idea of how the variable reverse strategies work without adding the dictionary load into the mix.  Here’s the testing script I came up with.

[regex]$Search = "^.{5}$"
$dic = Get-Content C:\Utils\dictionary.txt | Where { $_ -match $Search }
$Number = 0
Measure-Command { $dic | % {
$Palindrome = $_ -split ""
[array]::Reverse($Palindrome)
$Palindrome = $Palindrome -join ""
If ($_ -eq $Palindrome)
{ $Number ++
}
}
}
Write-Host "`n$Number palindromes found"
$Number = 0
Measure-Command { $dic | % {
$Palindrome = ""
$Word = $_   $Word.Length..1 | % {
$Palindrome += $Word[$_ - 1]
}
If ($Word -eq $Palindrome)
{ $Number ++
}
}
}
Write-Host "`n$Number of palindromes found"

And here’s the results:

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 813
Ticks             : 8134303
TotalDays         : 9.4147025462963E-06
TotalHours        : 0.000225952861111111
TotalMinutes      : 0.0135571716666667
TotalSeconds      : 0.8134303
TotalMilliseconds : 813.4303
24 palindromes found

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 3
Milliseconds      : 535
Ticks             : 35352974
TotalDays         : 4.09177939814815E-05
TotalHours        : 0.000982027055555556
TotalMinutes      : 0.0589216233333333
TotalSeconds      : 3.5352974
TotalMilliseconds : 3535.2974
24 of palindromes found

Pretty impressive results!  The [array]::Reverse() method managed to reverse those 8600 strings in about three-quarters of a second, while looping through the string and rebuilding it took a whopping 3 and a half seconds!

Fun little palindrome exercise, and got to learn a little bit more about RegEx and performance using .NET methods.

Advertisements

November 1, 2012 - Posted by | Powershell - Performance | , ,

3 Comments »

  1. Happy to inspire a fun diversion into perf. I’m fairly certain it’s the += that is adding the overhead, but I have not had the time to play with it myself.

    One note about your post: the code you have posted is losing the quotes “” – they’re coming up as "

    Comment by Tome | November 2, 2012 | Reply

    • sorry, writing & quot is showing &quot 🙂 I meant to say they’re coming up as & quot (without the space).

      Comment by Tome | November 2, 2012 | Reply

      • LOL, no problem. Yes, WordPress always does that to me and haven’t figured out a fix yet (support didn’t even bother responding). I usually have to go back and re-edit and completely forgot to! Looks fine in preview too, which is a pain!

        Comment by Martin9700 | November 2, 2012


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: