Objects and Hashtables
I expect, if you got the right group of people together, you could have a good old Mac vs PC style argument over the use of Powershell Objects (PSObjects) and hashtables. And I’ll be honest, while I’ve used hashtables a lot for splatting I’ve used them very little for anything else. Time to look at the two and figure out which is better, once and for all!
What Brought This On?
You might be tempted to ask. The real trigger for using hashtables came with the Get-SAADManager function I posted about early this week. The problem I ran into was I wanted to find out what depth a manager was at in the organization chart when the only information I had was who the manager of a particular user was. How do you determine who the top dog is? The methodology I came up with was when I found a manager I would add a 1 to their depth. I would then find out their manager and add 1 to their depth and continue up the tree until I found a manager who didn’t have a manager–and in theory this should be the CEO or President of the company. It’s not a perfect system because if you don’t have a traditional organization chart–for instance several people at the same top level–but for most instances the function will run into this should work.
But how to move up the tree efficiently? If I store everything in a PSObject, as I normally would, searching through for the next manager up can be pretty painful what with using the pipeline into the Where cmdlet. The good rule of thumb is to do your searches as far to the left as you can, avoiding the pipeline if possible. There would be so much recursive searching of a PSObject–technically an array of objects–the performance was really going to suffer. But can hashtables solve this problem for us?
What’s the Difference?
To understand the performance issues we have to look deeper at what makes them different and how you use them. PSObject’s are really more of a two-dimensional construct in that you have a object with multiple, fixed properties and values for those properties. Hashtables are more one-dimensional having only a “key” and a “value”. You reference elements in an array of PSObjects by using an index number–$Array[37]–and you reference elements in an array of hashtables by using the key. But the key can be anything from a number to a name–$hash[“George”].
Building a Hashtable
You’ve probably done it quite a bit and may not even realize it. If you’ve used splatting, and I use it a lot, then you’ve already built hashtables.
$Hash = @{ To = "someone@somedomain.com" From = "martin9700@thesurlyadmin.com" Subject = "Just testing" }
Here we’ve created an array of hashes with 3 elements and 3 values. $Hash[“To”] would reference the value “someone@somedamain.com”, $Hash[“Subject”] would get you “Just Testing”. But there’s another way to build a hash table.
$Hash = @{} $Hash.Add("To","someone@somedomain.com") $Hash.Add("From","martin9700@thesurlyadmin.com") $Hash.Add("Subject","Just testing")
Does the exact same thing. I happen to like the code from the first example a little better as it’s just prettier to look at. The problem is that with a hashtable you have a key and one value, but what if you want to store multiple values for each element–a not uncommon need?
PSObject’s Are Better
This is where PSObject’s really shine. Because a PSObject can hold many properties and values per element you can easily store information in them for later manipulation, especially for reporting. The other thing that PSObjects can do, that hashtables can’t, is be directly used with other cmdlets. Want to take your PSObject and save it to CSV? Just pipe it directly into Export-CSV and you’re done. Now, I know I said hashtables can’t do it and that’s not actually true. It is possible to do it with a hashtable, but it does take a little more work and a lot better understanding of the techniques you need to use with hashtables. Which is why I don’t use them to store my data. Why convert when I can just use a PSObject and be done with it?
Or Are They?
I would say that for most scripts you will not really need hashtables. Most of the time we’re gathering data, processing it sequencially and outputing it for consumption–or we just process it and finish. PSObjects shine in these operations. But what if we need to dynamically search through the data? A lot? Suddenly hashtables jump to the forefront and it all comes back to the key and the value.
Remember how I said hashtables are one-dimensional? While that is true, that doesn’t mean the value of the hashtable has to be one-dimensional. What is this craziness of which you speak?! This goes back to the fact that Powershell is very flexible and doesn’t much care what information you put in a variable, or the value of a hashtable for that matter. What if you put a PSObject in the value of the hashtable? It would work and suddenly you have the best of both worlds! And for Get-SAADManager that’s exactly what I needed.
But first, we must test some of this crazy stuff and see how much better (if at all) hashtables can do this work. For this test I’m going to break out our old friend dictionary.txt and pull off the first 1000 words from it and load that into a PSObject and a hashtable along with a number that corresponds with the location that word is in the dictionary. Then I’m going to do 50 random searches inside each object and see who can get the searching done faster, and by how much. Here’s the code:
</pre> $dic = Get-Content c:\utils\dictionary.txt $TestObj = @() $TestHash = @{} $ObjBuildTime = Measure-command { 0..999 | % { $TestObj += New-Object PSObject -Property @{ Key = $dic[$_] WordNumber = $_ } } } $HashBuildTime = Measure-Command { 0..999 | % { $TestHash.Add($dic[$_],$_) } } $RandomObjSearch = Measure-Command { 1..50 | % { $Num = Get-Random -Minimum 0 -Maximum 999 $Found = $TestObj | Where { $_.Key -eq $dic[$Num] } } } $RandomHashSearch = Measure-Command { 1..50 | % { $Num = Get-Random -Minimum 0 -Maximum 999 $Found = $TestHash[$dic[$Num]] } }
The results are pretty interesting.
Action Time ------ ---- Object Build Time 831 ms Hash Build Time 151 ms Random Search: Objects 9s 85ms (9085 ms) Random Search: Hash 29ms
As you can see, in every way possible, the hashtable outperforms the object. Both in build times by a wide margin–though honestly both are so fast that it doesn’t really matter–but in search times. In fact, the search times are so gigantic in favor of hashtables that I was pretty surprised when I saw it–and I was expecting it!
Get Manager Depth
So let’s tie this up with a real life scenario. With Get-SAADManager I wanted to know the Manager Depth and that meant a lot of jumping around in the dataset, first had to identify a manager and add a 1 to their Manager Depth, then I had to jump to their manager and a 1, and work my way up to the CEO adding all the way. The first step was loading a hashtable with everything I needed in it, so the key because the users SamAccountName and the value was a PSObject with all of the other data I needed in it, including Manager Depth.
So how do we identify a manager? Turns out it’s pretty easy. Get-ADUser produces a field called DirectReports which is an array with every user that has that user set as their manager. So all I had to do is get a count off of that array and detect any user that had a DirectReportsCount greater then 0.
ForEach ($Manager in ($Result.Values | Where { $_.DirectReportCount -gt 0})) { AddManagerDepth $Manager.SAN }
One thing about hashtables is you can use the .Values property to list all of the values and then simply dot source the particular property in your value (since we have a PSObject in the value it will have properties) and check for the DirectReportCount field I had built earlier in the script. I then call a function called AddManagerDepth and send it the Manager SamAccountName (remember, that’s the key in our hashtable). Here’s the function:
Function AddManagerDepth { Param ( [string]$ManagerSAN ) $Script:Result[$ManagerSAN].ManagerDepth ++ If ($Script:Result[$ManagerSAN].Manager) { AddManagerDepth $Script:Result[$ManagerSAN].ManagerSAN } }
I decided to just manipulate the hashtable outside the scope of the function (using $script:) for a couple of reasons. First it was just easier then passing the whole hashtable as a parameter but also for memory considerations. On large Active Directory trees this hashtable could be pretty large and prett deep, and we’re going to be calling this function recursively so there could end up being a LOT of copies of the hashtable in memory which could potentially cause problems down the line.
Now, to the magic of hashtables. Since our key is the manager name, and that’s what we passed to the function it’s a simple matter to find the array element we want by referencing that key: $Result[$ManagerSAN]. And as easy as that, we’re now on the correct array element. The “++” will add 1 to that field and then we check if that user has something set in their Manager field and if they do we call the AddManagerDepth function again, pass the users Manager SamAccountName to the function and repeat. And the function will now crawl it’s way up the tree until that Manager field is empty, which in theory should be the CEO/President/Owner.
Conclusion
Based on this, I’ll pretty much be using hashtables from now on, right? Probably not. I like my PSObject’s and the fact that I can use them with other cmdlet’s without having to manipulate anything is a huge boon. Also, I don’t have a lot of call for scripts that need to randomly access the dataset I’m building, one record at a time. Notice when I wanted to originally find all of the managers in my Active Directory, even with the hashtable, I still had to use a Where cmdlet on the dataset so I could get more then 1 record returned so the hashtable did nothing for me in that instance.
But I love the combination of the two, hashtable with a PSObject as the value. Notice how I was able to access the properties values of my PSObject from the hashtable? $Result[$key].ManagerDepth. Very cool and allowed me to use all the great things about PSObjects and the speed and flexibility of the hashtable. When I was done, I was also able to easily use my hashtable just like a normal PSObject just by using the .Values property.
$Result.Values | Out-Gridview
Would make that hashtable work exactly like a normal PSObject. Best of both worlds!
So like just about everything in Powershell, what’s best is dependent on you, your preferences and the exact circumstance you’re working with.
As we antiques used to say, “Cool!”. I use hash tables a lot. And, just like Martin saves PSObjects in the Value property, you can also save hash tables in there. I tend to use hash tables where the primary “audience” is the script and PSObject when the intended user is a human.
One enormous advantage to hash tables is that you can save them to a PowerShell formatted XML file (Export-Clixml) and then when you import them later, they will be fully restored as hash tables.
Art, good to hear from you! You can do the same with objects, they just transform from a PSObject to a PSCustomObject but still function identically. I do this with the DFS Replication Monitor.
[…] In the end, my best advice is to really understand hashtables before you use them, and my article comparing the two might help you a bit: Objects and Hashtables […]
[…] single data set, and we want to make sure lastLogon is the newest date out there. This is where a hashtable can really shine. Loop through every record, check the hashtable to see if the user exists, if […]