The Surly Admin

Father, husband, IT Pro, cancer survivor

Additional Thoughts on Hashtables

I recently starting looking at the subreddit for PowerShell, and I’ve seen a lot of people using hashtables for just about everything.  I’ve even seen a few people using them over at PowerShell.com and in both cases the usage was dubious at best.  Here are my thoughts on it, and why hashtables are usually not the right choice.

I’ve written about this before here, so I won’t go into the technical aspects of why hashtables may or may not be better than an array of objects.  But one of the key things to keep in mind is hashtables cannot be piped into other cmdlet’s.  This is huge.  That means in order to pass on your objects or to use any other cmdlet you must convert your hashtable to an array of objects.  In most use cases, why not simply begin with the array and save yourself the hassle?

Another technique I’ve seen is building a hashtable, only to splat it into an array of objects:


$Object = @()
$Hash = @{ "Property1"="value1"
"Property2"="value2"
}
$Object += New-Object PSObject Property $Hash

This is a simplification, but it really made me scratch my head.  You’re creating a temporary hashtable and then simply adding into an array of objects.  Why not bypass the extra step and add to your object?


$Object = @()
$Object += New-Object PSObject Property @{
"Property1"="Value1"
"Property2"="Value2"
}

view raw

BuildObject.ps1

hosted with ❤ by GitHub

Now, technically both techniques are the same thing.  I’m simply putting a hashtable directly into the Property parameter instead of building the hashtable into a variable and then adding it to the object.  But–and I’ll say it again and again–why create a variable you only use once?  It makes no sense, and it complicates the script unnecessarily.  Another interesting thing I’ve seen is people using hashtables in place of an array of objects, one poster was even doing hashtables inside hashtables!  Now, if you have a need for a multi-dimensional array–remember those from the days before we had objects with multiple properties?!–then this kind of construct makes some sense.  But for simple data gathering it’s a pure nightmare.  Getting your data back out of a hashtable, in an object form so it’s usable is hard and because of the very construction of the hashtable you’ll lose data!  If you are using the key value to hold data, say the name of a server then when you use the GetEnumerator() method to pull the data out in object form, it’s just the values that will be pulled not the key.  Data lost.  My typical advice is to simply not use hashtables until you really understand their strengths and weaknesses.  Once you full wrap your head around them they are truly powerful and you can do some amazing things with them!

I Blame Me

So why the sudden rise?  It’s probably not that sudden, I just happened to be going to a place where people were using it a lot.  It’s also interesting to see problems come and go in batches on the forums.  It’s just a nature cycle of things.  But one of the issues is me.  And the other bloggers who talk about PowerShell–you know, the good ones.  For us, the progression of our articles is very sequential.  When I look back at some of the early posts I made a couple of years ago I cringe a little bit.  But for most of my visitors the articles are accessed randomly.  You do a Google search looking to solve a problem and you find a blog post and copy the code.  We all do it!  But as a visitor you’re often coming to a blog that has been going for quite some time and that person may be using some very advanced techniques to solve a unique problem they encountered, and simply put, this might not be the best solution for your problem.  That’s one of the reason’s I often will run through the sections of my script, to really thoroughly explain what’s happening and why I decided to solve it in that manner.  This helps me get a deeper understanding too, since I can’t write it too well if I don’t know what I’m talking about.  I can’t tell you how many times I’ve been writing an article about a script I had written and I end up re-writing whole sections of the script because as I tear it down I see the problems!

In the end, my best advice is to really understand hashtables before you use them, and my article comparing the two might help you a bit:  Objects and Hashtables

What are your thoughts?

 

Advertisement

April 8, 2014 - Posted by | Powershell - Getting Started | ,

6 Comments »

  1. So, since I’m one of the folks who use hashtables (even nested hashtables) a lot, I suppose I just have to comment. Truth is, Martin and I have had a friendly disagreement on this topic for a long time. It’d be really cool to have some more views on the issue. Please chime in!

    I often have to work with very large object arrays, sometimes with 10’s of thousands of entries, but often need to process just one of them. There’s no possible way to index a standard object array; finding a single entry by traversing the array is extremely slow. Hashtables, by definition, are indexed. So, you can use a hashtable to index a large array and find and process individual objects much, much quicker.

    The place I most use nested hashtables is when I have more than one object array that need to be merged. For example, if you’re using the Quest AD Tools to find all the computer accounts in a Windows 2003 domain you don’t get the IPv4 or IPv6 addresses like you do with the Windows AD tools. If you traverse the array of computers doing a nslookup (or the .net equivalent) on an array of several thousand computers, you could wait all day. However, you can get a very quick dump of the A and AAAA records from DNS, make a hash table with the addresses and nest that under the computer name in a ‘master’ hashtable, then map that data into the first array in no time flat.

    And there are places in PowerShell where you HAVE to use a hashtable: New-Object PSObject -Property <<>>.

    PowerShell even creates hashtables when you might not expect it. For example, this will actually create a hashtable:
    $Status = DATA { ConvertFrom-StringData -StringData @’
    1=Disabled
    2=Enabled
    3=Not Implemented
    4=Unknown
    ‘@ }
    I use hashtables like that all the time to translate WMI properties from the index number to the text representation (don’t have to do that as often with AD properties ’cause most of them have toString() methods that do the translation for you).

    Comment by Art Beane | April 8, 2014 | Reply

    • Art, I just couldn’t disagree with you more! We don’t have a difference of opinion on this at all. You’re actually backing up exactly what I’m talking about! I could probably SAY it better, which is completely on me! The instances you described are exactly when you SHOULD use hashtables–and you don’t need me to tell you that, I know. The first instance of 10’s of thousands of entries that you have to access randomly, hashtables are the only way to go. The iteration through your table where you need to index two data points is what I would call a multi-dimensional array, and nest hashtables is exactly how you’d do it.

      The point of the post is I have often seen people simply gathering data, plopping it into a hashtable and then trying to sequentially pull it out for reporting (Export-CSV, Sort, any of the Format cmdlets, etc) and really struggling with it. As I mentioned in the article, one guy was doing nested hashtables and trying to get the data out was a nightmare! Switched him over to an hashtable with PSObject as a value and suddenly it was easy peasy (he could randomly access his data AND pull his reports with ease). It’s all about the right tool for the right job. If you’re doing something that your mind says should be simple and you’re really struggling with it you may very well be trying to use a plunger to hammer a nail!

      Comment by Martin9700 | April 8, 2014 | Reply

  2. And I’d like to apologize for the lack of code examples. The GIST embedding function is still broken. I finally got a hold of someone at support yesterday so hopefully we’ll see a restoration of this functionality soon. You can click here to watch the support ticket:

    http://en.forums.wordpress.com/topic/code-embed-with-gist-not-working

    Comment by Martin9700 | April 8, 2014 | Reply

    • Ok, looks like I have a workaround for this issue now. Had to go back and edit about 15 some articles, but all of the code embed’s are working again!

      Comment by Martin9700 | April 8, 2014 | Reply

  3. Maybe I misunderstood. Looks like we’re in violent agreement. Hashtables are key when you need to index an array, but redundant at best for arrays where you don’t.

    Comment by Art Beane | April 8, 2014 | Reply

  4. The whole point of using a hash table is to provide an index. Functionally it’s equivalent to having a collection of objects with one property. If you need the same index on a different set of data, you need to create another hash table. If you need more than two hash tables with the same key, it’s probably time to create objects using the key and values as properties, then put them into a single hash table.

    You usually see hash tables of hash tables when you want expand the granularity of your index across multiple properties. Many times this can be simplified to a single hash table using a delimited concatenation of the property values as the key. There may be a performance penalty but it probably won’t be significant unless you’re talking about very large scales of operation counts. It might also offend some purists, but it can simplify your code.

    IMHO

    Comment by mjolinor | April 18, 2014 | Reply


Leave a Reply to Martin9700 Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: