The Surly Admin

Father, husband, IT Pro, cancer survivor

Remove Empty Directories Recursively

An interesting problem came up at Spiceworks the other week, and it was all about deleting empty directories.  Locating empty folders and removing them is actually pretty easy but the complication comes when you have nested folders, all of which are empty.  The most obvious scripting method doesn’t work in that regard.  Let’s see how I accomplished this task.

The Obvious Route

If you read the Spiceworks thread here, then what I’m going to say here you’ve already seen.  But the first route is to simply use Get-ChildItem and use Where to get only the directories.  Then use the GetFileSystemInfos() method and check the Count property (which will be 0 if the directory is empty) and remove the directory.  This can actually be accomplished with a one-liner.

GCI $TargetFolder -Recurse | ? { $_.PSisContainer -and $_.GetFileSystemInfos().Count -eq 0 } | % { RM -Path $_.Fullname -Recurse -Force }

But there’s a problem with this approach.  Let’s say you have a directory structure like this:

C:\Test\Test1
C:\Test\Test1\Test2
C:\Test\Test3

Let’s pretent $TargetFolder is set to C:\Test.  All of these test folders are empty, but the code above will only remove Test2 and Test3, but not Test1.  Why?  Well, the answer is actually disarminingly simple.  Test1 is not actually empty, it has 1 item in it, the Test2 folder.  I’ve seen some solutions that would do nested Get-ChildItem searches, and this would work just fine, but I found the solution messy as you’re doing a lot of GCI searches and on large directory structures this could mean a whole bunch of extra work.

So we need some way of locating all of the empty directories and getting rid of them, even if they’re nested and we need to do it using a minimum of Get-ChildItem searches as possible.  How to do this?

The first step is recognizing what the problem is and coming at it from that direction.  The problem with the above script is it looks at Test1 first, which isn’t empty then moves on to Test2 which is, so how do we go back to Test1?  I decided to look at it from another angle.  Since we’re moving from the top down, what if we did our search in reverse from the bottom up?  But how to figure out what’s on the bottom?  Can’t really use the length of the directory, although that’s a good indicator you could have one directory that’s fifty characters long and 1 level in depth, and another directory structure that’s only 10 characters in length but 5 levels in depth.  But we know the depth, don’t we?  Just count the folders between the back slashes.  Powershell can do this easily too, then we just need to store the folder object and the depth it’s at.  Sort depth in reverse order and we have our backwards search.

$Folders = @()
ForEach ($Folder in (Get-ChildItem -Path $TargetFolder -Recurse | Where { $_.PSisContainer }))
{
   $Folders += New-Object PSObject -Property @{
      Object = $Folder
      Depth = ($Folder.FullName.Split("\")).Count
   }
}
$Folders = $Folders | Sort Depth -Descending

Yet another nice feature of the PSObject is that the properties can be almost anything, including other objects.  So store the folder object as a property, and the depth count in a PSObject.  Now we can loop our way through the $Folders variable, locate our empty directories and remove them.  And since we’re going backwards, Test2 will be the first directory and remove.  Then we’ll see Test1, which is now empty, and remove it.  Then Test3 comes next and we remove that.  No nested GCI searches, simply looping through variables in memory which is much more efficient.

$Deleted = @()
ForEach ($Folder in $Folders)
{
   If ($Folder.Object.GetFileSystemInfos().Count -eq 0)
   {  $Deleted += New-Object PSObject -Property @{
         Folder = $Folder.Object.FullName
         Deleted = (Get-Date -Format "hh:mm:ss tt")
         Created = $Folder.Object.CreationTime
         'Last Modified' = $Folder.Object.LastWriteTime
         Owner = (Get-Acl $Folder.Object.FullName).Owner
      }
      Remove-Item -Path $Folder.Object.FullName -Force
   }
}

That’s the loop, and since I want to have a nice comprehensive report about it we’ll create another PSObject to hold that.  Since we stored the file system object in $Folders, we don’t have to get Get-ItemProperty on each file to get that information, we can simply reference the PSObject property (which I named Object) and then the object property after that, FullName, CreationTime, etc.

After we create our PSObject for reporting we simply remove the folder using Remove-Item.  Once we’ve looped through it we can use $Deleted, sort it to our liking and pipe it into ConvertTo-HTML, or Export-CSV, Out-Gridview, or whatever display/reporting cmdlet  you prefer.

Performance and Powershell 3.0

While I haven’t upgraded my rig at work, I have updated my laptop at home to Powershell 3.0 and I really wanted to see the performance differences with Get-ChildItem.  With Powershell 2.0 GCI doesn’t have any way telling a file from a folder, so we have to pipe it into Where and check the PSisContainer property.  With Powershell 3.0 we now have a switch in GCI called -Directory which allows this check to be done within GCI, which in theory should be much faster.  But is it actually?

I devised this little script to see the differences.

$Folders = @()
$TargetFolder = "c:\"
Measure-Command {
   ForEach ($Folder in (Get-ChildItem -Path $TargetFolder -Recurse | Where { $_.PSisContainer }))
   {  $Folders += New-Object PSObject -Property @{
         Object = $Folder
         Depth = ($Folder.FullName.Split("\")).Count
      }
   }
}

$Folders = @()
Measure-Command {
   ForEach ($Folder in (Get-ChildItem -Path $TargetFolder -Recurse -Directory))
   {  $Folders += New-Object PSObject -Property @{
         Object = $Folder
         Depth = ($Folder.FullName.Split("\")).Count
      }
   }
}

And here are the results of the test:

Search Type                               Time
-----------                               ----
GCI with Where                            1 Minute, 15 Seconds (75,522 Milliseconds)
GCI with -Directory                       51 Seconds (51,056 Milliseconds)

When searching my entire C: drive, we shaved almost 30 seconds off of the search.  If you can run a simple upgrade and shave that much time savings from your scripts then Powershell 3.0 looks like a worthwhile upgrade!  So why haven’t I upgraded at work?  A couple of reasons, one is that I believe most people are still using Powershell 2.0 and I would like to continue writing scripts that support the broadest audience.  But more importantly I read that Exchange 2010 is not yet compatible with Powershell 3.0 and in fact the upgrade will bork some of the functionality.  I didn’t get all the information but I’m not touching my machine until I’ve researched it and know what I’m getting into!  If you know something about it feel free to leave a comment, I’d love to hear from you.

If you’re interested in the script discussed, you can get it here.

January 7, 2013 - Posted by | PowerShell, Powershell - Performance | , , , , , ,

No comments yet.

Leave a comment