The other day I mentioned I was having a really bad day–that after being recognized turned into a really good day. The reason it was though was because my shiny new firewall from Meraki was having all kinds of problems keeping a reliable site-to-site VPN with my co-host site. The remote firewall was a Cisco ASA 5510 so you would have thought this wouldn’t be a problem! But it was, and from it arose a need to create something that could monitor if my multiple subnets were routing properly. The funny thing was I had recently helped another Spicehead with a similar script! Now I needed it!
My initial thought, as always, is to use the built in cmdlet’s that are included with PowerShell and the go-to cmdlet for ping testing is Test-Connection. But Test-Connection kind of has a fatal flaw. If it doesn’t successfully find the device you’re looking for instead of returning an object with a StatusCode reporting the problem, it just fails and writes to the Error stream. In this scenario that’s completely useless unless I wrap the command in a Try/Catch block and parse out the error. Which just sucks.
At this point I decided to dig a little deeper and look at the Test-Connection cmdlet and see if I could set where the ping test was coming from. Doing a simple Get-Help Test-Connection -ShowWindow showed me that not only could I do a source, but I could do multiple sources–notice that the -Source parameter accepts an array of strings <String>. Not only that, but the -ComputerName parameter can also accept a string which meant I could monitor multiple destinations from multiple sources all from one command. I just had to submit the Test-Connection as a PowerShell job, monitor the job and retrieve it when it was done, see this post to see how to do that. And it all worked. Beautifully. I had the Ping Monitor I always wanted. So I set it loose and let it run and within a couple of minutes it detected that one of my subnets was down! Awesome. Except for the whole subnet down thing, of course.
Later that day…
After letting the script run for about 24 hours I noticed it got stuck. The job was submitted properly but it never finished, it just ran and ran and ran. So I put some loop control in there that basically if I wait for the submitted job too many times through my monitor loop than just cancel the job, get rid of it and try again. This part worked great, but the very next job to be submitted just sat and ran forever too. Closed out all my PowerShell jobs, tried again and it still hung up. Rebooted and it started running fine again. Left it overnight and it was hung up again. At this point I’m starting to realize that going the way of the PowerShell job just wasn’t the right approach!
I ended up having to drop the entire Job approach, and just using the WMI class Win32_PingStatus, which returns the same object that Test-Connection uses and it works even if the connection fails–which is a requirement in this case!
Haven’t done a good old fashion script breakdown in awhile, so let’s break this one down. First let’s take a look at the Param section and getting what we need for the script.
Nothing too amazing here, we have string array’s for Destinations and Sources, so we can hit multiple destinations from multiple sources. Typically I would think you’d only have one or two destinations here but it’s really up to the individual network. Sources would likely be more since you want to test from multiple locations to make sure all your routing/VPN’s/WAN’s whatever are all working the way they’re supposed to.
From there we have the To, From and SMTPServer to get the email information we need. I decided to go with the simplified email selection here and not the more advanced version that allows credentials and so on. I can always add those later if needed, I suppose. Next is TimeStop, which was a leftover parameter from the Job days, giving the script a shelf life essentially. If PowerShell has a weakness–and let’s be honest, it does have a couple–memory management is one of them. Not just with jobs either, I’ve seen other scripts that have run into issues because of memory. Because of that I try not to have a script run more than 24 hours, so I decided to leave this parameter in there and give the script a built in lifespan.
Last parameter is Alert, which is how often you want to be alerted that there’s a problem. After 3 failed pings the script will email you but I wanted it to keep emailing you every now and then, but not too many times because that can get annoying. I decided to go with every 500 failed pings after that. The question was how to determine that. One way would be to simply divide the number of false pings by our Alert number and check if it’s -is [int]. Which honestly would have been the easier way of doing this, but I really wanted to work with the mod operator (%) because I was pretty sure it would do what I wanted, but having never used it I wasn’t sure. And doing ‘failed pings’ % 500 works great! Except for one little problem, it also works if ‘failed pings’ is equal to 0. So you have to test out of that:
Works great but using the -is [int] is probably more efficient. Just a heck of a lot less cool. I’ll probably change it in a future version.
Now we need to validate sources. It doesn’t do us much good if one of the sources we want to use isn’t working (either it’s down, inaccessible or WMI is screwed up some way), so need to test it. If you’ll notice I try the Get-WMIObject cmdlet and if that fails I try again. The reason for this is WMI is not a perfect system and during testing I had one of my hosts actually fail the check when I knew it was fine. So we wait a half a second and try again, if it fails twice we know we got a bad one. Report the error and then NOT update the $Keep array, which is where we are storing all of our successful sources.
We have good sources–or not, which we report and exit out of the script–so now we have to load the data. This hearkens back to when the script was run from within a Job meaning the data being returned might be out of order and I needed a fast way to go through my fast data, find my record and update it as quickly as possible. This called for a hashtable. I love hashtables for this kind of thing because they allow you to jump within your array lightning fast and update as needed. By making the data within the hashtable a PowerShell object I get the best of both worlds, the multi-leveled data of an object and the speed and ease of indexing within a hashtable.
This code simply loops through our sources and destinations and loads each into its own record in the hashtable. To make the key I simply combine the Source/Destination into a single unique field–remember each key in a hashtable must be unique–and then preload the data with our PowerShell object.
After that, a simple loop to go through everything and create an array of objects with our data on it, which I will parse out later.
As I write this, I will probably rewrite a lot of this as it’s still very much a script structured around running around a job. Now that I’ve taken control of the Ping loop above, there’s really no reason to do this loop, and then do a parsing loop (below). Can really just do it all at once. Still, it’s completely functional, so sometimes it’s hard to go back and “fix” what ain’t broken. It’s just a little messier than it should be.
Here’s the parsing loop:
Line 3 creates our Key, than it’s a simple matter of getting to the right record using that key and updating fields as needed. I also detect if a server goes from more than 3 ping errors (in which case an alert email was sent out) back to 0. I store this information in the $Back variable to gather everything together, then if there is data in $Back I format it and send the email out. This way you only get 1 email for restored connections instead of 1 email for every restored connection (which if you’re monitoring a lot of connections could get annoying).
Next little bit determines if we need to send any alert emails, notice the % mod operator in there.
Last bit here just creates another object for display on the screen and puts it out there. Sleep for 4 seconds and go back into the loop and do it all again. Below are some screen shots of the script in action.
Hope you enjoyed this breakdown of how this script was put together. As always, if you have any questions you can leave them in comments and I’ll help out if I can. Below you’ll find the link to the entire script (including some additional text, error checking, etc).
No comments yet.