When I saw the topic for T-SQL Tuesday this time I just had to get in. Maybe I’ve never mentioned it, but backups is one of my big things. Today I’d like to talk about two topics that get overlooked quite often, the “backups” to the backup, so to speak. First up: proper backup alerting. And second, missing backup recovery.
Traditional alerting falls short
Well, let’s begin with a story from my days as senior DBA. Years ago, one of the application groups messed something up in their database, and they needed a restore. “Sure thing,” I said. No problem. So I went to the backup drive, and there wasn’t anything that could even be vaguely considered a fresh backup. The last backup file on the drive was from about three months ago.
OOPS. Oh crap…so what do I tell the app team?
First, a little investigation. I had to find out why the backup alert didn’t kick off. Every box was set up to alert us when a backup job failed. I found the problem right away. The SQL Agent was turned off. And from the looks of things, it had been turned off for quite some time. And as you may realize, there’s just no way to alert on missing backups if the Agent is off and can’t fire the alert.
But that was just the first part of the problem. The SQL Agent couldn’t send the email, of course. But the job never actually failed, because it didn’t start in the first place.
This is the crux of the issue: jobs that don’t start, can’t fail. Alerting on failed backup jobs isn’t the way to go.
“But it’s okay, we have…”
Hold on, I know what you’re thinking. You have service alerts through some other monitoring tool, so that could never happen to you! To degree, you’re right. But let’s see what else can go wrong along those same lines:
Continue reading on MinionWare.net.