5 Tips for Friday: Database Corruption

5 Tips for Friday
5 Tips for Friday
This week’s 5 tips will be on a topic very near to my heart, database corruption. I want to do the 5 tips a little differently this time. The tips will not just be 5 tips in no particular order. These tips will be tips as well as steps on what to do starting with the point where you find that a corruption error (823, 824, or 825 error) has occurred.

Let’s start this off by assuming that an end user has reported that their query failed due to inability to read a page:

  1. Don’t panic: If you haven’t dealt with corruption before or have had only a very limited exposure to it, panicking seems like the right thing to do at the time. It’s not though. It’s the absolute wrong thing to do. Corruption is a reality for everyone, and it’s not your fault that corruption occurred. The only thing you are responsible for is recovering from it, and that’s why we prepare all the time.
  2. Don’t react immediately: Don’t immediately start messing with the database. First, think a moment about who is potentially affected by this. Send out an email to those appropriate for you to notify such as your manager, the PM or whoever is your point of contact for the business side. Quite often, this is the exact same group of people to whom you send notices about planned maintenance. Don’t think too hard about it, you know these answers.

    Send out an email stating that symptoms of corruption have been detected and you will be running an consistency check of the database. Commit to sending an update once you have determine the extent of the corruption and know how to proceed with fixing it.
  3. Run CHECKDB on the database: As stated above, before you start fixing it, you need to determine the extent of the corruption. So run a CHECKDB of the whole database. Use the NO_InfoMSgs option to limit the output to only errors. If you are running it in SSMS (and you probably are at this point), use the All_ErrorMSgs option to ensure that all error messages are returned. By default SSMS will limit the output to 1000 errors unless you include this option.

    Optional: I also like to use the TableResults option as it gives me the output in a nice, easy-to-read table instead of printing out in the messages pane as plain text.

    DBCC CHECKDB(<Database Name>)
    With No_InfoMsgs, All_ErrorMsgs, TableResults;
  4. Double check the state of your backups: While CHECKDB is running, check the state of your backups. Not always, but in many cases, you will need to have backups to recover from. It's also important to note if you have log backups as well. If the database is in simple recovery model, there will not be log backups. Regardless of recovery model, if you don't have log backups, it restricts some of the recovery options you have. You will need to know if using log backups is an option when determining how best to recover.
  5. Interpret the CHECKDB results: Now that you have results, you need to figure out what objects are corrupted and what type of object is corrupted. The type of object will determine the optimal recovery option.

    If it is an allocation page (PFS, GAM, SGAM) or the header page (page 0), then you will need to restore the whole database from backup. If you have log backups, you can use the log backups along with a tail log backup (you're going to try to back up the current log before you start your recovery process) to recover it with little or no data loss. If you don't have log backups, you still have to restore from backup, but you will lost all data since the backup(s) you restore.

    If it is a clustered index page, you are still looking at restoring from backups, but you have more options. You can restore just the corrupted pages if you have log backups to apply to the database to bring the restored pages current with the rest of the database. This is including a tail log backup. If you don't have log backups, then you have to restore the whole database or resort to a repair option (repair options are absolute last resorts).

    If it is in a nonclustered index and the clustered index or heap is not corrupted, this is the easiest one to fix. Simply script out the index, drop it, and recreate it. Alternatively, you can disable the index and then rebuild it. Rebuilding it alone is not sufficient because it will attempt to use the existing index for the rebuild. You have to drop and recreate or disable it before you rebuild it. If it is disabled, the rebuild will use the clustered index or heap for the rebuild.

    For more details on this step, see my blog post on Introduction to Integrity: Interpreting DBCC CHECKDB Results.

Now that you have a plan, let everyone know what the plan is and put it into action.

54321
(1 vote. Average 1 of 5)