Over the last 10 years plus of being a DBA I’ve performed LUN manipulation tasks on dozens of occasions with no errors or problems. Other than adding new disks to Windows 2003 clusters I’ve never had to take SQL Server offline to perform these tasks either.
A simple request
I needed a new drive to be added to a server I had just taken over in order to perform backups of some large databases. This is something that the SAN engineers and Windows support teams have a lot of experience doing and so we weren’t concerned that there might be any problems.
The uh-oh moment
I was running a couple of scripts on the SQL instance hosted on that server when all of a sudden I had a corruption errors show up. The dreaded error 823 sat there in red staring me in the face. I had just taken over this machine, there were no backups and right as we were trying to add a LUN to be able to perform those things went boom.
My stomach hit the floor.
Remembering the words of Douglas Adams
The last thing you want to do as a DBA when there’s a problem is panic. Panic I did not. I took a deep breath and walked down to speak with the SAN and Windows admins to find out what might have happened. They reported no problems, showed that the new LUN was on there and a new drive created ready for me to start performing backups.
My nerves calmed somewhat I went back to my desk and started trying to figure out what the problem was. The error logs were showing that the data files did not exist. Very strange as after connecting to a remote desktop session on the server I was able to see the drive and files without any issue even though SQL could not.
I started digging through the Windows event logs to see if anything stood out. Right at the time that the errors started happening a single error showed in the system event log:
Event Type: Error Event Source: PlugPlayManager Event Category: None Event ID: 12 Description: The device ‘LUN SCSI Disk Device’ (SCSIDisk&Ven&Prod_LUN&Rev4&23432&3&2) disappeared from the system without first being prepared for removal.
The error stated that the LUN hosting the data files lost connectivity and came back online almost immediately.
Continue reading on SirSQL.net.