Every IT shop has its problems with performance: some localized, and some that span a server, or even multiple servers. Technologists tend to treat these problems as isolated incidents – solving one, then another, and then another. This happens especially when a problem is recurring but intermittent. When a slowdown or error happens every so often, it’s far too easy to lose the big picture.
Some shops suffer from these issues for years without ever getting to the bottom of it all. So, how can you determine what really causes performance problems?
First, a story
A developer in your shop creates an SSIS package to move data from one server to another. He makes the decision to pull the data from production using SELECT * FROM dbo.CustomerOrders. This works just fine in his development environment, and it works fine in QA, and it works fine when he pushes it into production. The package runs on an hourly schedule, and all is well.
What he doesn’t realize is that there’s a VARCHAR(MAX) column in that table that holds 2GB of data in almost every row…in production.
Things run just fine for a couple months. Then without warning, one day things in production start to slow down. It’s subtle at first, but then it gets worse and worse. The team opens a downtime bridge, and a dozen IT guys get on to look at the problem. And they find it! An important query is getting the wrong execution plan from time to time. They naturally conclude that they need to manage statistics, or put in a plan guide, or whatever other avenue they decide to take to solve the problem. All is well again.
A couple of days later, it happens again. And then again and then again. Then it stops. And a couple weeks later they start seeing a lot of blocking. They put together another bridge, and diagnose and fix the issue. Then they start seeing performance issues on another server that’s completely unrelated to that production server. There’s another bridge line, and another run through the process again.
What’s missing here?
The team has been finding and fixing individual problems, but they haven’t gotten to the root of the issue: the SSIS package data pull is very expensive. It ran fine for a while, but once the data grew (or more processes or more users came onto the server), the system was no longer able to keep up with demand. The symptoms manifested differently every time. While they’re busy blaming conditions on the server, or blaming the way the app was written, the real cause of the issues is that original data pull…
Continue reading on MinionWare.net.