5 Tips for Friday: Query Tuning

Time for another 5 Tips for Friday. Truth be told, I almost forgot that it was Friday so this week’s post is coming late int he day. This week’s fast, fun tips are all about tuning queries.

Look at current waits, not total waits: I see it asked time and time again in the forums about some top wait and what should they do about it. A lot of the time, the wait they’re asking about is a completely benign weight that is not a concern at all. Looking at total waits on a server can send you down a deep, dark hole that is totally unrelated to what your real problem is.
So how do you avoid this? Look at current waits. The DMVs sys.dm_os_waiting_tasks and sys.dm_exec_requests show currently running tasks and and information about their waits. The DMV sys.dm_os_wait_stats is a cumulative aggregation of all waits on the server since the last restart or reset of the wait stats. It collects waits all the time, not just when you have a problem.

Also, how do you know if a wait is benign or not? My answer is simple. Do you see any queries waiting on that wait type? If not, why do you care so much? Concentrate on the things you can see your queries waiting on.
Take out those lookups: A lookup (key or RID), becomes a costly process very quickly. When a query needs additional columns not in a nonclustered index it could use, it has two ways to solve that problem. It can either perform a scan or seek on the clustered index (or heap) or it can perform a lookup. A lookup performs a single page read from the base clustered index (key lookup) or the heap (RID lookup) to find the row a given row in the nonclustered index maps to to get the extra column(s). Each lookup is a separate page read. So if the seek of the nonclustered index yields 1000 rows, the lookup will loop through 1000 times performing a page lookup each time.
The sweet spot (also known as the tipping point) for when the optimizer chooses a cluster/heap scan over a lookup is really low. There is an algorithm that is used to calculate it, but it generally falls between 2% and 3% of the rows in the table. So the threshold for where a lookup exceeds the cost of a cluster/heap scan can be exceeded very quickly.

But there’s good news. When you have a situation where the query plan is performing a lookup and the row count is high, the optimizer either came up with a bad plan for some reason (like maybe statistics are very skewed) or it’s using a plan that was compiled with a parameter value that yields a much smaller row count (parameter sniffing). Either way, the fix is easy. If you create a covering index, the optimizer can skip the decision about doing a lookup or no and just go straight to using the covering index. A covering index (unless you’ve done something really bad like included a bunch of LOB columns) will be the better choice and will be chosen. Quite often, this is best resolved by simply adding the missing columns to the included list of the existing nonclustered index. Sometimes, it makes sense to create a new one. Use your best judgment here.
Follow the leader to find the cost: The cost threshold for parallelism is a setting that many people don’t know about. The number itself doesn’t have any user-readablevalue to it. It’s just an estimate of work that has meaning to SQL Server (if you see me in person at a conference, ask me to share the story with you about where it came from). The default cost threshold setting, in my opinion and the opinion of many others, is way too low. As a general rule, I crank it up to 50 by default on any new server, and then I adjust it as needed to tune the workload.
Plan Cost
Cost threshold for parallelism is the point at which the optimizer considers a parallel plan. Then it uses whichever plan has a lower cost. Just because a parallel plan is compiled, it does not mean a parallel plan will be selected. If I get a serial plan that I think should go parallel, the first thing I look at is the total cost for the plan. Luckily, it’s very easy to figure out.

When you look at the query plan, hover over the top-left most operator and it will show the cost of the entire plan. If it is above the cost threshold setting, it was considered for a parallel plan (assuming parallelism isn’t disabled). If it is not above it, then it was not. You may notice more than one cost listed. It is the estimated subtree cost that you want to look at. Subtree cost is the cost of the current operator and the entire tree that comes after it (or before it, depending on which way you look at it).
Query I/O Stats
I/O, I/O, less work is the way to go: Another easy fix I look for in the query plan for a query I’m trying to tune is to look for scans. Seeks are generally much better than scans (though not if the seek still reads as much or almost as much as the scan). You can see exactly how much I/O a scan or seek performs in an actual execution plan and you can see how much I/O it estimated in both an actual and an estimated query plan. Find those high I/O scans and see if you can provide a smaller nonclustered index that will perform much less I/O than the scan.
Take the hint: If are practicing the previous tip, and you want to measure or compare the difference between the new index you created and the original index, you can use an index hint to compare the same query with different indexes. If it is now using the new index, you can copy the query and add WITH(INDEX(1)) to force it to use the clustered index or some other index ID or index name to get it to use whatever index it used previously.

No, go. Tune. Tune like the wind!

5 Tips for Friday: Query Tuning

Related

One Comment

Best Practices Q & A

Robert L Davis

Jason Brimhall

Argenis Fernandez

MinionWare, LLC

Kenneth Fisher

David Klee

Ed Watson

Nic Cain

Chrissy LeMair