Using GROUP BY instead of DISTINCT

Recently, Aaron Bertrand (b/t) posted Performance Surprises and Assumptions : GROUP BY vs. DISTINCT. In it he says he prefers GROUP BY over DISTINCT. He discusses the fact that GROUP BY will, in fact, under certain circumstances, produce a faster query plan. I highly recommend taking the time to read it. In fact, if you haven’t already, go do it now. It’s ok. I’ll wait.

Done? Good. Interesting stuff wasn’t it? I agree with Aaron by the way. I prefer GROUP BY instead of DISTINCT. However, I was thinking that even before I realized the performance implications. In his post, Aaron said So why would I recommend using the wordier and less intuitive GROUP BY syntax over DISTINCT? I disagree with the less intiuitive part. Wordier yes, certainly, but I’ve found over the years that most people follow GROUP BY more easily. I can’t tell you how many times I’ve seen a query like this

SELECT DISTINCT FirstName, MiddleName, LastName, 
	AddressLine1, AddressLine2, City, StateProvinceName, PostalCode
FROM Sales.vIndividualCustomer
WHERE BusinessEntityID = 17298

SELECT DISTINCT FirstName, MiddleName, LastName,

AddressLine1, AddressLine2, City, StateProvinceName, PostalCode

FROM Sales.vIndividualCustomer

WHERE BusinessEntityID = 17298

And the user asks me why they ended up with two lines.

And so the conversation begins.

“I want to get one row for each person. Why do I have two rows for this guy?”
“Well, you have two different addresses.”
“But I used DISTINCT?”
“Yes but you are including the address and there are two of them.”
“Ok, so how do I fix it?”
“Don’t pull the address.”
“But I need the address.”

It’s even worse when it’s a SELECT DISTINCT * and you don’t have the columns listed explicitly. For some reason, for some people, it catches them by surprise that DISTINCT covers every column returned. In other words, every row is unique in the result set. I’m not sure why this confuses some people, and these are smart, skilled people, but it does.

Now, look at the same query with a group by.

SELECT FirstName, MiddleName, LastName, 
	AddressLine1, AddressLine2, City, StateProvinceName, PostalCode
FROM Sales.vIndividualCustomer
WHERE BusinessEntityID = 17298
GROUP BY FirstName, MiddleName, LastName

SELECT FirstName, MiddleName, LastName,

AddressLine1, AddressLine2, City, StateProvinceName, PostalCode

FROM Sales.vIndividualCustomer

WHERE BusinessEntityID = 17298

GROUP BY FirstName, MiddleName, LastName

Continue reading on SQLStudies.com.

Using GROUP BY instead of DISTINCT

Related

Best Practices Q & A

Robert L Davis

Jason Brimhall

Argenis Fernandez

MinionWare, LLC

Kenneth Fisher

David Klee

Ed Watson

Nic Cain

Chrissy LeMair