A note: I’m reposting this because I accidentally deleted it from WordPress. Because I’m an idiot.
I wrote a post about Overcoming my Fear of Pivot. With my newfound confidence, I decided to tackle dynamic pivots. This is a common scenario where you need to PIVOT, but you don’t know exactly what you’re going to end up with. Basically, you want to allow all of the possible column headers to come back with the aggregated data you need.
If you’re not familiar with PIVOT, go back and read the original post. If I’ve done my job properly, it should make sense. So, here’s what I did… I resisted the urge to hit Google to find a solution to the dynamic pivot problem. I opened SSMS and said, “Self, you’re under a deadline. Write it and see if you can get it to work all by your lonesome”. 45 minutes later, I had a working script that produced some cool real-world output, if I do say so myself.
Then, I hit Google. Then I saw Itzik Ben-Gan’s solution. My first response was, “Crap!” Actually, it was a much less ladylike expletive than that. The solution was… Beautiful. Elegant. Blew my method out of the water. You know how athletes have muscle memory? Well, developers have it too. We fall back to what’s comfortable and familiar. Sort of like our own version of T-SQL sweatpants and chocolate ice cream. Before I start in on the comparison of my solution and Itzik’s, let me say this: His is so much better than mine. Did I mention that it was elegant? And beautiful? But you know what? In a real development environment, with deadlines and giant to-do lists, I would have fallen back to my own comfort zone. I know this. I also know that next time I need to write a dynamic PIVOT, I’m going to know how to use his method.
Authors, when asked to give advice to aspiring writers, always say the same thing. “Write what you know.” For us IT Folk, there’s a corollary. “Write what you know. Hit the deadline. Then, go learn a better way.” Am I proud that I figured a solution out on my own? Yup. Am I a bit deflated that I didn’t come up with the same solution as Itzik Ben-Gan? Nope. Come on, it’s Itzik.
Personal note: I hate when I run across someone else’s T-SQL and ask them, “How does this work?”, and their response is, “I don’t know, I found it on a blog post/Google/forum.” Peeps, this is unacceptable. Don’t copy and paste until you understand what you’re seeing. Because someday you’re going to have to maintain that pilfered bit of code. If you don’t know what it does, then don’t use it. Comprehend your own code. We all borrow from the experts, but make sure you can explain it in 50 words or less. If you can’t, then back away from the Ctrl+V. Stretch your skills, learn new things, just don’t jeopardize a project by jumping the gun.
Okay, enough commentary. On to the solutions. The trick in a dyamic PIVOT is to create a string that has all of the column headers you need. This is where he and I diverged wildly. I fell back on a WHILE Loop over a set of rows contained in a table variable, he used the STUFF function with a FOR XML Path() query output. I wrote my solution to address the same example from BOL that I ranted about in my first post. I modified his solution to produce the same output, and to clean out some unused variables that were in the sample I found. I’ve also resisted the urge to make little tweaks to my script after doing some extra research. Truly, I want to make the point that there’s what works… and what works beautifully.
SET NOCOUNT ON;
DECLARE @vEmployeeIDTable as TABLE
EmployeeID varchar(20) NOT NULL
,ProcessedFlag bit NOT NULL DEFAULT(0)
DECLARE @vEmployeeID varchar(20)
DECLARE @vSQLString varchar(max) = ”
DECLARE @vEmployeeIDSELECT varchar(max) = ”
DECLARE @vEmployeeIDFOR varchar(max) = ”
DECLARE @vLoopCounter varchar(50) = 1
INSERT INTO @vEmployeeIDTable(EmployeeID)
SELECT DISTINCT EmployeeID
WHILE (SELECT count(ProcessedFlag) FROM @vEmployeeIDTable WHERE ProcessedFlag = 0) > 0
SET @vEmployeeIDSELECT = @vEmployeeIDSELECT + @vEmployeeID + ‘ as Emp’+@vLoopCounter+’,’
SET @vEmployeeIDFOR = @vEmployeeIDFOR + @vEmployeeID +’,’
SET ProcessedFlag = 1
WHERE EmployeeID = cast(substring(@vEmployeeID,2, LEN(@vEmployeeID)-2) as int)
SET @vLoopCounter = @vLoopCounter + 1
SET @vEmployeeIDSELECT = SUBSTRING(@vEmployeeIDSELECT,1, len(@vEmployeeIDSELECT)-1)
SET @vEmployeeIDFOR = SUBSTRING(@vEmployeeIDFOR,1, len(@vEmployeeIDFOR)-1)
SET @vSQLString = ‘
SELECT VendorID, ‘+@vEmployeeIDSELECT +’
(SELECT PurchaseOrderID, EmployeeID, VendorID
FROM Purchasing.PurchaseOrderHeader) p
FOR EmployeeID IN
) AS pvt
ORDER BY pvt.VendorID; ‘
So, a quick rundown of what I did:
1) Create a table variable (@vEmployeeIDTable). Populate it with DISTINCT EmployeeID’s from Purchasing.PurchaseOrderHeader.
2) Declare the following variables:
a) @vEmployeeID – holds the EmployeeID I’m concatenating into the string during the WHILE loop
b) @vEmployeeIDSELECT – holds the EmployeeID string that I’ll use in the SELECT clause of my PIVOT. I separate this one out because I want to concatenate the column aliases just as they were in the BOL example.
c) @vEmployeeIDFOR – holds the EmployeeID string that I use in the FOR clause of my PIVOT. I don’t need column aliases here.
d) @vLoopCounter – holds a counter as I loop through the string concatenation. I use it to help name my column aliases (Emp1, Emp2…). The 1 and 2 are coming from this variable
3) While I have unprocessed rows in my table variable, I loop through with a WHILE
a) Set @vEmployeeID to the minimum EmployeeID that hasn’t been processed. I also concatenate on the brackets I need since these will become column names. (Those brackets were a pain. I kept having to work around them. Another place where Ben-Gan’s method was more elegant)
b) Set @vEmployeeIDSELECT to itself plus the EmployeeID being processed (@vEmployeeID), and then set up the alias. (as ‘Emp’+@vLoopCounter). Important note: I initialized the variable as an empty string (”). This is so that I’m not trying concatenate a NULL value to a string on the first go-round.
c) Set @vEmployeeIDFor to itself plus the EmployeeID being processed
d) Update @vEmployeeIDTable to indicate that the EmployeeID has been added to the string variables
e) Update @vLoopCounter so that the next table alias will be the next number
4) Clean up the extra commas at the end of the string variables
5) Put the whole thing together in @vSQLString
a) Place the @vEmployeeIDSELECT variable where it needs to go
b) Place the @vEmployeeIDFOR variable where it needs to go
6) Execute the variable @vSQLString
This is the output:
Okay, not bad. Now, the elegant Itzik Ben-Gan solution:
@cols AS NVARCHAR(MAX),
@sql AS NVARCHAR(MAX);
SET @cols = STUFF(
(SELECT N’,’ + QUOTENAME(EmployeeID) AS [text()]
FROM (SELECT DISTINCT EmployeeID FROM Purchasing.PurchaseOrderHeader) AS Y
ORDER BY EmployeeID
FOR XML PATH(”)),
1, 1, N”);
SET @sql = N’SELECT ‘+@cols +’
FROM (SELECT VendorID, EmployeeID, PurchaseOrderID
FROM Purchasing.PurchaseOrderHeader) AS D
FOR EmployeeID IN(‘ + @cols + N’)) AS P
ORDER BY P.VendorID;’;
EXEC sp_executesql @sql;
I know, right? Elegant. So what did he do?
1) Declared a couple of variables
a) @cols – holds the string of column values for the PIVOT
b) @sql – holds the SQL statment that gets executed
2) Used a FOR XML PATH(”) command to concatenate the string. This is cool. The query pulls EmployeeID’s out of a derived table in the FROM Clause. He orders by EmployeeID (which is not required), and outputs the result of this query using FOR XML PATH(”). The FOR XML PATH(”) clause creates a single row that looks like this:
Wow, exactly what we need for the PIVOT. Well, almost. That’s what the STUFF function is for. Getting rid of “almost”.
3) Also, see how he used QUOTENAME to add the brackets he needed?
QUOTENAME(EmployeeID) AS [text()]
4) Then, since that leading comma (,) is not needed, he uses the STUFF command to strip it off. STUFF looks like this:
STUFF ( character_expression , start , length ,character_expression )
a) character_expression – the results of the query containing the FOR XML PATH(”) output
b) start – first character
c) length – how many characters to replace with what we’re “stuffing” in. In this case, a length of 1.
d) character_expression – an empty string, which is what’s’ “stuffed” into the first character expression, eliminating the comma.
Try this to illustrate it much more simply:
SELECT STUFF(‘abcdef’, 1, 1, ”);
Your result is: ‘bcdef’. The empty string he specified basically replaces the first character which is the comma we don’t want. Seriously, I had to run the baby STUFF to understand it properly. The beauty of STUFF over SUBSTRING is that SUBSTRING requires you to tell the function the length of the resulting string, which would require a LEN function over the entire subquery to get it right. It saves you having to execute that bad boy more than once.
5) Finally, he just puts the PIVOT query into @sql, concatenating in @cols where he needs to, and then executes it.
This is his output:
So he didn’t do pretty column aliases, but the important data is the same. And just take a look at the execution plans. That’s where I do feel just a bit deflated. Mine is monstrous. His? TWO queries. TWO! But that’s not the point. The point is, I had a blast figuring out how to write my own dynamic PIVOT. I had even more fun dissecting Itzik Ben-Gan’s method. (Yeah, I know. I’m a dork.) And, you can bet your sweet bippy that I’ll be working to make sure that FOR XML PATH, STUFF, and QUOTENAME all become part of my T-SQL muscle memory.