Register | Login 
View Article  

Current Articles | Categories | Search | Syndication

The PIVOT Operator - How SQL Continues to Impress Me

By Scott Klein on Saturday, December 09, 2006 :: 1949 Views :: 1 Comments :: :: SQL Server

SQL Server 2005 is awesome.  If you are using SQL Server 2005 and don’t know how cool a product it is, you need to dig deeper.  I can’t begin to tell you how many times it has saved my rear end and made my life much easier.  A task that seemed overwhelming in SQL Server 2000 is barely a drop of sweat in SQL Server 2005.  I can probably write a whole series of articles on the cool things I keep finding.  In fact, I probably will.

 

One of the first great new features, for example, are CTE’s (Common Table Expressions) which have made coding in T-SQL a shear delight.  They make for clean, easy-to-read T-SQL without the overhead of creating and populating temporary tables.  CTE’s automatically do that for you.

 

When I first got my hands on SQL Server 2005 I did a lot of reading about the new T-SQL enhancements that were made, and there were quite a few.  I had done some reading about the PIVOT and UNPIVOT operators that are new, but even when I was working in Microsoft Access and rarely had a use for them so I really didn’t pay much attention to these two new operators.

 

However, I have been working on a project recently that made me take a good look, and ultimately use, the PIVOT operator.  The project I am working on now is converting a large FoxPro application to SQL Server 2005.  The current FoxPro application used Crystal Reports for its reporting, and so all of their reports need to be converted to pull from SQL Server rather than FoxPro.  In one of these reports, the FoxPro code was pulling data from a table and manually pivoting the data to display it in a pivot format based on a specific column in the table.  I had been struggling for a few days trying to find the best and most efficient way to do this in SQL Server.  Lo and behold, I came across the PIVOT operator.  After doing some reading on the operator, I had coded what was needed for the report in very little time.  It took a little bit to get to know the PIVOT operator and understand how it was used, but once I got to know it I was able to write the report in no time at all compared to the many days prior to finding this great operator.

 

Therefore, this article will take a look at the PIVOT operator and how it can be used and save you potentially a lot of time.

 

The Pivot Operator

The PIVOT operator basically allows you to take unique row value information from one column in a table and turn those into a multiple column layout.  If you have used Microsoft Access, this is similar to the Cross-Tab queries you could create in Access.  It allows you to simply summarize data via unique values in a specified column. 

 

The PIVOT operator is a more-than-welcome solution over those nasty SELECT…CASE statements you used to write, or currently write, in SQL Server 2000.  You remember those?  The T-SQL in those could get pretty scary.

 

Luckily, the SQL Server 2005 came along and has been blessed with a much better solution.

Basic Syntax

What makes this operator simple is its syntax.  The syntax is as follows:

 

PIVOT

 

Yep, that’s it.  Just the word PIVOT.  The trick is to know where, and how, to use it.  As much as like SQL Server Books Online, it doesn’t do a very good job explaining what you need to do to get the PIVOT operator to work.  It gives a great example, but leaves most of it up to the reader to decipher the example.  So, this article will use the same example but delve into the code and explain the parts.

 

Here is the example:

 

SELECT VendorID, [164] AS Emp1, [198] AS Emp2, [223] AS Emp3, [231] AS Emp4, [233] AS Emp5

FROM

(SELECT PurchaseOrderID, EmployeeID, VendorID

FROM Purchasing.PurchaseOrderHeader) p

PIVOT

(

COUNT (PurchaseOrderID)

FOR EmployeeID IN

( [164], [198], [223], [231], [233] )

) AS pvt

ORDER BY VendorID

 

When you run this example, you will get some results that look like the following:

 

VendorID    Emp1        Emp2        Emp3        Emp4        Emp5

----------- ----------- ----------- ----------- ----------- ----

1           4           3           5           4           4

2           4           1           5           5           5

3           4           3           5           4           4

4           4           2           5           5           4

5           5           1           5           5           5

7           5           2           5           5           5

8           4           2           5           4           5

10          4           1           5           4           5

 

To really understand this example and how the results are returned as shown above, you need to take a look at the underlying table first.  Query the Purchasing.PurchaseOrder Header table and look at its data.  Open up SQL Server Management Studio, and then open a new query window.  Execute the following query against the AdventureWorks database

 

SELECT PurchaseOrderID, Status, EmployeeID, VendorID, ShipMethodID, SubTotal, TaxAmt

FROM    Purchasing.PurchaseOrderHeader

ORDER BY PurchaseOrderID

 

A sample of that data is shown below:

 

PurchaseOrderID Status EmployeeID  VendorID    ShipMethodID SubTotal              TaxAmt

--------------- ------ ----------- ----------- ------------ --------------------- -------

1               4      244         83          3            201.04                16.0832

2               1      231         32          5            272.1015              21.7681

3               4      241         38          2            8847.30               707.784

4               3      266         85          5            171.0765              13.6861

5               4      164         92          4            20397.30              1631.784

6               4      223         11          3            14628.075             1170.246

7               4      233         84          3            58685.55              4694.844

8               4      238         78          5            693.378               55.4702

9               4      261         74          5            694.1655              55.5332

10              4      274         13          5            1796.0355             143.6828

11              4      244         51          4            501.1965              40.0957

12              4      231         80          5            34644.225             2771.538

13              4      241         47          4            1839.5055             147.1604

14              3      266         81          5            146.286               11.7029

15              4      164         46          5            102.564               8.2051

16              4      223         21          5            150.7905              12.0632

17              4      233         27          5            13669.425             1093.554

18              4      238         86          5            16393.23              1311.4584

19              4      261         45          2            79204.125             6336.33

20              4      264         93          1            551.88                44.1504

21              4      274         64          1            6987.75               559.02

22              4      231         4           2            28072.275             2245.782

23              4      241         17          1            37312.275             2984.982

24              3      266         12          1            4215.75               337.26

25              4      164         44          2            28297.50              2263.80

 

Using the above information, let’s disect the query above that uses the PIVOT operator.  The important columns we want to look at in this query are the PurchaseOrderID, EmployeeID, and VendorID columns. 

 

We can start disecting this query by breaking it up into individual statements first.  You’ll notice that the first part of the query contains a subselect, as hightlighted below.

 

SELECT VendorID, [164] AS Emp1, [198] AS Emp2, [223] AS Emp3, [231] AS Emp4, [233] AS Emp5

FROM

(SELECT PurchaseOrderID, EmployeeID, VendorID

FROM Purchasing.PurchaseOrderHeader) p

 

If you were to run the SELECT statement by itself, it would return the following:

 

PurchaseOrderID EmployeeID  VendorID

--------------- ----------- -----------

1               244         83

2               231         32

3               241         38

4               266         85

5               164         92

6               223         11

7               233         84

8               238         78

9               261         74

10              274         13

 

Taking a close look at these results, we know that the data we want from our PIVOT query is contained within these results.  We know that because of the later part of the query:

 

COUNT (PurchaseOrderID)

FOR EmployeeID IN

( [164], [198], [223], [231], [233] )

) AS pvt

ORDER BY VendorID

 

This part of the query says “give me the count of PurchaseOrders for the following employeeID’s and order them by VendorID”.  Where does it get the data for this part of query from?  It gets the data from the subselect highlighted in the first part of the query. 

 

This is almost like running the following query:

 

SELECT COUNT(PurchaseOrderID)

FROM Purchasing.PurchaseOrderHeader

WHERE employeeid = 164 and vendorid = 1

 

You will see that the results from this query is a value of 4.  This query counts the number of PurchaseOrders where the VendorID = 1 and the EmployeeID = 164.  What you want to do is this exact same query for the other EmployeeID’s, but you want to do that with a single query.  You also want to display those results in individual columns for the given EmployeeID and all the vendors, ordered by the VendorID.

 

This is where the PIVOT operator comes in.  The top SELECT statement specifies the specific id’s from the EmployeeID column that will be used to form the columns in the result set.

 

SELECT VendorID, [164] AS Emp1, [198] AS Emp2, [223] AS Emp3, [231] AS Emp4, [233] AS Emp5

 

This part of the query says “I want to select the VendorID column and data from EmployeeIDs 164, 198, 223, 231, and 233”.

 

From there, the subselect is run from which the final data will be calculated. 

 

Now comes the important part, the PIVOT operator.  The PIVOT operator says “rotate the data from the following query and display them as columns in the columns I specified earlier in my initial SELECT statement”.

 

So the next step is to apply the PIVOT operator to the last part of the query which returns the count of PurchaseOrders for the given employees and order them by VendorID and return them in into the pivot table. 

 

PIVOT

(

COUNT (PurchaseOrderID)

FOR EmployeeID IN

( [164], [198], [223], [231], [233] )

) AS pvt

ORDER BY VendorID

 

You will notice that the syntax is a bit different here.  What this part of the query says is “count the PurchaseOrderID’s FOR the employees with EmployeeID found IN the following EmployeeID’s returned from the earlier subselect”.

 

When you put these parts together, you get the entire query shown on the first page.  The query as a whole looks pretty imtimidating, but when you separate out the parts and start to evaluate each piece individually, it starts to become much more simple to understand.

 

What is really cool is that you can also combine technologies.  Remember the CTE’s I discussed earlier?  I recently wrote a stored procedure that combined a CTE using the PIVOT operator.  I won’t give you the entire stored procedure, but a portion of it looks like the following:

 

WITH PriorBalance1 (HR180ID, clvbal1, clvbal2, clvbal3, clvbal4, clvbal5, clvbal6, clvbal7)

AS

  (

    SELECT pvt.HR180ID, [1], [2], [3], [4], [5], [6], [7] FROM

      (SELECT HR180.HR180ID, pr154a.ypriorbal, pr164.iprtorder         

       FROM        dbo.pr154a

       INNER JOIN  dbo.pr154 ON pr154.pr154id = pr154a.pr154id

       INNER JOIN  dbo.pr164 ON pr164.pr164id = pr154.pr164id

       INNER JOIN  dbo.hr180 ON pr154.hr180id = hr180.HR180ID

       WHERE       Pr154.lactive = 1

       AND         Pr164.lactive = 1

       AND         pr164.caccumsched IN ('2', '4', '6')

       AND         pr154a.dupdate BETWEEN @FisBegDate AND @FisEndDate

  ) p

  PIVOT

   (

     SUM(P.ypriorbal) FOR p.iprtorder IN ([1], [2], [3], [4], [5], [6], [7])

   ) AS pvt

  ),

 

This example follows the same principles as discussed in this article.  All the pieces of the query are the same, and you can actually disect this query the same way. The only difference is that the resulting query from the PIVOT query is returned in the form of a CTE (Common Table Expression) so that it can be used later on.

 

Maybe my next article will be on CTE’s.

Conclusion

The purpose of this article was to help you understand the mystery of the PIVOT operator.  For one little word, it can seem a bit to get used to but the intent of this article was to show you an easy way to look at it and how to use it.  SQL Server 2005 has arrived with an amazing amount of new features and enhancements that make coding in T-SQL much more fun and robust, and the PIVOT operator is certainly one of those.  The trick is to know how to use it.

 


Previous Page | Next Page

COMMENTS

How SQL Continues to Impress Me.
It depressed me. I wonder why the {pivot (for # in ( XXX ) ) cannot take a select statement here.
this is an implementation issue, and it seems like this is a big programming barrier, i would say improper software design. I am sorry this is the first time i got disapointed with sql. very disapointed.

posted @ Tuesday, January 16, 2007 6:33 AM by Hrubesh


Click here to post a comment

Copyright (c) 2008 GSP Developers
Walling Info Systems | Terms Of Use | Privacy Statement