There is a plethora of new T-SQL features and enhancements in SQL Server 2005 and you could probably write a small book talking about all of them. However, this article will focus on one of them, specifically, Common Table Expressions because personally they have come in handy on many occasions.
I have used a few of the other new features and enhancements and if you have not had a chance to delve into any of them, I would highly recommend familiarizing yourself with some of them, such as the new xml data type, improved error handling, query and event notifications, new DDL (Data Definition Language) triggers, ranking functions, TOP operator enhancements, the PIVOT and UNPIVOT operators, and the new APPLY operator. There are a few others and they all add tremendous value to T-SQL developers.
This article will introduce common table expressions and go as in-depth as space will allow, showing how CTE’s can be used to improve TSQL’s capabilities and how they can help better your T-SQL.
Common Table Expressions (CTE’s)
A common table expression is expressed as a temporary table or view defined within an executing statement such as SELECT INSERT, UPDATE, or DELETE. Think of a CTE as somewhat like a derived table in that a CTE only lasts throughout the duration of the executing statement. When that execution of the statement finishes, the CTE is gone. However, there are a few differences between a derived table and a CTE:
* A CTE can reference itself. A derived table cannot.
* A CTE can be referenced multiple times in the same query.
So what is the difference between a CTE and a true temporary table? The quick and easy answer is the overhead of having to create a temporary table and the performance consequence of using a temporary table. Think about what happens when you use a true temporary table. First you have to define and create the table then populate it, then run queries against it. When using CTE’s you can forgo the first step and the associated overhead and use the built in CTE which provides all of that functionality for you.
CTE Structure
The structure of CTE is fairly simple. It basically contains two main parts, the first part being the CTE followed by the second part which is the normal execution statement. The basic syntax for a CTE is as follows:
WITH common_table_expression_name [column(s)]
AS
(
query_definition
)
EXECUTION STATEMENT
As defined in the syntax above, a common table expression contains several arguments which are defined as follows:
common_table_expression_name: The identifier for the common table expression. This name must be different than any other table or view used within the CTE. It can, however, be named the same as a base table or view but when the common table expression is referenced in the normal execution statement the common table expression name must be used, not the base name.
column(s): The column names in the common table expression. These column names must be unique (meaning, no duplicate names allowed) and must match the same number of columns returned by the query_definition.
query_definition: The SELECT statement in which the results are used to populate the common table expression. This statement follows the same requirements that the normal execution statement follows.
A CTE expression cannot define another CTE expression.
The following example uses the syntax outlined above to create a very simple common table expression query. Open a query window in SQL Server Management Studio and execute the following query against the AdventureWorks database.
WITH CTE_ProdMod (ProductModelID, ProductModelCount) AS
(
SELECT ProductModelID, COUNT(*)
FROM Production.Product
WHERE ProductModelID IS NOT NULL
GROUP BY ProductModelID
)
SELECT ProductModelID, ProductModelCount
FROM CTE_ProdMod
ORDER BY ProductModelID
The example above uses a common table expression to count the number of products for each product model in the Production.Product table. A portion of the results of this query when executed are shown below.
ProductModelID ProductModelCount
-------------- -----------------
1 3
2 1
3 3
4 3
5 10
6 11
7 8
8 10
9 12
10 10
A simple example, yet not that impressive because it doesn’t quite showcase the true power and flexibility of what a common table expression can really do. So, the following example uses a common table expression to return the count of products for each product model. The count is returned in the CTE, then the CTE is used to filter the results even further based on what was returned in the CTE.
WITH CTE_TranHist (ProductID, OrderQuantity) AS
(
SELECT ProductID, COUNT(*)
FROM Production.TransactionHistory
WHERE TransactionDate > '05/01/2004'
GROUP BY ProductID
)
SELECT cte.ProductID, cte.OrderQuantity, pp.productmodelid, ProductSubcategoryID
FROM Production.Product pp
INNER JOIN CTE_TranHist AS cte ON pp.productID = cte.productid
WHERE cte.productid BETWEEN 500 AND 750
AND ProductModelID IS NOT NULL
ORDER BY cte.ProductID
The example above uses a common table expression to count the number of products for each product model in the Production.Product table. A portion of the results of this query when executed are shown below.
ProductID OrderQuantity ProductModelID ProductSubcategoryID
----------- ------------- -------------- --------------------
680 21 6 14
706 15 6 14
707 567 33 31
708 526 33 31
711 578 33 31
712 567 2 19
713 118 11 21
714 150 11 21
715 138 11 21
716 99 11 21
717 10 6 14
718 12 6 14
722 57 9 14
725 1 9 14
726 1 9 14
729 1 9 14
730 2 9 14
736 54 9 14
737 45 9 14
738 71 9 14
739 63 5 12
742 67 5 12
743 78 5 12
746 47 5 12
747 62 5 12
748 80 5 12
Imagine now that syntax you would have had to use to get the same results prior to SQL Server 2005. It would have looked something like this:
SELECT TranHist.ProductID, TranHist.OrderQuantity, pp.ProductModelID, pp.ProductSubcategoryID
FROM Production.Product pp
INNER JOIN (SELECT ProductID, COUNT(*)
FROM Production.TransactionHistory
WHERE TransactionDate > '05/01/2004'
GROUP BY ProductID) AS TranHist (ProductID, OrderQuantity)
ON pp.productID = TranHist.productid
WHERE TranHist.productid BETWEEN 500 AND 750
AND ProductModelID IS NOT NULL
Now, which would you rather read, let alone write? I thought so. Common table expressions have a number of advantages including better readability and make it much easier to maintain more complex queries.
Now that you have a general understanding of CTE’s, the true value of CTE’s appear when you have the need for recursive queries. This topic is discussed next.
Recursive Queries
If you’re not excited about CTE’s yet then you will be shortly because CTE’s provide the ability to do something that SQL developers have been praying for a very long time, and that is querying hierarchical data. Didn’t you just cringe when your boss asked you to create a company employee org chart? At long last the prayers of SQL developers have been answered.
In straightforward terms, a recursive CTE is a CTE that references itself. It is executed over and over until no more results are returned. Just like a normal CTE, a recursive CTE can be used with a normal SELECT, UPDATE, INSERT, AND DELETE statement, and even a CREATE VIEW statement.
The general syntax for create a recursive CTE is as follows:
WITH common_table_expression_name [column(s)]
AS
(
query_definition_anchor_member
[UNION ALL | UNION | EXCEPT | INTERSECT]
query_definition_recursive_member
)
EXECUTION STATEMENT
A recursive CTE, while similar to that of a normal of a normal CTE, contains three parts versus the one of a normal CTE. Those three parts are explained below.
common_table_expression_name: No different than that of a normal CTE, this is the identifier for the common table expression. This name must be different than any other table or view used within the CTE. It can, however, be named the same as a base table or view but when the common table expression is referenced in the normal execution statement the common table expression name must be used, not the base name.
query_definition_anchor_member: The anchor member is the query definition which forms the base result set. Anchor members must precede the recursive member definition joined together by a UNION ALL operator.
query_definition_recursive_member: The recursive query definition includes the query definitions joined to the anchor member by a UNION ALL operator.
A correctly coded recursive CTE must contain at least one anchor member query definition and one recursive member query definition.
A Simple Example
To understand the recursive CTE structure and they work, a simple sample is in order.
WITH CTE_Sample (Col1, Col2, Col3) AS
(
SELECT Column1, Column2, Column3
FROM Table1
WHERE Some_Where_Clause
UNION ALL
SELECT Column1, Column2, Column3
FROM Table1
INNER JOIN CTE_Sample ON Table1.Column1 = CTE_Sample.Column1
)
SELECT Col1, Col2, Col3
FROM CTE_Sample
Examining the example above you should see that a recursive CTE works as follows:
First, the anchor query definition returns a base result set. Next, the recursive query definition queries and returns information based on the anchor query definition, accomplished by joining the recursive query definition to the anchor query definition via a UNION ALL operator. The recursive query is executed again using the results returned from the first iteration as the base query (or input value to the next iteration of the recursive query). The recursive query is executed again and again until it finds no more results, at which point the entire result set is returned to the execution statement.
As a more detailed example, consider the following:
WITH CTE_Sample (MngrID, EmpID) AS
(
SELECT ManagerID, EmployeeID
FROM HumanResources.Employee
WHERE ManagerID IS NULL
UNION ALL
SELECT ManagerID, EmployeeID
FROM HumanResources.Employee
INNER JOIN CTE_Sample ON Employee.ManagerID = CTE_Sample.EmpID
)
SELECT MngrID, EmpID
FROM CTE_Sample
This example, a scaled down version of the example found in the SQL Server 2005 BOL, follows the steps outlined in the previous paragraph. The anchor query returns the base result set, in this case, the big cheese since he has no ManagerID. The recursive query is then executed, joining on the anchor query to return all employees who has a ManagerID of the person found in the anchor query (in this case, the big cheese). The recursive query is then executed again using those employees recently found in the first iteration of the recursive query, using those as the base query or input value to the current recursive query.
As stated above, a recursive CTE is executed over and over until no more results are returned. It does this by providing an internal, implicit termination check. With each iteration of the CTE, it basically asks itself “hey, have we retrieved all the rows?” and if the results of that check are true then it says “OK, we’re done!” and exits the recursion.
During the 1960’s there was a popular TV show called “Lost in Space” in which an unnamed robot and his human counterpart, “Will Robinson” traveled through space tackling one adventure after another. Part of the responsibility of the robot was to “watch the back” of Will and warn him of any potentially dangerous situations. When a situation did arise that could be or indeed was dangerous, the robot would utter the phrase “Danger, Will Robinson!” and our hero would spring into action.
Why does this have to do with recursive CTE’s? Because as cool as recursive CTE’s are, there exists the possibility of mistakenly creating a recursive CTE that results in an infinite loop. Not good. “Danger, Will Robinson!”. This happens when a recursive member of the CTE returns the same value for both the parent and child columns. Bam, you’re in an infinite loop.
Summary
OK, there you have it. CTE’s in a nutshell. The intent of this article is to provide a helpful explanation and overview of common table expressions in SQL Server 2005 and show the true power and flexibility they provide with recursive query capability. We started out covering simple common table expressions and showed how much easier and clean they are to use over what you used to have to use in previous versions of SQL Server, and from there we discussed recursive CTE’s and how to use them to retrieve hierarchical data. Very cool stuff. Try not to drool when you start using common table expressions. They have that affect.