A Closer Look at OUTPUT
Lots of people have used the
OUTPUT clause in an
DELETE statement, but many may not know just how flexible it is, and what hidden gems there are in the documentation. In this post we are going to look at two interesting and surprising use cases for the
For those of you that might not know how
OUTPUT works, let’s walk through a quick demo:
In the script we create a test database, create a table
MyGuid to store some data, and also a log table
MyGuid_Log to store historical records on what was changed in our table. At this point you have a few options to get the log table populated. You could implement a trigger on
MyGuid that inserts records into the log, but you could also use the
Based on this code, all 10,000 inserts will also be inserted into the
MyGuid_Log table as well as the newly updated records. We are accessing the new/updated records using the
Inserted psuedo-table provided by the
OUTPUT clause. This is similar to how you would interact with records in a trigger. When updating and deleting records you also have access to the
Deleted psuedo-table, allowing you to log both the current and previous state of a record if desired. It’s also important to note that you can insert into a table variable, or remove the
INTO portion of the statement completely and just return the output as a result set to be consumed.
A common command in the Linux world is the
tee command. What
tee allows you to do is pipe the output of a command to a file as well as the console. This same functionality can be implemented using multiple
OUTPUT clauses in a T-SQL statement. In this example we are going to update a few hundred records. When the update statement is run, not only will it update the MyGuid table but it will update a log table and also return the result of the update. This is accomplished by using two
This can be useful in cases where you need to do some logging but also need to know what records were just altered by a statement. In my example I am selecting
Inserted, but you could just as easily select the ID column only (or any column you need), and return that to the app making the call.
SELECT FROM … UPDATE?
While the “tee” functionality is useful, this next behavior took me by surprise. In this example we are going to illustrate how to nest an
UPDATE statement within an
INSERT statement as a sub-query. This feature of the
OUTPUT clause is discussed in Microsoft Docs, but because they chose to illustrate the concept using a
MERGE statement, it isn’t immediately obvious what it’s doing. In this script we are going to be working with some product inventory information. We will create an
ItemQty table to store item quantity information, an
Item table that store item details (just a name in this case), and an
Item_History table to store changes to an item.
Now we are going to populate our Item table with a few random string values (GUIDs), and then we will insert some item quantity data based on these items. We are going to use a “tally table” in this example just to generate some rows.
So far we have an
Item table full of some fake items, and an
ItemQty table full of item quantity values. Now we want to run a statement to update quantity, but we need to make sure that change gets logged in our history table. When we created the history table we included the
Item.ItemName column, as well as the old and new values for
ItemQty.Qty. Using a single T-SQL statement, and no triggers, we can update quantity while also inserting data from two different tables into a history table. Not only are we going to combine data from two tables, but we’ll also be making sure we are only inserting data when the quantity was actually changed. If the old and new quantity values are the same, nothing will be inserted.
Since we are treating the output of the
UPDATE statement as a table, we can filter that output, making sure we only get records that have changed. If we look at our history table, we should now see records where the quantity was changed:
One of the more interesting things about this method is the query plan. Based on some analysis using live query stats, the plan seems to stream the records into the history table as the updates occur. There are no intermediate objects created here, so unless your use case forces a merge join or another operation requiring sorting, your tempdb usage should be minimal. Here is the plan on my system (using SQL Operations Studio running against a SQL on Linux container):
Like anything else in SQL Server, you need to weigh the benefits of using any of the code above with how unfamiliar it might be to people that have to maintain it. With proper commenting you should be able to make it clear what you are trying to accomplish. There are a few restrictions when using
OUTPUT, so I would suggest reading through the full documentation to make sure you fully understand what you are getting into. Along with a full explanation of the restrictions, there are also a few other neat use cases proposed, like a queue implemented using
DELETE TOP(1) and