Compile Time Part 2

TL;DR – Square brackets do not save the optimizer any time.

If you didn’t already read Part 1, you might want to go back and do that.

I will easily admit that I can be rather obsessive about code formatting sometimes. Frankly, inconsistency annoys me. When it comes to Transact-SQL, this can translate into all sorts of little quirks in my code. One of the latest ones is overuse of [square brackets].

They have their place, of course. There’s a reason the QUOTENAME function exists. And there will always be some annoying fool that likes to create columns with names like “name” and “type” or – worse – put actual SPACES in them!

But I’ll admit it – I’ve been getting carried away. I blame Postgres, which won’t let me use Pascal casing without throwing quotes around every identifier. But I digress…

Anyway, this obsession had me thinking – does wrapping identifiers in square brackets save SQL Server any time? Does it say to the optimizer, “Hey, I PROMISE this whole thing inside these square brackets is an identifier. Cross my heart.” And the optimizer takes your code at its word and doesn’t look through its list of reserved keywords for one that matches AccountCreateDate or address_line_2?

The answer is… no. Throwing every identifier into square brackets doesn’t speed it up at all. Here’s the test:


IF EXISTS (SELECT 1 FROM sys.procedures WHERE object_id = OBJECT_ID('dbo.j1'))
DROP PROCEDURE dbo.j1;
GO
CREATE PROCEDURE dbo.j1
AS
EXEC(N'DECLARE @rowguid UNIQUEIDENTIFIER; SELECT @rowguid = rowguid FROM Person.Address WHERE AddressID = 1 OPTION (RECOMPILE);');
GO

IF EXISTS (SELECT 1 FROM sys.procedures WHERE object_id = OBJECT_ID('dbo.j2'))
DROP PROCEDURE dbo.j2;
GO

CREATE PROCEDURE dbo.j2
AS
EXEC(N'DECLARE @rowguid UNIQUEIDENTIFIER; SELECT @rowguid = [rowguid] FROM [Person].[Address] WHERE [AddressID] = 1 OPTION (RECOMPILE);');
GO

DBCC FREEPROCCACHE;
GO

DECLARE @i INT = 1;
WHILE (@i <= 100000)
BEGIN
EXEC dbo.j1
EXEC dbo.j2
SELECT @i = @i + 1;
END;
SELECT OBJECT_NAME(object_id) AS 'procedure_name', total_worker_time / 1000 AS 'cpu_in_ms'
FROM sys.dm_exec_procedure_stats
WHERE object_id IN (OBJECT_ID('dbo.j1'), OBJECT_ID('dbo.j2'))
ORDER BY OBJECT_NAME(object_id);
GO

Note that this was written to use AdventureWorks2014. Also note that I’m running ad hoc SQL inside of a pair of stored procedures so that I can measure a very quick and small bit of SQL over a lot of executions, forcing it to recompile every time yet keeping the plan cache from blowing up. See Part 1 if you don’t believe that my ad hoc SQL’s compile time is factored into total_worker_time.

Anyway, this code takes about a minute and a half to run on my system. The results? A dead heat. Sometimes, j1 wins, and sometimes, j2 wins. There’s no significant difference.

Compile Time Part 1

TL;DR – The sys.dm_exec_{query|procedure}_stats.*_worker_time fields do not include compile time. But the values in sys.dm_exec_procedure_stats.*_worker_time DO include compile time for ad hoc SQL within the procedures. Also, as a side note, a lot of indexed views can be a significant performance problem.

I recommend a blog post by Jonathan Kehayias entitled, “Identifying High Compile Time Statements from the Plan Cache“. In that, he’s concentrating on queries that have a high individual compile cost.

However, the object of my search was small queries being compiled and executed in high volume (more on why in Part 2 of this). In my case, I was looking for ad hoc SQL being generated and executed inside of stored procedures – the kind of queries that don’t make their way to the plan cache, often due to the use of OPTION (RECOMPILE) to keep them from spamming the cache.

What I found was that worker time needed to compile these queries is indistinguishable from that needed to execute them. To show this, let’s look at an example in AdventureWorks2014. In this example, I’m going to create and execute two similar procedures. I’m also going to create a number of indexed views.

Why indexed views? I want to increase compile time significantly for this exercise, and a large number of indexed views can do that. From MSDN: “The query optimizer may use indexed views to speed up the query execution. The view does not have to be referenced in the query for the optimizer to consider that view for a substitution.” My thanks to oas_public on stackoverflow.com for that tip.

Anyway, let’s create a stored procedure that will use our indexed view:


CREATE PROCEDURE dbo.j1
AS
BEGIN
DECLARE @stmt NVARCHAR(4000);
SELECT @stmt = N'/* dbo.j1 */
SELECT TOP 1
SUM(UnitPrice * OrderQty * (1.00 - UnitPriceDiscount)) AS Rev,
OrderDate,
ProductID
FROM Sales.SalesOrderDetail AS od
INNER JOIN Sales.SalesOrderHeader AS o
ON od.SalesOrderID = o.SalesOrderID
AND ProductID BETWEEN 700 and 800
AND OrderDate >= CONVERT(datetime,''05/01/2002'',101)
GROUP BY OrderDate, ProductID
ORDER BY Rev DESC;';
EXEC(@stmt);
END; --PROCEDURE
GO

By the way, this evolved from sample code on MSDN. Now here’s a second stored procedure that is ever-so-slightly different:


CREATE PROCEDURE dbo.j1
AS
BEGIN
DECLARE @stmt NVARCHAR(4000);
SELECT @stmt = N'/* dbo.j2 */
SELECT TOP 1
SUM(UnitPrice * OrderQty * (2.00 - UnitPriceDiscount)) AS Rev,
OrderDate,
ProductID
FROM Sales.SalesOrderDetail AS od
INNER JOIN Sales.SalesOrderHeader AS o
ON od.SalesOrderID = o.SalesOrderID
AND ProductID BETWEEN 700 and 800
AND OrderDate >= CONVERT(datetime,''05/01/2002'',101)
GROUP BY OrderDate, ProductID
ORDER BY Rev DESC;';
EXEC(@stmt);
END; --PROCEDURE
GO

To make this a fair test, let’s clear the cache:


DBCC FREEPROCCACHE;
GO

And execute both of the procedures:


EXEC dbo.j1;
EXEC dbo.j2;

How long did they take?


SELECT OBJECT_NAME(object_id) AS 'procedure_name', total_worker_time / 1000 AS 'cpu_in_ms'
FROM sys.dm_exec_procedure_stats
WHERE object_id IN (OBJECT_ID('dbo.j1'), OBJECT_ID('dbo.j2'))
ORDER BY OBJECT_NAME(object_id);

On my local system, both run in about 85-105 ms. Obviously, YMMV.

Now let’s create an indexed view. I’m going to reuse this code a couple of times, so bear that in mind when you see how it’s constructed:


--Set the options to support indexed views.
SET NUMERIC_ROUNDABORT OFF;
SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT, QUOTED_IDENTIFIER, ANSI_NULLS ON;
GO

DECLARE @i INT = 1, @stmt NVARCHAR(4000), @t1 NVARCHAR(4000), @t2 NVARCHAR(4000), @t3 NVARCHAR(4000)

-- Create template for dropping the view.
SELECT @t1 = N'/* jtest */
IF OBJECT_ID(''Sales.vOrders_{1}'') IS NOT NULL
DROP VIEW Sales.vOrders_{1};';

-- Create template for creating the view.
SELECT @t2 = N'/* jtest */
CREATE VIEW Sales.vOrders_{1} WITH SCHEMABINDING
AS
SELECT
SUM(UnitPrice*OrderQty*(1.00-UnitPriceDiscount)) AS Revenue,
OrderDate,
ProductID,
COUNT_BIG(*) AS COUNT
FROM Sales.SalesOrderDetail AS od
INNER JOIN Sales.SalesOrderHeader AS o
ON od.SalesOrderID = o.SalesOrderID
GROUP BY OrderDate, ProductID;';

-- Create template for creating the index.
SELECT @t3 = N'/* jtest */
CREATE UNIQUE CLUSTERED INDEX IDX_V1_{1}
ON Sales.vOrders_{1} (OrderDate, ProductID);';

-- Drop and create the objects.
WHILE (@i <= 1)
BEGIN
SELECT @stmt = REPLACE(@t1, N'{1}', CAST(@i AS NVARCHAR(4000)));
EXEC (@stmt);
SELECT @stmt = REPLACE(@t2, N'{1}', CAST(@i AS NVARCHAR(4000)));
EXEC (@stmt);
SELECT @stmt = REPLACE(@t3, N'{1}', CAST(@i AS NVARCHAR(4000)));
EXEC (@stmt);
SELECT @i = @i + 1;
END; --WHILE
GO

Note that the indexed view supports dbo.j1, but not dbo.j2. Let’s go back and run those again:


DBCC FREEPROCCACHE;
GO

EXEC dbo.j1;
EXEC dbo.j2;

SELECT OBJECT_NAME(object_id) AS 'procedure_name', total_worker_time / 1000 AS 'cpu_in_ms'
FROM sys.dm_exec_procedure_stats
WHERE object_id IN (OBJECT_ID('dbo.j1'), OBJECT_ID('dbo.j2'))
ORDER BY OBJECT_NAME(object_id);
GO

If your results are like mine, you’ll see that dbo.j1 has dropped to about a quarter of its previous worker time, but dbo.j2 hasn’t budged. So, our indexed view is doing what we want. Now let’s create a few more. How does 500 sound? In that WHILE loop for the indexed view script, change the constant from 1 to 500 and run it again. Bear in mind, this took about 90 seconds on my system, so be prepared to wait.

Once all 500 indexed views are created, execute the two stored procedures again and look at the results. Both worker time values will be much higher!

But the actual execution time hasn’t changed. We’re still going to get essentially the same plan for each query, which means SQL Server should be doing the same work to execute it. But now the compilation is much more costly because the optimizer is looking at all 500 views when generating the plan. And because that compilation is taking place inside of ad hoc SQL in a stored procedure, SQL Server is reporting it as part of the procedure’s worker time in sys.dm_exec_procedure_stats.

To verify this, you can run Jonathan’s code, which will break out the compile time for those ad hoc queries.

If you’ve been following along in your own database and want to gracefully clean up the mess, you’ll want to do two things. First, execute this:


DROP PROCEDURE dbo.j1;
DROP PROCEDURE dbo.j2;
GO

And second, go back to the indexed view script, comment or remove the last two EXEC lines, and run it again.

In conclusion, the *_worker_time fields in sys.dm_exec_query_stats and sys.dm_exec_procedure_stats do NOT include compile time. That was why Jonathan had to dig into the cached plan and tease that information out of the XML. But the compile time of ad hoc statements inside of a stored procedure WILL be included in the procedure’s execution time.

And too many indexed views can suck.

Stay tuned for Part 2, which explains why I chased this down in the first place.

NULL Pain

I had an issue today that resulted in a tweet to #sqlhelp:

#sqlhelp See Remarks section: http://ow.ly/Za6HX So how do you strip out 0x000? Suggestions?

As it turns out, when you have a character string in SQL Server that contains character 0x000, it really doesn’t know what to do with it most of the time, especially when you’re dealing with Unicode strings.

I did track down http://sqlsolace.blogspot.com/2014/07/function-dbostripunwantedcharacters.html, but I generally try to avoid calling UDF’s in my queries.

I ended up settling on this, which is based on the fact that REPLACE() works when the data is ASCII

CASE WHEN LEN(REPLACE(CAST([foo] AS VARCHAR(4000)), CHAR(0), '')) = LEN([foo]) THEN [foo] ELSE [dbo].[strip_char0]([foo], NCHAR(9)) END

My strip_char0 function was mostly like the one from sqlsolace, so I won’t repeat it here. But I did add in a replacement character option (in this example, a tab). And I’m only calling the UDF when I need to, which makes the performance geek in me happy.

My thanks to @SQLSoldier for being responsive on the #sqlhelp hashtag.

It Does Not Depend

A common refrain in the SQL Server community is the phrase, “It depends.” And, truthfully, that’s the correct answer to many questions. There are so many variables in any given situation that what works great 99 times will not work on the 100th, due to some slight difference.

But here’s one piece of advice that – even if it’s not perfectly 100% – has a high enough success rate that I would say “It does not depend.”


EXECUTE sp_configure 'Show Advanced Options', 1;
RECONFIGURE;
EXECUTE sp_configure 'Optimize for Ad hoc Workloads', 1;
RECONFIGURE;
GO

I have yet to come across a situation in which this shouldn’t be enabled. And I’ve heard others say the same.

By the way, in case you have the same question as an attendee at my SQL Saturday session today in Salt Lake City, no, it will not wipe the plan cache when you turn it on.

SELECT TOP 1 * FROM PASS.Summit2015;

I’ve been uncharacteristically quiet during this particular Summit. I picked up a cold on the way out there, which made the week a bit different than I was expecting. Still, I always take away quite a lot each time I go, both big and small, and this year was no exception. I’d like to share one of the small things here.

At some point, we’re all running ad hoc queries in a production or otherwise significant environment. We’re doing root cause analysis, looking for bad data, trying to figure out why something is slow, whatever. Our ad hoc queries get rather large sometimes, as we’re pulling in information from all sorts of places. Often, we have several stored away in a toolbox for these occasions, our “go-to” scripts when we get that inevitable call in the middle of the night.

Here’s one small change to those queries that will make one small difference. Admittedly, it’s not much, but it could add up, especially if you’re in a large shop with a lot of people who are doing the same thing. If we all changed our ad hoc to include this, we might actually have an impact. So, what’s the change? Throw in “OPTION (RECOMPILE)” at the end of all your ad hoc statements.

What does this buy us? Well, basically, it keeps those one-off ad hoc queries from bloating the plan cache. Does it really matter? Not much, in the grand scheme of things, except that I know some of the queries in my toolbox are quite lengthy. Why have SQL Server cache the plans for those?

By the way, you can also enable the Optimize for Ad Hoc Workloads server configuration option. This will save you nearly as much, at least until you hit F5 the second time.

Anyway, I might share some of the other stuff I picked up from this year’s Summit. Just not my cold, I hope. But the one thing I always share is my advice that you attend. If you’re a data-centric developer like me, definitely go. It’s excellent content by a thriving community in a beautiful city. I’m already looking forward to next year.

SELECT INTO

I just came across a stored procedure that does something like this:


DECLARE @stmt NVARCHAR(MAX);
CREATE TABLE #foo (col1 INT, col2 INT, col3 INT);
SELECT @stmt = N'SELECT col1, col2, col3 INTO #foo FROM bar;';
EXEC sp_executesql @stmt;

The incorrect assumption that the developer made was that the first #foo (the one created with CREATE TABLE) was the same as the second #foo (the one referenced in the INTO clause). But that’s not the case. To prove it, try this slight variation:


DECLARE @stmt NVARCHAR(MAX);
CREATE TABLE #foo (name NVARCHAR(2));
SELECT @stmt = N'SELECT TOP 5 name INTO #foo FROM sys.objects; SELECT * FROM #foo;';
EXEC (@stmt);

If the @stmt was really using the first #foo, then what would you see? The first two characters from name from 5 rows from sys.objects, right? No, actually you’d see Msg 8152, “String or binary data would be truncated.”. But that’s beside the point. The point is that it isn’t using that first #foo. It’s creating a second one. And it’s getting away with that because it’s running within EXEC. Take out the use of @stmt and SQL will complain to you when you try the “SELECT … INTO #foo” that #foo already exists.

Also note that the original used sp_executesql, but that my second query used only EXEC. That was to prove that it was EXEC alone, not sp_executesql, that was the differentiator.

SQL Saturday #402 – Indianapolis

I’m once again presenting a session at a SQL Saturday, this time in my own backyard. SQL Saturday #402 is here in Indianapolis this Saturday. If you’re in the Indy area and spend any significant amount of time with SQL Server on a regular basis, I really encourage you to go. You can learn a lot in a short amount of time, make some good contacts in the industry, learn about great products that can make your job easier – and, if you’re luckier than me, perhaps even win a prize or two. They also have Friday “pre-cons” with room still available, if you want to take it a step further.

My session, Playing in Traffic, is the same one I presented in Columbus. I’m still including the same topics, but I’ve taken the feedback and retooled it somewhat. If you do attend, I can tell you that all of the presenters appreciate your feedback, especially your criticism. We’re all trying to improve, and what you share now can make a difference later.

Also, while I’m not fortunate enough to be presenting in Seattle, I’m definitely attending the Summit this year. If you’re wondering whether or not to attend, just ask anyone who’s been. It’s definitely a valuable experience for SQL Server professionals.

@@ROWCOUNT Strikes Again

Just when I thought I’d been thwarted the last time by @@ROWCOUNT, along comes this gem:

begin try
select top 10 * from sys.objects;
end try
begin catch
select top 20 * from sys.objects;
end catch;
select @@rowcount;

Yup, it returns 0. It seems that even a try/catch block will stomp on the results. Maybe I’ll learn one of these days.

SQL Saturday #421 – Columbus

They say the only three numbers in computer science are 0, 1, and N. Well, I’ll be at the SQL Saturday in Columbus, Ohio, on July 11th, giving a session entitled Playing in Traffic. It’s really a talk about keeping the wheels turning as a SQL developer when N gets really big. It’s the last session of the day, so expect things to be a bit laid back and mildly entertaining, but hopefully still very educational. See you in Columbus!

I <3 SET STATISTICS IO ON

Once upon a time, a senior DBA at the Salesforce Marketing Cloud had pointed out a very small performance tweak to me. I was curious about whether or not it still held true in SQL Server 2014, so I wrote up the sample below. What’s interesting to me is that a cursory examination of the Execution Plan doesn’t reveal the difference. Of course, if you dig deep enough, you’ll find it there. But it’s not obvious.

On the other hand, by using SET STATISTICS IO ON, I can easily see the difference. Hopefully, this isn’t some new earth-shattering development for you so much as a simple reminder to make use of this statistical information.


IF NOT EXISTS (SELECT 1 FROM sys.indexes WHERE object_id = OBJECT_ID(N'Person.Person') AND name = N'IX_Person_PersonType_EmailPromotion')
CREATE NONCLUSTERED INDEX IX_Person_PersonType_EmailPromotion ON Person.Person (PersonType, EmailPromotion);

SET STATISTICS IO ON; SET NOCOUNT ON;
SELECT BusinessEntityID, EmailPromotion
FROM Person.Person
WHERE PersonType = N'SC'
AND EmailPromotion IN (1, 2);
SELECT BusinessEntityID, EmailPromotion
FROM Person.Person
WHERE PersonType = N'SC'
AND EmailPromotion BETWEEN 1 AND 2;
SET STATISTICS IO OFF;

DROP INDEX IX_Person_PersonType_EmailPromotion ON Person.Person;


Table 'Person'. Scan count 2, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Wondering what exactly these results mean? Look here: https://msdn.microsoft.com/en-us/library/ms184361.aspx

Wondering what the tweak was? Look closely at the IN versus the BETWEEN, and then at the number of scans and reads.