EF Core and nvarchar(max)

My opinion was once, “Entity Framework is slow.” As I dive more deeply into the nuances of Entity Framework Core, that opinion is changing, but only slightly. It’s becoming, “Entity Framework is slow out of the box.”

There are ways to make it better. I’ll describe one of those here, but first, let’s establish the environment:

– Entity Framework Core 7.0.5

– The “code-first” approach

– SQL Server database (any in-support version)

So what does EF Core do wrong?

In this situation, when you define a string field of a model, the underlying data type will be nvarchar(max). For those who don’t know, this is a variable-length Unicode string with an effective maximum length of 231-1 characters (2 GB). This is a combination of three choices that EF Core is making:

1. nvarchar(max): The desired encoding (ASCII versus Unicode) is unknown. Therefore, EF Core defaults to the “safer” option, Unicode.

2. nvarchar(max): The preference between a fixed-length string and a variable-length string is unknown. Because fixed-length Unicode strings have a size limit of 4000, while variable-length strings can be larger (see below), EF Core defaults to a variable-length string.

3. nvarchar(max): The desired length is also unknown. Therefore, EF Core defaults to the maximum possible length, “max”.

What are the problems with those decisions?

1. ASCII strings consume half the memory of Unicode strings in SQL Server. If all you need is an ASCII string, then that’s the option to choose.

2. If you have extremely consistent data lengths, then fixed-length columns can be more efficient.

An example of both of these is a phone number. The necessary characters are all available in the ASCII character set. And phone numbers (even accounting for country codes and the nuances within each of those countries) are roughly equivalent in length.

3. Even though the effective length of the data may be very small, SQL Server must assume that the column can hold a large amount of data. Therefore, it makes decisions about resources based on this assumption.

Using the phone number example again, even though every row in the table holds data in this field with a length somewhere between 10 and 20 characters, SQL Server must assume that there is the possibility that it will store 231-1 characters.

For an example of how this can affect performance, see Eddie Wuerch’s response in this thread on sqlservercentral.com.

4. Data types like nvarchar(max) are known as large value data types or large object (LOB) data types. Data in these columns is handled differently than other types. For example, that data may be stored on a separate type of page (SQL Server’s designation for an 8k block of storage) than the rest of the row. This is called a LOB page. And this data can’t be indexed like other types, because normal indexes have a size limit.

Why does this matter?

Obviously, these are performance issues, and we should all keep performance in mind as we’re developing our solutions. But the bigger issue is that this behavior can go unnoticed in the early stages of developing an application. It’s only when the application is in production, with a sizable footprint, that we begin to see the harm that this causes. And as the application grows, these issues become harder and harder to fix.

So what do we do about it?

We solve this with two different annotations in our models. In EF Core 7, here is an example of each of these:

    [System.ComponentModel.DataAnnotations.Schema.Column(TypeName = "varchar")]
    [System.ComponentModel.DataAnnotations.MaxLength(50)]
    public string? MyStringField { get; set; }

As you can see, this indicates to EF Core that the string is still meant to hold variable-length data, but that it uses the ASCII data type and has a limit of 50 characters.

Summary

Once again, EF Core is not necessarily slow, but it is slow by default. And going along with the default behavior when it is the most inefficient option is a recipe for disaster. This is why many people, myself included, are inclined not to use it. But it is a popular ORM, so knowing how to make it better is vital.

The Case for Stored Procedures

In some parts of the C# community, I’ve noticed that there seems to be a desire to avoid writing SQL code. Some advocates for Entity Framework have even cited the ability to avoid writing SQL as one of its benefits. I admit that I am no EF expert. If I had to choose an ORM, my list stops at Dapper, and I normally roll my own data access layer.

But I’m not here to try to dissuade anyone from using Entity Framework. What I’d like to do, though, is lay out some of the benefits of embracing SQL in your solutions, specifically Transact-SQL and the use of stored procedures in SQL Server.

First let me share my typical design for a new solution based on a relational database. I not only use a database-first approach, but I build the database using scripts that are part of a repository. Everything after the CREATE DATABASE statement itself is in the repo. Each script is essentially idempotent. In other words, the script can be executed repeatedly without adverse effect. For tables, this means checking for existence before creating them. For changes to existing tables, additional logic is used to (for example) check for the existence of a new column before adding it. For other objects, this is mostly just using the “CREATE OR ALTER” syntax. I also take steps to maintain backward compatibility, but that’s for another post.

By the way, kudos to the attendee at my PASS Summit 2022 session who called me out for using the word idempotent. I’ve been throwing that word out for years, knowing that it’s not quite what I mean, but close enough. So, no, the scripts aren’t really idempotent, but I haven’t found a better word yet for, “I can F5 them to death.”

Anwyay, I also include PowerShell scripts in the repo that execute the T-SQL scripts on a database, and incorporate these into a CI/CD pipeline. I won’t explain the whole pipeline, but one crucial part is that a merge of an approved PR into the dev branch will trigger a deployment to a reference development environment. The scripts are all executed against the database(s) in this environment, and then application code is deployed to the servers.

The point I’m trying to make is that the database code is treated as part of the solution. It is in the repos just like the C# code. It is deployed just like the C# code. I even include unit testing, just like the C# code.

I also keep all T-SQL in the database. There is no T-SQL code inside the C# code. Stored procedures are used for all functionality, even basic CRUD functions. In fact, probably 80–90% of the stored procedures in my systems are essentially CRUD functions. The only operation that doesn’t go through stored procedures is bulk insertion, and I use SqlBulkCopy for that.

Here is an example of my approach, one that I put together for that Summit session: https://dev.azure.com/downshiftdata/_git/SearchOverflow

Why Do It This Way?

  1. Changes to the database are tracked in the repo just like changes to application code. So knowing who changed what, when they changed it, who reviewed it, when it got deployed, etc, is no more difficult to answer for database changes than application code changes.
  2. The stored procedures act as an interface layer, keeping database code in the database and application code in the application. Radical changes to the database are possible, without any effect on application code and without incurring any downtime. Performance issues in the database can be addressed quickly, through hotfixes, without redeploying the application. And the database design can take advantage of this interface layer. For example, instead of relying on AFTER triggers, audit logging can be done through OUTPUT clauses inside the stored procedures. This is because you can reasonably guarantee that all the writes are coming in through the procedures (and even support this assumption with appropriate authorization configuration).
  3. Stored procedures are compiled, with their plans in the procedure plan cache. Yes, this happens with ad hoc queries as well (and I’m an advocate for the “Optimize for Ad Hoc Workloads” option in SQL Server). But there are issues like parameter sniffing and cache bloat that need to be handled. Routing execution through stored procedures makes it easier to manage the cache and understand what is happening in the database.

The Other Side

I’ve seen quite a few arguments against approaches like this. Here are some of the more popular ones:

  1. If I use Entity Framework and take a code-first approach, then it does all the SQL for me and I don’t have to worry about it.

Once again, I’m not an EF expert, and don’t know the full scope of what it can and can’t do. But I’ve heard variations of this one quite a lot. And the argument falls apart like any other code-generation argument. It’ll work fine 90% of the time, and then you’ll be in trouble when you hit the other 10%. Now, in some situations, this may work out. You’ll always be inside that 90%. But to me, it’s a calculated risk, and one that I don’t have to make because it’s not the approach I take.

Where this one concerns me is in situations in which a DBA (or someone in a similar role) was essentially ignored. The DBA advocated for something less wizard-y, and had good reasons for it. But they’re often outnumbered and – frankly – not well respected, and decisions are made despite their arguments. And then the situation hits that 10% and the DBA’s list of options is limited. They often know the change they should make, but they can’t make it because that code is being generated in the application layer. Or the change they want to make will break application layer code.

  1. That’s the old way of doing things.

This makes me cringe every time. Yeah, we all wrote stored procedures twenty years ago, but that’s not how you do things today. This goes hand-in-hand with the idea that RDBMSs themselves are antiques and shouldn’t be used. I’m a firm believer in the KISS principle. And despite being ancient in the tech world, the RDBMS is a straightforward solution to a lot of problems. Would a big data store be better for a certain situation? Of course. Will your app ever reach a threshold that demands the advantages of that big data store? If not, then why add to the complexity of your solution? If your tech stack includes an RDBMS, then you have one product that can reasonably support a host of requirements.

  1. The database is the database and the application code is the application code and the two shouldn’t be mixed.

In my experience, those who make this case have had to suffer through thousand-line stored procedures that tried to do everything in the database. That is indeed a nightmare situation, and one that I’ve seen all too often.

But their response is to swing fully in the other direction. Don’t let any code reside in the database. But the truth is that there are scenarios in which it makes sense to go either direction. If you need to join several large tables, have good covering indexes, need a timely response, are faced with frequent cache invalidation, and only need a few rows in each call, it makes a mountain of sense to have that functionality in a stored procedure. Yes, that could be dynamically generated in the application layer. But there is definitely a non-zero chance of getting a bad execution plan in that scenario. And as I said before, your options then are much more limited.

Conclusion

Stored procedures form an interface layer, between the database and the application code. This layer of abstraction is incredibly beneficial to application security, maintainability, and scalability. Not using it is a design decision that should be considered very carefully.

When I am in conversations with others in the database community, or with coworkers who share some of my experience, I wouldn’t dream of having to make these arguments. But now that I’m back in C# land more than I used to be, I’m shocked at how often my opinions are at odds with others in this community. So, I don’t know if I’ve put any of those arguments to rest – probably not – but maybe that anti-stored-procedure crowd is at least thinking a bit more about the situation.

T-SQL Tuesday #106

This is a response to T-SQL Tuesday #106, Trigger Headaches or Happiness, by Steve Jones.

I can only recall one time in the past several years (at least a decade) that I’ve found triggers to be useful. It involves data migration.

The problem: You have a massive, high-activity table. Let’s call this Table_A. You need to make significant changes to it. For example, the clustered index needs to change. How do you accomplish this?

The solution: Create a script that does the following:

  1. Create a second table, which we’ll call Table_B. This one will eventually become the new Table_A, so design it with your changes in mind.
  2. Create a third table, which we’ll call Table_C. This one is like Table_A as it is now, except that it includes an additional identity column, and is clustered on that column. Assuming there’s an existing clustered index on Table_A, recreate that as a non-clustered index on Table_C. Depending on how Table_A is updated, you may need additional columns to track what updates occur.
  3. Create a trigger on Table_A. This trigger duplicates all changes in Table_A to Table_C.
  4. Looping via a suitable batch size for your environment, write all rows from Table_A to Table_B.
  5. Looping again, write all rows from Table_C to Table_B (taking into account the appropriate insert/update/delete logic for your situation). Note where you stopped with Table_C, the “high water mark” for that identity column.
  6. Call sp_rename to change Table_A to Table_D, then again to change Table_B to Table_A.
  7. From the high water mark, write from Table_C to the newly-renamed Table_A.
  8. My favorite part: Drop Table_C, Table_D, and the trigger.

There are caveats to this method, of course, but they have been acceptable for my situations. They are:

  • The table is unavailable (non-existent, really) between the two renames in Step 6. This is an extremely brief window, but it does need to occur. Also, in order to apply a trigger, the table needs to be briefly locked, which may present a problem.
  • Step 7 is present so that changes between Steps 5 and 6 are carried over to the new table. However, these can occur after the new table is active following Step 6, meaning that the following scenario is possible:
    • A row is updated in Table_A, and the change is carried over to Table_C.
    • The renames occur.
    • The same row is updated in the new Table_A.
    • Th second change is overwritten with the first change.

If the table you wish to migrate has a considerable number of updates and deletes, then this solution may present a data integrity problem. However, for insert-heavy tables, it has proven to work very well. Also, a second trigger on the new table and some additional logic could circumvent the second issue I described, depending on your situation.

So there you go – a use for triggers.

Godspeed!

tsqllint

My PASS local group, IndyPASS, has its monthly meeting tonight. At my insistence, first-time presenter Nathan Boyd is showing off a SQL tool called tsqllint. Nathan, a coworker of mine at Salesforce, is the leading developer behind this GitHub project.

A lint (or linter), if you didn’t know, “analyzes source code to flag programming errors, bugs, stylistic errors, and suspicious constructs” (wikipedia). This one is designed specifically for T-SQL, is highly configurable, and includes a Visual Studio Code extension. What more could you want, right? If you want cleaner T-SQL code out of your developers, with less hassle on the part of your reviewers, it’s definitely worth your time.

If you’re in the area, keep in mind there’s a location change tonight. While IndyPASS usually meets at Virtusa, 1401 North Meridian (formerly Apparatus), this month’s meeting is at Moser Consulting in Castleton. As usual, doors open at 5:30pm, and we’ll turn it over to Nathan by about 6:15pm.

T-SQL Tuesday, Microsoft Connect Edition

This is a bit of a deviation from my usual post. You can thank Brent Ozar for that. He decided to host Adam Machanic‘s T-SQL Tuesday this month. Since Brent Ozar Unlimited‘s newsletter is about the only one I read on a daily basis (and that even includes any racing-related ones I get!), I noticed it and thought I’d participate.

Brent’s topic is Microsoft Connect items. The one I chose is… well, let’s call it a pi-in-the-sky feature request. Ok, no more puns, I promise!

Ah, yes, SQL Server on Raspberry Pi… Once the Linux news came out, I think it’s safe to say it was only a matter of time before we saw a Connect suggestion like this. What I’d like to do is go through a little thought exercise on why it would indeed be a good idea, why Microsoft didn’t bake it in already, why it doesn’t work, and what needs to be done to get it there.

By the way, the Connect item also references R. I’m going to pretend for now that I didn’t see that. Maybe somebody like Ginger Grant will take a crack at it.

Once again, this isn’t my usual blog post. Typically, if I pose a question, I’ll give the answer. In this case, though, I don’t have all the answers. My hope here is simply to get a conversation going. So…

Is It A Good Idea?

This is basically a feature request. There are a lot of factors that go into the decision to add a feature to a product. One that is vital, but sometimes overlooked, is whether or not it fits within the overall theme of the product. It could be argued that small devices, in general, are not really a market for SQL Server. But I think Microsoft settled that argument when they announced Linux support. This is less about compact devices and more about running on one of the most popular Linux platforms.

Another factor is adoption. No one wants to add a feature that doesn’t get used. What would SQL Server on Pi even be used for? Truthfully, I really don’t know. But that’s the beauty of Raspberry Pi. The foundation’s initial goal was for use in education and developing countries, but it has found uses far beyond the original target markets. There are a lot of hobbyists out there, experimenting with Raspberry Pi. I’ll bet quite a few of them interact with SQL Server in their day job. So, rather than ask why Microsoft should add this feature, should we ask why not? I think this is a classic case of putting it out there and just waiting to see what happens.

Why Doesn’t Microsoft Support It Already?

I know only one answer to this question, and it drives the rest of this post. But first, let’s be clear – Microsoft put SQL Server on Linux to compete with Oracle. I don’t think anyone’s questioning that rationale. So, something like Raspberry Pi, while it may be a “nice to have,” is not on the v1 list of requirements.

But Raspberry Pi is just a platform for Linux, and SQL Server runs on Linux, so what’s the problem? Well, there may be others, but there is one absolute show-stopper. SQL Server’s minimum memory requirement on Linux is 3.25gb. The Raspberry Pi 3 Model B comes with 1gb. Which leads us to…

Why Doesn’t It Work?

Here we have an impasse. You can’t add memory to Pi. Well, you could, but it’s going to involve a soldering iron, hardware schematics, probably some assembly language skills… effectively, though, you can’t. You get what you get, and right now you get 1gb at most.

As for SQL Server, it’s not an empty requirement. It won’t install on less than 3.25gb. And once you install it, if you then lower the amount of memory, it won’t start. Ironically, I used an Oracle product, VirtualBox, to test this.

The reasons for this requirement are a bit cloudy to me. If you research memory use in SQL Server, you find two types of results. First, you have the question of just how much memory SQL Server can use effectively, on the high end. That’s not exactly our problem.

Second, though, you have the issue of how SQL Server uses memory. It has its own memory manager. When you configure it, you give it a constraint – how much memory to use – and it does its own thing inside this constraint. Incidentally, it’s surprising the number of Google search results are along the lines of “Why is SQL Server using all of my memory?” Well, because you allowed it to!

So, approaching SQL Server’s memory usage from the angle of “How can I get it to use as little as possible?” is a bit outside the norm. On that barebones Ubuntu VM of mine, I can see an idle SQL Server only consuming about 700mb. Yes, that’s while idle, and SQL Server will need room to do its job, but that’s a far cry from 3.25gb. Incidentally, it’s also consuming about 2.7gb of virtual memory.

You can see, to a degree, what SQL Server is doing with that memory. The most helpful DMV is probably sys.dm_os_memory_clerks. As the name implies, it shows how much memory is under control of all of the active mid-level memory allocators. If you add the pages_kb up, though, you don’t get the full memory footprint. Of course, the executing code needs space as well, and here’s where things get blurry. SQL Server stores its data in 8k pages, but it also uses those same pages to hold some system objects. So the question of what is used by the engine, by features, by cached data… it begs the usual PASS community response… it depends.

As a side note, what’s the least helpful DMV? That would be sys.dm_os_sys_memory. It doesn’t even work on Linux yet. That’s another Connect item.

What Needs To Be Done?

This is the question I really can’t answer. Could Microsoft give us the option to disable some features that don’t make sense on a compact system? How about full-text indexing? Would it really be necessary in this case? Ironically, even though you could disable it at one time, you no longer can. This leads me to believe there is the possibility that Microsoft has somehow made use of it elsewhere in the engine, which would kill it as a candidate. I wonder if that’s the case.

Another good candidate might be CLR. Once again, is this really necessary in a small system? And due to the inherent security vulnerabilities, this is a pretty isolated feature.

But wait! Before we go too far down that path, Microsoft has already figured it out! Check out the requirements for SQL Server 2016. Express Edition requires only 512mb!

I’m at a loss at this point. Does Microsoft have any plans to trim the Linux memory requirement? What did they do to allow Express to run so light? Can that strategy be applied? And, should they remove this barrier, what else is holding this up?

The most important question to me, though, is this: If you had a Raspberry Pi with SQL Server on it, what would you do with it? Comments are welcome!

SQL Saturdays

Sharpening my technical skills has been an ever-present goal since I started assembling computers for a guy named Dale around 1990. Over the years, some work better than others. There’s definitely nothing that can compare to hands-on experience, but that’s often a Catch-22. You have to have the knowledge to do the work to get the knowledge. So it falls to other means. At one time, I used Microsoft certification exams to put my learning on a schedule. Sign up for the exam, and I was suddenly under the gun to study for it. I’ve also read quite a few books over the years (see my last post for a few of my favorites). Conferences have helped – a couple of TechEd’s back in the day, and now I’m a regular at the PASS Summit.

But one of the most effective has been SQL Saturdays. I present at them semi-regularly, and I always make a point to be an attendee there as well, not just hide away in the presenter’s room until my turn comes around. If you’re in a line of work that would benefit from better knowledge about SQL Server and related technologies, and you haven’t already been to one, seek them out. I think it’s well worth the time and definitely worth the minimal expense.

Thanks to Brent Ozar, where I get a lot of tidbits like this, I have one more thing to share: A reflection on ten years of SQL Saturdays by one of the founders.

Godspeed.

Fonts and Frustration

TL;DR – There are a couple of XML files at the end of this post. If you regularly present technical material using SSMS, download these.

I present technical sessions now and then – my local PASS group, SQL Saturdays, internal groups at my workplace, etc. I frequently find myself adjusting the fonts inside SQL Server Management Studio to make sure my material is readable on the big screen. I’ve also been in the audience plenty of times, watching with sympathy as one of my cohorts agonizingly navigates this problem.

Usually, it goes something like this. They first find the [100%] tucked away in the lower left corner of the text window, and blow that up to 150 or 200 percent. Then they run their query to find that the results are still at 100%. So then they eventually find the Options dialog under the Tools menu, find the Fonts and Colors branch of the tree, and then groan when they realize they have to figure out which three or four of the 30 different fonts they need to change. Sometimes, they’ll give up there and just go use ZoomIt (which any good technical presenter should have available anyway), but constantly bouncing around with ZoomIt will get old quickly over the course of an hour-long session.

But if they do manage to find the right fonts to change and take a good stab at what they ought to be, they get this wonderful message:

Font Frustration

Just the thing you want to see when you already have all your demo scripts loaded, right?

Oh, and don’t forget that – when the session is over – you now have to go through the same exercise to get SSMS back where you had it before the session.

So quite a while ago, I generated a couple of .reg files for myself, one called PresentationFonts.reg and one called NormalFonts.reg. You can imagine what these did when I applied them to the Windows Registry.

That worked great… until recently. The SQL Server Tools team has done some marvelous things with SSMS lately, and I’m very happy with the changes. But take a close look at one of those things they did:

Version Information

And where does this new shell keep its settings? Here’s a hint – it’s not in the registry. It’s actually in this file:


{LocalApplicationDataPath}\Microsoft\SQL Server Management Studio\13.0\ApplicationPrivateSettings

And this file is some bizarre hybrid of XML, JSON (with JSON inside of JSON, no less!), and I don’t know what else.

Fortunately, there is an option available. Under the Tools menu, there is “Import and Export Settings…”, which gives you a wizard for importing some or all settings from an XML file. So, with that in mind, here are my files that I use.

  • PresentationFonts.vssettings – This changes the font size to 16 for Text Editor, Execution Plan, Grid Results and Text Results.
  • NormalFonts.vssettings – This changes the font size to 9 for Text Editor, Execution Plan, Grid Results and Text Results.

NOTE: When you save these, save them with the .vssettings extension. Since I’m a cheapskate and use wordpress.com to host this blog, I’m prevented from using whatever extension I want. So they’ll show up as .doc files in your download dialog, but they really are just text XML. And the Import/Export wizard looks specifically for .vssettings files.

Obviously, you may not use the same settings I do, and you’ll have to customize them for your own uses. If you change the same four that I do, then all you have to do is fiddle with the sizes in the files. If you wish to change different fonts, you’ll want to export your settings with that wizard, change the font you want, export them again, and compare the files in order to figure out which GUID is which.

In any case, I strongly recommend having a pair of files like these parked right next to your demo scripts, where you’ll remember to run them as you prepare for your session.

I know this is a rather long post for a rather small tip, but I’m amazed at just how many of us fight this problem. If I had a dollar for every time I’ve seen a presenter struggle with font sizes, my Azure subscription would be paid for.

NULL Pain Revisited

In an earlier post, I covered an issue with special characters in Transact-SQL. I ended up creating a scalar function to handle it. Well, fast-forward a few months and now my query’s inability to go parallel is really starting to cause trouble. And, yes, scalar functions are absolutely poisonous to parallelism.

Anyway, I revisited the idea of stripping the null characters inline, rather than calling a function, and somewhere out there on the intarwebs I found this gem:


SELECT @var = REPLACE(@var COLLATE Latin1_General_BIN, NCHAR(0x00) COLLATE Latin1_General_BIN, NCHAR(9));

In my case, I’m replacing all instances with a tab character.

But I can’t find or remember where I got it! So, my sincere apologies to the real source, whoever you are. But even if I can’t claim ownership or identify the source, I still want to make sure this one’s out there for everyone else’s benefit.

Godspeed!

Compile Time Part 2

TL;DR – Square brackets do not save the optimizer any time.

If you didn’t already read Part 1, you might want to go back and do that.

I will easily admit that I can be rather obsessive about code formatting sometimes. Frankly, inconsistency annoys me. When it comes to Transact-SQL, this can translate into all sorts of little quirks in my code. One of the latest ones is overuse of [square brackets].

They have their place, of course. There’s a reason the QUOTENAME function exists. And there will always be some annoying fool that likes to create columns with names like “name” and “type” or – worse – put actual SPACES in them!

But I’ll admit it – I’ve been getting carried away. I blame Postgres, which won’t let me use Pascal casing without throwing quotes around every identifier. But I digress…

Anyway, this obsession had me thinking – does wrapping identifiers in square brackets save SQL Server any time? Does it say to the optimizer, “Hey, I PROMISE this whole thing inside these square brackets is an identifier. Cross my heart.” And the optimizer takes your code at its word and doesn’t look through its list of reserved keywords for one that matches AccountCreateDate or address_line_2?

The answer is… no. Throwing every identifier into square brackets doesn’t speed it up at all. Here’s the test:


IF EXISTS (SELECT 1 FROM sys.procedures WHERE object_id = OBJECT_ID('dbo.j1'))
DROP PROCEDURE dbo.j1;
GO
CREATE PROCEDURE dbo.j1
AS
EXEC(N'DECLARE @rowguid UNIQUEIDENTIFIER; SELECT @rowguid = rowguid FROM Person.Address WHERE AddressID = 1 OPTION (RECOMPILE);');
GO

IF EXISTS (SELECT 1 FROM sys.procedures WHERE object_id = OBJECT_ID('dbo.j2'))
DROP PROCEDURE dbo.j2;
GO

CREATE PROCEDURE dbo.j2
AS
EXEC(N'DECLARE @rowguid UNIQUEIDENTIFIER; SELECT @rowguid = [rowguid] FROM [Person].[Address] WHERE [AddressID] = 1 OPTION (RECOMPILE);');
GO

DBCC FREEPROCCACHE;
GO

DECLARE @i INT = 1;
WHILE (@i <= 100000)
BEGIN
EXEC dbo.j1
EXEC dbo.j2
SELECT @i = @i + 1;
END;
SELECT OBJECT_NAME(object_id) AS 'procedure_name', total_worker_time / 1000 AS 'cpu_in_ms'
FROM sys.dm_exec_procedure_stats
WHERE object_id IN (OBJECT_ID('dbo.j1'), OBJECT_ID('dbo.j2'))
ORDER BY OBJECT_NAME(object_id);
GO

Note that this was written to use AdventureWorks2014. Also note that I’m running ad hoc SQL inside of a pair of stored procedures so that I can measure a very quick and small bit of SQL over a lot of executions, forcing it to recompile every time yet keeping the plan cache from blowing up. See Part 1 if you don’t believe that my ad hoc SQL’s compile time is factored into total_worker_time.

Anyway, this code takes about a minute and a half to run on my system. The results? A dead heat. Sometimes, j1 wins, and sometimes, j2 wins. There’s no significant difference.

Compile Time Part 1

TL;DR – The sys.dm_exec_{query|procedure}_stats.*_worker_time fields do not include compile time. But the values in sys.dm_exec_procedure_stats.*_worker_time DO include compile time for ad hoc SQL within the procedures. Also, as a side note, a lot of indexed views can be a significant performance problem.

I recommend a blog post by Jonathan Kehayias entitled, “Identifying High Compile Time Statements from the Plan Cache“. In that, he’s concentrating on queries that have a high individual compile cost.

However, the object of my search was small queries being compiled and executed in high volume (more on why in Part 2 of this). In my case, I was looking for ad hoc SQL being generated and executed inside of stored procedures – the kind of queries that don’t make their way to the plan cache, often due to the use of OPTION (RECOMPILE) to keep them from spamming the cache.

What I found was that worker time needed to compile these queries is indistinguishable from that needed to execute them. To show this, let’s look at an example in AdventureWorks2014. In this example, I’m going to create and execute two similar procedures. I’m also going to create a number of indexed views.

Why indexed views? I want to increase compile time significantly for this exercise, and a large number of indexed views can do that. From MSDN: “The query optimizer may use indexed views to speed up the query execution. The view does not have to be referenced in the query for the optimizer to consider that view for a substitution.” My thanks to oas_public on stackoverflow.com for that tip.

Anyway, let’s create a stored procedure that will use our indexed view:


CREATE PROCEDURE dbo.j1
AS
BEGIN
DECLARE @stmt NVARCHAR(4000);
SELECT @stmt = N'/* dbo.j1 */
SELECT TOP 1
SUM(UnitPrice * OrderQty * (1.00 - UnitPriceDiscount)) AS Rev,
OrderDate,
ProductID
FROM Sales.SalesOrderDetail AS od
INNER JOIN Sales.SalesOrderHeader AS o
ON od.SalesOrderID = o.SalesOrderID
AND ProductID BETWEEN 700 and 800
AND OrderDate >= CONVERT(datetime,''05/01/2002'',101)
GROUP BY OrderDate, ProductID
ORDER BY Rev DESC;';
EXEC(@stmt);
END; --PROCEDURE
GO

By the way, this evolved from sample code on MSDN. Now here’s a second stored procedure that is ever-so-slightly different:


CREATE PROCEDURE dbo.j1
AS
BEGIN
DECLARE @stmt NVARCHAR(4000);
SELECT @stmt = N'/* dbo.j2 */
SELECT TOP 1
SUM(UnitPrice * OrderQty * (2.00 - UnitPriceDiscount)) AS Rev,
OrderDate,
ProductID
FROM Sales.SalesOrderDetail AS od
INNER JOIN Sales.SalesOrderHeader AS o
ON od.SalesOrderID = o.SalesOrderID
AND ProductID BETWEEN 700 and 800
AND OrderDate >= CONVERT(datetime,''05/01/2002'',101)
GROUP BY OrderDate, ProductID
ORDER BY Rev DESC;';
EXEC(@stmt);
END; --PROCEDURE
GO

To make this a fair test, let’s clear the cache:


DBCC FREEPROCCACHE;
GO

And execute both of the procedures:


EXEC dbo.j1;
EXEC dbo.j2;

How long did they take?


SELECT OBJECT_NAME(object_id) AS 'procedure_name', total_worker_time / 1000 AS 'cpu_in_ms'
FROM sys.dm_exec_procedure_stats
WHERE object_id IN (OBJECT_ID('dbo.j1'), OBJECT_ID('dbo.j2'))
ORDER BY OBJECT_NAME(object_id);

On my local system, both run in about 85-105 ms. Obviously, YMMV.

Now let’s create an indexed view. I’m going to reuse this code a couple of times, so bear that in mind when you see how it’s constructed:


--Set the options to support indexed views.
SET NUMERIC_ROUNDABORT OFF;
SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT, QUOTED_IDENTIFIER, ANSI_NULLS ON;
GO

DECLARE @i INT = 1, @stmt NVARCHAR(4000), @t1 NVARCHAR(4000), @t2 NVARCHAR(4000), @t3 NVARCHAR(4000)

-- Create template for dropping the view.
SELECT @t1 = N'/* jtest */
IF OBJECT_ID(''Sales.vOrders_{1}'') IS NOT NULL
DROP VIEW Sales.vOrders_{1};';

-- Create template for creating the view.
SELECT @t2 = N'/* jtest */
CREATE VIEW Sales.vOrders_{1} WITH SCHEMABINDING
AS
SELECT
SUM(UnitPrice*OrderQty*(1.00-UnitPriceDiscount)) AS Revenue,
OrderDate,
ProductID,
COUNT_BIG(*) AS COUNT
FROM Sales.SalesOrderDetail AS od
INNER JOIN Sales.SalesOrderHeader AS o
ON od.SalesOrderID = o.SalesOrderID
GROUP BY OrderDate, ProductID;';

-- Create template for creating the index.
SELECT @t3 = N'/* jtest */
CREATE UNIQUE CLUSTERED INDEX IDX_V1_{1}
ON Sales.vOrders_{1} (OrderDate, ProductID);';

-- Drop and create the objects.
WHILE (@i <= 1)
BEGIN
SELECT @stmt = REPLACE(@t1, N'{1}', CAST(@i AS NVARCHAR(4000)));
EXEC (@stmt);
SELECT @stmt = REPLACE(@t2, N'{1}', CAST(@i AS NVARCHAR(4000)));
EXEC (@stmt);
SELECT @stmt = REPLACE(@t3, N'{1}', CAST(@i AS NVARCHAR(4000)));
EXEC (@stmt);
SELECT @i = @i + 1;
END; --WHILE
GO

Note that the indexed view supports dbo.j1, but not dbo.j2. Let’s go back and run those again:


DBCC FREEPROCCACHE;
GO

EXEC dbo.j1;
EXEC dbo.j2;

SELECT OBJECT_NAME(object_id) AS 'procedure_name', total_worker_time / 1000 AS 'cpu_in_ms'
FROM sys.dm_exec_procedure_stats
WHERE object_id IN (OBJECT_ID('dbo.j1'), OBJECT_ID('dbo.j2'))
ORDER BY OBJECT_NAME(object_id);
GO

If your results are like mine, you’ll see that dbo.j1 has dropped to about a quarter of its previous worker time, but dbo.j2 hasn’t budged. So, our indexed view is doing what we want. Now let’s create a few more. How does 500 sound? In that WHILE loop for the indexed view script, change the constant from 1 to 500 and run it again. Bear in mind, this took about 90 seconds on my system, so be prepared to wait.

Once all 500 indexed views are created, execute the two stored procedures again and look at the results. Both worker time values will be much higher!

But the actual execution time hasn’t changed. We’re still going to get essentially the same plan for each query, which means SQL Server should be doing the same work to execute it. But now the compilation is much more costly because the optimizer is looking at all 500 views when generating the plan. And because that compilation is taking place inside of ad hoc SQL in a stored procedure, SQL Server is reporting it as part of the procedure’s worker time in sys.dm_exec_procedure_stats.

To verify this, you can run Jonathan’s code, which will break out the compile time for those ad hoc queries.

If you’ve been following along in your own database and want to gracefully clean up the mess, you’ll want to do two things. First, execute this:


DROP PROCEDURE dbo.j1;
DROP PROCEDURE dbo.j2;
GO

And second, go back to the indexed view script, comment or remove the last two EXEC lines, and run it again.

In conclusion, the *_worker_time fields in sys.dm_exec_query_stats and sys.dm_exec_procedure_stats do NOT include compile time. That was why Jonathan had to dig into the cached plan and tease that information out of the XML. But the compile time of ad hoc statements inside of a stored procedure WILL be included in the procedure’s execution time.

And too many indexed views can suck.

Stay tuned for Part 2, which explains why I chased this down in the first place.