Threading

A while back, I wrote an app that spawned a collection of threads to run some work in parallel, using the resources in the System.Threading namespace of the .NET Framework. Some time after that, I worked on another app that also had a threading component. This second app was reviewed by another developer from outside my immediate circle. He asked, “Why didn’t you use the System.Threading.Tasks” namespace? Uhh… because I didn’t know it existed?

That namespace was introduced in .NET Framework 4 – not exactly recent history – but I had somehow missed it for quite a long time. There are a few causes for that, but the one I’d like to focus on here is a trap that I think catches many developers at one time or another: We think we have it all figured out. While we are, to some degree, practical mathematicians – professionals who assemble algorithms to meet requirements – we are also creators. Our code is our art. And oftentimes, we don’t have the humility necessary to accept the possibility that our art isn’t beautiful. So we shy away from having the right people check our work.

This reminds me of an old saying: If you’re the smartest person in the room, then you’re in the wrong room.*

Now, this is not a commentary on my current team. I work with some really smart people, and I’m very grateful for that. But while my teammate may be one of the best PHP or Node.js coders I know, that doesn’t necessarily translate to an expertise with the .NET Framework. The true test is this – no matter how smart they are, if they’re not catching my mistakes, then I’m not being held accountable.

Lesson 1: Make sure someone’s catching your mistakes. If they’re not, then do you really think the reason is that you’re not making any?

So, back to the two apps… After the other developer’s feedback, I reworked the second one prior to release, and it passed its code reviews. The first app, meanwhile, developed some bad behavior in production. There was definitely a race condition of some sort, but I couldn’t seem to nail down where it was. I made a couple of adjustments to the code, but nothing seemed to bite. Of course, I couldn’t reproduce it in testing either.

Finally, I ripped out the threading code entirely and replaced it with nearly identical code based on System.Threading.Tasks. I was concerned about the risk of introducing more bugs, about the fact that I was still unable to reproduce the problem, and about how long it had been a problem, so I tried to remain as faithful to the original design as possible. And, yeah, honestly, I crossed my fingers.

Once this new version was released, the problem was gone.

Lesson 2: System.Threading.Tasks really is better than System.Threading.

I’ll never know what exactly fixed the problem. I could keep researching it, but the costs to me for that aren’t quite worth the benefits at this point. My takeaway was that the new stuff just simply works better. Whether that’s because it’s easier to use the right way (and harder to use the wrong way) or its internals are less buggy or some combination thereof, the end result is the same. I hope that’s old news to anyone reading this, but I wanted to share my experience just in case.

* I was unable to identify with certainty the source of this phrase. The leading candidate I found was 1962 Nobel Laureate James Watson.

T-SQL Tuesday, Microsoft Connect Edition

This is a bit of a deviation from my usual post. You can thank Brent Ozar for that. He decided to host Adam Machanic‘s T-SQL Tuesday this month. Since Brent Ozar Unlimited‘s newsletter is about the only one I read on a daily basis (and that even includes any racing-related ones I get!), I noticed it and thought I’d participate.

Brent’s topic is Microsoft Connect items. The one I chose is… well, let’s call it a pi-in-the-sky feature request. Ok, no more puns, I promise!

Ah, yes, SQL Server on Raspberry Pi… Once the Linux news came out, I think it’s safe to say it was only a matter of time before we saw a Connect suggestion like this. What I’d like to do is go through a little thought exercise on why it would indeed be a good idea, why Microsoft didn’t bake it in already, why it doesn’t work, and what needs to be done to get it there.

By the way, the Connect item also references R. I’m going to pretend for now that I didn’t see that. Maybe somebody like Ginger Grant will take a crack at it.

Once again, this isn’t my usual blog post. Typically, if I pose a question, I’ll give the answer. In this case, though, I don’t have all the answers. My hope here is simply to get a conversation going. So…

Is It A Good Idea?

This is basically a feature request. There are a lot of factors that go into the decision to add a feature to a product. One that is vital, but sometimes overlooked, is whether or not it fits within the overall theme of the product. It could be argued that small devices, in general, are not really a market for SQL Server. But I think Microsoft settled that argument when they announced Linux support. This is less about compact devices and more about running on one of the most popular Linux platforms.

Another factor is adoption. No one wants to add a feature that doesn’t get used. What would SQL Server on Pi even be used for? Truthfully, I really don’t know. But that’s the beauty of Raspberry Pi. The foundation’s initial goal was for use in education and developing countries, but it has found uses far beyond the original target markets. There are a lot of hobbyists out there, experimenting with Raspberry Pi. I’ll bet quite a few of them interact with SQL Server in their day job. So, rather than ask why Microsoft should add this feature, should we ask why not? I think this is a classic case of putting it out there and just waiting to see what happens.

Why Doesn’t Microsoft Support It Already?

I know only one answer to this question, and it drives the rest of this post. But first, let’s be clear – Microsoft put SQL Server on Linux to compete with Oracle. I don’t think anyone’s questioning that rationale. So, something like Raspberry Pi, while it may be a “nice to have,” is not on the v1 list of requirements.

But Raspberry Pi is just a platform for Linux, and SQL Server runs on Linux, so what’s the problem? Well, there may be others, but there is one absolute show-stopper. SQL Server’s minimum memory requirement on Linux is 3.25gb. The Raspberry Pi 3 Model B comes with 1gb. Which leads us to…

Why Doesn’t It Work?

Here we have an impasse. You can’t add memory to Pi. Well, you could, but it’s going to involve a soldering iron, hardware schematics, probably some assembly language skills… effectively, though, you can’t. You get what you get, and right now you get 1gb at most.

As for SQL Server, it’s not an empty requirement. It won’t install on less than 3.25gb. And once you install it, if you then lower the amount of memory, it won’t start. Ironically, I used an Oracle product, VirtualBox, to test this.

The reasons for this requirement are a bit cloudy to me. If you research memory use in SQL Server, you find two types of results. First, you have the question of just how much memory SQL Server can use effectively, on the high end. That’s not exactly our problem.

Second, though, you have the issue of how SQL Server uses memory. It has its own memory manager. When you configure it, you give it a constraint – how much memory to use – and it does its own thing inside this constraint. Incidentally, it’s surprising the number of Google search results are along the lines of “Why is SQL Server using all of my memory?” Well, because you allowed it to!

So, approaching SQL Server’s memory usage from the angle of “How can I get it to use as little as possible?” is a bit outside the norm. On that barebones Ubuntu VM of mine, I can see an idle SQL Server only consuming about 700mb. Yes, that’s while idle, and SQL Server will need room to do its job, but that’s a far cry from 3.25gb. Incidentally, it’s also consuming about 2.7gb of virtual memory.

You can see, to a degree, what SQL Server is doing with that memory. The most helpful DMV is probably sys.dm_os_memory_clerks. As the name implies, it shows how much memory is under control of all of the active mid-level memory allocators. If you add the pages_kb up, though, you don’t get the full memory footprint. Of course, the executing code needs space as well, and here’s where things get blurry. SQL Server stores its data in 8k pages, but it also uses those same pages to hold some system objects. So the question of what is used by the engine, by features, by cached data… it begs the usual PASS community response… it depends.

As a side note, what’s the least helpful DMV? That would be sys.dm_os_sys_memory. It doesn’t even work on Linux yet. That’s another Connect item.

What Needs To Be Done?

This is the question I really can’t answer. Could Microsoft give us the option to disable some features that don’t make sense on a compact system? How about full-text indexing? Would it really be necessary in this case? Ironically, even though you could disable it at one time, you no longer can. This leads me to believe there is the possibility that Microsoft has somehow made use of it elsewhere in the engine, which would kill it as a candidate. I wonder if that’s the case.

Another good candidate might be CLR. Once again, is this really necessary in a small system? And due to the inherent security vulnerabilities, this is a pretty isolated feature.

But wait! Before we go too far down that path, Microsoft has already figured it out! Check out the requirements for SQL Server 2016. Express Edition requires only 512mb!

I’m at a loss at this point. Does Microsoft have any plans to trim the Linux memory requirement? What did they do to allow Express to run so light? Can that strategy be applied? And, should they remove this barrier, what else is holding this up?

The most important question to me, though, is this: If you had a Raspberry Pi with SQL Server on it, what would you do with it? Comments are welcome!

SQL Saturdays

Sharpening my technical skills has been an ever-present goal since I started assembling computers for a guy named Dale around 1990. Over the years, some work better than others. There’s definitely nothing that can compare to hands-on experience, but that’s often a Catch-22. You have to have the knowledge to do the work to get the knowledge. So it falls to other means. At one time, I used Microsoft certification exams to put my learning on a schedule. Sign up for the exam, and I was suddenly under the gun to study for it. I’ve also read quite a few books over the years (see my last post for a few of my favorites). Conferences have helped – a couple of TechEd’s back in the day, and now I’m a regular at the PASS Summit.

But one of the most effective has been SQL Saturdays. I present at them semi-regularly, and I always make a point to be an attendee there as well, not just hide away in the presenter’s room until my turn comes around. If you’re in a line of work that would benefit from better knowledge about SQL Server and related technologies, and you haven’t already been to one, seek them out. I think it’s well worth the time and definitely worth the minimal expense.

Thanks to Brent Ozar, where I get a lot of tidbits like this, I have one more thing to share: A reflection on ten years of SQL Saturdays by one of the founders.

Godspeed.

A Christmas Wishlist

I finally had the chance to catch up on some reading last week, and it got me thinking about sharing this list. If you are in software development, then I would consider these four books to be required reading. I’m going to revisit this post in the future, because I’m sure there are more to add. But I wanted to start somewhere, and I’m confident about these four.

Fonts and Frustration

TL;DR – There are a couple of XML files at the end of this post. If you regularly present technical material using SSMS, download these.

I present technical sessions now and then – my local PASS group, SQL Saturdays, internal groups at my workplace, etc. I frequently find myself adjusting the fonts inside SQL Server Management Studio to make sure my material is readable on the big screen. I’ve also been in the audience plenty of times, watching with sympathy as one of my cohorts agonizingly navigates this problem.

Usually, it goes something like this. They first find the [100%] tucked away in the lower left corner of the text window, and blow that up to 150 or 200 percent. Then they run their query to find that the results are still at 100%. So then they eventually find the Options dialog under the Tools menu, find the Fonts and Colors branch of the tree, and then groan when they realize they have to figure out which three or four of the 30 different fonts they need to change. Sometimes, they’ll give up there and just go use ZoomIt (which any good technical presenter should have available anyway), but constantly bouncing around with ZoomIt will get old quickly over the course of an hour-long session.

But if they do manage to find the right fonts to change and take a good stab at what they ought to be, they get this wonderful message:

Font Frustration

Just the thing you want to see when you already have all your demo scripts loaded, right?

Oh, and don’t forget that – when the session is over – you now have to go through the same exercise to get SSMS back where you had it before the session.

So quite a while ago, I generated a couple of .reg files for myself, one called PresentationFonts.reg and one called NormalFonts.reg. You can imagine what these did when I applied them to the Windows Registry.

That worked great… until recently. The SQL Server Tools team has done some marvelous things with SSMS lately, and I’m very happy with the changes. But take a close look at one of those things they did:

Version Information

And where does this new shell keep its settings? Here’s a hint – it’s not in the registry. It’s actually in this file:


{LocalApplicationDataPath}\Microsoft\SQL Server Management Studio\13.0\ApplicationPrivateSettings

And this file is some bizarre hybrid of XML, JSON (with JSON inside of JSON, no less!), and I don’t know what else.

Fortunately, there is an option available. Under the Tools menu, there is “Import and Export Settings…”, which gives you a wizard for importing some or all settings from an XML file. So, with that in mind, here are my files that I use.

  • PresentationFonts.vssettings – This changes the font size to 16 for Text Editor, Execution Plan, Grid Results and Text Results.
  • NormalFonts.vssettings – This changes the font size to 9 for Text Editor, Execution Plan, Grid Results and Text Results.

NOTE: When you save these, save them with the .vssettings extension. Since I’m a cheapskate and use wordpress.com to host this blog, I’m prevented from using whatever extension I want. So they’ll show up as .doc files in your download dialog, but they really are just text XML. And the Import/Export wizard looks specifically for .vssettings files.

Obviously, you may not use the same settings I do, and you’ll have to customize them for your own uses. If you change the same four that I do, then all you have to do is fiddle with the sizes in the files. If you wish to change different fonts, you’ll want to export your settings with that wizard, change the font you want, export them again, and compare the files in order to figure out which GUID is which.

In any case, I strongly recommend having a pair of files like these parked right next to your demo scripts, where you’ll remember to run them as you prepare for your session.

I know this is a rather long post for a rather small tip, but I’m amazed at just how many of us fight this problem. If I had a dollar for every time I’ve seen a presenter struggle with font sizes, my Azure subscription would be paid for.

NULL Pain Revisited

In an earlier post, I covered an issue with special characters in Transact-SQL. I ended up creating a scalar function to handle it. Well, fast-forward a few months and now my query’s inability to go parallel is really starting to cause trouble. And, yes, scalar functions are absolutely poisonous to parallelism.

Anyway, I revisited the idea of stripping the null characters inline, rather than calling a function, and somewhere out there on the intarwebs I found this gem:


SELECT @var = REPLACE(@var COLLATE Latin1_General_BIN, NCHAR(0x00) COLLATE Latin1_General_BIN, NCHAR(9));

In my case, I’m replacing all instances with a tab character.

But I can’t find or remember where I got it! So, my sincere apologies to the real source, whoever you are. But even if I can’t claim ownership or identify the source, I still want to make sure this one’s out there for everyone else’s benefit.

Godspeed!

Compile Time Part 2

TL;DR – Square brackets do not save the optimizer any time.

If you didn’t already read Part 1, you might want to go back and do that.

I will easily admit that I can be rather obsessive about code formatting sometimes. Frankly, inconsistency annoys me. When it comes to Transact-SQL, this can translate into all sorts of little quirks in my code. One of the latest ones is overuse of [square brackets].

They have their place, of course. There’s a reason the QUOTENAME function exists. And there will always be some annoying fool that likes to create columns with names like “name” and “type” or – worse – put actual SPACES in them!

But I’ll admit it – I’ve been getting carried away. I blame Postgres, which won’t let me use Pascal casing without throwing quotes around every identifier. But I digress…

Anyway, this obsession had me thinking – does wrapping identifiers in square brackets save SQL Server any time? Does it say to the optimizer, “Hey, I PROMISE this whole thing inside these square brackets is an identifier. Cross my heart.” And the optimizer takes your code at its word and doesn’t look through its list of reserved keywords for one that matches AccountCreateDate or address_line_2?

The answer is… no. Throwing every identifier into square brackets doesn’t speed it up at all. Here’s the test:


IF EXISTS (SELECT 1 FROM sys.procedures WHERE object_id = OBJECT_ID('dbo.j1'))
DROP PROCEDURE dbo.j1;
GO
CREATE PROCEDURE dbo.j1
AS
EXEC(N'DECLARE @rowguid UNIQUEIDENTIFIER; SELECT @rowguid = rowguid FROM Person.Address WHERE AddressID = 1 OPTION (RECOMPILE);');
GO

IF EXISTS (SELECT 1 FROM sys.procedures WHERE object_id = OBJECT_ID('dbo.j2'))
DROP PROCEDURE dbo.j2;
GO

CREATE PROCEDURE dbo.j2
AS
EXEC(N'DECLARE @rowguid UNIQUEIDENTIFIER; SELECT @rowguid = [rowguid] FROM [Person].[Address] WHERE [AddressID] = 1 OPTION (RECOMPILE);');
GO

DBCC FREEPROCCACHE;
GO

DECLARE @i INT = 1;
WHILE (@i <= 100000)
BEGIN
EXEC dbo.j1
EXEC dbo.j2
SELECT @i = @i + 1;
END;
SELECT OBJECT_NAME(object_id) AS 'procedure_name', total_worker_time / 1000 AS 'cpu_in_ms'
FROM sys.dm_exec_procedure_stats
WHERE object_id IN (OBJECT_ID('dbo.j1'), OBJECT_ID('dbo.j2'))
ORDER BY OBJECT_NAME(object_id);
GO

Note that this was written to use AdventureWorks2014. Also note that I’m running ad hoc SQL inside of a pair of stored procedures so that I can measure a very quick and small bit of SQL over a lot of executions, forcing it to recompile every time yet keeping the plan cache from blowing up. See Part 1 if you don’t believe that my ad hoc SQL’s compile time is factored into total_worker_time.

Anyway, this code takes about a minute and a half to run on my system. The results? A dead heat. Sometimes, j1 wins, and sometimes, j2 wins. There’s no significant difference.

Compile Time Part 1

TL;DR – The sys.dm_exec_{query|procedure}_stats.*_worker_time fields do not include compile time. But the values in sys.dm_exec_procedure_stats.*_worker_time DO include compile time for ad hoc SQL within the procedures. Also, as a side note, a lot of indexed views can be a significant performance problem.

I recommend a blog post by Jonathan Kehayias entitled, “Identifying High Compile Time Statements from the Plan Cache“. In that, he’s concentrating on queries that have a high individual compile cost.

However, the object of my search was small queries being compiled and executed in high volume (more on why in Part 2 of this). In my case, I was looking for ad hoc SQL being generated and executed inside of stored procedures – the kind of queries that don’t make their way to the plan cache, often due to the use of OPTION (RECOMPILE) to keep them from spamming the cache.

What I found was that worker time needed to compile these queries is indistinguishable from that needed to execute them. To show this, let’s look at an example in AdventureWorks2014. In this example, I’m going to create and execute two similar procedures. I’m also going to create a number of indexed views.

Why indexed views? I want to increase compile time significantly for this exercise, and a large number of indexed views can do that. From MSDN: “The query optimizer may use indexed views to speed up the query execution. The view does not have to be referenced in the query for the optimizer to consider that view for a substitution.” My thanks to oas_public on stackoverflow.com for that tip.

Anyway, let’s create a stored procedure that will use our indexed view:


CREATE PROCEDURE dbo.j1
AS
BEGIN
DECLARE @stmt NVARCHAR(4000);
SELECT @stmt = N'/* dbo.j1 */
SELECT TOP 1
SUM(UnitPrice * OrderQty * (1.00 - UnitPriceDiscount)) AS Rev,
OrderDate,
ProductID
FROM Sales.SalesOrderDetail AS od
INNER JOIN Sales.SalesOrderHeader AS o
ON od.SalesOrderID = o.SalesOrderID
AND ProductID BETWEEN 700 and 800
AND OrderDate >= CONVERT(datetime,''05/01/2002'',101)
GROUP BY OrderDate, ProductID
ORDER BY Rev DESC;';
EXEC(@stmt);
END; --PROCEDURE
GO

By the way, this evolved from sample code on MSDN. Now here’s a second stored procedure that is ever-so-slightly different:


CREATE PROCEDURE dbo.j1
AS
BEGIN
DECLARE @stmt NVARCHAR(4000);
SELECT @stmt = N'/* dbo.j2 */
SELECT TOP 1
SUM(UnitPrice * OrderQty * (2.00 - UnitPriceDiscount)) AS Rev,
OrderDate,
ProductID
FROM Sales.SalesOrderDetail AS od
INNER JOIN Sales.SalesOrderHeader AS o
ON od.SalesOrderID = o.SalesOrderID
AND ProductID BETWEEN 700 and 800
AND OrderDate >= CONVERT(datetime,''05/01/2002'',101)
GROUP BY OrderDate, ProductID
ORDER BY Rev DESC;';
EXEC(@stmt);
END; --PROCEDURE
GO

To make this a fair test, let’s clear the cache:


DBCC FREEPROCCACHE;
GO

And execute both of the procedures:


EXEC dbo.j1;
EXEC dbo.j2;

How long did they take?


SELECT OBJECT_NAME(object_id) AS 'procedure_name', total_worker_time / 1000 AS 'cpu_in_ms'
FROM sys.dm_exec_procedure_stats
WHERE object_id IN (OBJECT_ID('dbo.j1'), OBJECT_ID('dbo.j2'))
ORDER BY OBJECT_NAME(object_id);

On my local system, both run in about 85-105 ms. Obviously, YMMV.

Now let’s create an indexed view. I’m going to reuse this code a couple of times, so bear that in mind when you see how it’s constructed:


--Set the options to support indexed views.
SET NUMERIC_ROUNDABORT OFF;
SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT, QUOTED_IDENTIFIER, ANSI_NULLS ON;
GO

DECLARE @i INT = 1, @stmt NVARCHAR(4000), @t1 NVARCHAR(4000), @t2 NVARCHAR(4000), @t3 NVARCHAR(4000)

-- Create template for dropping the view.
SELECT @t1 = N'/* jtest */
IF OBJECT_ID(''Sales.vOrders_{1}'') IS NOT NULL
DROP VIEW Sales.vOrders_{1};';

-- Create template for creating the view.
SELECT @t2 = N'/* jtest */
CREATE VIEW Sales.vOrders_{1} WITH SCHEMABINDING
AS
SELECT
SUM(UnitPrice*OrderQty*(1.00-UnitPriceDiscount)) AS Revenue,
OrderDate,
ProductID,
COUNT_BIG(*) AS COUNT
FROM Sales.SalesOrderDetail AS od
INNER JOIN Sales.SalesOrderHeader AS o
ON od.SalesOrderID = o.SalesOrderID
GROUP BY OrderDate, ProductID;';

-- Create template for creating the index.
SELECT @t3 = N'/* jtest */
CREATE UNIQUE CLUSTERED INDEX IDX_V1_{1}
ON Sales.vOrders_{1} (OrderDate, ProductID);';

-- Drop and create the objects.
WHILE (@i <= 1)
BEGIN
SELECT @stmt = REPLACE(@t1, N'{1}', CAST(@i AS NVARCHAR(4000)));
EXEC (@stmt);
SELECT @stmt = REPLACE(@t2, N'{1}', CAST(@i AS NVARCHAR(4000)));
EXEC (@stmt);
SELECT @stmt = REPLACE(@t3, N'{1}', CAST(@i AS NVARCHAR(4000)));
EXEC (@stmt);
SELECT @i = @i + 1;
END; --WHILE
GO

Note that the indexed view supports dbo.j1, but not dbo.j2. Let’s go back and run those again:


DBCC FREEPROCCACHE;
GO

EXEC dbo.j1;
EXEC dbo.j2;

SELECT OBJECT_NAME(object_id) AS 'procedure_name', total_worker_time / 1000 AS 'cpu_in_ms'
FROM sys.dm_exec_procedure_stats
WHERE object_id IN (OBJECT_ID('dbo.j1'), OBJECT_ID('dbo.j2'))
ORDER BY OBJECT_NAME(object_id);
GO

If your results are like mine, you’ll see that dbo.j1 has dropped to about a quarter of its previous worker time, but dbo.j2 hasn’t budged. So, our indexed view is doing what we want. Now let’s create a few more. How does 500 sound? In that WHILE loop for the indexed view script, change the constant from 1 to 500 and run it again. Bear in mind, this took about 90 seconds on my system, so be prepared to wait.

Once all 500 indexed views are created, execute the two stored procedures again and look at the results. Both worker time values will be much higher!

But the actual execution time hasn’t changed. We’re still going to get essentially the same plan for each query, which means SQL Server should be doing the same work to execute it. But now the compilation is much more costly because the optimizer is looking at all 500 views when generating the plan. And because that compilation is taking place inside of ad hoc SQL in a stored procedure, SQL Server is reporting it as part of the procedure’s worker time in sys.dm_exec_procedure_stats.

To verify this, you can run Jonathan’s code, which will break out the compile time for those ad hoc queries.

If you’ve been following along in your own database and want to gracefully clean up the mess, you’ll want to do two things. First, execute this:


DROP PROCEDURE dbo.j1;
DROP PROCEDURE dbo.j2;
GO

And second, go back to the indexed view script, comment or remove the last two EXEC lines, and run it again.

In conclusion, the *_worker_time fields in sys.dm_exec_query_stats and sys.dm_exec_procedure_stats do NOT include compile time. That was why Jonathan had to dig into the cached plan and tease that information out of the XML. But the compile time of ad hoc statements inside of a stored procedure WILL be included in the procedure’s execution time.

And too many indexed views can suck.

Stay tuned for Part 2, which explains why I chased this down in the first place.

IndyPASS is going Back to School

Yes, I’ve been quiet on this blog forever. That’ll change soon. I’ve been working on a couple of topics that I think may be of use to people other than me.

But in the meantime, I’m presenting “Back to School: Query and Execution Plan Fundamentals” tomorrow night at the monthly IndyPASS meeting. Think of it like a SQL 101 for developers. Come join us for some free pizza and a little bit of learning.

NULL Pain

I had an issue today that resulted in a tweet to #sqlhelp:

#sqlhelp See Remarks section: http://ow.ly/Za6HX So how do you strip out 0x000? Suggestions?

As it turns out, when you have a character string in SQL Server that contains character 0x000, it really doesn’t know what to do with it most of the time, especially when you’re dealing with Unicode strings.

I did track down http://sqlsolace.blogspot.com/2014/07/function-dbostripunwantedcharacters.html, but I generally try to avoid calling UDF’s in my queries.

I ended up settling on this, which is based on the fact that REPLACE() works when the data is ASCII

CASE WHEN LEN(REPLACE(CAST([foo] AS VARCHAR(4000)), CHAR(0), '')) = LEN([foo]) THEN [foo] ELSE [dbo].[strip_char0]([foo], NCHAR(9)) END

My strip_char0 function was mostly like the one from sqlsolace, so I won’t repeat it here. But I did add in a replacement character option (in this example, a tab). And I’m only calling the UDF when I need to, which makes the performance geek in me happy.

My thanks to @SQLSoldier for being responsive on the #sqlhelp hashtag.