Technical Interviews

Because it’s fresh in my mind, I’ll leave this here for candidates looking for software engineering roles. I typically interview for back-end C#/.Net/T-SQL developer roles, so what I have to say here is very much targeted to that. Anyway, in an interview, I generally ask two technical questions:

What are the three methods for adding services in Dependency Injection?
What’s the difference between a clustered and a non-clustered index?

The three methods are AddScoped(), AddSingleton(), and AddTransient(). They govern the scope/lifetime of resources created through DI. If you are capable of answering that, then it can easily lead to a discussion about their differences, when to use each, more general DI topics, etc.

The difference between the indexes is that the clustered index is the table (and so there can be only one). It’s how data in the table is organized in storage and in memory. Non-clustered indexes are additional indexes that are stored separately, but have in their leaf nodes the keys of the clustered index. And while there is a limit, the number of them for any given table is more of a design/performance choice. Here, the discussion can head toward B-tree logic, branch versus leaf nodes, key look-up performance, scans versus seeks, etc.

If you’re billed as a senior API developer on the Microsoft stack, then I expect you to know the former. If a senior database developer, then I expect you to know the latter. In my experience, very few candidates know both (and surprisingly few candidates know either). But what really irks me is when someone tries to bluff their way through one of these.

What I would much rather see is someone turn the question around. “Well, I know indexes obviously apply to tables and their use affects performance, but I don’t know the difference. Can you explain it to me?” Show me that you understand your limits and are willing to learn. Tell me what you do know, and let’s work from there. Let’s see how well you expand your knowledge on the fly, under a little pressure. Don’t try to BS me.

Whatever you know about these topics, I guarantee I can find some corner where I know more than you. It might take me a minute to get there, and it may be some esoteric quirk that will almost never come up in the real world, but I’ll get there. My goal is not to show I’m smarter than you (and, in fact, I’m genuinely hoping you’re smarter than me). My goal is to find your edge and pick away at it and see how you react.

Example: I had a candidate recently who was primarily a database developer. He had some .Net skills, but that part of his resume was marginal. But the recruiter brought him to me as a potentially very good candidate for the role because of his database experience. He really had that highlighted on his resume. It was clear that he thought of himself as a go-to for database development.

So I asked the index question. And he BS’d his way through an answer. And then didn’t follow up. Didn’t inquire about the accuracy of his answer. Didn’t probe at all. The interview carried on for a while after that, because I want to be sure of my assessment and not just make a snap judgment. But nothing afterward convinced me that this first impression was incorrect, so we were effectively done right there.

Lesson 1: If you think you’re an expert in an area, then you’d better know what you don’t know. If you don’t think the rabbit hole goes any deeper, then you’re not an expert. I’m very much aware of how little I know of SQL Server in general, and this is coming from someone who is regularly called “the database guy”.

Lesson 2: Failure to answer the question is not failure. Failure to follow up is. Show me that you’re willing and able to learn. Work the problem, right there in the interview. In tech, we will be faced with problems and won’t know the answers. We will be expected to figure them out. So show me that you can figure stuff out. This ends up being more true of junior candidates. They’re early enough in their careers that everything is a learning experience. But what I see is senior candidates with too much swagger, thinking they know all they need to know, and then spending the interview trying to convince me of the same. And those are exactly the kinds of candidates I will not hire.

Lesson 3: If you’re interviewing for a back-end C#/.Net/T-SQL role, go figure out the answers to those two questions. Not just what I wrote, but go for a real understanding of them. Get to the point at which you can answer “Why?” questions about them. If you go through enough interviews, I guarantee they will both come up at some point. I know I’m not the only interviewer who asks them. I stole the former from a colleague, and I know of several others who treat the latter as their go-to.

And good luck in your interviews!

Unit Testing Dynamic SQL

Dynamic SQL (aka Ad Hoc SQL) is SQL code that is generated at runtime. It’s quite common. Nearly every system I’ve supported in the past 30 years uses it to some degree, some more than others.

It can also be a particularly nasty pain point in a lot of systems. It can be a security vulnerability. It can be difficult to troubleshoot. It can be difficult to document. And it can produce some wickedly bad results.

When I started including T-SQL unit testing into my solutions a few years ago, Dynamic SQL quickly rose to the top of the pile of code to test. A pattern quickly emerged, and has gone largely unchanged since then. Here’s how it works:

First, bear in mind that I am adamant about not allowing application code to generate T-SQL code. Stored procedures provide an immensely useful API into the database, and I will enforce the exclusive use of stored procedures as that interface in every situation I possibly can. So this solution naturally builds on that stipulation.
When generating Dynamic SQL in stored procedures, I will create the procedures in pairs. The first procedure calls the second, which generates the Dynamic SQL. That second procedure then returns that result as a string to the first procedure. The first one then executes the string that it receives.
I write my unit test to execute the second procedure only, and then do a simple string comparison to a control string. If the strings match, then the unit test passes.

And that’s it.

Here’s an example, which is part of a painfully over-complicated demo project I put together for my session at the 2022 Summit:

Execute:

CREATE OR ALTER PROCEDURE [so].[Archive_SelectPagedData]
    @ArchiveId BIGINT,
    @Offset BIGINT,
    @Batch BIGINT
AS
BEGIN
    SET NOCOUNT ON;
    DECLARE @stmt NVARCHAR(MAX);

    IF (@ArchiveId IS NULL)
            OR NOT EXISTS (SELECT 1
                    FROM [so].[Archive]
                    WHERE [ArchiveId] = @ArchiveId)
        THROW 60001, 'ArchiveId', 0;

    IF NOT EXISTS (SELECT 1 FROM [so].[Archive])
        THROW 60004, 'Archive', 1;

    EXEC [so].[Archive_SelectPagedData_s]
        @ArchiveId = @ArchiveId,
        @Stmt = @stmt OUT;

    EXEC sp_executesql
        @stmt = @stmt,
        @params = N'@offset BIGINT, @batch BIGINT',
        @offset = @Offset,
        @batch = @Batch;
END;
GO

Generate:

CREATE OR ALTER PROCEDURE [so].[Archive_SelectPagedData_s]
    @ArchiveId BIGINT,
    @Stmt NVARCHAR(MAX) OUT
AS
BEGIN
    SET NOCOUNT ON;
    DECLARE
        @columns NVARCHAR(MAX) = N'',
        @fullName NVARCHAR(256),
        @orderBy NVARCHAR(MAX) = N'',
        @tableType INT;

    SELECT
            @fullName = N'[dump].' + QUOTENAME(a.[ForumName] + N'_' + tt.[TableTypeName]),
            @tableType = a.[TableType]
        FROM [so].[Archive] AS a
            INNER JOIN [so].[TableType] AS tt
                ON a.[ArchiveId] = @ArchiveId
                AND a.[TableType] = tt.[TableTypeId];

    SELECT
            @columns += N'
        ' + QUOTENAME(cd.[ColumnName]) + N',',
            @orderBy += CASE cd.[IsPrimaryKey] WHEN 1 THEN QUOTENAME(cd.[ColumnName]) + N',' ELSE N'' END
        FROM [so].[ColumnDefinition] AS cd
        WHERE cd.[TableType] = @tableType
        ORDER BY cd.[ColumnIndex];
    SELECT @columns = LEFT(@columns, LEN(@columns) - 1)
    SELECT @orderBy = LEFT(@orderBy, LEN(@orderBy) - 1)

    SELECT @Stmt = N'/* so.Archive_SelectPagedData */
SELECT{columns}
    FROM {fullName}
    ORDER BY {orderBy}
        OFFSET @offset ROWS FETCH NEXT @batch ROWS ONLY;';

    SELECT @Stmt = REPLACE(@Stmt, N'{fullName}', @fullname);
    SELECT @Stmt = REPLACE(@Stmt, N'{columns}', @columns);
    SELECT @Stmt = REPLACE(@Stmt, N'{orderBy}', @orderBy);

    RETURN 0;
END;
GO

Test:

/* ARRANGE */
SET NOCOUNT ON;
SET XACT_ABORT ON;
DECLARE
    @compare NVARCHAR(MAX),
    @id BIGINT,
    @stmt NVARCHAR(MAX);
SELECT @compare = N'/* so.Archive_SelectPagedData */
SELECT
        [Id],
        [TagName],
        [Count],
        [ExcerptPostId],
        [WikiPostId]
    FROM [dump].[foo_Tags]
    ORDER BY [Id]
        OFFSET @offset ROWS FETCH NEXT @batch ROWS ONLY;';
BEGIN TRANSACTION;
EXEC [so].[Archive_Upsert]
    @ArchiveId = @id OUT,
    @ForumName = N'foo',
    @TableType = 6,
    @SourcePath = N'bar';

/* ACT */
EXEC [so].[Archive_SelectPagedData_s]
    @ArchiveId = @id,
    @Stmt = @stmt OUT;

/* ASSERT */
IF (@compare != ISNULL(@stmt, N''))
BEGIN
    PRINT ISNULL(@stmt, N'(null)');
    THROW 50002, N'so.Archive_SelectPagedData_s_1', 0;
END;
ROLLBACK TRANSACTION;
GO

The first thing to note about this approach is that the strings must 100% match, right down to the white space. That may seem like an unrealistic obstacle at first. But my experience hasn’t supported that assumption. In fact, chasing down the few cases in which tests are failing due to small things like an extra space can often result in the discovery of larger issues that might have otherwise gone unnoticed.

Enforcing such consistency also aids in managing your execution plan reuse. Side Note: Don’t forget to check your Optimize for Ad Hoc Workloads setting!

This approach also aids documentation and troubleshooting. The unit test requires an exact instance of the string being generated. With unit testing like this, you have examples of what the code should be creating right in front of you!

And, if the existing unit tests aren’t enough to identify a problem, then the same pattern can be used while troubleshooting. Once an issue has been identified this way and rectified, then you’re already most of the way to creating a covering test for the new case!

So there ya go… unit testing for your Dynamic SQL. Enjoy!

EF Core and nvarchar(max)

My opinion was once, “Entity Framework is slow.” As I dive more deeply into the nuances of Entity Framework Core, that opinion is changing, but only slightly. It’s becoming, “Entity Framework is slow out of the box.”

There are ways to make it better. I’ll describe one of those here, but first, let’s establish the environment:

– Entity Framework Core 7.0.5

– The “code-first” approach

– SQL Server database (any in-support version)

So what does EF Core do wrong?

In this situation, when you define a string field of a model, the underlying data type will be nvarchar(max). For those who don’t know, this is a variable-length Unicode string with an effective maximum length of 2³¹-1 characters (2 GB). This is a combination of three choices that EF Core is making:

1. nvarchar(max): The desired encoding (ASCII versus Unicode) is unknown. Therefore, EF Core defaults to the “safer” option, Unicode.

2. nvarchar(max): The preference between a fixed-length string and a variable-length string is unknown. Because fixed-length Unicode strings have a size limit of 4000, while variable-length strings can be larger (see below), EF Core defaults to a variable-length string.

3. nvarchar(max): The desired length is also unknown. Therefore, EF Core defaults to the maximum possible length, “max”.

What are the problems with those decisions?

1. ASCII strings consume half the memory of Unicode strings in SQL Server. If all you need is an ASCII string, then that’s the option to choose.

2. If you have extremely consistent data lengths, then fixed-length columns can be more efficient.

An example of both of these is a phone number. The necessary characters are all available in the ASCII character set. And phone numbers (even accounting for country codes and the nuances within each of those countries) are roughly equivalent in length.

3. Even though the effective length of the data may be very small, SQL Server must assume that the column can hold a large amount of data. Therefore, it makes decisions about resources based on this assumption.

Using the phone number example again, even though every row in the table holds data in this field with a length somewhere between 10 and 20 characters, SQL Server must assume that there is the possibility that it will store 2³¹-1 characters.

For an example of how this can affect performance, see Eddie Wuerch’s response in this thread on sqlservercentral.com.

4. Data types like nvarchar(max) are known as large value data types or large object (LOB) data types. Data in these columns is handled differently than other types. For example, that data may be stored on a separate type of page (SQL Server’s designation for an 8k block of storage) than the rest of the row. This is called a LOB page. And this data can’t be indexed like other types, because normal indexes have a size limit.

Why does this matter?

Obviously, these are performance issues, and we should all keep performance in mind as we’re developing our solutions. But the bigger issue is that this behavior can go unnoticed in the early stages of developing an application. It’s only when the application is in production, with a sizable footprint, that we begin to see the harm that this causes. And as the application grows, these issues become harder and harder to fix.

So what do we do about it?

We solve this with two different annotations in our models. In EF Core 7, here is an example of each of these:

    [System.ComponentModel.DataAnnotations.Schema.Column(TypeName = "varchar")]
    [System.ComponentModel.DataAnnotations.MaxLength(50)]
    public string? MyStringField { get; set; }

As you can see, this indicates to EF Core that the string is still meant to hold variable-length data, but that it uses the ASCII data type and has a limit of 50 characters.

Summary

Once again, EF Core is not necessarily slow, but it is slow by default. And going along with the default behavior when it is the most inefficient option is a recipe for disaster. This is why many people, myself included, are inclined not to use it. But it is a popular ORM, so knowing how to make it better is vital.

The Case for Stored Procedures

In some parts of the C# community, I’ve noticed that there seems to be a desire to avoid writing SQL code. Some advocates for Entity Framework have even cited the ability to avoid writing SQL as one of its benefits. I admit that I am no EF expert. If I had to choose an ORM, my list stops at Dapper, and I normally roll my own data access layer.

But I’m not here to try to dissuade anyone from using Entity Framework. What I’d like to do, though, is lay out some of the benefits of embracing SQL in your solutions, specifically Transact-SQL and the use of stored procedures in SQL Server.

First let me share my typical design for a new solution based on a relational database. I not only use a database-first approach, but I build the database using scripts that are part of a repository. Everything after the CREATE DATABASE statement itself is in the repo. Each script is essentially idempotent. In other words, the script can be executed repeatedly without adverse effect. For tables, this means checking for existence before creating them. For changes to existing tables, additional logic is used to (for example) check for the existence of a new column before adding it. For other objects, this is mostly just using the “CREATE OR ALTER” syntax. I also take steps to maintain backward compatibility, but that’s for another post.

By the way, kudos to the attendee at my PASS Summit 2022 session who called me out for using the word idempotent. I’ve been throwing that word out for years, knowing that it’s not quite what I mean, but close enough. So, no, the scripts aren’t really idempotent, but I haven’t found a better word yet for, “I can F5 them to death.”

Anwyay, I also include PowerShell scripts in the repo that execute the T-SQL scripts on a database, and incorporate these into a CI/CD pipeline. I won’t explain the whole pipeline, but one crucial part is that a merge of an approved PR into the dev branch will trigger a deployment to a reference development environment. The scripts are all executed against the database(s) in this environment, and then application code is deployed to the servers.

The point I’m trying to make is that the database code is treated as part of the solution. It is in the repos just like the C# code. It is deployed just like the C# code. I even include unit testing, just like the C# code.

I also keep all T-SQL in the database. There is no T-SQL code inside the C# code. Stored procedures are used for all functionality, even basic CRUD functions. In fact, probably 80–90% of the stored procedures in my systems are essentially CRUD functions. The only operation that doesn’t go through stored procedures is bulk insertion, and I use SqlBulkCopy for that.

Here is an example of my approach, one that I put together for that Summit session: https://dev.azure.com/downshiftdata/_git/SearchOverflow

Why Do It This Way?

Changes to the database are tracked in the repo just like changes to application code. So knowing who changed what, when they changed it, who reviewed it, when it got deployed, etc, is no more difficult to answer for database changes than application code changes.
The stored procedures act as an interface layer, keeping database code in the database and application code in the application. Radical changes to the database are possible, without any effect on application code and without incurring any downtime. Performance issues in the database can be addressed quickly, through hotfixes, without redeploying the application. And the database design can take advantage of this interface layer. For example, instead of relying on AFTER triggers, audit logging can be done through OUTPUT clauses inside the stored procedures. This is because you can reasonably guarantee that all the writes are coming in through the procedures (and even support this assumption with appropriate authorization configuration).
Stored procedures are compiled, with their plans in the procedure plan cache. Yes, this happens with ad hoc queries as well (and I’m an advocate for the “Optimize for Ad Hoc Workloads” option in SQL Server). But there are issues like parameter sniffing and cache bloat that need to be handled. Routing execution through stored procedures makes it easier to manage the cache and understand what is happening in the database.

The Other Side

I’ve seen quite a few arguments against approaches like this. Here are some of the more popular ones:

If I use Entity Framework and take a code-first approach, then it does all the SQL for me and I don’t have to worry about it.

Once again, I’m not an EF expert, and don’t know the full scope of what it can and can’t do. But I’ve heard variations of this one quite a lot. And the argument falls apart like any other code-generation argument. It’ll work fine 90% of the time, and then you’ll be in trouble when you hit the other 10%. Now, in some situations, this may work out. You’ll always be inside that 90%. But to me, it’s a calculated risk, and one that I don’t have to make because it’s not the approach I take.

Where this one concerns me is in situations in which a DBA (or someone in a similar role) was essentially ignored. The DBA advocated for something less wizard-y, and had good reasons for it. But they’re often outnumbered and – frankly – not well respected, and decisions are made despite their arguments. And then the situation hits that 10% and the DBA’s list of options is limited. They often know the change they should make, but they can’t make it because that code is being generated in the application layer. Or the change they want to make will break application layer code.

That’s the old way of doing things.

This makes me cringe every time. Yeah, we all wrote stored procedures twenty years ago, but that’s not how you do things today. This goes hand-in-hand with the idea that RDBMSs themselves are antiques and shouldn’t be used. I’m a firm believer in the KISS principle. And despite being ancient in the tech world, the RDBMS is a straightforward solution to a lot of problems. Would a big data store be better for a certain situation? Of course. Will your app ever reach a threshold that demands the advantages of that big data store? If not, then why add to the complexity of your solution? If your tech stack includes an RDBMS, then you have one product that can reasonably support a host of requirements.

The database is the database and the application code is the application code and the two shouldn’t be mixed.

In my experience, those who make this case have had to suffer through thousand-line stored procedures that tried to do everything in the database. That is indeed a nightmare situation, and one that I’ve seen all too often.

But their response is to swing fully in the other direction. Don’t let any code reside in the database. But the truth is that there are scenarios in which it makes sense to go either direction. If you need to join several large tables, have good covering indexes, need a timely response, are faced with frequent cache invalidation, and only need a few rows in each call, it makes a mountain of sense to have that functionality in a stored procedure. Yes, that could be dynamically generated in the application layer. But there is definitely a non-zero chance of getting a bad execution plan in that scenario. And as I said before, your options then are much more limited.

Conclusion

Stored procedures form an interface layer, between the database and the application code. This layer of abstraction is incredibly beneficial to application security, maintainability, and scalability. Not using it is a design decision that should be considered very carefully.

When I am in conversations with others in the database community, or with coworkers who share some of my experience, I wouldn’t dream of having to make these arguments. But now that I’m back in C# land more than I used to be, I’m shocked at how often my opinions are at odds with others in this community. So, I don’t know if I’ve put any of those arguments to rest – probably not – but maybe that anti-stored-procedure crowd is at least thinking a bit more about the situation.

T-SQL Tuesday #106

This is a response to T-SQL Tuesday #106, Trigger Headaches or Happiness, by Steve Jones.

I can only recall one time in the past several years (at least a decade) that I’ve found triggers to be useful. It involves data migration.

The problem: You have a massive, high-activity table. Let’s call this Table_A. You need to make significant changes to it. For example, the clustered index needs to change. How do you accomplish this?

The solution: Create a script that does the following:

Create a second table, which we’ll call Table_B. This one will eventually become the new Table_A, so design it with your changes in mind.
Create a third table, which we’ll call Table_C. This one is like Table_A as it is now, except that it includes an additional identity column, and is clustered on that column. Assuming there’s an existing clustered index on Table_A, recreate that as a non-clustered index on Table_C. Depending on how Table_A is updated, you may need additional columns to track what updates occur.
Create a trigger on Table_A. This trigger duplicates all changes in Table_A to Table_C.
Looping via a suitable batch size for your environment, write all rows from Table_A to Table_B.
Looping again, write all rows from Table_C to Table_B (taking into account the appropriate insert/update/delete logic for your situation). Note where you stopped with Table_C, the “high water mark” for that identity column.
Call sp_rename to change Table_A to Table_D, then again to change Table_B to Table_A.
From the high water mark, write from Table_C to the newly-renamed Table_A.
My favorite part: Drop Table_C, Table_D, and the trigger.

There are caveats to this method, of course, but they have been acceptable for my situations. They are:

The table is unavailable (non-existent, really) between the two renames in Step 6. This is an extremely brief window, but it does need to occur. Also, in order to apply a trigger, the table needs to be briefly locked, which may present a problem.
Step 7 is present so that changes between Steps 5 and 6 are carried over to the new table. However, these can occur after the new table is active following Step 6, meaning that the following scenario is possible:

A row is updated in Table_A, and the change is carried over to Table_C.
The renames occur.
The same row is updated in the new Table_A.
Th second change is overwritten with the first change.

If the table you wish to migrate has a considerable number of updates and deletes, then this solution may present a data integrity problem. However, for insert-heavy tables, it has proven to work very well. Also, a second trigger on the new table and some additional logic could circumvent the second issue I described, depending on your situation.

So there you go – a use for triggers.

Godspeed!

tsqllint

My PASS local group, IndyPASS, has its monthly meeting tonight. At my insistence, first-time presenter Nathan Boyd is showing off a SQL tool called tsqllint. Nathan, a coworker of mine at Salesforce, is the leading developer behind this GitHub project.

A lint (or linter), if you didn’t know, “analyzes source code to flag programming errors, bugs, stylistic errors, and suspicious constructs” (wikipedia). This one is designed specifically for T-SQL, is highly configurable, and includes a Visual Studio Code extension. What more could you want, right? If you want cleaner T-SQL code out of your developers, with less hassle on the part of your reviewers, it’s definitely worth your time.

If you’re in the area, keep in mind there’s a location change tonight. While IndyPASS usually meets at Virtusa, 1401 North Meridian (formerly Apparatus), this month’s meeting is at Moser Consulting in Castleton. As usual, doors open at 5:30pm, and we’ll turn it over to Nathan by about 6:15pm.

T-SQL Tuesday, Microsoft Connect Edition

This is a bit of a deviation from my usual post. You can thank Brent Ozar for that. He decided to host Adam Machanic‘s T-SQL Tuesday this month. Since Brent Ozar Unlimited‘s newsletter is about the only one I read on a daily basis (and that even includes any racing-related ones I get!), I noticed it and thought I’d participate.

Brent’s topic is Microsoft Connect items. The one I chose is… well, let’s call it a pi-in-the-sky feature request. Ok, no more puns, I promise!

Ah, yes, SQL Server on Raspberry Pi… Once the Linux news came out, I think it’s safe to say it was only a matter of time before we saw a Connect suggestion like this. What I’d like to do is go through a little thought exercise on why it would indeed be a good idea, why Microsoft didn’t bake it in already, why it doesn’t work, and what needs to be done to get it there.

By the way, the Connect item also references R. I’m going to pretend for now that I didn’t see that. Maybe somebody like Ginger Grant will take a crack at it.

Once again, this isn’t my usual blog post. Typically, if I pose a question, I’ll give the answer. In this case, though, I don’t have all the answers. My hope here is simply to get a conversation going. So…

Is It A Good Idea?

This is basically a feature request. There are a lot of factors that go into the decision to add a feature to a product. One that is vital, but sometimes overlooked, is whether or not it fits within the overall theme of the product. It could be argued that small devices, in general, are not really a market for SQL Server. But I think Microsoft settled that argument when they announced Linux support. This is less about compact devices and more about running on one of the most popular Linux platforms.

Another factor is adoption. No one wants to add a feature that doesn’t get used. What would SQL Server on Pi even be used for? Truthfully, I really don’t know. But that’s the beauty of Raspberry Pi. The foundation’s initial goal was for use in education and developing countries, but it has found uses far beyond the original target markets. There are a lot of hobbyists out there, experimenting with Raspberry Pi. I’ll bet quite a few of them interact with SQL Server in their day job. So, rather than ask why Microsoft should add this feature, should we ask why not? I think this is a classic case of putting it out there and just waiting to see what happens.

Why Doesn’t Microsoft Support It Already?

I know only one answer to this question, and it drives the rest of this post. But first, let’s be clear – Microsoft put SQL Server on Linux to compete with Oracle. I don’t think anyone’s questioning that rationale. So, something like Raspberry Pi, while it may be a “nice to have,” is not on the v1 list of requirements.

But Raspberry Pi is just a platform for Linux, and SQL Server runs on Linux, so what’s the problem? Well, there may be others, but there is one absolute show-stopper. SQL Server’s minimum memory requirement on Linux is 3.25gb. The Raspberry Pi 3 Model B comes with 1gb. Which leads us to…

Why Doesn’t It Work?

Here we have an impasse. You can’t add memory to Pi. Well, you could, but it’s going to involve a soldering iron, hardware schematics, probably some assembly language skills… effectively, though, you can’t. You get what you get, and right now you get 1gb at most.

As for SQL Server, it’s not an empty requirement. It won’t install on less than 3.25gb. And once you install it, if you then lower the amount of memory, it won’t start. Ironically, I used an Oracle product, VirtualBox, to test this.

The reasons for this requirement are a bit cloudy to me. If you research memory use in SQL Server, you find two types of results. First, you have the question of just how much memory SQL Server can use effectively, on the high end. That’s not exactly our problem.

Second, though, you have the issue of how SQL Server uses memory. It has its own memory manager. When you configure it, you give it a constraint – how much memory to use – and it does its own thing inside this constraint. Incidentally, it’s surprising the number of Google search results are along the lines of “Why is SQL Server using all of my memory?” Well, because you allowed it to!

So, approaching SQL Server’s memory usage from the angle of “How can I get it to use as little as possible?” is a bit outside the norm. On that barebones Ubuntu VM of mine, I can see an idle SQL Server only consuming about 700mb. Yes, that’s while idle, and SQL Server will need room to do its job, but that’s a far cry from 3.25gb. Incidentally, it’s also consuming about 2.7gb of virtual memory.

You can see, to a degree, what SQL Server is doing with that memory. The most helpful DMV is probably sys.dm_os_memory_clerks. As the name implies, it shows how much memory is under control of all of the active mid-level memory allocators. If you add the pages_kb up, though, you don’t get the full memory footprint. Of course, the executing code needs space as well, and here’s where things get blurry. SQL Server stores its data in 8k pages, but it also uses those same pages to hold some system objects. So the question of what is used by the engine, by features, by cached data… it begs the usual PASS community response… it depends.

As a side note, what’s the least helpful DMV? That would be sys.dm_os_sys_memory. It doesn’t even work on Linux yet. That’s another Connect item.

What Needs To Be Done?

This is the question I really can’t answer. Could Microsoft give us the option to disable some features that don’t make sense on a compact system? How about full-text indexing? Would it really be necessary in this case? Ironically, even though you could disable it at one time, you no longer can. This leads me to believe there is the possibility that Microsoft has somehow made use of it elsewhere in the engine, which would kill it as a candidate. I wonder if that’s the case.

Another good candidate might be CLR. Once again, is this really necessary in a small system? And due to the inherent security vulnerabilities, this is a pretty isolated feature.

But wait! Before we go too far down that path, Microsoft has already figured it out! Check out the requirements for SQL Server 2016. Express Edition requires only 512mb!

I’m at a loss at this point. Does Microsoft have any plans to trim the Linux memory requirement? What did they do to allow Express to run so light? Can that strategy be applied? And, should they remove this barrier, what else is holding this up?

The most important question to me, though, is this: If you had a Raspberry Pi with SQL Server on it, what would you do with it? Comments are welcome!

SQL Saturdays

Sharpening my technical skills has been an ever-present goal since I started assembling computers for a guy named Dale around 1990. Over the years, some work better than others. There’s definitely nothing that can compare to hands-on experience, but that’s often a Catch-22. You have to have the knowledge to do the work to get the knowledge. So it falls to other means. At one time, I used Microsoft certification exams to put my learning on a schedule. Sign up for the exam, and I was suddenly under the gun to study for it. I’ve also read quite a few books over the years (see my last post for a few of my favorites). Conferences have helped – a couple of TechEd’s back in the day, and now I’m a regular at the PASS Summit.

But one of the most effective has been SQL Saturdays. I present at them semi-regularly, and I always make a point to be an attendee there as well, not just hide away in the presenter’s room until my turn comes around. If you’re in a line of work that would benefit from better knowledge about SQL Server and related technologies, and you haven’t already been to one, seek them out. I think it’s well worth the time and definitely worth the minimal expense.

Thanks to Brent Ozar, where I get a lot of tidbits like this, I have one more thing to share: A reflection on ten years of SQL Saturdays by one of the founders.

Godspeed.

Fonts and Frustration

TL;DR – There are a couple of XML files at the end of this post. If you regularly present technical material using SSMS, download these.

I present technical sessions now and then – my local PASS group, SQL Saturdays, internal groups at my workplace, etc. I frequently find myself adjusting the fonts inside SQL Server Management Studio to make sure my material is readable on the big screen. I’ve also been in the audience plenty of times, watching with sympathy as one of my cohorts agonizingly navigates this problem.

Usually, it goes something like this. They first find the [100%] tucked away in the lower left corner of the text window, and blow that up to 150 or 200 percent. Then they run their query to find that the results are still at 100%. So then they eventually find the Options dialog under the Tools menu, find the Fonts and Colors branch of the tree, and then groan when they realize they have to figure out which three or four of the 30 different fonts they need to change. Sometimes, they’ll give up there and just go use ZoomIt (which any good technical presenter should have available anyway), but constantly bouncing around with ZoomIt will get old quickly over the course of an hour-long session.

But if they do manage to find the right fonts to change and take a good stab at what they ought to be, they get this wonderful message:

Just the thing you want to see when you already have all your demo scripts loaded, right?

Oh, and don’t forget that – when the session is over – you now have to go through the same exercise to get SSMS back where you had it before the session.

So quite a while ago, I generated a couple of .reg files for myself, one called PresentationFonts.reg and one called NormalFonts.reg. You can imagine what these did when I applied them to the Windows Registry.

That worked great… until recently. The SQL Server Tools team has done some marvelous things with SSMS lately, and I’m very happy with the changes. But take a close look at one of those things they did:

And where does this new shell keep its settings? Here’s a hint – it’s not in the registry. It’s actually in this file:

{LocalApplicationDataPath}\Microsoft\SQL Server Management Studio\13.0\ApplicationPrivateSettings

And this file is some bizarre hybrid of XML, JSON (with JSON inside of JSON, no less!), and I don’t know what else.

Fortunately, there is an option available. Under the Tools menu, there is “Import and Export Settings…”, which gives you a wizard for importing some or all settings from an XML file. So, with that in mind, here are my files that I use.

PresentationFonts.vssettings – This changes the font size to 16 for Text Editor, Execution Plan, Grid Results and Text Results.
NormalFonts.vssettings – This changes the font size to 9 for Text Editor, Execution Plan, Grid Results and Text Results.

NOTE: When you save these, save them with the .vssettings extension. Since I’m a cheapskate and use wordpress.com to host this blog, I’m prevented from using whatever extension I want. So they’ll show up as .doc files in your download dialog, but they really are just text XML. And the Import/Export wizard looks specifically for .vssettings files.

Obviously, you may not use the same settings I do, and you’ll have to customize them for your own uses. If you change the same four that I do, then all you have to do is fiddle with the sizes in the files. If you wish to change different fonts, you’ll want to export your settings with that wizard, change the font you want, export them again, and compare the files in order to figure out which GUID is which.

In any case, I strongly recommend having a pair of files like these parked right next to your demo scripts, where you’ll remember to run them as you prepare for your session.

I know this is a rather long post for a rather small tip, but I’m amazed at just how many of us fight this problem. If I had a dollar for every time I’ve seen a presenter struggle with font sizes, my Azure subscription would be paid for.

NULL Pain Revisited

In an earlier post, I covered an issue with special characters in Transact-SQL. I ended up creating a scalar function to handle it. Well, fast-forward a few months and now my query’s inability to go parallel is really starting to cause trouble. And, yes, scalar functions are absolutely poisonous to parallelism.

Anyway, I revisited the idea of stripping the null characters inline, rather than calling a function, and somewhere out there on the intarwebs I found this gem:

SELECT @var = REPLACE(@var COLLATE Latin1_General_BIN, NCHAR(0x00) COLLATE Latin1_General_BIN, NCHAR(9));

In my case, I’m replacing all instances with a tab character.

But I can’t find or remember where I got it! So, my sincere apologies to the real source, whoever you are. But even if I can’t claim ownership or identify the source, I still want to make sure this one’s out there for everyone else’s benefit.

Godspeed!