Technical Interviews

Because it’s fresh in my mind, I’ll leave this here for candidates looking for software engineering roles. I typically interview for back-end C#/.Net/T-SQL developer roles, so what I have to say here is very much targeted to that. Anyway, in an interview, I generally ask two technical questions:

  1. What are the three methods for adding services in Dependency Injection?
  2. What’s the difference between a clustered and a non-clustered index?

The three methods are AddScoped(), AddSingleton(), and AddTransient(). They govern the scope/lifetime of resources created through DI. If you are capable of answering that, then it can easily lead to a discussion about their differences, when to use each, more general DI topics, etc.

The difference between the indexes is that the clustered index is the table (and so there can be only one). It’s how data in the table is organized in storage and in memory. Non-clustered indexes are additional indexes that are stored separately, but have in their leaf nodes the keys of the clustered index. And while there is a limit, the number of them for any given table is more of a design/performance choice. Here, the discussion can head toward B-tree logic, branch versus leaf nodes, key look-up performance, scans versus seeks, etc.

If you’re billed as a senior API developer on the Microsoft stack, then I expect you to know the former. If a senior database developer, then I expect you to know the latter. In my experience, very few candidates know both (and surprisingly few candidates know either). But what really irks me is when someone tries to bluff their way through one of these.

What I would much rather see is someone turn the question around. “Well, I know indexes obviously apply to tables and their use affects performance, but I don’t know the difference. Can you explain it to me?” Show me that you understand your limits and are willing to learn. Tell me what you do know, and let’s work from there. Let’s see how well you expand your knowledge on the fly, under a little pressure. Don’t try to BS me.

Whatever you know about these topics, I guarantee I can find some corner where I know more than you. It might take me a minute to get there, and it may be some esoteric quirk that will almost never come up in the real world, but I’ll get there. My goal is not to show I’m smarter than you (and, in fact, I’m genuinely hoping you’re smarter than me). My goal is to find your edge and pick away at it and see how you react.

Example: I had a candidate recently who was primarily a database developer. He had some .Net skills, but that part of his resume was marginal. But the recruiter brought him to me as a potentially very good candidate for the role because of his database experience. He really had that highlighted on his resume. It was clear that he thought of himself as a go-to for database development.

So I asked the index question. And he BS’d his way through an answer. And then didn’t follow up. Didn’t inquire about the accuracy of his answer. Didn’t probe at all. The interview carried on for a while after that, because I want to be sure of my assessment and not just make a snap judgment. But nothing afterward convinced me that this first impression was incorrect, so we were effectively done right there.

Lesson 1: If you think you’re an expert in an area, then you’d better know what you don’t know. If you don’t think the rabbit hole goes any deeper, then you’re not an expert. I’m very much aware of how little I know of SQL Server in general, and this is coming from someone who is regularly called “the database guy”.

Lesson 2: Failure to answer the question is not failure. Failure to follow up is. Show me that you’re willing and able to learn. Work the problem, right there in the interview. In tech, we will be faced with problems and won’t know the answers. We will be expected to figure them out. So show me that you can figure stuff out. This ends up being more true of junior candidates. They’re early enough in their careers that everything is a learning experience. But what I see is senior candidates with too much swagger, thinking they know all they need to know, and then spending the interview trying to convince me of the same. And those are exactly the kinds of candidates I will not hire.

Lesson 3: If you’re interviewing for a back-end C#/.Net/T-SQL role, go figure out the answers to those two questions. Not just what I wrote, but go for a real understanding of them. Get to the point at which you can answer “Why?” questions about them. If you go through enough interviews, I guarantee they will both come up at some point. I know I’m not the only interviewer who asks them. I stole the former from a colleague, and I know of several others who treat the latter as their go-to.

And good luck in your interviews!

EF Core and nvarchar(max)

My opinion was once, “Entity Framework is slow.” As I dive more deeply into the nuances of Entity Framework Core, that opinion is changing, but only slightly. It’s becoming, “Entity Framework is slow out of the box.”

There are ways to make it better. I’ll describe one of those here, but first, let’s establish the environment:

– Entity Framework Core 7.0.5

– The “code-first” approach

– SQL Server database (any in-support version)

So what does EF Core do wrong?

In this situation, when you define a string field of a model, the underlying data type will be nvarchar(max). For those who don’t know, this is a variable-length Unicode string with an effective maximum length of 231-1 characters (2 GB). This is a combination of three choices that EF Core is making:

1. nvarchar(max): The desired encoding (ASCII versus Unicode) is unknown. Therefore, EF Core defaults to the “safer” option, Unicode.

2. nvarchar(max): The preference between a fixed-length string and a variable-length string is unknown. Because fixed-length Unicode strings have a size limit of 4000, while variable-length strings can be larger (see below), EF Core defaults to a variable-length string.

3. nvarchar(max): The desired length is also unknown. Therefore, EF Core defaults to the maximum possible length, “max”.

What are the problems with those decisions?

1. ASCII strings consume half the memory of Unicode strings in SQL Server. If all you need is an ASCII string, then that’s the option to choose.

2. If you have extremely consistent data lengths, then fixed-length columns can be more efficient.

An example of both of these is a phone number. The necessary characters are all available in the ASCII character set. And phone numbers (even accounting for country codes and the nuances within each of those countries) are roughly equivalent in length.

3. Even though the effective length of the data may be very small, SQL Server must assume that the column can hold a large amount of data. Therefore, it makes decisions about resources based on this assumption.

Using the phone number example again, even though every row in the table holds data in this field with a length somewhere between 10 and 20 characters, SQL Server must assume that there is the possibility that it will store 231-1 characters.

For an example of how this can affect performance, see Eddie Wuerch’s response in this thread on sqlservercentral.com.

4. Data types like nvarchar(max) are known as large value data types or large object (LOB) data types. Data in these columns is handled differently than other types. For example, that data may be stored on a separate type of page (SQL Server’s designation for an 8k block of storage) than the rest of the row. This is called a LOB page. And this data can’t be indexed like other types, because normal indexes have a size limit.

Why does this matter?

Obviously, these are performance issues, and we should all keep performance in mind as we’re developing our solutions. But the bigger issue is that this behavior can go unnoticed in the early stages of developing an application. It’s only when the application is in production, with a sizable footprint, that we begin to see the harm that this causes. And as the application grows, these issues become harder and harder to fix.

So what do we do about it?

We solve this with two different annotations in our models. In EF Core 7, here is an example of each of these:

    [System.ComponentModel.DataAnnotations.Schema.Column(TypeName = "varchar")]
    [System.ComponentModel.DataAnnotations.MaxLength(50)]
    public string? MyStringField { get; set; }

As you can see, this indicates to EF Core that the string is still meant to hold variable-length data, but that it uses the ASCII data type and has a limit of 50 characters.

Summary

Once again, EF Core is not necessarily slow, but it is slow by default. And going along with the default behavior when it is the most inefficient option is a recipe for disaster. This is why many people, myself included, are inclined not to use it. But it is a popular ORM, so knowing how to make it better is vital.

T-SQL Tuesday #104

My thanks to Bert Wagner and his chosen topic for T-SQL Tuesday, Code You Would Hate To Live Without. It was just enough of an excuse to dust off the cobwebs here and get back to posting.

Anyway, since half of my time is spent in C#, I thought I’d venture into that world for my response. I’ll share a couple of common extensions that I include in most of my projects. Extensions, as their name implies, extend the functionality of existing objects. Here is a code snippet with a couple of extensions I typically add:

namespace myproj.Extension
{
  public static class Extensions
  {
    public static bool In(this T val, params T[] values) where T : struct
    {
      return ((System.Collections.Generic.IList)values).Contains(val);
    }

    public static object ToDbNull(this object val)
    {
      return val ?? System.DBNull.Value;
    }

    public static object FromDbNull(this object val)
    {
      return val == System.DBNull.Value ? null : val;
    }
  }
}

The first method enables me to easily search enumerations for a given value. For example, if I’ve defined this enumeration:

namespace myRacingProject.Enum
{
  public enum Series
  {
    None = 0,
    Indycar = 1,
    IndyLights = 2,
    ProMazda = 3,
    Usf2000 = 4
  }
}

Then I could use the extension like this:

if (mySeries.In(Enum.Series.ProMazda, Enum.Series.Usf2000)) myChassis = "Tatuus";

As for the other two methods, well… When is a null not a null? When it’s a System.DBNull.Value, of course! SQL Server pros who have spent any time in the .NET Framework will recognize this awkwardness:

var p = new System.Data.SqlClient.SqlParameter("@myParam", System.Data.SqlDbType.Int);
p.Value = (object)myVar ?? System.DBNull.Value;

With the extension, the second line becomes:

p.Value = mVar.ToDbNull();

Similarly, when reading, this:

var myInt = (int?)(myDataRow[myIndex] == System.DBNull.Value ? null : myDataRow[myIndex]);

Becomes this:

var myInt = (int?)myDataRow[myIndex].FromDbNull();

They’re not earth-shattering improvements, but my real point is that extensions are an often-overlooked feature that can improve your quality of life as a developer. Anytime you find yourself writing the same bit of code over and over, especially if that bit is rather unsightly, you might consider making it an extension.

Want to know more? Here ya go: https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/extension-methods

#PASS24HOP

In the past couple of years, I’ve had the opportunity to speak at a few SQLSaturdays, and presented to my local chapter, IndyPASS, a couple of times. I gained a new experience a few weeks ago, one that I had no idea would be so timely. I presented one of my sessions to YooperPASS, a new chapter in the upper peninsula of Michigan. This session was my first online session for PASS. I’ve presented online before at Salesforce, but that was to an internal audience. This was a new session, presented for the first time, to a bunch of strangers. Fortunately, the YooperPASS folks are very welcoming, and I think it went very well.

I mentioned this was timely, though, didn’t I? Well, that’s because I’ll be giving my second ever PASS virtual session on Thursday, the 20th. I’m a part of the 24 Hours of PASS: Summit Preview 2017! The same session I gave to YooperPASS, A Guided Tour of the SqlClient Namespace, will also be part of this webinar event. Because I’ll be presenting this session to a much wider audience now, I’d like to expand on my approach to this session.

First, I call it a “Guided Tour” because I’m cherry-picking those areas that I think deserve some special attention. A more comprehensive session would be impractical. So I had to cut things off somewhere. What made the cut are four topics that have valuable specific takeaways. A good technical session should give attendees something today they can put into practice tomorrow. I’d like to think I accomplished that with these topics.

Second, my choice of topics also gives me a chance to get across a broader message about how to scale ADO.NET code. Performance is always a focus of mine, and my experience has taught me that there is frequently a trade-off between application code and the database. One piece of code may perform better than another under a microscope, but its effect on overall system performance may actually be worse. In this session, I try to draw attention to this trade-off.

Finally, this is my attempt at an introductory session on an often-overlooked topic. Let’s face it, ADO.NET has been around a while. Because of that, I think the information available on it is inconsistent. StackOverflow didn’t exist until long after ADO.NET was introduced. Yes, there are some fine older resources on ADO.NET, but enterprises have changed a lot since many of them were written. Most current resources focus on Entity Framework, which is arguably just an abstraction layer over ADO.NET. So my session is intended to close the gap.

Anyway, I’m looking forward to being a part of this 24 Hours of PASS. If you tune into my hour, I hope it’s worth your time.

Godspeed,

J

Threading

A while back, I wrote an app that spawned a collection of threads to run some work in parallel, using the resources in the System.Threading namespace of the .NET Framework. Some time after that, I worked on another app that also had a threading component. This second app was reviewed by another developer from outside my immediate circle. He asked, “Why didn’t you use the System.Threading.Tasks” namespace? Uhh… because I didn’t know it existed?

That namespace was introduced in .NET Framework 4 – not exactly recent history – but I had somehow missed it for quite a long time. There are a few causes for that, but the one I’d like to focus on here is a trap that I think catches many developers at one time or another: We think we have it all figured out. While we are, to some degree, practical mathematicians – professionals who assemble algorithms to meet requirements – we are also creators. Our code is our art. And oftentimes, we don’t have the humility necessary to accept the possibility that our art isn’t beautiful. So we shy away from having the right people check our work.

This reminds me of an old saying: If you’re the smartest person in the room, then you’re in the wrong room.*

Now, this is not a commentary on my current team. I work with some really smart people, and I’m very grateful for that. But while my teammate may be one of the best PHP or Node.js coders I know, that doesn’t necessarily translate to an expertise with the .NET Framework. The true test is this – no matter how smart they are, if they’re not catching my mistakes, then I’m not being held accountable.

Lesson 1: Make sure someone’s catching your mistakes. If they’re not, then do you really think the reason is that you’re not making any?

So, back to the two apps… After the other developer’s feedback, I reworked the second one prior to release, and it passed its code reviews. The first app, meanwhile, developed some bad behavior in production. There was definitely a race condition of some sort, but I couldn’t seem to nail down where it was. I made a couple of adjustments to the code, but nothing seemed to bite. Of course, I couldn’t reproduce it in testing either.

Finally, I ripped out the threading code entirely and replaced it with nearly identical code based on System.Threading.Tasks. I was concerned about the risk of introducing more bugs, about the fact that I was still unable to reproduce the problem, and about how long it had been a problem, so I tried to remain as faithful to the original design as possible. And, yeah, honestly, I crossed my fingers.

Once this new version was released, the problem was gone.

Lesson 2: System.Threading.Tasks really is better than System.Threading.

I’ll never know what exactly fixed the problem. I could keep researching it, but the costs to me for that aren’t quite worth the benefits at this point. My takeaway was that the new stuff just simply works better. Whether that’s because it’s easier to use the right way (and harder to use the wrong way) or its internals are less buggy or some combination thereof, the end result is the same. I hope that’s old news to anyone reading this, but I wanted to share my experience just in case.

* I was unable to identify with certainty the source of this phrase. The leading candidate I found was 1962 Nobel Laureate James Watson.

Now versus UtcNow

This is a minor topic, but it was such a drastic difference when I found out about it, that it’s definitely worth sharing. Consider the following code:

 var begin = System.DateTime.UtcNow;
 for (var i = 0; i < 10000000; i++)
 {
 var foo = System.DateTime.Now;
 }
 System.Console.WriteLine(System.DateTime.UtcNow.Subtract(begin).TotalMilliseconds);
 System.Console.ReadLine();

The result on my machine was 9150 milliseconds. Now try this:

 var begin = System.DateTime.UtcNow;
 for (var i = 0; i < 10000000; i++)
 {
 var foo = System.DateTime.UtcNow;
 }
 System.Console.WriteLine(System.DateTime.UtcNow.Subtract(begin).TotalMilliseconds);
 System.Console.ReadLine();

The result was 152 milliseconds. Really. Not kidding. Yes, there are better ways of measuring, but that one worked well enough in this case. Apparently, the framework uses UTC natively, and then converts to local times when it matters. And that conversion comes with a huge cost!

Now you know. And knowing is half the battle.