My opinion was once, “Entity Framework is slow.” As I dive more deeply into the nuances of Entity Framework Core, that opinion is changing, but only slightly. It’s becoming, “Entity Framework is slow out of the box.”
There are ways to make it better. I’ll describe one of those here, but first, let’s establish the environment:
– Entity Framework Core 7.0.5
– The “code-first” approach
– SQL Server database (any in-support version)
So what does EF Core do wrong?
In this situation, when you define a string field of a model, the underlying data type will be nvarchar(max). For those who don’t know, this is a variable-length Unicode string with an effective maximum length of 231-1 characters (2 GB). This is a combination of three choices that EF Core is making:
1. nvarchar(max): The desired encoding (ASCII versus Unicode) is unknown. Therefore, EF Core defaults to the “safer” option, Unicode.
2. nvarchar(max): The preference between a fixed-length string and a variable-length string is unknown. Because fixed-length Unicode strings have a size limit of 4000, while variable-length strings can be larger (see below), EF Core defaults to a variable-length string.
3. nvarchar(max): The desired length is also unknown. Therefore, EF Core defaults to the maximum possible length, “max”.
What are the problems with those decisions?
1. ASCII strings consume half the memory of Unicode strings in SQL Server. If all you need is an ASCII string, then that’s the option to choose.
2. If you have extremely consistent data lengths, then fixed-length columns can be more efficient.
An example of both of these is a phone number. The necessary characters are all available in the ASCII character set. And phone numbers (even accounting for country codes and the nuances within each of those countries) are roughly equivalent in length.
3. Even though the effective length of the data may be very small, SQL Server must assume that the column can hold a large amount of data. Therefore, it makes decisions about resources based on this assumption.
Using the phone number example again, even though every row in the table holds data in this field with a length somewhere between 10 and 20 characters, SQL Server must assume that there is the possibility that it will store 231-1 characters.
For an example of how this can affect performance, see Eddie Wuerch’s response in this thread on sqlservercentral.com.
4. Data types like nvarchar(max) are known as large value data types or large object (LOB) data types. Data in these columns is handled differently than other types. For example, that data may be stored on a separate type of page (SQL Server’s designation for an 8k block of storage) than the rest of the row. This is called a LOB page. And this data can’t be indexed like other types, because normal indexes have a size limit.
Why does this matter?
Obviously, these are performance issues, and we should all keep performance in mind as we’re developing our solutions. But the bigger issue is that this behavior can go unnoticed in the early stages of developing an application. It’s only when the application is in production, with a sizable footprint, that we begin to see the harm that this causes. And as the application grows, these issues become harder and harder to fix.
So what do we do about it?
We solve this with two different annotations in our models. In EF Core 7, here is an example of each of these:
[System.ComponentModel.DataAnnotations.Schema.Column(TypeName = "varchar")]
[System.ComponentModel.DataAnnotations.MaxLength(50)]
public string? MyStringField { get; set; }
As you can see, this indicates to EF Core that the string is still meant to hold variable-length data, but that it uses the ASCII data type and has a limit of 50 characters.
Summary
Once again, EF Core is not necessarily slow, but it is slow by default. And going along with the default behavior when it is the most inefficient option is a recipe for disaster. This is why many people, myself included, are inclined not to use it. But it is a popular ORM, so knowing how to make it better is vital.