In some parts of the C# community, I’ve noticed that there seems to be a desire to avoid writing SQL code. Some advocates for Entity Framework have even cited the ability to avoid writing SQL as one of its benefits. I admit that I am no EF expert. If I had to choose an ORM, my list stops at Dapper, and I normally roll my own data access layer.
But I’m not here to try to dissuade anyone from using Entity Framework. What I’d like to do, though, is lay out some of the benefits of embracing SQL in your solutions, specifically Transact-SQL and the use of stored procedures in SQL Server.
First let me share my typical design for a new solution based on a relational database. I not only use a database-first approach, but I build the database using scripts that are part of a repository. Everything after the CREATE DATABASE statement itself is in the repo. Each script is essentially idempotent. In other words, the script can be executed repeatedly without adverse effect. For tables, this means checking for existence before creating them. For changes to existing tables, additional logic is used to (for example) check for the existence of a new column before adding it. For other objects, this is mostly just using the “CREATE OR ALTER” syntax. I also take steps to maintain backward compatibility, but that’s for another post.
By the way, kudos to the attendee at my PASS Summit 2022 session who called me out for using the word idempotent. I’ve been throwing that word out for years, knowing that it’s not quite what I mean, but close enough. So, no, the scripts aren’t really idempotent, but I haven’t found a better word yet for, “I can F5 them to death.”
Anwyay, I also include PowerShell scripts in the repo that execute the T-SQL scripts on a database, and incorporate these into a CI/CD pipeline. I won’t explain the whole pipeline, but one crucial part is that a merge of an approved PR into the dev branch will trigger a deployment to a reference development environment. The scripts are all executed against the database(s) in this environment, and then application code is deployed to the servers.
The point I’m trying to make is that the database code is treated as part of the solution. It is in the repos just like the C# code. It is deployed just like the C# code. I even include unit testing, just like the C# code.
I also keep all T-SQL in the database. There is no T-SQL code inside the C# code. Stored procedures are used for all functionality, even basic CRUD functions. In fact, probably 80–90% of the stored procedures in my systems are essentially CRUD functions. The only operation that doesn’t go through stored procedures is bulk insertion, and I use SqlBulkCopy for that.
Here is an example of my approach, one that I put together for that Summit session: https://dev.azure.com/downshiftdata/_git/SearchOverflow
Why Do It This Way?
- Changes to the database are tracked in the repo just like changes to application code. So knowing who changed what, when they changed it, who reviewed it, when it got deployed, etc, is no more difficult to answer for database changes than application code changes.
- The stored procedures act as an interface layer, keeping database code in the database and application code in the application. Radical changes to the database are possible, without any effect on application code and without incurring any downtime. Performance issues in the database can be addressed quickly, through hotfixes, without redeploying the application. And the database design can take advantage of this interface layer. For example, instead of relying on AFTER triggers, audit logging can be done through OUTPUT clauses inside the stored procedures. This is because you can reasonably guarantee that all the writes are coming in through the procedures (and even support this assumption with appropriate authorization configuration).
- Stored procedures are compiled, with their plans in the procedure plan cache. Yes, this happens with ad hoc queries as well (and I’m an advocate for the “Optimize for Ad Hoc Workloads” option in SQL Server). But there are issues like parameter sniffing and cache bloat that need to be handled. Routing execution through stored procedures makes it easier to manage the cache and understand what is happening in the database.
The Other Side
I’ve seen quite a few arguments against approaches like this. Here are some of the more popular ones:
- If I use Entity Framework and take a code-first approach, then it does all the SQL for me and I don’t have to worry about it.
Once again, I’m not an EF expert, and don’t know the full scope of what it can and can’t do. But I’ve heard variations of this one quite a lot. And the argument falls apart like any other code-generation argument. It’ll work fine 90% of the time, and then you’ll be in trouble when you hit the other 10%. Now, in some situations, this may work out. You’ll always be inside that 90%. But to me, it’s a calculated risk, and one that I don’t have to make because it’s not the approach I take.
Where this one concerns me is in situations in which a DBA (or someone in a similar role) was essentially ignored. The DBA advocated for something less wizard-y, and had good reasons for it. But they’re often outnumbered and – frankly – not well respected, and decisions are made despite their arguments. And then the situation hits that 10% and the DBA’s list of options is limited. They often know the change they should make, but they can’t make it because that code is being generated in the application layer. Or the change they want to make will break application layer code.
- That’s the old way of doing things.
This makes me cringe every time. Yeah, we all wrote stored procedures twenty years ago, but that’s not how you do things today. This goes hand-in-hand with the idea that RDBMSs themselves are antiques and shouldn’t be used. I’m a firm believer in the KISS principle. And despite being ancient in the tech world, the RDBMS is a straightforward solution to a lot of problems. Would a big data store be better for a certain situation? Of course. Will your app ever reach a threshold that demands the advantages of that big data store? If not, then why add to the complexity of your solution? If your tech stack includes an RDBMS, then you have one product that can reasonably support a host of requirements.
- The database is the database and the application code is the application code and the two shouldn’t be mixed.
In my experience, those who make this case have had to suffer through thousand-line stored procedures that tried to do everything in the database. That is indeed a nightmare situation, and one that I’ve seen all too often.
But their response is to swing fully in the other direction. Don’t let any code reside in the database. But the truth is that there are scenarios in which it makes sense to go either direction. If you need to join several large tables, have good covering indexes, need a timely response, are faced with frequent cache invalidation, and only need a few rows in each call, it makes a mountain of sense to have that functionality in a stored procedure. Yes, that could be dynamically generated in the application layer. But there is definitely a non-zero chance of getting a bad execution plan in that scenario. And as I said before, your options then are much more limited.
Conclusion
Stored procedures form an interface layer, between the database and the application code. This layer of abstraction is incredibly beneficial to application security, maintainability, and scalability. Not using it is a design decision that should be considered very carefully.
When I am in conversations with others in the database community, or with coworkers who share some of my experience, I wouldn’t dream of having to make these arguments. But now that I’m back in C# land more than I used to be, I’m shocked at how often my opinions are at odds with others in this community. So, I don’t know if I’ve put any of those arguments to rest – probably not – but maybe that anti-stored-procedure crowd is at least thinking a bit more about the situation.