I recently ran into a subtle issue with one of my SQL scripts. Unfortunately this issue caused it to delete all of the data in all of the tables it touched. Fortunately for me, we were still in DEV with the script so the impact was minimal. I spent quite a bit of time debugging the script and tracking down theories just to end up right where I started. Since this was a very simple script, I was feeling really dumb right about then. Here’s the issue and what I found was the problem. Hopefully you can learn the lesson I did without the pain involved.
Due to circumstances I won’t bore you with, I found myself in a situation where I had to identify some data that fell within a certain range of dates and delete all records in all tables that related to the identified records. I wrote the script, but found that instead of deleting only the records that fell within the date range specified, it deleted all records in the tables it touched. Not fun.
In the script I created a table variable (I don’t like temp tables) that would contain only one column. This column would contain the primary key of the main table to which most of the other tables related. Into the temp table I inserted all of the primary key values from the records in the main table that met a number of criteria. Then I used a DELETE statement to remove records from each related table where the record’s foreign key referenced a value that was in the table variable.
It turns out that the real problem was a simple typeographical error. In the DELETE statements, within the WHERE-IN statement’s sub-SELECT statement, I added an underscore (“_”) between “PK” and “Val”. The implications of this issue though were far reaching. Initial thoughts about this incorrectly named column might be that SQL Server Management Studio should notify you about the mistake, or at worst the sub-select statement wouldn’t return any data and therefore the DELETE statement wouldn’t delete anything.
These would be incorrect assumptions though. The problem is that the RelatedTableN contains a column called “PK_Val” with the underscore. Due to ANSI standards, SQL always checks the outer table BEFORE it checks the inner table. In this case it found a column called “PK_Val” in the outer table. Since the inner table contains at least 1 record, a match was found for every record and the result was that it deleted every record in each of the tables it touched.
The learning lesson for me in this case was several fold.
- First, it was good to know that outer tables are checked before inner tables. As with anything, knowing exactly how every part of something works helps you be much more proficient working with it. Take for example a car. If you know how everything in the engine works, you can generally deduce what is going on a lot easier from the symptoms you’re seeing.
- Second, I use table aliases religiously. However, this particular time I failed to do so and it bit me. If I had been more specific about telling the database engine exactly where I wanted it to look for that column, I wouldn’t have run into this issue. It’s a good reminder… always use aliases where multiple tables are involved. If nothing else, being more clear is always better than more vague.
- Always start with the simplest things and work your way to the more complex. A good example I always used as a systems administrator or network administrator is the OSI model. It’s a 7 layer model going from the physical layer (simplest) to the application layer (most complex). Don’t start uninstalling software, reconfiguring the system, etc. when you can’t communicate… check the network cable first. Then ensure the hardware is working (loopback & connectivity tests) and so on. This helps you systematically eliminate things that build the foundation for those above it and can often save you a lot of time. Of course this doesn’t mean to start checking cables when you have a virus or similarly unhelpful steps. Instead it’s a good rule-of-thumb. If in doubt start with the simple and work your way up.