Self-Validating Script in SQL
We all write SQL scripts to do database operations. When you are supporting an application that is running in Production it’s very critical when you are modifying the data by running SQL scripts. From web applications, some validation logics are written but when you are updating directly in the table, there is a risk that some wrong data gets modified by mistake.
There is another scenario where mistakes can happen. Generally, when we run an update script in a live application there is a standard process to test it in a lower environment first then move it to production. Sometimes you may not have access to insert/update/delete access to the production system. In that case, you prepare the script put it somewhere and another team like change control or DBA team gets the script and executes it. There is a high chance that data in the test and production environment is different and the same script behaves differently in test and production. So there is a high chance of making mistakes in production.
So what is the solution? Like the application validates the data before committing to the database, we can implement validation inside the script which will self-validate the script. It will check what it is supposed to do and then only commit. Let’s see how this can be implemented.
Suppose you want to update a table based on some condition. First, check how many number of data it is supposed to update. Let’s say 5. Now you do a check after the update query that is it updates 5 records like below.
Same thing applies in insert, update, and delete.
One thing needs to be kept in mind, always give the hardcoded number by checking the data beforehand. Never select the count of the query and immediately after update the data. If you select the count and update it immediately it will never fail. This is not proper validation. The select query or checking of the data needs to be done beforehand. The below query is not the correct way to validate the update.
Now there might be a scenario when you cannot tell the exact count of the record that needs to be updated as it’s a live system and the user keeps updating the records. The count might be different when you are making the script and executing it in production (as it might take time to test the script in a lower environment and approve it). In that case, you need to give an allowed threshold. Like you can allow a maximum 2% deviation of data (the percentage might change depending on the number of records to be updated). Let’s take an example.
Sometimes different environment has different numbers of records. We can create environment-specific count and put it in validation.
Lets see a complete sample code where self-validation is implemented.
We got this idea from Architect Lee Rosenfeld. We successfully implemented it in our day-to-day operation and this saves us a lot from bad data updates in the application.