We’ve all done it. We needed to pull data out of an SQL database, and the query we needed to pull out data was a complex query with multiple table joins, complex filter criteria, and sophisticated
And why not? SQL databases such as MySQL and PostgreSQL are really good at performing complex joins, filtering, and sorting in order to get exactly the data you need out of a query and nothing more. Let the database do the work! It sure is a lot easier than having to write that logic in the application code, right?
By performing these operations inside the database, what you are really doing is pushing application business logic down into the bowels of your database. You are taking this business logic out of your application code, and moving it into database logic instead.
The result? You end up using more database resources and fewer application server resources in order to achieve the data retrieval results you are looking for.
The same is true when implementing data insertion requirements. What happens when you want to insert data into your database? Rather than verifying data you are inserting is valid, conflict-free, and completely correct in the application before you perform data insertion, you let the database execute the validation.
You put that conflict criteria into the database schema requirements and let the database throw an error (if detected) in the data you are inserting. You might do this, for instance, by using
UNIQUE constraint criteria and indices.
Application resources are easier to scale
Now, before I get a lot of database engineers hounding me on Twitter saying, “That’s exactly what those features are for!” — or a lot of developers protesting that “some things have to be done that way” — let me be clear. I do understand there are cases where you need your database to do those sorts of validations and checks.
So I’m not saying don’t have your database do any application logic. What I’m saying is, you should have your database do as little application business logic as is possible, and do as much application business logic in the application code itself as you possibly can.
Why do I say this? Because, in general, it’s substantially easier to scale your application server resources than it is to scale your database server resources.
This is nearly a universal truth. For most web applications when traffic increases, usually you can easily add application servers to handle increased load. But additional application servers aren’t helpful unless your database can also scale to handle the increased load. And scaling a SQL database is substantially harder than adding application servers.
Take a look at the resources available to your application. You should consider your database as being composed of scarce computational resources, while your application code layer (including services) is composed of readily available computational resources that can be easily expanded.
Database resources are scarce. Computation resources are readily available.
Once you can wrap yourself around this mindset, you’ll realise that putting as much logic in your application layer as possible will facilitate easier scaling. Putting business logic in your database tier limits the scalability of that business logic substantially.
Let the database handle complex filtering
You can’t always put all the business logic in the application layer. Sometimes, you have to put certain queries in the database tier. One valid reason for this may be to control the quantity of results returned. Consider a query that has a complex filter criteria on a very large database table:
select * from mytable where <complex filter query>
The expected results, after executing the complex filter query on a large dataset, might be only a few rows from the table. The
* will return all the data from just those few rows. But if you instead want to execute the complex filter query in the application layer instead of the database, you typically have to retrieve all the data from the database first. You may end up with something like this:
select * from mytable
This would return all the data in mytable to the application layer. The application layer would then throw away any rows that did not pass the filter criteria it executes on the data. The problem with this is that the entire contents of mytable has to be transferred to the application layer to perform this request. For a large dataset, this is unacceptable.
But, in many cases, simply refactoring a query or series of queries can avoid problems such as this and allow you to execute more logic in the application without undue data traffic. One way to do this is to separate data filtering from data retrieval.
Separate filtering from retrieval
Very often we combine the concept of filtering results and retrieving results into a single query. Especially when we are looking at a big table with lots of data, it’s very convenient to write a query that performs all the filtering and specification we want to select which rows we need, then have the query return all the data we require from the selected rows. We do something like this:
select * from mytable where <complex filter query>
By itself, this is seemingly OK. But when the query involves complex joins or other complex operations, it can put an undo load on the database, straining database resources.
One way around this problem is to perform an initial query that simply returns only the fields required for the filter query from all the rows, then perform the filtering logic in the application. Let’s assume that field1 and field2 are the fields actually involved in the <complex filter query> above. So, let’s get only that data in an initial query:
select id,field1,field2 from my_table
Then, we can perform the complex filter query logic on field1 and field2 in your application code. This will result in a list of IDs for the rows in mytable that match the complex query. Once you have the list of matching IDs, you perform a follow-on query that gets the actual data from the prefiltered rows:
select * from my_table where id in (3,5,123,392,...)
Both queries are very simple queries to execute—there are no complex operations that need to be done in the database. The necessary business logic for selecting the data to return is executed in the application layer, yet there is very little extraneous data that needs to be transferred from the database to the application.
By splitting the query into separate filtering and data retrieval queries, you have refactored your request and allowed complex, resource-intensive, business logic to execute in the application rather than the database.
Avoid performing operations on returned results
Another easy way to move business logic out of the database and into the application layer is to avoid performing calculations on the returned results in the database; perform them in the application instead. So, rather than doing something like this:
select POWER(SQRT(field1)*SIN(field2),5) from my_table where ...
You can do this:
select field1,field2 from my_table where ...
And then perform the equivalent of
POWER(SQRT(field1)*SIN(field2),5) in the application with the returned results. The result is that all the computation required to perform the calculations make use of readily available application resources, rather than scarce database resources.
Move joins into the application layer
Complex joins are another area where extensive database resources can be required. Rather than joining data in the database, move as much of that joining logic as you can into the application layer. By refactoring your code in this manner, you can reduce the load on your database significantly, while increasing your scalability.
Break the chains
Of course, you can’t always refactor your queries in these ways. Sometimes, you just need to perform a complex query within the database itself. But, by removing as many of these complex queries as you can, you reduce your dependency on scarce database resources and increase your dependency on highly available application-level resources.
So, the next time you look at that big, long, gnarly query that uses multiple joins and complex filtering logic, don’t be proud of it! Instead, look at ways you can replace it with simpler queries, perhaps multiple queries, and business logic that executes in your application layer.
As your application scales, you will appreciate the improved flexibility.