Caching in Django with the Prefetch and Annotate Pattern
At some point you will add a property to a model that triggers an n+1 query. This is when an attribute of an associated table is requested. For each time this happens n additional queries are added to the request. What we need to do is request all of the joined tables as a single request and then reference the fetched data with the joined tables needed for the model’s property. Fortunately, Django makes caching extremely simple and also provides a built in test matcher that makes detecting n+1 queries a snap.
Our story begins with the addition of two innocent properties to a blog post model. For our blog posts we would like to include a count of reviews along with the average review score on the post_list
page for each post.
Functionally everything checks out and both properties can easily be referenced in the post index template. However, there is a severe performance hit that happens when we calculate averages or counts on a related table like review_set
. We can write a test that asserts that post_list
only results in one hit to the database.
This test fails with 11 fetched queries! This makes sense considering that we are constructing a listing of 5 posts. Both fields add 5 additional queries in addition to the original query because they reference data on the Review
model. This will eventually lead to some unacceptable performance bottlenecks as the list of posts grows and additional properties are added to Post.
Django provides two tools that can help us solve this problem - QuerySet.annotate
and QuerySet.prefetch
. We will start by creating a PostQuerySet
class and assign that as a manager to our Post model’s objects property.
The QuerySet.annotate
method is used to add a private property to each post with the calculated average review_rating and review count. Posts can be fetched with the detail records in the prefetch_detail
method. This method can be called when references to a Post’s average_rating
and number_of_reviews
are needed. The initial query might run slightly longer than a simple selection on posts. However, this will pay off when additional hits to the database no longer occur with the addition of the annotated calculations.
A simple guard condition can now be added to each property that checks for the presence of the annotated cached value before calling the calculation on Post.review_set
. With this update and the use of prefetch_detail
in the post_list
view method, the query count for the test returns back to a single query count and passes!
Conclusion
Testing view methods by using TestCase.assertNumQueries
is an easy way to detect n+1 performance issues in Django applications. Fixing n+1 queries can usually be solved with the addition of an annotation for the calculated or related data needed by the view requesting the data. Generally, a single reusable prefetch_detail
QuerySet method can be used in cases where a template or serializer needs data that should be calculated or included in advance. It is also up to the model to handle these annotations and use them when they exist.
Making these simple changes to an application can really help scale your Django application and possibly save money on the computation power needed for your database server.