Abusing Factory Boy sequences in Django unittests

About two weeks ago, I began using factory boy for easily generating models in Django unittests. Here's what I've learned about factory boy and unit testing in general over the past two weeks.

This article won't go into the basics of how to use factory boy - the official documentation covers the basics very well. The article will, however, discuss specific scenarios and opinions on best practices.

Why is factory boy useful?

Factory boy does a great job of eliminating the mental overhead that goes into writing tests against Django views. A common obstacle that prevents teams from incarnating tests of their Django views into the dev cycle is just how much work goes in to creating the environment the view needs to function correctly.

In order to test a particular view, you might have to create a specific model first. No big deal, right? That's until you realize that model has 4 foreign keys to other models, each of which has 2 foreign keys to some other models, etc. Would you, as a developer, really want to spend time creating and relating all these model instances every time you needed to test a view? I certainly wouldn't.

An example factory

Let's examine an example factory

Suppose we define our factories in a factories.py file within our application directory. This means factories.py would be adjacent to models.py in our application.

Suppose our Django app lets users bookmark articles on the project's website. Our factory for the Bookmark model may look like so:

(factories.py)

import factory
from . import models

class BookmarkFactory(factory.DjangoModelFactory):
    user = factory.SubFactory('account.factories.UserFactory')
    article = factory.SubFactory('article.factories.ArticleFactory')
    description = 'Article bookmark'

    class Meta:
        model = models.Bookmark

What's so awesome about factories is how easy they are to make and work with. In our Bookmark example, we need both an article object and a user object to have a bookmark association. If we were to create a BookmarkFactory instance in a test, factory boy would automatically create a User and Article object for the association. And perhaps the User and Article models contain foreign keys to other models! As long as their factories are defined correctly, we don't have to worry about any of that! We get our object with essentially no effort, but we also have ability to override values as desired.

Sequences: Not to be abused

Factory boy allows you to define sequences for your fields. These sequences allow you to generate similar but technically unique values across all instances of the factory within your test. Consider the example from their docs:

class UserFactory(factory.Factory):
    email = factory.Sequence(lambda n: 'person{0}@example.com'.format(n))

    class Meta:
        model = models.User


>>> UserFactory().email
'person0@example.com'
>>> UserFactory().email
'person1@example.com'

Sequences are pretty handy! They're especially useful for preventing conflicts on columns with unique constraints.

However, one word of caution: Don't rely on the sequenced value!

Most of the time, the sequence value (the 0 in 'person0@example.com') will be identical to the instance's pk value, but not always! The most frustrating issue I experienced related to this was a testcase that would pass when I tested it in isolation, but would fail when tested along with the rest of the app. My tests looked similar to the following:

from django.test import TestCase
from users import factories as user_f
import json

class UserViewTestCase(TestCase):
    def test_get_all_users(self):
        # No users yet, shouldn't see anything
        resp = self.get('/users/')
        data = json.loads(resp.cotent)
        expected = {'users': []}
        self.assertDictEqual(expected, data)

        # Now create a user and ensure it shows up in the view response
        user = user_f.UserFactory()

        resp = self.get('/users/')
        data = json.loads(resp.cotent)
        expected = {
            'users':[
                {
                    'id': user.pk,
                    'email': 'person{0}@example.com'.format(user.pk)
                }
            ]
        }

        # This would fail when the entire project test suite was ran, but
        # would work fine when just this test or testcase was run!
        self.assertDictEqual(expected, data)

This test worked well in isolation, and even with the rest of the tests in the testcase. But if there were other testcases that had UserFactory instantiations in their setUp() methods, the sequence counter would deviate from the pk, causing the assertion to fail.

I could not exactly determine why this was the case - however, as discussed in the next section, I'm actually glad this issue arose. Even though tests using this strategy were very flexible, they were ugly to look at due to the massive amount of string interpolations used throughout the expected values. In short, I made my tests too DRY. Tests that are ugly to look at cause you to lose a massive advantage gained by writing tests in the first place - the ability to use tests a living documentation.

Tests as living documentation

I looked through the factory boy source to determine the cause of the sequencing deviation. After a few hours, I came to the conclusion that I didn't even like the approach of using the object pk as a way to verify the values of other fields.

First of all, tests became ugly to look at. This particular project I was working on was an internal API for a client. I wanted their front end developers to have the ability to look at the testcases and determine what the return values of various API endpoints looked like. The front end dev was greeted with tests looking like so:

def test_stuff(self):
    ...
    ...
    resp = self.get('/users/')
    data = json.loads(resp.cotent)
    expected = {
        'users':[
            {
                'id': user.pk,
                'email': 'person{0}@example.com'.format(user.pk),
                'name': 'name{0}@example.com'.format(user.pk),
                'phone': '512-555-555{0}'.format(user.pk),
            },
            {
                'id': user2.pk,
                'email': 'person{0}@example.com'.format(user2.pk),
                'name': 'name{0}@example.com'.format(user2.pk),
                'phone': '512-555-555{0}'.format(user2.pk),
            },
        ]
    }
    self.assertDictEqual(expected, data)
    ...
    ...

If I were to pass fields to the factories explicitely, I could write my tests to look like this instead:

def test_stuff(self):
    ...
    ...
    resp = self.get('/users/')
    data = json.loads(resp.cotent)
    expected = {
        'users':[
            {
                'id': user.pk,
                'name': 'John Doe',
                'email': 'john@example.com',
                'phone': '512-555-5555'
            },
            {
                'id': user2.pk,
                'name': 'Jane Smith'
                'email': 'jane@example.com',
                'phone': '903-555-5555'
            },
        ]
    }
    self.assertDictEqual(expected, data)
    ...
    ...

This results in a much cleaner, much easier to read test. People don't like to read cluttered or confusing looking documentation. Unit tests are no different! If you want your tests to serve as documentation, it needs to be easy to read, otherwise developers will ignore the tests.

This is why I've decided to restrict the usage of sequence fields values to fields whose values most be unique across instances. I also do not compare against these sequentially generated values.

Summary

I thoroughly enjoy using factory boy for testing Django applications. In my excitement, I went a little sequence crazy! Factory boy already saves you loads of time and effort with the SubFactory feature - spending a little extra time to explicitly define values for the fields you would like to compare against results in cleaner, more useful tests.