PreSeeding: A faster way to run tests

Published

Jun 7, 2024

Author

As Rippling has grown in size, we've expanded our product line. Accordingly, our tests have grown in number and execution time. Our backend developers reported that it was common to start running a test, switch to working on something else, and come back two or three minutes later to view the test result. So, we decided to dive into our test framework and find a solution for them. Here’s how we freed our developers from slow tests and boosted their productivity in writing and running tests.

Anatomy of a RipplingTestCase

At Rippling, we use the pytest test framework, and a majority of our tests inherit unittest.TestCase. I’ll briefly explain how these tests work before moving on to how our test base adds features on top of this.

class TestCalculator(unittest.TestCase):
    def setUp(self):
        self.calculator = Calculator()

    def tearDown(self):
        del self.calculator

    def test_add(self):
        self.assertEqual(self.calculator.add(1, 1), 2) 

    def test_subtract(self):
        self.assertEqual(self.calculator.subtract(1, 1), 0)

Sample unittest.TestCase

In the example above, we have a test class with some setup and teardown hooks and two test units. When running this test class, on each unit, we will:

Run the setUp function.
Run the test.
Run the tearDown function.
Repeat this process for the second unit.

Order of operations for TestCalculator with unittest.TestCase

Most of our tests inherit from a modified version of unittest.TestCase called RipplingTestCase. Over time, we’ve added lots of custom functionality to this test base to assist in writing tests, but these behaviors are most pertinent to test execution time:

RipplingTestCase manages setting up and tearing down the test database.
RipplingTestCase has state snapshotting and restore behavior so that setUp is only run once per class rather than once per test case. It is snapshotting DB state as well as some global and class variable state.

These optimizations exist because most of these tests are actually integration tests and not unit tests. They rely heavily on interacting with the test database through our ORM and generate a lot of data as part of setUp. Without this snapshot behavior, running one of our data-dependent test classes would take significantly longer as we’d need to generate the same data before each unit.

Here’s how the previous test class would run if it inherited from RipplingTestCase. With the snapshotting behavior, we only ever run setUp once, whether we have one test unit or 100 in this class.

Order of operations for TestCalculator with RipplingTestCase

This worked well for developers for a long time, keeping the setUp time associated with any test class relatively constant. Over time, though, these test classes added more complexity and data generation to their setUp functions. The result was that running setUp could take an unbounded amount of time.

Test life cycle with RipplingTestCase, setUp time can grow

In addition to growing setUp time, we also have a fairly large application startup time of ~60 seconds. This is how long it takes to load our monolith Django application to prepare for running a test. We have adopted solutions, such as optimizing our application startup time and hot module reloading, to tackle the initial 60 seconds.

But it’s no wonder our developers were feeling the pain of running tests. Even for their fastest tests, they would have to wait a minute. On average, they waited several minutes to run even a single test unit.

The data

At this point, we had a strong suspicion that optimizing setUp times would improve our developers' experience, but we wanted to explore how pervasive slow setUp times were. To do this, we recorded the setUp times of all ~10,000 test classes in our CI system and sorted them by the length of time.

Distribution of setUp times across our 10,000 test classes

This was surprising: over 25% of our tests took more than 30 seconds to setUp, 10% of our tests took more than a minute, and one test took 15 minutes to setUp!

This data confirmed that setUp times were pretty slow across the board. If we could find a way to optimize them, we’d be able to make tests faster for developers to run locally. Making test execution faster this way could also apply to our CI system and result in compute and cost savings.

Investigating slow setUp

So far, we identified that slow setUp times were a pervasive problem with our tests, but we wanted to understand why. In our case, slow setUp time was correlated with test data generation: Why was it so slow to generate data?

On Changes

Rippling is a horizontal software platform supported by multiple product offerings. As a result, we have a wide graph of product-specific models that are connected by some central HR models like Employee and Company. In order to maintain data consistency, models can register to receive synchronous events triggered by state changes on other models. We call these “On Changes,” and they’re a convenience for developers to make multiple synchronous DB queries in the same thread to keep their denormalized models in sync.

Where On Changes fit on the data consistency scale

Although On Changes offer convenience in keeping data consistent, many of our core HR models have become inundated with On Change listeners, leading to a synchronous fanout of updates. These updates can also cascade into subsequent updates, making writes on certain models very expensive.

Single threading

Our wide data model—combined with synchronous fanout of On Changes—makes generating data for even a single model pretty slow. Most tests generate, at minimum, a company and a few employees in the company before generating more data that is specific to that test.

We profiled generating a single employee in a test and found that it took two seconds to do, and 95% of the time was spent executing On Changes! The execution time was split across hundreds of fanned-out On Changes, which were each individually fast but quite slow in total. We also found that most of this work is CPU-bound, so we couldn’t easily multi-thread these tasks. Because of CPython’s Global Interpreter Lock (GIL), only IO-bound tasks really benefit from multithreading, not CPU-bound tasks.

Our slow test setUp times were starting to make sense. Generating even a minimum set of test data through our wide data model meant running hundreds of small, CPU-bound tasks in a single thread. Trying to generate test data for multiple products or even trying to generate lots of data for performance testing, quickly racked up setUp execution time.

Solutions

There were a few approaches we could take to optimize setUp times from what we saw above:

Decouple our wide application architecture into isolated services.
Migrate eligible usages of On Change to asynchronous workflows.
Leverage multiprocessing in our application.

These are all active efforts led by engineering teams here at Rippling, but they’re outside the scope of this blog post.

To ease our developers’ immediate pain related to tests, we wanted a “quick and easy” win on setUp times that didn’t require fundamentally changing our application architecture. We leveraged caching of setUp data to achieve this.

PreSeed Test Framework

Recall that by using RipplingTestCase instead of unittest.TestCase directly, we were able to go from setUp time executed N times to executing it just once when running all test units in a test class. Even with this optimization, developers were having to wait around to run their tests. With the PreSeed Test Framework, we were able to remove setUp execution time (almost) completely.

PreSeeding improves test execution time by skipping setUp

The idea was to reuse the same DB snapshotting concept implemented in RipplingTestCase but persist it across test runs. You would have to “PreSeed” your database once, and then the PreSeedTestCase would restore the contents of that “PreSeed” to the test database before running each test unit.

Order of operations for TestCalculator with PreSeedTestCase

We knew from RipplingTestCase that the time to restore a snapshot before a test unit was pretty fast. So, as long as you populated the “PreSeed” ahead of time, you’d be able to run the same test multiple times without ever paying the setUp cost. The initial “PreSeed” generation cost would be amortized over each of those test runs.

This model fits well with how we know developers write and debug tests. It’s rare that you materially change the data that’s generated during a test, especially when debugging an existing test or adding a new test unit to a large test class. So you can just setUp the data once and skip ahead to running the units as many times as you want without paying that setUp cost again.

What happens if you make a code change which does affect data generation? This might happen if, for example, you add or change a model before running your test. In this case, you’d have to know to rerun the data generation step before running tests again. A good heuristic to use is if you’re just changing test code, you shouldn’t have to regenerate the “PreSeed” data, but if you’re changing models or helpers, you should regenerate once before running the test.

Migrating tests

PreSeedTestCase is not a drop-in replacement for existing tests. You could not simply change a test to inherit PreSeedTestCase like you could with RipplingTestCase to cache the setUp. Consider the following example of a RipplingTestCase, which generates some company and employee data in its setUp and then asserts against that data in the test unit:

class TestSuite(RipplingTestCase):
	def setUp(self):
		self.company = generateCompany()
		self.employees = [] 
		for i in range(10):
			self.employees.append(generateEmployee(self.company))

	def test_employees(self):
		self.assertEqual(10, len(self.employees))

A RipplingTestCase that generates and tests against data

You could snapshot the data resulting from the setUp. But if you tried to restore it directly and skip setUp altogether, you wouldn’t be able to reference class members like self.company or self.employees in your test units. RipplingTestCase gets around this problem by doing some magic to snapshot global and class members after setUp, but it’s not always reliable and can lead to some flakiness. We didn’t want to reintroduce this pattern in PreSeedTestCase, so we opted for a different option.

Developers could refactor their setUp functions to run in two parts: data generation and querying data. Data is generated in isolation and doesn’t take input or return anything, nor does it mutate the test class attributes in any way. Once you run data generation, you can query for the generated data and set class attributes with the result.

def my_seed_function():
	company = generateCompany()
	for i in range(10):
		generateEmployee(company)

class TestSuite(RipplingTestCase):
	def setUp(self):
		# data generation	
		my_seed_function()

		# query data & set class attributes
		self.company = Company.objects.get()
		self.employees = list(Employee.objects())

	def test_employees(self):
		self.assertEqual(10, len(self.employees))

RipplingTestCase refactored in preparation for PreSeeding

With a test class structured in this backward-compatible way, PreSeeding is a very simple change. Register your seed function with the framework, remove data generation from your setUp, and specify which unique seed_id the test class should restore before every test run. PreSeedTestCase will handle the rest.

# seed function is defined and registered
# with the framework with a unique seed_id
def my_seed_function():
	...

class TestSuite(PreSeedTestCase):
	# the test class is responsible for restoring the data
	# generated by this registered seed function ahead of setUp.
 	seed_id = "my_seed_function"
	def setUp(self):
		# data generation is skipped entirely	

		# query data & set class attributes
		self.company = Company.objects.get()
		self.employees = list(Employee.objects())

	def test_employees(self):
		self.assertEqual(10, len(self.employees))

A refactored test case that now leverages PreSeeding

The framework also comes with a CLI, which runs the seed generation step outside of test execution.

Adoption

While developing this test framework, we wanted to ensure that developers could easily migrate their tests—and that doing so would actually improve their local testing workflow. To prove this, we tried to convert some product teams’ test classes to be PreSeeded on their behalf and collected feedback on whether it improved their test experience.

We targeted test “base” classes with high setUp times and many child test classes. These were good candidates for PreSeeding because one code change would immediately improve the setUp times for a large number of child test classes that share the same setUp.

We found our candidate with SpendManagementTestBase—a test base for our Spend Management product that took about two minutes to setUp and was inherited by 350 test classes. With one code change that refactored and PreSeeded this test class, we were able to reduce the setUp time for all 350 test classes by about two minutes. The time remaining in the setUp for this test base was just restoring the PreSeed snapshot and querying data from it to set class attributes.

The Spend Management team were early adopters of this framework, and all of their backend tests are now PreSeeded, saving them about two minutes per test run in local development. Since sharing this more broadly within our engineering organization, we’ve had a lot of positive feedback from developers and rapid adoption of PreSeeding for their slowest-to-setUp test suites. Today, around 13% of our test classes are PreSeeded, and this number is growing over time. As a result, our setUp time distribution has meaningfully shifted.

Distribution of setUp times across our 10,000 test classes before and after PreSeeding

A framework for faster tests

Our developers expressed frustration with slow test execution time in local development. We collected data to support this evidence and found that tests were slow to setUp across the board. We found a number of reasons why these tests were slow to setUp, but there was no silver bullet to help speed up test execution across the board. So, we developed the PreSeed Test Framework, which cached setUp data across multiple test runs and effectively amortized test setUp time to zero. By getting some early adoption, we proved the framework's value early on. Since releasing the framework broadly, we’ve seen adoption rise and time spent waiting for tests drop.

I mentioned earlier that any improvements to test execution time in local development could also translate to less time and money spent running tests in our CI system. This was true in the case of the PreSeed Test Framework. We built a corresponding PreSeed CI component, which generated PreSeed cached artifacts shared between builds, handled automatic seed invalidation, and even improved local development PreSeeding even further. This will be the subject of a future blog post. Stay tuned!

If you love solving meaningful engineering problems, check out our open roles here!

We're hiring engineers!

See open roles

Disclaimer: Rippling and its affiliates do not provide tax, accounting, or legal advice. This material has been prepared for informational purposes only, and is not intended to provide or be relied on for tax, accounting, or legal advice. You should consult your own tax, accounting, and legal advisors before engaging in any related activities or transactions.

last edited: August 29, 2024