How to Boost Code Coverage with Functional Testing

In this blog post, we introduce a functional testing approach that does not need any manual setup and can be run like unit tests locally or in a Continuous Integration (CI) pipeline. Specifically, this approach does the following:

Helps catch and reproduce more bugs during local development and greatly reduce debugging time, as well as early bug detection and release confidence.
Accelerates internal refactorings by testing API contracts without getting involved in internal implementation details.
Provides greater code coverage than traditional unit tests.
Serves as concise and human-readable documentation for business logic.

Functional tests vs. Unit tests

Before we dive into the details, let’s provide a background on the similarities and differences between functional tests and unit tests.

The following properties demonstrate some of the similarities between functional tests and unit tests are:

Automated - Both can be run on a developer's local machine without any manual setup and both can be run in a CI pipeline.
Hermetic - Both avoid network calls to external dependencies.
Deterministic - If nothing related to the code being tested changes, then test results from both should not change.
Behavioral - Both will codify expected behavior and are sensitive to changes in the behavior of the code.
Predictive - If the tests all pass, then the code is suitable for production as both will prevent regressions when making code changes and can catch bugs before any code changes go to production.
Debuggable - Both will help when debugging by providing a quick way for developers to run their code locally with a reproducible state. They also provide an opportunity for developers to debug their code in an Integrated development environment (IDE) by adding breakpoints, inspecting code and controlling execution.

Stay Informed with Weekly Updates

Subscribe to our Engineering blog to get regular updates on all the coolest projects our team is working on

The following properties demonstrate some of the differences between functional tests and unit tests:

Test Scope
Refactoring Sensitivity
Self Documenting

Unit and functional tests differ in Test Scope because functional tests focus on public API endpoints while unit tests focus on the implementation details. Functional tests will only exercise the public API endpoints, which will in turn exercise all layers of code including making real calls to databases and downstream dependencies. Unit tests will test each layer of a system by short circuiting immediate dependencies through mocking. Mocking is a process by which dependencies are replaced with an implementation that is controlled by the test. Note that functional tests avoid mocking by instantiating compatible dependencies in place of the real downstream dependencies. A compatible dependency could be an embedded database, an embedded message queue, or an embedded service that is listening for actual HTTP and gRPC requests.

These types of tests also differ in Refactoring Sensitivity because functional tests typically do not need to be updated when changing or refactoring internal Classes and Methods. Unit tests and their mocks will need to be re-written when internal Classes and Methods are changed, which can be very tedious even for minor changes. Functional tests only need to be updated when the API or business logic changes, when the APIs for downstream dependencies change, and when the database schema changes.

The last difference between these types of tests is in being Self Documenting since functional tests will resemble API contracts as they only exercise the public API endpoints while unit tests do not. Understanding unit tests requires internal knowledge of every Class and Method that is being tested and unit tests are usually interspersed with mocking logic.

The following summarizes the pros and cons of Functional testing vs Unit testing.

A good rule of thumb is to cover all expected API behavior with functional tests first. Then write unit tests for internal utilities and libraries that are mostly static and require minimal mocks.

How we implemented Functional Tests

At DoorDash, services are written in Kotlin and use Guice for dependency injection. The APIs are exposed using gRPC, and the database is either Postgres, CockroachDB, or Cassandra. Tests are written using JUnit. JUnit is a test framework that allows developers to automatically run unit tests on their machines as well as in CI.

A functional test typically has the following steps:

Test setup: Spins up an instance of the service, cleans up databases, and anything else that may carry state between tests.
Prepare block: Sets up any required state in the database and stubs downstream services to set up a given scenario.
Act block: Sends a network request to the service API.
Verify block: Asserts on the network response and side-effects such as database changes.

Test setup

The majority of the work in writing functional tests lies in its setup. Since functional tests do not mock internal Classes and Methods, we needed to figure out how to write and execute functional tests in JUnit by spinning up the service along with compatible implementations of all of its dependencies. These are the strategies we followed for the test setup.

Use real databases which we can freely wipe

We used Testcontainers to spin up any database we want. docker-compose can also be used to spin up a database, but they need to be manually set up before running the tests. On the other hand, Testcontainers can be set up and torn down programmatically by JUnit.

Stub network responses instead of mocking classes and methods

Services at DoorDash interact with other services either using gRPC or REST, and we wanted to test those interactions as much as possible. We made use of gRPCMock to handle interactions with gRPC services instead of mocking calls to gRPC clients. Similarly, we use WireMock or MockServer to handle interactions with REST services instead of mocking calls to HTTP clients. All of these libraries spin up real servers and allow tests to set up responses to requests. This way, we test not only the service’s code but also the gRPC/HTTP client code it interacts with, thereby covering a lot more code than unit tests do.

Start a live version of the service

As the final step, we spun a live instance of the gRPC service and used it to test our APIs. Most services use Guice for dependency injection, and we used the same Guice injector in tests, with some overrides to allow it to run locally. Depending on the tests, the overrides could be something as simple as injecting local configuration into some of our dependencies or something more sophisticated such as a custom feature flagging implementation that allows per-test overrides. While Guice overrides are convenient, we try to use them sparingly as they increase the difference between local and production environments. In addition to Guice, a lot of our services also have their configuration exposed as environment variables and we made use of System Rules to set them up.

Create a gRPC Client to connect to the service

In order to test our gRPC service, we need to be able to make calls to it. We created a gRPC client and pointed at the service spun up in the previous step.

The test setup strategies we used above ensured that the service was tested with its setup as close to the production environment as possible so we have high confidence in how it would function in production.

Prepare Block

After the test setup process is complete, we set up state in the DB and stub responses from downstream services for the scenario being tested.

Act Block

In order to test a scenario or an API, we make API call(s) to the service using the gRPC client created in the test setup.

Verify Block

We assert on the response from the API call and verify it is as expected. If necessary, we also query the DB state and verify that it is as expected and verify from the downstream stubs that calls got made to the downstream services with the right inputs.

A sample functional test

Let’s consider a simplified version of the DoorDash Subscription Service having an API to subscribe new users to a subscription plan. As input, the API takes in the ID of the user subscribing and the ID of the plan being subscribed to. As output, the API will either return an error if a user is ineligible to subscribe to the plan, or a success result with custom text to be shown to the user.

In order to determine subscription eligibility, the Subscription Service will need to:

Query Location Service, an internal gRPC service, to check if the plan is available at the user’s location.
Query the Subscription Service Postgres database to see if the user already has an active subscription. There can only be one active subscription at a time!

Like many other services in DoorDash, Subscription Service is written in Kotlin, exposes its APIs via gRPC, and uses Guice for dependency injection. We will use simplified pseudo-Kotlin code for this example.

Our service has a function that is an entry point to instantiate the Guice injector and start the gRPC server.

fun instantiateAndStartServer(guiceOverrides: Module): Injector {
    val guiceInjector = Guice.createInjector(
        Modules.override(SubscriptionGrpcServiceModule()).with(guiceOverrides)
    )
    val server = guiceInjector.getInstance(SubscriptionGrpcServer::class.java)
    server.start()
    return guiceInjector
}

Test setup using a base Class

As we already described above, we need a way to start up a database using Testcontainers, stub network responses using GrpcMock, start a live version of the Subscription Service, and create a gRPC Client to connect to the running service. To do so, we defined a base Class to make setting up, configuring, and reusing resources easier between tests.

open class AbstractFunctionalTestBase {
    companion object {
        // All fields in companion object are static
        // and can be reused between instances.

        val postgresContainer =
            PostgreSQLContainer(DockerImageName.parse(" postgres :15.0"))

        val locationService = GrpcMockServer("location-service")

        val guiceOverrides = Module {
            // Use locally running Postgres instance.
            it.bind(PostgresConfig::class.java).toInstance(
                PostgresConfig(
                    host = postgresContainer.host,
                    port = postgresContainer.port,
                    user = "root",
                    password = ""
                )
            )

            it.bind(LocationServiceClientConfig::class.java).toInstance(
                LocationServiceClientConfig(
                    host = "localhost", port = locationService.port
                )
            )
        }

        val guiceInjector = instantiateAndStartServer(guiceOverrides)
    }

    val postgresClient: PostgresClient =
        guiceInjector.getInstance(PostgresClient::class.java)
    val subscriptionServiceGrpcClient =
        SubscriptionServiceGrpcKt.SubscriptionServiceCoroutineStub(host = "localhost")

    @BeforeEach
    fun beforeEachTest() {
        // Delete any rows written by previous tests.
        postgresClient.flushAllTables()
    }
}

Prepare, act and verify blocks for a happy path

Now that we've set up our base Class, we can write tests that use the same production postgres client to write data and define response stubs for our requests to our dependencies. Let's start with a test of the happy path. Our test will create a test subscription plan for a user in the USA by creating a test user, subscribing the user to the plan, and making sure that the subscription record created in the DB is as expected. A happy path test might look like this:

class SubscriptionFunctionalTests : AbstractFunctionalTestBase() {
    fun `eligible user should subscribe to monthly plan`() {
        // Prepare: A DoorDash user that is eligible for a monthly plan.
        val monthlyPlan =
            postgresClient.writeNewPlan(type = "Monthly", area = "USA")
        val user = postgresClient.writeNewUser(email = "[email protected]")
        locationService.stubFor(
            method = GetUserAreaRequest,
            request = """
                {"user_id": ${user.id}}
            """,
            response = """
                {"area": "USA"}
            """
        )

        // Act: We call subscribe.
        subscriptionServiceGrpcClient.subscribe(
            """
                {"user_id": ${user.id}, "plan_id": ${monthlyPlan.id} }
            """
        )

        // Verify: User should be subscribed.
        val subscription = postgresClient.getSubscription(userId = user.id)
        assertEquals("active", subscription.status)
    }
}

Since we are reusing the same control flow as in production, code coverage for this test will include any internal components used to retrieve, write, and transform data, thus reducing the need for finer-grained unit tests, while providing a readable, high-level test that documents the happy path business logic of this endpoint.

Functional tests allow us to quickly reproduce and fix bugs

The functional testing approach allows us to quickly reproduce and fix bugs with the confidence that we can add a new functional test that will exercise all layers of the code at once. Imagine that DoorDash is launching a new type of plan in the US that renews annually. However, after launch, we get reports of users in Canada subscribing to this plan and discover that there is a bug in the location check logic in the Annual Plan implementation.

The fastest way to both reproduce and fix the bug would be to write a functional test that reproduces the issue, and modify our code until this new functional test passes. Since we can recompile the code and re-run the tests in our development machine quickly, we can quickly roll out a fix!

The new functional test would look like this:

fun `users in canada should not be able to subscribe to annual plan`() {
    // Prepare: An annual plan which is available only in USA, and a user from Canada.
    val annualPlan = postgresClient.writeNewPlan(type = "Annual", area = "USA")
    val canadaUser =
        postgresClient.writeNewUser(email = "[email protected]")
    locationService.stubFor(
        method = GetUserAreaRequest,
        request = """
                {"user_id": ${canadaUser.id}}
            """,
        response = """
                {"area": "CA"}
            """
    )

    // Act: A user from Canada tries to subscribe to the annual plan.
    val response = subscriptionServiceGrpcClient.subscribe(
        """
                {"user_id": ${canadaUser.id}, "plan_id": ${annualPlan.id} }
            """
    )

    // Verify: We should get an error response back and not create any subscription.
    assertEquals("error", response.getError())
    assertNull(postgresClient.getSubscription(userId = canadaUser.id))
}

Challenges we faced and how we overcame them

Spinning up a real instance of our app, real database, and stubbing gRPC servers takes significantly longer than just mocking them. To help with this, we made JUnit extensions that ensured we spin up only one instance of resources, such as our app, database, and gRPCMock, and any state left over from the previous test is cleaned up before the next test is run. The extensions look very similar to the “AbstractFunctionalTestBase” class in the example above, but with more bells and whistles for ease of use and cleaning up the application state.

Another big challenge is the flakiness of tests due to sharing local databases and application instances. To address this, the following measures can be taken:

Clean up the state of the app and its dependencies before launching the application.
If the act block of the test spawns asynchronous jobs, such as new threads, workflows, etc., they should be joined before moving on to the verify block. We also used a package such as awaitility to test such interactions.
Tests need to be run sequentially and are not thread-safe. We are looking into how to safely run them in parallel.

Ideally, only network responses are mocked, but sometimes that's not practical without a lot of extra work. In that case, we might be tempted to use mocks. Instead, we recommend that a good middle-ground is to implement a fake class instead. For example, we found that it was easier to fake our internal experimentation library so we could change feature flags seamlessly between tests. In a lot of scenarios, it was also easier to inject fake configuration classes instead of setting up separate config files and/or setting up environment variables. However, overrides and fakes are discouraged, and we are working on eliminating their necessity, but it will require more time.

Finally, in order to make it easy for teams and services to adopt this testing approach, we did the following:

Provided JUnit extensions that make it easy to write functional tests.
Provided extensive documentation on how to write functional tests.
Built common helper functions to set up the state for tests including stubbing gRPC calls and populating database tables.

Results

Our development process and developer happiness improved a lot, as it is now possible to quickly set up a test scenario locally, and then keep running and re-running tests, which is very helpful when debugging. Developers also liked that they can use their IDE to debug and inspect an API execution path end to end.

Code coverage also went up significantly, up by even 20% in some services, simply because we cover a lot more code than unit tests. This also boosted our confidence in releasing our code to production.

Most of the new tests are in a functional style and serve as living documentation of our API contracts. In many cases, we also do not have to write additional unit tests for internal implementation details as long as the functional tests cover all possible business scenarios.

Conclusion and future work

While it initially took considerable effort, we found that developers report higher velocity and happiness once functional tests are adopted. For the Subscription Service, more than 600 tests currently cover a myriad of business cases.

This approach has been adopted by several other services, and we plan to continue developing tooling and documentation to encourage further adoption. We are also working on approaches to run these tests in parallel and reduce the total time taken to run them sequentially.

We are also working on integrating filibuster to functional tests. This will give further value by discovering and codifying faults that our services should tolerate.