Page 38 – DoorDash

At DoorDash, providing a fast, on-demand logistics service would not be possible without a robust computing infrastructure to power our backend systems. After all, it doesn’t matter how good our algorithms and application code are if we don’t have the server capacity to run them. Recently, we realized that our existing Heroku-based infrastructure wasn’t meeting our needs, and that we would need to upgrade to Amazon Web Services. With the help of an up-and-coming technology called Docker, we were able to make this transition in much less time and effort than would have otherwise been possible.

DoorDash was originally hosted on Heroku because it was a simple and convenient way to get our app up and running. Instead of worrying about the low-level complexities of server infrastructure, we could focus our time on developing product features. However, using a “platform-as-a-service” was not without tradeoffs, and as our traffic scaled up, we started to face some serious problems and limitations with Heroku.

Performance: The most pressing issue was the lackluster performance of Heroku server instances (aka “dynos”). Each dyno is extremely constrained in its CPU performance and memory resources (which is not surprising considering that Heroku runs multiple dynos on a single Amazon EC2 instance). Even after extensive performance tuning of our Django app, we were still forced to run a lot more dynos than we would have liked and didn’t see how this would continue to scale.
Cost Efficiency: Heroku dynos were very expensive for the computing resources we were getting. For roughly the same price as a Heroku “2x” dyno with 1GB RAM, we could have rented an Amazon c3.large EC2 instance with 3.75GB RAM.
Reliability: Surprisingly, we found that Heroku was plagued by reliability issues. An outage in Heroku’s deployment API would seem to pop up every week or two. One memorable half-day incident prevented us from pushing a critical hotfix and permanently eroded our trust in the platform.
Control: With Heroku, you lose fine-grained control and visibility over your servers. Installing custom software in your server stack is far from straightforward, and it’s not possible to SSH into a server to debug a CPU or memory issue.

In order to overcome these issues, we knew that we needed to move off of Heroku and find a new hosting provider. The logical upgrade choice was Amazon Web Services and its Elastic Compute Cloud (EC2) service. Amazon EC2 virtual server instances come in a wide variety of CPU and memory configurations, feature full root access, and offer much better performance-per-dollar than Heroku.

While AWS looked like an attractive solution, migrating from a high-level, managed platform provider like Heroku can be a daunting task. Simply put, there is a nontrivial amount of work needed to administer a cluster of servers and set up continuous code deployment. To automate the process of setting up our servers, you would normally need to set up “configuration management” software such as Chef or Puppet. However, these tools tend to be clunky and require learning a domain-specific language. We would need to write scripts to perform tasks such as installing all of our app’s third party dependencies or configuring all the software in our server stack. And to test that this code works, we would need to set up a local development environment using Vagrant. All of this amounted to a significant amount of complexity that was not looking too appealing.

Docker

We wondered if there was an easier way to get onto AWS, and had been hearing a lot about this relatively new virtualization technology called Docker (www.docker.com). Docker allows you to package an application, complete with all of its dependencies, into a “container” environment that can run in any Linux system. Because Docker containers are simply snapshots of a known and working system state, it’s finally possible to “build once, run anywhere”.

Docker containers encapsulate a virtual Linux filesystem, providing much of the portability and isolation offered by a virtual machine. The big difference is in the resource footprint. Whereas a virtual machine needs to run an entire operating system, containers share the host machine’s Linux kernel and only need to virtualize the application and its dependent libraries. As a result, containers are very lightweight and boot up in seconds instead of minutes. For a more technical explanation about containers, check out this article from RightScale.

Implementation

After learning what Docker was capable of, we knew that it could play a key role in accelerating our AWS migration. The plan was that we would deploy Docker containers running our Django app on to Amazon EC2 instances. Instead of spending effort configuring EC2 hosts to run our app, we could move this complexity into the Docker container environment.

Building the Docker Image

The first step was to define the Docker image that would house our app. (To clarify some terminology, a Docker “image” is basically a template for containers, and a container is a running instance of an image). This mostly involves writing a simple configuration file called a Dockerfile. In contrast to complex Chef or Puppet scripts written in a custom DSL, a Dockerfile closely resembles a series of Bash commands and is easy to understand.

Docker images are composed in layers and can be based off of other images. As a starting point, you would typically use a stable Docker build of a popular Linux distro such as Ubuntu or CentOS. These “base images” are hosted on Docker Hub, a public repository of predefined Docker images.

In the case of our Django app, most of the work was figuring out how to get tricky third-party Python dependencies to install (particularly those with C-extensions) and setting up the more complex software components in our stack (such as our web server or a database connection pooler). Normally this process can be tedious and filled with trial and error, but being able to test everything in our local environment was a huge boon. It takes next to no time to spin up a new Docker container, allowing for a super fast development pace.

Preparing the Docker Environment

The second step was to set up a Docker runtime environment on EC2. To save us time, we decided to use AWS Opsworks, a service which comes with a built-in Chef layer to help manage EC2 instances. While we couldn’t avoid Chef, we didn’t have to spend as much time wrangling with it because we had already defined the vast majority of system configuration inside our Docker image.

Our code deployment flow was straightforward, and mostly consisted of building a Docker image off our latest codebase and distributing it to our server instances, with the help of a Docker image server. We only needed to write a few short Chef scripts to download the latest Docker image build and start up a new Docker container.

Our Docker container-based deployment flow.

Results

From conception to completion, our migration from Heroku to AWS took two engineers about one month. Amongst other things, this included learning Docker, AWS and Chef, integrating everything together, and testing (and more testing). What we appreciated most about Docker was that once we got a container working locally, we were confident that it would work in a production environment as well. Because of this, we were able to make the switch with zero glitches or problems.

On our new Docker+AWS environment, we were able to achieve an over 2x performance gain compared to our Heroku environment. DoorDash’s average API response time dropped from around 220ms to under 100ms, and our background task execution times dropped in half as well. With more robust EC2 hardware, we only needed to run half the number of servers, cutting our hosting bill dramatically. The extra degree of control over our software stack also proved useful, as it allowed us to install Nginx as a reverse proxy in front of our application servers and improve our web API throughput.

Final Thoughts

Needless to say, we were pretty happy about our decision to move to AWS, and Docker was a key reason why we were able to do it quickly and easily. We were able to greatly improve our server performance and gain much more control over our software stack, without having to incur a ton of sysadmin overhead. Being able to develop and tweak a Docker container locally and knowing it will run exactly the same in any environment is a huge win. Another big advantage is portability — rather than being than being too tied to our existing hosting provider, we can easily deploy to any other cloud infrastructure in the future.

It’s clear that Docker is a powerful technology that should empower developers to take control over how they deploy their apps, while shrinking the marginal benefit that managed PaaS solutions like Heroku provide. We’re definitely looking forward to see what else we can do with it.

– Alvin Chow

At DoorDash, we’re building more than just an app. We’re building a system of products to enable on-demand delivery for local cities.

People don’t use DoorDash because we have a pretty, easy-to-use app that allows you to order food. People use DoorDash because we provide the fastest and most reliable delivery service. At the end of the day, people aren’t purchasing pretty pixels but an amazing delivery experience: does food show up fast and on-time, consistently? As a result, for the past year, most of our resources has actually been focused on developing the underlying logistics technology that fulfills deliveries in real-time.

Building such a system can get pretty complicated, especially at scale. There are many sides involved in this delivery ecosystem: consumers, drivers and merchants. Traditionally, most delivery companies only focused on one of the three sides. For example, we have lead-generation companies, like Grubhub and Seamless, that merely pass delivery orders onto restaurants. On the other hand, we see many local courier services that only focus on providing drivers.

But we took a different approach. We started DoorDash because we wanted to build the best local delivery service. From first principles, the only thing that made sense was to build a full stack delivery service. By partnering with merchants, contracting our own fleet of drivers, and building our own logistics software, we were able to control the entire delivery experience to make it more efficient for everyone.

Taking such a “full stack” approach offers us several benefits:

Superior experience. Instead of being at the mercy of individual merchants, we can design the delivery experience the way we want it to be
Better pricing. Instead of putting the full burden on one side, we share the delivery cost: we can charge lower delivery fees for consumers and take lower commissions from the merchants
Increased efficiency. Since we have deep integration within merchants and drivers, we can fulfill the same number of deliveries with less time and far fewer drivers, by eliminating inefficient transactions and driver downtime

System of Products

The challenge with a full stack approach is that we have to build for many audiences: a system of products that involves all sides to work together. At DoorDash, there are four sets of products:

Consumer

This is the side that most of you are familiar with — the web and mobile interfaces that allows consumers to place orders. This is the simplest product to build and relatively straight forward: press a button and food shows up.

Merchant

Once an order gets placed, we have to send that to the merchant. This requires us to build products around merchant integration: what is the fastest and cheapest way to send orders to restaurants (without a human manually calling in orders)? The challenge here is designing for a wide range of merchants: everything from a mom-and-pop shop up to national chains. There is no one solution that fits all.

Driver

Next, the order needs to be assigned to a driver. The main problem we have to solve here is thinking about using mobile to coordinate an on-demand workforce. We’ve built software that drivers can install onto their phones that allows them to accept orders whenever they have downtime. There are many interesting challenges around fleet coordination and supply matching.

Dispatch

This is the central brain that monitors and controls everything that goes on in our system, making decisions on when to send orders to merchants, which drivers to assign orders to and alerting operations when things go wrong. Dispatch is by far our most complicated piece of technology we’ve built and forms the lifeblood of DoorDash.

Areas to Design For

Having spent the last year building out our system, there are several areas we’ve learned to design around:

Mobile

All three of our products are mobile. Even two years ago, something like DoorDash wouldn’t have been possible, because smartphone adoption wasn’t pervasive. Now, everybody has a supercomputer in their pockets and we can tap into that potential.

Mobile has enabled us to design an efficient delivery network that eliminates the need for heavy infrastructure and fixed costs. Everything becomes on-demand.

Communication

The complexity in our system arises because there are so many moving pieces involved in a delivery. It is just as important to think through the interactions and protocols between products, in addition to the individual products themselves.

For example, if a consumer changes the order, we need to alert the restaurant. If the restaurant experiences kitchen holdup, we need to reassign the driver (as opposed to wasting time sitting at the restaurant) and the consumer needs to be notified. If a delivery messes up, dispatch needs to figure out in real-time what went wrong (driver spilled food or kitchen missed item?) and take the appropriate actions.

The key is that all the products are constantly talking to each other and working together to make a delivery happen. Now, multiply that by 100,000 deliveries at the same time, and you’ve got yourself an interesting operations problem with scale, automation and efficiency.

Modularity

By taking a system approach to designing DoorDash, it lends itself very well to modularity. This is an integrated system, yet consists of independent products. If we wanted to deliver flowers tomorrow, we can simply swap out the consumer piece, but still utilize the same dispatch and driver technologies.

DoorDash was never a food company — and it certainly won’t be moving forward. We’ve always designed it to be a generalizable delivery network from day one. We’re only scratching the surface on the potential this system can offer. As we continue with our phenomenal growth, we’ll have to continuously evolve our system of products to keep up with scale. And hopefully, if we do our jobs right, this will become a system that will one day power urban logistics in cities all over the world.

Stanley

Here at DoorDash, we’re tackling the problem of real-time delivery by integrating all players involved into our logistics platform. By controlling the entire stack of the delivery process, we believe we’re positioned to provide the most consistent logistics. In order to provide integrated logistics for our restaurant food delivery service, we’ve had to build out 3 different user facing products: for customers (to order food), drivers (to deliver the food), and restaurants (to make the food).

What does that have to do with APIs? Well, having to build a back-end that communicates well with three different kinds of users simultaneously requires a disciplined approach to implementing our internal APIs.

I wanted to share a few lessons I’ve learned about how our engineering team at DoorDash has handled these challenges. We’re using Django as our web framework and Django REST framework to build out our internal APIs (mainly because its browseable web API makes testing so easy!). However, the principles laid out here are generalizable beyond our specific programming language choices.

Not tryna read the whole thing? We got you covered: tl;dr

Our First Approach

Since the actions our APIs enable are so closely related to our data models, we initially created a single resource for each model (Django REST framework provides a ModelSerializer class that made it easy for us to do this):

customers/resources.py

class OrderResource(serializers.ModelSerializer):
    class Meta:
        model = Order
        fields = (
            'id',
            'subtotal',
            'commission',
            'tip_amount',
            'items',
            ... etc. ...
        )

This was the obvious way to go about connecting our data to our internal APIs, but we quickly ran into a serious issue: each of our apps needed different information from the same model. For instance, we didn’t want our delivery drivers to see the tip_amount because we want our drivers to treat all customers equally. Another example is that we didn’t want to expose to the customer the commission that we’re getting from the restaurants. What we were starting to do to account for these nuances was to include all the information in the serialization of the order and have each app be responsible for what data to show. Yet, such a solution required all our apps to be responsible for knowing what data they should be accessing. This created a serious privacy concern for us, so we quickly abandoned this one-model-to-one-resource approach.

The Right Approach

We learned that in order for DoorDash to be able to continue to safely and effectively share data across different apps, we needed to enforce modularity of the API-accessible data between applications. Thus, for each model, we created a separate resource for each relevant application:

customers/resources.py

class CustomerOrderResource(serializers.ModelSerializer):
    class Meta:
        model = Order
        fields = (
            'id',
            'subtotal',
            'tip_amount'
            ...
        )

drivers/resources.py

class DriverOrderResource(serializers.ModelSerializer):
    class Meta:
        model = Order
        fields = (
            'id',
            'subtotal',
            ...
        )

restaurants/resources.py

class RestaurantOrderResource(serializers.ModelSerializer):
    class Meta:
        model = Order
        fields = (
            'id',
            'subtotal',
            'commission',
            ...
        )

It seemed a bit cumbersome at first to have three separate resource classes for the same model, but it paid off big time. Prefixing each resource with the application they’re meant to be used with (reinforced by Django’s inherent application directory structure) made it easy for our engineering team to relate the resources to the correct app.

We also namespaced our API urls to reinforce modularity between applications even further. For instance, accessing an order from the customer-facing API would use api.doordash.com/customer/order/<id>/ whereas the driver-facing API would use api.doordash.com/driver/order/<id>/.

With the separate API resources and the namespaced URLs, we’re now assured that all our apps are accessing the data that they’re supposed to be seeing. Plus, by embedding the data-access policies within the resource layer, privacy is automatically enforced by the API infrastructure.

We could have used fine-grain permissions logic to filter out the information that we didn’t want to expose, but it would have created unnecessary complexity for our entire back-end. We’ve found that treating each application as its own module made our lives as developers a lot easier.

tl;dr

Don’t tie your API representation directly with your data models
Different applications have different information needs
Use a layer of abstraction (API resources) to hide your internal data representation from the users of your API
Use multiple API resources for the same model (or database table if you’re not using ORM) to help enforce modularity between applications
Namespaced URLs explicate modularity between applications
Making data-access policy choices at the API resource layer allows privacy to be automatically enforced by the API infrastructure
Similarly, enforcing modularity eliminates the need for complex permissions logic
Treat your internal APIs as if they were external-facing — it will force you to stick to good RESTful design principles

We by no means know all the answers when it comes to RESTful API design/implementation, so if please contact us if you have any feedback on this blog post.

Interested in helping build out RESTful APIs for the world’s first on-demand delivery company? We’re hiring!