2020 Hindsight: Building Reliability and Innovating at DoorDash

Throughout this exceedingly tough year, many more people grew to appreciate ordering food through DoorDash, restaurants shifted from inside dining to delivery, and Dashers, our term for drivers, found greater demand for their services. And our engineering organization worked tirelessly to improve our platform’s reliability and efficiency to serve these customers, launching new initiatives and continuing to build on our long-term strategy.

Highlights from this year include work on our microservices architecture and migrating business logic, a process begun in 2019, improving our reliability metrics on a platform facilitating millions of deliveries per day. To support the many data-driven aspects of our business, we built new pipelines and found other ways to improve our data infrastructure’s speed, reliability, and usability.

One big project involved integrating the recently acquired Caviar on our platform, giving our engineers the challenge of figuring out how to support two distinct brands on the same backend. We also launched new consumer-facing features, such as food pick-up, which required frontend innovation.

Our Data Science team pushed the envelope, finding ways to improve our forecasting abilities, ultimately leading to better delivery experiences not only for consumers, but also for Dashers and restaurants. Beyond delivery improvement, Data Science projects permeate DoorDash, improving efficiency company-wide.

Facing the fact that we’ve got a lot more engineering work in front of us, we’re expanding our team. Most notably, we opened a new engineering office in Seattle to support specific business initiatives and give new employment options for many talented engineers.

The challenges of 2020 have been unique, but our efforts have made DoorDash’s platform stronger and more reliable to support our business for years to come. We accomplished much more than recounted in these highlights, much of which we detail on the DoorDash Engineering blog.

Moving to microservices

The continued growth of DoorDash’s business brought us to the realization in 2019 that we needed to fundamentally re-architect our platform. Our original monolithic codebase was stressed from the need to facilitate millions of deliveries per day, while a growing engineering organization meant hundreds of engineers working to improve it. To support our scale, we began migrating from the original codebase to a microservices architecture, work that continues through 2020, improving reliability and developer velocity.

As an example of the kind of work required for a project such as this, we migrated the original APIs supporting our DoorDash Drive white label delivery service to the new architecture. Through careful planning, we were able to identify the APIs’ business logic and endpoints, then safely migrate them.

Our new architecture gave us many opportunities to improve our platform. As another example from DoorDash Drive, we implemented a new orchestration engine for asynchronous task retries. By moving our platform to a Kotlin based stack we could upgrade our task processing and orchestration engine from Celery to Cadence, which is more powerful and improved the platform's reliability.

Building our data infrastructure

Data forms the functional bedrock for a large, very active platform such as DoorDash’s. Obvious data needs include restaurant menus and consumer addresses. Other types of data, such as how long it took a Dasher to make a delivery, lets us analyze and improve our services. Some databases need to support large tables with limited updates, while others, containing quickly changing information, support continual access. We’ve taken a thoughtful approach to building our data infrastructure, designing it to facilitate a variety of internal users, from data scientists to product engineers, and scale with our business.

As an example of DoorDash’s unique needs when it comes to data, we store and make accessible images, including restaurant logos and photos of menu items. Setting up a tool to let restaurants update their logos was a challenge given our gRPC/Kotlin backend. Our solution involved building a new endpoint that could handle image data and communicate with REST APIs.

Working on DoorDash’s platform without disrupting production services is akin to replacing the fuel injectors on Bubba Wallace’s stock car on track during the Daytona 500. When it comes to updating large tables in our infrastructure, adding a column or field presents risk and could be a very time-consuming task. Given this need, we found a way to perform production table updates quickly and safely, ensuring the reliability of our platform.

Delivering for our customers

Consumers, restaurants, and Dashers interact with our services through the web, and our iOS and Android apps. Serving these users requires a flexible frontend architecture that supports experimentation, scales, and enables personalization features. Leveraging our new microservices architecture, we created a concept we call Display Modules, frontend building blocks which we can iterate quickly to deliver a delightful and usable experience.

Beyond this kind of foundational work, our engineers found plenty of opportunity to innovate around new consumer-facing features. For example, our launch of a new pick-up feature, where consumers can place an order at a restaurant and pick it up themselves, required displaying a map of nearby restaurants, as distance became a more crucial factor in choice. Implementing location-based services on the web became an interesting challenge for our engineers, with some valuable lessons learned.

The addition of Caviar to our platform increased delivery opportunities for Dashers, and extended the brand’s reach to consumers and upscale restaurants in new cities. To achieve economies of scale, however, we needed to make a fundamental change, serving two brands, Caviar and DoorDash, from the same backend. Our engineers redesigned these two frontends using React components, which gave us the flexibility to shift the web experience depending on the consumer’s entry point.

Leveraging machine learning

Machine learning is essential for the type of data-driven decisions on which DoorDash builds its business. Modeling based on historic data enables everything from the very functional, such as how much consumer activity we can expect at a given time in a specific city, to the financial, including where our marketing budget can give the greatest return. Our team of data scientists continually innovate with an eye towards the practical needs of the business.

Some of the work we do involves solving general issues in data science, such as how to derive value from experiment results that show little variation between different groups. In this case, we applied causal modeling, a means of determining the impact strength of different product features on experimental groups. This method gives us greater insight into subpopulations when traditional A/B tests show flat results.

Machine learning implies intensive automation, but sometimes we find a need for more traditional solutions. For example, creating tags for our vast database of food items was a task that could only practically be accomplished through machine learning. However, the limits of this solution required that we find the optimal place in the workflow for human agents to ensure the greatest accuracy.

Optimizing efficiency

Along with the work of building models and innovating our methodologies, our platform must serve machine learning models in production quickly and efficiently. Our new microservices architecture, based on gRPC and Kotlin, showed significant network overheads in this area. Addressing this issue with client-side load balancing, payload compression, and transport-level request tracing led to an impressive performance gain, reducing network overheads for model serving by 33 percent.

As the demands on our platform grew, we found that our asynchronous task processing, handled by an implementation of Celery and RabbitMQ, was in need of an upgrade. Among the multiple potential solutions we considered, we landed on Apache Kafka, along with a deployment strategy allowing for a fast, incremental rollout that let us tackle new problems sequentially. Moving to Kafka gave our platform greater reliability, scalability, and much-needed observability.

Looking to the future

Our DoorDash Engineering blog only recounts a fraction of the wins achieved by our engineering team. Maintaining and improving our three-sided marketplace, and launching new business initiatives based on our logistics platform, involves continuous innovation. Frontend engineers may deliver DoorDash’s most recognizable experiences, but database and backend engineers ensure our platform operates at peak efficiency, while data scientists come up with novel means of improving our services.

The growing demand on our platform throughout 2020 made it clear that we will need many more engineers to meet our needs. Preparing for this expansion, we planned our newest engineering office, based in Seattle, joining our San Francisco Bay Area and New York City-based engineering teams. Given the constraints of COVID-19, our Seattle office will remain virtual for the time being, but we hope our new engineers can convene there in the next year.

Interested in joining our dynamic team of engineers and data scientists? Take a look at the many roles open on our careers page!

2020 Hindsight: Building Reliability and Innovating at DoorDash

Moving to microservices

Building our data infrastructure

Delivering for our customers

Leveraging machine learning

Optimizing efficiency

Looking to the future

About the Author

Related Jobs

Similar Blogs

Path to high-quality LLM-based Dasher support automation

How DoorDash is pushing experimentation boundaries with interleaving designs

Growing Your In-House Legal Career