API Rearchitecture Series – The Juicy Details

3 Comments

In the previous post, my esteemed colleague and sometimes friend wrote about an epic quest that the API team is undertaking. I wanted to take a few moments to explain some problems we have had in our current system, what our new architecture will be, and how it will solve our problems. This blog post will talk mostly about what happens during the lifetime of a REST request and will ignore some of the dependencies for simplicity.

Current System

Right now, we use a very standard web application process flow. As the user makes a request via HTTP the request:

  1. Hits our load balancer that chooses an appropriate web server to handle the request
  2. The webserver is running Apache / PHP, so the request hits apache and is passed to the PHP handler
  3. The code determines what the request is about and loads the correct code path
  4. Specific code to handle the request is executed
  5. Some sort of data is acted upon based on the request (CRUD operations)
  6. A response is returned to the user in JSON format
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
+-----------------------+
| Apache                |
|-----------------------|
|+---------------------+|
|| PHP Process         ||
||---------------------||
||                     ||
|| Parse Request       ||
||       +             ||
||       |             ||
||       v             ||
|| Specific Code       ||
||       +             ||
||       |             ||
||       v             ||
|| Retrieve Data       ||
||       +             ||
||       |             ||
||       v             ||
|| Respond To Request  ||
|+---------------------+|
+-----------------------+

This structure works very well, is common around the internet, and has been serving all of our needs.

UNTIL

That magic moment where the architecture hits a wall and doesn’t quite scale anymore. Many large companies have tech blogs talking about this moment. As always it is hard to say the exact cause, but we have determined a bunch of factors that coincide including:

  1. Increased user base – more users means more concurrent requests.
  2. Increased usage per each user – each user is now doing a lot more requests individually.
  3. Increased complexity in the product – a lot of business logic and “magic” leads to dirty and sometimes slow code.
  4. Increased data set – queries that once were awesome are now slow.
  5. All requests are created equal – system can be overloaded by “complicated” requests. Right now we have rate limiting but that is pretty generic.
  6. Oversized monolithic code base – it is hard to be agile.
  7. Non-event driven system – it is horribly expensive to hold a connection while waiting for data from downstream dependencies. As illustrated in the diagram, each request is allocated a single PHP process.

The Future

In order to properly handle requests and the future, we decided to change the whole flow and architecture. We need to support a way to make our codebase non-monolithic so we can quickly add new functionality, protect our system from usages spikes, and serve 100% of our users needs. This requires our system to be event-driven and follow the reactive pattern, which is a strong argument for why we chose Typesafe’s Play as our framework of choice.

In the new system, right after a request hits the load balancer it will now follow this flow:

1
2
3
4
5
6
7
8
9
+-----------------------------------------+
|                Router                   |
+-----------------------------------------+
+-------------++---++---++---++---++---+
| application || a || a || a || a || a |  ...
+-------------++---++---++---++---++---+
+-----------------------------------------+
|               CRUD Layer                |
+-----------------------------------------+

The Router
A very thin layer to route requests to the right application / end point based on the url and the user. It is config driven and allows new routes to be easily added

Applications
Business logic lives in these applications to handle specific requests. Most requests depend on data in our data store, so each application will interface directly with the CRUD layer. Right now we have over 170 services in our codebase. Imagine being able add new applications or fix a specific application without having to release a whole monolithic application?

CRUD Layer
An easy-to-use JSON based abstraction on top of the database. It is built to replace the need for anyone to connect directly to our database to run queries. We can ensure the quality and quantity of queries since they all have to go through this layer. The CRUD layer can throttle, prioritize, and direct queries to the right databases as needed.

For retrieving data, the user/application just passes in the object type to query along with any filters, sorts, and limits. The CRUD layer handles generating the query, running it on the correct database, casting values, and generating meta data. The meta data includes a “links” section with data on how to retrieve related objects.

For updating and inserting data, the user/application passes in the JSON representation of the object along with any information about the object that is needed. The CRUD layer performs similar actions as it does for retrieval, but also enforces transactionality and validations. Being transactionally safe and having strong validations are extremely important for our downstream applications and our clients (imagine setting a 1 million $ budget for advertising).

What does this all mean? How do these layers work? Have we gone mad with power? For now we are just giving you the tip of the iceberg. In the subsequent blog posts, we will dive into each layer/application and talk in great detail about what it does and how it actually helps us scale.

This entry was posted in Architecture, Back-end Feature, Development Process, Scaling. Bookmark the permalink.

3 Comments
  • Emily

    Nice explanation – it’s not my area, but I can totally understand you!

  • Sam Bessalah

    Nice to see that you are planning on using the Play Framework. But do you intend to use the Scala or the Java API? And why go on with Play instead of Akka/SParay for example.
    Nice writeup though.

  • Masa

    Is the CRUD layer itself a REST API?

    The “links” section in the metadata is suggestive of a hypermedia (link-driven) API.

    If so, did you choose one among several hypermedia formats like application/hal+json currently gaining popularity?