tech blog

AppNexus is today’s most powerful, open, and customizable ad tech platform. Advertising’s largest and most innovative companies build their businesses on AppNexus.

Taming Big Data

| Comments

The Data Platform Engineering Team at AppNexus has been utilizing Hadoop in production for the last 4+ years. Growth of data volume as well as the number of customer use cases supported by Hadoop’s infrastructure has grown exponentially since we adopted the Hadoop stack. More specifically, in 2012, we were processing 10 terabytes of data per day, today we process over 170 terabytes per day.

We evaluated various commercial and open source solutions to reduce our storage foot print, improve Hadoop utilization, and unlock YARN’s multi-tenancy promises.

Our talk covers:

  • Architecture of AppNexus Data Platform: Data ingestion, processing, and how our customers consume the data.
  • Complex use cases supported by MapReduce, Vertica and Spark streaming.
  • Overview of how we are offering our Data Platform As A Service to AppNexus’ business units where teams can build and manage their own YARN application deployments.

x Eighty Swift

| Comments

Understanding the low level behavior of our applications is one of the most useful skills for working on high throughput systems at AppNexus. We currently process around 4 million requests per second with extremely strict latency requirements, so being able to identify and correct inefficiencies at the instruction level can yield significant performance gains. More importantly, being able to work in assembly engenders feelings of supreme confidence and consummate pride (citation needed).

Here are some of our team’s favorite (and/or least favorite) x86 instructions described through comparisons to Taylor Swift songs.


Parquet: Columnar Storage for Hadoop Data

| Comments


At AppNexus, over 2MM log events are ingested into our data pipeline every second. Log records are sent from upstream systems in the form of protobuf messages. Raw logs are compressed in Snappy when stored on HDFS. That said, even with compression, this still leads to over 25TB of log data collected every day. On top of logs, we also have 100s of MapReduce jobs that process and generate aggregated data. Collectively, we store petabytes of data in our primary Hadoop cluster.

Parquet is a columnar storage format in the Hadoop ecosystem. Compared to a traditional row oriented format, it is much more efficient in storage and has better query performance. Parquet is widely used in the Hadoop world for analytics workloads by many query engines. Among them are engines on top of Hadoop, such as Hive, Impala and systems which go beyond MapReduce to improve performance(Spark, Presto).

Parquet stores binary data in a column-oriented way, where the values of each column are organized so that they are all adjacent, enabling better compression. It is especially good for queries which read particular columns from a “wide” (with many columns) table, since only needed columns are read and IO is minimized. Read this for more details on Parquet.

an_message: Format Agnostic Data Transfer

| Comments

Every distributed RESTful system has a communication problem. How does Service A communicate with Service B? Does it pass data via multipart/form-data? Does it pass individual fields on the query string? Does it POST a blob of JSON?

With the proliferation of “RESTful” services the trend is decidedly towards JSON and away from XML. JSON is relatively compact and fast to parse (at least for most services the bottleneck is not parsing the JSON). This works well for most “wait based” services (database lookup, file reads, etc.) However, there is a class of services in the ad-tech space (and elsewhere) that have more stringent SLA’s for which JSON parsing is actually a significant portion of the runtime of a single request. For these services we can do better while still keeping the schematic safety of JSON in place.

AngularJS blog series – Introduction

| Comments

Greetings from your AppNexus Discovery Engineering team in San Francisco! In the spirit of AppNexians sharing, the team here is going to write a series of blog posts about technologies we’ve used to create Twixt, a brand-new application for direct media buying. (If you haven’t heard about Twixt, check out


Twixt is a single page web app, or SPA, built using a number of client and server technologies. The application client that runs in a users browser relies mainly on the JavaScript framework AngularJS. Through these blog posts, we are going to help you better understand what AngularJS (“Angular” for short) is and how we use it.

A single page app, or SPA, is a web app that loads itself mostly all at once at a single URL. Well-organized JavaScript then runs as a single application within the page, making AJAX calls to the server to load information and perform tasks, but never changing the base URL (although the path after the # may change – more on this later). HTML fragments may be loaded from the server to provide layouts for the app, but these are assembled (along with data) into views and injected into the Document Object Model (DOM) by the application itself – everything is handled on the fly by JavaScript.

Single page web apps are popular these days as they allow for the smoothest “desktop app in a browser” experience. There is less flicker between pages and “persistent UI” is truly persistent. If you’ve used Gmail, Google Maps, or listened to SoundCloud, then you’ve used a single page app, and many other big industry players treat large sections of their sites as SPAs.

K-ary heapsort: more comparisons, less memory traffic

| Comments

The impetus for this post was a max heap routine I had to write because libc, unlike the STL, does not support incremental or even partial sorting. After staring at the standard implicit binary heap for a while, I realised how to generalise it to arbitrary arity. The routine will be used for medium size elements (a couple dozen bytes) and with a trivial comparison function; in that situation, it makes sense to implement a high arity heap and perform fewer swaps in return for additional comparisons. In fact, the trade-off is interesting enough that a heapsort based on this routine is competitive with BSD and glibc sorts. This post will present the k-ary heap code and explore the impact of memory traffic on the performance of a few classical sort routines. The worst performing sorts in BSD’s and GNU’s libc overlook swaps and focus on minimising comparisons. I argue this is rarely the correct choice, although our hands are partly tied by POSIX.

Enable Your Python Developers by Making “Code Investments”

| Comments

Enable Your Python Developers by Making “Code Investments”

Note: portions of this post appeared on my personal blog under the title “Supercharge Your Python Developers

I think it’s safe to say that a project’s inception is the best, indeed perhaps only, opportunity to influence the quality of the code for years from now. Many (most?) projects are started without much direction; code simply springs into being and is put under version control. Making a series of thoughtful, upfront “investments,” however, can pay large dividends. In this post, I’ll describe investments I made at the start of a project that allowed a Python novice to quickly write concise, idiomatic, and well-tested code.

Hash Set versus Dense Hash

| Comments

During the development of the Concurrency Kit hash set and hash table, detailed microbenchmarks were used to measure latency and variability of various operations in relation to various open-source hash table implementations. For read-mostly workloads, the implementation was at least twice as fast than Google Dense Hash on reference machines even though it provides stronger forward-progress guarantees for concurrent workloads. For example, it is lock-free for readers and wait-free for writers in single-writer many-reader use-cases. However, a recent use-case required the hash table implementations to handle delete-heavy workloads. As many open-addressing schemes, the implementation failed miserably in this workload due to tombstone accumulation. The strength of the collision resolution mechanism would very quickly lead to the complete elimination of probe sequence termination markers (empty slots) in favor of tombstones. Google Dense Hash performed in a stable manner with the delete-heavy workloads at the cost of a higher operation completion floor for more favorable workloads (due to increased clumping of the quadratic probing it uses). The performance gap for delete-heavy workloads has since then been closed with some trade-offs.

API Rearchitecture Series - The Juicy Details

| Comments

In the previous post, my esteemed colleague and sometimes friend wrote about an epic quest that the API team is undertaking. I wanted to take a few moments to explain some problems we have had in our current system, what our new architecture will be, and how it will solve our problems. This blog post will talk mostly about what happens during the lifetime of a REST request and will ignore some of the dependencies for simplicity.