tech blog

AppNexus is today’s most powerful, open, and customizable ad tech platform. Advertising’s largest and most innovative companies build their businesses on AppNexus.

Introduction to the Actor Model for Concurrent Computation

| Comments

The Actor Model is a computational model for designing concurrent, distributed systems around the principal of self-contained Actors which operate on sending and receiving messages. While the idea has been around since the mid to late 70’s, it is now gaining more traction with frameworks and languages such as Akka and Erlang, which share many similar principals.

See John Murray, Senior Software Engineer at AppNexus, serve up an introduction to Actor Model principals and concepts and give a look at how we can construct parallel systems using the Akka framework.

Learn more about upcoming Tech Talks at AppNexus here

Securing our Big Data Platform Part 1

| Comments


The video we posted on June 23rd, 2015 (link here) introduced our efforts to open up our big data platform to other teams within AppNexus. Data Platform as a Service (DPaaS) is our internal offering that allows other teams at AppNexus to run analytics upon our wealth of data. Our users want to be confident that they’re using the platform safely and appropriately. They only want to see the jobs and resources that are relevant to them, and not to impact other users nor mainline production processes. As operators of the platform, we want to ensure the safety and stability of the system as a whole and reasonable isolation between our users.

This clearly points to requiring an AAA solution - authentication, authorization and accounting.

  • Authentication - identifying an entity acting upon your system
  • Authorization - allowing/disallowing that entity to perform actions
  • Accounting - keeping track of which entities have performed which actions

Practically speaking, we will be tackling these one at a time in the natural order listed above. Each item has its own complexity and intricacies, and this post will discuss those around the first A - authentication.

Weird Android bug

| Comments

The AppNexus Mobile Advertising SDK provides developers a fast and convenient way to monetize their apps. It’s a well documented and throughly tested open-source code with direct engineering support. While implementing the Android native ad solution, I ran across a puzzling issue that I’d like to share my investigation and hopefully save other Android developers some time in the future. For those who are not familiar with native advertising, the IAB has a very clear video.

The issue was that a registration call to bond a native ad response with an Android native view would fail silently, even though the debugging tool showed that the code was executed correctly.

To simplify it, let’s pretend you’re building an app that turns on/off a flashlight, in which there’re two runnables turnOnTheLight and turnOffTheLight. You would assume that if we call first and then call, the light should be on. However, in the test run, the light is actually off after execution. I put break points and stepped through the code, turnOffTheLight was indeed posted first. Then what happened?

I downloaded the source code of Android SDK, stepped into the method post() and found this:

Android SDK source code from View.javaSource
 * <p>Causes the Runnable to be added to the message queue.
 * The runnable will be run on the user interface thread.</p>
 * @param action The Runnable that will be executed.
 * @return Returns true if the Runnable was successfully placed in to the
 *         message queue.  Returns false on failure, usually because the
 *         looper processing the message queue is exiting.
 * @see #postDelayed
 * @see #removeCallbacks
public boolean post(Runnable action) {
    final AttachInfo attachInfo = mAttachInfo;
    if (attachInfo != null) {
    // Assume that post will succeed later
    return true;

It turns out that, if the view is not attached to the window, the runnable will be put in the RunQueue of the view hierachy - The run queue is used to enqueue pending work from Views when no Handler is attached. The work is executed during the next call to performTraversals on the thread.

Go back to the scenario above, when posting turnOffTheLight the view was not attached but was attached when posting turnOnTheLight. Thus, turnOnTheLight is posted to UI thread to be executed immediately and turnOffTheLight is not executed till the next performTraversals is called. The solution is very simple, post both runnables to the UI thread directly using the following method:

AppNexus SDK source codeSource
 Handler handler = new Handler(Looper.getMainLooper()); Runnable() {
        public void run() {
            // code

In the end, this is not actually a bug in the app’s code, it’s more of a rare use case exposed a slient inconvenient behavior of an Android convenience method, that the call to APIs must be done in the proper sequence to get the correct result. Sharing here with Android developers who might run into this weird situation too.

Taming Big Data

| Comments

The Data Platform Engineering Team at AppNexus has been utilizing Hadoop in production for the last 4+ years. Growth of data volume as well as the number of customer use cases supported by Hadoop’s infrastructure has grown exponentially since we adopted the Hadoop stack. More specifically, in 2012, we were processing 10 terabytes of data per day, today we process over 170 terabytes per day.

We evaluated various commercial and open source solutions to reduce our storage foot print, improve Hadoop utilization, and unlock YARN’s multi-tenancy promises.

Our talk covers:

  • Architecture of AppNexus Data Platform: Data ingestion, processing, and how our customers consume the data.
  • Complex use cases supported by MapReduce, Vertica and Spark streaming.
  • Overview of how we are offering our Data Platform As A Service to AppNexus’ business units where teams can build and manage their own YARN application deployments.

x Eighty Swift

| Comments

Understanding the low level behavior of our applications is one of the most useful skills for working on high throughput systems at AppNexus. We currently process around 4 million requests per second with extremely strict latency requirements, so being able to identify and correct inefficiencies at the instruction level can yield significant performance gains. More importantly, being able to work in assembly engenders feelings of supreme confidence and consummate pride (citation needed).

Here are some of our team’s favorite (and/or least favorite) x86 instructions described through comparisons to Taylor Swift songs.


Parquet: Columnar Storage for Hadoop Data

| Comments


At AppNexus, over 2MM log events are ingested into our data pipeline every second. Log records are sent from upstream systems in the form of protobuf messages. Raw logs are compressed in Snappy when stored on HDFS. That said, even with compression, this still leads to over 25TB of log data collected every day. On top of logs, we also have 100s of MapReduce jobs that process and generate aggregated data. Collectively, we store petabytes of data in our primary Hadoop cluster.

Parquet is a columnar storage format in the Hadoop ecosystem. Compared to a traditional row oriented format, it is much more efficient in storage and has better query performance. Parquet is widely used in the Hadoop world for analytics workloads by many query engines. Among them are engines on top of Hadoop, such as Hive, Impala and systems which go beyond MapReduce to improve performance(Spark, Presto).

Parquet stores binary data in a column-oriented way, where the values of each column are organized so that they are all adjacent, enabling better compression. It is especially good for queries which read particular columns from a “wide” (with many columns) table, since only needed columns are read and IO is minimized. Read this for more details on Parquet.

an_message: Format Agnostic Data Transfer

| Comments

Every distributed RESTful system has a communication problem. How does Service A communicate with Service B? Does it pass data via multipart/form-data? Does it pass individual fields on the query string? Does it POST a blob of JSON?

With the proliferation of “RESTful” services the trend is decidedly towards JSON and away from XML. JSON is relatively compact and fast to parse (at least for most services the bottleneck is not parsing the JSON). This works well for most “wait based” services (database lookup, file reads, etc.) However, there is a class of services in the ad-tech space (and elsewhere) that have more stringent SLA’s for which JSON parsing is actually a significant portion of the runtime of a single request. For these services we can do better while still keeping the schematic safety of JSON in place.

AngularJS blog series – Introduction

| Comments

Greetings from your AppNexus Discovery Engineering team in San Francisco! In the spirit of AppNexians sharing, the team here is going to write a series of blog posts about technologies we’ve used to create Twixt, a brand-new application for direct media buying. (If you haven’t heard about Twixt, check out


Twixt is a single page web app, or SPA, built using a number of client and server technologies. The application client that runs in a users browser relies mainly on the JavaScript framework AngularJS. Through these blog posts, we are going to help you better understand what AngularJS (“Angular” for short) is and how we use it.

A single page app, or SPA, is a web app that loads itself mostly all at once at a single URL. Well-organized JavaScript then runs as a single application within the page, making AJAX calls to the server to load information and perform tasks, but never changing the base URL (although the path after the # may change – more on this later). HTML fragments may be loaded from the server to provide layouts for the app, but these are assembled (along with data) into views and injected into the Document Object Model (DOM) by the application itself – everything is handled on the fly by JavaScript.

Single page web apps are popular these days as they allow for the smoothest “desktop app in a browser” experience. There is less flicker between pages and “persistent UI” is truly persistent. If you’ve used Gmail, Google Maps, or listened to SoundCloud, then you’ve used a single page app, and many other big industry players treat large sections of their sites as SPAs.