The Data Platform Engineering Team at AppNexus has been utilizing Hadoop in production for the last 4+ years. Growth of data volume as well as the number of customer use cases supported by Hadoop’s infrastructure has grown exponentially since we adopted the Hadoop stack. More specifically, in 2012, we were processing 10 terabytes of data per day, today we process over 170 terabytes per day.
We evaluated various commercial and open source solutions to reduce our storage foot print, improve Hadoop utilization, and unlock YARN’s multi-tenancy promises.
Our talk covers:
- Architecture of AppNexus Data Platform: Data ingestion, processing, and how our customers consume the data.
- Complex use cases supported by MapReduce, Vertica and Spark streaming.
- Overview of how we are offering our Data Platform As A Service to AppNexus’ business units where teams can build and manage their own YARN application deployments.