In the previous post, my esteemed colleague and sometimes friend wrote about an epic quest that the API team is undertaking. I wanted to take a few moments to explain some problems we have had in our current system, what our new architecture will be, and how it will solve our problems. This blog post will talk mostly about what happens during the lifetime of a REST request and will ignore some of the dependencies for simplicity.
This month, the Web Services team is embarking on a revolutionary change: a rearchitecture of how our platform operates. Upon completion, our clients will have access to powerful new features that will enable them to derive even greater value from building on top of our platform.
We want to share the details and experience of our technical development process. We’re currently working through the proof of concept phase and very soon will be developing our prototype. We believe the challenges we’re solving and the roadblocks we overcome are highly interesting and think others would be interested as well.
On September 17, 2013 starting at 17:54 UTC (1:54 PM America/New_York) the AppNexus platform experienced a technical failure that initially fully halted ad serving and later partially degraded ad serving with the entire incident lasting approximately two and a half hours. We messed up and we apologize. Here is what happened and what we are doing to make sure it does not happen again.
I suppose every software engineer accumulates a few stories about unusually elusive bugs. These stories are fun to reminisce about because, often, there’s a perfect set of circumstances that led to the perfect bug. One of my projects while interning on the data team has been to work on the Job Management Framework (JMF)—an internal (web) service used by the data team to manage and monitor various data pipeline jobs such as aggregations, syncs, purges, etc. This bug chasing story began a few weeks ago, when I was preparing JMF for its weekly deployment after implementing some routine bug fixes.
Dwight Merriman is a tech legend and entrepreneur extraordinaire. Dwight co-founded DoubleClick in 1995 and served as the company’s CTO for a decade. As CTO, Dwight designed the infrastructure for the DART ad serving technology that now drives Google’s profits. After selling DoubleClick in 2005, he and fellow executive Kevin Ryan left to start their own company. They ended up starting five, including Gilt Groupe, 10gen, and businessinsider.com. No big deal.
These days Dwight is focused on 10gen, the company behind MongoDB, a leading open source NoSQL database. At the June 12, 2013, installment of AppNexus Engineering@Scale, Dwight sat down with AppNexus CEO and Co-Founder Brian O’Kelley to talk scaling and the future of big data.
When DoubleClick launched, much of what now constitutes a tech stack didn’t exist. As a result, scaling in the early days of DoubleClick wasn’t about improving or expanding a tech stack but about creating one. Take geolocation software for example. In 1995 Dwight wrote his own geotargeting code because that critical tool for Internet ad tech hadn’t yet been invented. Even basic technology that did exist – like web browsers – had so many scaling limitations that Dwight and his team at DoubleClick developed their own homegrown solutions to meet the company’s scaling needs.
These days, as the former CEO and now Chairman of 10gen, Dwight has a full tech stack to scale. Within that tech stack, he believes that the data layer poses the most challenges for scalability – specifically horizontal scalability. Two things make it particularly hard to scale traditional databases horizontally: distributed joins and distributed transactions. Continue reading…