Dwight Merriman is a tech legend and entrepreneur extraordinaire. Dwight co-founded DoubleClick in 1995 and served as the company’s CTO for a decade. As CTO, Dwight designed the infrastructure for the DART ad serving technology that now drives Google’s profits. After selling DoubleClick in 2005, he and fellow executive Kevin Ryan left to start their own company. They ended up starting five, including Gilt Groupe, 10gen, and businessinsider.com. No big deal.
These days Dwight is focused on 10gen, the company behind MongoDB, a leading open source NoSQL database. At the June 12, 2013, installment of AppNexus Engineering@Scale, Dwight sat down with AppNexus CEO and Co-Founder Brian O’Kelley to talk scaling and the future of big data.
When DoubleClick launched, much of what now constitutes a tech stack didn’t exist. As a result, scaling in the early days of DoubleClick wasn’t about improving or expanding a tech stack but about creating one. Take geolocation software for example. In 1995 Dwight wrote his own geotargeting code because that critical tool for Internet ad tech hadn’t yet been invented. Even basic technology that did exist – like web browsers – had so many scaling limitations that Dwight and his team at DoubleClick developed their own homegrown solutions to meet the company’s scaling needs.
These days, as the former CEO and now Chairman of 10gen, Dwight has a full tech stack to scale. Within that tech stack, he believes that the data layer poses the most challenges for scalability – specifically horizontal scalability. Two things make it particularly hard to scale traditional databases horizontally: distributed joins and distributed transactions.
The MongoDB team is working to tackle both of these and has already gone through a few attempts at workarounds for implementing distributed joins without any simplifying instructions. Initially, they decided to scrap doing distributed joins altogether since it was such a tough roadblock. But that meant MongoDB wasn’t relational so eventually another data model was chosen. The challenge now is figuring out how to create something that has a high enough functionality to cover a majority of use cases. As Dwight points out, the main difficulty with distributed joins and distributed transactions, and with databases in general, is the engineering tradeoff between functionality and speed of scaling.
Looking to the future of big data, Dwight believes that there will be a number of successful NoSQL products to choose from. Dwight sees NoSQL databases as having two distinct advantages over SQL databases in terms of scalability. First, NoSQL databases offer a lot more automation. Relational databases scale vertically but not horizontally, so users are forced to split data up themselves through manual sharding at the application layer. Since MongoDB is NoSQL, it has been designed to provide auto-sharding, a big plus for horizontal scalability.
Second, NoSQL databases use JSON, which, according to Dwight, makes it significantly more flexible than relational databases, and therefore more conducive to scaling. MongoDB for example used BSON, a binary form of JSON documents that allows schemas to be much more agile without detrimental tradeoffs in functionality.
While the staying power of traditional databases is impressive, Brian and Dwight agree that it’s time for a new iteration to meet the ever-evolving demands of scaling technology. Dwight is aiming to make that next iteration come from MongoDB.