tech blog

AppNexus is today’s most powerful, open, and customizable ad tech platform. Advertising’s largest and most innovative companies build their businesses on AppNexus.

Real Time Big Data: Tech Talks @ AppNexus

| Comments

Another in our “Tech Talks” series @ AppNexus.

Lambda architectures for Big Data sprung out of the concept that stream processing is fast but inaccurate. So you needed to have a fast but less accurate streaming path along with a slower, accurate batch path. With the advent of technologies like Ratsoda and Samza, this is no longer the case.

Lambda architectures for Big Data sprung out of the concept that stream processing is fast but inaccurate. So you needed to have a fast but less accurate streaming path along with a slower, accurate batch path. With the advent of technologies like Ratsoda and Samza, this is no longer the case. Featuring AppNexus Principal Engineer Riley Berton. Watch the video to find out how AppNexus does real-time streaming joins across our entire platform to the tune of 200+ billion rows per day with minimal latency on minimal hardware. Learn about upcoming AppNexus tech talks at http://www.meetup.com/TechTalks-AppNe…

Securing our Big Data Platform Part 2

| Comments

Overview

When last we saw our heroes, they were battling Hadoop authentication in DPaaS. But our intrepid knights were able to overcome that crafty foe and move on to the next obstacle: authorization. And that’s where our story continues…

Authorization is the control of access to resources or actions for an entity that’s requesting them. Implementing a proper authorization layer within our system is what will give our users the peace of mind that they are in control of their data & processes and are properly isolated from one another. As the operators of the system, we gain the same peace of mind and more. We get more control over the use of our finite resources. Better yet, we can take ourselves out of the critical path of our users’ needs — more automatic security means more self-service. That’s a big win-win.

Ad Viewability and Feature Selection for Big Data

| Comments

Data Science … Translating Theory into the Real World

In today’s post, I will unite the theory behind a machine learning technique with a real-world industry problem. I’ll describe one problem we needed to solve, how we solved it by building a new tool, and how you can also leverage it!

Hopefully after reading this, you’ll have a better idea how we use data at AppNexus. Or, maybe you can sympathize with me after encountering yet another memory issue, and this tool will help alleviate your pain!

Pyrobuf – A faster Python Protobuf library written in Cython

| Comments

Introduction

Pyrobuf is an alternative to Google’s Python Protobuf library that is written in Cython and that offers better performance (roughly 1.5-2x faster), Python 3 support, and simple serialization/deserialization to JSON and native Python dictionaries. Since Pyrobuf’s only installation requirements are Cython, Jinja2, and setuptools, its also much easier to install than Google’s library, and should work as a drop-in replacement. Pyrobuf parses the same .proto specs as the Google library and generates Python modules (in .so form).

Pyrobuf started as just one part of a larger Python “port” of a C library that we use for serializing and deserializing data between a variety of formats. We plan to open-source this library in the future, but in the mean time we saw such good results with the Protobuf portion of the library that it seemed worthwhile to release it on its own.

Software Transactional Memory is Simple

| Comments

Another installment of our ongoing series of tech talks has just hit the youtubes. Paul Khuong, previously of this fame, brings us “Software Transactional Memory is Simple” which takes us into the innards of the AppNexus real time platform and how we do configuration data updates in such a low latency environment.

AppNexus’ real-time adserving stack is built on non-blocking concurrency control, which is how we achieve sub 1% timeout rates. In practice, it’s easy to misuse these techniques, which can result in crashes or angry clients. In this tech talk AppNexus Principal Engineer Paul Khuong talks about software transactional memory, how to implement it (simply), and learn about the specialized non-blocking STM we use here at AppNexus. Sign up to our MeetUp group to stay up to date with upcoming tech talks: http://www.meetup.com/TechTalks-AppNexus-NYC/

pool_party – Transactional allocation for short-lived requests

| Comments

Baby’s first memory allocator – background

Let me start off by saying that one really isn’t supposed to write custom memory allocators. Berger, Zorn, and McKinley thorougly debunked the need for them by the vast majority of the population in [1]. If you’re looking for a speedy allocator, projects like jemalloc, tcmalloc, and dlmalloc have done such a great job improving on glibc without changing the malloc/free interface that you shouldn’t even have to think about your memory allocator unless you’re doing something hugely performance sensitive. Just throw jemalloc on the case and get a free performance improvement, then don’t bother with it again. That was our theory when we moved to jemalloc at AppNexus in August of 2011, and it continues to be the theory. But then we built our own allocator anyway. This is the story of why and how.

Stopping Invalid Traffic using Spark Streaming, Kafka, and Science!

| Comments

The Signals Intelligence group (SIGINT) within AppNexus Data Science has the responsibility of identifying and stopping invalid traffic as quickly as possible. One technique they use to achieve this very quickly is collecting, aggregating and acting on streaming data using Kafka and Spark Streaming.

Watch this video to learn how AppNexus use these systems, some of the data science findings, the challenges and tribulations they’ve had to overcome, and how you can put these techniques into practice yourself.

The Signals Intelligence group (SIGINT) within AppNexus Data Science has the responsibility of identifying and stopping invalid traffic as quickly as possible. One technique they use to achieve this very quickly is collecting, aggregating and acting on streaming data using Kafka and Spark Streaming. Watch this video to learn how AppNexus use these systems, some of the data science findings, the challenges and tribulations they’ve had to overcome, and how you can put these techniques into practice yourself.

Optimize - Call for Speakers

| Comments

An Open Call to Submit Papers for AppNexus Optimize NYC

We’ve always believed that creating a better Internet is a task far greater than the efforts of any given company or entity. That was the intention behind hosting our first-ever AppNexus Optimize which took place in London earlier this summer. Catherine Williams, our Chief Data Scientist, had this to say in her opening remarks for the event:

“We want to give you deeper insight into [AppNexus’] technology and tools, and share with you our learnings and best practices. But it’s not just a one-way street. We [also] want to hear from you… We want you to share with us your own learnings and best practices, your challenges and successes. And we’re hoping for you to learn from each other as well.”

Traditions tend to develop quickly here at AppNexus; we’re excited to announce that we’ll be hosting our second Optimize event, this time in New York City on November 3, 2015. We’ll be discussing everything from the impact of ad fraud across our industry to how AppNexus is bursting the video cost bubble with a set of new video buying capabilities. In addition to product announcements and technical breakouts, we’re offering the broader tech community an opportunity to “Razzle Dazzle” us with their own projects and solutions that power the tech that powers ad tech. We actively invite, challenge, welcome, and encourage all engineers, technologists, data scientists, and developers from outside AppNexus to submit projects that demonstrate the possibilities for advanced real-time data integrations of all kinds. Those who submit their papers by Friday, October 23 will have a chance to present their findings on the main stage at AppNexus Optimize NYC.

We look forward to hearing from you all!

Submit your paper

Introduction to the Actor Model for Concurrent Computation

| Comments

The Actor Model is a computational model for designing concurrent, distributed systems around the principal of self-contained Actors which operate on sending and receiving messages. While the idea has been around since the mid to late 70’s, it is now gaining more traction with frameworks and languages such as Akka and Erlang, which share many similar principals.

See John Murray, Senior Software Engineer at AppNexus, serve up an introduction to Actor Model principals and concepts and give a look at how we can construct parallel systems using the Akka framework.

Learn more about upcoming Tech Talks at AppNexus here http://go.appnexus.com/TechTalks.html

In this tech talk, John Murray, Senior Software Engineer at AppNexus, serves up an introduction to Actor Model principals and concepts and gives a look at how we can construct parallel systems using the Akka framework. Actor Models is a computational model for designing concurrent, distributed systems around the principal of self-contained Actors which operate on sending and receiving messages. While the idea has been around since the mid to late 70’s, it is now gaining more traction with frameworks such as Akka and Celluloid as well as languages such as Go, which share many similar principals. Learn more about events at AppNexus at www.appnexus.com/razzledazzle.