PyData 2012: Rapid Iteration With Python — Scaling AppNexus

At PyData NYC, Dave Himrod and Steve Kannan, respectively our Director and our Engineering Manager of Optimization and Analytics*, delivered the keynote presentation, sharing best practices and lessons learned when iterating and scaling with Python.

Unlike other more complicated programming languages, Python’s versatility allows their team to use it both for offline analytical tasks as well as production system development. They dove into the benefits of rapid prototyping and the importance of tightly integrating research with production. In their presentation, they explored specific tools including Pandas, numpy, and ipython and how they have enabled us to quickly data-mine across disparate data sources, explore new algorithms, and rapidly bring new processes into production.

Lucky for us, the presentation was recorded. Enjoy!

*Affectionately referred to as OCT: Optimization Crack Team.

 

 

comments (0)

Surge 2012: Mike Nolet presents “From Zero to 500k QPS in Three Years: Scaling AppNexus”

By popular request, a recording of CTO & Co-Founder Mike Nolet’s presentation at Surge 2012 is finally available! For those who don’t want to dig through thousands of cat videos to find it, we’ve posted it here for your convenience.

Hear our story of how we went from two guys on a couch who thought they could build a cloud-hosting company, to more than 200 300 employees, building the world leader in real-time advertising technology. Mike’s presentation gives an uncensored dive in to how, over the course of one weekend, we used GSLB and KeepAlive to hack our servers into place, build a backend to support them, and freed ourselves from the grips of expensive LB vendors (you can learn even more about that here). How, because we’ve always invested in full-time resources for our fully automated deployment world, we managed to not hire a single test engineer until March 2012. And how a stupid mistake took down 3/4 of our global data centers for 12 hours, yet we still managed to maintain 99.5% uptime.

He explores our devops infrastructure, powered by our in-house built continuous deployment system; our load-balancing infrastructure and its many layers; and our real-time and data streaming infrastructure, built primarily in C and processing 12 terabytes of data every day.

This is our story of scaling from zero to 500k 800k queries per second in three years.

comments (1)

Ad-ing Value: How to value tens of millions of ad opportunities per hour

[Editor's note: Be sure to catch Steve presenting "Rapid Iteration with Python: Scaling AppNexus" with Dave Himrod at PyData NYC on October 26, 2012]

AppNexus serves nearly 10 billion ads per day. On average our ad servers serve over 100k ads per second and an order of magnitude more at peak. Each ad decision is determined by a real time auction, so our system must determine the value of all eligible ad creatives every time an impression is available for bid. We leverage our vast data resources to determine these values and we aim to do so as accurately as possible. However, this requires the application of computationally expensive mathematical models, which could not reasonably be calculated in the milliseconds-long duration of an ad auction. So how is this possible?

Enter the AppNexus Optimization Engineering Team, which builds, among other things, scalable infrastructure for offline calculations relevant to ad valuation. Such calculations include modelling the effect of user frequency1 on click through rate2 and determining the expected value of an ad creative that has never been served before. In the the latter case, we have to pre-calculate values for each ad creative on every segment of inventory (modulo targeting restrictions), resulting in tens of millions of combinations. The Optimization Eng team has built a system designed for this use.

More specifically, we developed a system for splitting up large, CPU-intensive jobs into manageable, independent chunks and distributing them across a cluster of machines for processing. Continue reading

comments (0)

Testing in Python: using Django

As promised! Let’s dig in to how we run tests using Django.

We run a web server using apache where the code is written using Django. We put information regarding the state of the optmization worker jobs, advertiser/publisher data upload systems and such for other engineers and analysts to use.

How to run Django test suites per application:
We run Django tests separately on our testing framework. We run Django tests with the command:

1
$ python path/to/manage.py test

Django tests are documented at https://docs.djangoproject.com/en/dev/topics/testing/. The document tells you that if you have a Django application, we can put our tests in the application directory’s file named tests.py. Let’s name the application app1.

What they say:

1
2
3
4
$ ls www
app1
$ ls app1/
tests.py

However, I don’t like this test organization. If we write many tests, the tests.py file will be huge. Also I want the filename to reflect which module is being tested.

What I want:
Continue reading

comments (2)

Testing in Python: using nose & mocks

Here on the AppNexus Optimization team, we write code to optimize algorithms for our customers. We want to write good great code, so of course we test our code. We write and unit test our code in Python and run continuous integration using Jenkins. We have one Jenkins instance up that covers many of the different engineering teams. Our check-ins automatically trigger tests to run in Jenkins, and if we break pre-existing unit tests, we should know right away.

The actual test command called by Jenkins makes use of the Python nose module. Nose picks up any appropriate test looking modules, classes, and functions that correspond to the running code while looking at the directory structure. It is mostly smart and very useful. We can also import nose into individual test modules and just run the files. This is by far the preferred use for us because it makes debugging a whole lot easier.

Whenever we write tests in Python, we have choices on what to use for mocks and stubs. For development, we usually use the mock module. Mocking allows for “fake” objects upon which we can impose certain behaviors. This allows us to separate the function we are testing from the functionality called by the function we are testing.

Our team also runs a few internal websites using Django. In this entry I will go over the basics of Python test-writing using the nose and mock classes. Check back later this week for my post on how we run our tests using the Django framework.

Nose

Before I dive in to how to test your code, let’s quickly discuss nose, a module we use for Python testing. Nose is a pretty clever test module, and it will look for all the test like files when you run it and call the matching functions in the code base, calculate their coverage, etc. You can install it using pip: https://nose.readthedocs.org/en/latest/

Run the following (from a root directory):
Continue reading

comments (0)