Few people are more familiar with website scalability problems than Theo Schlossnagle. Not only is Theo the founder and CEO of OmniTI, he is also the author of Scalable Internet Architectures, a book that draws on his 15 years of experience to provide developers with a blueprint for tackling the biggest obstacles to successful scaling. Theo shared his wisdom in a recent AppNexus Engineering@Scale talk.
Theo kicks the discussion off by breaking down the three biggest challenges of scaling systems: storing and accessing data, messaging, and caching.
When you face the issue of scaling, your first step is to decide whether to scale up (bigger boxes) or scale out (more boxes). It’s a simple question but one that people often get wrong. Theo’s rule of thumb is that you should never scale out when you know you can scale up, if your projections show you growing at or below the pace of Moore’s Law, you should always just use bigger boxes. Scaling out incurs the cost of using engineers to solve a technical infrastructure problem when their time could be better spent.
If you are unsure of whether or not you can scale up, Theo advises doing the following:
- Understand your problem: This seems obvious, but it is important to understand what exact problems you are trying to address.
- Project your possible needs and growth over the next 12 – 24 months: You do not want to deploy a solution that will be immediately obsolete because the size of the problem changed while you were building.
- Deeply understand the technology at hand: Remember “new” technology is often not thoroughly tested, well understood, or supported. The older technologies have been in production systems for years, are well understood, well supported, and have strong communities behind them. Just because there is a new hotness out there does not mean it is a good match for your needs.
The importance of not only how your subsystems communicate with each other, but also how you make your data available for consumption can’t be overemphasized. Some consumers actively need the data, but there are also passive listeners who might need access to that data as well. Doing this well is what allows you to run the types of problem-solving analytics that are essential to successful scaling.
Theo recommends using a producer:consumer paradigm for all messaging and highlights that the most common “unexpected” or passive consumers are monitoring systems – something that too few engineers take into consideration.
Without caching, the Internet does not work. Caching is complex and happens at so many levels that intelligent caching is absolutely essential.
No silver bullets
Theo makes it clear that there are no silver bullet solutions for these scaling challenges. He does however outline two key strategies that will significantly minimize the negative impact that those three scaling problems can have.
Monitor what matters. Monitoring for the sake of monitoring is a waste of time. What matters most are the metrics that help you diagnose problems, help you plan for the future, and help you tangibly affect the success of your business. Don’t drag an engineer out of bed in the middle of the night to swap out a dead hard drive on a database machine that is non mission critical. Make sure your monitoring, metrics, and alerts are tied to business value so that you and your entire team will have rational, aligned priorities. Doing this turns IT into a profit center instead of a cost center.Remember: You can’t scale what you can’t measure. Quantify your problem and get all of the stakeholders to agree.
Decoupling correctly is the number one component to scalability. Typically, only one component in a system will have scalability issues at any given time, so isolate your problems, decouple them, and solve each in its own way. This gives you the freedom to find the right solution to scale each individual component without ever having to make massive changes to your entire system. If you decouple correctly, you’ll be agile and flexible