Featured Whitepapers
- Apples, Oranges, and Acorns - All Agile Development Tools Are Not the Same
- One's Enough for Agile Application Development Management
- Requirements Management 101 – 4 Basics Everyone Should Know
- Tips on Requirements Traceability – Learn How to Control Change and Improve Quality
- Scaling Continuous Integration to Large and Distributed Teams
Upcoming and Recent WebCasts
The Twitter Fail Whale and Global Optimization |
|
| Monday, 22 February 2010 10:28 |
|
Blogger: Richard Watson I was looking for a good example to explain global vs. local optimization, and lo, one fell right out of the twittersphere at me. It came from the Twitter engineering team themselves. Ed Ceaser (@asdf) and Nick...
I was looking for a good example to explain global vs. local optimization, and lo, one fell right out of the twittersphere at me. It came from the Twitter engineering team themselves. Ed Ceaser (@asdf) and Nick Kallen (@nk) posted a blog entry recently, entitled "The Anatomy of the Whale". The entry discusses the team’s efforts to track down a capacity problem that caused too many people to see the 'fail whale', Twitter's visual representation of a HTTP 503 Service Unavailable error. As the guys say:
It struck me then that finding a performance bottleneck is a fantastic example of a global process optimization problem. Any seasoned developer knows that the process for finding performance issues is real detective work. In my years as a developer, I learned to recognize performance blind alleys, or a red herring (to name but two clichés), such as investing time in optimizing one part of an end-to-end process. Ed and Nick offer the following great advice for any optimization effort: "Focus on the biggest contributor to the problem". Tracking down the biggest contributor to the problem is where the need for visibility comes in. VisibilityGaining visibility into any process has a number of aspects:
The twitter team describes this measurement data problem as:
Visualizing the performance data, the Twitter team discovered that their problem was the decay in throughput of data being delivered during peak loads from their distributed caching subsystem, based on Memcached. Armed with this information, they tackled the problem in two ways: reduce the volume of calls to Memcached (they found 7 out of 17 calls to Memcached were unnecessary), and beef up the Memcached cluster. TransparencyThere is further point to make about the Twitter blog post: it is another example of the Twitter team doing their engineering in public. This transparency gives them credibility (Amazon and Salesforce.com, are you listening?) and positively affects Twitter’s relationship with their customers, potential paying customers, and investors. I have blogged about the work of the Twitter engineering team before. Their commitment to transparency continues to impress me. I'd rather hear my service providers saying "look we have a problem, here's how we measured it, and here are the steps we are taking to resolve it", rather than operate on a "Wizard-of-Oz-behind-the-magic-curtain" basis. Posted: 2010-02-22 17:28:49Author:Richard Watson
Set as favorite
Bookmark
Email this
Hits: 715 Trackback(0)Comments (0)
|
Agile Marketplace - Announcements and Special Offers
The Business Case for ALM Transformation
Are legacy systems holding your company back? Breakthrough these technical constraints with an open and scalable environment that meets your unique business need to transform. There is no reason to be locked into an obsolete platform. The output of a number of recent transitions from legacy systems, this is practical white paper shares lessons learned and illustrates how guidance and enablement can pave the way for change.
Download this Whitepaper




