Project Management

Scaling for E-Business : Technologies, Models, Performance, and Capacity Planning

Author: Daniel A. MenascT, Virgilio A. F. Almeida

ISBN: 0130863289

Buy this book at

Calculating Scalability
bu Alan Zeichick 

If 750 customers are browsing your commerce site at a given time, 10 percent of them are using secure SSL-based checkout sessions and 20 percent of them are running database queries, are you sure that all will see the sub-eight-second response time that’s the Holy Grail of Web usage models? How many additional users would it take to drive up average response time to 10 seconds? If half of the users with greater-than-eight-second response time abandon your site after the fifth page view, what’s the cost of the lost business? Where is the performance bottleneck? The router, the application server, the Web server, middleware, the disk subsystem? Would saving some of that business justify the costs of rearchitecting the application or throwing more hardware at the Web site?

Those are the sorts of issues addressed by “Scaling for E-Business,” a pragmatic guide to determining whether a Web site can survive deployment on an intranet or on the public Internet. After all, current Web testing and application profiling techniques are still new and crude, and due to the characteristics of Web and application servers, it’s extremely difficult to predict how many transactions per second a particular solution can handle—and even harder to translate that prediction into practice.

The authors, who are professors of computer science at George Mason University and at Brazil’s Federal University of Minas Gerias, clearly understand both technology and the business imperatives behind a fast Web site. The book begins by presenting a business model for e-commerce sites, with different sections for business-to-consumer retail sites and the more complex business-to-business model. There’s nothing groundbreaking there—until they get to the numbers, demonstrating that a text-only auction site with 1,100 categories, 120,000 items, 45,000 daily auctions, 1.9 million users and 4.8 million daily page views would require a 120Mbps network connection were it to add multimedia capabilities on the product-description pages.

Because a Web site’s overall performance needs to satisfy customer expectations, the authors next dive into an analysis of customer online behavior using a technique they call Customer Behavior Model Graphs (CBMG). They preach the need to model this behavior in order to be able to predict not only current site performance patterns, but also to allow formal what-if predictions for adding new site features or changing the arrangement of its pages.

The CBMG model provides a very clear way of seeing exactly how a site is designed, the different states that the site can be in to a particular customer (such as login, browse, pay and exit), and how to calculate the computational costs of changing from one state to another. Once the model is completed from the site, the book shows how to use Web-site logs to populate a state matrix to show the transition probability for each pair of states—and therefore, where to ensure that the transition is seamless and has sufficient computing resources.

For example, if many users are transitioning from the search-engine page directly to the exit, that might imply that there’s a problem with the search engine, or that it’s hard to apply search results to other parts of the Web site. They also show how to map each state to a particular e-business function, and thereby put a financial value on each state transition.

The authors provide a similar usage model for back-end processes. Their Client/Server Interaction Diagrams map the architecture of the e-business system, and provide a framework for calculating the average and peak delays at each stage in the system, such as from the client browser to the Web server to the application server to the database, back to the app server, Web server and the client. Based on these diagrams, it’s easy to see which resources bear the brunt of the load and should therefore be optimized and monitored regularly.

Taking things a step further, the next set of models goes down into the hardware and infrastructure, looking at bandwidth, redundancy and mean time between failures. These calculations should be familiar to network managers and systems analysts, but should be examined by application development staff as well, because it’s essential when designing a business-critical system to know where the weak points are, so that they can be accommodated by the code.

It keeps getting better. Each chapter delves into a different area, providing the readers with performance models—and in doing so, explaining how the technology works and scales. How do you accommodate proxies and caches for a commerce site? See page 114. How much delay does authentication with an X.509-compliant digital certificate add to the transaction? See pages 137 to 153. What about the Secure Electronic Transaction protocol? There’s a whole chapter.

The second half of the book contains theory to back up the general models described earlier. How many average and peak users can be accommodated by two servers connected via a 100Mbps LAN, and what would happen to performance if the speed of the database disk subsystem was increased by 20 percent? See section 8.2. How does Zipf’s Law, which states that the relationship between a Web page’s popularity and its frequency of use is an inverse power law, relate to serving documents? It’s a fascinating analysis, and all the better because the authors interject the real world of log-file analysis and e-business models with the theoretical computer-science mathematics.

That’s why I recommend this book to anyone who is developing a business-critical Web site—or wants to gain a better understanding of the diverse layers that affect Web performance.

Reprinted with permission from SDTimes. Originally appeared in Issue 12, August 15, 2000.


"The secret of life is honesty and fair dealing. If you can fake that, you've got it made."

- Groucho Marx