John R. Williams - Escaping legacy IT systems
Sophisticated computer model lets corporate IT managers evaluate upgrades and reconfigurations without risking the stability of their networks.
Larry Hardesty, MIT News Office
September 20, 2011
Today, big multinational companies depend heavily
on their information systems. If a company’s internal
servers go down for even a few hours, it can mean
millions of dollars in lost productivity. Information
technology (IT) managers are thus reluctant to make
changes that could have unforeseen consequences
for network stability; as a result, they often end up
saddled with obsolete software and inefficient
At the Institute of Electrical and Electronics
Engineers’ (IEEE) Cluster conference from Sept.
26-30, researchers from MIT’s Department of Civil
and Environmental Engineering (CEE) will unveil
what may well be the most sophisticated computer
model of a corporate information infrastructure yet
built, which could help IT managers predict the
effects of changes to their networks. In a study
funded by Ford Motor Co., the researchers
compared their model’s predictions to data supplied
by Ford and found that, on average, its estimates of
response times for queries sent to company servers
were within 5 to 13 percent of the real times.
A server in a corporate data center can be a
complex device, with 32, 64 or even 128 separate
processors, all of which are connected to a bank of disk drives. Servers at a single data center are connected to
each other, and the data center as a whole is connected to other data centers around the world.
Researchers have previously modeled the performance of groups of servers within individual data centers, with
good results. But the MIT team — Sergio Herrero-Lopez, a graduate student in CEE; John R. Williams, a
professor of information engineering, civil and environmental engineering and engineering systems; and Abel
Sanchez, a research scientist in the Laboratory for Manufacturing and Productivity — instead modeled every
processor in every server, every connection between processors and disk drives, and every connection between
servers and between data centers. They also modeled the way in which processing tasks are distributed across
the network by software running on multiple servers.
“We take the software application and we break it into very basic operations, like logging in, saving files,
searching, opening, filtering — basically, all the classic things that people do when they are searching for
information,” Herrero-Lopez says.
Just one of those operations, Herrero-Lopez explains, could require dozens of communications between
computers on the network. Opening a file, for instance, might require identifying the most up-to-date copies of
particular pieces of data, which could be stored on different servers; requesting the data; and transferring it to the
requester. The researchers’ model represents each of these transactions in terms of the computational resources
it requires: “It takes this many cycles of the CPU, this bandwidth, and this memory,” Herrero-Lopez says.
For a large network, the model can become remarkably complex. When the researchers were benchmarking it
against the data from Ford, “it took us two days to simulate one day” of Ford’s operations, Herrero-Lopez says.
(Of course, they were running the model on a single computer, which was less powerful than the average server
in a major corporate network.) But because the model can simulate the assignment of specific computational
tasks to specific processors, it can provide much more accurate information about the effects of software
upgrades or network reconfigurations than previous models could.
Trust, but verify
To be at all practical, even so complex a model has to make some simplifying assumptions. It doesn’t require any
information about what an application running on the network does, but it does require information about the
resources the application consumes. To provide that information, Ford engineers ran a series of experiments on
servers at multiple data centers. They also supplied the MIT researchers with information about patterns of
network use: “At 9 a.m., this many engineers are logging in, or are saving files, in this geographical location,”
The researchers’ model is modular: There’s a standard bit of code that describes each element of the network,
and elements — a processor here, a high-bandwidth connection there — can be added or removed arbitrarily. So
the model can quickly be adapted to different companies’ networks, and to different hypothetical configurations of
those networks. “You can use the simulator for performance estimation and capacity planning, or to evaluate
hardware or software configurations or bottlenecks — what happens if a link between two data centers is broken?
— or for background processes or denial-of-service attacks,” Herrero-Lopez says. “It’s nonintrusive — you’re not
touching the system — it’s modular, and it’s cheap.”