Hot Stuff:

Why Heat is the Enemy of Server Farms and Sometimes Less Power Means More
By Robert X. Cringely
NOVEMBER 6, 2003

For a few years back in the early 1980s, I had in my cellar a Digital PDP-8 minicomputer. Didn't everyone? I bought the computer from a college for one dollar, and my labor to remove the thing from their computer room. Once set up in the cellar, I ran cable and put ADM-3A terminals in every room. This was years ago, but my clearest memories of that old PDP-8 were toggling-in the boot loader from memory IN THE DARK following several power outages and one earthquake. Oh, and the machine raised the ambient temperature in my house about five degrees. That box put out a LOT of heat, and heat is the topic of this column. It is the enemy of big server installations, the bane of blade servers, and there are times when heat turns computing economics on its head and makes it smart to use computers that are less powerful, not more.

The catalyst for this column comes from an announcement made this week by Intel, and observations from one reader in the UK about Google's data centers. Intel announced that it has come up with a new material for the insulating layers between semiconductors in its microprocessors. This new and mysterious stuff insulates better and thereby minimizes the leakage of electrons between chip layers. It is that electron leakage that causes heat that hurts performance and can eventually even damage the chip. Leaking electrons are bad.

The trend in microprocessor design has been to use more transistors in each generation. More transistors mean more heat. But these design changes are mapped against simultaneous photolithographic process changes that have taken the minimum feature size of a chip from 0.6 micron down to 90 nanometers (0.09 micron) over the course of a decade. These ever-thinner lines etched on the chip mean that power consumption goes down, which presumably would reduce heat, too. But the surface area of the chip goes down even faster while the number of transistors is going up, meaning we are typically trying to dissipate a little less heat from a LOT less chip, so the temperatures on the die, itself, are going up, up, up. Run that sucker at a higher clock rate and the heat is even worse.

The cure for this problem can come through better heat sinks and fans, through deliberately choosing processors that produce less heat even though they may have less power, or from reducing electron leakage as Intel says it has done with its new material, which remains a secret in everything except hype. This latter technique is the hardest and most elegant, and fits best with Intel's marketing plan of getting us continually to trade up.

All chip makers are working on new insulating materials, not just Intel, but Intel has a bigger publicity budget.

There are lots of advantages to running microprocessors at lower temperatures. Overclockers have found, for example, that using liquid nitrogen to cool their CPUs allows them to run at vastly higher clock rates. If you want the fastest game PC in town, get yourself a Dewar flask of LN2.

While chip makers experiment with new dielectric materials, computer makers are limited to improving the cooling systems of their boxes. This means bigger heat sinks and more fans. Heat sinks are good because they just sit there, providing a highly thermoconductive material like aluminum (absolutely the wrong stuff to make insulating windows of, by the way) and a lot of surface area. Fans are bad because they can fail, leading to overheating, and because they consume power and add even more heat to the room.

As if things weren't bad enough, now we like to put our PCs in racks. Six feet of rack space holding up to 40 1U servers, each with two microprocessors, makes a total of 80 CPUs to cool, not to mention their memory and disk drives, which also make heat. And eight-foot racks are coming.

This drive to rack servers is based on looking cool and putting the most processing power possible per square foot of data center. Doubling the density of servers drops the real estate cost per server in half, the reasoning goes, and means you can wait that much longer before having to build another data center. So both capital and continuing costs are minimized.

This is part of the move to blade servers, which put the formerly separate machines into a common chassis. Blade servers allow even higher CPU densities, require a lot less wiring, and are well-suited to clustering. That's all good except they put an even greater thermal load on the total system.

Here is a nugget of information to keep in mind. The capability of a typical rack system to dissipate heat into a typical room is about 100 watts per square foot. This number can be raised by clever aerodynamic design and by adding bigger heat sinks and more fans. But eBay went looking for a new data center not long ago, and their requirement was for dissipating 500 watts per square foot, which is at the ragged edge of what's possible with air cooling.

Google faced this same problem (you thought I'd forgot the part about Google, didn't you?) when designing its data centers and made a decision based partly on science and partly on economics. If there is a practical limit to heat dissipation per square foot and you can design systems that produce just about any amount of heat you like by varying the computing power (remember there are miniATX systems that require no fans at all), then there has to be a sweet spot that produces the most CPU cycles per dollar per square foot. For Google, that sweet spot meant filling its data centers with mid-range Pentium-III machines, not screaming Pentium 4s or Itaniums.

So sometimes less is more.

The mainframe guys used chilled water cooling and that's an alternative, but an expensive one that turns a bunch of PCs into, well, a mainframe, complete with that architecture's advantages and disadvantages. So short of drenching our motherboards in silicon coolant like they did in the old Cray supercomputers, what can we do to get the most computing bang for our thermal bucks?

First, look at what makes heat. Disk storage generates lots of heat, especially the managed storage systems. From a reliability standpoint, hard disk drives usually the least reliable components in a PC. When you have hundreds of 1 or 2 U servers you also have hundreds of disk drives. Think about it. Mean time between failures may be 30,000 hours, but if you have a thousand disk drives, that means one will die every 30 hours. Now put 15,000 servers in a data center with two drives per server and you are losing one drive per hour, which takes a full-time person just to replace. Something is always breaking. This is usually what drives people to look at managed storage, but that costs more money and doesn't make less heat.

One alternative is a Storage Area Network or SAN, which at least schlepps the disks into another room with its own thermal absorption capacity. The problem with SAN is it’s expensive and very quirky to get up and running. SAN is a long way from plug-and-play. I think a more interesting route will be serial IDE and serial SCSI for internal storage, and storage over IP (maybe iSCSI) for external storage. The big plus is it will leverage the low cost technology that is available in the LAN world. Certainly, Cisco is counting on IP storage as a huge profit center in the next decade.

The next least reliable part is the power supply. I've seen people put in two power supplies for redundancy, but one supply doesn't have enough power to run the system. Lose either supply and you lose the system.

This brings us back to blade servers, where the big selling point is the idea of one chassis. To the universal IT labor model, you can count it as one system instead of, say eight. Blade servers are marketed as opportunities for server consolidation and reduced support costs. But if you replace eight servers with one, will you reduce your IT support staff by 87 percent?

No.

Ideally, a blade server should be able to provide you the same CPU capacity at a cost that is less than the same number of individual servers. This is achieved by sharing power supplies and other support electronics. The reality is there is probably not much cost difference. There is more new custom electronics in a blade server than a normal server and that costs more money. With all new designs, there will probably be some design shake out, too.

And processors are becoming a reliability problem, too. Anything with moving parts will be less reliable. CPUs now have fans. Anything that uses lots of power will be less reliable.

As Google has learned, there is another course. The more mature and slower processors are more than sufficient to handle most application needs. The problem is newer, faster, better is easier to sell. The IT industry lives by quickly making its stuff obsolete and selling us more. That means it is only a matter of time until Google can't get those old P3s.

I think the best way to deal with heat and improve performance is by adding memory. If you could put your entire Oracle index in RAM, your database would be a thousand times faster with almost any CPU. But it is not enough to just throw RAM at the problem because few operating systems and applications use extra memory efficiently. A lot of memory continues to be wasted by programming bugs.

So the practical answer to heat is first to wish Intel, AMD and the others luck in finding superior insulating materials to lower electron leakage and heat production. After that, we should demand that our operating system and application vendors adopt large memory models and find ways to use cheap RAM as a substitute for hot CPUs.

http://www.pbs.org/cringely/pulpit/pulpit20031106.html


Disclaimer: The views and opinions expressed on unofficial pages of California State University, Dominguez Hills faculty, staff or students are strictly those of the page authors. The content of these pages has not been reviewed or approved by California State University, Dominguez Hills.