Coherence?? Incoherence?? ScI??

The ANSI/IEEE Std Scalable Coherent Interface (#1596-1992) is often barely visible through the thick clouds of Fear, Uncertainty, and Doubt that seem to be stirred up by that central word, "Coherent".

Responding to this situation, it is tempting to downplay the Coherence by pointing out that it doesn't cost you anything if you don't need or use it.

(Analogy: the fact that your computer can do symbolic 4-D relativistic tensor algebra, by running Mathematica software, neither inhibits you from using it as a word processor nor motivates you to trade it in for a weaker one--you know your needs might change in the future.)

But hiding the Coherence of SCI is a mistake. Most of those who fear it today are going to want and need it in the future, and then they'll be glad that the SCI they put in today because of its performance and cost-effectiveness already is compatible with coherence.

Why?

As machines get more tightly coupled by networks and people become more dependent on networking, bandwidth requirements go up. When bandwidth goes up but performance doesn't get better, people discover they are software limited. As software becomes more efficient, communication will become more intimate.

The old concept of "I/O" becomes increasingly inapt--almost everything is just communication of information among processors, networks, memories, and (incidentally) disk drives and displays. As the amount of information becomes larger, it becomes increasingly shared--you don't want to copy all of it just to look at the bits and pieces that interest you.

What does not change during this evolution is the speed of light (and signals): it takes more time to get data from places that are farther away.

There are two strategies for handling this:

Both strategies rely on local caches.

How far away do data have to reside to be considered "remote"? That depends on the time scale of interest. Usually the relevant measure of time is the number of processor cycles that are wasted while waiting for the data to arrive.

This measure means that a given physical distance becomes ever more remote as processor speeds increase.

Today data only a few centimeters away is often remote enough in time to cause processor designers to include fast cache memories right on the processor chip, or (HP/Intel) on adjacent chips.

So caching is destined to become more and more important over time, and everyone is destined to care more and more about caches.

But caches, by their mere existence, create a serious logical problem: they hold duplicate copies of data.

(To be more precise, there's no problem if the system has only one cache and all data movement goes through that one, as might happen in a single-processor system that does only program-driven I/O. The hazard only arises when there are other devices that might access memory without going through that one cache, like a DMA controller.)

Sometime the data will probably change (that's what data processors do!), and when that happens all the duplicate copies are suddenly wrong. If these incorrect copies continue to be supplied to their local processors (or I/O devices!) logical inconsistencies result that can wreak havoc (though many applications can tolerate limited inconsistency for a little while).

The trivial example of this problem is two processors sharing a variable, which they both have copied to their local cache. One spins, testing over and over, waiting for the other to change the variable (e.g., let me know when you're through with the printer). When the other changes the variable, it changes its own cached copy and (depending on the system design) likely updates the original memory as well. But without a coherence mechanism, the first processor continues to see its now-incorrect, or "stale", copy from its local cache. The system waits forever, deadlocked. There are countless subtler examples.

So as caching becomes increasingly used to compensate for the increasing remoteness of data we share, keeping these caches consistent becomes increasingly important. That's what SCI's "coherence" does.

SCI tells how to send signals ("Interface") in a very application-and- technology-independent way ("Scalable"). It also explains how to keep track of duplicate cached copies of data so that all the stale copies can get refreshed when someone changes the data, and how to do this in a system of unspecified size and shape, with an unspecified number of processors and I/O controllers performing unspecified applications, in a way that's simple enough so that the hardware can take care of it for you.

How does coherence work?

SCI is a system interconnection standard, which understands the needs of software in an interconnected multiple-processor world of intensive communication and data sharing, which has various aspects of what we used to think of as I/O or processor/memory or networking. The boundaries are blurring, the old categories are fuzzy and now barely useful.

To do a good job on tomorrow's machines, we have to stop hyperoptimizing our old disjointed incoherent subsystems and think about the problem we're really trying to solve. If we don't, the software will become impossibly difficult to get either correct or efficient, and we'll end up using all our resources working around needless problems. (The real problems are hard enough!)

Dave Gustavson

p.s. I was pleased to read in Byte that Intel suggests interfacing PCI systems via SCI. An unexpected blessing... (Intel ought to hate SCI, because prices can be kept much higher on products that use proprietary "open" standards instead of real open standards, if you're dominant enough in the market. Surely Intel is dominant enough, if anyone is!)

p.p.s. It was amusing at the January HPCA conference in Raleigh to hear one paper explaining the Convex machine, and how it maintains coherence over a distributed system in hardware, followed later by a paper that said nobody had ever managed to do distributed shared memory in hardware because it is too hard and so here's how to do it in software! Of course, various software schemes have been used, but they are happy to get milliseconds of latency compared to SCI's microsecond or so. That makes an enormous difference in the kind of applications you can run!

--David B. Gustavson            phone 650/961-0305 fax 650/961-3530
SCI (ANSI/IEEE Std 1596 Scalable Coherent Interface) chairman
Exec. Director, SCIzzL: Assoc. SCI Local-Area MultiProcessor Users
1946 Fallen Leaf Lane, Los Altos, CA 94024-7206 dbg@SCIzzL.com