HPCwire interviews Gustavson on SCI

DAVE GUSTAVSON ANSWERS QUESTIONS ABOUT SCI: PART I (and II and III)  10.04.96
by Alan Beck, editor in chief                                         HPCwire
=============================================================================

  Los Altos, Calif. -- Controversy surrounding a recent interview about
networking technologies (See HPCwire article 10181, "EXPERT PREDICTS
MULTIHEADED FUTURE FOR NETWORKING", 09.20.96) led HPCwire to contact one of
SCI's (scalable coherent interface) principle developers, David B. Gustavson,
who is IEEE CS/MMSC Microprocessor Standards Committee chairman, executive
director of SCIzzL (Scalable Coherent Interface Local Area MultiProcessing
Users, Developers, and Manufacturers Association), and research professor at
Santa Clara University, in order to obtain a full account of his views about
the character and potential of this technology for HPC. Following are the
questions asked and his complete answers.

  (Because of the detailed character of Gustavson's replies this article
has been divided into three parts, to be published in as many consecutive
editions.)

  (Note for SCIzzL Website version: Those three parts have been combined
here for your convenience, and with permission from HPCwire, but are
available from HPCWire as article 10249, 10282, and 10316. An error that
appeared in the first edition of the second section was corrected in later
editions and is correct in this one. I.e., an SCI interface needs about 500
bytes of packet storage, not 500 KBytes as was misstated initially.

  Sun's high-availability SCI cluster product had not been announced when
this interview took place, but was announced during the course of
publication of the series. See article 10288 for details. In that article
SCI is called the Scalable Cluster Interface, but Dolphin's press release
on the subject made it clear that this really is the Scalable Coherent
Interface.)


---

  HPCwire: What was your role in developing SCI, and what is your current
connection with it?

  GUSTAVSON: "I was the chairman of the IEEE P1596 working group that
developed SCI. During that development I was a member of the Computation
Research Group of the Stanford Linear Accelerator Center at Stanford
University, funded by the U.S. Department of Energy, and previously I had
worked at SLAC as an experimental elementary particle (or "high energy")
physicist.

  "I had gotten increasingly involved in computing as a result of the data
acquisition and analysis problems encountered in experimental physics, then
learned about computer buses while trying to make an S-100 bus in my MITS
Altair (serial number 9) work reliably. I became involved in IEEE Standards
as a result (we fixed the S-100 bus while standardizing it as IEEE Std 696).

  "There were many bus standards, all with serious deficiencies, so in the
late '70s some of us began work on a preemptive strike, Futurebus, trying to
prepare a high quality general purpose 32-bit bus before the industry would
really need it, by developing the technology within the standardization
process.

  "We also started a Serial Bus project about that time. These went through
several generations before completing (IEEE Std 896.x Futurebus+ in 1992,
too little and too late to find a market window that would justify another
modular backplane bus in addition to VME; IEEE Std 1394 Serialbus in 1995,
finding a good consumer market for digital video and desktop I/O--its
earlier generations suffered from the lack of a clear mission).

  "In 1987 it became clear that microprocessor power was increasing so
rapidly that soon even Futurebus+ would not be able to support
multiprocessing usefully. Paul Sweazey of National Semi, the Futurebus cache
coherence chairman, started a study group to consider this problem.

  "We figured out what a solution would have to look like if one were
possible, namely a large number of independent cables carrying packets to
and from switches, with protocols that provide bus-like services without
using any bus-like physics. The hard problems were figuring out how to
allow ring connections in addition to switches (necessary for getting entry
level system costs down so volumes could be high), and how to keep cache
memories consistent in a system that has no single bus where all the data
transfers can be observed.

  "Sweazey decided this task was much bigger than Futurebus's development,
which took well over a decade, so he left to work on a simpler problem that
might be solved in less than a lifetime. That became QuickRing (developed
at Apple by Sweazey, then picked up by National Semi, where Sweazey is once
more.)

  "I became chairman of the P1596 SCI project then, mid 1988. We had really
extraordinary luck in having several key people available essentially full
time: in particular, David James of Hewlett Packard (now at Apple), I/O
architect on the PA/RISC project, who became our chief architect; some
capable industry partners from Dolphin Server Technology; some other
outstanding contributions from John Moussouris and Craig Hansen (MIPS and
MicroUnity Systems), Prof. Jim Goodman of U Wisconsin, a cache coherence
expert, and several others.

  "The result was that things moved fast, and the technical design was
complete early in 1991. Testing and polishing continued for some time, but
SCI became an approved standard in 1992.

  "Just to show how hard it is to predict the future, there were working SCI
chips before there were working QuickRing chips! Something no one would
have predicted; quite amazing, really.

  "I left Stanford in 1994, taking advantage of a very attractive payoff for
voluntary departures, and started to work with Prof. Qiang Li (pron. "chung
lee") in the Santa Clara University Computer Engineering Department,
organizing a nonprofit body to promote the development and use of standards
based on SCI, for Local Area MultiProcessing. I named the organization
"SCIzzL" as a condensation of a very long acronym for Scalable Coherent
Interface Local Area MultiProcessing Users, Developers, and Manufacturers
Association. It's pronounced "sizzle", which also has some nice connotations
of speed, hot stuff, etc.

  "SCIzzL raises money through corporate memberships (sponsors at $25k/year,
executive sponsors at $50k/year) and conference activities, meeting fees,
etc. At present, most of the members are pursuing SyncLink, IEEE P1596.7, a
fast RAM chip interface, rather than SCI itself, but primarily-SCI members
include Apple Computer, Cray Research, Bit 3 Computer, Credence Systems;
primarily-SyncLink members include Hyundai, Texas Instruments, Mitsubishi,
Micron Technology, Fujitsu, Samsung, MOSAID, Hewlett Packard, Nippon Steel,
Oki, Toshiba, NEC, and several other companies are currently in the process
of joining."

  HPCwire: Briefly describe how SCI operates and contrast this with the
other principle networking technologies, such as ATM, HIPPI, etc.

  GUSTAVSON: "SCI acts like a processor/memory/I-O bus, which uses a single
address space to specify data as well as its source and destination when
being transported.

  "That is one major factor in SCI's performance compared to ATM, HIPPI,
FibreChannel, SSA, SuperHIPPI, etc etc (the other major factor being raw
speed).

  "For example, when two tasks share data using SCI, the data remains stored
in ordinary variables, with ordinary memory addresses, at all times. Thus
processor instructions like Load and Store suffice to access data for doing
computation with it. Load and store are highly optimized in all processors,
and the underlying SCI transport mechanism is transparent to the user,
performing all the network protocol effectively as a fraction of one
instruction.

  "The other schemes (ATM etc) move data as streams: the user removes data
from its home variables, stores it in buffers, calls a library routine to
hand the buffers to an I/O interface for transporting as a byte stream. On
the receiving end, bytes arrive and fill buffers, filled buffers are handed
to the operating system (with an interrupt to get its attention), the
operating system hands the buffers to the waiting user task, the user task
parses the buffers to find the data, copies the data into variables for use
in computation again.

  "When you see all these details involved in the data transfer, it's not
surprising that communication delays for these stream-based channels or
networks are typically 1000 microseconds (pushing downward toward a few
hundred, someday maybe 10 by using "active message" techniques), because of
the many instructions required at each end.

  "SCI moves the data from one computational context directly into another,
without unlabeling the data enroute, and thus holds the communication delay
typically under a microsecond. (Assuming a reasonable implementation!
There were some experiments reported publicly, that used a severely limited
SCI interface (only one outstanding transaction at a time, and only 1/10
the standard link speed) through an early-model Sbus port. That resulted in
4 microseconds delay for passing an integer from one task to another in a
different workstation, and another 4 microseconds to pass it back. But this
is far slower than standard SCI operations.)

  "Of course, one can use the memory model with some software queues to mimic
any streaming I/O or networking model, so SCI can be used in that way also.
This is very useful for porting old software to new systems, for example.

  "Conversely, one can simulate shared memory using I/O channels for
transport. However, this is three or four orders of magnitude too slow to
be of much practical interest, though it has been done in research projects
etc.

  "The second big difference between SCI and the other connections is how one
hides the latency of accessing remote data. Every technique we know of for
hiding this latency involves using caches to keep copies of data near its
users. For example, a smart compiler might prerequest data that it knows
will be needed soon; that data will be kept in a cache until the actual
instructions needing it are finally executed. Or, the usual uses of caches:
data used multiple times in close succession (temporal locality), and
prefetching data that is stored near referenced data (spatial locality).

  "Caches have proven very useful, and there is no reason to think this will
change in the near future. But cached copies of data are duplicates of that
data, and when the data is modified the other copies are incorrect. Those
incorrect copies have to be found and discarded or updated, or various kinds
of disaster will result, ranging from deadlocks to incorrect answers.

  "Of all these interconnects, only SCI handles this problem, with SCI's
distributed cache coherence mechanism. This mechanism only adds a bit or so
to the command field in SCI's packets. Otherwise it costs nothing except
where it is needed. I.e., accessing unshared cached data is not slowed at
all by the coherence mechanisms. When shared cached data is modified, the
coherence protocol quickly locates all the other copies so they can be
discarded (and fresh copies fetched if those processors still want copies).
The distributed protocol does not add to the traffic at main memory, and it
turned out to increase (not decrease) the robustness of a system!

  "The SCI coherence protocols also are completely invisible to switches and
bridges, an enormous improvement over the situation we faced with bridged
buses that use the usual snooping coherence. The entire protocol is carried
out using ordinary packets such as are used for reads and writes. Only the
command bits and the interpretation of some fields is different, but that
does not affect packet transport.

  "And finally, the SCI protocols were optimized for simplicity, in order to
keep the speed high. (SCI's turf was defined as 1 GByte/s per processor, to
keep us from interfering in the Futurebus market. We were a bit surprised
to find it was possible!) IBM demonstrated SCI in biCMOS running at 1
GByte/s (that's two-byte-wide links signaling at 2 ns per bit per signal,
using differential signals) in 1994, and is rumored to be able to do this
in CMOS now. Others have gone to slower speeds in order to use common CMOS,
or have made the fast parts of the interface in GaAs. Convex/HP used GaAs
at about 600 MBytes/s initially in the Exemplar series, and got the
bandwidth they needed by using 4 SCI's in parallel, an easy way to speed up
systems.

  "(There was a funny story in one of the electronics mags reporting on the
ISSCC, 1995 I think it was, where there were two IBM chips described. One
was a single chip FibreChannel interface running at the 1 Gbit/s rate,
which was described with real excitement as a great achievement (which it
was, of course). The other was a single chip SCI interface running at 1
GByte/s, which was only of slight interest because the writer described it
as "approaching a Gbit/s" -- perhaps because the individual signals were 500
Mbit/s? It was pretty clear the author knew that FibreChannel had to be the
fastest thing around, and adjusted the stories to make the relative ranking
fit his expectation!)"

---
End of Part I. For more info on SCI see http://www.SCIzzL.com
---

  HPCwire: What companies presently offer SCI, and how does its cost compare
with competing technologies?

  GUSTAVSON: "At present, the highest-end users make their own SCI interfaces
and don't offer them to the public. That is because SCI was designed to be
integrated into the corner of an ASIC, transceivers and all, so companies
that are using ASICs do not use separate SCI chips, and so there is no
separate SCI chip they could sell to random customers who want to use SCI.

  "There is one high-end SCI chip on the market, a GaAs SCI interface from
Vitesse Semiconductor. Vitesse is supplying Sequent Computer, who is using
SCI to interconnect Pentium Pro quad boards together to form a cache
coherent very large SMP. (Sequent was the champion in bus-based SMPs,
topping out at about 30 processors and certainly getting good speedup into
the 20s. They really understand SMPs and what limits them, so it's a
significant milestone for SCI to be going into a high-end production
machine in this superserver class. They also contributed to the SCI design,
by the way, particularly to the diagnostic capabilities that were included
in the standard.)

  "The best known supplier of SCI interfaces is a descendant of Dolphin
Server Technology, namely Dolphin Interconnect Solutions. They reduce the
speed in order to use widely available CMOS fabs, with current speeds around
200 MBytes/s and rumored speeds approaching 500. When they worked with us to
develop SCI, they planned to make processors; after a few years that industry
became untenable, and they became SCI chip designers/suppliers. (They
designed the GaAs chip that was the first working SCI interface, demonstrated
in 1993, a descendant of which became Convex/HP's Exemplar SCI interface.)
More recently, however, they have decided to become interface board suppliers
instead. This has left other potential interface board manufacturers, who are
essential for making SCI available in the general marketplace, without an
independent non-competing supplier, a very difficult situation. Dolphin is
growing rapidly, but the potential market is far too large for one company to
meet all the needs.

  "Data General is building a new family of machines linked by SCI, using
Dolphin chips initially, which may be the first to make SCI workstations
generally available. These are also (see Sequent above) based on quad Pentium
Pro boards that have SCI interfaces added. Data General has expressed the
intention of using SCI as an open standard 3d-party interconnect, the first
computer manufacturer to apply SCI to open systems as the standard intended.

  "Another supplier of SCI interface designs is about to come online,
Interconnect Systems Solution in Mission Viejo, Calif., which should improve
the supply situation. (Its founder, Khan Kibria, has designed SCI interfaces
before.) However, the interfaces still won't be widely available until
someone invests in a production run of a versatile interface chip. Buying the
chip design isn't the way the typical consumer or small startup wants to
operate...

  "Meanwhile, we're taking an additional path to get to market: the
SerialBus, IEEE Std 1394-1995, has gained support in the consumer marketplace
for digital video (Sony) and desktop computer I/O. That's an exciting market
with potential high volume, but it needs an expansion path that can handle
building-size distances and far higher bandwidth, in order to meet the needs
of a home entertainment network or a digital-video movie studio.

  "It turns out that SerialBus shares a lot of architecture with SCI (and
Futurebus), intentionally (there was a lot of cross fertilization due to
common membership in the working groups).

  "The differences evolved as driven by the different goals: SCI was to
enable supercomputing-class interconnection (and Networks of Workstations,
Clusters, LAMPs), while SerialBus was to optimize for low cost and high
volume without at all considering the needs of bridges or switches, which
were considered futures and low volume.

  "The result is that SerialBus has no concurrency (every cable carries the
same information at the same time, unlike SCI where every cable carries
different information and adds to the system bandwidth), uses a global
arbitration scheme (slow, especially compared to SCI's zero-delay
distributed arbitration), and then uses long packets to amortize the
arbitration overhead (deadly for systems with bridges or switches, because
it makes the packet buffer requirements correspondingly large, i.e.
expensive, and also greatly increases the latency caused by cross traffic).
On the plus side, SerialBus generated an explicit mechanism for carrying
"isochronous" traffic, convenient for digitized sound or video which has to
be delivered at a certain pace. (SCI required users to plan for such
applications themselves, and to discover on their own that they would want
extra queues in the bridges etc.)

  "So a project to solve these problems was started, P1394.2 Serial Express,
chaired by Bill Van Loo of Sun, with SCI's chief architect David James as
technical editor. The idea is to take all our experience with SCI as a
starting point, change some optimizations to make interface chips even
simpler, and add explicit support for isochronous traffic. These goals are
quite simple, because of the similarity of the high-level architectures of
Serialbus and SCI, so this can be done rapidly.

  "Meanwhile, Intel got on board because it needed an upgrade path from its
USB, and offered its expertise on the serial physical layer, virtually
guaranteeing high volume and thus low cost, and finally placing the interface
directly on the processor/memory bus where it needs to be for high
performance, in the "North Bridge".

  "So we have put all the inputs saved up for second-generation SCI into this
project, and see it as an alternate path to a high volume existence. For
better market acceptance, we don't talk about cache coherence (which
frightens most people--it was a big strategic error that SCI included the
coherence specs in the same standard as the signaling and packet layers).
As soon as the spec is done, we will begin work on defining parallel links
for higher performance versions, and cache coherence (the few needed hooks
are in place).

  "Recently, this picture has been clouded somewhat, however. Apparently
Intel began marketing Serial Express as a competitor to SerialBus instead of
a followon upgrade path, encouraging use of its own USB for low end
applications and going directly to Serial Express for higher performance
applications. That antagonized the SerialBus supporters enormously, who
then added to the SerialBus Supplement project P1394A a goal of generating
Gbit/s speeds as an upgrade path for SerialBus that would be directly
compatible, so bridges would not need to do any protocol conversion.

  "This sounds very desirable to all the marketing departments involved, as
well as to the consumers, so it has enormous sex appeal -- even though, for
the reasons explained above, it isn't a reasonable approach from a technical
point of view. Unfortunately, marketing usually wins these battles, which
could derail Serial Express and leave us in a preposterous situation a year
or so downstream, where the same engineers who are currently trying to head
off these technical problems will probably be the very technical experts
called in by top management to make SerialBus run fast instead. Sigh.

  "Cost: The present market situation makes it hard to quote prices in any
meaningful way. Obviously people who need performance find SCI to be cost
effective, or they wouldn't be heading in that direction.

  "What has been hidden in all this is that SCI is simple, and if you don't
demand full speed it will be cheap. Designers tell me it only takes a bit
over 20k gates to do the SCI protocols, in addition to which an interface
needs at least 500 bytes of queue storage for packets. The transceivers
fit in the same package. A bridge is little more than two node interfaces
back to back -- mainly the address recognizer has to be generalized compared
to a simple node.

  "So given comparable production volumes, SCI should be less expensive than
most any bus interface and enormously less expensive than a high-performance
bus interface. I recall seeing one bus interface with far lower performance
that used 130k gates and 16 packages, most of which were for transceivers.

  "In fact, I think SCI is less complex than SerialBus (disregarding cache
coherence, of course, which doesn't affect the SCI interface but does
affect the cache/memory controller in a coherent system)."

  HPCwire: Do you feel SCI receives enough technical-media attention? If not,
why?

  GUSTAVSON: "No, of course I don't think it does! It's very hard to keep the
media enthused about an abstraction, and that's what SCI remains until people
can actually buy SCI-interface products. Until then, it's of interest only to
a rather small audience: academic computer scientists, computer company
architects, and people with extreme bandwidth requirements.

  "Our decision to develop this technology as a standards project made it
anathema to most academics -- everyone knows, and most experience teaches,
that standards are born obsolete and are the direct opposite of innovation.
Certain conferences for which SCI should have been the leading attraction
rejected every paper mentioning SCI, for years. However, gradually the word
seems to be getting out, that new academic proposals ought at a minimum to
outperform this "obsolete" standard if they're to be considered
interesting...

  "In one conference early in 1995, on high performance parallel computation,
one paper explained why it would never be feasible to do cache coherence in
hardware, so here's how we did it in software; and another explained how
Convex did it in hardware, using SCI! It's been fun, seen from a little
distance; a bit frustrating while in the midst of it, though.

  "And of course now that SCI's becoming more respected in academe, it's
getting harder to explain why papers about a 1992 standard are still leading
edge, and worth publication.

  "So SCI's been pretty quiet, overall. That's not good for the industry,
because investors make decisions based on the level of media attention
various technologies get, and so instead of investing in the
technologically best answer they invest in the best hyped technology, and
enormous capital flows down the drain into technology that's almost
certainly doomed. Nobody wins long term when investments are wasted."

  HPCwire: For what uses is SCI most and least suitable?

  GUSTAVSON: "SCI is most useful for clustering over local-area distances or
less. I.e., kilometers; and, of course, the shorter the better. Millimeters
are great, e.g. interchip links within a box.

  "It's least suitable over long distances -- each interface chip can only
handle a certain number of concurrently active packets, which are in flight
awaiting confirmed delivery to the destination or an intermediate bridge
queue. When that number is less than the number of packets that can be in
flight in the cables, efficiency drops.

  "To some degree one can compensate by adding additional SCI node chips in
series, to provide more queue storage, but at some point it makes sense for
most applications to switch over to a wide area networking approach, such as
ATM. There are probably a few exceptions, where the value of keeping the same
model everywhere is more important than the efficiency of the link
utilization."

---
End of Part II. For more info on SCI see http://www.SCIzzL.com
--------

  HPCwire: What can be done to enhance SCI's performance when long blocks of
data and long distances are involved?

  GUSTAVSON: "Long blocks of data must be moved in short packets, so the
overhead is pretty much constant once the data exceed a few packetloads. SCI
does include the definition of 256-byte-payload packets, at which point the
overhead is only a few percent -- pretty hard to make a rational argument for
improving on that, given the enormous negative impact longer packets would
have on the system's overall performance. However, there's not much
motivation to actually implement the 256-byte packets -- most systems will
probably stop at 64, which is the likely optimal cache line size for the
next generations when all the tradeoffs are balanced. Since fast queue
memory is a very significant fraction of the interface already at 64 bytes,
it will completely dominate the cost at 256.

  "So, as long as the interface cost is not negligible, people will opt to
use 64 byte packets (with about 65 percent efficiency) at full speed rather
than dropping to half speed (to get a cheaper denser technology) in order to
use 256-byte packets (with about 95 percent efficiency). Such arguments were
often invoked during the SCI design -- it's better to waste a little
bandwidth and run much faster than to be more efficient and have to run
slower because of the associated complexity. We included the 256 option
because we had enough coding space for it, and it is useful for removing some
of the irrational objections to SCI."

  HPCwire: What if a single node requires high bandwidth?

  GUSTAVSON: "If a single node requires high bandwidth, the best answer
depends on what the limiting factor is. Usually I'd expect the best answer to
be what Convex/HP does, i.e., add more SCI interfaces in (independent)
parallel and divide the load among them.

  "If the limiting factor is the queue space in the node interface, however,
or the number of concurrent outstanding transactions, the best answer might
be to add more SCI interfaces in (independent) series, so they each
contribute traffic to the same link cable, in a (small) ring configuration."

  HPCwire: What can be done to facilitate end-users' understanding of I/O and
networking as memory transfers, as SCI requires?

  GUSTAVSON: "The basic ideas are easy to convey, as everyone can imagine
moving data among tasks in one machine via memory. The people that write
device drivers and network interface drivers need a deeper understanding in
order to use shared memory efficiently, but those are smart people and that
doesn't take them long.

  "The biggest problem is the initial shock that hits when programmers used
to protecting their interfaces from naive user code begin to get the idea,
and then are horrified by visions of everybody on the net writing into their
entire memory.

  "The next step is to realize that nobody proposes making such an open
unprotected interface, but that takes a few minutes to explain. In fact, of
course, each processor will map its memory space to/from SCI's physical
addresses using the normal virtual memory schemes with protection on a page
by page basis, and since SCI tells the processor interface who the party is
that requests access, these protections can be quite explicit. Thus, certain
other processors will be allowed access to certain of my pages, etc.

  "The best, most desperately needed, solution is to make inexpensive
SCI-coupled clusters broadly available so that many users get exposed to
these concepts and can experiment with them. What makes this take so long is
that every machine you can buy today has made only very slow I/O ports
accessible, rather than the fast processor/memory connection one wants.

  "The biggest gains in parallel processing will probably result from the
enormous increase in software expertise and effort made possible by broad
availability of parallel machines."

  HPCwire: Please give a realistic assessment of SCI's future.

  GUSTAVSON: "SCI has made steady gains in acceptance. The laws of physics
haven't changed, and the predicted limits of processor buses have been
reached. There are only two competitive choices right now -- design a
proprietary switch-based interconnect in order to survive another generation
(SGI, for example), or use SCI (or something very like it).

  "The economics that favor using a standard are overwhelming -- it's an
enormous task to design a proprietary system, prove its correctness, develop
manufacturing, testing, and field support equipment for it, interface it to
every relevant other kind of interconnect, etc. SCI has already done the
hardest parts (I doubt there has ever been another cache coherence protocol
so thoroughly tested, for example!)

  "The first tough competitor is NIH (Not Invented Here). Every engineer
knows he/she can design a better interconnect, and probably every one has to
design at least two in their career. But competition is eliminating fat
profits, and realistic accounting shows that designing a custom
interconnect is extremely costly. In some cases, there are more rational
reasons -- for example, Cray Research had good reasons why they couldn't meet
certain deadlines unless they used certain technology, which meant they had
to double the width of the links to get the bandwidth, and once one is off
the precise standard it's hard to argue against optimizations for
particular applications. But they also recognize the disadvantages of going
it alone, and have expressed an intention to migrate to the next true
standard when the time is right (skip a generation). So I count them as
reasonable people, not the usual NIH mentality. Most NIH types just don't
understand the whole picture, and don't foresee all the interactions their
design decisions will have down the road. Competition will eventually weed
out most of these.

  "The second tough competitor is Serial Express. If all proceeds as
envisioned, Serial Express will be priced for high volume production, and
thus will eventually blow away all alternatives, including 1596 SCI, on
price and even on cost-effectiveness (it might be cheaper to use several
Serial Express interfaces in parallel to get the bandwidth needed, if
prices are low enough).

  "My solution to the second competitor is to declare that a victory for
SCI -- think of Serial Express as SCI-2.

  "Of course, all rational expectations can be dashed by a tidal wave of
marketing emotion, reinforced by massive misinvestment. If an inferior
product gets produced in sufficient volume, it can become so cheap it can't
be displaced and we're stuck with it forever.

  "However, that extra factor of a hundred that shared memory gives by
eliminating the software overheads of streaming interconnects, is probably
big enough to ensure that shared memory systems will be built. And once
there is shared memory there is another big factor to be gained by adding
hardware cache coherence.

  "And so far, even after five years, there's no other standard in sight that
comes within orders of magnitude of SCI. Just in raw bandwidth, using
comparable technology, it takes ten FibreChannel Gbit links, or sixty four
155 Mbit ATM links, to equal one SCI link; and raw bandwidth is orders of
magnitude away from being the whole story. The whole story has to handle
multiprocessor synchronization, locks, barriers, shared data structures,
deadlock hazards, starvation, big- vs little-endianness, hot spots, etc.
None of the other standards address these issues.

  "So, I'm convinced that SCI will succeed, gaining gradually, unless someone
makes an enormous investment to deliver something else that's comparable.
And if that works correctly, I'll buy it -- I own no stock in SCI, and SCIzzL
is nonprofit. I'd just like to see the industry take advantage of this free
technological gem... such things don't come along every year, or even every
decade. It took a lot of good luck to get the right ingredients together
for this to coalesce, and it's not likely to happen this way again. (I sure
wouldn't do things the same way again--next time I wouldn't go the
nonprofit route!)"

  Additional information about SCI can be obtained from http://www.SCIzzL.com

--------------------

Alan Beck is editor in chief of HPCwire. Comments are always welcome and
should be directed to editor@hpcwire.tgc.com

**************************************************************************
                      H P C w i r e   S P O N S O R S

       Product specifications and company information in this section are
             available to both subscribers and non-subscribers.

 936) Sony                  905) Maximum Strategy    937) Digital Equipment
 934) HP/Convex Tech. Center930) HNSX Supercomputers 932) Portland Group
 921) Cray Research Inc.    902) IBM Corp.           915) Genias Software
 909) Fujitsu                                        935) Silicon Graphics

****************************************************************************
Copyright 1996 HPCwire. Redistribution of this article is forbidden by
law without the expressed written consent of the publisher. For a free trial
subscription to HPCwire, send e-mail to trial@hpcwire.tgc.com.
tten consent of the publisher. For a free trial subscription to HPCwire, send e-mail to trial@hpcwire.tgc.com. ML> 9 �аp ��are 909) Fu �˰u 935) Silicon Graphics **************************************************************************** Copyright 1996 HPCwire. Redistribution of this article is forbidden by law without the expressed wrised wrin't go t �аP ��te!)" Add �˰nal information about SCI can be obtained from http://www.SCIzzL.com -------------------- Alan Beck is editor in chief of HPCwire. Comments are always welcome and should be directed to editor@hpcwire.tgc.com ************************************************************************** H P C w i r e S P O N S O R S Product specifications and company information in this section are available to both subscribers and non-subscribers. 936) Sony 905) Maximum Strategy 937) Dum Strategy 937) D