|
Welcome to the world of high-risk
technologies. You may have noticed that they seem to be multiplying,
and it is true. As our technology expands, as our wars multiply, and
as we invade more and more of nature, we create systems --- organizations,
and the organization of organizations -- that increase the risks for the operators,
passengers, innocent bystanders, and future generations. In this book
we will review some of these systems -- nuclear power plants, chemical
plants, aircraft and air traffic control, ships, dams, nuclear weapons,
space missions, and genetic engineering. Most of these risky
enterprises have catastrophic potential, the ability to take the lives of
hundreds of people in one blow, or to shorten or cripple the lives of thousands
or millions more. Every year there are more such systems.
The good news is that if we can understand
the nature of risky enterprises better, we may be able to reduce or even
remove these dangers. I have to present a lot of the bad news here
in order to reach the good new, but it is the possibility of managing
high-risk technologies better than we are doing now that motivates this
inquiry. There are many improvements we can make that I will not dwell on,
because they are fairly obvious -- such as better operator training, safer
designs, more quality control, and more effective regulation.
Experts are working on these solutions in both government and
industry. I am not too sanguine about these efforts, since the risks
seem to appear faster than the reduction of risks, but that is not the
topic of this book.
Rather, I will dwell upon characteristics
of high-risk technologies that suggest that no mater how effective
conventional safety devices are, there is a form of accident that is
inevitable. This is not good news for systems that have high
catastrophic potential, such as nuclear power plants, nuclear weapons
systems, recombinant DNA production, or even ships carrying highly toxic
or explosive charges. It suggests, for example, that the probability
of a nuclear plant meltdown with dispersion of radioactive materials to
the atmosphere is not one chance in a million a year, but more likely one
chance in the next decade.
Most high-risk systems have some special
characteristics, beyond their toxic or explosive or genetic dangers, that
make accidents in them inevitable, even "normal." This has
to do with the way failures can interact and the way the system is tied
together. It is possible to analyze these special characteristics
and in doing so gain a much better understanding of why accidents occur in
these systems, and why they always will. If we know that, then we
are in a better position to argue that certain technologies should be
abandoned, and others, which cannot abandon because we have built much of
our society around them, should be modified. Risk will never be
eliminated from high-risk systems, and we will never eliminate more than a
few systems at best. At the very least, however, we might stop
blaming the wrong people and the wrong factors, and stop trying to fix the
systems in a way that will only make them riskier.
The argument is basically very
simple. We start with a plant, airplane, ship, biology laboratory,
or other setting with a lot of components (parts, procedures,
operators). Then we need two or more failures among components that
interact in some unexpected way. No one dreamed that when X failed,
Y would also be out of order and the two failures would interact so as to
both start a fire and silence the fire alarm. Furthermore, no one
can figure out the interaction at the time and this know what to do.
The problem is just something that never occurred to the designers.
Next time they will put in an extra alarm system and a fire suppressor,
but who knows, that might just allow three more unexpected interactions
among the inevitable failures. This interacting tendency is a
characteristic of a system, not of a part or an operator; we will call it
the "interactive complexity" of the system.
For some systems that have this kind of
complexity, such as universities or research and development labs, the
accident will not spread and be serious because there is a lot of slack
available, and time to spare, and other ways to get things done. But
suppose the system is so also "tightly coupled," that is,
processes happen very fast and can't be turned off, the failed parts
cannot be isolated from other parts, or there is not other way to keep the
production going safely. Then recovery from the initial disturbance
is not possible; it will spread quickly and irretrievably for at least
some time. Indeed, operator action or the safety systems may make it
worse, since for a time it is not known what the problem really is.
|