Mike Ellims wrote:
> ............ we could/should compile a
> basic reading list of books and papers every safety engineer should have
> read. ..............
> [I propose the following] 1. anyone can submit up to five entries, either books or papers.
A very worthy goal, Mike. Indeed, I have been thinking about this issue
continually for some
years now. I suspect that everyone who has to teach a course on basic
system safety for
PESs does so. The problem is, as usual, even reducing the basic
literature down to some
list which one person could be expected to read over a few years.
Although I am curious as to the result of your proposed procedure, I am
sceptical that it
can bring what we want. For everything that will turn up on your list
must be regarded by
at least one person as amongst the five most important documents in the
field.
For example, there is an argument that, whatever one's personal
favorites, at least five of those
should turn out to be standards (Mil Std 882, IEC 61508, Def Stan 00-56,
DO-178B, DO-254). For
how can you work professionally without knowing what will be formally
required of you?
But not every standard will be on everybody's list.
I suggest first a classification into categories. There are more than
five categories, even at a first
cut. (This gives another reason for being sceptical that your scheme
will succeed.)
Restricting myself to books and book-length contributions, and trying to
choose just one, in the case
that there is at least one which suffices, I come up with the following,
using my category classification.
The list is obviously and inevitably incomplete, because a substantial
part of the literature only exists
as papers, and no one has yet put a selection of readings together. It
may also be incomplete because
I have forgotten stuff.
A. What to watch out for.
- Peter Neumann, Computer-Related Risks
- Charles Perrow, Normal Accidents
- Andrew Hopkins, Safety, Culture and Risk
- Andrew Hopkins, Lessons from Longford
- Nancy Leveson, Safeware.
None of these tell you how you should go about putting together a
safety-related software-based system
at the level of tools. So
B. Design and Programming Tool books
- John Barnes, High Integrity Software (the SPARK book)
- Les Hatton, Safer C (for those who feel they must use C)
- Alan Burns and Andy Wellings, Real-Time Systems and Their Programming
Languages
- Nobody, A Usable Definitive Specification-Technique Book
There are thousands of books on specification, and I have my
favorites, but I don't seem
to be able to choose just one.
C. Algorithms and Techniques
- Nobody, Design Diversity, Fault Tolerance, and so on
I don't know any single good book on all the important techniques.
None of these books tell you how to talk to other PES safety
professionals. We need a common
vocabulary. Notoriously, there isn't one (better said, there isn't one
that is both coherent and
used). So the best we can do is to know what people might be meaning.
C. Concepts and Definitions
- Meine van der Meulen, Definitions for Hardware and Software Safety
Engineers
- Jean-Claude Laprie, Dependability: Basic Concepts
- (also Leveson, op. cit., chapter 9)
Then there is analysis. What do we do to ensure that what we have done
is appropriate?
First of all, we must know what we *must* do. And there are diffferent
ways of construing
this. A wise engineer will familiarise himherself with more than one. I
cannot claim intimate
familiarity will all of these, and I doubt that anyone can. But there
are at least four, with whose
principles I believe that all PES safety engineers should be familiar.
Unfortunately, there is
no book or series of books discussing those principles.
D. Standards
- U.S. Mil Std 882 (C and D) for its statement of the "system safety"
position
- IEC 61508 for its general international applicability
- U.K. MoD Def. Stan 00-56 for its provenance incorporating formal methods
- RTCA DO-178B as the first of its kind for software and its
international applicability
Then we need books on specific techniques. I pick one for each.
E. Analysis (at any level): Risk Analysis, Hazard Analysis, Safety
Requirements Analysis,
System Design and Implementation Analysis, and so on
- Leveson, op. cit. again for a general survey
- E. Lloyd and W. Tye, Systematic Safety (for traditional methods, more
detailed but less complete
than Leveson)
- Redmill, Chudleigh, Catmur HAZOP and Software HAZOP
- James Reason, Managing the Risks of Organisational Accidents
- Tim Bedford and Roger Cooke, Probabilistic Risk Analysis
- Jens Braband, Risikoanalyse (for qualitative risk analysis,
unfortunately only available in
German)
- U.S. NUREG Fault Tree Handbook
- Peter Ladkin, Causal System Analysis
- Gerard Holzmann, The SPIN Book (for checking your designs)
- Nobody: An FMEA book
Then there is the human stuff that we need to know about, even if we are
primarily
digital-system-component people. It is hard to pick one book for each
aspect.
Here is a selection of books, each of which address aspects which the
others don't.
For most PES safety engineers, these are read-once-then-keep-handy
items, so there
can be more of them.
F. Human and Organisational Factors
- Don Norman, The Psychology of Everyday Things
- James Reason, Human Error
- Neil Johnston, Rob Lee, Dan Maurino and James Reason, Beyond Aviation
Human Factors
- Diane Vaughan, The Challenger Launch Decision
- Scott Snook, Friendly Fire
- Reason, Managing op. cit.
- Hopkins, op. cit
- Perrow, op. cit.
Then there is how to analyse when things go wrong. This is usually
performed by specialist
teams that work with their own organisation-internal methods. But there
are a couple of books.
G. Failure Analysis Methods
- Chris Johnson, Failure in Safety-Critical Systems
- Peter Ladkin, Causal System Analysis
- Reason, Managing op. cit. (describes TRIPOD)
Finally, there are the general technical basics. Engineers should all be
familiar with programming,
rigorous specification, probability and decision theory, management of
safety-critical projects.
H. Basic Techniques
- 1. Programming
* Barnes, op. cit
* Burns & Wellings, op. cit.
- 2. Rigorous Specification
* Nobody: A Usable Definitive Specification-Technique Book
One must look at usable favorites. People are going to say "what
about UML?",
but in its current state UML does not come close to sufficing for
Correct-by-Construction
techniques, for example. My personal favorite:
* Leslie Lamport, Specifying Systems
Others's favorites
* Somebody, Some Book on Z
- 3. Verification Techniques
I would claim that a safety-critical PES engineer should also
know generally *at least*
about pre- and post-condition, Hoare-logic techniques. There are
a couple
of recent books by John Fitzgerald and Peter Gorm Larsen which I
don't know, but I do
know the oldie but goodie
* Cliff Jones, Systematic Software Development with VDM
I myself prefer TLA to Hoare-logic techniques, because it works
at the algorithm level,
and there are those who prefer PVS or PROMELA, but there are no
books on
verification with those (PROMELA is in any case intended for
model-checking, not
verification, and Holzmann, op. cit., describes that application).
- 4. Testing
I don't know any single book, or pair of books, on which I would
rely
- 5. Probability and Decision Theory
Here there are lots. And oldie but goodie, which is my personal
first source, is
- William Feller, An Introduction to Probability Theory
But this does not go into the principles of how to use it. So
-Ian Hacking, An Introduction to Probability and Inductive Logic
For decision theory, I like
- Michael Resnick, Choices
- Brian Skyrms, Choice and Chance
and for the specific application, PRA,
- Bedford and Cooke, op. cit.
- 6. Management of safety-critical projects
Maybe somebody else could suggest some? Here, I like to mix and
match and no one
book recommends itself to me as *the* book.
I emphasise again that these are just books and book-length literature.
There are some fundamental
topics missing, such as testing and the limits of testing. There are
lots of books on testing. Maybe
someone could suggest a primary one. The limits of testing are of course
contained in two papers
which I often cite.
PBL
--
Peter B. Ladkin, Professor of Computer Networks and Distributed Systems,
Faculty of Technology, University of Bielefeld, 33594 Bielefeld, Germany
Tel+msg +49 (0)521 880 7319 www.rvs.uni-bielefeld.de
Received on Fri 20 Oct 2006 - 08:11:17 BST