PES Safety Literature (Was: Diverse Software Development)



PES Safety Literature (Was: Diverse Software Development)

From: Peter B. Ladkin <ladkin_at_xxxxxx>
Date: Fri, 20 Oct 2006 09:16:03 +0200
Message-ID: <453877B3.4090400@xxxxxx>
Mike Ellims wrote:
> ............  we could/should compile a
> basic reading list of books and papers every safety engineer should have
> read. .............. 
> [I propose the following] 1. anyone can submit up to five entries, either books or papers.

A very worthy goal, Mike. Indeed, I have been thinking about this issue 
continually for some
years now. I suspect that everyone who has to teach a course on basic 
system safety for
PESs does so. The problem is, as usual, even reducing the basic 
literature down to some
list which one person could be expected to read over a few years.

Although I am curious as to the result of your proposed procedure, I am 
sceptical that it
can bring what we want. For everything that will turn up on your list 
must be regarded by
at least one person as amongst the five most important documents in the 
field.

For example, there is an argument that, whatever one's personal 
favorites, at least five of those
should turn out to be standards (Mil Std 882, IEC 61508, Def Stan 00-56, 
DO-178B, DO-254). For
how can you work professionally without knowing what will be formally 
required of you?
But not every standard will be on everybody's list.

I suggest first a classification into categories. There are more than 
five categories, even at a first
cut. (This gives another reason for being sceptical that your scheme 
will succeed.)

Restricting myself to books and book-length contributions, and trying to 
choose just one, in the case
that there is at least one which suffices, I come up with the following, 
using my category classification.

The list is obviously and inevitably incomplete, because a substantial 
part of the literature only exists
as papers, and no one has yet put a selection of readings together. It 
may also be incomplete because
I have forgotten stuff.

A. What to watch out for.
- Peter Neumann, Computer-Related Risks
- Charles Perrow,  Normal Accidents
- Andrew Hopkins, Safety, Culture and Risk
- Andrew Hopkins, Lessons from Longford
- Nancy Leveson, Safeware.

None of these tell you how you should go about putting together a 
safety-related software-based system
at the level of tools. So

B. Design and Programming Tool books
- John Barnes, High Integrity Software (the SPARK book)
-  Les Hatton,  Safer C (for those who feel they must use C)
- Alan Burns and Andy Wellings, Real-Time Systems and Their Programming 
Languages
- Nobody, A Usable Definitive Specification-Technique Book
   There are thousands of books on specification, and I have my 
favorites, but I don't seem
    to be able to choose just one.

C. Algorithms and Techniques
- Nobody, Design Diversity, Fault Tolerance, and so on
   I don't know any single good book on all the important techniques.

None of these books tell you how to talk to other PES safety 
professionals. We need a common
vocabulary. Notoriously, there isn't one (better said, there isn't one 
that is both coherent and
used).  So the best we can do is to know what people might be meaning.

C. Concepts and Definitions
- Meine van der Meulen,  Definitions for Hardware and Software Safety 
Engineers
- Jean-Claude Laprie,  Dependability: Basic Concepts
- (also Leveson, op. cit., chapter 9)

Then there is analysis. What do we do to ensure that what we have done 
is appropriate?
First of all, we must know what we *must* do. And there are diffferent 
ways of construing
this. A wise engineer will familiarise himherself with more than one. I 
cannot claim intimate
familiarity will all of these, and I doubt that anyone can. But there 
are at least four, with whose
principles I believe that all PES safety engineers should be familiar. 
Unfortunately, there is
no book or series of books discussing those principles.

D. Standards
- U.S. Mil Std 882 (C and D) for its statement of the "system safety" 
position
- IEC 61508 for its general international applicability
- U.K. MoD Def. Stan 00-56 for its provenance incorporating formal methods
- RTCA DO-178B as the first of its kind for software and its 
international applicability

Then we need books on specific techniques. I pick one for each.

E. Analysis (at any level): Risk Analysis, Hazard Analysis, Safety 
Requirements Analysis,
     System Design and Implementation Analysis, and so on
- Leveson, op. cit.  again for a general survey
- E. Lloyd and W. Tye, Systematic Safety (for traditional methods, more 
detailed but less complete
             than Leveson)
- Redmill, Chudleigh, Catmur HAZOP and Software HAZOP
- James Reason, Managing the Risks of Organisational Accidents
- Tim Bedford and Roger Cooke, Probabilistic Risk Analysis
- Jens Braband, Risikoanalyse (for qualitative risk analysis, 
unfortunately only available in
                German)
- U.S. NUREG Fault Tree Handbook
- Peter Ladkin, Causal System Analysis
- Gerard Holzmann, The SPIN Book (for checking your designs)
- Nobody: An FMEA book

Then there is the human stuff that we need to know about, even if we are 
primarily
digital-system-component people. It is hard to pick one book for each 
aspect.
Here is a selection of books, each of which address aspects which the 
others don't.
For most PES safety engineers, these are read-once-then-keep-handy 
items, so there
can be more of them.

F. Human and Organisational Factors
- Don Norman, The Psychology of Everyday Things
- James Reason, Human Error
- Neil Johnston, Rob Lee, Dan Maurino and James Reason, Beyond Aviation 
Human Factors
- Diane Vaughan, The Challenger Launch Decision
- Scott Snook, Friendly Fire
- Reason, Managing op. cit.
- Hopkins, op. cit
- Perrow, op. cit.

Then there is how to analyse when things go wrong. This is usually 
performed by specialist
teams that work with their own organisation-internal methods. But there 
are a couple of books.

G. Failure Analysis Methods
- Chris Johnson, Failure in Safety-Critical Systems
- Peter Ladkin, Causal System Analysis
- Reason, Managing op. cit. (describes TRIPOD)

Finally, there are the general technical basics. Engineers should all be 
familiar with programming,
rigorous specification, probability and decision theory, management of 
safety-critical projects.

H. Basic Techniques
- 1. Programming
         * Barnes, op. cit
         * Burns & Wellings, op. cit.
- 2. Rigorous Specification
         * Nobody: A Usable Definitive Specification-Technique Book
       One must look at usable favorites. People are going to say "what 
about UML?",
       but in its current state UML does not come close to sufficing for 
Correct-by-Construction
       techniques, for example. My personal favorite:
         * Leslie Lamport, Specifying Systems
       Others's favorites
         * Somebody, Some Book on Z
- 3. Verification Techniques
        I would claim that a safety-critical PES engineer should also 
know generally *at least*
        about pre- and post-condition, Hoare-logic techniques. There are 
a couple
        of recent books by John Fitzgerald and Peter Gorm Larsen which I 
don't know, but I do
        know the oldie but goodie
        * Cliff Jones, Systematic Software Development with VDM
        I myself prefer TLA to Hoare-logic techniques, because it works 
at the algorithm level,
        and there are those who prefer PVS or PROMELA, but there are no 
books on
        verification with those (PROMELA is in any case intended for 
model-checking, not
        verification, and Holzmann, op. cit., describes that application).
- 4. Testing
        I don't know any single book, or pair of books, on which I would 
rely
- 5. Probability and Decision Theory
        Here there are lots. And oldie but goodie, which is my personal 
first source, is
       - William Feller, An Introduction to Probability Theory
        But this does not go into the principles of how to use it. So
       -Ian Hacking, An Introduction to Probability and Inductive Logic
       For decision theory, I like
       - Michael Resnick, Choices
       - Brian Skyrms, Choice and Chance
       and for the specific application, PRA,
       - Bedford and Cooke, op. cit.
- 6. Management of safety-critical projects
       Maybe somebody else could suggest some? Here, I like to mix and 
match and no one
       book recommends itself to me as *the* book.

I emphasise again that these are just books and book-length literature. 
There are some fundamental
topics missing, such as testing and the limits of testing. There are 
lots of books on testing. Maybe
someone could suggest a primary one. The limits of testing are of course 
contained in two papers
which I often cite.

PBL

-- 
Peter B. Ladkin, Professor of Computer Networks and Distributed Systems,
Faculty of Technology, University of Bielefeld, 33594 Bielefeld, Germany
Tel+msg +49 (0)521 880 7319  www.rvs.uni-bielefeld.de
Received on Fri 20 Oct 2006 - 08:11:17 BST