> quantitative probabilistic assessment of software failures that could have predicted, for example, Ariane 501, based on the data available before the flight?
Sorry to disagree.
Lots of experts or contributors to this list (myself included) have a
radically opposite opinion regarding the cause/nature of the failure of
the Ariane 5/501 maiden flight: that failure has *no* relationship at
all with software.
Aerospatiale and CNES had all the data needed to avoid that failure with
*absolute* certainty, for the simple reason that it is *them* who made
the choice (among others) to commission a satellite launcher that would
sustain horizontal accelerations (from nominal alignment) up to 5 times
those specified for Ariane 4.
Such a choice has to do with satellite launcher system engineering
(obviously), pertaining to the very first lifecycle phase, the
requirements capture phase. Once such a choice had been made (factor 5),
had "good" system engineering processes been followed for the subsequent
phases, the Ariane 5 contractors responsible for the on-board
command-and-control computer-based system (2 SRI and 2 OBC computers +
software) would have known right away that they had to provide for 3
extra bits of memory for storing the value of integer BH (biais
horizontal). Trivially, an integer which is 15 bit long (Ariane 4) needs
18 bits of storage when multiplied by 5.
That was totally overlooked -- obviously, a (computer-based) system
engineering fault -- leading to an overflow of the 16 bit register (sign
included) containing BH, 37 seconds after lift-off, and this is why
flight 501 blew up.
As far as I know:
1) The concept of horizontal velocity of a satellite launcher is not a
2) Labelling "software fault" an erroneous dimensioning of a
buffer/register/memory cell (which dimensioning depends solely on the
external physical world) leads to the interesting conclusion that
everything that is "in touch" with software is software; computers for
example; hence, a failure due to having selected a computer that is too
slow is a software fault; or a failure due to the overflow of a buffer
meant to store incoming requests, under overloads (the concept of
overloads is akin to to the concept of waiting queues, and as far as I
know, the skills required in queuing theory have little to do with the
skills required in the software domain),
3) A posteriori analysis revealed that the code of the conversion
procedure (which converts horizontal velocity (floating point) into
integer BH)) is fault free; it computed the correct value for BH at
T0+37, but found not enough bits to store it; why should we blame "the
software" when a code is correct?
4) Factor 5 being ignored, the choice of a particular implementation
technology for instantiating the conversion procedure (which converts
horizontal velocity (floating point) into integer BH)) is irrelevant
vis-à-vis the analysis of the cause; had the conversion procedure been
implemented in optoelectronics or mechanics or tupperware (thank you
Peter!), Ariane 5/501 would have exploded as well; one should not
confuse causes (factor 5 ignored, hence buffer overflow) and
consequences (exception raised by "the software", upon detection of the
If interested, you can find detailed reports on this topic published
circa 1999, including contributions to this list.
PS: The contents of the Inquiry Board report comprise the following
"(next time) conduct complete software inspection and simulation". This
is rather astonishing, for at least two reasons:
* Acquiring the necessary knowledge "factor 5" needs no software
inspection, nor any sort of simulation; moreover, as long as knowledge
"factor 5" is ignored, against which specification would software --
fully inspected -- be declared "correct"?
* This is yet another example of the biased view according to which one
can build correct systems simply by conducting a posteriori verification
of software programs; such verifications can be conducted in reference
to specifications, notably "high-level" specifications; questions almost
-- Where do such specifications come from?
-- How can we tell whether specification S (to be implemented in a
verifiable manner) is a good/correct specification of a sub-problem
which is provably raised by "my" overall composite (real world)
requirements specification? Tony Hoare himself has recently pointed out
the fact that this is now the weakest link (in applied and theoretical
> Dear sirs,
> I'm willing to learn. In your previous post, you may be referring to John D. Musa's work (RiP), but in his work, the quality of the probabilistic assessment is heavily dependent on the operational profiles selected, as you are alluding. In other words, on the quality of the requirements.
> So I repeat my questions:
> Do you have a precise and documented reference, within the 61508 standard, for the quantitative probalistic assessment of software failures?
> Are there references, in any standard, for quantitative probabilistic assessment of software failures that could have predicted, for example, Ariane 501, based on the data available before the flight?
> Are there references, in any standard, for the quantitative probabilistic assessment of common mode failures of hardware?
> Best regards,
> +33 (0)6 80 44 57 92
> -----Original Message-----
> From: safety-critical-request@xxxxxx [mailto:safety-critical-request@xxxxxx] On Behalf Of Prof. Dr. Peter Bernard Ladkin
> Sent: jeudi 5 novembre 2009 20:35
> To: safety-critical@xxxxxx
> Subject: Re: [sc] Could anybody advise .....
> On Nov 5, 2009, at 8:06 PM, <Thierry.Coq@xxxxxx> wrote:
>>Let's go back to the reference document.
> Good idea.
>>§3.6.5 : Random Hardware Failure, Note 2:
>>NOTE 2 - A major distinguishing feature between random hardware
>>failures and systematic failures (see 3.6.6), is that system failure
>>rates (or other appropriate measures), arising from random hardware
>>failures, can be predicted with reasonable accuracy but systematic
>>failures, by their very nature, cannot be accurately predicted. That
>>is, system failure rates arising from random hardware failures can
>>be quantified with reasonable accuracy but those arising from
>>systematic failures cannot be accurately statistically quantified
>>because the events leading to them cannot easily be predicted
> Since software failures are regarded in 61508 as systematic failures,
> it follows that
>>>A major distinguishing feature between random hardware failures and
>>>software failures ...... is that system failure rates ... arising
>>>from random hardware failures, can be predicted with reasonable
>>>accuracy but software failures, by their very nature, cannot be
>>>accurately predicted. That is, system failure rates arising from
>>>random hardware failures can be quantified with reasonable accuracy
>>>but those arising from software failures cannot be accurately
>>>statistically quantified because the events leading to them cannot
>>>easily be predicted
> I cite from an e-mail from perhaps the world's leading expert on
> software reliability:
>>Systematic failures, particularly software failures, are
>>"systematic" only in the sense that when exactly the same conditions
>>(external and internal) apply, they are reproducible. If a
>>particular input causes failure once, it will always cause failure.
>>But this is a *very* uninteresting notion of systematic. We are
>>interested in the failure behaviour of software. There is
>>uncertainty about which inputs cause failure, and when they will
>>occur. That's why you need a probabilistic treatment, and why
>>notions of "failure rate", "probability of failure on demand" are
>>needed. ......... All the theory has been around for over thirty
> It follows that the contrast in §3.6.5 is spurious.
> Peter Bernard Ladkin, Professor for Computer Networks and Distributed
> University of Bielefeld, 33594 Bielefeld, Germany
> www.rvs.uni-bielefeld.de +49 521 880 73 19
Received on Fri 06 Nov 2009 - 18:08:49 GMT