Ariane 501



Date view Thread view Subject view Author view

Tom Anderson (Tom.Anderson(at)newcastle.ac.uk)
Mon, 29 Jul 1996 14:44:35 +0100


The moderator of this list invited me to circulate an earlier version of the following, so after updating, here it is. ----------------------------------------- The report from the "Inquiry Board" into the failure of the maiden flight of the Ariane 5 launch vehicle is available on the Web at URL http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html The report makes very interesting reading and clearly identifies the (rather astonishing) software inadequacies to which the failure is confidently attributed. These inadequacies suggest that the software "engineering" for this project fell somewhat short of the highest standards, and that the overall system engineering did not take full account of software issues. Anyone who is interested in this matter should certainly read the very well written report (my print out is only 10 pages long), but here is my (casual) summary of the causality - as presented by the report. There was a software kludge in Ariane IV which kept a pre-launch alignment process running after lift-off, beyond its normal time-limit (this was in case the count down was held up, because restarting the pre-launch process had a very high time overhead; after lift-off it could have been stopped but it wasn't necessary in Ariane IV so they didn't bother). In Ariane V the kludge was no longer necessary, but no-one thought to clean it out, or to test whether it might cause a problem (because it had always been OK previously). [Perhaps this was a singularly unwise application of the "if it ain't broke don't fix it" principle.] Unfortunately, it turned out to be absolutely essential that this process didn't keep running after lift-off of Ariane V, because it deviated too much from reality and thereby raised a run-time exception in the 'active inertial reference system'. [Lower level details: the process was busy calculating pre-launch alignment figures (irrelevant after lift-off); it was kept running (on an unwarranted "just in case I'm needed" basis) for about 40 seconds into any flight; Ariane V is designed to take up an early higher horizontal velocity than Ariane IV, so the pre-launch process receives larger values to work with, and the processing involves conversion from 64bit float to 16bit integer; this conversion operation raised a run-time error exception on the large value presented 36 seconds into the flight (because this particular conversion operation was not dynamically checked - other conversions were checked but it was "known" to be unnecessary to check this one).] The designed response to such an exception is to shut down the active primary inertial reference processor (because they have a live back-up running in parallel) and transfer control to the back-up secondary processor. Unfortunately, the secondary contains identical software, and it had already shut down (on the previous cycle, 72ms before) having hit the exception fractionally earlier. So the main on board computer had to stick with the primary inertial reference processor, which by then was presenting its diagnostic bit pattern. This bit pattern indicated full nozzle deflection required, so that's what happened, which ripped off the boosters, which correctly triggered a full auto-destruct. End of summary. Turning to the report itself, I found it incredible (almost literally unbelievable) that the commission should find it necessary to make the observations quoted below. It *does* seem to be the case, on the basis of the report, that the observations are completely justified and appropriate - but for this to be so, in 1996, for such a high profile, high technology project, with such an enormous cost of failure (in money and prestige) - I'm astounded (and therefore, perhaps, naive). These are the last 3 paragraphs of section 2.2 of the report, except for the bits in [ ] which I've added. -------------------- Returning to the software error, the Board wishes to point out that software is an expression of a highly detailed design and does not fail in the same sense as a mechanical system. Furthermore software is flexible and expressive and thus encourages highly demanding requirements, which in turn lead to complex implementations which are difficult to assess. An underlying theme in the development of Ariane 5 is the bias towards the mitigation of random failure. The supplier of the SRI [Inertial Reference System] was only following the specification given to it, which stipulated that in the event of any detected exception the processor was to be stopped. The exception which occurred was not due to random failure but a design error. The exception was detected, but inappropriately handled because the view had been taken that software should be considered correct until it is shown to be at fault. The Board has reason to believe that this view is also accepted in other areas of Ariane 5 software design. The Board is in favour of the opposite view, that software should be assumed to be faulty until applying the currently accepted best practice methods can demonstrate that it is correct. [I would have preferred "demonstrate that it appears to be correct"; my son observed that a better, and shorter, statement is simply "Software should be assumed to be faulty."] This means that critical software - in the sense that failure of the software puts the mission at risk - must be identified at a very detailed level, that exceptional behaviour must be confined, and that a reasonable back-up policy must take software failures into account. -------------------- It seems that some key parts of the Ariane development team did not realise that software "does not fail in the same sense as a mechanical system"; had a profound "bias towards the mitigation of random failure"; thought that "software should be considered correct until it is shown to be at fault"; did not appreciate that software could be mission critical (or safety critical), or that a "back-up policy must take software failures into account". No further comment seems necessary. Tom Anderson


Date view Thread view Subject view Author view