THE JULY 2001 TRACEFEST ART Memo 24 (Version 0, 12 September 2001) Colin Runciman INTRODUCTION On 9-10 July we held a Tracing Fest, with Amanda Clare and John O'Donnell as invited guests and the ART Team including Thorsten and Phil as the local participants. The Fest followed the now-established pattern: * The initial sessions were used for introductions from each participant to the application program they were contributing (and for explanations of how the tracing tools worked). * Then for each application we agreed a team of two breakers, a fixer and a recorder/assistant, and proceeded to break, fix and record for some time! * We finished with a general discussion of what we had learned. The three following sections of this memo are about each of these aspects of the meeting. The purpose is to provide some record of what we did, and in particular to note various ideas and lessons for future work that emerged -- at all levels from minor tool-tweaks to the whole tracing framework. TEST PROGRAMS The application programs we used for the tracing exercises were as follows: Adder (John, ~350 lines, one module) is a simulation of digital circuits for adding binary numbers, represented as lists of binary digits. The program includes a series of adders, defined with increasing sophistication, and the final version is a log-time carry look-ahead adder. Adjoxo (Colin, ~100 lines, one module) is a program that adjudicates given noughts-and-crosses positions using a recursive minimax analysis of the game tree. (The same program was used in a previous tracing fest.) Avl (Phil, ~125 lines, one module) is a program that carries out various operations over AVL-balanced trees, including summing integer labels, traversing to obtain a list, etc. HaskellInterface (Malcolm, over 1000 lines, several modules) is part of the nhc98 front-end including lex and parse phases. The program generates auxiliary information (.hx files) and pretty-printed annotated source. The focus for the exercise was on a cluster of three modules, with a total of ~900 lines code. HierLearn (Amanda, ~600 lines, several modules) is an inductive learning program. Given logical clauses representing background knowledge, a series of examples, mode and type information, the program generates predicates for characteristic patterns of the examples, each with calculated support and confidence values. Pretty (Olaf, ~300 lines, several modules) is a simple application of Olaf's pretty-printing scheme based on lazy dequeues, which is used to format nested conditional expressions. The pretty-printer is linear time but depends on laziness in subtle ways. The Queue module was trusted for the exercise. Reduction (Thorsten, ~200 lines, single module) is a program for parsing and evaluating terms under a first-order tree-rewriting system. The evaluator supports two different evaluation strategies -- `parallel' or `leftmost' reduction. SPECIFIC ISSUES FROM USING HAT Rather than incorporate the recorded notes of the fixers and breakers verbatim I will try in this section to distill their many comments about shortcomings in the current Hat tools, and some suggestions for how they could be improved. Concerning hat-detect: * Some questions about very large equations too hard to answer, and ?n is of doubtful value (eg. ?y is less work if fault is in fact elsewhere). * Some children of CAF computations are not reached, but they are if the offending value is `decaffed' by the addition of an argument. * Marking a CAF computation incorrect sometimes causes detection to terminate inappropriately with a diagnosis too high in the EDT (eg, see insert and tree1 in the Avl program). * Suggestion: an option to see LHS outline for all child equations and consider them in any order. * Sometimes the same question is repeated (CAF problem?). * Sometimes `funny' questions are asked, for example of the form: f arg = _ ? or f = f ? * It sometimes seems necessary to `lie' about an equation to reach a part of the trace that seems relevant. Concerning hat-observe: * It would be useful to make available a menu of functions for which there are observed applications in the trace (perhaps with simple application counts). * To reduce problems with large expressions, there could be an option to sort listed observations by size. * There are problems observing functions passed as arguments to a trusted higher-order function. * Processing large .hat files can be too expensive. In one case it took 4-5 minutes for hat-observe to report 1% progress. Even after 100% is shown there can be a long wait for the first equation. * Some equations were ill-typed -- incorrect section of trace extracted for the result? Concerning hat-trail: * Long strings cause problems: display in full is inconvenient for the user, and really long strings cause hat-trail to crash. * Cannot run more than one hat-trail at once. * Conditional/case/guard expressions involving `within' seem the wrong way round to some users; reverse the order of `within' chains? * Conflict between parent trace and source position for results of projections. * Better to cut-off expression display at unfiform constructor type than at fixed depth? * Source window needs a module/file name and line numbers -- maybe even a feature to open an editor at a selected position. Also, the selected source position is sometimes one line below the visible part of the source. * White-spaces representing function application are too small, and needed brackets round arguments are sometimes missing -- eg. (\) Const "zero" when the single argument is (Const "zero") * If all the output shown is correct, but some is missing, where to click? (A variation of the empty-output problem.) * If computation is interrupted the (partial) output is not shown so the user can only trace from the interruption point, which can be frustrating. Concerning the Hat toolset as a whole: * Printing and display of expressions needs to be improved (eg better formatting of large expressions, better control over format -- ideally hooks for application-specific formatting) and more consistent between different tools. * Support for working at finer level than complete function definitions could be improved (eg. hat-trail conditionals scheme can be useful, but some find it confusing; hat-observe and hat-detect avoid confusion only by denying access to traces of locally defined functions). * Applications of `invisible ink' or `desugared' functions not in the source program still occur: for example, fromInteger can occur and list comprehensions are still a problem. * Some kinds of value are incorrectly displayed: for example, Double, Integer and packed strings. * Compositions are not well-supported as one cannot see the intermediate results. An example that occurred in one program: (return . foldr(/) . map read . lines) input * Interrupting a computation can result in a bad .hat file. GENERAL CONCLUSIONS The conclusions in this section were drawn in our closing discussion. They are about the whole enterprise of tracing, and the way we develop and evaluate tools in future. Though everyone seemed to think the TraceFest had been a useful and instructive exercise, I'll concentrate here on the criticisms and ideas for improvement. Future development: * We need to discover and document productive strategies for using the various Hat tools in combination. For example, one strategy is to begin with interleaved (ie mutually informed) used of hat-trail and hat-observe to `close in' on a faulty section of the computation which can then be examined systematically using hat-detect. * We should address the problem of how to avoid or retreat from `blind alleys' where the user may be faced with very large expressions (or small but seemingly obscure expressions) and can see no way forward. * All the tools could benefit from stronger links to program sources, and we should also try to support some inverted links *from* program sources. * When a program is re-executed (possibly after modification) it is a likely requirement on the part of the programmer to examine again the same places (or the corresponding/equivalent places) in the trace as for a previous execution. How can this be supported? * Users need a stronger sense of an overall integrated framework, in which they can control options and other stateful information, and gain linked access to all the kinds of information they need. * Although GHood was not used in this TraceFest it is good for evaluation- order faults which our current strictifying Hat tools don't reveal. Future evaluation exercises: * It can be hard to work with an unfamiliar program to the extent of finding and fixing faults. Besides getting used to the tools there are difficulties in understanding application concepts. So perhaps we should get authors to fix their own programs, after they are broken by others. * Another idea is for authors to set comprehension exercises rather than debugging problems. This would also serve to test the usefulness of tracing tools for tasks other than fault-finding. * We still need more rigorous ways of recording and analysing the results of our tracing exercises.