A BETTER HAT FILE FORMAT ART Memo 19 (Version 0) Olaf Chitil, 9 April 2001 INTRODUCTION The Hat trace file, as described in version 0 of Memo 14, contains a lot of redundant information. Hence the trace file is unnecessarily large and takes unnecessarily long to write. On the other hand there are a few small bits of desirable information missing in the file. 1. A SOURCE REFERENCE FOR AN INDIRECTION At the moment an indirection consists of two traces. The meaning is: this trace is identical with the second trace, expect that the parent is not the one given in the second trace, but the first trace given here (is `indirection' a good name for this?). An indirection enables Hat to record the information, that for e.g. main = last [True,False] (with `last' suspected), the parent of the result `False' is `last [False]'. However, when you ask for the source position of `False' (its use), then you get the `False' on the right hand side of `main'. Instead you should get the `x' on the right hand side of the equation `last [x] = x'. The general rule should be: The use position of an expression should always be on the right hand side of the parent redex. Hence an indirection should additionally have a source reference. Then its meaning is: this trace is identical with the second trace, expect that the parent is not the one given in the second trace, but the first trace given here, and the source reference is not the one given in the second trace, but the one given here. 2. COMBINING APP+SAT AND NM+SAT TO DESRIBE REDUCTIONS The redex trail browser shows only redexes, but hat-observe and hat-detect show reductions. A redex is an APP node in the trace, or, in the case of a caf, simply a Nm node. The latter tools currently assume that the result of a redex is stored right after the redex in a Sat node. To save space and time a redex in strict context doesn't have a Sat node. Then the result is the result of the parent. We have the invariant, the a Sat is always preceded by an App or a Nm to which it belongs. Combining Apps and Nms with their Sats into a single node would make this invariant explicit in the trace structure and impossible to accidentally break. Furthermore, such integration would not only make it possible to get from an App or Nm to its Sat, but also conversely to get from a Sat to its App or Nm. This feature would make implementing hat-detect much more simple (when you have the outermost application of the right-hand-side of a redex, you can quickly get all other child nodes of the redex). So let's introduce nodes Red(A/B/C): Red(A/B/C): arity (trace of Red) (trace of fn) (trace of arg1) ... (trace of argn) (trace of result) srcref Special case for Caf: Red(A/B/C): 0 (trace of Red) (trace of Nm) (trace of result) srcref (trace of Nm) and (trace of result) are always the same and could be shared. Needs some further thought. Note that the trace of the result is only valid for a RedC. For RedA and RedB there is only a dummy value. (Actually it might be a good idea to have a valid result trace also for a RedB, to facilitate the implementation of hat-detect for incomplete computations.) Note that when we are interested in a parent and it is a RedX, then we want to see the application. When we a interested in an argument and it is a RedC, then we want to see the result. For the above-mentioned optimisation that the result of a redex in strict context can be obtained from its parent, the old App and Nm nodes without integrated Sat are still required. 3. REORGANISING NAMES FOR A SMALLER TRACE Nearly 50% of the space of the trace file is used for Nm nodes. However, most of these Nm nodes are at least partially redundant. For every reduction of a suspected redex, these nodes are created for all names occurring on the right hand side (for trusted redexes `only' names of constructors and literals). So every time the same function is entered, all these Nm nodes are created anew. However, they only (possibly) differ with respect to the parent. And the parent of a Nm is most times identical with the parent of the surrounding application. So propose to replace Nm and all NmType nodes by: NmDef: position + identifier (string) (contains position of definition) NmOcc: position + value for Int,Char,Integer,Rational,Float,Double, Fun,Case,Lambda,Dummy,CString,If,Guard,Container NmOccDef: position + pointer to NmDef (contains position on rhs) NmUse: parent pointer + position + value for Int,Char,... NmUseDef: parent pointer + position + pointer to NmDef The latter two are actually abbreviations for indirections to NmOccs and NmOccDefs, respectively. NmDef itself is not a trace, but all others are traces. The `Occ' versions are used when the parent can be obtained from the surrounding application. They are only created once for a right hand side, no matter how often the function is used. 4. INLINE SOURCE REFERENCES Proposal 3 removes many duplicate source references. Certainly for NmDefs, but also for NmOccs and NmOccDefs it seems reasonable to inline the source reference instead of using a pointer. 5. NO SOURCE REFERENCES FOR TRUSTED CODE No source reference should point to trusted code. The only possible exception seems to be the definition position of a trusted identifier. At least you want to know in which trusted module it is defined. Instead of wasting space with many null source references and pointers to source references, we could have a trusted version of each node that does not contain (a pointer to) a source reference.