A BETTER HAT FILE FORMAT 
ART Memo 19 (Version 0) 
Olaf Chitil, 9 April 2001


INTRODUCTION

The Hat trace file, as described in version 0 of Memo 14, contains a
lot of redundant information. Hence the trace file is  unnecessarily
large and takes unnecessarily long to write. On the other hand there
are a few small bits of desirable information missing in the file.


1. A SOURCE REFERENCE FOR AN INDIRECTION

At the moment an indirection consists of two traces. The meaning is:
this trace is identical with the second trace, expect that the parent
is not the one given in the second trace, but the first trace given here
(is `indirection' a good name for this?).

An indirection enables Hat to record the information, that for e.g.

main = last [True,False]

(with `last' suspected), the parent of the result `False' is `last
[False]'.  However, when you ask for the source position of `False' (its
use), then you get the `False' on the right hand side of `main'. Instead
you should get the `x' on the right hand side of the equation 
`last [x] = x'.

The general rule should be: The use position of an expression should
always be on the right hand side of the parent redex.

Hence an indirection should additionally have a source reference. Then
its meaning is: this trace is identical with the second trace, expect
that the parent is not the one given in the second trace, but the first
trace given here, and the source reference is not the one given in the
second trace, but the one given here.


2. COMBINING APP+SAT AND NM+SAT TO DESRIBE REDUCTIONS

The redex trail browser shows only redexes, but hat-observe and hat-detect
show reductions. A redex is an APP node in the trace, or, in the case
of a caf, simply a Nm node. The latter tools currently assume that the
result of a redex is stored right after the redex in a Sat node. To save
space and time a redex in strict context doesn't have a Sat node. Then
the result is the result of the parent.

We have the invariant, the a Sat is always preceded by an App or a
Nm to which it belongs. Combining Apps and Nms with their Sats into a
single node would make this invariant explicit in the trace structure
and impossible to accidentally break. Furthermore, such integration
would not only make it possible to get from an App or Nm to its Sat,
but also conversely to get from a Sat to its App or Nm. This feature
would make implementing hat-detect much more simple (when you have the
outermost application of the right-hand-side of a redex, you can quickly
get all other child nodes of the redex).

So let's introduce nodes Red(A/B/C):

Red(A/B/C):  arity (trace of Red) (trace of fn) 
             (trace of arg1) ... (trace of argn) (trace of result) srcref

Special case for Caf: 
Red(A/B/C):  0  (trace of Red) (trace of Nm)
             (trace of result) srcref

(trace of Nm) and (trace of result) are always the same and could be
shared. Needs some further thought.

Note that the trace of the result is only valid for a RedC. For RedA
and RedB there is only a dummy value. (Actually it might be a good
idea to have a valid result trace also for a RedB, to facilitate the
implementation of hat-detect for incomplete computations.)

Note that when we are interested in a parent and it is a RedX, then we
want to see the application. When we a interested in an argument and it
is a RedC, then we want to see the result.

For the above-mentioned optimisation that the result of a redex in strict
context can be obtained from its parent, the old App and Nm nodes without
integrated Sat are still required.


3. REORGANISING NAMES FOR A SMALLER TRACE

Nearly 50% of the space of the trace file is used for Nm nodes. However,
most of these Nm nodes are at least partially redundant. For every
reduction of a suspected redex, these nodes are created for all names
occurring on the right hand side (for trusted redexes `only' names of
constructors and literals). So every time the same function is entered,
all these Nm nodes are created anew. However, they only (possibly)
differ with respect to the parent. And the parent of a Nm is most times
identical with the parent of the surrounding application.

So propose to replace Nm and all NmType nodes by:

NmDef: position + identifier (string)   (contains position of definition)

NmOcc: position + value  for Int,Char,Integer,Rational,Float,Double,
                             Fun,Case,Lambda,Dummy,CString,If,Guard,Container
NmOccDef: position + pointer to NmDef   (contains position on rhs)

NmUse: parent pointer + position + value  for Int,Char,...  
NmUseDef: parent pointer + position + pointer to NmDef

The latter two are actually abbreviations for indirections to NmOccs
and NmOccDefs, respectively.

NmDef itself is not a trace, but all others are traces. The `Occ'
versions are used when the parent can be obtained from the surrounding
application. They are only created once for a right hand side, no matter
how often the function is used.


4. INLINE SOURCE REFERENCES

Proposal 3 removes many duplicate source references. Certainly for
NmDefs, but also for NmOccs and NmOccDefs it seems reasonable to inline
the source reference instead of using a pointer.


5. NO SOURCE REFERENCES FOR TRUSTED CODE

No source reference should point to trusted code. The only possible
exception seems to be the definition position of a trusted identifier. At
least you want to know in which trusted module it is defined. Instead of
wasting space with many null source references and pointers to source
references, we could have a trusted version of each node that does not
contain (a pointer to) a source reference.