PROFILING HAT TRACE FILES USING HAT-CHECK ART Memo N (Version 1) Colin Runciman, 23 March 2001 INTRODUCTION We are now recording redex trails in binary files, using the format described in Memo 14. Constructing traces as independent archives is one of the main goals of ART, and there are many advantages in having the new .hat files. However, trail-generating computations now take even longer than was the case for heap-based trails (50 X untraced time is the rule of thumb at the time of writing) and some of the files are alarmingly big (tens or even hundreds of megabytes). Approximating to whole powers of two, tracing increases run-times by an exponent of six (ie. six factors of two = 64X). For acceptance with our intended users, this exponent must be reduced to about three -- or less! That is, we need to find a factor-of-two improvement three times over -- a factor of two for each of us in the official ART team! So how to obtain `my' factor of two? There is a folk-theorem that careful application of any reasonable profiling scheme generates improvements worth a factor of two. Phil Hassall is working to restore basic time-profiling to nhc, but meanwhile why not apply space-profiling to the trace files? Generally speaking, smaller files take less time to write. HAT_CHECK Hat-check is a small program written in C that reads and checks the new .hat files. I have recently extended it with an option to produce statistical summaries of file contents. usage: hat-check [-a] [-s] [-v] file-name -a print ascii text version of hat file -s print statistics about frequency and size of nodes -v verify tag types of pointer destinations EXAMPLE PROFILES Here are two examples of statistical profiles reported by hat-check. The first is for the 44Mb trace file generated by Adjoxo with the `edgecorner1' position as input. The second is for the 0.75Mb trace generated by PsaCompiler with a simple GCD routine as input. Number Description Bytes % Space 1 header 16 766883 TR application nodes 16178230 36.4 1718602 TR name nodes 22341826 50.2 64431 TR indirection nodes 579879 1.3 10085 TR hidden nodes 50425 0.1 31612 TR type A SAT nodes 158060 0.4 5 TR type B SAT nodes 25 0.0 871794 TR type C SAT nodes 4358970 9.8 41 MD nodes 777 0.0 163376 NT nodes 816562 1.8 253 SR nodes 2277 0.0 ------------------------------------------------ whole trace file 44487047 100.0 Number Description Bytes % Space 1 header 16 6632 TR application nodes 136424 17.9 35384 TR name nodes 459992 60.4 1845 TR indirection nodes 16605 2.2 1345 TR hidden nodes 6725 0.9 845 TR type A SAT nodes 4225 0.6 5 TR type B SAT nodes 25 0.0 19945 TR type C SAT nodes 99725 13.1 87 MD nodes 1762 0.2 7170 NT nodes 29586 3.9 727 SR nodes 6543 0.9 ------------------------------------------------ whole trace file 761628 100.0 Another way to classify the contents of .hat files is to distinguish pointers from other data. For the two trace files the figures are: Number Description Bytes % Space 163739 pointers (all kinds) 654956 86.0 9859615 pointers (all kinds) 39438460 88.7 PRELIMINARY ASSESSMENT 0. In the current representation almost 9 out of 10 bytes are used for pointers, so pointer representation is the key to saving space. The standard resource-saving representation for jumps in machine code is to introduce relative as well as absolute jumps. Most pointers in .hat files go backwards (all but type C SATs), and a natural choice of size for relative pointers would be half the size of an absolute pointer -- factor-of-two thinking! One bit is needed to tag each pointer as local or absolute, so the potential half-size pointers are those that point backwards less that 2^15 bytes. (Null pointers can also be included.) How much would that save? Number Description Bytes % Space 76978 point back <(2^15)bytes 307912 40.4 5044761 point back <(2^15)bytes 20179044 45.4 So halving the size of these pointers would save over 20%. And this is an under-estimate because reducing the size of pointers would shorten the distances between nodes, making more pointers local. The actual saving might be more like 25%. One disadvantage is that the range of absolute pointer addresses is halved -- from something over 4Gb to something over 2Gb. But (a) do people really want 4Gb traces? and (b) the maximum size of trace represented is not reduce by as much as a half, because the representations become smaller. 1. Turning now to specific kinds of node, TR names are the obvious target for further improvements. Recall that after the tag-byte these nodes contain exactly three pointers: the parent TR, the NT entry for the name and the SR node giving the source reference (this pointer may be zero). Notice that the total number of NT entries is only a fraction of the TR-name count, the number of SR nodes is far smaller still and there can only be one name (though perhaps many expressions) associated with any SR. This suggests that many different TR name nodes share the same pair of NT and SR pointers. I suggest cutting the number of pointers in TR names from 3 to 2 as follows. Make (NT,SR) pairs into first-class nodes, and point to these nodes from TR name nodes with source-references. Better still, avoiding the need for a fifth kind of node, make two variants of SR, those with an associated NT and those without, and point directly to NT-bearing SRs from TR names. For NT names without source-references, define a variant that points directly to an NT entry. It is hard to predict how much this would save in addition to the local-pointer scheme, but perhaps 10%-15%. I have a dim memory that the heap-based trails used a similar scheme, perhaps informed by heap-profiling, but that we changed the representation for the file-based version! Another possible improvement might be to separate numbers from names, allowing all NT nodes to occur at the start of the file. All NT references could then use short absolute pointers. Non-source numbers up to 32 bits could instead be encoded directly in place of the NT pointer in a TR node variant; bigger numbers could go into new number node. All these changes, however, would take quite a lot of effort, for fairly minor gains. 2. TR application nodes are the other big space hogs. After the tag-byte these contain an arity byte, a parent TR for the application, arity+1 TR pointers for the function and each argument, and an SR pointer referring to the source of the application. The average size of an application node for both examples is 20-21 bytes, indicating an average of between one and two arguments. There is no clear way to save further space in these nodes -- though they are prime beneficiaries of the local pointer scheme. Often the application and the function share the same parent, and a variant of Ap nodes could save one TR pointer in this case, but the gains may be hardly worth the trouble. 3. TR type C SATs are also quite prominent. They will be all the more prominent with the introduction of local backward pointers elsewhere: a type C SAT often (always?) contains a forward pointer that must continue to occupy a full four bytes. We know that around 80% of SATs were shorted out at each garbage-collection in the heap-based system. We also know that updating the type A SATs to type C can be time-consuming. Is there some scope for holding back type A SATs in a `pending pool' so that many of them make the B-C transition and fewer SATs are written to the .hat file? This would save about another 10%. SUMMARY OF PROPOSED IMPROVEMENTS * Introduce half-size pointers for nulls and backward going references to nodes less than 32k bytes back. Estimated saving: 25%. * Instead of an SR pointer and an NT pointer in every TR name node, have variants with either an SR pointer (with associated NT in referent) or an NT (when no source-reference anyway). Estimated saving: 15%. * Hold back SATs in a pending pool so that many of them can be cancelled before they ever reach the .hat file. Estimated saving: 10%. That's 25% + 15% + 10% = 50%. Maybe! :-) FURTHER WORK A more detailed break-down of the statistics (eg. TR names with and without source references; NT entries for names, small numbers, big numbers) will allow a more confident prediction of improvements. More significantly, there may be substantial parts of currently recorded traces that are unreachable or unwanted, particularly in connection with trusted functions. I am working on a further extension of hat-check to allow these dispensible parts to be identified and measured.