=================================
REDUCERON MEMO 28
Mark-Compact in O(survivors) time
Matthew N, 6 August 2009
=================================
One of the big advantages of copying collectors is that they take time
linear in size of the live graph. Most collectors take time linear in
the size of the heap. In functional language implementations, the
live graph is typically small compared to the heap, due to many
short-lived allocations. This makes a copying collector very
attractive in a functional setting.
On the other hand, one of the big disadvantages of copying collectors
is that they require the heap to be split into two sub-spaces: a
"from-space" where the live graph and the garbage resides and a
"to-space" where the live graph is written to. The result is that
maximum residency of the live graph is halved, wasting precious
memory.
This memo simply points out that a one-space compacting collector that
takes time linear in the size of the live graph is possible. The size
of the live graph is sometimes referred to as the number of survivors,
hence the time-complexity of the algorithm is O(survivors). The
algorithm is a mild variation of the Two-Finger Mark-Compact
algorithm.
The Two-Finger Mark-Compact algorithm
-------------------------------------
Jones and Lins define Edwards's Two-Finger algorithm as follows.
numSurvivors = mark();
relocate();
updatePointers(numSurvivors);
heapPointer = numSurvivors + 1;
The "mark" phase marks the live cells in the heap, and takes time
proportional to the size of the live graph.
The "relocate" phase makes use of a left pointer and a right pointer.
The left pointer starts at the left end of the heap and advances from
left to right. The right pointer starts at the right end of the heap
and advances from right to left. On each iteration, the left pointer
is advanced to point to the next free cell, and the right pointer is
advanced to point to the next live cell; the leftmost free cell is
overwritten with the rightmost live cell, and the righmost live cell
is overwritten its new address, as pointed to by the left pointer.
The "relocation" phase finishes when the left pointer is larger than
the right pointer.
Finally, the "updatePointers" phase iterates through the compacted
live graph and updates any pointers to relocated cells with their new
addresses.
The algorithm is linear in the heap size because every cell on the
heap will have, at some stage during the algorithm, been pointed to by
either the left or right pointer.
Mild variation
--------------
Let us replace the right pointer with a traversal pointer that
advances along the live graph, beginning at the root. Now on every
iteration, advance the traversal pointer to point the to next live cell
whose address is larger than "numSurvivors". Relocate the live cell
to address pointed to by the left pointer, as above.
This variant of the algorithm traverses "numSurvivors" cells three
times: once in the mark phase, once when advancing the left pointer
from address 0 to address numSurvivors, and once when advancing the
traversal pointer along the live graph.
Other considerations
--------------------
It is simple to implement a copying collector, and even simpler to do
so with without using stack! However, the mark phase in Mark-Compact
does require either a stack or the pointer-reversal technique, and
these are both somewhat complicated to implement.