================================
REDUCERON MEMO 12
An algorithm for arity-reduction
Matthew N, 7 January 2009
================================

The Reduceron reads the top eight stack elements into a register file
on every clock cycle.  This register file has eight read ports,
allowing fast block-instantiation of function bodies.  The problem is
that functions must take no more than eight arguments because then a
single block-read from stack memory is no longer sufficient.  Instead,
multiple reads would be needed, and block-reads would be of little
help when instantiating an application that refers to stack elements
eight or more addresses apart.  Things become more complicated and
less efficient without the arity limitation.

In Memo 7, I described a solution using an eight-port stack --- any
eight elements can be read in parallel, not just the top eight.
Indeed, this requires that we perform multiple reads, one for each
block of template nodes to be instantiated.  Although these reads can
be pipelined (reading of the next block of needed arguments can be
done in parallel with instantiation of the current block of template
nodes), there remains the one-cycle overhead of priming the pipeline
when instantiating the first application in the body.  There is no
such overhead in the original method because we always know exactly
what stack elements we'll need, namely the top eight, so we can always
pre-fetch the arguments a clock-cycle in advance.

So it seems that the current approach works quite well assuming that,
a lot of the time, functions take eight or fewer arguments anyway.
Still, the arity limitation must be solved.  To do so, we will
transform programs so that no function takes more than eight
arguments.  This memo defines such a transformation, called
arity-reduction.

Turner's abstraction algorithm
------------------------------

It is first useful to observe that arity-reduction is indeed possible.
Given a function of 'n' arguments, Turner's abstraction algorithm
produces one of 'n-1' arguments.  In the process, auxiliary functions
such as 'S' and 'K' might be introduced, but these functions
themselves take only three and two arguments respectively.  So
Turner's algorithm can be applied repeatedly until all functions in a
program take no more than eight arguments.

The only problem is that this may result in a lot of fine-grained
reductions of 'S' and 'K' being performed when executing functions of
more than eight arguments.  The algorithm below aims to be more
efficient; the introduced combinators do not need to be so
fine-grained, and they do not need to be the same for all programs.

Dijkstra's abstraction algorithm
--------------------------------

In reaction to Turner's work on combinators, Dijkstra developed his
own abstraction algorithm.  I find Dijkstra's algorithm clearer than
Turner's, and more amenable to tweaking.  So our algorithm is based on
Dijkstra's.

Dijkstra's main idea is to annotate the interior nodes of an
applicative expression with strings of combinators, rather than
introduce combinators into the expression by additional applicative
structure --- i.e. new identifiers and application nodes.  Dijkstra's
syntax for expressions is

> data Exp = Id String | App [Com] Exp Exp | Empty
>   deriving Show

An identifier is a variable (an argument to a function) or a function
name. A Com (combinator) is an S, B or C.

> data Com = S | B | C
>   deriving Show

An application containing an empty sequence of combinators is just a
standard binary application.  An application with a sequence of n > 0
combinators and sub-expressions e_0 and e_1 can be viewed as a lambda
expression of the form

  \v_1 .. v_n -> (e_0 vs) (e_1 ws)

where vs and ws are sequences containing zero or more of the variables
v_1 .. v_n as defined by the sequence of combinators.  More
specifically, if the i^th combinator in the sequence is S then v_i is
passed down to both e_0 and e_1; if it is B then v_i is passed down
only to e_1; if it is C then v_i is passed down only to e_0.

EXAMPLE 1. For some expressions e_0 and e_1, the expression

  App [S,B,C] e_0 e_1

can be viewed as the lambda expression

  \x y z -> (e_0 x z) (e_1 x y)

EXAMPLE 2. An expression can also be 'Empty'.  For some expression e,
the expression

  App [S] e Empty

can be viewed as the lambda expression

  \x -> (e x) x

'Empty' expressions are introduced during the abstraction process.

Here is Dijkstra's abstraction algorithm, defined in Haskell.

> dijkstra :: String -> Exp -> Exp
> dijkstra x (Id y) | x == y = Empty
> dijkstra x (App coms e0 e1)
>   | x `occursIn` e0 && x `occursIn` e1 =
>       App (S:coms) (dijkstra x e0) (dijkstra x e1)
>   | x `occursIn` e0 = App (C:coms) (dijkstra x e0) e1
>   | x `occursIn` e1 = App (B:coms) e0 (dijkstra x e1)

The auxiliary function 'occursIn' is defined as you'd expect.

> occursIn :: String -> Exp -> Bool
> x `occursIn` Empty = False
> x `occursIn` (Id y) = x == y
> x `occursIn` (App coms e0 e1) = x `occursIn` e0 || x `occursIn` e1

Dijkstra assumes that the variable to be abstracted from an expression
does indeed occur in that expression.  We will later remove this
assumption.

Converting Dijkstra's Expressions to Combinator Expressions
-----------------------------------------------------------

Dijkstra's expressions aren't supported by standard functional
evaluators such as the Reduceron, so we'd like to convert them to
expressions which are.  To do this, we move the combinator annotations
to identifiers occurring in the applicative structure.  So Dijkstra's
algorithm, composed with the conversion function below, has a similar
function to Turner's algorithm.

We assume that the identifiers I, B, C, and S are defined by the
following equations.

  I x = x
  B k f g x = k f (g x)
  C k f g x = k (f x) g
  S k f g x = k (f x) (g x)
 
With a left-associative operator for function application,

> infixl @@
> (@@) :: Exp -> Exp -> Exp
> e0 @@ e1 = App [] e0 e1

the conversion is defined as follows.

> convert :: Exp -> Exp
> convert Empty = Id "I"
> convert (Id x) = Id x
> convert (App [] e0 e1) = e0 @@ e1
> convert (App cs e0 e1) = 
>   foldr (@@) (Id "I") (map (Id . show) cs) @@ convert e0 @@ convert e1

(The third equation is unnecessary, but saves symbols.)

EXAMPLE 3.  The value of the expression

> example3 = convert (App [S,B,C] (Id "e0") (Id "e1"))

is 'S (B (C I)) e0 e1' (pretty-printed).

Vectored applications
---------------------

We now modify Dijkstra's algorithm to work on vectored applications,
rather than just binary ones.  This makes combinators more
coarse-grained because they don't just say "pass left" or "pass
right", but "pass to the i^th sub-expression".

First of all, the syntax is changed as follows.

> data Exp2 = Id2 String | App2 [Dir] [Exp2]
>   deriving Show

Strings of combinators are no longer represented as lists of S, B, or
C combinators, but as lists of director strings.

> type Dir = [Bool]

The i^th element of a director string states whether or not an
argument is passed to the i^th element of the application.

EXAMPLE 4. The application

  App [S,B] e_0 e_1

is now represented by

  App2 [[True,True],[False,True]] [e_0, e_1]

That is, S is equivalent to the director string [True, True] (pass the
argument to both sub-expressions) and B is equivalent to [False, True]
(pass the argument to the right-hand sub-expression only).

Note that an all-False director string is possible, e.g. [False,
False].  This allows us to abstract a variable from an expression in
which that variable does not occur.

We no longer have an explicit constructor for empty expressions; these
can be sensibly represented as nullary applications.

> empty :: Exp2
> empty = App2 [] []

The modified abstraction algorithm is as follows.

> dijkstra2 :: String -> Exp2 -> Exp2
> dijkstra2 x (Id2 y)
>   | x == y = empty
>   | otherwise = App2 [[False]] [Id2 y]
> dijkstra2 x (App2 dirs es) = App2 (bs:dirs) es'
>   where bs = map (x `occursIn2`) es
>         es' = [if b then dijkstra2 x e else e | (e, b) <- zip es bs]

This function can indeed abstract a variable from an expression in
which that variable does not occur.

The auxiliary function 'occursIn2' is defined as you'd expect.

> occursIn2 :: String -> Exp2 -> Bool
> x `occursIn2` (Id2 y) = x == y
> x `occursIn2` (App2 dirs es) = any (x `occursIn2`) es

The Modified Conversion Algorithm
---------------------------------

We must now modify the conversion algorithm for vectored applications.
In a further effort to increase the granularity of the combinators, we
attempt to introduce a single combinator for each sequence of director
strings rather than one for each director string.  As there is a limit
on the number of arguments that a function can take, this is not
always possible.  So the algorithm below takes the arity-limit as an
argument and introduces combinators that encode chunks of director
strings, where the chunks are as large as possible without breaking
the arity-limit.

> convert2 :: Int -> Exp2 -> Exp2
> convert2 n (Id2 x) = Id2 x
> convert2 n (App2 [] []) = Id2 "I"
> convert2 n (App2 [] es) = App2 [] es
> convert2 n (App2 ds es) =
>     App2 [] (foldr apply (Id2 "I") combs : map (convert2 n) es)
>   where
>     m = n - length es - 1
>     combs = map (\ds -> App2 ds []) (groupN m ds)
>     apply a b = App2 [] [a, b]

This function assumes that variable 'm' is never less than or equal to
zero.  Equivalently, vectored applications must be shorter than the
arity-limit minus one.  This condition can be easily established in an
initial application-splitting pass.

Above, we use the syntax 'App2 ds []' to represent a combinator,
specifically the combinator defined by 'comb ds'.

> comb :: [Dir] -> (Int, Exp2)
> comb (d:ds) = (1 + n + m, body)
>   where
>     n = length d
>     m = length (d:ds)
>     body = App2 [] (Id2 "0" : args)
>     args = zipWith apply [1..n] is
>     is = map (\bs -> [i | (b,i) <- zip bs [n+1..], b]) (transpose (d:ds))
>     apply n is = App2 [] (map (Id2 . show) (n:is))
 
This function takes a sequence of director strings and generates a
combinator definition in the form of a pair whose first element is the
arity of the combinator and whose second element is the body of the
combinator.
 
EXAMPLE 5.  The expression

> example5 = comb [[True,False,True],[True,False,False]]

generates the following combinator definition (pretty-printed).

  \v0 v1 v2 v3 v4 v5 -> v0 (v1 v4 v5) v2 (v3 v4)

Remaining Questions
-------------------

How do we deal with let expressions?

Auxiliary functions
------------------

> transpose :: [[a]] -> [[a]]
> transpose [] = []
> transpose [x] = map (: []) x
> transpose (x:xs) = zipWith (:) x (transpose xs)

> groupN :: Int -> [a] -> [[a]]
> groupN n [] = []
> groupN n xs = take n xs : groupN n (drop n xs)