======================================== REDUCERON MEMO 50 Finding and increasing PRS candidates Colin Runciman (post-implementation version) 23 June 2010 ======================================== When PRS was first proposed, in Memo 8, the closing paragraphs suggested that PRS candidates could be determined statically. The idea was first to determine by iterative analysis which arguments are always passed as primitive values, then to use that information to identify PRS expressions. However, the current PRS implementation detects primitive redexes dynamically. A dynamic solution seemed more in keeping with the Reduceron approach, and it avoided the complication of a compile-time analysis. The dynamic scheme does work well in many respects, but it has drawbacks. Expressions still have to be identified at compile-time as *possible* PRS candidates. For such candidates, PRS register slots are reserved. But then at run-time, some candidate primitive applications may turn out not to be redexes after all: so these slots are wasted, and the fall-back mechanism to construct the primitive application for normal evaluation is quite costly. In this memo, I pick up the thread from memo 8 and propose a static analysis of programs to determine *guaranteed* PRS instances. The proposed method uses an abstraction for the degree of evaluation of algebraic data, so that PRS opportunities can often be detected even when argument values are components of larger structures. The analysis procedure also increases the opportunities for PRS by *cloning* functions of which some but not all applications give rise to PRS opportunities. However, the method set out here does not extend to higher-order programs; I plan to address that extension in a subsequent memo. An Example ---------- Consider the following F-lite program. 01 sumRange m n = sum (range m n) ; 03 range m n = case (<=) m n of { 04 True -> Cons m (range ((+) m 1) n) ; 05 False -> Nil ; 06 } ; 08 sum xs = sumInto 0 xs ; 10 sumInto a xs = case xs of { 11 Nil -> a ; 12 Cons y ys -> sumInto ((+) a y) ys ; 13 } ; Assume that the program as a whole is identified with the sumRange function, and that literal values for m and n are supplied as input. Here's how we might reason informally about whether the primitive applications of (<=) on line 03, and of (+) on lines 04 and 12, can be evaluated by PRS. - In the call of range on line 01, m and n are values. Now if m is a value, then under PRS the (+) m 1 application on line 04 can be evaluated speculatively, providing a value result as the first argument in the recursive call of range. So under PRS *all* calls of range have value arguments. So the applications of (<=) and (+) in its body can both be evaluated by PRS. - The *result* of range is a list of integers. The only Cons application used to build this list, on line 04, has m as its first argument. As we have seen, m is always bound to a value. So every application of range is a "valuable" list: if it is lazily evaluated to Cons y ys then y is a value and ys is valuable. - So the argument to sum is valuable. When sumInto is called from sum, a is a value and xs is valuable. Therefore, in the Cons alternative on line 12, both a and y are values, and (+) a y is a PRS candidate. That means the first argument in the recursive call of sumInto is a value and the second is valuable, and *every* instance of (+) a y will be a PRS candidate. So it turns out that in this example *all three* primitive applications, on lines 03, 04 and 12, can be marked for PRS evaluation. Formulating the analysis ------------------------ Let's now formulate the analysis we have just conducted informally for the sumRange example. For now, I shall assume *first-order* programs with non-recursive local definitions.So the syntax of programs is as follows: p ::= d* -- 1st d is the main function; d ::= f v* = e -- single-equation definition; e ::= n -- number; | v -- variable; | p e1 e2 -- saturated primitive binary application; | c e* -- saturated construction; | f e* -- saturated application of named function; | case e of a* -- case with exhaustive alternatives; | let b in e -- local definition with no shadowing or recursion; a ::= c v* -> e -- alternative with flat pattern and no shadowing; b ::= v = e -- non-recursive local binding. type oracles and $ annotations ------------------------------ In some places the analysis needs to know whether an expression is definitely of some (unspecified) algebraic data type. I assume a *type oracle* in the form of a predicate isAlg. (See appendix for one way to provide this oracle, without doing full type inference.) I shall use a $ annotation to indicate *valuable* expressions $e or variables $v. These $-annotations are introduced to represent the results of analysis. An annotation $e asserts that e is *valuable*: (1) if e is integer then under PRS every instance of e during a run of the program is an integer value; (2) if isAlg e then every instance of e when lazily evaluated to an outermost construction has only valuable arguments. Valuable annotations are idempotent: $($e) = $e. It will be convenient to have a predicate isVal that is true of a variable or expression exactly if it is $-annotated. When a program defines a function f, during analysis there may be distinct variants of f with argument variables differently $-annotated. Looking at the definitions, the variation is immediately apparent: plus $m $n = ... plus m n = ... When referring to plus itself, where convenient the pattern of variable annotations will be added as a suffix, with ? indicating no annotation. So the two plus variants above may be referred to as plus$$ and plus??. Outline of the analysis process ------------------------------- The analysis applies to a *pool* of $-annotated variants of definitions in the program. The pool initially contains just one definition. It is the first definition in the program -- the main function -- with each argument variable $-annotated. The process of analysis introduces (1) additional $-annotations and (2) additional definitions. The analysis involves the application of three distinct "passes": * The F pass propagates $-annotations Forwards from binding to applied occurrences of variables. * The B pass propagates $-annotations Backwards from expressions to bindings, and also outwards from subexpressions to containing expressions; another output from the B pass is a set of requests for additional definitions to be added to the pool. * The D pass is so-called because it is applied to strong components of the Dependency graph between definitions in the pool (in non-descending order) -- its purpose is to detect and exploit valuable algebraic *results* of defined functions. As we shall see the D "pass" is a bit of a misnomer, because it may recursively invoke the entire analysis! The F pass ---------- The F pass is fairly straightforward. It takes a pool of definitions as input and returns the same definitions, but with applied occurrences $-annotated to match binding occurrences. The auxiliary parameter V is a set of variables to be $-annotated. F [ds] = F* [ds] F [f vs = e] = f vs = F_V [e] where V = {v | v in vs, isVal v} F_V [n] = n F_V [v] = if v in V then $v else v F_V [p e1 e2] = p (F_V [e1]) (F_V [e2]) F_V [c es] = c (F_V* [es]) F_V [f es] = f (F_V* [es]) F_V [case e of as] = case F_V[e] of F_V* [as] F_V [let v = e1 in e2] = let v = F_V[e1] in F_V'[e2] where V' = V union {v | isVal v} F_V [c vs -> e] = c vs -> F_V' [e] where V' = V union {v | v in vs, isVal v} The B pass ---------- The B pass also takes a pool of definitions as input, but its output is a pair of things: (1) the same pool of definitions, but with $-annotations propagated where the rules allow, and (2) a set of applications for which no matching definition was found in the pool. In support of (2), the pool of definitions is made an explicit parameter. An auxiliary maybeDefnOf ds f es searches the pool of definitions ds for a variant of function f with $-annotations matching those of the argument list es. The result may be Nothing or Just a matching definition. An auxiliary reqMerge combines two lists of applications for which new declarations are requested, to avoid duplicate requests. B [ds] = ( ds' , rqs ) where (ds',rqss) = unzip ((B_ds)* [ds]) rqs = foldr reqMerge empty rqss B_ds [f vs = e] = ( f vs = e' , rqs ) where (e',rqs) = B_ds [e] B_ds [n] = ( $n , [] ) B_ds [v] = ( v , []) B_ds [p e1 e2] = ( if isVal e1' and isVal e2' then $e' else e' , reqMerge rqs1 rqs2 ) where (e1',rqs1) = B_ds [e1] (e2',rqs2) = B_ds [e2] e' = p e1' e2' B_ds [c es] = ( if all isVal es' then $(c es') else c es' , rqs ) where (es',rqss) = unzip ((B_ds)* [es]) rqs = foldr reqMerge empty rqss B_ds [f es] = case md of Nothing -> (e', e':rqs) Just [f args body] -> if isAlg body && isVal body then ($e', rqs) else (e', rqs) where (es',rqss) = unzip ((B_ds)* [es]) rqs = foldr reqMerge empty rqss md = maybeDefnOf ds f es' e' = f es' B_ds [case e1 of as] = ( if all isVal (rhs* as') && isAlg e' then $e' else e' , rqs ) where (e1',rqs1) = B_ds [e1] (as',rqss) = unzip (B_ds_(isVal e1')* [as]) e' = case e1' of as' rqs = foldr reqMerge rqs1 rqss B_ds [let v = e1 in e2] = ( passVal e2' e' , union rqs1 rqs2 ) where (e1',rqs1) = B_ds [e1] (e2',rqs2) = B_ds [e2] v' = passVal e1' v e' = let v' = e1' in e2' B_ds_q [c vs -> e] = ( c vs' -> e' , rqs ) where vs' = if q then $* vs else vs (e',rqs) = B ds [e] The D pass ---------- The D pass is applied to the list of strong-components in the dependency graph of function definitions in a pool P, where components are listed in non-descending order. (1) If the first component C contains only a single non-recursive definition, OR if every algebraically-typed body of a definition in C is already annotated, OR if every algebraically-typed application in C of a definition also in C is already $-annotated THEN proceed to consider the next component (if any). Otherwise: (2) Make a temporary pool P' containing a copy of all definitions in C or components below it in the dependency ordering. In P' $-annotate every algebraically-typed application of a function defined in C. (3) Apply the entire analysis method to the pool P'. (4) If *all* the definitions of functions originally defined in C, with algebraic bodies, now have $-annotated bodies in P': (4a) add P' to P, over-riding any existing definitions of the same functions; (4b) apply the analysis method to the amended pool P. Otherwise: (5) Consider the next component, if any. Analysis functions defined -------------------------- PRSanalyse :: [Def] -> [Def] PRSanalyse prog = valuable prog [f ($* vs) = e] where (f vs = e) = head prog valuable :: [Def] -> [Def] -> [Def] valuable prog pool = passD prog pool' where (pool',[]) = fixFrom (forwardAndBack prog) (pool,[]) forwardAndBack :: [Def] -> ([Def],[Req]) -> ([Def],[Req]) forwardAndBack prog (pool,reqs) = passB (passF pool') where defs = map (variant prog) reqs pool' = defs ++ pool Example revisited ----------------- Recalling the sumRange example, the initial pool contains just: sumRange $m $n = sum (range m n) ; Apply forwardAndBack. Only forward propagation has any effect on the single definition in the pool sumRange $m $n = sum (range $m $n) ; but backward propagation issues requests for range$$ and sum?. After a further iteration of forwardAndBack the pool contains sumRange $m $n = sum (range $m $n) ; range $m $n = case $((<=) $m $n) of { True -> Cons $m (range $((+) $m $1) $n) ; False -> $Nil ; } ; sum xs = sumInto $0 xs ; with both primitive applications in range$$ $-annotated, indicating that they can be evaluated by PRS. There is a request for sumInto$?. In a further two applications of forwardAndBack, the pool is extended by the addition of both sumInto$? and sumInto??. sumInto $a xs = case xs of { Nil -> $a ; Cons y ys -> sumInto ((+) $a y) ys ; } ; sumInto a xs = case xs of { Nil -> a ; Cons y ys -> sumInto ((+) a y) ys ; } ; A fixpoint of forwardAndBack has been reached. The call-graph of the definitions in the pool is as follows: sumRange$$ | | sum? range$$ -- | | | sumInto$? ---- | sumInto??-- | | ---- Applying the D pass, the first component {sumInto??} fails the condition at step (4), as does {sumInto$?}. The component {sum?} is skipped at step (1). However, for the component {range$$} the following refined definition is obtained. range $m $n = $(case $((<=) $m $n) of { True -> $(Cons $m $(range $((+) $m $1) $n)) ; False -> $Nil ; }) ; The temporary pool P' contains just this definition. It is unchanged by application of valuable, and it now replaces the previous version of range$$ in the pool P. Reapplying the analysis to the amended pool P, forwardAndBack yields the following refinement of sumRange$$ sumRange $m $n = sum $(range $m $n) ; and a request for a new variant sum$. Applying forwardAndBack once more the sum$$ definition becomes sum $xs = sumInto $0 $xs ; with a request for sumInto$$. Finally forwardAndBack refines sumInto$$ as follows sumInto $a $xs = case $xs of { Nil -> $a ; Cons $y $ys -> sumInto $((+) $a $y) $ys ; } ; and so the third and final primitive application in the program is $-annotated for evaluation by PRS. A fixpoint is reached and the analysis concludes. The three definitions sum?, sumInto$? and sumInto?? are no longer reachable from sumRange$$, so they can be discarded. Conclusions and Further Work ---------------------------- * I described this analysis problem, and a sketch of the possible analysis method, to John Hughes. He pointed out that a type-based analysis (cf. boxed and unboxed integers) would give more accurate results. He may be right. However, the property of being a valuable integer does not transfer from the body of f to an application of f. It might also be harder to extract the variations, and the type-distinctions might be *too* refined, generating too many variations. * It remains to deal with higher-order functions and with recursive lets. One approach to higher-order functions is to specialize them to first-order variants wherever that is straightforward, and to approximate wildly where it isn't. Recursive lets could be tamed by a simplified version of the D pass. * Concerning efficiency, I claim only that the method terminates -- essentially because there are finitely many $-annotations and variant definitions. By maintaining dependency information, one could do better than re-application of passes, or even the complete analysis procedure, to an entire pool of definitions. * No strictness analysis has been used, but strictness information might be a useful way to improve the results obtained. For example, even with no prior value annotation on variables m and n the primitive subtractions in case (<=) m n of { False -> (-) m n ; True -> (-) n m ; } might be marked for PRS evaluation. (If each alternative becomes a function over arguments m and n, both must already be evaluated in view of the strict context of the (<=) comparison.) * Concerning the application to the Reduceron, there are three issues that may need further thought. (1) The analysis places no bound on the number of PRS-evaluated expressions in a function body, and it assumes that every $-annotated primitive application will be evaluated by PRS. (2) There is no distinction between "unboxed" integer values, directly available on the stack or in registers, and "boxed" values that are at the other end of a heap reference. (3) Is there a way to support the in-principle extension to apply PRS on the basis of a strict evaluation context for arguments? APPENDIX: a poor-man's type oracle ---------------------------------- Let the @ annotation of an expression e mean that e is known to be of algebraic type. Here are two passes for introducing @ annotations, to be applied iteratively until a fixpoint is reached. The predicate isAlg e is true exactly if e is @-annotated. The A-pass (for Ana-pass) propagates annotations upwards and outwards, from expressions to parents and from applied occurrences to bindings. The C-pass (for Cata-pass) propagates information downwards and inwards from expressions to children and from bindings to applied occurrences. The A-pass rules assume an auxiliaries liftAlg e v that @-annotates v if any applied occurrence of it in e is @-annotated. liftAlg e v = if anyAlg e v then @v else v anyAlg [n] v = False anyAlg [v'] v = v==v' && isAlg v' anyAlg [p e1 e2] v = anyAlg e1 v || anyAlg e2 v anyAlg [c es] v = any (flip anyAlg v) es anyAlg [f es] v = any (flip anyAlg v) es anyAlg [case e of as] v = anyAlg e v || any (flip anyAlg v . rhs) as anyAlg [let v' = e1 in e2] v = anyAlg e1 v || anyAlg e2 v The auxiliary passAlg is also useful: passAlg e1 e2 = if isAlg e1 then @e2 else e2 A [ds] = A* [ds] A [f vs = e] = f vs' = e' where vs' = (liftAlg e')* vs e' = A [e] A [n] = n A [v] = v A [p e1 e2] = p (A [e1]) (A [e2]) A [c es] = c (A* [es]) A [f es] = f (A* [es]) A [case e of as] = if any isAlg (rhs* as') then @e' else e' where as' = A* [as] e' = case @(A [e]) of as' A [let v = e1 in e2] = passAlg e2' e' where e1' = A [e1] e2' = A [e2] v' = if isAlg e1' then @v else liftAlg e2' v e' = let v' = e1' in e2' A [c vs -> e] = c vs' -> e' where e' = A [e] vs' = (liftAlg e')* vs The C pass uses as auxiliary information the top-level definitions and the variables in scope. It assumes an auxiliary definitionOf to lookup the definition of a named function. Here are the rules. C [ds] = C_ds* [ds] C_ds [f vs = e] = f vs = C_ds_V [e] where V = {v | v in vs, isAlg v} C_ds_V [n] = n C_ds_V [v] = if v in V then @v else v C_ds_V [p e1 e2] = p (C_ds_V [e1]) (C_ds_V [e2]) C_ds_V [c es] = @(c (C_ds_V* [es])) C_ds_V [f es] = if isAlg rhs df then @e' else e' where (f vs = rhs) = definitionOf f ds e' = f ((C_ds_V* . zipWith bindAlg vs) [es]) C_ds_V (e=[case e1 of as]) = case @(C_ds_V [e1]) of as' where as' = C_ds_V_e* as C_ds_V (e=[let v = e1 in e2]) = let v = C_ds_V' [e1] in C_ds_V' (passAlg e e2) where V' = V union {v | isAlg v} C_ds_V_e0 [c vs -> e1] = c vs -> C_ds_V' (passAlg e0 e1) where V' = V union {v | v in vs, isAlg v} The approach here is conservative so far as polymorphism is concerned. A polymorphic function may *sometimes* be applied to an argument of algebraic type, and it may *sometimes* return a data structure. But @-annotation indicates that *every* instance of the expression will be of algebraic type. So the above rules propagate @-annotations from function definitions to applied occurrences, but not the other way round. An alternative approach would be to generate variants of function definitions for distinct @-annotations of arguments, in much the same way that variants are generated for distinct $-annotations. But that would make programs larger, and the usefulness of the variations would be less immediately apparent.