TRACING LIST COMPREHENSIONS ART Memo 5 (Version 0, Thursday 15 June) Colin Runciman INTRODUCTION This memo is about source-level tracing for programs involving list comprehensions. The transformation to generate redex trails is not applied in nhc until after de-sugaring, so current traces of comprehension-based programs do not include expressions that look like comprehensions. Instead there are applications of `foldr' with strange-looking lambda expressions as arguments: these are what the de-sugaring rules in nhc produce. Forcing the programmer to think of comprehensions as standing for these ugly expressions defeats the object `to think bigger thoughts in one go' [Turner]. There are at least three different ways of looking at this problem, each suggesting a different route to a solution. 1. Re-sugaring: the problem is to invert the effects of de-sugaring when the trace is examined by the programmer. Start from the de-sugared expressions and devise translation rules. 2. Semantics: the problem is to provide an independent semantics for comprehensions, not a compiler translation. Start with operational rules for evaluating comprehensions. 3. User-interface design: the problem is to design the trace to show what programmers expect, based on intuitive understanding of comprehensions. How would a tutor explain where results come from? Start from the browser interface and specify idealised responses. The views are complementary, not exclusive, and each has something to offer. In this initial version of this memo, I shall just say a bit about each of the views in turn. In a subsequent version I hope to take each of them further, and also to reconcile the views and show how they merge into a solution. There are other possible starting points, such as the solutions used by others or a new representation for comprehensions in the Trace type. RE-SUGARING The first thing to realise is that there are various alternative rules for de-sugaring list comprehensions. The rules given in the Haskell report are: 1. [e | ] ==> [e] 2. [e | b, Q] ==> if b then [e | Q] else [] 3. [e | p <- xs, Q] ==> let ok p = [e | Q] ok _ = [] in concat (map ok xs) or, if p is a simple variable, ==> concat (map (\p -> [e | Q]) xs) So, for example, the recursive equation in the following definition of a permutation function pm [] = [[]] pm xs = [x:p | x <- xs, p <- pm (xs \\ [x])] translates to the desugared equivalent pm xs = concat (map (\x -> concat (map (\p -> [x:p]) (pm (xs \\ [x]))) xs It is easy to see why the comprehension version might be preferred. Even so, a programmer might have written the concat-map-lambda version, and in that case the tracer should not force sugary comprehensions down the programmer's throat! For this reason, distinguishable clones of conditionals, concat and map should be introduced for use in desugared translations. SEMANTICS I am not aware of a standard semantics for comprehensions, other than the `transformational semantics' given by translation rules and an (assumed) denotational semantics for the core language that is the target for translation. The two most widely used Haskell textbooks are the ones by Bird and by Thompson. Both authors discuss comprehensions, and give some rules for calculation, but their rules do not provide a complete semantics, and they are not always sufficiently lazy. In a Plasma talk last year, I gave the following semantics for a large subset of comprehensions -- restricting patterns in generators to be simple variables. Some notation: A=>B means `A computes to the normal form B' A=? means `A diverges' (ie bottom, undefined) A{e/v} means `A with free occurrences of v replaced by e'} First the basics of list construction: cons and nil constructions are already normal forms. [] => [] (NIL) x:xs => x:xs (CONS) I am going to need list concatenation, but don't want any special primitives outside the semantics. So here are the rules for reducing ++; there are three rules as the prefix list may be empty, non-empty or undefined. xs => [], ys => ys' xs => (x':xs') ------------------- (NIL-CAT) -------------------------- (CONS-CAT) xs ++ ys => ys' xs ++ ys => x':(xs' ++ ys) xs =? ------------ (DIV-CAT) xs ++ ys =>? Using the same decomposition into cases as the translation rules, first there is a rule for empty qualifiers: ------------ [e |] => [e] Now the rules for boolean qualifiers. b => False b => True, [e | Q] => xs ---------------- (FALSE-QUAL) ------------------------ (TRUE-QUAL) [e | b, Q] => [] [e | b, Q] => xs b =? ------------- (DIV-QUAL) [e | b, Q] =? Finally the rules for generators. The rules for empty and divergent generators are easy, but the rule for non-empty generators poses a more interesting problem. xs => [] ---------------------- (NIL-GEN) (CONS-GEN) [e | v <- xs, Q] => [] xs =? ---------------- (DIV-GEN) [e | v <- xs, Q] It is the CONS-GEN rule that introduces ++, and expresses the substitution that is `the magic of comprehensions'. xs => (x':xs'), [e | Q]{x'/v} ++ [e | v <- xs', Q] => ys -------------------------------------------------------- (CONS-GEN) [e | v <- xs, Q] => ys USER-INTERFACE DESIGN Back in 1998, Jan Sparud and I spent half a day filling a large whiteboard with imaginary traces for list comprehensions. We looked at a series of five small examples considering all possible selections in a trace browser. We asked ourselves what would we like/expect the tracer to display to answer the intuitive question `Where did that come from?'. Here are the first two of those examples. The remarks are those that occur to me now -- I didn't record any at the time. Example 1 Suppose f x = [x | x > 0]. Trace (a) f 0 and (b) f 1. (a) [] -- * [..|False] (b) 1 : [] ------ * [..|True] Selecting the entire result one asks `Why is the list empty/non-empty?' It is enough to see the False/True value of the qualifier, for which the trail could be followed as usual. The item expression is elided. The parent redex for the comprehension is just the application of f. Example 2 Suppose f n = [sqr x | x <- [1..n], odd x]. Trace f 3. 1 : 9 : [] ----------A -B ------C (A) | 1 <- .., True IN [.. <- 1:<> ..] ---- * odd 1 The question here is: why do we get this non-empty list of results? That is, how did we come to apply the outermost cons in the result? The answer is that in a comprehension, the first qualifer generated a value that satisfied the second and only other qualifier. Further selection would allow scrutiny of computations in each qualifier. (B) sqr 1 - * <- [1..3] But why does the first item in the result list have this particular value? It is the square of an in item in the generator list. (C) | 3 <- .., True IN IN [.. 2:3:<>..] / \ / \ / \ / \ | 2 <- .., False ----- * odd 2 And why is the tail of the result as it is? The answer has a positive and a negative part: why is 9 included, and why is 4 not included. The answer is presented in a similar style to a trace for an applications of a function defined by guarded equations.