TRACING LIST COMPREHENSIONS
ART Memo 5
(Version 0, Thursday 15 June)
Colin Runciman

INTRODUCTION

This memo is about source-level tracing for programs involving list
comprehensions.  The transformation to generate redex trails is not applied in
nhc until after de-sugaring, so current traces of comprehension-based programs
do not include expressions that look like comprehensions.  Instead there are
applications of `foldr' with strange-looking lambda expressions as arguments:
these are what the de-sugaring rules in nhc produce.  Forcing the programmer to
think of comprehensions as standing for these ugly expressions defeats the
object `to think bigger thoughts in one go' [Turner].

There are at least three different ways of looking at this problem, each
suggesting a different route to a solution.

1. Re-sugaring: the problem is to invert the effects of de-sugaring when the
   trace is examined by the programmer.  Start from the de-sugared expressions
   and devise translation rules.

2. Semantics: the problem is to provide an independent semantics for
   comprehensions, not a compiler translation.  Start with operational rules
   for evaluating comprehensions.

3. User-interface design: the problem is to design the trace to show what
   programmers expect, based on intuitive understanding of comprehensions.
   How would a tutor explain where results come from? Start from the browser
   interface and specify idealised responses.

The views are complementary, not exclusive, and each has something to offer.
In this initial version of this memo, I shall just say a bit about each of the
views in turn.  In a subsequent version I hope to take each of them further,
and also to reconcile the views and show how they merge into a solution.

There are other possible starting points, such as the solutions used
by others or a new representation for comprehensions in the Trace type.

RE-SUGARING

The first thing to realise is that there are various alternative rules
for de-sugaring list comprehensions.  The rules given in the Haskell
report are:

1. [e | ]           ==> [e]

2. [e | b, Q]       ==> if b then [e | Q] else []

	      
3. [e | p <- xs, Q] ==> let ok p = [e | Q]
                            ok _ = []
		        in concat (map ok xs)
			 
          or, if p is a simple variable,
	  
	            ==> concat (map (\p -> [e | Q]) xs)

So, for example, the recursive equation in the following definition of a
permutation function

pm [] = [[]]
pm xs = [x:p | x <- xs, p <- pm (xs \\ [x])]

translates to the desugared equivalent

pm xs = concat (map (\x -> concat (map (\p -> [x:p]) (pm (xs \\ [x]))) xs

It is easy to see why the comprehension version might be preferred.  Even so,
a programmer might have written the concat-map-lambda version, and in that
case the tracer should not force sugary comprehensions down the programmer's
throat! For this reason, distinguishable clones of conditionals,
concat and map should be introduced for use in desugared translations. 

SEMANTICS

I am not aware of a standard semantics for comprehensions, other than the
`transformational semantics' given by translation rules and an (assumed)
denotational semantics for the core language that is the target for
translation.

The two most widely used Haskell textbooks are the ones by Bird and by
Thompson.  Both authors discuss comprehensions, and give some rules for
calculation, but their rules do not provide a complete semantics, and they
are not always sufficiently lazy.

In a Plasma talk last year, I gave the following semantics for a large subset
of comprehensions -- restricting patterns in generators to be simple variables.

Some notation: 
A=>B   means  `A computes to the normal form B'
A=?    means  `A diverges' (ie bottom, undefined)
A{e/v} means  `A with free occurrences of v replaced by e'}
	  
First the basics of list construction: cons and nil constructions
are already normal forms.

[] => []  (NIL)			x:xs => x:xs  (CONS)

I am going to need list concatenation, but don't want any special primitives
outside the semantics.  So here are the rules for reducing ++;  there are three
rules as the prefix list may be empty, non-empty or undefined.

xs => [], ys => ys'		xs => (x':xs')
-------------------  (NIL-CAT) 	--------------------------  (CONS-CAT)
xs ++ ys => ys'			xs ++ ys => x':(xs' ++ ys)

		xs =?
		------------  (DIV-CAT)
		xs ++ ys =>?
		
Using the same decomposition into cases as the translation rules, first there
is a rule for empty qualifiers:

------------
[e |] => [e]

Now the rules for boolean qualifiers.

b => False			b => True, [e | Q] => xs
----------------  (FALSE-QUAL)	------------------------  (TRUE-QUAL)
[e | b, Q] => []		[e | b, Q] => xs

		b =?
		-------------  (DIV-QUAL)
		[e | b, Q] =?

Finally the rules for generators.  The rules for empty and divergent generators
are easy, but the rule for non-empty generators poses a more interesting
problem.

xs => []
----------------------  (NIL-GEN)	<Wait for it!>  (CONS-GEN)
[e | v <- xs, Q] => []

		xs =?
		----------------  (DIV-GEN)
		[e | v <- xs, Q]

It is the CONS-GEN rule that introduces ++, and expresses the substitution
that is `the magic of comprehensions'.

xs => (x':xs'), [e | Q]{x'/v} ++ [e | v <- xs', Q] => ys
--------------------------------------------------------  (CONS-GEN)
[e | v <- xs, Q] => ys

USER-INTERFACE DESIGN

Back in 1998, Jan Sparud and I spent half a day filling a large whiteboard with
imaginary traces for list comprehensions.  We looked at a series of five small
examples considering all possible selections in a trace browser.  We asked
ourselves what would we like/expect the tracer to display to answer the
intuitive question `Where did that come from?'. Here are the first two of those
examples.  The remarks are those that occur to me now -- I didn't record any at
the time.

Example 1

Suppose f x = [x | x > 0].  Trace (a) f 0 and (b) f 1.

(a) []
    --
    * [..|False]
	
(b) 1 : []
    ------
    * [..|True]
	
Selecting the entire result one asks `Why is the list empty/non-empty?'
It is enough to see the False/True value of the qualifier, for which
the trail could be followed as usual.  The item expression is elided.
The parent redex for the comprehension is just the application of f.

Example 2

Suppose f n = [sqr x | x <- [1..n], odd x].  Trace f 3.

1 : 9 : []
----------A
-B  ------C

(A) | 1 <- .., True IN [.. <- 1:<> ..]
               ----
    * odd 1
 
The question here is: why do we get this non-empty list of results?
That is, how did we come to apply the outermost cons in the result?
The answer is that in a comprehension, the first qualifer generated
a value that satisfied the second and only other qualifier.  Further
selection would allow scrutiny of computations in each qualifier.
  
(B) sqr 1
        -
    * <- [1..3]
  
But why does the first item in the result list have this particular value?
It is the square of an in item in the generator list.
  
(C) | 3 <- .., True IN <X> IN [.. 2:3:<>..]
                      /   \
		     /     \
		    /       \
		   /         \
		 | 2 <- .., False
		            -----
	             * odd 2

And why is the tail of the result as it is?  The answer has a positive and a
negative part: why is 9 included, and why is 4 not included. The answer is
presented in a similar style to a trace for an applications of a function
defined by guarded equations.