Green card: a foreign-language interface for Haskell

Thomas Nordin, Simon Peyton Jones, Alastair Reid, Malcolm Wallace

**** Note that this document describes GreenCard as of November 1997 - in particular, it supersedes the Haskell Workshop 97 paper. There are significant syntax changes, some simplifications, and some new features to extend the power of DISs. ****

1 Motivation

A foreign-language interface provides a way for software components written in a one language to interact with components written in another. Programming languages that lack foreign-language interfaces die a lingering death.

This document describes GreenCard, a foreign-language interface for the non-strict, purely functional language Haskell. We assume some knowledge of Haskell and C.

green-card_1.1: Goals and non-goals.

1.1 Goals and non-goals

Our goals are limited. We do not set out to solve the foreign-language interface in general; rather we intend to profit from others' work in this area. Specifically, we aim to provide the following, in priority order:

A convenient way to call C procedures from Haskell.
A convenient way to write COM(1) software components in Haskell, and to call COM components from Haskell.

The ability to call C from Haskell is an essential foundation. Through it we can access operating system services and mountains of other software libraries.

In the other direction, should we be able to write a Haskell library that a C program can use? Yes indeed, but this paper does not address the question directly. (Some implementations of GreenCard, e.g. for nhc13, have provided a limited mechanism to allow this.)

Should we support languages other than C? The trite answer is that pretty much everything available as a library is available as a C library. For other languages the right thing to do is to interface to a language-independent software component architecture, rather than to a raft of specific languages. For the moment we choose COM, but CORBA(2) might be another sensible choice. (Note also that there is some current research focussed on using IDL to specify generalised foreign language interfaces for Haskell.)

While we do not here propose a mechanism to call Haskell from C, it does make sense to think of writing COM software components in Haskell that are used by clients. For example, one might write an animated component that sits in a Web page.

This document, however, describes only /1/, the C interface mechanism.

2 Foreign language interfaces are harder than they look

Even after the scope is restricted to designing a foreign-language interface from Haskell to C, the task remains surprisingly tricky. At first, one might think that one could take the C header file describing a C procedure, and generate suitable interface code to make the procedure callable from Haskell.

Alas, there are numerous tiresome details that are simply not expressed by the C procedure prototype in the header file. For example, consider calling a C procedure that opens a file, passing a character string as argument. The C prototype might look like this:

  int open( char *filename )

Our goal is to generate code that implements a Haskell procedure with type

  open :: String -> IO FileDescriptor

First there is the question of data representation. One has to decide either to alter the Haskell language implementation, so that its string representation is identical to that of C, or to translate the string from one representation to another at run time. This translation is conventionally called marshalling. Since Haskell is lazy, the second approach is required. (In general, it is tremendously constraining to try to keep common representations between two languages. For example, precisely how are structures laid out in C?)
Next come questions of allocation and lifetime. Where should we put the translated string? In a static piece of storage? (But how large a block should we allocate? Is it safe to re-use the same block on the next call?) Or in Haskell's heap? (But what if the called procedure does something that triggers garbage collection, and the transformed string is moved? Can the called procedure hold on to the string after it returns?) Or in C's `malloc''d heap? (But how will it get deallocated? And `malloc' is expensive.)
C procedures often accept pointer parameters (such as strings) that can be `NULL'. How is that to be reflected on the host-language side of the interface? For example, if the documentation for `open' told us that it would do something sensible when called with a `NULL' string, we might like the Haskell type for `open' to be
```
  open :: Maybe String -> IO FileDescriptor
```
so that we can model `NULL' by `Nothing'.
The desired return type, `FileDescriptor', will presumably have a Haskell definition such as this:
```
  newtype FileDescriptor = FD Int
```
The file descriptor returned by `open' is just an integer, but Haskell programmers often use `newtype' declarations create new distinct types isomorphic to existing ones. Now the type system will prevent, say, an attempt to add one to a `FileDescriptor'. Needless to say, the Haskell result type is not going to be described in the C header file.
The file-open procedure might fail; sometimes details of the failure are stored in some global variable, `errno'. Somehow this failure and the details of what went wrong must be reflected into Haskell's `IO' monad.
The `open' procedure causes a side effect, so it is appropriate for its type to be in Haskell's `IO' monad. Some C functions really are functions (that is, they have no side effects), and in this case it makes sense to give them a "pure" Haskell type. For example, the C function `sin' should appear to the Haskell programmer as a function with type
```
  sin :: Float -> Float
```
C procedure specifications are not explicit about which parameters are `in' parameters, which `out' and which `in out'.

None of these details are mentioned in the C header file. Instead, many of them are in the manual page for the procedure, while others (such as parameter lifetimes) may not even be written down at all.

3 Overview of GreenCard

The previous section bodes ill for an automatic system that attempts to take C header files and automatically generate the "right" Haskell functions; C header files simply do not contain enough information.

The rest of this paper describes how we approach the problem. The general idea is to start from the Haskell type definition for the foreign function, rather than the C prototype. The Haskell type contains quite a bit more information; indeed, it is often enough to generate correct interface code. Sometimes, however, it is not, in which case we provide a way for the programmer to express more details of the interface. All of this is embodied in a program called "GreenCard".

GreenCard is a Haskell pre-processor. It takes a Haskell module as input, and scans it for GreenCard directives (which are lines prefixed by `%'). It produces a new Haskell module as output, and (in some implementations) a C module as well. (Figure 1).

Figure 1: The big picture

GreenCard's output depends on the particular Haskell implementation that is going to compile it. For the Glasgow Haskell Compiler (GHC), GreenCard generates Haskell code that uses GHC's primitive `ccall'/`casm' construct to call C. All of the argument marshalling is done in Haskell. For Hugs, GreenCard generates a C module to do most of the argument marshalling, while the generated Haskell code uses Hugs's `prim' construct to access the generated C stubs. For nhc13, GreenCard generates a C module to do part of the argument marshalling, although the majority of it is done in the generated Haskell code.

For example, consider the following Haskell module:

  module M where

  %fun sin :: Float -> Float

  sin2 :: Float -> Float
  sin2 x = sin (sin x)

Everything is standard Haskell except the `%fun' line, which asks GreenCard to generate an interface to a (pure) C function `sin'. After the GHC-targeted version of GreenCard processes the file, it looks like this(3): (Only GHC aficionados will understand this code. The whole point of GreenCard is that Joe Programmer should not have to learn how to write this stuff!)

  module M where
        
  sin :: Float -> Float
  sin f = unsafePerformPrimIO (
            case f of { F# f# ->
            _casm_ "%r = sin(%0)" f#  `thenPrimIO` \ r# ->
            returnPrimIO (F# r#)})

  sin2 :: Float -> Float
  sin2 x = sin (sin x)

The `%fun' line has been expanded to a blob of gruesome boilerplate, while the rest of the module comes through unchanged.

If Hugs is the target, the Haskell source file remains unchanged, but the the Hugs variant of GreenCard generates output that uses Hugs's primitive mechanisms for calling C. For the nhc13 target, GreenCard generates something different again. Much of the GreenCard implementation is, however, shared between all variants.

4 GreenCard directives

GreenCard pays attention only to GreenCard directives, each of which starts with a `%' at the beginning of a line. All other lines are passed through to the output Haskell file unchanged.

The syntax of GreenCard directives is given in Figure 2). The syntax for the dis production is given later (Figure 3).

Program	idl	->	decl_1 ... decl_n	n >= 1
Declaration	decl	->	proc
		\|	`%const` var `[`const_1`,` ... `,`const_n`]`	Constants, n >= 1
		\|	`%dis` var var_1 ... var_n `=` dis	n >= 0
		\|	`%prefix` var	Prefix to strip from Haskell function names
		\|	`%C` var	entire line is passed (stripped) to C
		\|	`%-` var	entire line is passed verbatim to C
Procedure	proc	->	sig [call] [ccode] [result]
Signature	sig	->	`%fun` var `::` type	Name and type
Type	type	->	var	simple type
		\|	var type	type application
		\|	type `->` type	function type
		\|	`(`type_1`,` ... `,`type_n`)`	tuple types, n >= 0
		\|	`[`type`]`	list type
Call	call	->	`%call` dis_1 ... dis_n
Result	result	->	`%fail` cexp cexp [result]	In I/O monad
		\|	`%result` dis
Constant	const	->	cv
		\|	var `=` cv
C Expression	cexp	->	`"` var `"`	string excludes " character
C Code	ccode	->	`%code` var

Figure 2: Grammar for GreenCard

A general principle we have followed is to define a single, explicit (and hence long-winded) general mechanism, that should deal with just about anything, and then define convenient abbreviations that save the programmer from writing out the general mechanism in many common cases. We have erred on the conservative side in defining such abbreviations; that is, we have only defined an abbreviation where doing without it seemed unreasonably long-winded, and where there seemed to be a systematic way of defining an abbreviation.

GreenCard understands the following directives:

`%fun' begins a procedure specification, which describes the interface to a single C procedure (Section 5 Procedure specifications).
`%dis' allows the programmer to describe a new Data Interface Scheme (DIS). A DIS describes how to translate, or marshall, data from Haskell to C and back again (Section 6 Data Interface Schemes).
`%const' makes it easy to generate a collection of new Haskell constants derived from C constants. This can be done with `%fun', but `%const' is much more concise (Section 5.6 Constants).
`%prefix' makes it easy to remove standard prefixes from the Haskell function name, those are usually not needed since Haskell allows qualified imports (Section 5.7 Prefixes).
`%C' allows one to write fragments of C code which sit outside any procedure specifications. (We shall see later how to include fragments of C code within procedures.) The entire line of text following this directive is simply copied verbatim to the generated C module.

Following a GreenCard directive, subsequent leading or trailing whitespace is in general ignored or trimmed. This applies even to the `%C' directive. Because there are occasions when it can be desirable to preserve whitespace in the C code, some implementations of GreenCard (currently only for nhc13) allow a special form `%-' which is exactly like `%C' except that it preserves all whitespace.

All directives (except `%C' and `%-') can span more than one line, but the continuation lines must each start with a `%' followed by some whitespace. Haskell-style comments are permitted in GreenCard directives (except, for obvious reasons, `%C' and `%-'). For example:

  %fun draw :: Int              -- Length in pixels
  %         -> Maybe Int        -- Width in pixels
  %         -> IO ()

In later sections, we shall encounter the specification of short fragments of literal C code (and indeed, literal Haskell code) deep within a GreenCard directive. On such occasions, the literal C code is enclosed within double-quote marks (and the literal Haskell code is also denoted syntactically). However, within these fragments one sometimes wishes to make use of the value of a name bound by a GreenCard DIS macro, rather than the name itself. Hence, a name used within double-quotes can be escaped by prefixing it with the `%' character. When the literal code is generated, these escaped names will be replaced by the value bound to that name in the current environment. See Section 7.2 for examples.

5 Procedure specifications

The most common GreenCard directive is a procedure specification. It describes the interface to a C procedure. A procedure specification has four parts:

Type signature: `%fun': (Section 5.1 Type signature). The `%fun' statement starts a new procedure specification, giving the name and Haskell type of the function.
Parameter marshalling: `%call': (Section 5.2 Parameter marshalling). The `%call' statement tells GreenCard how to translate the Haskell parameters into their C representations.
The body: `%code': (Section 5.3 The body). The `%code' statement gives the body and it can contain arbitrary C code. Sometimes the body consists of a simple procedure call, but it may also include variable declarations, multiple calls, loops, and so on.
Result marshalling: `%result', `%fail': (Section 5.4 Result marshalling). The result-marshalling statements tell GreenCard how to translate the result(s) of the call back into Haskell values.

Any of these parts may be omitted except the type signature. If any part is missing, GreenCard will fill in a suitable statement based on the type signature given in the `%fun' statement. For example, consider this procedure specification:

  %fun sin :: Float -> Float

GreenCard fills in the missing statements like this(4):

  %fun sin :: Float -> Float
  %call (float arg1)
  %code res1 = sin(arg1);
  %result (float res1)

The rules that guide this automatic fill-in are described in Section 5.5 Automatic fill-in.

A procedure specification can define a procedure with no input parameter, or even a constant (a "procedure" with no input parameters and no side effects). In the following example, `printBang' is an example of the former, while `grey' is an example of the latter(5):

  %fun printBang :: IO ()
  %code printf( "!" );

  %fun grey :: Colour
  %code r = GREY;
  %result (colour r)

All the C variables bound in the `%call' statement or mentioned in the `%result' statement, are declared by GreenCard and in scope throughout the body. In the examples above, GreenCard would have declared `arg1', `res1' and `r'.

green-card_5.1: Type signature.
green-card_5.2: Parameter marshalling.
green-card_5.3: The body.
green-card_5.4: Result marshalling.
green-card_5.5: Automatic fill-in.
green-card_5.6: Constants.
green-card_5.7: Prefixes.

5.1 Type signature

The `%fun' statement starts a new procedure specification.

GreenCard supports two sorts of C procedures: ones that may cause side effects (including I/O), and ones that are guaranteed to be pure functions. The two are distinguished by their type signatures. Side-effecting functions have the result type `IO t' for some type `t'. If the programmer specifies any result type other than `IO t', GreenCard takes this as a promise that the C function is indeed pure, and will generate code that assumes such.

The procedure specification will expand to the definition of a Haskell function, whose name is that given in the `%fun' statement, with two changes: the longest matching prefix specified with a `%prefix' (Section 5.7 Prefixes elaborates) statement is removed from the name and the first letter of the remaining function name is changed to lower case. Haskell requires all function names to start with a lower-case letter (upper case would indicate a data constructor), but when the C procedure name begins with an upper case letter it is convenient to still be able to make use of GreenCard's automatic fill-in facilities. For example:

  %fun OpenWindow :: Int -> IO Window

would expand to a Haskell function `openWindow' that is implemented by calling the C procedure `OpenWindow'.

  %prefix Win32
  %fun Win32OpenWindow :: Int -> IO Window

would expand to a Haskell function `openWindow' that is implemented by calling the C procedure `Win32OpenWindow'.

5.2 Parameter marshalling

The `%call' statement tells GreenCard how to translate the Haskell parameters into C values. Its syntax is designed to look rather like Haskell pattern matching, and consists of a sequence of zero or more Data Interface Schemes (DISs), one for each (curried) argument in the type signature. For example:

  %fun foo :: Float -> (Int,Int) -> String -> IO ()
  %call (float x) (int y, int z) (string s)
  ...

This `%call' statement binds the C variables `x', `y', `z', and `s', in a similar way that Haskell's pattern-matching binds variables to (parts of) a function's arguments. These bindings are in scope throughout the body and result-marshalling statements.

In the `%call' statement, `float', `int', and `string' are the names of the DISs that are used to translate between Haskell and C. The names of these DISs are deliberately chosen to be the same as the corresponding Haskell types (apart from changing the initial letter to lower case) so that in many cases, including `foo' above, GreenCard can generate the `%call' line by itself (Section 5.5 Automatic fill-in). In fact there is a fourth DIS hiding in this example, the `(_,_)' pairing DIS. DISs are discussed in detail in Section 6 Data Interface Schemes.

5.3 The body

The body consists of arbitrary C code, beginning with `%code'. The reason for allowing arbitrary C is that C procedures sometimes have complicated interfaces. They may return results through parameters passed by address, deposit error codes in global variables, require `#include''d constants to be passed as parameters, and so on. The body of a GreenCard procedure specification allows the programmer to say exactly how to call the procedure, in its native language.

The C code starts a block, and may thus start with declarations that create local variables. For example:

  %code int x, y;
  %     x = foo( &y, GREY );

Here, `x' and `y' are declared as local variables. The local C variables declared at the start of the block scope over the rest of the body and the result-marshalling statements.

(The C code may also mention values from included C header files, such as `GREY' above, or use global variables or structures declared earlier by GreenCard `%C' (or `%-') directives.

5.4 Result marshalling

Functions return their results using a `%result' statement. Side-effecting functions -- ones whose result type is `IO t' -- can also use `%fail' to specify the failure value.

green-card_5.4.1: Pure functions.
green-card_5.4.2: Arbitrary C results.
green-card_5.4.3: Side effecting functions.

5.4.1 Pure functions

The `%result' statement takes a single DIS that describes how to translate one or more C values back into a single Haskell value. For example:

  %fun sin :: Float -> Float
  %call (float x)
  %code ans = sin(x);
  %result (float ans)

As in the case of the `%call' statement, the `float' in the `%result' statement is the name of a DIS, chosen as before to coincide with the name of the type. A single DIS, `float', is used to denote both the translation from Haskell to C and that from C to Haskell, just as a data constructor can be used both to construct a value and to take one apart (in pattern matching).

All the C variables bound in the `%call' statement, the `%result' statement, and all those bound in declarations at the start of the body, scope over all the result-marshalling statements (i.e. `%result' and `%fail').

5.4.2 Arbitrary C results

In a result-marshalling statement an almost arbitrary C expression, enclosed in double quotes, can be used in place of a C variable name. The above example could be written more briefly like this(6):

  %fun sin :: Float -> Float
  %call (float x)
  %result (float "sin(x)")

5.4.3 Side effecting functions

A side effecting function returns a result of type `IO t' for some type `t'. The `IO' monad supports exceptions, so GreenCard allows them to be raised.

The result-marshalling statements for a side-effecting call consists of zero or more `%fail' statements, each of which conditionally raise an exception in the `IO' monad, followed by a single `%result' statement that returns successfully in the `IO' monad. Just as in Section 5.4 Result marshalling, the `%result' statement gives a single DIS that describes how to construct the result Haskell value, following successful completion of a side-effecting operation. For example:

  %fun windowSize :: Window -> IO (Int,Int)
  %call (window w)
  %code struct WindowInfo wi;
  %     GetWindowInfo( w, &wi );
  %result (int "wi.x", int "wi.y")

Here, a pairing DIS is used, with two `int' DISs inside it. The arguments to the `int' DISs are C record selections, enclosed in double quotes; they extract the relevant information from the `WindowInfo' structure that was filled in by the `GetWindowInfo' call(7).

The `%fail' statement has two fields, each of which is either a C variable or a C expression, enclosed in double quotes. The first field is a boolean-valued expression that indicates when the call should fail; the second is a `(char *)'-value that indicates what sort of failure occurred. If the boolean is true (i.e. non zero) then the call fails with a `UserError' in the `IO' monad containing the specified string.

For example:

  %fun fopen :: String -> IO FileHandle
  %call (string s)
  %code f = fopen( s );
  %fail "f == NULL" "errstring(errno)"
  %result (fileHandle f)

The assumption here is that `fopen' puts its error code in the global variable `errno', and `errstring' converts that error number to a string.

`UserError's can be caught with `catch', but exactly which error occurred must be encoded in the string, and parsed by the error-handling code. This is rather slow, but errors are meant to be exceptional.

5.5 Automatic fill-in

Any or all of the parameter-marshalling, body, and result-marshalling statements may be omitted. If they are omitted, GreenCard will "fill in" plausible statements instead, guided by the function's type signature. The rules by which GreenCard does this filling in are as follows:

A missing `%call' statement is filled in with a DIS for each curried argument. Each DIS is constructed from the corresponding argument type as follows:
- A tuple argument type generates a tuple DIS, with the same algorithm applied to the components.
- All other types generate a DIS macro application (Section 6.1 Forms of DISs). The DIS macro name is derived from the type of the corresponding argument, except that the first letter of the type is changed to lower case. The DIS macro is applied to as many argument variables as required by the arity of the DIS macro.
- The automatically-generated argument variables are named left-to-right as `arg1', `arg2', `arg3', and so on.
If the body is missing, GreenCard fills in a body of the form:
```
  r = f(a_1,a_2,...a_n);
```
where
- `f' is the function name given in the type signature.
- `a_1,...,a_n' are the argument names extracted from the `%call' statement.
- `r' is the variable name for the variable used in the `%result' statement. (There should only be one such variable if the body is automatically filled in.)
A missing `%result' statement is filled in by a `%result' with a DIS constructed from the result type in the same way as for a `%call'. The result variables are named `res1', `res2', `res3', and so on.
GreenCard never fills in `%fail' statements.

5.6 Constants

Some C header files define a large number of constants of a particular type. The `%const' statement provides a convenient abbreviation to allow these constants to be imported into Haskell. For example:

  %const PosixError [EACCES, ENOENT]

This statement is equivalent to the following `%fun' statements:

  %fun EACCES :: PosixError
  %fun ENOENT :: PosixError

After the automatic fill-in has taken place we would obtain:

  %fun EACCES :: PosixError
  %result (posixError "EACCES")

  %fun ENOENT :: PosixError
  %result (posixError "ENOENT")

Each constant is made available as a Haskell value of the specified type, converted into Haskell by the DIS macro for that type. (It is up to the programmer to write a `%dis' definition for the macro -- see Section 6.2 DIS macros.)

There are variant ways of declaring constants within the `%const' directive. Firstly, the type-name can be replaced by a DIS-name if you wish. Secondly, you may find the Haskell constant names `eACCES' and `eNOENT' somewhat ugly, so you may associate a different Haskell name with each C constant name.

  %const PosixError [
  %   errAccess = "EACCES", 
  %   errNoEnt  = "ENOENT"
  % ]

5.7 Prefixes

In C it is common practice to give all function names in a library the same prefix, to minimize the impact on the common namespace. In Haskell we use qualified imports to achieve the same result. To simplify the conversion of C style namespace management to Haskell the `%prefix' statement specifies which prefixes to remove from the Haskell function names.

  module OpenGL where
  
  %prefix OpenGL
  %prefix gl

  %fun OpenGLInit :: Int -> IO Window
  %fun glSphere :: Coord -> Int -> IO Object

This would define the two procedures init and sphere which would be implemented by calling OpenGLInit and glSphere respectively.

5.8 Arbitrary C inclusions

It is often useful to be able to write arbitrary lines of C code outside any procedure specification, for instance to include a header file, define the layout of a C structure, or declare a C global variable. The `%C' directive (with its whitespace-preserving variant `%-') is provided expressly for this purpose.

For example, either of

    %C   #include <header.h>

    %-#include <header.h>

tells GreenCard to arrange that a specified C header file will be included by the C code it generates.

As another example, for simple convenience one might wish to add data or type declarations directly to the generated C module, rather than in a separate header file. Thus:

    %-struct _iocb {
    %-   int fd;
    %-   void *buf;
    %-   int pos;
    %-   unsigned flags;
    %-};
    %-typedef struct _iocb *FILE

6 Data Interface Schemes

A Data Interface Scheme, or DIS, tells GreenCard how to translate from a Haskell data type to a C data type, and vice versa.

green-card_6.1: Forms of DISs.
green-card_6.2: DIS macros.
green-card_6.3: Semantics of DISs.

DIS	dis	->	var arg_1 ... arg_n	Macro application
		\|	`(`dis_1`,` ... `,`dis_n`)`	Tuple, n >= 0
		\|	Cons arg_1 ... arg_n	Constructor, n >= 0
		\|	Cons `{` field_1 `=` arg_1 `,` ... `,` field_n `=` arg_n `}`	Named fields, n >= 1
		\|	`<`var`/`var`>` arg_1 ... arg_n	User-defined functions, n >= 1
		\|	`%%`Var cv	Base DIS
		\|	`declare` cexp cv `in` dis	Type-cast DIS
Argument	arg	->	dis
		\|	cv
Variable / C Expression	cv	->	cexp
		\|	var	Variable bound in `%dis`

Figure 3: Grammar of DISs

6.1 Forms of DISs

The syntax of DISs is given in Figure 3. It is designed to be similar to the syntax of Haskell patterns. A DIS takes one of the following forms:

The application of a DIS macro to zero or more arguments. Like Haskell functions, a DIS macro starts with a lower-case letter. DIS macros are described in Section 6.2 DIS macros. Some standard DIS macros include `int', `float', `double'; the full set is given in Section 7 Standard DISs. For example:
```
  %fun foo :: This -> Int -> That
  %call (this x y) (int z)
  %code r = c_foo( x, y, z );
  %result (that r)
```
In this example `this' and `that' are DIS macros defined elsewhere.
The application of a Haskell data constructor to zero or more DISs. For example:
```
  newtype Age = Age Int
  %fun foo :: (Age,Age) -> Age
  %call (Age (int x), Age (int y))
  %code r = foo(x,y);
  %result (Age (int r))
```
As the `%call' line of this example illustrates, tuples are understood as data constructors, including their special syntax. Haskell named-field syntax is also supported. For example:
```
  data Point = Point { px,py::Int }

  %fun foo :: Point -> Point
  %call (Point { px = int x, py = int y })
  ...
```
GreenCard does not attempt to perform type inference; it simply assumes that any DIS starting with an upper case letter is a data constructor, and that the number of argument DISs matches the arity of the constructor.
The application of a user function to one or more DISs. This form allows one to use an arbitrary data transformation written in Haskell, usually to simplify a value of some complicated type down to a collection of values of types that C can understand. Noting that DISs can be used bi-directionally, it is necessary to provide the names of two Haskell functions; one for marshalling, the other for unmarshalling. For example:
```
  data T = Zero | Succ T
  from_t :: T -> Int
  to_t   :: Int -> T

  %fun square :: T -> T
  %call (<from_t/to_t> (int x))
  %code r = square( x );
  %result (<from_t/to_t> (int r))
```
Here, the function `from_t' is applied to `square''s argument (converting it to an integer) before it crosses the fence into C. Likewise, the result from C is converted back to type `T' by the function `to_t' after being returned to the Haskell world. (The reason for giving both function names in the `%call' and `%result' lines when only one will be used in either case, is that the DIS may be hidden inside a macro, and of course the same macro can be used in either position.)
The whole `<../..>' construct is treated by analogy to the two uses of constructors: on the one hand for pattern-matching (taking apart a value), and on the other for constructing a value.
The user functions can have any name at all: in fact, the `<../..>'" syntax simply encloses two fragments of arbitrary Haskell to be applied to the succeeding arguments. One may specify a partially applied function, or anything else (excluding uses of the `/' and `>' symbols - so lambda abstractions are not possible unfortunately). The user-defined DIS may of course also bind more than one parameter (in which case, to preserve symmetry of marshalling and unmarshalling, the functions are always treated as uncurried). For example:
```
  data Polar = P Dist Vector
  %dis polar a b = <polar_to_cart/cart_to_polar> (int a) (int b)

  polar_to_cart :: Polar -> (Int, Int)
  cart_to_polar :: (Int, Int) -> Polar
```
Notice that all the example marshalling functions have pure types (e.g. `from_t' has type `T -> Int' rather than `T -> IO Int'). Sometimes one wants to write a marshalling function that is internally stateful. For example, it might pack a `[Char]' into a `ByteArray', by allocating a `MutableByteArray' and filling it in with the characters one at a time. This can be done using `runST', or even `unsafePerformIO'. (This is a GHC-specific comment; so far as GreenCard is concerned it is simply up to the programmer to supply suitably-typed marshalling functions.)
A C type cast. Occasionally one wishes to declare and use a C variable at a type which differs slightly from the type produced by a standard DIS, although it shares the same machine representation. The `declare "ctype" var in dis' form of declaration can be used to perform the necessary type-conversion in C. Examples:
```
  %fun foo :: Int -> IO ()
  %call (declare "unsigned" x in int x)
  ...

  data T = MkT Int
  %fun baz :: T -> IO ()
  %call (declare "c_t" x in MkT (int x))
  ...
```
The application of a base DIS to exactly one variable. This is the primitive form of a DIS - the way all values actually get passed across the Haskell-C boundary. Base DISs denote a fixed set of primitive types known to both C and Haskell (such as `int' and `Int' respectively), and consist of the Haskell typename prefixed by `%%' (e.g. `%%Int'). Because the exact set of base DISs may vary slightly between compilers, it is recommended that programmers use the standard DIS macros listed in Section 7 in preference to the base DISs. The base form is noted here primarily for completeness.
As an example, here is the fully expanded DIS for floats in GHC, which also deals with unboxing. (Note that other compilers do not treat unboxing in this way, hence the recommendation to use the standard DIS.)
```
  %fun sin :: Float -> Float
  %call (declare "float" x in (F# (%%Float x)))
  %code r = sin(x);
  %result (declare "float" r in (F# (%%Float r)))
```

6.2 DIS macros

It would be unbearably tedious to have to write out complete DISs in every procedure specification, so GreenCard supports DIS macros in much the same way that Haskell provides functions. (The big difference is that DIS macros can be used in "patterns" -- such as `%call' statements -- whereas Haskell functions cannot.)

DIS macros allow the programmer to define abbreviations for commonly-occurring DISs. For example:

  newtype This = MkThis Int (Float, Float)
  %dis this x y z = MkThis (int x) (float y, float z)

Along with the `newtype' declaration the programmer can write a `%dis' declaration that defines the DIS macro `this' in the obvious manner.

DIS macros are simply expanded out by GreenCard before it generates code. So for example, if we write:

  %fun f :: This -> This
  %call (this p q r)
  ...

GreenCard will expand the call to `this':

  %fun f :: This -> This
  %call (MkThis (int p) (float q, float r))
  ...

(In fact, `int' and `float' are also DIS macros defined in GreenCard's standard prelude, so the `%call' line is further expanded to something like:

  %fun f :: This -> This
  %call (MkThis ((declare "int" p in I# (%%Int p))
  %              (declare "float" q in F# (%%Float q),
  %               declare "float" r in F# (%%Float r))))
  ...

The fully expanded calls describe the marshalling code in full detail; you can see why it would be inconvenient to write them out literally on each occasion!)

Notice that DIS macros are automatically bidirectional; that is, they can be used to convert Haskell values to C and vice versa. For example, we can write:

  %fun f :: This -> This
  %call (this p q r)
  %code f( p, q, r, &a, &b, &c);
  %result (this a b c)

The form of DIS macro definitions, given in Figure 3, is very simple. The formal parameters can only be variables (not patterns), and the right hand side is simply another DIS. Only first-order DIS macros are permitted.

Note however that the quoting/escape mechanism for literal code enables one to use the value of a macro variable within a fragment of C code (or Haskell code). This feature is very powerful, as shown in Section 6.2.0 Marshalling complex structures.

6.2.0 Marshalling complex structures

The full power of DIS macros becomes apparent when one attempts to map between a structured Haskell type and a structured C type. For example, let us study a Haskell `ColourPoint' type:

  data ColourPoint = CP Int Int Colour
  data Colour = Red | Green | Blue | ...

for which we happen to want a representation in C as a `struct colourpoint':

  struct colourpoint {
      int x;
      int y;
      enum colour c;
  };

It requires just two small DIS macros to capture the mapping:

  %dis colourPoint cp =
  %    declare "struct colourpoint" cp in
  %    CP (int "%cp.x") (int "%cp.y") (colour "%cp.c")
  %dis colour x =
  %    declare "enum colour" x in
  %    <fromEnum/toEnum> (int x)

Using these, it is then very easy to implement the required interfaces to foreign functions which manipulate coloured points:

  %fun translate :: Int -> Int -> ColourPoint -> IO ColourPoint
  %call (int xrel) (int yrel) (colourpoint p)
  %code p.x += xrel;
  %     p.y += yrel;
  %     render(&p);
  %result (colourpoint "p")

Note that in this example, the return value is actually the same structure as the argument value (destructively updated). It is for this reason that the final `p' is quoted as a C literal - it prevents the `declare' clause of the DIS macro from generating a second (overlapping) declaration of the variable in C. Here is a different example where it is more obvious that the literal-C argument to the `colourPoint' DIS should not generate a variable declaration:

  %fun nullPoint :: ColourPoint
  %result (colourPoint "{0,0,RED}")

6.3 Semantics of DISs

How does GreenCard use these DISs to convert between Haskell values and C values? We give an informal algorithm here, although most programmers should be able to manage without knowing the details.

To convert from Haskell values to C values, guided by a DIS, GreenCard does the following:

First, GreenCard recursively rewrites all DIS macro applications, replacing left hand side by right hand side, with actual variables substituted for formals.
Next, GreenCard works from outside in, as follows:
- For a data-constructor DIS (in either positional or named-field form), GreenCard generates a Haskell pattern-match to take the value apart.
- For a user-defined DIS, GreenCard generates a call to the DIS's `from_t' function.
- For a type-cast DIS, GreenCard pushes the type declaration inwards towards the use of the variable it declares.
- For a base DIS, GreenCard does no translation.
All variables remaining in the final expression must lie inside a base DIS. If this is not the case, then an error has occurred (probably the omission of a macro definition).
Finally, any variable used in the expanded DIS expression (and which has a C type-declaration clause attached) generates the appropriate declaration in C, and the variable is initialised with the value provided by Haskell. (In GHC, the value is unboxed and available directly. In Hugs or nhc13, the value is extracted from the stack.)

Much the same happens in the other direction, except that GreenCard calls the `to_t' function when inside a user-defined DIS, and builds a value with a data constructor, rather than taking it apart. Again, C variables are declared of the appropriate types, although of course a literal C expression in a result does not generate a declaration.

7 Standard DISs

Figure 4 gives the DIS macros that GreenCard provides as a "standard prelude".

green-card_7.1: GHC extensions.
green-card_7.2: Maybe.

standard DIS	Haskell type	C type	comments
int i	Int	`int`	.
char c	Char	`char`	.
bool b	Bool	`int`	0 for False, 1 for True
float f	Float	`float`	.
double d	Double	`double`	.
string s	String	`char*`	Persistence not required in either direction
addr a	Addr	`void*`	An immovable C address
foreign f r	ForeignObj	`void*`	r is the finalisation routine
stable s	a	`int`	int is just an index into the stable pointer table.

Figure 4: Standard DISs

7.1 Haskell type extensions

Several of the provided DISs involve types that go beyond standard Haskell:

`Addr' is a type large enough to contain a machine address. The Haskell garbage collector treats it as a non-pointer, however.
`ForeignObj' is a type designed to contain a reference to a foreign resource of some kind: a `malloc''d structure, a file descriptor, an X-windows graphic context, or some such. The size of this reference is assumed to be that of a machine address. When the Haskell garbage collector decides that a value of type `ForeignObj' is unreachable, it calls the object's finalisation routine, which was given as an address in the argument of the DIS. The finalisation routine is passed the object reference as its only argument.
The `stable' DIS maps a value of any type onto a C `int'. The `int' is actually an index into the stable pointer table, which is treated as a source of roots by the garbage collector. Thus the C procedure can effectively get a reference into the Haskell heap. When `stable' is used to map from C to Haskell, the process is reversed.

7.2 Maybe

Almost all DISs work on single-constructor data types. It is much less obvious how to translate values of multi-constructor data types to and from C. In fact, the right way to do it is through user-defined DISs. We illustrate how with a DIS for the `Maybe' type.

The definition of a `maybe' DIS is:

  %dis maybeInt default x = <fromMaybe %default/toMaybe %default> (int x)
  fromMaybe def (Nothing) = def
  fromMaybe def (Just x)  = x
  toMaybe def x
    | def == x  = Nothing
    | otherwise = Just x

where `default' is a Haskell expression which represents the `Nothing' value. Note how we use the `%' character to unquote the bound variable `default' within a context where it would otherwise be treated as literal Haskell.

In the following example, the function `foo' takes an argument of type `Maybe Int'. If the argument value is `Nothing' it will bind `x' to `0'; if it is `Just a' it will bind `x' to the value of `a'. The return value will be `Just r' unless `r == -1' in which case it will be `Nothing'.

  %fun foo :: Maybe Int -> Maybe Int
  %call (maybeInt 0 x)
  %code r = foo(x);
  %result (maybeInt -1 r)

8 Imports

GreenCard "connects" with code in other modules in two ways:

GreenCard reads the source code of any modules imported (recursively) by the module being processed. It extracts `%dis' function definitions (only) from these modules. This provides an easy mechanism for GreenCard to import DIS macros defined elsewhere. (Note however that GreenCard does not provide any namespace management, so it is up to the programmer to ensure that DIS macros from different modules do not share the same name. Note also that if a DIS macro uses a data constructor, that constructor must be exported/imported correctly.)
It is often important to arrange that a C header file is `#include'd when the C code fragments in GreenCard directives are compiled. The `%C' directive makes this possible.

9 Invoking GreenCard

Most Haskell compilers invoke GreenCard automatically when they are given a source file with the extension `.gc'. However, the general syntax for invoking GreenCard as a stand-alone program is:

    greencard [options] [filename]

GreenCard reads from standard input if no filename is given. The options can be any of these:

-tTARGET --target TARGET: Generate code for a particular Haskell compiler. Possible values of TARGET are currently `ghc', `Hugs', and `nhc'.
--version: Print the version number, then exit successfully.
-h --help: Print a usage message listing all available options, then exit successfully.
-v --verbose: Print more information while processing the input.
-d --debug: Print even more information while processing the input.
-iDIRS -PDIRS --include-dir DIRS: Search the directories named in the colon (`:') separated list for imported files. The directories will be searched in a left to right order, after the current directory.
-g --fgc-safe: Generates code that can use callbacks to Haskell. This makes the generated code slower. (Only meaningful for GHC.)

10 Related Work

A Portable C Interface for Standard ML of New Jersey, by Lorenz Huelsbergen, describes the implementation of a general interface to C for SML/NJ.
Simplified Wrapper and Interface Generator (SWIG) generate interfaces from (extended) ANSI C/C++ function and variable declarations. It can generate output for Tcl/Tk, Python, Perl5, Perl4 and Guile-iii. SWIG lives at @url{http://www.cs.utah.edu/ beazley/SWIG/}
Foreign Function Interface GENerator (FFIGEN) is a tool that parses C header files and presents an intermediate data representation suitable for writing backends. FFIGEN lives at @url{http://www.cs.uoregon.edu/ lth/ffigen/}
Header2Scheme is a program which reads C++ header files and compiles them into C++ code. This code implements the back end for a Scheme interface to the classes defined by these header files. Header2Scheme can be found at: @url{http://www-white.media.mit.edu/ kbrussel/Header2Scheme/}

11 Alternative design choices and avenues for improvement

Here we summarise aspects of GreenCard that are less than ideal, and indicate possible improvements.

Automatic DIS generation.: Pretty much every `newtype' or single-constructor declaration that is involved in a foreign language call needs a corresponding `%dis' definition. Maybe this `%dis' definition should be automated. On the other hand, there are many fewer data types than procedures, so perhaps it isn't too big a burden to define a `%dis' for each.
Error handling.: The error handling provided by `%fail' is fairly rudimentary. It isn't obvious how to improve it in a systematic manner.

Footnotes

(1)

Microsoft's Common Object Model (COM) is a language-independent software component architecture. It allows objects written in one language to create objects written in another, and to call their methods. The two objects may be in the same address space, in different address spaces on the same machine, or on separate machines connected by a network. OLE is a set of conventions for building components on top of COM.

Green card: a foreign-language interface for Haskell

Table of Contents

Footnotes