Data.Derive: A User Manual

by Neil Mitchell & Stefan O'Rear

Data.Derive is a library and a tool for deriving instances for Haskell programs. It is designed to work with custom derivations, SYB and Template Haskell mechanisms. The tool requires GHC, but the generated code is portable to all compilers. We see this tool as a competitor to DrIFT.

This document proceeds as follows:

  1. Obtaining and Installing Data.Derive
  2. Supported Derivations
  3. Using the Derive Program
  4. Using Template Haskell Derivations
  5. Writing a New Derivation

Acknowledgements

Thanks to everyone who has submitted patches and given assistance, including: Twan van Laarhoven, Spencer Janssen, Andrea Vezzosi, Samuel Bronson, Joel Raymont, Benedikt Huber.

Obtaining and Installing Data.Derive

Data.Derive is available using darcs:

darcs get --partial http://www.cs.york.ac.uk/fp/darcs/derive

Install the program using the standard sequence of Cabal magic:

runhaskell Setup configure
runhaskell Setup build
runhaskell Setup install

Supported Derivations

Data.Derive is not limited to any prebuild set of derivations, see later for how to add your own. Out of the box, we provide instances for the following libraries.

Prelude

These are the standard classes defined in the Haskell Report, some of which the existing deriving works upon.

Base

These are instances from the base libraries, but which aren't in the Haskell 98 report.

Query

DrIFT defines a number of useful query functions, which are technically not instances, but can be derived in a similar manner. We support some of these as from DrIFT, some with modifications, and some which are brand new:

Generics

We support the two classes from the first Scrap Your Boilerplate paper, and the classes from the Play library:

Binary

We support the new Binary library, and the BinaryDefer library.

Testing

We support both QuickCheck and the SmallCheck library:

Classhacking

From the HList library:

Missing

These derivations are in DrIFT, but not in Derive. If you need them, let us know and we'll implement them.

Using the Derive program

Let's imagine we've defined a data type:

data Color = RGB Int Int Int
           | CMYK Int Int Int Int
           deriving (Eq, Show)

Now we wish to extend this to derive Binary and change to defining Eq using our library. To do this we simply add to the deriving clause.

data Color = RGB Int Int Int
           | CMYK Int Int Int Int
           deriving (Show {-! Eq, Binary !-})

Now running derive on the program containing this code will generate appropriate instances. How do you combine these instances back into the code? There are various mechanisms supported.

Appending to the module

One way is to append the text to the bottom of the module, this can be done by passing the --append flag. If this is done, Derive will generate the required instances and place them at the bottom of the file, along with a checksum. Do not modify these instances.

Using CPP

One way is to use CPP. Ensure your compiler is set up for compiling with the C Pre Processor. For example:

{-# OPTIONS_GHC -cpp #-}
{-# OPTIONS_DERIVE --output=file.h #-}

module ModuleName where

#include "file.h"

Side-by-side Modules

If you had Colour.Type, and wished to place the Binary instance in Colour.Binary, this can be done with:

{-# OPTIONS_DERIVE --output=Binary.hs --module=Colour.Binary --import #-}

Here you ask for the output to go to a particular file, give a specific module name and import this module. This will only work if the data structure is exported non-abstractly.

Using Template Haskell Derivations

One of Derive's major advantages over DrIFT is support for the Template Haskell (henceforth abbreviated "TH") system. This allows Derive to be invoked automatically during the compilation process, and (because it occurs with full access to the renamer tables) transparently supports deriving across module boundaries. The main disadvantage of TH-based deriving is that it is only portable to compilers that support TH; currently that is GHC only.

To use the TH deriving system, with the same example as before:

import Data.DeriveTH
import Data.Derive.Eq
import Data.Derive.Binary

data Color = RGB Int Int Int
           | CMYK Int Int Int Int
           deriving (Show)

$( derive makeEq ''Color )
$( derive makeBinary ''Color )

Note two things. First, we need to import the derivations. By convention, a derivation for a class FooBar is located in module Data.Derive.FooBar (nota bene: this need not be in package "derive") and is exported with the name makeFooBar. Secondly, we need to tell the compiler to insert the instance using the TH splice construct, $( ... ) (the spaces are optional). The splice causes the compiler to run the function derive (exported from Data.DeriveTH), passing arguments makeFooBar and ''Color. The second argument deserves more explanation; it is a quoted symbol, somewhat like a quoted symbol in Lisp and with deliberately similar syntax. (Two apostrophes are used to specify that this name is to be resolved as a type constructor; just 'Color would look for a data constructor named Color.)

Writing a New Derivation

There are two methods for writing a new derivation, guessing or coding. The guessing method is substantially easier if it will work for you, but is limited to derivations with the following properties:

If however your instance does meet these properties, you can use derivation by guess. Many instances do meet these conditions: Eq, Ord, Data, Serial etc.

Derivation by Guess

This is a unique feature of this library. You simply give an instance, and the program guesses what your instance derivation code should look like, and returns it. You paste the code in, and you have written an instance without learning any of the types or functions required to construct the abstract syntax. For example, lets take the Data instance. I recommend reading through the source in Data.Derive.Data first, then matching it to this description.

First copy the Data file, changing all the obvious bits (makeData etc) to whatever name you want. Next change the example to match your requirements. You basically define an instance for DataName which is defined as:

data DataName a = CtorZero
                | CtorOne  a
                | CtorTwo  a a
                | CtorTwo' a a

Try and make your declaration as inductive as possible. Use x1 etc for variable names within a constructor match. Place all the constructors in the correct order. If you would be unable to see an obvious pattern, then the guesser won't either. Once we have written our sample instance:

> ghci Data.Derive.Data -DGUESS
   ___         ___ _
  / _ \ /\  /\/ __(_)
 / /_\// /_/ / /  | |      GHC Interactive, version 6.6, for Haskell 98.
/ /_\\/ __  / /___| |      http://www.haskell.org/ghc/
\____/\/ /_/\____/|_|      Type :? for help.

Loading package base ... linking ... done.
Ok, modules loaded: Data.Derive.Data, Data.DeriveGuess, Language.Haskell.TH.All,
 Language.Haskell.TH.SYB, Language.Haskell.TH.Data, Language.Haskell.TH.FixedPpr
, Language.Haskell.TH.Helper, Language.Haskell.TH.Peephole.
*Data.Derive.Data> guess example

makeData = Derivation data' "Data"
data' dat = [instance_context ["Data","Typeable"] "Data" dat [(FunD (mkName
    "gfoldl") ((map (\(ctorInd,ctor) -> (Clause [(VarP (mkName "k")),(VarP (
    mkName "r")),(ConP (mkName ("" ++ ctorName ctor)) ((map (\field -> (VarP (
    mkName ("x" ++ show field)))) (id [1..ctorArity ctor]))++[]))] (NormalB (
    foldr1With (VarE (mkName "k")) ((map (\field -> (VarE (mkName ("x" ++ show
    field)))) (reverse [1..ctorArity ctor]))++[(AppE (VarE (mkName "r")) (ConE
    (mkName ("" ++ ctorName ctor))))]++[]))) [])) (id (zip [0..] (dataCtors dat
    ))))++[]))]]

And thats it. The block of code spewed out will generate Data instances, we just paste it back into the file.

There is lots of clever stuff, induction hypotheses etc going on behind all this. If you have an instance which you think should be inferable, but isn't, then let me know.

Derivation by Coding

We use the Template Haskell data types extensively, for examples take a look at Binary and Functor. Its not particularly hard, but it is harder than just having them guessed.