Haskell module namespaces proposal


This is an annoucement of a new mailing list, and a proposal for three things:

A formatted version of this proposal appears on the web at http://www.cs.york.ac.uk/fp/libraries/

The new mailing list is for the discussion of these proposals. Please subscribe if you are interested. Follow-ups set accordingly.

Mailing list details

libraries@haskell.org

The purpose for this new list is to:

(a) discuss an extension to Haskell to provide a richer module namespace,
(b) discuss how to partition this namespace and populate it with libraries,
(c) discuss how to provide a consistent set of libraries for all compilers, and the setting up of a common library repository.

To subscribe: http://haskell.org/mailman/listinfo/libraries/

Introduction

Everyone agrees that Haskell needs good, useful, libraries: lots of them, well-specified, well-implemented, well-documented. A problem is that the current "Standard Libraries" defined by the Haskell'98 Report number only about a dozen. But there are actually many more libraries out there: some are in GHC's hslibs collection, others are linked from haskell.org, even more are used only by their original author and have no public distribution.

What is more, there is no Haskell Committee. There is no-one to decide which candidate libraries are worthy to be added to the "Standard" set. This stifles the possible distribution of great libraries, because no-one knows how to get my library "accepted".

Furthermore, the existing libraries that people distribute from their own websites often run into problems when used alongside other people's libraries. A library usually consists of several modules, but often the constituent modules have simple names that can easily clash with modules from another library package. This leads people to ad hoc solutions such as prefixing all their modules with a cryptic identifier e.g.

        HsParse
        XmlParse
        HOGLParse
        THIHParse

Just counting the libraries currently available from GHC's hslibs, and haskell.org's links, there are currently over 200 separate modules in semi-"standard" use. As more libraries are written, the possibility of clashes can only increase.

Related to this problem, although not identical, is the difficulty of finding a library that provides exactly the functionality you need to help you write a specific application program. How do you go about searching through 200+ modules for interesting-looking datatypes and signatures, starting only from the module names?

My View

My view is that many of these problems are rooted in Haskell's restriction to a flat module namespace. If we can address that issue adequately, then I believe that many of the difficulties surrounding the provision of good libraries for Haskell will simply fall away.

Proposal 1

Introduce nested namespaces for modules. The key concept here is to map the module namespace into a hierarchical directory-like structure. I propose using the dot as a separator, analogous to Java's usage for namespaces.

So for instance, the four example module names above using cryptic prefixes could perhaps be more clearly named

    Haskell.Language.Parse
    Text.Xml.Parse
    Graphics.Drawing.HOpenGL.ConfigFile.Parse
    TypeSystem.Parse

Naming proceeds from the most general category on the left, through more specific subdivisions towards the right.

For most compilers and interpreters, this extended module namespace maps directly to a directory/file structure in which the modules are stored. Storing unrelated modules in separate directories (and related modules in the same directory) is a useful and common practice when engineering large systems.

(But note that, just as Haskell'98 does not insist that modules live in files of the same name, this proposal does not insist on it either. However, we expect most tools to use the close correspondance to their advantage.)

There are several issues arising from the particular proposal here.

Further down this document, I give more motivation and a rationale for this proposal of nested namespaces. But first, two other proposals which rest on the first one.

Proposal 2

Adopt a standardised namespace layout to help those looking for or writing libraries, and a "Std." namespace prefix for genuinely standard libraries. (These are two different things.)

The hslibs collection of modules is a great starting place for finding common libraries that could become standards. I propose that we adopt a "standardised" namespace hierarchy, based on the current hslibs layout, into which Haskell programmers can plug their own libraries relatively easily (whether they intend to release them or not). The aim is to make it clear where to place a new module, and where to search for a possible existing module.

For instance, in ASCII art, here is a small part of a suggested tree.

    + Data + Structures + Trees + AVL
    |      |            |       + RedBlack
    |      |            |
    |      |            + Queue + Bankers
    |      |                    + FIFO
    |      + Encoding + Binary
    |                 + MD5
    |
    + Graphics + UI + Gtk + Widget
    |          |    |     + Pane
    |          |    |     + Text
    |          |    | 
    |          |    + FranTk
    |          |
    |          + Drawing + HOpenGL + ....
    |          |         + Vector
    |          |
    |          + Format + Jpeg
    |                   + PPM
    + Haskell + ....
    |

A fuller proposed layout appears on the web at http://www.cs.york.ac.uk/fp/libraries/layout.html.
Simon Marlow suggests an alternative layout at http://www.cs.york.ac.uk/fp/libraries/layoutSM.html

In addition to a standardised hierarchy layout, I propose a truly Standard-with-a-capital-S namespace. A separate discussion is needed on what exactly would consitute "Standard" quality, but by analogy with Java where everything beginning "java." is sanctioned by Sun, I propose that every module name beginning "Std." is in some sense sanctioned by the whole Haskell community.

So for instance, an experimental, or not-quite-complete, library could be called

    Text.Xml
but only a guaranteed-to-be-stable, complete, library could be called
    Std.Text.Xml

The implication of the Std. namespace is that all such "standard" libraries will be distributed with all Haskell systems. In other words, you can rely on a standard library always being there, and always having the same interface on all systems.

Proposal 3

Develop a process by which candidate libraries can be proposed to enter the Std. namespace.

Since Haskell'98 is fixed, and there is no longer a Haskell Committee, there is no official body capable of deciding new standards for libraries. However, we do have a Haskell community which will use or not use libraries, depending on their quality. So libraries will become standards by a de-facto process, rather than de-jure.

Apart from the Haskell compiler implementers, we wanted a means to encourage the whole community to be involved in recognising de facto "standard" libraries. The mailing list 'libraries@haskell.org' is one contribution. We hope this will work on the same model as the FFI mailing list, which has been pretty successful at allowing a community of designers and implementers to explore their FFI needs and solidify a design that is common across at least three Haskell systems.

On top of this discussion however, some final decisions will have to be made on which libraries achieve entry to the "Std." namespace. The Haskell implementers have collectively proposed a ruling troika, one representing each of the three main Haskell systems (Hugs,ghc,nhc98). These are Simon Marlow, representing ghc, and current keeper of the hslibs collection; Malcolm Wallace, representing nhc98; and Andy Gill, representing Hugs users.

Some obvious criteria for entry to the "Std." namespace would be:

These suggested criteria need some discussion and improvement.

After the initial period of deciding what belongs in the "Std." namespace, I would expect any further candidate libraries that are proposed for standardisation to spend some time in another part of the namespace hierarchy whilst they gain stability and common acceptance, before being moved to "Std.".

Rationale and Motivation for Proposal 1 (nested namespaces)

Scenario 1

Imagine you have just written a new library of, say, pretty-printing combinators. You want to release it to the Haskell public. So what do you call it?

    module Pretty	-- already taken (several times)
    module UU_Pretty	-- also taken
    module PrettyLib	-- already exists as well
Ok, so lacking any further inspiration, you end up deciding to call it
    module MyPretty	-- !

Surely there must be a better solution. Of course there is - namespaces. Let's classify libraries that do similar jobs together:

    module Text.PrettyPrinter.Hughes	-- the original Hughes design
    module Text.PrettyPrinter.HughesPJ	-- later modified by Simon PJ
    module Text.PrettyPrinter.UU	-- the Utrecht design
    module Text.PrettyPrinter.Chitil	-- Olaf's new design

These are exactly the same Pretty libs as before, but named more sensibly. It is still clear that each is a pretty-printing library, but it is also clear that they are different.

Incidentally, have you ever tried to write your own module called Pretty? You may have discovered with GHC (which has a Pretty already in the hslibs collection), that you get strange errors. This is because sometimes the compiler can be confused into reading one Pretty.hi interface file (i.e. yours), yet linking the other Pretty.o object file (i.e. from hslibs), ending in a core dump. With proper module namespaces, this confusion should never happen again.

Scenario 2

You are writing a complex library that has a couple of layers of abstraction. For some users, you want to expose just a small high-level set of types and functions. Other users will need more detailed access to lower-level stuff.

With namespaces, you can use the directory-like structure to make these kinds of access explicit. For instance, imagine a socket library:

    module Network.Socket
It exports an abstract type Socket for ordinary users - they only need to know its name. More advanced hackers however can play with the details of the type, because you also have:
    module Network.Socket.Types
which exports the Socket type non-abstractly i.e. Socket(..). And of course this abstraction is easy for the library-writer to manage, because the implementation of the more abstract layer simply imports and re-exports a careful selection of the more detailed layers.

Don't forget that, in terms of the actual filesystem layout, it is perfectly OK to have e.g.

    file  Network/Socket.hs
    dir   NetWork/Socket
    file  Network/Socket/Types.hs

Scenario 3

You are managing a software engineering project. Several people are working more-or-less independently on different sections of the program. To avoid mistakes with files, you give each one a separate directory to place their code in. But in Haskell'98 this is not enough to ensure that they invent module names that do not clash with other developers' modules. So you insist that everyone also uses a prefix-naming scheme for each appropriate sub-task.

For instance, here is a sketch of the layout of the Galois Connection team's entry in the ICFP 2000 programming contest:

    dir  CSG			-- constructive solid geometry
    file CSG/CSG.hs
    file CSG/CSGConstruct.hs
    file CSG/CSGGeometry.hs
    file CSG/CSGInterval.hs
    dir  Fran			-- Fran-style animation
    file Fran/FranLite.hs
    file Fran/FranCSG.hs
    dir  GML			-- interpreter for little language
    file GML/GMLData.hs
    file GML/GMLParse.hs
    file GML/GMLPrimitives.hs

So now the problem is that to actually build the software, you need to write a Makefile that descends into these directories. Or maybe you use 'hmake' like so:

    hmake examples/chess.hs -ICSG -IFran -IGML -IRayTrace -package text

Note how many sub-directories you must remember to add to the command line (this applies equally for compiler options in Makefiles). Note also the inconsistency between compiling and linking my modules, against using and linking a "standard" hslibs module from package text.

Isn't there a simpler way? Yes. Namespaces. Prefix naming is no longer needed inside directories, because the directory name is part of the module name:

    file CSG.hs			-- re-exports everything from the CSG dir
    dir  CSG
    file CSG/Construct.hs
    file CSG/Geometry.hs
    file CSG/Interval.hs
    dir  Fran
    file Fran/Lite.hs
    file Fran/CSG.hs		-- does not conflict with top-level CSG.hs
    dir  GML
    file GML/Data.hs
    file GML/Parse.hs
    file GML/Primitives.hs

And now, the commandline to 'hmake' (or compiler options in a Makefile) becomes simply:

    hmake examples/chess.hs -I.

You only need to specify the root of the module tree (-I.), and all modules in all subdirectories can be found via their full namespace path as used in the source files. Note also that, whereas previously we needed to specify a package for whatever hslibs modules were used, now the compiler/hmake already knows the root of the installed hslibs tree and can use the same mechanism to find and link "standard" modules as for user modules.

From this example it should be clear that the use of module namespaces is of benefit to ordinary programs that may never become public, quite aside from any benefits we expect to derive in managing publically-distributed library code.

What now?

Ok, so that's my proposal. The implementers of some of the main Haskell systems have seen a presentation of these ideas, and seemed to like them. Namespaces are already implemented in nhc98 (v1.02) and hmake (v2.02) if you want to play with them. I expect some discussion to refine this proposal on the 'libraries@haskell.org' list, to which everyone interested is invited.

Once we have nailed down the precise design, we need to get matching implementations in all systems. I have rashly volunteered to implement the lexical/parsing/module-search changes in any Haskell system that no-one else volunteers for (probably ghc, Hugs, possibly hbc).

But after that we will still have many more decisions to take about individual libraries, precise naming, build systems, and so on, not to mention actually writing the libraries. Get involved. Contribute.

Malcolm.Wallace@cs.york.ac.uk