C2HS example: To save other people frustration

Written by Jeff Heard on July 10th, 2009

C2Hs is a wonderful little tool.  It generates a lot of the boilerplate code for binding C libraries to Haskell and saves wrists and frustration. However, the documentation I’ve manage to find on it has at times been buggy and incomplete.  I can’t guarantee the following tutorial is idiomatic c2hs — actually I can guarantee that it isn’t — but I can guarantee that it works.

Today I bound libshapefile to Haskell using c2hs.  The following code is not at all Haskell-ish, but I find that it is best to write a true-to-C interface to C code first and then write Haskellish interfaces on top of that.  The following bindings are generated, along with getters and setters for the items in the SHPObject structure:

open :: String -> String -> IO (SHPHandle)
getInfo :: SHPHandle -> IO (Int, Int, [Double], [Double])
readObject :: SHPHandle -> Int -> IO (SHPObject)
close :: SHPHandle -> IO ()
create :: String -> Int -> IO (SHPHandle)
createSimpleObject :: Int -> Int -> [Double] -> [Double] -> [Double] -> IO (SHPObject)
createObject :: Int -> Int -> Int -> [Int] -> [Int] -> Int -> [Double] -> [Double] -> [Double] -> [Double] -> IO (SHPObject)
computeExtents :: SHPObject -> IO ()
writeObject :: SHPHandle -> Int -> SHPObject -> IO (Int)
destroyObject :: SHPObject -> IO ()
rewindObject :: SHPHandle -> SHPObject -> IO (Int)

I’m going to intersperse the c2hs code that I wrote with some explanatory text in hopes that someone finds it helpful.  If the authors of c2hs or someone who is more of an expert in this than me has comments on the code, I’ll be happy to include them in the tutorial.

{-# LANGUAGE ForeignFunctionInterface #-}
{-# LANGUAGE TypeSynonymInstances #-}
module Gis.Shapefile.Internal where

#include <shapefil.h>

import C2HS
import Foreign.Ptr
import System.IO.Unsafe
import Foreign.C
import Control.Monad
import Control.Applicative ((<$>))

The library that you’re binding to needs to be enclosed in a {#context #} tag first.  There are other parameters that can go into context, but these weren’t needed for what I was doing, and don’t seem to be needed super-commonly.  The C2HS import can be found in $HOME/.cabal/share if you’ve installed c2hs with Cabal.  Noe that the #include is left bare up top.  That’s correct even though it’s invalid Haskell code.

For many enumerations, c2hs provides the handy-dandy {#enum #} construct, but for #define-d constants, we have to wrap our own.  I could make these instances of Enum, but it seems like too much work for an internal interface.

{#context lib="shapefile" #}

-- these are defined constants, not an enum, so we can't just use the #enum hook
shptNull = 0
shptPoint = 1
shptArc = 3
shptPolygon = 5
shptMultipoint = 8
shptPointZ = 11
shptArcZ = 13
shptPolygonZ = 15
shptMultipointZ = 18
shptPointM = 21
shptArcM = 23
shptPolygonM = 25
shptMultipointM = 28
shptMultiPatch = 35

data SHPType =
  | TPoint
  | TArc
  | TPolygon
  | TMultipoint
  | TPointZ
  | TArcZ
  | TPolygonZ
  | TMultipointZ
  | TPointM
  | TArcM
  | TPolygonM
  | TMultipointM
  | TMultipatch

shpTypeToIntegral TNull = 0
shpTypeToIntegral TPoint= 1
shpTypeToIntegral TArc=3
shpTypeToIntegral TPolygon=5
shpTypeToIntegral TMultipoint=8
shpTypeToIntegral TPointZ=11
shpTypeToIntegral TArcZ=13
shpTypeToIntegral TPolygonZ=15
shpTypeToIntegral TMultipointZ=18
shpTypeToIntegral TPointM=21
shpTypeToIntegral TArcM=23
shpTypeToIntegral TPolygonM=25
shpTypeToIntegral TMultipointM=28
shpTypeToIntegral TMultipatch=35

integralToSHPType :: (Integral a) => a -> SHPType
integralToSHPType 0=TNull
integralToSHPType 1=TPoint
integralToSHPType 3=TArc
integralToSHPType 5=TPolygon
integralToSHPType 8=TMultipoint
integralToSHPType 11=TPointZ
integralToSHPType 13=TArcZ
integralToSHPType 15=TPolygonZ
integralToSHPType 18=TMultipointZ
integralToSHPType 21=TPointM
integralToSHPType 23=TArcM
integralToSHPType 25=TPolygonM
integralToSHPType 28=TMultipointM
integralToSHPType 35=TMultipatch

Note that I declare opaque types two different ways below.  I couldn’t get c2hs to understand the SHPHandle declaration, because it was itself an opaque type in C.  Well, because of something anyway.  Anyway, because of that, I wrote out the opaque type by hand.  Yes it looks recursive, but GHC handles it just fine, and if you use the -XEmptyDataDecls extension this is what the code actually reduces to.

newtype SHPHandle = SHPHandle (Ptr (SHPHandle))
{#pointer *SHPObject newtype #}

Now to define a bunch of getters and setters for the SHPObject type.  The c2hs documentation doesn’t seem to require that you unwrap the opaque type yourself into the Ptr type, but I had to.  It compiles fine if you don’t, but then you have to unwrap it in your other code, and I like to keep all my foreign data in one module where possible.

getSHPType (SHPObject x) = integralToSHPType <$> {#get SHPObject->nSHPType #} x
getShapeId (SHPObject x) = {#get SHPObject->nShapeId #} x
getParts (SHPObject x) = {#get SHPObject->nParts #} x
getPartStart (SHPObject x) = {#get SHPObject->panPartStart #} x
getPartType (SHPObject x) = {#get SHPObject->panPartType #} x
getVertices (SHPObject x) = {#get SHPObject->nVertices #} x
getX (SHPObject x) = {#get SHPObject->padfX #} x
getY (SHPObject x) = {#get SHPObject->padfY #} x
getZ (SHPObject x) = {#get SHPObject->padfZ #} x
getM (SHPObject x) = {#get SHPObject->padfM #} x
getXMin (SHPObject x) = {#get SHPObject->dfXMin #} x
getYMin (SHPObject x) = {#get SHPObject->dfYMin #} x
getZMin (SHPObject x) = {#get SHPObject->dfZMin #} x
getMMin (SHPObject x) = {#get SHPObject->dfMMin #} x
getXMax (SHPObject x) = {#get SHPObject->dfXMax #} x
getYMax (SHPObject x) = {#get SHPObject->dfYMax #} x
getZMax (SHPObject x) = {#get SHPObject->dfZMax #} x
getMMax (SHPObject x) = {#get SHPObject->dfMMax #} x

setSHPType (SHPObject p) v = {#set SHPObject->nSHPType #} p (shpTypeToIntegral v)
setShapeId (SHPObject p) = {#set SHPObject->nShapeId #} p
setParts (SHPObject p) = {#set SHPObject->nParts #} p

In the next line we see some Haskell code being obviously mixed with c2hs markup.  This is standard practice and it’s safer than it seems like it might be.  c2hs parenthesizes most things and seems to handle Haskell syntax perfectly.  Here we have to marshal an list of doubles to a pointer.  The newArray is correct here, because the array needs to live on the C heap and not be garbage collected.  The new- as opposed to alloca- means that it is our responsibility to deallocate it.  Fortunately, destroyObject (bound later) releases that memory for us.

setPartStart (SHPObject p) v = newArray v >>= {#set SHPObject->panPartStart #} p
setPartType (SHPObject p) v = newArray v >>= {#set SHPObject->panPartType #} p
setVertices (SHPObject p) = {#set SHPObject->nVertices #} p
setX (SHPObject p) v = newArray v >>= {#set SHPObject->padfX #} p
setY (SHPObject p) v = newArray v >>= {#set SHPObject->padfY #} p
setZ (SHPObject p) v = newArray v >>= {#set SHPObject->padfZ #} p
setM (SHPObject p) v = newArray v >>= {#set SHPObject->padfM #} p
setXMin (SHPObject p) = {#set SHPObject->dfXMin #} p
setYMin (SHPObject p) = {#set SHPObject->dfYMin #} p
setZMin (SHPObject p) = {#set SHPObject->dfZMin #} p
setMMin (SHPObject p) = {#set SHPObject->dfMMin #} p
setXMax (SHPObject p) = {#set SHPObject->dfXMax #} p
setYMax (SHPObject p) = {#set SHPObject->dfYMax #} p
setZMax (SHPObject p) = {#set SHPObject->dfZMax #} p
setMMax (SHPObject p) = {#set SHPObject->dfMMax #} p

Now for a few utility functions: to/fromSHPHandle, which marshals and unmarshals a bare pointer, allocate/peek4, and peekInt.  We’ll use these soon in our {#fun #} bindings, but it appears from working with c2hs that you can’t curry arguments into marshallers or use lambda expressions; I’m not sure if this is me or if this is c2hs, but the solution of writing utility functions works.

toSHPHandle = SHPHandle . castPtr
fromSHPHandle (SHPHandle x) = castPtr x

arrDouble x = ((newArray . map realToFrac $ x) >>=)
arrInt x = ((newArray . map fromIntegral $ x) >>=)

allocate4 = allocaArray 4 

peek4 d = do
    lst <- (peekArray 4 d :: IO [CDouble])
    return . map cFloatConv $ lst

peekInt i = peek i >>= return . cIntConv    

{#fun SHPOpen as open
    { `String'
    , `String'
    } -> `SHPHandle' toSHPHandle #}

Right.  So I’ll explain the two marshallers on either side of the paragraph here.  Argument and return types are outlined in c2hs using a backtick followed by the type followed by a forward tick.  That is not a formatting mistake.  To the left of the arrow are the arguments, and to the right of the arrow is the return type. On the left side of each type can be an “in-marshaller” and on the right, an “out-marshaller”, which translates between Haskell types and C types. There are also two signifiers that can come at the end of the marshaller: - and *.  the ‘-’ on the in-marshaller signifies that the argument is to be handled entirely within the function def and not passed in as a parameter.  The * signifier says that the result will be within the IO monad and to handle it specially.  A good rule seems to be that if you’re working with pointers, you’ll need it.  Out marshallers in the arg list imply that the parameters are “out” parameters, meant to be returned as part of the function return.  c2hs generally handles this as a tuple.  So the following function will have type SHPHandle -> IO (Int,Int,[Double],[Double]).

-- void SHPGetInfo(SHPHandle, int*, int*, double*, double*)
{#fun SHPGetInfo as getInfo
    { fromSHPHandle `SHPHandle'
    , alloca- `Int' peekInt*
    , alloca- `Int' peekInt*
    , allocate4- `[Double]' peek4*
    , allocate4- `[Double]' peek4*
    } -> `()' #}

Note the use of “id” as a marshaller below.  It seemed like c2hs should have a default marshaller for opaque types, but compiling it told me otherwise, so I added it and it worked.  Experimentation, experimentation!

{#fun SHPReadObject as readObject
    { fromSHPHandle `SHPHandle'
    , `Int'
    } -> `SHPObject' id #}

{#fun SHPClose as close
    { fromSHPHandle `SHPHandle' } -> `()' #}

{#fun SHPCreate as create
    { `String'
    , `Int'
    } -> `SHPHandle' toSHPHandle  #}

{#fun SHPCreateSimpleObject as createSimpleObject
    { `Int'
    , `Int'
    , arrDouble* `[Double]'
    , arrDouble* `[Double]'
    , arrDouble* `[Double]'
    } -> `SHPObject' id #}

{#fun SHPCreateObject as createObject
    { `Int'
    , `Int'
    , `Int'
    , arrInt* `[Int]'
    , arrInt* `[Int]'
    , `Int'
    , arrDouble* `[Double]'
    , arrDouble* `[Double]'
    , arrDouble* `[Double]'
    , arrDouble* `[Double]'
    } -> `SHPObject' id #}

{#fun SHPComputeExtents as computeExtents
    { id `SHPObject' } -> `()' #}

{#fun SHPWriteObject as writeObject
    { fromSHPHandle `SHPHandle'
    , `Int'
    , id `SHPObject' } -> `Int' #}

{#fun SHPDestroyObject as destroyObject
    { id `SHPObject' } -> `()' #}

{#fun SHPRewindObject as rewindObject
    { fromSHPHandle `SHPHandle'
    , id `SHPObject' } -> `Int'  #}

This compiled and ran on my machine. Note that you do have to have libshp and shapfil.h installed on your machine to compile this example.

Anyway, I hope this helps someone.  My first c2hs wrapper was a heck of an experience, but the code seems easy to maintain and the results of it not much different than what I would have written by hand except that it’s incredibly shorter.


3 Comments so far ↓

  1. – these are defined constants, not an enum, so we can’t just use the #enum hook

    This isn’t quite true. If you look in section 3.3 of the paper at http://www.cse.unsw.edu.au/~chak/papers/Cha99b.html
    There is a better way to use the #define’d constants, using something like
    {#enum define Wrapping{GL_CLAMP as Clamp,
    GL_CLAMP_TO_EDGE as ClampToEdge, GL_REPEAT as Repeat #} as the example they give, which avoids the manually entered constants you use, and also takes advantage of the type system, so you have a Wrapping type instead of a bunch of constant functions of type Int.

  2. Actually I didn’t at first notice the SHPType stuff so you do have a separate type, but using the #enum define would avoid all that boilerplate code converting from Int.

  3. Maurício says:

    You say that you “find that it is best to write a true-to-C interface to C code first and then write Haskellish interfaces on top of that”.

    I wrote a FFI package with that in mind. If you ever have time you want to spend on this, would you mind giving it a quick look and maybe offer some criticism on what needs improvement?


    Thanks. Best,

Leave a Comment