Haskell

...now browsing by tag

 
 

HDR imaging library for Haskell based on pfsutils

Friday, August 28th, 2009

PFS stands for Portable Floating-point Streams, and it’s a framework for reading and writing HDR images in a variety of formats and also tonemapping them. The HDRUtils library is an Haskell library for reading these streams and formatting them into Haskeller friendly 2D C-compatible arrays of 32bit floats.On top of these are a simple index/assign bijection that uses Russell O’Connor’s Data.Colour package to deal with colorspaces in a human-perception centric way.  Most tonemapping algorithms are based loosely on human perception models, so Haskell is ideally suited to developing new tonemapping algorithms based on roconnor’s model.

The general procedure for working with the streams is something like:

import Graphics.Image.PFS
import Graphics.Image.PixelMap

main = getFrame >>= myToneMappingProcedure . toPixelMap >>= putFrame . fromPixelMap

Where myToneMappingProcedure is something that takes pixels and rewrites them algorithmically.  I intend soon to add a more functional layer to this, using immutable arrays or STArrays, but I haven’t decided on the best way to do it yet. This at least makes the library available.  I’d like to see GdkPixbuf or DevIL images map to PixelMaps, eventually, and to see frame-aligning algorithms, convolutions, and other things layer on top of this library as applicative combinators.  In the meantime, however, have fun with what I have, and if you have a good idea for a combinator architecture, let me know and we’ll work it out.

This project is Open Sourced on github

Followup to my earlier post on Hilbert curve timeseries plots

Monday, August 24th, 2009

Note that the Hilbert code is not mine.  Rather it is from a Haskell Golf Challenge post in a pastebin and appropriated because I wanted a quick Hilbert curve without having to think through a hazy minded Friday evening.  Can’t find the original post now, actually.  That’ll teach me to bookmark things that are important.  Here’s the other person’s code.  If anyone can help me claim it, I will post your name as author.  Again, apologies in advance for that :-(.

h :: [Bool] -> ([Bool], [Bool])
h = l

go :: Bool -> Bool -> ([Bool], [Bool]) -> ([Bool], [Bool])
go x y (xs,ys) = (x:xs, y:ys)

l, r :: [Bool] -> ([Bool], [Bool])
l (False:False:ns) = go False False $ right (r ns)
l (False:True:ns) = go False True $ l ns
l (True:False:ns) = go True True $ l ns
l (True:True:ns) = go True False $ left (r ns)
l _ = ([], [])

r (False:False:ns) = go False True $ left (l ns)
r (False:True:ns) = go False False $ r ns
r (True:False:ns) = go True False $ r ns
r (True:True:ns) = go True True $ right (l ns)
r _ = ([], [])

left, right :: ([Bool], [Bool]) -> ([Bool], [Bool])
left (False:xs, False:ys) = go False True $ left (xs,ys)
left (False:xs, True:ys) = go True True $ left (xs,ys)
left (True:xs, False:ys) = go False False $ left (xs,ys)
left (True:xs, True:ys) = go True False $ left (xs,ys)
left _ = ([], [])

right (False:xs, True:ys) = go False False $ right (xs,ys)
right (True:xs, True:ys) = go False True $ right (xs,ys)
right (False:xs, False:ys) = go True False $ right (xs,ys)
right (True:xs, False:ys) = go True True $ right (xs,ys)
right _ = ([], [])

-- Infrastructure for testing:
bits :: Int -> Int -> [Bool]
bits n k = go n k [] where
  go 0 k = id
  go n k = go (n-1) (k `div` 2) . (odd k:)

num :: [Bool] -> Double
num (False:xs) = num xs / 2
num (True:xs) = (num xs + 1) / 2
num [] = 0

hilbert :: Int -> Int -> (Double, Double)
hilbert n k = (\(x,y) -> (num x, num y)) (h (bits n k))

--

Here begins my own code.  To use the Data.Colour.blend function, I need to normalize all the values.  Here is that function, which could be made considerably more efficient with a minimax instead of independently calling minimum and maximum, but again, the point here is illustration of a technique, not the most beautiful code.

normalize :: [Double] -> [Double]
normalize values = [(val-minval) / (maxval-minval) | val <- values]
    where minval = minimum values
          maxval = maximum values

Following that, we have a function and its helper for creating a hilbert plot of the data.  Note the use of the constant 64.  The Hilbert code above keeps everything within a unit vector of the origin, so we scale out for the resolution.  The resolution should properly be ceiling . log2 of the number of items in the list, which could be calculated efficiently, but it would clutter the code.

vis :: Int -> [Double] -> BaseVisual
vis n values = concat $ zipWith vis'
                [(64*x,64*y) | (x,y) <- (map (hilbert n) [0..2^n-1])]
                (normalize values)

Finally here is the visualization of a single point whose x,y vector is now the Hilbert point for its position in the timeseries.  We blend between two colors, blue for 0 and orange for 1. This could just as easily be a more complicated colourmap, but this is the code that generated the colormap from the previous post.

vis' :: (Double,Double) -> Double -> BaseVisual
vis' (x,y) val = fillcolour (blend val c0 c1)
               . filled True
               . outlined False
               $ arc{ center = Point x y, radius=0.5 }
    where c0 = opaque blue
          c1 = opaque orange
--

And finally the main program.  All we do here is take the filename from the arguments, read in the lines as Double values, and map those values to colored hilbert points using our vis function.

main = do
  name <- (!!0) <$> getArgs
  k <- map read . tail . words <$> readFile name
  let visualization = vis degree k
      degree = (ceiling $ log (realToFrac . length $ k) / log 2)
  renderToSVG (name ++ ".svg") 128 128 visualization

C2HS example: To save other people frustration

Friday, July 10th, 2009

C2Hs is a wonderful little tool.  It generates a lot of the boilerplate code for binding C libraries to Haskell and saves wrists and frustration. However, the documentation I’ve manage to find on it has at times been buggy and incomplete.  I can’t guarantee the following tutorial is idiomatic c2hs — actually I can guarantee that it isn’t — but I can guarantee that it works.

Today I bound libshapefile to Haskell using c2hs.  The following code is not at all Haskell-ish, but I find that it is best to write a true-to-C interface to C code first and then write Haskellish interfaces on top of that.  The following bindings are generated, along with getters and setters for the items in the SHPObject structure:

open :: String -> String -> IO (SHPHandle)
getInfo :: SHPHandle -> IO (Int, Int, [Double], [Double])
readObject :: SHPHandle -> Int -> IO (SHPObject)
close :: SHPHandle -> IO ()
create :: String -> Int -> IO (SHPHandle)
createSimpleObject :: Int -> Int -> [Double] -> [Double] -> [Double] -> IO (SHPObject)
createObject :: Int -> Int -> Int -> [Int] -> [Int] -> Int -> [Double] -> [Double] -> [Double] -> [Double] -> IO (SHPObject)
computeExtents :: SHPObject -> IO ()
writeObject :: SHPHandle -> Int -> SHPObject -> IO (Int)
destroyObject :: SHPObject -> IO ()
rewindObject :: SHPHandle -> SHPObject -> IO (Int)

I’m going to intersperse the c2hs code that I wrote with some explanatory text in hopes that someone finds it helpful.  If the authors of c2hs or someone who is more of an expert in this than me has comments on the code, I’ll be happy to include them in the tutorial.

{-# LANGUAGE ForeignFunctionInterface #-}
{-# LANGUAGE TypeSynonymInstances #-}
module Gis.Shapefile.Internal where

#include <shapefil.h>

import C2HS
import Foreign.Ptr
import System.IO.Unsafe
import Foreign.C
import Control.Monad
import Control.Applicative ((<$>))

The library that you’re binding to needs to be enclosed in a {#context #} tag first.  There are other parameters that can go into context, but these weren’t needed for what I was doing, and don’t seem to be needed super-commonly.  The C2HS import can be found in $HOME/.cabal/share if you’ve installed c2hs with Cabal.  Noe that the #include is left bare up top.  That’s correct even though it’s invalid Haskell code.

For many enumerations, c2hs provides the handy-dandy {#enum #} construct, but for #define-d constants, we have to wrap our own.  I could make these instances of Enum, but it seems like too much work for an internal interface.

{#context lib="shapefile" #}

-- these are defined constants, not an enum, so we can't just use the #enum hook
shptNull = 0
shptPoint = 1
shptArc = 3
shptPolygon = 5
shptMultipoint = 8
shptPointZ = 11
shptArcZ = 13
shptPolygonZ = 15
shptMultipointZ = 18
shptPointM = 21
shptArcM = 23
shptPolygonM = 25
shptMultipointM = 28
shptMultiPatch = 35

data SHPType =
    TNull
  | TPoint
  | TArc
  | TPolygon
  | TMultipoint
  | TPointZ
  | TArcZ
  | TPolygonZ
  | TMultipointZ
  | TPointM
  | TArcM
  | TPolygonM
  | TMultipointM
  | TMultipatch

shpTypeToIntegral TNull = 0
shpTypeToIntegral TPoint= 1
shpTypeToIntegral TArc=3
shpTypeToIntegral TPolygon=5
shpTypeToIntegral TMultipoint=8
shpTypeToIntegral TPointZ=11
shpTypeToIntegral TArcZ=13
shpTypeToIntegral TPolygonZ=15
shpTypeToIntegral TMultipointZ=18
shpTypeToIntegral TPointM=21
shpTypeToIntegral TArcM=23
shpTypeToIntegral TPolygonM=25
shpTypeToIntegral TMultipointM=28
shpTypeToIntegral TMultipatch=35

integralToSHPType :: (Integral a) => a -> SHPType
integralToSHPType 0=TNull
integralToSHPType 1=TPoint
integralToSHPType 3=TArc
integralToSHPType 5=TPolygon
integralToSHPType 8=TMultipoint
integralToSHPType 11=TPointZ
integralToSHPType 13=TArcZ
integralToSHPType 15=TPolygonZ
integralToSHPType 18=TMultipointZ
integralToSHPType 21=TPointM
integralToSHPType 23=TArcM
integralToSHPType 25=TPolygonM
integralToSHPType 28=TMultipointM
integralToSHPType 35=TMultipatch

Note that I declare opaque types two different ways below.  I couldn’t get c2hs to understand the SHPHandle declaration, because it was itself an opaque type in C.  Well, because of something anyway.  Anyway, because of that, I wrote out the opaque type by hand.  Yes it looks recursive, but GHC handles it just fine, and if you use the -XEmptyDataDecls extension this is what the code actually reduces to.

newtype SHPHandle = SHPHandle (Ptr (SHPHandle))
{#pointer *SHPObject newtype #}

Now to define a bunch of getters and setters for the SHPObject type.  The c2hs documentation doesn’t seem to require that you unwrap the opaque type yourself into the Ptr type, but I had to.  It compiles fine if you don’t, but then you have to unwrap it in your other code, and I like to keep all my foreign data in one module where possible.

getSHPType (SHPObject x) = integralToSHPType <$> {#get SHPObject->nSHPType #} x
getShapeId (SHPObject x) = {#get SHPObject->nShapeId #} x
getParts (SHPObject x) = {#get SHPObject->nParts #} x
getPartStart (SHPObject x) = {#get SHPObject->panPartStart #} x
getPartType (SHPObject x) = {#get SHPObject->panPartType #} x
getVertices (SHPObject x) = {#get SHPObject->nVertices #} x
getX (SHPObject x) = {#get SHPObject->padfX #} x
getY (SHPObject x) = {#get SHPObject->padfY #} x
getZ (SHPObject x) = {#get SHPObject->padfZ #} x
getM (SHPObject x) = {#get SHPObject->padfM #} x
getXMin (SHPObject x) = {#get SHPObject->dfXMin #} x
getYMin (SHPObject x) = {#get SHPObject->dfYMin #} x
getZMin (SHPObject x) = {#get SHPObject->dfZMin #} x
getMMin (SHPObject x) = {#get SHPObject->dfMMin #} x
getXMax (SHPObject x) = {#get SHPObject->dfXMax #} x
getYMax (SHPObject x) = {#get SHPObject->dfYMax #} x
getZMax (SHPObject x) = {#get SHPObject->dfZMax #} x
getMMax (SHPObject x) = {#get SHPObject->dfMMax #} x

setSHPType (SHPObject p) v = {#set SHPObject->nSHPType #} p (shpTypeToIntegral v)
setShapeId (SHPObject p) = {#set SHPObject->nShapeId #} p
setParts (SHPObject p) = {#set SHPObject->nParts #} p

In the next line we see some Haskell code being obviously mixed with c2hs markup.  This is standard practice and it’s safer than it seems like it might be.  c2hs parenthesizes most things and seems to handle Haskell syntax perfectly.  Here we have to marshal an list of doubles to a pointer.  The newArray is correct here, because the array needs to live on the C heap and not be garbage collected.  The new- as opposed to alloca- means that it is our responsibility to deallocate it.  Fortunately, destroyObject (bound later) releases that memory for us.

setPartStart (SHPObject p) v = newArray v >>= {#set SHPObject->panPartStart #} p
setPartType (SHPObject p) v = newArray v >>= {#set SHPObject->panPartType #} p
setVertices (SHPObject p) = {#set SHPObject->nVertices #} p
setX (SHPObject p) v = newArray v >>= {#set SHPObject->padfX #} p
setY (SHPObject p) v = newArray v >>= {#set SHPObject->padfY #} p
setZ (SHPObject p) v = newArray v >>= {#set SHPObject->padfZ #} p
setM (SHPObject p) v = newArray v >>= {#set SHPObject->padfM #} p
setXMin (SHPObject p) = {#set SHPObject->dfXMin #} p
setYMin (SHPObject p) = {#set SHPObject->dfYMin #} p
setZMin (SHPObject p) = {#set SHPObject->dfZMin #} p
setMMin (SHPObject p) = {#set SHPObject->dfMMin #} p
setXMax (SHPObject p) = {#set SHPObject->dfXMax #} p
setYMax (SHPObject p) = {#set SHPObject->dfYMax #} p
setZMax (SHPObject p) = {#set SHPObject->dfZMax #} p
setMMax (SHPObject p) = {#set SHPObject->dfMMax #} p

Now for a few utility functions: to/fromSHPHandle, which marshals and unmarshals a bare pointer, allocate/peek4, and peekInt.  We’ll use these soon in our {#fun #} bindings, but it appears from working with c2hs that you can’t curry arguments into marshallers or use lambda expressions; I’m not sure if this is me or if this is c2hs, but the solution of writing utility functions works.

toSHPHandle = SHPHandle . castPtr
fromSHPHandle (SHPHandle x) = castPtr x

arrDouble x = ((newArray . map realToFrac $ x) >>=)
arrInt x = ((newArray . map fromIntegral $ x) >>=)

allocate4 = allocaArray 4 

peek4 d = do
    lst <- (peekArray 4 d :: IO [CDouble])
    return . map cFloatConv $ lst

peekInt i = peek i >>= return . cIntConv    

{#fun SHPOpen as open
    { `String'
    , `String'
    } -> `SHPHandle' toSHPHandle #}

Right.  So I’ll explain the two marshallers on either side of the paragraph here.  Argument and return types are outlined in c2hs using a backtick followed by the type followed by a forward tick.  That is not a formatting mistake.  To the left of the arrow are the arguments, and to the right of the arrow is the return type. On the left side of each type can be an “in-marshaller” and on the right, an “out-marshaller”, which translates between Haskell types and C types. There are also two signifiers that can come at the end of the marshaller: - and *.  the ‘-’ on the in-marshaller signifies that the argument is to be handled entirely within the function def and not passed in as a parameter.  The * signifier says that the result will be within the IO monad and to handle it specially.  A good rule seems to be that if you’re working with pointers, you’ll need it.  Out marshallers in the arg list imply that the parameters are “out” parameters, meant to be returned as part of the function return.  c2hs generally handles this as a tuple.  So the following function will have type SHPHandle -> IO (Int,Int,[Double],[Double]).

-- void SHPGetInfo(SHPHandle, int*, int*, double*, double*)
{#fun SHPGetInfo as getInfo
    { fromSHPHandle `SHPHandle'
    , alloca- `Int' peekInt*
    , alloca- `Int' peekInt*
    , allocate4- `[Double]' peek4*
    , allocate4- `[Double]' peek4*
    } -> `()' #}

Note the use of “id” as a marshaller below.  It seemed like c2hs should have a default marshaller for opaque types, but compiling it told me otherwise, so I added it and it worked.  Experimentation, experimentation!

{#fun SHPReadObject as readObject
    { fromSHPHandle `SHPHandle'
    , `Int'
    } -> `SHPObject' id #}

{#fun SHPClose as close
    { fromSHPHandle `SHPHandle' } -> `()' #}

{#fun SHPCreate as create
    { `String'
    , `Int'
    } -> `SHPHandle' toSHPHandle  #}

{#fun SHPCreateSimpleObject as createSimpleObject
    { `Int'
    , `Int'
    , arrDouble* `[Double]'
    , arrDouble* `[Double]'
    , arrDouble* `[Double]'
    } -> `SHPObject' id #}

{#fun SHPCreateObject as createObject
    { `Int'
    , `Int'
    , `Int'
    , arrInt* `[Int]'
    , arrInt* `[Int]'
    , `Int'
    , arrDouble* `[Double]'
    , arrDouble* `[Double]'
    , arrDouble* `[Double]'
    , arrDouble* `[Double]'
    } -> `SHPObject' id #}

{#fun SHPComputeExtents as computeExtents
    { id `SHPObject' } -> `()' #}

{#fun SHPWriteObject as writeObject
    { fromSHPHandle `SHPHandle'
    , `Int'
    , id `SHPObject' } -> `Int' #}

{#fun SHPDestroyObject as destroyObject
    { id `SHPObject' } -> `()' #}

{#fun SHPRewindObject as rewindObject
    { fromSHPHandle `SHPHandle'
    , id `SHPObject' } -> `Int'  #}

This compiled and ran on my machine. Note that you do have to have libshp and shapfil.h installed on your machine to compile this example.

Anyway, I hope this helps someone.  My first c2hs wrapper was a heck of an experience, but the code seems easy to maintain and the results of it not much different than what I would have written by hand except that it’s incredibly shorter.


				

An ugly force-directed layout implementation

Thursday, July 9th, 2009

Updated 7.9.2009: I’ve added another video showing the effect of attractors on the layout

Unalbe to show flash video

I’ve been working on visualizations silently of late.  Today, though, I threw together a quick FDL (force-directed layout) algorithm based on POV-Ray’s static-field isosurface or “blob” equation, which is (sorry, but I’m using plaintext)

density = strength * (1 - (distance / radius)^2)^2

The documented equation has a problem, in that if distance > radius, the density approaches infinity fast, so I change it to be:

density = strength * (1 - (min(distance, radius) / radius)^2)^2

That solves the infinity problem, and now all there is to the algorithm is to apply it:

adjust :: Double -> Double -> Double -> Double -> Double -> Double -> (Double,Double)
adjust sx sy ss sr dx dy = if isNaN vx || isNaN vy then (dx,dy) else (dx+force * vx, dy+force * vy)
    where force = density ss sr dist
          dist = sqrt ((dx-sx)^^2 + (dy-sy)^^2)
          vx = (dx-sx) / dist
          vy = (dy-sy) / dist

Note that it is possible for dist to be NaN for a single point-to-point interaction, so we correct for that.  Basically, this function is a single adjustment from point set to a point set that fits our force constraint a little better.  In the following code, we take a static vector, <sx,sy> and a movable vector <dx,dy> and take the distance.  We apply the density function to the distance using ss as our strength and sr as the radius or distance at which the density function falls off to zero.  We return the translated point P’ which is translated along the normal vector <vx,vy> by the variable “force”. Simple and straightforward.

Now I know… by my title, you’re thinking, “What’s ugly about that?” Well, when I code fast, I use shortcuts.  The following code, which uses this adjustment function as a kernel for the actual force-directed layout algorithm is imperative.  I’ve tried FDL and MDS (multi-dimensional scaling) algorithms before using lists and tries and always ran into enough overhead that it significantly diminished the number of points that are viable to run a single iteration in real-time.  I’m sure there’s a way to do it, but I solicit the community’s help in suggesting a faster way than this.  Yes, I could have used STUArray and avoided IO, but that doesn’t really eliminate the imperative nature of things.

layoutIteration :: Double -> [Attractor] -> (Arr Int Double) -> (Arr Int Double) -> (Arr (Int,Int) Double) -> IO ()
layoutIteration alpha attractors xsArray ysArray radii = do
    bounds <- A.getBounds radii
    forM_ (A.range bounds) $ \(r,c) -> when (r /= c) $ do
        radius <- radii -| (r,c)
        x0 <- xsArray -| c
        y0 <- ysArray -| c
        x1 <- xsArray -| r
        y1 <- ysArray -| r
        let (x',y') = adjust x0 y0 alpha radius x1 y1

        xsArray =| r $ x'
        ysArray =| r $ y'

    forM_ attractors $ \(CAttractor ax ay arArray) -> do
        bounds <- A.getBounds arArray
        forM_ (A.range bounds) $ \ix -> do
            x0 <- xsArray -| ix
            y0 <- ysArray -| ix
            ar <- arArray -| ix
            let (x',y') = adjust ax ay (-alpha) ar x0 y0
            xsArray =| ix $ x'
            ysArray =| ix $ y'

As you can see, this is pretty ugly, even with the aliased (-|) readArray and (=|) writeArray binary operators cleaning things up a bit.  It’s straightforward, and random access to the points is O(1).  There are algorithms for which this is more important than this one, and I can see a list comprehension version coalescing even as I write this, but this kind of code is O(N^2). The constants are sometimes (depending on the adjustment function) quite high, so what I really need is the fastest version possible, probably taking a lot of advantage of fusion in the compiler.  The other problem, which is more subtle,  is that points are updated multiple times in one pass and we always want to use the latest point.  Any functional solution would have to take this into account (and storing all the update deltas and taking their centroid won’t work, because each update wouldn’t then be based on the point’s most current position).

This one seems to handle up to about 250 points quite well, which is decent for an FDL algorithm (keep in mind, that’s 10000 distance calculations per iteration, all of which have to happen before sending any points to the video card and other such matters — doing this 30-50 times a second along with other functionality is harder than it sounds).  Handling more than that would require one of two modifications to the algorithm: either select a random N moves to make per iteration or subdivide the point set into sqrt(N) size blocks, perform the FDL on each of those, and then on their centroid, translating them all by the delta.

So essentially, there were two purposes to this exercise.  One was coding an FDL algorithm in Haskell, but the other was trying out an FDL I hadn’t seen tried before.  This one in particular does not result in the usual circular pattern arising and also handles negative forces as easily as positive forces with a simple, low-overhead function. Oh, and of course the other was showing that I could get realtime performance of such an algorithm (that previously people insisted had to be done in C to be done properly) with a relatively naive implementation.

Force directed layout algorithms are useful for tasks like graph layouts and other dense datasets as well as for performing a sort of “visual” clustering.   Here’s an example showing the starting state, with all points clustered in a square in the middle of the screen:

screenshot-testing-force-directed-layouts-1

And here is one after the layout has run for a few hundred iterations:

screenshot-testing-force-directed-layoutsThe evenness reflected here is because the ideal distances I selected are random.  If there is a system to the distances, there will be a system to the layout.  The neat thing about this algorithm is that given any set of input points and distances, the result of the algorithm at any point is deterministic. Follows is the code that I used to generate the above pictures and a short movie of the forces in action.

This is the code for the algorithm and a test program in Hieroglyph/Buster

A video of the algorithm in action

Unalbe to show flash video

ANN: Buster, the not quite entirely unlike FRP library.

Wednesday, April 1st, 2009

Note: I just released an update to 0.99.2.  Now included are Widgets for getting environment vars and command line args and the EData type has been extracted from Event to be separate.

People seem to like the idea so far, so I’m releasing the thing I posted about yesterday on Hackage and we’ll see who uses it.  I’m calling it Buster for now, because I’m bad at naming my own projects.  Haddock documentation is here. The following is a brief example of how to use the library:

module Main where

import App.Widgets.GtkMouseKeyboard
import App.Behaviours.PrintEvents
import App.EventBus

import Control.Concurrent
import qualified Graphics.UI.Gtk as Gtk
import qualified System.Glib.MainLoop as Gtk

main = do
    Gtk.unsafeInitGUIForThreadedRTS -- buster is multithreaded, so you will need this.
    window <- Gtk.windowNew -- create a window to bind the mouse keyboard widget to
    Gtk.windowSetDefaultSize window 800 600
    Gtk.widgetShowAll window

    let mk = bindMouseKeyboardWidget (Gtk.castToWidget window) -- bind the widget

    -- bus is our main loop
    bus [mk]                                                   -- add the widget to the bus
        (Gtk.mainContextIteration Gtk.mainContextDefault True) -- the bus executes this thunk to collect data from widgets, if necessary (to support the various main loops out there, like GLUT and Gtk)
        printEventsBehaviour                                   -- add the behaviour to the bus

Compiling this after install buster will get you a single Gtk window which tracks your mouse position, scroll wheel direction, button press releases and keyboard and outputs it via the printEventsBehaviour, which is pretty useful overall for debugging. Output looks like this:

name:     Position
source:   GtkWindow.KeyboardMouseWidget
group:    Mouse
ttl:      Iterations 1
emitTime: 20:10:26
580.0
85.0

By inspecting the source of GtkMouseKeyboard and PrintEvents you can see in a pretty straightforward manner (I think anyway) how to code your own behaviours and widgets.  I reccommend sticking to the various default datatypes and not depending on the “a” of “Event a” to be anything in particular, because that means that your widgets will be compatible with everyone else’s.  I intend the “a” to be determined by the actual application developer to do special purpose things.  If people are interested, I’ll open  up a space on here where you can upload your own widgets (as cabalized source dists) and I will compile them into one library called buster-widgets-community which I’ll update whenever other people add widgets.  If not, there will be some basic support for GL and XML-RPC webservices that i’ll add as I need it for The Big Board.

Almost, but not quite entirely unlike FRP.

Tuesday, March 31st, 2009

Yes, it’s to solve a particular problem.  And yes, this is a rough draft of an explanation of how it works.  I’ve not even really solidified the vocabulary yet, but I have this module, EventBus, which couches a large, abstract, interactive (both with the user and the system), multicomponent application in terms of a bus, inputs, behaviours, and events.

  • Time is continuous and infinite.
  • An event is a static, discrete item associated with a particular time.
  • The bus is the discrete view of event in time at an instant.
  • A widget is an IO action that assigns events to a particular time based only upon sampling the outside world (other events and behaviours are irrelevant to it).  e.g. a Gtk Button is a widget, a readable network socket is an widget, the mouse is an widget, the keyboard is an widget, a multitouch gesture engine is a widget.
  • A behaviour is a continuous item — it exists for the entire program and for all times — which maps events on the bus to other events on the bus.  It is an IO action as well — where widgets only sample the outside world and are in a sense read only, behaviours encapsulate reading and writing.

In a sense a behaviour is an application by itself. Examples of behaviours are: writing a file (the file itself is an Event), opening and reading a file, rendering data to the screen, and so on.  Behaviours also encompass more mundane tasks: translating mouse button up and down events that are close together in time to click, double click, and triple click events.  Behaviours are composable and with the presence of a passthrough behaviour form a monoid.  An application is a composition of 1 or more behaviours and a “running” application are those behaviours applied to a sample of time (as we have defined it earlier) and the real world.

Behaviours are attached to the bus.  They are composed by relating their sample of time, their bus, to the sample that other behaviours receive. Thus we define three combinators that relate events: behind, infront, and beside (given by operators (<~<, >~>, and |~| respectively)

If behaviour A is “behind” behaviour B, then behaviour A will see the time on the bus right after behaviour B.  Behaviour B’s filtering of events happens before behaviour A’s sample of time.

If behaviour A is “in front of” behaviour B, then behaviour A will see the time on the bus right before behaviour B.  Behaviour A’s filtering will happen before behavour B’s sample of time.

If behaviour A is “beside” behaviour B, then their order is undefined, and they may filter the bus concurrently.  This leads to a well defined merge of the two filtered buses as defined in the famous paper that inspired Unix diff(1).

So what’s the type of behaviour?

type Behaviour a = Bus a -> Future [Diff a]
data Diff a = Insertion (Event a) | Deletion (Event a)

Future encapsulates the notion that the behaviour doesn’t define where in time the events it introduces are, only that they exist in some time after the time the Behaviour samples.  Diffs encode the notion that time after the Behaviour’s sample can be described by insertions or deletions of events that exist in time before the Behaviour’s sample.

Now let’s enumerate what exists in an event.

data Event a = Event {
    name :: String   -- a name that is unique within the group of events
  , group :: String  -- together with name and src, part of the fully qualified name of the event.
  , src :: String    -- the source of the event
  , eventdata :: [EData a]  -- the data associated with the event
  , timespan :: TimeSpan    -- how long after the event appears on the bus
  , time :: UTCTime }       -- the time that the event appeared on the bus

data TimeSpan = Persistent | Time DiffTime | Iterations Int
data EData a = EString String | EChar Char | EStringL [String] | EInt Int | EIntL [Int] | EDouble Double | EDoubleL [Double] | EOther a

The various names give the behaviours various ways of sampling events on the bus.  The event data defines some basic types and allows for a user defined type as well, and the data itself is a collection (I chose a list, but in fact the ordering doesn’t have to be defined).  The timespan defines the width on the timeline that the event is visible from its start.  Some events are persistent until deletion.  Some events, like mouse clicks, can be defined using Iteration so that they disappear after it’s clear no behaviour will consume them.  Other events exist for some discrete time over the continuous time stream.

Here’s a diagram of how the system works as a whole:

bus1

If people don’t recoil in terror over this post, I might even release it to the wild.  So far I’ve incorporated a behaviour that encapsulates the Gtk mouse/keyboard event gamut and one that handles Renci’s own multitouch library, and I plan to integrate it with other things as I continue developing the Big Board application.  It should be relatively straightforward, even to write a function that generically wraps any simple Gtk Widget and several of the more complex ones into the bus.  Then other things like network collections, web service clients, and the like could become behaviours as well.

Anyway, let me know what you think…

Heiroglyph: Interactively graph a spreadsheet in 99 lines of code.

Thursday, February 5th, 2009

screenshot-sparklines-for-breast-cancer-wisconsin-conttab-1

99 lines of code, even with imports. The app is simple, but it builds upon the sparkline example of the other day. In this app, we import an entire spreadsheet of tabular data and create sparklines for each column of data where the data is floating point. The user can sort the data by a column by clicking on the left side of a sparkline. This doesn’t do a lot of error checking, and only works on spreadsheets where the columnar data type is homogenous, but that’s a large class of datasets, and it simplifies our code considerably. First, our boring imports:

> module Main where
>
> import Graphics.Rendering.Hieroglyph
> import Graphics.Rendering.Hieroglyph.Scaffolds.Interactive
> import qualified Data.Map as Map
> import Data.Map (Map)
> import qualified Data.Set as Set
> import Data.Set (Set)
> import qualified Data.ByteString.Lazy.Char8 as C
> import Data.List (foldl',transpose,mapAccumL,sort)
> import System.Environment (getArgs)
> import Data.Maybe (fromJust)
> import qualified Data.IntMap as IntMap

Now we setup a bit of color for our visualization: attribute sets for describing thin black lines (the sparklines), thicker black lines (for the text), and the background color.

> blackstroke = plain{ strokeRGBA=black, linewidth=0.5 }
> blacktext = plain{ strokeRGBA=black }
> whitefill = plain{ fillRGBA=white, filled=True, outlined=False }

Then we setup our datatype that we’re visualizing. Note that we’re using the interactive scaffolding, so this will be an interactive application.

> data MyData = MyData {
>     interactiveScaffolding :: Scaffolding
>   , spreadsheet :: Map String [Double]
>   , ncolumns :: Int
>   }

Next we setup the data so that it’s responsive to the user input.

> instance Interactive MyData where
>     getInteractiveScaffolding = interactiveScaffolding
>     setInteractiveScaffolding a b = a{ interactiveScaffolding = b }
>
>     interactionFold dat geom

Yes, the 80 is arbitrary. It seems to work well, and to some extent, this is a draft application and isn’t meant to be perfect. Here, if the user clicks on the left side of the application, we check to see which sparkline the user is over and sort that column in ascending order.

>         | xcoord < 80 && getMouseLeftButtonDown dat = dat { spreadsheet = sortSpreadsheet dat }

Then we ignore all other inputs.

>         | otherwise = dat
>                 where Point xcoord ycoord = getMousePosition dat

Next is our familiar sparkline function from my blog post the other day. As a review, we remap all the values in the line to be inside the height of a sparkline and we take those and create a path out of them.

> sparkline width height (Point startx starty) values = path{ begin=point0 , segments=map Line points , attribs=blackstroke }
>    where (point0:points) = zipWith Point xvals yvals
>          xvals = iterate (+(width/n)) startx
>          yvals = map (remap mn mx starty (starty+height)) values
>          (mx,mn,_,_,_,n) = stats values

To visualize a whole spreadsheet, we start at the origin and work our way down using Map.mapAccum to create a visualization that is the same structure as our spreadsheet. We could simply pull the columns out of the spreadsheet and return a list of primitives instead of a map from name to primitives, but this way, we could filter the output from this function based on the structure of the original data. To me, this is closer to the intent of visualization; it’s less about the drawing aspect and more about the data, and manipulations on visualizations are based on the data, not the geometry or the screen.

> visSpreadsheet width height (Point startx starty) mp = snd . Map.mapAccum stackSparklines starty $ mp
>     where stackSparklines starty' values = (starty'+height, sparkline width height (Point startx starty') values)

This function places the names of the columns to the left of the sparklines.

> visSpreadsheetNames height starty names = snd . mapAccumL stackNames starty $ names
>     where stackNames starty' name = (starty'+height, text{ str=name, bottomleft = Point 5 starty', attribs=blacktext })

This merely defines our background rectangle

> background w h = rectangle{ width = w , height = h , attribs=whitefill }

And finally, vis calls our vis functions and also sets up the occlusion order, putting the names and sparklines on top of the background. The height of each sparkline is the number of sparklines divided by the height of the window.

> vis dat = visSpreadsheetNames (sy/ncf) 10 names
>           `over` visSpreadsheet (sx-offset) (sy/ncf) (Point offset 0) sheet
>           `over` background sx sy
>     where sx = getSizeX dat
>           sy = getSizeY dat
>           nc = ncolumns dat
>           ncf = fromIntegral nc
>           sheet = spreadsheet dat
>           Point xcoord ycoord = getMousePosition dat
>           offset = 6 * (fromIntegral . maximum . map length $ names)
>           names = Map.keys sheet

This is a simple function to read a spreadsheet and filter out columns that aren’t numeric. Note that it only checks the first item of each column, so it really doesn’t do sufficient error checking, however it’s faster than loading the entire spreadsheet to check to make sure each is numeric. Yes, we could have used Parsec here instead of defining our own isNumeric, but I wanted to keep down the number of libraries I was importing for tutorial’s sake.

> readSpreadsheet name = do
>      sheet <- (transpose . map (C.split '\t') . C.lines) `fmap` C.readFile name
>      return $ foldl' go Map.empty sheet
>   where go m (x:xs) | isNumeric xs = Map.insert (C.unpack x) (map (read . C.unpack) xs) m
>                     | otherwise = m
>         isNumeric = C.all isNumeral . head
>         isNumeral = (flip Set.member) nums
>         nums = Set.fromList "0123456789."

Remap a range of values to a different range. This is used in the sparkline function, and since our origin is at the top left and values go down, the seeming reversal of the first two arguments of the function is correct for our purposes.

> remap mx mn mn' mx' a = (mx'-mn') * (a-mn) / (mx-mn) + mn'

Once again, you should recognize this stats function from the blog post from the other day.

> stats (x:xs) = finish . foldl' stats' (x,x,x,x*x,1) $ xs
>     where stats' (mx,mn,s,ss,n) x = ( max x mx , min x mn , s + x , ss + x*x , n+1 )
>
> finish (mx,mn,s,ss,n) = (mx,mn,av,va,stdev,n)
>    where av = s/n
>          va = ss/(n-1) - n*av*av/(n-1)
>          stdev = sqrt va

This sorts the spreadsheet based on which sparkline the mouse pointer is on using the mapsort algorithm below.

> sortSpreadsheet dat = mapsort sheet (names !! item)
>     where sy = getSizeY dat
>           nc = ncolumns dat
>           ncf = fromIntegral nc
>           item = round $ (ycoord / sy) * ncf
>           Point _ ycoord = getMousePosition dat
>           names = Map.keys sheet
>           sheet = spreadsheet dat

This function is a little more elegant than it looks. It’s not terribly useful to do things this way here, but this is a lazy version of sorting the entire map. The more straightforward way to sort the map uses transpose to turn columns into rows, but also makes the whole thing head strict on the first element retrieved from the map. This is only head strict per column.

> mapsort mp key = Map.map reorder mp
>     where sort' vs = sort . zip vs $ indices
>           ordering = map snd . sort' $ (mp Map.! key)
>           nitems = (length ordering :: Int)
>           indices = iterate (+1) (0::Int)
>           reorder vs = (let arr = IntMap.fromList . zip indices $ vs in map (arr IntMap.!) ordering)

Now finally our very simple main. Take the argument from the command line that is our spreadsheet and create the visualization from it. Note that this GUI is not motion sensitive, as it really doesn’t need to be.

> main = do
>     [fname] <- getArgs
>     sheet <- readSpreadsheet fname
>     let dat = MyData scaffold sheet (Map.size sheet)
>     simpleGui dat vis ("Sparklines for " ++ fname)

Snippet: sorting the elements of a map

Thursday, February 5th, 2009

So in an application where you have a lot of data columns indexed by name, it’s often necessary to sort that data based on one or more of the columns.  You might be tempted to do this the straightforward way of

  1. Extract the data from the map using toList
  2. unzip and transpose to create rows instead of columns
  3. sort the resultant list using Data.List.sort or Data.List.SortBy
  4. transpose, zip, and put the data into a new map.

Now that’s a reasonable way to do things, but it raises one minor issue, which might not be so minor if you’ve got a really large dataset.  It’s “head-strict”.  The second you access one of the elements of the map, the entire sort procedure takes place for all the data in the map.  If you’re only needing to access a column or two of a large multicolumn dataset once it’s sorted and on top of that, are only needing a few elements of those columns, this is SERIOUSLY overkill.  Here’s a better way:

module MapSort where

import qualified Data.Map as Map
import qualified Data.List as List
import qualified Data.IntMap as IntMap

sort mp key = Map.map reorder mp
    where sort' vs = List.sort . zip vs $ indices
          ordering = map snd . sort' $ (mp Map.! key)
          indices = enumFrom (0::Int)
          reorder vs = IntMap.toAscList . IntMap.fromList $ zip ordering vs

This solution is head-strict per column instead of for the whole map.  How did we achieve that?  We created an ordering out of the sort key column and then applied that ordering on each one of the lists stored in the map individually.  Simple and efficient.  There are probably even more efficiencies to be had here, but this one works for me quite well.  From here, it should be straightforward to implement sortBy and even use a list of keys instead of a single key as the sorting key for the map.

Hieroglyph Haddock docs

Wednesday, February 4th, 2009

Since Hackage is building with GHC 6.10 now and we’re still waiting on a final release of Gtk for 6.10, it’s not generating the Haddock docs for Hieroglyph.  I’m posting them here for the time being. That’s all!

Introducing Hieroglyph. Purely functional 2D visualization on top of Cairo.

Tuesday, February 3rd, 2009

As a followup to my tutorial last year on Beautiful Code, Compelling Evidence, the tutorial on doing infovis and visual analytics in Haskell, I set about trying to do a pure-functional wrapper to Cairo.  After a month or so of banging my and Conal’s head against it, I am ready to (with many MANY thanks to Conal Elliott for his help in getting me to think about this in a semantic way) release a first pass at purely functional 2D drawing, called Hieroglyph.

My original goal was to come up with something that simply produced no IO when creating a scene and was fast enough to create interactive visualizations.  What I was eventually directed toward was semantic design and trying to figure out what a 2D visualization means. Hieroglyph is a first cut at that — a compromise between strict semantic design and staying close enough to Cairo to not make things easy in Hieroglyph that create performance problems in Cairo.  Future versions of Hieroglyph or forks will be less based on Cairo and more based on semantic design, but for now, I’m satisfied with what I have.

So the first question is: What is a visualization?  My answer is that it’s a function from data to a visual.  A visual is an unstructured collection (bag) of primitives.  The data can be any data that is interesting to the user doing the visualizing.

To encapsulate the Visual, we have:

type BaseVisual = [Primitive] 

class Visual a where
  primitives :: a -> BaseVisual

In Graphics.Rendering.Hieroglyph.Visual, I define instances on each of the basic collection types in the Haskell standard library, Visual v =>  Data.Set.Set v, Data.IntMap.IntMap v, Data.Map.Map a v,  [v], and v itself.  This allows the programmer to be unconcerned with how the collection of visual primitives is structured, allowing in many cases a straightforward map from data to primitives with the same structure.  In addition, there are two relationships between Visuals that are important to describe.  One is occlusion.  To describe the occlusion characteristics of two visuals, I have defined the combinator “a `over` b” to describe a composed visual in which a occludes b.  The other is specificity.  There is a concept in visualization of level-of-detail, and this is encapsulated by the combinator “a `moreSpecific` b”, which declares that a is more specific than b.  Both over and moreSpecific describe partial orderings over primitives, which can be used for filtering, or used by the renderer to order and constrain drawing.

Now for the primitives.  A primitive is nothing more than a declaration of attributed geometry.  Our primitives in Hieroglyph are Arc, Path, Rectangle, Text, Union, Image, and Hidden.  I chose to implement primitives as “labeled” functions using the Haskell record modification syntax.  While you can declare a primitive using the type constructor, it is easier and more expedient to use the lower case version of each of the primitive constructors and modify them using the record modification syntax.  The Haddock documentation describes what each record field is for every primitive.

Over the next few days, I’ll be giving examples of Hieroglyph and talking a little bit more about my design process as I do.  In the meantime, enjoy Hieroglyph and let me know about any bugs!