Visualizing iBiblio Traffic Redux

Written by Jeff Heard on February 28th, 2010

So I was asked to pull together an artistic infovis to give as a gift for the director of iBiblio’s keynote speech the other day at code4lib and this is what I came up with.  I thought, “Sure, why not?”  iBiblio has approximately 8TB of web-logs stored on our machines from the beginning of time to the present, and I took those and pulled down all the unique IP addresses for each web server.

I broke all these addresses up into subnets and assigned a unique position in a sphere to each subnet.  Longitude is the first octet, mapped from 0,255 to (-180,180)  Latitude is the second octet, mapped from 0,255 to (-90,90), and the radius out from the center is the third octet.  The subnets are visualized by a PovRay blob component or a sphere in the case of more than 10% of active internet subnets have hit the site.  The radius of the sphere or the strength of the static field behind the blob is the number of IP addresses within the subnet that have touched the site.  The results are thus:


Durham Bike Co-op

The final visualization, a poster, is ordered by the number of subnets that have accessed the site over the years (apologies for the size of this one, but I can’t shrink it much more without it being too small to understand.  Click for the full sized pic (31″x17″) @ 300dpi, be warned!


You’ll notice some color variation in the individual globes.  I scattered three different colored light sources over three-space in POVRay to make for some visual variation in the otherwise dense fields of the visualizations.  If I hadn’t done this, the result would have been that the whole thing would have had a kind of murky gray cast that made individual blobs more difficult to distinguish.  Although the color here is otherwise meaningless, it does give the eye something to separate one bit of the image from another.

SPDE, Semi-functional programming for Processing

Written by Jeff Heard on January 15th, 2010

I’ve been using Processing for awhile now as a tool for prototyping visualizations and getting visualizations on the web.  The main problem with Processing is all the hideous programming habits it encourages; I feel at the end of a long bout of Processing like I’m my 5 year old self back at the keyboard of my Commodore 64, typing 10 PRINT “Hello World!” The only thing it doesn’t have is a goto statement.  So imagine my pleasant surprise when I found you could code in Processing without having to use their awful P4 subdialect of Java.  Instead, I can code in Scala, which while not as purely functional as Haskell, does allow functional programming and encourages more sensible programming. The environment is called SPDE and at its most basic level it consists of The Graft, a specially constructed catch-all package that contains the scala compiler, the simple-build-tool, and the Processing runtime and bells and whistles.

Unlike Processing, SPDE is command-line oriented, but all the familiar things (except one niggling detail, and that is the creation of fonts, which still has to be done in the Processing tool) are there, including applet creation, one-step running, etcetera.  Actually applets are a little less work in SPDE, as typing applet in the tool will cause a browser window to pop up with the created applet, instead of merely exporting the directory and showing it in the Finder (I use a Mac).

To get going, download the graft jarfile and execute it in your operating system’s favorite fashion (e.g. java -jar graft-setup-0.2.2.jar).  You should get a fresh new directory with a Sketch.spde, several directories, and an sbt.bat or sbt file (depending on your OS).  If everything’s worked perfectly, it’ll even pop up a sketch that looks like an epilepsy-triggering barcode.

Now to start doing your own thing.  sbt is the Simple Build Tool.  Sketch.spde is the file you’ll start modifying.  Any file with an .spde extension will be compiled by the compiler into a single sketch (note these are not really modules and you don’t get separate namespaces for each file, unlike Java, so if you have name clashes, they will produce esoteric error messages that e.e. cummings would have been proud to have written).  SPDE and Processing both follow the same model of faking a toplevel by enclosing whatever you type in the runtime class, so you get a lot for free, and although you’re still writing Scala, it’s a little freer than the straight-up Scala you’d write normally.  Here’s an applet that i created in SPDE. Let’s look at some of the code that produced it:

Click to continue »

Radial Bar Charts

Written by Jeff Heard on December 15th, 2009

The other day, I was presented with a problem of visualizing a small 3-D Excel spreadsheet in a way that allowed the viewer to:

  • Compare columns for balance.
  • Compare the third dimension alternates (professors vs. grad students).
  • Compare the rows for balance.
  • Compare the overall sums.

What I came up with was this:


Along the ring are the names of the columns of data in the spreadsheet.  On each spoke are the rows in the two layers of the spreadsheet visualized as bar charts.  Stretching counterclockwise are the bars for senior faculty.  Stretching clockwise are the bars for graduate research assistants.  In the center, visualized as bubbles of varying radii are the totals from both bar charts along the spoke.  Note the light, thin rings connecting each bar. These are designed to draw a viewer’s eye around the chart, connecting bar to bar visually to inform the viewer that comparison is relevant.

This technique is appropriate for smaller charts.  I have a feeling that too many columns would get to be too visually busy, although I suspect the chart’s behaviour as rows increase may well be more robust, especially if the bar chart was changed to an area chart.

HDR imaging library for Haskell based on pfsutils

Written by Jeff Heard on August 28th, 2009

PFS stands for Portable Floating-point Streams, and it’s a framework for reading and writing HDR images in a variety of formats and also tonemapping them. The HDRUtils library is an Haskell library for reading these streams and formatting them into Haskeller friendly 2D C-compatible arrays of 32bit floats.On top of these are a simple index/assign bijection that uses Russell O’Connor’s Data.Colour package to deal with colorspaces in a human-perception centric way.  Most tonemapping algorithms are based loosely on human perception models, so Haskell is ideally suited to developing new tonemapping algorithms based on roconnor’s model.

The general procedure for working with the streams is something like:

import Graphics.Image.PFS
import Graphics.Image.PixelMap

main = getFrame >>= myToneMappingProcedure . toPixelMap >>= putFrame . fromPixelMap

Where myToneMappingProcedure is something that takes pixels and rewrites them algorithmically.  I intend soon to add a more functional layer to this, using immutable arrays or STArrays, but I haven’t decided on the best way to do it yet. This at least makes the library available.  I’d like to see GdkPixbuf or DevIL images map to PixelMaps, eventually, and to see frame-aligning algorithms, convolutions, and other things layer on top of this library as applicative combinators.  In the meantime, however, have fun with what I have, and if you have a good idea for a combinator architecture, let me know and we’ll work it out.

This project is Open Sourced on github

Open Sourcing The Big Board

Written by Jeff Heard on August 26th, 2009

The Big Board from Renaissaince Computing Institute on Vimeo.

The Big Board is a real-time collaborative environment for mapping.  Users open “conference rooms” on a shared map, and join conversations in these conference rooms, much like in a regular teleconferencing application.  However, instead of sharing faces and powerpoints and speech over the wire, users draw on and add content to a shared map or very large image.

The platform is somewhat similar to Google Maps, ArcInfo, or Microsoft VirtualEarth, however the similarity stops where real-time collaboration begins.  Instead of a static map or a remote feed of content, the users of The Big Board have full interactive control of the surface they are working on.  Adding content to the map will cause that content to show up to all participants in the teleconference.

Content is not merely a polygon or line nor even a bit of text attached to that line or polygon.  Content is dynamic and rich, using a PmWiki based backend to allow users to create their own hypertext content on the fly and anchor it to polygons, lines, points, rectangles, or circles on the map.  Users can also anchor geometry out to the greater world-wide-web.  Clicking on the anchor (that is, the piece of geometry overlayed onto the map) will open a browser to the content.

The Big Board with content in Firefox

The Big Board with content in Firefox

In addition to dynamic, user-generated content, the Big Board will also handle arbitrary geometry overlays (provided by the server), and web-services which can add content as a sort of “feed” much in the same way as Google Earth or Maps can take syndicated feeds of data.

The intent of the program is that it be used by emergency managers as a part of a distributed or virtual Emergency Operations Center (EOC).  Instead of all emergency management coordinators needing to be in a room or having to communicate location over text or speech, content can be directly attached to a point on an orthophotographic map.  In addition, the intent is that self-locating resources such as field-officers with cellphones can be see on the map and can add content such as pictures of a disaster scene, info on people that need further assistance, mark static resources, or survey an area.

Renci Open-Source

Today, we are open-sourcing The Big Board and opening it up to the larger community.  We hope this will lead to a more useful and wider-used application.  My own vision for the application is that it remain fairly close in feature-set to the way it is now, with the addition of a few specific abilities, such as the ability to warp and overlay an image and support GeoRSS formatted content feeds. It is not intended to ever become a full featured GIS client nor is it intended to be overly similar to Google Maps.

The Big Board is written in Haskell.  In addition to other goals, I intend this application to show that large, complex applications can be written quickly and the code can be more understandable than the same application written in Java or C++.  Anyone with questions about the code structure or layout can contact me directly via the blog or at jeff at renci dot org.  I will put a Windows downloadable executable on here tomorrow.

Stable source snapshot (.tar.gz)

The Big Board on github

(I’m working on it, folks - never used subversion before so if it doesn’t work, tell me)

User and Administrator’s Guide

Followup to my earlier post on Hilbert curve timeseries plots

Written by Jeff Heard on August 24th, 2009

Note that the Hilbert code is not mine.  Rather it is from a Haskell Golf Challenge post in a pastebin and appropriated because I wanted a quick Hilbert curve without having to think through a hazy minded Friday evening.  Can’t find the original post now, actually.  That’ll teach me to bookmark things that are important.  Here’s the other person’s code.  If anyone can help me claim it, I will post your name as author.  Again, apologies in advance for that :-(.

h :: [Bool] -> ([Bool], [Bool])
h = l

go :: Bool -> Bool -> ([Bool], [Bool]) -> ([Bool], [Bool])
go x y (xs,ys) = (x:xs, y:ys)

l, r :: [Bool] -> ([Bool], [Bool])
l (False:False:ns) = go False False $ right (r ns)
l (False:True:ns) = go False True $ l ns
l (True:False:ns) = go True True $ l ns
l (True:True:ns) = go True False $ left (r ns)
l _ = ([], [])

r (False:False:ns) = go False True $ left (l ns)
r (False:True:ns) = go False False $ r ns
r (True:False:ns) = go True False $ r ns
r (True:True:ns) = go True True $ right (l ns)
r _ = ([], [])

left, right :: ([Bool], [Bool]) -> ([Bool], [Bool])
left (False:xs, False:ys) = go False True $ left (xs,ys)
left (False:xs, True:ys) = go True True $ left (xs,ys)
left (True:xs, False:ys) = go False False $ left (xs,ys)
left (True:xs, True:ys) = go True False $ left (xs,ys)
left _ = ([], [])

right (False:xs, True:ys) = go False False $ right (xs,ys)
right (True:xs, True:ys) = go False True $ right (xs,ys)
right (False:xs, False:ys) = go True False $ right (xs,ys)
right (True:xs, False:ys) = go True True $ right (xs,ys)
right _ = ([], [])

-- Infrastructure for testing:
bits :: Int -> Int -> [Bool]
bits n k = go n k [] where
  go 0 k = id
  go n k = go (n-1) (k `div` 2) . (odd k:)

num :: [Bool] -> Double
num (False:xs) = num xs / 2
num (True:xs) = (num xs + 1) / 2
num [] = 0

hilbert :: Int -> Int -> (Double, Double)
hilbert n k = (\(x,y) -> (num x, num y)) (h (bits n k))


Here begins my own code.  To use the Data.Colour.blend function, I need to normalize all the values.  Here is that function, which could be made considerably more efficient with a minimax instead of independently calling minimum and maximum, but again, the point here is illustration of a technique, not the most beautiful code.

normalize :: [Double] -> [Double]
normalize values = [(val-minval) / (maxval-minval) | val <- values]
    where minval = minimum values
          maxval = maximum values

Following that, we have a function and its helper for creating a hilbert plot of the data.  Note the use of the constant 64.  The Hilbert code above keeps everything within a unit vector of the origin, so we scale out for the resolution.  The resolution should properly be ceiling . log2 of the number of items in the list, which could be calculated efficiently, but it would clutter the code.

vis :: Int -> [Double] -> BaseVisual
vis n values = concat $ zipWith vis'
                [(64*x,64*y) | (x,y) <- (map (hilbert n) [0..2^n-1])]
                (normalize values)

Finally here is the visualization of a single point whose x,y vector is now the Hilbert point for its position in the timeseries.  We blend between two colors, blue for 0 and orange for 1. This could just as easily be a more complicated colourmap, but this is the code that generated the colormap from the previous post.

vis' :: (Double,Double) -> Double -> BaseVisual
vis' (x,y) val = fillcolour (blend val c0 c1)
               . filled True
               . outlined False
               $ arc{ center = Point x y, radius=0.5 }
    where c0 = opaque blue
          c1 = opaque orange

And finally the main program.  All we do here is take the filename from the arguments, read in the lines as Double values, and map those values to colored hilbert points using our vis function.

main = do
  name <- (!!0) <$> getArgs
  k <- map read . tail . words <$> readFile name
  let visualization = vis degree k
      degree = (ceiling $ log (realToFrac . length $ k) / log 2)
  renderToSVG (name ++ ".svg") 128 128 visualization

Plotting timeseries in space filling curves

Written by Jeff Heard on August 24th, 2009

Fitting many timeseries in the same area and formatting them for quick comparison is challenging and an important problem.  For example, say that you want to pick stocks in the S&P 500 that perform similarly or which respond similarly to certain numerical transformations (say they have the same trend lines or the same noise frequency).  Line graphs are inappropriate for this because they’re too visually noisy and because you can only compare easily up to down, not left to right, visually, limiting the number of graphs on the page at the same time.

For certain comparisons, we aren’t interested in the individual values of a timeseries nor its trend along time so much as we are how it relates to other timeseries in the same dataset.  For this purpose over the weekend, I created these:


These are timeseries simply plotted along a Hilbert curve.  The 8 you see here are the first 8 stocks in alphabetical order of the S&P 500, plotted with one segment per day for all trading days in the last decade.  Blues are lower values.  Oranges are higher values.  They have each been normalized internally for their minimum and maximum prices, so the shape of the curve is more important than the maxima or minima.

In this figure, we see two glyphs that are similar: the first and the next to last.  These correspond to Agilent and Adobe, respectively.  Let’s look at their 5 year charts to see how similar they in fact are:


Adobe (ADBE). Source: Google Finance

Source: Google Finance

Agilent (A). Source: Google Finance

Now, despite some surface differences, you’ll note that the shapes of the graphs are indeed similar, and that when Agilent goes up, Adobe tends to go up, and when Agilent goes down, Adobe tends to as well.  Now, these two companies are unrelated in terms of market sector, so can we really say that these similarities matter?  If it were a short trend, maybe not, but this is a 5 year graph that we’re comparing here, and they’re remarkably similar to each other.  It’s worth delving deeper into.  They could be majority held by the same people, they could track for reasons that aren’t immediately obvious, they could have the many of the same people on their board of directors, or it could in the end be complete coincidence (but paradoxically predictive coincidence).

Here, by the way, for comparison is Apple, the third glyph from the Hilbert graph, shown as an area chart on the same timescale as A and ADBE:


Apple (AAPL). Source: Google Finance

One might expect, considering that the majority of Adobe’s projects run on Apple computers and a majority of Apple computers run Adobe products, that when Apple does well, Adobe does well, and vice versa.  We can see that this is not the case.  Much as the two Hilbert graphs are unrelated, AAPL and ADBE’s performance are unrelated.

Noe also that I managed to pack 8 timeseries in less space total using the Hilbert glyphs than the Google finance graphs.

And yes, by the way, this was written in Haskell using Hieroglyph.  I don’t have the code on me right now, but I’ll post it later on today when I’m in front of my personal computer.

C2HS example: To save other people frustration

Written by Jeff Heard on July 10th, 2009

C2Hs is a wonderful little tool.  It generates a lot of the boilerplate code for binding C libraries to Haskell and saves wrists and frustration. However, the documentation I’ve manage to find on it has at times been buggy and incomplete.  I can’t guarantee the following tutorial is idiomatic c2hs — actually I can guarantee that it isn’t — but I can guarantee that it works.

Today I bound libshapefile to Haskell using c2hs.  The following code is not at all Haskell-ish, but I find that it is best to write a true-to-C interface to C code first and then write Haskellish interfaces on top of that.  The following bindings are generated, along with getters and setters for the items in the SHPObject structure:

open :: String -> String -> IO (SHPHandle)
getInfo :: SHPHandle -> IO (Int, Int, [Double], [Double])
readObject :: SHPHandle -> Int -> IO (SHPObject)
close :: SHPHandle -> IO ()
create :: String -> Int -> IO (SHPHandle)
createSimpleObject :: Int -> Int -> [Double] -> [Double] -> [Double] -> IO (SHPObject)
createObject :: Int -> Int -> Int -> [Int] -> [Int] -> Int -> [Double] -> [Double] -> [Double] -> [Double] -> IO (SHPObject)
computeExtents :: SHPObject -> IO ()
writeObject :: SHPHandle -> Int -> SHPObject -> IO (Int)
destroyObject :: SHPObject -> IO ()
rewindObject :: SHPHandle -> SHPObject -> IO (Int)

I’m going to intersperse the c2hs code that I wrote with some explanatory text in hopes that someone finds it helpful.  If the authors of c2hs or someone who is more of an expert in this than me has comments on the code, I’ll be happy to include them in the tutorial.

{-# LANGUAGE ForeignFunctionInterface #-}
{-# LANGUAGE TypeSynonymInstances #-}
module Gis.Shapefile.Internal where

#include <shapefil.h>

import C2HS
import Foreign.Ptr
import System.IO.Unsafe
import Foreign.C
import Control.Monad
import Control.Applicative ((<$>))

The library that you’re binding to needs to be enclosed in a {#context #} tag first.  There are other parameters that can go into context, but these weren’t needed for what I was doing, and don’t seem to be needed super-commonly.  The C2HS import can be found in $HOME/.cabal/share if you’ve installed c2hs with Cabal.  Noe that the #include is left bare up top.  That’s correct even though it’s invalid Haskell code.

For many enumerations, c2hs provides the handy-dandy {#enum #} construct, but for #define-d constants, we have to wrap our own.  I could make these instances of Enum, but it seems like too much work for an internal interface.

{#context lib="shapefile" #}

-- these are defined constants, not an enum, so we can't just use the #enum hook
shptNull = 0
shptPoint = 1
shptArc = 3
shptPolygon = 5
shptMultipoint = 8
shptPointZ = 11
shptArcZ = 13
shptPolygonZ = 15
shptMultipointZ = 18
shptPointM = 21
shptArcM = 23
shptPolygonM = 25
shptMultipointM = 28
shptMultiPatch = 35

data SHPType =
  | TPoint
  | TArc
  | TPolygon
  | TMultipoint
  | TPointZ
  | TArcZ
  | TPolygonZ
  | TMultipointZ
  | TPointM
  | TArcM
  | TPolygonM
  | TMultipointM
  | TMultipatch

shpTypeToIntegral TNull = 0
shpTypeToIntegral TPoint= 1
shpTypeToIntegral TArc=3
shpTypeToIntegral TPolygon=5
shpTypeToIntegral TMultipoint=8
shpTypeToIntegral TPointZ=11
shpTypeToIntegral TArcZ=13
shpTypeToIntegral TPolygonZ=15
shpTypeToIntegral TMultipointZ=18
shpTypeToIntegral TPointM=21
shpTypeToIntegral TArcM=23
shpTypeToIntegral TPolygonM=25
shpTypeToIntegral TMultipointM=28
shpTypeToIntegral TMultipatch=35

integralToSHPType :: (Integral a) => a -> SHPType
integralToSHPType 0=TNull
integralToSHPType 1=TPoint
integralToSHPType 3=TArc
integralToSHPType 5=TPolygon
integralToSHPType 8=TMultipoint
integralToSHPType 11=TPointZ
integralToSHPType 13=TArcZ
integralToSHPType 15=TPolygonZ
integralToSHPType 18=TMultipointZ
integralToSHPType 21=TPointM
integralToSHPType 23=TArcM
integralToSHPType 25=TPolygonM
integralToSHPType 28=TMultipointM
integralToSHPType 35=TMultipatch

Note that I declare opaque types two different ways below.  I couldn’t get c2hs to understand the SHPHandle declaration, because it was itself an opaque type in C.  Well, because of something anyway.  Anyway, because of that, I wrote out the opaque type by hand.  Yes it looks recursive, but GHC handles it just fine, and if you use the -XEmptyDataDecls extension this is what the code actually reduces to.

newtype SHPHandle = SHPHandle (Ptr (SHPHandle))
{#pointer *SHPObject newtype #}

Now to define a bunch of getters and setters for the SHPObject type.  The c2hs documentation doesn’t seem to require that you unwrap the opaque type yourself into the Ptr type, but I had to.  It compiles fine if you don’t, but then you have to unwrap it in your other code, and I like to keep all my foreign data in one module where possible.

getSHPType (SHPObject x) = integralToSHPType <$> {#get SHPObject->nSHPType #} x
getShapeId (SHPObject x) = {#get SHPObject->nShapeId #} x
getParts (SHPObject x) = {#get SHPObject->nParts #} x
getPartStart (SHPObject x) = {#get SHPObject->panPartStart #} x
getPartType (SHPObject x) = {#get SHPObject->panPartType #} x
getVertices (SHPObject x) = {#get SHPObject->nVertices #} x
getX (SHPObject x) = {#get SHPObject->padfX #} x
getY (SHPObject x) = {#get SHPObject->padfY #} x
getZ (SHPObject x) = {#get SHPObject->padfZ #} x
getM (SHPObject x) = {#get SHPObject->padfM #} x
getXMin (SHPObject x) = {#get SHPObject->dfXMin #} x
getYMin (SHPObject x) = {#get SHPObject->dfYMin #} x
getZMin (SHPObject x) = {#get SHPObject->dfZMin #} x
getMMin (SHPObject x) = {#get SHPObject->dfMMin #} x
getXMax (SHPObject x) = {#get SHPObject->dfXMax #} x
getYMax (SHPObject x) = {#get SHPObject->dfYMax #} x
getZMax (SHPObject x) = {#get SHPObject->dfZMax #} x
getMMax (SHPObject x) = {#get SHPObject->dfMMax #} x

setSHPType (SHPObject p) v = {#set SHPObject->nSHPType #} p (shpTypeToIntegral v)
setShapeId (SHPObject p) = {#set SHPObject->nShapeId #} p
setParts (SHPObject p) = {#set SHPObject->nParts #} p

In the next line we see some Haskell code being obviously mixed with c2hs markup.  This is standard practice and it’s safer than it seems like it might be.  c2hs parenthesizes most things and seems to handle Haskell syntax perfectly.  Here we have to marshal an list of doubles to a pointer.  The newArray is correct here, because the array needs to live on the C heap and not be garbage collected.  The new- as opposed to alloca- means that it is our responsibility to deallocate it.  Fortunately, destroyObject (bound later) releases that memory for us.

setPartStart (SHPObject p) v = newArray v >>= {#set SHPObject->panPartStart #} p
setPartType (SHPObject p) v = newArray v >>= {#set SHPObject->panPartType #} p
setVertices (SHPObject p) = {#set SHPObject->nVertices #} p
setX (SHPObject p) v = newArray v >>= {#set SHPObject->padfX #} p
setY (SHPObject p) v = newArray v >>= {#set SHPObject->padfY #} p
setZ (SHPObject p) v = newArray v >>= {#set SHPObject->padfZ #} p
setM (SHPObject p) v = newArray v >>= {#set SHPObject->padfM #} p
setXMin (SHPObject p) = {#set SHPObject->dfXMin #} p
setYMin (SHPObject p) = {#set SHPObject->dfYMin #} p
setZMin (SHPObject p) = {#set SHPObject->dfZMin #} p
setMMin (SHPObject p) = {#set SHPObject->dfMMin #} p
setXMax (SHPObject p) = {#set SHPObject->dfXMax #} p
setYMax (SHPObject p) = {#set SHPObject->dfYMax #} p
setZMax (SHPObject p) = {#set SHPObject->dfZMax #} p
setMMax (SHPObject p) = {#set SHPObject->dfMMax #} p

Now for a few utility functions: to/fromSHPHandle, which marshals and unmarshals a bare pointer, allocate/peek4, and peekInt.  We’ll use these soon in our {#fun #} bindings, but it appears from working with c2hs that you can’t curry arguments into marshallers or use lambda expressions; I’m not sure if this is me or if this is c2hs, but the solution of writing utility functions works.

toSHPHandle = SHPHandle . castPtr
fromSHPHandle (SHPHandle x) = castPtr x

arrDouble x = ((newArray . map realToFrac $ x) >>=)
arrInt x = ((newArray . map fromIntegral $ x) >>=)

allocate4 = allocaArray 4 

peek4 d = do
    lst <- (peekArray 4 d :: IO [CDouble])
    return . map cFloatConv $ lst

peekInt i = peek i >>= return . cIntConv    

{#fun SHPOpen as open
    { `String'
    , `String'
    } -> `SHPHandle' toSHPHandle #}

Right.  So I’ll explain the two marshallers on either side of the paragraph here.  Argument and return types are outlined in c2hs using a backtick followed by the type followed by a forward tick.  That is not a formatting mistake.  To the left of the arrow are the arguments, and to the right of the arrow is the return type. On the left side of each type can be an “in-marshaller” and on the right, an “out-marshaller”, which translates between Haskell types and C types. There are also two signifiers that can come at the end of the marshaller: - and *.  the ‘-’ on the in-marshaller signifies that the argument is to be handled entirely within the function def and not passed in as a parameter.  The * signifier says that the result will be within the IO monad and to handle it specially.  A good rule seems to be that if you’re working with pointers, you’ll need it.  Out marshallers in the arg list imply that the parameters are “out” parameters, meant to be returned as part of the function return.  c2hs generally handles this as a tuple.  So the following function will have type SHPHandle -> IO (Int,Int,[Double],[Double]).

-- void SHPGetInfo(SHPHandle, int*, int*, double*, double*)
{#fun SHPGetInfo as getInfo
    { fromSHPHandle `SHPHandle'
    , alloca- `Int' peekInt*
    , alloca- `Int' peekInt*
    , allocate4- `[Double]' peek4*
    , allocate4- `[Double]' peek4*
    } -> `()' #}

Note the use of “id” as a marshaller below.  It seemed like c2hs should have a default marshaller for opaque types, but compiling it told me otherwise, so I added it and it worked.  Experimentation, experimentation!

{#fun SHPReadObject as readObject
    { fromSHPHandle `SHPHandle'
    , `Int'
    } -> `SHPObject' id #}

{#fun SHPClose as close
    { fromSHPHandle `SHPHandle' } -> `()' #}

{#fun SHPCreate as create
    { `String'
    , `Int'
    } -> `SHPHandle' toSHPHandle  #}

{#fun SHPCreateSimpleObject as createSimpleObject
    { `Int'
    , `Int'
    , arrDouble* `[Double]'
    , arrDouble* `[Double]'
    , arrDouble* `[Double]'
    } -> `SHPObject' id #}

{#fun SHPCreateObject as createObject
    { `Int'
    , `Int'
    , `Int'
    , arrInt* `[Int]'
    , arrInt* `[Int]'
    , `Int'
    , arrDouble* `[Double]'
    , arrDouble* `[Double]'
    , arrDouble* `[Double]'
    , arrDouble* `[Double]'
    } -> `SHPObject' id #}

{#fun SHPComputeExtents as computeExtents
    { id `SHPObject' } -> `()' #}

{#fun SHPWriteObject as writeObject
    { fromSHPHandle `SHPHandle'
    , `Int'
    , id `SHPObject' } -> `Int' #}

{#fun SHPDestroyObject as destroyObject
    { id `SHPObject' } -> `()' #}

{#fun SHPRewindObject as rewindObject
    { fromSHPHandle `SHPHandle'
    , id `SHPObject' } -> `Int'  #}

This compiled and ran on my machine. Note that you do have to have libshp and shapfil.h installed on your machine to compile this example.

Anyway, I hope this helps someone.  My first c2hs wrapper was a heck of an experience, but the code seems easy to maintain and the results of it not much different than what I would have written by hand except that it’s incredibly shorter.


A quick buster example

Written by Jeff Heard on July 9th, 2009

Some people have asked of late for examples of Buster.  What follows is the (annotated) code that I used to generate the FDL video of the particles swarming around the lambda in the last post:

module Main where

import qualified Data.Array.IO as A
import Graphics.Rendering.Hieroglyph
import Graphics.Rendering.Hieroglyph.OpenGL
import App.EventBus
import IsoFDL
import Random
import Data.Colour
import Data.Colour.Names
import System.Random
import Control.Concurrent

dim = 480
minSeedCoord = fromIntegral $ quot dim 2 - 80
maxSeedCoord = fromIntegral $ quot dim 2 + 80
minIdealDist = (-100)
maxIdealDist = 100
npts = 100
basestrength = 4 

randomRIOs :: Double -> Double -> IO [Double]
randomRIOs a b = do
    g <- getStdGen
    return $ randomRs (a, b) g

This is the only behaviour we define, adjustLayout, and it runs a single layout iteration over the data.  We take the old locales of points, compute an FDL iteration, take the new locales of points, and produce the event, which is “Visible” (this is important, as Hieroglyph looks for all items in the Visible group, assumes they are o the class Visual, and attempts to render them.  We also produce a re-rendering request, and return the list of rerender and our layout as a future to be used at the appropriate time by the buster framework.  One final thing to note about our event is that it should live for 10 iterations.  Often geometry is persistent, but this was used to give the “contrail” effect to the particles as they move, effectively layering slightly transparent geometry over the top of older slightly transparent geometry to create the effect.

The Hieroglyph code is also fairly straightforward.  We apply linewidth and line color combinators (using the colours in Russell O’Connor’s excellent Data.Colour library) to a list of paths which are slightly exaggerated (that’s the +/-(x1-x0) and +/-(y1-y0) operations on the Points in the path) vectors along the path that was most recently traveled.

adjustLayout :: [Attractor] -> (Arr (Int,Int) Double) -> (Arr Int Double) -> (Arr Int Double) -> Behaviour [EData HieroglyphGLRuntime]
adjustLayout poles dists xs ys b = do
    pts <- externalizeLayout xs ys
    layoutIteration basestrength poles xs ys dists
    pts' <- externalizeLayout xs ys
    event' <- produce "Visible" "" (show . head $ pts') (Iterations 10)
                . (:[])
                . EOther
                . Geometry
                . linewidth 1
                . strokecolour (dissolve 0.3 (opaque darkblue))
                . name "points"
                $ zipWith (\(x0,y0) (x1,y1) -> path{ begin=Point (x0-(x1-x0)) (y0-(y1-y0)), segments=[Line $ Point (x1+(x1-x0)) (y1+(y1-y0))] }) pts pts'
    rerender <- produce "Hieroglyph" "" "Rerender" once []
    future b . return $ [event',rerender]

The next function, behaviour would normally be pure, but because we have some data that is internal to the adjustLayout behaviour, it’s got a do.  One of the things that I will change in the near future about Hieroglyph and OpenGL is to make the events it can respond to more polymorphic.  Currently I require that to use Hieroglyph in OpenGL that the event data type is [EData HieroglyphGLRuntime].  That’s pretty inane if I do say so myself, and in the future, there will be a constraint more along the lines of (HieroglyphInteractiveEventData a) => a rather than a static type to be determined at compile time.  However, with the current state of things, this is our behaviour, and it’s still pretty straightforward.  renderBehaviour depends on the result of adjustLayout.  That’s it.  The <~< establishes the dependency relationship and buster takes care of the rest.  The “poles” are a series of attractors (defined in IsoFDL.hs), which are static fields attached to a fixed point in space, attracting or repelling depending on the constant given them.

behaviour = do
    randomDistances <- randomRIOs minIdealDist maxIdealDist >>= A.newListArray ((0,0),(npts-1,npts-1)) . take (npts^2) . drop (2*npts)
    randomXs <- randomRIOs minSeedCoord maxSeedCoord >>= A.newListArray (0,npts-1) . take npts
    randomYs <- randomRIOs minSeedCoord maxSeedCoord >>= A.newListArray (0,npts-1) . take npts . drop npts

    poles <- mapM (\(x,y) -> compileAttractor . Attractor x y . take npts $ repeat 150)
        [(100,(440-100)), (170,(440-160)), (220,(440-220)), (160,(440-280)),
         (100,(440-320)), (280,(440-280)), (350,(440-320)), (380,(440-300))]

    return $ renderBehaviour <~< adjustLayout poles randomDistances randomXs randomYs

main = behaviour >>= boilerplateOpenGLMain  [initializeBus "Testing Force Directed layouts" dim dim]

Finally, our main function is incredibly simple.  We simply bind the behaviour to the boilerplate buster OpenGL main.  The other argument to boilerplateOpenGLMain is a list of “widgets” to be bound to the bus at the beginning of the application.

An ugly force-directed layout implementation

Written by Jeff Heard on July 9th, 2009

Updated 7.9.2009: I’ve added another video showing the effect of attractors on the layout

Unalbe to show flash video

I’ve been working on visualizations silently of late.  Today, though, I threw together a quick FDL (force-directed layout) algorithm based on POV-Ray’s static-field isosurface or “blob” equation, which is (sorry, but I’m using plaintext)

density = strength * (1 - (distance / radius)^2)^2

The documented equation has a problem, in that if distance > radius, the density approaches infinity fast, so I change it to be:

density = strength * (1 - (min(distance, radius) / radius)^2)^2

That solves the infinity problem, and now all there is to the algorithm is to apply it:

adjust :: Double -> Double -> Double -> Double -> Double -> Double -> (Double,Double)
adjust sx sy ss sr dx dy = if isNaN vx || isNaN vy then (dx,dy) else (dx+force * vx, dy+force * vy)
    where force = density ss sr dist
          dist = sqrt ((dx-sx)^^2 + (dy-sy)^^2)
          vx = (dx-sx) / dist
          vy = (dy-sy) / dist

Note that it is possible for dist to be NaN for a single point-to-point interaction, so we correct for that.  Basically, this function is a single adjustment from point set to a point set that fits our force constraint a little better.  In the following code, we take a static vector, <sx,sy> and a movable vector <dx,dy> and take the distance.  We apply the density function to the distance using ss as our strength and sr as the radius or distance at which the density function falls off to zero.  We return the translated point P’ which is translated along the normal vector <vx,vy> by the variable “force”. Simple and straightforward.

Now I know… by my title, you’re thinking, “What’s ugly about that?” Well, when I code fast, I use shortcuts.  The following code, which uses this adjustment function as a kernel for the actual force-directed layout algorithm is imperative.  I’ve tried FDL and MDS (multi-dimensional scaling) algorithms before using lists and tries and always ran into enough overhead that it significantly diminished the number of points that are viable to run a single iteration in real-time.  I’m sure there’s a way to do it, but I solicit the community’s help in suggesting a faster way than this.  Yes, I could have used STUArray and avoided IO, but that doesn’t really eliminate the imperative nature of things.

layoutIteration :: Double -> [Attractor] -> (Arr Int Double) -> (Arr Int Double) -> (Arr (Int,Int) Double) -> IO ()
layoutIteration alpha attractors xsArray ysArray radii = do
    bounds <- A.getBounds radii
    forM_ (A.range bounds) $ \(r,c) -> when (r /= c) $ do
        radius <- radii -| (r,c)
        x0 <- xsArray -| c
        y0 <- ysArray -| c
        x1 <- xsArray -| r
        y1 <- ysArray -| r
        let (x',y') = adjust x0 y0 alpha radius x1 y1

        xsArray =| r $ x'
        ysArray =| r $ y'

    forM_ attractors $ \(CAttractor ax ay arArray) -> do
        bounds <- A.getBounds arArray
        forM_ (A.range bounds) $ \ix -> do
            x0 <- xsArray -| ix
            y0 <- ysArray -| ix
            ar <- arArray -| ix
            let (x',y') = adjust ax ay (-alpha) ar x0 y0
            xsArray =| ix $ x'
            ysArray =| ix $ y'

As you can see, this is pretty ugly, even with the aliased (-|) readArray and (=|) writeArray binary operators cleaning things up a bit.  It’s straightforward, and random access to the points is O(1).  There are algorithms for which this is more important than this one, and I can see a list comprehension version coalescing even as I write this, but this kind of code is O(N^2). The constants are sometimes (depending on the adjustment function) quite high, so what I really need is the fastest version possible, probably taking a lot of advantage of fusion in the compiler.  The other problem, which is more subtle,  is that points are updated multiple times in one pass and we always want to use the latest point.  Any functional solution would have to take this into account (and storing all the update deltas and taking their centroid won’t work, because each update wouldn’t then be based on the point’s most current position).

This one seems to handle up to about 250 points quite well, which is decent for an FDL algorithm (keep in mind, that’s 10000 distance calculations per iteration, all of which have to happen before sending any points to the video card and other such matters — doing this 30-50 times a second along with other functionality is harder than it sounds).  Handling more than that would require one of two modifications to the algorithm: either select a random N moves to make per iteration or subdivide the point set into sqrt(N) size blocks, perform the FDL on each of those, and then on their centroid, translating them all by the delta.

So essentially, there were two purposes to this exercise.  One was coding an FDL algorithm in Haskell, but the other was trying out an FDL I hadn’t seen tried before.  This one in particular does not result in the usual circular pattern arising and also handles negative forces as easily as positive forces with a simple, low-overhead function. Oh, and of course the other was showing that I could get realtime performance of such an algorithm (that previously people insisted had to be done in C to be done properly) with a relatively naive implementation.

Force directed layout algorithms are useful for tasks like graph layouts and other dense datasets as well as for performing a sort of “visual” clustering.   Here’s an example showing the starting state, with all points clustered in a square in the middle of the screen:


And here is one after the layout has run for a few hundred iterations:

screenshot-testing-force-directed-layoutsThe evenness reflected here is because the ideal distances I selected are random.  If there is a system to the distances, there will be a system to the layout.  The neat thing about this algorithm is that given any set of input points and distances, the result of the algorithm at any point is deterministic. Follows is the code that I used to generate the above pictures and a short movie of the forces in action.

This is the code for the algorithm and a test program in Hieroglyph/Buster

A video of the algorithm in action

Unalbe to show flash video