...now browsing by category


Visualizing iBiblio Traffic Redux

Sunday, February 28th, 2010

So I was asked to pull together an artistic infovis to give as a gift for the director of iBiblio’s keynote speech the other day at code4lib and this is what I came up with.  I thought, “Sure, why not?”  iBiblio has approximately 8TB of web-logs stored on our machines from the beginning of time to the present, and I took those and pulled down all the unique IP addresses for each web server.

I broke all these addresses up into subnets and assigned a unique position in a sphere to each subnet.  Longitude is the first octet, mapped from 0,255 to (-180,180)  Latitude is the second octet, mapped from 0,255 to (-90,90), and the radius out from the center is the third octet.  The subnets are visualized by a PovRay blob component or a sphere in the case of more than 10% of active internet subnets have hit the site.  The radius of the sphere or the strength of the static field behind the blob is the number of IP addresses within the subnet that have touched the site.  The results are thus:


Durham Bike Co-op

The final visualization, a poster, is ordered by the number of subnets that have accessed the site over the years (apologies for the size of this one, but I can’t shrink it much more without it being too small to understand.  Click for the full sized pic (31″x17″) @ 300dpi, be warned!


You’ll notice some color variation in the individual globes.  I scattered three different colored light sources over three-space in POVRay to make for some visual variation in the otherwise dense fields of the visualizations.  If I hadn’t done this, the result would have been that the whole thing would have had a kind of murky gray cast that made individual blobs more difficult to distinguish.  Although the color here is otherwise meaningless, it does give the eye something to separate one bit of the image from another.

SPDE, Semi-functional programming for Processing

Friday, January 15th, 2010

I’ve been using Processing for awhile now as a tool for prototyping visualizations and getting visualizations on the web.  The main problem with Processing is all the hideous programming habits it encourages; I feel at the end of a long bout of Processing like I’m my 5 year old self back at the keyboard of my Commodore 64, typing 10 PRINT “Hello World!” The only thing it doesn’t have is a goto statement.  So imagine my pleasant surprise when I found you could code in Processing without having to use their awful P4 subdialect of Java.  Instead, I can code in Scala, which while not as purely functional as Haskell, does allow functional programming and encourages more sensible programming. The environment is called SPDE and at its most basic level it consists of The Graft, a specially constructed catch-all package that contains the scala compiler, the simple-build-tool, and the Processing runtime and bells and whistles.

Unlike Processing, SPDE is command-line oriented, but all the familiar things (except one niggling detail, and that is the creation of fonts, which still has to be done in the Processing tool) are there, including applet creation, one-step running, etcetera.  Actually applets are a little less work in SPDE, as typing applet in the tool will cause a browser window to pop up with the created applet, instead of merely exporting the directory and showing it in the Finder (I use a Mac).

To get going, download the graft jarfile and execute it in your operating system’s favorite fashion (e.g. java -jar graft-setup-0.2.2.jar).  You should get a fresh new directory with a Sketch.spde, several directories, and an sbt.bat or sbt file (depending on your OS).  If everything’s worked perfectly, it’ll even pop up a sketch that looks like an epilepsy-triggering barcode.

Now to start doing your own thing.  sbt is the Simple Build Tool.  Sketch.spde is the file you’ll start modifying.  Any file with an .spde extension will be compiled by the compiler into a single sketch (note these are not really modules and you don’t get separate namespaces for each file, unlike Java, so if you have name clashes, they will produce esoteric error messages that e.e. cummings would have been proud to have written).  SPDE and Processing both follow the same model of faking a toplevel by enclosing whatever you type in the runtime class, so you get a lot for free, and although you’re still writing Scala, it’s a little freer than the straight-up Scala you’d write normally.  Here’s an applet that i created in SPDE. Let’s look at some of the code that produced it:

Click to continue »

Radial Bar Charts

Tuesday, December 15th, 2009

The other day, I was presented with a problem of visualizing a small 3-D Excel spreadsheet in a way that allowed the viewer to:

  • Compare columns for balance.
  • Compare the third dimension alternates (professors vs. grad students).
  • Compare the rows for balance.
  • Compare the overall sums.

What I came up with was this:


Along the ring are the names of the columns of data in the spreadsheet.  On each spoke are the rows in the two layers of the spreadsheet visualized as bar charts.  Stretching counterclockwise are the bars for senior faculty.  Stretching clockwise are the bars for graduate research assistants.  In the center, visualized as bubbles of varying radii are the totals from both bar charts along the spoke.  Note the light, thin rings connecting each bar. These are designed to draw a viewer’s eye around the chart, connecting bar to bar visually to inform the viewer that comparison is relevant.

This technique is appropriate for smaller charts.  I have a feeling that too many columns would get to be too visually busy, although I suspect the chart’s behaviour as rows increase may well be more robust, especially if the bar chart was changed to an area chart.

A quick buster example

Thursday, July 9th, 2009

Some people have asked of late for examples of Buster.  What follows is the (annotated) code that I used to generate the FDL video of the particles swarming around the lambda in the last post:

module Main where

import qualified Data.Array.IO as A
import Graphics.Rendering.Hieroglyph
import Graphics.Rendering.Hieroglyph.OpenGL
import App.EventBus
import IsoFDL
import Random
import Data.Colour
import Data.Colour.Names
import System.Random
import Control.Concurrent

dim = 480
minSeedCoord = fromIntegral $ quot dim 2 - 80
maxSeedCoord = fromIntegral $ quot dim 2 + 80
minIdealDist = (-100)
maxIdealDist = 100
npts = 100
basestrength = 4 

randomRIOs :: Double -> Double -> IO [Double]
randomRIOs a b = do
    g <- getStdGen
    return $ randomRs (a, b) g

This is the only behaviour we define, adjustLayout, and it runs a single layout iteration over the data.  We take the old locales of points, compute an FDL iteration, take the new locales of points, and produce the event, which is “Visible” (this is important, as Hieroglyph looks for all items in the Visible group, assumes they are o the class Visual, and attempts to render them.  We also produce a re-rendering request, and return the list of rerender and our layout as a future to be used at the appropriate time by the buster framework.  One final thing to note about our event is that it should live for 10 iterations.  Often geometry is persistent, but this was used to give the “contrail” effect to the particles as they move, effectively layering slightly transparent geometry over the top of older slightly transparent geometry to create the effect.

The Hieroglyph code is also fairly straightforward.  We apply linewidth and line color combinators (using the colours in Russell O’Connor’s excellent Data.Colour library) to a list of paths which are slightly exaggerated (that’s the +/-(x1-x0) and +/-(y1-y0) operations on the Points in the path) vectors along the path that was most recently traveled.

adjustLayout :: [Attractor] -> (Arr (Int,Int) Double) -> (Arr Int Double) -> (Arr Int Double) -> Behaviour [EData HieroglyphGLRuntime]
adjustLayout poles dists xs ys b = do
    pts <- externalizeLayout xs ys
    layoutIteration basestrength poles xs ys dists
    pts' <- externalizeLayout xs ys
    event' <- produce "Visible" "" (show . head $ pts') (Iterations 10)
                . (:[])
                . EOther
                . Geometry
                . linewidth 1
                . strokecolour (dissolve 0.3 (opaque darkblue))
                . name "points"
                $ zipWith (\(x0,y0) (x1,y1) -> path{ begin=Point (x0-(x1-x0)) (y0-(y1-y0)), segments=[Line $ Point (x1+(x1-x0)) (y1+(y1-y0))] }) pts pts'
    rerender <- produce "Hieroglyph" "" "Rerender" once []
    future b . return $ [event',rerender]

The next function, behaviour would normally be pure, but because we have some data that is internal to the adjustLayout behaviour, it’s got a do.  One of the things that I will change in the near future about Hieroglyph and OpenGL is to make the events it can respond to more polymorphic.  Currently I require that to use Hieroglyph in OpenGL that the event data type is [EData HieroglyphGLRuntime].  That’s pretty inane if I do say so myself, and in the future, there will be a constraint more along the lines of (HieroglyphInteractiveEventData a) => a rather than a static type to be determined at compile time.  However, with the current state of things, this is our behaviour, and it’s still pretty straightforward.  renderBehaviour depends on the result of adjustLayout.  That’s it.  The <~< establishes the dependency relationship and buster takes care of the rest.  The “poles” are a series of attractors (defined in IsoFDL.hs), which are static fields attached to a fixed point in space, attracting or repelling depending on the constant given them.

behaviour = do
    randomDistances <- randomRIOs minIdealDist maxIdealDist >>= A.newListArray ((0,0),(npts-1,npts-1)) . take (npts^2) . drop (2*npts)
    randomXs <- randomRIOs minSeedCoord maxSeedCoord >>= A.newListArray (0,npts-1) . take npts
    randomYs <- randomRIOs minSeedCoord maxSeedCoord >>= A.newListArray (0,npts-1) . take npts . drop npts

    poles <- mapM (\(x,y) -> compileAttractor . Attractor x y . take npts $ repeat 150)
        [(100,(440-100)), (170,(440-160)), (220,(440-220)), (160,(440-280)),
         (100,(440-320)), (280,(440-280)), (350,(440-320)), (380,(440-300))]

    return $ renderBehaviour <~< adjustLayout poles randomDistances randomXs randomYs

main = behaviour >>= boilerplateOpenGLMain  [initializeBus "Testing Force Directed layouts" dim dim]

Finally, our main function is incredibly simple.  We simply bind the behaviour to the boilerplate buster OpenGL main.  The other argument to boilerplateOpenGLMain is a list of “widgets” to be bound to the bus at the beginning of the application.

An ugly force-directed layout implementation

Thursday, July 9th, 2009

Updated 7.9.2009: I’ve added another video showing the effect of attractors on the layout

Unalbe to show flash video

I’ve been working on visualizations silently of late.  Today, though, I threw together a quick FDL (force-directed layout) algorithm based on POV-Ray’s static-field isosurface or “blob” equation, which is (sorry, but I’m using plaintext)

density = strength * (1 - (distance / radius)^2)^2

The documented equation has a problem, in that if distance > radius, the density approaches infinity fast, so I change it to be:

density = strength * (1 - (min(distance, radius) / radius)^2)^2

That solves the infinity problem, and now all there is to the algorithm is to apply it:

adjust :: Double -> Double -> Double -> Double -> Double -> Double -> (Double,Double)
adjust sx sy ss sr dx dy = if isNaN vx || isNaN vy then (dx,dy) else (dx+force * vx, dy+force * vy)
    where force = density ss sr dist
          dist = sqrt ((dx-sx)^^2 + (dy-sy)^^2)
          vx = (dx-sx) / dist
          vy = (dy-sy) / dist

Note that it is possible for dist to be NaN for a single point-to-point interaction, so we correct for that.  Basically, this function is a single adjustment from point set to a point set that fits our force constraint a little better.  In the following code, we take a static vector, <sx,sy> and a movable vector <dx,dy> and take the distance.  We apply the density function to the distance using ss as our strength and sr as the radius or distance at which the density function falls off to zero.  We return the translated point P’ which is translated along the normal vector <vx,vy> by the variable “force”. Simple and straightforward.

Now I know… by my title, you’re thinking, “What’s ugly about that?” Well, when I code fast, I use shortcuts.  The following code, which uses this adjustment function as a kernel for the actual force-directed layout algorithm is imperative.  I’ve tried FDL and MDS (multi-dimensional scaling) algorithms before using lists and tries and always ran into enough overhead that it significantly diminished the number of points that are viable to run a single iteration in real-time.  I’m sure there’s a way to do it, but I solicit the community’s help in suggesting a faster way than this.  Yes, I could have used STUArray and avoided IO, but that doesn’t really eliminate the imperative nature of things.

layoutIteration :: Double -> [Attractor] -> (Arr Int Double) -> (Arr Int Double) -> (Arr (Int,Int) Double) -> IO ()
layoutIteration alpha attractors xsArray ysArray radii = do
    bounds <- A.getBounds radii
    forM_ (A.range bounds) $ \(r,c) -> when (r /= c) $ do
        radius <- radii -| (r,c)
        x0 <- xsArray -| c
        y0 <- ysArray -| c
        x1 <- xsArray -| r
        y1 <- ysArray -| r
        let (x',y') = adjust x0 y0 alpha radius x1 y1

        xsArray =| r $ x'
        ysArray =| r $ y'

    forM_ attractors $ \(CAttractor ax ay arArray) -> do
        bounds <- A.getBounds arArray
        forM_ (A.range bounds) $ \ix -> do
            x0 <- xsArray -| ix
            y0 <- ysArray -| ix
            ar <- arArray -| ix
            let (x',y') = adjust ax ay (-alpha) ar x0 y0
            xsArray =| ix $ x'
            ysArray =| ix $ y'

As you can see, this is pretty ugly, even with the aliased (-|) readArray and (=|) writeArray binary operators cleaning things up a bit.  It’s straightforward, and random access to the points is O(1).  There are algorithms for which this is more important than this one, and I can see a list comprehension version coalescing even as I write this, but this kind of code is O(N^2). The constants are sometimes (depending on the adjustment function) quite high, so what I really need is the fastest version possible, probably taking a lot of advantage of fusion in the compiler.  The other problem, which is more subtle,  is that points are updated multiple times in one pass and we always want to use the latest point.  Any functional solution would have to take this into account (and storing all the update deltas and taking their centroid won’t work, because each update wouldn’t then be based on the point’s most current position).

This one seems to handle up to about 250 points quite well, which is decent for an FDL algorithm (keep in mind, that’s 10000 distance calculations per iteration, all of which have to happen before sending any points to the video card and other such matters — doing this 30-50 times a second along with other functionality is harder than it sounds).  Handling more than that would require one of two modifications to the algorithm: either select a random N moves to make per iteration or subdivide the point set into sqrt(N) size blocks, perform the FDL on each of those, and then on their centroid, translating them all by the delta.

So essentially, there were two purposes to this exercise.  One was coding an FDL algorithm in Haskell, but the other was trying out an FDL I hadn’t seen tried before.  This one in particular does not result in the usual circular pattern arising and also handles negative forces as easily as positive forces with a simple, low-overhead function. Oh, and of course the other was showing that I could get realtime performance of such an algorithm (that previously people insisted had to be done in C to be done properly) with a relatively naive implementation.

Force directed layout algorithms are useful for tasks like graph layouts and other dense datasets as well as for performing a sort of “visual” clustering.   Here’s an example showing the starting state, with all points clustered in a square in the middle of the screen:


And here is one after the layout has run for a few hundred iterations:

screenshot-testing-force-directed-layoutsThe evenness reflected here is because the ideal distances I selected are random.  If there is a system to the distances, there will be a system to the layout.  The neat thing about this algorithm is that given any set of input points and distances, the result of the algorithm at any point is deterministic. Follows is the code that I used to generate the above pictures and a short movie of the forces in action.

This is the code for the algorithm and a test program in Hieroglyph/Buster

A video of the algorithm in action

Unalbe to show flash video

Buster 2.2 - Application Orchestration redux

Thursday, May 28th, 2009

I’ve been working heavily on the Big Board lately, and now I’m finally basically done with it.  As a result, I’ve modified, updated, and upgraded buster considerably since the first release, and it is now a pretty mature, high performance, and powerful engine for application orchestration.  There are a few things about it that bug me; mainly that there tends to be a lot more IO code than I usually like to see, but on the other hand, this tends to be in the behaviour part and the actual orchestration in terms of events and behaviour relations are purely functional and their semantics are well exposed.  One thing I am thinking of changing is the way that events are named, to an EventName type rather than relying on group, source, and name, since some of these seem to be irrelevant in certain cases.

One thing I am particularly satisfied with is that behaviours seem to be self-segmenting and I’ve not had to wedge anything into the form of a behaviour or worry about having to wedge something non-behaviour like into the framework because of poor forethought.   The metaphor seems to work well.  I have found that since I use strings to name events rather than enumerations, it’s helpful to have a module in the code somewhere that ties constants to the names to encourage consistency and type-safety.

Anyway, in a nutshell, here’s how application orchestration via Buster works by example:

iwidth = 1024
iheight = 1024

programBehaviour =
   >~> selectionBehaviour
   >~> zoomAndPanBehaviour
   >~> (uiResponses
        |~| translateVisibles
        |~| mouseCoordinateAdjustmentBehaviour)
   >~> dialogueRaisingBehaviour
   >~> dialogueResponseBehaviour
   >~> annotationCreationBehaviour
   >~> annotationFinishingBehaviour
   >~> annotationsTouchedBehaviour
   >~> loadMapMetadataBehaviour
   >~> (loadTilesBehaviour |~| loadAnnotationsBehaviour)
   >~> renderBehaviour

widgets = [
  initializeBus "The Big Board, by the Renaissance Computing Institute. v3.0a" iwidth iheight
, configFileWidget "thebigboard.ini"
, paletteGeometryWidget]

main = boilerplateGLMain widgets programBehaviour

Yes.  That plus the imports is the entire main module of The Big Board.  In point of fact the entirey of the Big Board, despite its complexity, is just a little under 800 lines of application-specific Haskell code.  I’ve highlighted in blue all the behaviours and widgets that are part of Hieroglyph and in green the ones that are part of buster.  Here we declare the dependency structure of the component program behaviours in the function programBehaviour.  The >~> operator designates that the right operand depends on the left operand.  The |~| operator declares that the two operands are independent and are thus functionally concurrent.   So we perform mouse-based selection, select items, and go through the normal chain of UI behaviours and then finally hit the render behaviour, which detects if anything’s changed in the scene and tells Hieroglyph to rerender if it has.  Main is a single line that calls the boilerplate Hieroglyph main, which initializes the Gtk and OpenGL, creates a window (which can be accessed later and modified for those who want more than a single main window in their application), and starts the bus a’-running.

Now let’s delve into the anatomy of a behaviour…

mouseCoordinateAdjustmentBehaviour b =
 pollFullyQualifiedEventWith b "Mouse" "Hieroglyph.KeyboardMouseWidget" "Position" $ \evt -> do
    let [x,y] = fromEDoubleL . fromJust . lookup "coords" . fromEAssocL . head . eventdata $ evt
        [ox,oy] = fromEDoubleL . fromJust . lookup "origin" metaL
        scale = fromEDouble . fromJust . lookup "scale" metaDataL
        metaL = maybe ([("origin",EDoubleL [0,0]),("scale",EDouble 1)])
                      (fromEAssocL . head . eventdata)
                      (eventForQName "Environment" "Map" "Metadata" b)
    glcoords <- reverseMouseCoords b x y
    let Point x' y' = reprojectPoint (Point ox oy) scale glcoords
    rerender <- produce "Hieroglyph" "Hieroglyph" "Rerender" Persistent []
    coords <- produce "Visible" "Hieroglyph" "coords text" Persistent [EOther (Geometry (coordsText x' y')]
    return [rerender, coords]

Again, the buster and Hieroglyph highlighting rules apply.  Here, we poll events named (Mouse,Hieroglyph.KeyboardMouseWidget,Position) that sit on the bus and apply the associated behavioural thunk to them.  Polling means that the event will stay on the bus even after the behaviour is applied.  We then reverse the mouse coords to fit GL coordinates and  offset it from the origin by the proper amount.  The coordinates are then displayed in a box (the coordsText function maps the x and y values to a visible representation and this visible representation is put on the bus as an event in the group “Visible”, which indicates to Hieroglyph that it should be rendered.

There are times where the poll/consume event helper functions aren’t appropriate.  In this case, instead of returning a list of bus differences, we have to return a “future” of a list of bus differences.  This is done by replacing the last line of the behaviour with:

future bus . return $ [list of Insertion evt | Deletion evt]

From there, most everything else about buster can be figured out based on this and the previous posts on the buster model and the Haddock documentation.

New version of Hieroglyph released.

Tuesday, February 24th, 2009

I’ve changed colors so that they’re now handled by Russell O’Connor’s excellent Data.Colour package, and changed Interactive up a bit to make it a bit easier to get at the canvas and data structures, point being to allow you to use Hieroglyph inside of a larger Gtk GUI more easily.  There shouldn’t be any code changes required at this time, though.  Can’t guarantee I won’t break the interface a bit in the future, but for now, you should see slightly better performance and have a lot more control over how color is displayed.  Thanks!

Hieroglyph Part II, basic mouse/keyboard interactivity

Tuesday, February 3rd, 2009
> module Main where
> import Graphics.Rendering.Hieroglyph
> import Graphics.Rendering.Hieroglyph.Scaffolds.Interactive

This is a very very simple demonstration of how to use Hieroglyph in an interactive setting. We draw a red circle under the mouse point, a blue triangle which will cover the circle if the mouse goes over it, a line from the origin to the mouse, and a bit of text showing us where the mouse is at the moment.

Heiroglyph uses records to simulate labeled functions, where parameters can be given in any order or omitted with a sensible default. Here, we change attributes from the Cairo default, called “plain”, which is imported from Graphics.Rendering.Hieroglyph to give us some sensible stroke and fill colors.

The first is black and sets the outline color only

> blackstroke = plain{ strokeRGBA = black }

Next is blue, where the blue is slightly transparent (faded).

> bluefill    = plain{ fillRGBA = fade 0.8 blue, filled=True, outlined=False }

Next is red, and sets the fill color only

> redfill     = plain{ fillRGBA = red, filled=True, outlined=False }

Finally white, for our background rectangle, and sets our fill color only.

> whitefill   = plain{ fillRGBA = white, filled=True, outlined=False }

The next bit is our visualization. A visualization in Hieroglyph is a function that maps data to a collection of primitives. In this case, the data is simply the interactive scaffolding, an element from an infinite list of user interactions. The visualization function is then folded over the input data. More complicated data can be used by implementing the class Graphics.Rendering.Hieroglyph.Scaffolds.Interactive.Interactive, but for our purposes, we don’t care to track anything else but the mouse Position.

This is our type signature. Note that I go ahead and generalize the vis to all Interactive types rather than the Scaffolding, even though we’re using it. This is more portable, and more correct, because really the vis only depends on the mouse position, which can be gotten from any instance of Interactive.

Note that I use BaseVisual as the return type. This is the return type of “over,” which takes any two Visuals and returns a BaseVisual where the left operand is declared to be possibly occluding the right operand.

> vis :: Interactive a => a -> BaseVisual
> vis a =

Note that at no time do we draw anything in this function, nor does the order in which I declare these items actually matter. The only thing that matters here to order is the combinator `over`, which takes two Visuals and returns a new Visual that contains both operands, but the left operand occludes the right where there is overlap.

First, our line from the origin:

>     [path{ segments = [Line (getMousePosition a)], attribs = blackstroke }

Next our text, and you note that we don’t declare the over-ness here, meaning that the occlusion between these two primitives is undefined.

>     ,text{ str = "mousePosition" ++ (show . getMousePosition $ a), bottomleft = Point 10 (getSizeY a-40), attribs = blackstroke }]

Now we declare that these two primitives occlude the next primitive, a blue triangle.

>     `over` polygon{ begin = (Point 100 100), segments = [Line (Point 200 300), Line (Point 300 100)], attribs = bluefill}

And then we declare a mouse pointer tracking red dot that is below the triangle when the triangle and dot interact with each other.

>     `over` arc{ center = getMousePosition a, radius = 4, attribs = redfill}

And finally, we declare the background rectangle.

>     `over` rectangle{ width = getSizeX a, height = getSizeY a, attribs = whitefill }

Main is very simple in Hieroglyph, generally. Since we’re not reading any input files, all we do is create a basic 800 by 600 motion sensitive GUI. Note that the motion-sensitive part does hurt performance a little, so if you don’t need it, use the simpleGui function instead. Both take initial input data, the scene function from the data to the scene, and a title for the window.

> main = simpleMotionSensitiveGui scaffold vis "sample interactive app"

Voila, it’s an app! Just to recap, here’s the program without all the commentary in between:

module Main where

import Graphics.Rendering.Hieroglyph
import Graphics.Rendering.Hieroglyph.Scaffolds.Interactive 

blackstroke = plain{ strokeRGBA = black }
bluefill    = plain{ fillRGBA = fade 0.8 blue, filled=True, outlined=False }
redfill     = plain{ fillRGBA = red, filled=True, outlined=False }
whitefill   = plain{ fillRGBA = white, filled=True, outlined=False }

vis :: Interactive a => a -> BaseVisual
vis a =
     [path{ segments = [Line (getMousePosition a)], attribs = blackstroke }
     ,text{ str = "mousePosition" ++ (show . getMousePosition $ a), bottomleft = Point 10 (getSizeY a-40), attribs = blackstroke }]
     `over` polygon{ begin = (Point 100 100), segments = [Line (Point 200 300), Line (Point 300 100)], attribs = bluefill}
     `over` arc{ center = getMousePosition a, radius = 4, attribs = redfill}
     `over` rectangle{ width = getSizeX a, height = getSizeY a, attribs = whitefill }

main = simpleMotionSensitiveGui scaffold vis "sample interactive app"

Hieroglyph HOWTO, part I: Sparklines

Tuesday, February 3rd, 2009
> module Main where
> import Data.List
> import Graphics.Rendering.Hieroglyph
> import System.Environment (getArgs)
> import qualified Data.ByteString.Lazy.Char8 as BStr

This is a simple demonstration of Hieroglyph for a non-interactive application.  The drawing is output directly to a PNG file of the specified width and height.

Formally, a visualization is a function from data to a Visual, where a Visual is an unstructured collection of Primitives.  These Primitives are documented in the module Graphics.Rendering.Hieroglyph.Primitives.  A purely functional framework such as this affords the programmer several advantages. Number one is that the drawing closely resembles the data — it is a straightforward
transformation, not unlike a stylesheet, and entirely without side effects or dependencies outside the drawing itself. Another advantage is that the drawing itself can be transformed and read by code. For example, the Visual can be indexed in an R-Tree or P-Tree and collision detection (mouse detection) becomes straightforward to implement. Another advantage is by knowing that
the drawing algorithm is a straightforward breadth first search without frills, it is easy to reason about why your drawing didn’t come out the way you expected it to. A final advantage is that you can be lazy about what you draw. Being declarative instead of imperative about drawing means that you don’t have to make breaks out of your code or write separate code to change level of detail, or to do off-screen culling. These become just a series of data transformations or a matter of how much of your Visual you pass to the renderer.

Alright. If I haven’t convinced you by now you ought to be doing your 2D
drawing in a purely functional way, I’m not going to.

To create a visulization in Hieroglyph, we write a function or functions from our data to a Visual. Then we use a transfer operation to take the Visual and put it to the page in main.

Our first simple visualization will be a sparkline. Sparklines are inline unscaled line graphs meant to replace or accent a table in a block of text. Let’s see how we do that. First of all, here’s the type signature of our sparkline, so you can see that while we’re defining a drawing, since there’s no IO, we aren’t actually doing the drawing yet.

> sparkline :: Point -> Double -> Double -> [Double] -> Primitive

Note that Point is a 2D point defined as two Doubles x and y by Hieroglyph. Primitive is the instance of Visual we’re using, since a sparkline can be represented by a single path. Now, on to the code.

> sparkline (Point startx starty) width height values = path{ begin=point0, segments=map Line points }

That one line defines the drawing itself, a line beginning at the leftmost x whose segments are compressed horizontally into the space we define by “width” and whose y values are compressed to height and scaled according to the values we passed in. A sparkline. It took me four lines of text, but it takes Hieroglyph and Haskell one line of code, with a bit of addenda.

>    where (point0:points) = zipWith Point xvals yvals
>          xvals = iterate (+(width/n)) startx
>          yvals = map (remap mx mn starty (starty+height)) values
>          (mx,mn,_,_,_,n) = stats values

Here we define our points in terms of x values and y values. The x values are in turn defined by a list of doubles starting with the leftmost point, startx and continuing on by increments of our defined width divided by the number of data points we’re trying to compress into that space. The y values are defined by re-mapping the range of values from its natural range, defined by mx and mn, to the geometry of our drawing, defined by height and starty. We define the stats function shortly and the remap function only a little further down. Note briefly, though, the underscores. Those mean that stats returns more values that we’re actually going to use, and we’re ignoring them. Now, due to the magic of laziness in Haskell, those values and their dependents are never actually calculated, saving us potentially a fair bit of performance.

I’m going to go ahead and define main for you before I define anything else, because then you can see how we go from a Visual, which is an abstract and purely functional entity to drawing on paper, which is not. Here goes.

> main = do

This line grabs the program arguments in the order we expect them. It’s not generally condoned to do things this way, but it prevents us from littering our example with bulky parameter checking code and from littering our import list with other modules for you to find and compile.

>   [dataset,widthAsc,heightAsc,outputfile] <- getArgs

There’s a lot going on in the next line. We map the function to the left of `fmap` over the values on the right, which are of type “IO BStr.ByteString” (appropriately, since they come from disc). Reading from right to left, the left side splits the string into lines, unpacks each line from our lazy representation to indivdual Haskell Strings, and then reads those Strings into Doubles.

>   values <- (map (read . BStr.unpack) . BStr.lines) `fmap` BStr.readFile dataset

Now, we briefly read our PNG width and height from the arguments, and then progress to the visualization.

>   let width = read widthAsc
>       height = read heightAsc
>       visualization = (sparkline origin width height values){ attribs = whiteStroke }
>       whiteStroke = plain{ strokeRGBA=white }

Here. These two short lines create our sparkline and set it to be black when we show it.

The next line is the main action of the program and transfers the drawing we’ve done to the physical drawing on disc.

>   renderToPNG outputfile (round width) (round height) visualization

And that’s it for main. Now to define the stats function as a function from a list of doubles to a tuple of crunched math:

> stats :: [Double] -> (Double,Double,Double,Double,Double,Double)

First define the finished product. The function foldl’ takes an initial value and folds the list into it, like one might fold egg-whites into a cake batter. The (x,x,x,x*x,1) is the initial value, defined by the first value in the list of data.

> stats (x:xs) = finish . foldl' stats' (x,x,x,x*x,1) $ xs

Now we define stats’, our folding function, which calculates the global extrema, list length, sum, and sum of squares all at once. Why, when Haskell defines for us so conveniently functions for maximum, minimum, sum, and length, would we want to do this? Simple. This is done in one list traversal instead of several. One for loop as opposed to five.

> stats' (mx,mn,s,ss,n) x = ( max x mx
>                           , min x mn
>                           , s + x
>                           , ss + x*x
>                           , n+1 )

Now we get to the finish. We’ve calculated the sum and sum of squares, but not the mean or standard deviation. We replace those now, and the task of computing summary stats is finished.

> finish (mx,mn,s,ss,n) = (mx,mn,av,va,stdev,n)
>    where av = s/n
>          va = ss/(n-1) - n*av*av/(n-1)
>          stdev = sqrt va

One last function to define before we’re done. Remap takes a value in the range [a,b] and remaps it to the range [c,d]

> remap :: Double -> Double -> Double -> Double -> Double -> Double
> remap mx mn mx' mn' x = (x-mn) / (mx-mn) * (mx'-mn') + mn'

That simple, and in a very few lines of code, and with no modules imported other than Hieroglyph, we’ve created a sparkline!

Hieroglyph contains primitives for lines, splines, arcs, quads, text, and loading and placing images. These can be used as we’ve seen in this module, in static images, or they can be used in interactive, dynamic visualizations by adding the Interactive scaffolding module.

Visualizing groundwater nitrate concentration

Wednesday, January 21st, 2009

This one brings back memories.  This is a snapshot from my first ever Haskell visualization, which I sadly no longer have the full code for.  I actually used Haskell here to calculate the data points and output textures and height-fields that were then imported into POV-Ray and rendered to give you what you see here.  It was my first experience with Control.Parallel.Strategies and my first experience with arrays in Haskell.  The thing I’m particularly proud of are the “calligraphic” contour lines that I got.

The contour lines are created by first finding every point where a contour threshold is crossed and then along with the point, calculating the gradient of the heightfield slope perpendicular to that point.  The flatter the slope, the wider the line.  The more tilted the slope, the narrower the line.  The boundaries of the lines mark where the height has changed from the threshold by at least a meter. The lighter side of the line shows the “upside” of the slope.  The darker side of the line shows the “downside” of the slope.

The actual data itself is an inverse-square interpolated set of points on a regular grid; each point represents a sensor installed in a 200 square yard field in eastern North Carolina.  These sensors were measured twice a month for several years.  The sensors measure two things that are shown on this visualization:  groundwater depth and groundwater nitrate concentration.  Depth is shown by the actual surface.  Nitrate concentration is shown by the colormap.  The blues you see are where nitrates are within EPA regulations.  The yellows and reds are outside EPA regulations.  Nitrate concentrations above EPA regulations can cause “blue baby syndrome” among other health problems, and safe levels of the toxin are not known for wildlife.  Pollution is most often caused by leaky hog-waste cesspools.

Groundwater nitrate concentrations in a test field in eastern NC

Groundwater nitrate concentrations in a test field in eastern NC