Heiroglyph: Interactively graph a spreadsheet in 99 lines of code.

Written by Jeff Heard on February 5th, 2009

screenshot-sparklines-for-breast-cancer-wisconsin-conttab-1

99 lines of code, even with imports. The app is simple, but it builds upon the sparkline example of the other day. In this app, we import an entire spreadsheet of tabular data and create sparklines for each column of data where the data is floating point. The user can sort the data by a column by clicking on the left side of a sparkline. This doesn’t do a lot of error checking, and only works on spreadsheets where the columnar data type is homogenous, but that’s a large class of datasets, and it simplifies our code considerably. First, our boring imports:

> module Main where
>
> import Graphics.Rendering.Hieroglyph
> import Graphics.Rendering.Hieroglyph.Scaffolds.Interactive
> import qualified Data.Map as Map
> import Data.Map (Map)
> import qualified Data.Set as Set
> import Data.Set (Set)
> import qualified Data.ByteString.Lazy.Char8 as C
> import Data.List (foldl',transpose,mapAccumL,sort)
> import System.Environment (getArgs)
> import Data.Maybe (fromJust)
> import qualified Data.IntMap as IntMap

Now we setup a bit of color for our visualization: attribute sets for describing thin black lines (the sparklines), thicker black lines (for the text), and the background color.

> blackstroke = plain{ strokeRGBA=black, linewidth=0.5 }
> blacktext = plain{ strokeRGBA=black }
> whitefill = plain{ fillRGBA=white, filled=True, outlined=False }

Then we setup our datatype that we’re visualizing. Note that we’re using the interactive scaffolding, so this will be an interactive application.

> data MyData = MyData {
>     interactiveScaffolding :: Scaffolding
>   , spreadsheet :: Map String [Double]
>   , ncolumns :: Int
>   }

Next we setup the data so that it’s responsive to the user input.

> instance Interactive MyData where
>     getInteractiveScaffolding = interactiveScaffolding
>     setInteractiveScaffolding a b = a{ interactiveScaffolding = b }
>
>     interactionFold dat geom

Yes, the 80 is arbitrary. It seems to work well, and to some extent, this is a draft application and isn’t meant to be perfect. Here, if the user clicks on the left side of the application, we check to see which sparkline the user is over and sort that column in ascending order.

>         | xcoord < 80 && getMouseLeftButtonDown dat = dat { spreadsheet = sortSpreadsheet dat }

Then we ignore all other inputs.

>         | otherwise = dat
>                 where Point xcoord ycoord = getMousePosition dat

Next is our familiar sparkline function from my blog post the other day. As a review, we remap all the values in the line to be inside the height of a sparkline and we take those and create a path out of them.

> sparkline width height (Point startx starty) values = path{ begin=point0 , segments=map Line points , attribs=blackstroke }
>    where (point0:points) = zipWith Point xvals yvals
>          xvals = iterate (+(width/n)) startx
>          yvals = map (remap mn mx starty (starty+height)) values
>          (mx,mn,_,_,_,n) = stats values

To visualize a whole spreadsheet, we start at the origin and work our way down using Map.mapAccum to create a visualization that is the same structure as our spreadsheet. We could simply pull the columns out of the spreadsheet and return a list of primitives instead of a map from name to primitives, but this way, we could filter the output from this function based on the structure of the original data. To me, this is closer to the intent of visualization; it’s less about the drawing aspect and more about the data, and manipulations on visualizations are based on the data, not the geometry or the screen.

> visSpreadsheet width height (Point startx starty) mp = snd . Map.mapAccum stackSparklines starty $ mp
>     where stackSparklines starty' values = (starty'+height, sparkline width height (Point startx starty') values)

This function places the names of the columns to the left of the sparklines.

> visSpreadsheetNames height starty names = snd . mapAccumL stackNames starty $ names
>     where stackNames starty' name = (starty'+height, text{ str=name, bottomleft = Point 5 starty', attribs=blacktext })

This merely defines our background rectangle

> background w h = rectangle{ width = w , height = h , attribs=whitefill }

And finally, vis calls our vis functions and also sets up the occlusion order, putting the names and sparklines on top of the background. The height of each sparkline is the number of sparklines divided by the height of the window.

> vis dat = visSpreadsheetNames (sy/ncf) 10 names
>           `over` visSpreadsheet (sx-offset) (sy/ncf) (Point offset 0) sheet
>           `over` background sx sy
>     where sx = getSizeX dat
>           sy = getSizeY dat
>           nc = ncolumns dat
>           ncf = fromIntegral nc
>           sheet = spreadsheet dat
>           Point xcoord ycoord = getMousePosition dat
>           offset = 6 * (fromIntegral . maximum . map length $ names)
>           names = Map.keys sheet

This is a simple function to read a spreadsheet and filter out columns that aren’t numeric. Note that it only checks the first item of each column, so it really doesn’t do sufficient error checking, however it’s faster than loading the entire spreadsheet to check to make sure each is numeric. Yes, we could have used Parsec here instead of defining our own isNumeric, but I wanted to keep down the number of libraries I was importing for tutorial’s sake.

> readSpreadsheet name = do
>      sheet <- (transpose . map (C.split '\t') . C.lines) `fmap` C.readFile name
>      return $ foldl' go Map.empty sheet
>   where go m (x:xs) | isNumeric xs = Map.insert (C.unpack x) (map (read . C.unpack) xs) m
>                     | otherwise = m
>         isNumeric = C.all isNumeral . head
>         isNumeral = (flip Set.member) nums
>         nums = Set.fromList "0123456789."

Remap a range of values to a different range. This is used in the sparkline function, and since our origin is at the top left and values go down, the seeming reversal of the first two arguments of the function is correct for our purposes.

> remap mx mn mn' mx' a = (mx'-mn') * (a-mn) / (mx-mn) + mn'

Once again, you should recognize this stats function from the blog post from the other day.

> stats (x:xs) = finish . foldl' stats' (x,x,x,x*x,1) $ xs
>     where stats' (mx,mn,s,ss,n) x = ( max x mx , min x mn , s + x , ss + x*x , n+1 )
>
> finish (mx,mn,s,ss,n) = (mx,mn,av,va,stdev,n)
>    where av = s/n
>          va = ss/(n-1) - n*av*av/(n-1)
>          stdev = sqrt va

This sorts the spreadsheet based on which sparkline the mouse pointer is on using the mapsort algorithm below.

> sortSpreadsheet dat = mapsort sheet (names !! item)
>     where sy = getSizeY dat
>           nc = ncolumns dat
>           ncf = fromIntegral nc
>           item = round $ (ycoord / sy) * ncf
>           Point _ ycoord = getMousePosition dat
>           names = Map.keys sheet
>           sheet = spreadsheet dat

This function is a little more elegant than it looks. It’s not terribly useful to do things this way here, but this is a lazy version of sorting the entire map. The more straightforward way to sort the map uses transpose to turn columns into rows, but also makes the whole thing head strict on the first element retrieved from the map. This is only head strict per column.

> mapsort mp key = Map.map reorder mp
>     where sort' vs = sort . zip vs $ indices
>           ordering = map snd . sort' $ (mp Map.! key)
>           nitems = (length ordering :: Int)
>           indices = iterate (+1) (0::Int)
>           reorder vs = (let arr = IntMap.fromList . zip indices $ vs in map (arr IntMap.!) ordering)

Now finally our very simple main. Take the argument from the command line that is our spreadsheet and create the visualization from it. Note that this GUI is not motion sensitive, as it really doesn’t need to be.

> main = do
>     [fname] <- getArgs
>     sheet <- readSpreadsheet fname
>     let dat = MyData scaffold sheet (Map.size sheet)
>     simpleGui dat vis ("Sparklines for " ++ fname)

1 Comments so far ↓

  1. Ben says:

    Just wanted to say that this is a fantastic project and I hope to be able to use it soon. (Time is a bit short these days!)

Leave a Comment