laziness browsing by tag


Snippet: sorting the elements of a map

Thursday, February 5th, 2009

So in an application where you have a lot of data columns indexed by name, it’s often necessary to sort that data based on one or more of the columns.  You might be tempted to do this the straightforward way of

  1. Extract the data from the map using toList
  2. unzip and transpose to create rows instead of columns
  3. sort the resultant list using Data.List.sort or Data.List.SortBy
  4. transpose, zip, and put the data into a new map.

Now that’s a reasonable way to do things, but it raises one minor issue, which might not be so minor if you’ve got a really large dataset.  It’s “head-strict”.  The second you access one of the elements of the map, the entire sort procedure takes place for all the data in the map.  If you’re only needing to access a column or two of a large multicolumn dataset once it’s sorted and on top of that, are only needing a few elements of those columns, this is SERIOUSLY overkill.  Here’s a better way:

module MapSort where

import qualified Data.Map as Map
import qualified Data.List as List
import qualified Data.IntMap as IntMap

sort mp key = reorder mp
    where sort' vs = List.sort . zip vs $ indices
          ordering = map snd . sort' $ (mp Map.! key)
          indices = enumFrom (0::Int)
          reorder vs = IntMap.toAscList . IntMap.fromList $ zip ordering vs

This solution is head-strict per column instead of for the whole map.  How did we achieve that?  We created an ordering out of the sort key column and then applied that ordering on each one of the lists stored in the map individually.  Simple and efficient.  There are probably even more efficiencies to be had here, but this one works for me quite well.  From here, it should be straightforward to implement sortBy and even use a list of keys instead of a single key as the sorting key for the map.