Snippet: sorting the elements of a map

Written by Jeff Heard on February 5th, 2009

So in an application where you have a lot of data columns indexed by name, it’s often necessary to sort that data based on one or more of the columns.  You might be tempted to do this the straightforward way of

  1. Extract the data from the map using toList
  2. unzip and transpose to create rows instead of columns
  3. sort the resultant list using Data.List.sort or Data.List.SortBy
  4. transpose, zip, and put the data into a new map.

Now that’s a reasonable way to do things, but it raises one minor issue, which might not be so minor if you’ve got a really large dataset.  It’s “head-strict”.  The second you access one of the elements of the map, the entire sort procedure takes place for all the data in the map.  If you’re only needing to access a column or two of a large multicolumn dataset once it’s sorted and on top of that, are only needing a few elements of those columns, this is SERIOUSLY overkill.  Here’s a better way:

module MapSort where

import qualified Data.Map as Map
import qualified Data.List as List
import qualified Data.IntMap as IntMap

sort mp key = reorder mp
    where sort' vs = List.sort . zip vs $ indices
          ordering = map snd . sort' $ (mp Map.! key)
          indices = enumFrom (0::Int)
          reorder vs = IntMap.toAscList . IntMap.fromList $ zip ordering vs

This solution is head-strict per column instead of for the whole map.  How did we achieve that?  We created an ordering out of the sort key column and then applied that ordering on each one of the lists stored in the map individually.  Simple and efficient.  There are probably even more efficiencies to be had here, but this one works for me quite well.  From here, it should be straightforward to implement sortBy and even use a list of keys instead of a single key as the sorting key for the map.

6 Comments so far ↓

  1. Eyal Lotem says:

    indices = [0..]
    nitems is unused.

  2. Jeff Heard says:

    Indeed you’re right that nitems is unused. Removed. As for the [0..], I’m not fond of that syntax and don’t personally use it. On top of that, it tends to create Integers unless you’re careful, which turn out to be a lot slower than Ints. For consistency’s sake, I prefer iterate to [0..].

  3. lilac says:

    … or use DSU (decorate-sort-undecorate):

    sort mp key = reorder mp
    where reorder vs = map snd . List.sort $ zip vs (mp Map.! key)

  4. Cale Gibbard says:

    [0..] and iterate (+1) 0 should always result in a list of Int/Integer in the same circumstances. Defaulting gives rise to a list of Integer in either case.

  5. BMeph says:

    If you prefer iterate, then you might as well just cut to the chase, and use enumFrom (0::Int) instead.

    Why try to specialize a general function, when there’s a specialized function at hand? ;)

  6. Jeff Heard says:

    Ah ha! Thanks. Couldn’t remember the name of that one.

Leave a Comment