So in an application where you have a lot of data columns indexed by name, it’s often necessary to sort that data based on one or more of the columns. You might be tempted to do this the straightforward way of
- Extract the data from the map using toList
- unzip and transpose to create rows instead of columns
- sort the resultant list using Data.List.sort or Data.List.SortBy
- transpose, zip, and put the data into a new map.
Now that’s a reasonable way to do things, but it raises one minor issue, which might not be so minor if you’ve got a really large dataset. It’s “head-strict”. The second you access one of the elements of the map, the entire sort procedure takes place for all the data in the map. If you’re only needing to access a column or two of a large multicolumn dataset once it’s sorted and on top of that, are only needing a few elements of those columns, this is SERIOUSLY overkill. Here’s a better way:
module MapSort where import qualified Data.Map as Map import qualified Data.List as List import qualified Data.IntMap as IntMap sort mp key = Map.map reorder mp where sort' vs = List.sort . zip vs $ indices ordering = map snd . sort' $ (mp Map.! key) indices = enumFrom (0::Int) reorder vs = IntMap.toAscList . IntMap.fromList $ zip ordering vs
This solution is head-strict per column instead of for the whole map. How did we achieve that? We created an ordering out of the sort key column and then applied that ordering on each one of the lists stored in the map individually. Simple and efficient. There are probably even more efficiencies to be had here, but this one works for me quite well. From here, it should be straightforward to implement sortBy and even use a list of keys instead of a single key as the sorting key for the map.
indices = [0..]
nitems is unused.
Indeed you’re right that nitems is unused. Removed. As for the [0..], I’m not fond of that syntax and don’t personally use it. On top of that, it tends to create Integers unless you’re careful, which turn out to be a lot slower than Ints. For consistency’s sake, I prefer iterate to [0..].
… or use DSU (decorate-sort-undecorate):
sort mp key = Map.map reorder mp
where reorder vs = map snd . List.sort $ zip vs (mp Map.! key)
[0..] and iterate (+1) 0 should always result in a list of Int/Integer in the same circumstances. Defaulting gives rise to a list of Integer in either case.
If you prefer iterate, then you might as well just cut to the chase, and use enumFrom (0::Int) instead.
Why try to specialize a general function, when there’s a specialized function at hand?
Ah ha! Thanks. Couldn’t remember the name of that one.