What is the purpose of the ArgMin and ArgMax type synonyms in Data.Semigroup?
I suppose it's one of those things that exist in Haskell because the theoretical concept exists. I'm not sure if these types have much practical use, but they do illustrate just how extensive the concepts of semigroups and monoids are in relation to programming.
Imagine, for example, that you need to pick the longest of two names, name1
and name2
, both of them String
values. You can use the Semigroup
instance of ArgMax
for that:
Prelude Data.Semigroup> Max (Arg (length name1) name1) <> Max (Arg (length name2) name2)
Max {getMax = Arg 5 "Alice"}
After that, it's just a question of unwrapping "Alice"
from its container.
As Willem Van Onsem points out in the comments, you can use ArgMax
and ArgMin
to pick the maximum or minimum item, according to some attribute of the item, but still keeping the original item around.
The purpose of them is to implement things like minimumOn
:
minimumOn :: (Ord b, Foldable f) => (a -> b) -> f a -> Maybe a
minimumOn f = fmap (getArg . getMin)
. getOption
. foldMap (Option . Just . Min . (Arg =<< f))
-- ^^^^^^^^^^
-- ArgMin
where
getArg (Arg _ x) = x
While this implementation might look a little convoluted, it's often helpful to implement things using general concepts like monoids. For instance, in this case, it is straightforward to adapt the above code to compute the min and max in a single pass.
I reach for ArgMin
/ ArgMax
when:
I want to compute (a function of) the minimum/maximum of some values according to a comparison function
The comparison is costly or unwieldy to recompute, so I want to cache its result; and/or
I want to do it monoidally with
foldMap
instead of with an explicit/specialisedminimumBy
/maximumBy
orsortOn
, to leave it flexible to changes in the future such as a different monoid or parallelisation
Here’s an adaptation of a recent real-world example from my job, findNextWorkerQueue
, which takes a map from workers to tasks and finds the worker with the earliest first task, e.g. given this input:
Worker 1:
- Time 10: Task A
- Time 12: Task B
- Time 14: Task C
Worker 2:
- Time 5: Task D
- Time 10: Task E
- Time 15: Task F
Worker 3:
- Time 22: Task G
- Time 44: Task H
It would produce a start time of 5, and a work queue describing worker 2, with a first task of D, and subsequent tasks of E & F.
{-# LANGUAGE ScopedTypeVariables #-}
import Data.Map (Map)
import Data.Semigroup (Arg(..), Min(..), Option(..))
import Data.Sequence (Seq(Empty, (:<|)))
import qualified Data.Map as Map
-- An enumeration of computation units for running tasks.
data WorkerId = …
-- The timestamp at which a task runs.
type Time = Int
-- Some kind of task scheduled at a timestamp.
data Scheduled task = Scheduled
{ schedAt :: !Time
, schedItem :: !task
}
-- A non-empty sequence of work assigned to a worker.
data WorkQueue task = WorkQueue
{ wqId :: !WorkerId
, wqFirst :: !(Scheduled task)
, wqRest :: !(Seq (Scheduled task))
}
-- | Find the lowest worker ID with the first scheduled task,
-- if any, and return its scheduled time and work queue.
findNextWorkerQueue
:: forall task
. Map WorkerId (Seq (Scheduled task))
-> Maybe (Time, WorkerQueue task)
findNextWorkerQueue
= fmap getTimeAndQueue . getOption
. foldMap (uncurry minWorkerTask) . Map.assocs
where
minWorkerTask
:: WorkerId
-> Seq (Scheduled task)
-> Option (Min (Arg (Time, WorkerId) (WorkQueue task)))
minWorkerTask wid tasks = Option $ case tasks of
Empty -> Nothing
t :<| ts -> Just $ Min $ Arg
(schedTime t, wid)
WorkQueue { wqId = wid, wqFirst = t, wqRest = ts }
getTimeAndQueue
:: Min (Arg (Time, WorkerId) (WorkQueue task))
-> (Time, WorkQueue task)
getTimeAndQueue (Min (Arg (time, _) queue))
= (time, queue)
(Note that this is using Option
to support GHC 8.6; in GHC ≥8.8, Maybe
has an improved Monoid
instance depending on Semigroup
instead of Monoid
, so we can use it with Min
without imposing a Bounded
constraint. The time signatures are just for clarity here.)