Scala simple histogram

How about this:

val num_bins = 20
val mx = a.max.toDouble
val mn = a.min.toDouble
val hist = a
    .map(x=>(((x.toDouble-mn)/(mx-mn))*num_bins).floor.toInt)
    .groupBy(x=>x)
    .map(x=>x._1->x._2.size)
    .toSeq
    .sortBy(x=>x._1)
    .map(x=>x._2)

A very similar preparation of values as in @om-nom-nom 's answer, yet the histogram method quite small by using partition,

case class Distribution(nBins: Int, data: List[Double]) {
  require(data.length > nBins)

  val Epsilon = 0.000001
  val (max,min) = (data.max,data.min)
  val binWidth = (max - min) / nBins + Epsilon
  val bounds = (1 to nBins).map { x => min + binWidth * x }.toList

  def histo(bounds: List[Double], data: List[Double]): List[List[Double]] =
    bounds match {
      case h :: Nil => List(data)
      case h :: t   => val (l,r) = data.partition( _ < h) ; l :: histo(t,r)
    }

  val histogram = histo(bounds, data)
}

Then for

val data = Array.tabulate(100){ _ => scala.util.Random.nextDouble * 10 }
val h = Distribution(5, data.toList).histogram

and so

val tabulated = h.map {_.size}

How about this?

object Hist {

    type Bins = Map[Double, List[Double]]
    // artificially increasing bucket length to overcome last-point issue 
    private val Epsilon = 0.000001

    def histogram(data: List[Double], binsCount: Int) = {
        require(data.length > binsCount)
        val sorted = data.sorted
        val min = sorted.head
        val max = sorted.last
        val binLength = (max - min) / binsCount + Epsilon

        val bins = Map.empty[Double, List[Double]].withDefaultValue(Nil)

        scatterToBins(sorted, min + binLength, binLength, bins)
    }

    @annotation.tailrec
    private def scatterToBins(xs: List[Double], upperBound: Double, binLength: Double, bins: Bins): Bins = xs match {
        case Nil         => bins
        case point::tail => 
            val bound = if (point < upperBound) upperBound else upperBound + binLength
            val currentBin = bins(bound)
            val newBin = point::currentBin
            scatterToBins(tail, bound, binLength, bins + (bound -> newBin))             
    }

    // now let's test this out
    val data = Array.tabulate(100){ _ => scala.util.Random.nextDouble * 10 }

    val result = histogram(data.toList, 5)

    val pointsPerBucket = result.values.map(xs => xs.length)
}

Which yield the following output:

scala> Hist.result
// res14: Hist.Bins = Map(4.043605797342332 -> List(4.031739029821568, 3.826704675600351, 3.7661438110766166, 3.680326808626887, 3.6788463836133767, 3.5442867825350266, 3.5156167603774904, 3.464310876575163, 3.3796397333178216, 3.33851670739545, 3.1702423754536504, 3.1681320879333708, 2.9520859637868204, 2.885027245987456, 2.8091011617711024, 2.745475619527371, 2.520275275070399, 2.3720116613386546, 2.2909255324112374, 2.229522549904405, 2.0693233045454895), 6.0237846547671845 -> List(5.957572654029027, 5.6887311125180675, 5.356707271645041, 5.3155138169898475, 5.285634121992783, 5.2823949256676865, 5.159891625116016, 5.152024494453849, 5.063625430476634, 4.903706519410671, 4.891005992072018, 4.857168214245934, 4.845526801893324, 4.845452341208768, 4.8205059750156, 4.799306005256147, 4.751...
scala> Hist.pointsPerBucket
// res15: Iterable[Int] = List(21, 23, 15, 22, 19)

I've cheated a bit by using Lists instead of Arrays, but I hope it's okay for you

Scala simple histogram

Tags:

Scala

Histogram

Scala Collections

Related

Recent Posts