SI prefixes in ggplot2 axis labels

I used library("sos"); findFn("{SI prefix}") to find the sitools package.

Construct data:

bytes <- 2^seq(0,20) + rnorm(21, 4, 2)
time <- bytes/(1e4 + rnorm(21, 100, 3)) + 8
my_data <- data.frame(time, bytes)

Load packages:

library("sitools")
library("ggplot2")    

Create the plot:

(p <- ggplot(data=my_data, aes(x=bytes, y=time)) +
     geom_point() +
     geom_line() +
     scale_x_log10("Message Size [Byte]", labels=f2si) +
     scale_y_continuous("Round-Trip-Time [us]"))

I'm not sure how this compares to your function, but at least someone else went to the trouble of writing it ...

I modified your code style a little bit -- semicolons at the ends of lines are harmless but are generally the sign of a MATLAB or C coder ...

edit: I initially defined a generic formatting function

si_format <- function(...) {
    function(x) f2si(x,...)
}

following the format of (e.g) scales::comma_format, but that seems unnecessary in this case -- just part of the deeper ggplot2 magic that I don't fully understand.

The OP's code gives what seems to me to be not quite the right answer: the rightmost axis tick is "1000K" rather than "1M" -- this can be fixed by changing the >1e6 test to >=1e6. On the other hand, f2si uses lower-case k -- I don't know whether K is wanted (wrapping the results in toupper() could fix this).

OP results (si_vec):

enter image description here

My results (f2si):

enter image description here


Update: Recent versions of the scales package include functionality to print readable labels.

In this case, label_bytes can be used:

library(ggplot2)
library(scales)

bytes <- 2^seq(0,20) + rnorm(21, 4, 2)

my_data <- data.frame(
    bytes=as.integer(bytes),
    time=bytes / (1e4 + rnorm(21, 100, 3)) + 8
)

ggplot(data=my_data, aes(x=bytes, y=time)) +
    geom_point() +
    geom_line() +
    scale_x_log10("Message Size [Byte]", labels=label_bytes()) +
    scale_y_continuous("Round-Trip-Time [us]")

scales-si-labels

Or, if you prefer to have IEC units (KiB = 2^10, MiB = 2 ^ 20, ...), specify labels=label_bytes(units = "auto_binary"). For the result, check out the second plot in the original answer below as the result is very similar.


Original answer

For bytes there is gdata::humanReadable. humanReadable supports both SI prefixes (1000 Byte = 1 KB) as well as the binary prefixes defined by the IEC (1024 Byte = 1 KiB).

This function humanReadableLabs allows to customise the parameters and takes care of NA values:

humanReadableLabs <- function(...) {
    function(x) {
        sapply(x, function(val) {
            if (is.na(val)) {
                return("")
            } else {
                return(
                    humanReadable(val, ...)
                )
            }
        })
    }
}

Now it is straightforward to change the labels to use SI prefixes and "byte" as the unit:

library(ggplot2)
library(gdata)

bytes <- 2^seq(0,20) + rnorm(21, 4, 2)

my_data <- data.frame(
    bytes=as.integer(bytes),
    time=bytes / (1e4 + rnorm(21, 100, 3)) + 8
)

humanReadableLabs <- function(...) {...}

ggplot(data=my_data, aes(x=bytes, y=time)) +
    geom_point() +
    geom_line() +
    scale_x_log10("Message Size [Byte]", labels=humanReadableLabs(standard="SI")) +
    scale_y_continuous("Round-Trip-Time [us]")

si-labels

IEC prefixes are plotted by omitting standard="SI". Note that the breaks would have to be specified as well to have well-legible values.

ggplot(data=my_data, aes(x=bytes, y=time)) +
    geom_point() +
    geom_line() +
    scale_x_log10("Message Size [Byte]", labels=humanReadableLabs()) +
    scale_y_continuous("Round-Trip-Time [us]")

iec-labels

Tags:

R

Ggplot2