How to format a complex table for rmarkdown PDF output
This is very simple to do using the add_header_above
command from the KableExtra-package. You can add as many column groupings as you want. Here is what I would do:
d <- mtcars[1:5,1:5]
kable(d,longtable = T, booktabs = T) %>%
add_header_above(c(" ", "Group 1" = 2, "Group 2" = 3)) %>%
add_header_above(c("","Groups" = 5))
Quoting this comment:
I'm looking for a way to do this programmatically from within the rmarkdown document without having to hard-code the formatting, so that it's reproducible and flexible.
The following solution uses a hard-coded "template", but the template can be filled with any data (provided it has the same 2x8 structure).
The generated table looks like this:
Full code below.
Basically, the final table consists of 9 columns, so the basic LaTeX structure is
\begin{tabular}{|c|c|c|c|c|c|c|c|c|}
% rest of table
\end{tabular}
However, it is convenient to fix the width of the cells. This is possible with the custom column type C
(taken from here on TEX.SE), which allows for centered content with fixed width. This, together with the more compact syntax for repeating column types gives:
\begin{tabular}{|c *{8}{|C{1cm}}|}
% rest of table
\end{tabular}
(First column centered with flexible width, then 8 centered columns, each 1cm wide).
The cells spanning multiple columns are possible using \multicolumn
. These cells should also have a fixed width in order to have the cell captions break into two lines. Note that it is a fallacy to assume that the cells spanning two 1cm-columns should have a width of 2cm because the two spanned cells have additional padding between them. Some measurement revealed that about 2.436cm delivers good results.
Remark on the first column: Although \multicolumn{1}{...}{...}
looks useless at first sight, it is useful for changing the column type (including left/right) borders for a single cell. I used it to drop the leftmost vertical line in the first two rows.
\cline{x-y}
provides horizontal lines that span only the columns x
to y
.
Taking these pieces together gives:
\begin{tabular}{|c *{8}{|C{1cm}}|} \cline{2-9}
\multicolumn{1}{c|}{} & \multicolumn{8}{c|}{\textbf{Predicted}} \\ \cline{2-9}
\multicolumn{1}{c|}{} & \multicolumn{2}{c|}{\textbf{Count}} & \multicolumn{2}{C{2.436cm}|}{\textbf{Overall Percent}} & \multicolumn{2}{C{2.436cm}|}{\textbf{Row \newline Percent}} & \multicolumn{2}{C{2.436cm}|}{\textbf{Column Percent}} \\ \hline
% rest of table
\end{tabular}
Regarding the data, I dropped the last line of the code that generated to sample data to get:
> x <- structure(c(34L, 6L, 9L, 35L), .Dim = c(2L, 2L), .Dimnames = structure(list(Actual = c("Fail", "Pass"), Predicted = c("Fail", "Pass")), .Names = c("Actual", "Predicted")), class = "table")
> x <- cbind(x, prop.table(x), prop.table(x, 1), prop.table(x,2))
> x[, -c(1,2)] <- sapply(x[,-c(1,2)], function(i) paste0(sprintf("%1.1f", i*100),"%"))
> x
Fail Pass Fail Pass Fail Pass Fail Pass
Fail "34" "9" "40.5%" "10.7%" "79.1%" "20.9%" "85.0%" "20.5%"
Pass "6" "35" "7.1%" "41.7%" "14.6%" "85.4%" "15.0%" "79.5%"
To set the column and row names in italics, apply
colnames(x) <- sprintf("\\emph{%s}", colnames(x)) # highlight colnames
rownames(x) <- sprintf("\\emph{%s}", rownames(x)) # highlight rownames
Then, the following xtable
code can be used:
print(xtable(x),
only.contents = TRUE,
comment = FALSE,
sanitize.colnames.function = identity,
sanitize.rownames.function = identity,
hline.after = 0:2)
The argument only.contents
suppresses the enclosing tabular
environment. Assigning the identity function to sanitize.colnames.function
and sanitize.rownames.function
means "don't sanitize". We need this because column and row names contain special LaTeX characters that should not be escaped (\emph
).
The output should replace the %rest of table
placeholder from above.
Conceptually, the code uses xtable
to generate only the table body but not the header because it is much easier to write the header manually.
Although the whole table header is "hard-coded", the data can be changed as required.
Don't forget to escape all \
with a second \
! Also, the following must be added to the header (header.tex
):
\usepackage{array}
\newcolumntype{C}[1]{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}m{#1}} % https://tex.stackexchange.com/a/12712/37118
I wrapped all the elements outlined above in a function PrintConfusionMatrix
that can be reused with any 2x8 data frame providing the data and column / row names.
Full code:
---
output:
pdf_document:
keep_tex: yes
includes:
in_header: header.tex
---
```{r, echo = FALSE}
library(xtable)
# Sample data from question
x <- structure(c(34L, 6L, 9L, 35L), .Dim = c(2L, 2L), .Dimnames = structure(list(Actual = c("Fail", "Pass"), Predicted = c("Fail", "Pass")), .Names = c("Actual", "Predicted")), class = "table")
x <- cbind(x, prop.table(x), prop.table(x, 1), prop.table(x,2))
x[, -c(1,2)] <- sapply(x[,-c(1,2)], function(i) paste0(sprintf("%1.1f", i*100),"%"))
#x <- cbind(Actual=rownames(x), x) # dropped; better not to add row names to data
PrintConfusionMatrix <- function(data, ...) {
stopifnot(all(dim(x) == c(2, 8)))
colnames(x) <- sprintf("\\emph{%s}", colnames(x)) # highlight colnames
rownames(x) <- sprintf("\\emph{%s}", rownames(x)) # highlight rownames
cat('\\begin{tabular}{|c *{8}{|C{1cm}}|} \\cline{2-9}
\\multicolumn{1}{c|}{} & \\multicolumn{8}{c|}{\\textbf{Predicted}} \\\\ \\cline{2-9}
\\multicolumn{1}{c|}{} & \\multicolumn{2}{c|}{\\textbf{Count}} & \\multicolumn{2}{C{2.436cm}|}{\\textbf{Overall Percent}} & \\multicolumn{2}{C{2.436cm}|}{\\textbf{Row \\newline Percent}} & \\multicolumn{2}{C{2.436cm}|}{\\textbf{Column Percent}} \\\\ \\hline
\\textbf{Actual} ')
print(xtable(x),
only.contents = TRUE,
comment = FALSE,
sanitize.colnames.function = identity,
sanitize.rownames.function = identity,
hline.after = 0:2,
...)
cat("\\end{tabular}")
}
```
```{r, results='asis'}
PrintConfusionMatrix(x)
```