Chapter 2 Figures

Tips for using ggplot to generate publication-quality graphs {#ggplot}

2.1 Plot margins

2.1.1 Removing white space around the plot

To remove the white space around the plot, set the plot margins equal to 0. The order is [top, right, bottom, left].

figure %>% 
  theme(plot.margin = margin(0, 0, 0, 0))

This setting is useful when working with package cowplot to generate multi-panel figures. cowplot::plot_grid often overlays panel labels on top of the figures, so you can add space to the top of the figure:

plot_grid(g1 + theme(plot.margin = margin(t = 15)),
          g2 + theme(plot.margin = margin(t = 15)),
          nrow = 2)

2.1.2 Removing white space between axes and plot

Sometimes it’s useful to reduce the distance between the plot and axis text. You can do this by reducing the top margin of the x-axis text and the right margin of the y-axis text:

figure %>% 
  theme(axis.text.x = element_text(margin = margin(t = -5, r = 0, b = 0, l = 0)),
        axis.text.y = element_text(margin = margin(t = 0, r = -5, b = 0, l = 0)))

2.2 Axis labels

2.2.1 Removing axis labels

When removing axis labels (for example, when dealing with dates in the x-axis), use labs(x = NULL) rather than labs(x = ""), as this eliminates the extra white space.

figure %>% 
  labs(x = NULL)

2.3 Legends

2.3.1 Removing the legend title

Legend titles are always redundant: if the figure is included in the paper or report, then the title and axis label give enough information; if the figure is in a presentation, the slide title and description along with axis labels provide enough information to understand what is being plotted.

figure %>% 
  theme(legend.title = element_blank())

2.3.2 Removing black boxes around legend keys

Legend keys look much better without the black border surrounding them. This should be a standard for any figure.

figure %>% 
  theme(legend.key = element_blank())

2.3.3 Putting the legend inside the figure

Useful for when there is a lot of blank space in the figure.

figure %>% 
  theme(legend.position = c(0.75, 0.85))

2.4 Number formatting functions

You can save these functions in a script called number_functions.R and import them in each script where they’re needed, e.g.:

source(here("scripts", "programs", "number_functions.R"))

2.4.1 Calculating the mean, median and standard deviation of a variable

# Mean
num_mean <- function(df, variable, dig = 1) {
  df %>% 
    pull(eval(as.name(variable))) %>% 
    mean(na.rm = TRUE) %>% 
    round(digits = dig)
}

# Median
num_median <- function(df, variable, dig = 1) {
  df %>% 
    pull(eval(as.name(variable))) %>% 
    median(na.rm = TRUE) %>% 
    round(digits = dig)
}

# Standard deviation
num_sd <- function(df, variable, dig = 1) {
  df %>% 
    pull(eval(as.name(variable))) %>% 
    sd(na.rm = TRUE) %>% 
    round(digits = dig)
}

You can call these functions the following way:

df %>% num_mean("number_employees")

2.4.2 Checking if a number is an integer.

This is used in the functions that print numbers to .tex files, since no decimals should be added after integers.

num_int <- function(x) {
  x == round(x)
}

2.4.3 Number formatting function

This function formats numbers with a standard number of digits and commas to present large and small numbers in a more readable format.

How does this function work? First, the function calculates the number of digits the number should have to the right of the decimal point. For numbers from 1-9, three digits are assigned, two digits are assigned for numbers 10-99, one for 100-999, and no right digits for numbers equal or larger than 1,000. This function sets the maximum number of right digits as 3. Therefore, 0.0001 will display as 0.000. Once the number of right digits is defined, the number is formatted. Numbers smaller than 1 are padded if necessary to ensure that there are 3 right digits (e.g., 0.25 is formatted as 0.250). The default number of right digits can be overridden with the option override_right_digits.

This function has the same name as scales::comma_format. However, this function has been superseded by scales::label_comma, so there are no issues with this user-defined function taking priority over scales::comma_format.

comma_format <- function(x, override_right_digits = NA) {
  # Calculate number of right digits
  if (x <= 0) {num <- 1} else {num <- x}
  right_digits <- 3 - floor(log10(abs(num)))
  if (right_digits < 0) {right_digits <- 0}
  if (right_digits > 3) {right_digits <- 3}
  if (!is.na(override_right_digits)) {right_digits <- override_right_digits}
  # Calculate number of left digits
  left_digits <- 4 + floor(log10(abs(num)))
  if (left_digits <= 0) {left_digits <- 1}
  # Format number
  proc_num <- format(round(x, right_digits), nsmall = right_digits, digits =  left_digits, big.mark = ",")
  if (proc_num != "0" & as.numeric(str_replace(proc_num, fixed(","), "")) < 1) {
    proc_num <- str_pad(proc_num, right_digits + 2, "right", "0")
  }
  return(proc_num)
}

Here are a few examples of the output of this function:

pacman::p_load(tidyverse)

lapply(c(0.098, 0.11, 3.1233, 45.968, 1949), comma_format) %>% unlist()
## [1] "0.098" "0.110" "3.123" "45.97" "1,949"

2.5 Multiplot figures

2.5.1 Auxiliary functions to remove figure components: axes, legend

Functions:

remove_axis_x <- function(plot) {
  plot + 
    theme(axis.title.x = element_blank(),
          axis.text.x = element_blank())
}
remove_axis_y <- function(plot) {
  plot + 
    theme(axis.title.y = element_blank(),
          axis.text.y = element_blank())
}
remove_legend <- function(plot) {
  plot +
    theme(legend.position = "none")
}

Usage: Let g1, g2 and g3 be three figures with the same x-axis. We can plot the three figures in one multipanel figure, removing the legend and x-axis from all but the bottom figure.

plot_grid(g1 %>% remove_legend() %>% remove_axis_x(), 
          g2 %>% remove_legend() %>% remove_axis_x(), 
          g3, 
          rel_heights = c(1, 1, 1),
          nrow = 3,
          axis = "b", 
          align = "h")