Chapter 2 Figures
Tips for using ggplot
to generate publication-quality graphs {#ggplot}
2.1 Plot margins
2.1.1 Removing white space around the plot
To remove the white space around the plot, set the plot margins equal to 0. The order is [top, right, bottom, left].
%>%
figure theme(plot.margin = margin(0, 0, 0, 0))
This setting is useful when working with package cowplot
to generate multi-panel figures. cowplot::plot_grid
often overlays panel labels on top of the figures, so you can add space to the top of the figure:
plot_grid(g1 + theme(plot.margin = margin(t = 15)),
+ theme(plot.margin = margin(t = 15)),
g2 nrow = 2)
2.1.2 Removing white space between axes and plot
Sometimes it’s useful to reduce the distance between the plot and axis text. You can do this by reducing the top margin of the x-axis text and the right margin of the y-axis text:
%>%
figure theme(axis.text.x = element_text(margin = margin(t = -5, r = 0, b = 0, l = 0)),
axis.text.y = element_text(margin = margin(t = 0, r = -5, b = 0, l = 0)))
2.3 Legends
2.3.1 Removing the legend title
Legend titles are always redundant: if the figure is included in the paper or report, then the title and axis label give enough information; if the figure is in a presentation, the slide title and description along with axis labels provide enough information to understand what is being plotted.
%>%
figure theme(legend.title = element_blank())
2.4 Number formatting functions
You can save these functions in a script called number_functions.R and import them in each script where they’re needed, e.g.:
source(here("scripts", "programs", "number_functions.R"))
2.4.1 Calculating the mean, median and standard deviation of a variable
# Mean
<- function(df, variable, dig = 1) {
num_mean %>%
df pull(eval(as.name(variable))) %>%
mean(na.rm = TRUE) %>%
round(digits = dig)
}
# Median
<- function(df, variable, dig = 1) {
num_median %>%
df pull(eval(as.name(variable))) %>%
median(na.rm = TRUE) %>%
round(digits = dig)
}
# Standard deviation
<- function(df, variable, dig = 1) {
num_sd %>%
df pull(eval(as.name(variable))) %>%
sd(na.rm = TRUE) %>%
round(digits = dig)
}
You can call these functions the following way:
%>% num_mean("number_employees") df
2.4.2 Checking if a number is an integer.
This is used in the functions that print numbers to .tex files, since no decimals should be added after integers.
<- function(x) {
num_int == round(x)
x }
2.4.3 Number formatting function
This function formats numbers with a standard number of digits and commas to present large and small numbers in a more readable format.
How does this function work? First, the function calculates the number of digits the number should have to the right of the decimal point. For numbers from 1-9, three digits are assigned, two digits are assigned for numbers 10-99, one for 100-999, and no right digits for numbers equal or larger than 1,000. This function sets the maximum number of right digits as 3. Therefore, 0.0001 will display as 0.000. Once the number of right digits is defined, the number is formatted. Numbers smaller than 1 are padded if necessary to ensure that there are 3 right digits (e.g., 0.25 is formatted as 0.250). The default number of right digits can be overridden with the option override_right_digits
.
This function has the same name as scales::comma_format
. However, this function has been superseded by scales::label_comma
, so there are no issues with this user-defined function taking priority over scales::comma_format
.
<- function(x, override_right_digits = NA) {
comma_format # Calculate number of right digits
if (x <= 0) {num <- 1} else {num <- x}
<- 3 - floor(log10(abs(num)))
right_digits if (right_digits < 0) {right_digits <- 0}
if (right_digits > 3) {right_digits <- 3}
if (!is.na(override_right_digits)) {right_digits <- override_right_digits}
# Calculate number of left digits
<- 4 + floor(log10(abs(num)))
left_digits if (left_digits <= 0) {left_digits <- 1}
# Format number
<- format(round(x, right_digits), nsmall = right_digits, digits = left_digits, big.mark = ",")
proc_num if (proc_num != "0" & as.numeric(str_replace(proc_num, fixed(","), "")) < 1) {
<- str_pad(proc_num, right_digits + 2, "right", "0")
proc_num
}return(proc_num)
}
Here are a few examples of the output of this function:
::p_load(tidyverse)
pacman
lapply(c(0.098, 0.11, 3.1233, 45.968, 1949), comma_format) %>% unlist()
## [1] "0.098" "0.110" "3.123" "45.97" "1,949"
2.5 Multiplot figures
2.5.1 Auxiliary functions to remove figure components: axes, legend
Functions:
<- function(plot) {
remove_axis_x +
plot theme(axis.title.x = element_blank(),
axis.text.x = element_blank())
}<- function(plot) {
remove_axis_y +
plot theme(axis.title.y = element_blank(),
axis.text.y = element_blank())
}<- function(plot) {
remove_legend +
plot theme(legend.position = "none")
}
Usage:
Let g1
, g2
and g3
be three figures with the same x-axis. We can plot the three figures in one multipanel figure, removing the legend and x-axis from all but the bottom figure.
plot_grid(g1 %>% remove_legend() %>% remove_axis_x(),
%>% remove_legend() %>% remove_axis_x(),
g2
g3, rel_heights = c(1, 1, 1),
nrow = 3,
axis = "b",
align = "h")