I decided to write this post in English, as I need to practice in writing more.

As I currently do some meta-analytic stuff, I needed to get a proper plot of the results of analysis. The existing solutions are hard to customize, so I decided to do something by myself. Basically, the forest plot is a errorbar plot of effect sizes, plus a list of studies, plus some other stuff, usually a list of effect sizes.

Here is the example code from metafor package.

Basic plot:


library(metafor)
### load BCG vaccine data
data(dat.bcg)

res <- rma(ai = tpos, bi = tneg, ci = cpos, di = cneg, 
    data = dat.bcg, measure = "RR", slab = paste(author, 
        year, sep = ", "), method = "REML")
par(bg = "white")
forest(res, slab = paste(dat.bcg$author, dat.bcg$year, 
    sep = ", "), xlim = c(-16, 6), at = log(c(0.05, 
    0.25, 1, 4)), atransf = exp, ilab = cbind(dat.bcg$tpos, 
    dat.bcg$tneg, dat.bcg$cpos, dat.bcg$cneg), ilab.xpos = c(-9.5, 
    -8, -6, -4.5), cex = 0.75, ylim = c(-1, 27), order = order(dat.bcg$alloc), 
    rows = c(3:4, 9:15, 20:23), mlab = "RE Model for All Studies")

Forest plot from metafor without annotations

Plot with grouping:

par(bg = "white")
forest(res, slab = paste(dat.bcg$author, dat.bcg$year, 
    sep = ", "), xlim = c(-16, 6), at = log(c(0.05, 
    0.25, 1, 4)), atransf = exp, ilab = cbind(dat.bcg$tpos, 
    dat.bcg$tneg, dat.bcg$cpos, dat.bcg$cneg), ilab.xpos = c(-9.5, 
    -8, -6, -4.5), cex = 0.75, ylim = c(-1, 27), order = order(dat.bcg$alloc), 
    rows = c(3:4, 9:15, 20:23), mlab = "RE Model for All Studies")

op <- par(cex = 0.75, font = 4)
text(-16, c(24, 16, 5), c("Systematic Allcoation", 
    "Random Allocation", "Alternate Allocation"), pos = 4)
par(font = 2)
text(c(-9.5, -8, -6, -4.5), 26, c("TB+", "TB-", "TB+", 
    "TB-"))
text(c(-8.75, -5.25), 27, c("Vaccinated", "Control"))
text(-16, 26, "Author(s) and Year", pos = 4)
text(6, 26, "Relative Risk [95% CI]", pos = 2)
par(op)
res <- rma(ai = tpos, bi = tneg, ci = cpos, di = cneg, 
    data = dat.bcg, measure = "RR", subset = (alloc == 
        "systematic"), method = "REML")
addpoly(res, row = 18.5, cex = 0.75, atransf = exp, 
    mlab = "RE Model for Subgroup")
res <- rma(ai = tpos, bi = tneg, ci = cpos, di = cneg, 
    data = dat.bcg, measure = "RR", subset = (alloc == 
        "random"), method = "REML")
addpoly(res, row = 7.5, cex = 0.75, atransf = exp, 
    mlab = "RE Model for Subgroup")
res <- rma(ai = tpos, bi = tneg, ci = cpos, di = cneg, 
    data = dat.bcg, measure = "RR", subset = (alloc == 
        "alternate"), method = "REML")
addpoly(res, row = 1.5, cex = 0.75, atransf = exp, 
    mlab = "RE Model for Subgroup")

Forest plot from metafor with annotations

I see two problems with this code. First, many things are done manually. You have to set all coordinates for groups by hand. It is OK, if you have three groups, but what if there are more? Second, I don't like basic plot function for its lack of flexibility.

So the following coide is aimed at reproduction of this plot using ggplot2. It won't be an exact reproduction, but it is quite easy to add neccessary parts once you've got the idea.

First, I load neccessary libraries:

library(extrafont)  #Open Sans font for plot

loadfonts("postscript", quiet = T)

library(metafor)  #To compute weights and resulting effect sizes
library(xtable)  #xtable is used for creating HTML tables in this post

library(devtools)  #To use latest version of ggplot2
dev_mode(T)

library(ggplot2)
library(gridExtra)  #to set up plot grid
library(stringr)  #string formatting functions
library(divisors)  #this one will be used to set scale range and breaks
library(plyr)  #rbind.fill function

library(reshape2)  #transformation of tables

Now the main part begins. I start with computing ESs, corresponding SEs and confidence intervals.

data(dat.bcg)

### calculate log relative risks and
### corresponding sampling variances
dat <- escalc(measure = "RR", ai = tpos, bi = tneg, 
    ci = cpos, di = cneg, data = dat.bcg, append = TRUE)
dat$se <- sqrt(dat$vi)
dat <- transform(dat, ci.lower = yi - 1.96 * se, ci.upper = yi + 
    1.96 * se)

print(xtable(dat[, c("author", "year", "alloc", "yi", 
    "vi", "se", "ci.lower", "ci.upper")]), html.table.attributes = NULL, 
    type = "HTML")

author year alloc yi vi se ci.lower ci.upper
1 Aronson 1948 random -0.89 0.33 0.57 -2.01 0.23
2 Ferguson &amp Simes 1949 random -1.59 0.19 0.44 -2.45 -0.72
3 Rosenthal et al 1960 random -1.35 0.42 0.64 -2.61 -0.08
4 Hart &amp Sutherland 1977 random -1.44 0.02 0.14 -1.72 -1.16
5 Frimodt-Moller et al 1973 alternate -0.22 0.05 0.23 -0.66 0.23
6 Stein &amp Aronson 1953 alternate -0.79 0.01 0.08 -0.95 -0.62
7 Vandiviere et al 1973 random -1.62 0.22 0.47 -2.55 -0.70
8 TPT Madras 1980 random 0.01 0.00 0.06 -0.11 0.14
9 Coetzee &amp Berjak 1968 random -0.47 0.06 0.24 -0.94 -0.00
10 Rosenthal et al 1961 systematic -1.37 0.07 0.27 -1.90 -0.84
11 Comstock et al 1974 systematic -0.34 0.01 0.11 -0.56 -0.12
12 Comstock &amp Webster 1969 systematic 0.45 0.53 0.73 -0.98 1.88
13 Comstock et al 1976 systematic -0.02 0.07 0.27 -0.54 0.51

Now the trick that will be used for grouping. I make an additional row in our data frame for each group:

for (i in unique(dat$alloc)) {
    dat <- rbind.fill(dat, data.frame(year = 0, alloc = i, 
        stringsAsFactors = F))
}

print(xtable(dat[, c("author", "year", "alloc", "yi", 
    "vi", "se", "ci.lower", "ci.upper")]), html.table.attributes = NULL, 
    type = "HTML")

author year alloc yi vi se ci.lower ci.upper
1 Aronson 1948.00 random -0.89 0.33 0.57 -2.01 0.23
2 Ferguson &amp Simes 1949.00 random -1.59 0.19 0.44 -2.45 -0.72
3 Rosenthal et al 1960.00 random -1.35 0.42 0.64 -2.61 -0.08
4 Hart &amp Sutherland 1977.00 random -1.44 0.02 0.14 -1.72 -1.16
5 Frimodt-Moller et al 1973.00 alternate -0.22 0.05 0.23 -0.66 0.23
6 Stein &amp Aronson 1953.00 alternate -0.79 0.01 0.08 -0.95 -0.62
7 Vandiviere et al 1973.00 random -1.62 0.22 0.47 -2.55 -0.70
8 TPT Madras 1980.00 random 0.01 0.00 0.06 -0.11 0.14
9 Coetzee &amp Berjak 1968.00 random -0.47 0.06 0.24 -0.94 -0.00
10 Rosenthal et al 1961.00 systematic -1.37 0.07 0.27 -1.90 -0.84
11 Comstock et al 1974.00 systematic -0.34 0.01 0.11 -0.56 -0.12
12 Comstock &amp Webster 1969.00 systematic 0.45 0.53 0.73 -0.98 1.88
13 Comstock et al 1976.00 systematic -0.02 0.07 0.27 -0.54 0.51
14 0.00 random
15 0.00 alternate
16 0.00 systematic

Then a title factor is created, which will be used for labels. Also I make a numeric version of this factor, which will be used for positioning of rows in the plot.

dat$title <- factor(paste(dat$alloc, dat$year, sep = ", "))
dat$titleN <- as.numeric(dat$title)

Allocation is converted to factor with alternate being the reference level.

dat$alloc <- factor(dat$alloc, levels = c("alternate", 
    "random", "systematic"), labels = c("Alternate", 
    "Random", "Systematic"))

Next I create regression model to get summary effects and extract weights for each study. These weights are in turn appended to the dat data frame.

res.rdat.noi <- rma(data = dat[!is.na(dat$yi), ], yi = yi, 
    vi = vi, mods = ~alloc - 1, slab = title, method = "REML")
res.rdat.dat <- data.frame(b = res.rdat.noi$b, SE.b = res.rdat.noi$se, 
    alloc = levels(dat$alloc))

res.rdat.w <- data.frame(weights(res.rdat.noi))
names(res.rdat.w) <- "w"
res.rdat.w$title <- rownames(res.rdat.w)

dat <- merge(dat, res.rdat.w, by = "title", all = T)
print(xtable(dat[, c("title", "author", "year", "alloc", 
    "yi", "vi", "se", "ci.lower", "ci.upper", "w")]), 
    type = "HTML")

Now I compute effect sizes scale range and a nice number of breaks (from 8 to 12):

rng <- range(na.omit(c(dat$ci.lower, dat$ci.upper)))

floor.dec <- function(x, digits) {
    r <- round(x, digits)
    r <- ifelse(r > x, r - 10^(-digits), r)
    r
}
ceiling.dec <- function(x, digits) {
    r <- round(x, digits)
    r <- ifelse(r < x, r + 10^(-digits), r)
    r
}
rng[1] <- floor.dec(rng[1], 1)
rng[2] <- ceiling.dec(rng[2], 1)

scale.rng <- diff(rng * 10)

divs = divisors(scale.rng)$divs

while (!sum(ifelse(divs > 6 & divs < 14, 1, 0))) {
    scale.rng = scale.rng + 1
    divs = divisors(scale.rng)$divs
}

div <- divs[divs > 6 & divs < 14][1]
scale.rng <- scale.rng/10

xstart=-(max(dat$titleN)+2+length(levels(res.rdat.dat$alloc)))
xend=4

Finally, the plotting part begins. I need almost empty plot, with the exception of y axis. Red grid is used for debugging purposes, it is disabled with the next statement.

plotSettings <- theme(legend.position = "none", axis.line.y = element_blank(), 
    axis.ticks.y = element_blank(), plot.margin = unit(c(0, 
        0, 0, 0), "npc"), panel.margin = unit(c(0, 
        0, 0, 0), "npc"), rect = element_blank(), axis.line = element_line(colour = "black", 
        size = 0.3, linetype = 1), axis.title.x = element_blank(), 
    axis.title.y = element_blank(), panel.grid.major = element_line(color = "red"), 
    axis.text.y = element_blank())

# you can comment the next string for debug
plotSettings <- plotSettings + theme(panel.grid = element_blank(), 
    axis.text.y = element_blank(), axis.ticks.y = element_blank(), 
    panel.grid.major = element_blank())

# I set position_dodge here, but it is
# unneccessary, if you don't use grouping factor
# as in this example. For the same reason there
# are two values for each scale in the following
# plot.
pd <- position_dodge(width = 0.7)

Main part of the plot. The inverted titleN scale is used.


mainPart<-ggplot(dat, aes(x=-titleN,y=yi, ymin=ci.lower, ymax=ci.upper, group=1)) +
  scale_y_continuous(name=NULL, breaks=seq(rng[1], rng[1]+scale.rng, scale.rng/div), limits=c(rng[1], rng[1]+scale.rng), expand=c(0,0)) + ylab(NULL) +
  geom_segment(aes(x=xstart, xend=0, y=0, yend=0), linetype=3, alpha=0.01) +
  geom_vline(x= c(xstart+length(levels(res.rdat.dat$alloc))+1,0), alpha=0.5, linetype=5) + 
  geom_linerange(aes(linetype="1"),position=pd) +
  geom_point(aes(size=w, shape="1"), fill="white",position=pd) +
  geom_linerange( data=res.rdat.dat, aes(x=xstart+length(levels(res.rdat.dat$alloc)):1, y=b, linetype="1", ymin=b-1.96*SE.b, ymax=b+1.96*SE.b), position=pd) +
  geom_point( data=res.rdat.dat, aes(x=xstart+length(levels(res.rdat.dat$alloc)):1, y=b, shape="1", ymin=b-1.96*SE.b, ymax=b+1.96*SE.b), position=pd,  size=2, fill="white") +
  coord_flip() +
  scale_size_continuous(range=c(2,4)) +
  scale_shape_manual(values=c(22,21))+
  scale_linetype_manual(values=c(1,2))+
  plotSettings + 
  scale_x_continuous(limits=c(xstart,xend), expand=c(0,0))+xlab(NULL)+
  annotate("text", x=1, y=0, label='paste("Effect size (",italic("d"),") with 95% CI")', parse=T, family="Open Sans Semibold")+
  guides(shape=guide_legend(title=NULL,keyheight=4), linetype=guide_legend(title=NULL), size="none")+
  #theme(legend.position=c(-rng[1]/diff(rng),0.93),legend.direction="horizontal", legend.justification="center", legend.text=element_text(family="Open Sans Semibold", size=12))  +
  #geom_text(aes(label=d, color=obj..fam),position=pd) +
  theme()

mainPart
Main part of the forest plot

Main part of the forest plot

Now I create study list table:

plotSettings2 <- theme(axis.ticks.x = element_blank(), 
    axis.line.x = element_blank(), axis.text.x = element_text(color = "white"), 
    axis.title.x = element_text(color = "white"))

# ystart & yend are arbitrary. [0, 1] is
# convinient for setting relative coordinates of
# columns

ystart = 0
yend = 1
update_geom_defaults("text", list(size = 4, hjust = 0.5, 
    family = "Open Sans"))

p1 <- ggplot(dat, aes(x = -titleN, y = 0)) + coord_flip() + 
    scale_y_continuous(limits = c(ystart, yend)) + 
    plotSettings + plotSettings2 + scale_x_continuous(limits = c(xstart, 
    xend), expand = c(0, 0)) + xlab(NULL) + ylab(NULL)

studyList<-p1 + with(unique(dat[is.na(dat$author)|dat$author=="",c("titleN","alloc")]), annotate("text",label=alloc, x=-titleN,y=0, family="Open Sans Semibold", hjust=0)) +
  with(dat[!is.na(dat$author)&!duplicated(dat$title),],annotate("text",label=paste(author, year,sep=','),x=-titleN,y=0.02, hjust=0)) +
  annotate("text",x=c(1,xstart+length(levels(res.rdat.dat$alloc)):1),y=0, hjust=0,label=c("Study", levels(res.rdat.dat$alloc)),fontface=2) + geom_vline(x=c(xstart+length(levels(res.rdat.dat$alloc))+1,0),alpha=0.5,linetype=5)

studyList
Plot of the list of studies for forest plot

Plot of the list of studies for forest plot

Now I make an effect size table:

f <- function(x) sprintf("% 0.2f", x)

dat$f.se <- sprintf("% 00.2f", dat$se)
dat$f.vi <- f(dat$vi)

data <- melt(subset(dat, !is.na(vi), c("title", "titleN", 
    "f.vi", "f.se")), measure.vars = c(3:4))

effectSizes1<-p1+annotate("text",x=-data$titleN, y=ifelse(data$variable=="f.vi",0.25,0.75),label=data$value)+
  annotate("text",x=-c(-1,-1), y=rep(c(0.25,0.75)), label=c("italic(d)","italic(SE[d])"),parse=T,fontface=2) +
  #annotate("text",x=-c(-2.5), y=c(0.5), label=c("Top label"),fontface=2) +
  with(res.rdat.dat, annotate("text",x=rep(xstart+length(levels(res.rdat.dat$alloc)):1, 2), y=rep(c(0.25,0.75),each=length(levels(res.rdat.dat$alloc))), label=c(f(b),sprintf("% 00.2f",SE.b)))) +  geom_vline(x=-c(max(dat$titleN+1),0),alpha=0.5,linetype=5)

effectSizes1
Tableplot of effect sizes for forest plot

Tableplot of effect sizes for forest plot

Putting it all together:

grid.arrange(ggplotGrob(studyList), ggplotGrob(mainPart), 
    ggplotGrob(effectSizes1), ncol = 3, widths = unit(c(0.25, 
        0.45, 0.3), "npc"))
Grouped forest plot created with ggplot2

Grouped forest plot created with ggplot2

I didn't create overall result, as it is meaningless in presence of three groups, in my opinion. I also didn't bother to create table with number of subjects, mainly because it is non-informative, but you could easily add it just like the effect sizes.

Several comments:

  1. I used Open Sans font, loaded with extrafont package. You would have to change it to some other font, probably.
  2. This plot was build mainly by trial and error, so there are parts that can be made a lot better.
  3. In some places there are things like linetype=“1”. They are there as a suggestion that you may add some within-study factor here, e.g. to compare treatments.
  4. I use dev version of ggplot2, that has support for element_ stuff and theme function. Not sure, whether it will work with current stable version.

Credit goes to Learning R blog for the idea on how to set up text annotations.

I also used Carl Boettiger's (1, 2) posts on how to set up knitr to WordPress posting. The syntax highlight script and corresponding css are borrowed from RStudio.

UPD: I've made some changes in the code that make the plot scalable, depending on the number of rows in res.rdat.dat. So if you would like to add the "total" effect to the plot, below is the code changes that you need. However, in my opinion, showing "total" effect in presence of moderators is meaningless (see the explanation in comments).

#after res.rdat.dat setting
res.all<-rma(data=dat[!is.na(dat$yi),], yi=yi, vi=vi, slab=title, method="REML")
res.rdat.dat<-rbind(res.rdat.dat, data.frame(b=res.all$b, SE.b=res.all$b, alloc="Total"))

With this addition plot looks like this:

Forest plot with total ES

Forest plot with total ES

UPD1: If you'd like to use fixed effects instead of mixed effects model, you'll need to change "REML" in the regression function (rma) calls to "FE". In this case, the resulting effects have much smaller confidence intervals on this data set.

Forest plot for fixed effects model

Forest plot for fixed effects model