I decided to write this post in English, as I need to practice in writing more.
As I currently do some meta-analytic stuff, I needed to get a proper plot of the results of analysis. The existing solutions are hard to customize, so I decided to do something by myself. Basically, the forest plot is a errorbar plot of effect sizes, plus a list of studies, plus some other stuff, usually a list of effect sizes.
Here is the example code from metafor package.
Basic plot:
library(metafor)
### load BCG vaccine data
data(dat.bcg)
res <- rma(ai = tpos, bi = tneg, ci = cpos, di = cneg,
data = dat.bcg, measure = "RR", slab = paste(author,
year, sep = ", "), method = "REML")
par(bg = "white")
forest(res, slab = paste(dat.bcg$author, dat.bcg$year,
sep = ", "), xlim = c(-16, 6), at = log(c(0.05,
0.25, 1, 4)), atransf = exp, ilab = cbind(dat.bcg$tpos,
dat.bcg$tneg, dat.bcg$cpos, dat.bcg$cneg), ilab.xpos = c(-9.5,
-8, -6, -4.5), cex = 0.75, ylim = c(-1, 27), order = order(dat.bcg$alloc),
rows = c(3:4, 9:15, 20:23), mlab = "RE Model for All Studies")
Plot with grouping:
par(bg = "white")
forest(res, slab = paste(dat.bcg$author, dat.bcg$year,
sep = ", "), xlim = c(-16, 6), at = log(c(0.05,
0.25, 1, 4)), atransf = exp, ilab = cbind(dat.bcg$tpos,
dat.bcg$tneg, dat.bcg$cpos, dat.bcg$cneg), ilab.xpos = c(-9.5,
-8, -6, -4.5), cex = 0.75, ylim = c(-1, 27), order = order(dat.bcg$alloc),
rows = c(3:4, 9:15, 20:23), mlab = "RE Model for All Studies")
op <- par(cex = 0.75, font = 4)
text(-16, c(24, 16, 5), c("Systematic Allcoation",
"Random Allocation", "Alternate Allocation"), pos = 4)
par(font = 2)
text(c(-9.5, -8, -6, -4.5), 26, c("TB+", "TB-", "TB+",
"TB-"))
text(c(-8.75, -5.25), 27, c("Vaccinated", "Control"))
text(-16, 26, "Author(s) and Year", pos = 4)
text(6, 26, "Relative Risk [95% CI]", pos = 2)
par(op)
res <- rma(ai = tpos, bi = tneg, ci = cpos, di = cneg,
data = dat.bcg, measure = "RR", subset = (alloc ==
"systematic"), method = "REML")
addpoly(res, row = 18.5, cex = 0.75, atransf = exp,
mlab = "RE Model for Subgroup")
res <- rma(ai = tpos, bi = tneg, ci = cpos, di = cneg,
data = dat.bcg, measure = "RR", subset = (alloc ==
"random"), method = "REML")
addpoly(res, row = 7.5, cex = 0.75, atransf = exp,
mlab = "RE Model for Subgroup")
res <- rma(ai = tpos, bi = tneg, ci = cpos, di = cneg,
data = dat.bcg, measure = "RR", subset = (alloc ==
"alternate"), method = "REML")
addpoly(res, row = 1.5, cex = 0.75, atransf = exp,
mlab = "RE Model for Subgroup")
I see two problems with this code. First, many things are done manually. You have to set all coordinates for groups by hand. It is OK, if you have three groups, but what if there are more? Second, I don't like basic plot function for its lack of flexibility.
So the following coide is aimed at reproduction of this plot using ggplot2. It won't be an exact reproduction, but it is quite easy to add neccessary parts once you've got the idea.
First, I load neccessary libraries:
library(extrafont) #Open Sans font for plot
loadfonts("postscript", quiet = T)
library(metafor) #To compute weights and resulting effect sizes
library(xtable) #xtable is used for creating HTML tables in this post
library(devtools) #To use latest version of ggplot2
dev_mode(T)
library(ggplot2)
library(gridExtra) #to set up plot grid
library(stringr) #string formatting functions
library(divisors) #this one will be used to set scale range and breaks
library(plyr) #rbind.fill function
library(reshape2) #transformation of tables
Now the main part begins. I start with computing ESs, corresponding SEs and confidence intervals.
data(dat.bcg)
### calculate log relative risks and
### corresponding sampling variances
dat <- escalc(measure = "RR", ai = tpos, bi = tneg,
ci = cpos, di = cneg, data = dat.bcg, append = TRUE)
dat$se <- sqrt(dat$vi)
dat <- transform(dat, ci.lower = yi - 1.96 * se, ci.upper = yi +
1.96 * se)
print(xtable(dat[, c("author", "year", "alloc", "yi",
"vi", "se", "ci.lower", "ci.upper")]), html.table.attributes = NULL,
type = "HTML")
| author | year | alloc | yi | vi | se | ci.lower | ci.upper | |
|---|---|---|---|---|---|---|---|---|
| 1 | Aronson | 1948 | random | -0.89 | 0.33 | 0.57 | -2.01 | 0.23 |
| 2 | Ferguson & Simes | 1949 | random | -1.59 | 0.19 | 0.44 | -2.45 | -0.72 |
| 3 | Rosenthal et al | 1960 | random | -1.35 | 0.42 | 0.64 | -2.61 | -0.08 |
| 4 | Hart & Sutherland | 1977 | random | -1.44 | 0.02 | 0.14 | -1.72 | -1.16 |
| 5 | Frimodt-Moller et al | 1973 | alternate | -0.22 | 0.05 | 0.23 | -0.66 | 0.23 |
| 6 | Stein & Aronson | 1953 | alternate | -0.79 | 0.01 | 0.08 | -0.95 | -0.62 |
| 7 | Vandiviere et al | 1973 | random | -1.62 | 0.22 | 0.47 | -2.55 | -0.70 |
| 8 | TPT Madras | 1980 | random | 0.01 | 0.00 | 0.06 | -0.11 | 0.14 |
| 9 | Coetzee & Berjak | 1968 | random | -0.47 | 0.06 | 0.24 | -0.94 | -0.00 |
| 10 | Rosenthal et al | 1961 | systematic | -1.37 | 0.07 | 0.27 | -1.90 | -0.84 |
| 11 | Comstock et al | 1974 | systematic | -0.34 | 0.01 | 0.11 | -0.56 | -0.12 |
| 12 | Comstock & Webster | 1969 | systematic | 0.45 | 0.53 | 0.73 | -0.98 | 1.88 |
| 13 | Comstock et al | 1976 | systematic | -0.02 | 0.07 | 0.27 | -0.54 | 0.51 |
Now the trick that will be used for grouping. I make an additional row in our data frame for each group:
for (i in unique(dat$alloc)) {
dat <- rbind.fill(dat, data.frame(year = 0, alloc = i,
stringsAsFactors = F))
}
print(xtable(dat[, c("author", "year", "alloc", "yi",
"vi", "se", "ci.lower", "ci.upper")]), html.table.attributes = NULL,
type = "HTML")
| author | year | alloc | yi | vi | se | ci.lower | ci.upper | |
|---|---|---|---|---|---|---|---|---|
| 1 | Aronson | 1948.00 | random | -0.89 | 0.33 | 0.57 | -2.01 | 0.23 |
| 2 | Ferguson & Simes | 1949.00 | random | -1.59 | 0.19 | 0.44 | -2.45 | -0.72 |
| 3 | Rosenthal et al | 1960.00 | random | -1.35 | 0.42 | 0.64 | -2.61 | -0.08 |
| 4 | Hart & Sutherland | 1977.00 | random | -1.44 | 0.02 | 0.14 | -1.72 | -1.16 |
| 5 | Frimodt-Moller et al | 1973.00 | alternate | -0.22 | 0.05 | 0.23 | -0.66 | 0.23 |
| 6 | Stein & Aronson | 1953.00 | alternate | -0.79 | 0.01 | 0.08 | -0.95 | -0.62 |
| 7 | Vandiviere et al | 1973.00 | random | -1.62 | 0.22 | 0.47 | -2.55 | -0.70 |
| 8 | TPT Madras | 1980.00 | random | 0.01 | 0.00 | 0.06 | -0.11 | 0.14 |
| 9 | Coetzee & Berjak | 1968.00 | random | -0.47 | 0.06 | 0.24 | -0.94 | -0.00 |
| 10 | Rosenthal et al | 1961.00 | systematic | -1.37 | 0.07 | 0.27 | -1.90 | -0.84 |
| 11 | Comstock et al | 1974.00 | systematic | -0.34 | 0.01 | 0.11 | -0.56 | -0.12 |
| 12 | Comstock & Webster | 1969.00 | systematic | 0.45 | 0.53 | 0.73 | -0.98 | 1.88 |
| 13 | Comstock et al | 1976.00 | systematic | -0.02 | 0.07 | 0.27 | -0.54 | 0.51 |
| 14 | 0.00 | random | ||||||
| 15 | 0.00 | alternate | ||||||
| 16 | 0.00 | systematic |
Then a title factor is created, which will be used for labels. Also I make a numeric version of this factor, which will be used for positioning of rows in the plot.
dat$title <- factor(paste(dat$alloc, dat$year, sep = ", "))
dat$titleN <- as.numeric(dat$title)
Allocation is converted to factor with alternate being the reference level.
dat$alloc <- factor(dat$alloc, levels = c("alternate",
"random", "systematic"), labels = c("Alternate",
"Random", "Systematic"))
Next I create regression model to get summary effects and extract weights for each study. These weights are in turn appended to the dat data frame.
res.rdat.noi <- rma(data = dat[!is.na(dat$yi), ], yi = yi,
vi = vi, mods = ~alloc - 1, slab = title, method = "REML")
res.rdat.dat <- data.frame(b = res.rdat.noi$b, SE.b = res.rdat.noi$se,
alloc = levels(dat$alloc))
res.rdat.w <- data.frame(weights(res.rdat.noi))
names(res.rdat.w) <- "w"
res.rdat.w$title <- rownames(res.rdat.w)
dat <- merge(dat, res.rdat.w, by = "title", all = T)
print(xtable(dat[, c("title", "author", "year", "alloc",
"yi", "vi", "se", "ci.lower", "ci.upper", "w")]),
type = "HTML")
Now I compute effect sizes scale range and a nice number of breaks (from 8 to 12):
rng <- range(na.omit(c(dat$ci.lower, dat$ci.upper)))
floor.dec <- function(x, digits) {
r <- round(x, digits)
r <- ifelse(r > x, r - 10^(-digits), r)
r
}
ceiling.dec <- function(x, digits) {
r <- round(x, digits)
r <- ifelse(r < x, r + 10^(-digits), r)
r
}
rng[1] <- floor.dec(rng[1], 1)
rng[2] <- ceiling.dec(rng[2], 1)
scale.rng <- diff(rng * 10)
divs = divisors(scale.rng)$divs
while (!sum(ifelse(divs > 6 & divs < 14, 1, 0))) {
scale.rng = scale.rng + 1
divs = divisors(scale.rng)$divs
}
div <- divs[divs > 6 & divs < 14][1]
scale.rng <- scale.rng/10
xstart=-(max(dat$titleN)+2+length(levels(res.rdat.dat$alloc)))
xend=4
Finally, the plotting part begins. I need almost empty plot, with the exception of y axis. Red grid is used for debugging purposes, it is disabled with the next statement.
plotSettings <- theme(legend.position = "none", axis.line.y = element_blank(),
axis.ticks.y = element_blank(), plot.margin = unit(c(0,
0, 0, 0), "npc"), panel.margin = unit(c(0,
0, 0, 0), "npc"), rect = element_blank(), axis.line = element_line(colour = "black",
size = 0.3, linetype = 1), axis.title.x = element_blank(),
axis.title.y = element_blank(), panel.grid.major = element_line(color = "red"),
axis.text.y = element_blank())
# you can comment the next string for debug
plotSettings <- plotSettings + theme(panel.grid = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank(),
panel.grid.major = element_blank())
# I set position_dodge here, but it is
# unneccessary, if you don't use grouping factor
# as in this example. For the same reason there
# are two values for each scale in the following
# plot.
pd <- position_dodge(width = 0.7)
Main part of the plot. The inverted titleN scale is used.
mainPart<-ggplot(dat, aes(x=-titleN,y=yi, ymin=ci.lower, ymax=ci.upper, group=1)) +
scale_y_continuous(name=NULL, breaks=seq(rng[1], rng[1]+scale.rng, scale.rng/div), limits=c(rng[1], rng[1]+scale.rng), expand=c(0,0)) + ylab(NULL) +
geom_segment(aes(x=xstart, xend=0, y=0, yend=0), linetype=3, alpha=0.01) +
geom_vline(x= c(xstart+length(levels(res.rdat.dat$alloc))+1,0), alpha=0.5, linetype=5) +
geom_linerange(aes(linetype="1"),position=pd) +
geom_point(aes(size=w, shape="1"), fill="white",position=pd) +
geom_linerange( data=res.rdat.dat, aes(x=xstart+length(levels(res.rdat.dat$alloc)):1, y=b, linetype="1", ymin=b-1.96*SE.b, ymax=b+1.96*SE.b), position=pd) +
geom_point( data=res.rdat.dat, aes(x=xstart+length(levels(res.rdat.dat$alloc)):1, y=b, shape="1", ymin=b-1.96*SE.b, ymax=b+1.96*SE.b), position=pd, size=2, fill="white") +
coord_flip() +
scale_size_continuous(range=c(2,4)) +
scale_shape_manual(values=c(22,21))+
scale_linetype_manual(values=c(1,2))+
plotSettings +
scale_x_continuous(limits=c(xstart,xend), expand=c(0,0))+xlab(NULL)+
annotate("text", x=1, y=0, label='paste("Effect size (",italic("d"),") with 95% CI")', parse=T, family="Open Sans Semibold")+
guides(shape=guide_legend(title=NULL,keyheight=4), linetype=guide_legend(title=NULL), size="none")+
#theme(legend.position=c(-rng[1]/diff(rng),0.93),legend.direction="horizontal", legend.justification="center", legend.text=element_text(family="Open Sans Semibold", size=12)) +
#geom_text(aes(label=d, color=obj..fam),position=pd) +
theme()
mainPart
Now I create study list table:
plotSettings2 <- theme(axis.ticks.x = element_blank(),
axis.line.x = element_blank(), axis.text.x = element_text(color = "white"),
axis.title.x = element_text(color = "white"))
# ystart & yend are arbitrary. [0, 1] is
# convinient for setting relative coordinates of
# columns
ystart = 0
yend = 1
update_geom_defaults("text", list(size = 4, hjust = 0.5,
family = "Open Sans"))
p1 <- ggplot(dat, aes(x = -titleN, y = 0)) + coord_flip() +
scale_y_continuous(limits = c(ystart, yend)) +
plotSettings + plotSettings2 + scale_x_continuous(limits = c(xstart,
xend), expand = c(0, 0)) + xlab(NULL) + ylab(NULL)
studyList<-p1 + with(unique(dat[is.na(dat$author)|dat$author=="",c("titleN","alloc")]), annotate("text",label=alloc, x=-titleN,y=0, family="Open Sans Semibold", hjust=0)) +
with(dat[!is.na(dat$author)&!duplicated(dat$title),],annotate("text",label=paste(author, year,sep=','),x=-titleN,y=0.02, hjust=0)) +
annotate("text",x=c(1,xstart+length(levels(res.rdat.dat$alloc)):1),y=0, hjust=0,label=c("Study", levels(res.rdat.dat$alloc)),fontface=2) + geom_vline(x=c(xstart+length(levels(res.rdat.dat$alloc))+1,0),alpha=0.5,linetype=5)
studyList
Now I make an effect size table:
f <- function(x) sprintf("% 0.2f", x)
dat$f.se <- sprintf("% 00.2f", dat$se)
dat$f.vi <- f(dat$vi)
data <- melt(subset(dat, !is.na(vi), c("title", "titleN",
"f.vi", "f.se")), measure.vars = c(3:4))
effectSizes1<-p1+annotate("text",x=-data$titleN, y=ifelse(data$variable=="f.vi",0.25,0.75),label=data$value)+
annotate("text",x=-c(-1,-1), y=rep(c(0.25,0.75)), label=c("italic(d)","italic(SE[d])"),parse=T,fontface=2) +
#annotate("text",x=-c(-2.5), y=c(0.5), label=c("Top label"),fontface=2) +
with(res.rdat.dat, annotate("text",x=rep(xstart+length(levels(res.rdat.dat$alloc)):1, 2), y=rep(c(0.25,0.75),each=length(levels(res.rdat.dat$alloc))), label=c(f(b),sprintf("% 00.2f",SE.b)))) + geom_vline(x=-c(max(dat$titleN+1),0),alpha=0.5,linetype=5)
effectSizes1
Putting it all together:
grid.arrange(ggplotGrob(studyList), ggplotGrob(mainPart),
ggplotGrob(effectSizes1), ncol = 3, widths = unit(c(0.25,
0.45, 0.3), "npc"))
I didn't create overall result, as it is meaningless in presence of three groups, in my opinion. I also didn't bother to create table with number of subjects, mainly because it is non-informative, but you could easily add it just like the effect sizes.
Several comments:
- I used Open Sans font, loaded with extrafont package. You would have to change it to some other font, probably.
- This plot was build mainly by trial and error, so there are parts that can be made a lot better.
- In some places there are things like linetype=“1”. They are there as a suggestion that you may add some within-study factor here, e.g. to compare treatments.
- I use dev version of ggplot2, that has support for element_ stuff and theme function. Not sure, whether it will work with current stable version.
Credit goes to Learning R blog for the idea on how to set up text annotations.
I also used Carl Boettiger's (1, 2) posts on how to set up knitr to WordPress posting. The syntax highlight script and corresponding css are borrowed from RStudio.
UPD: I've made some changes in the code that make the plot scalable, depending on the number of rows in res.rdat.dat. So if you would like to add the "total" effect to the plot, below is the code changes that you need. However, in my opinion, showing "total" effect in presence of moderators is meaningless (see the explanation in comments).
#after res.rdat.dat setting
res.all<-rma(data=dat[!is.na(dat$yi),], yi=yi, vi=vi, slab=title, method="REML")
res.rdat.dat<-rbind(res.rdat.dat, data.frame(b=res.all$b, SE.b=res.all$b, alloc="Total"))
With this addition plot looks like this:
UPD1: If you'd like to use fixed effects instead of mixed effects model, you'll need to change "REML" in the regression function (rma) calls to "FE". In this case, the resulting effects have much smaller confidence intervals on this data set.










RSS
Livejournal
Twitter
Facebook