3.2.Plot with R
本章我们介绍如何使用 R 进行数据可视化,我们将提供两种方案:
  • 0a) 在自己电脑使用 Rstudio 来画图(基于 .Rmd 文件),优点是使用方便,交互性强;
  • 0b) 在 Docker 容器中用命令行的方式来画图,优点是无需额外的安装和配置,docker images的下载链接如附表所示。
推荐使用ggsci来完成科研绘图及配色。

0a) Method 1: Use Rstudio

本方案需要在自己电脑上安装软件和配置。
  • (5) 打开 .Rmd 文件
用Rstudio打开all.Rmd文件, 即可阅读教程,并执行相关代码。
tips: 如果你更喜欢每个文件仅包含一节的内容(一种 plot 类型),可以先打开index.Rmd,安装需要的 packages,然后依次打开每一节对应的 .Rmd 文件(动画展了第1、2小节对应的 1.box-plots.Rmd2.violin-plots.Rmd

0b) Method 2: Use R in Docker

如果你在使用方案一时遇到了问题,也可以用我们提供的 Docker(里面已经预装好了 R 语言和需要的 packages)。

(1) Use R in a Docker container

首先进入容器:
1
docker exec -it bioinfo_tsinghua bash
Copied!
本章的操作均在 /home/test/plot/ 下进行:
1
cd /home/test/plot/
Copied!
进入容器后,用以下命令进入 R 语言环境:
1
R
Copied!
现在就可以运行 R 代码了,这里我们展示了计算 1, 2, ..., 10 的平均数。
1
mean(1:10)
Copied!
在实际画图时,依次将下文给出的 R 代码复制到 Terminal 中运行。
运行完毕之后,用以下命令退出(按完 Enter 后,按 n 和 Enter),返回到容器:
1
q()
Copied!

(2) load data, install & library packages

Prepare output directory

1
dir.create('output')
Copied!

Load the data

1
# Read the input files
2
# “header=T” means that the data has a title, and sep="\t" is used as the separator
3
data <-read.table("input/box_plots_mtcars.txt",header=T,sep="\t")
4
# The function c(,,) means create the vector type data
5
df <- data[, c("mpg", "cyl", "wt")]
6
7
df2 <-read.table("input/histogram_plots.txt",header=T,sep="\t")
8
9
df3 <- read.table("input/volcano_plots.txt", header=T)
10
11
df4 <- read.table("input/manhattan_plots_gwasResults.txt",header=T,sep="\t")
12
13
df5 <-read.table("input/heatmaps.txt",header=T,sep="\t")
14
15
# Covert data into matrix format
16
# nrow(df5) and ncol(df5) return the number of rows and columns of matrix df5 respectively.
17
dm <- data.matrix(df5[1:nrow(df5),2:ncol(df5)])
18
19
# Get the row names
20
row.names(dm) <- df5[,1]
21
22
df6 <- read.table("input/ballon_plots_GO.txt", header=T, sep="\t")
23
24
df7 <- read.table("input/box_plots_David_GO.txt",header=T,sep="\t")
25
df7 <- df7[1:10,]
Copied!

Install R packages

Docker 中已经装好所需要的 R 包,如果你是在自己电脑上运行,则需要安装 ggplot2, qqman, gplots, pheatmap, scales, reshape2, RColorBrewer 和 plotrix(使用 install.packages(), 如 install.packages('ggplot2'))。

library R packages

1
library(ggplot2)
2
library(qqman)
3
library(gplots)
4
library(pheatmap)
5
library(scales)
6
library(reshape2)
7
library(RColorBrewer)
8
library(plyr)
9
library(plotrix)
Copied!

(3) Save & view the plot

If you want to save the plot, please use either pdf() + dev.off() or ggsave(). The second one is specific for the ggplot2 package (i.e., if the code for plot starts with ggplot, then you can use the second one).
Let's see an example:
  • pdf() + dev.off()
1
# Begin to plot
2
# Output as pdf
3
pdf("output/1.1.Basic_boxplot.pdf", height = 3, width = 3)
4
# Mapping the X and Y
5
# Components are constructed by using "+"
6
ggplot(df, aes(x=cyl, y=mpg))+
7
# draw the boxplot and fill it with gray
8
geom_boxplot(fill="gray")+
9
# Use the labs function to set the title and modify x and y
10
labs(title="Plot of mpg per cyl",x="Cyl", y = "Mpg")+
11
# Set the theme style
12
theme_classic()
13
14
# Save the plot
15
dev.off()
Copied!
  • ggsave()
1
# Begin to plot
2
p <- ggplot(df, aes(x=cyl, y=mpg)) +
3
geom_boxplot(fill="gray")+
4
labs(title="Plot of mpg per cyl",x="Cyl", y = "Mpg")+
5
theme_classic()
6
# Sava as pdf
7
ggsave("output/1.1.Basic_boxplot.pdf", plot=p, height = 3, width = 3)
Copied!
If you want to view the produced file, you need to copy the file to /home/test/share, then open the bioinfo_tsinghua_share folder on the Desktop of host machine.
the following code is executed in Terminal, i.e., you need to quit R.
1
cp output/1.1.Basic_boxplot.pdf /home/test/share/
Copied!
Here we only show one plot, in real use, you should replace the code for plot and change output file name to do more plots.

(4) Animation of above 3 steps

为了更清楚地展示方案二,我们制作了一个完整的动画:
动画中演示的是虚拟机中的操作步骤,我们首先用浏览器打开这一章,然后将一些代码复制到 Terminal 中去运行,最后查看生成的 plot。(On linux, you can use Ctrl + Insert to paste text in the clipboard to the terminal.)
正如你所看到的,方案二使用起来还是比较不方便的,所以如果没有特别的原因,我们还是推荐优先考虑方案一。
For the following sections, you can find all code in /home/test/plot/Rscripts/ or here (a file per chapter), and demo output in /home/test/plot/success/output/.

1) Box plots

(1) Basic box plot

1
df$cyl <- as.factor(df$cyl)
2
head(df)
Copied!
1
### mpg cyl wt
2
### Mazda RX4 21.0 6 2.620
3
### Mazda RX4 Wag 21.0 6 2.875
4
### Datsun 710 22.8 4 2.320
5
### Hornet 4 Drive 21.4 6 3.215
6
### Hornet Sportabout 18.7 8 3.440
7
### Valiant 18.1 6 3.460
Copied!
1
ggplot(df, aes(x=cyl, y=mpg)) +
2
geom_boxplot(fill="gray")+
3
labs(title="Plot of mpg per cyl",x="Cyl", y = "Mpg")+
4
theme_classic()
Copied!

(2) Change continuous color by groups

1
ggplot(df, aes(x=cyl, y=mpg, fill=cyl)) +
2
geom_boxplot()+
3
labs(title="Plot of mpg per cyl",x="Cyl", y = "Mpg") +
4
scale_fill_brewer(palette="Blues") +
5
theme_bw()
Copied!

(3) Grouped boxplots

1
#Read the data table
2
data=read.csv("boxplot_example.csv")
3
###################
4
#I.Prepare the data
5
#1.Normalize the data, etc
6
for (i in 12:17){
7
data[,i]=log(data[,i]+1e-3) # log some expression values
8
}
9
for (i in 9:17) {
10
maxValue=max(data[,i]) #scale the data into 0-1
11
minValue=min(data[,i])
12
range=maxValue-minValue
13
data[,i]=(data[,i]-minValue)/range
14
}
15
data$X8.Identity=data$X8.Identity/100
16
17
#2.Make the new matrix for boxplot: cleaning the data table
18
library("reshape")
19
m=melt(data[,c(2,7:12,14:17)], id=1)# remove some columns not to show and reshape the matrix into 3 columns for boxplot drawing in bwplot
20
colnames(m)=c("Type","Feature","Normalized_Value") #define the new column names
21
22
#3.Clean the names of each type and each feature
23
#Merge sub-types of different elements
24
m[,1]=sub ("ncRNA_selected","RNAI", m[,1])
25
m[,1]=sub ("ncRNA_3019","RNAII", m[,1])
26
m[,1]=sub ("exon_CCDS","CDS", m[,1])
27
m[,1]=sub ("five_prime_UTR","UTR", m[,1])
28
m[,1]=sub ("three_prime_UTR","UTR", m[,1])
29
m[,1]=sub ("ancestral_repeat","AP", m[,1])
30
#Rename the feature
31
m[,2]=sub('X7.GC','01.GC Content',m[,2])
32
m[,2]=sub('X8.Identity','02.DNA Conservation',m[,2])
33
m[,2]=sub('X9.z_score','03.RNA Struc. Free Energy',m[,2])
34
m[,2]=sub('X10.SCI','04.RNA Struc. Cons.',m[,2])
35
m[,2]=sub('X11.tblastx_score','05.Protein Conservation',m[,2])
36
m[,2]=sub('X12.polyA_RNAseq_MAX','06.PolyA+ RNA-seq',m[,2])
37
m[,2]=sub('X14.small_RNAseq_MAX','07.Small RNA-seq',m[,2])
38
m[,2]=sub('X15.Array_totalRNA_MAX','08.Total RNA Array',m[,2])
39
m[,2]=sub('X16.Array_polyA_MAX','09.PolyA+ RNA Array',m[,2])
40
m[,2]=sub('X17.Array_nonpolyA_MAX','10.PolyA- RNA Array',m[,2])
41
42
###########################
43
#Making Boxplot
44
library("lattice")
45
png("boxplot.png",width=1500,height=500) # pdf is recommended for most cases, or png for figure with huge amount of data points
46
#pdf("boxplot.pdf")
47
attach(m)
48
bwplot(Normalized_Value ~ Type|Feature,fill=c("green","red","yellow","blue","light blue"),layout=c(10,1))
49
dev.off()
Copied!

(4) Boxplot with statistical test

1
data(iris)
2
print(levels(iris$Species))
3
comparisons <- list(c("versicolor","setosa"),c("virginica","versicolor"),c("virginica","setosa"))
4
ggplot(iris,aes(x=Species,y=Sepal.Length,fill=Species))+geom_boxplot(alpha = 1, size = 1, position = position_dodge(1.1),outlier.size=0,outlier.alpha = 0)+
5
geom_point(size = 1, position = position_jitterdodge(dodge.width=0.3,jitter.width = 0.3))+
6
scale_fill_brewer(palette="Blues") +
7
theme_bw()+
8
theme(#legend.position="right",
9
legend.position="right",
10
panel.grid=element_blank(),
11
panel.border=element_blank(),
12
axis.line = element_line(size=1, colour = "black"),
13
legend.title = element_text(face="bold", color="black",family = "Arial", size=24),
14
legend.text= element_text(face="bold", color="black",family = "Arial", size=24),
15
plot.title = element_text(hjust = 0.5,size=24,face="bold"),
16
axis.text.x = element_text(face="bold", color="black", size=24,angle = 45,hjust = 1),
17
axis.text.y = element_text(face="bold", color="black", size=24),
18
axis.title.x = element_text(face="bold", color="black", size=24),
19
axis.title.y = element_text(face="bold",color="black", size=24))+
20
stat_compare_means(comparisons = comparisons,
21
method = "wilcox.test",
22
method.args = list(alternative = "greater"),
23
label = "p.signif"
24
)+labs(x="",title="Boxplot and statistical test", face="bold")
Copied!

2) Violin plots

(1) Basic violin plot

1
df$cyl <- as.factor(df$cyl)
2
head(df)
Copied!
1
### mpg cyl wt
2
### Mazda RX4 21.0 6 2.620
3
### Mazda RX4 Wag 21.0 6 2.875
4
### Datsun 710 22.8 4 2.320
5
### Hornet 4 Drive 21.4 6 3.215
6
### Hornet Sportabout 18.7 8 3.440
7
### Valiant 18.1 6 3.460
Copied!
1
ggplot(df, aes(x=cyl, y=mpg)) +
2
geom_violin(trim=FALSE) +
3
labs(title="Plot of mpg per cyl", x="Cyl", y = "Mpg")
Copied!

(2) Add summary statistics on a violin plot

(2.1) Add median and quartile

1
ggplot(df, aes(x=cyl, y=mpg)) +
2
geom_violin(trim=FALSE) +
3
labs(title="Plot of mpg per cyl", x="Cyl", y = "Mpg") +
4
stat_summary(fun.y=mean, geom="point", shape=23, size=2, color="red")
Copied!
or
1
ggplot(df, aes(x=cyl, y=mpg)) +
2
geom_violin(trim=FALSE) +
3
labs(title="Plot of mpg per cyl", x="Cyl", y = "Mpg") +
4
geom_boxplot(width=0.1)
Copied!

(2.2) Add mean and standard deviation

1
ggplot(df, aes(x=cyl, y=mpg)) +
2
geom_violin(trim=FALSE) +
3
labs(title="Plot of mpg per cyl", x="Cyl", y = "Mpg") +
4
stat_summary(fun.data="mean_sdl", fun.args = list(mult = 1), geom="crossbar", width=0.1 )
Copied!
or
1
ggplot(df, aes(x=cyl, y=mpg)) +
2
geom_violin(trim=FALSE) +
3
labs(title="Plot of mpg per cyl", x="Cyl", y = "Mpg") +
4
stat_summary(fun.data=mean_sdl, fun.args = list(mult = 1), geom="pointrange", color="red")
Copied!

(3) Change violin plot fill colors

1
ggplot(df, aes(x=cyl, y=mpg, fill=cyl)) +
2
geom_violin(trim=FALSE) +
3
geom_boxplot(width=0.1, fill="white") +
4
labs(title="Plot of mpg per cyl", x="Cyl", y = "Mpg") +
5
scale_fill_brewer(palette="Blues") +
6
theme_classic()
Copied!

3) Histogram plots

(1) Basic histogram plot

1
head(df2)
Copied!
1
### sex weight
2
### 1 F 49
3
### 2 F 56
4
### 3 F 60
5
### 4 F 43
6
### 5 F 57
7
### 6 F 58
Copied!
1
ggplot(df2, aes(x=weight)) + geom_histogram(binwidth=1)
Copied!

(2) Add mean line on a histogram plot

1
ggplot(df2, aes(x=weight)) +
2
geom_histogram(binwidth=1, color="black", fill="white") +
3
geom_vline(aes(xintercept=mean(weight)),color="black", linetype="dashed", size=0.5)
Copied!

(3) Change histogram plot fill colors

1
##Use the plyr package to calculate the average weight of each group :
2
mu <- ddply(df2, "sex", summarise, grp.mean=mean(weight))
3
head(mu)
Copied!
1
### sex grp.mean
2
### 1 F 54.70
3
### 2 M 65.36
Copied!
1
##draw the plot
2
ggplot(df2, aes(x=weight, color=sex)) +
3
geom_histogram(binwidth=1, fill="white", position="dodge")+
4
geom_vline(data=mu, aes(xintercept=grp.mean, color=sex), linetype="dashed") +
5
scale_color_brewer(palette="Paired") +
6
theme_classic()+
7
theme(legend.position="top")
Copied!

4) Density plots

(1) Basic density

1
head(df2)
Copied!
1
### sex weight
2
### 1 F 49
3
### 2 F 56
4
### 3 F 60
5
### 4 F 43
6
### 5 F 57
7
### 6 F 58
Copied!
1
ggplot(df2, aes(x=weight)) +
2
geom_density()
Copied!

(2) Add mean line on a density plot

1
ggplot(df2, aes(x=weight)) +
2
geom_density() +
3
geom_vline(aes(xintercept=mean(weight)), color="black", linetype="dashed", size=0.5)
Copied!

(3) Change density plot fill colors

1
##Use the plyr package plyr to calculate the average weight of each group :
2
mu <- ddply(df2, "sex", summarise, grp.mean=mean(weight))
3
head(mu)
Copied!
1
### sex grp.mean
2
### 1 F 54.70
3
### 2 M 65.36
Copied!
draw the plot

(4) Change fill colors

1
ggplot(df2, aes(x=weight, fill=sex)) +
2
geom_density(alpha=0.7)+
3
geom_vline(data=mu, aes(xintercept=grp.mean, color=sex), linetype="dashed")+
4
labs(title="Weight density curve",x="Weight(kg)", y = "Density") +
5
scale_color_brewer(palette="Paired") +
6
scale_fill_brewer(palette="Blues") +
7
theme_classic()
Copied!

(5) Change line colors

1
ggplot(df2, aes(x=weight, color=sex)) +
2
geom_density()+
3
geom_vline(data=mu, aes(xintercept=grp.mean, color=sex), linetype="dashed")+
4
labs(title="Weight density curve",x="Weight(kg)", y = "Density") +
5
scale_color_brewer(palette="Paired") +
6
theme_classic()
Copied!

(6) Combine histogram and density plots

1
ggplot(df2, aes(x=weight, color=sex, fill=sex)) +
2
geom_histogram(binwidth=1, aes(y=..density..), alpha=0.5, position="identity") +
3
geom_density(alpha=.2) +
4
labs(title="Weight density curve",x="Weight(kg)", y = "Density") +
5
scale_color_brewer(palette="Paired") +
6
scale_fill_brewer(palette="Blues") +
7
theme_classic()
Copied!

5) Dot plots

(1) Basic dot plots

1
df$cyl <- as.factor(df$cyl)
2
head(df)
Copied!
1
### mpg cyl wt
2
### Mazda RX4 21.0 6 2.620
3
### Mazda RX4 Wag 21.0 6 2.875
4
### Datsun 710 22.8 4 2.320
5
### Hornet 4 Drive 21.4 6 3.215
6
### Hornet Sportabout 18.7 8 3.440
7
### Valiant 18.1 6 3.460
Copied!
1
ggplot(df, aes(x=cyl, y=mpg)) +
2
geom_dotplot(binaxis='y', stackdir='center', binwidth=1)
Copied!

(2) Add mean and standard deviation

1
ggplot(df, aes(x=cyl, y=mpg)) +
2
geom_dotplot(binaxis='y', stackdir='center', binwidth=1) +
3
stat_summary(fun.data="mean_sdl", fun.args = list(mult=1), geom="crossbar", width=0.5)
Copied!
or
1
ggplot(df, aes(x=cyl, y=mpg)) +
2
geom_dotplot(binaxis='y', stackdir='center', binwidth=1) +
3
stat_summary(fun.data="mean_sdl", fun.args = list(mult=1), geom="pointrange", color="red")
Copied!

(3) Change dot colors

1
ggplot(df, aes(x=cyl, y=mpg, fill=cyl, shape=cyl)) +
2
geom_dotplot(binaxis='y', stackdir='center', binwidth=1, dotsize=0.8) +
3
labs(title="Plot of mpg per cyl",x="Cyl", y = "Mpg") +
4
#stat_summary(fun.data="mean_sdl", fun.args = list(mult=1), geom="crossbar", width=0.5) +
5
scale_fill_brewer(palette="Blues") +
6
#scale_color_brewer(palette="Blues") +
7
theme_classic()
Copied!

(4) Change dot colors, shapes and align types

1
ggplot(df, aes(x=cyl, y=mpg, color=cyl, shape=cyl)) +
2
geom_jitter(position=position_jitter(0.1), cex=2)+
3
labs(title="Plot of mpg per cyl",x="Cyl", y = "Mpg") +
4
scale_color_brewer(palette="Blues") +
5
theme_classic()
Copied!

6) Scatter plots

(1) Basic scatter plots

1
df$cyl <- as.factor(df$cyl)
2
head(df)
Copied!
1
### mpg cyl wt
2
### Mazda RX4 21.0 6 2.620
3
### Mazda RX4 Wag 21.0 6 2.875
4
### Datsun 710 22.8 4 2.320
5
### Hornet 4 Drive 21.4 6 3.215
6
### Hornet Sportabout 18.7 8 3.440
7
### Valiant 18.1 6 3.460
Copied!
1
ggplot(df, aes(x=wt, y=mpg)) +
2
geom_point(size=1.5)
Copied!

(2) Add regression lines and change the point colors, shapes and sizes

1
ggplot(df, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
2
geom_point(size=1.5) +
3
geom_smooth(method=lm, se=FALSE, fullrange=TRUE) +
4
theme_classic()
Copied!

(3) Scatter plot with statistical test

1
data(cars)
2
ggscatter(cars, x = "speed", y = "dist",
3
add = "reg.line", conf.int = TRUE,
4
cor.coef = TRUE, cor.coeff.args = list(method = "spearman", label.x = 15, label.y = 0.05,label.sep = "\n",size = 8))+
5
theme(#legend.position="right",
6
legend.position="right",
7
panel.grid=element_blank(),
8
legend.title = element_text(face="bold", color="black",family = "Arial", size=20),
9
legend.text= element_text(face="bold", color="black",family = "Arial", size=20),
10
plot.title = element_text(hjust = 0.5,size=24,face="bold"),
11
axis.text.x = element_text(face="bold", color="black", size=20),
12
axis.text.y = element_text(face="bold", color="black", size=20),
13
axis.title.x = element_text(face="bold", color="black", size=24),
14
axis.title.y = element_text(face="bold",color="black", size=24))
Copied!

(4) Multiple correlation plot

1
data(iris)
2
library(Hmisc)
3
library(corrplot)
4
res2 <- rcorr(as.matrix(iris[c("Sepal.Width","Petal.Length","Petal.Width")]))
5
corrplot(corr = res2$r,tl.col="black",type="lower", order="original",tl.pos = "d",tl.cex=1.2,
6
p.mat = res2$P, sig.level = 0.05,insig = "blank")
Copied!

7) Volcano plots

1
head(df3)
Copied!
1
### Gene log2FoldChange pvalue padj
2
### 1 DOK6 0.5100 1.861e-08 0.0003053
3
### 2 TBX5 -2.1290 5.655e-08 0.0004191
4
### 3 SLC32A1 0.9003 7.664e-08 0.0004191
5
### 4 IFITM1 -1.6870 3.735e-06 0.0068090
6
### 5 NUP93 0.3659 3.373e-06 0.0068090
7
### 6 EMILIN2 1.5340 2.976e-06 0.0068090
Copied!
1
df3$threshold <- as.factor(ifelse(df3$padj < 0.05 & abs(df3$log2FoldChange) >=1,ifelse(df3$log2FoldChange > 1 ,'Up','Down'),'Not'))
2
ggplot(data=df3, aes(x=log2FoldChange, y =-log10(padj), color=threshold,fill=threshold)) +
3
scale_color_manual(values=c("blue", "grey","red"))+
4
geom_point(size=1) +
5
xlim(c(-3, 3)) +
6
theme_bw(base_size = 12, base_family = "Times") +
7
geom_vline(xintercept=c(-1,1),lty=4,col="grey",lwd=0.6)+
8
geom_hline(yintercept = -log10(0.05),lty=4,col="grey",lwd=0.6)+
9
theme(legend.position="right",
10
panel.grid=element_blank(),
11
legend.title = element_blank(),
12
legend.text= element_text(face="bold", color="black",family = "Times", size=8),
13
plot.title = element_text(hjust = 0.5),
14
axis.text.x = element_text(face="bold", color="black", size=12),
15
axis.text.y = element_text(face="bold", color="black", size=12),
16
axis.title.x = element_text(face="bold", color="black", size=12),
17
axis.title.y = element_text(face="bold",color="black", size=12))+
18
labs(x="log2FoldChange",y="-log10 (adjusted p-value)",title="Volcano plot of DEG", face="bold")
Copied!

8) Manhattan plots

1
head(df4)
Copied!
1
### SNP CHR BP P
2
### 1 rs1 1 1 0.9148060
3
### 2 rs2 1 2 0.9370754
4
### 3 rs3 1 3 0.2861395
5
### 4 rs4 1 4 0.8304476
6
### 5 rs5 1 5 0.6417455
7
### 6 rs6 1 6 0.5190959
Copied!
1
manhattan(df4, main = "GWAS results", ylim = c(0, 8),
2
cex = 0.5, cex.axis=0.8, col=c("dodgerblue4","deepskyblue"),
3
#suggestiveline = F, genomewideline = F, #remove the suggestive and genome-wide significance lines
4
chrlabs = as.character(c(1:22)))
Copied!

9) Heatmaps

(1) Draw the heatmap with the gplots package, heatmap.2() function

1
head(dm)
Copied!
1
### Control1 Tumor2 Control3 Tumor4 Control5 Tumor1
2
### Gene1 3.646058 -0.98990248 2.210404 -0.2063050 2.859744 1.3304284
3
### Gene2 4.271172 -1.16217765 2.734119 -2.4782173 3.752013 0.0255639
4
### Gene3 3.530448 1.11451101 1.635485 -0.4241215 3.701427 1.2263312
5
### Gene4 3.061122 -1.18791027 4.331229 0.8733314 2.349352 0.4825479
6
### Gene5 1.956817 0.25431042 1.984438 1.2713845 1.685917 1.4554739
7
### Gene6 2.000919 0.06015972 4.480901 0.9780682 3.063475 -0.4222994
8
### Control2 Tumor3 Control4 Tumor5
9
### Gene1 2.690376 0.6135943 2.470413 0.5158246
10
### Gene2 4.471795 1.6516242 2.735508 -0.5837784
11
### Gene3 3.588787 -0.6349656 1.999844 0.1417349
12
### Gene4 1.854433 -1.2237684 1.154377 -0.9301261
13
### Gene5 2.445830 0.3316909 2.715163 0.1866400
14
### Gene6 3.585366 1.0689000 2.563422 1.3465830
Copied!
1
##to draw high expression value in red, we use colorRampPalette instead of redblue in heatmap.2
2
##colorRampPalette is a function in the RColorBrewer package
3
cr <- colorRampPalette(c("blue","white","red"))
4
heatmap.2(dm,
5
scale="row", #scale the rows, scale each gene's expression value
6
key=T, keysize=1.1,
7
cexCol=0.9,cexRow=0.8,
8
col=cr(1000),
9
ColSideColors=c(rep(c("blue","red"),5)),
10
density.info="none",trace="none",
11
#dendrogram='none', #if you want to remove dendrogram
12
Colv = T,Rowv = T) #clusters by both row and col
Copied!

(2) Draw the heatmap with the pheatmap package, pheatmap function

1
##add column and row annotations
2
annotation_col = data.frame(CellType = factor(rep(c("Control", "Tumor"), 5)), Time = 1:5)
3
rownames(annotation_col) = colnames(dm)
4
annotation_row = data.frame(GeneClass = factor(rep(c("Path1", "Path2", "Path3"), c(10, 4, 6))))
5
rownames(annotation_row) = paste("Gene", 1:20, sep = "")
6
##set colors of each group
7
ann_colors = list(Time = c("white", "springgreen4"),
8
CellType = c(Control = "#7FBC41", Tumor = "#DE77AE"),
9
GeneClass = c(Path1 = "#807DBA", Path2 = "#9E9AC8", Path3 = "#BCBDDC"))
10
##draw the heatmap
11
pheatmap(dm,
12
cutree_col = 2, cutree_row = 3, #break up the heatmap by clusters you define
13
cluster_rows=TRUE, show_rownames=TRUE, cluster_cols=TRUE, #by default, pheatmap clusters by both row and col
14
annotation_col = annotation_col, annotation_row = annotation_row,annotation_colors = ann_colors)
Copied!

(3) Draw the heatmap with the ggplot2 package

1
##9.3.1.cluster by row and col
2
##cluster and re-order rows
3
rowclust = hclust(dist(dm))
4
reordered = dm[rowclust$order,]
5
##cluster and re-order columns
6
colclust = hclust(dist(t(dm)))
7
##9.3.2.scale each row value in [0,1]
8
dm.reordered = reordered[, colclust$order]
9
dm.reordered=apply(dm.reordered,1,rescale) #rescale is a function in the scales package
10
dm.reordered=t(dm.reordered) #transposed matrix
11
##9.3.3.save col and row names before changing the matrix format
12
col_name=colnames(dm.reordered)
13
row_name=rownames(dm.reordered)
14
##9.3.4.change data format for geom_title
15
colnames(dm.reordered)=1:ncol(dm.reordered)
16
rownames(dm.reordered)=1:nrow(dm.reordered)
17
dm.reordered=melt(dm.reordered) #melt is a function in the reshape2 package
18
head(dm.reordered)
19
##9.3.5.draw the heatmap
20
ggplot(dm.reordered, aes(Var2, Var1)) +
21
geom_tile(aes(fill = value), color = "white") +
22
scale_fill_gradient(low = "white", high = "steelblue") +
23
theme_grey(base_size = 10) +
24
labs(x = "", y = "") +
25
scale_x_continuous(expand = c(0, 0),labels=col_name,breaks=1:length(col_name)) +
26
scale_y_continuous(expand = c(0, 0),labels=row_name,breaks=1:length(row_name))
Copied!

10) Ballon plots

(1) basic ballon plots

1
head(df6)
Copied!
1
### Biological.process Fold.enrichment X.log10.Pvalue. col
2
### 1 Small molecule metabolic process 1.0 16 1
3
### 2 Single-organism catabolic process 1.5 12 1
4
### 3 Oxoacid metabolic process 2.0 23 1
5
### 4 Small molecule biosynthetic process 2.5 6 1
6
### 5 Carboxylic acid metabolic process 2.7 24 1
7
### 6 Organic acid metabolic process 2.7 25 1
Copied!
1
ggplot(df6, aes(x=Fold.enrichment, y=Biological.process)) +
2
geom_point(aes(size = X.log10.Pvalue.)) +
3
scale_x_continuous(limits=c(0,7),breaks=0:7) +
4
scale_size(breaks=c(1,5,10,15,20,25)) +
5
theme_light() +
6
theme(panel.border=element_rect(fill='transparent', color='black', size=1),
7
plot.title = element_text(color="black", size=14, hjust=0.5, face="bold", lineheight=1),
8
axis.title.x = element_text(color="black", size=12, face="bold"),
9
axis.title.y = element_text(color="black", size=12, vjust=1.5, face="bold"),
10
axis.text.x = element_text(size=12,color="black",face="bold"),
11
axis.text.y = element_text(size=12,color="black",face="bold"),
12
legend.text = element_text(color="black", size=10, hjust=-2),
13
legend.position="bottom") +
14
labs(x="Fold Enrichment",y="Biological Process",size="-log10(Pvalue)", title="GO Enrichment",face="bold")
Copied!

(2) change the dot colors

1
ggplot(df6, aes(x=col, y=Biological.process,color=X.log10.Pvalue.)) +
2
geom_point(aes(size = Fold.enrichment)) +
3
scale_x_discrete(limits=c("1")) +
4
scale_size(breaks=c(1,2,4,6)) +
5
scale_color_gradient(low="#fcbba1", high="#a50f15") +
6
theme_classic() +
7
theme(panel.border=element_rect(fill='transparent', color='black', size=1),
8
plot.title = element_text(color="black", size=14, hjust=0.5, face="bold", lineheight=1),
9
axis.title.x = element_blank(),
10
axis.title.y = element_text(color="black", size=12, face=