Appendix I. Keep Learning

在生物信息学的学习和应用中,最重要的、最有用的基本工具和技能,过去一直是,我相信将来的很长一段时间也会是:

  1. google

  2. wikipedia

  3. 论坛(知乎,SeqanswersBiostars, etc)

⭐: 必读 ✨: 推荐

📖 1) Recommended Books

📖 (1) 参考书 - 综合

选择性阅读的案头书

  • ✨《Computational Biology》by Manolis Kellis @ MIT (PDF)

  • 《生物信息学》樊龙江 主编

  • 《生物信息学》李霞,雷健波,李亦学 等 编

📖 (2). 参考书 - 工具书

按需阅读和练习

Better to learn and practice 3 basic techniques (完成任何一个要求即可:1. 1000行以上的程序; 2. 认可证书,例如在线课程的正式)

  1. R (or MATLAB)

  2. Python (or Perl)

  3. Linux (Editor (e.g. VIM) and Shell Script (e.g. bash))

  1. ⭐Quick R (online) OR 《R语言实战》 (《R in action》)

  2. ⭐《笨办法学 Python》(《Learn Python The Hard Way》)OR 《Beginning Perl for Bioinformatics》

Linux 推荐章节:

  • 第5章: 5.3.1 man page; 第6章: 6.1用户与用户组; 6.2 LINUX文件权限概念; 6.3 LINUX目录配置

  • 第7章: 7.1目录与路径; 7.2文件与目录管理; 7.3文件内容查阅; 7.5命令与文件的查询; 7.6权限与命令间的关系; 第8章: 8.2文件系统的简单操作

  • 第9章: 9.1压缩文件的用途与技术; 9.2 Linux系统常见的压缩命令; 9.3打包命令:tar

  • 第10章 vim程序编辑器

  • 第11章 认识与学习bash; 第12章 正则表达式与文件格式化处理;第13章 学习shell script

  • 第25章 LINUX备份策略: 25.2.2完整备份的差异备份; 25.3鸟哥的备份策略; 25.4灾难恢复的考虑; 25.5重点回顾

Linux 重点学习:

  1. Editor (e.g. VIM)

  2. Shell Script (e.g. bash)

📖 (3) 参考书 - 统计类

📖 (4) 参考书 - 进阶类

半年以上经验

🎓 2) Recommended on-line Courses

💡 3) Recommended Tips

4) Computational Biology Primers

This is a list of explanatory papers that have appeared as primer in the Computational Biology section of the journal Nature Biotechnology, in reverse chronological order. (Last addition November 2013 / checked March 2016).

Nature Biotechnology

PDFs

(1) Basics

The anatomy of successful computational biology software

(Stephen Altschul, Barry Demchak, Richard Durbin, Robert Gentleman, Martin Krzywinski, Heng Li, Anton Nekrutenko, James Robinson, Wayne Rasband, James Taylor & Cole Trapnell)

October 2013, Vol 31, No 10; pp 894 - 897

doi: 10.1038/nbt.2721 (google)

Understanding genome browsing

(Melissa S Cline & W James Kent)

February 2009, Vol 27, No 2; pp 153 - 155

doi: 10.1038/nbt0209-153 (google)

(2) Basic Statistics

How does multiple testing correction work?

(William S Noble)

December 2009, Vol 27, No 12 ; pp 1135 - 1137

doi: 10.1038/nbt1209-1135 (google)

What is Bayesian statistics?

(Sean R Eddy)

September 2004, Volume 22, No 9; pp 1177 - 1178

doi: 10.1038/nbt0904-1177 (google)

(3) Basic Algorithms

How to map billions of short reads onto genomes

(Cole Trapnell & Steven L Salzberg)

May 2009, Vol 27, No 5; pp 455 - 457

doi: 10.1038/nbt0509-455 (google)

Where did the BLOSUM62 alignment score matrix come from?

(Sean R Eddy)

August 2004, Volume 22, No 8; pp 1035 - 1036

doi: 10.1038/nbt0804-1035 (google)

What is dynamic programming?

(Sean R Eddy)

July 2004, Volume 22, No 7; pp 909 - 910

doi: 10.1038/nbt0704-909 (google)

How do RNA folding algorithms work?

(Sean R Eddy)

November 2004, Volume 22, No 11; pp 1457 - 1458

doi: 10.1038/nbt1104-1457 (google)

(4) Machine Learning

What is a hidden Markov model?

(Sean R Eddy)

October 2004, Volume 22, No 10; pp 1315 - 1316

doi: 10.1038/nbt1004-1315 (google)

What is the expectation maximization algorithm?

(Chuong B Do & Serafim Batzoglou)

August 2008, Volume 26 No 8; pp 897 - 899

doi: 10.1038/nbt1406 (google)

What are decision trees?

(Carl Kingsford & Steven L Salzberg)

September 2008, Volume 26, No 9; pp 1011 - 1013

doi: 10.1038/nbt0908-1011 (google)

What is a support vector machine?

(William S Noble)

December 2006, Volume 24, No 12; pp 1565 - 1567

doi: 10.1038/nbt1206-1565 (google)

Inference in Bayesian networks

(Chris J Needham, James R Bradford, Andrew J Bulpitt & David R Westhead)

January 2006, Volume 24, No 1; pp 51 - 53

doi: 10.1038/nbt0106-51 (google)

What are artificial neural networks?

(Anders Krogh)

February 2008, Volume 26, No 2; pp 195 - 197

doi: 10.1038/nbt1386 (google)

How does gene expression clustering work?

(Patrik D'haeseleer)

December 2005, Volume 23, No 12; pp 1499 - 1501

doi: 10.1038/nbt1205-1499 (google)

What is principal component analysis?

(Markus Ringnér)

March 2008, Volume 26, No 3; pp 303 - 304

doi: 10.1038/nbt0308-303 (google)

(5) Others

What are DNA sequence motifs?

(Patrik D'haeseleer)

April 2006, Volume 24, No 4; pp 423 - 425

doi: 10.1038/nbt0406-423 (google)

How does DNA sequence motif discovery work?

(Patrik D'haeseleer)

August 2006, Volume 24, No 8; pp 959 - 961

doi: 10.1038/nbt0806-959 (google)

How to apply de Bruijn graphs to genome assembly

(Phillip E C Compeau, Pavel A Pevzner & Glenn Tesler)

November 2011, Vol 29, No 11; pp 987 - 991

doi: 10.1038/nbt.2023 (google)

How does eukaryotic gene prediction work?

(Michael R Brent)

August 2007, Volume 25, No 8; pp 883 - 885

doi: 10.1038/nbt0807-883 (google)

Analyzing 'omics data using hierarchical models

(Hongkai Ji & X Shirley Liu)

April 2010, Vol 28, No 4; pp 337 - 340

doi: 10.1038/nbt.1619 (google)

What is flux balance analysis?

(Jeffrey D Orth, Ines Thiele & Bernhard Ø Palsson)

March 2010, Vol 28, No 3; pp 245 - 248

doi: 10.1038/nbt.1614 (google)

How to visually interpret biological data using networks

(Daniele Merico, David Gfeller & Gary D Bader)

October 2009, Vol 27 No 10 ; pp 921 - 924

doi: 10.1038/nbt.1567 (google)

SNP imputation in association studies

(Eran Halperin & Dietrich A Stephan)

April 2009, Vol 27, No 4; pp 349 - 351

doi: 10.1038/nbt0409-349 (google)

Maximizing power in association studies

(Eran Halperin & Dietrich A Stephan)

March 2009, Vol 27, No 3; pp 255 - 256

doi: 10.1038/nbt0309-255 (google)

How do shotgun proteomics algorithms identify proteins?

(Edward M Marcotte)

July 2007, Volume 25, No 7; pp 755 - 757

doi: 10.1038/nbt0707-755 (google)

5) Getting Started in ...

Several Captions have been used to indicate educationally relevant papers in Plos CompBio. Here we have collected some other papers. — PloS Computational Biology

PDFs

Getting Started in Computational Immunology.

(Kleinstein SH )

PLoS Comput Biol (2008) 4(8): e1000128;

doi: 10.1371/journal.pcbi.1000128 (google)

(1) Basics

Getting Started in Gene Orthology and Functional Analysis

(Fang G, Bhardwaj N, Robilotto R, Gerstein MB)

PLoS Comput Biol (2010) 6(3): e1000703;

doi: 10.1371/journal.pcbi.1000703 (google)

Getting Started in Biological Pathway Construction and Analysis.

(Viswanathan GA, Seto J, Patil S, Nudelman G, Sealfon SC )

PLoS Comput Biol (2008) 4(2): e16;

doi: 10.1371/journal.pcbi.0040016 (google)

Getting Started in Structural Phylogenomics

(Sjölander K )

PLoS Comput Biol (2010) 6(1): e1000621 ;

doi: 10.1371/journal.pcbi.1000621 (google)

(2) Advanced

Getting Started in Text Mining

(Cohen KB, Hunter L)

PLoS Comput Biol (2008) 4(1): e20;

doi: 10.1371/journal.pcbi.0040020 (google)

Getting Started in Text Mining: Part Two.

(Rzhetsky A, Seringhaus M, Gerstein MB)

PLoS Comput Biol (2009) 5(7): e1000411. ;

doi: 10.1371/journal.pcbi.1000411 (google)

Getting Started in Probabilistic Graphical Models.

(Airoldi EM )

PLoS Comput Biol (2007) 3(12): e252. ;

doi: 10.1371/journal.pcbi.0030252 (google)

(3) MS and Array

Getting Started in Computational Mass Spectrometry-Based Proteomics.

(Vitek O)

PLoS Comput Biol (2009) 5(5): e1000366. ;

doi: 10.1371/journal.pcbi.1000366 (google)

Getting Started in Gene Expression Microarray Analysis

(Slonim DK, Yanai I)

PLoS Comput Biol (2009) 5(10): e1000543;

doi: 10.1371/journal.pcbi.1000543 (google)

Getting Started in Tiling Microarray Analysis

(Liu XS)

PLoS Comput Biol (2007) 3(10): e183;

doi: 10.1371/journal.pcbi.0030183 (google)