edgeR

edgeR: empirical analysis of DGE in R
Why EdgeR
Method
More


© Mark D. Robinson

由於語法渲染問題而影響閱讀體驗，請移步博客閱讀～
本文GitPage地址

edgeR: empirical analysis of DGE in R

cite: Mark D. Robinson, Davis J. McCarthy, Gordon K. Smyth, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data, Bioinformatics, Volume 26, Issue 1, 1 January 2010, Pages 139–140, https://doi.org/10.1093/bioinformatics/btp616

An overdispersed Poisson model is used to account for both biological and technical variability.
Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated.

Why EdgeR

For microarrays, the abundance of a particular transcript is measured as afluorescence intensity, effectively a continuous response
Digital gene expression (DGE) data the abundance is observed as a count
Therefore, procedures that are successful for microarray data are not directly applicable to DGE data
. edgeR is designed for the analysis of replicated count-based expression data and is an implementation of methology developed by Robinson and Smyth[1][2].
It initially developed for serial analysis of gene expression (SAGE)
As a result, edgeR may also be useful in other experiments that generate counts, such as ChIP-seq, in proteomics experiments where spectral counts are used to summarize the peptide abundance[3] or in barcoding experiments where several species are counted [4].

Digital gene expression: Digital gene expression (DGE) is a sequence-based approach for gene expression analyses, that generates a digital output at an unparalleled level of sensitivity[5].

Serial analysis of gene expression (SAGE): Serial analysis of gene expression, or SAGE, is an experimental technique designed to gain a direct and quantitative measure of gene expression. The SAGE method is based on the isolation of unique sequence tags (9-10 bp in length) from individual mRNAs and concatenation of tags serially into long DNA molecules for a lump-sum sequencing[6].

Spam test
Spam test2

Method

In limma (Smyth, 2004), where an empirical Bayes model is used to moderate the probe-wise variances.

In edgeR:
We assume the data can be summarized into a table of counts
We model the data as negative binomial (NB) distributed

$edgeR - 图2$ %0A#card=math&code=Y%20%7Bgi%7D%20%5Csim%20NB%28M%20i%20p_%20%7Bgj%7D%2C%5Cphi_g%29%0A)

For gene $edgeR - 图3$ and sample $edgeR - 图4$ :
$edgeR - 图5$ : the library size (total number of reads),
$edgeR - 图6$ : the dispersion
$edgeR - 图7$ : is the relative abundance of gene $edgeR - 图8$ in experimental group $edgeR - 图9$ to which sample $edgeR - 图10$ belongs.

We use the NB parameterization where:

the mean is $edgeR - 图11$
the variance is $edgeR - 图12$ #card=math&code=%CE%BC%20%7Bgi%7D%281%2B%20%5Cmu%20%20%7Bgi%7D%20%5Cphi%20_g%29)

For differential expression analysis:

the parameters of interest are $edgeR - 图13$ .

The NB distribution is reduced to Poisson when $ \phi_g = 0$.

In some DGE applications, technical variation can be treated as Poisson.
In general, $edgeR - 图14$ represents the coefficient of variation of biological variation between the samples. In this way, our model is able to separate biological from technical variation.

limma: dispersion estimates -> topTags: tabulate the top differentially expressed genes
-> plotSmear: MA plot

There are a few terms and algorithms I do not understand. So, I’ll update this page later.

Enjoy~

本文由Python腳本GitHub/語雀自動更新

由於語法渲染問題而影響閱讀體驗，請移步博客閱讀～
本文GitPage地址

GitHub: Karobben
Blog:Karobben
BiliBili:史上最不正經的生物狗

Robinson MD, Smyth GK. Moderated statistical tests for assessing differences in tag abundance, Bioinformatics, 2007, vol. 23 (pg. 2881-2887) ↩︎
[Robinson MD, Smyth GK. Small sample estimation of negative binomial dispersion, with applications to SAGE data, Biostatistics, 2008, vol. 9 (pg. 321-332)] ↩︎
Andersson AF, et al. Comparative analysis of human gut microbiota by barcoded pyrosequencing, PLoS ONE, 2008, vol. 3 pg. e2836 ↩︎
Wong JWH, et al. Computational methods for the comparative quantification of proteins in label-free LCn-MS experiments, Brief. Bioinform., 2008, vol. 9 (pg. 156-165) ↩︎
Rodríguez-Esteban, G., González-Sastre, A., Rojo-Laguna, J.I. et al. Digital gene expression approach over multiple RNA-Seq data sets to detect neoblast transcriptional changes in Schmidtea mediterranea . BMC Genomics 16, 361 (2015). https://doi.org/10.1186/s12864-015-1533-1 ↩︎
Yamamoto M, Wakatsuki T, Hada A, Ryo A. Use of serial analysis of gene expression (SAGE) technology. J Immunol Methods. 2001 Apr;250(1-2):45-66. doi: 10.1016/s0022-1759(01)00305-2. PMID: 11251221. ↩︎

edgeR: empirical analysis of DGE in R

Why EdgeR

Method

More