© Mark D. Robinson |
由於語法渲染問題而影響閱讀體驗, 請移步博客閱讀~
本文GitPage地址
edgeR: empirical analysis of DGE in R
- An overdispersed Poisson model is used to account for both biological and technical variability.
- Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
- The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated.
Why EdgeR
- For microarrays, the abundance of a particular transcript is measured as afluorescence intensity, effectively a continuous response
- Digital gene expression (DGE) data the abundance is observed as a count
- Therefore, procedures that are successful for microarray data are not directly applicable to DGE data
- . edgeR is designed for the analysis of replicated count-based expression data and is an implementation of methology developed by Robinson and Smyth[1][2].
- It initially developed for serial analysis of gene expression (SAGE)
As a result, edgeR may also be useful in other experiments that generate counts, such as ChIP-seq, in proteomics experiments where spectral counts are used to summarize the peptide abundance[3] or in barcoding experiments where several species are counted [4].
Digital gene expression: Digital gene expression (DGE) is a sequence-based approach for gene expression analyses, that generates a digital output at an unparalleled level of sensitivity[5].
Serial analysis of gene expression (SAGE): Serial analysis of gene expression, or SAGE, is an experimental technique designed to gain a direct and quantitative measure of gene expression. The SAGE method is based on the isolation of unique sequence tags (9-10 bp in length) from individual mRNAs and concatenation of tags serially into long DNA molecules for a lump-sum sequencing[6].
Spam test
Spam test2
Method
In limma (Smyth, 2004), where an empirical Bayes model is used to moderate the probe-wise variances.
In edgeR:
We assume the data can be summarized into a table of counts
We model the data as negative binomial (NB) distributed
%0A#card=math&code=Y%20%7Bgi%7D%20%5Csim%20NB%28M%20i%20p_%20%7Bgj%7D%2C%5Cphi_g%29%0A)
For gene and sample :
: the library size (total number of reads),
: the dispersion
: is the relative abundance of gene in experimental group to which sample belongs.
We use the NB parameterization where:
- the mean is
- the variance is #card=math&code=%CE%BC%20%7Bgi%7D%281%2B%20%5Cmu%20%20%7Bgi%7D%20%5Cphi%20_g%29)
For differential expression analysis:
- the parameters of interest are .
The NB distribution is reduced to Poisson when $ \phi_g = 0$.
In some DGE applications, technical variation can be treated as Poisson.
In general, represents the coefficient of variation of biological variation between the samples. In this way, our model is able to separate biological from technical variation.
limma
: dispersion estimates -> topTags
: tabulate the top differentially expressed genes
-> plotSmear
: MA plot
More
There are a few terms and algorithms I do not understand. So, I’ll update this page later.
Enjoy~
由於語法渲染問題而影響閱讀體驗, 請移步博客閱讀~
本文GitPage地址
GitHub: Karobben
Blog:Karobben
BiliBili:史上最不正經的生物狗
Robinson MD, Smyth GK. Moderated statistical tests for assessing differences in tag abundance, Bioinformatics, 2007, vol. 23 (pg. 2881-2887) ↩︎
[Robinson MD, Smyth GK. Small sample estimation of negative binomial dispersion, with applications to SAGE data, Biostatistics, 2008, vol. 9 (pg. 321-332)] ↩︎
Andersson AF, et al. Comparative analysis of human gut microbiota by barcoded pyrosequencing, PLoS ONE, 2008, vol. 3 pg. e2836 ↩︎
Wong JWH, et al. Computational methods for the comparative quantification of proteins in label-free LCn-MS experiments, Brief. Bioinform., 2008, vol. 9 (pg. 156-165) ↩︎
Rodríguez-Esteban, G., González-Sastre, A., Rojo-Laguna, J.I. et al. Digital gene expression approach over multiple RNA-Seq data sets to detect neoblast transcriptional changes in Schmidtea mediterranea . BMC Genomics 16, 361 (2015). https://doi.org/10.1186/s12864-015-1533-1 ↩︎
Yamamoto M, Wakatsuki T, Hada A, Ryo A. Use of serial analysis of gene expression (SAGE) technology. J Immunol Methods. 2001 Apr;250(1-2):45-66. doi: 10.1016/s0022-1759(01)00305-2. PMID: 11251221. ↩︎