瞎解释的真的多,还有一些公众号、简书号、课程什么的也是瞎写,竟然还是收费的。。。误导人啊。
体细胞突变(somatic mutations)
我们的体细胞时刻在发生着变异,但是一般突变率都很低,并且细胞会进行修复,不能修复就被免疫系统破坏掉。相对健康单个个体而言,所有细胞内的遗传物质应该是一样的。而癌症大部分是因为某个组织局部的细胞发生了突变的累积,免疫系统也无法起作用。但是这种变异是不遗传的,因为变异只是发生在个别细胞中,身体中其他大部分都是正常的,包括生殖细胞等。所以是细胞与细胞之间的差异。
生殖/种系变异(germline variations)
这个理解也很简单,如果我们都是来自于为数不多的祖先,而我们这些后代之间又千差万别,是因为在进化过程中发生了某些变异,并且遗传了下来。所以是个体与个体,种群与种群之间的差异。
生信分析
所以生信分析的时候,这两种变异分析也是不一样的。生殖变异(包括SNP等)只需要将序列与参考基因组比较就好,参考基因组可以比喻成另一个人吧(其实是很多人的基因组装的),是人与人的比较。而体细胞变异你需要比较的是同一个人的癌症部位和正常部位的基因。
进入正题 SNP、SNV
维基百科解释 https://en.wikipedia.org/wiki/Single-nucleotide_polymorphism
A single-nucleotide polymorphism (SNP; /snɪp/; plural /snɪps/) is a substitution of a single nucleotide that occurs at a specific position in the genome, where each variation is present at a level of more than 1% in the population.[1]
For example, at a specific base position in the human genome, the C nucleotide may appear in most individuals, but in a minority of individuals, the position is occupied by an A. This means that there is a SNP at this specific position, and the two possible nucleotide variations – C or A – are said to be alleles for this position.
SNPs underline differences in our susceptibility to a wide range of diseases (e.g. – sickle-cell anemia, β-thalassemia and cystic fibrosis result from SNPs).[2][3][4] The severity of illness and the way the body responds to treatments are also manifestations of genetic variations. For example, a single-base mutation in the APOE (apolipoprotein E) gene is associated with a lower risk for Alzheimer’s disease.[5]
A single-nucleotide variant (SNV) is a variation in a single nucleotide without any limitations of frequency and may arise in somatic cells. A somatic single-nucleotide variation (e.g., caused by cancer) may also be called a single-nucleotide alteration.
所以,SNP就是一种生殖/种系变异,是可以遗传的,而且在人群中频率大于1%。SNV就只是体细胞变异。
TCGA 数据库中
TCGA数据库中包含这两种数据,但是SNP数据是不开放的,要通过申请获得。并且这个数据是通过SNP 6.0 array 芯片测定的。拷贝数变异也是通过这个芯片测得,数据是开放的。
SNV数据就是开放的,有VCF文件和注释后的MAF文件。可以查看[TCGA pipeline].(https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/)
这个数据是通过全外显子测序分析得到的(WXS,whole exome sequencing )。
最后
如果有任何错误,欢迎指正。