PyClone 是一种用于推断癌症中克隆种群结构的统计模型。 它是一种贝叶斯聚类方法,用于将深度测序的体细胞突变集分组到假定的克隆簇中,同时估计其细胞流行率(prevalences)并解释由于分段拷贝数变化(segmental copy-number changes)和正常细胞污染(normal-cell contamination)引起的等位基因失衡。 单细胞测序验证证明了 PyClone 的准确性。

The input data for PyClone consists of a set read counts from a deep sequencing experiment, the copy number of the genomic region containing the mutation and an estimate of tumour content.

简易安装

官方推荐使用 MiniConda 来安装 PyClone。为了保证环境的稳定,可为 PyClone 单独建立一个环境,因为 PyClone 基于 Python2.7。在这里,我们使用 Anaconda3(conda 4.5.11) 来安装 PyClone。

  1. # 创建基于 Python2.7 名字为 pyclone 独立环境
  2. conda create --name pyclone python=2
  3. # 激活 pyclone 环境
  4. source activate pyclone
  5. # 退出 pyclone 环境
  6. source deactivate
  7. # 安装 PyClone
  8. conda install pyclone -c aroth85

Anaconda3 中安装完 PyClone,激活环境后,执行 PyClone -h 出现 RuntimeWarning。同样的,我们在 pyclone 的环境中导入 pandas 模板,出现一样的 RuntimeWarning:

  1. (pyclone) shenweiyan@ecs-steven 13:38:25 /home/shenweiyan
  2. $ PyClone -h
  3. /usr/local/software/anaconda3/envs/pyclone/lib/python2.7/site-packages/pandas/_libs/__init__.py:4: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  4. from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
  5. /usr/local/software/anaconda3/envs/pyclone/lib/python2.7/site-packages/pandas/__init__.py:26: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  6. ......
  7. from pandas._libs import algos, lib, writers as libwriters
  8. /usr/local/software/anaconda3/envs/pyclone/lib/python2.7/site-packages/statsmodels/nonparametric/kde.py:22: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  9. from .linbin import fast_linbin
  10. /usr/local/software/anaconda3/envs/pyclone/lib/python2.7/site-packages/statsmodels/nonparametric/smoothers_lowess.py:11: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  11. from ._smoothers_lowess import lowess as _lowess
  12. usage: PyClone [-h] [--version]
  13. {setup_analysis,run_analysis,run_analysis_pipeline,build_mutations_file,plot_clusters,plot_loci,build_table}
  14. ...
  15. positional arguments:
  16. {setup_analysis,run_analysis,run_analysis_pipeline,build_mutations_file,plot_clusters,plot_loci,build_table}
  17. setup_analysis Setup a config file and mutations files for a PyClone
  18. analysis.
  19. run_analysis Run an MCMC sampler to sample from the posterior of
  20. the PyClone model.
  21. run_analysis_pipeline
  22. Run a full PyClone analysis.
  23. build_mutations_file
  24. Build a YAML format file with mutation data and states
  25. prior to be used for PyClone analysis.
  26. plot_clusters Plot features of the clusters.
  27. plot_loci Plot features of the loci.
  28. build_table Build results table which contains cluster ids and
  29. (mean) cellular prevalence estimates.
  30. optional arguments:
  31. -h, --help show this help message and exit
  32. --version show program's version number and exit
  33. (pyclone) shenweiyan@ecs-steven 14:47:17 /home/shenweiyan
  34. $ python
  35. Python 2.7.15 | packaged by conda-forge | (default, Oct 12 2018, 14:10:50)
  36. [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux2
  37. Type "help", "copyright", "credits" or "license" for more information.
  38. >>> >>> import pandas
  39. /usr/local/software/anaconda3/envs/pyclone/lib/python2.7/site-packages/pandas/_libs/__init__.py:4: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  40. from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
  41. /usr/local/software/anaconda3/envs/pyclone/lib/python2.7/site-packages/pandas/__init__.py:26: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  42. from pandas._libs import (hashtable as _hashtable,
  43. ......
  44. /usr/local/software/anaconda3/envs/pyclone/lib/python2.7/site-packages/pandas/io/pytables.py:50: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  45. from pandas._libs import algos, lib, writers as libwriters
  46. >>> pandas.__version__
  47. u'0.23.4'

原因与解决:(参考 anaconda-issues:#6678numpy issues:#11628)

The pandas were build agains different version of numpy. we need to rebuild pandas agains the local numpy.

  1. # 方法一(耗时长)
  2. pip install --no-binary pandas -I pandas
  3. # 方法二(推荐使用)
  4. conda install numpy==1.14.5 --yes

手动安装

要手动安装 PyClone,请确保安装了必要的库(如下所列)。 之后就可以像任何其他 Python 包一样通过 python setup.py install 安装 PyClone。

PyClone 必须满足依赖包如下:

  1. PyDP >= 0.2.3
  2. PyYAML >= 3.10
  3. matplotlib >= 1.2.0 - Required for plotting.
  4. numpy >= 1.6.2 - Required for plotting and clustering.
  5. pandas >= 0.11 - Required for multi sample plotting.
  6. scipy >= 0.11 - Required for plotting and clustering.
  7. seaborn >= 0.6.0

手动安装 PyClone:

  1. $ git clone https://github.com/aroth85/pyclone.git
  2. $ cd pyclone
  3. $ python setup.py install
  4. running install
  5. running bdist_egg
  6. running egg_info
  7. creating PyClone.egg-info
  8. writing PyClone.egg-info/PKG-INFO
  9. ......
  10. Installed /usr/local/software/python2.7/pyclone/lib/python2.7/site-packages/PyClone-0.13.1-py2.7.egg
  11. Processing dependencies for PyClone==0.13.1
  12. Finished processing dependencies for PyClone==0.13.1

到这里,PyClone 就安装完成了,关于该软件具体的使用说明,请参考 PyClone -h 或者 PyClone wiki: Usage

癌症中克隆种群结构统计推断分析软件PyClone安装小记 - 图1

参考资料: