SPAdes

SPAdes is an assembly toolkit containing various assembly pipelines. It is mainly used for small genomes, such as bacterial (both single-cell MDA and standard isolates), fungal and other small genomes. SPAdes is not intended for larger genomes (e.g. mammalian size genomes).

options:

  1. --isolate # this flag is highly recommended for high-coverage isolate and multi-cell data
  2. --sc # this flag is required for MDA (single-cell) data
  3. --meta # this flag is required for metagenomic data
  4. -o <output_dir> # directory to store all the resulting files (required)
  5. -k <int> [<int> ...] # list of k-mer sizes (must be odd and less than 128)

Setting k-mer as odd is to avoid causing positive and negative chain confusion. SPAdes attempts to assemble genomes using different k-mer sizes, the default list of this parameters is 21 33 55.

  1. Input data:
  2. --12 <filename> # file with interlaced forward and reverse paired-end reads
  3. -1 <filename> # file with forward paired-end reads
  4. -2 <filename> # file with reverse paired-end reads
  5. -s <filename> # file with unpaired reads

Assembled genome sequences using SPAdes may be reverse .

Pipeline:

Subsample reads.

  1. $ sambamba view -h -s 0.001 ~/SARS_CoV_2/sortedbam/P3-VERO-P3-9-vero_L4_sort.bam -o ~/SARS_CoV_2/subsample/P3-VERO-P3-9-0.001.bam
  2. $ samtools sort -@ 4 -l 9 ~/SARS_CoV_2/subsample/P3-VERO-P3-9-0.001.bam -o ~/SARS_CoV_2/subsample/P3-VERO-P3-9-sort0.001.bam
  3. $ samtools index ~/SARS_CoV_2/subsample/P3-VERO-P3-9-sort0.001.bam ~/SARS_CoV_2/subsample/P3-VERO-P3-9-sort0.001.bam.bai

Convert the BAM to FASTQ.

  1. $ samtools fastq ~/SARS_CoV_2/subsample/P3-VERO-P3-9-sort0.001.bam > ~/SARS_CoV_2/fastq/P3-VERO-P3-9_sort0.001.fq

Assemble.

  1. $ spades.py --meta --12 ~/SARS_CoV_2/fastq/P3-VERO-P3-1-vero_L4_006.fq -o ~/SARS_CoV_2/genome/P3-VERO-P3-1-006_meta.fasta

Result:

Assembly (SPAdes) - 图1
The full list of <output_dir> content is presented below:

  • scaffolds.fasta – resulting scaffolds (recommended for use as resulting sequences)
  • contigs.fasta – resulting contigs
    **scaffolds.fasta** and **contigs.fasta** are no difference in the assembly results of this project.
  • assembly_graph.fastg – assembly graph
  • contigs.paths – contigs paths in the assembly graph
  • scaffolds.paths – scaffolds paths in the assembly graph
  • before_rr.fasta – contigs before repeat resolution
  • corrected/ – files from read error correction
    • configs/ – configuration files for read error correction
    • corrected.yaml – internal configuration file
    • Output files with corrected reads
  • params.txt – information about SPAdes parameters in this run
  • spades.log – SPAdes log
  • dataset.info – internal configuration file
  • input_dataset.yaml – internal YAML data set file
  • K<##>/ – directory containing intermediate files from the run with K=<##>. These files should not be used as assembly results; use resulting contigs/scaffolds in files mentioned above.

    More information:

    Invoking the manual

    1. $ conda activate covid
    2. $ spades.py

https://github.com/ablab/spades#sec1