Pilon is a software tool which can be used to:

  • Automatically improve draft assemblies.
  • Find variation among strains, including large event detection.

Pilon uses read alignment analysis to identify inconsistencies between the input genome and the evidence in the reads. It then attempts to make improvements to the input genome, including:

  • Single base differences
  • Small indels
  • Larger indel or block substitution events
  • Gap filling
  • Identification of local misassemblies, including optional opening of new gaps

    Pipeline

    Build index

    1. bwa index -p P3-${i}_meta P3-${i}_meta.fasta

    Align and sort

    1. bwa mem -t 20 P3-${i}_meta \
    2. ~/SARS_CoV_2/clean_data/P3-VERO-P3-${i}-vero_L4_1.fq.gz \
    3. ~/SARS_CoV_2/clean_data/P3-VERO-P3-${i}-vero_L4_2.fq.gz \
    4. | samtools sort -@ 10 -O bam -o P3-${i}_meta.bam

    Build index for output .bam files

    .bam files must be sorted in coordinate order and indexed.

  1. samtools index -@ 10 P3-${i}_meta.bam

Mark duplication

If you data is paired-end data is sequenced using PCR-free method, you can omit this step.

  1. sambamba markdup -t 20 --hash-table-size=1000000 --overflow-list-size=1000000 P3-${i}_meta.bam P3-${i}_markdup.bam
  • Options:

    1. --hash-table-size # default is 262144 reads
    2. --overflow-list-size # default is 200000 reads

    You may get the following message if you do not set these two options or parameter is smaller.

    1. finding positions of the duplicate reads in the file...
    2. sambamba-markdup: Cannot open file `/tmp/sambamba-pid213077-markdup-nzko/PairedEndsInfobcfa3' in mode `w+' (Too many open files)

    Filter the reads with quality score better than Q30

    1. samtools view -@ 10 -q 30 -b -h P3-${i}_markdup.bam > P3-${i}_filter.bam
    2. samtools index -@ 10 P3-${i}_filter.bam

    Polish

    1. java -Xmx16G -jar ~/miniconda3/envs/covid/share/pilon-1.23-3/pilon-1.23.jar \
    2. --genome P3-${i}_meta.fasta \
    3. --frags P3-${i}_filter.bam \
    4. --changes --fix all,breaks\
    5. --output pilon_polished_${i} \
    6. --vcf &> pilon${i}.log
  • Options:

    1. -Xmx${MEMORY}G
    2. # MEMORY to allocate to the JVM. The amount of memory required depends on the genome, the read data, and how many fixes Pilon needs to make.
    3. -jar <pilon.jar>
    4. INPUTS
    5. --genome <genome.fasta>
    6. # The input genome we are trying to improve.
    7. --frags <frags.bam>
    8. # A bam file consisting of fragment paired-end alignments.
    9. --jumps <jumps.bam>
    10. # A bam file consisting of jump (mate pair) paired-end alignments
    11. --unpaired <unpaired.bam>
    12. # A bam file consisting of unpaired alignments.
    13. --bam any.bam
    14. # A bam file of unknown type; Pilon will scan it and attempt to classify it as one of the above bam types.
    15. CONTROL:
    16. --fix fixlist
    17. # A comma-separated list of categories of issues to try to fix:
    18. "snps": try to fix individual base errors;*
    19. "indels": try to fix small indels;*
    20. "gaps": try to fill gaps;
    21. "local": try to detect and fix local misassemblies;
    22. "all": all of the above (default);
    23. "bases": shorthand for "snps" and "indels" (for back compatibility);
    24. "none": none of the above; new fasta file will not be written.
    25. OUTPUTS:
    26. --output <prefix>
    27. --changes
    28. # If specified, a file listing changes in the <output>.fasta will be generated.
    29. --vcf
    30. # If specified, a vcf file will be generated

    Result

    Pilon - 图1
    pilon.log:

    1. Pilon version 1.23 Mon Nov 26 16:04:05 2018 -0500
    2. Warning: experimental fix option breaks
    3. Genome: P3-9_meta.fasta
    4. Fixing breaks, indels, gaps, local, snps
    5. Input genome size: 29924
    6. Scanning BAMs
    7. P3-9_filter.bam: 550450330 reads, 539413686 filtered, 11036644 mapped, 9518254 proper, 979354 stray, FR 93% 211+/-144, max 1143
    8. Processing NODE_1_length_29924_cov_1434.380629:1-29924
    9. frags P3-9_filter.bam: coverage 31191
    10. Total Reads: 11036644, Coverage: 31191, minDepth: 3119
    11. Confirmed 29839 of 29924 bases (99.72%)
    12. Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
    13. # Attempting to fix local continuity breaks
    14. NODE_1_length_29924_cov_1434.380629:1-29924 log:
    15. Finished processing NODE_1_length_29924_cov_1434.380629:1-29924
    16. Writing NODE_1_length_29924_cov_1434.380629:1-29924 VCF to pilon_polished_9.vcf
    17. Writing NODE_1_length_29924_cov_1434.380629:1-29924 changes to pilon_polished_9.changes
    18. Writing updated NODE_1_length_29924_cov_1434.380629_pilon to pilon_polished_9.fasta
    19. Mean frags coverage: 31191
    20. Mean total coverage: 31191

    Pilon did not make adjustments to the input data.

    More information

  • Pilon documentation

https://github.com/broadinstitute/pilon/wiki