Pilon is a software tool which can be used to:
- Automatically improve draft assemblies.
- Find variation among strains, including large event detection.
Pilon uses read alignment analysis to identify inconsistencies between the input genome and the evidence in the reads. It then attempts to make improvements to the input genome, including:
- Single base differences
- Small indels
- Larger indel or block substitution events
- Gap filling
- Identification of local misassemblies, including optional opening of new gaps
Pipeline
Build index
bwa index -p P3-${i}_meta P3-${i}_meta.fasta
Align and sort
bwa mem -t 20 P3-${i}_meta \~/SARS_CoV_2/clean_data/P3-VERO-P3-${i}-vero_L4_1.fq.gz \~/SARS_CoV_2/clean_data/P3-VERO-P3-${i}-vero_L4_2.fq.gz \| samtools sort -@ 10 -O bam -o P3-${i}_meta.bam
Build index for output
.bamfiles.bamfiles must be sorted in coordinate order and indexed.
samtools index -@ 10 P3-${i}_meta.bam
Mark duplication
If you data is paired-end data is sequenced using PCR-free method, you can omit this step.
sambamba markdup -t 20 --hash-table-size=1000000 --overflow-list-size=1000000 P3-${i}_meta.bam P3-${i}_markdup.bam
Options:
--hash-table-size # default is 262144 reads--overflow-list-size # default is 200000 reads
You may get the following message if you do not set these two options or parameter is smaller.
finding positions of the duplicate reads in the file...sambamba-markdup: Cannot open file `/tmp/sambamba-pid213077-markdup-nzko/PairedEndsInfobcfa3' in mode `w+' (Too many open files)
Filter the reads with quality score better than
Q30samtools view -@ 10 -q 30 -b -h P3-${i}_markdup.bam > P3-${i}_filter.bamsamtools index -@ 10 P3-${i}_filter.bam
Polish
java -Xmx16G -jar ~/miniconda3/envs/covid/share/pilon-1.23-3/pilon-1.23.jar \--genome P3-${i}_meta.fasta \--frags P3-${i}_filter.bam \--changes --fix all,breaks\--output pilon_polished_${i} \--vcf &> pilon${i}.log
Options:
-Xmx${MEMORY}G# MEMORY to allocate to the JVM. The amount of memory required depends on the genome, the read data, and how many fixes Pilon needs to make.-jar <pilon.jar>INPUTS:--genome <genome.fasta># The input genome we are trying to improve.--frags <frags.bam># A bam file consisting of fragment paired-end alignments.--jumps <jumps.bam># A bam file consisting of jump (mate pair) paired-end alignments--unpaired <unpaired.bam># A bam file consisting of unpaired alignments.--bam any.bam# A bam file of unknown type; Pilon will scan it and attempt to classify it as one of the above bam types.CONTROL:--fix fixlist# A comma-separated list of categories of issues to try to fix:"snps": try to fix individual base errors;*"indels": try to fix small indels;*"gaps": try to fill gaps;"local": try to detect and fix local misassemblies;"all": all of the above (default);"bases": shorthand for "snps" and "indels" (for back compatibility);"none": none of the above; new fasta file will not be written.OUTPUTS:--output <prefix>--changes# If specified, a file listing changes in the <output>.fasta will be generated.--vcf# If specified, a vcf file will be generated
Result

pilon.log:Pilon version 1.23 Mon Nov 26 16:04:05 2018 -0500Warning: experimental fix option breaksGenome: P3-9_meta.fastaFixing breaks, indels, gaps, local, snpsInput genome size: 29924Scanning BAMsP3-9_filter.bam: 550450330 reads, 539413686 filtered, 11036644 mapped, 9518254 proper, 979354 stray, FR 93% 211+/-144, max 1143Processing NODE_1_length_29924_cov_1434.380629:1-29924frags P3-9_filter.bam: coverage 31191Total Reads: 11036644, Coverage: 31191, minDepth: 3119Confirmed 29839 of 29924 bases (99.72%)Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases# Attempting to fix local continuity breaksNODE_1_length_29924_cov_1434.380629:1-29924 log:Finished processing NODE_1_length_29924_cov_1434.380629:1-29924Writing NODE_1_length_29924_cov_1434.380629:1-29924 VCF to pilon_polished_9.vcfWriting NODE_1_length_29924_cov_1434.380629:1-29924 changes to pilon_polished_9.changesWriting updated NODE_1_length_29924_cov_1434.380629_pilon to pilon_polished_9.fastaMean frags coverage: 31191Mean total coverage: 31191
Pilon did not make adjustments to the input data.
More information
Pilon documentation
