9. Pilon function 1: Improved assembly - 《The English version of COVID-19 Genome Assembly》

1. Download and install
2. Prepare the data
3. Comparison
4. Experimental design
5. Interpretation of results:
- Fasta file after different Polish strategies

1. Download and install

wget http: / /https://github.com/broadinstitute/pilon/releases/download/v1.23/pilon-1.23.jar Java - Xmx10G - jar pilon - 1.23. The jar Java - Xmx10G - jar pilon - 1.22. The jar

2. Prepare the data

1. Genome files Scaffold. fa assembled with three generations of data 2. Bam. sort file built with the assembled fasta as an index. Can be divided into 4 KINDS of BAM files

bam	4 kinds of bam files
The bam. sort file of the subsample handles the resulting double-ended FQ	subfq1,subfq2>alignsort1.bam
SPAdes are assembled with corrected reads	meta-cor1.bam
SPAdes assembled unpaired reads	unpaired1.bam
Mardup labeled BAM file after PCR repetition	cor-markdup1.bam

3. Comparison

The index

Bwa index -p (prefix) meta1/meta1-cor meta1. Fasta /meta1-cor. Fasta

The comparison sort

bwa mem -t 16 index subv1-1.fg subv1-2.fq |samtools sort -@ 10 -O bam -o alignsort1.bam bwa mem -t 16 index meta1-cor1.fq meta1-cor2.fq |samtools sort -@ 10 -O bam -o meta-cor1.bam

Compare and index BAM files

samtools index -@ 10 alignsort1.bam

sambamba markup -t 10 align.bam align_markup.bam

Important parameters needed to improve genomic assembly :(select —fix, not — VCF, note —unpaired parameter)
**

Java-xmx {$x} g-jar pilon-1.23.jar --genome meta1.fasta --frags align_markdup1.bam --fix SNPS,indels --output pilonpolished1 --outdir pilon &> pilon2.log

The important paramete**rs **usage

-Xmx{$x}G Determine the memory size based on the genome size --bam The type of BAM input can be automatically identified --frags Enter SubfQ or SPAdes corrected fq reads --duplicates The BAM file for Markdup --unpaired Enter Unpaired reads in the SPAdes corrected file --fix Choose to repair the non-strategy (single base error) SNPs (small Indel) Indels, fasta output of the ambiguous base (AMB), and allow local recombination to open up new white breaks

4. Experimental design

Strategy 1:

Subfq reads constructed index alignment to obtain the BAM file with the meta-FastA file obtained after the first SPAdes assembly.

Strategy 2:

Corrected reads are compared with the BAM obtained with the FASTa file obtained after the second Spades.

Strategy 3:

— FRAGS bAM file marker PCR repeat markup. bAM file, — Unpaired Bam generated by Unpaired Unpaired reads.

Strategy 4:

— Frags Strategy 3 markup. bam file, no input of Unpaired Bam file

Strategy 5:

— FRAGS BAM file in strategy 2, — Unpaired Bam generated by Unpaired unpaired reads.
Quast evaluates these several Polish results as indicators

5. Interpretation of results:

Fasta file after different Polish strategies

The comparison can be seen as follows:
1.If SPAdes are re-assembled (only corrected reads are re-assembled), genomic coverage decreases after assembly and increases per 100kbp misalignment.
2.This decrease cannot be further improved by increasing the coverage of unpaired BAM files generated by SPAdes and corrected reads at the first pass (compared with fasta generated by unpaired reads at the first pass).

Conclusion: SPAdes does better with a single assembly than with corrected reads.
Question: Can the unpaired BAM from unpaired reads assembled in the first time improve the fastA file formed in the first time?

The figure is the report of the Pilon operation. It can be seen that there are 1193reads in the pilon process of the Unpaired Bam file, so it is assumed that it is useful and further exploration will be carried out.
Comparison: From left to right are 1. Not polish2. Polish only enter —frags 3. Polish enter —frags and — unpaired, 4. Is corrected. Bam and unpaired. Bam
Conclusion: The fasta assembled best without the second SPAdes.The effect of SPAdes (corrected reads only) and Polish of Unpaired reads were better than that of the second time.