介绍

https://github.com/Nextomics/NextPolish2

conda install nextpolish -c bioconda

[General]
job_type = local
job_prefix = nextPolish
task = best
rewrite = yes
rerun = 3
parallel_jobs = 6
multithread_jobs = 5
genome = ./raw.genome.fasta #genome file
genome_size = auto
workdir = ./01_rundir
polish_options = -p {multithread_jobs}
[lgs_option]
lgs_fofn = ./lgs.fofn
lgs_options = -min_read_len 1k -max_depth 100
lgs_minimap2_options = -x map-ont

threads=20
genome=input.genome.fa # 组装的基因组
lgsreads=input.lgs.reads.fq.gz # 三代长度序列
minimap2 \
  -ax map-pb \
  -t ${threads} \
  ${genome} \
  ${lgsreads} | \
  samtools sort - -m 2g --threads 20 -o genome.lgs.bam
samtools index genome.lgs.bam
ls `pwd`/genome.lgs.bam > pb.map.bam.fofn
python NextPolish/lib/nextpolish2.py \
  -g ${genome} \
  -l pb.map.bam.fofn \
  -r hifi \
  -p 20 \
  -a \
  -o genome.lgspolish.fa

usage: nextpolish2.py [-h] -g FILE -l FILE -r {clr,hifi,ont} [-b FILE] [-i BLOCK_INDEX] [-o FILE] [-p INT] [-u] [-w STR] [-a] [-sp] [-id FLOAT] [-as FLOAT]
nextpolish2.py:
        correct structural & base errors in the genome with long reads using multi-processor.
exmples:
        nextpolish2.py -g genome.fa -l lgs.sort.bam.list -r ont -p 10
options:
  -h, --help            show this help message and exit
  -g FILE, --genome FILE
                        genome file, the reference of bam alignments. (default: None)
  -l FILE, --bam_list FILE
                        sorted bam file list of long reads, one file one line, require index file. (default: None)
  -r {clr,hifi,ont}, --read_type {clr,hifi,ont}
                        reads type, clr=PacBio continuous long read, hifi=PacBio highly accurate long reads, ont=NanoPore 1D reads (default: None)
  -b FILE, --block FILE
                        genome block file, each line includes [seq_id, index]. (default: None)
  -i BLOCK_INDEX, --block_index BLOCK_INDEX
                        index of seqs need to be corrected in genome block file. (default: all)
  -o FILE, --out FILE   output file, corrected seqs in output file will be skipped. (default: stdout)
  -p INT, --process INT
                        number of processes used for correcting. (default: 10)
  -u, --uppercase       output uppercase sequences. (default: False)
  -w STR, --window STR  size of window (>=5M) to split super-long contigs, shorter size requires less memory and more CPU time. (default: 5M)
  -a, --auto            automatically adjust window size (-w) and processes (-p). (default: True)
  -sp, --split          split the corrected contig with un-corrected regions. (default: True)
  -id FLOAT, --alignment_identity_ratio FLOAT
                        split the corrected contig if alignment_identity/median_alignment_identity < $identity_ratio, co-use with --split. (default: 0.8)
  -as FLOAT, --alignment_score_ratio FLOAT
                        split the corrected contig if alignment_score/max_alignment_score < $alignment_score_ratio, co-use with --split. (default: 0.8)

/proj/nobackup/hpc2nstor2024-021/shwzhao/bin/miniconda3/share/nextpolish-1.4.1/lib/nextpolish2.py

参考

github 地址：https://github.com/Nextomics/NextPolish
公众号 | 生信媛 | 使用nextpolish对三代组装进行polish(v1.2.2)

组学分析

工具 | NextPolish

介绍

参考