介绍

https://github.com/Nextomics/NextPolish

https://github.com/Nextomics/NextPolish2

  1. conda install nextpolish -c bioconda
  1. [General]
  2. job_type = local
  3. job_prefix = nextPolish
  4. task = best
  5. rewrite = yes
  6. rerun = 3
  7. parallel_jobs = 6
  8. multithread_jobs = 5
  9. genome = ./raw.genome.fasta #genome file
  10. genome_size = auto
  11. workdir = ./01_rundir
  12. polish_options = -p {multithread_jobs}
  13. [lgs_option]
  14. lgs_fofn = ./lgs.fofn
  15. lgs_options = -min_read_len 1k -max_depth 100
  16. lgs_minimap2_options = -x map-ont
  1. threads=20
  2. genome=input.genome.fa # 组装的基因组
  3. lgsreads=input.lgs.reads.fq.gz # 三代长度序列
  4. minimap2 \
  5. -ax map-pb \
  6. -t ${threads} \
  7. ${genome} \
  8. ${lgsreads} | \
  9. samtools sort - -m 2g --threads 20 -o genome.lgs.bam
  10. samtools index genome.lgs.bam
  11. ls `pwd`/genome.lgs.bam > pb.map.bam.fofn
  12. python NextPolish/lib/nextpolish2.py \
  13. -g ${genome} \
  14. -l pb.map.bam.fofn \
  15. -r hifi \
  16. -p 20 \
  17. -a \
  18. -o genome.lgspolish.fa
  1. usage: nextpolish2.py [-h] -g FILE -l FILE -r {clr,hifi,ont} [-b FILE] [-i BLOCK_INDEX] [-o FILE] [-p INT] [-u] [-w STR] [-a] [-sp] [-id FLOAT] [-as FLOAT]
  2. nextpolish2.py:
  3. correct structural & base errors in the genome with long reads using multi-processor.
  4. exmples:
  5. nextpolish2.py -g genome.fa -l lgs.sort.bam.list -r ont -p 10
  6. options:
  7. -h, --help show this help message and exit
  8. -g FILE, --genome FILE
  9. genome file, the reference of bam alignments. (default: None)
  10. -l FILE, --bam_list FILE
  11. sorted bam file list of long reads, one file one line, require index file. (default: None)
  12. -r {clr,hifi,ont}, --read_type {clr,hifi,ont}
  13. reads type, clr=PacBio continuous long read, hifi=PacBio highly accurate long reads, ont=NanoPore 1D reads (default: None)
  14. -b FILE, --block FILE
  15. genome block file, each line includes [seq_id, index]. (default: None)
  16. -i BLOCK_INDEX, --block_index BLOCK_INDEX
  17. index of seqs need to be corrected in genome block file. (default: all)
  18. -o FILE, --out FILE output file, corrected seqs in output file will be skipped. (default: stdout)
  19. -p INT, --process INT
  20. number of processes used for correcting. (default: 10)
  21. -u, --uppercase output uppercase sequences. (default: False)
  22. -w STR, --window STR size of window (>=5M) to split super-long contigs, shorter size requires less memory and more CPU time. (default: 5M)
  23. -a, --auto automatically adjust window size (-w) and processes (-p). (default: True)
  24. -sp, --split split the corrected contig with un-corrected regions. (default: True)
  25. -id FLOAT, --alignment_identity_ratio FLOAT
  26. split the corrected contig if alignment_identity/median_alignment_identity < $identity_ratio, co-use with --split. (default: 0.8)
  27. -as FLOAT, --alignment_score_ratio FLOAT
  28. split the corrected contig if alignment_score/max_alignment_score < $alignment_score_ratio, co-use with --split. (default: 0.8)
  1. /proj/nobackup/hpc2nstor2024-021/shwzhao/bin/miniconda3/share/nextpolish-1.4.1/lib/nextpolish2.py

参考

github 地址:https://github.com/Nextomics/NextPolish
公众号 | 生信媛 | 使用nextpolish对三代组装进行polish(v1.2.2)