介绍

LTR_retriever(https://github.com/oushujun/LTR_retriever

工具 | LTR_retriever: 从头识别 LTR - 图1

下载

使用

  1. LTR_retriever -h
  2. ##
  3. ## ##########################
  4. ## ### LTR_retriever v2.9.0 ###
  5. ## ##########################
  6. ##
  7. ## A program for accurate identification of LTR-RTs from outputs of LTRharvest and
  8. ## LTR_FINDER, generates non-redundant LTR-RT library for genome annotations.
  9. ##
  10. ## Shujun Ou (shujun.ou.1@gmail.com) 03/26/2019
  11. ##
  12. ## Usage: LTR_retriever -genome genomefile -inharvest LTRharvest_input [options]

输入选项

  1. ## 【Input Options】
  2. ## -genome [File] Specify the genome sequence file (FASTA)
  3. ## -inharvest [File] LTR-RT candidates from LTRharvest
  4. ## -infinder [File] LTR-RT candidates from LTR_FINDER
  5. ## -inmgescan [File] LTR-RT candidates from MGEScan_LTR
  6. ## -nonTGCA [File] Non-canonical LTR-RT candidates from LTRharvest
  • -genome:
  • -inharvest:
  • -infinder:
  • -inmgescan:
  • -nonTGCA:

输出选项

  1. ## 【Output options】
  2. ## -verbose/-v Retain intermediate outputs (developer mode)
  3. ## -noanno Disable whole genome LTR-RT annotation (no GFF3 output)
  • -verbose/-v:
  • -noanno:

过滤选项

  1. ## 【Filter options】
  2. ## -misschar [CHR] Specify the ambiguous character (default N)
  3. ## -Nscreen Disable filtering ambiguous sequence in candidates
  4. ## -missmax [INT] Maximum number of ambiguous bp allowed in a candidate (default 10)
  5. ## -missrate [0-1] Maximum percentage of ambiguous bp allowed in a candidate (default 0.8)
  6. ## -minlen [INT] Minimum bp of the LTR region (default 100)
  7. ## -max_ratio [FLOAT] Maximum length ratio of internal region/LTR region (default 50)
  8. ## -minscore [INT] Minimum alignment length (INT/2) to identify tandem repeats (default 1000)
  9. ## -flankmiss [1-60] Maximum ambiguous length (bp) allowed in 60bp-flanking sequences (default 25)
  10. ## -flanksim [0-100] Minimum percentage of identity for flanking sequence alignment (default 60)
  11. ## -flankaln [0-1] Maximum alignment portion allowed for 60bp-flanking sequences (default 0.6)
  12. ## -motif [[STRING]] Specify non-canonical motifs to search for
  13. ## (default -motif [TCCA TGCT TACA TACT TGGA TATA TGTA TGCA])
  14. ## -notrunc Discard truncated LTR-RTs and nested LTR-RTs (will dampen sensitivity)
  15. ## -procovTE [0-1] Maximum portion of allowed for cumulated DNA TE database and LINE database
  16. ## lignments (default 0.7)
  17. ## -procovPL [0-1] Maximum portion allowed for cumulated plant protein database alignments (default 0.7)
  18. ## -prolensig [INT] Minimum alignment length (aa) for LINE/DNA transposase/plant protein alignment (default 30)
  • -misschar
  • -Nscreen
  • -missmax
  • -missrate
  • -minlen
  • -max_ratio
  • -minscore
  • -flankmiss:
  • -flanksim:
  • -flankaln:
  • -motif:
  • -notrunc:
  • -procovTE:
  • -procovPL:
  • -prolensig:

库选项

  1. ## 【Library options】
  2. ## -blastclust [[STRING]] Trigger to use blastclust and customize parameters
  3. ## (default -blastclust [-L .9 -b T -S 80])
  4. ## -cdhit [[STRING]] Trigger to use cd-hit-est (default) and customize parameters
  5. ## (default -cdhit [-c 0.8 -G 0.8 -s 0.9 -aL 0.9 -aS 0.9 -M 0])
  6. ## -linelib [FASTA] Provide LINE transposase database for LINE TE exclusion
  7. ## (default /database/Tpases020812LINE)
  8. ## -dnalib [FASTA] Provide DNA TE transposase database for DNA TE exclusion
  9. ## (default /database/Tpases020812DNA)
  10. ## -plantprolib [FASTA] Provide plant protein database for coding sequence exclusion
  11. ## (default /database/alluniRefprexp082813)
  12. ## -TEhmm [Pfam] Provide Pfam database for TE identification
  13. ## (default /database/TEfam.hmm)
  • -blastclust:
  • -cdhit:
  • -linelib:
  • -dnalib:
  • -plantprolib:
  • -TEhmm:

依赖

  1. ## 【Dependencies】
  2. ## -repeatmasker [path] Path to the RepeatMasker program. (default: find from ENV)
  3. ## -blastplus [path] Path to the BLAST+ program. (default: find from ENV)
  4. ## -blast [path] Path to the BLAST program. Required if -blastclust is used. (default: find from ENV)
  5. ## -cdhit_path [path] Path to the CD-HIT program. Required if -cdhit is used. (default: find from ENV)
  6. ## -hmmer [path] Path to the HMMER program. (default: find from ENV)
  7. ## -trf_path [path] Path to the trf program. (default: find from ENV)

设置指定软件的路径,默认从环境变量中获取。

其他

  1. ## 【Miscellaneous】
  2. ## -u [FLOAT] Neutral mutation rate (per bp per ya) (default 1.3e-8 (from rice))
  3. ## -step [STRING] Restart the program from a particular step. Existing outputs will be overwritten. Options:
  4. ## Init (default, from the beginning);
  5. ## Major (Tandem repeat cleanup finished, structrual analyses next)
  6. ## Trunc (Structural analyses finished, truncated LTR recycle next)
  7. ## Promask (Truncated LTR recycle finished, protein contamination cleanup next)
  8. ## Library (Protein contamination cleanup finished, initial library construction next)
  9. ## Next (Initial library construction finished, non-TGCA analyses next)
  10. ## -threads [INT] Number of threads (≤ total available threads, default 4)
  11. ## -help/-h Display this help information
  • -u:
  • -step:
  • -threads:
  • -help/-h:

运行

  1. convert_ltrdetector.pl
  2. convert_ltr_struc.pl
  3. convert_MGEScan3.0.pl
  1. gt suffixerator \
  2. -db genome.fa \
  3. -indexname genome.fa \
  4. -tis -suf -lcp -des -ssp -sds -dna
  5. gt ltrharvest \
  6. -index genome.fa \
  7. -minlenltr 100 \
  8. -maxlenltr 7000 \
  9. -mintsd 4 \
  10. -maxtsd 6 \
  11. -motif TGCA \
  12. -motifmis 1 \
  13. -similar 85 \
  14. -vic 10 \
  15. -seed 20 \
  16. -seqids yes > genome.fa.harvest.scn
  17. LTR_FINDER_parallel \
  18. -seq genome.fa \
  19. -threads 10 \
  20. -harvest_out \
  21. -size 1000000 \
  22. -time 300
  23. cat genome.fa.harvest.scn genome.fa.finder.combine.scn > genome.fa.rawLTR.scn
  1. LTR_retriever \
  2. -genome genome.fa \
  3. -inharvest genome.fa.rawLTR.scn \
  4. -threads 10 [options]
  1. LAI \
  2. -genome genome.fa \
  3. -intact genome.fa.pass.list \
  4. -all genome.fa.out [options]

参考

  1. github地址:https://github.com/oushujun/LTR_retriever
  2. xuzhougeng | LTR_retriever: 一个更加准的LTR整合分析工具
  3. Ou S, Jiang N. LTRretriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. _Plant Physiol. 2018 Feb;176(2):1410-1422. doi: 10.1104/pp.17.01310