介绍
LTR_retriever(https://github.com/oushujun/LTR_retriever)

下载
使用
LTR_retriever -h#### ############################ ### LTR_retriever v2.9.0 ##### ############################## A program for accurate identification of LTR-RTs from outputs of LTRharvest and## LTR_FINDER, generates non-redundant LTR-RT library for genome annotations.#### Shujun Ou (shujun.ou.1@gmail.com) 03/26/2019#### Usage: LTR_retriever -genome genomefile -inharvest LTRharvest_input [options]
输入选项
## 【Input Options】## -genome [File] Specify the genome sequence file (FASTA)## -inharvest [File] LTR-RT candidates from LTRharvest## -infinder [File] LTR-RT candidates from LTR_FINDER## -inmgescan [File] LTR-RT candidates from MGEScan_LTR## -nonTGCA [File] Non-canonical LTR-RT candidates from LTRharvest
-genome:-inharvest:-infinder:-inmgescan:-nonTGCA:
输出选项
## 【Output options】## -verbose/-v Retain intermediate outputs (developer mode)## -noanno Disable whole genome LTR-RT annotation (no GFF3 output)
-verbose/-v:-noanno:
过滤选项
## 【Filter options】## -misschar [CHR] Specify the ambiguous character (default N)## -Nscreen Disable filtering ambiguous sequence in candidates## -missmax [INT] Maximum number of ambiguous bp allowed in a candidate (default 10)## -missrate [0-1] Maximum percentage of ambiguous bp allowed in a candidate (default 0.8)## -minlen [INT] Minimum bp of the LTR region (default 100)## -max_ratio [FLOAT] Maximum length ratio of internal region/LTR region (default 50)## -minscore [INT] Minimum alignment length (INT/2) to identify tandem repeats (default 1000)## -flankmiss [1-60] Maximum ambiguous length (bp) allowed in 60bp-flanking sequences (default 25)## -flanksim [0-100] Minimum percentage of identity for flanking sequence alignment (default 60)## -flankaln [0-1] Maximum alignment portion allowed for 60bp-flanking sequences (default 0.6)## -motif [[STRING]] Specify non-canonical motifs to search for## (default -motif [TCCA TGCT TACA TACT TGGA TATA TGTA TGCA])## -notrunc Discard truncated LTR-RTs and nested LTR-RTs (will dampen sensitivity)## -procovTE [0-1] Maximum portion of allowed for cumulated DNA TE database and LINE database## lignments (default 0.7)## -procovPL [0-1] Maximum portion allowed for cumulated plant protein database alignments (default 0.7)## -prolensig [INT] Minimum alignment length (aa) for LINE/DNA transposase/plant protein alignment (default 30)
-misschar-Nscreen-missmax-missrate-minlen-max_ratio-minscore-flankmiss:-flanksim:-flankaln:-motif:-notrunc:-procovTE:-procovPL:-prolensig:
库选项
## 【Library options】## -blastclust [[STRING]] Trigger to use blastclust and customize parameters## (default -blastclust [-L .9 -b T -S 80])## -cdhit [[STRING]] Trigger to use cd-hit-est (default) and customize parameters## (default -cdhit [-c 0.8 -G 0.8 -s 0.9 -aL 0.9 -aS 0.9 -M 0])## -linelib [FASTA] Provide LINE transposase database for LINE TE exclusion## (default /database/Tpases020812LINE)## -dnalib [FASTA] Provide DNA TE transposase database for DNA TE exclusion## (default /database/Tpases020812DNA)## -plantprolib [FASTA] Provide plant protein database for coding sequence exclusion## (default /database/alluniRefprexp082813)## -TEhmm [Pfam] Provide Pfam database for TE identification## (default /database/TEfam.hmm)
-blastclust:-cdhit:-linelib:-dnalib:-plantprolib:-TEhmm:
依赖
## 【Dependencies】## -repeatmasker [path] Path to the RepeatMasker program. (default: find from ENV)## -blastplus [path] Path to the BLAST+ program. (default: find from ENV)## -blast [path] Path to the BLAST program. Required if -blastclust is used. (default: find from ENV)## -cdhit_path [path] Path to the CD-HIT program. Required if -cdhit is used. (default: find from ENV)## -hmmer [path] Path to the HMMER program. (default: find from ENV)## -trf_path [path] Path to the trf program. (default: find from ENV)
设置指定软件的路径,默认从环境变量中获取。
其他
## 【Miscellaneous】## -u [FLOAT] Neutral mutation rate (per bp per ya) (default 1.3e-8 (from rice))## -step [STRING] Restart the program from a particular step. Existing outputs will be overwritten. Options:## Init (default, from the beginning);## Major (Tandem repeat cleanup finished, structrual analyses next)## Trunc (Structural analyses finished, truncated LTR recycle next)## Promask (Truncated LTR recycle finished, protein contamination cleanup next)## Library (Protein contamination cleanup finished, initial library construction next)## Next (Initial library construction finished, non-TGCA analyses next)## -threads [INT] Number of threads (≤ total available threads, default 4)## -help/-h Display this help information
-u:-step:-threads:-help/-h:
运行
convert_ltrdetector.plconvert_ltr_struc.plconvert_MGEScan3.0.pl
gt suffixerator \-db genome.fa \-indexname genome.fa \-tis -suf -lcp -des -ssp -sds -dnagt ltrharvest \-index genome.fa \-minlenltr 100 \-maxlenltr 7000 \-mintsd 4 \-maxtsd 6 \-motif TGCA \-motifmis 1 \-similar 85 \-vic 10 \-seed 20 \-seqids yes > genome.fa.harvest.scnLTR_FINDER_parallel \-seq genome.fa \-threads 10 \-harvest_out \-size 1000000 \-time 300cat genome.fa.harvest.scn genome.fa.finder.combine.scn > genome.fa.rawLTR.scn
LTR_retriever \-genome genome.fa \-inharvest genome.fa.rawLTR.scn \-threads 10 [options]
LAI \-genome genome.fa \-intact genome.fa.pass.list \-all genome.fa.out [options]
参考
- github地址:https://github.com/oushujun/LTR_retriever
- xuzhougeng | LTR_retriever: 一个更加准的LTR整合分析工具
- Ou S, Jiang N. LTRretriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. _Plant Physiol. 2018 Feb;176(2):1410-1422. doi: 10.1104/pp.17.01310
