介绍
LTR_retriever(https://github.com/oushujun/LTR_retriever)

下载
使用
LTR_retriever -h
##
## ##########################
## ### LTR_retriever v2.9.0 ###
## ##########################
##
## A program for accurate identification of LTR-RTs from outputs of LTRharvest and
## LTR_FINDER, generates non-redundant LTR-RT library for genome annotations.
##
## Shujun Ou (shujun.ou.1@gmail.com) 03/26/2019
##
## Usage: LTR_retriever -genome genomefile -inharvest LTRharvest_input [options]
输入选项
## 【Input Options】
## -genome [File] Specify the genome sequence file (FASTA)
## -inharvest [File] LTR-RT candidates from LTRharvest
## -infinder [File] LTR-RT candidates from LTR_FINDER
## -inmgescan [File] LTR-RT candidates from MGEScan_LTR
## -nonTGCA [File] Non-canonical LTR-RT candidates from LTRharvest
- -genome:
- -inharvest:
- -infinder:
- -inmgescan:
- -nonTGCA:
输出选项
## 【Output options】
## -verbose/-v Retain intermediate outputs (developer mode)
## -noanno Disable whole genome LTR-RT annotation (no GFF3 output)
- -verbose/- -v:
- -noanno:
过滤选项
## 【Filter options】
## -misschar [CHR] Specify the ambiguous character (default N)
## -Nscreen Disable filtering ambiguous sequence in candidates
## -missmax [INT] Maximum number of ambiguous bp allowed in a candidate (default 10)
## -missrate [0-1] Maximum percentage of ambiguous bp allowed in a candidate (default 0.8)
## -minlen [INT] Minimum bp of the LTR region (default 100)
## -max_ratio [FLOAT] Maximum length ratio of internal region/LTR region (default 50)
## -minscore [INT] Minimum alignment length (INT/2) to identify tandem repeats (default 1000)
## -flankmiss [1-60] Maximum ambiguous length (bp) allowed in 60bp-flanking sequences (default 25)
## -flanksim [0-100] Minimum percentage of identity for flanking sequence alignment (default 60)
## -flankaln [0-1] Maximum alignment portion allowed for 60bp-flanking sequences (default 0.6)
## -motif [[STRING]] Specify non-canonical motifs to search for
## (default -motif [TCCA TGCT TACA TACT TGGA TATA TGTA TGCA])
## -notrunc Discard truncated LTR-RTs and nested LTR-RTs (will dampen sensitivity)
## -procovTE [0-1] Maximum portion of allowed for cumulated DNA TE database and LINE database
## lignments (default 0.7)
## -procovPL [0-1] Maximum portion allowed for cumulated plant protein database alignments (default 0.7)
## -prolensig [INT] Minimum alignment length (aa) for LINE/DNA transposase/plant protein alignment (default 30)
- -misschar
- -Nscreen
- -missmax
- -missrate
- -minlen
- -max_ratio
- -minscore
- -flankmiss:
- -flanksim:
- -flankaln:
- -motif:
- -notrunc:
- -procovTE:
- -procovPL:
- -prolensig:
库选项
## 【Library options】
## -blastclust [[STRING]] Trigger to use blastclust and customize parameters
## (default -blastclust [-L .9 -b T -S 80])
## -cdhit [[STRING]] Trigger to use cd-hit-est (default) and customize parameters
## (default -cdhit [-c 0.8 -G 0.8 -s 0.9 -aL 0.9 -aS 0.9 -M 0])
## -linelib [FASTA] Provide LINE transposase database for LINE TE exclusion
## (default /database/Tpases020812LINE)
## -dnalib [FASTA] Provide DNA TE transposase database for DNA TE exclusion
## (default /database/Tpases020812DNA)
## -plantprolib [FASTA] Provide plant protein database for coding sequence exclusion
## (default /database/alluniRefprexp082813)
## -TEhmm [Pfam] Provide Pfam database for TE identification
## (default /database/TEfam.hmm)
- -blastclust:
- -cdhit:
- -linelib:
- -dnalib:
- -plantprolib:
- -TEhmm:
依赖
## 【Dependencies】
## -repeatmasker [path] Path to the RepeatMasker program. (default: find from ENV)
## -blastplus [path] Path to the BLAST+ program. (default: find from ENV)
## -blast [path] Path to the BLAST program. Required if -blastclust is used. (default: find from ENV)
## -cdhit_path [path] Path to the CD-HIT program. Required if -cdhit is used. (default: find from ENV)
## -hmmer [path] Path to the HMMER program. (default: find from ENV)
## -trf_path [path] Path to the trf program. (default: find from ENV)
设置指定软件的路径,默认从环境变量中获取。
其他
## 【Miscellaneous】
## -u [FLOAT] Neutral mutation rate (per bp per ya) (default 1.3e-8 (from rice))
## -step [STRING] Restart the program from a particular step. Existing outputs will be overwritten. Options:
## Init (default, from the beginning);
## Major (Tandem repeat cleanup finished, structrual analyses next)
## Trunc (Structural analyses finished, truncated LTR recycle next)
## Promask (Truncated LTR recycle finished, protein contamination cleanup next)
## Library (Protein contamination cleanup finished, initial library construction next)
## Next (Initial library construction finished, non-TGCA analyses next)
## -threads [INT] Number of threads (≤ total available threads, default 4)
## -help/-h Display this help information
- -u:
- -step:
- -threads:
- -help/- -h:
运行
convert_ltrdetector.pl
convert_ltr_struc.pl
convert_MGEScan3.0.pl
gt suffixerator \
-db genome.fa \
-indexname genome.fa \
-tis -suf -lcp -des -ssp -sds -dna
gt ltrharvest \
-index genome.fa \
-minlenltr 100 \
-maxlenltr 7000 \
-mintsd 4 \
-maxtsd 6 \
-motif TGCA \
-motifmis 1 \
-similar 85 \
-vic 10 \
-seed 20 \
-seqids yes > genome.fa.harvest.scn
LTR_FINDER_parallel \
-seq genome.fa \
-threads 10 \
-harvest_out \
-size 1000000 \
-time 300
cat genome.fa.harvest.scn genome.fa.finder.combine.scn > genome.fa.rawLTR.scn
LTR_retriever \
-genome genome.fa \
-inharvest genome.fa.rawLTR.scn \
-threads 10 [options]
LAI \
-genome genome.fa \
-intact genome.fa.pass.list \
-all genome.fa.out [options]
参考
- github地址:https://github.com/oushujun/LTR_retriever
- xuzhougeng | LTR_retriever: 一个更加准的LTR整合分析工具
- Ou S, Jiang N. LTRretriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. _Plant Physiol. 2018 Feb;176(2):1410-1422. doi: 10.1104/pp.17.01310
 
                         
                                

