介绍
LTR_retriever(https://github.com/oushujun/LTR_retriever)
下载
使用
LTR_retriever -h
##
## ##########################
## ### LTR_retriever v2.9.0 ###
## ##########################
##
## A program for accurate identification of LTR-RTs from outputs of LTRharvest and
## LTR_FINDER, generates non-redundant LTR-RT library for genome annotations.
##
## Shujun Ou (shujun.ou.1@gmail.com) 03/26/2019
##
## Usage: LTR_retriever -genome genomefile -inharvest LTRharvest_input [options]
输入选项
## 【Input Options】
## -genome [File] Specify the genome sequence file (FASTA)
## -inharvest [File] LTR-RT candidates from LTRharvest
## -infinder [File] LTR-RT candidates from LTR_FINDER
## -inmgescan [File] LTR-RT candidates from MGEScan_LTR
## -nonTGCA [File] Non-canonical LTR-RT candidates from LTRharvest
-genome
:-inharvest
:-infinder
:-inmgescan
:-nonTGCA
:
输出选项
## 【Output options】
## -verbose/-v Retain intermediate outputs (developer mode)
## -noanno Disable whole genome LTR-RT annotation (no GFF3 output)
-verbose
/-v
:-noanno
:
过滤选项
## 【Filter options】
## -misschar [CHR] Specify the ambiguous character (default N)
## -Nscreen Disable filtering ambiguous sequence in candidates
## -missmax [INT] Maximum number of ambiguous bp allowed in a candidate (default 10)
## -missrate [0-1] Maximum percentage of ambiguous bp allowed in a candidate (default 0.8)
## -minlen [INT] Minimum bp of the LTR region (default 100)
## -max_ratio [FLOAT] Maximum length ratio of internal region/LTR region (default 50)
## -minscore [INT] Minimum alignment length (INT/2) to identify tandem repeats (default 1000)
## -flankmiss [1-60] Maximum ambiguous length (bp) allowed in 60bp-flanking sequences (default 25)
## -flanksim [0-100] Minimum percentage of identity for flanking sequence alignment (default 60)
## -flankaln [0-1] Maximum alignment portion allowed for 60bp-flanking sequences (default 0.6)
## -motif [[STRING]] Specify non-canonical motifs to search for
## (default -motif [TCCA TGCT TACA TACT TGGA TATA TGTA TGCA])
## -notrunc Discard truncated LTR-RTs and nested LTR-RTs (will dampen sensitivity)
## -procovTE [0-1] Maximum portion of allowed for cumulated DNA TE database and LINE database
## lignments (default 0.7)
## -procovPL [0-1] Maximum portion allowed for cumulated plant protein database alignments (default 0.7)
## -prolensig [INT] Minimum alignment length (aa) for LINE/DNA transposase/plant protein alignment (default 30)
-misschar
-Nscreen
-missmax
-missrate
-minlen
-max_ratio
-minscore
-flankmiss
:-flanksim
:-flankaln
:-motif
:-notrunc
:-procovTE
:-procovPL
:-prolensig
:
库选项
## 【Library options】
## -blastclust [[STRING]] Trigger to use blastclust and customize parameters
## (default -blastclust [-L .9 -b T -S 80])
## -cdhit [[STRING]] Trigger to use cd-hit-est (default) and customize parameters
## (default -cdhit [-c 0.8 -G 0.8 -s 0.9 -aL 0.9 -aS 0.9 -M 0])
## -linelib [FASTA] Provide LINE transposase database for LINE TE exclusion
## (default /database/Tpases020812LINE)
## -dnalib [FASTA] Provide DNA TE transposase database for DNA TE exclusion
## (default /database/Tpases020812DNA)
## -plantprolib [FASTA] Provide plant protein database for coding sequence exclusion
## (default /database/alluniRefprexp082813)
## -TEhmm [Pfam] Provide Pfam database for TE identification
## (default /database/TEfam.hmm)
-blastclust
:-cdhit
:-linelib
:-dnalib
:-plantprolib
:-TEhmm
:
依赖
## 【Dependencies】
## -repeatmasker [path] Path to the RepeatMasker program. (default: find from ENV)
## -blastplus [path] Path to the BLAST+ program. (default: find from ENV)
## -blast [path] Path to the BLAST program. Required if -blastclust is used. (default: find from ENV)
## -cdhit_path [path] Path to the CD-HIT program. Required if -cdhit is used. (default: find from ENV)
## -hmmer [path] Path to the HMMER program. (default: find from ENV)
## -trf_path [path] Path to the trf program. (default: find from ENV)
设置指定软件的路径,默认从环境变量中获取。
其他
## 【Miscellaneous】
## -u [FLOAT] Neutral mutation rate (per bp per ya) (default 1.3e-8 (from rice))
## -step [STRING] Restart the program from a particular step. Existing outputs will be overwritten. Options:
## Init (default, from the beginning);
## Major (Tandem repeat cleanup finished, structrual analyses next)
## Trunc (Structural analyses finished, truncated LTR recycle next)
## Promask (Truncated LTR recycle finished, protein contamination cleanup next)
## Library (Protein contamination cleanup finished, initial library construction next)
## Next (Initial library construction finished, non-TGCA analyses next)
## -threads [INT] Number of threads (≤ total available threads, default 4)
## -help/-h Display this help information
-u
:-step
:-threads
:-help
/-h
:
运行
convert_ltrdetector.pl
convert_ltr_struc.pl
convert_MGEScan3.0.pl
gt suffixerator \
-db genome.fa \
-indexname genome.fa \
-tis -suf -lcp -des -ssp -sds -dna
gt ltrharvest \
-index genome.fa \
-minlenltr 100 \
-maxlenltr 7000 \
-mintsd 4 \
-maxtsd 6 \
-motif TGCA \
-motifmis 1 \
-similar 85 \
-vic 10 \
-seed 20 \
-seqids yes > genome.fa.harvest.scn
LTR_FINDER_parallel \
-seq genome.fa \
-threads 10 \
-harvest_out \
-size 1000000 \
-time 300
cat genome.fa.harvest.scn genome.fa.finder.combine.scn > genome.fa.rawLTR.scn
LTR_retriever \
-genome genome.fa \
-inharvest genome.fa.rawLTR.scn \
-threads 10 [options]
LAI \
-genome genome.fa \
-intact genome.fa.pass.list \
-all genome.fa.out [options]
参考
- github地址:https://github.com/oushujun/LTR_retriever
- xuzhougeng | LTR_retriever: 一个更加准的LTR整合分析工具
- Ou S, Jiang N. LTRretriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. _Plant Physiol. 2018 Feb;176(2):1410-1422. doi: 10.1104/pp.17.01310