介绍

LTR_retriever（https://github.com/oushujun/LTR_retriever）

工具 | LTR_retriever: 从头识别 LTR - 图1

下载

使用

LTR_retriever -h
## 
## ##########################
## ### LTR_retriever v2.9.0 ###
## ##########################
## 
## A program for accurate identification of LTR-RTs from outputs of LTRharvest and
##         LTR_FINDER, generates non-redundant LTR-RT library for genome annotations.
## 
## Shujun Ou (shujun.ou.1@gmail.com) 03/26/2019
## 
## Usage: LTR_retriever -genome genomefile -inharvest LTRharvest_input [options]

输入选项

## 【Input Options】
## -genome      [File]     Specify the genome sequence file (FASTA)
## -inharvest   [File]     LTR-RT candidates from LTRharvest
## -infinder    [File]     LTR-RT candidates from LTR_FINDER
## -inmgescan   [File]     LTR-RT candidates from MGEScan_LTR
## -nonTGCA     [File]     Non-canonical LTR-RT candidates from LTRharvest

-genome:
-inharvest:
-infinder:
-inmgescan:
-nonTGCA:

输出选项

## 【Output options】
## -verbose/-v             Retain intermediate outputs (developer mode)
## -noanno                 Disable whole genome LTR-RT annotation (no GFF3 output)

-verbose/-v:
-noanno:

过滤选项

## 【Filter options】
## -misschar    [CHR]      Specify the ambiguous character (default N)
## -Nscreen                Disable filtering ambiguous sequence in candidates
## -missmax     [INT]      Maximum number of ambiguous bp allowed in a candidate (default 10)
## -missrate    [0-1]      Maximum percentage of ambiguous bp allowed in a candidate (default 0.8)
## -minlen      [INT]      Minimum bp of the LTR region (default 100)
## -max_ratio   [FLOAT]    Maximum length ratio of internal region/LTR region (default 50)
## -minscore    [INT]      Minimum alignment length (INT/2) to identify tandem repeats (default 1000)
## -flankmiss   [1-60]     Maximum ambiguous length (bp) allowed in 60bp-flanking sequences (default 25)
## -flanksim    [0-100]    Minimum percentage of identity for flanking sequence alignment (default 60)
## -flankaln    [0-1]      Maximum alignment portion allowed for 60bp-flanking sequences (default 0.6)
## -motif       [[STRING]] Specify non-canonical motifs to search for
##                         (default -motif [TCCA TGCT TACA TACT TGGA TATA TGTA TGCA])
## -notrunc                Discard truncated LTR-RTs and nested LTR-RTs (will dampen sensitivity)
## -procovTE    [0-1]      Maximum portion of allowed for cumulated DNA TE database and LINE database
##                         lignments (default 0.7)
## -procovPL    [0-1]      Maximum portion allowed for cumulated plant protein database alignments (default 0.7)
## -prolensig   [INT]      Minimum alignment length (aa) for LINE/DNA transposase/plant protein alignment (default 30)

-misschar
-Nscreen
-missmax
-missrate
-minlen
-max_ratio
-minscore
-flankmiss:
-flanksim:
-flankaln:
-motif:
-notrunc:
-procovTE:
-procovPL:
-prolensig:

库选项

## 【Library options】
## -blastclust  [[STRING]] Trigger to use blastclust and customize parameters
##                         (default -blastclust [-L .9 -b T -S 80])
## -cdhit       [[STRING]] Trigger to use cd-hit-est (default) and customize parameters
##                         (default -cdhit [-c 0.8 -G 0.8 -s 0.9 -aL 0.9 -aS 0.9 -M 0])
## -linelib     [FASTA]    Provide LINE transposase database for LINE TE exclusion
##                         (default /database/Tpases020812LINE)
## -dnalib      [FASTA]    Provide DNA TE transposase database for DNA TE exclusion
##                         (default /database/Tpases020812DNA)
## -plantprolib [FASTA]    Provide plant protein database for coding sequence exclusion
##                         (default /database/alluniRefprexp082813)
## -TEhmm       [Pfam]     Provide Pfam database for TE identification
##                         (default /database/TEfam.hmm)

-blastclust:
-cdhit:
-linelib:
-dnalib:
-plantprolib:
-TEhmm:

依赖

## 【Dependencies】
## -repeatmasker [path]    Path to the RepeatMasker program. (default: find from ENV)
## -blastplus   [path]     Path to the BLAST+ program. (default: find from ENV)
## -blast       [path]     Path to the BLAST program. Required if -blastclust is used. (default: find from ENV)
## -cdhit_path  [path]     Path to the CD-HIT program. Required if -cdhit is used. (default: find from ENV)
## -hmmer       [path]     Path to the HMMER program. (default: find from ENV)
## -trf_path    [path]     Path to the trf program. (default: find from ENV)

设置指定软件的路径，默认从环境变量中获取。

其他

## 【Miscellaneous】
## -u           [FLOAT]    Neutral mutation rate (per bp per ya) (default 1.3e-8 (from rice))
## -step        [STRING]   Restart the program from a particular step. Existing outputs will be overwritten. Options:
##                                 Init (default, from the beginning);
##                                 Major (Tandem repeat cleanup finished, structrual analyses next)
##                                 Trunc (Structural analyses finished, truncated LTR recycle next)
##                                 Promask (Truncated LTR recycle finished, protein contamination cleanup next)
##                                 Library (Protein contamination cleanup finished, initial library construction next)
##                                 Next (Initial library construction finished, non-TGCA analyses next)
## -threads     [INT]      Number of threads (≤ total available threads, default 4)
## -help/-h                Display this help information

-u:
-step:
-threads:
-help/-h:

运行

convert_ltrdetector.pl
convert_ltr_struc.pl
convert_MGEScan3.0.pl

gt suffixerator \
  -db genome.fa \
  -indexname genome.fa \
  -tis -suf -lcp -des -ssp -sds -dna
gt ltrharvest \
  -index genome.fa \
  -minlenltr 100 \
  -maxlenltr 7000 \
  -mintsd 4 \
  -maxtsd 6 \
  -motif TGCA \
  -motifmis 1 \
  -similar 85 \
  -vic 10 \
  -seed 20 \
  -seqids yes > genome.fa.harvest.scn
LTR_FINDER_parallel \
  -seq genome.fa \
  -threads 10 \
  -harvest_out \
  -size 1000000 \
  -time 300
cat genome.fa.harvest.scn genome.fa.finder.combine.scn > genome.fa.rawLTR.scn

LTR_retriever \
  -genome genome.fa \
  -inharvest genome.fa.rawLTR.scn \
  -threads 10 [options]

LAI \
  -genome genome.fa \
  -intact genome.fa.pass.list \
  -all genome.fa.out [options]

参考

github地址：https://github.com/oushujun/LTR_retriever
xuzhougeng | LTR_retriever: 一个更加准的LTR整合分析工具
Ou S, Jiang N. LTRretriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. _Plant Physiol. 2018 Feb;176(2):1410-1422. doi: 10.1104/pp.17.01310