介绍

EDTA (The Extensive de novo TE Annotator)

工具 | EDTA: 转座子注释流程 - 图1

使用

软件下载

  1. cd /home/shwzhao/bin/SingularityDir/
  2. SINGULARITY_CACHEDIR=./
  3. export SINGULARITY_CACHEDIR
  4. singularity pull EDTA.sif docker://oushujun/edta:2.0.0
  5. singularity exec /home/shwzhao/bin/SingularityDir/EDTA.sif EDTA.pl -h
  6. ## INFO: Converting SIF file to temporary sandbox...
  7. ##
  8. ## ########################################################
  9. ## ##### Extensive de-novo TE Annotator (EDTA) v2.0.0 ####
  10. ## ##### Shujun Ou (shujun.ou.1@gmail.com) ####
  11. ## ########################################################
  12. ## .......

参数说明

具体看参考吧

  • EDTA.pl

--curatedlib

--sensitive

--rmout

--exclude

--u

  • EDTA_raw.pl

--type

  • lib-test.pl

检验注释性能

运行

  1. mkdir test
  2. cd test
  3. cp /home/train/public_data/genome/Arabidopsis_thaliana/Arabidopsis_thaliana.genome.fa .
  4. singularity exec /home/shwzhao/bin/SingularityDir/EDTA.sif EDTA.pl --genome ./Arabidopsis_thaliana.genome.fa --overwrite 0 --sensitive 1 --anno 1 --threads 20
  1. genome=/pfs/proj/nobackup/fs/projnb10/hpc2nstor2024-021/shwzhao/01_research/03_assemble2/6_polish/genome.all.polished.fa
  2. EDTA.pl \
  3. --genome ${genome} \
  4. --overwrite 1 \
  5. --sensitive 1 \
  6. --anno 1 \
  7. --threads 50 \
  8. 1> edta.out \
  9. 2> edta.err
  10. ls -lh
  11. ## total 3.2G
  12. ## -rw-rw----+ 1 shwzhao ps30521 9.4K Oct 23 10:06 edta.err
  13. ## -rw-rw----+ 1 shwzhao ps30521 2.0K Oct 23 11:43 edta.out
  14. ## lrwxrwxrwx 1 shwzhao ps30521 120 Oct 20 11:39 genome.all.polished.fa -> /pfs/proj/nobackup/fs/projnb10/hpc2nstor2024-021/shwzhao/01_research/03_assemble2/6_polish/genome.all.polished.fa
  15. ## -rw-rw----+ 1 shwzhao ps30521 752M Oct 20 11:39 genome.all.polished.fa.mod
  16. ## drwxrws---+ 2 shwzhao ps30521 4.0K Oct 23 11:43 genome.all.polished.fa.mod.EDTA.anno
  17. ## drwxrws---+ 2 shwzhao ps30521 20K Oct 21 00:38 genome.all.polished.fa.mod.EDTA.combine
  18. ## drwxrws---+ 3 shwzhao ps30521 12K Oct 23 10:06 genome.all.polished.fa.mod.EDTA.final
  19. ## -rw-rw----+ 1 shwzhao ps30521 3.9M Oct 23 10:06 genome.all.polished.fa.mod.EDTA.intact.gff3
  20. ## drwxrws---+ 5 shwzhao ps30521 4.0K Oct 21 00:23 genome.all.polished.fa.mod.EDTA.raw
  21. ## -rw-rw----+ 1 shwzhao ps30521 121M Oct 23 11:43 genome.all.polished.fa.mod.EDTA.TEanno.gff3
  22. ## -rw-rw----+ 1 shwzhao ps30521 288K Oct 23 11:43 genome.all.polished.fa.mod.EDTA.TEanno.sum
  23. ## -rw-rw----+ 1 shwzhao ps30521 8.5M Oct 23 10:03 genome.all.polished.fa.mod.EDTA.TElib.fa
  24. ## -rw-rw----+ 1 shwzhao ps30521 752M Oct 23 11:43 genome.all.polished.fa.mod.MAKER.masked
  25. ## -rw-rw----+ 1 shwzhao ps30521 440 Oct 20 11:29 run.sh

2.4 可视化

以后补充

  1. grep -v "##" Arabidopsis_thaliana.genome.fa.mod.EDTA.TEanno.gff3 | cut -f 1-5,7 | sed '1i seqid\tsource\tsequence_ontology\tstart\tend\tstrand' > ara.edta.tsv

参考

github 地址:https://github.com/oushujun/EDTA

文章:Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline

bilibili | CGM第七十九期 区树俊博士的报告

xuzhougeng | 有了它,基因组重复序列注释一次全搞定