参考

https://ftp.ncbi.nih.gov/pub/taxonomy/

工具 | R | taxonomizr

工具 | R | taxize

https://cran.r-project.org/web/packages/myTAI/vignettes/Taxonomy.html

SQL

1. 介绍

taxonomizr provides some simple functions to parse NCBI taxonomy files and accession dumps and efficiently use them to assign taxonomy to accession numbers or taxonomic IDs. This is useful for example to assign taxonomy to BLAST results. This is all done locally after downloading the appropriate files from NCBI using included functions (see below).

2. 命令

2.1 主要函数

  • prepareDatabase: download data from NCBI and prepare SQLite database
  • accessionToTaxa: convert accession numbers to taxonomic IDs
  • getTaxonomy: convert taxonomic IDs to taxonomy
  1. library(taxonomizr)
  2. #note this will require a lot of hard drive space, bandwidth and time to process all the data from NCBI
  3. # prepareDatabase('accessionTaxa.sql')
  4. blastAccessions <- c("Z17430.1","Z17429.1","X62402.1")
  5. ids <- accessionToTaxa(blastAccessions,'/home/xmyan/TPS_SynNet/20221031/TPS_identified/NR/Species/accessionTaxa.sql')
  6. getTaxonomy(ids,'/home/xmyan/TPS_SynNet/20221031/TPS_identified/NR/Species/accessionTaxa.sql')

2.2 其他函数

  • getId: convert a biological name to taxonomic ID
  • getRawTaxonomy: find all taxonomic ranks for a taxonomic ID
  • normalizeTaxa: combine raw taxonomies with different taxonomic ranks
  • condenseTaxa: condense a set of taxa to their most specific common branch
  • makeNewick: generate a Newick formatted tree from taxonomic output
  • getAccessions: find accessions for a given taxonomic ID

taxaId <- getId(a$V1,'/home/xmyan/TPS_SynNet/20221031/TPS_identified/NR/Species/accessionTaxa.sql')
print(taxaId)