对TAB分隔的,含染色体位置的文件都可以建立索引,如VCF, GFF, BED, SAM …

输入文件必须是bgzip压缩文件

  1. bgzip -@ 4 input.vcf # 压缩
  2. tabix input.vcf.gz # 建索引
  3. tabix input.vcf.gz -l # 查看染色体列表
  4. tabix input.vcf.gz X # 查看某个染色体
  5. tabix input.vcf.gz 2:1000-1000 # 查看某个位点
  6. tabix input.vcf.gz 2:1000-2000 # 查看某个区间 (1-based)
  7. tabix -T target.bed input.vcf.gz # 指定区间文件(1-based)
  8. tabix -R region.bed input.vcf.gz # 指定区间文件(0-based)

任意TSV文件

  1. $ tabix
  2. Version: 1.9
  3. Usage: tabix [OPTIONS] [FILE] [REGION [...]]
  4. Indexing Options:
  5. -0, --zero-based coordinates are zero-based
  6. -b, --begin INT column number for region start [4]
  7. -c, --comment CHAR skip comment lines starting with CHAR [null]
  8. -C, --csi generate CSI index for VCF (default is TBI)
  9. -e, --end INT column number for region end (if no end, set INT to -b) [5]
  10. -f, --force overwrite existing index without asking
  11. -m, --min-shift INT set minimal interval size for CSI indices to 2^INT [14]
  12. -p, --preset STR gff, bed, sam, vcf
  13. -s, --sequence INT column number for sequence names (suppressed by -p) [1]
  14. -S, --skip-lines INT skip first INT lines [0]
  15. Querying and other options:
  16. -h, --print-header print also the header lines
  17. -H, --only-header print only the header lines
  18. -l, --list-chroms list chromosome names
  19. -r, --reheader FILE replace the header with the content of FILE
  20. -R, --regions FILE restrict to regions listed in the file
  21. -T, --targets FILE similar to -R but streams rather than index-jumps

区间文件(如BED文件,UCSC库文件等)

  1. #chrom start end name score
  2. 1 1 2 aaa 1
  3. tabix -c '#' -s 1 -b 2 -c 3 input.tsv.gz

位点文件(如VCF,ANNOVAR注释结果文件等)

  1. chrom pos anno
  2. 1 111 hello
  3. tabix -c 'chrom' -s 1 -b 2 -e 2 input.tsv.gz