对TAB分隔的,含染色体位置的文件都可以建立索引,如VCF, GFF, BED, SAM …
输入文件必须是bgzip压缩文件
bgzip -@ 4 input.vcf # 压缩
tabix input.vcf.gz # 建索引
tabix input.vcf.gz -l # 查看染色体列表
tabix input.vcf.gz X # 查看某个染色体
tabix input.vcf.gz 2:1000-1000 # 查看某个位点
tabix input.vcf.gz 2:1000-2000 # 查看某个区间 (1-based)
tabix -T target.bed input.vcf.gz # 指定区间文件(1-based)
tabix -R region.bed input.vcf.gz # 指定区间文件(0-based)
任意TSV文件
$ tabix
Version: 1.9
Usage: tabix [OPTIONS] [FILE] [REGION [...]]
Indexing Options:
-0, --zero-based coordinates are zero-based
-b, --begin INT column number for region start [4]
-c, --comment CHAR skip comment lines starting with CHAR [null]
-C, --csi generate CSI index for VCF (default is TBI)
-e, --end INT column number for region end (if no end, set INT to -b) [5]
-f, --force overwrite existing index without asking
-m, --min-shift INT set minimal interval size for CSI indices to 2^INT [14]
-p, --preset STR gff, bed, sam, vcf
-s, --sequence INT column number for sequence names (suppressed by -p) [1]
-S, --skip-lines INT skip first INT lines [0]
Querying and other options:
-h, --print-header print also the header lines
-H, --only-header print only the header lines
-l, --list-chroms list chromosome names
-r, --reheader FILE replace the header with the content of FILE
-R, --regions FILE restrict to regions listed in the file
-T, --targets FILE similar to -R but streams rather than index-jumps
区间文件(如BED文件,UCSC库文件等)
#chrom start end name score
1 1 2 aaa 1
tabix -c '#' -s 1 -b 2 -c 3 input.tsv.gz
位点文件(如VCF,ANNOVAR注释结果文件等)
chrom pos anno
1 111 hello
tabix -c 'chrom' -s 1 -b 2 -e 2 input.tsv.gz