目标检测 - 目标检测数据集常见格式 - 《AICV实验室》

Pascal VOC
- 目录结构
- 标注格式
COCO
Pascal VOC to COCO
COCO to Pascal VOC

Pascal VOC

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
tar xvf VOCtrainval_06-Nov-2007.tar
tar xvf VOCtest_06-Nov-2007.tar
tar xvf VOCdevkit_08-Jun-2007.tar

目录结构

VOC 数据集包括VOC2007和VOC2012两个，包含有 20 个类别：

VOC_CLASSES = (  # always index 0
    'aeroplane', 'bicycle', 'bird', 'boat',
    'bottle', 'bus', 'car', 'cat', 'chair',
    'cow', 'diningtable', 'dog', 'horse',
    'motorbike', 'person', 'pottedplant',
    'sheep', 'sofa', 'train', 'tvmonitor')

以VOC2007 为例，其整体的文件结构如下：

VOC2007/
├── Annotations
│   ├── 000001.xml
│   ├── 000002.xml
│   ├── 000003.xml
│   └── ...
│
├── ImageSets
│   ├── Layout
│   │   ├── test.txt
│   │   ├── train.txt
│   │   ├── trainval.txt
│   │   └── val.txt
│   ├── Main
│   │   ├── test.txt
│   │   ├── train.txt
│   │   └── val.txt
│   └── Segmentation
│       ├── test.txt
│       ├── train.txt
│       ├── trainval.txt
│       └── val.txt
│
├── JPEGImages
│   ├── 000001.jpg
│   ├── 000002.jpg
│   ├── 000003.jpg
│   └── ...
│
├── SegmentationClass
└── SegmentationObject

标注格式

Annotations：文件夹下存放的是 xml 格式的文件，是标注后生成的一种格式文件

以 000001.xml 为例

<annotation>
    <folder>VOC2007</folder>
    <filename>000001.jpg</filename>
    <source>
        <database>The VOC2007 Database</database>
        <annotation>PASCAL VOC2007</annotation>
        <image>flickr</image>
        <flickrid>341012865</flickrid>
    </source>
    <owner>
        <flickrid>Fried Camels</flickrid>
        <name>Jinky the Fruit Bat</name>
    </owner>
    <size>
        <width>353</width>
        <height>500</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>dog</name>
        <pose>Left</pose>
        <truncated>1</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>48</xmin>
            <ymin>240</ymin>
            <xmax>195</xmax>
            <ymax>371</ymax>
        </bndbox>
    </object>
    <object>
        <name>person</name>
        <pose>Left</pose>
        <truncated>1</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>8</xmin>
            <ymin>12</ymin>
            <xmax>352</xmax>
            <ymax>498</ymax>
        </bndbox>
    </object>
</annotation>

JPEGImages：文件夹下存放的是原始的图片文件

（注意：Annotations 文件夹和 JPEGImages 文件夹下的文件数量是相等的，文件名也是一一对应的，只是后缀名不一样而已）

ImageSets：文件夹下存放的是一些 txt 文件，里面记录的是每个类别的 txt 文件，主要关注的是 ImageSets/Main 文件夹下的三个文件train.txt、 test.txt、val.txt ，这三个文件相当于对数据进行拆分成训练集、测试集、验证集，文件里面的内容是图片的文件名（不包含后缀名），读者们自行打开就知道了。

以 train.txt 为例

SegmentationClass 和 SegmentationObject：分割相关的文件夹，在目标检测中用不到。

VOC2007 数据集，总共有样本 9963 张图片，训练集 2501张、测试集 4952张、验证集 2510张。

COCO

关于数据集

数据集名称：COCO 大型图像数据集
发布机构：Microsoft
发布地址：http://cocodataset.org/#
下载地址：https://hyper.ai/datasets/4909
简介：COCO 数据集是一个大型图像数据集，设计用于机器视觉领域的对象检测、分割，人物关键点检测，填充分割和字幕生成。COCO 数据集以场景理解为目标，主要从复杂的日常场景中截取，图像中的目标通过精确的分割进行位置的标定。

COCO 数据集具有以下几个特征：目标分割，情景中的感知，超像素分割，33 万张图像（20 余万个标签），150 万个目标实例，80 个目标类，91 个物品类，25 万有关键点的人。
COCO 数据集于 2014 年由微软发布，现已成为图像字幕的标准测试平台。

目录结构

COCO/
├── annotations
│   ├──captions_train2014.json
│   ├──captions_val2014.json
│   ├──image_info_test2014.json
│   ├──instances_minival2014.json
│   ├──instances_train2014.json
│   ├──instances_val2014.json
│   ├──instances_valminusminival2014.json
│   ├──person_keypoints_minival2014.json
│   ├──person_keypoints_train2014.json
│   ├──person_keypoints_val2014.json
│   ├──person_keypoints_valminusminival2014.json
│
├── train2014
│   ├── COCO_train2014_000000000009.jpg
│   ├── COCO_train2014_000000000025.jpg
│   ├── COCO_train2014_000000000030.jpg
│   ├── COCO_train2014_000000000034.jpg
│   ├── COCO_train2014_000000000036.jpg
│   ├── COCO_train2014_000000000049.jpg
│   ├── COCO_train2014_000000000061.jpg
│   └── ...
│
├── val2014
│   ├── COCO_val2014_000000000042.jpg
│   ├── COCO_val2014_000000000073.jpg
│   ├── COCO_val2014_000000000074.jpg
│   ├── COCO_val2014_000000000133.jpg
│   ├── COCO_val2014_000000000136.jpg
│   ├── COCO_val2014_000000000139.jpg
│   └── ...

标注格式

以 instances_minival2014.json 文件为例
主要字段

info

"info": 
  {
      "description": "This is stable 1.0 version of the 2014 MS COCO dataset.",
      "url": "http:\/\/mscoco.org",
      "version": "1.0",
      "year": 2014,
      "contributor": "Microsoft COCO group",
  }

licenses

"licenses": [
  {
  "url" : str,   # 协议链接
  "id" : int,   # 协议id编号
  "name" : str,   # 协议名
  },
  ....
  {
  "url": "http:\/\/creativecommons.org\/licenses\/by-nc-sa\/2.0\/",
  "id": 1,
  "name": "Attribution-NonCommercial-ShareAlike License"
  }
]

images

"images": [
  {
      "license" : int,   # 遵循哪个协议
      "url" : str,   # COCO图片链接url        
      "file_name" : str,   # 文件名
      "height" : int,  # 图像的高
      "width" : int,   # 图像的宽
      "date_captured" : datetime, # 获取数据的日期
      "id" : int,   # 图像id，可从0开始
  },
  {
      "license": 4,
      "url": "http:\/\/farm7.staticflickr.com\/6116\/6255196340_da26cf2c9e_z.jpg",
      "file_name": "COCO_val2014_000000397133.jpg",
      "height": 427,
      "width": 640,
      "date_captured": "2013-11-14 17:02:52",
      "id": 397133
  },
  ...
]

type

annotations

"annotation"[
  {
      "segmentation" : RLE or [polygon],  # 分割具体数据
      "area" : float,  # 目标检测的区域大小
      "iscrowd" : 0 or 1,  # 目标是否被遮盖，默认为0
      "image_id" : int,  # 图像id编号
      "bbox" : [x,y,width,height],  # 目标检测框的坐标详细位置信息
      "category_id" : int,  # 类别id编号
      "id" : int,  # 注释id编号
  },
  {
      "segmentation": [[510.66,
                        423.01,
                        ......
                        510.03,
                        423.01,
                        510.45,
                        423.01]],
      "area": 702.10575,
      "iscrowd": 0,
      "image_id": 289343,
      "bbox": [473.07,
               395.93,
               38.65,
               28.67],
      "category_id": 18,
      "id": 1768
  },
  ....
]

categories

每个类别的信息

"categories":[
    {
        "supercategory" : str, # 类别所属的大类，如哈巴狗和狐狸犬都属于犬科这个大类
        "id" : int,  # 类别id编号
        "name" : str,  # 类别名字
    },
    .....
    {
        "supercategory": "person",
        "id": 1,
        "name": "person"
    }
]

Pascal VOC to COCO

# pip install mmcv
import os.path as osp
import xml.etree.ElementTree as ET
import mmcv
# 根据自己的数据进行修改
def underwater_classes():
    return ['holothurian', 'echinus', 'scallop', 'starfish']
from glob import glob
from tqdm import tqdm
from PIL import Image
label_ids = {name: i + 1 for i, name in enumerate(underwater_classes())}
def get_segmentation(points):
    return [points[0], points[1], points[2] + points[0], points[1],
             points[2] + points[0], points[3] + points[1], points[0], points[3] + points[1]]
def parse_xml(xml_path, img_id, anno_id):
    tree = ET.parse(xml_path)
    root = tree.getroot()
    annotation = []
    for obj in root.findall('object'):
        name = obj.find('name').text
        if name == 'waterweeds':
            continue
        category_id = label_ids[name]
        bnd_box = obj.find('bndbox')
        xmin = int(bnd_box.find('xmin').text)
        ymin = int(bnd_box.find('ymin').text)
        xmax = int(bnd_box.find('xmax').text)
        ymax = int(bnd_box.find('ymax').text)
        w = xmax - xmin + 1
        h = ymax - ymin + 1
        area = w*h
        segmentation = get_segmentation([xmin, ymin, w, h])
        annotation.append({
                        "segmentation": segmentation,
                        "area": area,
                        "iscrowd": 0,
                        "image_id": img_id,
                        "bbox": [xmin, ymin, w, h],
                        "category_id": category_id,
                        "id": anno_id,
                        "ignore": 0})
        anno_id += 1
    return annotation, anno_id
def cvt_annotations(img_path, xml_path, out_file):
    images = []
    annotations = []
    # xml_paths = glob(xml_path + '/*.xml')
    img_id = 1
    anno_id = 1
    for img_path in tqdm(glob(img_path + '/*.jpg')):
        w, h = Image.open(img_path).size
        img_name = osp.basename(img_path)
        img = {"file_name": img_name, "height": int(h), "width": int(w), "id": img_id}
        images.append(img)
        xml_file_name = img_name.split('.')[0] + '.xml'
        xml_file_path = osp.join(xml_path, xml_file_name)
        annos, anno_id = parse_xml(xml_file_path, img_id, anno_id)
        annotations.extend(annos)
        img_id += 1
    categories = []
    for k,v in label_ids.items():
        categories.append({"name": k, "id": v})
    final_result = {"images": images, "annotations": annotations, "categories": categories}
    mmcv.dump(final_result, out_file)
    return annotations
def main():
    xml_path = 'F:/jupyter/Underwater_detection/data/train/box'  # xml 所在文件夹
    img_path = 'F:/jupyter/Underwater_detection/data/train/image' # 图片所在文件夹
    print('processing {} ...'.format("xml format annotations"))
    cvt_annotations(img_path, xml_path, 'F:/jupyter/Underwater_detection/data/train/annotations/train.json') # 保存的结果
    print('Done!')
if __name__ == '__main__':
    main()

目标检测数据集常见格式

Pascal VOC

目录结构

标注格式

COCO

关于数据集

目录结构

标注格式

Pascal VOC to COCO

COCO to Pascal VOC