Pascal VOC

  1. wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
  2. wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
  3. wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
  4. tar xvf VOCtrainval_06-Nov-2007.tar
  5. tar xvf VOCtest_06-Nov-2007.tar
  6. tar xvf VOCdevkit_08-Jun-2007.tar

目录结构

VOC 数据集包括VOC2007和VOC2012两个,包含有 20 个类别:

  1. VOC_CLASSES = ( # always index 0
  2. 'aeroplane', 'bicycle', 'bird', 'boat',
  3. 'bottle', 'bus', 'car', 'cat', 'chair',
  4. 'cow', 'diningtable', 'dog', 'horse',
  5. 'motorbike', 'person', 'pottedplant',
  6. 'sheep', 'sofa', 'train', 'tvmonitor')

以VOC2007 为例,其整体的文件结构如下:

  1. VOC2007/
  2. ├── Annotations
  3. ├── 000001.xml
  4. ├── 000002.xml
  5. ├── 000003.xml
  6. └── ...
  7. ├── ImageSets
  8. ├── Layout
  9. ├── test.txt
  10. ├── train.txt
  11. ├── trainval.txt
  12. └── val.txt
  13. ├── Main
  14. ├── test.txt
  15. ├── train.txt
  16. └── val.txt
  17. └── Segmentation
  18. ├── test.txt
  19. ├── train.txt
  20. ├── trainval.txt
  21. └── val.txt
  22. ├── JPEGImages
  23. ├── 000001.jpg
  24. ├── 000002.jpg
  25. ├── 000003.jpg
  26. └── ...
  27. ├── SegmentationClass
  28. └── SegmentationObject

标注格式

  • Annotations:文件夹下存放的是 xml 格式的文件,是标注后生成的一种格式文件

000001.xml 为例

  1. <annotation>
  2. <folder>VOC2007</folder>
  3. <filename>000001.jpg</filename>
  4. <source>
  5. <database>The VOC2007 Database</database>
  6. <annotation>PASCAL VOC2007</annotation>
  7. <image>flickr</image>
  8. <flickrid>341012865</flickrid>
  9. </source>
  10. <owner>
  11. <flickrid>Fried Camels</flickrid>
  12. <name>Jinky the Fruit Bat</name>
  13. </owner>
  14. <size>
  15. <width>353</width>
  16. <height>500</height>
  17. <depth>3</depth>
  18. </size>
  19. <segmented>0</segmented>
  20. <object>
  21. <name>dog</name>
  22. <pose>Left</pose>
  23. <truncated>1</truncated>
  24. <difficult>0</difficult>
  25. <bndbox>
  26. <xmin>48</xmin>
  27. <ymin>240</ymin>
  28. <xmax>195</xmax>
  29. <ymax>371</ymax>
  30. </bndbox>
  31. </object>
  32. <object>
  33. <name>person</name>
  34. <pose>Left</pose>
  35. <truncated>1</truncated>
  36. <difficult>0</difficult>
  37. <bndbox>
  38. <xmin>8</xmin>
  39. <ymin>12</ymin>
  40. <xmax>352</xmax>
  41. <ymax>498</ymax>
  42. </bndbox>
  43. </object>
  44. </annotation>
  • JPEGImages:文件夹下存放的是原始的图片文件

(注意:Annotations 文件夹和 JPEGImages 文件夹下的文件数量是相等的,文件名也是一一对应的,只是后缀名不一样而已)
image.png
image.png

  • ImageSets:文件夹下存放的是一些 txt 文件,里面记录的是每个类别的 txt 文件,主要关注的是 ImageSets/Main 文件夹下的三个文件train.txt、 test.txt、val.txt ,这三个文件相当于对数据进行拆分成训练集、测试集、验证集,文件里面的内容是图片的文件名(不包含后缀名),读者们自行打开就知道了。

train.txt 为例
image.png

  • SegmentationClass 和 SegmentationObject:分割相关的文件夹,在目标检测中用不到。

VOC2007 数据集,总共有样本 9963 张图片,训练集 2501张、测试集 4952张、验证集 2510张。

COCO

关于数据集

  • 数据集名称:COCO 大型图像数据集
  • 发布机构:Microsoft
  • 发布地址:http://cocodataset.org/#
  • 下载地址:https://hyper.ai/datasets/4909
  • 简介:COCO 数据集是一个大型图像数据集,设计用于机器视觉领域的对象检测、分割,人物关键点检测,填充分割和字幕生成。COCO 数据集以场景理解为目标,主要从复杂的日常场景中截取,图像中的目标通过精确的分割进行位置的标定。

COCO 数据集具有以下几个特征:目标分割,情景中的感知,超像素分割,33 万张图像(20 余万个标签),150 万个目标实例,80 个目标类,91 个物品类,25 万有关键点的人。
COCO 数据集于 2014 年由微软发布,现已成为图像字幕的标准测试平台。

目录结构

  1. COCO/
  2. ├── annotations
  3. ├──captions_train2014.json
  4. ├──captions_val2014.json
  5. ├──image_info_test2014.json
  6. ├──instances_minival2014.json
  7. ├──instances_train2014.json
  8. ├──instances_val2014.json
  9. ├──instances_valminusminival2014.json
  10. ├──person_keypoints_minival2014.json
  11. ├──person_keypoints_train2014.json
  12. ├──person_keypoints_val2014.json
  13. ├──person_keypoints_valminusminival2014.json
  14. ├── train2014
  15. ├── COCO_train2014_000000000009.jpg
  16. ├── COCO_train2014_000000000025.jpg
  17. ├── COCO_train2014_000000000030.jpg
  18. ├── COCO_train2014_000000000034.jpg
  19. ├── COCO_train2014_000000000036.jpg
  20. ├── COCO_train2014_000000000049.jpg
  21. ├── COCO_train2014_000000000061.jpg
  22. └── ...
  23. ├── val2014
  24. ├── COCO_val2014_000000000042.jpg
  25. ├── COCO_val2014_000000000073.jpg
  26. ├── COCO_val2014_000000000074.jpg
  27. ├── COCO_val2014_000000000133.jpg
  28. ├── COCO_val2014_000000000136.jpg
  29. ├── COCO_val2014_000000000139.jpg
  30. └── ...

标注格式

instances_minival2014.json 文件为例
主要字段
image.png

  • info

    1. "info":
    2. {
    3. "description": "This is stable 1.0 version of the 2014 MS COCO dataset.",
    4. "url": "http:\/\/mscoco.org",
    5. "version": "1.0",
    6. "year": 2014,
    7. "contributor": "Microsoft COCO group",
    8. }
  • licenses

    1. "licenses": [
    2. {
    3. "url" : str, # 协议链接
    4. "id" : int, # 协议id编号
    5. "name" : str, # 协议名
    6. },
    7. ....
    8. {
    9. "url": "http:\/\/creativecommons.org\/licenses\/by-nc-sa\/2.0\/",
    10. "id": 1,
    11. "name": "Attribution-NonCommercial-ShareAlike License"
    12. }
    13. ]
  • images

    1. "images": [
    2. {
    3. "license" : int, # 遵循哪个协议
    4. "url" : str, # COCO图片链接url
    5. "file_name" : str, # 文件名
    6. "height" : int, # 图像的高
    7. "width" : int, # 图像的宽
    8. "date_captured" : datetime, # 获取数据的日期
    9. "id" : int, # 图像id,可从0开始
    10. },
    11. {
    12. "license": 4,
    13. "url": "http:\/\/farm7.staticflickr.com\/6116\/6255196340_da26cf2c9e_z.jpg",
    14. "file_name": "COCO_val2014_000000397133.jpg",
    15. "height": 427,
    16. "width": 640,
    17. "date_captured": "2013-11-14 17:02:52",
    18. "id": 397133
    19. },
    20. ...
    21. ]
  • type

  • annotations

    1. "annotation"[
    2. {
    3. "segmentation" : RLE or [polygon], # 分割具体数据
    4. "area" : float, # 目标检测的区域大小
    5. "iscrowd" : 0 or 1, # 目标是否被遮盖,默认为0
    6. "image_id" : int, # 图像id编号
    7. "bbox" : [x,y,width,height], # 目标检测框的坐标详细位置信息
    8. "category_id" : int, # 类别id编号
    9. "id" : int, # 注释id编号
    10. },
    11. {
    12. "segmentation": [[510.66,
    13. 423.01,
    14. ......
    15. 510.03,
    16. 423.01,
    17. 510.45,
    18. 423.01]],
    19. "area": 702.10575,
    20. "iscrowd": 0,
    21. "image_id": 289343,
    22. "bbox": [473.07,
    23. 395.93,
    24. 38.65,
    25. 28.67],
    26. "category_id": 18,
    27. "id": 1768
    28. },
    29. ....
    30. ]
  • categories

每个类别的信息

  1. "categories":[
  2. {
  3. "supercategory" : str, # 类别所属的大类,如哈巴狗和狐狸犬都属于犬科这个大类
  4. "id" : int, # 类别id编号
  5. "name" : str, # 类别名字
  6. },
  7. .....
  8. {
  9. "supercategory": "person",
  10. "id": 1,
  11. "name": "person"
  12. }
  13. ]

Pascal VOC to COCO

  1. # pip install mmcv
  2. import os.path as osp
  3. import xml.etree.ElementTree as ET
  4. import mmcv
  5. # 根据自己的数据进行修改
  6. def underwater_classes():
  7. return ['holothurian', 'echinus', 'scallop', 'starfish']
  8. from glob import glob
  9. from tqdm import tqdm
  10. from PIL import Image
  11. label_ids = {name: i + 1 for i, name in enumerate(underwater_classes())}
  12. def get_segmentation(points):
  13. return [points[0], points[1], points[2] + points[0], points[1],
  14. points[2] + points[0], points[3] + points[1], points[0], points[3] + points[1]]
  15. def parse_xml(xml_path, img_id, anno_id):
  16. tree = ET.parse(xml_path)
  17. root = tree.getroot()
  18. annotation = []
  19. for obj in root.findall('object'):
  20. name = obj.find('name').text
  21. if name == 'waterweeds':
  22. continue
  23. category_id = label_ids[name]
  24. bnd_box = obj.find('bndbox')
  25. xmin = int(bnd_box.find('xmin').text)
  26. ymin = int(bnd_box.find('ymin').text)
  27. xmax = int(bnd_box.find('xmax').text)
  28. ymax = int(bnd_box.find('ymax').text)
  29. w = xmax - xmin + 1
  30. h = ymax - ymin + 1
  31. area = w*h
  32. segmentation = get_segmentation([xmin, ymin, w, h])
  33. annotation.append({
  34. "segmentation": segmentation,
  35. "area": area,
  36. "iscrowd": 0,
  37. "image_id": img_id,
  38. "bbox": [xmin, ymin, w, h],
  39. "category_id": category_id,
  40. "id": anno_id,
  41. "ignore": 0})
  42. anno_id += 1
  43. return annotation, anno_id
  44. def cvt_annotations(img_path, xml_path, out_file):
  45. images = []
  46. annotations = []
  47. # xml_paths = glob(xml_path + '/*.xml')
  48. img_id = 1
  49. anno_id = 1
  50. for img_path in tqdm(glob(img_path + '/*.jpg')):
  51. w, h = Image.open(img_path).size
  52. img_name = osp.basename(img_path)
  53. img = {"file_name": img_name, "height": int(h), "width": int(w), "id": img_id}
  54. images.append(img)
  55. xml_file_name = img_name.split('.')[0] + '.xml'
  56. xml_file_path = osp.join(xml_path, xml_file_name)
  57. annos, anno_id = parse_xml(xml_file_path, img_id, anno_id)
  58. annotations.extend(annos)
  59. img_id += 1
  60. categories = []
  61. for k,v in label_ids.items():
  62. categories.append({"name": k, "id": v})
  63. final_result = {"images": images, "annotations": annotations, "categories": categories}
  64. mmcv.dump(final_result, out_file)
  65. return annotations
  66. def main():
  67. xml_path = 'F:/jupyter/Underwater_detection/data/train/box' # xml 所在文件夹
  68. img_path = 'F:/jupyter/Underwater_detection/data/train/image' # 图片所在文件夹
  69. print('processing {} ...'.format("xml format annotations"))
  70. cvt_annotations(img_path, xml_path, 'F:/jupyter/Underwater_detection/data/train/annotations/train.json') # 保存的结果
  71. print('Done!')
  72. if __name__ == '__main__':
  73. main()

COCO to Pascal VOC