使用适当的正则表达式来处理


re.sub(r‘re’,‘’,input)可以用指定字符替换输入文本中的字符

用\t分割的文本和标签 可以处理为两个小列表之后使用 zip转化为元组

  1. def load_dataset(self):
  2. train_path = os.path.join(self.dataset_path, 'images_background')
  3. for alphabet in os.listdir(train_path):
  4. alphabet_path = os.path.join(train_path, alphabet)
  5. for character in os.listdir(alphabet_path):
  6. character_path = os.path.join(alphabet_path, character)
  7. for image in os.listdir(character_path):
  8. self.train_lines.append(os.path.join(character_path, image))
  9. self.train_labels.append(self.types)
  10. self.types += 1
  1. c = random.randint(0, self.types - 1)
  2. selected_path = lines[labels[:] == c]

这里根据随机选择的标签,选取对应标签里面所有的图片路径

这里面

  1. image_indexes = random.sample(range(0, len(selected_path)), 3)
  2. # 取出两张类似的图片
  3. batch_images_path.append(selected_path[image_indexes[0]])
  4. batch_images_path.append(selected_path[image_indexes[1]])
  5. # 取出两张不类似的图片
  6. batch_images_path.append(selected_path[image_indexes[2]])
  7. # 取出与当前的小类别不同的类
  8. different_c = list(range(self.types))
  9. different_c.pop(c)
  10. different_c_index = np.random.choice(range(0, self.types - 1), 1)
  11. current_c = different_c[different_c_index[0]]
  12. selected_path = lines[labels == current_c]
  13. while len(selected_path)<1:
  14. different_c_index = np.random.choice(range(0, self.types - 1), 1)
  15. current_c = different_c[different_c_index[0]]
  16. selected_path = lines[labels == current_c]
  17. image_indexes = random.sample(range(0, len(selected_path)), 1)
  18. batch_images_path.append(selected_path[image_indexes[0]])