需要调整权重的场景

  1. 误分类的代价很高,比如对合法用户和非法用户进行分类,将非法用户分类为合法用户的代价很高,我们宁愿将合法用户分类为非法用户,这时可以人工再甄别,但是却不愿将非法用户分类为合法用户。
  2. 第二种是样本是高度失衡的,比如我们有合法用户和非法用户的二元样本数据10000条,里面合法用户有9995条,非法用户只有5条,如果我们不考虑权重,则我们可以将所有的测试集都预测为合法用户,这样预测准确率理论上有99.95%,但是却没有任何意义。


pytorch下应对方法

机器学习分类模型中(比如说scikit-learn)一般可以用调整类别权重或样本权重两种方式来满足上述场景,pytorch中我只找到调整样本权重的方式 WeightedRandomSampler

  1. #根据所属类别样本数量比例调整权重
  2. def make_weights_for_balanced_classes(images, nclasses):
  3. count = [0] * nclasses
  4. for item in images:
  5. count[item[1]] += 1
  6. weight_per_class = [0.] * nclasses
  7. N = float(sum(count))
  8. for i in range(nclasses):
  9. weight_per_class[i] = N/float(count[i])
  10. weight = [0] * len(images)
  11. for idx, val in enumerate(images):
  12. weight[idx] = weight_per_class[val[1]]
  13. return weight
  1. train_set = ImageFolder(opt.train_data_root, train_tf, loader=pil_loader)
  2. valid_set = ImageFolder(opt.val_data_root, valid_tf, loader=pil_loader)
  3. weights = make_weights_for_balanced_classes(train_set.imgs, len(train_set.classes))
  4. #固定根据所属类别设定不同权重
  5. #weights = [2 if label == 1 else 1 for data, label in train_set]
  6. weights = t.DoubleTensor(weights)
  7. sampler = t.utils.data.sampler.WeightedRandomSampler(weights, len(weights))
  8. train_dataloader = DataLoader(train_set, opt.batch_size, sampler=sampler, num_workers=opt.num_workers)