
Stanford2021年春季课程CS231N:Convolutional Neural Networks for Visual Recognition的一些作业笔记,这门课的作业围绕视觉相关的任务,需要从底层手动实现一大批经典机器学习算法和神经网络模型,本文是作业的第八部分,包含了自监督学习和整个课程的总结

自监督学习Self-Supervised Learning


自监督学习(Self-supervised Learning)是这几年机器学习和深度学习领域非常火热的一个研究方向,这种学习方式可以在数据没有标注的情况下学到好的数据表示,并且这种方式取得了巨大的成功,因为实际场景中,很多数据集都是没有标注或者难以标注的(要花费大量的成本),而自监督学习可以在数据没有标注的情况下就学到足够好的数据表示,并可以将其用到下游任务中。


对比学习Contrastive Learning


那么我们怎么得到两张相似的图片和两张不相似的图片呢?自监督学习使用的数据是没有标注的,因此我们不能用label来确定数据之间是否相似,而对于图像来说,答案就是可以用图像的各种变换自己生成两张相似的图像,称为一个positive pair,然后以这两张图像是高度相似的为先验知识来进行训练,训练过程中,我们希望得到这两张图尽可能相似的表示,这个过程可以用下面的这张图来表示:

Stanford-CS231N-Assignment札记8:自监督学习与课程总结 - 图1





  1. def compute_train_transform(seed=123456):
  2. random.seed(seed)
  3. torch.random.manual_seed(seed)
  4. # Transformation that applies color jitter with brightness=0.4, contrast=0.4, saturation=0.4, and hue=0.1
  5. color_jitter = transforms.ColorJitter(0.4, 0.4, 0.4, 0.1)
  6. train_transform = transforms.Compose([
  7. # Step 1: Randomly resize and crop to 32x32.
  8. transforms.RandomResizedCrop(32),
  9. # Step 2: Horizontally flip the image with probability 0.5
  10. transforms.RandomHorizontalFlip(p=0.5),
  11. # Step 3: With a probability of 0.8, apply color jitter
  12. # (you can use "color_jitter" defined above.
  13. transforms.RandomApply([color_jitter], p=0.8),
  14. # Step 4: With a probability of 0.2, convert the image to grayscale
  15. transforms.RandomGrayscale(p=0.2),
  16. transforms.ToTensor(),
  17. transforms.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010])
  18. ])
  19. return train_transform
  • 根据作业的要求,我们要按照顺序并以一定的概率对一张图片分别进行调整大小,随机的水平翻转,随机的颜色变换,随机的灰度化等操作。



  1. def sim(z_i, z_j):
  2. norm_dot_product =, z_j) / (torch.linalg.norm(z_i) * torch.linalg.norm(z_j))
  3. return norm_dot_product
  4. def simclr_loss_naive(out_left, out_right, tau):
  5. N = out_left.shape[0] # total number of training examples
  6. # Concatenate out_left and out_right into a 2*N x D tensor.
  7. out =[out_left, out_right], dim=0) # [2*N, D]
  8. total_loss = 0
  9. for k in range(N): # loop through each positive pair (k, k+N)
  10. z_k, z_k_N = out[k], out[k + N]
  11. exp_sum1, exp_sum2 = 0, 0
  12. for i in range(2 * N):
  13. if i != k:
  14. exp_sum1 += torch.exp(sim(z_k, out[i]) / tau)
  15. if i != k + N:
  16. exp_sum2 += torch.exp(sim(z_k_N, out[i]) / tau)
  17. total_loss += -torch.log(torch.exp(sim(z_k, z_k_N) / tau) / exp_sum1)
  18. total_loss += -torch.log(torch.exp(sim(z_k_N, z_k) / tau) / exp_sum2)
  19. # In the end, we need to divide the total loss by 2N, the number of samples in the batch.
  20. total_loss = total_loss / (2 * N)
  21. return total_loss
  22. def sim_positive_pairs(out_left, out_right):
  23. pos_pairs = None
  24. left_norm = out_left / torch.linalg.norm(out_left, dim=-1, keepdim=True)
  25. right_norm = out_right / torch.linalg.norm(out_right, dim=-1, keepdim=True)
  26. mul =, right_norm.T)
  27. pos_pairs = torch.diag(mul).view(-1, 1)
  28. return pos_pairs
  29. def compute_sim_matrix(out):
  30. out_norm = out / torch.linalg.norm(out, dim=1, keepdim=True)
  31. sim_matrix =, out_norm.T)
  32. return sim_matrix
  33. def simclr_loss_vectorized(out_left, out_right, tau, device='cuda'):
  34. N = out_left.shape[0]
  35. # Concatenate out_left and out_right into a 2*N x D tensor.
  36. out =[out_left, out_right], dim=0) # [2*N, D]
  37. # Compute similarity matrix between all pairs of augmented examples in the batch.
  38. sim_matrix = compute_sim_matrix(out) # [2*N, 2*N]
  39. # Step 1: Use sim_matrix to compute the denominator value for all augmented samples.
  40. # Hint: Compute e^{sim / tau} and store into exponential, which should have shape 2N x 2N.
  41. exponential = torch.exp(sim_matrix / tau)
  42. # This binary mask zeros out terms where k=i.
  43. mask = (torch.ones_like(exponential, device=device) - torch.eye(2 * N, device=device)).to(device).bool()
  44. # We apply the binary mask.
  45. exponential = exponential.masked_select(mask).view(2 * N, -1) # [2*N, 2*N-1]
  46. # Hint: Compute the denominator values for all augmented samples. This should be a 2N x 1 vector.
  47. denom = torch.sum(exponential, dim=1, keepdim=True)
  48. # Step 2: Compute similarity between positive pairs.
  49. # You can do this in two ways:
  50. # Option 1: Extract the corresponding indices from sim_matrix.
  51. # Option 2: Use sim_positive_pairs().
  52. sim_pairs = sim_positive_pairs(out_left, out_right)
  53. sim_pairs =[sim_pairs, sim_pairs], dim=0)
  54. # Step 3: Compute the numerator value for all augmented samples.
  55. numerator = torch.exp(sim_pairs / tau)
  56. # Step 4: Now that you have the numerator and denominator for all augmented samples, compute the total loss.
  57. loss = torch.mean(-torch.log(numerator / denom))
  58. return loss


