Implements Faster R-CNN.
    实现更快的R-CNN
    The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each image, and should be in 0-1 range. Different images can have different sizes.
    模型的输入应该是张量列表,每个张量为[C,H,W]形状,每个图像一个,应该在0-1范围内。 不同的图像可以有不同的大小。
    The behavior of the model changes depending if it is in training or evaluation mode.
    模型的行为会根据其是否处于训练或评估模式而变化。

    During training, the model expects both the input tensors, as well as a targets dictionary,containing:
    在训练期间,模型需要输入张量以及目标字典,其中包含:

    1. - boxes (Tensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values between 0 and H and 0 and W
    2. - labels (Tensor[N]): the class label for each ground-truth box

    The model returns a Dict[Tensor] during training, containing the classification and regression losses for both the RPN and the R-CNN.
    该模型在训练期间返回Dict [Tensor],包含RPN和R-CNN的分类和回归损失。
    During inference, the model requires only the input tensors, and returns the post-processed predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as follows:
    在推理期间,模型仅需要输入张量,并将后处理的预测作为List [Dict [Tensor]]返回,每个输入图像一个。 Dict的字段如下:

    1. - boxes (Tensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between 0 and H and 0 and W
    2. - labels (Tensor[N]): the predicted labels for each image
    3. - scores (Tensor[N]): the scores or each prediction

    Arguments:
    参数:
    backbone (nn.Module): the network used to compute the features for the model.
    用于计算模型特征的网络。
    It should contain a out_channels attribute, which indicates the number of output channels that each feature map has (and it should be the same for all feature maps).
    它应该包含一个outchannels属性,该属性指示每个要素图具有的输出通道数(并且对于所有要素图应该相同)
    _The backbone should return a single Tensor or and OrderedDict[Tensor].

    num_classes (int): number of output classes of the model (including the background).
    numclasses (int)_:模型的输出类数(包括背景)。
    If box_predictor is specified, num_classes should be None.
    如果指定了boxpredictor,则num_classes应为None。
    _min_size (int): minimum size of the image to be rescaled before feeding it to the backbone

    在将图像馈送到主干之前要重新缩放的图像的最小尺寸
    max_size (int): maximum size of the image to be rescaled before feeding it to the backbone
    在将图像馈送到主干之前要重新缩放的图像的最大尺寸
    image_mean (Tuple[float, float, float]): mean values used for input normalization.
    用于输入归一化的平均值。
    They are generally the mean values of the dataset on which the backbone has been trained on
    image_std (Tuple[float, float, float]): std values used for input normalization.
    它们通常是骨架在imagestd上训练的数据集的平均值(元组[float,float,float]):用于输入标准化的标准值。
    _They are generally the std values of the dataset on which the backbone has been trained on rpn_anchor_generator (AnchorGenerator): module that generates the anchors for a set of feature maps.

    它们通常是在rpn_anchor_generator(AnchorGenerator)上训练骨干的数据集的std值:生成一组特征映射的锚点的模块。
    rpn_head (nn.Module): module that computes the objectness and regression deltas from the RPN
    从RPN计算对象性和回归增量的模块
    rpn_pre_nms_top_n_train (int): number of proposals to keep before applying NMS during training
    在训练期间应用NMS之前要保留的提案数量
    rpn_pre_nms_top_n_test (int): number of proposals to keep before applying NMS during testing
    在测试期间应用NMS之前要保留的提案数量
    rpn_post_nms_top_n_train (int): number of proposals to keep after applying NMS during training
    在训练期间应用NMS之后要保留的提案数量
    rpn_post_nms_top_n_test (int): number of proposals to keep after applying NMS during testing
    在测试期间应用NMS之后要保留的提案数量
    rpn_nms_thresh (float): NMS threshold used for postprocessing the RPN proposals
    用于后处理RPN提议的NMS阈值
    rpn_fg_iou_thresh (float): minimum IoU between the anchor and the GT box so that they can be
    considered as positive during training of the RPN.
    锚和GT盒之间的最小IoU,以便在训练RPN时可以认为它们是正的。
    rpn_bg_iou_thresh (float): maximum IoU between the anchor and the GT box so that they can be considered as negative during training of the RPN.
    锚和GT盒之间的最大IoU,以便它们可以__在训练RPN期间被视为负值。
    rpn_batch_size_per_image (int): number of anchors that are sampled during training of the RPN for computing the loss
    在训练RPN期间为计算损失而采样的锚的数量
    rpn_positive_fraction (float): proportion of positive anchors in a mini-batch during training of the RPN
    在训练RPN期间,小批量中的正的锚的比例
    box_roi_pool (MultiScaleRoIAlign): the module which crops and resizes the feature maps in the locations indicated by the bounding boxes
    用于在边界框指示的位置中裁剪和调整特征映射的模块
    box_head (nn.Module): module that takes the cropped feature maps as input
    将裁剪的特征图作为输入的模块
    box_predictor (nn.Module): module that takes the output of box_head and returns the classification logits and box regression deltas.
    获取box_head输出并返回分类logits和box回归增量的模块。
    box_score_thresh (float): during inference, only return proposals with a classification score greater than box_score_thresh
    在推理期间,仅返回分类分数大于box_score_thresh的提案
    box_nms_thresh (float): NMS threshold for the prediction head. Used during inference
    预测头的NMS阈值。 在推理期间使用
    box_detections_per_img (int): maximum number of detections per image, for all classes.
    对于所有类,每个图像的最大检测数。
    box_fg_iou_thresh (float): minimum IoU between the proposals and the GT box so that they can be considered as positive during training of the classification head
    提案和GT箱之间的最小IoU,以便在分类头的训练期间可以认为它们是正的
    box_bg_iou_thresh (float): maximum IoU between the proposals and the GT box so that they can be considered as negative during training of the classification head
    提案和GT箱之间的最大IoU,以便在分类头的训练期间可以认为它们是负的
    box_batch_size_per_image (int): number of proposals that are sampled during training of the classification head
    在分类头部训练期间抽样的提案数量
    box_positive_fraction (float): proportion of positive proposals in a mini-batch during training of the classification head
    在分类头训练期间,小批量中的正的提议的比例
    bbox_reg_weights (Tuple[float, float, float, float]): weights for the encoding/decoding of the bounding boxes
    用于边界框的编码/解码的权重
    Example::

    1. import torch
    2. import torchvision
    3. from torchvision.models.detection import FasterRCNN
    4. from torchvision.models.detection.rpn import AnchorGenerator
    5. # load a pre-trained model for classification and return
    6. # only the features
    7. backbone = torchvision.models.mobilenet_v2(pretrained=True).features
    8. # FasterRCNN needs to know the number of
    9. # output channels in a backbone. For mobilenet_v2, it's 1280
    10. # so we need to add it here
    11. backbone.out_channels = 1280
    12. # let's make the RPN generate 5 x 3 anchors per spatial
    13. # location, with 5 different sizes and 3 different aspect
    14. # ratios. We have a Tuple[Tuple[int]] because each feature
    15. # map could potentially have different sizes and
    16. # aspect ratios
    17. anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
    18. aspect_ratios=((0.5, 1.0, 2.0),))
    19. # let's define what are the feature maps that we will
    20. # use to perform the region of interest cropping, as well as
    21. # the size of the crop after rescaling.
    22. # if your backbone returns a Tensor, featmap_names is expected to
    23. # be [0]. More generally, the backbone should return an
    24. # OrderedDict[Tensor], and in featmap_names you can choose which
    25. # feature maps to use.
    26. roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
    27. output_size=7,
    28. sampling_ratio=2)
    29. # put the pieces together inside a FasterRCNN model
    30. model = FasterRCNN(backbone,
    31. num_classes=2,
    32. rpn_anchor_generator=anchor_generator,
    33. box_roi_pool=roi_pooler)
    34. model.eval()
    35. x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
    36. predictions = model(x)