Facial emo tions can be divided into seven categories: happy, sad, fearful, angry,
    surprised, disgusted and neutral.
    面部表情可以分为七类:快乐、悲伤、恐惧、愤怒、惊讶、厌恶和中性。

    The first thing to do for facial ex- pression recognition is to preprocess the collected images, then carry
    out feature extraction and classification recognition.
    人脸表情识别首先要对采集的图像进行预处理,然后进行特征提取和分类识别。

    many scholars tend to use the convolutional neural network to extract image features.
    许多学者倾向于使用卷积神经网络来提取图像特征。

    support vector machine (SVM) method.
    支持向量机(SVM)方法。

    Through the analysis of the above literatures, it can be found that the facial emotional features extracted by the above methods have the problem that the original emotional information is easy to be lost.
    通过对上述文献的分析,可以发现上述方法提取的面部情感特征存在原始情感信息容易丢失的问题。

    the generalization and robustness of these network models are also poor and the accuracy of facial expression recognition is not high.
    这些网络模型的泛化能力和鲁棒性也较差,人脸表情识别的准确率也不高。

    Convolution ( Hasebe & Ueda, 2021 ) is widely used in the field of image processing, such as filtering, edge detection, image sharpening, etc.
    卷积(Hasebe &上田,2021)广泛应用于图像处理领域,如滤波、边缘检测、图像锐化等。

    In the convolutional neural network, features in the image can be extracted by convolution
    operation ( Sahani & Dash, Apr, 2021 ).
    在卷积神经网络中,图像中的特征可以通过卷积运算来提取(Sahani & Dash,2021年4月)。

    The lower convolutional layer can extract some features such as edges, lines and angles of the image
    ( Guttery, 2021 ; Satapathy, 2021 ; Wang, 2021a , 2021b ).
    下卷积层可以提取图像的边缘、线条、角度等一些特征(Guttery,2021;Satapathy,2021;王,2021a,2021b)。

    The higher con- volutional layer can learn more complex features from the lower convo-
    lutional layer, so as to realize image classification and recognition.
    较高的卷积层可以从较低的卷积层学习更复杂的特征,从而实现图像的分类和识别。

    convolution is a mathematical operator that generates a third function by two functions f and g,
    卷积是通过两个函数f和g产生第三个函数的数学算子,
    面部表情识别 - 图1

    h(x) represents the integral of the overlap length of the product of the overlapping function values of the function f and g by flipping and shifting.
    h(x) 表示函数 f 和 g 的重叠函数值通过翻转和移位的乘积的重叠长度的积分。

    The physical meaning of convolution is the weighted superposition of one function over another.
    卷积的物理意义是一个函数对另一个函数的加权叠加。

    The output of the system is the result of the superposition of multiple inputs.
    系统的输出是多个输入叠加的结果。

    In the image analysis, f( x ) is the original pixel point, g( x ) is the action point.
    在图像分析中,f(x) 是原始像素点,g(x) 是动作点。
    All the action points are combined into the convolution kernel.
    所有的动作点被合并到卷积核中。
    After all the action points on the convolution kernel are applied to the original pixel points in turn, the
    output of the final convolution is obtained.
    卷积核上的所有动作点依次应用于原始像素点后,得到最终卷积的输出。

    The element of convolution operation in the convolutional layer is called the convolution kernel, and its parameters need to be learned ( Bister et al., 2021 ; Ganguly et al., 2021 ).
    卷积层中卷积运算的元素称为卷积核,其参数需要学习(Bister等,2021;Ganguly等人,2021年)。

    The size of the convolution kernel should be smaller than the size of the input image.
    卷积核的大小应该小于输入图像的大小。

    During the convolution process, each kernel is convolved with the input image to calculate a feature map ( Satapathy & Wu, 2020 ; Wang, 2021a , 2021b ; Zhang, Nayak, Zhang & Wang, 2020 ).
    在卷积过程中,每个核与输入图像卷积以计算特征图(Satapathy & Wu,2020;王,2021a,2021b张,纳亚克,张,王,2020)。

    In other words, the convolution kernel will slide over the input image and computes the dot product
    between the input and the convolution kernel at each spatial position. Then, the feature map of each kernel is superimposed along the depth dimension to obtain the output image of the convolutional layer.
    换句话说,卷积核将在输入图像上滑动,并在每个空间位置计算输入和卷积核之间的点积。然后,沿着深度维度叠加每个核的特征图,得到卷积层的输出图像。

    In short, convolution is the multiplication and addition of the corresponding elements between the input matrix and the convolution kernel. Finally, the whole input matrix is traversed and the result matrix is obtained.
    简而言之,卷积是输入矩阵和卷积核之间对应元素的乘法和加法。最后遍历整个输入矩阵,得到结果矩阵。

    The convolution operation is to calculate the degree of similarity be- tween each location and the pattern, or the number of components of the pattern at each location.
    卷积运算是计算每个位置和图案之间的相似度,或每个位置的图案的成分数。

    Pooling
    池化

    After obtaining feature map through convolutional layer, the next step is to integrate and classify these features.
    通过卷积层获得特征图后,下一步就是对这些特征进行整合和分类。

    Theoretically, all features extracted by convolution can be used as input to a classifier, such as the
    Softmax classifier ( Ashiquzzaman et al., 2020 ; Satapathy & Zhu, 2020 ), but doing so has large amount of calculation.
    理论上,通过卷积提取的所有特征可以用作分类器的输入,例如Softmax分类器(Ashiquzzaman等人,2020;Satapathy &朱,2020),但是这样做需要大量的计算。

    At this time, we will use pooling layer to get the feature dimension reduce.
    此时,我们将使用池化层来降低特征维数。

    Pooling layer is also called the down-sampling layer, it aims to reduce the size of matrix generated by a convolutional layer.
    池化层也称为下采样层,它旨在减小卷积层生成的矩阵的大小。

    On the one hand, pooling reduces features and parameters, thus simplifying the complexity of convolutional network calculation.Pooling also prevents overfit- ting to a certain extent, making optimization more convenient.
    一方面,池化减少了特征和参数,从而简化了卷积网络计算的复杂性。池化也在一定程度上防止了溢出,使优化更加方便。
    On the other hand, some invariance of features is maintained, such as rotation, translation and contraction, etc.
    另一方面,保持了特征的一些不变性,如旋转、平移、收缩等。

    Pooling can increase the invariance of the network against translation, which is very critical to the improvement of the network generalization ability.

    Pooling makes the model pay more attention to the existence of certain features rather than the
    specific location of the feature.
    池化使得模型更加关注某些特征的存在,而不是特征的具体位置。

    Another use of pooling is to increase the receptive field, that is, the size of a pixel corresponding to the original image, which improves the capability of the model.
    池化的另一个用途是增加感受野,即原始图像对应的像素大小,提高了模型的能力。

    There are two main pooling operations, Average Pooling and Max Pooling.
    有两种主要的池化操作,平均池化和最大池化。

    The error of feature extraction mainly comes from two aspects: The first error is that the estimated variance increases due to the limitation of neighborhood size.
    特征提取的误差主要来自两个方面:第一个误差是由于邻域大小的限制,估计方差增大。

    The second error is that the convolution layer param- eter error causes the deviation of the estimated mean value.
    第二个误差是卷积层参数误差导致估计平均值的偏差。

    Generally speaking, Max Pooling is more efficient and is a common method in image processing.
    一般来说,Max Pooling效率更高,是图像处理中常用的方法。

    It’s kind of like making feature choices. Max Pooling selects features with good classification and recognition performance and has nonlinear characteristics.
    这有点像选择功能。Max Pooling选择分类识别性能好且具有非线性特征的特征。

    In addition, Max Pooling can handle the second error mentioned above and retain texture features
    well.
    此外,最大池可以处理上述第二个错误,并很好地保留纹理特征。