机器学习基本概念

学习率

The learning rate is a hyperparameter that guides us, in gradient descent, on how to use the gradient of the loss function to tune the network weights.
If the learning rate is too large, it may cause the loss function to directly exceed the global optimal point. At this time, the loss is too large or nan
If the learning rate is too small, the change speed of the loss function is very slow, which will greatly increase the convergence complexity of the network, and it is easy to be trapped in a local minimum or saddle point

整体设计

整体网络结构

The network contains a total of 6 modules. The first module contains a layer of 77 convolution for preliminary feature extraction. The four modules in the middle contain 2 layers of 33 convolution respectively. The features are average pooled, and then the pooled features are input into a fully connected layer for classification, and finally the classification results are output.

Training的效果比Testing的好

Because training is done on the training set, the training process of the neural network essentially uses a nonlinear combination of multiple polynomials to fit the data on the training set, so the fitting effect on the training set will be better. The test process is carried out on the test data. The test data is only similar to the training data, but there are still differences, so the effect of fitting is not as good as that on the training set.

什么是卷积神经网络

卷积神经网络是一种针对图像数据的特征提取问题而设计的神经网络,它致力于解决传统的全连接神经网络在图像数据上所存在的三个问题:(1)参数量和计算量过多,导致训练效率低下;(2)直接将图像转为一维向量会造成空间信息的丢失;(3)参数量过大会导致网络过拟合,泛化性差。
Convolutional neural network is a kind of neural network designed for the feature extraction problem of image data. It is committed to solving three problems existing in traditional fully connected neural network on image data: (1) The amount of parameters and the amount of calculation are too large. (2) Converting images directly into one-dimensional vectors will cause the loss of spatial information; (3) Too many parameters will lead to network overfitting and poor generalization.
卷积神经网络主要由四个主要模块组成,包括卷积层,激活层,池化层以及全连接层。
卷积层的主要作用是特征提取,它使用一个固定大小的拥有可学习参数的卷积核在原图上执行卷积计算,计算得到的特征将组成一幅新的特征图,该特征图将包含一系列图像的特征。
池化层的主要作用是对特征图进行降维,保留其特征比较明显的部分信息,抛弃其他部分信息,以此来降低神经网络的计算量。
激活层的主要作用是对卷积层或池化层输出的特征进行非线性映射,这样可以实现将线性的卷积运算非线性化,从而实现对非线性问题的拟合。
全连接层采用传统的全连接神经网络实现,它的作用是对卷积层和池化层提取出的特征进行分类或回归,得到最终的输出结果,因此,全连接层往往处于卷积神经网络的最后一层。
Convolutional neural networks are mainly composed of four main modules, including convolutional layers, activation layers, pooling layers, and fully connected layers. The main function of the convolution layer is feature extraction. It uses a fixed-size convolution kernel with learnable parameters to perform convolution calculations on the original image. The calculated features will form a new feature map, which will Contains the features of a series of images.
The main function of the pooling layer is to reduce the dimension of the feature map, retain part of the information with obvious features, and discard other parts of the information, so as to reduce the computational load of the neural network.
The main function of the activation layer is to non-linearly map the features output by the convolution layer or pooling layer, which can realize the non-linearization of the linear convolution operation, thereby realizing the fitting of non-linear problems.
The fully connected layer is implemented by the traditional fully connected neural network. Its function is to classify or regress the features extracted by the convolutional layer and the pooling layer to obtain the final output result. Therefore, the fully connected layer is often in the convolutional neural network. the last layer.

用了哪些依赖库

怎么封装的

Write an interface on the back end, and then package it into an exe file with pyinstaller, which can be called by the front end

迁移学习体现在哪里

I first trained my model with MNIST to optimize some parameters.
Then I usedthe dataset I collected myself to train on the basis of this model.
Maybe it can easier to understand with this part of code.

为什么用C

Because my mentor gave me a basic program based on net framework with C#. So I have to do some changes to it. I added a multi process for the connection between the front-end display page with my model. Everytime we enter a number with the styulus or mouse, it will creat a command to invoke the main python file and my saved model. We just have to pay attention to the three important parameters.

为什么用PyTorch

It is a Python-based sustainable computing package that provides two high-level functions: 1. Tensor computing (such as NumPy) with powerful GPU acceleration. 2. Deep neural network including automatic derivation system.
And it’s easy to use.
Take data processing for example, during the process of data processing , I mainly use the transforms module under the torchvision package in pytorch.

为什么用ResNet

Residual Neural Network (ResNet) was proposed by He Yuming of Microsoft Research
在计算机视觉领域中使用最为广泛,最具影响力的网络是ResNet,它利用十分简洁的结构很好地解决了深层次网络训练过程中的梯度消失问题
The most widely used and influential network in the field of computer vision is ResNet, which uses a very simple structure to solve the problem of gradient disappearance in the process of deep network training.
The basic idea is to add the feature map obtained by the convolution of the previous layer and the feature map obtained by the convolution calculation of the current layer.
人们普遍的认知是卷积神经网络的深度越深,其特征提取能力就会越强,体现在检测结果上,就是在数据集上精度会越高。但后来在研究中发现,当网络层数过深时(达到50层以上),神经网络的分类精度反而在降低,并且降低的幅度较大。这种现象主要是出于两种原因,第一个原因是过深层次的网络在训练过程中会存在梯度消失的问题,导致后面层的参数难以或无法被优化;第二个原因是深层次网络无法实现恒等变换问题,在线性变换方面的能力出现了很大程度的下降。卷积神经网络为了实现非线性变换,一般会在每一层卷积和池化后加上一层激活层,这样做在浅层的网络中尚且适合,但是在过深的网络中会导致非线性变换能力过强,线性变换能力大幅度降低的问题。因此,在这两个因素的限制下,卷积神经网络的深度受到了很大的限制。ResNet就是为了解决这两个主要问题而设计的,它在传统的卷积神经网络中引入了“残差学习”的概念。
It is generally believed that the deeper the depth of the convolutional neural network, the stronger its feature extraction ability will be, which is reflected in the detection results, that is, the higher the accuracy will be in the data set. But later in the research, it was found that when the number of network layers is too deep (over 50 layers), the classification accuracy of the neural network is decreasing, and the decrease is larger. This phenomenon is mainly due to two reasons. The first reason is that the gradient of the network with too deep layer will disappear during the training process, making it difficult or impossible to optimize the parameters of the later layers; the second reason is that the deep layer The network cannot realize the identity transformation problem, and the ability of linear transformation has been greatly reduced. In order to achieve non-linear transformation, convolutional neural networks generally add an activation layer after each layer of convolution and pooling. This is suitable in shallow networks, but it will lead to non-linearity in too deep networks. The linear transformation ability is too strong, and the linear transformation ability is greatly reduced. Therefore, under the constraints of these two factors, the depth of convolutional neural networks is greatly limited. ResNet is designed to solve these two main problems, it introduces the concept of “residual learning” in the traditional convolutional neural network.
在该手写数字识别的任务中,手写数字由于其图像所包含的语义内容较少,因此对其进行特征提取时并不需要过深层次的神经网络。相反,过深层次的神经网络反而会导致在训练数据上的过拟合的问题。因此,我对ResNet的网络结构进行了修改,在ResNet18的基础上,将网络层数缩减到了10层,称为ResNet-10
In the task of handwritten digit recognition, since the handwritten digit contains less semantic content in its image, it does not need a deep neural network for feature extraction. On the contrary, too deep neural network will lead to the problem of overfitting on the training data. Therefore, I modified the network structure of ResNet and reduced the number of network layers to 10 layers on the basis of ResNet18, which is called ResNet-10 and this relly help a lot.
But however, from the result, the difference between ResNet10 and ResNet18 are not so big. So after this , I think I should pay more effort to the innovation to increase the accuracy of my model. So I tried to learn the attention mechanism. But the principle behind it is too complicated and it’s so hard for me to understand it completely. So I asked my supervisor for some help, and he told me that I just have to use it and implemente it and that will be easy. So the following work is easier than before.
So I added a spatial attention module and a channel attention module between the first layer of convolution and the second layer of convolution, which improved the accuracy of my model to a certain extent

为什么用Attention Mechanism

Because in the beginning I wanted to make a certain level of innovation on ResNet, but found that the results were not ideal
But however, from the result, the difference between ResNet10 and ResNet18 are not so big. So after this , I think I should pay more effort to the innovation to increase the accuracy of my model. So I tried to learn the attention mechanism. But the principle behind it is too complicated and it’s so hard for me to understand it completely. So I asked my supervisor for some help, and he told me that I just have to use it and implemente it and that will be easy. So the following work is easier than before.
So I added a spatial attention module and a channel attention module between the first layer of convolution and the second layer of convolution, which improved the accuracy of my model to a certain extent

This module firstly obtains the maximum pooling and average pooling of the input features in the spatial dimension, and then linearly transforms the features through a fully connected layer. The features of these two parts share the same set of fully connected layer weights, and two sets of weights are obtained after mapping. , and finally sum the weights to get the weights of the channel dimension.

For example, if we need to identify a cat in a picture, we only need to pay attention to the parts of the picture that contain the cat’s face and body, and do not need to pay attention to other background information in the picture. Feature channels for cat faces and limbs, enhancing them while suppressing other channels.

数据集相关问题

怎么读入数据

All the pictures are collected by myself. I asked my friends and parents for their help. They used the stylus for different styles of writing and this is really useful.
I created nine folders for the different numbers and I sent every picture to its correct folder.
Then I used a python file named get_datalist to process and annotate the pictures.
In get_datalist python file. I first wrote the address of the pictures into two txt file. One is test.txt and the other one is train.txt
To wrote the address, I first created an array with nothing.Then I used a loop to go through these nine folders from 0-9. The standard library function os.listdir() is one of the commonly used functions in file operations , which is used to obtain a list of all files and subfolder names in a specified folder. From this loop, we append the label and address to the empty list.
Then I used random.shuffle to mess up the order of the pictures.
Finally, I wrote these pictures into train.txt and test.txt according to the ratio of three to one. Actually ,this is a very common ration when split the datasets into train sample and test sample.

图片的大小、参数

The input of the network is a picture of 224224
The extracted features are 512-dimensional vectors
输入的是224
224的图像
作为网络的输入
提取到的特征是512维的向量
I use cv_imread to get the picture and use image to change the format of the picture from the array format.Then give the img,device and model named state to the detect function.
And this function is used for picture detection.
We first process the received image. As the pictures we received are always grayscale image. The four values of each pixel of a grayscale image are 0-255. We do an image binarization here. Then we do some processing on the size of the image and change its format to tensor format and give the new format to the new variable img.
During this process , we mainly use the transforms module under the torchvision package in pytorch.
The transformation operation of transforms.ToTensor() is a key step, which converts PILImage into the data form of torch.FloatTensor.
The following Normalize transformation is performed on the data format of tensor. Its operation is to regularize the data of each channel with the given mean and standard deviation.
After the above series of transformations, the data format of the image has first changed in the order of dimensions, and secondly, the scope of the data has also changed.

MNIST

The picture on the left is the famous dataset MNIST.
It is composed of 0~9 handwritten digit pictures and digit labels. It consists of 60,000 training samples and 10,000 test samples, each of which is a 28*28 pixel grayscale handwritten digit picture.

自己的数据集

Next I collected almost 5000 pictures using the tablet and stylus pen to construct my own data set. But my supervisor advised me to collect 10000 pictures. So after that, I did
data argumentation using methods such as flip, rotation, scale, crop and translation to increase the amount of data which finally expand the data set to 10400.
The ratio of training set to test set is 3 to 1, because this is a very common and reasonable ratio in machine learning.
The process of spliting dataset into train dataset and test dataset can be seen in get_datalist python file.

训练过程

神经网络怎么训练
模型参数怎么确定
The parameters of the model are based on some parameter tuning experience plus some trials and then take the best
首先加载网络框架,然后定义一个优化器,使用SGD的优化策略。

模型准确率

演示Demo

通用问题

创新点

In the task of handwritten digit recognition, since the handwritten digit contains less semantic content in its image, it does not need a deep neural network for feature extraction. On the contrary, too deep neural network will lead to the problem of overfitting on the training data. Therefore, I modified the network structure of ResNet and reduced the number of network layers to 10 layers on the basis of ResNet18, which is called ResNet-10 and this relly help a lot.
But however, from the result, the difference between ResNet10 and ResNet18 are not so big. So after this , I think I should pay more effort to the innovation to increase the accuracy of my model. So I tried to learn the attention mechanism. But the principle behind it is too complicated and it’s so hard for me to understand it completely. So I asked my supervisor for some help, and he told me that I just have to use it and implemente it and that will be easy. So the following work is easier than before.
So I added a spatial attention module and a channel attention module between the first layer of convolution and the second layer of convolution, which improved the accuracy of my model to a certain extent

参考文献有哪些(看了什么书籍、课程)

设计思路

为了这个项目做了什么(付出了什么)

想要多少分(85)

一句话概括你的项目

最大的挑战