引言
本节我们将使用之前介绍的基于GIN的图表示学习神经网络,和在前一节中我们自己定义的数据集来实现分子图的量子性质预测任务。
1. 原理回顾
GIN
的数学公式表示如下:
%7D%20%3D%20MLP%5E%7B(k)%7D((1%20%2B%20%CF%B5%5E%7B(k)%7D)%C2%B7hv%5E%7B(k-1)%7D%20%2B%20%5Csum%7Bu%20%5Cin%20%5Cmathcal%7BN%7D(v)%7Dhu%5E%7B(k-1)%7D)%0A#card=math&code=h_v%5E%7B%28k%29%7D%20%3D%20MLP%5E%7B%28k%29%7D%28%281%20%2B%20%CF%B5%5E%7B%28k%29%7D%29%C2%B7h_v%5E%7B%28k-1%29%7D%20%2B%20%5Csum%7Bu%20%5Cin%20%5Cmathcal%7BN%7D%28v%29%7Dh_u%5E%7B%28k-1%29%7D%29%0A&id=nHcsO)
- MLP可以近似拟合任意函数,故可以学习到单射函数,而GraphSAGE和GCN中使用的单层感知机不能满足。
- 约束输入特征是one-hot,故第一次迭代sum后还是满足单射性,不需先做MLP的预处理。
- 由于
%7D#card=math&code=h_v%5E%7B%280%29%7D&id=VOhSP)是可数的,根据论文中的定理,迭代k轮得到新特征
%7D#card=math&code=h_v%5E%7B%28k%29%7D&id=MxAwn)是可数的,经过了转换
#card=math&code=f%28x%29&id=xlSLN),下一轮迭代还是满足单射性条件。
对于图分类任务,在得到每个节点的表征向量后,对图上各个节点的表征做图池化,公式如下:
%7D%7Cv%20%5Cin%20G%5C%7D)%2C%20k%20%3D%200%2C1%2C%20…%2C%20K)%0A#card=math&code=h_G%20%3D%20CONCAT%28sum%28%5C%7Bh_v%5E%7B%28k%29%7D%7Cv%20%5Cin%20G%5C%7D%29%2C%20k%20%3D%200%2C1%2C%20…%2C%20K%29%0A&id=G5Ikw)
可以看出,GIN
通过使用concat+sum
,对每次迭代得到的所有节点特征求和得到图的特征,然后拼接起来最终得到图的表征向量。
2. 代码分析
首先来看一下代码的整体结构,如下图所示
其中,
gin_conv.py
:图同构卷积层gin_graph.py
:生成图的表征向量gin_node.py
:生成节点的表征向量mol_encoder.py
:生成节点或者边的第0层表征向量pcqm4m_data.py
:上一节中我们自己定义的大型数据集main.py
:main
函数
下面来详细看下各个部分
2.1 gin_conv
import torch
from torch import nn
from torch_geometric.nn import MessagePassing
import torch.nn.functional as F
from ogb.graphproppred.mol_encoder import BondEncoder
### GIN convolution along the graph structure
class GINConv(MessagePassing):
def __init__(self, emb_dim):
# emb_dim (int): 节点嵌入维度
# 设置聚合方式为 add
super(GINConv, self).__init__(aggr = "add")
self.mlp = nn.Sequential(
nn.Linear(emb_dim, emb_dim),
nn.BatchNorm1d(emb_dim),
nn.ReLU(),
nn.Linear(emb_dim, emb_dim))
self.eps = nn.Parameter(torch.Tensor([0]))
self.bond_encoder = BondEncoder(emb_dim = emb_dim)
def forward(self, x, edge_index, edge_attr):
# 先将类别型边属性转换为边嵌入
edge_embedding = self.bond_encoder(edge_attr)
out = self.mlp((1 + self.eps) *x + self.propagate(edge_index, x=x, edge_attr=edge_embedding))
return out
def message(self, x_j, edge_attr):
return F.relu(x_j + edge_attr)
def update(self, aggr_out):
return aggr_out
该层遵循之前原理中介绍的公式,即
%7D%20%3D%20MLP%5E%7B(k)%7D((1%20%2B%20%CF%B5%5E%7B(k)%7D)%C2%B7hv%5E%7B(k-1)%7D%20%2B%20%5Csum%7Bu%20%5Cin%20%5Cmathcal%7BN%7D(v)%7Dhu%5E%7B(k-1)%7D)%0A#card=math&code=h_v%5E%7B%28k%29%7D%20%3D%20MLP%5E%7B%28k%29%7D%28%281%20%2B%20%CF%B5%5E%7B%28k%29%7D%29%C2%B7h_v%5E%7B%28k-1%29%7D%20%2B%20%5Csum%7Bu%20%5Cin%20%5Cmathcal%7BN%7D%28v%29%7Dh_u%5E%7B%28k-1%29%7D%29%0A&id=ovpGZ)
消息传播的过程随着self.propagate()
方法的调用开始执行,该函数接收edge_index
, x
, edge_attr
三个参数。edge_index
是形状为[2,num_edges]
的Tensor
,在消息传递过程中,此张量首先被按行拆分为**x_i**
和**x_j**
张量,**x_j**
表示了消息传递的源节点,**x_i**
表示了消息传递的目标节点。接着message()
方法被调用,此函数定义了从源节点传入到目标节点的消息,在这里要传递的消息是源节点表征与边表征之和的**relu()**
的输出。然后是更新过程,这里update()
方法只简单地返回了输入的aggr_out
。最后,在forward
函数中通过执行out = self.mlp((1 + self.eps) *x + self.propagate(edge_index, x=x, edge_attr=edge_embedding))
实现消息的最后更新(这里就是上面介绍的公式的代码形式)。
2.2 gin_graph
import torch
from torch import nn
from torch_geometric.nn import global_add_pool, global_mean_pool, global_max_pool, GlobalAttention, Set2Set
from gin_node import GINNodeEmbedding
class GINGraphPooling(nn.Module):
def __init__(self, num_tasks=1, num_layers=5, emb_dim=300, residual=False, drop_ratio=0, JK="last", graph_pooling="sum"):
"""
GIN Graph Pooling Module
Args:
num_tasks (int, optional): number of labels to be predicted. Defaults to 1 (控制了图表示的维度,dimension of graph representation).
num_layers (int, optional): number of GINConv layers. Defaults to 5.
emb_dim (int, optional): dimension of node embedding. Defaults to 300.
residual (bool, optional): adding residual connection or not. Defaults to False.
drop_ratio (float, optional): dropout rate. Defaults to 0.
JK (str, optional): 可选的值为"last"和"sum"。选"last",只取最后一层的结点的嵌入,选"sum"对各层的结点的嵌入求和。Defaults to "last".
graph_pooling (str, optional): pooling method of node embedding. 可选的值为"sum","mean","max","attention"和"set2set"。 Defaults to "sum".
Out:
graph representation
"""
super(GINGraphPooling, self).__init__()
self.num_layers = num_layers
self.drop_ratio = drop_ratio
self.JK = JK
self.emb_dim = emb_dim
self.num_tasks = num_tasks
if self.num_layers < 2:
raise ValueError("Number of GNN layers must be greater than 1.")
self.gnn_node = GINNodeEmbedding(num_layers, emb_dim, JK=JK, drop_ratio=drop_ratio, residual=residual)
# Pooling function to generate whole-graph embeddings
if graph_pooling == "sum":
self.pool = global_add_pool
elif graph_pooling == "mean":
self.pool = global_mean_pool
elif graph_pooling == "max":
self.pool = global_max_pool
elif graph_pooling == "attention":
self.pool = GlobalAttention(gate_nn=nn.Sequential(
nn.Linear(emb_dim, emb_dim), nn.BatchNorm1d(emb_dim), nn.ReLU(), nn.Linear(emb_dim, 1)))
elif graph_pooling == "set2set":
self.pool = Set2Set(emb_dim, processing_steps=2)
else:
raise ValueError("Invalid graph pooling type.")
if graph_pooling == "set2set":
self.graph_pred_linear = nn.Linear(2*self.emb_dim, self.num_tasks)
else:
self.graph_pred_linear = nn.Linear(self.emb_dim, self.num_tasks)
def forward(self, batched_data):
h_node = self.gnn_node(batched_data)
h_graph = self.pool(h_node, batched_data.batch)
output = self.graph_pred_linear(h_graph)
if self.training:
return output
else:
# At inference time, relu is applied to output to ensure positivity
return torch.clamp(output, min=0, max=50)
此模块首先采用GINNodeEmbedding模块对图上每一个节点做表征,然后对节点表征池化得到图的表征向量,最后用一层线性变换得到图的最终的表示(graph representation)。
2.3 gin_node
import torch
from mol_encoder import AtomEncoder
from gin_conv import GINConv
import torch.nn.functional as F
# GNN to generate node embedding
class GINNodeEmbedding(torch.nn.Module):
"""
Output:
node representations
"""
def __init__(self, num_layers, emb_dim, drop_ratio=0.5, JK="last", residual=False):
"""
GIN Node Embedding Module
采用多层GINConv实现图上结点的嵌入。
"""
super(GINNodeEmbedding, self).__init__()
self.num_layers = num_layers
self.drop_ratio = drop_ratio
self.JK = JK
# add residual connection or not
self.residual = residual
if self.num_layers < 2:
raise ValueError("Number of GNN layers must be greater than 1.")
self.atom_encoder = AtomEncoder(emb_dim)
# List of GNNs
self.convs = torch.nn.ModuleList()
self.batch_norms = torch.nn.ModuleList()
for layer in range(num_layers):
self.convs.append(GINConv(emb_dim))
self.batch_norms.append(torch.nn.BatchNorm1d(emb_dim))
def forward(self, batched_data):
x, edge_index, edge_attr = batched_data.x, batched_data.edge_index, batched_data.edge_attr
# computing input node embedding
h_list = [self.atom_encoder(x)] # 先将类别型原子属性转化为原子嵌入
for layer in range(self.num_layers):
h = self.convs[layer](h_list[layer], edge_index, edge_attr)
h = self.batch_norms[layer](h)
if layer == self.num_layers - 1:
# remove relu for the last layer
h = F.dropout(h, self.drop_ratio, training=self.training)
else:
h = F.dropout(F.relu(h), self.drop_ratio, training=self.training)
if self.residual:
h += h_list[layer]
h_list.append(h)
# Different implementations of Jk-concat
if self.JK == "last":
node_representation = h_list[-1]
elif self.JK == "sum":
node_representation = 0
for layer in range(self.num_layers + 1):
node_representation += h_list[layer]
return node_representation
此模块首先用AtomEncoder()
将节点类别属性转化为第0
层节点表征,然后逐层计算节点表征,从第1
层开始到第num_layers
层,每一层节点表征的计算都以上一层的节点表征**h_list[layer]**
、边**edge_index**
和边的属性**edge_attr**
为输入。在这里,GINConv
的层数越多,此节点嵌入模块的感受野(receptive field)越大,结点**i**
的表征最远能捕获到结点**i**
的距离为**num_layers**
的邻接节点的信息。
2.4 mol_encoder
import torch
from ogb.utils.features import get_atom_feature_dims, get_bond_feature_dims
full_atom_feature_dims = get_atom_feature_dims()
full_bond_feature_dims = get_bond_feature_dims()
class AtomEncoder(torch.nn.Module):
"""该类用于对原子属性做嵌入。
记`N`为原子属性的维度,则原子属性表示为`[x1, x2, ..., xi, xN]`,其中任意的一维度`xi`都是类别型数据。full_atom_feature_dims[i]存储了原子属性`xi`的类别数量。
该类将任意的原子属性`[x1, x2, ..., xi, xN]`转换为原子的嵌入`x_embedding`(维度为emb_dim)。
"""
def __init__(self, emb_dim):
super(AtomEncoder, self).__init__()
self.atom_embedding_list = torch.nn.ModuleList()
for i, dim in enumerate(full_atom_feature_dims):
emb = torch.nn.Embedding(dim, emb_dim) # 不同维度的属性用不同的Embedding方法
torch.nn.init.xavier_uniform_(emb.weight.data)
self.atom_embedding_list.append(emb)
def forward(self, x):
x_embedding = 0
for i in range(x.shape[1]):
x_embedding += self.atom_embedding_list[i](x[:,i])
return x_embedding
class BondEncoder(torch.nn.Module):
def __init__(self, emb_dim):
super(BondEncoder, self).__init__()
self.bond_embedding_list = torch.nn.ModuleList()
for i, dim in enumerate(full_bond_feature_dims):
emb = torch.nn.Embedding(dim, emb_dim)
torch.nn.init.xavier_uniform_(emb.weight.data)
self.bond_embedding_list.append(emb)
def forward(self, edge_attr):
bond_embedding = 0
for i in range(edge_attr.shape[1]):
bond_embedding += self.bond_embedding_list[i](edge_attr[:,i])
return bond_embedding
该模块用于产生节点或边的第0层表征向量。
2.5 pcqm4m_data
import os
import os.path as osp
import pandas as pd
import torch
from ogb.utils import smiles2graph
from ogb.utils.torch_util import replace_numpy_with_torchtensor
from ogb.utils.url import download_url, extract_zip
from rdkit import RDLogger
from torch_geometric.data import Data, Dataset
import shutil
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
RDLogger.DisableLog('rdApp.*')
class MyPCQM4MDataset(Dataset):
def __init__(self, root):
self.url = 'https://dgl-data.s3-accelerate.amazonaws.com/dataset/OGB-LSC/pcqm4m_kddcup2021.zip'
super(MyPCQM4MDataset, self).__init__(root)
filepath = osp.join(root, 'raw/data.csv.gz')
data_df = pd.read_csv(filepath)
self.smiles_list = data_df['smiles']
self.homolumogap_list = data_df['homolumogap']
@property
def raw_file_names(self):
return 'data.csv.gz'
def download(self):
path = download_url(self.url, self.root)
extract_zip(path, self.root)
os.unlink(path)
shutil.move(osp.join(self.root, 'pcqm4m_kddcup2021/raw/data.csv.gz'), osp.join(self.root, 'raw/data.csv.gz'))
def len(self):
return len(self.smiles_list)
def get(self, idx):
smiles, homolumogap = self.smiles_list[idx], self.homolumogap_list[idx]
graph = smiles2graph(smiles)
assert(len(graph['edge_feat']) == graph['edge_index'].shape[1])
assert(len(graph['node_feat']) == graph['num_nodes'])
x = torch.from_numpy(graph['node_feat']).to(torch.int64)
edge_index = torch.from_numpy(graph['edge_index']).to(torch.int64)
edge_attr = torch.from_numpy(graph['edge_feat']).to(torch.int64)
y = torch.Tensor([homolumogap])
num_nodes = int(graph['num_nodes'])
data = Data(x, edge_index, edge_attr, y, num_nodes=num_nodes)
return data
def get_idx_split(self):
split_dict = replace_numpy_with_torchtensor(torch.load(osp.join(self.root, 'pcqm4m_kddcup2021/split_dict.pt')))
return split_dict
if __name__ == "__main__":
dataset = MyPCQM4MDataset('D://Dataset/MyDataset')
from torch_geometric.data import DataLoader
from tqdm import tqdm
dataloader = DataLoader(dataset, batch_size=256, shuffle=True)
for batch in tqdm(dataloader):
pass
该模块是我们自己定义的大型数据集。
2.6 main
import os
import torch
import argparse
from tqdm import tqdm
from ogb.lsc import PCQM4MEvaluator
from torch_geometric.data import DataLoader
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
from pcqm4m_data import MyPCQM4MDataset
from gin_graph import GINGraphPooling
from torch.utils.tensorboard import SummaryWriter
# 模型参数
def parse_args():
parser = argparse.ArgumentParser(description='Graph data miming with GNN')
parser.add_argument('--task_name', type=str, default='GINGraphPooling',
help='task name')
parser.add_argument('--device', type=int, default=0,
help='which gpu to use if any (default: 0)')
parser.add_argument('--num_layers', type=int, default=5,
help='number of GNN message passing layers (default: 5)')
parser.add_argument('--graph_pooling', type=str, default='sum',
help='graph pooling strategy mean or sum (default: sum)')
parser.add_argument('--emb_dim', type=int, default=256,
help='dimensionality of hidden units in GNNs (default: 256)')
parser.add_argument('--drop_ratio', type=float, default=0.,
help='dropout ratio (default: 0.)')
parser.add_argument('--save_test', action='store_true')
parser.add_argument('--batch_size', type=int, default=512,
help='input batch size for training (default: 512)')
parser.add_argument('--epochs', type=int, default=100,
help='number of epochs to train (default: 100)')
parser.add_argument('--weight_decay', type=float, default=0.00001,
help='weight decay')
parser.add_argument('--early_stop', type=int, default=10,
help='early stop (default: 10)')
parser.add_argument('--num_workers', type=int, default=4,
help='number of workers (default: 4)')
parser.add_argument('--dataset_root', type=str, default="dataset",
help='dataset root')
args = parser.parse_args()
return args
# 预分区
def prepartion(args):
save_dir = os.path.join('saves', args.task_name)
if os.path.exists(save_dir):
for idx in range(1000):
if not os.path.exists(save_dir + '=' + str(idx)):
save_dir = save_dir + '=' + str(idx)
break
args.save_dir = save_dir
os.makedirs(args.save_dir, exist_ok=True)
args.device = torch.device("cuda:" + str(args.device)) if torch.cuda.is_available() else torch.device("cpu")
args.output_file = open(os.path.join(args.save_dir, 'output'), 'a')
print(args, file=args.output_file, flush=True)
def train(model, device, loader, optimizer, criterion_fn):
model.train()
loss_accum = 0
for step, batch in enumerate(tqdm(loader)):
batch = batch.to(device)
pred = model(batch).view(-1,)
optimizer.zero_grad()
loss = criterion_fn(pred, batch.y)
loss.backward()
optimizer.step()
loss_accum += loss.detach().cpu().item()
return loss_accum / (step + 1)
def eval(model, device, loader, evaluator):
model.eval()
y_true = []
y_pred = []
with torch.no_grad():
for _, batch in enumerate(tqdm(loader)):
batch = batch.to(device)
pred = model(batch).view(-1,)
y_true.append(batch.y.view(pred.shape).detach().cpu())
y_pred.append(pred.detach().cpu())
y_true = torch.cat(y_true, dim=0)
y_pred = torch.cat(y_pred, dim=0)
input_dict = {"y_true": y_true, "y_pred": y_pred}
return evaluator.eval(input_dict)["mae"]
def test(model, device, loader):
model.eval()
y_pred = []
with torch.no_grad():
for _, batch in enumerate(loader):
batch = batch.to(device)
pred = model(batch).view(-1,)
y_pred.append(pred.detach().cpu())
y_pred = torch.cat(y_pred, dim=0)
return y_pred
def main(args):
prepartion(args)
nn_params = {
'num_layers': args.num_layers,
'emb_dim': args.emb_dim,
'drop_ratio': args.drop_ratio,
'graph_pooling': args.graph_pooling
}
# automatic dataloading and splitting
dataset = MyPCQM4MDataset(root=args.dataset_root)
split_idx = dataset.get_idx_split()
train_data = dataset[split_idx['train']]
valid_data = dataset[split_idx['valid']]
test_data = dataset[split_idx['test']]
train_loader = DataLoader(train_data, batch_size=args.batch_size, shuffle=True, num_workers=args.num_workers)
valid_loader = DataLoader(valid_data, batch_size=args.batch_size, shuffle=False, num_workers=args.num_workers)
test_loader = DataLoader(test_data, batch_size=args.batch_size, shuffle=False, num_workers=args.num_workers)
# automatic evaluator. takes dataset name as input
evaluator = PCQM4MEvaluator()
criterion_fn = torch.nn.MSELoss()
device = args.device
model = GINGraphPooling(**nn_params).to(device)
num_params = sum(p.numel() for p in model.parameters())
print(f'#Params: {num_params}', file=args.output_file, flush=True)
print(model, file=args.output_file, flush=True)
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=args.weight_decay)
scheduler = StepLR(optimizer, step_size=30, gamma=0.25)
writer = SummaryWriter(log_dir=args.save_dir)
not_improved = 0
best_valid_mae = 9999
for epoch in range(1, args.epochs + 1):
print("=====Epoch {}".format(epoch), file=args.output_file, flush=True)
print('Training...', file=args.output_file, flush=True)
train_mae = train(model, device, train_loader, optimizer, criterion_fn)
print('Evaluating...', file=args.output_file, flush=True)
valid_mae = eval(model, device, valid_loader, evaluator)
print({'Train': train_mae, 'Validation': valid_mae}, file=args.output_file, flush=True)
writer.add_scalar('valid/mae', valid_mae, epoch)
writer.add_scalar('train/mae', train_mae, epoch)
if valid_mae < best_valid_mae:
best_valid_mae = valid_mae
if args.save_test:
print('Saving checkpoint...', file=args.output_file, flush=True)
checkpoint = {
'epoch': epoch, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(),
'scheduler_state_dict': scheduler.state_dict(), 'best_val_mae': best_valid_mae, 'num_params': num_params
}
torch.save(checkpoint, os.path.join(args.save_dir, 'checkpoint.pt'))
print('Predicting on test data...', file=args.output_file, flush=True)
y_pred = test(model, device, test_loader)
print('Saving test submission file...', file=args.output_file, flush=True)
evaluator.save_test_submission({'y_pred': y_pred}, args.save_dir)
not_improved = 0
else:
not_improved += 1
if not_improved == args.early_stop:
print(f"Have not improved for {not_improved} epoches.", file=args.output_file, flush=True)
break
scheduler.step()
print(f'Best validation MAE so far: {best_valid_mae}', file=args.output_file, flush=True)
writer.close()
args.output_file.close()
if __name__ == "__main__":
args = parse_args()
main(args)
3. 通过实验寻找超参数
通过运行以下的命令即可运行一次试验:
#!/bin/sh
python main.py --task_name GINGraphPooling\ # 为当前试验取名
--device 0\
--num_layers 5\ # 使用GINConv层数
--graph_pooling sum\ # 图读出方法
--emb_dim 256\ # 节点嵌入维度
--drop_ratio 0.\
--save_test\ # 是否对测试集做预测并保留预测结果
--batch_size 512\
--epochs 100\
--weight_decay 0.00001\
--early_stop 10\ # 当有`early_stop`个epoches验证集结果没有提升,则停止训练
--num_workers 4\
--dataset_root dataset # 存放数据集的根目录
试验运行开始后,程序会在saves
目录下创建一个task_name
参数指定名称的文件夹用于记录试验过程,当saves
目录下已经有一个同名的文件夹时,程序会在task_name
参数末尾增加一个后缀作为文件夹名称。试验运行过程中,所有的print
输出都会写入到试验文件夹下的output
文件,tensorboard.SummaryWriter
记录的信息也存储在试验文件夹下的文件中。运行的结果如下:
通过修改命令的参数再执行,即可试验不同的超参数,所有试验的过程与结果信息都存储于saves
文件夹下。启动TensorBoard
会话,选择saves
文件夹,即可查看所有试验的过程与结果信息。
启动TensorBoard
会话的步骤如下:
- 首先定位到训练后log文件保存的路径,如下图
- 然后回到log文件所在目录的上一级目录打开cmd
执行如下命令启动Tensorbroad:
【注意】:这里可能会提示:’tensorboard’ 不是内部或外部命令,也不是可运行的程序
解决办法是配置环境变量,如下图所示tensorboard --logdir=GINGraphPooling=2
将结果中的网址拷贝到浏览器,即可打开
Tensorboard
- 在
Tensorboard
中查看结果
从图像中可以看出,55个epoch后,训练集上的绝对平均误差(MAE)值为0.05566,测试集上的是0.1913
4. 结论
通过本节的图预测任务的实践,
- 学习了创建大型数据集的方法
- 更深入的理解了
GIN
的代码 - 学习了
Tensorboard
的使用
5. 作业
- 请通过设置不同的超参数进行试验,观察不同试验的过程与结果信息的差别分析不同超参数对图预测任务的影响。
这里我选择将图池化方式换成mean来测试效果,参数设置如下:
--task_name GINGraphPooling
--device 0
--num_layers 5
--graph_pooling mean
--emb_dim 256
--drop_ratio 0.
--batch_size 512
--epochs 100
--weight_decay 0.00001
--early_stop 10
--num_workers 4
--dataset_root dataset
这里我选择的是将参数拷贝到Pycharm的Edit Configuration中,如下图所示,
结果如下,
这里同样采用之前的方式,去Tensorboard中看下结果,如下图
可以看到,经过42个epoch后,训练集的MAE是0.05646,测试集的MAE是0.1868。对比其他参数相同,图池化方式采用sum的实验结果可以发现,使用mean池化方式在训练集上的MAE较大,而测试集的MAE较小,这里可能是因为采取sum池化方式的模型发生了过拟合,才导致训练集上的效果较差。