会议纪要

1.Pre-train achieve the comparable results as prosit
2.CNN is less expressive than transformer and stacked LSTM. And model capacity of transformer is similar to that of stacked LSTM
3.Dataset of prosit is saturate as the PCC is about 0.99.
4.The aim of use large scale data is to utilize large data without phospho modified peptide.
5.Jeff pre-trained did not help to CNN and there is overfitting of CNN
6.Potentially we could reduce the overfitting by decreasing blocks of CNN
7.We maybe should try 64 layer transformer to achieve the training performance of 0.99 if gap of training and val is 0.02.
8.I supposed to main two type of models varied on whether pre-trained on prosit data.
9.We should compare with pdeep2 fairly as we predict 12 types of ions but 8 types for pdeep2
10.Do a statistical analysis on each ion type to see the variance of PCCs.
11.We should use MSE rather than RMSE
12.As we has lots of ions whose normalized intensity is 1, it’s more rational to multiply 1.1 of sigmoid where 1 is hard to reach.
13.When discussed with iHuman, I might ignore some technical detail other than pinpointing the information they cared about.

ToDo

1.Compare with pdeep2 fairly by use 8 ions.
2.Should try 64 layer transformer
3.Should use MSE rather than RMSE
4.Multiply 1.1 of sigmoid
5.Do a statistical analysis on each ion type to see the variance of PCCs.