lasso回归 - 《小琪学统计》

图中的每一条曲线代表了每一个自变量系数的变化轨迹，纵坐标是系数的值，下横坐标是λ，上横坐标是此时模型中非零系数的个数。
随着penalty的增加（从左到右），或者说正则项越大，即lamda越小的时候，你筛选出来的变量越多。对于lasso来说，lamda越大，给的penalty越大，b!=0的变量越少。
从左到有lamda变小L1 norm就是|beta|和。lamda和L1的关系如下：argmin beta(min(RSS(beta) + lambda L1(beta)))
这个链接里讲得比较清楚，你可以具体参考：r - Interpretting LASSO variable trace plots

不过，lasso是有偏估计，如果要fit模型lasso出来的系数不是最好。建议你用elastic net或者adpative lasso会更好。

链接内容相当于这个问题的英文版，整理如下：

Mayou问：I am new to the glmnet package, and I am still unsure of how to interpret the results. Could anyone please help me read the following trace plot?

David Marx答：In both plots, each colored line represents the value taken by a different coefficient in your model. Lambda is the weight given to the regularization term (the L1 norm), so as lambda approaches zero, the loss function of your model approaches the OLS loss function. Here’s one way you could specify the LASSO loss function to make this concrete

Therefore, when lambda is very small, the LASSO solution should be very close to the OLS solution, and all of your coefficients are in the model. As lambda grows, the regularization term has greater effect and you will see fewer variables in your model (because more and more coefficients will be zero valued).

As I mentioned above, the L1 norm is the regularization term for LASSO. Perhaps a better way to look at it is that the x-axis is the maximum permissible value the L1 norm can take. So when you have a small L1 norm, you have a lot of regularization. Therefore, an L1 norm of zero gives an empty model, and as you increase the L1 norm, variables will “enter” the model as their coefficients take non-zero values.

The plot on the left and the plot on the right are basically showing you the same thing, just on different scales.

Mayou问：Very neat answer, thanks! Is it possible to deduce the “best predictors” from the graphs above, i.e. a final model?

JAW答：No, you’ll need to cross-validation or some other validation procedure for that; it’ll tell you which value of the L1 norm (or equivalently, which log(lambda)) yields the model with best predictive ability.

David Marx答：If you’re trying to determine your strongest predictors, you could interpret the plot as evidence that variables that enter the model early are the most predictive and variables that enter the model later are less important. If you want the “best model,” generally this is found via cross validation. A common method for attaining this using the glmnet package was suggested to you here: stats.stackexchange.com/a/68350/8451 . I strongly recommend you read the short Lasso chapter in ESLII (3.4.2 and 3.4.3), which is free to download: www-stat.stanford.edu/~tibs/ElemStatLearn

作者：解琪琪
链接：https://www.jianshu.com/u/bcb81276c29d
来源：简书
参考学习资源：LASSO回歸在生物醫學資料中的簡單實例
简书著作权归作者所有，任何形式的转载都请联系作者获得授权并注明出处。