what to do next - Prioritizing What to Work On - 《机器学习》

1.敏捷方法：快速实现一个简单算法、定义一个指标在cross validation数据集上验证、人工分析case判断优先级增加特征还是增加数据
The recommended approach to solving machine learning problems is to:

Start with a simple algorithm, implement it quickly, and test it early on your cross validation data.
Plot learning curves to decide if more data, more features, etc. are likely to help.
Manually examine the errors on examples in the cross validation set and try to spot a trend where most of the errors were made.

2.避免算法欺骗的手段从只用cost细化为准确率和召回率
这样针对极端分布的样本也可以很好的评估效果

3.平衡准确率与召回率的方式：根据诉求设置不同的h临界值高于0.5或低于0.5、使用F1 score来评估权衡算法自动选择

4.什么情况下更多的数据是有效的：数据包含的特征是有足够有效信息的（假设一个人类专家是否可以基于这些信息做出判断）、模型有很多特征参数来识别。