1.敏捷方法:快速实现一个简单算法、定义一个指标在cross validation数据集上验证、人工分析case判断优先级增加特征还是增加数据
The recommended approach to solving machine learning problems is to:
- Start with a simple algorithm, implement it quickly, and test it early on your cross validation data.
- Plot learning curves to decide if more data, more features, etc. are likely to help.
- Manually examine the errors on examples in the cross validation set and try to spot a trend where most of the errors were made.


2.避免算法欺骗的手段从只用cost细化为准确率和召回率
这样针对极端分布的样本也可以很好的评估效果
3.平衡准确率与召回率的方式:根据诉求设置不同的h临界值高于0.5或低于0.5、使用F1 score来评估权衡算法自动选择

4.什么情况下更多的数据是有效的:数据包含的特征是有足够有效信息的(假设一个人类专家是否可以基于这些信息做出判断)、模型有很多特征参数来识别。
