Decompose
Converting 2014-09-20T20:45:40Z
into categorical attributes like hour_of_the_day
, part_of_the_day
, etc
Discretization 离散化
Continous Features
Typically data is discretized into partitions of K equal lengths/width (equal intervals) or k% of the total data (equal frequencies).
Categorical Features
Values for categorical feautures may be combined, particularly when there’s few samples for some categories.
Reframe Numerical Quantities
Changing from grams
to kg
, and losing detail might be both wanted and efficient for calculation.
Crossing
Creating new features as a combination of existing features. Could be multiplying numerical features, or combining categorical variables. This is a great way to add domain expertise knowledge to the dataset.