🚀 原文链接:https://xgboost.readthedocs.io/en/latest/treemethod.html
For training boosted tree models, there are 2 parameters used for choosing algorithms, namely updater
and tree_method
. XGBoost has 4 builtin tree methods, namely exact
, approx
, hist
and gpu_hist
. Along with these tree methods, there are also some free standing updaters including grow_local_histmaker
, refresh
, prune
and sync
. The parameter updater
is more primitive(原始的) than tree_method
as the latter is just a pre-configuration of the former(以前的). The difference is mostly due to historical reasons that each updater requires some specific configurations and might has missing features. As we are moving forward, the gap between them is becoming more and more irrevelant. We will collectively(全体的) document(记录) them under tree methods.
1、Exact Solution
Exact means XGBoost considers all candidates(候选点) from data for tree splitting, but underlying(潜在的,根本的) the objective is still interpreted(步骤) as a Taylor expansion.
exact
: Vanilla tree boosting tree algorithm described in reference paper. During each split finding procedure, it iterates over every entry of input data. It’s more accurate (among other greedy methods) but slow in computation performance. Also it doesn’t support distributed training as XGBoost employs row spliting data distribution whileexact
tree method works on a sorted column format. This tree method can be used with parametertree_method
set toexact
.2、Approximated Solutions
As
exact
tree method is slow in performance and not scalable, we often employ approximated(近似的) training algorithms. These algorithms build a gradient histogram for each node and iterate through the histogram instead of real dataset. Here we introduce the implementations in XGBoost below.grow_local_histmaker
updater: An approximation tree method described in reference paper. This updater is rarely used in practice so it’s still an updater rather than tree method. During split finding, it first runs a weighted GK sketching(草图) for data points belong to current node to find split candidates, using hessian as weights. The histogram is built upon this per-node sketch. It’s faster thanexact
in some applications, but still slow in computation.approx
tree method: An approximation tree method described in reference paper. Different fromgrow_local_histmaker
, it runs sketching before building each tree using all the rows (rows belonging to the root) instead of per-node dataset. Similar togrow_local_histmaker
updater, hessian is used as weights during sketch. The algorithm can be accessed by settingtree_method
toapprox
.hist
tree method: An approximation tree method used in LightGBM with slight differences in implementation. It runs sketching before training using only user provided weights instead of hessian. The subsequent(随后的 ) per-node histogram(柱状图) is built upon this global sketch. This is the fastest algorithm as it runs sketching only once. The algorithm can be accessed by settingtree_method
tohist
.gpu_hist
tree method: Thegpu_hist
tree method is a GPU implementation ofhist
, with additional support for gradient based sampling. The algorithm can be accessed by settingtree_method
togpu_hist
.3、Implications
Some objectives like
reg:squarederror
have constant(常数) hessian. In this case,hist
orgpu_hist
should be preferred as weighted sketching doesn’t make sense with constant weights. When using non-constant hessian objectives, sometimesapprox
yields better accuracy, but with slower computation performance. Most of the time using(gpu)_hist
with highermax_bin
can achieve similar or even superior(更好的) accuracy while maintaining good performance. However, as xgboost is largely driven by community effort, the actual implementations have some differences than pure math description. Result might have slight differences than expectation(期望), which we are currently trying to overcome(克服).4、Other Updaters
Pruner
: It prunes the built tree bygamma
parameter.pruner
is usually used as part of other tree methods.Refresh
: Refresh the statistic of bulilt trees on a new training dataset.Sync
: Synchronize the tree among workers when running distributed training.
5、Removed Updaters
2 Updaters were removed during development due to maintainability(可维护性). We describe them here solely(唯一地) for the interest of documentation. First one is distributedcolmaker
, which was a distributed version of exact tree method. It required specialization for column based spliting strategy and a different prediction procedure. As the exact tree method is slow by itself and scaling is even less efficient, we removed it entirely. Second one isskmaker
. Per-node weighted sketching employed bygrow_local_histmaker
is slow, theskmaker
was unmaintained and seems to be a workaround trying to eliminate(排除) the histogram creation step and uses sketching values directly during split evaluation(评价). It was never tested and contained some unknown bugs, we decided to remove it and focus our resources on more promising algorithms instead. For accuracy, most of the timeapprox
,hist
andgpu_hist
are enough with some parameters tunning(完全匹配), so removing them don’t have any real practical impact.