原文链接:https://xgboost.readthedocs.io/en/latest/prediction.html
There are a number of prediction functions in XGBoost with various parameters. This document attempts to clarify(澄清) some of confusions around prediction with a focus on the Python binding.
1、Prediction Options
There are a number of different prediction options for the xgboost.Booster.predict()
method, ranging from pred_contribs
to pred_leaf
. The output shape depends on types of prediction. Also for multi-class classification problem, XGBoost builds one tree for each class and the trees for each class are called a “group” of trees, so output dimension may change due to used model. After 1.4 release, we added a new parameter called strict_shape
, one can set it to True
to indicate(表明) a more restricted output is desired. Assuming you are using xgboost.Booster
, here is a list of possible returns:
- When using normal prediction with
strict_shape
set toTrue
:
Output is a 2-dim array with first dimension as rows and second as groups. For regression/survival/ranking/binary classification this is equivalent to a column vector withshape[1] == 1
. But for multi-class withmulti:softprob
the number of columns equals to number of classes. If strict_shape is set toFalse
then XGBoost might output 1 or 2 dim array. - When using
output_margin
to avoid transformation andstrict_shape
is set toTrue
:
Similar to the previous case, output is a 2-dim array, except for thatmulti:softmax
has equivalent output ofmulti:softprob
due to dropped transformation. If strict shape is set toFalse
then output can have 1 or 2 dim depending on used model. - When using
preds_contribs
withstrict_shape
set toTrue
:
Output is a 3-dim array, with(rows, groups, columns + 1)
as shape. Whetherapprox_contribs
is used does not change the output shape. If the strict shape parameter is not set, it can be a 2 or 3 dimension array depending on whether multi-class model is being used. - When using
preds_interactions
withstrict_shape
set toTrue
:
Output is a 4-dim array, with(rows, groups, columns + 1, columns + 1)
as shape. Like the predict contribution case, whetherapprox_contribs
is used does not change the output shape. If strict shape is set toFalse
, it can have 3 or 4 dims depending on the underlying model. - When using
pred_leaf
withstrict_shape
set toTrue
:
Output is a 4-dim array with(n_samples, n_iterations, n_classes, n_trees_in_forest)
as shape.n_trees_in_forest
is specified by thenumb_parallel_tree
during training. When strict shape is set to False, output is a 2-dim array with last 3 dims concatenated into 1. When usingapply
method in scikit learn interface, this is set to False by default.
Other than these prediction types, there’s also a parameter called iteration_range
, which is similar to model slicing. But instead of actually splitting up the model into multiple stacks, it simply returns the prediction formed by the trees within range. Number of trees created in each iteration eqauls to treesi=num_class×num_parallel_treetreesi=num_class×num_parallel_tree. So if you are training a boosted random forest with size of 4, on the 3-class classification dataset, and want to use the first 2 iterations of trees for prediction, you need to provide iteration_range=(0, 2)
. Then the first 2×3×42×3×4 trees will be used in this prediction.