1. Explain a linear function with no interactions
- 1.1 SHAP Interaction Values
2. Explain a linear model with one interaction
simulate some binary data and a linear outcome with an interaction term
note we make the features in X perfectly independent of each other to make
it easy to solve for the exact SHAP values
train a model with single tree
make sure the SHAP values add up to marginal predictions
- 2.1 SHAP interaction values

🚀原文地址 📑Notebook

This notebook shows how the SHAP interaction values for a very simple function are computed. We start with a simple linear function, and then add an interaction term to see how it changes the SHAP values and the SHAP interaction values.

import xgboost as xgb
import numpy as np
import shap

1. Explain a linear function with no interactions

# simulate some binary data and a linear outcome with an interaction term
# note we make the features in X perfectly independent of each other to make
# it easy to solve for the exact SHAP values
N = 2000
X = np.zeros((N, 5))
X[:1000, 0] = 1
X[:500, 1] = 1
X[1000:1500, 1] = 1
X[:250, 2] = 1
X[500:750, 2] = 1
X[1000:1250, 2] = 1
X[1500:1750, 2] = 1
X[:, 0:3] -= 0.5
y = 2 * X[:,0] - 3 * X[:, 1]
# ensure the variables are independent
np.cov(X.T)
# and mean centered
X.mean(0)

np.cov 函数用来计算协方差矩阵，如果输入只有一个数组的话，那就是计算方差

# train a model with single tree
Xd = xgb.DMatrix(X, label=y)
model = xgb.train({
    'eta': 1, 
    'max_depth': 3, 
    'base_score': 0, 
    'lambda': 0
    }, Xd, 1)
print("Model error =", np.linalg.norm(y - model.predict(Xd)))
print(model.get_dump(with_stats=True)[0])
# Model error = 0.0
# 0:[f1<0] yes=1,no=2,missing=1,gain=4500,cover=2000
#     1:[f0<0] yes=3,no=4,missing=3,gain=1000,cover=1000
#         3:leaf=0.5,cover=500
#         4:leaf=2.5,cover=500
#     2:[f0<0] yes=5,no=6,missing=5,gain=1000,cover=1000
#         5:leaf=-2.5,cover=500
#         6:leaf=-0.5,cover=500

:::info 解释一下上面代码： ::: :::success

XGB 参数中 base_score 表示对于所有样本预测为正样本的全局偏置，如果迭代次数够多，改变这个参数对结果不会有影响。将 base_score 设置为 #(正样本)/#(所有样本) 可减少迭代次数
np.linalg.norm 表示对输入数组求范式，默认为，公式：
model.get_dump 表示查看基分类器的决策树如何选择特征来进行分裂的，with_stats 设置为 True 表示输出分裂统计信息 :::

# make sure the SHAP values add up to marginal predictions
pred = model.predict(Xd, output_margin=True)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(Xd)
np.abs(shap_values.sum(1) + explainer.expected_value - pred).max()  # 0

:::info 解释一下上面的代码： ::: :::success

model.predict 方法中 output_margin 参数，设置为 True 时，输出的不是概率值而是每个样本在 XGB 生成的所有树中叶子节点的累加值
explainer.expected_value 为样本期望值 :::

If we build a summary plot we see that only features 1 and 2 have any effect, and that their effects only have two possible magnitudes (one for -0.5 and for 0.5).

shap.summary_plot(shap_values, X)

# train a linear model
from sklearn import linear_model
lr = linear_model.LinearRegression()
lr.fit(X, y)
lr_pred = lr.predict(X)
lr.coef_.round(2)  # array([ 2., -3.,  0.,  0.,  0.])
# Make sure the computed SHAP values match the true SHAP values
# (we can compute the true SHAP values directly for this simple case)
main_effect_shap_values = lr.coef_ * (X - X.mean(0))
np.linalg.norm(shap_values - main_effect_shap_values)  # 1.654243349044798e-13

1.1 SHAP Interaction Values

Note that when there are no interactions present the SHAP interaction values are just a diagonal matrix with the SHAP values on the diagonal(对角线).

shap_interaction_values = explainer.shap_interaction_values(Xd)
shap_interaction_values[0]
# array([[ 1. ,  0. ,  0. ,  0. ,  0. ],
#        [ 0. , -1.5,  0. ,  0. ,  0. ],
#        [ 0. ,  0. ,  0. ,  0. ,  0. ],
#        [ 0. ,  0. ,  0. ,  0. ,  0. ],
#        [ 0. ,  0. ,  0. ,  0. ,  0. ]], dtype=float32)
# ensure the SHAP interaction values sum to the marginal predictions
np.abs(shap_interaction_values.sum((1, 2)) + explainer.expected_value - pred).max()  # 0
# ensure the main effects from the SHAP interaction values match those from a linear model
dinds = np.diag_indices(shap_interaction_values.shape[1])
total = 0
for i in range(N):
    for j in range(5):
        total += np.abs(shap_interaction_values[i, j, j] - main_effect_shap_values[i, j])
total  # 1.0533118387982904e-11

:::info 解释： ::: :::success

np.diag_indices 返回一组索引，用来访问 N x N 数组的对角线 :::
2. Explain a linear model with one interaction
```python
simulate some binary data and a linear outcome with an interaction term
note we make the features in X perfectly independent of each other to make
it easy to solve for the exact SHAP values
N = 2000 X = np.zeros((N, 5)) X[:1000, 0] = 1

X[:500, 1] = 1 X[1000:1500, 1] = 1

X[:250, 2] = 1 X[500:750, 2] = 1 X[1000:1250, 2] = 1 X[1500:1750, 2] = 1

X[:125, 3] = 1 X[250:375, 3] = 1 X[500:625, 3] = 1 X[750:875, 3] = 1 X[1000:1125, 3] = 1 X[1250:1375, 3] = 1 X[1500:1625, 3] = 1 X[1750:1875, 3] = 1 X[:,:4] -= 0.4999 # we can’t exactly mean center the data or XGBoost has trouble finding the splits y = 2 X[:, 0] - 3 X[:, 1] + 2 X[:, 1] X[:, 2]

X.mean(0) # array([1.e-04, 1.e-04, 1.e-04, 1.e-04, 0.e+00])

train a model with single tree

Xd = xgb.DMatrix(X, label=y) model = xgb.train({‘eta’: 1, ‘max_depth’: 4, ‘base_score’: 0, “lambda”: 0}, Xd, 1) print(“Model error =”, np.linalg.norm(y-model.predict(Xd))) print(model.get_dump(with_stats=True)[0])

make sure the SHAP values add up to marginal predictions

pred = model.predict(Xd, output_margin=True) explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(Xd) np.abs(shap_values.sum(1) + explainer.expected_value - pred).max() # 4.7683716e-07

输出结果：

Model error = 1.7365037830677591e-06 0:[f1<0.000100001693] yes=1,no=2,missing=1,gain=4499.40039,cover=2000 1:[f0<0.000100001693] yes=3,no=4,missing=3,gain=999.999756,cover=1000 3:[f2<0.000100001693] yes=7,no=8,missing=7,gain=124.949997,cover=500 7:leaf=0.99970001,cover=250 8:leaf=-9.99800031e-05,cover=250 4:[f2<0.000100001693] yes=9,no=10,missing=9,gain=124.949951,cover=500 9:leaf=2.99970007,cover=250 10:leaf=1.99989998,cover=250 2:[f0<0.000100001693] yes=5,no=6,missing=5,gain=999.999756,cover=1000 5:[f2<0.000100001693] yes=11,no=12,missing=11,gain=125.050049,cover=500 11:leaf=-3.0000999,cover=250 12:leaf=-1.99989998,cover=250 6:[f2<0.000100001693] yes=13,no=14,missing=13,gain=125.050018,cover=500 13:leaf=-1.00010002,cover=250 14:leaf=0.000100019999,cover=250

If we build a summary plot we see that now only features 3 and 4 don't matter, and that feature 1 can have four possible effect sizes due to interactions.
```python
shap.summary_plot(shap_values, X)

# train a linear model
lr = linear_model.LinearRegression()
lr.fit(X, y)
lr_pred = lr.predict(X)
lr.coef_.round(2)  # array([ 2., -3.,  0.,  0.,  0.])
# Note that the SHAP values no longer match the main effects because they now include interaction effects
main_effect_shap_values = lr.coef_ * (X - X.mean(0))
np.linalg.norm(shap_values - main_effect_shap_values)  # 15.811389093449788

2.1 SHAP interaction values

# SHAP interaction contributions:
shap_interaction_values = explainer.shap_interaction_values(Xd)
shap_interaction_values[0].round(2)
# Output:
# array([[ 1.  ,  0.  ,  0.  ,  0.  ,  0.  ],
#        [ 0.  , -1.5 ,  0.25,  0.  ,  0.  ],
#        [ 0.  ,  0.25,  0.  ,  0.  ,  0.  ],
#        [ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ],
#        [ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ]], dtype=float32)
# ensure the SHAP interaction values sum to the marginal predictions
np.abs(shap_interaction_values.sum((1, 2)) + explainer.expected_value - pred).max()
# Output: 4.7683716e-07
# ensure the main effects from the SHAP interaction values match those from a linear model.
# while the main effects no longer match the SHAP values when interactions are present, they do match
# the main effects on the diagonal of the SHAP interaction value matrix
dinds = np.diag_indices(shap_interaction_values.shape[1])
total = 0
for i in range(N):
    for j in range(5):
        total += np.abs(shap_interaction_values[i, j, j] - main_effect_shap_values[i, j])
total  # 0.4000067711135082

If we build a dependence plot for feature 0 we that it only takes two values and that these values are entirely dependent on the value of the feature (the value of feature 0 entirely determines it’s effect because it has no interactions with other features).

shap.dependence_plot(0, shap_values, X)

In contrast if we build a dependence plot for feature 2 we see that it takes 4 possible values and they are not entirely determined by the value of feature 2, instead they also depend on the value of feature 3. This vertical spread in a dependence plot represents the effects of non-linear interactions.

shap.dependence_plot(2, shap_values, X)

Dash 教程

🙈 Basic SHAP Interaction Value Example in XGBoost

1. Explain a linear function with no interactions

1.1 SHAP Interaction Values

2. Explain a linear model with one interaction

simulate some binary data and a linear outcome with an interaction term

note we make the features in X perfectly independent of each other to make

it easy to solve for the exact SHAP values

train a model with single tree

make sure the SHAP values add up to marginal predictions

2.1 SHAP interaction values