SHAP in Python

3 min readJan 19, 2022

Interpretation of a Machine Learning model has been a longstanding issue. Many methods have been proposed over the last few years to use alternative approaches to the interpretation issue. I recently came across an article which uses SHAP (or Shapley Values), first introduced in 2017 in this paper.

SHAP or SHapley Additive exPlanations is a method to explain the results of running a machine learning model using game theory. The basic idea behind SHAP is fair allocation from cooperative game theory to allocate credit for a model’s output.

I will be using a sample data set to briefly explain how we can use this technique to our advantage.

Running the logistic regression code on the sample dataset.

def get_stats():
    X = data3[x_columns]
    X_test = X.iloc[1:550,:]
    Y_test = data3.iloc[1:550,k-1]
    x = X_test
    logit_model = sm.Logit(Y_test, sm.add_constant(X_test)).fit()
    print(logit_model.summary())
    
get_stats()

Full Model representation of Logistic Regression

We will now apply a step-wise regression to find out the best model using some or all of the features of the above dataset.

We will now employ SHAP on our logistic regression model to figure out the most important features.

import shap
masker = shap.maskers.Independent(data = X_test)model = LogisticRegression(random_state = 1)
model.fit(X_train, Y_train)explainer = shap.LinearExplainer(model, masker=masker)
shap_values = explainer(X_test)

There are various ways to visualize the output of SHAP method.

shap.plots.waterfall(shap_values[0])

Graph showing the extent to which each feature affect the output

shap.plots.beeswarm(shap_values)

Graph representing the importance of each feature

Partial Model created after logistic regression

As we can see that model obtained from SHAP is nearly similar to the model obtained through best subset model. In summary, SHAP is useful tool with much more interesting applications. This article deals in mode details the vast possibilities of this library.

Happy Exploring!!

Connect with me on LinkedIn:

Harsh Harsh - Graduate Research Analyst - University of California, Davis - Graduate School of…

View Harsh Harsh's profile on LinkedIn, the world's largest professional community. Harsh has 7 jobs listed on their…

www.linkedin.com

Some of my other articles can be found here:

Difference-in-Differences

I have written about Quasi-experiment designs in my earlier articles. Difference-in-Differences (DiD) is a kind of…

singhharsh246.medium.com

Regression Discontinuity design

In my previous articles, I wrote about Quasi-experiment designs to reduce the effect of interference on the analysis. I…