SHAP in Python
Interpretation of a Machine Learning model has been a longstanding issue. Many methods have been proposed over the last few years to use alternative approaches to the interpretation issue. I recently came across an article which uses SHAP (or Shapley Values), first introduced in 2017 in this paper.
SHAP or SHapley Additive exPlanations is a method to explain the results of running a machine learning model using game theory. The basic idea behind SHAP is fair allocation from cooperative game theory to allocate credit for a model’s output.
I will be using a sample data set to briefly explain how we can use this technique to our advantage.
Running the logistic regression code on the sample dataset.
X = data3[x_columns]
X_test = X.iloc[1:550,:]
Y_test = data3.iloc[1:550,k-1]
x = X_test
logit_model = sm.Logit(Y_test, sm.add_constant(X_test)).fit()
We will now apply a step-wise regression to find out the best model using some or all of the features of the above dataset.
We will now employ SHAP on our logistic regression model to figure out the most important features.
masker = shap.maskers.Independent(data = X_test)model = LogisticRegression(random_state = 1)
model.fit(X_train, Y_train)explainer = shap.LinearExplainer(model, masker=masker)
shap_values = explainer(X_test)
There are various ways to visualize the output of SHAP method.
As we can see that model obtained from SHAP is nearly similar to the model obtained through best subset model. In summary, SHAP is useful tool with much more interesting applications. This article deals in mode details the vast possibilities of this library.
Connect with me on LinkedIn:
Harsh Harsh - Graduate Research Analyst - University of California, Davis - Graduate School of…
View Harsh Harsh's profile on LinkedIn, the world's largest professional community. Harsh has 7 jobs listed on their…
Some of my other articles can be found here:
I have written about Quasi-experiment designs in my earlier articles. Difference-in-Differences (DiD) is a kind of…