In this tutorial, we aim to understand two popular techniques for interpreting machine learning models, SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-Agnostic Explanations).
By the end of this tutorial, you will:
- Understand what SHAP and LIME are
- Know how to interpret a machine learning model using SHAP and LIME
- Practical knowledge of implementing SHAP and LIME using Python
A basic understanding of Python, machine learning models, and familiarity with Python's data science stack (Pandas, numpy, scikit-learn) is required. Prior exposure to Jupyter Notebooks would be beneficial.
SHAP connects optimal credit allocation with local explanations using the classic Shapley values from cooperative game theory and their related extensions.
It works by calculating the contribution of each feature to the prediction for each instance.
LIME explains predictions of any classifier or regressor in a faithful way by approximating it locally with an interpretable model.
It works by perturbing the instance, learning locally to approximate the underlying model.
import shap
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
import pandas as pd
import numpy as np
# load dataset
data = pd.read_csv('data.csv')
X = data.drop('target', axis=1)
y = data['target']
# split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# train model
model = RandomForestRegressor()
model.fit(X_train, y_train)
# create explainer
explainer = shap.TreeExplainer(model)
# calculate shap values
shap_values = explainer.shap_values(X_test)
# plot
shap.summary_plot(shap_values, X_test)
import lime
from lime.lime_tabular import LimeTabularExplainer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import numpy as np
# load dataset
data = pd.read_csv('data.csv')
X = data.drop('target', axis=1)
y = data['target']
# split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# train model
model = RandomForestClassifier()
model.fit(X_train, y_train)
# create explainer
explainer = LimeTabularExplainer(X_train.values,
feature_names=X_train.columns,
class_names=['0', '1'],
verbose=True,
mode='classification')
# explain a prediction
exp = explainer.explain_instance(X_test.values[0], model.predict_proba, num_features=5)
exp.show_in_notebook(show_all=False)
In this tutorial, we've covered how to use SHAP and LIME for interpreting machine learning models. We have also seen how to implement them in Python.