ShapML.shap
— Functionshap(explain::DataFrame,
reference::Union{DataFrame, Nothing} = nothing,
model,
predict_function::Function,
target_features::Union{Vector, Nothing} = nothing,
sample_size::Integer = 60,
parallel::Symbol = [:none, :samples, :features, :both],
seed::Integer = 1,
precision::Union{Integer, Nothing} = nothing,
chunk::Bool = true,
reconcile_instance::Bool = false
)
Compute stochastic feature-level Shapley values for any ML model.
Arguments
explain::DataFrame
: A DataFrame of model features with 1 or more instances to be explained using Shapley values.reference
: Optional. A DataFrame with the same format asexplain
which serves as a reference group against which the Shapley value deviations fromexplain
are compared (i.e., the model intercept).model
: A trained ML model that is passed intopredict_function
.predict_function
: A wrapper function that takes 2 required positional arguments–(1) the trained model frommodel
and (2) a DataFrame of instances with the same format asexplain
. The function should return a 1-column DataFrame of model predictions; the column name does not matter.target_features
: Optional. AnArray{String, 1}
of model features that is a subset of feature names inexplain
for which Shapley values will be computed. For high-dimensional models, selecting a subset of features may dramatically speed up computation time. The default behavior is to return Shapley values for all instances and features inexplain
.sample_size::Integer
: The number of Monte Carlo samples used to compute the stochastic Shapley values for each feature.parallel::Union{Symbol, Nothing}
: One of [:none, :samples, :features, :both]. Whether to perform the calculation serially (:none) or in parallel over Monte Carlo samples (:samples) withpmap()
and/or multi-threaded over target features (:features) with @threads or :both.seed::Integer
: A number passed toRandom.seed!()
to get reproducible results.precision::Union{Integer, Nothing}
: The number of digits toround()
results in the ouput (to reduce the size of the returned DataFrame).chunk::Bool
: Defaulttrue
. Increases speed on data with many instances and/or features. Calls thepredict()
function once per sample insample_size
instead of once per call toShapML.shap()
.reconcile_instance
: EXPERIMENTAL. For each instance inexplain
, the stochastic feature-level Shapley values are adjusted so that their sum equals the model prediction. The adjustments are based on feature-level sampling variances and are typically small compared to the model prediction.
Return
- A
size(explain, 1)
*length(target_features)
row by 6 column DataFrame.index
: An instance inexplain
.feature_name
: Model feature.feature_value
: Feature value.shap_effect
: The average Shapley value across Monte Carlo samples.shap_effect_sd
: The standard deviation of Shapley values across Monte Carlo samples.intercept
: The average model prediction fromexplain
orreference
.