Skip to content

Explain

Helpers around feature importance, SHAP, and early-stopping configuration.

from catboost_utils.explain import feature_importance, shap_values, check_early_stopping

fi = feature_importance(model, X)
sv = shap_values(model, X)
check_early_stopping(model, eval_set=(X_val, y_val))

feature_importance returns a sorted DataFrame with named features (resolved from data → Pool → model.feature_names_ → positional fallback). DataFrame inputs are auto-promoted to Pool using model.get_cat_feature_indices() so categorical columns don't crash.

shap_values returns a DataFrame with one column per feature plus expected_value. It pre-emptively detects the "non-zero approx for zero-weight leaf" misconfiguration and raises a clear CBXError. Multiclass SHAP is not yet supported; use the raw API for that.

check_early_stopping raises CBXError when any of od_type, od_wait, od_pval, or early_stopping_rounds is set without an eval_set.

catboost_utils.explain.importance.feature_importance

feature_importance(
    model: Any,
    data: Any | None = None,
    *,
    type: ImportanceType = "PredictionValuesChange",
    prettified: bool = True,
) -> pd.DataFrame | npt.NDArray[Any]

Return feature importances with feature names attached.

Parameters:

Name Type Description Default
model Any

a fitted CatBoost model (or CBXClassifier/CBXRegressor).

required
data Any | None

pd.DataFrame, catboost.Pool, or None (for types that don't require it). "ShapValues", "LossFunctionChange", "Interaction" require non-None data and will raise CBXError if missing.

None
type ImportanceType

CatBoost importance type.

'PredictionValuesChange'
prettified bool

when True (default), return a sorted pd.DataFrame with columns ["feature", "importance"]. When False, return the raw ndarray from CatBoost.

True

Returns:

Type Description
DataFrame | NDArray[Any]

DataFrame or ndarray depending on prettified.

Raises:

Type Description
CBXError

when data is required but not provided, or when the underlying CatBoost call fails for a reason we recognize.

catboost_utils.explain.shap.shap_values

shap_values(model: Any, data: Any) -> pd.DataFrame

Compute SHAP values and return them as a named DataFrame.

Parameters:

Name Type Description Default
model Any

a fitted CatBoost / CBX model.

required
data Any

pd.DataFrame or catboost.Pool — the rows to explain.

required

Returns:

Type Description
DataFrame

DataFrame with one column per feature plus expected_value (the model

DataFrame

bias / base value) as the last column. Rows correspond to the input rows.

Raises:

Type Description
CBXError

if data is None, if the model is multiclass (not supported in v0.1), or if CatBoost reports the well-known "non-zero approx for zero-weight leaf" misconfiguration.

catboost_utils.explain.early_stopping.check_early_stopping

check_early_stopping(
    model: Any, eval_set: Any | None = None
) -> None

Validate early-stopping configuration before fit().

Parameters:

Name Type Description Default
model Any

a CatBoost / CBX model instance.

required
eval_set Any | None

the eval_set you intend to pass to fit().

None

Raises:

Type Description
CBXError

when any early-stopping parameter is set but eval_set is missing.