Explain¶

Helpers around feature importance, SHAP, and early-stopping configuration.

from catboost_utils.explain import feature_importance, shap_values, check_early_stopping

fi = feature_importance(model, X)
sv = shap_values(model, X)
check_early_stopping(model, eval_set=(X_val, y_val))

feature_importance returns a sorted DataFrame with named features (resolved from data → Pool → model.feature_names_ → positional fallback). DataFrame inputs are auto-promoted to Pool using model.get_cat_feature_indices() so categorical columns don't crash.

shap_values returns a DataFrame with one column per feature plus expected_value. It pre-emptively detects the "non-zero approx for zero-weight leaf" misconfiguration and raises a clear CBXError. Multiclass SHAP is not yet supported; use the raw API for that.

check_early_stopping raises CBXError when any of od_type, od_wait, od_pval, or early_stopping_rounds is set without an eval_set.

catboost_utils.explain.importance.feature_importance ¶

feature_importance(
    model: Any,
    data: Any | None = None,
    *,
    type: ImportanceType = "PredictionValuesChange",
    prettified: bool = True,
) -> pd.DataFrame | npt.NDArray[Any]

Return feature importances with feature names attached.

Parameters:

Name	Type	Description	Default
`model`	`Any`	a fitted CatBoost model (or CBXClassifier/CBXRegressor).	required
`data`	`Any \| None`	`pd.DataFrame`, `catboost.Pool`, or `None` (for types that don't require it). `"ShapValues"`, `"LossFunctionChange"`, `"Interaction"` require non-None data and will raise `CBXError` if missing.	`None`
`type`	`ImportanceType`	CatBoost importance type.	`'PredictionValuesChange'`
`prettified`	`bool`	when `True` (default), return a sorted `pd.DataFrame` with columns `["feature", "importance"]`. When `False`, return the raw `ndarray` from CatBoost.	`True`

Returns:

Type	Description
`DataFrame \| NDArray[Any]`	DataFrame or ndarray depending on `prettified`.

Raises:

Type	Description
`CBXError`	when `data` is required but not provided, or when the underlying CatBoost call fails for a reason we recognize.

catboost_utils.explain.shap.shap_values ¶

shap_values(model: Any, data: Any) -> pd.DataFrame

Compute SHAP values and return them as a named DataFrame.

Parameters:

Name	Type	Description	Default
`model`	`Any`	a fitted CatBoost / CBX model.	required
`data`	`Any`	`pd.DataFrame` or `catboost.Pool` — the rows to explain.	required

Returns:

Type	Description
`DataFrame`	DataFrame with one column per feature plus `expected_value` (the model
`DataFrame`	bias / base value) as the last column. Rows correspond to the input rows.

Raises:

Type	Description
`CBXError`	if data is None, if the model is multiclass (not supported in v0.1), or if CatBoost reports the well-known "non-zero approx for zero-weight leaf" misconfiguration.

catboost_utils.explain.early_stopping.check_early_stopping ¶

check_early_stopping(
    model: Any, eval_set: Any | None = None
) -> None

Validate early-stopping configuration before fit().

Parameters:

Name	Type	Description	Default
`model`	`Any`	a CatBoost / CBX model instance.	required
`eval_set`	`Any \| None`	the eval_set you intend to pass to `fit()`.	`None`

Raises:

Type	Description
`CBXError`	when any early-stopping parameter is set but `eval_set` is missing.