Skip to content

Validation

catboost_utils.validation runs cheap pre-flight checks against your training data, returning a structured report you can inspect or assert on.

from catboost_utils import validate

report = validate(
    X, y,
    cat_features=["city"],
    eval_set=(X_val, y_val),
    model_params={"task_type": "GPU"},
)
report.ok           # bool
report.issues       # blocking
report.warnings     # non-blocking
report.raise_if_failed()

Blocking issues

  • empty DataFrame
  • target with NaN or only one unique value
  • categorical features with float dtype
  • NaN in object-dtype columns not declared as cat
  • inf / -inf in numerical columns

Warnings

  • undeclared object/category/string/bool columns
  • low-cardinality int columns (likely categorical)
  • 50% NaN ratio

  • constant columns
  • datetime / timedelta dtype
  • column mismatch between train and eval
  • both class_weights and auto_class_weights set
  • task_type='GPU' or thread_count > 1 (non-bitwise reproducibility)

catboost_utils.validation.runner.validate

validate(
    X: DataFrame,
    y: Series | None = None,
    *,
    cat_features: list[str | int] | None = None,
    eval_set: tuple[DataFrame, Any] | None = None,
    model_params: dict[str, Any] | None = None,
) -> ValidationReport

Run pre-flight checks against a training DataFrame.

Parameters:

Name Type Description Default
X DataFrame

feature DataFrame (a catboost.Pool raises NotImplementedError).

required
y Series | None

target series. Optional — when None, target checks are skipped.

None
cat_features list[str | int] | None

list of categorical column names or positional indices.

None
eval_set tuple[DataFrame, Any] | None

optional (X_eval, y_eval) tuple — checks column alignment.

None
model_params dict[str, Any] | None

optional dict of CatBoost params for cross-cutting warnings (class_weights conflict, GPU / multi-thread reproducibility).

None

Returns:

Type Description
ValidationReport

ValidationReport with ok, issues, warnings.

Raises:

Type Description
NotImplementedError

when X is a catboost.Pool.

TypeError

when X is not a DataFrame.

catboost_utils.validation.models.ValidationReport

Bases: BaseModel

Full result of validate().

raise_if_failed

raise_if_failed() -> None

Raise ValidationError if there are any blocking issues.

catboost_utils.validation.models.ValidationIssue

Bases: BaseModel

Blocking problem — training will (almost certainly) fail.

catboost_utils.validation.models.ValidationWarning

Bases: BaseModel

Non-blocking concern — training proceeds but the user should be aware.