CvModelContainer¶

CvModelContainer ¶

CvModelContainer(
    model_config: BaseMlModelConfig[T, U, V, W],
    k_fold: _BaseKFold | SplittedDatasetsIndices,
)

Cross-validation model container for machine learning models.

This class provides a container for cross-validation models. It takes a model configuration and a cross-validation splitter, and provides methods for training models and making predictions.

Parameters:

Name	Type	Description	Default
`model_config`	`BaseMlModelConfig`	The model configuration, which includes the learner, predictor, training configuration, and prediction configuration.	required
`k_fold`	`_BaseKFold or SplittedDatasetsIndices`	The cross-validation splitter, which can be either a scikit-learn _BaseKFold object or a SplittedDatasetsIndices object. If _BaseKFold is specified, the same indices will be used for both validation and test data. To specify custom indices without such constraints, use SplittedDatasetsIndices.	required

Examples:

>>> import lightgbm as lgb
>>> from sklearn.datasets import make_regression
>>> from sklearn.metrics import r2_score
>>> from sklearn.model_selection import KFold
>>> from factrainer.core import CvModelContainer, EvalMode
>>> from factrainer.lightgbm import LgbDataset, LgbModelConfig, LgbTrainConfig
>>>
>>> # Load data
>>> X, y = make_regression()
>>> dataset = LgbDataset(dataset=lgb.Dataset(X, label=y))
>>>
>>> # Configure model
>>> config = LgbModelConfig.create(
...     train_config=LgbTrainConfig(
...         params={"objective": "regression", "verbose": -1},
...         callbacks=[lgb.early_stopping(100, verbose=False)],
...     ),
... )
>>>
>>> # Set up cross-validation
>>> k_fold = KFold(n_splits=4, shuffle=True, random_state=1)
>>>
>>> # Create and train model
>>> model = CvModelContainer(config, k_fold)
>>> model.train(dataset, n_jobs=4)
>>>
>>> # Get OOF predictions
>>> y_pred = model.predict(dataset, n_jobs=4)
>>>
>>> # Evaluate predictions
>>> metric = model.evaluate(y, y_pred, r2_score)
>>>
>>> # Or get per-fold metrics
>>> metrics = model.evaluate(y, y_pred, r2_score, eval_mode=EvalMode.FOLD_WISE)

Attributes¶

raw_model `property` ¶

raw_model: RawModels[U]

Get the raw models from cross-validation.

Returns:

Type	Description
`RawModels[U]`	The raw models as a RawModels object.

train_config `property` `writable` ¶

train_config: V

Get the training configuration.

Returns:

Type	Description
`V`	The training configuration.

pred_config `property` `writable` ¶

pred_config: W

Get the prediction configuration.

Returns:

Type	Description
`W`	The prediction configuration.

cv_indices `property` ¶

cv_indices: SplittedDatasetsIndices

Get the cross-validation split indices after training.

This property returns the cross-validation split indices that are stored in the instance after the train method is executed.

Returns:

Type	Description
`SplittedDatasetsIndices`	The cross-validation split indices.

k_fold `property` ¶

k_fold: _BaseKFold | SplittedDatasetsIndices

Get the cross-validation splitter.

Returns:

Type	Description
`_BaseKFold or SplittedDatasetsIndices`	The cross-validation splitter.

Functions¶

train ¶

train(train_dataset: T, n_jobs: int | None = None) -> None

Train the model using cross-validation.

This method trains the model using cross-validation, according to the specified cross-validation splitter. The trained models can be accessed through the raw_model property.

Parameters:

Name	Type	Description	Default
`train_dataset`	`T`	The training dataset.	required
`n_jobs`	`int or None`	The number of jobs to run in parallel. If -1, all CPUs are used. If None, no parallel processing is used. Default is None.	`None`

Returns:

Type	Description
`None`

predict ¶

predict(
    pred_dataset: T,
    n_jobs: int | None = None,
    mode: PredMode = OOF_PRED,
) -> Prediction

Make predictions using the trained models.

This method makes predictions using the trained models. It supports two prediction modes:

Out-of-fold (OOF) predictions: Predictions for the training data using models trained on other folds.
Averaging Ensemble predictions: Predictions using averaging ensemble of all trained models.This mode should ONLY be used for unseen data (test data), as using it on training data would lead to data leakage.

Parameters:

Name	Type	Description	Default
`pred_dataset`	`T`	The dataset to make predictions for.	required
`n_jobs`	`int or None`	The number of jobs to run in parallel. If -1, all CPUs are used. If None, no parallel processing is used. Default is None.	`None`
`mode`	`PredMode`	The prediction mode. Can be either PredMode.OOF_PRED for out-of-fold predictions or PredMode.AVG_ENSEMBLE for averaging ensemble predictions. Default is PredMode.OOF_PRED.	`OOF_PRED`

Returns:

Type	Description
`Prediction`	The predictions as a NumPy array.

Raises:

Type	Description
`ValueError`	If the prediction mode is invalid.

evaluate ¶

evaluate(
    y_true: Target,
    y_pred: Prediction,
    eval_func: EvalFunc[X],
    eval_mode: Literal[POOLING] = POOLING,
) -> X

evaluate(
    y_true: Target,
    y_pred: Prediction,
    eval_func: EvalFunc[X],
    eval_mode: Literal[FOLD_WISE],
) -> Sequence[X]

evaluate(
    y_true: Target,
    y_pred: Prediction,
    eval_func: EvalFunc[X],
    eval_mode: EvalMode = POOLING,
) -> X | Sequence[X]

Evaluate the model's predictions against true values.

This method evaluates predictions from cross-validation models. The predictions can be either out-of-fold (OOF) predictions or predictions on unseen data (held-out test set).

Parameters:

Name	Type	Description	Default
`y_true`	`Target`	The true target values as a NumPy array.	required
`y_pred`	`Prediction`	The predicted values as a NumPy array. Must have the same shape as y_true. These can be: - Out-of-fold predictions from predict(mode=PredMode.OOF_PRED) - Predictions on unseen data from predict(mode=PredMode.AVG_ENSEMBLE)	required
`eval_func`	`EvalFunc[X]`	The evaluation function that takes (y_true, y_pred) and returns a metric. Common examples include sklearn.metrics functions like r2_score, mae, etc.	required
`eval_mode`	`EvalMode`	The evaluation mode: - EvalMode.POOLING: Compute a single metric across all predictions (standard for both OOF evaluation and held-out test set evaluation) - EvalMode.FOLD_WISE: Compute metrics for each fold separately (useful for analyzing per-fold performance in OOF predictions)	`EvalMode.POOLING`

Returns:

Type	Description
`X \| Sequence[X]`	If eval_mode is POOLING, returns a single evaluation score of type X. If eval_mode is FOLD_WISE, returns a list of evaluation scores per fold.

Raises:

Type	Description
`ValueError`	If y_true or y_pred are not NumPy arrays.

CvModelContainer¶