Skip to content

SklearnDataset

SklearnDataset

Wrapper for scikit-learn compatible data providing factrainer-compatible interface.

This class wraps feature matrix and target vector to provide a consistent interface for cross-validation and data manipulation within the factrainer framework. It supports multiple data formats including numpy arrays, pandas DataFrames/Series, and polars DataFrames/Series. This dataset is passed to the train and predict methods of SingleModelContainer and CvModelContainer when using scikit-learn models.

Attributes:

Name Type Description
X MatrixLike

Feature matrix. Can be a numpy array, pandas DataFrame, or polars DataFrame.

y VectorLike | None

Target vector. Can be a numpy array, pandas Series, polars Series, or None.

Examples:

>>> import numpy as np
>>> from factrainer.sklearn import SklearnDataset
>>> # Create from numpy arrays
>>> X = np.random.randn(100, 10)
>>> y = np.random.randn(100)
>>> dataset = SklearnDataset(X=X, y=y)
>>> # Create without target
>>> dataset = SklearnDataset(X=X)
>>> # Create from pandas
>>> import pandas as pd
>>> df = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(10)])
>>> target = pd.Series(y, name="target")
>>> dataset = SklearnDataset(X=df, y=target)

Attributes

X instance-attribute

X: MatrixLike

y class-attribute instance-attribute

y: VectorLike | None = None