sksurv.preprocessing.OneHotEncoder#

class sksurv.preprocessing.OneHotEncoder(*, allow_drop=True)[source]#

Encode categorical features using a one-hot scheme.

Accepts pandas.DataFrame and polars.DataFrame inputs. The following column dtypes are treated as categorical features:

  • pandas: category or object

  • polars: polars.Categorical, polars.Enum, or polars.String

The features are encoded using a one-hot (or dummy) encoding scheme, which creates a binary column for each category. By default, one category per feature is dropped: a column with M categories is encoded as M - 1 integer columns according to the one-hot scheme.

The order of non-categorical columns is preserved. Encoded columns are inserted in place of the original column. The output dataframe library matches the input.

fit and transform must be called with the same dataframe library. Passing a pandas input to one and a polars input to the other raises TypeError.

Parameters:

allow_drop (bool, optional, default: True) – Whether to allow dropping categorical columns that only consist of a single category.

feature_names_#

Names of categorical features that were encoded.

Type:

pandas.Index

categories_#

A dictionary mapping each categorical feature name to a pandas.Index of categories.

Type:

dict

encoded_columns_#

The full list of feature names in the transformed output.

Type:

pandas.Index

n_features_in_#

Number of features seen during fit.

Type:

int

feature_names_in_#

Names of features seen during fit. Defined only when X has feature names that are all strings.

Type:

ndarray, shape = (n_features_in_,)

__init__(*, allow_drop=True)[source]#

Methods

__init__(*[, allow_drop])

fit(X[, y])

Determine which features are categorical and should be one-hot encoded.

fit_transform(X[, y])

Fit to data, then transform it.

get_feature_names_out([input_features])

Get output feature names for transformation.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform X by one-hot encoding categorical features.

fit(X, y=None)[source]#

Determine which features are categorical and should be one-hot encoded.

Parameters:
Returns:

self – Returns the instance itself.

Return type:

object

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits the transformer to X by identifying categorical features and then returns a transformed version of X with categorical features one-hot encoded.

Parameters:
Returns:

Xt – The transformed data. The output dataframe library matches the input.

Return type:

pandas.DataFrame or polars.DataFrame

get_feature_names_out(input_features=None)[source]#

Get output feature names for transformation.

Parameters:

input_features (array-like of str or None, default: None) –

Input features.

  • If input_features is None, then feature_names_in_ is used as feature names in.

  • If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns:

feature_names_out – Transformed feature names.

Return type:

ndarray of str objects

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

set_output(*, transform=None)#

Set output container.

Refer to the user guide for more details and Introducing the set_output API for an example on how to use the API.

Parameters:

transform ({"default", "pandas", "polars"}, default=None) –

Configure output of transform and fit_transform.

  • ”default”: Default output format of a transformer

  • ”pandas”: DataFrame output

  • ”polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

transform(X)#

Transform X by one-hot encoding categorical features.

Parameters:

X (pandas.DataFrame or polars.DataFrame) – The data to transform.

Returns:

Xt – The transformed data. The output dataframe library matches the input.

Return type:

pandas.DataFrame or polars.DataFrame