sksurv.datasets.get_x_y#

sksurv.datasets.get_x_y(data_frame, attr_labels, pos_label=None, survival=True, competing_risks=False)[source]#

Split data frame into features and labels.

Parameters:

data_frame (pandas.DataFrame or polars.DataFrame, shape = (n_samples, n_columns)) – A data frame.
attr_labels (sequence of str or None) – A list of one or more columns that are considered the label. If survival is True, then attr_labels has two elements: 1) the name of the column denoting the event indicator, and 2) the name of the column denoting the survival time. If the sequence contains None, then labels are not retrieved and only a data frame with features is returned.
pos_label (any, optional) – Which value of the event indicator column denotes that a patient experienced an event. This value is ignored if survival is False.
survival (bool, optional, default: True) – Whether to return y that can be used for survival analysis.
competing_risks (bool, optional, default: False) – Whether y refers to competing risks situation. Only used if survival is True.

Returns:

X (pandas.DataFrame or polars.DataFrame, shape = (n_samples, n_features)) – Data frame containing features. The output dataframe library matches the input.
y (structured array, Series, DataFrame, or None) – If survival is True, a structured array of shape (n_samples,) with two fields. The first field is a boolean where True indicates an event and False indicates right-censoring. The second field is a float with the time of event or time of censoring.

If survival is False and attr_labels is a single column name, a Series in the input dataframe library; if it is a sequence of column names, a DataFrame with those columns.

If survival is False and attr_labels is None, y is set to None.