sksurv.datasets.get_x_y#
- sksurv.datasets.get_x_y(data_frame, attr_labels, pos_label=None, survival=True, competing_risks=False)[source]#
Split data frame into features and labels.
- Parameters:
data_frame (pandas.DataFrame, shape = (n_samples, n_columns)) – A data frame.
attr_labels (sequence of str or None) – A list of one or more columns that are considered the label. If survival is True, then attr_labels has two elements: 1) the name of the column denoting the event indicator, and 2) the name of the column denoting the survival time. If the sequence contains None, then labels are not retrieved and only a data frame with features is returned.
pos_label (any, optional) – Which value of the event indicator column denotes that a patient experienced an event. This value is ignored if survival is False.
survival (bool, optional, default: True) – Whether to return y that can be used for survival analysis.
competing_risks (bool, optional, default: False) – Whether y refers to competing risks situation. Only used if survival is True.
- Returns:
X (pandas.DataFrame, shape = (n_samples, n_columns - len(attr_labels))) – Data frame containing features.
y (structured array, shape = (n_samples,), or pandas.DataFrame, shape = (n_samples, len(attr_labels)), or None) – If survival is True, a structured array with two fields. The first field is a boolean where
Trueindicates an event andFalseindicates right-censoring. The second field is a float with the time of event or time of censoring.If survival is False and attr_labels not None, a
pandas.DataFramewith columns specified by attr_labels.If survival is False and attr_labels is None, y is set to None.