sksurv.datasets.get_x_y

sksurv.datasets.get_x_y(data_frame, attr_labels, pos_label=None, survival=True)[source]

Split data frame into features and labels.

Parameters:
  • data_frame (pandas.DataFrame, shape = (n_samples, n_columns)) – A data frame.
  • attr_labels (sequence of str or None) – A list of one or more columns that are considered the label. If survival is True, then attr_labels has two elements: 1) the name of the column denoting the event indicator, and 2) the name of the column denoting the survival time. If the sequence contains None, then labels are not retrieved and only a data frame with features is returned.
  • pos_label (any, optional) – Which value of the event indicator column denotes that a patient experienced an event. This value is ignored if survival is False.
  • survival (bool, optional, default: True) – Whether to return y that can be used for survival analysis.
Returns:

  • X (pandas.DataFrame, shape = (n_samples, n_columns - len(attr_labels))) – Data frame containing features.
  • y (None or pandas.DataFrame, shape = (n_samples, len(attr_labels))) – Data frame containing columns with supervised information. If survival was True, then the column denoting the event indicator will be boolean and survival times will be float. If attr_labels contains None, y is set to None.