sksurv.datasets.load_arff_files_standardized

sksurv.datasets.load_arff_files_standardized(path_training, attr_labels, pos_label=None, path_testing=None, survival=True, standardize_numeric=True, to_numeric=True)

Load dataset in ARFF format.

Parameters:
path_training : str

Path to ARFF file containing data.

attr_labels : sequence of str

Names of attributes denoting dependent variables. If survival is set, it must be a sequence with two items: the name of the event indicator and the name of the survival/censoring time.

pos_label : any type, optional

Value corresponding to an event in survival analysis. Only considered if survival is True.

path_testing : str, optional

Path to ARFF file containing hold-out data. Only columns that are available in both training and testing are considered (excluding dependent variables). If standardize_numeric is set, data is normalized by considering both training and testing data.

survival : bool, optional, default: True

Whether the dependent variables denote event indicator and survival/censoring time.

standardize_numeric : bool, optional, default: True

Whether to standardize data to zero mean and unit variance. See sksurv.column.standardize().

to_numeric : boo, optional, default: True

Whether to convert categorical variables to numeric values. See sksurv.column.categorical_to_numeric().

Returns:
x_train : pandas.DataFrame, shape = (n_train, n_features)

Training data.

y_train : pandas.DataFrame, shape = (n_train, n_labels)

Dependent variables of training data.

x_test : None or pandas.DataFrame, shape = (n_train, n_features)

Testing data if path_testing was provided.

y_test : None or pandas.DataFrame, shape = (n_train, n_labels)

Dependent variables of testing data if path_testing was provided.