sksurv.datasets.load_arff_files_standardized¶
-
sksurv.datasets.
load_arff_files_standardized
(path_training, attr_labels, pos_label=None, path_testing=None, survival=True, standardize_numeric=True, to_numeric=True)[source]¶ Load dataset in ARFF format.
Parameters: - path_training (str) – Path to ARFF file containing data.
- attr_labels (sequence of str) – Names of attributes denoting dependent variables.
If
survival
is set, it must be a sequence with two items: the name of the event indicator and the name of the survival/censoring time. - pos_label (any type, optional) – Value corresponding to an event in survival analysis.
Only considered if
survival
isTrue
. - path_testing (str, optional) – Path to ARFF file containing hold-out data. Only columns that are available in both
training and testing are considered (excluding dependent variables).
If
standardize_numeric
is set, data is normalized by considering both training and testing data. - survival (bool, optional, default: True) – Whether the dependent variables denote event indicator and survival/censoring time.
- standardize_numeric (bool, optional, default: True) – Whether to standardize data to zero mean and unit variance.
See
sksurv.column.standardize()
. - to_numeric (boo, optional, default: True) – Whether to convert categorical variables to numeric values.
See
sksurv.column.categorical_to_numeric()
.
Returns: - x_train (pandas.DataFrame, shape = (n_train, n_features)) – Training data.
- y_train (pandas.DataFrame, shape = (n_train, n_labels)) – Dependent variables of training data.
- x_test (None or pandas.DataFrame, shape = (n_train, n_features)) – Testing data if path_testing was provided.
- y_test (None or pandas.DataFrame, shape = (n_train, n_labels)) – Dependent variables of testing data if path_testing was provided.