sksurv.meta.EnsembleSelectionRegressor#
- class sksurv.meta.EnsembleSelectionRegressor(base_estimators, *, scorer=None, n_estimators=0.2, min_score=0.66, correlation='pearson', min_correlation=0.6, cv=None, n_jobs=1, verbose=0)[source]#
Ensemble selection for regression that accounts for the accuracy and correlation of errors.
The ensemble is pruned during training according to estimators’ accuracy and the correlation between prediction errors per sample. The accuracy of the i-th estimator defined as \(\frac{ \min_{i=1,\ldots, n}(error_i) }{ error_i }\). In addition to the accuracy, models are selected based on the correlation between residuals of different models (diversity). The diversity of the i-th estimator is defined as \(\frac{n-count}{n}\), where count is the number of estimators for whom the correlation of residuals exceeds min_correlation.
The hillclimbing is based on cross-validation to avoid having to create a separate validation set.
See [1], [2], [3] for further description.
- Parameters:
base_estimators (list) – List of (name, estimator) tuples (implementing fit/predict) that are part of the ensemble.
scorer (callable) – Function with signature
func(estimator, X_test, y_test, **test_predict_params)that evaluates the error of the prediction on the test data. The function should return a scalar value. Smaller values of the score are assumed to be better.n_estimators (float or int, optional, default: 0.2) – If a float, the percentage of estimators in the ensemble to retain, if an int the absolute number of estimators to retain.
min_score (float, optional, default: 0.66) – Threshold for pruning estimators based on scoring metric. After fit, only estimators with an accuracy above min_score are retained.
min_correlation (float, optional, default: 0.6) – Threshold for Pearson’s correlation coefficient that determines when residuals of two estimators are significantly correlated.
cv (int, a cv generator instance, or None, optional) – The input specifying which cv generator to use. It can be an integer, in which case it is the number of folds in a KFold, None, in which case 3 fold is used, or another object, that will then be used as a cv generator. The generator has to ensure that each sample is only used once for testing.
n_jobs (int, optional, default: 1) – Number of jobs to run in parallel.
verbose (int, optional, default: 0) – Controls the verbosity: the higher, the more messages.
- scores_#
Array of scores (relative to best performing estimator)
- Type:
ndarray, shape = (n_base_estimators,)
- fitted_models_#
Selected models during training based on scorer.
- Type:
ndarray
- n_features_in_#
Number of features seen during
fit.- Type:
int
- feature_names_in_#
Names of features seen during
fit. Defined only when X has feature names that are all strings.- Type:
ndarray, shape = (n_features_in_,)
References
- __init__(base_estimators, *, scorer=None, n_estimators=0.2, min_score=0.66, correlation='pearson', min_correlation=0.6, cv=None, n_jobs=1, verbose=0)[source]#
Methods
__init__(base_estimators, *[, scorer, ...])fit(X[, y])Fit ensemble of models.
Get metadata routing of this object.
get_params([deep])Get the parameters of an estimator from the ensemble.
predict(X)Perform prediction.
predict_cumulative_hazard_function(X[, ...])Perform prediction.
Perform prediction.
Perform prediction.
predict_survival_function(X[, return_array])Perform prediction.
score(X, y)Returns the concordance index of the prediction.
set_params(**params)Set the parameters of an estimator from the ensemble.
Attributes
stepsunique_times_- fit(X, y=None, **fit_params)[source]#
Fit ensemble of models.
- Parameters:
X (array-like, shape = (n_samples, n_features)) – Training data.
y (array-like, shape = (n_samples,), optional) – Target data if base estimators are supervised.
**fit_params (dict) – Parameters passed to the
fitmethod of each base estimator.
- Return type:
self
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)[source]#
Get the parameters of an estimator from the ensemble.
Returns the parameters given in the constructor as well as the estimators contained within the estimators parameter.
- Parameters:
deep (bool, default=True) – Setting it to True gets the various estimators and the parameters of the estimators as well.
- Returns:
params – Parameter and estimator names mapped to their values or parameter names mapped to their values.
- Return type:
dict
- predict(X)#
Perform prediction.
Only available if the meta estimator has a
predictmethod.- Parameters:
X (array-like, shape = (n_samples, n_features)) – Data with samples to predict.
- Returns:
prediction – Prediction of meta estimator that combines predictions of base estimators. n_dim depends on the return value of meta estimator’s
predictmethod.- Return type:
ndarray, shape = (n_samples, n_dim)
- predict_cumulative_hazard_function(X, return_array=False)#
Perform prediction.
Only available if the meta estimator has a
predict_cumulative_hazard_functionmethod.- Parameters:
X (array-like, shape = (n_samples, n_features)) – Data with samples to predict.
return_array (bool, default: False) –
Whether to return a single array of cumulative hazard values or a list of step functions.
If False, a list of
sksurv.functions.StepFunctionobjects is returned.If True, a 2d-array of shape (n_samples, n_unique_times) is returned, where n_unique_times is the number of unique event times in the training data. Each row represents the cumulative hazard function of an individual evaluated at unique_times_.
- Returns:
cum_hazard – If return_array is False, an array of n_samples
sksurv.functions.StepFunctioninstances is returned.If return_array is True, a numeric array of shape (n_samples, n_unique_times_) is returned.
- Return type:
ndarray
- predict_log_proba(X)#
Perform prediction.
Only available if the meta estimator has a
predict_log_probamethod.- Parameters:
X (array-like, shape = (n_samples, n_features)) – Data with samples to predict.
- Returns:
prediction – Prediction of meta estimator that combines predictions of base estimators. n_dim depends on the return value of meta estimator’s predict method.
- Return type:
ndarray, shape = (n_samples, n_dim)
- predict_proba(X)#
Perform prediction.
Only available if the meta estimator has a
predict_probamethod.- Parameters:
X (array-like, shape = (n_samples, n_features)) – Data with samples to predict.
- Returns:
prediction – Prediction of meta estimator that combines predictions of base estimators. n_dim depends on the return value of meta estimator’s predict method.
- Return type:
ndarray, shape = (n_samples, n_dim)
- predict_survival_function(X, return_array=False)#
Perform prediction.
Only available if the meta estimator has a
predict_survival_functionmethod.- Parameters:
X (array-like, shape = (n_samples, n_features)) – Data with samples to predict.
return_array (bool, default: False) –
Whether to return a single array of survival probabilities or a list of step functions.
If False, a list of
sksurv.functions.StepFunctionobjects is returned.If True, a 2d-array of shape (n_samples, n_unique_times) is returned, where n_unique_times is the number of unique event times in the training data. Each row represents the survival function of an individual evaluated at unique_times_.
- Returns:
survival – If return_array is False, an array of n_samples
sksurv.functions.StepFunctioninstances is returned.If return_array is True, a numeric array of shape (n_samples, n_unique_times_) is returned.
- Return type:
ndarray
- score(X, y)[source]#
Returns the concordance index of the prediction.
- Parameters:
X (array-like, shape = (n_samples, n_features)) – Test samples.
y (structured array, shape = (n_samples,)) – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.
- Returns:
cindex – Estimated concordance index.
- Return type:
float
See also
sksurv.metrics.concordance_index_censoredComputes the concordance index.
- set_params(**params)[source]#
Set the parameters of an estimator from the ensemble.
Valid parameter keys can be listed with get_params(). Note that you can directly set the parameters of the estimators contained in estimators.
- Parameters:
**params (keyword arguments) – Specific parameters using e.g. set_params(parameter_name=new_value). In addition, to setting the parameters of the estimator, the individual estimator of the estimators can also be set, or can be removed by setting them to ‘drop’.
- Returns:
self – Estimator instance.
- Return type:
object