sksurv.meta.EnsembleSelectionRegressor¶

class
sksurv.meta.
EnsembleSelectionRegressor
(base_estimators, scorer=None, n_estimators=0.2, min_score=0.66, correlation='pearson', min_correlation=0.6, cv=None, n_jobs=1, verbose=0)¶ Ensemble selection for regression that accounts for the accuracy and correlation of errors.
The ensemble is pruned during training according to estimators’ accuracy and the correlation between prediction errors per sample. The accuracy of the ith estimator defined as \(\frac{ \min_{i=1,\ldots, n}(error_i) }{ error_i }\). In addition to the accuracy, models are selected based on the correlation between residuals of different models (diversity). The diversity of the ith estimator is defined as \(\frac{ncount}{n}\), where count is the number of estimators for whom the correlation of residuals exceeds min_correlation.
The hillclimbing is based on crossvalidation to avoid having to create a separate validation set.
Parameters:  base_estimators : list
List of (name, estimator) tuples (implementing fit/predict) that are part of the ensemble.
 scorer : callable
Function with signature
func(estimator, X_test, y_test, **test_predict_params)
that evaluates the error of the prediction on the test data. The function should return a scalar value. Smaller values of the score are assumed to be better. n_estimators : float or int, optional, default: 0.2
If a float, the percentage of estimators in the ensemble to retain, if an int the absolute number of estimators to retain.
 min_score : float, optional, default: 0.66
Threshold for pruning estimators based on scoring metric. After fit, only estimators with a accuracy above min_score are retained.
 min_correlation : float, optional, default: 0.6
Threshold for Pearson’s correlation coefficient that determines when residuals of two estimators are significantly correlated.
 cv : int, a cv generator instance, or None, optional
The input specifying which cv generator to use. It can be an integer, in which case it is the number of folds in a KFold, None, in which case 3 fold is used, or another object, that will then be used as a cv generator. The generator has to ensure that each sample is only used once for testing.
 n_jobs : int, optional, default: 1
Number of jobs to run in parallel.
 verbose : int, optional, default: 0
Controls the verbosity: the higher, the more messages.
References
[1] Pölsterl, S., Gupta, P., Wang, L., Conjeti, S., Katouzian, A., and Navab, N., “Heterogeneous ensembles for predicting survival of metastatic, castrateresistant prostate cancer patients”. F1000Research, vol. 5, no. 2676, 2016 [2] Caruana, R., Munson, A., NiculescuMizil, A. “Getting the most out of ensemble selection”. 6th IEEE International Conference on Data Mining, 828833, 2006 [3] Rooney, N., Patterson, D., Anand, S., Tsymbal, A. “Dynamic integration of regression models. International Workshop on Multiple Classifier Systems”. Lecture Notes in Computer Science, vol. 3181, 164173, 2004 Attributes:  scores_ : ndarray, shape = (n_base_estimators,)
Array of scores (relative to best performing estimator)
 fitted_models_ : ndarray
Selected models during training based on scorer.

__init__
(base_estimators, scorer=None, n_estimators=0.2, min_score=0.66, correlation='pearson', min_correlation=0.6, cv=None, n_jobs=1, verbose=0)¶
Methods
__init__
(base_estimators[, scorer, …])fit
(X[, y])Fit ensemble of models get_params
([deep])
fit
(X, y=None, **fit_params)¶ Fit ensemble of models
Parameters:  X : arraylike, shape = (n_samples, n_features)
Training data.
 y : arraylike, optional
Target data if base estimators are supervised.
Returns:  self