sksurv.meta.EnsembleSelection

class sksurv.meta.EnsembleSelection(base_estimators, scorer=None, n_estimators=0.2, min_score=0.2, correlation='pearson', min_correlation=0.6, cv=None, n_jobs=1, verbose=0)[source]

Ensemble selection for survival analysis that accounts for a score and correlations between predictions.

The ensemble is pruned during training only according to the specified score (accuracy) and additionally for prediction according to the correlation between predictions (diversity).

The hillclimbing is based on cross-validation to avoid having to create a separate validation set.

See [1], [2], [3] for further description.

Parameters:
  • base_estimators (list) – List of (name, estimator) tuples (implementing fit/predict) that are part of the ensemble.
  • scorer (callable) – Function with signature func(estimator, X_test, y_test, **test_predict_params) that evaluates the error of the prediction on the test data. The function should return a scalar value. Larger values of the score are assumed to be better.
  • n_estimators (float or int, optional, default: 0.2) – If a float, the percentage of estimators in the ensemble to retain, if an int the absolute number of estimators to retain.
  • min_score (float, optional, default: 0.66) – Threshold for pruning estimators based on scoring metric. After fit, only estimators with a score above min_score are retained.
  • min_correlation (float, optional, default: 0.6) – Threshold for Pearson’s correlation coefficient that determines when predictions of two estimators are significantly correlated.
  • cv (int, a cv generator instance, or None, optional) – The input specifying which cv generator to use. It can be an integer, in which case it is the number of folds in a KFold, None, in which case 3 fold is used, or another object, that will then be used as a cv generator. The generator has to ensure that each sample is only used once for testing.
  • n_jobs (int, optional, default: 1) – Number of jobs to run in parallel.
  • verbose (integer) – Controls the verbosity: the higher, the more messages.
scores_

Array of scores (relative to best performing estimator)

Type:ndarray, shape = (n_base_estimators,)
fitted_models_

Selected models during training based on scorer.

Type:ndarray

References

[1]Pölsterl, S., Gupta, P., Wang, L., Conjeti, S., Katouzian, A., and Navab, N., “Heterogeneous ensembles for predicting survival of metastatic, castrate-resistant prostate cancer patients”. F1000Research, vol. 5, no. 2676, 2016
[2]Caruana, R., Munson, A., Niculescu-Mizil, A. “Getting the most out of ensemble selection”. 6th IEEE International Conference on Data Mining, 828-833, 2006
[3]Rooney, N., Patterson, D., Anand, S., Tsymbal, A. “Dynamic integration of regression models. International Workshop on Multiple Classifier Systems”. Lecture Notes in Computer Science, vol. 3181, 164-173, 2004
__init__(base_estimators, scorer=None, n_estimators=0.2, min_score=0.2, correlation='pearson', min_correlation=0.6, cv=None, n_jobs=1, verbose=0)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(base_estimators[, scorer, …]) Initialize self.
fit(X[, y]) Fit ensemble of models
get_params([deep])

Attributes

predict mock imports
predict_log_proba mock imports
predict_proba mock imports
fit(X, y=None, **fit_params)[source]

Fit ensemble of models

Parameters:
  • X (array-like, shape = (n_samples, n_features)) – Training data.
  • y (array-like, optional) – Target data if base estimators are supervised.
Returns:

Return type:

self