sksurv.ensemble.ComponentwiseGradientBoostingSurvivalAnalysis#
- class sksurv.ensemble.ComponentwiseGradientBoostingSurvivalAnalysis(*, loss='coxph', learning_rate=0.1, n_estimators=100, subsample=1.0, dropout_rate=0, random_state=None, verbose=0)[source]#
Gradient boosting with component-wise least squares as base learner.
See the User Guide and 1 for further description.
- Parameters
loss ({'coxph', 'squared', 'ipcwls'}, optional, default: 'coxph') – loss function to be optimized. ‘coxph’ refers to partial likelihood loss of Cox’s proportional hazards model. The loss ‘squared’ minimizes a squared regression loss that ignores predictions beyond the time of censoring, and ‘ipcwls’ refers to inverse-probability of censoring weighted least squares error.
learning_rate (float, optional, default: 0.1) – learning rate shrinks the contribution of each base learner by learning_rate. There is a trade-off between learning_rate and n_estimators. Values must be in the range [0.0, inf).
n_estimators (int, default: 100) – The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance. Values must be in the range [1, inf).
subsample (float, optional, default: 1.0) – The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias. Values must be in the range (0.0, 1.0].
dropout_rate (float, optional, default: 0.0) – If larger than zero, the residuals at each iteration are only computed from a random subset of base learners. The value corresponds to the percentage of base learners that are dropped. In each iteration, at least one base learner is dropped. This is an alternative regularization to shrinkage, i.e., setting learning_rate < 1.0. Values must be in the range [0.0, 1.0).
random_state (int seed, RandomState instance, or None, default: None) – The seed of the pseudo random number generator to use when shuffling the data.
verbose (int, default: 0) – Enable verbose output. If 1 then it prints progress and performance once in a while. Values must be in the range [0, inf).
- coef_#
The aggregated coefficients. The first element coef_[0] corresponds to the intercept. If loss is coxph, the intercept will always be zero.
- Type
array, shape = (n_features + 1,)
- loss_#
The concrete
LossFunction
object.- Type
LossFunction
- estimators_#
The collection of fitted sub-estimators.
- Type
list of base learners
- train_score_#
The i-th score
train_score_[i]
is the deviance (= loss) of the model at iterationi
on the in-bag sample. Ifsubsample == 1
this is the deviance on the training data.- Type
array, shape = (n_estimators,)
- oob_improvement_#
The improvement in loss (= deviance) on the out-of-bag samples relative to the previous iteration.
oob_improvement_[0]
is the improvement in loss of the first stage over theinit
estimator.- Type
array, shape = (n_estimators,)
- n_features_in_#
Number of features seen during
fit
.- Type
int
- feature_names_in_#
Names of features seen during
fit
. Defined only when X has feature names that are all strings.- Type
ndarray of shape (n_features_in_,)
- event_times_#
Unique time points where events occurred.
- Type
array of shape = (n_event_times,)
References
- 1
Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., van der Laan, M. J., “Survival ensembles”, Biostatistics, 7(3), 355-73, 2006
- __init__(*, loss='coxph', learning_rate=0.1, n_estimators=100, subsample=1.0, dropout_rate=0, random_state=None, verbose=0)[source]#
Methods
__init__
(*[, loss, learning_rate, ...])fit
(X, y[, sample_weight])Fit estimator.
get_params
([deep])Get parameters for this estimator.
predict
(X)Predict risk scores.
predict_cumulative_hazard_function
(X[, ...])Predict cumulative hazard function.
predict_survival_function
(X[, return_array])Predict survival function.
score
(X, y)Returns the concordance index of the prediction.
set_params
(**params)Set the parameters of this estimator.
Attributes
Estimator used to grow the ensemble.
feature_importances_
- property base_estimator_#
Estimator used to grow the ensemble.
- fit(X, y, sample_weight=None)[source]#
Fit estimator.
- Parameters
X (array-like, shape = (n_samples, n_features)) – Data matrix
y (structured array, shape = (n_samples,)) – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.
sample_weight (array-like, shape = (n_samples,), optional) – Weights given to each sample. If omitted, all samples have weight 1.
- Return type
self
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- predict(X)[source]#
Predict risk scores.
If loss=’coxph’, predictions can be interpreted as log hazard ratio corresponding to the linear predictor of a Cox proportional hazards model. If loss=’squared’ or loss=’ipcwls’, predictions are the time to event.
- Parameters
X (array-like, shape = (n_samples, n_features)) – Data matrix.
- Returns
risk_score – Predicted risk scores.
- Return type
array, shape = (n_samples,)
- predict_cumulative_hazard_function(X, return_array=False)[source]#
Predict cumulative hazard function.
Only available if
fit()
has been called with loss = “coxph”.The cumulative hazard function for an individual with feature vector \(x\) is defined as
\[H(t \mid x) = \exp(f(x)) H_0(t) ,\]where \(f(\cdot)\) is the additive ensemble of base learners, and \(H_0(t)\) is the baseline hazard function, estimated by Breslow’s estimator.
- Parameters
X (array-like, shape = (n_samples, n_features)) – Data matrix.
return_array (boolean, default: False) – If set, return an array with the cumulative hazard rate for each self.event_times_, otherwise an array of
sksurv.functions.StepFunction
.
- Returns
cum_hazard – If return_array is set, an array with the cumulative hazard rate for each self.event_times_, otherwise an array of length n_samples of
sksurv.functions.StepFunction
instances will be returned.- Return type
ndarray
Examples
>>> import matplotlib.pyplot as plt >>> from sksurv.datasets import load_whas500 >>> from sksurv.ensemble import ComponentwiseGradientBoostingSurvivalAnalysis
Load the data.
>>> X, y = load_whas500() >>> X = X.astype(float)
Fit the model.
>>> estimator = ComponentwiseGradientBoostingSurvivalAnalysis(loss="coxph").fit(X, y)
Estimate the cumulative hazard function for the first 10 samples.
>>> chf_funcs = estimator.predict_cumulative_hazard_function(X.iloc[:10])
Plot the estimated cumulative hazard functions.
>>> for fn in chf_funcs: ... plt.step(fn.x, fn(fn.x), where="post") ... >>> plt.ylim(0, 1) >>> plt.show()
- predict_survival_function(X, return_array=False)[source]#
Predict survival function.
Only available if
fit()
has been called with loss = “coxph”.The survival function for an individual with feature vector \(x\) is defined as
\[S(t \mid x) = S_0(t)^{\exp(f(x)} ,\]where \(f(\cdot)\) is the additive ensemble of base learners, and \(S_0(t)\) is the baseline survival function, estimated by Breslow’s estimator.
- Parameters
X (array-like, shape = (n_samples, n_features)) – Data matrix.
return_array (boolean, default: False) – If set, return an array with the probability of survival for each self.event_times_, otherwise an array of
sksurv.functions.StepFunction
.
- Returns
survival – If return_array is set, an array with the probability of survival for each self.event_times_, otherwise an array of length n_samples of
sksurv.functions.StepFunction
instances will be returned.- Return type
ndarray
Examples
>>> import matplotlib.pyplot as plt >>> from sksurv.datasets import load_whas500 >>> from sksurv.ensemble import ComponentwiseGradientBoostingSurvivalAnalysis
Load the data.
>>> X, y = load_whas500() >>> X = X.astype(float)
Fit the model.
>>> estimator = ComponentwiseGradientBoostingSurvivalAnalysis(loss="coxph").fit(X, y)
Estimate the survival function for the first 10 samples.
>>> surv_funcs = estimator.predict_survival_function(X.iloc[:10])
Plot the estimated survival functions.
>>> for fn in surv_funcs: ... plt.step(fn.x, fn(fn.x), where="post") ... >>> plt.ylim(0, 1) >>> plt.show()
- score(X, y)[source]#
Returns the concordance index of the prediction.
- Parameters
X (array-like, shape = (n_samples, n_features)) – Test samples.
y (structured array, shape = (n_samples,)) – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.
- Returns
cindex – Estimated concordance index.
- Return type
float
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance