sksurv.ensemble.GradientBoostingSurvivalAnalysis#
- class sksurv.ensemble.GradientBoostingSurvivalAnalysis(loss='coxph', learning_rate=0.1, n_estimators=100, criterion='friedman_mse', min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, random_state=None, max_features=None, max_leaf_nodes=None, subsample=1.0, dropout_rate=0.0, verbose=0, ccp_alpha=0.0)[source]#
Gradient-boosted Cox proportional hazard loss with regression trees as base learner.
In each stage, a regression tree is fit on the negative gradient of the loss function.
For more details on gradient boosting see 1 and 2. If loss=’coxph’, the partial likelihood of the proportional hazards model is optimized as described in 3. If loss=’ipcwls’, the accelerated failture time model with inverse-probability of censoring weighted least squares error is optimized as described in 4. When using a non-zero dropout_rate, regularization is applied during training following 5.
See the User Guide for examples.
- Parameters
loss ({'coxph', 'squared', 'ipcwls'}, optional, default: 'coxph') – loss function to be optimized. ‘coxph’ refers to partial likelihood loss of Cox’s proportional hazards model. The loss ‘squared’ minimizes a squared regression loss that ignores predictions beyond the time of censoring, and ‘ipcwls’ refers to inverse-probability of censoring weighted least squares error.
learning_rate (float, optional, default: 0.1) – learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
n_estimators (int, default: 100) – The number of regression trees to create. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
criterion (string, optional, default: 'friedman_mse') – The function to measure the quality of a split. Supported criteria are “friedman_mse” for the mean squared error with improvement score by Friedman, “mse” for mean squared error, and “mae” for the mean absolute error. The default value of “friedman_mse” is generally the best as it can provide a better approximation in some cases.
min_samples_split (integer, optional, default: 2) – The minimum number of samples required to split an internal node.
min_samples_leaf (integer, optional, default: 1) – The minimum number of samples required to be at a leaf node.
min_weight_fraction_leaf (float, optional, default: 0.) – The minimum weighted fraction of the input samples required to be at a leaf node.
max_depth (integer, optional, default: 3) – maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables. Ignored if
max_leaf_nodes
is not None.min_impurity_decrease (float, optional, default: 0.) –
A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
The weighted impurity decrease equation is the following:
N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity)
where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed.random_state (int seed, RandomState instance, or None, default: None) – The seed of the pseudo random number generator to use when shuffling the data.
max_features (int, float, string or None, optional, default: None) –
- The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=n_features.
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
Choosing max_features < n_features leads to a reduction of variance and an increase in bias.
Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than
max_features
features.max_leaf_nodes (int or None, optional, default: None) – Grow trees with
max_leaf_nodes
in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.subsample (float, optional, default: 1.0) – The fraction of samples to be used for fitting the individual regression trees. If smaller than 1.0, this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
dropout_rate (float, optional, default: 0.0) – If larger than zero, the residuals at each iteration are only computed from a random subset of base learners. The value corresponds to the percentage of base learners that are dropped. In each iteration, at least one base learner is dropped. This is an alternative regularization to shrinkage, i.e., setting learning_rate < 1.0.
verbose (int, default: 0) – Enable verbose output. If 1 then it prints progress and performance once in a while (the more trees the lower the frequency). If greater than 1 then it prints progress and performance for every tree.
ccp_alpha (non-negative float, optional, default: 0.0.) – Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than
ccp_alpha
will be chosen. By default, no pruning is performed.
- n_estimators_#
The number of estimators as selected by early stopping (if
n_iter_no_change
is specified). Otherwise it is set ton_estimators
.- Type
int
- feature_importances_#
The feature importances (the higher, the more important the feature).
- Type
ndarray, shape = (n_features,)
- estimators_#
The collection of fitted sub-estimators.
- Type
ndarray of DecisionTreeRegressor, shape = (n_estimators, 1)
- train_score_#
The i-th score
train_score_[i]
is the deviance (= loss) of the model at iterationi
on the in-bag sample. Ifsubsample == 1
this is the deviance on the training data.- Type
ndarray, shape = (n_estimators,)
- oob_improvement_#
The improvement in loss (= deviance) on the out-of-bag samples relative to the previous iteration.
oob_improvement_[0]
is the improvement in loss of the first stage over theinit
estimator.- Type
ndarray, shape = (n_estimators,)
- n_features_in_#
Number of features seen during
fit
.- Type
int
- feature_names_in_#
Names of features seen during
fit
. Defined only when X has feature names that are all strings.- Type
ndarray of shape (n_features_in_,)
- event_times_#
Unique time points where events occurred.
- Type
array of shape = (n_event_times,)
References
- 1
J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” The Annals of Statistics, 29(5), 1189–1232, 2001.
- 2
J. H. Friedman, “Stochastic gradient boosting,” Computational Statistics & Data Analysis, 38(4), 367–378, 2002.
- 3
G. Ridgeway, “The state of boosting,” Computing Science and Statistics, 172–181, 1999.
- 4
Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., van der Laan, M. J., “Survival ensembles”, Biostatistics, 7(3), 355-73, 2006.
- 5
K. V. Rashmi and R. Gilad-Bachrach, “DART: Dropouts meet multiple additive regression trees,” in 18th International Conference on Artificial Intelligence and Statistics, 2015, 489–497.
- __init__(loss='coxph', learning_rate=0.1, n_estimators=100, criterion='friedman_mse', min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, random_state=None, max_features=None, max_leaf_nodes=None, subsample=1.0, dropout_rate=0.0, verbose=0, ccp_alpha=0.0)[source]#
Methods
__init__
([loss, learning_rate, ...])apply
(X)Apply trees in the ensemble to X, return leaf indices.
fit
(X, y[, sample_weight, monitor])Fit the gradient boosting model.
get_params
([deep])Get parameters for this estimator.
predict
(X)Predict risk scores.
predict_cumulative_hazard_function
(X[, ...])Predict cumulative hazard function.
predict_survival_function
(X[, return_array])Predict survival function.
score
(X, y)Returns the concordance index of the prediction.
set_params
(**params)Set the parameters of this estimator.
Predict risk scores at each stage for X.
Attributes
Attribute loss_ was deprecated in version 1.1 and will be removed in 1.3.
Attribute n_features_ was deprecated in version 1.0 and will be removed in 1.2.
- apply(X)#
Apply trees in the ensemble to X, return leaf indices.
New in version 0.17.
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The input samples. Internally, its dtype will be converted to
dtype=np.float32
. If a sparse matrix is provided, it will be converted to a sparsecsr_matrix
.- Returns
X_leaves – For each datapoint x in X and for each tree in the ensemble, return the index of the leaf x ends up in each estimator. In the case of binary classification n_classes is 1.
- Return type
array-like of shape (n_samples, n_estimators, n_classes)
- fit(X, y, sample_weight=None, monitor=None)[source]#
Fit the gradient boosting model.
- Parameters
X (array-like, shape = (n_samples, n_features)) – Data matrix
y (structured array, shape = (n_samples,)) – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.
sample_weight (array-like, shape = (n_samples,), optional) – Weights given to each sample. If omitted, all samples have weight 1.
monitor (callable, optional) – The monitor is called after each iteration with the current iteration, a reference to the estimator and the local variables of
_fit_stages
as keyword argumentscallable(i, self, locals())
. If the callable returnsTrue
the fitting procedure is stopped. The monitor can be used for various things such as computing held-out estimates, early stopping, model introspect, and snapshoting.
- Returns
self – Returns self.
- Return type
object
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- property loss_#
Attribute loss_ was deprecated in version 1.1 and will be removed in 1.3.
- Type
DEPRECATED
- property n_features_#
Attribute n_features_ was deprecated in version 1.0 and will be removed in 1.2. Use n_features_in_ instead.
- Type
DEPRECATED
- predict(X)[source]#
Predict risk scores.
If loss=’coxph’, predictions can be interpreted as log hazard ratio similar to the linear predictor of a Cox proportional hazards model. If loss=’squared’ or loss=’ipcwls’, predictions are the time to event.
- Parameters
X (array-like, shape = (n_samples, n_features)) – The input samples.
- Returns
y – The risk scores.
- Return type
ndarray, shape = (n_samples,)
- predict_cumulative_hazard_function(X, return_array=False)[source]#
Predict cumulative hazard function.
Only available if
fit()
has been called with loss = “coxph”.The cumulative hazard function for an individual with feature vector \(x\) is defined as
\[H(t \mid x) = \exp(f(x)) H_0(t) ,\]where \(f(\cdot)\) is the additive ensemble of base learners, and \(H_0(t)\) is the baseline hazard function, estimated by Breslow’s estimator.
- Parameters
X (array-like, shape = (n_samples, n_features)) – Data matrix.
return_array (boolean, default: False) – If set, return an array with the cumulative hazard rate for each self.event_times_, otherwise an array of
sksurv.functions.StepFunction
.
- Returns
cum_hazard – If return_array is set, an array with the cumulative hazard rate for each self.event_times_, otherwise an array of length n_samples of
sksurv.functions.StepFunction
instances will be returned.- Return type
ndarray
Examples
>>> import matplotlib.pyplot as plt >>> from sksurv.datasets import load_whas500 >>> from sksurv.ensemble import GradientBoostingSurvivalAnalysis
Load the data.
>>> X, y = load_whas500() >>> X = X.astype(float)
Fit the model.
>>> estimator = GradientBoostingSurvivalAnalysis(loss="coxph").fit(X, y)
Estimate the cumulative hazard function for the first 10 samples.
>>> chf_funcs = estimator.predict_cumulative_hazard_function(X.iloc[:10])
Plot the estimated cumulative hazard functions.
>>> for fn in chf_funcs: ... plt.step(fn.x, fn(fn.x), where="post") ... >>> plt.ylim(0, 1) >>> plt.show()
- predict_survival_function(X, return_array=False)[source]#
Predict survival function.
Only available if
fit()
has been called with loss = “coxph”.The survival function for an individual with feature vector \(x\) is defined as
\[S(t \mid x) = S_0(t)^{\exp(f(x)} ,\]where \(f(\cdot)\) is the additive ensemble of base learners, and \(S_0(t)\) is the baseline survival function, estimated by Breslow’s estimator.
- Parameters
X (array-like, shape = (n_samples, n_features)) – Data matrix.
return_array (boolean, default: False) – If set, return an array with the probability of survival for each self.event_times_, otherwise an array of
sksurv.functions.StepFunction
.
- Returns
survival – If return_array is set, an array with the probability of survival for each self.event_times_, otherwise an array of length n_samples of
sksurv.functions.StepFunction
instances will be returned.- Return type
ndarray
Examples
>>> import matplotlib.pyplot as plt >>> from sksurv.datasets import load_whas500 >>> from sksurv.ensemble import GradientBoostingSurvivalAnalysis
Load the data.
>>> X, y = load_whas500() >>> X = X.astype(float)
Fit the model.
>>> estimator = GradientBoostingSurvivalAnalysis(loss="coxph").fit(X, y)
Estimate the survival function for the first 10 samples.
>>> surv_funcs = estimator.predict_survival_function(X.iloc[:10])
Plot the estimated survival functions.
>>> for fn in surv_funcs: ... plt.step(fn.x, fn(fn.x), where="post") ... >>> plt.ylim(0, 1) >>> plt.show()
- score(X, y)[source]#
Returns the concordance index of the prediction.
- Parameters
X (array-like, shape = (n_samples, n_features)) – Test samples.
y (structured array, shape = (n_samples,)) – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.
- Returns
cindex – Estimated concordance index.
- Return type
float
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- staged_predict(X)[source]#
Predict risk scores at each stage for X.
This method allows monitoring (i.e. determine error on testing set) after each stage.
If loss=’coxph’, predictions can be interpreted as log hazard ratio similar to the linear predictor of a Cox proportional hazards model. If loss=’squared’ or loss=’ipcwls’, predictions are the time to event.
- Parameters
X (array-like, shape = (n_samples, n_features)) – The input samples.
- Returns
y – The predicted value of the input samples.
- Return type
generator of array of shape = (n_samples,)