sksurv.ensemble.GradientBoostingSurvivalAnalysis

class sksurv.ensemble.GradientBoostingSurvivalAnalysis(loss='coxph', learning_rate=0.1, n_estimators=100, criterion='friedman_mse', min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_split=None, min_impurity_decrease=0.0, random_state=None, max_features=None, max_leaf_nodes=None, presort='auto', subsample=1.0, dropout_rate=0.0, verbose=0)

Gradient-boosted Cox proportional hazard loss with regression trees as base learner.

In each stage, a regression tree is fit on the negative gradient of the loss function.

Parameters:
  • loss ({'coxph', 'squared', 'ipcwls'}, optional, default: 'coxph') – loss function to be optimized. ‘coxph’ refers to partial likelihood loss of Cox’s proportional hazards model. The loss ‘squared’ minimizes a squared regression loss that ignores predictions beyond the time of censoring, and ‘ipcwls’ refers to inverse-probability of censoring weighted least squares error.
  • learning_rate (float, optional, default: 0.1) – learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.
  • n_estimators (int, default: 100) – The number of regression trees to create. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.
  • criterion (string, optional, default: 'friedman_mse') – The function to measure the quality of a split. Supported criteria are “friedman_mse” for the mean squared error with improvement score by Friedman, “mse” for mean squared error, and “mae” for the mean absolute error. The default value of “friedman_mse” is generally the best as it can provide a better approximation in some cases.
  • min_samples_split (integer, optional, default: 2) – The minimum number of samples required to split an internal node.
  • min_samples_leaf (integer, optional, default: 1) – The minimum number of samples required to be at a leaf node.
  • min_weight_fraction_leaf (float, optional, default: 0.) – The minimum weighted fraction of the input samples required to be at a leaf node.
  • max_depth (integer, optional, default: 3) – maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables. Ignored if max_leaf_nodes is not None.
  • min_impurity_split (float,) – Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.
  • min_impurity_decrease (float, optional, default: 0.) –

    A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

    The weighted impurity decrease equation is the following:

    N_t / N * (impurity - N_t_R / N_t * right_impurity
                        - N_t_L / N_t * left_impurity)
    

    where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

    N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

  • random_state (int seed, RandomState instance, or None, default: None) – The seed of the pseudo random number generator to use when shuffling the data.
  • max_features (int, float, string or None, optional, default: None) –
    The number of features to consider when looking for the best split:
    • If int, then consider max_features features at each split.
    • If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split.
    • If “auto”, then max_features=n_features.
    • If “sqrt”, then max_features=sqrt(n_features).
    • If “log2”, then max_features=log2(n_features).
    • If None, then max_features=n_features.

    Choosing max_features < n_features leads to a reduction of variance and an increase in bias.

    Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

  • max_leaf_nodes (int or None, optional, default: None) – Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.
  • presort (bool or 'auto', optional, default: 'auto') – Whether to presort the data to speed up the finding of best splits in fitting. Auto mode by default will use presorting on dense data and default to normal sorting on sparse data. Setting presort to true on sparse data will raise an error.
  • subsample (float, optional, default: 1.0) – The fraction of samples to be used for fitting the individual regression trees. If smaller than 1.0, this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
  • dropout_rate (float, optional, default: 0.0) – If larger than zero, the residuals at each iteration are only computed from a random subset of base learners. The value corresponds to the percentage of base learners that are dropped. In each iteration, at least one base learner is dropped. This is an alternative regularization to shrinkage, i.e., setting learning_rate < 1.0.
  • verbose (int, default: 0) – Enable verbose output. If 1 then it prints progress and performance once in a while (the more trees the lower the frequency). If greater than 1 then it prints progress and performance for every tree.
n_estimators_

The number of estimators as selected by early stopping (if n_iter_no_change is specified). Otherwise it is set to n_estimators.

Type:int
feature_importances_

The feature importances (the higher, the more important the feature).

Type:ndarray, shape = (n_features,)
estimators_

The collection of fitted sub-estimators.

Type:ndarray of DecisionTreeRegressor, shape = (n_estimators, 1)
train_score_

The i-th score train_score_[i] is the deviance (= loss) of the model at iteration i on the in-bag sample. If subsample == 1 this is the deviance on the training data.

Type:ndarray, shape = (n_estimators,)
oob_improvement_

The improvement in loss (= deviance) on the out-of-bag samples relative to the previous iteration. oob_improvement_[0] is the improvement in loss of the first stage over the init estimator.

Type:ndarray, shape = (n_estimators,)
__init__(loss='coxph', learning_rate=0.1, n_estimators=100, criterion='friedman_mse', min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_split=None, min_impurity_decrease=0.0, random_state=None, max_features=None, max_leaf_nodes=None, presort='auto', subsample=1.0, dropout_rate=0.0, verbose=0)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([loss, learning_rate, …]) Initialize self.
fit(X, y[, sample_weight, monitor]) Fit the gradient boosting model.
predict(X) Predict risk scores.
score(X, y) Returns the concordance index of the prediction.
staged_predict(X) Predict risk scores at each stage for X.
fit(X, y, sample_weight=None, monitor=None)

Fit the gradient boosting model.

Parameters:
  • X (array-like, shape = (n_samples, n_features)) – Data matrix
  • y (structured array, shape = (n_samples,)) – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.
  • sample_weight (array-like, shape = (n_samples,), optional) – Weights given to each sample. If omitted, all samples have weight 1.
  • monitor (callable, optional) – The monitor is called after each iteration with the current iteration, a reference to the estimator and the local variables of _fit_stages as keyword arguments callable(i, self, locals()). If the callable returns True the fitting procedure is stopped. The monitor can be used for various things such as computing held-out estimates, early stopping, model introspect, and snapshoting.
Returns:

self – Returns self.

Return type:

object

predict(X)

Predict risk scores.

If loss=’coxph’, predictions can be interpreted as log hazard ratio similar to the linear predictor of a Cox proportional hazards model. If loss=’squared’ or loss=’ipcwls’, predictions are the time to event.

Parameters:X (array-like, shape = (n_samples, n_features)) – The input samples.
Returns:y – The risk scores.
Return type:ndarray, shape = (n_samples,)
score(X, y)

Returns the concordance index of the prediction.

Parameters:
  • X (array-like, shape = (n_samples, n_features)) – Test samples.
  • y (structured array, shape = (n_samples,)) – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.
Returns:

cindex – Estimated concordance index.

Return type:

float

staged_predict(X)

Predict risk scores at each stage for X.

This method allows monitoring (i.e. determine error on testing set) after each stage.

If loss=’coxph’, predictions can be interpreted as log hazard ratio similar to the linear predictor of a Cox proportional hazards model. If loss=’squared’ or loss=’ipcwls’, predictions are the time to event.

Parameters:X (array-like, shape = (n_samples, n_features)) – The input samples.
Returns:y – The predicted value of the input samples.
Return type:generator of array of shape = (n_samples,)