sksurv.linear_model.
CoxnetSurvivalAnalysis
Cox’s proportional hazard’s model with elastic net penalty.
See the User Guide and 1 for further description.
n_alphas (int, optional, default: 100) – Number of alphas along the regularization path.
alphas (array-like or None, optional) – List of alphas where to compute the models. If None alphas are set automatically.
None
alpha_min_ratio (float or { "auto" }, optional, default: "auto") –
Determines minimum alpha of the regularization path if alphas is None. The smallest value for alpha is computed as the fraction of the data derived maximum alpha (i.e. the smallest value for which all coefficients are zero).
alphas
If set to “auto”, the value will depend on the sample size relative to the number of features. If n_samples > n_features, the default value is 0.0001 If n_samples <= n_features, 0.01 is the default value.
n_samples > n_features
n_samples <= n_features
l1_ratio (float, optional, default: 0.5) – The ElasticNet mixing parameter, with 0 < l1_ratio <= 1. For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.
0 < l1_ratio <= 1
l1_ratio = 0
l1_ratio = 1
0 < l1_ratio < 1
penalty_factor (array-like or None, optional) – Separate penalty factors can be applied to each coefficient. This is a number that multiplies alpha to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables. Note: the penalty factors are internally rescaled to sum to n_features, and the alphas sequence will reflect this change.
normalize (boolean, optional, default: False) – If True, the features X will be normalized before optimization by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.
sklearn.preprocessing.StandardScaler
fit
normalize=False
copy_X (boolean, optional, default: True) – If True, X will be copied; else, it may be overwritten.
True
tol (float, optional, default: 1e-7) – The tolerance for the optimization: optimization continues until all updates are smaller than tol.
tol
max_iter (int, optional, default: 100000) – The maximum number of iterations.
verbose (bool, optional, default: False) – Whether to print additional information during optimization.
fit_baseline_model (bool, optional, default: False) – Whether to estimate baseline survival function and baseline cumulative hazard function for each alpha. If enabled, predict_cumulative_hazard_function() and predict_survival_function() can be used to obtain predicted cumulative hazard function and survival function.
predict_cumulative_hazard_function()
predict_survival_function()
alphas_
The actual sequence of alpha values used.
ndarray, shape=(n_alphas,)
alpha_min_ratio_
The inferred value of alpha_min_ratio.
float
penalty_factor_
The actual penalty factors used.
ndarray, shape=(n_features,)
coef_
Matrix of coefficients.
ndarray, shape=(n_features, n_alphas)
offset_
Bias term to account for non-centered features.
deviance_ratio_
The fraction of (null) deviance explained.
References
Simon N, Friedman J, Hastie T, Tibshirani R. Regularization paths for Cox’s proportional hazards model via coordinate descent. Journal of statistical software. 2011 Mar;39(5):1.
__init__
Initialize self. See help(type(self)) for accurate signature.
Methods
__init__([n_alphas, alphas, …])
Initialize self.
fit(X, y)
Fit estimator.
predict(X[, alpha])
predict
The linear predictor of the model.
predict_cumulative_hazard_function(X[, alpha])
predict_cumulative_hazard_function
Predict cumulative hazard function.
predict_survival_function(X[, alpha])
predict_survival_function
Predict survival function.
score(X, y)
score
Returns the concordance index of the prediction.
X (array-like, shape = (n_samples, n_features)) – Data matrix
y (structured array, shape = (n_samples,)) – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.
self
X (array-like, shape = (n_samples, n_features)) – Test data of which to calculate log-likelihood from
alpha (float, optional) – Constant that multiplies the penalty terms. If the same alpha was used during training, exact coefficients are used, otherwise coefficients are interpolated from the closest alpha values that were used during training. If set to None, the last alpha in the solution path is used.
T – The predicted decision function
array, shape = (n_samples,)
Only available if fit() has been called with fit_baseline_model = True.
fit()
The cumulative hazard function for an individual with feature vector \(x_\alpha\) is defined as
where \(H_0(t)\) is the baseline hazard function, estimated by Breslow’s estimator.
X (array-like, shape = (n_samples, n_features)) – Data matrix.
alpha (float, optional) – Constant that multiplies the penalty terms. The same alpha as used during training must be specified. If set to None, the last alpha in the solution path is used.
cum_hazard – Predicted cumulative hazard functions.
ndarray of sksurv.functions.StepFunction, shape = (n_samples,)
sksurv.functions.StepFunction
Examples
>>> import matplotlib.pyplot as plt >>> from sksurv.datasets import load_breast_cancer >>> from sksurv.preprocessing import OneHotEncoder >>> from sksurv.linear_model import CoxnetSurvivalAnalysis
Load and prepare the data.
>>> X, y = load_breast_cancer() >>> X = OneHotEncoder().fit_transform(X)
Fit the model.
>>> estimator = CoxnetSurvivalAnalysis(l1_ratio=0.99, fit_baseline_model=True) >>> estimator.fit(X, y)
Estimate the cumulative hazard function for one sample and the five highest alpha.
>>> chf_funcs = {} >>> for alpha in estimator.alphas_[:5]: ... chf_funcs[alpha] = estimator.predict_cumulative_hazard_function( ... X.iloc[:1], alpha=alpha) ...
Plot the estimated cumulative hazard functions.
>>> for alpha, chf_alpha in chf_funcs.items(): ... for fn in chf_alpha: ... plt.step(fn.x, fn(fn.x), where="post", ... label="alpha = {:.3f}".format(alpha)) ... >>> plt.ylim(0, 1) >>> plt.legend() >>> plt.show()
The survival function for an individual with feature vector \(x_\alpha\) is defined as
where \(S_0(t)\) is the baseline survival function, estimated by Breslow’s estimator.
survival – Predicted survival functions.
Estimate the survival function for one sample and the five highest alpha.
>>> surv_funcs = {} >>> for alpha in estimator.alphas_[:5]: ... surv_funcs[alpha] = estimator.predict_survival_function( ... X.iloc[:1], alpha=alpha) ...
Plot the estimated survival functions.
>>> for alpha, surv_alpha in surv_funcs.items(): ... for fn in surv_alpha: ... plt.step(fn.x, fn(fn.x), where="post", ... label="alpha = {:.3f}".format(alpha)) ... >>> plt.ylim(0, 1) >>> plt.legend() >>> plt.show()
X (array-like, shape = (n_samples, n_features)) – Test samples.
cindex – Estimated concordance index.