sksurv.linear_model.CoxnetSurvivalAnalysis#
- class sksurv.linear_model.CoxnetSurvivalAnalysis(*, n_alphas=100, alphas=None, alpha_min_ratio='auto', l1_ratio=0.5, penalty_factor=None, normalize=False, copy_X=True, tol=1e-07, max_iter=100000, verbose=False, fit_baseline_model=False)[source]#
Cox’s proportional hazard’s model with elastic net penalty.
See the User Guide and 1 for further description.
- Parameters
n_alphas (int, optional, default: 100) – Number of alphas along the regularization path.
alphas (array-like or None, optional) – List of alphas where to compute the models. If
None
alphas are set automatically.alpha_min_ratio (float or { "auto" }, optional, default: "auto") –
Determines minimum alpha of the regularization path if
alphas
isNone
. The smallest value for alpha is computed as the fraction of the data derived maximum alpha (i.e. the smallest value for which all coefficients are zero).If set to “auto”, the value will depend on the sample size relative to the number of features. If
n_samples > n_features
, the default value is 0.0001 Ifn_samples <= n_features
, 0.01 is the default value.l1_ratio (float, optional, default: 0.5) – The ElasticNet mixing parameter, with
0 < l1_ratio <= 1
. Forl1_ratio = 0
the penalty is an L2 penalty. Forl1_ratio = 1
it is an L1 penalty. For0 < l1_ratio < 1
, the penalty is a combination of L1 and L2.penalty_factor (array-like or None, optional) –
Separate penalty factors can be applied to each coefficient. This is a number that multiplies alpha to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables.
Note: the penalty factors are internally rescaled to sum to n_features, and the alphas sequence will reflect this change.
normalize (boolean, optional, default: False) – If True, the features X will be normalized before optimization by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use
sklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.copy_X (boolean, optional, default: True) – If
True
, X will be copied; else, it may be overwritten.tol (float, optional, default: 1e-7) – The tolerance for the optimization: optimization continues until all updates are smaller than
tol
.max_iter (int, optional, default: 100000) – The maximum number of iterations.
verbose (bool, optional, default: False) – Whether to print additional information during optimization.
fit_baseline_model (bool, optional, default: False) – Whether to estimate baseline survival function and baseline cumulative hazard function for each alpha. If enabled,
predict_cumulative_hazard_function()
andpredict_survival_function()
can be used to obtain predicted cumulative hazard function and survival function.
- alphas_#
The actual sequence of alpha values used.
- Type
ndarray, shape=(n_alphas,)
- alpha_min_ratio_#
The inferred value of alpha_min_ratio.
- Type
float
- penalty_factor_#
The actual penalty factors used.
- Type
ndarray, shape=(n_features,)
- coef_#
Matrix of coefficients.
- Type
ndarray, shape=(n_features, n_alphas)
- offset_#
Bias term to account for non-centered features.
- Type
ndarray, shape=(n_alphas,)
- deviance_ratio_#
The fraction of (null) deviance explained.
- Type
ndarray, shape=(n_alphas,)
- n_features_in_#
Number of features seen during
fit
.- Type
int
- feature_names_in_#
Names of features seen during
fit
. Defined only when X has feature names that are all strings.- Type
ndarray of shape (n_features_in_,)
- event_times_#
Unique time points where events occurred.
- Type
array of shape = (n_event_times,)
References
- 1
Simon N, Friedman J, Hastie T, Tibshirani R. Regularization paths for Cox’s proportional hazards model via coordinate descent. Journal of statistical software. 2011 Mar;39(5):1.
- __init__(*, n_alphas=100, alphas=None, alpha_min_ratio='auto', l1_ratio=0.5, penalty_factor=None, normalize=False, copy_X=True, tol=1e-07, max_iter=100000, verbose=False, fit_baseline_model=False)[source]#
Methods
__init__
(*[, n_alphas, alphas, ...])fit
(X, y)Fit estimator.
get_params
([deep])Get parameters for this estimator.
predict
(X[, alpha])The linear predictor of the model.
predict_cumulative_hazard_function
(X[, ...])Predict cumulative hazard function.
predict_survival_function
(X[, alpha, ...])Predict survival function.
score
(X, y)Returns the concordance index of the prediction.
set_params
(**params)Set the parameters of this estimator.
Attributes
- fit(X, y)[source]#
Fit estimator.
- Parameters
X (array-like, shape = (n_samples, n_features)) – Data matrix
y (structured array, shape = (n_samples,)) – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.
- Return type
self
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- predict(X, alpha=None)[source]#
The linear predictor of the model.
- Parameters
X (array-like, shape = (n_samples, n_features)) – Test data of which to calculate log-likelihood from
alpha (float, optional) – Constant that multiplies the penalty terms. If the same alpha was used during training, exact coefficients are used, otherwise coefficients are interpolated from the closest alpha values that were used during training. If set to
None
, the last alpha in the solution path is used.
- Returns
T – The predicted decision function
- Return type
array, shape = (n_samples,)
- predict_cumulative_hazard_function(X, alpha=None, return_array=False)[source]#
Predict cumulative hazard function.
Only available if
fit()
has been called with fit_baseline_model = True.The cumulative hazard function for an individual with feature vector \(x_\alpha\) is defined as
\[H(t \mid x_\alpha) = \exp(x_\alpha^\top \beta) H_0(t) ,\]where \(H_0(t)\) is the baseline hazard function, estimated by Breslow’s estimator.
- Parameters
X (array-like, shape = (n_samples, n_features)) – Data matrix.
alpha (float, optional) – Constant that multiplies the penalty terms. The same alpha as used during training must be specified. If set to
None
, the last alpha in the solution path is used.return_array (boolean, default: False) – If set, return an array with the cumulative hazard rate for each self.event_times_, otherwise an array of
sksurv.functions.StepFunction
.
- Returns
cum_hazard – If return_array is set, an array with the cumulative hazard rate for each self.event_times_, otherwise an array of length n_samples of
sksurv.functions.StepFunction
instances will be returned.- Return type
ndarray
Examples
>>> import matplotlib.pyplot as plt >>> from sksurv.datasets import load_breast_cancer >>> from sksurv.preprocessing import OneHotEncoder >>> from sksurv.linear_model import CoxnetSurvivalAnalysis
Load and prepare the data.
>>> X, y = load_breast_cancer() >>> X = OneHotEncoder().fit_transform(X)
Fit the model.
>>> estimator = CoxnetSurvivalAnalysis(l1_ratio=0.99, fit_baseline_model=True) >>> estimator.fit(X, y)
Estimate the cumulative hazard function for one sample and the five highest alpha.
>>> chf_funcs = {} >>> for alpha in estimator.alphas_[:5]: ... chf_funcs[alpha] = estimator.predict_cumulative_hazard_function( ... X.iloc[:1], alpha=alpha) ...
Plot the estimated cumulative hazard functions.
>>> for alpha, chf_alpha in chf_funcs.items(): ... for fn in chf_alpha: ... plt.step(fn.x, fn(fn.x), where="post", ... label="alpha = {:.3f}".format(alpha)) ... >>> plt.ylim(0, 1) >>> plt.legend() >>> plt.show()
- predict_survival_function(X, alpha=None, return_array=False)[source]#
Predict survival function.
Only available if
fit()
has been called with fit_baseline_model = True.The survival function for an individual with feature vector \(x_\alpha\) is defined as
\[S(t \mid x_\alpha) = S_0(t)^{\exp(x_\alpha^\top \beta)} ,\]where \(S_0(t)\) is the baseline survival function, estimated by Breslow’s estimator.
- Parameters
X (array-like, shape = (n_samples, n_features)) – Data matrix.
alpha (float, optional) – Constant that multiplies the penalty terms. The same alpha as used during training must be specified. If set to
None
, the last alpha in the solution path is used.return_array (boolean, default: False) – If set, return an array with the probability of survival for each self.event_times_, otherwise an array of
sksurv.functions.StepFunction
.
- Returns
survival – If return_array is set, an array with the probability of survival for each self.event_times_, otherwise an array of length n_samples of
sksurv.functions.StepFunction
instances will be returned.- Return type
ndarray
Examples
>>> import matplotlib.pyplot as plt >>> from sksurv.datasets import load_breast_cancer >>> from sksurv.preprocessing import OneHotEncoder >>> from sksurv.linear_model import CoxnetSurvivalAnalysis
Load and prepare the data.
>>> X, y = load_breast_cancer() >>> X = OneHotEncoder().fit_transform(X)
Fit the model.
>>> estimator = CoxnetSurvivalAnalysis(l1_ratio=0.99, fit_baseline_model=True) >>> estimator.fit(X, y)
Estimate the survival function for one sample and the five highest alpha.
>>> surv_funcs = {} >>> for alpha in estimator.alphas_[:5]: ... surv_funcs[alpha] = estimator.predict_survival_function( ... X.iloc[:1], alpha=alpha) ...
Plot the estimated survival functions.
>>> for alpha, surv_alpha in surv_funcs.items(): ... for fn in surv_alpha: ... plt.step(fn.x, fn(fn.x), where="post", ... label="alpha = {:.3f}".format(alpha)) ... >>> plt.ylim(0, 1) >>> plt.legend() >>> plt.show()
- score(X, y)[source]#
Returns the concordance index of the prediction.
- Parameters
X (array-like, shape = (n_samples, n_features)) – Test samples.
y (structured array, shape = (n_samples,)) – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.
- Returns
cindex – Estimated concordance index.
- Return type
float
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance