sksurv.svm.FastSurvivalSVM#
- class sksurv.svm.FastSurvivalSVM(alpha=1, *, rank_ratio=1.0, fit_intercept=False, max_iter=20, verbose=False, tol=None, optimizer=None, random_state=None, timeit=False)[source]#
Implements an efficient linear Support Vector Machine for survival analysis, capable of optimizing both ranking and regression objectives.
Training data consists of n triplets \((\mathbf{x}_i, y_i, \delta_i)\), where \(\mathbf{x}_i\) is a d-dimensional feature vector, \(y_i > 0\) the survival time or time of censoring, and \(\delta_i \in \{0,1\}\) the binary event indicator. Using the training data, the objective is to minimize the following function:
\[ \begin{align}\begin{aligned} \arg \min_{\mathbf{w}, b} \frac{1}{2} \mathbf{w}^\top \mathbf{w} + \frac{\alpha}{2} \left[ r \sum_{i,j \in \mathcal{P}} \max(0, 1 - (\mathbf{w}^\top \mathbf{x}_i - \mathbf{w}^\top \mathbf{x}_j))^2 + (1 - r) \sum_{i=0}^n \left( \zeta_{\mathbf{w}, b} (y_i, x_i, \delta_i) \right)^2 \right]\\\begin{split}\zeta_{\mathbf{w},b} (y_i, \mathbf{x}_i, \delta_i) = \begin{cases} \max(0, y_i - \mathbf{w}^\top \mathbf{x}_i - b) \quad \text{if $\delta_i = 0$,} \\ y_i - \mathbf{w}^\top \mathbf{x}_i - b \quad \text{if $\delta_i = 1$,} \\ \end{cases}\end{split}\\\mathcal{P} = \{ (i, j) \mid y_i > y_j \land \delta_j = 1 \}_{i,j=1,\dots,n}\end{aligned}\end{align} \]The hyper-parameter \(\alpha > 0\) determines the amount of regularization to apply: a smaller value increases the amount of regularization and a higher value reduces the amount of regularization. The hyper-parameter \(r \in [0; 1]\) determines the trade-off between the ranking objective and the regression objective. If \(r = 1\) it reduces to the ranking objective, and if \(r = 0\) to the regression objective. If the regression objective is used, survival/censoring times are log-transformed and thus cannot be zero or negative.
See the User Guide and [1] for further description.
- Parameters:
alpha (float, default: 1) – Weight of penalizing the squared hinge loss in the objective function. Must be greater than 0.
rank_ratio (float, optional, default: 1.0) – Mixing parameter between regression and ranking objectives, with
0 <= rank_ratio <= 1. Ifrank_ratio = 1, only ranking is performed. Ifrank_ratio = 0, only regression is performed. Arank_ratioless than 1.0 (i.e., including a regression objective) is only supported if theoptimizeris ‘avltree’, ‘rbtree’, or ‘direct-count’.fit_intercept (bool, optional, default: False) – Whether to calculate an intercept for the regression model. If set to
False, no intercept will be calculated. This parameter has no effect ifrank_ratio = 1, i.e., only ranking is performed.max_iter (int, optional, default: 20) – Maximum number of iterations to perform in Newton optimization.
verbose (bool, optional, default: False) – If
True, print messages during optimization.tol (float or None, optional, default: None) – Tolerance for termination. If
None, the solver’s default tolerance is used. Seescipy.optimize.minimize().optimizer ({'avltree', 'direct-count', 'PRSVM', 'rbtree', 'simple'}, optional, default: 'avltree') – Specifies which optimizer to use.
random_state (int,
numpy.random.RandomStateinstance, or None, optional, default: None) – Used to resolve ties in survival times. Pass an int for reproducible output across multiplefit()calls.timeit (bool, int, or None, optional, default: False) – If
Trueor a non-zero integer, the time taken for optimization is measured. If an integer is provided, the optimization is repeated that many times. Results can be accessed from theoptimizer_result_attribute.
- coef_#
Coefficients of the features in the decision function.
- Type:
ndarray, shape = (n_features,), dtype = float
- optimizer_result_#
Stats returned by the optimizer. See
scipy.optimize.OptimizeResult.
- n_features_in_#
Number of features seen during
fit.- Type:
int
- feature_names_in_#
Names of features seen during
fit. Defined only when X has feature names that are all strings.- Type:
ndarray, shape = (n_features_in_,), dtype = object
- n_iter_#
Number of iterations run by the optimization routine to fit the model.
- Type:
int
See also
FastKernelSurvivalSVMFast implementation for arbitrary kernel functions.
References
- __init__(alpha=1, *, rank_ratio=1.0, fit_intercept=False, max_iter=20, verbose=False, tol=None, optimizer=None, random_state=None, timeit=False)[source]#
Methods
__init__([alpha, rank_ratio, fit_intercept, ...])fit(X, y)Build a survival support vector machine model from training data.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
predict(X)Predict risk scores or transformed survival times.
score(X, y)Returns the concordance index of the prediction.
set_params(**params)Set the parameters of this estimator.
Attributes
- fit(X, y)[source]#
Build a survival support vector machine model from training data.
- Parameters:
X (array-like, shape = (n_samples, n_features)) – Data matrix.
y (structured array, shape = (n_samples,)) – A structured array with two fields. The first field is a boolean where
Trueindicates an event andFalseindicates right-censoring. The second field is a float with the time of event or time of censoring.
- Return type:
self
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
dict
- predict(X)[source]#
Predict risk scores or transformed survival times.
If the model has been fit only considering the ranking objective (
rank_ratio = 1), predictions are risk scores (i.e. higher values indicate an increased risk of experiencing an event). The scores have no unit and are only meaningful to rank samples by their risk of experiencing an event.If the regression objective has been used (
rank_ratio < 1), predictions are transformed survival times. Lower scores indicate shorter survival, higher scores longer survival.- Parameters:
X (array-like, shape = (n_samples, n_features)) – The input samples.
- Returns:
y – Risk scores (if
rank_ratio = 1), or transformed survival times (ifrank_ratio < 1).- Return type:
ndarray, shape = (n_samples,), dtype=float
- score(X, y)[source]#
Returns the concordance index of the prediction.
- Parameters:
X (array-like, shape = (n_samples, n_features)) – Test samples.
y (structured array, shape = (n_samples,)) – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.
- Returns:
cindex – Estimated concordance index.
- Return type:
float
See also
sksurv.metrics.concordance_index_censoredComputes the concordance index.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance