sksurv.svm.FastSurvivalSVM#

class sksurv.svm.FastSurvivalSVM(alpha=1, *, rank_ratio=1.0, fit_intercept=False, max_iter=20, verbose=False, tol=None, optimizer=None, random_state=None, timeit=False)[source]#

Implements an efficient linear Support Vector Machine for survival analysis, capable of optimizing both ranking and regression objectives.

Training data consists of n triplets $(\mathbf{x}_i, y_i, \delta_i)$, where $\mathbf{x}_i$ is a d-dimensional feature vector, $y_i > 0$ the survival time or time of censoring, and $\delta_i \in \{0,1\}$ the binary event indicator. Using the training data, the objective is to minimize the following function:

\[ \begin{align}\begin{aligned} \arg \min_{\mathbf{w}, b} \frac{1}{2} \mathbf{w}^\top \mathbf{w} + \frac{\alpha}{2} \left[ r \sum_{i,j \in \mathcal{P}} \max(0, 1 - (\mathbf{w}^\top \mathbf{x}_i - \mathbf{w}^\top \mathbf{x}_j))^2 + (1 - r) \sum_{i=0}^n \left( \zeta_{\mathbf{w}, b} (y_i, x_i, \delta_i) \right)^2 \right]\\\begin{split}\zeta_{\mathbf{w},b} (y_i, \mathbf{x}_i, \delta_i) = \begin{cases} \max(0, y_i - \mathbf{w}^\top \mathbf{x}_i - b) \quad \text{if $\delta_i = 0$,} \\ y_i - \mathbf{w}^\top \mathbf{x}_i - b \quad \text{if $\delta_i = 1$,} \\ \end{cases}\end{split}\\\mathcal{P} = \{ (i, j) \mid y_i > y_j \land \delta_j = 1 \}_{i,j=1,\dots,n}\end{aligned}\end{align} \]

The hyper-parameter $\alpha > 0$ determines the amount of regularization to apply: a smaller value increases the amount of regularization and a higher value reduces the amount of regularization. The hyper-parameter $r \in [0; 1]$ determines the trade-off between the ranking objective and the regression objective. If $r = 1$ it reduces to the ranking objective, and if $r = 0$ to the regression objective. If the regression objective is used, survival/censoring times are log-transformed and thus cannot be zero or negative.

See the User Guide and [1] for further description.

Parameters:

alpha (float, default: 1) – Weight of penalizing the squared hinge loss in the objective function. Must be greater than 0.
rank_ratio (float, optional, default: 1.0) – Mixing parameter between regression and ranking objectives, with 0 <= rank_ratio <= 1. If rank_ratio = 1, only ranking is performed. If rank_ratio = 0, only regression is performed. A rank_ratio less than 1.0 (i.e., including a regression objective) is only supported if the optimizer is ‘avltree’, ‘rbtree’, or ‘direct-count’.
fit_intercept (bool, optional, default: False) – Whether to calculate an intercept for the regression model. If set to False, no intercept will be calculated. This parameter has no effect if rank_ratio = 1, i.e., only ranking is performed.
max_iter (int, optional, default: 20) – Maximum number of iterations to perform in Newton optimization.
verbose (bool, optional, default: False) – If True, print messages during optimization.
tol (float or None, optional, default: None) – Tolerance for termination. If None, the solver’s default tolerance is used. See scipy.optimize.minimize().
optimizer ({'avltree', 'direct-count', 'PRSVM', 'rbtree', 'simple'}, optional, default: 'avltree') – Specifies which optimizer to use.
random_state (int, numpy.random.RandomState instance, or None, optional, default: None) – Used to resolve ties in survival times. Pass an int for reproducible output across multiple fit() calls.
timeit (bool, int, or None, optional, default: False) – If True or a non-zero integer, the time taken for optimization is measured. If an integer is provided, the optimization is repeated that many times. Results can be accessed from the optimizer_result_ attribute.

coef_#

Coefficients of the features in the decision function.

Type:: ndarray, shape = (n_features,), dtype = float

optimizer_result_#

Stats returned by the optimizer. See scipy.optimize.OptimizeResult.

Type:: scipy.optimize.OptimizeResult

n_features_in_#

Number of features seen during fit.

Type:: int

feature_names_in_#

Names of features seen during fit. Defined only when X has feature names that are all strings.

Type:: ndarray, shape = (n_features_in_,), dtype = object

n_iter_#

Number of iterations run by the optimization routine to fit the model.

Type:: int

See also

FastKernelSurvivalSVM: Fast implementation for arbitrary kernel functions.

References

__init__(alpha=1, *, rank_ratio=1.0, fit_intercept=False, max_iter=20, verbose=False, tol=None, optimizer=None, random_state=None, timeit=False)[source]#

Methods

`__init__`([alpha, rank_ratio, fit_intercept, ...])
`fit`(X, y)	Build a survival support vector machine model from training data.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X)	Predict risk scores or transformed survival times.
`score`(X, y)	Returns the concordance index of the prediction.
`set_params`(**params)	Set the parameters of this estimator.

Attributes

n_iter_

fit(X, y)[source]#

Build a survival support vector machine model from training data.

Parameters:

X (array-like, shape = (n_samples, n_features)) – Data matrix.
y (structured array, shape = (n_samples,)) – A structured array with two fields. The first field is a boolean where True indicates an event and False indicates right-censoring. The second field is a float with the time of event or time of censoring.

Return type:

self

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

predict(X)[source]#

Predict risk scores or transformed survival times.

If the model has been fit only considering the ranking objective (rank_ratio = 1), predictions are risk scores (i.e. higher values indicate an increased risk of experiencing an event). The scores have no unit and are only meaningful to rank samples by their risk of experiencing an event.

If the regression objective has been used (rank_ratio < 1), predictions are transformed survival times. Lower scores indicate shorter survival, higher scores longer survival.

Parameters:: X (array-like, shape = (n_samples, n_features)) – The input samples.
Returns:: y – Risk scores (if rank_ratio = 1), or transformed survival times (if rank_ratio < 1).
Return type:: ndarray, shape = (n_samples,), dtype=float

score(X, y)[source]#

Returns the concordance index of the prediction.

Parameters:

X (array-like, shape = (n_samples, n_features)) – Test samples.
y (structured array, shape = (n_samples,)) – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.

Returns:

cindex – Estimated concordance index.

Return type:

float