sksurv.svm.NaiveSurvivalSVM#

class sksurv.svm.NaiveSurvivalSVM(penalty='l2', loss='squared_hinge', dual=False, tol=0.0001, alpha=1.0, verbose=0, random_state=None, max_iter=1000)[source]#

Naive version of linear Survival Support Vector Machine.

Uses regular linear support vector classifier (liblinear). A new set of samples is created by building the difference between any two feature vectors in the original data, thus this version requires O(n_samples^2) space.

See sksurv.svm.HingeLossSurvivalSVM for the kernel naive survival SVM.

\[ \begin{align}\begin{aligned}\begin{split}\min_{\mathbf{w}}\quad \frac{1}{2} \lVert \mathbf{w} \rVert_2^2 + \gamma \sum_{i = 1}^n \xi_i \\ \text{subject to}\quad \mathbf{w}^\top \mathbf{x}_i - \mathbf{w}^\top \mathbf{x}_j \geq 1 - \xi_{ij},\quad \forall (i, j) \in \mathcal{P}, \\ \xi_i \geq 0,\quad \forall (i, j) \in \mathcal{P}.\end{split}\\\mathcal{P} = \{ (i, j) \mid y_i > y_j \land \delta_j = 1 \}_{i,j=1,\dots,n}.\end{aligned}\end{align} \]

See 1, 2 for further description.

Parameters
  • alpha (float, positive, default: 1.0) – Weight of penalizing the squared hinge loss in the objective function.

  • loss (string, 'hinge' or 'squared_hinge', default: 'squared_hinge') – Specifies the loss function. ‘hinge’ is the standard SVM loss (used e.g. by the SVC class) while ‘squared_hinge’ is the square of the hinge loss.

  • penalty ('l1' | 'l2', default: 'l2') – Specifies the norm used in the penalization. The ‘l2’ penalty is the standard used in SVC. The ‘l1’ leads to coef_ vectors that are sparse.

  • dual (bool, default: True) – Select the algorithm to either solve the dual or primal optimization problem. Prefer dual=False when n_samples > n_features.

  • tol (float, optional, default: 1e-4) – Tolerance for stopping criteria.

  • verbose (int, default: 0) – Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in liblinear that, if enabled, may not work properly in a multithreaded context.

  • random_state (int seed, RandomState instance, or None, default: None) – The seed of the pseudo random number generator to use when shuffling the data.

  • max_iter (int, default: 1000) – The maximum number of iterations to be run.

See also

sksurv.svm.FastSurvivalSVM

Alternative implementation with reduced time complexity for training.

References

1

Van Belle, V., Pelckmans, K., Suykens, J. A., & Van Huffel, S. Support Vector Machines for Survival Analysis. In Proc. of the 3rd Int. Conf. on Computational Intelligence in Medicine and Healthcare (CIMED). 1-8. 2007

2

Evers, L., Messow, C.M., “Sparse kernel methods for high-dimensional survival data”, Bioinformatics 24(14), 1632-8, 2008.

__init__(penalty='l2', loss='squared_hinge', dual=False, tol=0.0001, alpha=1.0, verbose=0, random_state=None, max_iter=1000)[source]#

Methods

__init__([penalty, loss, dual, tol, alpha, ...])

decision_function(X)

Predict confidence scores for samples.

densify()

Convert coefficient matrix to dense array format.

fit(X, y[, sample_weight])

Build a survival support vector machine model from training data.

get_params([deep])

Get parameters for this estimator.

predict(X)

Rank samples according to survival times

score(X, y)

Returns the concordance index of the prediction.

set_params(**params)

Set the parameters of this estimator.

sparsify()

Convert coefficient matrix to sparse format.

decision_function(X)#

Predict confidence scores for samples.

The confidence score for a sample is proportional to the signed distance of that sample to the hyperplane.

Parameters

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The data matrix for which we want to get the confidence scores.

Returns

scores – Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

Return type

ndarray of shape (n_samples,) or (n_samples, n_classes)

densify()#

Convert coefficient matrix to dense array format.

Converts the coef_ member (back) to a numpy.ndarray. This is the default format of coef_ and is required for fitting, so calling this method is only required on models that have previously been sparsified; otherwise, it is a no-op.

Returns

Fitted estimator.

Return type

self

fit(X, y, sample_weight=None)[source]#

Build a survival support vector machine model from training data.

Parameters
  • X (array-like, shape = (n_samples, n_features)) – Data matrix.

  • y (structured array, shape = (n_samples,)) – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.

  • sample_weight (array-like, shape = (n_samples,), optional) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Return type

self

get_params(deep=True)#

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

predict(X)[source]#

Rank samples according to survival times

Lower ranks indicate shorter survival, higher ranks longer survival.

Parameters

X (array-like, shape = (n_samples, n_features,)) – The input samples.

Returns

y – Predicted ranks.

Return type

ndarray, shape = (n_samples,)

score(X, y)[source]#

Returns the concordance index of the prediction.

Parameters
  • X (array-like, shape = (n_samples, n_features)) – Test samples.

  • y (structured array, shape = (n_samples,)) – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.

Returns

cindex – Estimated concordance index.

Return type

float

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

sparsify()#

Convert coefficient matrix to sparse format.

Converts the coef_ member to a scipy.sparse matrix, which for L1-regularized models can be much more memory- and storage-efficient than the usual numpy.ndarray representation.

The intercept_ member is not converted.

Returns

Fitted estimator.

Return type

self

Notes

For non-sparse models, i.e. when there are not many zeros in coef_, this may actually increase memory usage, so use this method with care. A rule of thumb is that the number of zero elements, which can be computed with (coef_ == 0).sum(), must be more than 50% for this to provide significant benefits.

After calling this method, further fitting with the partial_fit method (if any) will not work until you call densify.