sksurv.svm.HingeLossSurvivalSVM#

class sksurv.svm.HingeLossSurvivalSVM(alpha=1.0, *, solver='ecos', kernel='linear', gamma=None, degree=3, coef0=1, kernel_params=None, pairs='all', verbose=False, timeit=None, max_iter=None)[source]#

Naive implementation of kernel survival support vector machine.

This implementation creates a new set of samples by building the difference between any two feature vectors in the original data. This approach requires \(O(\text{n_samples}^4)\) space and \(O(\text{n_samples}^6 \cdot \text{n_features})\) time, making it computationally intensive for large datasets.

The optimization problem is formulated as:

\[ \begin{align}\begin{aligned}\begin{split}\min_{\mathbf{w}}\quad \frac{1}{2} \lVert \mathbf{w} \rVert_2^2 + \gamma \sum_{i = 1}^n \xi_i \\ \text{subject to}\quad \mathbf{w}^\top \phi(\mathbf{x})_i - \mathbf{w}^\top \phi(\mathbf{x})_j \geq 1 - \xi_{ij},\quad \forall (i, j) \in \mathcal{P}, \\ \xi_i \geq 0,\quad \forall (i, j) \in \mathcal{P}.\end{split}\\\mathcal{P} = \{ (i, j) \mid y_i > y_j \land \delta_j = 1 \}_{i,j=1,\dots,n}.\end{aligned}\end{align} \]

See [1], [2], [3] for further description.

Parameters:

alpha (float, optional, default: 1) – Weight of penalizing the hinge loss in the objective function. Must be greater than 0.
solver ({'ecos', 'osqp'}, optional, default: 'ecos') – Which quadratic program solver to use.
kernel (str or callable, optional, default: 'linear') – Kernel mapping used internally. This parameter is directly passed to sklearn.metrics.pairwise.pairwise_kernels(). If kernel is a string, it must be one of the metrics in sklearn.pairwise.PAIRWISE_KERNEL_FUNCTIONS or “precomputed”. If kernel is “precomputed”, X is assumed to be a kernel matrix. Alternatively, if kernel is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from X as input and return the corresponding kernel value as a single number. This means that callables from sklearn.metrics.pairwise are not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead.
gamma (float or None, optional, default: None) – Gamma parameter for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. Ignored by other kernels.
degree (int, optional, default: 3) – Degree of the polynomial kernel. Ignored by other kernels.
coef0 (float, optional, default: 1) – Zero coefficient for polynomial and sigmoid kernels. Ignored by other kernels.
kernel_params (dict or None, optional, default: None) – Additional parameters (keyword arguments) for kernel function passed as callable object.
pairs ({'all', 'nearest', 'next'}, optional, default: 'all') –
Which constraints to use in the optimization problem.
- all: Use all comparable pairs. Scales quadratically in number of samples.
- nearest: Only considers comparable pairs \((i, j)\) where \(j\) is the uncensored sample with highest survival time smaller than \(y_i\). Scales linearly in number of samples (cf. sksurv.svm.MinlipSurvivalAnalysis).
- next: Only compare against direct nearest neighbor according to observed time, disregarding its censoring status. Scales linearly in number of samples.
verbose (bool, optional, default: False) – If True, enable verbose output of the solver.
timeit (bool, int, or None, optional, default: False) – If True or a non-zero integer, the time taken for optimization is measured. If an integer is provided, the optimization is repeated that many times. Results can be accessed from the timings_ attribute.
max_iter (int or None, optional, default: None) – The maximum number of iterations taken for the solvers to converge. If None, use solver’s default value.

X_fit_#

Training data.

Type:: ndarray, shape = (n_samples, n_features_in_)

coef_#

Coefficients of the features in the decision function.

Type:: ndarray, shape = (n_samples,), dtype = float

n_features_in_#

Number of features seen during fit.

Type:: int

feature_names_in_#

Names of features seen during fit. Defined only when X has feature names that are all strings.

Type:: ndarray, shape = (n_features_in_,), dtype = object

n_iter_#

Number of iterations run by the optimization routine to fit the model.

Type:: int

See also

sksurv.svm.NaiveSurvivalSVM: The linear naive survival SVM based on liblinear.

References

__init__(alpha=1.0, *, solver='ecos', kernel='linear', gamma=None, degree=3, coef0=1, kernel_params=None, pairs='all', verbose=False, timeit=None, max_iter=None)[source]#

Methods

`__init__`([alpha, solver, kernel, gamma, ...])
`fit`(X, y)	Build a MINLIP survival model from training data.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X)	Predict risk score of experiencing an event.
`score`(X, y)	Returns the concordance index of the prediction.
`set_params`(**params)	Set the parameters of this estimator.

fit(X, y)[source]#

Build a MINLIP survival model from training data.

Parameters:

X (array-like, shape = (n_samples, n_features)) – Data matrix.
y (structured array, shape = (n_samples,)) – A structured array with two fields. The first field is a boolean where True indicates an event and False indicates right-censoring. The second field is a float with the time of event or time of censoring.

Return type:

self

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

predict(X)[source]#

Predict risk score of experiencing an event.

Higher values indicate an increased risk of experiencing an event, lower values a decreased risk of experiencing an event. The scores have no unit and are only meaningful to rank samples by their risk of experiencing an event.

Parameters:: X (array-like, shape = (n_samples, n_features)) – The input samples.
Returns:: y – Predicted risk.
Return type:: ndarray, shape = (n_samples,)

score(X, y)[source]#

Returns the concordance index of the prediction.

Parameters:

X (array-like, shape = (n_samples, n_features)) – Test samples.
y (structured array, shape = (n_samples,)) – A structured array containing the binary event indicator as first field, and time of event or time of censoring as second field.

Returns:

cindex – Estimated concordance index.

Return type:

float