sksurv.datasets.load_cgvhd#

sksurv.datasets.load_cgvhd()[source]#

Load and return data from multicentre randomized clinical trial initiated for patients with a myeloid malignancy who were to undergo an allogeneic bone marrow transplant.

The dataset is a 100 patient subsample of the full data set. See [2] for further details.

Index

Name

Description

Encoding

1

dx

Diagnosis

AML=acute myeloid leukaemia
CML=chronic myeloid leukaemia

2

tx

Randomized treatment

BM=cell harvested from the bone marrow
PB=cell harvested from peripheral blood

3

extent

Extent of disease

L=limited, E=extensive

4

agvhdgd

Grade of acute GVHD

5

age

Age

Years

6

survtime

Time from date of transplant to death or last follow-up

Years

7

reltime

Time from date of transplant to relapse or last follow-up

Years

8

agvhtime

Time from date of transplant to acute GVHD or last follow-up

Years

9

cgvhtime

Time from date of transplant to chronic GVHD or last follow-up

Years

10

stat

Status

1=Dead, 0=Alive

11

rcens

Relapse

1=Yes, 0=No

12

agvh

Acute GVHD

1=Yes, 0=No

13

cgvh

Chronic GVHD

1=Yes, 0=No

14

stnum

patient ID

Columns 6,7 and 9 contain the time to death, relapse and CGVHD calculated in years (survtime, reltime, cgvhtime) and the respective indicator variables are in columns 10,11 and 13 (stat, rcens, cgvh). The earliest time that any of these events happened is calculated by taking the minimum of the observed times. The censoring variable cens is coded as 0 when no events were observed, 1 if CGVHD was observed as first event, 2 if a relapse was observed as the first event and 3 if death occurred before either of the events: The endpoint (status) is therefore defined as

Value

Description

Count (%)

0

Survival (Right-censored data)

4 patients (4%)

1

Chronic graft versus host disease (CGVHD)

86 events (86%)

2

Relapse (TRM)

5 events (5%)

3

Death

5 events (5%)

The dataset has been obtained from [1].

Returns:

  • x (pandas.DataFrame) – The measurements for each patient.

  • y (structured array with 2 fields) – status: Integer indicating the endpoint: 0: right censored data; 1: CGVHD; 2: relapse; 3: death.

    ftime: total length of follow-up or time of event.

References