sksurv.io.writearff#

sksurv.io.writearff(data, filename, relation_name=None, index=True)[source]#

Write ARFF file

Parameters:
  • data (pandas.DataFrame or polars.DataFrame) – Polars input is converted to pandas internally; pl.Enum columns keep their declared categories (incl. unseen labels) in the ARFF header.

  • filename (str or file-like object) – Path to ARFF file or file-like object. In the latter case, the handle is closed by calling this function.

  • relation_name (str, optional, default: 'pandas') – Name of relation in ARFF file.

  • index (boolean, optional, default: True) – Write row names (index). Only relevant for pandas input; other dataframe libraries have no row-index concept, so the value is ignored.

See also

loadarff

Function to read ARFF files.

Examples

>>> import tempfile
>>> from pathlib import Path
>>> import numpy as np
>>> import pandas as pd
>>> from sksurv.io import writearff
>>>
>>> # Create a dummy DataFrame
>>> data = pd.DataFrame({
...     'feature1': [1.0, 3.0, 5.0],
...     'feature2': [2.0, np.nan, 6.0],
...     'class': ['A', 'B', 'C']
... }, index=['One', 'Two', 'Three'])
>>>
>>> # Write to a temporary directory so the CWD stays clean.
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     path = Path(tmpdir) / "data.arff"
...     writearff(data, str(path), relation_name='test_data')
...     print(path.read_text())
@relation test_data

@attribute index        {One,Three,Two}
@attribute feature1     real
@attribute feature2     real
@attribute class        {A,B,C}

@data
One,1.0,2.0,A
Two,3.0,?,B
Three,5.0,6.0,C

Polars input is accepted as well. pl.Enum columns preserve their declared category list (including labels absent from the data) in the resulting ARFF header.

>>> import polars as pl
>>> data_pl = pl.DataFrame({
...     'feature1': [1.0, 3.0, 5.0],
...     'class': pl.Series(['A', 'B', 'C'], dtype=pl.Enum(['A', 'B', 'C'])),
... })
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     path = Path(tmpdir) / "data.arff"
...     writearff(data_pl, str(path), relation_name='test_data')