NaN handling
Multimodal data frequently contains missing values. A sample may be present in one modality but absent or incomplete in another. mllabiome provides per-modality NaN handling configured through ModalityConfig.nan_handling.
Available strategies
All strategies are members of mll.NaNHandlingStrategy:
| Strategy | Behaviour |
|---|---|
DROP_SAMPLES | Remove any sample that contains at least one NaN in this modality. This is the default. |
FILL_ZERO | Replace NaN with 0. |
FILL_MEAN | Replace NaN with the column mean (numeric) or mode (categorical). |
FILL_MEDIAN | Replace NaN with the column median (numeric) or mode (categorical). |
FILL_MODE | Replace NaN with the most frequent value. |
FILL_CONSTANT | Replace NaN with the value specified by nan_fill_value. |
KNN_IMPUTE | K-nearest-neighbours imputation. The number of neighbours is controlled by nan_knn_neighbors (default 5). |
FORWARD_FILL | Forward-fill (useful for time-series data). |
BACKWARD_FILL | Backward-fill (useful for time-series data). |
INTERPOLATE | Linear interpolation between existing values. |
Configuration
NaN handling is set on each ModalityConfig individually, allowing different strategies per modality:
import mllabiome as mll
modality_a = mll.ModalityConfig(
name="modality_a",
file="modality_a_data.tsv",
sample_id_column="Sample",
nan_handling=mll.NaNHandlingStrategy.FILL_MEDIAN,
verbose_nan_handling=True,
)
modality_b = mll.ModalityConfig(
name="modality_b",
file="modality_b_data.tsv",
sample_id_column="Sample",
nan_handling=mll.NaNHandlingStrategy.DROP_SAMPLES,
verbose_nan_handling=True,
)Verbose output
When verbose_nan_handling=True, a summary is printed during evaluation:
NaN handling [modality_a]: fill_median
Found 26 NaN values in 6 samples
Imputed 26 values -> 0 NaN remainingStrategy-specific parameters
| Parameter | Used by | Default | Description |
|---|---|---|---|
nan_fill_value | FILL_CONSTANT | None | The constant value to fill. Required when using FILL_CONSTANT. |
nan_knn_neighbors | KNN_IMPUTE | 5 | Number of neighbours for KNN imputation. |
Last updated on