Modality configuration
Each secondary data source is described by a ModalityConfig. The primary modality is configured through the top-level ExperimentConfiguration fields (primary_data_file, taxonomic_configs, etc.). Secondary modalities are passed as a dictionary to MultimodalConfig.modality_configs.
ModalityConfig
import mllabiome as mll
modality = mll.ModalityConfig(
name="modality_a",
file="path/to/modality_a_data.tsv",
sample_id_column="Sample",
)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
name | str | required | Unique identifier for this modality. Used in the results directory structure. |
file | str | required | Path to the TSV feature matrix. |
sample_id_column | str | None | Column containing the sample identifier. Must match the primary modality and metadata. |
exclude_columns | list[str] | None | Columns to drop before treating the remaining columns as features. |
enable_categorical_encoding | bool | True | Automatically encode categorical columns via label encoding. |
enable_numeric_scaling | bool | False | Apply StandardScaler to numeric columns. |
categorical_threshold | int | 20 | A column with at most this many unique values is treated as categorical (when auto-detection is enabled). |
force_categorical_columns | list[str] | None | Columns to treat as categorical regardless of auto-detection. |
force_numeric_columns | list[str] | None | Columns to treat as numeric regardless of auto-detection. |
nan_handling | NaNHandlingStrategy | DROP_SAMPLES | Strategy for missing values. See NaN handling. |
nan_fill_value | float | None | Fill value when nan_handling=FILL_CONSTANT. |
nan_knn_neighbors | int | 5 | Number of neighbours when nan_handling=KNN_IMPUTE. |
verbose_nan_handling | bool | True | Print a summary of NaN handling to stdout. |
model_configs | list[ModelSpec] | None | Per-modality model list for late fusion. When None, the primary modality’s models are used. |
case_sensitive | bool | True | When False, sample IDs are normalised to lowercase before alignment. |
MultimodalConfig
multimodal_config = mll.MultimodalConfig(
modality_configs={"modality_a": modality},
fusion_configs=[...],
exclude_incomplete_samples=mll.SampleExclusionStrategy.TASK_LEVEL,
run_siso=True,
)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
modality_configs | dict[str, ModalityConfig] | required | Maps a modality name to its configuration. |
fusion_configs | list[FusionConfig] | required | Fusion strategies to evaluate. See Fusion strategies. |
exclude_incomplete_samples | SampleExclusionStrategy | TASK_LEVEL | How to handle samples missing from one modality. |
run_siso | bool | True | Also run single-modality baselines for the primary modality. |
Sample exclusion strategies
SampleExclusionStrategy | Behaviour |
|---|---|
TASK_LEVEL | Each target task uses all samples available for that task, even if some are absent from one modality. This is the recommended default. |
EXPERIMENT_LEVEL | Only samples present in every modality are used across all tasks. |
Per-modality models (late fusion)
In late fusion, each modality is trained independently. By default every modality uses the models specified in primary_modality_models. To override this on a per-modality basis, use model_configs on the ModalityConfig or on the FusionConfig:
# Option 1: via ModalityConfig
modality = mll.ModalityConfig(
name="modality_a",
file="modality_a_data.tsv",
model_configs=[
mll.LogisticRegression(max_iter=100),
mll.SVC(),
],
)
# Option 2: via FusionConfig (overrides per strategy)
from mllabiome.ai_space.multimodal import FusionConfig, FusionStrategy, LateFusionType
fusion = FusionConfig(
strategy=FusionStrategy.LATE,
fusion_type=LateFusionType.STACKING,
model_configs={
"primary_modality": [
mll.RandomForestClassifier(n_estimators=100),
mll.XGBoost(n_estimators=100),
],
"modality_a": [mll.LogisticRegression(max_iter=100)],
},
)Last updated on