Experiment configuration

This page walks through every section of example/IBD/ibd_franzosa_multimodal.py. Sections that are identical to the single-modality tutorials link back rather than repeat the explanation.

Imports


from pathlib import Path
 
import mllabiome as mll
from mllabiome.ai_space.multimodal import (
    FusionConfig,
    FusionStrategy,
    EarlyFusionType,
    LateFusionType,
)

The multimodal-specific types (FusionConfig, FusionStrategy, EarlyFusionType, LateFusionType) are imported from mllabiome.ai_space.multimodal. Everything else comes from the top-level mllabiome namespace.

File paths


MICROBIOME_FILE_PATH = Path(
    "example/IBD/data/FRANZOSA_IBD_2019_profiles_hierarchical.tsv"
)
METABOLOMICS_FILE_PATH = Path("example/IBD/data/metabolomics_data.tsv")
METADATA_FILE_PATH = Path("example/IBD/data/metadata.tsv")
EXPERIMENT_DIR = Path("results/ibd_multimodal")
SAMPLE_ID_COLUMN_NAME = "Sample"
TARGET_COLUMN_NAME = "Study.Group"
TASK_TYPE = mll.TaskType.MULTICLASS

Two data files are needed instead of one: the hierarchical microbiome profiles and the prepared metabolomics matrix. Both share the Sample column as the sample identifier.

Evaluation protocol


NESTED_CV_CONFIG = mll.NestedCVConfig(
    outer_folds=5,
    inner_folds=3,
    repeats=2,
    random_state=42,
    stratify=True,
    stratify_columns=[TARGET_COLUMN_NAME],
)
 
EVALUATION_THRESHOLDS = mll.EvaluationThresholds(
    inner_val_performance_threshold=0.51,
    inner_val_single_fold_performance_threshold=0.51,
)

These settings are identical to the single-modality tutorials. See Evaluation protocol for a description.

Primary modality settings

The primary modality (microbiome) uses the same configuration types as the microbiome tutorial, narrowed down for speed:


TAXONOMIC_RESOLUTIONS_CONFIGS = [
    mll.TaxonomicProcessingConfig.filter_exact(
        level=mll.TaxonomicLevel.GENUS,
    ),
]
 
TRANSFORMS_CONFIGS = [
    mll.TransformationConfig(
        transform_type=mll.TransformationType.NONE,
        normalize_to_relative=False,
    ),
]
 
MODEL_CONFIGS = [
    mll.XGBoost(n_estimators=100),
    mll.RandomForestClassifier(n_estimators=100, min_samples_leaf=5, random_state=91),
]

A single taxonomic resolution (genus), one transformation (pass-through), and two learners keep the sweep small while still demonstrating the multimodal workflow.

Secondary modality: metabolomics


METABOLOMICS_MODALITY = mll.ModalityConfig(
    name="metabolomics",
    file=str(METABOLOMICS_FILE_PATH),
    sample_id_column=SAMPLE_ID_COLUMN_NAME,
    nan_handling=mll.NaNHandlingStrategy.DROP_SAMPLES,
    verbose_nan_handling=True,
)

Parameter	Value	Purpose
`name`	`"metabolomics"`	Unique identifier for this modality. Appears in the results directory structure.
`file`	Path to `metabolomics_data.tsv`	The prepared feature matrix.
`sample_id_column`	`"Sample"`	Must match the sample identifier used by the primary modality and metadata.
`nan_handling`	`DROP_SAMPLES`	Removes any sample that contains missing values in this modality.
`verbose_nan_handling`	`True`	Prints a summary of dropped or imputed samples.

For the full set of ModalityConfig parameters, see Modality configuration.

Fusion strategies


FUSION_CONFIGS = [
    FusionConfig(
        strategy=FusionStrategy.EARLY,
        fusion_type=EarlyFusionType.CONCAT,
    ),
    FusionConfig(
        strategy=FusionStrategy.LATE,
        fusion_type=LateFusionType.MEAN,
    ),
]

Each FusionConfig pairs a FusionStrategy (early, late, intermediate, or hybrid) with a concrete fusion type.

Early / CONCAT: concatenates the microbiome and metabolomics feature columns into one matrix, then trains a single model on the combined features.
Late / MEAN: trains one model per modality independently, then averages their predicted probabilities to produce the final output.

The experiment evaluates both strategies so results can be compared directly.

Multimodal configuration


MULTIMODAL_CONFIG = mll.MultimodalConfig(
    modality_configs={"metabolomics": METABOLOMICS_MODALITY},
    fusion_configs=FUSION_CONFIGS,
    exclude_incomplete_samples=mll.SampleExclusionStrategy.TASK_LEVEL,
    run_siso=False,
)

Parameter	Value	Purpose
`modality_configs`	`{"metabolomics": ...}`	Dictionary mapping modality names to their `ModalityConfig`.
`fusion_configs`	List of `FusionConfig`	Fusion strategies to evaluate.
`exclude_incomplete_samples`	`TASK_LEVEL`	Each target task uses all samples available for that task, even if some are missing from one modality.
`run_siso`	`False`	Skip single-input single-output baselines. Set to `True` to also run microbiome-only baselines for comparison.

Assembling the configuration

The differences from a single-modality microbiome experiment are highlighted in the comments:


config = mll.ExperimentConfiguration(
    microbiome_file=MICROBIOME_FILE_PATH,
    metadata_file=METADATA_FILE_PATH,
    experiment_dir=EXPERIMENT_DIR,
    sample_id_column=SAMPLE_ID_COLUMN_NAME,
    taxonomic_configs=TAXONOMIC_RESOLUTIONS_CONFIGS,
    transform_configs=TRANSFORMS_CONFIGS,
    target_configs=[
        mll.TargetConfig(
            column=TARGET_COLUMN_NAME,
            task_type=TASK_TYPE,
        )
    ],
    primary_modality_models=MODEL_CONFIGS,
    multimodal_config=MULTIMODAL_CONFIG,  # NEW: multimodal configuration
    nested_cv_config=NESTED_CV_CONFIG,
    hyperopt_config=mll.HyperoptConfig(enabled=False, ...),
    evaluation_thresholds=EVALUATION_THRESHOLDS,
    execution_config=mll.ExperimentExecutionConfig(...),
)

The only structural difference is multimodal_config=MULTIMODAL_CONFIG instead of multimodal_config=None. All other parameters remain the same as in a microbiome-only experiment.