Skip to Content

Experiment configuration

This page walks through every section of example/IBD/ibd_franzosa_multimodal.py. Sections that are identical to the single-modality tutorials link back rather than repeat the explanation.

Imports

from pathlib import Path import mllabiome as mll from mllabiome.ai_space.multimodal import ( FusionConfig, FusionStrategy, EarlyFusionType, LateFusionType, )

The multimodal-specific types (FusionConfig, FusionStrategy, EarlyFusionType, LateFusionType) are imported from mllabiome.ai_space.multimodal. Everything else comes from the top-level mllabiome namespace.

File paths

MICROBIOME_FILE_PATH = Path( "example/IBD/data/FRANZOSA_IBD_2019_profiles_hierarchical.tsv" ) METABOLOMICS_FILE_PATH = Path("example/IBD/data/metabolomics_data.tsv") METADATA_FILE_PATH = Path("example/IBD/data/metadata.tsv") EXPERIMENT_DIR = Path("results/ibd_multimodal") SAMPLE_ID_COLUMN_NAME = "Sample" TARGET_COLUMN_NAME = "Study.Group" TASK_TYPE = mll.TaskType.MULTICLASS

Two data files are needed instead of one: the hierarchical microbiome profiles and the prepared metabolomics matrix. Both share the Sample column as the sample identifier.

Evaluation protocol

NESTED_CV_CONFIG = mll.NestedCVConfig( outer_folds=5, inner_folds=3, repeats=2, random_state=42, stratify=True, stratify_columns=[TARGET_COLUMN_NAME], ) EVALUATION_THRESHOLDS = mll.EvaluationThresholds( inner_val_performance_threshold=0.51, inner_val_single_fold_performance_threshold=0.51, )

These settings are identical to the single-modality tutorials. See Evaluation protocol for a description.

Primary modality settings

The primary modality (microbiome) uses the same configuration types as the microbiome tutorial, narrowed down for speed:

TAXONOMIC_RESOLUTIONS_CONFIGS = [ mll.TaxonomicProcessingConfig.filter_exact( level=mll.TaxonomicLevel.GENUS, ), ] TRANSFORMS_CONFIGS = [ mll.TransformationConfig( transform_type=mll.TransformationType.NONE, normalize_to_relative=False, ), ] MODEL_CONFIGS = [ mll.XGBoost(n_estimators=100), mll.RandomForestClassifier(n_estimators=100, min_samples_leaf=5, random_state=91), ]

A single taxonomic resolution (genus), one transformation (pass-through), and two learners keep the sweep small while still demonstrating the multimodal workflow.

Secondary modality: metabolomics

METABOLOMICS_MODALITY = mll.ModalityConfig( name="metabolomics", file=str(METABOLOMICS_FILE_PATH), sample_id_column=SAMPLE_ID_COLUMN_NAME, nan_handling=mll.NaNHandlingStrategy.DROP_SAMPLES, verbose_nan_handling=True, )
ParameterValuePurpose
name"metabolomics"Unique identifier for this modality. Appears in the results directory structure.
filePath to metabolomics_data.tsvThe prepared feature matrix.
sample_id_column"Sample"Must match the sample identifier used by the primary modality and metadata.
nan_handlingDROP_SAMPLESRemoves any sample that contains missing values in this modality.
verbose_nan_handlingTruePrints a summary of dropped or imputed samples.

For the full set of ModalityConfig parameters, see Modality configuration.

Fusion strategies

FUSION_CONFIGS = [ FusionConfig( strategy=FusionStrategy.EARLY, fusion_type=EarlyFusionType.CONCAT, ), FusionConfig( strategy=FusionStrategy.LATE, fusion_type=LateFusionType.MEAN, ), ]

Each FusionConfig pairs a FusionStrategy (early, late, intermediate, or hybrid) with a concrete fusion type.

  • Early / CONCAT: concatenates the microbiome and metabolomics feature columns into one matrix, then trains a single model on the combined features.
  • Late / MEAN: trains one model per modality independently, then averages their predicted probabilities to produce the final output.

The experiment evaluates both strategies so results can be compared directly.

Multimodal configuration

MULTIMODAL_CONFIG = mll.MultimodalConfig( modality_configs={"metabolomics": METABOLOMICS_MODALITY}, fusion_configs=FUSION_CONFIGS, exclude_incomplete_samples=mll.SampleExclusionStrategy.TASK_LEVEL, run_siso=False, )
ParameterValuePurpose
modality_configs{"metabolomics": ...}Dictionary mapping modality names to their ModalityConfig.
fusion_configsList of FusionConfigFusion strategies to evaluate.
exclude_incomplete_samplesTASK_LEVELEach target task uses all samples available for that task, even if some are missing from one modality.
run_sisoFalseSkip single-input single-output baselines. Set to True to also run microbiome-only baselines for comparison.

Assembling the configuration

The differences from a single-modality microbiome experiment are highlighted in the comments:

config = mll.ExperimentConfiguration( microbiome_file=MICROBIOME_FILE_PATH, metadata_file=METADATA_FILE_PATH, experiment_dir=EXPERIMENT_DIR, sample_id_column=SAMPLE_ID_COLUMN_NAME, taxonomic_configs=TAXONOMIC_RESOLUTIONS_CONFIGS, transform_configs=TRANSFORMS_CONFIGS, target_configs=[ mll.TargetConfig( column=TARGET_COLUMN_NAME, task_type=TASK_TYPE, ) ], primary_modality_models=MODEL_CONFIGS, multimodal_config=MULTIMODAL_CONFIG, # NEW: multimodal configuration nested_cv_config=NESTED_CV_CONFIG, hyperopt_config=mll.HyperoptConfig(enabled=False, ...), evaluation_thresholds=EVALUATION_THRESHOLDS, execution_config=mll.ExperimentExecutionConfig(...), )

The only structural difference is multimodal_config=MULTIMODAL_CONFIG instead of multimodal_config=None. All other parameters remain the same as in a microbiome-only experiment.

Last updated on