Experiment configuration
This page walks through every section of example/IBD/ibd_franzosa_multimodal.py. Sections that are identical to the single-modality tutorials link back rather than repeat the explanation.
Imports
from pathlib import Path
import mllabiome as mll
from mllabiome.ai_space.multimodal import (
FusionConfig,
FusionStrategy,
EarlyFusionType,
LateFusionType,
)The multimodal-specific types (FusionConfig, FusionStrategy, EarlyFusionType, LateFusionType) are imported from mllabiome.ai_space.multimodal. Everything else comes from the top-level mllabiome namespace.
File paths
MICROBIOME_FILE_PATH = Path(
"example/IBD/data/FRANZOSA_IBD_2019_profiles_hierarchical.tsv"
)
METABOLOMICS_FILE_PATH = Path("example/IBD/data/metabolomics_data.tsv")
METADATA_FILE_PATH = Path("example/IBD/data/metadata.tsv")
EXPERIMENT_DIR = Path("results/ibd_multimodal")
SAMPLE_ID_COLUMN_NAME = "Sample"
TARGET_COLUMN_NAME = "Study.Group"
TASK_TYPE = mll.TaskType.MULTICLASSTwo data files are needed instead of one: the hierarchical microbiome profiles and the prepared metabolomics matrix. Both share the Sample column as the sample identifier.
Evaluation protocol
NESTED_CV_CONFIG = mll.NestedCVConfig(
outer_folds=5,
inner_folds=3,
repeats=2,
random_state=42,
stratify=True,
stratify_columns=[TARGET_COLUMN_NAME],
)
EVALUATION_THRESHOLDS = mll.EvaluationThresholds(
inner_val_performance_threshold=0.51,
inner_val_single_fold_performance_threshold=0.51,
)These settings are identical to the single-modality tutorials. See Evaluation protocol for a description.
Primary modality settings
The primary modality (microbiome) uses the same configuration types as the microbiome tutorial, narrowed down for speed:
TAXONOMIC_RESOLUTIONS_CONFIGS = [
mll.TaxonomicProcessingConfig.filter_exact(
level=mll.TaxonomicLevel.GENUS,
),
]
TRANSFORMS_CONFIGS = [
mll.TransformationConfig(
transform_type=mll.TransformationType.NONE,
normalize_to_relative=False,
),
]
MODEL_CONFIGS = [
mll.XGBoost(n_estimators=100),
mll.RandomForestClassifier(n_estimators=100, min_samples_leaf=5, random_state=91),
]A single taxonomic resolution (genus), one transformation (pass-through), and two learners keep the sweep small while still demonstrating the multimodal workflow.
Secondary modality: metabolomics
METABOLOMICS_MODALITY = mll.ModalityConfig(
name="metabolomics",
file=str(METABOLOMICS_FILE_PATH),
sample_id_column=SAMPLE_ID_COLUMN_NAME,
nan_handling=mll.NaNHandlingStrategy.DROP_SAMPLES,
verbose_nan_handling=True,
)| Parameter | Value | Purpose |
|---|---|---|
name | "metabolomics" | Unique identifier for this modality. Appears in the results directory structure. |
file | Path to metabolomics_data.tsv | The prepared feature matrix. |
sample_id_column | "Sample" | Must match the sample identifier used by the primary modality and metadata. |
nan_handling | DROP_SAMPLES | Removes any sample that contains missing values in this modality. |
verbose_nan_handling | True | Prints a summary of dropped or imputed samples. |
For the full set of ModalityConfig parameters, see Modality configuration.
Fusion strategies
FUSION_CONFIGS = [
FusionConfig(
strategy=FusionStrategy.EARLY,
fusion_type=EarlyFusionType.CONCAT,
),
FusionConfig(
strategy=FusionStrategy.LATE,
fusion_type=LateFusionType.MEAN,
),
]Each FusionConfig pairs a FusionStrategy (early, late, intermediate, or hybrid) with a concrete fusion type.
- Early / CONCAT: concatenates the microbiome and metabolomics feature columns into one matrix, then trains a single model on the combined features.
- Late / MEAN: trains one model per modality independently, then averages their predicted probabilities to produce the final output.
The experiment evaluates both strategies so results can be compared directly.
Multimodal configuration
MULTIMODAL_CONFIG = mll.MultimodalConfig(
modality_configs={"metabolomics": METABOLOMICS_MODALITY},
fusion_configs=FUSION_CONFIGS,
exclude_incomplete_samples=mll.SampleExclusionStrategy.TASK_LEVEL,
run_siso=False,
)| Parameter | Value | Purpose |
|---|---|---|
modality_configs | {"metabolomics": ...} | Dictionary mapping modality names to their ModalityConfig. |
fusion_configs | List of FusionConfig | Fusion strategies to evaluate. |
exclude_incomplete_samples | TASK_LEVEL | Each target task uses all samples available for that task, even if some are missing from one modality. |
run_siso | False | Skip single-input single-output baselines. Set to True to also run microbiome-only baselines for comparison. |
Assembling the configuration
The differences from a single-modality microbiome experiment are highlighted in the comments:
config = mll.ExperimentConfiguration(
microbiome_file=MICROBIOME_FILE_PATH,
metadata_file=METADATA_FILE_PATH,
experiment_dir=EXPERIMENT_DIR,
sample_id_column=SAMPLE_ID_COLUMN_NAME,
taxonomic_configs=TAXONOMIC_RESOLUTIONS_CONFIGS,
transform_configs=TRANSFORMS_CONFIGS,
target_configs=[
mll.TargetConfig(
column=TARGET_COLUMN_NAME,
task_type=TASK_TYPE,
)
],
primary_modality_models=MODEL_CONFIGS,
multimodal_config=MULTIMODAL_CONFIG, # NEW: multimodal configuration
nested_cv_config=NESTED_CV_CONFIG,
hyperopt_config=mll.HyperoptConfig(enabled=False, ...),
evaluation_thresholds=EVALUATION_THRESHOLDS,
execution_config=mll.ExperimentExecutionConfig(...),
)The only structural difference is multimodal_config=MULTIMODAL_CONFIG instead of multimodal_config=None. All other parameters remain the same as in a microbiome-only experiment.