Skip to Content
DocumentationMultimodal fusionModality configuration

Modality configuration

Each secondary data source is described by a ModalityConfig. The primary modality is configured through the top-level ExperimentConfiguration fields (primary_data_file, taxonomic_configs, etc.). Secondary modalities are passed as a dictionary to MultimodalConfig.modality_configs.

ModalityConfig

import mllabiome as mll modality = mll.ModalityConfig( name="modality_a", file="path/to/modality_a_data.tsv", sample_id_column="Sample", )

Parameters

ParameterTypeDefaultDescription
namestrrequiredUnique identifier for this modality. Used in the results directory structure.
filestrrequiredPath to the TSV feature matrix.
sample_id_columnstrNoneColumn containing the sample identifier. Must match the primary modality and metadata.
exclude_columnslist[str]NoneColumns to drop before treating the remaining columns as features.
enable_categorical_encodingboolTrueAutomatically encode categorical columns via label encoding.
enable_numeric_scalingboolFalseApply StandardScaler to numeric columns.
categorical_thresholdint20A column with at most this many unique values is treated as categorical (when auto-detection is enabled).
force_categorical_columnslist[str]NoneColumns to treat as categorical regardless of auto-detection.
force_numeric_columnslist[str]NoneColumns to treat as numeric regardless of auto-detection.
nan_handlingNaNHandlingStrategyDROP_SAMPLESStrategy for missing values. See NaN handling.
nan_fill_valuefloatNoneFill value when nan_handling=FILL_CONSTANT.
nan_knn_neighborsint5Number of neighbours when nan_handling=KNN_IMPUTE.
verbose_nan_handlingboolTruePrint a summary of NaN handling to stdout.
model_configslist[ModelSpec]NonePer-modality model list for late fusion. When None, the primary modality’s models are used.
case_sensitiveboolTrueWhen False, sample IDs are normalised to lowercase before alignment.

MultimodalConfig

multimodal_config = mll.MultimodalConfig( modality_configs={"modality_a": modality}, fusion_configs=[...], exclude_incomplete_samples=mll.SampleExclusionStrategy.TASK_LEVEL, run_siso=True, )

Parameters

ParameterTypeDefaultDescription
modality_configsdict[str, ModalityConfig]requiredMaps a modality name to its configuration.
fusion_configslist[FusionConfig]requiredFusion strategies to evaluate. See Fusion strategies.
exclude_incomplete_samplesSampleExclusionStrategyTASK_LEVELHow to handle samples missing from one modality.
run_sisoboolTrueAlso run single-modality baselines for the primary modality.

Sample exclusion strategies

SampleExclusionStrategyBehaviour
TASK_LEVELEach target task uses all samples available for that task, even if some are absent from one modality. This is the recommended default.
EXPERIMENT_LEVELOnly samples present in every modality are used across all tasks.

Per-modality models (late fusion)

In late fusion, each modality is trained independently. By default every modality uses the models specified in primary_modality_models. To override this on a per-modality basis, use model_configs on the ModalityConfig or on the FusionConfig:

# Option 1: via ModalityConfig modality = mll.ModalityConfig( name="modality_a", file="modality_a_data.tsv", model_configs=[ mll.LogisticRegression(max_iter=100), mll.SVC(), ], ) # Option 2: via FusionConfig (overrides per strategy) from mllabiome.ai_space.multimodal import FusionConfig, FusionStrategy, LateFusionType fusion = FusionConfig( strategy=FusionStrategy.LATE, fusion_type=LateFusionType.STACKING, model_configs={ "primary_modality": [ mll.RandomForestClassifier(n_estimators=100), mll.XGBoost(n_estimators=100), ], "modality_a": [mll.LogisticRegression(max_iter=100)], }, )
Last updated on