Running the experiment

Run this from a virtual environment with mllabiome installed and activated. See the Installation guide if needed.

Run the metabolomics example from the repository root:


python example/IBD/ibd_franzosa_mtb.py

The script’s entry point calls run_experiment() directly:


if __name__ == "__main__":
    evaluator, summary = run_experiment()

Assembling the configuration

run_experiment() packs all experiment constants into a single ExperimentConfiguration and passes it to run_evaluation():


def run_experiment():
    config = mll.ExperimentConfiguration(
        primary_data_file=METABOLOMICS_FILE_PATH,
        metadata_file=METADATA_FILE_PATH,
        experiment_dir=EXPERIMENT_DIR,
        sample_id_column=SAMPLE_ID_COLUMN_NAME,
        features_are_rows=False,
        taxonomic_configs=[],
        transform_configs=TRANSFORMS_CONFIGS,
        target_configs=[
            mll.TargetConfig(
                column=TARGET_COLUMN_NAME,
                task_type=TASK_TYPE,
            )
        ],
        primary_modality_models=MODEL_CONFIGS,
        multimodal_config=None,
        nested_cv_config=NESTED_CV_CONFIG,
        hyperopt_config=mll.HyperoptConfig(enabled=False, ...),
        evaluation_thresholds=EVALUATION_THRESHOLDS,
        execution_config=mll.ExperimentExecutionConfig(...),
    )
    return run_evaluation(config)

For a parameter-by-parameter breakdown, see Experiment configuration.

Execution settings

ExperimentExecutionConfig controls parallelism, memory management, checkpointing, and disk output. The metabolomics example uses:


execution_config=mll.ExperimentExecutionConfig(
    n_jobs=1,
    use_threading=False,
    enable_early_termination=True,
    progress_backend="sqlite",
    progress_batch_size=10,
    consolidate_predictions=True,
    model_compression_level=3,
    consolidate_hyperparameters=True,
    deduplicate_nested_cv=False,
    incremental_results=True,
    stream_predictions=True,
    save_model_weights=True,
    save_predictions=True,
    save_processed_data=False,
    store_data_in_results=False,
    gc_between_configs=True,
    keep_original_data=False,
)

Parameter	Default	Purpose
`n_jobs`	`1`	Number of parallel workers. `1` runs sequentially.
`enable_early_termination`	`True`	Skip outer evaluation for configurations that fail inner validation thresholds.
`progress_backend`	`"sqlite"`	Persistence backend for run state. Allows an interrupted run to resume.
`incremental_results`	`True`	Flush each completed result to disk immediately.
`save_model_weights`	`True`	Serialise fitted models. Required for ensemble search.
`save_predictions`	`True`	Write per-fold predictions to Parquet. Required for ensemble search.
`gc_between_configs`	`True`	Run the garbage collector after each configuration to limit memory growth.

Running the evaluation

run_evaluation() constructs the evaluator, runs the sweep, and writes the results:


def run_evaluation(config):
    evaluator = mll.Evaluator(config)
    evaluator.run_systematic_evaluation()
    summary = evaluator.save_results()
    return evaluator, summary

run_systematic_evaluation() iterates every combination of transformation and learner. With taxonomic_configs=[], there is no taxonomic resolution axis, so the sweep covers only (transform, model) pairs. The inner cross-validation loop identifies configurations that meet the evaluation thresholds. The outer loop estimates generalisation performance.

save_results() consolidates all completed results into a summary DataFrame, writes evaluation_results.csv to the experiment directory, and returns:


{
    "total_configs": <int>,       # configurations evaluated
    "qualifying_configs": <int>,  # configurations that passed thresholds
    "qualifying_targets": [str],  # targets with at least one qualifying result
}