Experiment configuration

The experiment is defined in example/IBD/ibd_franzosa_mtb.py. It follows the same structure as the microbiome tutorial but with three configuration changes specific to generic tabular data.

Imports


from pathlib import Path
import mllabiome as mll

File paths and task definition


METABOLOMICS_FILE_PATH = Path("example/IBD/data/metabolomics_data.tsv")
METADATA_FILE_PATH     = Path("example/IBD/data/metadata.tsv")
EXPERIMENT_DIR         = Path("results/ibd_metabolomics")
SAMPLE_ID_COLUMN_NAME  = "Sample"
TARGET_COLUMN_NAME     = "Study.Group"
TASK_TYPE              = mll.TaskType.MULTICLASS

Constant	Purpose
`METABOLOMICS_FILE_PATH`	Feature TSV: first column `Sample` (sample IDs), remaining columns are metabolite abundances
`METADATA_FILE_PATH`	Sample metadata TSV with one row per sample
`EXPERIMENT_DIR`	Directory where all results and artefacts are written
`SAMPLE_ID_COLUMN_NAME`	Column in both the features file and the metadata file that identifies samples
`TARGET_COLUMN_NAME`	Column in the metadata file used as the prediction target
`TASK_TYPE`	Learning task: `mll.TaskType.MULTICLASS`, `mll.TaskType.BINARY`, or `mll.TaskType.REGRESSION`

Evaluation protocol


NESTED_CV_CONFIG = mll.NestedCVConfig(
    outer_folds=5,
    inner_folds=3,
    repeats=2,
    random_state=42,
    stratify=True,
    stratify_columns=[TARGET_COLUMN_NAME],
)
 
EVALUATION_THRESHOLDS = mll.EvaluationThresholds(
    inner_val_performance_threshold=0.51,
    inner_val_single_fold_performance_threshold=0.51,
)

These settings are identical to the microbiome tutorial. See Evaluation protocol for a description.

Transformation

For this example metabolomics data there is no taxonomic hierarchy to filter, so taxonomic_configs is set to an empty list. A single pass-through transformation is used to keep the experiment fast:


TRANSFORMS_CONFIGS = [
    mll.TransformationConfig(
        transform_type=mll.TransformationType.NONE,
        normalize_to_relative=False,
    ),
]

For all available transformation types, see Transformation types.

Models


MODEL_CONFIGS = [
    mll.XGBoost(n_estimators=100),
    mll.RandomForestClassifier(n_estimators=100, min_samples_leaf=5, random_state=91),
    mll.LogisticRegression(max_iter=100),
]

For the full list of supported models, see Available base learners.

Assembling the configuration

The three differences from the microbiome tutorial are highlighted below:


config = mll.ExperimentConfiguration(
    primary_data_file=METABOLOMICS_FILE_PATH,  # not microbiome_file
    metadata_file=METADATA_FILE_PATH,
    experiment_dir=EXPERIMENT_DIR,
    sample_id_column=SAMPLE_ID_COLUMN_NAME,
    features_are_rows=False,   # samples are rows, features are columns
    taxonomic_configs=[],      # no taxonomy
    transform_configs=TRANSFORMS_CONFIGS,
    target_configs=[
        mll.TargetConfig(
            column=TARGET_COLUMN_NAME,
            task_type=TASK_TYPE,
        )
    ],
    primary_modality_models=MODEL_CONFIGS,
    ...
)

Parameter	Microbiome tutorial	This tutorial
`microbiome_file` / `primary_data_file`	`microbiome_file=...`	`primary_data_file=...`
`features_are_rows`	`True` (taxa are rows)	`False` (samples are rows)
`taxonomic_configs`	list of `TaxonomicProcessingConfig`	`[]`

primary_data_file is the parameter for any non-microbiome primary modality. microbiome_file is the legacy name for the same concept and remains accepted for backward compatibility.

features_are_rows=False tells the loader that the file has one sample per row. The data is transposed internally before any transformation or model training, so the rest of the pipeline sees the standard features-as-rows layout.

taxonomic_configs=[] disables the taxonomic filtering step entirely. With no taxonomy to traverse, the transformation search runs directly on the full feature matrix.