Skip to Content

Experiment configuration

The experiment is defined in example/IBD/ibd_franzosa_mtb.py. It follows the same structure as the microbiome tutorial but with three configuration changes specific to generic tabular data.

Imports

from pathlib import Path import mllabiome as mll

File paths and task definition

METABOLOMICS_FILE_PATH = Path("example/IBD/data/metabolomics_data.tsv") METADATA_FILE_PATH = Path("example/IBD/data/metadata.tsv") EXPERIMENT_DIR = Path("results/ibd_metabolomics") SAMPLE_ID_COLUMN_NAME = "Sample" TARGET_COLUMN_NAME = "Study.Group" TASK_TYPE = mll.TaskType.MULTICLASS
ConstantPurpose
METABOLOMICS_FILE_PATHFeature TSV: first column Sample (sample IDs), remaining columns are metabolite abundances
METADATA_FILE_PATHSample metadata TSV with one row per sample
EXPERIMENT_DIRDirectory where all results and artefacts are written
SAMPLE_ID_COLUMN_NAMEColumn in both the features file and the metadata file that identifies samples
TARGET_COLUMN_NAMEColumn in the metadata file used as the prediction target
TASK_TYPELearning task: mll.TaskType.MULTICLASS, mll.TaskType.BINARY, or mll.TaskType.REGRESSION

Evaluation protocol

NESTED_CV_CONFIG = mll.NestedCVConfig( outer_folds=5, inner_folds=3, repeats=2, random_state=42, stratify=True, stratify_columns=[TARGET_COLUMN_NAME], ) EVALUATION_THRESHOLDS = mll.EvaluationThresholds( inner_val_performance_threshold=0.51, inner_val_single_fold_performance_threshold=0.51, )

These settings are identical to the microbiome tutorial. See Evaluation protocol for a description.

Transformation

For this example metabolomics data there is no taxonomic hierarchy to filter, so taxonomic_configs is set to an empty list. A single pass-through transformation is used to keep the experiment fast:

TRANSFORMS_CONFIGS = [ mll.TransformationConfig( transform_type=mll.TransformationType.NONE, normalize_to_relative=False, ), ]

For all available transformation types, see Transformation types.

Models

MODEL_CONFIGS = [ mll.XGBoost(n_estimators=100), mll.RandomForestClassifier(n_estimators=100, min_samples_leaf=5, random_state=91), mll.LogisticRegression(max_iter=100), ]

For the full list of supported models, see Available base learners.

Assembling the configuration

The three differences from the microbiome tutorial are highlighted below:

config = mll.ExperimentConfiguration( primary_data_file=METABOLOMICS_FILE_PATH, # not microbiome_file metadata_file=METADATA_FILE_PATH, experiment_dir=EXPERIMENT_DIR, sample_id_column=SAMPLE_ID_COLUMN_NAME, features_are_rows=False, # samples are rows, features are columns taxonomic_configs=[], # no taxonomy transform_configs=TRANSFORMS_CONFIGS, target_configs=[ mll.TargetConfig( column=TARGET_COLUMN_NAME, task_type=TASK_TYPE, ) ], primary_modality_models=MODEL_CONFIGS, ... )
ParameterMicrobiome tutorialThis tutorial
microbiome_file / primary_data_filemicrobiome_file=...primary_data_file=...
features_are_rowsTrue (taxa are rows)False (samples are rows)
taxonomic_configslist of TaxonomicProcessingConfig[]

primary_data_file is the parameter for any non-microbiome primary modality. microbiome_file is the legacy name for the same concept and remains accepted for backward compatibility.

features_are_rows=False tells the loader that the file has one sample per row. The data is transposed internally before any transformation or model training, so the rest of the pipeline sees the standard features-as-rows layout.

taxonomic_configs=[] disables the taxonomic filtering step entirely. With no taxonomy to traverse, the transformation search runs directly on the full feature matrix.

Last updated on