Skip to Content

Ensemble sweep

After running the systematic evaluation, the individual trained models can be combined into an ensemble. The EnsembleSweep searches over a range of ensemble construction strategies and pool sizes to find the combination that best optimises a chosen metric on the inner-validation folds, then saves the winning configuration for deployment or further evaluation.

import json from pathlib import Path from mllabiome.ensemble_sweep import EnsembleSweep, SweepConfig config = SweepConfig( experiment_dir="results/ibd_franzosa", optimize_metric="halo", mode="inner_validation", max_models=5, pool_sizes=[5, 10], diversity_weights=[0.2, 0.5], prob_models_only=True, exclude_transforms=["ILR", "ALR"], member_selection_strategies=[ "top_k", "stratified", "diverse_families", "diverse", "borda_ranking", "clustered", "best_per_family", "maximal_diversity", "superlearner", ], ) sweep = EnsembleSweep(config) result = sweep.run() output_dir = Path("results/ibd_franzosa/ensemble") pred_file = sweep.save_best_ensemble(result, output_dir) ensemble_dir = pred_file.parent.parent print(f"\nEnsemble saved to: {ensemble_dir}")

SweepConfig parameters

ParameterDescription
experiment_dirRoot directory of the experiment, containing all individual model results.
optimize_metricMetric used to rank ensembles. "halo" is the composite HALO score. Alternatives include "nmcc", "auc", "f1".
modeEvaluation split to optimise on. "inner_validation" uses the inner CV folds produced during training, so no test data is touched.
max_modelsMaximum number of member models per ensemble.
pool_sizesList of pool sizes to search over. Each value controls how many candidate models are pre-selected before applying a selection strategy.
diversity_weightsList of diversity penalty weights to try. Higher values favour ensembles whose members disagree more with each other.
prob_models_onlyWhen True, only models that output calibrated probabilities are eligible. Required for probability-averaging ensembles.
exclude_transformsTransformation types to exclude from the candidate pool (e.g. "ILR" and "ALR" are excluded here as they are less suitable for tree-based members).
member_selection_strategiesList of ensemble construction strategies to evaluate. The sweep tries every combination of strategy, pool size, and diversity weight.

Member selection strategies

StrategyDescription
top_kSelects the k highest-performing individual models.
stratifiedPicks models proportionally across taxonomy levels and transformation types, ensuring diversity in the feature representation.
diverse_familiesMaximises the number of distinct model families (e.g. tree-based, linear, neural).
diverseMaximises pairwise disagreement between member predictions.
borda_rankingRanks candidates by a Borda count across multiple metrics, then takes the top k.
clusteredClusters models by prediction similarity and selects one representative per cluster.
best_per_familySelects the single best model from each model family.
maximal_diversityGreedy search for the subset with the highest average pairwise diversity.
superlearnerLearns an optimal linear combination of member predictions (stacking with a meta-learner).

Reading the results

After save_best_ensemble, an ensemble_config.json file is written alongside the predictions. It stores the chosen configuration and the outer-validation metrics:

config_file = ensemble_dir / "ensemble_config.json" with open(config_file) as f: meta = json.load(f) m = meta.get("outer_metrics", {}) print(f"AUC {m.get('AUC', float('nan')):.4f}") print(f"nMCC {m.get('nMCC', float('nan')):.4f}")

outer_metrics are computed on the outer CV folds and reflect generalisation performance. They are not used to select the ensemble (selection is done entirely on inner_validation folds), so these values are unbiased estimates of ensemble quality.

What is written to disk

All output is written under the output_dir passed to save_best_ensemble.

FileDescription
ensemble_predictions.parquetOut-of-fold predictions produced by the winning ensemble.
ensemble_config.jsonWinning configuration, member model paths, and outer-validation metrics.
sweep_results.parquetFull table of every strategy/pool/weight combination evaluated, with their inner-validation scores. Useful for inspecting how sensitive results are to configuration choices.
Last updated on