Ensemble sweep
After running the systematic evaluation, the individual trained models can be combined into an ensemble. The EnsembleSweep searches over a range of ensemble construction strategies and pool sizes to find the combination that best optimises a chosen metric on the inner-validation folds, then saves the winning configuration for deployment or further evaluation.
import json
from pathlib import Path
from mllabiome.ensemble_sweep import EnsembleSweep, SweepConfig
config = SweepConfig(
experiment_dir="results/ibd_franzosa",
optimize_metric="halo",
mode="inner_validation",
max_models=5,
pool_sizes=[5, 10],
diversity_weights=[0.2, 0.5],
prob_models_only=True,
exclude_transforms=["ILR", "ALR"],
member_selection_strategies=[
"top_k",
"stratified",
"diverse_families",
"diverse",
"borda_ranking",
"clustered",
"best_per_family",
"maximal_diversity",
"superlearner",
],
)
sweep = EnsembleSweep(config)
result = sweep.run()
output_dir = Path("results/ibd_franzosa/ensemble")
pred_file = sweep.save_best_ensemble(result, output_dir)
ensemble_dir = pred_file.parent.parent
print(f"\nEnsemble saved to: {ensemble_dir}")SweepConfig parameters
| Parameter | Description |
|---|---|
experiment_dir | Root directory of the experiment, containing all individual model results. |
optimize_metric | Metric used to rank ensembles. "halo" is the composite HALO score. Alternatives include "nmcc", "auc", "f1". |
mode | Evaluation split to optimise on. "inner_validation" uses the inner CV folds produced during training, so no test data is touched. |
max_models | Maximum number of member models per ensemble. |
pool_sizes | List of pool sizes to search over. Each value controls how many candidate models are pre-selected before applying a selection strategy. |
diversity_weights | List of diversity penalty weights to try. Higher values favour ensembles whose members disagree more with each other. |
prob_models_only | When True, only models that output calibrated probabilities are eligible. Required for probability-averaging ensembles. |
exclude_transforms | Transformation types to exclude from the candidate pool (e.g. "ILR" and "ALR" are excluded here as they are less suitable for tree-based members). |
member_selection_strategies | List of ensemble construction strategies to evaluate. The sweep tries every combination of strategy, pool size, and diversity weight. |
Member selection strategies
| Strategy | Description |
|---|---|
top_k | Selects the k highest-performing individual models. |
stratified | Picks models proportionally across taxonomy levels and transformation types, ensuring diversity in the feature representation. |
diverse_families | Maximises the number of distinct model families (e.g. tree-based, linear, neural). |
diverse | Maximises pairwise disagreement between member predictions. |
borda_ranking | Ranks candidates by a Borda count across multiple metrics, then takes the top k. |
clustered | Clusters models by prediction similarity and selects one representative per cluster. |
best_per_family | Selects the single best model from each model family. |
maximal_diversity | Greedy search for the subset with the highest average pairwise diversity. |
superlearner | Learns an optimal linear combination of member predictions (stacking with a meta-learner). |
Reading the results
After save_best_ensemble, an ensemble_config.json file is written alongside the predictions. It stores the chosen configuration and the outer-validation metrics:
config_file = ensemble_dir / "ensemble_config.json"
with open(config_file) as f:
meta = json.load(f)
m = meta.get("outer_metrics", {})
print(f"AUC {m.get('AUC', float('nan')):.4f}")
print(f"nMCC {m.get('nMCC', float('nan')):.4f}")outer_metrics are computed on the outer CV folds and reflect generalisation performance. They are not used to select the ensemble (selection is done entirely on inner_validation folds), so these values are unbiased estimates of ensemble quality.
What is written to disk
All output is written under the output_dir passed to save_best_ensemble.
| File | Description |
|---|---|
ensemble_predictions.parquet | Out-of-fold predictions produced by the winning ensemble. |
ensemble_config.json | Winning configuration, member model paths, and outer-validation metrics. |
sweep_results.parquet | Full table of every strategy/pool/weight combination evaluated, with their inner-validation scores. Useful for inspecting how sensitive results are to configuration choices. |