Ensemble sweep

After running the systematic evaluation, the individual trained models can be combined into an ensemble. The EnsembleSweep searches over a range of ensemble construction strategies and pool sizes to find the combination that best optimises a chosen metric on the inner-validation folds, then saves the winning configuration for deployment or further evaluation.


import json
from pathlib import Path
 
from mllabiome.ensemble_sweep import EnsembleSweep, SweepConfig
 
config = SweepConfig(
    experiment_dir="results/ibd_franzosa",
    optimize_metric="halo",
    mode="inner_validation",
    max_models=5,
    pool_sizes=[5, 10],
    diversity_weights=[0.2, 0.5],
    prob_models_only=True,
    exclude_transforms=["ILR", "ALR"],
    member_selection_strategies=[
        "top_k",
        "stratified",
        "diverse_families",
        "diverse",
        "borda_ranking",
        "clustered",
        "best_per_family",
        "maximal_diversity",
        "superlearner",
    ],
)
sweep = EnsembleSweep(config)
result = sweep.run()
 
output_dir = Path("results/ibd_franzosa/ensemble")
pred_file = sweep.save_best_ensemble(result, output_dir)
ensemble_dir = pred_file.parent.parent
print(f"\nEnsemble saved to: {ensemble_dir}")

SweepConfig parameters

Parameter	Description
`experiment_dir`	Root directory of the experiment, containing all individual model results.
`optimize_metric`	Metric used to rank ensembles. `"halo"` is the composite HALO score. Alternatives include `"nmcc"`, `"auc"`, `"f1"`.
`mode`	Evaluation split to optimise on. `"inner_validation"` uses the inner CV folds produced during training, so no test data is touched.
`max_models`	Maximum number of member models per ensemble.
`pool_sizes`	List of pool sizes to search over. Each value controls how many candidate models are pre-selected before applying a selection strategy.
`diversity_weights`	List of diversity penalty weights to try. Higher values favour ensembles whose members disagree more with each other.
`prob_models_only`	When `True`, only models that output calibrated probabilities are eligible. Required for probability-averaging ensembles.
`exclude_transforms`	Transformation types to exclude from the candidate pool (e.g. `"ILR"` and `"ALR"` are excluded here as they are less suitable for tree-based members).
`member_selection_strategies`	List of ensemble construction strategies to evaluate. The sweep tries every combination of strategy, pool size, and diversity weight.

Member selection strategies

Strategy	Description
`top_k`	Selects the k highest-performing individual models.
`stratified`	Picks models proportionally across taxonomy levels and transformation types, ensuring diversity in the feature representation.
`diverse_families`	Maximises the number of distinct model families (e.g. tree-based, linear, neural).
`diverse`	Maximises pairwise disagreement between member predictions.
`borda_ranking`	Ranks candidates by a Borda count across multiple metrics, then takes the top k.
`clustered`	Clusters models by prediction similarity and selects one representative per cluster.
`best_per_family`	Selects the single best model from each model family.
`maximal_diversity`	Greedy search for the subset with the highest average pairwise diversity.
`superlearner`	Learns an optimal linear combination of member predictions (stacking with a meta-learner).

Reading the results

After save_best_ensemble, an ensemble_config.json file is written alongside the predictions. It stores the chosen configuration and the outer-validation metrics:


config_file = ensemble_dir / "ensemble_config.json"
with open(config_file) as f:
    meta = json.load(f)
 
m = meta.get("outer_metrics", {})
print(f"AUC   {m.get('AUC', float('nan')):.4f}")
print(f"nMCC  {m.get('nMCC', float('nan')):.4f}")

outer_metrics are computed on the outer CV folds and reflect generalisation performance. They are not used to select the ensemble (selection is done entirely on inner_validation folds), so these values are unbiased estimates of ensemble quality.

What is written to disk

All output is written under the output_dir passed to save_best_ensemble.

File	Description
`ensemble_predictions.parquet`	Out-of-fold predictions produced by the winning ensemble.
`ensemble_config.json`	Winning configuration, member model paths, and outer-validation metrics.
`sweep_results.parquet`	Full table of every strategy/pool/weight combination evaluated, with their inner-validation scores. Useful for inspecting how sensitive results are to configuration choices.