MPMAs: ensembling
An ensemble combines the predictions of multiple trained MPMAs. EnsembleSweep searches over all combinations of construction strategy, pool size, and diversity weight, then returns the single best configuration. See the Ensemble sweep tutorial for a worked example.
SweepConfig parameters
| Parameter | Description |
|---|---|
experiment_dir | Root directory of the experiment, containing all individual model results. |
optimize_metric | Metric used to rank ensembles. "halo" is the composite HALO score. Alternatives include "nmcc", "auc", "f1". |
mode | Evaluation split to optimise on. "inner_validation" uses the inner CV folds produced during training, so no test data is touched. |
min_models | Minimum number of member models per ensemble (default 2). |
max_models | Maximum number of member models per ensemble. |
pool_sizes | List of candidate pool sizes to search over. Each value controls how many models are pre-selected before a selection strategy is applied. Defaults to [50, 100, 200]. |
diversity_weights | List of diversity penalty weights to try. Higher values favour ensembles whose members disagree more with each other. Defaults to [0.1, 0.3, 0.5, 0.7]. |
prob_models_only | When True, only models that output calibrated probabilities are eligible. Required by all probability-based aggregation strategies. |
include_transforms | Restrict the candidate pool to models trained with these transformation types. |
exclude_transforms | Exclude models trained with these transformation types (e.g. ["ILR", "ALR"]). |
must_include_models | Force specific model paths to always appear in the ensemble. |
use_calibrated | Use probability-calibrated model variants when available. |
member_selection_strategies | List of construction strategies to evaluate. When None, all strategies are tried. See Member selection strategies below. |
aggregation_strategies | List of combination methods to evaluate. Prefix matching is supported ("power_mean" matches all variants). When None, all methods are tried. See Aggregation strategies below. |
compute_oracle | When True, also computes the oracle upper bound: the best score achievable by any (K, method) combination over all models. Useful as a ceiling reference. |
quiet | Suppress progress output. |
Member selection strategies
Member selection determines which trained MPMAs are included in the ensemble.
| Strategy | Parameters | Description |
|---|---|---|
top_k | max_models | Sorts all models by their validation score and selects the top max_models. No diversity consideration. |
stratified | max_models | Groups models into strata by (transform_type, model_family) and allocates an equal number of slots per stratum, choosing the best scorer per stratum. Remaining slots fill from the global ranking. |
diverse | top_k_pool_size, diversity_weight, max_models | Pre-selects the top-pool_size models, then uses greedy selection maximising a combined score: (1 − w) × performance + w × (1 − mean_abs_correlation), where correlation is computed from Pearson correlation of prediction vectors across all folds. |
diverse_families | top_k_pool_size, diversity_weight, max_models | Same greedy framework as diverse, but the diversity term combines prediction correlation (50%) and algorithm-family novelty (50%): a model from a family not yet in the ensemble gets full family-novelty credit. |
clustered | top_k_pool_size, max_models | Builds four separate candidate pools (top-K, diverse, stratified, clustering-based) and searches all of them. The clustering pool uses Ward-linkage hierarchical clustering of `1 − |
iterative | top_k_pool_size, diversity_weight, max_models | Seeds an ensemble with diverse selection, then iteratively improves it: for each position, tries every other candidate as a replacement and keeps the swap if it raises the ensemble’s validation score. Repeats for up to 3 rounds or until no position improves. |
interpretable | (none) | Restricts the candidate pool to tree-based and linear model families (RandomForest, ExtraTrees, GradientBoosting, HistGradientBoosting, LogisticRegression, DecisionTree, AdaBoost, Bagging, Ridge). Picks the single best per family, hard-capped at 3 models total. |
maximal_diversity | top_k_pool_size, max_models | Maximises three-axis diversity: prediction disagreement (30%), algorithm-family novelty (40%), and transformation-type novelty (30%). Performance acts only as a 10% tiebreaker: 0.1 × performance + 0.9 × diversity. |
borda_ranking | max_models | Computes per-model ranks independently for several metrics (primary validation score, mean AUC, mean nMCC) and for their standard deviations (stability metrics ranked ascending). The final Borda score is the mean rank across all metrics. Models with the lowest average rank are selected. |
greedy_forward | top_k_pool_size, max_models | Starts with the single model with the best per-fold AUC, then greedily adds the candidate whose addition to the current ensemble yields the largest AUC gain (using probability averaging). Stops when no candidate gives a positive gain. |
superlearner | min_models, max_models | Trains a Lasso-regularised logistic regression meta-learner on the stacked per-fold probability outputs of up to 30 pre-selected models. Cross-validates over 9 regularisation strengths on the inner folds. Models whose absolute meta-learner coefficient exceeds 1% of the maximum are kept. |
shapley_value | top_k_pool_size, max_models, min_models | Estimates each model’s marginal contribution via Monte-Carlo Shapley values (600 random coalition permutations, seed 42). For each permutation the metric gain from adding each model to its current coalition is accumulated. Models with Shapley value ≤ 0 are excluded. |
best_per_family | max_models | Groups models by algorithm family and selects the single best-scoring representative per family, sorted by family score descending. |
Aggregation strategies
Aggregation determines how the selected members’ predictions are combined into a final ensemble prediction.
All probability-based strategies require prob_models_only=True (or calibrated models). When probabilities are not available they fall back to (None, None).
Vote-based
| Strategy | Description |
|---|---|
voting | Uniform majority vote. The most-predicted class wins. When probabilities are available, also returns the arithmetic mean probability vector. |
weighted_voting | Each model’s vote is weighted by its min-max normalised validation score. The class with the highest total weight wins. |
Probability averaging
| Strategy | Description |
|---|---|
probability_averaging | Arithmetic mean of all members’ probability vectors (uniform weights). |
weighted_probability_averaging | Exponential weighting: w_i = exp(score_i / 0.1) after min-shifting. High-scoring models dominate sharply. |
geometric_mean | Weighted geometric mean in log-space: exp(Σ w_i · log(p_i)), renormalised. A single near-zero probability pulls the class probability down strongly. |
trimmed_mean | Arithmetic mean after trimming the bottom and top 10% of probability values per class (only applied when ≥ 4 models are present). |
median_probability | Element-wise median across all members’ probability vectors, renormalised to sum to 1. Has a 50% breakdown point: up to half the models can be outliers without affecting the result. |
power_mean_p-1 | Weighted harmonic mean (p = −1): (Σ w_i / p_i)^{−1}. Conservative combiner: a single model assigning near-zero probability to a class suppresses that class strongly. |
power_mean_p2 | Weighted quadratic mean (p = 2): sqrt(Σ w_i · p_i^2). Members with high class probabilities dominate more than in arithmetic mean. |
Stacking
| Strategy | Description |
|---|---|
stacking_lr | Trains a logistic regression meta-learner on the stacked probability outputs from inner-validation folds. Uses L2 regularisation, cross-validated over C ∈ {0.01, 0.1, 1, 10}. |
hierarchical_stacking | Pseudo-stacking: normalises member scores to [0,1], creates binary pseudo-labels (score > median), and fits a penalty-free logistic regression on those labels. Member weights are proportional to the fitted coefficient times the normalised score. |
bayesian_model_averaging | Posterior weights via Bayes rule with a uniform prior: log_w_i ∝ precision × normalised_score, where precision = α × n_members (α = 1.0). Weights computed via log-sum-exp for numerical stability. |
superlearner | Learns an optimal linear combination of member probabilities using a regularised logistic regression meta-learner (see selection strategy of the same name, which is used here as a combiner rather than a selector). |
Sample-adaptive
| Strategy | Description |
|---|---|
confidence_weighted | Base weights from raw validation scores. Per-sample, per-model confidence is max_prob × (1 − entropy / log(n_classes)). Each model’s base weight is boosted by 1 + confidence. |
temperature_scaled | Assigns a per-model temperature based on score bucket: ≥ 0.9 gives T = 0.8, ≥ 0.8 gives T = 1.0, ≥ 0.7 gives T = 1.2, lower gives T = 1.5. Probabilities are sharpened or softened via softmax(log(p) / T) before weighted averaging. |
adaptive_ensemble | Uses per-sample entropy to estimate difficulty. On hard samples (normalised_entropy > 0.5), models with low difficulty (consistently high scores) receive an upward weight adjustment. Rate controlled by adaptation_rate = 0.1. |
uncertainty_aware | Per-sample, per-model prediction uncertainty is 0.6 × (1 − confidence_margin) + 0.4 × normalised_entropy. Models whose uncertainty exceeds a threshold (0.3) have their weight reduced by (1 − prediction_uncertainty) × (1 − model_uncertainty). |
dynamic_selection | Per-sample, each model’s competence is (0.5 × max_prob + 0.5 × margin) × base_score. Only models with competence ≥ 0.6 contribute. If fewer than 2 qualify, the top-2 by competence are forced in. |
Robust / adversarial
| Strategy | Description |
|---|---|
robust_consensus | Groups models by their predicted class. Selects the class with the highest consensus score (Σ_weights × n_agreers / n_total > 0.7 and ≥ 3 agreeing models). Final probability is the weighted average over agreeing models only. Falls back to global weighted average when no consensus is reached. |
negative_correlation | Per-sample, models that disagree most from the ensemble mean receive a diversity bonus: `adjusted_weight = base_weight × (1 + λ × mean |
trimmed_mean | See Probability averaging above. |
Rank / tournament
| Strategy | Description |
|---|---|
rank_aggregation | Borda-count aggregation over probability vectors: for each model, classes are ranked by probability (0 = lowest). Weighted rank totals across models determine the final class order. |
copeland | Pairwise weighted tournament over classes: for each pair of classes, the model’s weight counts towards whichever class has higher probability. Copeland score = number of pairwise wins (+0.5 for ties). Immune to irrelevant alternatives. |
Other
| Strategy | Description |
|---|---|
switch | Per-sample, selects the single most confident model (highest max_prob) and uses its prediction and probability vector directly. No averaging. |
minimax | Per-sample, scales each model’s probability vector by its weight, then takes the element-wise minimum across all models. Equivalent to a unanimous-consent combiner: any model that strongly doubts a class suppresses it. |
taxonomic_aware | Assigns weights based on the taxonomic resolution in the model path. Higher-resolution models (genus, species) receive larger weights. Path keywords (strain/species/…/domain) are detected automatically. |
Last updated on