The data

Microbiota profiles: sparse, compositional, high-dimensional

For instance, given a cohort, where data points are the faecal microbiota samples orginating from subjects suffering from depression, the aim is to find meaningful associations between depression and microbiota. Each data point can be defined as a vector of taxon abundance counts or proportions which can span the taxonomic ranks from domain to strain. The resulting abundance profiles are often sparse and high-dimensional. The number of features and the degree of sparsity often vary by dataset, sequencing method, and the taxonomic resolution used.

data representation × machine learning model: the joint search begins

Step 1 — MPMA discovery

Data representation meets learner: the joint search

The framework automates the search for the optimal Microbiome Profile Modelling Algorithm (MPMA), the combination of data representation and machine learning model that ranks highest under user-specified criteria. Two mechanisms reduce computation: each data representation is prepared once and shared across all learners evaluated on it, and evaluation gates terminate unpromising configurations early. The user specifies which learners, transformations, taxonomic levels, and evaluation protocol settings to include in the search.

the qualified configurations can enter the ensemble sweep

Step 2 — Multi-view ensemble construction

Ensemble sweep: combining complementary views

An MPMAs Ensemble aggregates predictions from multiple MPMAs selected for complementarity.

explainable ai methods are applied to the algorithms

Step 3a — Global XAI

Group-level explanations: four methods, results compared feature by feature

top-ranked features can be traced to individual samples

Step 3b — Local XAI

Instance-level explanations: how features contributed to each individual prediction