The data
For instance, given a cohort, where data points are the faecal microbiota samples orginating from subjects suffering from depression, the aim is to find meaningful associations between depression and microbiota. Each data point can be defined as a vector of taxon abundance counts or proportions which can span the taxonomic ranks from domain to strain. The resulting abundance profiles are often sparse and high-dimensional. The number of features and the degree of sparsity often vary by dataset, sequencing method, and the taxonomic resolution used.
data representation × machine learning model: the joint search begins
Step 1 — MPMA discovery
The framework automates the search for the optimal Microbiome Profile Modelling Algorithm (MPMA), the combination of data representation and machine learning model that ranks highest under user-specified criteria. Two mechanisms reduce computation: each data representation is prepared once and shared across all learners evaluated on it, and evaluation gates terminate unpromising configurations early. The user specifies which learners, transformations, taxonomic levels, and evaluation protocol settings to include in the search.
the qualified configurations can enter the ensemble sweep
Step 2 — Multi-view ensemble construction
An MPMAs Ensemble aggregates predictions from multiple MPMAs selected for complementarity.
explainable ai methods are applied to the algorithms
Step 3a — Global XAI
top-ranked features can be traced to individual samples
Step 3b — Local XAI