Overview

This tutorial walks through taking a trained ensemble from the IBD experiment and deploying it as a REST API. The starting point is the results/ibd_franzosa/ directory produced by the single-modality microbiome tutorial.

Prerequisites

A completed experiment with an ensemble/ensemble_summary.json file (produced by sweep_ibd.py).
The original training data (FRANZOSA_IBD_2019_profiles_hierarchical.tsv and metadata.tsv).
The app extras installed:


uv pip install -e ".[app]"

What happens during deployment

During cross-validation, each model is trained on a subset of the data. Deploying an ensemble means retraining every member on the full training set, saving the resulting artefacts, and placing them where the backend can discover them.

The pipeline:

Load the ensemble summary. ProductionPipeline reads ensemble_summary.json to identify the selected models and aggregation strategy.
Retrain on full data. Each model is retrained with its original preprocessing (taxonomic filtering and compositional transformation) but using all 220 samples instead of a single CV fold.
Save artefacts. Serialised models, ensemble configuration, and per-model pipeline settings are written to a directory.
Serve via the REST API. The backend loads the artefacts and exposes /api/inference/predict, /api/inference/predict/batch, and /api/inference/explain endpoints.

Ensemble used in this tutorial

The IBD experiment produced a 5-model Copeland ensemble with an inner-validation HALO score of 0.685:

Model	Taxonomy	Transform
XGBoost	genus (excl. Chloroplast)	none
Random forest	genus (excl. Chloroplast)	relative + arcsin-sqrt
XGBoost	genus	binary
XGBoost	order-family (no aggregation)	none
XGBoost	genus	relative + arcsin-sqrt

Tutorial structure

Page	Content
Packaging models	Retrain the ensemble on full data and save the artefacts.
Serving predictions	Start the backend and query the inference API.

Complete script

The full deployment script is at example/IBD/deploy_ibd.py.