Overview

This tutorial demonstrates federated learning with mllabiome by splitting the IBD dataset (Franzosa et al., 2019) across three simulated client sites and training a global XGBoost model without any site sharing its raw data.

Prerequisites

The IBD data at example/IBD/data/ (the same files used in the single-modality tutorial).
The app extras installed:


uv pip install -e ".[app]"

Federated learning in mllabiome

In a standard experiment, all data is available in one place. Federated learning handles the case where data is distributed across multiple sites that cannot share raw samples. Instead of moving data, each site trains a local model and sends only the model parameters to a central coordinator. The coordinator aggregates the updates (FedAvg) and distributes the new global model back to the sites. After several rounds, the global model benefits from all sites without any site ever exposing its data.

mllabiome uses the Flower framework for this communication. The coordinator runs a Flower server, and each client site runs a Flower client that loads its local data, trains locally, and reports parameters.

Simulated setup

The Franzosa IBD cohort contains 220 samples (88 CD, 76 UC, 56 Control). The data-splitting script partitions these into three client sites with stratified random assignment:

Client	Samples	Purpose
`client_1`	~73	Site 1
`client_2`	~73	Site 2
`client_3`	~74	Site 3

Each client receives its own profile.tsv and metadata.tsv files. The class distribution is approximately preserved at each site.

Workflow

Split the data into per-client directories.
Start the backend (coordinator + Flower server).
Register clients through the web interface or API.
Create a training session from the server admin dashboard.
Submit consent from each client.
Run Flower clients locally, each pointing at its own data split.
Monitor training through the analytics dashboard.

Tutorial structure

Page	Content
Splitting data across clients	Create per-client data partitions from the IBD dataset.
Running the session	Register clients, create a session, run training, and view results.

Scripts

Data splitting: example/IBD/split_ibd_federated.py
Flower client runner: example/IBD/run_flower_client.py