Overview
This tutorial demonstrates federated learning with mllabiome by splitting the IBD dataset (Franzosa et al., 2019) across three simulated client sites and training a global XGBoost model without any site sharing its raw data.
Prerequisites
- The IBD data at
example/IBD/data/(the same files used in the single-modality tutorial). - The app extras installed:
uv pip install -e ".[app]"Federated learning in mllabiome
In a standard experiment, all data is available in one place. Federated learning handles the case where data is distributed across multiple sites that cannot share raw samples. Instead of moving data, each site trains a local model and sends only the model parameters to a central coordinator. The coordinator aggregates the updates (FedAvg) and distributes the new global model back to the sites. After several rounds, the global model benefits from all sites without any site ever exposing its data.
mllabiome uses the Flower framework for this communication. The coordinator runs a Flower server, and each client site runs a Flower client that loads its local data, trains locally, and reports parameters.
Simulated setup
The Franzosa IBD cohort contains 220 samples (88 CD, 76 UC, 56 Control). The data-splitting script partitions these into three client sites with stratified random assignment:
| Client | Samples | Purpose |
|---|---|---|
client_1 | ~73 | Site 1 |
client_2 | ~73 | Site 2 |
client_3 | ~74 | Site 3 |
Each client receives its own profile.tsv and metadata.tsv files. The class distribution is approximately preserved at each site.
Workflow
- Split the data into per-client directories.
- Start the backend (coordinator + Flower server).
- Register clients through the web interface or API.
- Create a training session from the server admin dashboard.
- Submit consent from each client.
- Run Flower clients locally, each pointing at its own data split.
- Monitor training through the analytics dashboard.
Tutorial structure
| Page | Content |
|---|---|
| Splitting data across clients | Create per-client data partitions from the IBD dataset. |
| Running the session | Register clients, create a session, run training, and view results. |
Scripts
- Data splitting:
example/IBD/split_ibd_federated.py - Flower client runner:
example/IBD/run_flower_client.py