Model deployment and inference
After running an experiment and identifying the best-performing ensemble via ensemble_sweep, the next step is to retrain those models on the full training set and package them for prototype deployment. mllabiome provides ProductionPipeline for this workflow and InferenceEngine / ModelLoader for loading and running predictions.
Note: The deployment tooling described here is designed for prototype and internal use. It is not hardened for public-facing production environments.
Installation
The app extras are required:
uv pip install -e ".[app]"This adds FastAPI, uvicorn, python-multipart, httpx, and Flower.
From experiment to deployment
The ProductionPipeline takes an experiment directory that already contains ensemble_summary.json (produced by ensemble_sweep) and retrains the selected models on the full dataset.
from mllabiome.production import ProductionPipeline
pipeline = ProductionPipeline(
experiment_dir="results/my_experiment",
)ProductionPipeline searches for ensemble_summary.json in the experiment directory (under ensemble_results_inner_val/ or ensemble_results/). The summary file identifies which models to include and the aggregation strategy to use.
Retraining on full data
During cross-validation, each model is trained on a subset of the data. For deployment, the same models are retrained on all available training samples:
pipeline.train_production_models(
train_data_path="data/profiles.tsv",
train_targets_path="data/metadata.tsv",
sample_id_column="Sample",
target_column="Study.Group",
save_dir="production_models/",
)This performs the following for each model in the ensemble:
- Loads the raw training data.
- Applies the same preprocessing (taxonomic filtering, compositional transformation) that was used during the experiment.
- Aligns features to the training feature set (missing features are zero-filled).
- Trains the model on the full dataset.
- Saves the trained model, ensemble configuration, and pipeline configuration to
save_dir.
Output files
After train_production_models completes, save_dir contains:
production_models/
{model_name}.pkl # serialized trained model (one per ensemble member)
ensemble_config.json # aggregation strategy, model list, score
pipeline_configs.json # preprocessing settings per modelRunning inference
Once trained, the pipeline can predict on new data:
predictions = pipeline.predict(
test_data_path="data/new_samples.tsv",
sample_id_column="Sample",
return_probabilities=True,
)The returned DataFrame contains one row per sample with the ensemble prediction and, when return_probabilities=True, per-class probability estimates.
Loading models from an experiment
ModelLoader provides utilities for finding and loading individual models or ensembles directly from an experiment directory without the full ProductionPipeline.
Best single model
import mllabiome as mll
metadata = mll.ModelLoader.find_best_model(
experiment_dir="results/my_experiment",
metric="test_nMCC",
)find_best_model reads evaluation_results.csv, ranks by the specified metric, and returns a ModelMetadata object containing the model name, path, score, and configuration.
Best ensemble
metadata = mll.ModelLoader.find_best_ensemble(
experiment_dir="results/my_experiment",
metric="test_nMCC",
)Returns an EnsembleMetadata with the aggregation method, number of models, constituent model names, and the ensemble score.
InferenceEngine
InferenceEngine wraps a loaded model or ensemble for batch prediction:
engine = mll.InferenceEngine(metadata, experiment_dir="results/my_experiment")
predictions = engine.predict_batch(test_data)Serving via the REST API
The backend application loads deployed models from app/backend/production_models/. Each model is represented by a pair of files:
| File | Contents |
|---|---|
{model_id}_config.json | Feature names, transformation settings, class labels, display name |
{model_id}_models.joblib | Serialized model ensemble |
The inference API exposes the following endpoints:
| Method | Path | Description |
|---|---|---|
GET | /api/inference/models | List available deployed models |
POST | /api/inference/predict | Single-sample prediction |
POST | /api/inference/predict/batch | Batch prediction from an uploaded TSV/CSV file |
POST | /api/inference/explain | LIME feature-importance explanations |
POST | /api/inference/explain/plot | Generate a LIME bar chart |
The batch endpoint applies the full mllabiome preprocessing pipeline (taxonomic filtering, transformation, feature alignment) before running the ensemble, ensuring consistency with the training configuration stored in _config.json.