Model deployment and inference

After running an experiment and identifying the best-performing ensemble via ensemble_sweep, the next step is to retrain those models on the full training set and package them for prototype deployment. mllabiome provides ProductionPipeline for this workflow and InferenceEngine / ModelLoader for loading and running predictions.

Note: The deployment tooling described here is designed for prototype and internal use. It is not hardened for public-facing production environments.

Installation

The app extras are required:


uv pip install -e ".[app]"

This adds FastAPI, uvicorn, python-multipart, httpx, and Flower.

From experiment to deployment

The ProductionPipeline takes an experiment directory that already contains ensemble_summary.json (produced by ensemble_sweep) and retrains the selected models on the full dataset.


from mllabiome.production import ProductionPipeline
 
pipeline = ProductionPipeline(
    experiment_dir="results/my_experiment",
)

ProductionPipeline searches for ensemble_summary.json in the experiment directory (under ensemble_results_inner_val/ or ensemble_results/). The summary file identifies which models to include and the aggregation strategy to use.

Retraining on full data

During cross-validation, each model is trained on a subset of the data. For deployment, the same models are retrained on all available training samples:


pipeline.train_production_models(
    train_data_path="data/profiles.tsv",
    train_targets_path="data/metadata.tsv",
    sample_id_column="Sample",
    target_column="Study.Group",
    save_dir="production_models/",
)

This performs the following for each model in the ensemble:

Loads the raw training data.
Applies the same preprocessing (taxonomic filtering, compositional transformation) that was used during the experiment.
Aligns features to the training feature set (missing features are zero-filled).
Trains the model on the full dataset.
Saves the trained model, ensemble configuration, and pipeline configuration to save_dir.

Output files

After train_production_models completes, save_dir contains:


production_models/
  {model_name}.pkl          # serialized trained model (one per ensemble member)
  ensemble_config.json      # aggregation strategy, model list, score
  pipeline_configs.json     # preprocessing settings per model

Running inference

Once trained, the pipeline can predict on new data:


predictions = pipeline.predict(
    test_data_path="data/new_samples.tsv",
    sample_id_column="Sample",
    return_probabilities=True,
)

The returned DataFrame contains one row per sample with the ensemble prediction and, when return_probabilities=True, per-class probability estimates.

Loading models from an experiment

ModelLoader provides utilities for finding and loading individual models or ensembles directly from an experiment directory without the full ProductionPipeline.

Best single model


import mllabiome as mll
 
metadata = mll.ModelLoader.find_best_model(
    experiment_dir="results/my_experiment",
    metric="test_nMCC",
)

find_best_model reads evaluation_results.csv, ranks by the specified metric, and returns a ModelMetadata object containing the model name, path, score, and configuration.

Best ensemble


metadata = mll.ModelLoader.find_best_ensemble(
    experiment_dir="results/my_experiment",
    metric="test_nMCC",
)

Returns an EnsembleMetadata with the aggregation method, number of models, constituent model names, and the ensemble score.

InferenceEngine

InferenceEngine wraps a loaded model or ensemble for batch prediction:


engine = mll.InferenceEngine(metadata, experiment_dir="results/my_experiment")
 
predictions = engine.predict_batch(test_data)

Serving via the REST API

The backend application loads deployed models from app/backend/production_models/. Each model is represented by a pair of files:

File	Contents
`{model_id}_config.json`	Feature names, transformation settings, class labels, display name
`{model_id}_models.joblib`	Serialized model ensemble

The inference API exposes the following endpoints:

Method	Path	Description
`GET`	`/api/inference/models`	List available deployed models
`POST`	`/api/inference/predict`	Single-sample prediction
`POST`	`/api/inference/predict/batch`	Batch prediction from an uploaded TSV/CSV file
`POST`	`/api/inference/explain`	LIME feature-importance explanations
`POST`	`/api/inference/explain/plot`	Generate a LIME bar chart

The batch endpoint applies the full mllabiome preprocessing pipeline (taxonomic filtering, transformation, feature alignment) before running the ensemble, ensuring consistency with the training configuration stored in _config.json.