Post-explain: local
Local explanations answer a different question from global ones. Instead of asking which features matter most across all samples, they ask: why did the model make this particular prediction for this particular sample? The entry point is run_local_lime from postexplain_local.py, which reuses the EnsembleExplainer data-loading pipeline and then runs LIME at the instance level.
XAI dependencies are not installed with the base package. Install them first:
uv pip install -e ".[xai]"Configuration block
Edit the configuration block at the top of postexplain_local.py before running:
MODEL_DIR = (
"results/ibd_franzosa/siso/target-Study.Group/..."
".../models/RandomForestClassifier_min_samples_leaf-5_n_estimators-1000_random_state-91"
)
FEATURES_FILE = "example/IBD/data/FRANZOSA_IBD_2019_profiles_hierarchical.tsv"
TARGETS_FILE = "example/IBD/data/metadata.tsv"
TARGET_COLUMN = "Study.Group"
# Samples to explain. None triggers automatic selection.
SAMPLE_IDS: list[str] | None = None
N_LIME_SAMPLES = 2000 # LIME perturbation samples per instance
TOP_N = 5 # top-N features per direction (positive + negative)Calling run_local_lime directly
from postexplain_local import run_local_lime
results = run_local_lime(
model_dir="results/ibd_franzosa/siso/target-Study.Group/.../models/...",
features_file="example/IBD/data/FRANZOSA_IBD_2019_profiles_hierarchical.tsv",
targets_file="example/IBD/data/metadata.tsv",
target_column="Study.Group",
sample_ids=None, # None triggers automatic selection
n_lime_samples=2000,
top_n=5,
)Parameters
| Parameter | Default | Description |
|---|---|---|
model_dir | required | Path to a trained model directory. |
features_file | required | Path to the microbiome profiles TSV. |
targets_file | required | Path to the metadata TSV. |
target_column | required | Column in targets_file used as the prediction target. |
sample_ids | None | List of sample IDs to explain. When None, samples are selected automatically (see below). |
n_lime_samples | 2000 | Number of perturbed neighbourhood samples LIME generates per instance. Higher values give more stable explanations at the cost of compute time. |
top_n | 5 | Number of features to report per direction. The output shows the top-top_n positively contributing features and the top-top_n negatively contributing features. |
Automatic sample selection
When sample_ids is None, the script selects one representative sample per class. The selection strategy depends on whether a differential-abundance (DA) results file is available.
With a DA file (DA_FILE points to a CSV with FeatureID, Signif, and Enrichment columns):
- Loads features with
q < 0.05. - Matches them to columns present in the model’s feature space.
- For each sample, computes a DA-concordance score: correctly classified samples that most strongly exhibit the expected DA pattern score highest.
- Returns the top-scoring correctly classified sample per class.
Fallback (no DA file or no matching features): picks the correctly classified sample per class whose predicted class probability is closest to the class-level median. This gives a sample that is “typical” of the model’s behaviour for that class rather than an outlier.
How it works
- Loads the model and reconstructs the correct train/test splits for each CV fold via the
EnsembleExplainerpipeline. Taxonomy filtering and transformation are applied exactly as they were during training. - For each requested sample, identifies the CV fold(s) in which that sample was held out as test data.
- Runs
LIMEExplainer.compute_local_limeon each such fold with the fold-specific transformed feature matrix. - Aggregates LIME feature contributions across folds (mean over all folds in which the sample appeared as a test instance).
- Saves a per-sample JSON with the full contribution vector and a waterfall PNG showing the top contributing features.
What is written to disk
Output is written under {model_dir}/explainability/local/.
| File | Description |
|---|---|
{sample_id}_lime.json | Full LIME feature contribution vector for the sample, plus predicted class, true class, and prediction probability. |
{sample_id}_waterfall.png | Waterfall chart showing the top-N positive and top-N negative features for the sample. |