Post-explain: local

Local explanations answer a different question from global ones. Instead of asking which features matter most across all samples, they ask: why did the model make this particular prediction for this particular sample? The entry point is run_local_lime from postexplain_local.py, which reuses the EnsembleExplainer data-loading pipeline and then runs LIME at the instance level.

XAI dependencies are not installed with the base package. Install them first:


uv pip install -e ".[xai]"

Configuration block

Edit the configuration block at the top of postexplain_local.py before running:


MODEL_DIR = (
    "results/ibd_franzosa/siso/target-Study.Group/..."
    ".../models/RandomForestClassifier_min_samples_leaf-5_n_estimators-1000_random_state-91"
)
 
FEATURES_FILE = "example/IBD/data/FRANZOSA_IBD_2019_profiles_hierarchical.tsv"
TARGETS_FILE  = "example/IBD/data/metadata.tsv"
TARGET_COLUMN = "Study.Group"
 
# Samples to explain. None triggers automatic selection.
SAMPLE_IDS: list[str] | None = None
 
N_LIME_SAMPLES = 2000   # LIME perturbation samples per instance
TOP_N = 5               # top-N features per direction (positive + negative)

Calling `run_local_lime` directly


from postexplain_local import run_local_lime
 
results = run_local_lime(
    model_dir="results/ibd_franzosa/siso/target-Study.Group/.../models/...",
    features_file="example/IBD/data/FRANZOSA_IBD_2019_profiles_hierarchical.tsv",
    targets_file="example/IBD/data/metadata.tsv",
    target_column="Study.Group",
    sample_ids=None,        # None triggers automatic selection
    n_lime_samples=2000,
    top_n=5,
)

Parameters

Parameter	Default	Description
`model_dir`	required	Path to a trained model directory.
`features_file`	required	Path to the microbiome profiles TSV.
`targets_file`	required	Path to the metadata TSV.
`target_column`	required	Column in `targets_file` used as the prediction target.
`sample_ids`	`None`	List of sample IDs to explain. When `None`, samples are selected automatically (see below).
`n_lime_samples`	`2000`	Number of perturbed neighbourhood samples LIME generates per instance. Higher values give more stable explanations at the cost of compute time.
`top_n`	`5`	Number of features to report per direction. The output shows the top-`top_n` positively contributing features and the top-`top_n` negatively contributing features.

Automatic sample selection

When sample_ids is None, the script selects one representative sample per class. The selection strategy depends on whether a differential-abundance (DA) results file is available.

With a DA file (DA_FILE points to a CSV with FeatureID, Signif, and Enrichment columns):

Loads features with q < 0.05.
Matches them to columns present in the model’s feature space.
For each sample, computes a DA-concordance score: correctly classified samples that most strongly exhibit the expected DA pattern score highest.
Returns the top-scoring correctly classified sample per class.

Fallback (no DA file or no matching features): picks the correctly classified sample per class whose predicted class probability is closest to the class-level median. This gives a sample that is “typical” of the model’s behaviour for that class rather than an outlier.

How it works

Loads the model and reconstructs the correct train/test splits for each CV fold via the EnsembleExplainer pipeline. Taxonomy filtering and transformation are applied exactly as they were during training.
For each requested sample, identifies the CV fold(s) in which that sample was held out as test data.
Runs LIMEExplainer.compute_local_lime on each such fold with the fold-specific transformed feature matrix.
Aggregates LIME feature contributions across folds (mean over all folds in which the sample appeared as a test instance).
Saves a per-sample JSON with the full contribution vector and a waterfall PNG showing the top contributing features.

What is written to disk

Output is written under {model_dir}/explainability/local/.

File	Description
`{sample_id}_lime.json`	Full LIME feature contribution vector for the sample, plus predicted class, true class, and prediction probability.
`{sample_id}_waterfall.png`	Waterfall chart showing the top-N positive and top-N negative features for the sample.