Skip to Content

Post-explain: local

Local explanations answer a different question from global ones. Instead of asking which features matter most across all samples, they ask: why did the model make this particular prediction for this particular sample? The entry point is run_local_lime from postexplain_local.py, which reuses the EnsembleExplainer data-loading pipeline and then runs LIME at the instance level.

XAI dependencies are not installed with the base package. Install them first:

uv pip install -e ".[xai]"

Configuration block

Edit the configuration block at the top of postexplain_local.py before running:

MODEL_DIR = ( "results/ibd_franzosa/siso/target-Study.Group/..." ".../models/RandomForestClassifier_min_samples_leaf-5_n_estimators-1000_random_state-91" ) FEATURES_FILE = "example/IBD/data/FRANZOSA_IBD_2019_profiles_hierarchical.tsv" TARGETS_FILE = "example/IBD/data/metadata.tsv" TARGET_COLUMN = "Study.Group" # Samples to explain. None triggers automatic selection. SAMPLE_IDS: list[str] | None = None N_LIME_SAMPLES = 2000 # LIME perturbation samples per instance TOP_N = 5 # top-N features per direction (positive + negative)

Calling run_local_lime directly

from postexplain_local import run_local_lime results = run_local_lime( model_dir="results/ibd_franzosa/siso/target-Study.Group/.../models/...", features_file="example/IBD/data/FRANZOSA_IBD_2019_profiles_hierarchical.tsv", targets_file="example/IBD/data/metadata.tsv", target_column="Study.Group", sample_ids=None, # None triggers automatic selection n_lime_samples=2000, top_n=5, )

Parameters

ParameterDefaultDescription
model_dirrequiredPath to a trained model directory.
features_filerequiredPath to the microbiome profiles TSV.
targets_filerequiredPath to the metadata TSV.
target_columnrequiredColumn in targets_file used as the prediction target.
sample_idsNoneList of sample IDs to explain. When None, samples are selected automatically (see below).
n_lime_samples2000Number of perturbed neighbourhood samples LIME generates per instance. Higher values give more stable explanations at the cost of compute time.
top_n5Number of features to report per direction. The output shows the top-top_n positively contributing features and the top-top_n negatively contributing features.

Automatic sample selection

When sample_ids is None, the script selects one representative sample per class. The selection strategy depends on whether a differential-abundance (DA) results file is available.

With a DA file (DA_FILE points to a CSV with FeatureID, Signif, and Enrichment columns):

  1. Loads features with q < 0.05.
  2. Matches them to columns present in the model’s feature space.
  3. For each sample, computes a DA-concordance score: correctly classified samples that most strongly exhibit the expected DA pattern score highest.
  4. Returns the top-scoring correctly classified sample per class.

Fallback (no DA file or no matching features): picks the correctly classified sample per class whose predicted class probability is closest to the class-level median. This gives a sample that is “typical” of the model’s behaviour for that class rather than an outlier.

How it works

  1. Loads the model and reconstructs the correct train/test splits for each CV fold via the EnsembleExplainer pipeline. Taxonomy filtering and transformation are applied exactly as they were during training.
  2. For each requested sample, identifies the CV fold(s) in which that sample was held out as test data.
  3. Runs LIMEExplainer.compute_local_lime on each such fold with the fold-specific transformed feature matrix.
  4. Aggregates LIME feature contributions across folds (mean over all folds in which the sample appeared as a test instance).
  5. Saves a per-sample JSON with the full contribution vector and a waterfall PNG showing the top contributing features.

What is written to disk

Output is written under {model_dir}/explainability/local/.

FileDescription
{sample_id}_lime.jsonFull LIME feature contribution vector for the sample, plus predicted class, true class, and prediction probability.
{sample_id}_waterfall.pngWaterfall chart showing the top-N positive and top-N negative features for the sample.
Last updated on