Skip to Content

Post-explain: global

run_post_explanation computes feature importance explanations for a trained model using SHAP, LIME, permutation importance, and ALE, then aggregates them across all CV folds and writes a report.

Global feature importance across XAI methods — IBD example

The chart above shows the top features ranked by each XAI method for a Random Forest trained on the IBD Franzosa 2019 cohort. Stars mark features with cross-method consensus: an orange star indicates strong consensus across 3 or more methods, a green star indicates agreement across 2 methods.

XAI dependencies are not installed with the base package. Install them first:

uv pip install -e ".[xai]"
from mllabiome.xai_space.postexplain import run_post_explanation explainer = run_post_explanation( "results/ibd_franzosa/siso/target-Study.Group/taxonomy-fel_genus_excl_chlo/transform-none/project-none/models/RandomForestClassifier_min_samples_leaf-5_n_estimators-1000_random_state-91", features_file="example/IBD/data/FRANZOSA_IBD_2019_profiles_hierarchical.tsv", targets_file="example/IBD/data/metadata.tsv", target_column="Study.Group", quick_run=False, max_samples=None, )

Parameters

ParameterDefaultDescription
directoryrequiredPath to a trained model directory or an ensemble results directory.
features_filerequiredPath to the microbiome profiles TSV.
targets_filerequiredPath to the metadata TSV.
target_columnrequiredColumn in targets_file to use as the prediction target.
methods["shap", "lime", "permutation", "ale"]XAI methods to run. Pass a subset to skip specific methods.
top_n20Number of top features to include in the summary report and visualisations.
visualizeTrueGenerate PNG visualisations alongside the reports.
quick_runFalseProcess only the first CV fold per model. Useful for a fast smoke test.
max_samplesNoneMaximum number of test samples used for SHAP and LIME. None uses all test samples. Does not affect permutation importance or ALE.
top_n_prescreenNoneFor permutation importance only: pre-screen to this many features using the model’s built-in feature_importances_ before running the full permutation loop. Reduces runtime on large feature spaces without discarding the full feature matrix passed to the model.
verboseTruePrint progress to stdout.

What is written to disk

All output is written under {model_dir}/explainability/.

FileDescription
ensemble_feature_importance_report.parquetAggregated feature importance scores across all folds and all methods, one row per feature per method.
ensemble_top_features.txtTop-N features per method, as plain text.
feature_interactions.csvALE second-order interaction scores for feature pairs. Only written when ALE runs successfully.
global_feature_importance.pngBar chart grid showing the top features per XAI method (example above).
feature_importance_heatmap.pngNormalised importance heatmap across methods and top features.
consensus_feature_importance.pngBar chart highlighting features with cross-method consensus.

Per-fold incremental results are cached under explainability/incremental/ during the run. If a run is interrupted it will resume from the last completed fold rather than starting over.

Last updated on