Resuming and extending an experiment
Resuming after an interruption
When progress_backend="sqlite" and incremental_results=True are set (both are the case in the IBD example), the framework writes a progress database to experiment_progress.db inside the target directory after every completed configuration. If the run is interrupted for any reason — an out-of-memory kill, a crash, or Ctrl+C — re-running the same script picks up from the last completed configuration automatically:
python example/IBD/ibd_franzosa.pyOn startup the framework reports how many configurations are already completed and how many remain pending. Completed configurations are not re-evaluated.
Configurations that were marked as failed (an unhandled exception during evaluation) are also skipped on a plain re-run. To re-run a failed configuration, see the section below.
Adding new configurations
New entries can be appended to any of the four search-space lists at any time. On the next run, only the new entries are evaluated; all previously completed configurations are recognised by their identifier and skipped.
For example, to add a new taxonomic resolution and a new learner to the IBD script:
TAXONOMIC_RESOLUTIONS_CONFIGS = [
# ... existing entries ...
mll.TaxonomicProcessingConfig.filter_exact(
level=mll.TaxonomicLevel.SPECIES,
),
]
MODEL_CONFIGS = [
# ... existing entries ...
mll.LogisticRegression(C=1.0),
]Re-running the script evaluates all combinations involving the new entries while leaving existing results untouched.
Re-running a failed configuration
Each configuration is identified by a string derived from the target name, taxonomic resolution, transformation, projection, and the learner name together with its explicitly-set constructor arguments. Changing any of those arguments produces a different identifier, which the framework treats as a new, unevaluated configuration.
The most direct way to force a re-run of a configuration that previously failed is to add or change a parameter that becomes part of the identifier. Adding random_state to a learner that did not have it set is a common choice:
# Original — has been marked failed, will be skipped
mll.XGBoost(n_estimators=1000),
# New entry — different identifier, will be evaluated
mll.XGBoost(n_estimators=1000, random_state=42),Both entries remain in the list. The original is skipped; the new one runs as a fresh configuration with its own output directory.
Any other explicitly-set argument works the same way. The argument does not need to affect the model meaningfully — its only role here is to distinguish the identifier. That said, random_state has the practical benefit of making the new run deterministic and directly comparable to others.
Starting fresh
To discard all progress and re-run the entire experiment from the beginning, delete the progress database:
rm results/ibd_franzosa/siso/target-Study.Group/experiment_progress.dbThe database is recreated automatically on the next run.