Taxonomic levels
Each row in a profiles file represents a clade at a particular rank in the taxonomic hierarchy. mllabiome recognises eight levels:
| Level | Abbreviation | Rank |
|---|---|---|
DOMAIN | d | 0 |
PHYLUM | p | 1 |
CLASS | c | 2 |
ORDER | o | 3 |
FAMILY | f | 4 |
GENUS | g | 5 |
SPECIES | s | 6 |
STRAIN | t | 7 |
The level of a clade is inferred from its name prefix (e.g. g__Bacteroides maps to GENUS). Rows whose level cannot be determined are ignored during processing.
Taxonomic resolution
A TaxonomicProcessingConfig specifies which subset of rows to include as features. Two factory methods cover the common cases:
filter_exact(level, ...)keeps only rows whose level matches one or more specified levels exactly. Multiple levels can be passed to combine non-contiguous ranks in a single feature matrix.filter_range(start_level, end_level, aggregate)keeps all rows whose level falls within a contiguous range (inclusive on both ends). Theaggregateparameter controls how intermediate levels are handled:aggregate=True(default) — child rows are summed up to produce features at every level in the range. Use this when the input profiles contain only the most specific level (e.g. species-only profiles where family- and genus-level rows are absent).aggregate=False— rows are kept as-is with no summation. Use this when the profiles already contain pre-computed abundances at every taxonomic level in the range.
import mllabiome as mll
# Single level
mll.TaxonomicProcessingConfig.filter_exact(mll.TaxonomicLevel.GENUS)
# Two non-contiguous levels
mll.TaxonomicProcessingConfig.filter_exact(
mll.TaxonomicLevel.PHYLUM, mll.TaxonomicLevel.SPECIES
)
# Contiguous range — profiles already contain all levels, no aggregation needed
mll.TaxonomicProcessingConfig.filter_range(
start_level=mll.TaxonomicLevel.ORDER,
end_level=mll.TaxonomicLevel.SPECIES,
aggregate=False,
)
# Contiguous range — aggregate species rows up to higher ranks
mll.TaxonomicProcessingConfig.filter_range(
start_level=mll.TaxonomicLevel.ORDER,
end_level=mll.TaxonomicLevel.SPECIES,
aggregate=True,
)Predefined configuration set
operations/taxons_configs.py provides an exhaustive, ready-to-use list of 283 configurations across all eight taxonomic levels:
- 255
filter_exactentries — every non-empty subset of levels from a single rank up to all eight combined - 28
filter_rangeentries — every contiguous start/end pair across the eight ranks
All levels from DOMAIN through STRAIN are covered. The list can be passed directly as taxonomic_configs:
from operations.taxons_configs import taxons_configs
config = mll.ExperimentConfiguration(
...
taxonomic_configs=taxons_configs,
...
)The full set does not need to be used. Any subset can be selected, and custom configurations can be added alongside it.