Skip to Content

Taxonomic levels

Each row in a profiles file represents a clade at a particular rank in the taxonomic hierarchy. mllabiome recognises eight levels:

LevelAbbreviationRank
DOMAINd0
PHYLUMp1
CLASSc2
ORDERo3
FAMILYf4
GENUSg5
SPECIESs6
STRAINt7

The level of a clade is inferred from its name prefix (e.g. g__Bacteroides maps to GENUS). Rows whose level cannot be determined are ignored during processing.

Taxonomic resolution

A TaxonomicProcessingConfig specifies which subset of rows to include as features. Two factory methods cover the common cases:

  • filter_exact(level, ...) keeps only rows whose level matches one or more specified levels exactly. Multiple levels can be passed to combine non-contiguous ranks in a single feature matrix.
  • filter_range(start_level, end_level, aggregate) keeps all rows whose level falls within a contiguous range (inclusive on both ends). The aggregate parameter controls how intermediate levels are handled:
    • aggregate=True (default) — child rows are summed up to produce features at every level in the range. Use this when the input profiles contain only the most specific level (e.g. species-only profiles where family- and genus-level rows are absent).
    • aggregate=False — rows are kept as-is with no summation. Use this when the profiles already contain pre-computed abundances at every taxonomic level in the range.
import mllabiome as mll # Single level mll.TaxonomicProcessingConfig.filter_exact(mll.TaxonomicLevel.GENUS) # Two non-contiguous levels mll.TaxonomicProcessingConfig.filter_exact( mll.TaxonomicLevel.PHYLUM, mll.TaxonomicLevel.SPECIES ) # Contiguous range — profiles already contain all levels, no aggregation needed mll.TaxonomicProcessingConfig.filter_range( start_level=mll.TaxonomicLevel.ORDER, end_level=mll.TaxonomicLevel.SPECIES, aggregate=False, ) # Contiguous range — aggregate species rows up to higher ranks mll.TaxonomicProcessingConfig.filter_range( start_level=mll.TaxonomicLevel.ORDER, end_level=mll.TaxonomicLevel.SPECIES, aggregate=True, )

Predefined configuration set

operations/taxons_configs.py provides an exhaustive, ready-to-use list of 283 configurations across all eight taxonomic levels:

  • 255 filter_exact entries — every non-empty subset of levels from a single rank up to all eight combined
  • 28 filter_range entries — every contiguous start/end pair across the eight ranks

All levels from DOMAIN through STRAIN are covered. The list can be passed directly as taxonomic_configs:

from operations.taxons_configs import taxons_configs config = mll.ExperimentConfiguration( ... taxonomic_configs=taxons_configs, ... )

The full set does not need to be used. Any subset can be selected, and custom configurations can be added alongside it.

Last updated on