Transformation types

A TransformationConfig specifies how raw abundance values are transformed before they are passed to a model. The transform_type field selects the mathematical operation applied to each sample, and normalize_to_relative controls whether counts are converted to relative abundances before that operation.


import mllabiome as mll
 
mll.TransformationConfig(
    transform_type=mll.TransformationType.ARCSIN_SQRT,
    normalize_to_relative=True,
)

Available transformations

Identity and basic

Type	Description
`NONE`	No transformation. Raw counts or relative abundances are used as-is.
`TSS`	Total sum scaling. Each sample is divided by its total, producing relative abundances.
`BINARY`	Presence/absence encoding. All non-zero values are set to 1.

Log-ratio transformations (compositional data)

These transformations are designed for compositional data where absolute values carry no meaning and only ratios between components are informative.

Type	Description
`CLR`	Centred log-ratio. Each value is divided by the geometric mean of the sample, then log-transformed.
`ALR`	Additive log-ratio. Each value is expressed as a log-ratio relative to a reference component.
`ILR`	Isometric log-ratio. Projects the composition into an unconstrained Euclidean space via an orthonormal basis.

Root and trigonometric transformations

Type	Description
`SQRT`	Square root of each value.
`HELLINGER`	Square root of relative abundances. Equivalent to the Hellinger transformation used in ordination.
`ARCSIN`	Arc-sine of each value.
`ARCSIN_SQRT`	Arc-sine of the square root of each value. A variance-stabilising transformation for proportions.

Log transformations

Type	Description
`LOG`	Natural logarithm. A small pseudocount is added to handle zeros.
`LOG10`	Base-10 logarithm with pseudocount.
`LOG2`	Base-2 logarithm with pseudocount.

Rank-based normalisation

These transformations were identified as strong performers in benchmarking studies of microbiome classification pipelines.

Type	Description
`RANK_STD`	Feature ranks followed by z-score standardisation.
`RANK_UNIT`	Feature ranks divided by the square root of the sum of squared ranks.
`LOG_STD`	Natural log followed by z-score standardisation.
`LOG_UNIT`	Natural log followed by unit-norm scaling.
`ZSCORE`	Z-score standardisation of raw values without prior rank or log step.

Variance-stabilising transformations

These transformations use batch-level statistics fitted on training data. They are applied after fitting a scaler across the training set and are not purely per-sample operations.

Type	Description
`POWER`	Yeo-Johnson power transformation. Stabilises variance and reduces skewness.
`BOXCOX`	Box-Cox transformation. Requires strictly positive values.
`ROBUST`	Robust scaling using median and interquartile range. Less sensitive to outliers than z-score.
`QUANTILE`	Quantile transformation mapping values to a uniform or normal distribution.

Special-purpose transformations

Type	Description
`CHI_SQUARE`	Chi-square scaling, transforming each feature by its chi-square statistic relative to the sample total.
`MGM_ENCODING`	Neural encoder output from a pre-trained MGM model. Requires a compatible model checkpoint and is not interchangeable with the transformations above. Requires the `mgm` optional dependency — install with `uv pip install -e ".[mgm]"` (see Optional extras).

Predefined configuration set

operations/transforms_configs.py provides a ready-to-use list covering 23 transformation types, each evaluated both with and without prior relative-abundance normalisation. MGM_ENCODING is excluded as it requires a separate model checkpoint.


from operations.transforms_configs import transforms_configs
 
config = mll.ExperimentConfiguration(
    ...
    transformation_configs=transforms_configs,
    ...
)

Custom transformations

New transformations can be added at runtime using the @register_transformation decorator. Once registered, a custom transformation is available through TransformationConfig and participates in the experiment sweep alongside built-in types.

Row-wise transformations

A row-wise transformation processes each sample independently. Subclass BaseTransformer and implement transform:


import numpy as np
import pandas as pd
from mllabiome import register_transformation
from mllabiome.data_space.processing.compositional_transformation import BaseTransformer
 
@register_transformation(
    name="robust_clr",
    description="CLR with median centering instead of mean",
)
class RobustCLRTransformer(BaseTransformer):
    def transform(self, sample: pd.DataFrame) -> pd.DataFrame:
        values = sample.values.flatten()
        values = np.where(values == 0, 1e-10, values)
        log_values = np.log(values)
        centered = log_values - np.median(log_values)
        return pd.DataFrame(
            centered.reshape(1, -1),
            index=sample.index,
            columns=sample.columns,
        )

The sample argument is a single-row DataFrame (1 x n_features). The returned DataFrame must preserve the same index and columns.

Batch transformations

A batch transformation fits parameters on training data and applies the learned transformation to both train and test sets. Subclass BatchTransformer and implement fit_transform_batch:


from mllabiome.data_space.processing.compositional_transformation import BatchTransformer
 
@register_transformation(
    name="percentile_norm",
    description="Percentile-based normalization fitted on training data",
    is_batch=True,
)
class PercentileNormTransformer(BatchTransformer):
    def __init__(self, lower: float = 5, upper: float = 95):
        self.lower = lower
        self.upper = upper
 
    def fit_transform_batch(
        self,
        X_train: pd.DataFrame,
        X_test: pd.DataFrame,
    ) -> tuple[pd.DataFrame, pd.DataFrame]:
        p_low = np.percentile(X_train.values, self.lower, axis=0)
        p_high = np.percentile(X_train.values, self.upper, axis=0)
        range_ = p_high - p_low
        range_[range_ == 0] = 1.0
 
        X_train_norm = (X_train - p_low) / range_
        X_test_norm = (X_test - p_low) / range_
        return X_train_norm.clip(0, 1), X_test_norm.clip(0, 1)

Decorator parameters

Parameter	Type	Default	Description
`name`	`str`	required	Unique identifier used in `TransformationConfig`.
`description`	`str`	`""`	Human-readable summary.
`is_batch`	`bool`	`False`	Whether the transform requires fitting on training data.
`formula`	`str`	`""`	Optional LaTeX formula for documentation.
`requires_positive`	`bool`	`False`	Whether input values must be positive.
`changes_dimensions`	`bool`	`False`	Whether the output has a different number of features than the input.

Using a custom transformation in an experiment

The registration module must be imported before the experiment runs so the decorator executes:


# my_transforms.py — defines and registers the transformation (as above)
 
# experiment.py
import my_transforms  # triggers registration
import mllabiome as mll
 
config = mll.ExperimentConfiguration(
    ...
    transformation_configs=[
        mll.TransformationConfig(transform_type="robust_clr", normalize_to_relative=True),
        mll.TransformationConfig(transform_type=mll.TransformationType.CLR),
    ],
    ...
)

Custom and built-in transformations can be mixed freely in the same experiment.

Listing registered transformations


from mllabiome import get_all_transformations, get_row_transformations, get_batch_transformations
 
print(get_all_transformations())    # all registered names
print(get_row_transformations())    # row-wise only
print(get_batch_transformations())  # batch only