Skip to Content

Transformation types

A TransformationConfig specifies how raw abundance values are transformed before they are passed to a model. The transform_type field selects the mathematical operation applied to each sample, and normalize_to_relative controls whether counts are converted to relative abundances before that operation.

import mllabiome as mll mll.TransformationConfig( transform_type=mll.TransformationType.ARCSIN_SQRT, normalize_to_relative=True, )

Available transformations

Identity and basic

TypeDescription
NONENo transformation. Raw counts or relative abundances are used as-is.
TSSTotal sum scaling. Each sample is divided by its total, producing relative abundances.
BINARYPresence/absence encoding. All non-zero values are set to 1.

Log-ratio transformations (compositional data)

These transformations are designed for compositional data where absolute values carry no meaning and only ratios between components are informative.

TypeDescription
CLRCentred log-ratio. Each value is divided by the geometric mean of the sample, then log-transformed.
ALRAdditive log-ratio. Each value is expressed as a log-ratio relative to a reference component.
ILRIsometric log-ratio. Projects the composition into an unconstrained Euclidean space via an orthonormal basis.

Root and trigonometric transformations

TypeDescription
SQRTSquare root of each value.
HELLINGERSquare root of relative abundances. Equivalent to the Hellinger transformation used in ordination.
ARCSINArc-sine of each value.
ARCSIN_SQRTArc-sine of the square root of each value. A variance-stabilising transformation for proportions.

Log transformations

TypeDescription
LOGNatural logarithm. A small pseudocount is added to handle zeros.
LOG10Base-10 logarithm with pseudocount.
LOG2Base-2 logarithm with pseudocount.

Rank-based normalisation

These transformations were identified as strong performers in benchmarking studies of microbiome classification pipelines.

TypeDescription
RANK_STDFeature ranks followed by z-score standardisation.
RANK_UNITFeature ranks divided by the square root of the sum of squared ranks.
LOG_STDNatural log followed by z-score standardisation.
LOG_UNITNatural log followed by unit-norm scaling.
ZSCOREZ-score standardisation of raw values without prior rank or log step.

Variance-stabilising transformations

These transformations use batch-level statistics fitted on training data. They are applied after fitting a scaler across the training set and are not purely per-sample operations.

TypeDescription
POWERYeo-Johnson power transformation. Stabilises variance and reduces skewness.
BOXCOXBox-Cox transformation. Requires strictly positive values.
ROBUSTRobust scaling using median and interquartile range. Less sensitive to outliers than z-score.
QUANTILEQuantile transformation mapping values to a uniform or normal distribution.

Special-purpose transformations

TypeDescription
CHI_SQUAREChi-square scaling, transforming each feature by its chi-square statistic relative to the sample total.
MGM_ENCODINGNeural encoder output from a pre-trained MGM model. Requires a compatible model checkpoint and is not interchangeable with the transformations above. Requires the mgm optional dependency — install with uv pip install -e ".[mgm]" (see Optional extras).

Predefined configuration set

operations/transforms_configs.py provides a ready-to-use list covering 23 transformation types, each evaluated both with and without prior relative-abundance normalisation. MGM_ENCODING is excluded as it requires a separate model checkpoint.

from operations.transforms_configs import transforms_configs config = mll.ExperimentConfiguration( ... transformation_configs=transforms_configs, ... )

Custom transformations

New transformations can be added at runtime using the @register_transformation decorator. Once registered, a custom transformation is available through TransformationConfig and participates in the experiment sweep alongside built-in types.

Row-wise transformations

A row-wise transformation processes each sample independently. Subclass BaseTransformer and implement transform:

import numpy as np import pandas as pd from mllabiome import register_transformation from mllabiome.data_space.processing.compositional_transformation import BaseTransformer @register_transformation( name="robust_clr", description="CLR with median centering instead of mean", ) class RobustCLRTransformer(BaseTransformer): def transform(self, sample: pd.DataFrame) -> pd.DataFrame: values = sample.values.flatten() values = np.where(values == 0, 1e-10, values) log_values = np.log(values) centered = log_values - np.median(log_values) return pd.DataFrame( centered.reshape(1, -1), index=sample.index, columns=sample.columns, )

The sample argument is a single-row DataFrame (1 x n_features). The returned DataFrame must preserve the same index and columns.

Batch transformations

A batch transformation fits parameters on training data and applies the learned transformation to both train and test sets. Subclass BatchTransformer and implement fit_transform_batch:

from mllabiome.data_space.processing.compositional_transformation import BatchTransformer @register_transformation( name="percentile_norm", description="Percentile-based normalization fitted on training data", is_batch=True, ) class PercentileNormTransformer(BatchTransformer): def __init__(self, lower: float = 5, upper: float = 95): self.lower = lower self.upper = upper def fit_transform_batch( self, X_train: pd.DataFrame, X_test: pd.DataFrame, ) -> tuple[pd.DataFrame, pd.DataFrame]: p_low = np.percentile(X_train.values, self.lower, axis=0) p_high = np.percentile(X_train.values, self.upper, axis=0) range_ = p_high - p_low range_[range_ == 0] = 1.0 X_train_norm = (X_train - p_low) / range_ X_test_norm = (X_test - p_low) / range_ return X_train_norm.clip(0, 1), X_test_norm.clip(0, 1)

Decorator parameters

ParameterTypeDefaultDescription
namestrrequiredUnique identifier used in TransformationConfig.
descriptionstr""Human-readable summary.
is_batchboolFalseWhether the transform requires fitting on training data.
formulastr""Optional LaTeX formula for documentation.
requires_positiveboolFalseWhether input values must be positive.
changes_dimensionsboolFalseWhether the output has a different number of features than the input.

Using a custom transformation in an experiment

The registration module must be imported before the experiment runs so the decorator executes:

# my_transforms.py — defines and registers the transformation (as above) # experiment.py import my_transforms # triggers registration import mllabiome as mll config = mll.ExperimentConfiguration( ... transformation_configs=[ mll.TransformationConfig(transform_type="robust_clr", normalize_to_relative=True), mll.TransformationConfig(transform_type=mll.TransformationType.CLR), ], ... )

Custom and built-in transformations can be mixed freely in the same experiment.

Listing registered transformations

from mllabiome import get_all_transformations, get_row_transformations, get_batch_transformations print(get_all_transformations()) # all registered names print(get_row_transformations()) # row-wise only print(get_batch_transformations()) # batch only
Last updated on