Skip to content

Domain Gap Metric

The Domain Gap metric measures the statistical distance between a source and a target distribution based on their embeddings. It is used to quantify drift or differences between datasets.

What It Measures

Domain Gap measures how different two datasets are from each other. Use it to:

  • Compare train/test distributions — Ensure test data resembles training data
  • Detect data drift over time — Monitor distribution shifts in production
  • Validate data augmentation — Check if augmented data maintains original characteristics

Available Distance Metrics

Metric Full Name Best For
FID Fréchet Inception Distance Image embeddings
MMD Maximum Mean Discrepancy General kernel-based
Wasserstein Earth Mover's Distance 1D distributions
KLMVN Kullback-Leibler Multivariate Normal Gaussian distributions
H-Divergence Hypothesis-based Divergence Binary classification

Use Cases

  • Validate train/test split quality
  • Monitor data drift in production pipelines
  • Compare different data versions
  • Evaluate data augmentation strategies

Processor Information

  • Class: DomainGapProcessor
  • Package: dqm-ml-pytorch
  • Type Name: domain_gap

Supported Distance Metrics

  • fid: Frechet Inception Distance.
  • klmvn_diag: KL-Divergence assuming Diagonal Multivariate Normal distribution.
  • mmd_linear: Maximum Mean Discrepancy with a linear kernel.
  • wasserstein_1d: 1D Wasserstein distance aggregated over dimensions.

Configuration Parameters

  • INPUT:
  • embedding_col: Column containing pre-computed feature embeddings.
  • DELTA:
  • metric: The specific distance metric to compute.

Example YAML Configuration

metrics_processor:
  domain_drift:
    type: domain_gap
    INPUT:
      embedding_col: "resnet_embeddings"
    DELTA:
      metric: "fid"

Output

The processor returns the computed distance value:

  • domain_gap_<metric_name>: The calculated statistical distance.