Skip to content

DQM-ML CLI Wrapper

Main CLI entry point for DQM-ML. Consolidates all modular packages into a single command-line interface.

Installation

# Basic installation (core only)
pip install dqm-ml

# Installation with optional components
pip install "dqm-ml[all]"      # Everything
pip install "dqm-ml[job]"      # core + job
pip install "dqm-ml[pytorch]" # core + pytorch
pip install "dqm-ml[images]"  # core + images
pip install "dqm-ml[notebooks]" # Jupyter support

Quick Start

Process a Dataset

Run a data quality pipeline from a configuration file:

dqm-ml process -p config.yaml

List Available Plugins

Show all registered metrics and data loaders:

dqm-ml list

Check Version

dqm-ml version

Commands

Command Description
process Execute a data quality pipeline from a YAML config
list Show all available plugins (metrics, loaders)
version Display version information

Configuration

DQM-ML uses YAML configuration files to define: - Data sources (dataloaders) - Metrics to compute (metrics_processor) - Output settings (outputs)

Completeness Example

dataloaders:
  train:
    type: parquet
    path: data/train.parquet

metrics_processor:
  completeness:
    type: completeness
    input_columns: [col_a, col_b]

Representativeness Example

dataloaders:
  train:
    type: parquet
    path: data/train.parquet

metrics_processor:
  representativeness:
    type: representativeness
    input_columns: [feature_x, feature_y]
    distribution: "normal"
    metrics: ["chi-square", "kolmogorov-smirnov"]

Domain Gap Example

dataloaders:
  source:
    type: parquet
    path: data/source.parquet
  target:
    type: parquet
    path: data/target.parquet

metrics_processor:
  domain_gap:
    type: domain_gap
    INPUT:
      embedding_col: "features"
    DELTA:
      metric: "mmd_linear"

Visual Features Example

dataloaders:
  images:
    type: parquet
    path: data/images.parquet

metrics_processor:
  visual:
    type: visual_metric
    input_columns: ["image_data"]
    grayscale: true

Multiple Metrics Example

dataloaders:
  train:
    type: parquet
    path: data/train.parquet

metrics_processor:
  completeness:
    type: completeness
    input_columns: [col_a, col_b]

  representativeness:
    type: representativeness
    input_columns: [feature_x]
    distribution: "normal"

See Also