dqm_ml_job.outputwriter
Output writers module for DQM ML Job.
This module contains classes for writing pipeline results (features and metrics) to various storage backends.
Classes:
| Name | Description |
|---|---|
OutputWriter |
Protocol for output writer implementations. |
ParquetOutputWriter |
Writer that saves data to Parquet files. |
__all__ = ['OutputWriter', 'ParquetOutputWriter', 'dqml_outputs_registry']
module-attribute
dqml_outputs_registry = {'parquet': ParquetOutputWriter}
module-attribute
OutputWriter
Bases: Protocol
Protocol for Output Writers.
Defines the interface for writing pipeline results (features or metrics) to storage.
Source code in packages/dqm-ml-job/src/dqm_ml_job/outputwriter/__init__.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | |
columns: list[str]
instance-attribute
name: str
instance-attribute
write_metrics_dict(metrics_dict: dict[str, dict[str, Any]]) -> None
Source code in packages/dqm-ml-job/src/dqm_ml_job/outputwriter/__init__.py
27 28 | |
write_table(name: str, table: Any, part_index: int | None = None) -> None
Write a table (features or metrics) to the output.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the dataset or metric. |
required |
table
|
Any
|
The data to write (usually a pyarrow Table or dict of arrays). |
required |
part_index
|
int | None
|
Index of the data part (for chunked writing). |
None
|
Source code in packages/dqm-ml-job/src/dqm_ml_job/outputwriter/__init__.py
30 31 32 33 34 35 36 37 38 | |
ParquetOutputWriter
Output writer that saves processed features to a Parquet file.
Source code in packages/dqm-ml-job/src/dqm_ml_job/outputwriter/parquet.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
columns = config['columns']
instance-attribute
name = name
instance-attribute
path_pattern = config['path_pattern']
instance-attribute
__init__(name: str, config: dict[str, Any] | None = None)
Initialize a ParquetOutputWriter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Unique name for this output writer. |
required |
config
|
dict[str, Any] | None
|
Configuration dictionary with keys: - path_pattern (str): Output file path format string. - columns (List[str]): Columns to save. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If required config keys are missing. |
Source code in packages/dqm-ml-job/src/dqm_ml_job/outputwriter/parquet.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | |
write_metrics_dict(metrics_dict: dict[str, dict[str, Any]]) -> None
Aggregate and write dataset-level metrics for all selections to a Parquet file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metrics_dict
|
dict[str, dict[str, Any]]
|
Map of selection names to their computed metric dictionaries. |
required |
Source code in packages/dqm-ml-job/src/dqm_ml_job/outputwriter/parquet.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
write_table(path_pattern: str, features_array: dict[str, Any], part: int | None = None) -> None
Write a table of features or metrics to a Parquet file on disk.
Handles directory creation if the target path doesn't exist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path_pattern
|
str
|
Identifier for the data detination (used in filename pattern). |
required |
features_array
|
dict[str, Any]
|
Map of column names to pyarrow Arrays. |
required |
part
|
int | None
|
Optional partition index for chunked output. |
None
|
Source code in packages/dqm-ml-job/src/dqm_ml_job/outputwriter/parquet.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |