dqm_ml_job.outputwriter.parquet
Parquet output writer for persisting pipeline results.
This module contains the ParquetOutputWriter class that writes metrics and features to Parquet files.
logger = logging.getLogger(__name__)
module-attribute
ParquetOutputWriter
Output writer that saves processed features to a Parquet file.
Source code in packages/dqm-ml-job/src/dqm_ml_job/outputwriter/parquet.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
columns = config['columns']
instance-attribute
name = name
instance-attribute
path_pattern = config['path_pattern']
instance-attribute
__init__(name: str, config: dict[str, Any] | None = None)
Initialize a ParquetOutputWriter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Unique name for this output writer. |
required |
config
|
dict[str, Any] | None
|
Configuration dictionary with keys: - path_pattern (str): Output file path format string. - columns (List[str]): Columns to save. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If required config keys are missing. |
Source code in packages/dqm-ml-job/src/dqm_ml_job/outputwriter/parquet.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | |
write_metrics_dict(metrics_dict: dict[str, dict[str, Any]]) -> None
Aggregate and write dataset-level metrics for all selections to a Parquet file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metrics_dict
|
dict[str, dict[str, Any]]
|
Map of selection names to their computed metric dictionaries. |
required |
Source code in packages/dqm-ml-job/src/dqm_ml_job/outputwriter/parquet.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
write_table(path_pattern: str, features_array: dict[str, Any], part: int | None = None) -> None
Write a table of features or metrics to a Parquet file on disk.
Handles directory creation if the target path doesn't exist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path_pattern
|
str
|
Identifier for the data detination (used in filename pattern). |
required |
features_array
|
dict[str, Any]
|
Map of column names to pyarrow Arrays. |
required |
part
|
int | None
|
Optional partition index for chunked output. |
None
|
Source code in packages/dqm-ml-job/src/dqm_ml_job/outputwriter/parquet.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |