dqm_ml_job.dataloaders.proto
Protocol definitions for data loaders and selections.
This module contains the DataLoader and DataSelection protocol classes that define the interface for data loading implementations.
DataLoader
Bases: Protocol
Protocol for Data Loader factories.
A DataLoader is responsible for scanning a source (disk, DB, S3) and discovering available DataSelections based on its configuration.
Source code in packages/dqm-ml-job/src/dqm_ml_job/dataloaders/proto.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
get_selections() -> list[DataSelection]
Discover and return the list of available selections for this loader.
Returns:
| Type | Description |
|---|---|
list[DataSelection]
|
A list of initialized DataSelection instances. |
Source code in packages/dqm-ml-job/src/dqm_ml_job/dataloaders/proto.py
52 53 54 55 56 57 58 | |
DataSelection
Bases: Protocol
Protocol for a specific subset of data discovered by a DataLoader.
A DataSelection represents a concrete set of samples (e.g., a specific folder, a filtered view of a database, or a single file) and provides an iterator over data batches.
Source code in packages/dqm-ml-job/src/dqm_ml_job/dataloaders/proto.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | |
name: str
instance-attribute
__iter__() -> Any
Iterate over the selection, yielding pyarrow.RecordBatch objects.
Source code in packages/dqm-ml-job/src/dqm_ml_job/dataloaders/proto.py
37 38 39 40 | |
bootstrap(columns_list: list[str]) -> None
Perform initial setup for the selection before iteration starts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
columns_list
|
list[str]
|
List of column names that must be loaded for this selection. |
required |
Source code in packages/dqm-ml-job/src/dqm_ml_job/dataloaders/proto.py
22 23 24 25 26 27 28 | |
get_nb_batches() -> int
Return the estimated number of batches in this selection.
Used primarily for progress bar estimation.
Source code in packages/dqm-ml-job/src/dqm_ml_job/dataloaders/proto.py
30 31 32 33 34 35 | |