OnDiskPlibData¶
- class OnDiskPlibData(data=None, data_sources=None, path_to_data_sources=None, columns=None)[source]¶
Bases:
PlibData
[OutT
],Generic
[OutT
]Class for handling on-disk data. Implemented using
pytables
backend. . +====================+ . | OnDiskPlibData | . | | — __getitem__ —–> . | | — __iter__ ——-> . | | . +====================+ . | . v . _iterate_shards (internal) . | . v . +——————-+ . | ShuffleBuffer | . +——————-+ . | . __iter__ (produce batches) . | . v . +——————-+ . | DataLoader | . +——————-+- apply_transform(transform)[source]¶
Apply a transformation to the data.
- Return type:
OnDiskPlibData
[TypeVar
(NewOutT
)]
- property columns: list[str]¶
The list of column names.
- property dtypes: dict¶
Dictionary of data types.
- get_data_loader(batch_size, num_workers=0, pin_memory=False, shuffle=False)[source]¶
Fetch a torch-style data loader for batch sampling.
- Parameters:
batch_size (
Optional
[int
]) – The size of a batch to fetch in each iteration. If None, return shards directlynum_workers (
int
) – Number of pytorch workers.pin_memory (
bool
) – If true, Copy Tensors into device pinned memory before returning them.shuffle (
bool
) – If false, samples will be sampled sequentially to form batches. If true, samples will be shuffled.
- Return type:
DataLoader
[TypeVar
(OutT
)]- Returns:
an instance of
torch.utils.data.DataLoader