OnDiskPlibData¶
- class OnDiskPlibData(data=None, data_sources=None, path_to_data_sources=None, columns=None)[source]¶
Bases:
PlibData[OutT],Generic[OutT]Class for handling on-disk data. Implemented using
pytablesbackend. . +====================+ . | OnDiskPlibData | . | | — __getitem__ —–> . | | — __iter__ ——-> . | | . +====================+ . | . v . _iterate_shards (internal) . | . v . +——————-+ . | ShuffleBuffer | . +——————-+ . | . __iter__ (produce batches) . | . v . +——————-+ . | DataLoader | . +——————-+- apply_transform(transform)[source]¶
Apply a transformation to the data.
- Return type:
OnDiskPlibData[TypeVar(NewOutT)]
- property columns: list[str]¶
The list of column names.
- property dtypes: dict¶
Dictionary of data types.
- get_data_loader(batch_size, num_workers=0, pin_memory=False, shuffle=False)[source]¶
Fetch a torch-style data loader for batch sampling.
- Parameters:
batch_size (
Optional[int]) – The size of a batch to fetch in each iteration. If None, return shards directlynum_workers (
int) – Number of pytorch workers.pin_memory (
bool) – If true, Copy Tensors into device pinned memory before returning them.shuffle (
bool) – If false, samples will be sampled sequentially to form batches. If true, samples will be shuffled.
- Return type:
DataLoader[TypeVar(OutT)]- Returns:
an instance of
torch.utils.data.DataLoader