PlibData¶

class PlibData(data=None, data_sources=None, path_to_data_sources=None)[source]¶

Bases: Dataset[OutT], Generic[OutT]

Data structure for hosting perturb-lib data.

Parameters:

data – data to initialize the class with.
data_sources (UnionType[str, list[str], None]) – if data is None, data_sources can be used to specify the names of the data sources.

apply_transform(transform)[source]¶

Apply a transformation to the data.

abstract get_data_loader(batch_size, num_workers=0, pin_memory=False, shuffle=False)[source]¶

Fetch a torch-style data loader for batch sampling.

Parameters:

batch_size (Optional[int]) – The size of a batch to fetch in each iteration.
num_workers (int) – Number of pytorch workers.
pin_memory (bool) – If true, Copy Tensors into device pinned memory before returning them.
shuffle (bool) – If false, samples will be sampled sequentially to form batches. If true, samples will be shuffled.

Return type:

DataLoader[TypeVar(OutT)]

Returns:

an instance of torch.utils.data.DataLoader

abstract init_from_files(path_to_data_sources, data_sources)[source]¶

Initializes PlibData from multiple files.

abstract subset_columnwise(columns)[source]¶

Select a subset of existing columns.