PlibData

class PlibData(data=None, data_sources=None, path_to_data_sources=None)[source]

Bases: Dataset[OutT], Generic[OutT]

Data structure for hosting perturb-lib data.

Parameters:
  • data – data to initialize the class with.

  • data_sources (UnionType[str, list[str], None]) – if data is None, data_sources can be used to specify the names of the data sources.

apply_transform(transform)[source]

Apply a transformation to the data.

Return type:

PlibData[TypeVar(NewOutT)]

abstract property columns: list[str]

The list of column names.

abstract property dtypes: dict

Dictionary of data types.

abstract get_data_loader(batch_size, num_workers=0, pin_memory=False, shuffle=False)[source]

Fetch a torch-style data loader for batch sampling.

Parameters:
  • batch_size (Optional[int]) – The size of a batch to fetch in each iteration.

  • num_workers (int) – Number of pytorch workers.

  • pin_memory (bool) – If true, Copy Tensors into device pinned memory before returning them.

  • shuffle (bool) – If false, samples will be sampled sequentially to form batches. If true, samples will be shuffled.

Return type:

DataLoader[TypeVar(OutT)]

Returns:

an instance of torch.utils.data.DataLoader

abstract init_from_files(path_to_data_sources, data_sources)[source]

Initializes PlibData from multiple files.

Return type:

DataFrame

abstract subset_columnwise(columns)[source]

Select a subset of existing columns.

Parameters:

columns (list[str]) – The names of columns to keep

Return type:

Self