experanto.utils.MultiEpochsDataLoader

class MultiEpochsDataLoader(*args, **kwargs)[source]

Bases: DataLoader

DataLoader that keeps workers alive across epochs.

Solves a bug where worker processes are re-spawned at the start of each epoch, causing significant overhead. Workers are initialized once and reused throughout training.

Parameters:
  • *args – Positional arguments forwarded to torch.utils.data.DataLoader.

  • shuffle_each_epoch (bool, default=False) – If True and the underlying dataset has a shuffle_valid_screen_times method, that method is called at the start of every epoch.

  • **kwargs – Keyword arguments forwarded to torch.utils.data.DataLoader.

References

https://discuss.pytorch.org/t/enumerate-dataloader-slow/87778

https://github.com/huggingface/pytorch-image-models/blob/d72ac0db259275233877be8c1d4872163954dfbb/timm/data/loader.py#L209-L238

Methods

__init__(*args[, shuffle_each_epoch])

__init__(*args, shuffle_each_epoch=False, **kwargs)[source]