Dataset and dataloader configuration
This section describes the dataset configuration used to load experiments into the Experanto dataloaders. It includes global settings, modality-specific configurations, and dataloader parameters.
Default YAML configuration
dataset:
global_sampling_rate: null
global_chunk_size: null
add_behavior_as_channels: false
replace_nans_with_means: false
cache_data: false
out_keys:
- screen
- responses
- eye_tracker
- treadmill
modality_config:
screen:
keep_nans: false
sampling_rate: 30
chunk_size: 60
valid_condition:
tier: train
offset: 0
sample_stride: 1
include_blanks: true
transforms:
normalization: normalize
Resize:
_target_: torchvision.transforms.v2.Resize
size:
- 144
- 256
interpolation:
rescale: true
rescale_size:
- 144
- 256
responses:
keep_nans: false
sampling_rate: 8
chunk_size: 16
offset: 0.0
transforms:
normalization: normalize_variance_only
interpolation:
interpolation_mode: nearest_neighbor
filters:
nan_filter:
__target__: experanto.filters.common_filters.nan_filter
__partial__: true
vicinity: 0.05
eye_tracker:
keep_nans: false
sampling_rate: 30
chunk_size: 60
offset: 0
transforms:
normalization: normalize
interpolation:
interpolation_mode: nearest_neighbor
filters:
nan_filter:
__target__: experanto.filters.common_filters.nan_filter
__partial__: true
vicinity: 0.05
treadmill:
keep_nans: false
sampling_rate: 30
chunk_size: 60
offset: 0
transforms:
normalization: normalize
interpolation:
interpolation_mode: nearest_neighbor
filters:
nan_filter:
__target__: experanto.filters.common_filters.nan_filter
__partial__: true
vicinity: 0.05
dataloader:
batch_size: 16
shuffle: true
num_workers: 2
pin_memory: true
drop_last: true
prefetch_factor: 2
Viewing the configuration
from omegaconf import OmegaConf, open_dict
from experanto.configs import DEFAULT_CONFIG as cfg
print(OmegaConf.to_yaml(cfg))
Modifying the configuration
You can change parameters programmatically:
cfg.dataset.modality_config.screen.include_blanks = True
cfg.dataset.modality_config.screen.valid_condition = {"tier": "train"}
cfg.dataloader.num_workers = 8
Configuration options
Dataset options
global_sampling_rateOverride sampling rate for all modalities. Set to
Noneto use per-modality rates.global_chunk_sizeOverride chunk size (number of time steps/data points) for all modalities. Set to
Noneto use per-modality sizes.The time window covered by a chunk is
chunk_size / sampling_rate, so theglobal_sampling_rateshould be taken into account:With
global_sampling_rateset: all modalities share the same output rate, so a singleglobal_chunk_sizeunambiguously gives every modality the same time window.Without
global_sampling_rate(per-modality rates active): different modalities have different rates, so the same sample count produces different durations. In this case, leaveglobal_chunk_sizeasNoneand setchunk_sizeper modality instead.
add_behavior_as_channelsIf
True, concatenate behavioral data (e.g., eye tracker, treadmill) as additional channels to the screen data.replace_nans_with_meansIf
True, replace NaN values with the mean of non-NaN values.cache_dataIf
True, cache interpolated data in memory for faster access.out_keysList of modality keys to include in the output dictionary.
normalize_timestampsIf
True, normalize timestamps to start from 0.
Modality options
Each modality (e.g., screen, responses, eye_tracker, treadmill) supports:
keep_nansWhether to keep NaN values in the output.
sampling_rateControls the spacing of the time points that the dataset constructs and passes to
interpolate(). Concretely, each item in the dataset requests values at timesstart, start + 1/sampling_rate, start + 2/sampling_rate, …. The interpolator then interpolates the stored raw samples at those points.chunk_sizeNumber of time steps/data points returned per item for this modality. Internally,
sampling_ratedefines the spacing of the time points passed to the interpolator, so the covered time window is:\[\text{duration (s)} = \frac{\text{chunk\_size}}{\text{sampling\_rate}}\]Note that
sampling_ratehere controls the spacing of the time points requested from the underlying experiment (seesampling_rateabove). The native acquisition rate of the signal does not matter (the interpolator simply looks up the stored values closest to each requested time, e.g.).When per-modality output rates differ,
chunk_sizemust be set per modality to cover the same time window. The default configuration keeps all modalities at a 2-second window while using different output rates:Modality
sampling_rate
chunk_size
Duration
screen
30 Hz
60
2 s
eye_tracker
30 Hz
60
2 s
treadmill
30 Hz
60
2 s
responses
8 Hz
16
2 s
If you unify all rates with
global_sampling_rate, useglobal_chunk_sizeinstead and this per-modality value is ignored. In general:chunk_size = desired_duration_seconds * sampling_rate.offsetTime offset in seconds applied to the time points constructed for this modality. For example, if the screen is queried at times
[t, t + 1/sampling_rate, …], settingoffset = 0.1on responses means responses are queried at[t + 0.1, t + 0.1 + 1/sampling_rate, …]. Useful for aligning modalities with known temporal delays relative to the screen stimulus.transformsDictionary of transforms to apply at the dataset level. This is modality specific, i.e., not all modalities support the same set of transforms. Some examples include
"normalize"for sequences, such as eye_tracker, and"normalize_variance_only"for responses.To understand how transforms are loaded and applied internally, refer to
experanto.datasets.ChunkDataset.initialize_transforms(). If you need to implement a custom transform, we recommend following the same pattern used there. In particular, note how each entry in thetransformsdictionary is checked and, when it is a configdict, instantiated via Hydra before being added to the transform pipeline.You can point Experanto to any callable (function or class) by using Hydra’s
_target_key, which triggers hydra.utils.instantiate under the hood (e.g.,_target_: my_package.my_module.MyTransform).interpolationInterpolation settings. This is modality specific, i.e., not all modalities support the same set of interpolation methods. Some examples include
"rescale"for the screen and"interpolation_mode"(e.g.,"nearest_neighbor") for sequences.filtersDictionary of filter functions to apply to the data.
Dataloader options
All standard torch.utils.data.DataLoader options are supported. See the
PyTorch DataLoader documentation
for the full list of available parameters.