The Dutch EEG Speech Register Corpus

Home

The Dutch EEG Speech Register Corpus contains 207 hours of EEG recordings from 48 participants listening to natural connected speech. The speech recordings were sampled from spontaneous dialogues, news broadcasts, and read-aloud stories in Dutch. They contain 50,277 word tokens per participant, time-locked to the EEG signal.The EEG signal was automatically cleaned and all artefacts identified were manually checked. The EEG recodings (raw and cleaned) contain 1.5 million word epochs.

A detailed description of the corpus can be found [here]

The data and materials are available at: https://data.ru.nl/collections/ru/cls/dutch_eeg_speech_register_corpus_dsc_807

The DOI of the dataset is: https://doi.org/10.34973/97pv-jw72

An example of a study based on the corpus and showing its validity:

M. Bentum, L.F.M. ten Bosch, A. van den Bosch & M. Ernestus (2022). Speech register influences listeners' word expectations. journal=Brain and Language, 235:105197, pages=1-11 [pdf]