Ernestus Corpus of Spontaneous Dutch

Home

The Ernestus Corpus of Spontaneous Dutch contains high quality recordings of 10 conversations, each 90 minutes long, between friends or direct colleagues. The corpus was recorded between autumn 1995 and spring 1996 at the Institute of Phonetics of the University of Amsterdam.
Professional transcribers have created an orthographic transcription of the corpus by hand, while a phonemic transcription has been created automatically. Both types of transcriptions are stored in Praat TextGrid format.

The corpus is available to researchers in academics. If you would like to obtain access to the corpus, you can send an access request to the Radboud University Faculty of Arts data officer by e-mail dataofficer@let.ru.nl.The data officer will send you a data use agreement. After the data use agreement has been signed, you will be granted access to the corpus.

A detailed description of the corpus is provided in:

M. Ernestus (2000). Voice assimilation and segment reduction in casual Dutch: A corpus-based study of the phonology-phonetic interface. Holland Institute of Generative Linguistics, Utrecht.[pdf]

A description of the automatic generation of the phonemic transcription can be found in:

B. Schuppler, M. Ernestus, O. Scharenborg, & L. Boves (2011). Acoustic reduction in conversational Dutch: A quantitative analysis based on automatically generated segmental transcriptions. Journal of Phonetics 39, 96-109.[pdf]