Nijmegen Corpus of Casual Spanish

Corpus contents

The motivation behind the creation of the Nijmegen Corpus of Casual Spanish was to provide large amounts of high-quality recordings of casual speech suitable for phonetic analysis. The uniqueness of our corpus can be characterized as follows:

  • It contains around 30 hours of orthographically-transcribed casual conversations elicited following a thoroughly tested procedure (393,000 word tokens, 16,500 word types).
  • It contains high-quality recordings captured with head-mounted microphones in a sound-attenuated room.
  • It contains speech from 52 speakers (27 female and 25 male) sharing the same geographic and educational background. This allows researchers to study inter-speaker variation.
  • It contains large amounts of data for every speaker (around 90 minutes of recorded conversation for every group of three speakers). This allows researchers to study within-speaker variability.
  • It contains audio as well as video data, which can be used to study facial and body gestures during verbal communication.