Corpus contents
The motivation behind the creation of the Nijmegen Corpus of Casual Spanish was to provide large amounts of high-quality recordings of casual speech suitable for phonetic analysis. The uniqueness of our corpus can be characterized as follows:
- It contains around 30 hours of orthographically-transcribed casual conversations elicited following a thoroughly tested procedure (393,000 word tokens, 16,500 word types).
- It contains high-quality recordings captured with head-mounted microphones in a sound-attenuated room.
- It contains speech from 52 speakers (27 female and 25 male) sharing the same geographic and educational background. This allows researchers to study inter-speaker variation.
- It contains large amounts of data for every speaker (around 90 minutes of recorded conversation for every group of three speakers). This allows researchers to study within-speaker variability.
- It contains audio as well as video data, which can be used to study facial and body gestures during verbal communication.
|