Corpus transcriptionThe corpus was orthographically annotated by two professional transcribers using Transcriber software. The audio stream was manually segmented, separately for each speaker, into small chunks of a few seconds. The transcription guidelines stated that these chunks should contain speech having a clear degree of syntactic and semantic coherence and no long stretches of silence. When possible, boundaries between chunks were placed during pauses Each speaker was transcribed in a different annotation file. Transcribers were asked to restore common elisions to their full orthographic forms. For instance, expressions characteristic of casual speech such as y a 'there is' or J'sais pas 'I don't know' were respectively transcribed as il y a and je sais pas. Filled pauses were marked in the text by using specific orthographic forms (e.g. euh, hum). Breathing, laughter and mouth noises were also indicated in the transcriptions. |