Buckeye Corpus

The Buckeye Corpus of conversational speech is a speech corpus created by a team of linguists and psychologists at Ohio State University led by Prof. Mark Pitt.[1][2] [3][4] It contains high-quality recordings from 40 speakers in Columbus, Ohio conversing freely with an interviewer. The interviewer's voice is heard only faintly in the background of these recordings. The sessions were conducted as Sociolinguistics interviews, and are essentially monologues. The speech has been orthographically transcribed and phonetically labeled. The audio and text files, together with time-aligned phonetic labels, are stored in a format for use with speech analysis software (Xwaves and Wavesurfer). Software for searching the transcription files is also available at the project web site. The corpus is available to researchers in academics and industry.

The project was funded by the National Institute on Deafness and Other Communication Disorders and the Office of Research at Ohio State University.

  1. ^ Pitt, Mark, Keith Johnson, Elizabeth Hume, Scott Kiesling, and William Raymond. (2005). The Buckeye Corpus of Conversational Speech: Labeling Conventions and a Test of Transcriber Reliability. Speech Communication, 45, 90-95.
  2. ^ Raymond, William D., Robin Dautricourt, and Elizabeth Hume. (2006). Word-medial /t,d/ deletion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonological factors. Language Variation and Change, 18(1), 55-97.
  3. ^ Eric Fosler-Lussier, Laura Dilley, Na’im Tyson, Mark Pitt (2007) The Buckeye Corpus of Speech: Updates and Enhancements. In Proceedings of Interspeech 2007, Antwerp, Belgium.
  4. ^ Dilley, L., & Pitt, M. (2007). A study of regressive place assimilation in spontaneous speech and its implications for spoken word recognition. Journal of the Acoustical Society of America, 122(4), 2340-2353.