The construction of a corpus to investigate the presentation of speech, thought and writing in written and spoken British English

Daniel McIntyre, Carol Bellard-Thompson, John Heywood, Tony McEnery, Elena Semino, Mick Short

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we describe the Lancaster Speech, Thought and Writing Presentation (ST&WP) Spoken Corpus. We have constructed this corpus to investigate the ways in which speakers present speech, thought and writing in contemporary spoken British English, with the associated aim of comparing our findings with the patterns revealed by the previous Lancaster corpus-based investigation of ST&WP in written texts. We describe the structure of the corpus, the archives from which its composite texts are taken, the decisions that we made concerning the selection of suitable extracts from the archives, and the problems associated with the original archived transcripts. We then move on to consider issues surrounding the mark-up of our data with TEI-conformant SGML, and explain the tagging format we adopted in annotating our data for ST&WP.
Original languageEnglish
Title of host publicationProceedings of the Corpus Linguistics 2003 conference
EditorsDawn Archer, Paul Rayson, Andrew Wilson, Tony McEnery
Pages513-522
Number of pages10
Volume16
Publication statusPublished - 2003
Externally publishedYes

Publication series

NameUCREL Technical Papers

    Fingerprint

Cite this

McIntyre, D., Bellard-Thompson, C., Heywood, J., McEnery, T., Semino, E., & Short, M. (2003). The construction of a corpus to investigate the presentation of speech, thought and writing in written and spoken British English. In D. Archer, P. Rayson, A. Wilson, & T. McEnery (Eds.), Proceedings of the Corpus Linguistics 2003 conference (Vol. 16, pp. 513-522). (UCREL Technical Papers).