In this paper we describe the Lancaster Speech, Thought and Writing Presentation (ST&WP) Spoken Corpus. We have constructed this corpus to investigate the ways in which speakers present speech, thought and writing in contemporary spoken British English, with the associated aim of comparing our
findings with the patterns revealed by the previous Lancaster corpus-based investigation of ST&WP in written texts. We describe the structure of the corpus, the archives from which its composite texts are taken, the decisions that we made concerning the selection of suitable extracts from the archives, and
the problems associated with the original archived transcripts. We then move on to consider issues surrounding the mark-up of our data with TEI-conformant SGML, and explain the tagging format we adopted in annotating our data for ST&WP.
|Title of host publication||Proceedings of the Corpus Linguistics 2003 conference|
|Editors||Dawn Archer, Paul Rayson, Andrew Wilson, Tony McEnery|
|Number of pages||10|
|Publication status||Published - 2003|
|Name||UCREL Technical Papers|