TY - GEN
T1 - The construction of a corpus to investigate the presentation of speech, thought and writing in written and spoken British English
AU - McIntyre, Daniel
AU - Bellard-Thompson, Carol
AU - Heywood, John
AU - McEnery, Tony
AU - Semino, Elena
AU - Short, Mick
PY - 2003
Y1 - 2003
N2 - In this paper we describe the Lancaster Speech, Thought and Writing Presentation (ST&WP) Spoken Corpus. We have constructed this corpus to investigate the ways in which speakers present speech, thought and writing in contemporary spoken British English, with the associated aim of comparing our
findings with the patterns revealed by the previous Lancaster corpus-based investigation of ST&WP in written texts. We describe the structure of the corpus, the archives from which its composite texts are taken, the decisions that we made concerning the selection of suitable extracts from the archives, and
the problems associated with the original archived transcripts. We then move on to consider issues surrounding the mark-up of our data with TEI-conformant SGML, and explain the tagging format we adopted in annotating our data for ST&WP.
AB - In this paper we describe the Lancaster Speech, Thought and Writing Presentation (ST&WP) Spoken Corpus. We have constructed this corpus to investigate the ways in which speakers present speech, thought and writing in contemporary spoken British English, with the associated aim of comparing our
findings with the patterns revealed by the previous Lancaster corpus-based investigation of ST&WP in written texts. We describe the structure of the corpus, the archives from which its composite texts are taken, the decisions that we made concerning the selection of suitable extracts from the archives, and
the problems associated with the original archived transcripts. We then move on to consider issues surrounding the mark-up of our data with TEI-conformant SGML, and explain the tagging format we adopted in annotating our data for ST&WP.
UR - http://ucrel.lancs.ac.uk/publications/CL2003/contents.htm
UR - http://ucrel.lancs.ac.uk/tech_papers.html
M3 - Conference contribution
SN - 1862201315
VL - 16
T3 - UCREL Technical Papers
SP - 513
EP - 522
BT - Proceedings of the Corpus Linguistics 2003 conference
A2 - Archer, Dawn
A2 - Rayson, Paul
A2 - Wilson, Andrew
A2 - McEnery, Tony
ER -