TY - JOUR
T1 - Investigating the presentation of speech, writing and thought in spoken British English
T2 - a corpus-based approach
AU - McIntyre, Daniel
AU - Bellard-Thompson, Carol
AU - Heywood, John
AU - McEnery, Tony
AU - Semino, Elena
AU - Short, Mick
PY - 2004/1/1
Y1 - 2004/1/1
N2 - In this paper we describe the Lancaster Speech, Writing and Thought Presentation
(SW&TP2
) Spoken Corpus. We have constructed this corpus to investigate
the ways in which speakers present speech, thought and writing in contemporary
spoken British English, with the associated aim of comparing our findings with
the patterns revealed by the previous Lancaster corpus-based investigation of
SW&TP in written texts. We describe the structure of the corpus and the
archives from which its composite texts are taken. These are the spoken section
of the British National Corpus, and archives currently housed in the Centre for
North West Regional Studies (CNWRS) at Lancaster University. We discuss the
decisions that we made concerning the selection of suitable extracts from the
archives, the re-transcription that was necessary in order to use the original
CNWRS archive texts in our corpus, and the problems associated with the original archived transcripts. Having described the sources of our corpus, we move on to consider issues surrounding the mark-up of our data with TEI-conformant SGML, and the problems associated with capturing in electronic form the CNWRS archive material. We then explain the tagging format we adopted in
annotating our data for Speech, Writing and Thought Presentation and discuss
how this was developed from the earlier version used for tagging written texts.
We also discuss some preliminary analyses which point towards fruitful future
lines of investigation.
AB - In this paper we describe the Lancaster Speech, Writing and Thought Presentation
(SW&TP2
) Spoken Corpus. We have constructed this corpus to investigate
the ways in which speakers present speech, thought and writing in contemporary
spoken British English, with the associated aim of comparing our findings with
the patterns revealed by the previous Lancaster corpus-based investigation of
SW&TP in written texts. We describe the structure of the corpus and the
archives from which its composite texts are taken. These are the spoken section
of the British National Corpus, and archives currently housed in the Centre for
North West Regional Studies (CNWRS) at Lancaster University. We discuss the
decisions that we made concerning the selection of suitable extracts from the
archives, the re-transcription that was necessary in order to use the original
CNWRS archive texts in our corpus, and the problems associated with the original archived transcripts. Having described the sources of our corpus, we move on to consider issues surrounding the mark-up of our data with TEI-conformant SGML, and the problems associated with capturing in electronic form the CNWRS archive material. We then explain the tagging format we adopted in
annotating our data for Speech, Writing and Thought Presentation and discuss
how this was developed from the earlier version used for tagging written texts.
We also discuss some preliminary analyses which point towards fruitful future
lines of investigation.
M3 - Article
VL - 28
SP - 49
EP - 76
JO - ICAME Journal
JF - ICAME Journal
SN - 1502-5462
ER -