Investigating the presentation of speech, writing and thought in spoken British English: a corpus-based approach

Daniel McIntyre, Carol Bellard-Thompson, John Heywood, Tony McEnery, Elena Semino, Mick Short

Research output: Contribution to journalArticle

Abstract

In this paper we describe the Lancaster Speech, Writing and Thought Presentation (SW&TP2 ) Spoken Corpus. We have constructed this corpus to investigate the ways in which speakers present speech, thought and writing in contemporary spoken British English, with the associated aim of comparing our findings with the patterns revealed by the previous Lancaster corpus-based investigation of SW&TP in written texts. We describe the structure of the corpus and the archives from which its composite texts are taken. These are the spoken section of the British National Corpus, and archives currently housed in the Centre for North West Regional Studies (CNWRS) at Lancaster University. We discuss the decisions that we made concerning the selection of suitable extracts from the archives, the re-transcription that was necessary in order to use the original CNWRS archive texts in our corpus, and the problems associated with the original archived transcripts. Having described the sources of our corpus, we move on to consider issues surrounding the mark-up of our data with TEI-conformant SGML, and the problems associated with capturing in electronic form the CNWRS archive material. We then explain the tagging format we adopted in annotating our data for Speech, Writing and Thought Presentation and discuss how this was developed from the earlier version used for tagging written texts. We also discuss some preliminary analyses which point towards fruitful future lines of investigation.
LanguageEnglish
Pages49-76
Number of pages28
JournalICAME Journal
Volume28
Publication statusPublished - 1 Jan 2004
Externally publishedYes

Fingerprint

Corpus-based
British English
Regional Studies
Tagging
British National Corpus
Transcription
Spoken Corpora

Cite this

McIntyre, D., Bellard-Thompson, C., Heywood, J., McEnery, T., Semino, E., & Short, M. (2004). Investigating the presentation of speech, writing and thought in spoken British English: a corpus-based approach. ICAME Journal, 28, 49-76.
McIntyre, Daniel ; Bellard-Thompson, Carol ; Heywood, John ; McEnery, Tony ; Semino, Elena ; Short, Mick. / Investigating the presentation of speech, writing and thought in spoken British English : a corpus-based approach. In: ICAME Journal. 2004 ; Vol. 28. pp. 49-76.
@article{c583d8d00ebc4d70a3942f61bcbc7345,
title = "Investigating the presentation of speech, writing and thought in spoken British English: a corpus-based approach",
abstract = "In this paper we describe the Lancaster Speech, Writing and Thought Presentation (SW&TP2 ) Spoken Corpus. We have constructed this corpus to investigate the ways in which speakers present speech, thought and writing in contemporary spoken British English, with the associated aim of comparing our findings with the patterns revealed by the previous Lancaster corpus-based investigation of SW&TP in written texts. We describe the structure of the corpus and the archives from which its composite texts are taken. These are the spoken section of the British National Corpus, and archives currently housed in the Centre for North West Regional Studies (CNWRS) at Lancaster University. We discuss the decisions that we made concerning the selection of suitable extracts from the archives, the re-transcription that was necessary in order to use the original CNWRS archive texts in our corpus, and the problems associated with the original archived transcripts. Having described the sources of our corpus, we move on to consider issues surrounding the mark-up of our data with TEI-conformant SGML, and the problems associated with capturing in electronic form the CNWRS archive material. We then explain the tagging format we adopted in annotating our data for Speech, Writing and Thought Presentation and discuss how this was developed from the earlier version used for tagging written texts. We also discuss some preliminary analyses which point towards fruitful future lines of investigation.",
author = "Daniel McIntyre and Carol Bellard-Thompson and John Heywood and Tony McEnery and Elena Semino and Mick Short",
year = "2004",
month = "1",
day = "1",
language = "English",
volume = "28",
pages = "49--76",
journal = "ICAME Journal",

}

McIntyre, D, Bellard-Thompson, C, Heywood, J, McEnery, T, Semino, E & Short, M 2004, 'Investigating the presentation of speech, writing and thought in spoken British English: a corpus-based approach', ICAME Journal, vol. 28, pp. 49-76.

Investigating the presentation of speech, writing and thought in spoken British English : a corpus-based approach. / McIntyre, Daniel; Bellard-Thompson, Carol; Heywood, John; McEnery, Tony; Semino, Elena; Short, Mick.

In: ICAME Journal, Vol. 28, 01.01.2004, p. 49-76.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Investigating the presentation of speech, writing and thought in spoken British English

T2 - ICAME Journal

AU - McIntyre, Daniel

AU - Bellard-Thompson, Carol

AU - Heywood, John

AU - McEnery, Tony

AU - Semino, Elena

AU - Short, Mick

PY - 2004/1/1

Y1 - 2004/1/1

N2 - In this paper we describe the Lancaster Speech, Writing and Thought Presentation (SW&TP2 ) Spoken Corpus. We have constructed this corpus to investigate the ways in which speakers present speech, thought and writing in contemporary spoken British English, with the associated aim of comparing our findings with the patterns revealed by the previous Lancaster corpus-based investigation of SW&TP in written texts. We describe the structure of the corpus and the archives from which its composite texts are taken. These are the spoken section of the British National Corpus, and archives currently housed in the Centre for North West Regional Studies (CNWRS) at Lancaster University. We discuss the decisions that we made concerning the selection of suitable extracts from the archives, the re-transcription that was necessary in order to use the original CNWRS archive texts in our corpus, and the problems associated with the original archived transcripts. Having described the sources of our corpus, we move on to consider issues surrounding the mark-up of our data with TEI-conformant SGML, and the problems associated with capturing in electronic form the CNWRS archive material. We then explain the tagging format we adopted in annotating our data for Speech, Writing and Thought Presentation and discuss how this was developed from the earlier version used for tagging written texts. We also discuss some preliminary analyses which point towards fruitful future lines of investigation.

AB - In this paper we describe the Lancaster Speech, Writing and Thought Presentation (SW&TP2 ) Spoken Corpus. We have constructed this corpus to investigate the ways in which speakers present speech, thought and writing in contemporary spoken British English, with the associated aim of comparing our findings with the patterns revealed by the previous Lancaster corpus-based investigation of SW&TP in written texts. We describe the structure of the corpus and the archives from which its composite texts are taken. These are the spoken section of the British National Corpus, and archives currently housed in the Centre for North West Regional Studies (CNWRS) at Lancaster University. We discuss the decisions that we made concerning the selection of suitable extracts from the archives, the re-transcription that was necessary in order to use the original CNWRS archive texts in our corpus, and the problems associated with the original archived transcripts. Having described the sources of our corpus, we move on to consider issues surrounding the mark-up of our data with TEI-conformant SGML, and the problems associated with capturing in electronic form the CNWRS archive material. We then explain the tagging format we adopted in annotating our data for Speech, Writing and Thought Presentation and discuss how this was developed from the earlier version used for tagging written texts. We also discuss some preliminary analyses which point towards fruitful future lines of investigation.

M3 - Article

VL - 28

SP - 49

EP - 76

JO - ICAME Journal

JF - ICAME Journal

ER -