The construction of a corpus to investigate the presentation of speech, thought and writing in written and spoken British English

Daniel McIntyre, Carol Bellard-Thompson, John Heywood, Tony McEnery, Elena Semino, Mick Short

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we describe the Lancaster Speech, Thought and Writing Presentation (ST&WP) Spoken Corpus. We have constructed this corpus to investigate the ways in which speakers present speech, thought and writing in contemporary spoken British English, with the associated aim of comparing our findings with the patterns revealed by the previous Lancaster corpus-based investigation of ST&WP in written texts. We describe the structure of the corpus, the archives from which its composite texts are taken, the decisions that we made concerning the selection of suitable extracts from the archives, and the problems associated with the original archived transcripts. We then move on to consider issues surrounding the mark-up of our data with TEI-conformant SGML, and explain the tagging format we adopted in annotating our data for ST&WP.
LanguageEnglish
Title of host publicationProceedings of the Corpus Linguistics 2003 conference
EditorsDawn Archer, Paul Rayson, Andrew Wilson, Tony McEnery
Pages513-522
Number of pages10
Volume16
Publication statusPublished - 2003
Externally publishedYes

Publication series

NameUCREL Technical Papers

Fingerprint

British English
Tagging
Spoken Corpora
Thought
Corpus-based

Cite this

McIntyre, D., Bellard-Thompson, C., Heywood, J., McEnery, T., Semino, E., & Short, M. (2003). The construction of a corpus to investigate the presentation of speech, thought and writing in written and spoken British English. In D. Archer, P. Rayson, A. Wilson, & T. McEnery (Eds.), Proceedings of the Corpus Linguistics 2003 conference (Vol. 16, pp. 513-522). (UCREL Technical Papers).
McIntyre, Daniel ; Bellard-Thompson, Carol ; Heywood, John ; McEnery, Tony ; Semino, Elena ; Short, Mick. / The construction of a corpus to investigate the presentation of speech, thought and writing in written and spoken British English. Proceedings of the Corpus Linguistics 2003 conference. editor / Dawn Archer ; Paul Rayson ; Andrew Wilson ; Tony McEnery. Vol. 16 2003. pp. 513-522 (UCREL Technical Papers).
@inproceedings{2c706443bb2142169fc940243e275c9b,
title = "The construction of a corpus to investigate the presentation of speech, thought and writing in written and spoken British English",
abstract = "In this paper we describe the Lancaster Speech, Thought and Writing Presentation (ST&WP) Spoken Corpus. We have constructed this corpus to investigate the ways in which speakers present speech, thought and writing in contemporary spoken British English, with the associated aim of comparing our findings with the patterns revealed by the previous Lancaster corpus-based investigation of ST&WP in written texts. We describe the structure of the corpus, the archives from which its composite texts are taken, the decisions that we made concerning the selection of suitable extracts from the archives, and the problems associated with the original archived transcripts. We then move on to consider issues surrounding the mark-up of our data with TEI-conformant SGML, and explain the tagging format we adopted in annotating our data for ST&WP.",
author = "Daniel McIntyre and Carol Bellard-Thompson and John Heywood and Tony McEnery and Elena Semino and Mick Short",
year = "2003",
language = "English",
isbn = "1862201315",
volume = "16",
series = "UCREL Technical Papers",
pages = "513--522",
editor = "Dawn Archer and Paul Rayson and Andrew Wilson and Tony McEnery",
booktitle = "Proceedings of the Corpus Linguistics 2003 conference",

}

McIntyre, D, Bellard-Thompson, C, Heywood, J, McEnery, T, Semino, E & Short, M 2003, The construction of a corpus to investigate the presentation of speech, thought and writing in written and spoken British English. in D Archer, P Rayson, A Wilson & T McEnery (eds), Proceedings of the Corpus Linguistics 2003 conference. vol. 16, UCREL Technical Papers, pp. 513-522.

The construction of a corpus to investigate the presentation of speech, thought and writing in written and spoken British English. / McIntyre, Daniel; Bellard-Thompson, Carol; Heywood, John; McEnery, Tony; Semino, Elena; Short, Mick.

Proceedings of the Corpus Linguistics 2003 conference. ed. / Dawn Archer; Paul Rayson; Andrew Wilson; Tony McEnery. Vol. 16 2003. p. 513-522 (UCREL Technical Papers).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - The construction of a corpus to investigate the presentation of speech, thought and writing in written and spoken British English

AU - McIntyre, Daniel

AU - Bellard-Thompson, Carol

AU - Heywood, John

AU - McEnery, Tony

AU - Semino, Elena

AU - Short, Mick

PY - 2003

Y1 - 2003

N2 - In this paper we describe the Lancaster Speech, Thought and Writing Presentation (ST&WP) Spoken Corpus. We have constructed this corpus to investigate the ways in which speakers present speech, thought and writing in contemporary spoken British English, with the associated aim of comparing our findings with the patterns revealed by the previous Lancaster corpus-based investigation of ST&WP in written texts. We describe the structure of the corpus, the archives from which its composite texts are taken, the decisions that we made concerning the selection of suitable extracts from the archives, and the problems associated with the original archived transcripts. We then move on to consider issues surrounding the mark-up of our data with TEI-conformant SGML, and explain the tagging format we adopted in annotating our data for ST&WP.

AB - In this paper we describe the Lancaster Speech, Thought and Writing Presentation (ST&WP) Spoken Corpus. We have constructed this corpus to investigate the ways in which speakers present speech, thought and writing in contemporary spoken British English, with the associated aim of comparing our findings with the patterns revealed by the previous Lancaster corpus-based investigation of ST&WP in written texts. We describe the structure of the corpus, the archives from which its composite texts are taken, the decisions that we made concerning the selection of suitable extracts from the archives, and the problems associated with the original archived transcripts. We then move on to consider issues surrounding the mark-up of our data with TEI-conformant SGML, and explain the tagging format we adopted in annotating our data for ST&WP.

UR - http://ucrel.lancs.ac.uk/publications/CL2003/contents.htm

UR - http://ucrel.lancs.ac.uk/tech_papers.html

M3 - Conference contribution

SN - 1862201315

VL - 16

T3 - UCREL Technical Papers

SP - 513

EP - 522

BT - Proceedings of the Corpus Linguistics 2003 conference

A2 - Archer, Dawn

A2 - Rayson, Paul

A2 - Wilson, Andrew

A2 - McEnery, Tony

ER -

McIntyre D, Bellard-Thompson C, Heywood J, McEnery T, Semino E, Short M. The construction of a corpus to investigate the presentation of speech, thought and writing in written and spoken British English. In Archer D, Rayson P, Wilson A, McEnery T, editors, Proceedings of the Corpus Linguistics 2003 conference. Vol. 16. 2003. p. 513-522. (UCREL Technical Papers).