Building a Spanish Lexicon for Corpus Analysis

Ricardo Jiménez-Yáñez, H. Sanjurjo-González, Paul Rayson, Scott Piao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper seeks to describe the creation of a Spanish lexicon with semantic annotation in order to analyse more extensive corpora in the Spanish language. The semantic resources most employed nowadays are WordNet, FrameNet, PDEV and USAS, but they have been used mainly for English language research. The creation of a large Spanish lexicon will permit a greater amount of studies of corpora in Spanish can be undertaken. In the description of the steps followed for the construction of the lexicon, the difficulties encountered in its creation, and the solutions used to overcome them will be described. Finally, the construction of the lexicon will allow specific research tasks to be carried out, such as metaphor analysis, ACD studies and even NLP studies.
Original languageEnglish
Title of host publicationProceedings of the 35th Edition of the International Conference of The Spanish Association of Applied Linguistics
Subtitle of host publicationLanguages at the Crossroads: Training, Accreditation and Context of Use
EditorsFrancisco Javier Díez Pérez, María Águeda Moreno Moreno
Place of PublicationJaén
PublisherPublicaciones de la Universidad de Jaén
Pages227-239
Number of pages13
Volume1
ISBN (Print)9788491591085
Publication statusPublished - May 2017
Externally publishedYes
EventInternational Conference of the Spanish Association of Applied Linguistics - Universidad de Jaén, Andalucia, Spain
Duration: 4 May 20176 May 2017
https://www.unebook.es/es/ebook/languages-at-the-crossroads-training-accreditation-and-context-of-use_E0002650393

Conference

ConferenceInternational Conference of the Spanish Association of Applied Linguistics
CountrySpain
CityAndalucia
Period4/05/176/05/17
Internet address

Fingerprint

Lexicon
Corpus Analysis
WordNet
Natural Language Processing
Spanish Language
Resources
Language Research
Annotation

Cite this

Jiménez-Yáñez, R., Sanjurjo-González, H., Rayson, P., & Piao, S. (2017). Building a Spanish Lexicon for Corpus Analysis. In F. J. Díez Pérez, & M. Á. Moreno Moreno (Eds.), Proceedings of the 35th Edition of the International Conference of The Spanish Association of Applied Linguistics : Languages at the Crossroads: Training, Accreditation and Context of Use (Vol. 1, pp. 227-239). Jaén: Publicaciones de la Universidad de Jaén.
Jiménez-Yáñez, Ricardo ; Sanjurjo-González, H. ; Rayson, Paul ; Piao, Scott. / Building a Spanish Lexicon for Corpus Analysis. Proceedings of the 35th Edition of the International Conference of The Spanish Association of Applied Linguistics : Languages at the Crossroads: Training, Accreditation and Context of Use. editor / Francisco Javier Díez Pérez ; María Águeda Moreno Moreno. Vol. 1 Jaén : Publicaciones de la Universidad de Jaén, 2017. pp. 227-239
@inproceedings{a0a67ba5c99942f1b0876785cb3cc12f,
title = "Building a Spanish Lexicon for Corpus Analysis",
abstract = "This paper seeks to describe the creation of a Spanish lexicon with semantic annotation in order to analyse more extensive corpora in the Spanish language. The semantic resources most employed nowadays are WordNet, FrameNet, PDEV and USAS, but they have been used mainly for English language research. The creation of a large Spanish lexicon will permit a greater amount of studies of corpora in Spanish can be undertaken. In the description of the steps followed for the construction of the lexicon, the difficulties encountered in its creation, and the solutions used to overcome them will be described. Finally, the construction of the lexicon will allow specific research tasks to be carried out, such as metaphor analysis, ACD studies and even NLP studies.",
keywords = "lexicon, Spanish, semantic tagging, discourse analysis",
author = "Ricardo Jim{\'e}nez-Y{\'a}{\~n}ez and H. Sanjurjo-Gonz{\'a}lez and Paul Rayson and Scott Piao",
year = "2017",
month = "5",
language = "English",
isbn = "9788491591085",
volume = "1",
pages = "227--239",
editor = "{D{\'i}ez P{\'e}rez}, {Francisco Javier} and {Moreno Moreno}, {Mar{\'i}a {\'A}gueda}",
booktitle = "Proceedings of the 35th Edition of the International Conference of The Spanish Association of Applied Linguistics",
publisher = "Publicaciones de la Universidad de Ja{\'e}n",

}

Jiménez-Yáñez, R, Sanjurjo-González, H, Rayson, P & Piao, S 2017, Building a Spanish Lexicon for Corpus Analysis. in FJ Díez Pérez & MÁ Moreno Moreno (eds), Proceedings of the 35th Edition of the International Conference of The Spanish Association of Applied Linguistics : Languages at the Crossroads: Training, Accreditation and Context of Use. vol. 1, Publicaciones de la Universidad de Jaén, Jaén, pp. 227-239, International Conference of the Spanish Association of Applied Linguistics, Andalucia, Spain, 4/05/17.

Building a Spanish Lexicon for Corpus Analysis. / Jiménez-Yáñez, Ricardo; Sanjurjo-González, H.; Rayson, Paul; Piao, Scott.

Proceedings of the 35th Edition of the International Conference of The Spanish Association of Applied Linguistics : Languages at the Crossroads: Training, Accreditation and Context of Use. ed. / Francisco Javier Díez Pérez; María Águeda Moreno Moreno. Vol. 1 Jaén : Publicaciones de la Universidad de Jaén, 2017. p. 227-239.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Building a Spanish Lexicon for Corpus Analysis

AU - Jiménez-Yáñez, Ricardo

AU - Sanjurjo-González, H.

AU - Rayson, Paul

AU - Piao, Scott

PY - 2017/5

Y1 - 2017/5

N2 - This paper seeks to describe the creation of a Spanish lexicon with semantic annotation in order to analyse more extensive corpora in the Spanish language. The semantic resources most employed nowadays are WordNet, FrameNet, PDEV and USAS, but they have been used mainly for English language research. The creation of a large Spanish lexicon will permit a greater amount of studies of corpora in Spanish can be undertaken. In the description of the steps followed for the construction of the lexicon, the difficulties encountered in its creation, and the solutions used to overcome them will be described. Finally, the construction of the lexicon will allow specific research tasks to be carried out, such as metaphor analysis, ACD studies and even NLP studies.

AB - This paper seeks to describe the creation of a Spanish lexicon with semantic annotation in order to analyse more extensive corpora in the Spanish language. The semantic resources most employed nowadays are WordNet, FrameNet, PDEV and USAS, but they have been used mainly for English language research. The creation of a large Spanish lexicon will permit a greater amount of studies of corpora in Spanish can be undertaken. In the description of the steps followed for the construction of the lexicon, the difficulties encountered in its creation, and the solutions used to overcome them will be described. Finally, the construction of the lexicon will allow specific research tasks to be carried out, such as metaphor analysis, ACD studies and even NLP studies.

KW - lexicon

KW - Spanish

KW - semantic tagging

KW - discourse analysis

M3 - Conference contribution

SN - 9788491591085

VL - 1

SP - 227

EP - 239

BT - Proceedings of the 35th Edition of the International Conference of The Spanish Association of Applied Linguistics

A2 - Díez Pérez, Francisco Javier

A2 - Moreno Moreno, María Águeda

PB - Publicaciones de la Universidad de Jaén

CY - Jaén

ER -

Jiménez-Yáñez R, Sanjurjo-González H, Rayson P, Piao S. Building a Spanish Lexicon for Corpus Analysis. In Díez Pérez FJ, Moreno Moreno MÁ, editors, Proceedings of the 35th Edition of the International Conference of The Spanish Association of Applied Linguistics : Languages at the Crossroads: Training, Accreditation and Context of Use. Vol. 1. Jaén: Publicaciones de la Universidad de Jaén. 2017. p. 227-239