Creation of a high-quality, register-diversified parallel (English- Spanish) corpus for linguistic and computational investigations

Lavid López, María Julia; Arús Hita, Jorge; Hoste, Veronique; DeClerck, Bernard

doi:10.1016/j.sbspro.2015.07.443

Creation of a high-quality, register-diversified parallel (English- Spanish) corpus for linguistic and computational investigations

Download

Docta_1-s2.0-S1877042815044444-main.pdf (199.78 KB)

Official URL

https://www.sciencedirect.com/science/article/pii/S1877042815044444

Publication date

2015

Authors

Lavid López, María Julia

Arús Hita, Jorge

Hoste, Veronique

DeClerck, Bernard

Publisher

Elsevier

Citations

Exportar

URI

https://hdl.handle.net/20.500.14352/110262

Citation

Lavid, Julia, et al. «Creation of a High-quality, Register-diversified Parallel (English-Spanish) Corpus for Linguistic and Computational Investigations». Procedia : Social and Behavioral Sciences, vol. 198, 2015, pp. 249-256. ScienceDirect, https://doi.org/10.1016/j.sbspro.2015.07.443.

Abstract

This paper outlines current work on the construction of a high-quality, richly-annotated and register-diversified parallel corpus for the English-Spanish language pair, as currently carried out within the framework of the MULTINOT project. The corpus consists of original and translated texts in both directions and is designed as a multifunctional resource to be used in a number of disciplines such as corpus-based contrastive linguistic and translation studies, machine translation, computer-assisted translation, computer-assisted language learning and terminology extraction. The paper describes the structure of the corpus –which includes four subcorpora: English originals (EO) and Spanish originals (SO), English translations (Etrans) and Spanish translations (Strans)-, the registers selected for inclusion in the corpus, and the methodology used to guarantee the quality of the processing steps to enrich the corpus with linguistic information at different levels.

Description

The MULTINOT project is financed by the Spanish Ministry of Economy and Competitiveness under project grant FFI2012-32201.

UCM subjects

Lingüística, Traducción e interpretación

Unesco subjects

57 Lingüística, 5701.13 Lingüística Aplicada a la Traducción E Interpretación

Collections

Artículos

Full item page

Creation of a high-quality, register-diversified parallel (English- Spanish) corpus for linguistic and computational investigations

Download

Official URL

Full text at PDC

Publication date

Authors

Advisors (or tutors)

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Citations

Exportar

URI

Citation

Abstract

Research Projects

Organizational Units

Journal Issue

Description

UCM subjects

Unesco subjects

Keywords

Collections