Linguistic injustice in multilingual technologies: the TenTen Corpus Family as a case study

Bordonaba Plou, David; Jreis-Navarro, Laila M.

doi:10.4324/9781003393696-12

Linguistic injustice in multilingual technologies: the TenTen Corpus Family as a case study

Official URL

https://doi.org/10.4324/9781003393696

Publication date

2023

Authors

Bordonaba Plou, David

Jreis-Navarro, Laila M.

Editors

Viola, Lorella

Spence, Paul

Publisher

Routledge

Citations

Exportar

URI

https://hdl.handle.net/20.500.14352/129996

Citation

Bordonaba-Plou, D. y Jreis-Navarro, L.M. (2023) «Linguistic Injustice in Multilingual Technologies: The TenTen Corpus Family as a Case Study», en Multilingual Digital Humanities. Taylor and Francis, pp. 129-144. Disponible en: https://doi.org/10.4324/9781003393696-12.

Abstract

The aim of this work is twofold. First, to distinguish a phenomenon that produces a new type of linguistic injustice, “the paradox of Anglocentric multilingualism.” This paradox arises when a multilingual philosophy is pursued in constructing complex systems of analysis in the digital environment; however, these systems imply advantages in the study of English over other languages. The injustice derives from a poor level of precision in the output of a technology when analyzing non-English languages. Second, to contend that multilingual DH should deal with the deficiencies of tools’ performance, in addition to those of the language resources, because this disadvantage makes it difficult for any cross-linguistic study to provide reliable empirical data in dis(proving) linguistic intuitions. To illustrate some of the potential problems derived from the paradox, this work will detail the difficulties we have faced in a cross-linguistic study on color terms, when using the Arabic corpus arTenTen and the Spanish corpus esTenTen in Sketch Engine. We will study the different performances of the tool in Arabic and Spanish, compared to English, to signal the weaknesses of this tool in a multilingual arena, enabling its improvement and enriching the critical and inclusive framework of multilingual DH.

UCM subjects

Filosofía, Filología, Lingüística

Unesco subjects

72 Filosofía, 5505.10 Filología, 5505.10-1 Filología Árabe, 7202.07 Filosofía del Lenguaje

Collections

Secciones de libros

Full item page

Linguistic injustice in multilingual technologies: the TenTen Corpus Family as a case study

Official URL

Full text at PDC

Publication date

Authors

Advisors (or tutors)

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Citations

Exportar

URI

Citation

Abstract

Research Projects

Organizational Units

Journal Issue

Description

UCM subjects

Unesco subjects

Keywords

Collections