%0 Journal Article %A Bordonaba Plou, David %A Jreis-Navarro, Laila M. %T Are the TenTen corpora really a corpus family? On linguistic tagging and corpora members’ kinship degrees %D 2025 %@ 1753-8548 %U https://hdl.handle.net/20.500.14352/118738 %X Corpus linguistics is an essential tool in digital humanities, and multilingual corpora are valuable resources in cross-linguistic studies. In this article we address the multilingual layout of the TenTen corpus family, questioning the rationale to call it a family, and advancing the idea of different degrees of kinship for its language members. The analysis focuses on the performance of the Sketch Engine Word Sketch tool in the English Web 2020 corpus (enTenTen20) in comparison with the latest release of the arTenTen, Arabic Web 2018 corpus (arTenTen18), which has been processed by CAMeL tools, an Arabic-specific software, and its previous version, the arTenTen12, tagged with Stanford CoreNLP. The study shows the challenges posed by the platform tools and the tagged corpora regarding the dissimilarities between the available data and the reliability of the results of these tools for both languages, as well as the efforts made to tackle the challenges. The concluding remarks point to the need for a better definition of multilingualism in the TenTen corpora and, by extension, in the digital humanities as a whole, based on the structural design of the resources and tools meant for such theoretical aspirations. %~