¿Tienen GPT-3.5 y GPT-4 un estilo de escritura diferente del estilo humano? : un estudio exploratorio para el español

Alonso Simón, Lara; Fernández-Pampillón Cesteros, Ana María; Fernández Trinidad, Marianela; Márquez Cruz, Manuel

doi:10.58859/rael.v23i1.666

¿Tienen GPT-3.5 y GPT-4 un estilo de escritura diferente del estilo humano? : un estudio exploratorio para el español

Download

Docta_anaibanezmoreno,+rael-23_666-DEF.pdf (791.25 KB)

Official URL

https://doi.org/10.58859/rael.v23i1.666

Publication date

2024

Authors

Alonso Simón, Lara

Fernández-Pampillón Cesteros, Ana María

Fernández Trinidad, Marianela

Márquez Cruz, Manuel

Publisher

Asociación Española de Lingüística Aplicada (AESLA)

Citations

Exportar

URI

https://hdl.handle.net/20.500.14352/130386

Citation

Alonso Simón, L., Fernández-Pampillón Cesteros, A.M., Fernández Trinidad, M. y Márquez Cruz, M. (2024). «¿Tienen GPT-3.5 y GPT-4 un estilo de escritura diferente del estilo humano? : un estudio exploratorio para el español». RAEL: Revista Electrónica de Lingüística Aplicada, 23, 34-54. https://doi.org/10.58859/rael.v23i1.666

Abstract

RESUMEN: La cuestión que se aborda en este trabajo de investigación es la comprobación, mediante técnicas estadísticas, de que los modelos generativos de lenguaje GPT-3.5 (versión gratuita) y GPT-4 (versión de pago) de ChatGPT tienen un estilo de escritura distinto al de los humanos, y que pueden diferenciarse, al menos, por tres tipos de rasgos: léxicos, signos de puntuación y estructura sintáctica de las oraciones. Determinar si los grandes modelos de lenguaje tienen un estilo propio es relevante de cara a poder detectar la autoría automática de los textos. En trabajos anteriores se construyó un corpus comparable de textos humanos y automáticos en español y, mediante un estudio cualitativo, se localizó un conjunto de rasgos lingüísticos y estilísticos propios de cada autor. En este trabajo se ha podido comprobar cuantitativamente que 17 variables lingüísticas presentan diferencias estadísticamente significativas entre autores humanos y los modelos GPT-3.5 y GPT-4.
ABSTRACT: The aim of this research is to verify, using statistical methods, that the generative language models GPT-3.5 (free version) and GPT-4 (paid version) of ChatGPT have their own writing style distinct from that of humans and that they can be distinguished by at least three types of features: lexical features, punctuation marks and syntactic sentence structure. Determining whether large language models have their own style is relevant in order to detect automatic authorship of texts. In previous work, a comparable corpus of human and automatic texts in Spanish was constructed and, through a qualitative study, a set of linguistic and stylistic features specific to each author was identified. In this work, it has been quantitatively demonstrated that 17 identified linguistic variables show statistically significant differences between human authors and the GPT-3.5 and GPT-4 models.

Description

Esta publicación es parte del proyecto de I+D+i Proyecto ROBOT-TALK PID2022-140897OB-I00 financiado por MCIN/AEI/10.13039/501100011033/ y FEDER/UE.

UCM subjects

Lingüística, Inteligencia artificial (Informática)

Unesco subjects

5701.04 Lingüística Informatizada, 1203.04 Inteligencia Artificial

Collections

Artículos

Full item page

¿Tienen GPT-3.5 y GPT-4 un estilo de escritura diferente del estilo humano? : un estudio exploratorio para el español

Download

Official URL

Full text at PDC

Publication date

Authors

Advisors (or tutors)

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Citations

Exportar

URI

Citation

Abstract

Research Projects

Organizational Units

Journal Issue

Description

UCM subjects

Unesco subjects

Keywords

Collections