Publication: Annotating Expressions of Engagement in online book reviews: A contrastive (English-Spanish) corpus study for computational processing
Full text at PDC
Advisors (or tutors)
This dissertation studies the expression of Engagement and alternative points of view in English and Spanish online book reviews, following the Appraisal model designed by Martin and White (2005). The study has three main aims: 1) to test two main aspects of the linguistic category of Engagement empirically, namely the identification of span realising Engagement and the classification of Engagement into different subtypes; 2) to extract relevant contrastive features of the use of Engagement in English and Spanish in online book reviews; 3) to create a bilingual (comparable) machine-readable annotated corpus with Engagement features in English and Spanish which can serve as the training corpus for machine learning algorithms and be offered to the scientific community for further research. Following standard methodologies in the field of Natural Language Processing, two agreement studies are carried out, designed to measure inter-annotator agreement based on an initial set of 10 reviews. A larger set of 28 reviews (14 English, 14 Spanish) is further annotated by one single human coder in order to extract relevant results on contrastive aspects and provide publicly-available machine-readable annotated texts with Engagement categories. The findings reveal disagreement mainly on span length and the annotation of some specific categories, namely Pronounce and Counter. In addition, differences regarding frequency in the use of Engagement types were found in both languages, although the expressions employed were formally similar. Finally, the results of the annotation of the larger data set showed that more expressions than what was initially expected can be annotated context-independently, although regarding some other expressions, register and collocations were seen to have a decisive influence on their interpretation of some expressions, in the same way that genre has on their frequency of use, for resources aimed at emphasising reviewer’s personal opinion were more frequent than those who acknowledged and evaluated external sources.