UNIVERSIDAD COMPLUTENSE DE MADRID FACULTAD DE INFORMÁTICA TESIS DOCTORAL Mejorando la evaluación de juegos serios aplicando analíticas de aprendizaje y técnicas de minería de datos Improving serious games evaluation by applying learning analytics and data mining techniques MEMORIA PARA OPTAR AL GRADO DE DOCTOR PRESENTADA POR Cristina Alonso Fernández DIRECTORES Baltasar Fernández Manjón Manuel Freire Morán Iván Martínez Ortiz Madrid, 2021 © Cristina Alonso Fernández, 2021 UNIVERSIDAD COMPLUTENSE DE MADRID FACULTAD DE INFORMÁTICA TESIS DOCTORAL MEJORANDO LA EVALUACIÓN DE JUEGOS SERIOS APLICANDO ANALÍTICAS DE APRENDIZAJE Y TÉCNICAS DE MINERÍA DE DATOS IMPROVING SERIOUS GAMES EVALUATION BY APPLYING LEARNING ANALYTICS AND DATA MINING TECHNIQUES MEMORIA PARA OPTAR AL GRADO DE DOCTOR PRESENTADA POR CRISTINA ALONSO FERNÁNDEZ DIRECTOR BALTASAR FERNÁNDEZ MANJÓN MANUEL FREIRE MORÁN IVÁN MARTÍNEZ ORTIZ UNIVERSIDAD COMPLUTENSE DE MADRID FACULTAD DE INFORMÁTICA TESIS DOCTORAL MEJORANDO LA EVALUACIÓN DE JUEGOS SERIOS APLICANDO ANALÍTICAS DE APRENDIZAJE Y TÉCNICAS DE MINERÍA DE DATOS IMPROVING SERIOUS GAMES EVALUATION BY APPLYING LEARNING ANALYTICS AND DATA MINING TECHNIQUES CRISTINA ALONSO FERNÁNDEZ DIRECTORES BALTASAR FERNÁNDEZ MANJÓN MANUEL FREIRE MORÁN IVÁN MARTÍNEZ ORTIZ Madrid, 2021 “I am never so happy as when I am really engaged in good earnest, and it makes me must wonderfully cheerful and merry at other times, which is curious and very satisfactory” Ada Lovelace I Acknowledgments From the outside, it could seem that “the thesis” is mainly this document but, actually, it is all the work carried out during, at least, the last 4 years. During this time, I have been lucky to have the support and company of a lot of people that have made this journey, that is sometimes confusing, much more bearable. Thanks to my classmates doblegradistas and of the master, as well as to the teachers that I had during those years, that helped me to lay the groundwork for this adventure. Thanks to Rafa for his help and co-direction of my Final Master Thesis. Thanks to the Department of Software Engineering and Artificial Intelligence, in which I have carried out this work. Special mention to Lourdes, for her help and joy. Thanks to the members of the Doctoral Program, specially Narciso, Román and Daniel for their help. Thanks to the staff of the Computer Science Faculty, specially to Sánchez for his red-hot coffees that woke us up even in the most tiring afternoons. Thanks to my workmates in the research stay in Florida State University: mainly Professor Valerie Shute, who always helped me. And her research group, with whom I had the luck to work a few months: Ahmad, Lukas, Ginny, Xiaotong, Renata, Chi- Puh, Curt, Chen, Russell. Thanks to my mates of the room 16 and surroundings for their company while working and, specially, while not working: Toni, Iván, Victorma, Cristian, Marta, Pablo, Miguel, Alicia, Jesús, Luisma, Joaquín, Dani, Rubén. Although we have repeated this many times, it is still true: without you, I would have finished this thesis much earlier, but it would have been a lot less fun. Thanks to the other members of the e-UCM research group who helped me during this work: Ángel, Ana, Alma. Thanks to Julio for his work during his stay. Thanks to my supervisors for their advice and constant help during these years. To Iván, Lord of the Machines, and Manu, Master of English, for their different perspectives that have enriched this work. And, of course, to Balta, for his guidance and help all these years, since I had the crazy idea of asking him to supervise my Final Degree Project. Thanks to my friends and family, who, although they did not always understand what I was doing, always supported me in this journey. And mostly, thanks to my mother and father, for cheering me up in the bad moments, for accompanying me in the good ones, and for everything else. II Agradecimientos Desde fuera, podría parecer que “la tesis” se compone principalmente de este documento, pero, en realidad, es el cúmulo del trabajo realizado durante, al menos, los últimos 4 años. En este tiempo, he contado con el apoyo y la compañía de muchas personas que han hecho este camino, en ocasiones confuso, mucho más transitable. Gracias a mis compis doblegradistas y del máster, así como a los profesores y profesoras que tuve durante esos años, que me ayudaron a sentar las bases en las etapas previas a esta aventura. Gracias a Rafa por su ayuda y codirección del Trabajo de Fin de Máster. Gracias al equipo del Departamento de Ingeniería del Software e Inteligencia Artificial, en el que he realizado este trabajo. Mención especial a Lourdes, por su ayuda y alegría. Gracias al equipo del Programa de Doctorado, en especial a Narciso, Román y Daniel por su ayuda. Gracias al personal de la Facultad de Informática, en especial a Sánchez por sus cafés al rojo vivo que nos espabilaban hasta en las tardes más cansadas. Gracias a las personas con las que pasé la estancia en la Universidad Estatal de Florida: principalmente a la Profesora Valerie Shute, que me ayudó en todo momento. Y a todo su grupo de investigación con los que tuve la suerte de trabajar unos meses: Ahmad, Lukas, Ginny, Xiaotong, Renata, Chi-Puh, Curt, Chen, Russell. Gracias a mis compis del aula 16 y alrededores por su compañía en los momentos de trabajo y, sobre todo, en los de no trabajo: Toni, Iván, Victorma, Cristian, Marta, Pablo, Miguel, Alicia, Jesús, Luisma, Joaquín, Dani, Rubén. Aunque lo hayamos repetido muchas veces, sigue siendo verdad: sin vosotros hubiera terminado esta tesis mucho antes, pero hubiera sido mucho menos divertido. Gracias también al resto del grupo e-UCM que me ha ayudado en este trabajo: Ángel, Ana, Alma. Gracias a Julio por su colaboración en los meses de su estancia. Gracias a mis directores por sus consejos y ayuda constante en estos años. A Iván, señor de las máquinas, y a Manu, master of English, por sus puntos de vista tan complementarios para enriquecer el trabajo. Y, por supuesto, a Balta, por su guía y ayuda todos estos años, desde que tuve la loca idea de pedirle que me dirigiera el Trabajo de Fin de Grado. Gracias a mis amigas y amigos, y a mi familia, que, aunque no siempre entendieran a que dedicaba el tiempo, han estado siempre apoyándome en este camino. Y sobre todo gracias a mi madre y a mi padre, por animarme en los momentos malos, por acompañarme en los momentos buenos, y por todo lo demás. III About this document The thesis presented in this document was carried out as a compendium of publications. The publications included in the thesis are listed below, and their full text is included in Chapter 6. Journal publications: • Cristina Alonso-Fernández, Antonio Calvo-Morata, Manuel Freire, Iván Martínez-Ortiz, Baltasar Fernández-Manjón (2019): Applications of data science to game learning analytics data: a systematic literature review. Computers & Education, Volume 141, November 2019, 103612. DOI: 10.1016/j.compedu.2019.103612. o Impact metrics: JCR 2019, Impact Factor: 5.296, Q1 in Computer Science, Interdisciplinary Applications. o This paper presents the systematic literature review carried out about the applications of data mining techniques to game learning analytics data from serious games. o Details and results of the paper are described in Section 2.3 of this document, as part of the presentation of the state of the art, and in Section 4.1, as part of the results of the thesis. • Cristina Alonso-Fernández, Iván Martínez-Ortiz, Rafael Caballero, Manuel Freire, Baltasar Fernández-Manjón (2020): Predicting students’ knowledge after playing a serious game based on learning analytics data: A case study. Journal of Computer Assisted Learning, vol. 36, no. 3, pp. 350-358, June 2020. DOI: 10.1111/jcal.12405. o Impact metrics: JCR 2019, Impact Factor: 2.126, Q2 in Education & Educational Research. o This paper presents the first case study carried out to test our assessment approach using learning analytics data and prediction models. o Details and results of the paper are described in Section 4.2 of this document. • Cristina Alonso-Fernández, Antonio Calvo-Morata, Manuel Freire, Iván Martínez-Ortiz, Baltasar Fernández-Manjón (2020): Evidence-based evaluation of a serious game to increase bullying awareness. Interactive Learning Environments, 2020. DOI: 10.1080/10494820.2020.1799031. o Impact metrics: JCR 2019, Impact Factor: 1.938, Q2 in Education & Educational Research. IV o This paper presents the second case study carried out to further explore our assessment approach with a different serious game. o Details and results of the paper are described in Section 4.3 of this document. • Cristina Alonso-Fernández, Manuel Freire, Iván Martínez-Ortiz, Baltasar Fernández-Manjón (2021): Improving evidence-based assessment of players using serious games. Telematics and Informatics. (in press) DOI: 10.1016/j.tele.2021.101583. o Impact metrics: JCR 2019, Impact Factor: 4.139, Q1 in Information Science & Library Science. o This paper presents the final evidence-based assessment process of serious games players based on game learning analytics data and prediction models. o Details and results of the paper are described in Section 4.4 of this document. • Cristina Alonso-Fernández, Ana Rus Cano, Antonio Calvo-Morata, Manuel Freire, Iván Martínez-Ortiz, Baltasar Fernández-Manjón (2019): Lessons learned applying learning analytics to assess serious games. Computers in Human Behavior, Volume 99, October 2019, Pages 301-309. DOI: 10.1016/j.chb.2019.05.036. o Impact metrics: JCR 2019, Impact Factor: 5.003, Q1 in Psychology, Experimental. o This paper presents the research carried out in this and two other thesis exploring different applications of learning analytics data to assess serious games. o Details and results of the paper are described in Section 4.5 of this document. Conference publications: • Cristina Alonso-Fernández, Antonio Calvo-Morata, Manuel Freire, Iván Martínez-Ortiz, Baltasar Fernández-Manjón (2017): Systematizing game learning analytics for serious games. IEEE Global Engineering Education Conference (EDUCON), 25-28 April 2017, Athens, Greece. o This paper presents the first steps carried out in systematization of the application of game learning analytics in serious games. o Details and results of the paper are described in Section 4.4 of this document. V o This paper received a Best Paper Award of the Conference, in the “Area 3: Innovative Materials, Teaching and Learning Experiences in Engineering Education”. • Cristina Alonso-Fernández, Antonio Calvo-Morata, Manuel Freire, Iván Martínez-Ortiz, Baltasar Fernández-Manjón (2021): Data science meets standardized game learning analytics. IEEE Global Engineering Education Conference (EDUCON), 21-23 April 2021, Vienna, Austria. o This paper presents the tool T-MON, an analysis and visualization tool to conduct exploratory analysis on the game interaction data collected. o Details and results of the paper are described in Section 4.4 of this document. • Cristina Alonso-Fernández, Dan C. Rotaru, Manuel Freire, Iván Martínez- Ortiz, Baltasar Fernández-Manjón (2017): Full Lifecycle Architecture for Serious Games: Integrating Game Learning Analytics and a Game Authoring Tool. Joint Conference on Serious Games (JCSG), 23-24 November 2017, Polytechnic University of Valencia, Spain. o This paper presents the work to integrate game learning analytics data as part of a game authoring tool. o Details and results of the paper are described in Section 4.5 of this document. • Cristina Alonso-Fernández, Ivan Perez-Colado, Manuel Freire, Iván Martínez- Ortiz, Baltasar Fernández-Manjón (2018): Improving serious games analyzing learning analytics data: lessons learned. Games and Learning Alliance conference (GALA Conf), December 5-7, 2018, Palermo, Italy. o This paper presents the initial lessons learned in some of the work carried out with learning analytics data in serious games. o Details and results of the paper are described in Section 4.5 of this document. • Cristina Alonso-Fernández, Ana Rus Cano, Antonio Calvo-Morata, Manuel Freire, Iván Martínez-Ortiz, Baltasar Fernández-Manjón (2018): Applications of learning analytics to assess serious games. 2nd Annual Learning & Student Analytics Conference (LSAC), October 22-23, 2018, Amsterdam, The Netherlands. o This paper explored the opportunities for assessment with learning analytics in serious games. o Details and results of the paper are described in Section 4.5 of this document. VI Abstract Title: Improving serious games evaluation by applying learning analytics and data mining techniques. Serious games are highly motivational resources effective to teach, raise awareness, or change the perceptions of players. To foster their application in education, teachers and institutions require clear and formal evidences to assess students’ learning while they are playing the games. However, traditional assessment techniques rely on external questionnaires, typically carried out before and after playing, that fail to measure players’ learning while it is happening. The multiple interactions carried out by players in the games can provide more precise information about how players play, and even be used to assess them. In this regard, game learning analytics techniques propose the collection and analysis of such interactions for multiple purposes, including assessment. The potentially large game learning analytics data collected can be further analyzed with data mining techniques to discover unexpected patterns and to provide measures to evaluate the effect of games on their players and assess their learning. In this thesis, we propose a new approach to assess serious games players based on evidences collected from their gameplays. The interaction data collected is analyzed to derive game learning analytics variables that fill data mining models to predict players’ learning. These prediction models are created during the serious games’ validation phase and tested by comparing their predictions to the learning measured by the differences between the pre-post questionnaires. With our approach, once the games and the prediction models are validated, players’ assessment in large-scale deployment is simplified as players can be assessed solely based on their interaction data, as questionnaires are no longer required. The approach uses a standard data collection format for interactions with serious games, xAPI-SG, to systematize the interaction data collected, as well as the creation of the relevant game learning analytics variables. Finally, to support the essential step of creating those variables, we provide an exploratory tool, T-MON, to analyze and visualize the collected interaction data. The approach is based on the lessons learned in two case studies with different serious games in which we conducted all the steps to assess students based on their interactions. Simplifying and improving players’ assessment, educators and institutions will have clearer evidences to include serious games in their classes, without the VII additional costs for players to complete the questionnaires, and for educators to then analyze their results. Moreover, with this approach, we aim to contribute to the systematization of players’ assessment, one of the gaps identified in the systematic literature review about data science applications to game learning analytics data. The work carried out in this thesis builds on the work I performed as a member of the e-UCM group, including work conducted as part of the H2020 European projects RAGE and BEACONING. During those projects, I worked in improving the application and deployment of serious games and game learning analytics in large scale deployment scenarios, dealing with both the technological part as well as the analysis of the interaction data collected. Moreover, in the final part of the thesis, during a research stay at Florida State University at Professor Valerie Shute’s research group in stealth assessment, I worked in an international environment using the hosting group’s game learning analytics and player assessment techniques, which allowed me to do an initial validation and contrast the work presented in this proposal. Keywords: serious games, learning analytics, game-based learning, stealth assessment, data mining, standardization, e-learning VIII Resumen Título: Mejorando la evaluación de juegos serios aplicando analíticas de aprendizaje y técnicas de minería de datos. Los juegos serios son recursos altamente motivadores y efectivos para enseñar, concienciar, o cambiar las percepciones de sus jugadores. Para fomentar su aplicación en educación, los profesores y las instituciones necesitan pruebas claras y automáticas con las que evaluar el aprendizaje de sus estudiantes mientras utilizan los juegos. Tradicionalmente, la evaluación con juegos serios se basa en cuestionarios externos, realizados normalmente antes y después de jugar, que no miden el aprendizaje de los jugadores durante el proceso en sí. Las múltiples interacciones que realizan los jugadores al jugar pueden proporcionar una información más precisa sobre cómo juegan los jugadores e, incluso, utilizarse para evaluar su aprendizaje. En este sentido, las analíticas de aprendizaje para juegos proponen técnicas para la recogida y el análisis de dichas interacciones con múltiples fines, incluida la evaluación de los jugadores. Los datos (potencialmente numerosos) de las analíticas de aprendizaje para juegos pueden analizarse en mayor detalle con técnicas de minería de datos que permiten descubrir patrones ocultos a simple vista y proporcionar mejores medidas para estudiar el efecto de los juegos en los estudiantes y evaluar su aprendizaje. En esta tesis, proponemos un nuevo método para evaluar a los jugadores de juegos serios basándonos en las evidencias recogidas mientras juegan. Los datos de interacción recogidos se analizan para extraer variables de analíticas de aprendizaje utilizadas por modelos predictivos de minería de datos para cuantificar el aprendizaje de los jugadores. Estos modelos predictivos se crean durante la fase de validación de los juegos serios, y se validan comparando sus predicciones con el aprendizaje medido por las diferencias entre los cuestionarios anterior y posterior al juego. Con nuestra propuesta, una vez validados los juegos y los modelos predictivos, la evaluación de los jugadores se simplifica durante el despliegue a gran escala, permitiendo que los jugadores pueden ser evaluados automáticamente con sus datos de interacción, sin necesidad de cuestionarios. La propuesta utiliza un formato de recogida de datos estándar para las interacciones con juegos serios, xAPI-SG, que permite sistematizar tanto los datos de interacción recogidos, como la creación de variables de analíticas de aprendizaje. Por último, para ayudar en la etapa esencial de extracción de variables, proporcionamos una herramienta exploratoria, T-MON, para analizar y visualizar los datos de interacción recogidos. IX La propuesta se basa en las lecciones aprendidas en dos casos de estudio con diferentes juegos serios en los que realizamos todos los pasos para evaluar a los estudiantes basándonos en sus interacciones. Simplificando y mejorando la evaluación de los jugadores, los educadores y las instituciones tendrán evidencias más claras para incluir los juegos serios en sus clases, sin los costes adicionales para los jugadores de completar dichos cuestionarios, y para los educadores de analizarlos posteriormente. Además, con esta propuesta buscamos avanzar en la sistematización de la evaluación de los jugadores, uno de los vacíos identificados en la revisión sistemática de la literatura sobre las aplicaciones de ciencia de datos a analíticas de aprendizaje para juegos. El trabajo realizado en esta tesis se basa en el trabajo en el que he participado como integrante del grupo e-UCM, incluyendo la investigación que formó parte de los proyectos europeos H2020 RAGE y BEACONING. Durante estos proyectos, trabajé en mejorar la aplicación y el despliegue de juegos serios y de analíticas de aprendizaje para juegos en entornos de desarrollo de gran escala, abordando tanto la parte de tecnologías aplicadas como la parte de análisis de los datos de interacción recogidos. Además, durante la parte final de la tesis, realicé una estancia de investigación en la Universidad Estatal de Florida, con el grupo de investigación en stealth assessment de la Profesora Valerie Shute, en la que trabajé en un entorno internacional con técnicas de analíticas de aprendizaje y evaluación de estudiantes, que me permitieron realizar una validación inicial y contrastar el trabajo que presento en esta propuesta. Palabras clave: juegos serios, analíticas de aprendizaje, aprendizaje basado en juegos, evaluación, minería de datos, estandarización, e-learning X Table of Contents Acknowledgments ................................................................................................. I Agradecimientos .................................................................................................. II About this document ........................................................................................... III Abstract .............................................................................................................. VI Resumen ........................................................................................................... VIII List of Figures .................................................................................................. XIV List of Tables .................................................................................................... XVI List of Abbreviations ........................................................................................ XVII Chapter 1. Introduction ...................................................................................... 1 1.1. Motivation ............................................................................................... 1 1.2. Document structure ................................................................................ 3 Chapter 2. State of the art .................................................................................. 4 2.1. Serious Games......................................................................................... 4 2.1.1. Players’ assessment using serious games .......................................... 6 2.2. Game Learning Analytics ......................................................................... 8 2.2.1. Data standardization: xAPI-SG ...................................................... 12 2.3. Data mining techniques .......................................................................... 15 2.4. Applications of data mining to game learning analytics data ..................... 19 Chapter 3. Goals of the thesis ............................................................................ 27 3.1. Research goals ........................................................................................ 27 3.2. Research process .................................................................................... 28 Chapter 4. Results and discussion ...................................................................... 30 4.1. Study of the domain ............................................................................... 30 4.2. First case study ....................................................................................... 35 4.2.1. The game: First Aid Game ............................................................ 35 4.2.2. Data captured............................................................................... 36 4.2.3. GLA variables ............................................................................... 38 XI 4.2.4. Prediction models and results ........................................................ 39 4.2.5. Discussion and conclusions .......................................................... 40 4.3. Second case study .................................................................................. 42 4.3.1. The game: Conectado .................................................................. 42 4.3.2. Data captured............................................................................... 43 4.3.3. GLA variables ............................................................................... 44 4.3.4. Prediction models and results ........................................................ 46 4.3.5. Discussion and conclusions ........................................................... 47 4.4. Evidence-based assessment process of serious game players ....................... 49 4.4.1. Collection of player data: pre-post questionnaires and game interaction data............................................................................................. 50 4.4.2. Feature extraction process: GLA variables from interaction data ...... 51 T-MON: Monitor of traces in xAPI-SG ....................................................... 52 4.4.3. Assessment prediction with GLA evidences .................................... 55 4.4.4. From game validation to game deployment ................................... 57 4.5. Discussion.............................................................................................. 59 Chapter 5. Conclusions, contributions and future work ...................................... 62 5.1. Conclusions ........................................................................................... 62 5.2. Contributions ........................................................................................ 64 5.3. Future work ........................................................................................... 66 Chapter 6. Publications ..................................................................................... 69 6.1. Journal publications ................................................................................ 69 6.1.1. Applications of data science to game learning analytics data: a systematic literature review ...........................................................................70 Full citation...............................................................................................70 Abstract ....................................................................................................70 Full publication ......................................................................................... 71 6.1.2. Predicting students’ knowledge after playing a serious game based on learning analytics data: A case study ............................................................... 85 Full citation............................................................................................... 85 XII Abstract .................................................................................................... 85 Full publication ......................................................................................... 86 6.1.3. Evidence-based evaluation of a serious game to increase bullying awareness……………………………………………………………………………………..95 Full citation............................................................................................... 95 Abstract .................................................................................................... 95 Full publication ......................................................................................... 96 6.1.4. Improving evidence-based assessment of players using serious games………………………………………………………………………………………… 107 Full citation............................................................................................. 107 Abstract .................................................................................................. 107 Full publication ....................................................................................... 108 6.1.5. Lessons learned applying learning analytics to assess serious games 118 Full citation.............................................................................................. 118 Abstract ................................................................................................... 118 Full publication ........................................................................................ 119 6.2. Conference publications ....................................................................... 128 6.2.1. Systematizing game learning analytics for serious games ............... 129 Full citation............................................................................................. 129 Abstract .................................................................................................. 129 Full publication ....................................................................................... 130 6.2.2. Data science meets standardized game learning analytics .............. 138 Full citation............................................................................................. 138 Abstract .................................................................................................. 138 Full publication ....................................................................................... 139 6.2.3. Full lifecycle architecture for serious games: integrating game learning analytics and a game authoring tool ................................................ 146 Full citation............................................................................................. 146 Abstract .................................................................................................. 146 Full publication ....................................................................................... 147 XIII 6.2.4. Improving serious games analyzing learning analytics data: lessons learned …………………………………………………………………………………….. 159 Full citation............................................................................................. 159 Abstract .................................................................................................. 159 Full publication ....................................................................................... 160 6.2.5. Applications of learning analytics to assess serious games .............. 170 Full citation............................................................................................. 170 Abstract .................................................................................................. 170 Full publication ........................................................................................ 171 Bibliography ..................................................................................................... 175 XIV List of Figures Figure 1. Screenshot of the serious game Treefrog Treasure (left) retrieved from https://cool-math.co.uk/treefrog-treasure/, and of the serious game Darfur is Dying (right) retrieved from https://www.commonsense.org/education/game/darfur-is-dying. .................... 6 Figure 2. Traditional formal assessment methodology with serious games: pre-post evaluation. ..................................................................................................... 7 Figure 3. Learning Analytics diagram, retrieved from (Chatti et al., 2012). .............. 9 Figure 4. Implementation process of learning analytics, adapted from https://es.slideshare.net/emadridnet/20201113-aplicando-analticas-de- aprendizaje-en-un-juego-serio-de-puzles-geomtricos-jos-a-ruiprez. ................... 10 Figure 5. Game Learning Analytics model retrieved from (Hauge et al., 2014). ........ 11 Figure 6. Example xAPI statement representing the event “John Doe initialized the example activity” generated with https://adlnet.github.io/xapi-lab/. ............... 13 Figure 7. xAPI-SG sample statement generated when "John Doe selected a false response in a question", retrieved from (Serrano-Laguna, Martínez-Ortiz, et al., 2017). ........................................................................................................... 15 Figure 8. Educational Data Mining (EDM) process, retrieved from (Vahdat et al., 2015). ........................................................................................................... 19 Figure 9. Design Science Research Methodology (DSRM), retrieved from (Peffers, Tuunanen, Rothenberger, & Chatterjee, 2007). ............................................. 28 Figure 10. Process carried out to select the publications included in the systematic literature review............................................................................................ 31 Figure 11. Screenshots of the First Aid Game used in the first case study: the three game levels with a score (left) and visual choices in a level (right). .................. 36 Figure 12. Example of an xAPI-SG statement captured from the First Aid Game: the player has selected the correct response (112) in the question about the emergency number. ...................................................................................... 37 Figure 13. Screenshots of the serious game Conectado, used in the second case study: dialogue with a non-playable character (left) and choices in a conversation in the in-game mobile phone (right). ...................................................................... 43 Figure 14. Example of an xAPI-SG statement from Conectado: the player has interacted with the computer in the game. Additional information is encapsulated in the result field. ..................................................................... 44 Figure 15. Evidence-based assessment process of players using serious games: the game interaction traces collected fill the pre-defined set of GLA variables to be XV used as input for the prediction models. The target variable used for prediction is based on pre-post results. ........................................................................... 50 Figure 16. Four of the default visualizations included in T-MON presenting information about games completion, progress, completion times and scores in completables. ............................................................................................... 54 Figure 17. Four of the default visualizations included in T-MON presenting information about correct and incorrect responses in alternatives per player and per question, accessibles and interactions. ...................................................... 54 Figure 18. T-MON main GitHub repository page (left), and interface with configuration options (right). ........................................................................ 55 Figure 19. Evidence-based assessment process of serious game players: after validating the game and the prediction models, during the game deployment, players are assessed solely based on their game interactions. ............................................ 58 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65581267 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65581267 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65581267 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65581268 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65581268 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65581268 XVI List of Tables Table 1. Confusion matrix for classification algorithms. ......................................... 18 Table 2. Main purposes of data science applications to game learning analytics data from serious games. ...................................................................................... 31 Table 3. Data science techniques applied to game learning analytics data from serious games. .......................................................................................................... 32 Table 4. Game Learning Analytics variables derived from interaction data in the first case study (First Aid Game). .......................................................................... 38 Table 5. Results of prediction models of first aid knowledge for the first case study (First Aid Game). .......................................................................................... 40 Table 6. GLA variables derived from interactions in the second case study (Conectado). ................................................................................................ 45 Table 7. Results of prediction models of bullying awareness increase for the second case study (Conectado). ................................................................................ 46 Table 8. Correspondence of xAPI-SG traces (object type, verb and other fields) to derive GLA variables. .................................................................................... 53 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65583733 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65583733 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65583734 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65583734 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65583735 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65583735 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65583736 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65583736 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65583737 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65583737 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65583738 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65583738 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65583739 file:///C:/Users/cristina/Documents/tesis/tesis%20v4.docx%23_Toc65583739 XVII List of Abbreviations ADL Advanced Distributed Learning AI Artificial Intelligence API Application Programming Interface CV Cross Validation DM Data Mining DSRM Design Science Research Methodology EDM Educational Data Mining EU European Union e-UCM UCM research group on e-learning technologies FN False Negative FP False Positive GA Game Analytics GDPR General Data Protection Regulation GLA Game Learning Analytics IEEE Institute of Electrical and Electronics Engineers JSON JavaScript Object Notation k-NN K-Nearest Neighbors LA Learning Analytics LAM Learning Analytics Model LMS Learning Management System MAE Mean Absolute Error MOOC Massive Open Online Course MR Misclassification Rate RQ Research Question XVIII SD Standard Deviation SG Serious Game SVM Support Vector Machines SVR Support Vector machines for Regression TN True Negative TP True Positive T-MON Monitor of xAPI-SG traces UCM Universidad Complutense de Madrid xAI Explainable Artificial Intelligence xAPI Experience API xAPI-SG Experience API for Serious Games 1 Chapter 1. Introduction This chapter presents the motivation for the work carried out in this thesis, namely the improvement of the assessment method of students playing serious games, taking advantage of game learning analytics data and its analysis with data mining techniques. The chapter further summarizes the structure of the following chapters of this document. 1.1. Motivation Serious games are videogames that aim to cause an effect on players beyond simple entertainment, for example, increase players’ knowledge or awareness about social issues. We consider that serious games offer new opportunities not only as tools of learning but, additionally, as tools to assess that learning due to their multiple advantages, from their interactive and engaging nature to the possibility to test complex scenarios in a safely manner. Nevertheless, assessment of serious games effect on players is still conducted mostly through external formal questionnaires, which merely assess players before and after the gameplay, missing the opportunity to perform the assessment while players are actually learning, i.e., while they are playing the game. The innovative field of Game Learning Analytics proposes the collection, analysis and report of information extracted from the data obtained from players’ interactions with serious games. The application of Game Learning Analytics brings new opportunities to improve different aspects of the serious games lifecycle: further explore players’ behaviors in the gameplays, understand their progress and learning processes, improve the game and learning design, and adapt the learning experience to players’ characteristics and needs. The rich and potentially large amount of data gathered from the collection of game learning analytics can be further analyzed with complex analysis techniques to discover unexpected patterns. Data mining techniques like prediction models provide further opportunities to analyze the collected data, and together with the insight provided by game learning analytics, can allow to perform players’ assessment using serious games in a precise and automatic way without relying on external questionnaires. Accurate prediction models could be trained to automatically assess players solely based on their game interaction data. 2 Such data-based assessment of players can greatly simplify the process to obtain evidences on how much serious games are impacting their players. The simplification of the students’ assessment can provide the required tools to verify that students are learning, therefore, increasing teachers and institutions trust in serious games as tools for causing an actual effect on students/players. This, in turn, can contribute to expand the application of serious games in real life scenarios – beyond their current limited application as a complementary activity with no actual impact on students’ evaluations. Therefore, the simplification of the current assessment techniques of players using serious games can be one of the factors that contribute to foster the application of serious games. In our previous work related to serious games, we had analyzed the potential of game learning analytics data gathered from players’ interactions. Our research group e-UCM was part of two H2020 European projects, RAGE and BEACONING, in which, among other tasks, we managed the game learning analytics collection, analysis and reporting. During those projects, we worked in improving the application and the large-scale deployment of serious games, collecting and analyzing large amounts of game learning analytics data. For that work, the application of an interaction data standard format (xAPI-SG) was essential to provide a clear definition of the data collected from the different serious games and to systematize the analysis and visualizations carried out. This was particularly important working in an international project, with multiple partners that had to collaborate in the development of the tools. The work carried out in those projects laid the foundation for this work, providing a valuable experience in large-scale deployment of serious games in real scenarios. Additionally, in the final stages of this work, I had the opportunity to carry out a research stay in Florida State University with Professor Valerie Shute and her group, a leading research group in the field of stealth assessment. During that stay, besides learning the process and methodology of the work that they carry out, we worked together in two studies applying different techniques to the game learning analytics data collected from a serious game that they were studying. This experience provided me with a great insight on their techniques and methodology, as well as with a different perspective on the assessment of players with serious games. In the studies conducted, we were able to apply some of the steps and processes of our assessment approach, presented in this thesis, in a different context (different serious game and interaction data format), validating some of the steps of our approach and obtaining additional results on the application of game learning analytics in serious games. 3 1.2. Document structure The rest of this document is structured as follows: • Chapter 2 reviews the state of the art about the three main topics that are central to the thesis work: Serious Games, including the current methods of assessment of their players; Game Learning Analytics, including the standardization of the data collected (and the xAPI-SG Profile); and data mining techniques, such as prediction models. The chapter concludes analyzing the combination of the three previous topics, that is, the application of data mining techniques to game learning analytics data from serious games, including the results obtained in the systematic literature review conducted about this application. • Chapter 3 states the research goals of the thesis and presents the research methodology carried out and the steps followed in the thesis. • Chapter 4 presents the process, results and conclusions obtained in the systematic review of the literature, in the two case studies carried out with different serious games to conduct evidence-based assessment of their players, and the final evidence-based assessment process obtained, detailing all the steps needed, the data standard use, and presenting tools to support the process. The chapter concludes with a discussion of the work carried out and its limitations. • Chapter 5 summarizes the conclusions obtained from our work, the main contributions of the thesis and some of the possible lines of future research. • Finally, Chapter 6 contains the details and full text of the journal and conference publications that constitute the thesis. 4 Chapter 2. State of the art This chapter summarizes the state of the art in relation to: serious games, including the techniques to assess their players; game learning analytics, and data standardization, including the data standard for interactions with serious games xAPI-SG; and data mining techniques, including prediction models. The chapter finally presents the combination of the three previous fields that compose the core of the thesis: the applications of data mining techniques to game learning analytics data collected from serious games, including the systematic literature review carried out about this topic, that constitutes one of the contributions of the thesis. 2.1. Serious Games The application of innovative and more interactive tools in educational contexts has greatly spread in recent years. For instance, the field of e-learning proposes a learning experience through electronic tools including more means of interaction, although sometimes the experience is restricted to a digitalized version of traditional learning. Gamification techniques (i.e. the use of game techniques in non-gaming areas) are also been applied in education to benefit from their advantages compared to more traditional approaches. This interest has also reached videogames, and their application with serious purposes has greatly increased in recent years. Serious Games (SGs) have been defined as games that “do not have entertainment, enjoyment or fun as their primary purpose” (Michael & Chen, 2005) and as digital games “created with the intention to entertain and to achieve at least one additional goal (e.g., learning or health)” (Dörner, Göbel, Effelsberg, & Wiemeyer, 2016). Although serious games also should be entertaining, their main purpose could be to teach some knowledge, create awareness of some issue, change players’ attitudes, etc. Videogames provide an engaging, highly interactive environment with many possibilities for causing an effect on players. Among their several advantages, they provide (Dörner et al., 2016): • Motivation: games increase students’ motivation, allowing them to interact with the learning tool and overcoming the proposed challenges. 5 • Engagement: games reach players on an emotional level, in an immersive experience, breaking the usual barrier of 10 minutes of attention. This way, serious games allow to link education and entertainment. • Feedback and adaptation: within games, players can practice different strategies or choices, obtaining feedback of the consequences of their actions. Games also provide adaptability, as they can change according to players’ choices or profiles. • Progress and completion: games provide a progressive increase in the difficulty so players can train the skills or knowledge, while they are progressing in the game. They also provide a means of completion to the full game or different levels or chapters that provide a feeling of progress within the gameplay. • Free and safe exploration: serious games allow players to test complex or risky scenarios in a safely manner, as players can explore the game areas and path in a safe and free exploration, training complex procedures. • Active learning: the interactive nature of games helps in allowing students to have an active role for learning; compare to the traditional passive role of learning in traditional lectures. Examples of Serious Games can be found in multiple domains, such as: medicine (Evans et al., 2015; Standford Medicine, 2013), mathematics (Center for Game Science at the University of Washington, 2016), physics (V. J. Shute, Ventura, & Kim, 2013), literature (Iglesias, Fernandez-Vara, & Fernandez-Manjon, 2013), history (GTLHistory, 2020), computer science (Adamo-Villani, Haley-Hermiz, & Cutler, 2013), or military (United States Army, 2002). Besides teaching knowledge, other serious games focus on raising awareness about social problems, such as humanitarian crisis (interFUEL, 2006) or drug addiction (Asociación Servicio Interdisciplinar de Atención a las Drogodependencias (SIAD), 2014). Some examples of serious games are Treefrog Treasure to teach mathematics (Figure 1, left) and Darfur is Dying to raise awareness about the humanitarian crisis in Sudan (Figure 1, right). The interest of applying games in education has increased in recent years, not only in the education or research communities. Some commercial videogames have also launched their educational versions, to be used by teachers in schools or high schools, such as: a version of SimCity (Electronic Arts Games, 2019) to teach about city management and pollution (Electronic Arts, 2013), a version of Civilization (Firaxis Games, 2016), to teach historical problem solving (Seppala, 2016), an Education Edition of Minecraft (Mojang Studios, 2011) to teach basic coding concepts (Mojang 6 Studios, 2016), or a version of Portal 2 (Valve, 2011) to teach physics concepts (Valve, 2012). Figure 1. Screenshot of the serious game Treefrog Treasure (left) retrieved from https://cool- math.co.uk/treefrog-treasure/, and of the serious game Darfur is Dying (right) retrieved from https://www.commonsense.org/education/game/darfur-is-dying. Despite their multiple advantages, serious games are still rarely applied in education (Kato & Klerk, 2017). One of the reasons for their low adoption is the lack of evidences about the impact they have on players, as no clear evidences are given on how to use games for players’ assessment. Therefore, their application is usually limited to a simple complementary or additional activity with no real impact on students’ final evaluations (Pereira, De Souza, & De Menezes, 2016). Some literature reviews have pointed out the potential positive impact of gaming with respect to learning, skill enhancement and engagements, finding that the most frequently occurring outcomes and impacts were knowledge acquisition/content understanding, and affective and motivational outcomes (Connolly, Boyle, MacArthur, Hainey, & Boyle, 2012). Besides, prior to assess students by using serious games, the serious game must be evaluated itself. In this regard, there are few approaches to systematically evaluate educational games (Petri & Gresse von Wangenheim, 2017). In the following subsection, we describe the current assessment techniques of players using serious games, and their possible improvements to provide a more direct measure to assess players directly from their actions in the game. 2.1.1. Players’ assessment using serious games Despite the multiple advantages of applying games in educational contexts, there is a lack of formal or systematic evaluation with games. Few empirical studies have investigated the effectiveness of SGs in learning (Girard, Ecalle, & Magnan, 2013). https://cool-math.co.uk/treefrog-treasure/ https://cool-math.co.uk/treefrog-treasure/ https://www.commonsense.org/education/game/darfur-is-dying 7 The first step before applying serious games in educational contexts is to perform a formal validation of the games, to ensure that they produce the intended effect on players (teaching knowledge, raising awareness, etc.). To determine if serious games have the intended impact on their players, the most common validation technique is called pre-post experiments (Calderón & Ruiz, 2015). This game validation process is as follows (Figure 2): 1. Players complete a questionnaire before playing (pre-test) assessing the characteristic the game aims to change (e.g. knowledge, awareness). 2. Players play the serious game, from beginning to end. 3. Players complete a questionnaire after playing (post-test), again assessing the characteristic the game aims to change. The results on the pre-test and the post-test are then compared: if a significant improvement is found between both results, we can say that players have learned, and the game is considered to be effective and, therefore, it is validated. The pre-test and the post-test have been traditionally carried out on paper and, typically, they contain the same questions, or at least questions similar enough so that they can be fairly compared. Figure 2. Traditional formal assessment methodology with serious games: pre-post evaluation. Notice that in Figure 2, and in other places of this document, we simplify the narrative regarding the purposes of serious games by saying that they increase players’ knowledge (i.e. games for learning): however, our explanations are equally valid for serious games that aim to cause a different effect on players, such as raise their awareness about some issue, or change their attitudes. The comparison between pre-test and post-test results provides, for each player, the assessment of their learning and, globally, if results are positive, provides the validation of the serious game. Once the serious game has been formally validated with this approach, questionnaires are also used in deployment. Both pre and post questionnaires are needed to measure the exact learning of players. It is also possible 8 to assess students only with post-questionnaires, if it is only required to measure final players’ knowledge. The use of external questionnaires to evaluate players has several disadvantages, as they are error prone and increase the total time of the experiment (Clark, Martinez-Garza, Biswas, Luecht, & Sengupta, 2012; Frederick-Recascino, Liu, Doherty, Kring, & Liskey, 2013). Questionnaires have, first of all, to be created and validated so that they provide an effective validated measure of the characteristic that they aim to evaluate. Once such questionnaires exist, they have to be prepared and distributed in advance before the gameplay and after the gameplay, therefore, reducing the time left in the session to actually play the game. Finally, questionnaires results have to be digitalized (if they are paper based as they have been traditionally), and analyzed to obtain the evaluation results for all players. Moreover, the complexity and requirements of assessment processes through pre-post questionnaires make them neither scalable nor generalizable outside a controlled experimental setup. Game-based assessment offer better opportunities than traditional external questionnaires (de Klerk & Kato, 2017), as they provide rich interaction data that can more effectively measure the change on players as it is happening, that is, while they are playing the game. The field of stealth assessment captures evidences of gameplay without disrupting players’ progress and then compares the evidences gathered in the log files against an evidence model (V. J. Shute & Moore, 2017; V. Shute & Kim, 2014; V. Shute & Ventura, 2013). Following this line of research, we consider that better informed assessment of players can be obtained with data from game interactions. The rich and varied information that can be gathered from serious games is encapsulated in the field of Game Learning Analytics. 2.2. Game Learning Analytics Learning Analytics (LA) was originally defined in the first Learning Analytics Knowledge (LAK) conference as the “measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimising learning and the environments in which it occurs” (Phil Long & Siemens, 2011; Philip Long, Siemens, Gráinne, & Gašević, 2011). LA can therefore contribute to both teaching and learning practice (Gašević, Dawson, & Siemens, 2015). LA comprises multiple goals and techniques to collect and analyze data from different learning environments. For instance, the reference model for LA (Figure 3) presented by (Chatti, Dyckhoff, Schroeder, & Thüs, 2012) describes the: 9 • What? The variety of educational data that can be gathered from different learning environments to be used for the analysis. • Why? The main goals of LA, including monitoring of students’ activities and reporting of results, prediction of knowledge, assessment, adaptation of the learning resources, or reflection about the effectiveness of the learning or teaching practice. • How? The techniques applied including visualizations and data mining. • Who? The different stakeholders that can benefit from the application of LA including students, teachers, institutions, and researchers. Figure 3. Learning Analytics diagram, retrieved from (Chatti et al., 2012). The application of LA in education has increased mainly promoted by the large amount of interaction data that can be captured by two systems: Massive Open Online Courses (MOOCs), in which the main focus is to predict student success and dropout (Moreno-Marcos, Alario-Hoyos, Munoz-Merino, & Delgado Kloos, 2018) and Learning Management Systems (LMSs) in educational institutions, also with the focus of early detecting students at risk and take the corresponding actions to prevent their failure in the module (Macfadyen & Dawson, 2010). The work of (Ruiperez-Valiente, 2020) proposes an implementation process of LA in learning environments (e.g. educational videogames) considering the steps of data collection, data cleaning and feature engineering, and analysis of the data (with exploration, or prediction models) with different goals based on the stakeholder: visual dashboards for teachers, adaptive contents and recommendation systems for students, or educational reports for institutions (Figure 4). 10 Figure 4. Implementation process of learning analytics, adapted from https://es.slideshare.net/emadridnet/20201113-aplicando-analticas-de-aprendizaje-en-un-juego-serio-de- puzles-geomtricos-jos-a-ruiprez. The collection and analysis of interaction data is not exclusive to the educational domain. In the field of entertainment games, the analysis of player interaction data is also becoming increasingly widespread. The so-called field of Game Analytics (GA) (Seif El-Nasr, Drachen, & Canossa, 2013) aims to provide data-driven information to support decision-making in the fields of game development and research. With this information, the goal is to better understand players behavior, improve the game design, and ultimately enhance the commercial aspect of the game. Analytics data can improve all stages of the lifecycle of games, including design and development: informing game design based on players’ requirements and improving the player experience; discovering potential problems in development and reducing costs; optimizing the game for publishing; improving game user retention and increasing game revenue; and finally, extending the game’s life cycle (Su, Backlund, & Engström, 2020). Moreover, there are some overlapping research areas such as Serious Game Analytics, which focuses on skills and performance improvement (Loh, Sheng, & Ifenthaler, 2015a). In addition, LA can also be applied in serious games, by focusing on the interactions that are meaningful to the learning process (Chaudy, Connolly, & Hainey, 2014). Research applying LA to SGs has focused on learner performance and game design strategies (Liu, Kang, Liu, Zou, & Hodson, 2017). The information gathered can provide feedback to improve and validate the game design: to obtain comprehensive results, authors have stressed that interpretable data should be designed early on, selecting the suitable analysis features (E. Owen & Baker, 2019). The data that can be collected using LA could be classified as intensive or extensive data (Shoukry, Göbel, & Steinmetz, 2014). Intensive data is obtained when the focus is on a limited number of students, for whom very detail interactions are collected. Extensive data is obtained from a large number of users, when only few data is gathered https://es.slideshare.net/emadridnet/20201113-aplicando-analticas-de-aprendizaje-en-un-juego-serio-de-puzles-geomtricos-jos-a-ruiprez https://es.slideshare.net/emadridnet/20201113-aplicando-analticas-de-aprendizaje-en-un-juego-serio-de-puzles-geomtricos-jos-a-ruiprez 11 about each user results. A combination of both approaches is recommended to complement each other and to avoid missing significant patterns. The fields of LA and SGs also provide mutual benefits: while LA can improve SGs by providing information that can lead to better game designs or educational results of the game, SGs can also benefit LA by providing an innovative scenario for learning, enriching the opportunities of LA for more accurate and objective assessments (Petrov, Mustafina, Alloghani, Galiullin, & Tan, 2018). Game Learning Analytics (GLA) is the combination of the educational goals of Learning Analytics with the tools and technologies of Game Analytics (Freire et al., 2016). An implementation of GLA needs to obtain evidences of players’ interactions in the game, storing detailed information for later analysis. These analyses could include both real-time analytics to allow to make targeted interventions while students are playing, and later batch analysis, to perform more complex analysis and aggregating results from multiple gameplays. In the conceptual GLA System and Model (Figure 5), the cycle starts when the game sends data to a collector for storage and aggregation, creating the needed reports and visualizations. The information obtained can also be used to assess students. Finally, the analytic system provides feedback to the game to adapt to players’ characteristics (Hauge et al., 2014). Figure 5. Game Learning Analytics model retrieved from (Hauge et al., 2014). This way, the results obtained from GLA can be used for multiple purposes including visualizations in dashboards for real-time intervention, offline reporting, adaptation of the game, assessment of players, etc. The interaction data collected from serious games is very varied and reliant on the game. As the collected interaction data is commonly closely tied to the particularities of the game, it becomes particularly difficult to reuse the interaction data outside the particular environment where it was defined: the combination of data of multiple systems, the integration of tools and even the sharing of interaction data for research 12 purposes are therefore limited. To simplify the collection, analysis and reporting of results from game interaction data, it is convenient that the collected data format follows a standard. Doing this, it will additionally simplify its integration with other analytics tools that use the same standard. 2.2.1. Data standardization: xAPI-SG Data standardization is an essential step that can help to simplify the full process of applying game learning analytics data for serious games: to define and simplify the collection of data from serious games, systematize the analysis and report of results, allow the replicability of the process and results, simplify the integration with other environments, and even to share the collected data for other purposes (Kitto, Whitmer, Silvers, & Webb, 2020). Despite the convenience of applying standards, there is a lack of standardization around collecting, analyzing, and managing student learning data from educational games (Keehn & Claggett, 2019). Currently there are two prominent standards related to the collection of analytical data: IMS Caliper and ADL xAPI. Although they share the same purpose, that is, provide the means to collect and share analytical data between tools, their approaches are different. On the one hand, IMS Caliper defines both a general framework and a set of specific and fixed set of metrics profiles which models specific learning activities interactions. On the other hand, the xAPI specification defines a general framework (API and data model) and the means to let the community define their own profiles. In our work, due to the availability of a specific profile for serious games, as described in detail below, we focus on xAPI. The Experience Application Programming Interface (xAPI, for short) is a data specification created by a community led by the working group Advanced Distributed Learning ADL (ADL, 2012), a part of the Department of Defense of the United States of America. xAPI is based on activity streams (Snell et al., 2011), a standard to represent activities, and aims to provide a standard to communicate information about learners’ activities in learning systems. The main concepts of xAPI are verbs, activity types and extensions. Data traces in xAPI (called statements) are JSON-based and represent learning activities. Each statement contains three main fields: actor, verb, and object. The actor represents the one who carries out the action, the verb is the action itself, and the object is the item that receives the action. Extensions may be included in the statements to provide further context, results, etc. An example xAPI statement can be seen in Figure 6, representing that a learner has started a new activity. 13 Figure 6. Example xAPI statement representing the event “John Doe initialized the example activity” generated with https://adlnet.github.io/xapi-lab/. For fields that have specific requirements that go beyond the ones defined in xAPI, Experience API Profiles can be created to provide the means to comply with expertise in that topic area. An xAPI Profile is defined as “the human or machine-readable documentation of application-specific concepts, extensions, and statement templates used when implementing xAPI in a particular context” (ADL, 2017). xAPI Profiles provide specific sets of verbs, activity types and extensions to meet the needs of the topic area. There are several xAPI domain-specific profiles that have been authored (described in http://xapi.vocab.pub/browse/index.html), including profiles for open e-book tracking or healthcare training scenarios. The xAPI Profile for Serious Games (xAPI-SG) was created to identify and standardize the common interactions that can be tracked in serious games. An interactions model for serious games was created and then validated and published with ADL (Serrano- Laguna, Martínez-Ortiz, et al., 2017) to be the official xAPI Profile for Serious Games, as part of Ángel Serrano’s thesis (Serrano Laguna, 2017). The xAPI-SG Profile vocabulary1 defines a set of verbs (accessed, completed, initialized, interacted, pressed, progressed, released, selected, skipped, unlocked, used), activity types (area, controller, cutscene, dialog-tree, enemy, item, keyboard, level, menu, mouse, non- player-character, quest, question, screen, serious-game, touchscreen, zone) and extensions (health, position, progress) to collect the most common interactions in 1 http://xapi.e-ucm.es/vocab/seriousgames http://xapi.vocab.pub/browse/index.html http://xapi.e-ucm.es/vocab/seriousgames 14 serious games. To define their use, the verbs and activity types are related and categorized based on the following higher-level target types: • Completables define something that players can start, progress and complete. The activity types serious-game, level and quest are completable types, and the actions performed with them are initialized to indicate the start, progressed with the extension progress to describe how far the player advances in the current completable and completed to signal its ending. • Reachables or accessibles define virtual spaces in the game that players can access or skip. The activity types screen, area, zone and cutscene are types of reachables, and the verbs accessed and skipped are used to track the actions regarding entering those areas, or skipping them, respectively. • Alternatives define decisions that players face in the game. They include the types question, menu and dialog-tree, and the verbs used to track their actions are selected, to indicate a choice taken within an alternative, and unlocked to indicate that a previously-locked option can now be selected. • Targets define game elements that the player can interact with. They include the types enemy, non-player-character and item. The verbs associated with them are interacted for general interactions, and used when the player has actually used the target. • Devices define pieces of hardware that the player interacts with to control the game, including mouse, keyboard, controller, and touchscreen. The verbs pressed and released are used to describe interactions with such devices. Figure 7 displays an example xAPI-SG statement, stating that John Doe (actor, name) has selected (verb) the incorrect (result, success) response Lisbon (result, response) in the question (object, definition, type) Capital_of_Spain (object). Additionally, the extensions in the result field collect that the player has a health of 0.34. 15 Figure 7. xAPI-SG sample statement generated when "John Doe selected a false response in a question", retrieved from (Serrano-Laguna, Martínez-Ortiz, et al., 2017). The use of the xAPI-SG Profile can simplify the collection of interaction data from serious games. The collected data can then be used for multiple purposes, such as the assessment of players. For that and other purposes, multiple techniques can be applied to the GLA data gathered, including data mining techniques. 2.3. Data mining techniques The potentially large amount of GLA data collected from players interactions in serious games can be analyzed with multiple techniques, including data mining techniques. Data mining (DM) is a term defined as the “process of discovering interesting patterns and knowledge from large amounts of data” (Han, Kamber, & Pei, 2012). This technique is commonly a step in a large process of knowledge discovery, that involves the following: 1. Data cleaning: removes noise and inconsistent data. 2. Data integration: combines multiple data sources. 3. Data selection: choses the relevant data for analysis. 4. Data transformation: performs aggregations and other operations to shape data in the appropriate forms. 5. Data mining: applies intelligent methods to extract data patterns. 6. Pattern evaluation: identifies the interesting patterns and knowledge. 7. Knowledge presentation: presents the results through visualization or representation techniques. 16 Prior to creating any DM models, data preprocessing is essential to clean the dataset finding possible missing or incorrect values, integrate all the data from the different data sources, remove any redundant data and selecting the data variables, and perform the required transformation on the data variables. After performing the data mining models and evaluating the results, the last step of the DM process commonly includes the creation of visualizations to report information about the gathered data; in education, visualizations are usually put together in teacher or learner dashboards, providing an overview of the actions taken. The DM techniques mainly focus on the problems of supervised and unsupervised learning. Supervised learning comprises the techniques that classify the new input data from the labeled training data points. Unsupervised learning, however, comprises the techniques to create clusters of the dataset, where input examples are not labeled in classes. These learning techniques can be applied, for instance, to create prediction models of players’ learning, in the case of supervised learning, or to create different player profiles based on their actions, with unsupervised techniques. In this work, we focus on supervised techniques: prediction models that can be created for both classification and regression problems, depending on whether the target variable is categorical (or binary), or linear, respectively. Classification models include decision trees, Bayesian classification, or support vector machines (SVM), while regression models include linear regression or regression trees. Some models like neural networks or k-nearest neighbors (k-NN) can be adapted to both classification and regression problems. Ensemble methods combine multiple models to improve performance: for instance, for classification, an ensemble classification model is made up of a combination of different classifiers, each of one vote and the ensemble final classification is based on the combination of the individual votes. Random forests or Ada boost models are examples of ensemble models. For the work carried out in this thesis, we selected a variety of prediction models based on the ones commonly reported in similar works in the literature (as later described in the results of the literature review) and including both traditional models (e.g. regression and tree-based models), as well as some of the more complex and promising models (e.g. neural networks and ensemble methods) to try to improve the results. As a brief summary, the prediction models included in this thesis for classification problems, as defined in references like (Agarwal, 2014; Irizarry, 2019), are: • Decision trees are flowchart-like tree structures, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf or terminal node holds class label. 17 • Logistic regression is a simple linear model, that converts probabilities into labels where each outcome is assigned to one of the two numeric values of 0 and 1. The model can be extended to model several classes. For regression problems, the prediction models considered in this thesis include: • Regression trees are built exactly like decision tress, but the predictive value in each leaf node is not a class, but an average value of the training observations that also fall in that leaf. • Linear regression is a simple linear model that finds the “best” line to fit two attributes (or variables) so that one can be used to predict the other, that is, the data are modeled to fit a straight line. This technique can be extended to more than two attributes, in what is called multiple linear regression. • Support Vector Machines (SVM) use a nonlinear mapping to transform the original data into a higher dimension, in which it is possible to create the “best” hyperplane to separate the data. SVM has its corresponding version for regression, SVR, which can use both linear and non-linear kernels. • Bayesian ridge regression is a version of a Bayesian regression model that includes regularization parameters. • The k-nearest-neighbor (k-NN) algorithm represents data in a multidimensional space, and bases the prediction for each input data on the values of the data that are closest in that space to the input data (the nearest neighbors). • Neural networks are a set of connections between input and output units, with an associated weight for each connection. During the learning phase, the network adjusts the weights so that it performs the best prediction on the input. • Random forests are an ensemble method that combines multiple single regression or decision trees, whose predictions are combined in each step to obtain a global prediction. • AdaBoost (adaptive boosting) is a type of boosting algorithm that combines multiple methods and weighs their predictions, changing the weight of each algorithm based on their accuracy in the previous iterations. • Gradient boosting is a type of boosting algorithm that ensembles multiple prediction models (typically trees). Finally, to evaluate the predictive efficacy of the models, it is necessary to test them. For that purpose, we performed cross validation (CV), a technique that divides the training data into multiple groups and performs as many iterations as groups to ensure that all data points participate in both training and testing steps of the prediction 18 models. Different metrics can be applied to test the models’ effectiveness. For the classification models considered in this thesis, we selected the metrics of precision and recall, to provide an indication of how successful the models were, as well as misclassification rate (MR) to provide a measure of error. These metrics are based on the defined values of the confusion matrix (Table 1) regarding the relation between positive and negative values, and their predicted ones. Table 1. Confusion matrix for classification algorithms. Positive Value Negative Value Predicted Positive True Positive (TP) False Positive (FP) Predicted Negative False Negative (FN) True Negative (TN) In particular, the three metrics that we report for the classification algorithms used, precision, recall and misclassification rate (MR), are defined as follows: 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑃 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑁 𝑀𝑅 = 𝐹𝑃 + 𝐹𝑁 𝑛 For the regression algorithms, we chose the metric of Mean Absolute Error (MAE), as it is an interpretable measure of the error of the models. The MAE is defined as: 𝑀𝐴𝐸 = ∑ |𝑥𝑖 − 𝑦𝑖|𝑛 𝑖=1 𝑛 = ∑ |𝑒𝑖|𝑛 𝑖=1 𝑛 Where the absolute error |𝑒𝑖| = |𝑥𝑖 − 𝑦𝑖|, being 𝑥𝑖 the true value and 𝑦𝑖 the predicted value. Data mining techniques have been extensively applied in many fields, including education. Educational Data Mining (EDM) is defined as a discipline “concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in” (R. Baker & Yacef, 2009). EDM is related to LA as it applies techniques from the fields of statistics, machine learning, and data mining to analyze data gathered during teaching and learning activities (Bienkowski, Feng, & Means, 2012; ElAtia, Ipperciel, & Zaïane, 2016). The works on EDM have focused on the analysis and visualization of data, providing feedback to support instructors, providing 19 recommendations for students, predicting student performance, student modeling, detecting student behaviors, studying the relationships between students, grouping students, and creating and planning the course (Romero & Ventura, 2010). Figure 8 describes the full process of EDM (Vahdat et al., 2015) that includes the collection of data, and its preprocessing, the data analysis to obtain metrics and the postprocessing to report feedback and create interventions to optimize the learning process. Notice the similarities between this process and the GLA system depicted in Figure 5, including the collection of data from the learner/player, the analysis and reports, and the final feedback step to close the process. Figure 8. Educational Data Mining (EDM) process, retrieved from (Vahdat et al., 2015). 2.4. Applications of data mining to game learning analytics data In the previous sections we have described, separately, the three main topics of our thesis: serious games, game learning analytics, and data mining techniques. In this section, we focus on the combination of the three topics: the application of data mining techniques to game learning analytics data from serious games. As we did not find many studies that considered all three aspects, we carried out a specific systematic literature review, which constitutes one of the contributions of the thesis, as described below. Data science and artificial intelligence (AI) have been applied in games to study game- playing, content generation and player modeling (Yannakakis & Togelius, 2018). The applications of learning analytics on serious games for assessment has also been studied, and authors have found out how SGs had a positive impact on learning and highlighted the importance of the game design (Liu et al., 2017). Despite these and other works previously discussed, we did not find literature reviews or major revision studies that 20 combined the three areas of our interest: serious games, game learning analytics data, and data mining techniques. To fill this gap identified in the literature, we carried out a specific systematic literature review focusing on the specific data science techniques used to game learning analytics from serious games. Specifically, we analyzed the purposes for which data science had been applied to GLA data from SGs, the data science techniques applied, the stakeholders that were the targets of such analysis, and the results obtained in the studies (Alonso-Fernández, Calvo-Morata, Freire, Martínez-Ortiz, & Fernández- Manjón, 2019). We additionally gathered the information about the serious games used in the studies, the interaction data captured, and the participants included in the experiments. The rest of the section describes the key aspects of the related work found out as a result of the literature review, with emphasis on the results obtained in the studies and the discussion and conclusions pointed out in such works. Further details about the results of the literature review, and the overview of the main results, are included in section 4.1 of this document, as part of the results and contributions of the thesis. In the studies reviewed, the main purposes of applying data science to GLA data from SGs were player assessment and performance predictions, the study of players’ in-game behaviors and the validation of the games design. The data science algorithms and techniques used in the studies can be grouped into three main categories: regarding supervised algorithms, a majority of studies used linear and logistic regression, or regression and decision trees; the unsupervised algorithms most commonly included in the studies were correlation and clustering; and performance metrics were usually included in visualizations. The applications included in the studies were mainly targeted at serious game designers and developers, researchers, or teachers and educators. The focus of the serious games used was to teach (in particular, mathematics and science), to participants of primary and secondary school. The sample sizes included in most studies were low (less than 100 participants). The interaction data captured mainly focused on completion, scores and interactions in general; while the data format in which interactions were collected was majority not stated. The results obtained in the application of data science techniques to GLA data were varied and closely related to the purpose of application. To encapsulate the results, we defined three main groups: (1) studies focusing on players’ assessment and learning predictions; (2) studies focusing on serious game design and implementations; and (3) studies that propose frameworks to apply GLA in specific contexts. 21 In the first group of studies identified in the review, studies focused on assessment of players and learning predictions. These studies highlighted how GLA data can accurately predict and measure games’ impact (Kosmas, Ioannou, & Retalis, 2018; Mavridis, Katmada, & Tsiatsos, 2017): being useful at real-time and after the interventions are completed (Wiemeyer, Kickmeier-Rust, & Steiner, 2016), and for all stakeholders (Alonso-Fernández, Pérez-Colado, Freire, Martínez-Ortiz, & Fernández- Manjón, 2019). However, authors have pointed out that most data are still captured after the game (Smith, Blackmore, & Nesbitt, 2015) and that specific game learning analytics (Freire et al., 2016), or so-called serious games analytics (Loh, Sheng, & Ifenthaler, 2015b) are required for more precise information. Regarding learning predictors, the achievement system built into games may not be the most informative indicator of learning (Heeter, Lee, Medler, & Magerko, 2013); instead, predictions of player success should be based on log data (R. S. Baker, Clarke-Midura, & Ocumpaugh, 2016; Rowe, Asbell-clarke, & Baker, 2015). The best learning predictors are based on the analysis of the player's exploration strategies (Horn et al., 2016; Kang, Liu, & Qu, 2017; Käser, Hallinen, & Schwartz, 2017; V. E. Owen, Anton, & Baker, 2016; Smith, Hickmott, Southgate, Bille, & Stephens, 2016), or on player failures (Halverson & Owen, 2014) and behaviors (Hernández-Lara, Perera-Lluna, & Serradell- López, 2019; Ketamo, 2013; Mayer, van Dierendonck, van Ruijven, & Wenzler, 2014; Rowe et al., 2017; Tellioglu, Xie, Rohrer, & Prince, 2014; Z. Xu & Woodruff, 2017). Authors also provided some recommendations to improve the learning predictions: perform feature engineering (V. E. Owen & Baker, 2018), include the domain structure and the weights of competencies (Kickmeier-Rust, 2018), and perform exploratory data analysis (DiCerbo et al., 2015) and dynamical analysis (Snow, Allen, & McNamara, 2015) to uncover unexpected patterns. Assessment can be further improved combining generic game trace variables (Steiner, Kickmeier-Rus, & Albert, 2015) or basic sets of traces (Serrano-Laguna, Torrente, Moreno-Ger, & Fernández-Manjón, 2014). Many of the studies also analyzed how performance was related to players’ characteristics: creating clusters of players performance based on their actions (Chung, 2015; Cutumisu, Blair, Chin, & Schwartz, 2017; Forsyth et al., 2012; Freitas & Gibson, 2014; Lazo, Anareta, Duremdes, & Red, 2018; Martin et al., 2015; Martinez-Garza & Clark, 2017; Polyak, von Davier, & Peterschmidt, 2017; Sharples & Domingue, 2016; Slimani, Elouaai, Elaachak, Yedri, & Bouhorma, 2018); or differentiating experts from novice users (Loh & Sheng, 2014, 2015a, 2015b). Once students are classified in a performance group, scores can be inferred adding time or action sequences (Gibson & Clarke-Midura, 2015). Learners’ characteristics such as age and gender (Wallner & 22 Kriglstein, 2015), background (Jaccard, Hulaas, & Dumont, 2017), or exploration strategies (Martin et al., 2013) also influence their learning behaviors (Liu, Lee, Kang, & Liu, 2016). Modelling students is essential to effectively adapt learning (Koedinger, McLaughlin, & Stamper, 2012; Liu, Kang, Lee, Winzeler, & Liu, 2015; Sabourin, Shores, Mott, & Lester, 2013). GLA data can also be used to track students’ progress (Gweon et al., 2015), assess persistence (Dicerbo, 2013), or detect engagement (Ghergulescu & Muntean, 2016). The information gathered can also be presented at real-time to teachers and students (Elaachak, Belahbibe, & Bouhorma, 2015) or parents (Ketamo, 2015; Roberts, Chung, & Parks, 2016). The second group of studies focused on improving serious game design and implementation. First, their results proved that GLA data can be used to validate the serious game design (Cano, Fernández-Manjón, & García-Tejedor, 2018; Cheng, Rosenheck, Lin, & Klopfer, 2017; Harpstead, MacLellan, Aleven, & Myers, 2015; Ninaus, Kiili, Siegler, & Moeller, 2017; Serrano-Laguna, Torrente, Moreno-Ger, & Fernández-Manjón, 2012; Tlili, Essalmi, Jemni, & Kinshuk, 2016). Authors have stressed how assessment should be early integrated in serious games development and design (Ke, Shute, Clark, & Erlebacher, 2019; Ke & Shute, 2015), starting from an early definition of the game traces to be collected (Serrano-Laguna, Manero, Freire, & Fernández-Manjón, 2017; Tlili et al., 2016). To improve the design of games for assessment, studies have pointed out that assessment models should be reliable, providing meaningful educational information (Steiner et al., 2015). For that, it is required to explore how design decisions affect the learning outcomes (Plass et al., 2013) and include adaptivity (Streicher & Smeddinck, 2016). Learning has also been investigated in relation to some serious games’ characteristics: adaptive difficulty (Hicks et al., 2016; Käser et al., 2013; Martinez-Garza & Clark, 2017); engagement and motivation (Pareto, 2014; Stamper et al., 2012; Tlili et al., 2016); and feedback and interventions during play (DeFalco et al., 2018; McCarthy, Johnson, Likens, Martin, & McNamara, 2017). The final group of studies included the proposal of frameworks to simplify serious game design in specific contexts: game analytics frameworks for people with intellectual disabilities (García-Tejedor, Cano, & Fernández-Manjón, 2016; Nguyen, Gardner, & Sheridan, 2018), a game-based assessment model (Halverson & Owen, 2014), a framework to integrate design of event-stream features for analysis (V. E. Owen & Baker, 2018), a framework to support tracking and analysis of learners in- game activities (Hauge et al., 2014), a framework to help designers model experts’ solving process almost automatically (Muratet, Yessad, & Carron, 2016), an 23 interoperable adaptivity framework (Streicher & Roller, 2017), a framework for internet-scale experiments to inform and be informed by classroom and lab experiments (Stamper et al., 2012), an open-source SGs framework for sustainability (Y. Xu, Johnson, Lee, Moore, & Brewer, 2014) and a framework for a mobile game application for adults with cystic fibrosis (Vagg et al., 2018). From the review we can highlight several conclusions. Most studies focused on assessment and learners’ behaviors. Games are indeed a useful tool for purposes beyond entertainment, so now the interest focuses on analyzing interaction data to measure how much impact serious games have on players (mainly focusing on learning), and how that impact relates to players’ in-game behaviors. Studies used visualization, supervised and unsupervised techniques, mainly linear models, correlations, and cluster techniques. Newer and more complex and powerful techniques, like neural networks, are experiencing an important surge in popularity, but they did not appear that frequently in the reviewed studies. One possible explanation is that further evidence is needed on how to widely and reliably apply these new complex techniques, as well as to explain the results obtained, an open debate about explainable AI (xAI) (Adadi & Berrada, 2018). The main stakeholders considered are game designers/developers, and researchers, followed by teachers/educators. This suggests that the analysis of data from games is used for several purposes including research, improving or validating game design, and providing information when applying games in educational scenarios. Still, students are always indirect recipients of the results, as the research, improvement and adaptivity of games and assessment techniques will make the use of games more effective and efficient for the ones who play the games, that is, the students/learners. Most of the games used in the studies aim to teach science-related topics, in particular mathematics. This result shows the intention to benefit from games’ advantages to improve learning in a subject typically difficult for young students. It also and aligns with previous research which identified mathematics and science as the main areas for games targeted at primary education (Hainey, Connolly, Boyle, Wilson, & Razak, 2016). Sample sizes used in the studies are, in general, quite low (less than 100 participants). This may restrict the significance and generalization of their results, as well as the application of more complex algorithms (e.g. deep neural networks), which require larger amounts of data points. The low sample size used in experiments is an important issue, pointed out by authors (Petri & Gresse von Wangenheim, 2017). 24 Data collected from students’ interactions included mainly completion times, actions or interactions in general, and scores. All these data can be collected from most games, but provide basic information that does not take full advantage of the rich interactions produced in games, as described in works on game analytics in entertainment games (Seif El-Nasr et al., 2013). The data to be collected should be identified at early stages of the game development, to ensure that it provides information with educational value. Most papers did not report the format in which they collected the data, so we cannot know if they were using a standard or relying on their own data-formats. The latest scenario is less desirable, as it restricts the open sharing of the data for other purposes and requires an extra effort to replicate results with other techniques (Serrano-Laguna, Martínez-Ortiz, et al., 2017). We have not found reports of any open data set of game analytics data or learning analytics data from serious games; this hinders research in this area, as testing out new data science techniques requires not only choosing the techniques themselves, but also developing a serious game and performing the experiments to collect its interaction data. The analysis of GLA data from serious games has yielded, as expected, wide and varied results. We can, however, extract some general findings from the conclusions and discussions of the studies analyzed: • Predicting games impact with GLA data: raw data can be used to accurately predict impact (e.g. learning), including simple values from interactions (e.g. completion times, scores) but also more complex information such as kind of failures or exploration strategies. Adding information of the context is also recommended, as it can improve the models’ accuracy. Also, the choice of data to analyze should ideally be taken during game design, to ensure that as much educationally relevant data as possible is actually captured. • Importance of student profiling: performance appears to be highly related to students’ characteristics and behaviors, so it is recommended to create students’ profiles or clusters to improve learning, including targeted feedback and adaptive learning experiences. The need to fit users’ needs has also led authors to propose user-specific frameworks (e.g. for users with intellectual disabilities). • Designing serious games for assessment: assessment needs to be formally and reliably integrated in the development phase of SGs to provide meaningful educational information. This should not damage costs or entertainment, as games need to maintain engaging and motivation features, while controlling for an adequate difficulty. GLA data can then be used to validate the game design and assessment. 25 From the conclusions obtained in the studies reviewed, we can further point out that, although plenty of studies have started to investigate the opportunities that GLA data offers to improve the full lifecycle of serious games (from design and development, to deployment in real scenarios, and the assessment of players), there is still a gap in the literature regarding the systematization and generalization of players’ assessment. That is, the studies that investigate this (including stealth assessment techniques) are very tied to the game they are applied to, turning players’ assessment into a process performed ad-hoc for each specific case. More research is needed to try to generalize these data-based approaches, providing tools to systematize the steps required to effectively and accurately assess students based on their game interaction data. Additionally, such studies should also consider large-enough sample sizes, and more so if they involve the application of complex data mining techniques (e.g. neural networks), to ensure that there are enough evidences to support their results and replicate them in different scenarios. Our evidence-based approach, presented in the following chapters, aims to contribute in such identified areas: creating some easy-to-follow steps to assess players based on their interaction data with a serious game, using standards to support the process, and generalizing each step as much as possible so the process can be applied to a wide range of serious games (at least, to serious games with similar features, like narrative- based games), providing tools to support the process. In conclusion, serious games have a great potential to cause a positive effect on players. Combining the educational purposes of LA, and the techniques of GA, the field of game learning analytics comprises the collection and analysis of interaction data from serious games, that can provide a richer insight into players’ actions, inform about their progress and actions, and provide feedback to improve the serious game and the learning process. The information obtained from the analysis of GLA data can additionally be used for players’ assessment, improving the commonly method of assessment based on external paper-based questionnaires. This traditional technique does not base the assessment on data collected while learning is happening, that is, while players are playing the serious game. With the GLA evidences collected, new opportunities for assessment arise, as they can provide a finer-grained information on which to assess players. The GLA data can further be analyzed with data mining techniques, applying lessons from the field of EDM to serious games. Although some works have started to explore this idea, there is a lack of standardization and systematization to create evidence- based assessment of players based on their interactions with the serious game. To try 26 to advance in this identified gap, we propose the goals of this thesis, described in detail in the following section, to improve the current assessment methods of players using serious games, combining the potential of the information gathered from game learning analytics data, with the richer analysis obtained from data mining techniques. 27 Chapter 3. Goals of the thesis This chapter presents the general research goals of the thesis, and how they were divided into specific goals to be achieved during the research; together with the research methodology followed and how the steps in the research process were carried out to achieve these goals. 3.1. Research goals The main goal of the thesis is to simplify the current method of assessment of players using serious games, taking advantage of the richer value provided by the game learning analytics data created by players’ interactions during the gameplay. The current assessment methodology is based on external, frequently paper-based, questionnaires. We aim to avoid the use of these external questionnaires after the formal game validation phase and, instead, collect game learning analytics to obtain richer information about students’ gameplays and analyze it with data mining techniques to predict players’ performance. The large-scale goal is that the simplification of players’ assessment will, in turn, provide teachers and educators clearer evidences of the impact of playing serious games. If the assessment is then simplified, we consider that this will also foster the application of serious games in a broader set of actual educational scenarios, as clearer evidences will be provided of their impact on players. This can increase the adoption of serious games as providing teachers and institutions with evidences of the effectiveness of using serious games will make them a more attractive choice. To reach these high-level goals, we proposed the following specific milestones: 1. Analyze the current assessment techniques of players using serious games, the application of Game Learning Analytics data in the context of serious games, with particular focus on their application for assessment, and the application of data mining techniques to GLA data for assessment of players using serious games. 2. Propose an application of data mining techniques to the GLA data collected from serious games to simplify assessment of players using serious games, relying as much as possible on the game interaction data collected. 3. Verify the suitability of the proposal in actual experiments, collecting player interaction data from serious games, and applying data mining techniques to 28 the GLA data collected to create predictions that can effectively assess students, without requiring the use of pre-post questionnaires. 4. Iterate, as needed, the previous step to improve the evidence-based assessment process of players using serious games, analyzing the steps to move towards their generalization. 5. Create a final version of the evidence-based process to assess players using serious games based solely on the application of data mining techniques to the GLA data collected from their game interactions, and providing the required tools and standards to support the process. 3.2. Research process The process followed in the thesis was based on the design science research methodology (DSRM) (Peffers, Tuunanen, Rothenberger, & Chatterjee, 2007), that consists of the following steps (Figure 9): 1. Identify and define the problem • Show the importance and motivation 2. Define the objectives of a solution 3. Design and develop a solution 4. Demonstrate the solution in a suitable context to solve the problem 5. Evaluate the effectivity and efficiency of the solution • Iterate back to design 6. Communicate the solution • Publications Figure 9. Design Science Research Methodology (DSRM), retrieved from (Peffers, Tuunanen, Rothenberger, & Chatterjee, 2007). The DSRM was adapted to our context of application and based on our specific goals: to identify the problem, we started by studying the field to find areas for improvement 29 and issues that other authors had identified that required further research. Once the review was finished and the problem clearly identified, we proposed a first solution and we developed our proposal in two iterations. First, we prototyped the ideas and tested them by carrying out experiments in real settings to collect actual data that can serve us to test our approach, verify its suitability, refine it and improve it. Second, we synthesized the knowledge acquired in the experiments carried out to obtain and distill our final approach, with all the required steps, based on standards and developing the required easy-to-use tools to support the process. All the results were suitability communicated with publications in journals and conferences. The research process conducted in the thesis was then as follows: • Review the literature regarding the application of game learning analytics data and data mining techniques in the field of serious games. • Explore in a first case study the validity of a new players’ assessment approach using game learning analytics data and prediction models. • Conduct a second case study to verify the approach with another serious game with different goals, and different prediction models. • Create an assessment process based on game learning analytics data and data mining models, as general as possible, and describing each step of the process in detail. The process is based on the conclusions obtained in the two case studies; the use of standards like the xAPI-SG Profile to collect the interaction data from serious games; and includes the development of any required tools to support the process, like the tool T-MON to support a first exploration of the xAPI-SG interaction data collected to refine the selection of GLA variables. • Publish all results in high ranking journals and in conferences. Additionally, we had the previous experience of the work carried out in the European Projects H2020 RAGE and, particularly, BEACONING in which we had worked in the full lifecycle of serious games deployed in real large-scale scenarios: from their design and development, to the collection and analysis of the game learning analytics collected from players’ actions. Finally, the research stay carried out at Florida State University allowed us to contrast our approach with the work of the research group of Professor Valerie Shute, leader in the field of stealth assessment. 30 Chapter 4. Results and discussion This chapter presents the results obtained in the thesis including: the systematic literature review about the application of data mining techniques to game learning analytics data from serious games; and the two case studies conducted including their main characteristics and the main analytical insights obtained from them. The final result is the evidence-based assessment process distilled from these case studies, describing the game validation process including the definition of the interaction data to be collected, the use of the standard data format xAPI-SG, the feature extraction process into game learning analytics variables, and the tool to support that selection, T-MON, and the prediction models to simplify players’ assessment in games deployment. The chapter finalizes with a discussion of the results obtained, and points out the limitations of our work. 4.1. Study of the domain The study of the domain was conducted through a systematic literature review of the applications of data science techniques to game learning analytics data from serious games. For that review, we established the following research questions (RQ): RQ1. What are the purposes for which data science has been applied to game analytics data and/or learning analytics data from serious games? RQ2. What data science algorithms or techniques have been applied to game analytics data and/or learning analytics data from serious games? RQ3. What stakeholders are the intended recipients of the analysis results? RQ4. What results and conclusions have been drawn from these applications? We additionally included in the review the following information from the studies: • The main purpose of the games (e.g. teaching, change behavior) and their domain (e.g. biology, math). • The sample size of the studies, and the educational level of their participants. • The general characteristics of the in-game interaction data collected, and the data format used. 31 After defining the suitable search terms and databases, we carried out the selection process depicted in Figure 10, obtaining a final set of 87 studies included in the review. Figure 10. Process carried out to select the publications included in the systematic literature review. Regarding the applications of data science to game learning analytics data from serious games (RQ1), the main purposes include: assessment of learning, or prediction of players’ performance; the study of players’ in-game behaviors; the validation of the game design; the profiling of students in different categories based on their actions and characteristics; the study of different in-game interventions on players’ performance; and the proposal of different frameworks of application for particular scenarios (Table 2). These results confirm the interest pointed out by other authors regarding the use of game learning analytics data to assess learning or predict players’ performance with serious games, with an additional high interest in studying how players behave in game. Table 2. Main purposes of data science applications to game learning analytics data from serious games. Purpose category Definition of purpose Example studies Number of studies Assessment Assess learning, predict performance (R. S. Baker et al., 2016; Ke & Shute, 2015) 32 In-game behaviors Study in-game players behaviors (e.g. persistence, engagement) (Dicerbo, 2013; Kang et al., 2017) 27 Game design Validate game design (Cano et al., 2018; Tlili et al., 2016) 16 Student profiling Stablish categories of players profiles, differentiate players characteristics (Denden, Tlili, Essalmi, & Jemni, 2018; Loh & Sheng, 2014) 7 Interventions Study effect of in-game interventions (e.g. feedback messages, notification of performance) (DeFalco et al., 2018; McCarthy et al., 2017) 4 Framework proposals Propose a framework for specific contexts (Halverson & Owen, 2014; Nguyen et al., 2018) 10 32 The data science algorithms and techniques used in the reviewed studies (RQ2) can be grouped into three main categories (Table 3): • Supervised algorithms: linear and logistic regression, regression and decision trees, support vector machines, Bayesian networks, neural networks, naive Bayes, and Bayesian knowledge tracing. • Unsupervised algorithms: correlation, clustering, factor analysis. • Visualization techniques: display of gameplay pathways, performance metrics, learning curves, heatmaps of interactions, use of in-game tools (frequency or duration). Some of the studies used multiple of the previous techniques. From these results, we noticed that most studies were using simple traditional algorithms (e.g. linear regression, clustering), while not so many studies were using more complex techniques (e.g. neural networks). Regarding the main stakeholders targeted by the studies (RQ3), serious game designers and developers were the main target (39 studies), closely followed by researchers, or studies carried out with research purposes (37 studies). Many studies (25) also focused on teachers and educators. Finally, some studies mainly focused on students and learners (8) and few mentioned parents (2). We further analyzed the information that the studies provided regarding the serious games used, the participants included in the experiments, and the interaction data collected from their actions with the serious games. Regarding the serious games used, most of the serious games applied in the studies focused on teaching some knowledge Table 3. Data science techniques applied to game learning analytics data from serious games. Data science technique Number of papers using the technique Supervised models 31 Linear/logistic regression 18 Regression/decision trees 7 Bayesian networks 6 Neural networks 4 Naïve Bayes 3 Bayesian knowledge tracing 3 Support vector machines 2 Unsupervised models 35 Correlation 17 Clustering 16 Factor analysis 2 Visualization 36 Performance metrics 15 Gameplay pathways 7 Use of in-game tools 5 Learning curves 4 Heatmaps of interactions 2 33 to the players (55 studies). Most games aimed to teach mathematics (20 studies) or science (10 studies). Aligning with those results, the educational level of the participants was mainly primary, secondary school students, and undergraduates. The sample sizes of the studies were limited: around a third of studies used fewer than 100 participants, and around another third used fewer than 1000 participants, although most of such studies had a sample size closer to few hundreds of players. Finally, the interaction data collected from serious games mainly included completion times (30 studies), general interactions in the game (28) and scores (14). Most studies did not report the format on which they collected the interaction data. The studies reviewed provided diverse results that allowed us to draw the following general and specific conclusions: • Regarding players’ assessment and student profiling o GLA data can accurately predict serious games impact, for instance, predictions can be created based on players’ exploration strategies, failures or interactions between players in collaborative games. To improve the predictions, it is recommended to perform feature engineering and combine basis set of traces and variables. GLA information can be used both at real-time and after the intervention, and for all stakeholders. o Learning performance is related to players’ characteristics, therefore, players can be effectively be clustered into performance groups based on their actions. Understanding their learning characteristics is essential to better predict learning and improve or adapt the learning process. o Additionally, further information can be extracted from GLA data to study additional students’ characteristics, or to provide real-time information to players, students, and even parents. • Regarding serious games design o GLA data can effectively validate serious games design and mechanics. o Assessment can and should be integrated in early stages of serious game design and development, and it should be transparent and reliable, based on models that are valid, easy to use, and provide meaningful educational information. The data to be collected should be specified early on so that meaningful GLA information can be obtained from it. o Serious games characteristics, such as difficulty, engagement and motivation, and feedback and interventions during the gameplay, should also be considered as they affect learning. 34 o Specific frameworks have been proposed to apply GLA in different scenarios simplifying the multiple tasks in serious games design. In summary, despite the diversity of the studies, we found some notable common points: the main purpose when analyzing data from serious games is learning prediction or assessment, most commonly with linear prediction models, simple correlation or cluster techniques, or visually displaying performance information. Learning predictions obtained are quite accurate and may be improved with some of the pointed-out recommendations (feature engineering, combining multiple traces). The importance of student profiling as well as recommendations for integrating assessment into early phases of game design and development also stand out among the conclusions of the studies. It also stands out that future research should consider large-enough sample sizes to ensure significant conclusions, and to decide in advance which data is to be collected from the games. In this sense, as a baseline, typical data such as completion times, interactions, or scores can and should be included; but research can benefit from moving on to more complex data extracted from in-game interactions. Regarding algorithms, classical techniques should be compared with new more complex ones (e.g. neural networks), to determine which ones draw the best results in each case. Finally, authors have pointed out a clear need for specific game learning analytics, where the use of standards to collect GLA data is desirable, as it allows the creation of open data sets in standard formats, such as xAPI, for research purposes, and simplifies results reproducibility and improvement, as well as testing of new techniques and integration of analytics as a module of larger systems. The process, results and details about this systematic literature review have been published in (Alonso-Fernández, Calvo-Morata, et al., 2019), a publication that is included as part of this thesis. For the full text and details of the publication, see subsection 6.1.1. 35 4.2. First case study In this section, we describe the first case study carried out to test the approach to assess players automatically from their interactions with a serious game. In this case, we used a game to teach knowledge about first aid techniques. Besides verifying the suitability of our approach, in this first study, we were additionally interested in verifying if students’ previous knowledge (given in their pre-questionnaire results) was essential to accurately predict their knowledge after playing. The following subsections describe the serious game used (4.2.1), the interaction data captured from it (4.2.2), the analysis on that data to create GLA variables (4.2.3), the prediction models used to assess players and the results obtained (4.2.4), and the discussion and conclusions of the case study (4.2.5). 4.2.1. The game: First Aid Game The First Aid Game (Marchiori et al., 2012) is a game-like simulation with narrative structure that aims to teach basic life-support maneuvers in the situations of chest pain, unconsciousness and choking. The game is targeted for players between 12-16 years old. The three medical situations are depicted as different game levels. In each level or scenario, players can interact with several game elements: the main character, suffering from the medical condition depicted in that level, or a mobile phone that they can use to call the simulated emergency services. In each of the levels, different multiple-choice situations (second screenshot of Figure 11) are presented for the player to choose the course of action between a set of visual or textual options. These situations include the specific first aid knowledge to be learnt through the game (e.g. Heimlich maneuver to avoid chocking). Players learn if their decisions are appropriate or not: when choosing an incorrect answer, either the game reports the critical error and its consequences, and lets them try again until they choose the correct answer, either the game allows you to continue to later discover the consequences (and it is reflected in the final score). The game includes random elements to improve reflection and replayability (e.g. availability of a semi-automatic external defibrillator that players can also use). The three levels can be replayed as many times as wanted during the allotted time. After each level is completed, a score is provided to indicate players whether their actions were mostly correct or not (first screenshot of Figure 11). The presented score does not directly measure players’ knowledge but challenges them to replay levels where they made many mistakes. 36 Figure 11. Screenshots of the First Aid Game used in the first case study: the three game levels with a score (left) and visual choices in a level (right). The game was first developed and evaluated by the e-UCM research group and actual emergency physicians (Marchiori et al., 2012). The game was originally validated with an experiment that included pre-post questionnaires to measure players’ knowledge before and after playing, and a control group to compare the game effect against that of a theoretical-practical demonstration by a trained instructor. Players in the experimental group gained, on average, 2.07 points (on a 10-point scale), compared to control group learners who gained 3.61 points. This validation proved that the game achieved its goal of making player learn first aid procedures. The game was later adapted and updated to the Unity 3D videogame engine using uAdventure (I. Pérez- Colado, Pérez-Colado, Martínez-Ortiz, Freire, & Fernández-Manjón, 2017), an authoring tool developed by e-UCM. The adaptation also incorporated the collection of game interaction data used for the present case study. 4.2.2. Data captured The data captured for this case study was gathered in a set of experiments with N=227 high school students from a charter school in Madrid, Spain. We initially conducted a formative evaluation in two sessions with 28 students. With the feedback from these sessions, we tested the remote data collection in the school settings and prepared for the main experience. Out of the remaining participants (N=199), gender was not obtained for 15 students due to an error handling a questionnaire. For the other 184 students, the gender distribution was 46.7% males and 53.3% females. The median age was 14 years old. Two questionnaires were used in the experiments. The pre-questionnaire elicited demographic variables (players’ gender and age); the first aid knowledge questionnaire with 15 multiple-choice questions, also used in the original experiment to validate the game (Marchiori et al., 2012) about the game contents; and a questionnaire with 11 5- 37 point-Likert questions on game habits obtained from (Manero, Torrente, Freire, & Fernández-Manjón, 2016), and slightly adapted for this experiment. The post- questionnaire consisted of two parts: the same first aid knowledge questionnaire used in the pre-questionnaire (to compare results); and a questionnaire to evaluate the experience, with 5 5-point-Likert questions assessing the experience, and optional free- text sections for feedback. The score for the first aid knowledge questionnaires is defined as the total number of correct answers, therefore, possible scores ranged from 0 to 15 points. Internal consistency of the scale used was ensured when the test was created in the original validation experiment (Marchiori et al., 2012). These questionnaires were a simplification of the ones used in the medical domain and had been previously validated. From the game, we captured players’ interactions including game level starts, game levels endings and scores, selections in every multiple-choice situations and questions, and interactions with game elements (character, phone, defibrillator). All interaction data traces were collected using the xAPI-SG Profile. Figure 12 represents an example xAPI-SG statement collected from the First Aid Game. The statement represents that the player (with anonymous identifier given in the actor, name field) has selected (verb) the correct response 112 (result) in the question about the number of emergencies (object), at the given timestamp. Figure 12. Example of an xAPI-SG statement captured from the First Aid Game: the player has selected the correct response (112) in the question about the emergency number. 38 4.2.3. GLA variables The xAPI-SG statements were then analyzed to extract higher-level GLA variables by performing aggregations, retaining maximum and first values, or specific responses in some game selections. Table 4 presents the full list of GLA variables derived from the interaction data collected from the First Aid Game. Table 4. Game Learning Analytics variables derived from interaction data in the first case study (First Aid Game). Variable Name Type Description gameCompleted Binary (true, false) True if learner completed the game; False otherwise score Numerical in range [0,10] Total score obtained in the game maxScoreCP Numerical in range [0,10] Maximum score obtained in “chest pain” level maxScoreU Numerical in range [0,10] Maximum score obtained in “unconsciousness” level maxScoreCH Numerical in range [0,10] Maximum score obtained in “choking” level firstScoreCP Numerical in range [0,10] First score obtained in “chest pain” level firstScoreU Numerical in range [0,10] First score obtained in “unconsciousness” level firstScoreCH Numerical in range [0,10] First score obtained in “choking” level timesCP Integer Number of times student completed “chest pain” level timesU Integer Number of times student completed “unconsciousness” level timesCH Integer Number of times student completed “choking” level int_patient Integer Number of interactions with patient (game character, NPC) int_phone Integer Number of interactions with phone (game element) int_saed Integer Number of interactions with defibrillator (game element) failedEmergency Binary (true, false) True if learner failed, at least once, the question about the emergency number; False otherwise failedThrusts Binary (true, false) True if learner failed, at least once, the question about the number of abdominal thrusts per minute; False otherwise failedHName Binary (true, false) True if learner failed, at least once, the question about the name of Heimlich maneuver; False otherwise failedHPosition Binary (true, false) True if learner failed, at least once, the question about the initial position for Heimlich maneuver; False otherwise failedHHands Binary (true, false) True if learner failed, at least once, the question about the hand position for Heimlich maneuver; False otherwise 39 The GLA variables provide information about game completion, first and maximum scores per game level, number of tries per game level, number of interactions with the game elements and whether the selections in specific in-game questions were correct or incorrect. These variables were consequently used for the prediction models. 4.2.4. Prediction models and results The input data for the prediction models included all the variables described in Table 4, for each player. The prediction models, built using RStudio, were additionally created with and without pre-test information as input, to further determine if the pre-test is essential to predict players’ knowledge after playing or not. The target variable of the predictions is the post-test score. Two types of models were created: linear models to predict exact score in range [0-15], and classification models to predict pass/fail category (establishing pass as 8 points out of 15). The prediction models selected included those widely used in the literature for data mining applied to learning analytics data: regression and decision trees, and linear and logistic regression. While trees can show complex, non-linear relationships providing easy-to-understand models, regression is useful when data are not extremely complex or not a lot of data are gathered. Additionally, these models are white-box models, which will allow us to relate the results obtained to our input data to obtain further information related to the traces collected from the game. A priori, our dataset is not too large, so regression should still be viable; however, if complex relationships appear, trees are expected to be better at discovering them. We additionally included two methods commonly mentioned in the literature: Naïve Bayes for classification, and support vector machines for regression (SVR), testing different non-linear kernels (polynomial, radial basis and sigmoid) (Drucker, Burges, Kaufman, Smola, & Vapnik, 1997) and tuning the different parameters, with the ranges recommended in the literature (Hsu, Chang, & Lin, 2016). Models were compared using 10-fold cross validation. When predicting pass/fail, and since data were not balanced (169 students passed the post-test, while only 30 failed it), classification models were created with an undersample of 78 students (40% from the fail class, 60% from the pass class) and tested on the original sample. The results of the prediction models are presented in Table 5. Not all the variables used as input for the models (listed in Table 4) had the same relevance towards predictions. When pre-test information was included, pre-test score appeared among the most relevant variables, but so did the final game score (score), and the number of times each situation was repeated (timesCP, timesU, timesCH), the maximum score achieved in the “chest pain” game level (maxScoreCP) and, the 40 number of interactions with the game character (int_patient). Solely with game interactions, the most important variables both predict pass/fail categories and exact score included the number of interactions with the game character, and the first and maximum score achieved in the “chest pain” level. We hypothesized that, as players tend to play the game from left-to-right, the “chest pain” level was commonly played first, therefore, their results on their first level played may have a greater influence on the final knowledge acquisition. Regarding the interactions with the game character, as the game forces players to repeat some situations, a high number of interactions may point to a “trial-and-error” strategy. 4.2.5. Discussion and conclusions The highly accurate results obtained in this initial case study provide initial evidences that players can be effectively assessed based on their game interaction data. We obtained high accurate predictions of knowledge (as post-test results) from previous information: although the models that included pre-test data found it useful for the predictions, we have verified that similarly high-accurate results were also obtained predicting post-test scores solely from in-game interactions, without pre-test information. Therefore, we can focus in the following to predict players’ performance solely from in game interactions. The encouraging results obtained on this case study suggest that our approach may be generalized at least to other similar cases, such as games for procedural learning or game-like simulations with narrative structure that are quite common in several domains (e.g. military, medicine). Both can provide similar interaction data, and therefore, by following the described steps, a similar approach could be applied. This case study has some limitations, as the data used are from one serious game and a single school, which could potentially bias the results. However, we consider that Table 5. Results of prediction models of first aid knowledge for the first case study (First Aid Game). Pass/Fail prediction Score prediction (scale [0-15]) Pre-test? Data mining model Success measure Error Data mining model Error Mean (SD) Precision Recall MR Yes (pre+game) Decision tree 81.6% 94.2% 16.2% Regression tree 2.22 (0.55) Logistic regression 89.8% 98.3% 10.5% Linear regression 1.68 (1.44) Naïve Bayes Classifier 92.6% 89.7% 15.1% SVR (non-linear kernels) 1.47 (1.33) No (game-only) Decision Tree 88.6% 92.4% 17.3% Regression tree 2.38 (0.62) Logistic regression 87.2% 98.8% 12.7% Linear regression 1.89 (1.54) Naïve Bayes Classifier 89.7% 90.6% 16.9% SVR (non-linear kernels) 1.56 (1.37) 41 the approach could be generalized for a wider range of games and students with similarly accurate results. From this first case study, we obtained some initial guidelines for our final evidence- based assessment process. First of all, the interaction data was based on the game design and learning design of the game, containing both game-independent information (game completion) and game-dependent information (number of times per level, based on the design decision to be able to repeat the levels). The definition of most GLA variables was straightforward from the xAPI-SG interaction data (scores, responses, completion) but also adapted to the game characteristics (one variable per each of the three game levels). The accurate results also confirmed our proposal and allowed us to continue with the process, aiming to explore it with different serious games. A final lesson learned is that we were able to relate specific assessment results with the game design only because the final prediction models were interpretable: this allowed us to obtain information about the relevance of the variables and infer players’ strategies and relate them to learning outcomes. With black-box models, such a discussion would not have been possible as we would not have had that feedback. The process, results and details about this case study have been published in (Alonso‐ Fernández, Martínez‐Ortiz, Caballero, Freire, & Fernández‐Manjón, 2020), a publication that is included as part of this thesis. For the full text and details of the publication, see subsection 6.1.2. 42 4.3. Second case study In this section, we describe the second case study carried out to further test the evidence-based process to assess players and refine the process. To verify the generalization of the approach, we used a serious game with a different purpose (raise awareness about bullying and cyberbullying) and tested a wider set of prediction models. The following subsections describe the serious game used (4.3.1), the interaction data captured from it (4.3.2), the analysis on that data to create GLA variables (4.3.3), the prediction models used to assess players and the results obtained (4.3.4), and the discussion and conclusions of this case study (4.3.5). 4.3.1. The game: Conectado The game Conectado (Calvo-Morata, Rotaru, et al., 2020) is a serious game to raise awareness about bullying and cyberbullying, developed as part of Antonio Calvo’s thesis (Calvo Morata, 2020). Conectado places players in first person as a student that transfers into a new school and, during the first week, becomes increasingly bullied by classmates. The aggressions happen both in the school and at home, where the bullying continues via the mobile phone and the social media (i.e., it becomes cyberbullying). During the 5 in-game days, the player can interact with several in- game characters: parents, schoolmates (which represent different attitudes towards bullying and cyberbullying) and teachers. The game has a linear flow and, depending on the actions taken, such as mentioning the problem to the character’s parents or teachers, players will reach one of the three different game endings. By design, player’s choices only have an immediate effect on the next dialogues and some of the following actions in the game, but do not affect the main storyline until just before the ending. This ensures that all players will go through all the situations represented in the game, therefore having a similar awareness experience, while still experiencing their actions as meaningful, even while they have minimal effect on the overall flow of the game. Linear play also makes all playthroughs of comparable length and provides all players with a common experience for their in-class post-game discussions. Figure 13 depicts two screenshots of Conectado, showing the player in the school with classmates (left) and the in-game mobile phone, allowing different options to answer (right). 43 Figure 13. Screenshots of the serious game Conectado, used in the second case study: dialogue with a non-playable character (left) and choices in a conversation in the in-game mobile phone (right). 4.3.2. Data captured The data used in the case-study was obtained from N = 1109 participants (ages 12-17) from 11 schools around Spain. In all experiments, participants completed a pre-test, a gameplay of Conectado, and a post-test, in that order (Calvo-Morata, Alonso‐ Fernández, Freire, Martinez-Ortiz, & Fernández-Manjón, 2020). Minimal time elapsed between the gameplay and either of the two tests, and the complete sessions lasted a total of around 50 minutes, fitting in an average-length lecture session in Spain’s schools. The pre-test and the post-test both assess bullying and cyberbullying awareness before and after playing Conectado. The set of questions included in both tests derive from multiple formal and widely accepted questionnaires that have been demonstrated effective in the school population of Spain (Álvarez-García, Núñez Pérez, & Dobarro González, 2013; Garaigordobil & Aliri, 2013; Ortega-Ruiz, Del Rey, & Casas, 2016). In total, the pre-test and the post-test included 18 7-point Likert questions, eliciting how much players agree with each of 18 statements on bullying and cyberbullying. The questionnaire has a Cronbach's alpha of 0.95. The score of each test is calculated as the mean of all answers; therefore, possible test scores range from 1 to 7. As well as the responses to both questionnaires, the game interaction data traces were collected during the experiments, including in-game days starts and ends, scenes changes, and interactions (characters, objects). All traces were represented using the xAPI-SG Profile. Figure 14 depicts an example xAPI-SG statement collected from Conectado. The statement represents that the player (with anonymous identifier given in the actor, name field), has interacted (verb) with the game object computer (object) at the given timestamp. Additionally, the result field contains information about the game day and hour, and that the in-game mobile has messages. 44 Figure 14. Example of an xAPI-SG statement from Conectado: the player has interacted with the computer in the game. Additional information is encapsulated in the result field. 4.3.3. GLA variables The xAPI-SG statements were then processed to derive the GLA variables to be used in the analysis. For each type of statement, we stored the following information: • For “accessed” actions, an identifier for the target, such as “school_bathroom” • For “initialized” actions, an identifier for the object of the action, such as the full game or a specific in-game day, and a timestamp. • For “completed” actions, same information to that of “initialized”, and, if the full game has been finished, the specific ending reached within the result field. • For “interacted” actions, the target (which can be an in-game object, when using items, or a character, in the case of conversations). • For “progressed” actions, an object identifier. For example, when tracking the changes in variables that represent the level of friendship with other characters, the identifiers of those characters are used. • For “selected” actions, the object and the results of the action to track in-game decisions. For example, when players can choose to mention the ongoing bullying to parents, the results would include the player’s choice, and the object would identify the point where that choice was taken. With this information, we can calculate the values of the GLA variables (Table 6), to be consequently used in the prediction models. 45 Table 6. GLA variables derived from interactions in the second case study (Conectado). Variable name Type Description accepted_c, c in [Alison, Guillermo, Jose] true/false Player has accepted a friendship request on in-game computer of character c accessed_bathroom true/false Player has accessed the school bathroom confront_Alejandro true/false Player has confronted Alejandro duration continuous Total time playing Conectado (in minutes) duration_day_d, d in [1,2,3,4,5] continuous Total time playing day d of Conectado (in minutes) ending_number categorical Ending reached by the player: 1 for worst ending, 2 for regular, and 3 for best ending find_earring true/false Player has helped Alison to find her earring friendship_decrease_c, c in [Alejandro, Alison, Ana, Guillermo, Jose, Maria, Parents] discrete Number of times the player has decreased the level of friendship with character c friendship_increase_c, c in [Alejandro, Alison, Ana, Guillermo, Jose, Maria, Parents] discrete Number of times the player has increased the level of friendship with character c gum_washed true/false Player has washed the gum from the clothes has_ended_game true/false Player has ended the full Conectado game interactions_c, c in [Alejandro, Alison, Ana, Guillermo, Jose, Maria, Mother, Father] discrete Number of interactions the player has carried out with character c mock_Maria true/false Player has mocked Maria shared_password true/false Player has shared the password with classmates tattle_to_parents true/false Player has mentioned bullying to parents at home tattle_to_teacher true/false Player has mentioned bullying to teacher at the school used_computer true/false Player has used the computer at home used_friends_app true/false Player has used social network app on smartphone used_mobile_chat true/false Player has used instant messaging on the smartphone 46 4.3.4. Prediction models and results The prediction models in this case aim to predict the increase in bullying awareness as a result of playing the game. We define the bullying awareness increase as the difference between the post-test mean score and the pre-test mean score for each player. Therefore, this continuous variable is the target variable for prediction models. The GLA variables described in Table 6 were used as input in all the prediction models. We have used different prediction models to predict the exact value of the increase in bullying awareness and compared predicted results with those obtained in the pre- post tests. As prediction models, we chose linear regression, regression trees, Bayesian regression, Support Vector Machines for Regression (SVR), k-nearest neighbors (k- NN), neural networks, random forests, AdaBoost, and gradient boosting. All models were tested with 10-fold cross validation. For all models, different parameters were tuned to find the best ones. For each of the 9 prediction models, Table 7 shows the mean absolute error (MAE) and the standard deviation (SD) (normalized to scale [0- 10]) for the predictions with the best combination of parameters found for that model. The model that provides the best results is a Bayesian regression, closely followed by a gradient boosting model, with random forests and AdaBoost models at very similar error levels, and all other models providing acceptable results. The difference between the best models is not significant. The variables that were most relevant in the best- Table 7. Results of prediction models of bullying awareness increase for the second case study (Conectado). Prediction model Mean Absolute Error (MAE) normalized to scale [0-10] Standard Deviation (SD) normalized to scale [0-10] Linear regression 0,581 0,047 Regression trees 0,557 0,055 Bayesian ridge regression 0,540 0,053 SVR 0,556 0,051 k-NN 0,578 0,048 Neural Networks 0,557 0,050 Random Forests 0,551 0,052 AdaBoost 0,551 0,057 Gradient boosting 0,548 0,052 47 performing models included: the number of interactions with the character Jose (interactions_Jose), possibly showing that a high number of interactions with any character may be a result of a high immersion of the player in the game; the ending reached (ending_number), which is the result of the in-game actions and decisions taken, therefore, it relates to players’ behavior in the game, therefore, we consider that an adequate behavior shows higher awareness of players, and this could be related to a higher inclination to be attentive in the game and therefore further increase their awareness; the duration of in-game day 4 (duration_day_4), possibly showing that the specific content of that day (threats, theft, and identify theft) may be more relevant to the target group (12-17 years old), impacting their awareness increase; and the duration of in-game day 3 (duration_day_3), where above-average durations in these day content (higher presence of social media) may show players losing attention in the game by being distracted by the in-game social media application. 4.3.5. Discussion and conclusions In this second case study, we have replicated the accurate prediction results with a SG with a different purpose (raise awareness about bullying, and not learning), with a larger dataset and a wider range of prediction models, some more complex ones. From the similarities encountered, we consider that the approach used can be generalized to other serious games or, at least, to other linear, narrative serious games. From the steps taken, we consider that this process could be generalized to carry out other evidence-based evaluations of the effectiveness of serious games. The steps followed can be generalized, using a standard to track in-game interactions such as the xAPI-SG Profile. Once interaction data are collected, a further step towards generalization is to gather an initial set of variables to derive from the xAPI-SG traces, based on available fields such as the duration of in-game activities, and interactions with relevant in-game items and characters; which can later be complemented with game-dependent information. An initial set of variables can be used as a baseline of what game learning analytics can conclude for the serious game and can be extracted automatically if analytics traces are formatted using the xAPI-SG Profile representation. With those GLA variables, interpretable prediction models can provide information of the relevance of each variable, which can help to interpret and inform the evaluation process and its results. Moreover, using xAPI allow SGs’ developers and researchers to build and reuse a tooling ecosystem for both statements gathering, analysis and predictions. This case study has some limitations to its generalizability. First, the fact that the videogame has a narrative, almost-linear structure and a low playing time restricts the 48 variability of the interactions for players. Second, the discussion of the relevance of specific variables in our results is limited by the fact that the prediction model is not a black-box model. Finally, the creation and selection of the GLA variables is not straightforward and could limit the generalization of our approach; however, we consider that most of the GLA variables used can be automatically created based on the xAPI-SG fields providing some guidelines and recommendations. From this second case study, we were able to extract further information for our evidence-based assessment process. This case study enabled us to test our evidence- based approach with a serious game that had a different purpose (raising awareness instead of teaching) and replicating the accurate results obtained, additionally with a wider range of prediction models and a larger sample size, following our pointed-out result that larger number of participants were required in experiments. In the process, we were able to collect most of the interaction data and the GLA variables derived from it simply based on the fields and types available in the xAPI-SG Profile. This showed us the suitability of this profile for different types of serious games. We additionally handcrafted some GLA variables (the specific ending reached) that were relevant based on the learning design of the game. Again, we noticed the suitability to use interpretable prediction models, as the information that they provide about the predictive relevance of the input variables allowed us to relate specific game actions to awareness increase. The process, results and details about this case study have been published in (Alonso- Fernández, Calvo-Morata, Freire, Martínez-Ortiz, & Fernández-Manjón, 2020a), a publication that is included as part of this thesis. For the full text and details of the publication, see subsection 6.1.3. 49 4.4. Evidence-based assessment process of serious game players In this section, we present the final evidence-based assessment process of serious game players, based on the collection on interaction data to obtain GLA evidences, and predict their learning based on such evidences with data mining techniques. The process is based on the lessons learned from the work carried out and the conclusions extracted in the case studies with the serious game First Aid Game (described in Section 4.2) and the serious game Conectado (described in Section 4.3). Additionally, this assessment process derives from some previous work on GLA systematization, in which we started to explore how the application of GLA could be standardized (Alonso-Fernández, Calvo-Morata, Freire, Martínez-Ortiz, & Fernández- Manjón, 2017). This publication is included as part of this thesis. For the full text and details of the publication, see subsection 6.2.1. In that paper, we explored: 1. The use of a standard tracking model to exchange information between the serious game and the analytics platform. This allows reusable tracker components to be developed for each game engine or development platform. 2. The use of standardized analysis and visualization assets to provide general but useful information for any given serious game that sends data in the previously stated standard data format. In that work we studied the importance of determining the suitable set of GLA variables to obtain rich information from a serious game: a complete set of game- independent variables is recommended to simplify and systematize the process, with the possibility to extend them with some game-dependent variables if needed. The full evidence-based assessment process of players using serious games is a two-step process described in the following subsections: • During serious game validation o The collection of player data, both pre-post questionnaires and interactions in the game (subsection 4.4.1). o The feature extraction process to derive GLA variables from the game interaction data (subsection 4.4.2). o The prediction models with GLA variables (subsection 4.4.3). • During serious game deployment o The assessment of players based on interactions (subsection 4.4.4). 50 The process is carried out in two steps: during the game validation phase, the prediction models to assess players with are created and validated; in the game deployment phase, players’ can automatically be assessed based solely on their game interaction data, that is, questionnaires can be completely avoided, simplifying teachers’ tasks to obtain a measure of how much effect the game is having on players. The steps carried out during the game validation phase, explained further in the following subsections, are depicted in Figure 15. Figure 15. Evidence-based assessment process of players using serious games: the game interaction traces collected fill the pre-defined set of GLA variables to be used as input for the prediction models. The target variable used for prediction is based on pre-post results. 4.4.1. Collection of player data: pre-post questionnaires and game interaction data The first step in our process is the collection of the data to assess players with. For this purpose, we need to collect both pre-post questionnaires (or any other validated measure to be used as the target value for the predictions) as well as interaction data. Questionnaires should be formally validated by experts in the domain, to ensure that they provide a reliable measure of the characteristic that the game seeks to affect, such as awareness or knowledge. Although the type of data that can be collected from a serious game will depend on its content, structure and features, there are some common interactions in game analytics (GA) and learning analytics (LA) that can be extrapolated to serious games. GA data information will related to the game design (e.g. number of clicks, avatar location in the game environment and characteristics, movements and changes of scenes or levels, items used, total time spent in the game, interactions with interface elements and non-player characters, points scored, in-game selections, and quest 51 completions), while LA data will focus on the learning design (achievements, errors made, responses). GLA data comprises both the game and learning design of the games, reflecting information about the learning progress and process of players/learners. To systematize the specific data to be collected from serious games, we propose the use of the validated and standardized Experience API for Serious Games Profile (xAPI- SG) (Serrano-Laguna, Martínez-Ortiz, et al., 2017), described in detail in section 2.2.1. The use of a standard data format, such as xAPI-SG, is a clear benefit to provide a step further in the systematization of the collection of traces and their analysis to derive relevant information from user gameplays. Standard formats facilitate the integration of tools from different providers and help to comply with personal data-protection laws: art. 20 of the EU GDPR requires data controllers to use a “structured, commonly used and machine-readable format” when users request access to their data, or transfer to other data controllers (European Commission, 2018). Additionally, as we found out in the systematic literature review detailed in section 4.1, standard data collection formats are not commonly reported on the literature, and their uptake would greatly assist in result replication and data sharing. Having a common interchange format also fosters the creation of a tool ecosystem created by different actors. 4.4.2. Feature extraction process: GLA variables from interaction data Once the raw traces with user interaction data are collected, they can be analyzed to extract higher-level meaningful information about the actions of players within the game, in what is called a feature engineering process. Our process synthesizes the information available in the data traces (collected in xAPI-SG format) into a smaller set of GLA variables. Ideally, the definition of such variables should be described in the game’s Learning Analytics Model (LAM) (I. Pérez-Colado, Alonso-Fernández, Freire, Martínez-Ortiz, & Fernández-Manjón, 2018), cooperatively created by both educational experts and game designers. LAMs build on the game's learning design and game design, which define the educational goals of the game and how these are reflected on the specific game design choices taken depending on their educational goals. Based on both designs, a LAM determines the data to be collected from the game and how these data are to be analyzed into GLA variables and interpreted to provide meaningful information about the actions of a player in the game. It also may define any posterior visualization, feedback or reporting to do with the analysis results. 52 If such a LAM is not available, the definition of the GLA variables can be based on: • Game designers’ suggestions about what information to obtain from the game and analyze it into GLA variables. • Using the xAPI-SG as the data collection standard, we can provide a default set of GLA variables to be easily extracted from the fields available in the Profile. While game-specific variables as specified in a LAM or suggested by expert designer knowledge are of course preferable, a set of ready-to-use generic variables can be highly useful to complement game-specific variables, and allows the use of our process even when no LAM or designers are available. Table 8 details the xAPI-SG fields and the GLA variables that can be obtained from the fields, providing a non-exhaustive set of pre-defined GLA variables for each player that can be easily derived from any set of traces that follow the xAPI-SG Profile. Such variables include the number of interactions with each in-game object and character (count of interacted traces per object), or the duration of each level/game (difference in timestamp of completed and initialized traces per object of type serious-game or level). • Analysis and visualizations of the xAPI-SG traces can provide important insights on the data collected and guide the choice of some GLA variables. For the latest purpose, we have created our data science environment called T-MON (a trace monitor in xAPI-SG format), detail below. T-MON: Monitor of traces in xAPI-SG To support the feature extraction process, we have developed T-MON (Alonso‐ Fernández, Calvo-Morata, Freire, Martínez-Ortiz, & Fernández‐Manjón, 2021), a monitor of traces in the standard xAPI-SG. T-MON contains a set of Python Jupyter Notebooks that provide a default set of analyses and visualizations that can be applied to any given JSON file containing xAPI-SG traces: overall game progress; choices in alternatives, and if applicable, those considered correct and incorrect; progress, scores and times per game activity or subsection; content seen and skipped; and interactions with game items and areas and over time. The interactive interface allows to filter the data and configure the visualizations to gain a more in-depth insight into the data (Figure 18, right). T-MON is intended both to provide quick overviews of collected data and to allow in-depth exploratory analysis to refine the choice of GLA variables that will be used in subsequent steps: the Jupyter Notebooks (Project Jupyter, 2020) 53 T-MON builds upon are a commonly used tool in data science to perform such analyses and provide access to an extensive and actively maintained collection of utilities to manipulate and explore data (Jupyter Team, 2020). The variables included in Table 8 also constitute a good starting point for refinement using T-MON. Some of the visualizations included in T-MON are depicted in Figure 16 (left to right, top to bottom): pie chart with percentage of serious games started and completed; line chart with progress (y-axis) of each player in the game over time (x-axis); bar chart Table 8. Correspondence of xAPI-SG traces (object type, verb and other fields) to derive GLA variables. xAPI-SG fields GLA variables Object type Verb Other fields Name Description Accessible: area, cutscene, screen, zone Accessed Object id Accessed_id Number of times the accessible id has been accessed Skipped Object id (cutscene) Skipped_id Number of times the cutscene id has been skipped Completable: serious-game, level, quest Initialized Object id, timestamp Duration_id Duration of completable id (calculated in combination with completed trace of same id) Progressed Object id, result progress, timestamp Progress_id_time Progress in completable id per timestamp time Completed Object id Completed_id True if completable id has been completed Object id, timestamp Duration_id Duration of completable id (calculated in combination with initialized trace of same id) Object id, result score Score_id Score obtained in completable id Alternative: question, dialog-tree, menu Selected Object id (question), result success Correct_id True if question id has been successfully answered Object id (dialog), result response Response_id Response selected in dialog id Object id (menu), result response Selection_id Option selected in menu id Target: non-player character, enemy, item Interacted Object id Interactions_id Number of interactions with target id Used Object id (item) Uses_id Number of uses of item id 54 Figure 16. Four of the default visualizations included in T-MON presenting information about games completion, progress, completion times and scores in completables. Figure 17. Four of the default visualizations included in T-MON presenting information about correct and incorrect responses in alternatives per player and per question, accessibles and interactions. 55 with maximum and minimum completion times (y-axis) in each completable (x-axis), max and min times corresponding to each bar per completable; and bar chart with scores (y-axis) obtained by each player in each completable (x-axis), each bar per completable corresponding to one player; and in Figure 17 (left to right, top to bottom): bar chart with correct (in green) and incorrect (in red) number of responses (y-axis) in alternatives per player (x-axis); bar chart with correct (in green) and incorrect (in red) number of responses (y-axis) per alternative (x-axis); heatmap with times each accessible (y-axis) has been accessed per player (x-axis); and bar chart with number of interactions (y-axis) per player (x-axis) with an item. T-MON is open-source and freely available in a GitHub repository2 providing information about the use of the tool and the analysis and visualizations provided (Figure 18, left). Figure 18. T-MON main GitHub repository page (left), and interface with configuration options (right). The publication describing T-MON is included as part of this thesis. For the full text and details of the publication, see subsection 6.2.2. Once the GLA variables have been chosen, synthetizing the information obtained from users’ interactions, they can be used to predict the serious game's effect on players, as described in the following section. 4.4.3. Assessment prediction with GLA evidences The next step is to create the prediction models to accurately measure the effect of the game on its players. To consider a serious game effective in educational scenarios, it first needs to be validated, ideally using a formal validation process (like pre-post 2 https://github.com/e-ucm/t-mon https://github.com/e-ucm/t-mon 56 questionnaires). We use the formal validation step to create the prediction models that will be used in the deployment phase. During these experiments, we also collect relevant GLA data from players’ in-game interactions. The prediction models then have: • As input, the GLA variables filled with the data collected from players’ interactions with the game. • A target variable, the effect cause by the game on players. By default, that is the improvement in score (difference between pre- and post- questionnaire results, as in the second case study), but if we were only interested in measuring the final effect on players after playing, the post questionnaire score alone could be used as the target variable (as in the first case study). This process is experimental and can be iterated until accurate-enough models are created, by changing and refining the GLA variables according to their relevance as reported by the results of the prediction models. In our experiments, we have found accuracies above 90% to be achievable, and suggest this figure as a workable goal. Once an accurate-enough level is reached, the final prediction model is retained for the next step of deployment, where it will be used for automatic non-intrusive assessment of players based solely on their interaction data. For the specific prediction models to be tested, an increasingly broad and varied range of options is available. At least in the first iterations, we recommend using interpretable models that provide information about the relevance of the input variables towards the predictions. These xAI models (Adadi & Berrada, 2018) will provide feedback about the importance of specific GLA variables (and, therefore, about users’ interactions) for the predictions, allowing to improve the process before moving to deployment. As exemplified in the two case studies, the use of such interpretable models also allows to relate the assessment results to specific behaviors or choices made by players, better understanding their learning process and providing feedback. Linear and tree-based prediction models are a simple baseline to start from. More complex models may improve the results: for instance, ensemble methods based on trees, such as random forests or gradient boosting. These complex models could provide more precise results while still giving feedback about how relevant the input variables are towards the prediction results. The models may then be reused and adapted for different contexts. Traces can be re-examined to generate additional GLA variables or change existing ones based on variable relevance as reported by such models. 57 As for the number of users to include in this validation phase, considering the reported number of users in other data-based research on serious games (as described in our systematic literature review, section 4.1), we recommend including at least 100 users. The information gathered during the validation phase can also be used to improve the game or adapt it to players’ characteristics for a better learning experience. 4.4.4. From game validation to game deployment As we have described, our full evidence-based assessment process involves the steps of the serious game validation (collecting both questionnaires and GLA data), and the serious game deployment where assessment is simplified and solely based on players’ interaction data. Once the serious game has been formally validated, the deployment phase can start, with the game applied in classrooms and other real-world educational settings. To be able to gather information from users’ experience and to assess them based on their interactions, this application should include the collection of data from relevant interactions. The deployment process for large-scale scenarios reads as follows: 1. Gameplay: Students access the SG and play the game from beginning to end. We have used anonymous identifiers that allow only teachers to de-anonymize student data to ensure that privacy requirements are met while still linking questionnaire responses to each student’s game-interaction data. 2. Data collection: A tracking component integrated in the game sends the relevant traces generated from player interactions to the analytics tool while students are playing. The user interaction traces should follow a well-defined format (e.g. xAPI-SG), as required by the analytics tool that will receive it. 3. Feature extraction: The analytics tool takes interaction data as input, and uses it to fill the values of the pre-defined GLA variables. These variables are then used as input to the previously created prediction models, to derive prediction outputs for the students’ assessment. 4. Assessment: Once students have finished playing, teachers will receive the predicted score based on each students’ in-game interactions. They can then use this information, together with any other evaluation of their own, to obtain the final students’ assessment. Note that the prediction models provide the assessment output for students once they have finished playing the game, and therefore once all the input data required by the models is available. 58 The assessment obtained with this process is therefore automatic and non-intrusive, and simplified from both ends: teacher preparation and execution times typically required for post-game assessment are removed, and students will simply play a game without the added time, disruption, and pressure of completing the questionnaires. Game-based assessment can also provide institutions and managers means of evaluating the efficacy of games for education and simplify the assessment process of their students. The previously used pre-post questionnaires are no longer required during the large-scale deployment, which simplifies the application of SGs in real-world educational settings. This allows students to play the game for longer periods, and/or teachers to include additional activities related with the gameplay (e.g. discussion, post-game questions), instead of the traditional student assessment. Figure 19 depicts the deployment phase of a serious game using our evidence-based assessment process, once questionnaires are no longer required (note the differences with the game validation phase, depicted in Figure 15). Figure 19. Evidence-based assessment process of serious game players: after validating the game and the prediction models, during the game deployment, players are assessed solely based on their game interactions. The process, results and details about this full evidence-based assessment process have been published in (Alonso‐Fernández, Freire, Martínez-Ortiz, & Fernández-Manjón, 2021), a publication that is included as part of this thesis. For the full text and details of the publication, see subsection 6.1.4. 59 4.5. Discussion The evidence-based assessment process presented, and the case studies that exemplify it, have some limitations. The full process is closely tied to the use of the xAPI-SG Profile. This standard may fail to capture the particularities of specific serious games – however, the standard was created after a study of the common interactions in serious games, so we consider that it is a suitable baseline to define general information that can be extracted from a majority of serious games. The games used in the case studies do not cover the variety and complexity of the serious games that exist. To try to tackle these issues, we have selected two serious games that, at least, have different goals (teaching knowledge and raising awareness), with different mechanics. Still, both serious games are similar in their narrative structure and the importance of the dialogues and selection of choices made by players in alternatives. Therefore, we consider that our approach could yield similar results at least in similar narrative serious games. Further research is needed to explore this approach with different types of serious games. As of the participants includes in the experiments, we have tried to ensure larger samples sized to tackle the identified gap in the literature. However, there are also some limitations regarding their characteristics: the participants in the first case study were from the same school, which could bias the results. In the second case study, participants were from different schools in the same country – Spain. Regarding the steps of the evidence-based assessment process, we have highlighted what we consider to be the key step: the selection of GLA variables from game interaction data (feature extraction process). We consider that this step is essential as we consider, like other authors have pointed out, that the selection of variables is of higher importance for the results than that of the prediction models. To try to support this key step, we have provided both a default set of GLA variables (derived automatically from using the xAPI-SG standard), as well as the exploratory analysis tool T-MON, to obtain a richer insight into the collected interaction data traces, to derive further game-dependent GLA variables if needed. For this step to yield the best results, it is of course preferable that analytics have been included from the early stages of the serious games design. This simplifies all the steps as, as well as analytics are been design, they could also imply changes in the game design to obtain the desired information. Finally, for the prediction models included, we have recommended the use of explainable models: black box models that provide no information could improve the accuracy of the assessment but in earlier stages of the process it could be preferable to use xAI models with information about the predictive relevance of the included GLA variables to obtain feedback and improve the process. 60 The work carried out in the thesis is based on some previous exploratory work about the applications of GLA for assessment with serious games. In particular, we had explore how LA for SGs can provide insight to improve the application of serious games in different stages of serious games lifecycle (Alonso-Fernández, Cano, et al., 2019; Alonso‐Fernández et al., 2018) including validation of the design of a serious game, improvement of the deployment of serious games, and assessment of players using serious games. In that work, that combined three different scenarios of application of LA in SGs, we obtained some lessons learnt that included: • The importance of using a standard format to collect the data, to simplify integration with other systems (e.g. real-time information analytics system), compare information from different games and even reuse and share the collected data for research purposes. • The wide range of purposes to apply GLA data with serious games: validation of game design, simplify deployment, and assessment, and provide further information about students’ actions in game, engagement and motivation. • The different stakeholders that can benefit from such applications: game designers and developers to simplify validation of their designs; teachers and educators, to simplify application of games in classrooms; and students/learners to be effectively assessed based on their actions. The process, results and details about these initial exploratory works were initially published in (Alonso‐Fernández et al., 2018), a publication that is included as part of this thesis (for the full text and details of the publication, see subsection 6.2.5), and then extended in the publication (Alonso-Fernández, Cano, et al., 2019), a publication that is included as part of this thesis. For the full text and details of the publication, see subsection 6.1.5. The application of GLA for serious games does not only benefit the assessment of players: the information obtained with GLA data can improve the serious games, in all steps of their life cycle (from design and development, to validation and deployment). We also explored these opportunities in some initial works of this thesis. The integration of GLA with a game authoring tool, including the collection of GLA data, and its analysis and visualization using a real-time information analytics system can simplify the deployment and application of serious games in educational scenarios (Alonso-Fernández, Rotaru, Freire, Martínez-Ortiz, & Fernández-Manjón, 2017). For that, we proposed the integration of GLA in the game authoring tool uAdventure, the use of the standard xAPI-SG to standardize the data collection, and a default set of analysis and visualizations for the main stakeholders involved. Additionally, this 61 possibilities can help to improve serious games lifecycle at two stages: while games are in play, providing real-time information to teachers and students in the form of dashboards or warning messages, the deployment of games can be more adapted to each player’s needs and progress; while, after gameplays are finished, further analysis can provided deeper information from the gathered data (Alonso-Fernández, Pérez- Colado, et al., 2019). The latest includes, of course, the application of GLA data for assessment purposes. The process, results and details about these works are published in (Alonso-Fernández, Rotaru, et al., 2017), a publication that is included as part of this thesis, and whose full text and details are included in subsection 6.2.3, and in (Alonso-Fernández, Pérez- Colado, et al., 2019), a publication that is included as part of this thesis, and whose full text and details are included in subsection 6.2.4. The evidence-based assessment process includes the lessons learnt and conclusions obtained from those initial exploratory works. The described process aims to provide some guidelines to carry out assessment of players using serious games, based on standards (the data collection format xAPI-SG) and with tools to support the essential process of creation and selection of GLA variables (with the exploratory tool T-MON). With these guidelines, we hope to simplify the assessment process of players with serious games, which can be adapted and personalized for each serious game and scenario. Still, we consider that the described process can help to improve the assessment method of players with GLA data and data mining techniques. With this simplification, the application and deployment of serious games in educational scenarios could be fostered with clearer evidences on their impact on players. 62 Chapter 5. Conclusions, contributions and future work This chapter presents the final conclusions of the thesis, a summary of all the contributions, and the identified limitations of this work. This chapter also introduces some of the possible lines for future research in related fields to the work of the thesis. 5.1. Conclusions The assessment of serious games’ players has been traditionally conducted with external questionnaires that fail to assess players based on their actual actions while playing the games. However, to increase the application of serious games in educational scenarios, traditionally limited to an additional motivational activity with no impact in the evaluation of students, we consider it essential to improve the use of serious games to assess players, providing accurate measures of how they affect their players. Game Learning Analytics application provide a richer and deeper information about players’ actions, progress, and results within serious games, allowing to detect players’ strategies and behaviors, creating players profiles and adapting the game experience to each player. The potentially large-scale and rich information gathered from game learning analytics data can be further analyzed with complex data mining techniques. This combination offers new opportunities to improve serious games’ players assessment. From the literature review of data science applications to game learning analytics data in serious games, we identified some challenges pointed out by authors: for instance, the few studies reporting evidences to effectively assess students, the lack of standardization in the assessment process, the lack of standards in the collection of interaction data and the limited number of participants in many studies. Based on that background, we proposed an evidence-based assessment approach that combines the collection of game interaction data, analyzed into GLA data variables, with data mining techniques to obtain accurate predictions of serious games’ impact on players. The approach proposed aims to be a step forward to simplify players’ assessment using serious games. The use of standards (in our case, xAPI-SG) is a clear benefit to simplify the collection of game interaction data, its analysis into GLA variables, and the sharing of data and replicability of the results. Through using the standard data format, it is possible to define a set of game independent GLA variables, 63 providing a baseline of the information that can be extracted from many serious games by default (as the xAPI-SG Profile was also created based on the most common interactions present in serious games). To further support the key step of feature extraction, we developed the exploratory tool T-MON to help in the definition and selection of further GLA variables, based on the collected interaction data in xAPI-SG format. The complete evidence-based assessment process has been exemplified in two case studies with SGs that have different purposes (teaching first aid procedures, and raising awareness about bullying and cyberbullying) and with different prediction models. In each case study, we have showcased how the game design and learning goals were turned into game mechanics and interactions, the game interaction data collected and how it was encapsulated in the xAPI-SG statements, and how the default game- independent GLA variables, together with some defined game-dependent ones, were derived. In both cases, the different prediction results have been highly accurate in predicting the effect of each game on players, and we have been able to obtain measures about the relevance of the variables towards the predictions. This allowed us to discuss the results, and provide some plausible explanations of how each variable affected the assessment, based on players’ strategies and actions. The case studies also show how the general evidence-based assessment process can be adapted for particular serious games, while keeping most of the steps as general as possible to increase replicability. We consider that, although each serious game will have its particularities, the process is general enough so that it can provide a reliable baseline (also fostered by the application of the xAPI-SG standard) to use with different serious games, or at least with similar narrative-based serious games. The approach has some limitations. The evidence-based assessment process aims to be general to increase replicability but this, of course, hinders the efficacy in each particular case as important game-dependent information may be omitted. For that purpose, we have further designed and developed T-MON, a tool to support the key step of selecting GLA variables, by showing particularities of the data that may have been overlooked by the default game-independent variables. The T-MON tool and the default GLA variables are based on the xAPI-SG Profile. If some relevant game interactions cannot be effectively collected with the types and fields available in the Profile, again they could be omitted. This should be fixed during the definition of the game interactions to be collected from the game: the extension field available in the Profile could, in many cases, suffice to collect other relevant information from the game. For more complex situations, it could be necessary to define game-specific verbs 64 or activity types to collect the data. However, we consider that the xAPI-SG Profile is a good baseline, as it was defined based on the creation of a general interaction model stating the most common interactions in serious games. The serious games used in the case studies are also limited in their goals and characteristics, but being different enough, we consider that they provide some guidelines on how to apply the evidence- based assessment process in different scenarios. The proposed approach further synthetizes the lessons learned in our previous work in two H2020 European projects, RAGE and BEACONING, regarding the large-scale deployment of serious games including the collection and analysis of game learning analytics data. That perspective has helped us to focus on the systematization of players’ assessment, using standards, and simplifying the tasks for teachers and educators to obtain evidences about how much players are learning with serious games. The work carried out during the research stay at Florida State University has also allowed us to contrast our approach with an expert research group in the field of stealth assessment, emphasizing the need to systematize players’ assessment, and the opportunities that the collected game learning analytics data provide to validate different parts of the game design. With the steps carried out to systematize the assessment of serious games’ players, we can contribute to ease the application of serious games in different educational contexts, providing all stakeholders involved with more accurate and data-based evidences on games effect on players. 5.2. Contributions The main contributions of this thesis are centered around the systematization of the assessment of serious games’ players, using game learning analytics data, standards, and data mining techniques. With the systematization of this evidence-based assessment process, we aim to simplify the deployment and application of serious games in a wider range of educational scenarios, by providing accurate and data-based information that can better prove their efficacy. With this information, teachers, educators and institutions will have further evidences of the effect of games on players, and this can contribute to students/players being more effectively assessed while using these learning tools. The application of serious games in educational scenarios could be fostered following systematic approaches like the one presented, and increase their current limited role as simple motivational activities with no impact on students’ final evaluations. The main specific contributions of the thesis are: 65 • A systematic literature review about data science applications of GLA in serious games: this review provides a detailed overview of the different purposes for which studies have applied data science to the game learning analytics data collected from serious games, the different data analysis techniques applied to such GLA data, the stakeholders targeted by such applications and the varied results obtained in the studies. The review additionally provides information about the serious games used and their purposes, the target participants and the sample sizes included in the studies, and the interaction data collected from such gameplays, and their format. After presenting the results of the studies reviewed, we further point out areas for improvement and recommendations for future research on the area, such as the need to increase the average sample size of participants and use standards for the interaction data collected to simplify replicability of results and data sharing. • A full assessment process of players using serious games based on the collection of interaction data and its analysis with data mining techniques: the described process provides a step-by-step methodology to effectively assess players using serious games based solely on their game interaction data. The evidence-based assessment process details all the steps from game and learning design, the interaction data to be collected, and the standard data format xAPI-SG recommended to collect it, the feature extraction process into GLA variables, and the prediction models to simplify assessment. The process comprises two steps: (1) the initial game validation phase is used to create and validate the prediction models, collecting both traditional formally-validated pre-post questionnaires and game interaction data; and (2) in the final game deployment phase, after the game and the predictions models are validated, assessment is simplified as players can be automatically assessed solely from their interaction data. This way, teachers are provided with simplified tools to assess their students, without relying on external questionnaires. The process also includes the use of standards: the xAPI-SG Profile to collect the interaction data is recommended as it simplifies both the definition of the interaction data to be collected, as well as the feature extraction process into GLA variables, a large number of which can be directly defined based on the fields and types available in the standard. • T-MON, a monitor of xAPI-SG traces, to support the evidence-based assessment process: the T-MON exploratory tool helps in the essential step of the feature extraction and selection of GLA variables from the game interaction data, as it provides an exploratory interface with ready-to-use analysis and 66 visualizations of the interaction data collected in the xAPI-SG format. The default set of analysis and visualizations included in T-MON provide a deeper insight into the data collected, to define additional GLA variables, beyond the ones created by default following the fields in the standard. Additionally, T- MON simplifies the work of data scientists with standard-based GLA, as it decreases the cost to learn these techniques, allowing them to work in a familiar data science environment, without needing to be experts in xAPI or GLA. • Two case studies that exemplify the evidence-based assessment process: the case studies are based on two serious games with different goals (learning knowledge and raising awareness), and using different predictions models (from classic and simple prediction models to more complex models). The case studies describe in detail all the steps carried out from the game design and learning goals, the game interaction data collected using the xAPI-SG standard format, the creation of both by-default and game-specific GLA variables, the predictions models used, and the high-accurate results obtained. The discussion of the results also served to relate predictive relevance to the game design, uncovering how some players’ behaviors relate to learning. • Additional publications that have resulted from the work conducted in this thesis: 5 journal publications (Alonso-Fernández, Cano, et al., 2019; Alonso- Fernández et al., 2020a; Alonso-Fernández, Calvo-Morata, et al., 2019; Alonso‐ Fernández, Freire, et al., 2021; Alonso‐Fernández et al., 2020) and 5 conference publications (Alonso-Fernández, Calvo-Morata, et al., 2017; Alonso-Fernández, Pérez-Colado, et al., 2019; Alonso-Fernández, Rotaru, et al., 2017; Alonso‐ Fernández et al., 2018; Alonso‐Fernández, Calvo-Morata, et al., 2021); as well as other related publications published during these years that do not belong to the core of the thesis (Alonso-Fernandez et al., 2020; Alonso-Fernández, Perez-Colado, et al., 2019; Alonso-Fernández, Calvo-Morata, Freire, Martínez- Ortiz, & Fernández-Manjón, 2020b). 5.3. Future work To continue verifying the suitability of our evidence-based approach to assess players of serious games, a clear line of future research is to test the assessment approach in different and more varied contexts. First, the assessment process could be applied with different serious games: both with serious games that have different educational goals than the ones tested (e.g. changing players’ attitudes or behaviors towards some issues) or serious games that have the same goals but in different contexts (e.g. games that aim to teach different knowledge, 67 or raise awareness about other social issues). This application includes the integration of the assessment process with other already-developed serious games, to use our evidence-based approach to assess their players. Regarding the contexts of application for other serious games, one of the fields that we have started to explore, and we aim to continue to do so, is the application of serious games to educate about gender equality. In this area, we have started to study the applications of serious games to raise awareness about different gender inequalities, sexist behaviors, etc. As many of these inequalities and behaviors start during childhood, when games are commonly used by children and teenagers, serious games seem like a particularly suitable tool to raise awareness about these topics. Additionally, the assessment process could include testing different prediction models: in the case studies, we have included some classical and simple prediction models, as well as some other more complex models. The field of data mining is in constant change, therefore new prediction models are currently being created and tested. For instance, some studies are exploring new prediction models that only include few training data points (Sucholutsky & Schonlau, 2020; Wang, Zhu, Torralba, & Efros, 2018). New predictions models could be tested to improve the assessment results. A further step will be to embed the assessment process in the development of a new serious game: considering the assessment process while creating a new serious game, to fully integrate all the steps of the assessment process from scratch during the design and development of the game. This way, all the suitable game interaction data to be collected, analyzed and used during the assessment will be clearly defined from the beginning of the design process (this aligns with the recommendation made by several authors, as described in the literature review, that the interaction data for assessment should be defined early on during the game design process). Moreover, these steps could be iterated as needed while the game is still being developed, extending the interaction data collected from the game or changing the game design to provide other educationally relevant data. The assessment process could also be integrated with some of the current set of tools available in our research group: uAdventure and SIMVA. uAdventure (I. Pérez-Colado et al., 2017) is an authoring tool to create serious games, including geolocalized capabilities and learning analytics. The steps of the assessment process presented could help game designers using uAdventure to improve the analytics metrics collected, defining their game-specific ones, besides the default set of analytics provided by uAdventure. 68 SIMVA (I. J. Pérez-Colado et al., 2019) is a tool to simplify the experiments with serious games, managing students groups, anonymization, questionnaires and interaction data. With the collected xAPI-SG interaction data in SIMVA, we could provide the prediction models to assess students: integrating SIMVA with a data science tool such as Python Jupyter notebooks, with the previously-created prediction models during the game validation phase, the interaction data collected from each player could be analyzed, the GLA variables derived, and with the predefined prediction models, obtain an assessment measure for players after they have completed their gameplays. This will automatically simplify the assessment of players in educational scenarios. Following some of the steps of our assessment approach, and those of the field of stealth assessment, the techniques applied can be used to further explore the relationship between players’ actions in specific contents of the game (e.g. learning supports and incentive systems) and learning outcomes and performance. In this regard, during the research stay carried out as part of this thesis in Florida State University from February to May 2020, we conducted two studies that have been published in two additional JCR journal publications (Rahimi et al., 2021; Yang et al., 2021). The xAPI standard is currently being prepared and updated for IEEE standardization. An additional future line of work will be to adapt the process to this new version of xAPI as an IEEE standard once it is completed. Moreover, an extension of the xAPI- SG Profile for the particular case of geolocalized serious games was also proposed; the steps in the evidence-based assessment process could be adapted to include the types and fields of this version of the Profile to improve the assessment when using geolocalized games: these games are particularly adequate in the current pandemic situation, as they allow learning experiences to be conducted outdoors and keeping social distance. The exploratory tool T-MON can also be extended with new functionalities: so far, the tool is highly tied to the xAPI-SG Profile, providing default analysis and visualizations that are restricted to the fields and types available in the standard. However, the tool could be further extended to include other game-specific information, with an interface that allows users to specify other fields and types that are included in their data traces, and how to analyze them (e.g. providing a default set of analysis and visualizations to select from). The current interface could also be improved by extending the current set of visualizations (e.g. statistical visualizations like boxplots) or providing further configuration options. 69 Chapter 6. Publications 6.1. Journal publications This section contains the journal publications included in this thesis. The following subsections present in detail each publication, full citation and impact metrics, abstract and full text of the publication. As an overview, the journal publications included in the thesis are the following: 1. Applications of data science to game learning analytics data: a systematic literature review: this publication presents the systematic literature review carried out about the applications of data science techniques to game learning analytics data from serious games. The process and results of this work are included in this thesis as part of the related work section and in the results, in subsection 4.1. 2. Predicting students’ knowledge after playing a serious game based on learning analytics data: A case study: this publication presents the first case study carried out to test the evidence-based assessment process with the serious game First Aid Game. The process and results of this work are included in this thesis as part of the results, in subsection 4.2. 3. Evidence-based evaluation of a serious game to increase bullying awareness: this publication presents the second case study carried out to test the evidence- based assessment process with the serious game Conectado. The process and results of this work are included in this thesis as part of the results, in subsection 4.3. 4. Improving evidence-based assessment of players using serious games: this publication presents the final evidence-based assessment process obtained after the two case studies, to assess players based on their interaction data with serious games, using game learning analytics data and data mining techniques. The process and results of this work are included in this thesis as part of the results, in subsection 4.4. 5. Lessons learned applying learning analytics to assess serious games: this publication presents an overview of lessons learned applying learning analytics to assess serious games in different contexts and with different purposes. The process and results of this work are included in this thesis as part of the results, in subsection 4.5. 70 6.1.1. Applications of data science to game learning analytics data: a systematic literature review Full citation Cristina Alonso-Fernández, Antonio Calvo-Morata, Manuel Freire, Iván Martínez- Ortiz, Baltasar Fernández-Manjón (2019): Applications of data science to game learning analytics data: a systematic literature review. Computers & Education, Volume 141, November 2019, 103612. DOI: 10.1016/j.compedu.2019.103612. Impact metrics: JCR 2019, Impact Factor: 5.296, Q1 in Computer Science, Interdisciplinary Applications. Abstract Data science techniques, nowadays widespread across all fields, can also be applied to the wealth of information derived from student interactions with serious games. Use of data science techniques can greatly improve the evaluation of games, and allow both teachers and institutions to make evidence-based decisions. This can increase both teacher and institutional confidence regarding the use of serious games in formal education, greatly raising their attractiveness. This paper presents a systematic literature review on how authors have applied data science techniques on game analytics data and learning analytics data from serious games to determine: (1) the purposes for which data science has been applied to game learning analytics data, (2) which algorithms or analysis techniques are commonly used, (3) which stakeholders have been chosen to benefit from this information and (4) which results and conclusions have been drawn from these applications. Based on the categories established after the mapping and the findings of the review, we discuss the limitations of the studies analyzed and propose recommendations for future research in this field. 71 Full publication 72 73 74 75 76 77 78 79 80 81 82 83 84 85 6.1.2. Predicting students’ knowledge after playing a serious game based on learning analytics data: A case study Full citation Cristina Alonso-Fernández, Iván Martínez-Ortiz, Rafael Caballero, Manuel Freire, Baltasar Fernández-Manjón (2020): Predicting students’ knowledge after playing a serious game based on learning analytics data: A case study. Journal of Computer Assisted Learning, vol. 36, no. 3, pp. 350-358, June 2020. DOI: 10.1111/jcal.12405. Impact metrics: JCR 2019, Impact Factor: 2.126, Q2 in Education & Educational Research. Abstract Serious games have proven to be a powerful tool in education to engage, motivate, and help students learn. However, the change in student knowledge after playing games is usually measured with traditional (paper) prequestionnaires– postquestionnaires. We propose a combination of game learning analytics and datamining techniques to predict knowledge change based on in-game student interactions. We have tested this approach in a case study for which we have conducted preexperiments–postexperiments with 227 students playing a previously validated serious game on first aid techniques. We collected student interaction data while students played, using a game learning analytics infrastructure and the standard data format Experience API for Serious Games. After data collection, we developed and tested prediction models to determine whether knowledge, given as posttest results, can be accurately predicted. Additionally, we compared models both with and without pretest information to determine the importance of previous knowledge when predicting postgame knowledge. The high accuracy of the obtained prediction models suggests that serious games can be used not only to teach but also to measure knowledge acquisition after playing. This will simplify serious games application for educational settings and especially in the classroom easing teachers' evaluation tasks. 86 Full publication 87 88 89 90 91 92 93 94 95 6.1.3. Evidence-based evaluation of a serious game to increase bullying awareness Full citation Cristina Alonso-Fernández, Antonio Calvo-Morata, Manuel Freire, Iván Martínez- Ortiz, Baltasar Fernández-Manjón (2020): Evidence-based evaluation of a serious game to increase bullying awareness. Interactive Learning Environments, 2020. DOI: 10.1080/10494820.2020.1799031. Impact metrics: JCR 2019, Impact Factor: 1.938, Q2 in Education & Educational Research. Abstract Game Learning Analytics can be used to conduct evidence-based evaluations of the effect that serious games produce on their players by combining in-game user interactions and traditional evaluation methods. We illustrate this approach with a case-study where we conduct an evidence-based evaluation of a serious game’s effectiveness to increase awareness of bullying. In this paper, we describe: (1) the full process of tracking in-game interactions, analyzing the traces collected using the standard xAPI-SG format, and deriving game learning analytics variables (to be used as evidences); and (2) the use of those variables to predict the increase in bullying awareness. We consider that this process can be generalized and replicated to systematize, and therefore simplify, evidence-based evaluations for other serious games based on the interaction data of their players. 96 Full publication 97 98 99 100 101 102 103 104 105 106 107 6.1.4. Improving evidence-based assessment of players using serious games Full citation Cristina Alonso-Fernández, Manuel Freire, Iván Martínez-Ortiz, Baltasar Fernández- Manjón (2021): Improving evidence-based assessment of players using serious games. Telematics and Informatics. (in press). DOI: 10.1016/j.tele.2021.101583. Impact metrics: JCR 2019, Impact Factor: 4.139, Q1 in Information Science & Library Science. Abstract Serious games are highly interactive systems which can therefore capture large amounts of player interaction data. This data can be analyzed to provide a deep insight into the effect of the game on its players. However, traditional techniques to assess players of serious games make little use of interaction data, relying instead on costly external questionnaires. We propose an evidence-based process to improve the assessment of players by using their interaction data. The process first combines player interaction data and traditional questionnaires to derive and refine game learning analytics variables, which can then be used to predict the effects of the game on its players. Once the game is validated, and suitable prediction models have been built, the prediction models can be used in large-scale deployments to assess players solely based on their interactions, without the need for external questionnaires. We briefly describe two case studies where this combination of traditional questionnaires and data mining techniques has been successfully applied. The evidence-based assessment process proposed radically simplifies the deployment and application of serious games in real class settings. 108 Full publication 109 110 111 112 113 114 115 116 117 118 6.1.5. Lessons learned applying learning analytics to assess serious games Full citation Cristina Alonso-Fernández, Ana Rus Cano, Antonio Calvo-Morata, Manuel Freire, Iván Martínez-Ortiz, Baltasar Fernández-Manjón (2019): Lessons learned applying learning analytics to assess serious games. Computers in Human Behavior, Volume 99, October 2019, Pages 301-309. DOI: 10.1016/j.chb.2019.05.036. Impact metrics: JCR 2019 Impact Factor: 5.003, Q1 in Psychology, Experimental. Abstract Serious Games have already proved their advantages in different educational environments. Combining them with Game Learning Analytics can further improve the life-cycle of serious games, by informing decisions that shorten development time and reduce development iterations while improving their impact, therefore fostering their adoption. Game Learning Analytics is an evidence-based methodology based on in-game user interaction data, and can provide insight about the game-based educational experience promoting aspects such as a better assessment of the learning process. In this article, we review our experiences and results applying Game Learning Analytics for serious games in three different scenarios: (1) validating and deploying a game to raise awareness about cyberbullying, (2) validating the design of a game to improve independent living of users with intellectual disabilities and (3) improving the evaluation of a game on first aid techniques. These experiences show different uses of game learning analytics in the context of serious games to improve their design, evaluation and deployment processes. Building up from these experiences, we discuss the results obtained and provide lessons learnt from these different applications, to provide an approach that can be generalized to improve the design and application of a wide range of serious games in different educational settings. 119 Full publication 120 121 122 123 124 125 126 127 128 6.2. Conference publications This section contains the publications published as part of conferences or congresses. The following subsections present in detail each publication, full citation, abstract and full text of the publication. As an overview, the conference publications included in the thesis are the following: 1. Systematizing game learning analytics for serious games: this publication presents the initial work conducted to systematize GLA for serious games with default analysis and visualizations. The process and results of this work are included in this thesis as part of the results, in subsection 4.5. 2. Data science meets standardized game learning analytics: this publication presents the tool T-MON, an exploratory analysis tool of game interaction data, aiming to help in the evidence-based assessment process of players using serious games. The process and results of this work are included in this thesis as part of the results, in subsection 4.4. 3. Full lifecycle architecture for serious games: integrating game learning analytics and a game authoring tool: this publication presents some of the earlier work to explore the improvements obtained applying GLA in the serious games’ life cycle. The process and results of this work are included in this thesis as part of the results, in subsection 4.5. 4. Improving serious games analyzing learning analytics data: lessons learned: this publication presents some of the earlier work to explore the improvements obtained applying GLA in the serious games’ life cycle. The process and results of this work are included in this thesis as part of the results, in subsection 4.5. 5. Applications of learning analytics to assess serious games: this publication presents some of the earlier work to explore the opportunities for assessment using GLA data with serious games. The process and results of this work are included in this thesis as part of the results, in subsection 4.5. 129 6.2.1. Systematizing game learning analytics for serious games Full citation Cristina Alonso-Fernández, Antonio Calvo-Morata, Manuel Freire, Iván Martínez- Ortiz, Baltasar Fernández-Manjón (2017): Systematizing game learning analytics for serious games. IEEE Global Engineering Education Conference (EDUCON), 25-28 April 2017, Athens, Greece. This paper received a Best Paper Award of the Conference, in the “Area 3: Innovative Materials, Teaching and Learning Experiences in Engineering Education”. Abstract Applying games in education provides multiple benefits clearly visible in entertainment games: their engaging, goal-oriented nature encourages students to improve while they play. Educational games, also known as Serious Games (SGs) are video games designed with a main purpose other than pure entertainment; their main purpose may be to teach, to change an attitude or behavior, or to create awareness of a certain issue. As educators and game developers, the validity and effectiveness of these games towards their defined educational purposes needs to be both measurable and measured. Fortunately, the highly interactive nature of games makes the application of Learning Analytics (LA) perfect to capture students’ interaction data with the purpose of better understanding or improving the learning process. However, there is a lack of widely adopted standards to communicate information between games and their tracking modules. Game Learning Analytics (GLA) combines the educational goals of LA with technologies that are commonplace in Game Analytics (GA), and also suffers from a lack of standards adoption that would facilitate its use across different SGs. In this paper, we describe two key steps towards the systematization of GLA: 1), the use of a newly-proposed standard tracking model to exchange information between the SG and the analytics platform, allowing reusable tracker components to be developed for each game engine or development platform; and 2), the use of standardized analysis and visualization assets to provide general but useful information for any SG that sends its data in the aforementioned format. These analysis and visualizations can be further customized and adapted for particular games when needed. We examine the use of this complete standard model in the GLA system currently under development for use in two EU H2020 SG projects. 130 Full publication 131 132 133 134 135 136 137 138 6.2.2. Data science meets standardized game learning analytics Full citation Cristina Alonso-Fernández, Antonio Calvo-Morata, Manuel Freire, Iván Martínez- Ortiz, Baltasar Fernández-Manjón (2021): Data science meets standardized game learning analytics. IEEE Global Engineering Education Conference (EDUCON), 21-23 April 2021, Vienna, Austria. Abstract Data science applications in education are quickly proliferating, partially due to the use of LMSs and MOOCs. However, the application of data science techniques in the validation and deployment of serious games is still scarce. Among other reasons, obtaining and communicating useful information from the varied interaction data captured from serious games requires specific data analysis and visualization techniques that are out of reach of most non-experts. To mitigate this lack of application of data science techniques in the field of serious games, we present T-Mon, a monitor of traces for the xAPI-SG standard. T-Mon offers a default set of analysis and visualizations for serious game interaction data that follows this standard, with no other configuration required. The information reported by T-Mon provides an overview of the game interaction data collected, bringing analysis and visualizations closer to non-experts and simplifying the application of serious games. 139 Full publication 140 141 142 143 144 145 146 6.2.3. Full lifecycle architecture for serious games: integrating game learning analytics and a game authoring tool Full citation Cristina Alonso-Fernández, Dan C. Rotaru, Manuel Freire, Iván Martínez-Ortiz, Baltasar Fernández-Manjón (2017): Full Lifecycle Architecture for Serious Games: Integrating Game Learning Analytics and a Game Authoring Tool. Joint Conference on Serious Games (JCSG), 23-24 November 2017, Polytechnic University of Valencia, Spain. Abstract The engaging and goal-oriented nature of serious games has been proven to increase student motivation. Games also allow learning assessment in a non-intrusive fashion. To increase adoption of serious games, their full lifecycle, including design, development, validation, deployment and iterative refinement must be made as simple and transparent as possible. Currently serious games impact analysis and validation is done on a case-by-case basis. In this paper, we describe a generic architecture that integrates a game authoring tool, uAdventure, with a standards-based Game Learning Analytics framework, providing a holistic approach to bring together development, validation, and analytics, that allows a systematic analysis and validation of serious games impact. This architecture allows game developers, teachers and students access to different analyses with minimal setup; and improves game development and evaluation by supporting an evidence-based approach to assess both games and learning. This system is currently being extended and used in two EU H2020 serious games projects. 147 Full publication 148 149 150 151 152 . 153 154 155 156 157 158 159 6.2.4. Improving serious games analyzing learning analytics data: lessons learned Full citation Cristina Alonso-Fernández, Ivan Perez-Colado, Manuel Freire, Iván Martínez-Ortiz, Baltasar Fernández-Manjón (2018): Improving serious games analyzing learning analytics data: lessons learned. Games and Learning Alliance conference (GALA Conf), December 5-7, 2018, Palermo, Italy. Abstract Serious games adoption is increasing, although their penetration in for-mal education is still surprisingly low. To improve their outcomes and increase their adoption in this domain, we propose new ways in which serious games can leverage the information extracted from player interactions, beyond the usual post-activity analysis. We focus on the use of: (1) open data which can be shared for research purposes, (2) real-time feedback for teachers that apply games in schools, to maintain awareness and control of their classroom, and (3) once enough data is gathered, data mining to improve game design, evaluation and deployment; and allow teachers and students to benefit from enhanced feedback or stealth assessment. Having developed and tested a game learning analytics platform throughout multiple experiments, we describe the lessons that we have learnt when analyzing learning analytics data in the previous contexts to improve serious games. 160 Full publication 161 162 163 164 165 166 167 168 169 170 6.2.5. Applications of learning analytics to assess serious games Full citation Cristina Alonso-Fernández, Ana Rus Cano, Antonio Calvo-Morata, Manuel Freire, Iván Martínez-Ortiz, Baltasar Fernández-Manjón (2018): Applications of learning analytics to assess serious games. 2nd Annual Learning & Student Analytics Conference (LSAC), October 22-23, 2018, Amsterdam, The Netherlands. Abstract We summarize our experiences regarding three applications of Learning Analytics (LA) for Serious Games (SGs) with different purposes: A. Validate and deploy games in schools. The SG Conectado has been designed to address social problems (bullying and cyberbullying). B. Validate game design when information cannot be directly gathered from users. The SG Downtown was designed for improving independent life of users with Intellectual Disabilities (ID) who struggle with communication issues. C. Improve evaluation and deployment of games. The SG First Aid Game was already validated and data mining models were applied to predict knowledge after playing. All three games have been tested with target users in actual classrooms, as described in the following section. Results and implications of the use of analytics in those three scenarios are later explained. 171 Full publication 172 173 174 175 Bibliography Adadi, A., & Berrada, M. (2018). Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access, 6, 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052 Adamo-Villani, N., Haley-Hermiz, T., & Cutler, R. (2013). Using a Serious Game Approach to Teach “Operator Precedence” to Introductory Programming Students. In 2013 17th International Conference on Information Visualisation (pp. 523–526). IEEE. https://doi.org/10.1109/IV.2013.70 ADL. (2012). Experience API. Retrieved March 20, 2016, from https://www.adlnet.gov/adl-research/performance-tracking-analysis/experience- api/ ADL. (2017). xAPI Profiles. Retrieved from https://adlnet.github.io/xapi-profiles/ Agarwal, S. (2014). Data mining: Data mining concepts and techniques. Proceedings - 2013 International Conference on Machine Intelligence Research and Advancement, ICMIRA 2013. https://doi.org/10.1109/ICMIRA.2013.45 Alonso-Fernández, C., Calvo-Morata, A., Freire, M., Martínez-Ortiz, I., & Fernández- Manjón, B. (2017). Systematizing game learning analytics for serious games. In 2017 IEEE Global Engineering Education Conference (EDUCON) (pp. 1111–1118). IEEE. https://doi.org/10.1109/EDUCON.2017.7942988 Alonso-Fernández, C., Calvo-Morata, A., Freire, M., Martínez-Ortiz, I., & Fernández- Manjón, B. (2019). Applications of data science to game learning analytics data: A systematic literature review. Computers & Education, 141, 103612. https://doi.org/10.1016/j.compedu.2019.103612 Alonso-Fernández, C., Calvo-Morata, A., Freire, M., Martínez-Ortiz, I., & Fernández- Manjón, B. (2020a). Evidence-based evaluation of a serious game to increase bullying awareness. Interactive Learning Environments, 1–11. https://doi.org/10.1080/10494820.2020.1799031 Alonso-Fernández, C., Calvo-Morata, A., Freire, M., Martínez-Ortiz, I., & Fernández- Manjón, B. (2020b). Simplifying the Validation and Application of Games with Simva. In Emerging Technologies for Education (pp. 337–346). https://doi.org/10.1007/978-3-030-38778-5_37 Alonso-Fernández, C., Cano, A. R., Calvo-Morata, A., Freire, M., Martínez-Ortiz, I., & Fernández-Manjón, B. (2019). Lessons learned applying learning analytics to assess serious games. Computers in Human Behavior, 99, 301–309. https://doi.org/10.1016/j.chb.2019.05.036 176 Alonso-Fernández, C., Pérez-Colado, I., Freire, M., Martínez-Ortiz, I., & Fernández- Manjón, B. (2019). Improving Serious Games Analyzing Learning Analytics Data: Lessons Learned. In Games and Learning Alliance: 7th International Conference, GALA 2018, Palermo, Italy, December 5–7, 2018, Proceedings (Vol. 10653, pp. 287–296). https://doi.org/10.1007/978-3-030-11548-7_27 Alonso-Fernández, C., Perez-Colado, I. J., Calvo-Morata, A., Freire, M., Martinez- Ortiz, I., & Fernández-Manjón, B. (2019). Using Simva to evaluate serious games and collect game learning analytics data. In LASI Spain 2019: Learning Analytics in Higher Education (pp. 22–34). Retrieved from https://pubman.e- ucm.es/drafts/e-UCM_draft_343.pdf Alonso-Fernandez, C., Perez-Colado, I. J., Calvo-Morata, A., Freire, M., Ortiz, I. M., & Manjon, B. F. (2020). Applications of Simva to Simplify Serious Games Validation and Deployment. IEEE Revista Iberoamericana de Tecnologias Del Aprendizaje, 15(3), 161–170. https://doi.org/10.1109/RITA.2020.3008117 Alonso-Fernández, C., Rotaru, D. C., Freire, M., Martínez-Ortiz, I., & Fernández- Manjón, B. (2017). Full Lifecycle Architecture for Serious Games: Integrating Game Learning Analytics and a Game Authoring Tool. In Lecture Notes in Computer Science (Vol. 10622 LNCS, pp. 73–84). https://doi.org/10.1007/978- 3-319-70111-0_7 Alonso‐Fernández, C., Calvo-Morata, A., Freire, M., Martínez-Ortiz, I., & Fernández‐ Manjón, B. (2021). Data science meets standardized game learning analytics. In 2021 IEEE Global Engineering Education Conference (EDUCON). Alonso‐Fernández, C., Freire, M., Martínez-Ortiz, I., & Fernández-Manjón, B. (2021). Improving evidence-based assessment of players using serious games. Telematics and Informatics. Alonso‐Fernández, C., Martínez‐Ortiz, I., Caballero, R., Freire, M., & Fernández‐ Manjón, B. (2020). Predicting students’ knowledge after playing a serious game based on learning analytics data: A case study. Journal of Computer Assisted Learning, 36(3), 350–358. https://doi.org/10.1111/jcal.12405 Alonso‐Fernández, C., Rus Cano, A., Calvo-Morata, A., Freire, M., Martínez-Ortiz, & Fernández-Manjón, B. (2018). Applications of Learning Analytics to assess Serious Games. In 2nd Annual Learning & Student Analytics Conference (LSAC). Amsterdam. Álvarez-García, D., Núñez Pérez, J. C., & Dobarro González, A. (2013). Cuestionarios para evaluar la violencia escolar en Educación Primaria y en Educación Secundaria: CUVE3-EP y CUVE3-ESO. Apuntes de Psicología, 31(2), 191–202. Retrieved from 177 http://www.apuntesdepsicologia.es/index.php/revista/article/view/322/296 Asociación Servicio Interdisciplinar de Atención a las Drogodependencias (SIAD). (2014). Aislados. Retrieved November 13, 2016, from http://www.aislados.es/zona-educadores/ Baker, R. S., Clarke-Midura, J., & Ocumpaugh, J. (2016). Towards general models of effective science inquiry in virtual performance assessments. Journal of Computer Assisted Learning, 32(3), 267–280. https://doi.org/10.1111/jcal.12128 Baker, R., & Yacef, K. (2009). The State of Educational Data Mining in 2009 : A Review and Future Visions. Journal of Educational Data Mining, 1(1), 3–16. https://doi.org/http://doi.ieeecomputersociety.org/10.1109/ASE.2003.1240314 Bienkowski, M., Feng, M., & Means, B. (2012). Enhancing teaching and learning through educational data mining and learning analytics: An issue brief. Washington, DC: SRI International, 1–57. Retrieved from https://tech.ed.gov/wp-content/uploads/2014/03/edm-la-brief.pdf Calderón, A., & Ruiz, M. (2015). A systematic literature review on serious games evaluation: An application to software project management. Computers & Education, 87, 396–422. https://doi.org/10.1016/j.compedu.2015.07.011 Calvo-Morata, A., Alonso‐Fernández, C., Freire, M., Martinez-Ortiz, I., & Fernández- Manjón, B. (2020). Creating awareness on bullying and cyberbullying among young people: validating the effectiveness and design of the serious game Conectado (submitted). Telematics and Informatics. Calvo-Morata, A., Rotaru, D. C., Alonso-Fernandez, C., Freire-Moran, M., Martinez- Ortiz, I., & Fernandez-Manjon, B. (2020). Validation of a Cyberbullying Serious Game Using Game Analytics. IEEE Transactions on Learning Technologies, 13(1), 186–197. https://doi.org/10.1109/TLT.2018.2879354 Calvo Morata, A. (2020). Uso de técnicas de learning analytics para la validación, mejora y aplicación de juegos serios en la clase aplicado al ciberbullying. Universidad Complutense de Madrid. Cano, A. R., Fernández-Manjón, B., & García-Tejedor, Á. J. (2018). Using game learning analytics for validating the design of a learning game for adults with intellectual disabilities. British Journal of Educational Technology, 49(4), 659– 672. https://doi.org/10.1111/bjet.12632 Center for Game Science at the University of Washington. (2016). Treefrog Treasure. Retrieved November 15, 2016, from http://centerforgamescience.org/blog/portfolio/treefrog-treasure/ 178 Chatti, M. A., Dyckhoff, A. L., Schroeder, U., & Thüs, H. (2012). A reference model for learning analytics. International Journal of Technology Enhanced Learning, 4(5/6), 318. https://doi.org/10.1504/IJTEL.2012.051815 Chaudy, Y., Connolly, T., & Hainey, T. (2014). Learning Analytics in Serious Games: a Review of the Literature. Ecaet 2014, (March 2016). Cheng, M.-T., Rosenheck, L., Lin, C.-Y., & Klopfer, E. (2017). Analyzing gameplay data to inform feedback loops in The Radix Endeavor. Computers & Education, 111, 60–73. https://doi.org/10.1016/j.compedu.2017.03.015 Chung, G. K. W. K. (2015). Guidelines for the Design and Implementation of Game Telemetry for Serious Games Analytics. In Serious Games Analytics (pp. 59–79). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319- 05834-4_3 Clark, D. B., Martinez-Garza, M. M., Biswas, G., Luecht, R. M., & Sengupta, P. (2012). Driving Assessment of Students’ Explanations in Game Dialog Using Computer- Adaptive Testing and Hidden Markov Modeling. In Assessment in Game-Based Learning (pp. 173–199). New York, NY: Springer New York. https://doi.org/10.1007/978-1-4614-3546-4_10 Connolly, T. M., Boyle, E. A., MacArthur, E., Hainey, T., & Boyle, J. M. (2012). A systematic literature review of empirical evidence on computer games and serious games. Computers & Education, 59(2), 661–686. https://doi.org/10.1016/j.compedu.2012.03.004 Cutumisu, M., Blair, K. P., Chin, D. B., & Schwartz, D. L. (2017). Assessing Whether Students Seek Constructive Criticism: The Design of an Automated Feedback System for a Graphic Design Task. International Journal of Artificial Intelligence in Education, 27(3), 419–447. https://doi.org/10.1007/s40593-016-0137-5 de Klerk, S., & Kato, P. (2017). The Future Value of Serious Games for Assessment: Where Do We Go Now?. Journal of Applied Testing Technology, 18(February), 32–37. DeFalco, J. A., Rowe, J. P., Paquette, L., Georgoulas-Sherry, V., Brawner, K., Mott, B. W., … Lester, J. C. (2018). Detecting and Addressing Frustration in a Serious Game for Military Training. International Journal of Artificial Intelligence in Education, 28(2), 152–193. https://doi.org/10.1007/s40593-017-0152-1 Denden, M., Tlili, A., Essalmi, F., & Jemni, M. (2018). Implicit modeling of learners’ personalities in a game-based learning environment using their gaming behaviors. Smart Learning Environments, 5(1), 1–19. https://doi.org/10.1186/s40561-018- 0078-6 179 Dicerbo, K. E. (2013). Game-based assessment of persistence. Educational Technology and Society, 17(1), 17–28. DiCerbo, K. E., Bertling, M., Stephenson, S., Jia, Y., Mislevy, R. J., Bauer, M., & Jackson, G. T. (2015). An Application of Exploratory Data Analysis in the Development of Game-Based Assessments. In C. S. Loh, Y. Sheng, & D. Ifenthaler (Eds.), Serious Games Analytics (pp. 319–342). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-05834-4_14 Dörner, R., Göbel, S., Effelsberg, W., & Wiemeyer, J. (Eds.). (2016). Serious Games. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319- 40612-1 Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A., & Vapnik, V. (1997). Support vector regression machines. Advances in Neural Information Processing Systems, 9(x), 155–161. https://doi.org/10.1.1.10.4845 Elaachak, L., Belahbibe, A., & Bouhorma, M. (2015). Towards a System of Guidance, Assistance and Learning Analytics Based on Multi Agent System Applied on Serious Games. International Journal of Electrical and Computer Engineering (IJECE) Journal, 5(2), 2088–8708. Retrieved from http://iaesjournal.com/online/index.php/IJECE ElAtia, S., Ipperciel, D., & Zaïane, O. R. (2016). Data Mining and Learning Analytics. (S. ElAtia, D. Ipperciel, & O. R. Zaïane, Eds.), Data Mining And Learning Analytics: Applications in Educational Research. Hoboken, NJ, USA: John Wiley & Sons, Inc. https://doi.org/10.1002/9781118998205 Electronic Arts. (2013). SimCityEDU: Pollution Challenge! Retrieved December 18, 2020, from http://www.simcityedu.org/ Electronic Arts Games. (2019). SimCity BuildIt. Retrieved December 18, 2020, from https://www.ea.com/es-es/games/simcity/simcity-buildit European Commission. (2018). 2018 reform of EU data protection rules. Retrieved from https://ec.europa.eu/commission/priorities/justice-and-fundamental- rights/data-protection/2018-reform-eu-data-protection-rules_en Evans, K. H., Daines, W., Tsui, J., Strehlow, M., Maggio, P., & Shieh, L. (2015). Septris. Academic Medicine, 90(2), 180–184. https://doi.org/10.1097/ACM.0000000000000611 Firaxis Games. (2016). Civilization. Retrieved December 18, 2020, from https://civilization.com Forsyth, C., Pavlik, P., Graesser, A., Cai, Z., Germany, M.-L., Millis, K., … Halpern, 180 D. (2012). Learning Gains for Core Concepts in a Serious Game on Scientific Reasoning. Proceedings of the 5th International Conference on Educational Data Mining, 1–4. Retrieved from http://w.optimallearning.org/people/Articles/edm2012_short_2.pdf Frederick-Recascino, C., Liu, D., Doherty, S., Kring, J., & Liskey, D. (2013). Articulating an Experimental Model for the Study of Game-Based Learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8018 LNCS, pp. 25–32). https://doi.org/10.1007/978-3-642-39226-9_4 Freire, M., Serrano-Laguna, Á., Iglesias, B. M., Martínez-Ortiz, I., Moreno-Ger, P., & Fernández-Manjón, B. (2016). Game Learning Analytics: Learning Analytics for Serious Games. In Learning, Design, and Technology (pp. 1–29). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-17727-4_21-1 Freitas, S. de, & Gibson, D. (2014). Exploratory learning analytics methods from three case studies. Rhetoric and Reality: Critical Perspectives on Educational Technology. Proceedings of Ascilite Dunedin 2014, 383–388. Garaigordobil, M., & Aliri, J. (2013). Ciberacoso (“cyberbullying”) en el País Vasco: Diferencias de sexo en víctimas, agresores y observadores. Behavioral Psychology/ Psicologia Conductual, 21(3), 461–474. García-Tejedor, Á. J., Cano, A. R., & Fernández-Manjón, B. (2016). GLAID: Designing a Game Learning Analytics Model to Analyze the Learning Process in Users with Intellectual Disabilities. In 6th EAI International Conference on Serious Games, Interaction and Simulation. Porto, Portugal. Retrieved from http://sgamesconf.org/2016/show/technical-session Gašević, D., Dawson, S., & Siemens, G. (2015). Let’s not forget: Learning analytics are about learning. TechTrends, 59(1), 64–71. https://doi.org/10.1007/s11528-014- 0822-x Ghergulescu, I., & Muntean, C. H. (2016). ToTCompute: A Novel EEG-Based TimeOnTask Threshold Computation Mechanism for Engagement Modelling and Monitoring. International Journal of Artificial Intelligence in Education, 26(3), 821–854. https://doi.org/10.1007/s40593-016-0111-2 Gibson, D., & Clarke-Midura, J. (2015). Some Psychometric and Design Implications of Game-Based Learning Analytics. E-Learning Systems, Environments and Approaches, (CELDA), 247–261. https://doi.org/10.1007/978-3-319-05825-2_17 Girard, C., Ecalle, J., & Magnan, A. (2013). Serious games as new educational tools: how effective are they? A meta-analysis of recent studies. Journal of Computer 181 Assisted Learning, 29(3), 207–219. https://doi.org/10.1111/j.1365- 2729.2012.00489.x GTLHistory. (2020). Games to learn history. Retrieved October 22, 2020, from https://www.gtlhistory.com/ Gweon, G.-H., Lee, H.-S., Dorsey, C., Tinker, R., Finzer, W., & Damelin, D. (2015). Tracking student progress in a game-like learning environment with a Monte Carlo Bayesian knowledge tracing model. In Proceedings of the Fifth International Conference on Learning Analytics And Knowledge - LAK ’15 (pp. 166–170). New York, New York, USA: ACM Press. https://doi.org/10.1145/2723576.2723608 Hainey, T., Connolly, T. M., Boyle, E. A., Wilson, A., & Razak, A. (2016). A systematic literature review of games-based learning empirical evidence in primary education. Computers & Education, 102, 202–223. https://doi.org/10.1016/j.compedu.2016.09.001 Halverson, R., & Owen, V. E. (2014). Game-based assessment: an integrated model for capturing evidence of learning in play. International Journal of Learning Technology, 9(2), 111. https://doi.org/10.1504/ijlt.2014.064489 Han, J., Kamber, M., & Pei, J. (2012). Introduction. In Data Mining (pp. 1–38). Elsevier. https://doi.org/10.1016/B978-0-12-381479-1.00001-0 Harpstead, E., MacLellan, C. J., Aleven, V., & Myers, B. A. (2015). Replay Analysis in Open-Ended Educational Games. In Serious Games Analytics (pp. 381–399). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319- 05834-4_17 Hauge, J. B., Berta, R., Fiucci, G., Manjon, B. F., Padron-Napoles, C., Westra, W., & Nadolski, R. (2014). Implications of Learning Analytics for Serious Game Design. In 2014 IEEE 14th International Conference on Advanced Learning Technologies (pp. 230–232). IEEE. https://doi.org/10.1109/ICALT.2014.73 Heeter, C., Lee, Y.-H., Medler, B., & Magerko, B. (2013). Conceptually Meaningful Metrics: Inferring Optimal Challenge and Mindset from Gameplay. In Game Analytics (pp. 731–762). London: Springer London. https://doi.org/10.1007/978- 1-4471-4769-5_32 Hernández-Lara, A. B., Perera-Lluna, A., & Serradell-López, E. (2019). Applying learning analytics to students’ interaction in business simulation games. The usefulness of learning analytics to know what students really learn. Computers in Human Behavior, 92, 600–612. https://doi.org/10.1016/j.chb.2018.03.001 Hicks, D., Eagle, M., Rowe, E., Asbell-Clarke, J., Edwards, T., & Barnes, T. (2016). 182 Using game analytics to evaluate puzzle design and level progression in a serious game. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge - LAK ’16 (pp. 440–448). New York, New York, USA: ACM Press. https://doi.org/10.1145/2883851.2883953 Horn, B., Hoover, A. K., Barnes, J., Folajimi, Y., Smith, G., & Harteveld, C. (2016). Opening the Black Box of Play. In Proceedings of the 2016 Annual Symposium on Computer-Human Interaction in Play - CHI PLAY ’16 (pp. 142–153). New York, New York, USA: ACM Press. https://doi.org/10.1145/2967934.2968109 Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2016). A Practical Guide to Support Vector Classification. Taipei. Retrieved from https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf Iglesias, B. M., Fernandez-Vara, C., & Fernandez-Manjon, B. (2013). E-Learning Takes the Stage: From La Dama Boba to a Serious Game. IEEE Revista Iberoamericana de Tecnologias Del Aprendizaje, 8(4), 197–204. https://doi.org/10.1109/RITA.2013.2285023 interFUEL, L. (2006). Darfur is Dying. Retrieved from http://www.gamesforchange.org/play/darfur-is-dying/ Irizarry, R. A. (2019). Introduction to Data Science. Introduction to Data Science. Chapman and Hall/CRC. https://doi.org/10.1201/9780429341830 Jaccard, D., Hulaas, J., & Dumont, A. (2017). Using Comparative Behavior Analysis to Improve the Impact of Serious Games on Students’ Learning Experience. (J. Dias, P. A. Santos, & R. C. Veltkamp, Eds.) (Vol. 10653). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-71940-5 Jupyter Team. (2020). Jupyter Projects. Retrieved November 1, 2020, from https://jupyter.readthedocs.io/en/latest/projects/content-projects.html Kang, J., Liu, M., & Qu, W. (2017). Using gameplay data to examine learning behavior patterns in a serious game. Computers in Human Behavior, 72, 757–770. https://doi.org/10.1016/j.chb.2016.09.062 Käser, T., Busetto, A. G., Solenthaler, B., Baschera, G. M., Kohn, J., Kucian, K., … Gross, M. (2013). Modelling and optimizing mathematics learning in children. International Journal of Artificial Intelligence in Education, 23(1–4), 115–135. https://doi.org/10.1007/s40593-013-0003-7 Käser, T., Hallinen, N. R., & Schwartz, D. L. (2017). Modeling exploration strategies to predict student performance within a learning environment and beyond. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference on - LAK ’17 (pp. 31–40). New York, New York, USA: ACM Press. 183 https://doi.org/10.1145/3027385.3027422 Kato, P. M., & Klerk, S. De. (2017). Serious Games for Assessment: Welcome to the Jungle. Journal of Applied Testing Technology, 18, 1–6. Ke, F., Shute, V., Clark, K. M., & Erlebacher, G. (2019). Interdisciplinary Design of Game-based Learning Platforms. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-04339-1 Ke, F., & Shute, V. J. (2015). Serious Games Analytics. (C. S. Loh, Y. Sheng, & D. Ifenthaler, Eds.), Serious Games Analytics. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-05834-4 Keehn, S., & Claggett, S. (2019). Collecting Standardized Assessment Data in Games. Journal of Applied Testing Technology, 20, 43–51. Ketamo, H. (2013). Agents and Analytics - A Framework for Educational Data Mining with Games based Learning. In Proceedings of the 5th International Conference on Agents and Artificial Intelligence (pp. 377–382). SciTePress - Science and and Technology Publications. https://doi.org/10.5220/0004331403770382 Ketamo, H. (2015). User-Generated Character Behaviors in Educational Games. In Healthcare Informatics Research (Vol. 21, pp. 57–68). https://doi.org/10.1007/978-981-287-408-5_5 Kickmeier-Rust, M. D. (2018). Predicting Learning Performance in Serious Games. In S. Göbel, A. Garcia-Agundez, T. Tregel, M. Ma, J. Baalsrud Hauge, M. Oliveira, … P. Caserman (Eds.), Serious Games (Vol. 11243, pp. 133–144). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-02762-9_14 Kitto, K., Whitmer, J., Silvers, A. E., & Webb, M. (2020). Creating Data for Learning Analytics Ecosystems. SOLAR Position Paper, 1–43. Koedinger, K., McLaughlin, E., & Stamper, J. (2012). Automated Student Model Improvement. Proceedings of the 5th International Conference on Educational Data Mining, 17–24. https://doi.org/10.978.17421/02764 Kosmas, P., Ioannou, A., & Retalis, S. (2018). Moving Bodies to Moving Minds: A Study of the Use of Motion-Based Games in Special Education. TechTrends, 62(6), 594–601. https://doi.org/10.1007/s11528-018-0294-5 Lazo, P. P. L., Anareta, C. L. Q., Duremdes, J. B. T., & Red, E. R. (2018). Classification of public elementary students’ game play patterns in a digital game-based learning system with pedagogical agent. In Proceedings of the 6th International Conference on Information and Education Technology - ICIET ’18 (pp. 75–80). New York, New York, USA: ACM Press. https://doi.org/10.1145/3178158.3178160 184 Liu, M., Kang, J., Lee, J., Winzeler, E., & Liu, S. (2015). Examining Through Visualization What Tools Learners Access as They Play a Serious Game for Middle School Science. In Serious Games Analytics (pp. 181–208). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-05834-4_8 Liu, M., Kang, J., Liu, S., Zou, W., & Hodson, J. (2017). Learning Analytics as an Assessment Tool in Serious Games: A Review of Literature. In Serious Games and Edutainment Applications (pp. 537–563). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-51645-5_24 Liu, M., Lee, J., Kang, J., & Liu, S. (2016). What We Can Learn from the Data: A Multiple-Case Study Examining Behavior Patterns by Students with Different Characteristics in Using a Serious Game. Technology, Knowledge and Learning, 21(1), 33–57. https://doi.org/10.1007/s10758-015-9263-7 Loh, C. S., & Sheng, Y. (2014). Maximum Similarity Index (MSI): A metric to differentiate the performance of novices vs. multiple-experts in serious games. Computers in Human Behavior, 39, 322–330. https://doi.org/10.1016/j.chb.2014.07.022 Loh, C. S., & Sheng, Y. (2015a). Measuring Expert Performance for Serious Games Analytics: From Data to Insights. In Serious Games Analytics (pp. 101–134). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-05834-4_5 Loh, C. S., & Sheng, Y. (2015b). Measuring the (dis-)similarity between expert and novice behaviors as serious games analytics. Education and Information Technologies, 20(1), 5–19. https://doi.org/10.1007/s10639-013-9263-y Loh, C. S., Sheng, Y., & Ifenthaler, D. (2015a). Serious Games Analytics. (C. S. Loh, Y. Sheng, & D. Ifenthaler, Eds.). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-05834-4 Loh, C. S., Sheng, Y., & Ifenthaler, D. (2015b). Serious Games Analytics: Theoretical Framework. In Serious Games Analytics (pp. 3–29). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-05834-4_1 Long, P., & Siemens, G. (2011). Penetrating the Fog: Analytics in Learning and Education. EDUCAUSE Review, 46(5), 30–32. Retrieved from http://search.proquest.com.proxy.library.vanderbilt.edu/docview/964183308/13 AF5BC47C138E29FF2/5?accountid=14816%5Cnhttps://login.proxy.library.vande rbilt.edu/login?url=http://search.proquest.com/docview/964183308/13AF5BC47 C138E29FF2/5?accountid=14816 Long, P., Siemens, G., Gráinne, C., & Gašević, D. (2011). LAK ’11 : proceedings of the 1st International Conference on Learning Analytics and Knowledge, February 27 185 - March 1, 2011, Banff, Alberta, Canada. In 1st International Conference on Learning Analytics and Knowledge (p. 195). Retrieved from https://dl.acm.org/citation.cfm?id=2090116 Macfadyen, L. P., & Dawson, S. (2010). Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers and Education. https://doi.org/10.1016/j.compedu.2009.09.008 Manero, B., Torrente, J., Freire, M., & Fernández-Manjón, B. (2016). An instrument to build a gamer clustering framework according to gaming preferences and habits. Computers in Human Behavior, 62, 353–363. https://doi.org/10.1016/j.chb.2016.03.085 Marchiori, E. J., Ferrer, G., Fernandez-Manjon, B., Povar-Marco, J., Suberviola, J. F., & Gimenez-Valverde, A. (2012). Video-game instruction in basic life support maneuvers. Emergencias, 24(6), 433–437. Martin, T., Aghababyan, A., Pfaffman, J., Olsen, J., Baker, S., Janisiewicz, P., … Smith, C. P. (2013). Nanogenetic learning analytics. In Proceedings of the Third International Conference on Learning Analytics and Knowledge - LAK ’13 (p. 165). New York, New York, USA: ACM Press. https://doi.org/10.1145/2460296.2460328 Martin, T., Petrick Smith, C., Forsgren, N., Aghababyan, A., Janisiewicz, P., & Baker, S. (2015). Learning Fractions by Splitting: Using Learning Analytics to Illuminate the Development of Mathematical Understanding. Journal of the Learning Sciences, 24(4), 593–637. https://doi.org/10.1080/10508406.2015.1078244 Martinez-Garza, M. M., & Clark, D. B. (2017). Investigating Epistemic Stances in Game Play with Data Mining. International Journal of Gaming and Computer- Mediated Simulations, 9(3), 1–40. https://doi.org/10.4018/ijgcms.2017070101 Mavridis, A., Katmada, A., & Tsiatsos, T. (2017). Impact of online flexible games on students’ attitude towards mathematics. Educational Technology Research and Development, 65(6), 1451–1470. https://doi.org/10.1007/s11423-017-9522-5 Mayer, I., van Dierendonck, D., van Ruijven, T., & Wenzler, I. (2014). Stealth Assessment of Teams in a Digital Game Environment. In Lecture Notes in Computer Science (Vol. 8605, pp. 224–235). https://doi.org/10.1007/978-3-319- 12157-4_18 McCarthy, K. S., Johnson, A. M., Likens, A. D., Martin, Z., & McNamara, D. S. (2017). Metacognitive Prompt Overdose: Positive and Negative Effects of Prompts in iSTART. Grantee Submission, 404–405. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&db=eric&AN=ED577125&si 186 te=ehost-live Michael, D. R., & Chen, S. L. (2005). Serious Games: Games That Educate, Train, and Inform. Education, October 31, 1–95. https://doi.org/10.1145/2465085.2465091 Mojang Studios. (2011). Minecraft. Retrieved from https://www.minecraft.net/ Mojang Studios. (2016). Minecraft Education Edition. Retrieved December 18, 2020, from https://education.minecraft.net/ Moreno-Marcos, P. M., Alario-Hoyos, C., Munoz-Merino, P. J., & Delgado Kloos, C. (2018). Prediction in MOOCs: A review and future research directions. IEEE Transactions on Learning Technologies, pp. 1–1. https://doi.org/10.1109/TLT.2018.2856808 Muratet, M., Yessad, A., & Carron, T. (2016). Understanding Learners’ Behaviors in Serious Games. In F. W. B. Li, R. Klamma, M. Laanpere, J. Zhang, B. F. Manjón, & R. W. H. Lau (Eds.), Advances in Web-Based Learning - ICWL 2015 (Vol. 9412, pp. 195–205). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-47440-3_22 Nguyen, A., Gardner, L. A., & Sheridan, D. (2018). A framework for applying learning analytics in serious games for people with intellectual disabilities. British Journal of Educational Technology, 49(4), 673–689. https://doi.org/10.1111/bjet.12625 Ninaus, M., Kiili, K., Siegler, R. S., & Moeller, K. (2017). Data-Driven Design Decisions to Improve Game-Based Learning of Fractions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10653 LNCS, pp. 3–13). https://doi.org/10.1007/978-3-319-71940-5_1 Ortega-Ruiz, R., Del Rey, R., & Casas, J. A. (2016). Evaluar el bullying y el cyberbullying validación española del EBIP-Q y del ECIP-Q. Psicología Educativa, 22(1), 71–79. https://doi.org/10.1016/j.pse.2016.01.004 Owen, E., & Baker, R. (2019). Learning Analytics for Serious Games, (February). Retrieved from http://www.galanoe.eu/index.php/home/365-learning-analytics- for-serious-games Owen, V. E., Anton, G., & Baker, R. (2016). Modeling User Exploration and Boundary Testing in Digital Learning Games. In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization - UMAP ’16 (pp. 301–302). New York, New York, USA: ACM Press. https://doi.org/10.1145/2930238.2930271 Owen, V. E., & Baker, R. S. (2018). Fueling Prediction of Player Decisions: Foundations of Feature Engineering for Optimized Behavior Modeling in Serious 187 Games. Technology, Knowledge and Learning, (123456789). https://doi.org/10.1007/s10758-018-9393-9 Pareto, L. (2014). A teachable agent game engaging primary school children to learn arithmetic concepts and reasoning. International Journal of Artificial Intelligence in Education, 24(3), 251–283. https://doi.org/10.1007/s40593-014-0018-8 Peffers, K., Tuunanen, T., Rothenberger, M. A., & Chatterjee, S. (2007). A design science research methodology for information systems research. Journal of Management Information Systems. https://doi.org/10.2753/MIS0742- 1222240302 Pereira, H. A., De Souza, A. F., & De Menezes, C. S. (2016). A computational architecture for learning analytics in game-based learning. Proceedings - IEEE 16th International Conference on Advanced Learning Technologies, ICALT 2016, 191–193. https://doi.org/10.1109/ICALT.2016.3 Pérez-Colado, I., Alonso-Fernández, C., Freire, M., Martínez-Ortiz, I., & Fernández- Manjón, B. (2018). Game learning analytics is not informagic! In 2018 IEEE Global Engineering Education Conference (EDUCON) (pp. 1729–1737). IEEE. https://doi.org/10.1109/EDUCON.2018.8363443 Pérez-Colado, I. J., Calvo-Morata, A., Alonso-Fernández, C., Freire, M., Martínez- Ortiz, I., & Fernández-Manjón, B. (2019). Simva: Simplifying the Scientific Validation of Serious Games. In 2019 IEEE 19th International Conference on Advanced Learning Technologies (ICALT) (pp. 113–115). IEEE. https://doi.org/10.1109/ICALT.2019.00033 Pérez-Colado, I., Pérez-Colado, V., Martínez-Ortiz, I., Freire, M., & Fernández- Manjón, B. (2017). uAdventure: The eAdventure reboot - Combining the experience of commercial gaming tools and tailored educational tools. In IEEE Global Engineering Education Conference (EDUCON) (pp. 1754–1761). Retrieved from http://www.e-ucm.es/drafts/e-UCM_draft_304.pdf Petri, G., & Gresse von Wangenheim, C. (2017). How games for computing education are evaluated? A systematic literature review. Computers & Education, 107, 68– 90. https://doi.org/10.1016/j.compedu.2017.01.004 Petrov, E. V., Mustafina, J., Alloghani, M., Galiullin, L., & Tan, S. Y. (2018). Learning Analytics and Serious Games: Analysis of Interrelation. In 2018 11th International Conference on Developments in eSystems Engineering (DeSE) (Vol. 2018–Septe, pp. 153–156). IEEE. https://doi.org/10.1109/DeSE.2018.00037 Plass, J. L., Homer, B. D., Kinzer, C. K., Chang, Y. K., Frye, J., Kaczetow, W., … Perlin, K. (2013). Metrics in Simulations and Games for Learning. In Game 188 Analytics (pp. 697–729). London: Springer London. https://doi.org/10.1007/978- 1-4471-4769-5_31 Polyak, S. T., von Davier, A. A., & Peterschmidt, K. (2017). Computational psychometrics for the measurement of collaborative problem solving skills. Frontiers in Psychology, 8(NOV), 1–16. https://doi.org/10.3389/fpsyg.2017.02029 Project Jupyter. (2020). Jupyter. Retrieved November 1, 2020, from https://jupyter.org/ Rahimi, S., Shute, V., Kuba, R., Dai, C.-P., Yang, X., Smith, G., & Alonso Fernández, C. (2021). The use and effects of incentive systems on learning and performance in educational games. Computers & Education, 165, 104135. https://doi.org/10.1016/j.compedu.2021.104135 Roberts, J. D., Chung, G. K. W. K., & Parks, C. B. (2016). Supporting children’s progress through the PBS KIDS learning analytics platform. Journal of Children and Media, 10(2), 257–266. https://doi.org/10.1080/17482798.2016.1140489 Romero, C., & Ventura, S. (2010). Educational Data Mining: A Review of the State of the Art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), 601–618. https://doi.org/10.1109/TSMCC.2010.2053532 Rowe, E., Asbell-clarke, J., & Baker, R. S. (2015). Serious Games Analytics to Measure Implicit Science Learning. In C. S. Loh, Y. Sheng, & D. Ifenthaler (Eds.), Serious Games Analytics (pp. 343–360). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-05834-4 Rowe, E., Asbell-Clarke, J., Baker, R. S., Eagle, M., Hicks, A. G., Barnes, T. M., … Edwards, T. (2017). Assessing implicit science learning in digital games. Computers in Human Behavior, 76, 617–630. https://doi.org/10.1016/j.chb.2017.03.043 Ruiperez-Valiente, J. A. (2020). The Implementation Process of Learning Analytics. Ried-Revista Iberoamericana De Educacion a Distancia, 23(2), 88–101. https://doi.org/10.5944/ried.23.1.26283 Sabourin, J. L., Shores, L. R., Mott, B. W., & Lester, J. C. (2013). Understanding and predicting student self-regulated learning strategies in game-based learning environments. International Journal of Artificial Intelligence in Education, 23(1– 4), 94–114. https://doi.org/10.1007/s40593-013-0004-6 Seif El-Nasr, M., Drachen, A., & Canossa, A. (2013). Game Analytics. (M. Seif El-Nasr, A. Drachen, & A. Canossa, Eds.). London: Springer London. https://doi.org/10.1007/978-1-4471-4769-5 189 Seppala, T. J. (2016). CivilizationEDU takes the strategy franchise to school. Retrieved December 18, 2020, from https://www.engadget.com/2016-06-24- civilizationedu-takes-the-strategy-franchise-to-school.html Serrano-Laguna, Á., Manero, B., Freire, M., & Fernández-Manjón, B. (2017). A methodology for assessing the effectiveness of serious games and for inferring player learning outcomes. Multimedia Tools and Applications, 77(2), 2849–2871. https://doi.org/10.1007/s11042-017-4467-6 Serrano-Laguna, Á., Martínez-Ortiz, I., Haag, J., Regan, D., Johnson, A., & Fernández- Manjón, B. (2017). Applying standards to systematize learning analytics in serious games. Computer Standards & Interfaces, 50, 116–123. https://doi.org/10.1016/j.csi.2016.09.014 Serrano-Laguna, Á., Torrente, J., Moreno-Ger, P., & Fernández-Manjón, B. (2012). Tracing a little for big improvements: Application of learning analytics and videogames for student assessment. In Procedia Computer Science (Vol. 15, pp. 203–209). Elsevier. Serrano-Laguna, Á., Torrente, J., Moreno-Ger, P., & Fernández-Manjón, B. (2014). Application of Learning Analytics in educational videogames. Entertainment Computing, 5(4), 313–322. https://doi.org/10.1016/j.entcom.2014.02.003 Serrano Laguna, Á. (2017). Mejorando la evaluación de juegos serios mediante el uso de analíticas de aprendizaje. Universidad Complutense de Madrid. Sharples, M., & Domingue, J. (2016). Adaptive and Adaptable Learning. Lecture Notes in Computer Science. Switzerland, 9891, 13–16. https://doi.org/10.1007/978-3- 319-45153-4 Shoukry, L., Göbel, S., & Steinmetz, R. (2014). Learning Analytics and Serious Games: Trends and Considerations. In Proceedings of the 2014 ACM International Workshop on Serious Games. https://doi.org/10.1145/2656719.2656729 Shute, V. J., & Moore, G. R. (2017). Consistency and Validity in Game-Based Stealth Assessment. In Technology Enhanced Innovative Assessment: Development, Modeling, and Scoring From an Interdisciplinary Perspective. Shute, V. J., Ventura, M., & Kim, Y. J. (2013). Assessment and Learning of Qualitative Physics in Newton’s Playground. The Journal of Educational Research, 106(6), 423–430. https://doi.org/10.1080/00220671.2013.832970 Shute, V., & Kim, Y. J. (2014). Formative and stealth assessment. In Handbook of Research on Educational Communications and Technology: Fourth Edition (pp. 311–321). https://doi.org/10.1007/978-1-4614-3185-5_3 190 Shute, V., & Ventura, M. (2013). Stealth Assessment. In The SAGE Encyclopedia of Educational Technology (p. 91). 2455 Teller Road, Thousand Oaks, California 91320: SAGE Publications, Inc. https://doi.org/10.4135/9781483346397.n278 Slimani, A., Elouaai, F., Elaachak, L., Yedri, O. B., & Bouhorma, M. (2018). Learning analytics through serious games: Data mining algorithms for performance measurement and improvement purposes. International Journal of Emerging Technologies in Learning, 13(1), 46–64. https://doi.org/10.3991/ijet.v13i01.7518 Smith, S. P., Blackmore, K., & Nesbitt, K. (2015). A Meta-Analysis of Data Collection in Serious Games Research. In Serious Games Analytics (pp. 31–55). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-05834-4_2 Smith, S. P., Hickmott, D., Southgate, E., Bille, R., & Stephens, L. (2016). Exploring Play-Learners’ Analytics in a Serious Game for Literacy Improvement. In T. Marsh, M. Ma, M. F. Oliveira, J. Baalsrud Hauge, & S. Göbel (Eds.), Serious Games (Vol. 9894, pp. 13–24). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-45841-0_2 Snell, J., Atkins, M., Norris, W., Messina, C., Wilkinson, M., & Dolin, R. (2011). JSON Activity Streams 1.0. Act. Streams Work., 22(8), 2013. Snow, E. L., Allen, L. K., & McNamara, D. S. (2015). The Dynamical Analysis of Log Data Within Educational Games. In Serious Games Analytics (pp. 81–100). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-05834-4_4 Stamper, J. C., Lomas, D., Ching, D., Ritter, S., Koedinger, K. R., & Steinhart, J. (2012). The Rise of the Super Experiment. Proceedings of the 5th International Conference on Educational Data Mining, 196–199. https://doi.org/10.1177/0003122412458508 Standford Medicine. (2013). SICKO. Retrieved April 24, 2018, from https://med.stanford.edu/news/all-news/2013/09/stanford-designed-game- teaches-surgical-decision-making.html Steiner, C. M., Kickmeier-Rus, M. D., & Albert, D. (2015). Making sense of game- based user data: Learning analytics in applied games. International Conference E- Learning, 195–198. https://doi.org/10.1017/CBO9781107415324.004 Streicher, A., & Roller, W. (2017). Interoperable Adaptivity and Learning Analytics for Serious Games in Image Interpretation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10474 LNCS, pp. 598–601). https://doi.org/10.1007/978-3- 319-66610-5_71 Streicher, A., & Smeddinck, J. D. (2016). Personalized and Adaptive Serious Games. 191 In R. Dörner, S. Göbel, M. Kickmeier-Rust, M. Masuch, & K. Zweig (Eds.), Springer (Vol. 9970, pp. 332–377). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-46152-6_14 Su, Y., Backlund, P., & Engström, H. (2020). Comprehensive review and classification of game analytics. In Service Oriented Computing and Applications. https://doi.org/10.1007/s11761-020-00303-z Sucholutsky, I., & Schonlau, M. (2020). “Less Than One”-Shot Learning: Learning N Classes From M