UNIVERSIDAD COMPLUTENSE DE MADRID FACULTAD DE FILOLOGÍA TESIS DOCTORAL Attribution of authorship of arden of faversham: a forensic linguistic study of William Shakespeare and Christopher Marlowe Atribución de autoría de arden of faversham: un estudio lingüístico forense de William Shakespeare and Christopher Marlowe MEMORIA PARA OPTAR AL GRADO DE DOCTOR PRESENTADA POR Juan Antonio Latorre García Directoras María Goicoechea de Jorge Elena Martínez Caro Madrid © Juan Antonio Latorre García, 2022 UNIVERSIDAD COMPLUTENSE DE MADRID FACULTAD DE FILOLOGÍA TESIS DOCTORAL ATTRIBUTION OF AUTHORSHIP OF ARDEN OF FAVERSHAM: A FORENSIC LINGUISTIC STUDY OF WILLIAM SHAKESPEARE AND CHRISTOPHER MARLOWE ATRIBUCIÓN DE AUTORÍA DE ARDEN OF FAVERSHAM: UN ESTUDIO LINGÜÍSTICO FORENSE DE WILLIAM SHAKESPEARE Y CHRISTOPHER MARLOWE MEMORIA PARA OPTAR AL GRADO DE DOCTOR PRESENTADA POR Juan Antonio Latorre García DIRECTORAS María Goicoechea de Jorge Elena Martínez Caro UNIVERSIDAD COMPLUTENSE DE MADRID FACULTAD DE FILOLOGÍA ATTRIBUTION OF AUTHORSHIP OF ARDEN OF FAVERSHAM: A FORENSIC LINGUISTIC STUDY OF WILLIAM SHAKESPEARE AND CHRISTOPHER MARLOWE ATRIBUCIÓN DE AUTORÍA DE ARDEN OF FAVERSHAM: UN ESTUDIO LINGÜÍSTICO FORENSE DE WILLIAM SHAKESPEARE Y CHRISTOPHER MARLOWE Tesis presentada para optar al Grado de Doctor por Juan Antonio Latorre García Directoras: Dr. María Goicoechea de Jorge Dr. Elena Martínez Caro Madrid, 2021 ACKNOWLEDGEMENTS Como no podía ser de otra manera, quiero empezar esta sección dedicándole unas palabras a mis directoras, María Goicoechea de Jorge y Elena Martínez Caro. María, gracias por la brillantez de tus ideas. Si tú no me hubieras hablado de Arden of Faversham, esta tesis habría sido algo completamente distinto. Además, quiero que sepas que siempre te estaré agradecido por la calidad humana y la empatía que me mostraste en el momento más delicado de mi vida académica. Elena, gracias por haberme convertido en un lingüista infinitamente mejor de lo que era cuando nos conocimos. Después de estos cinco años, te veo más como una amiga que como una directora, y eso refleja la manera que tienes de preocuparte por la gente que te rodea. Krzysztof Kredens, thank you for supervising my research mobility and for teaching me so much about forensic linguistics. When I look back, I realize I barely knew anything about the discipline when I got there. I admire you intellectually and I have enjoyed each of our conversations. Rui Sousa-Silva and Gerardo Sierra, thank you for your commendable work as external reviewers of the thesis. The implementation of your suggestions has increased its quality substantially. I hope we can be in touch after the oral defence, since I would love to keep learning from you. Victoria Martín de la Rosa, gracias por ser la primera persona de la Complutense que vio algo especial en mí y por tu interés constante a lo largo de todos estos años. Si bien mi timidez no me permite ser más expresivo en ciertas ocasiones, espero que sepas que te aprecio enormemente y que siempre puedes contar conmigo. David Vallejo y Alekos Camino, gracias por haberme contestado unas mil doscientas veintisiete veces a la pregunta «¿A vosotros esta oración os suena bien?» Gracias también por ayudarme con la maquetación de la tesis y, sobre todo, por vuestra infinita paciencia. Carlos Antón, gracias por embarcarte en el Proyecto ALTXA conmigo. Quién nos lo iba a decir cuando nos conocimos hace quince años. ¿Recuerdas cuando te dejaste el pelo largo y se metían contigo en el instituto? No sé si alguna vez te lo dije, pero siempre te admiré por tener la personalidad suficiente como para seguir llevándolo así hasta que tú quisiste. Eres un buen amigo y estoy orgulloso de la persona que eres. Irene Mezquita, gracias por haberme hecho tan, tan feliz. Siempre serás «mi florecita». Si hay dos personas a las que quiero dedicar esta tesis son mi padre, José Antonio Latorre, y mi madre, Amalia García. Me habéis enseñado a ser un buen padre. Papá, gracias por inculcarme la importancia de amar lo que hago. Mamá, gracias por ayudarme a desarrollar mi sensibilidad. Ojalá pueda disfrutar de vosotros mucho más tiempo (papá, deja de fumar). He escrito estos agradecimientos unos días antes de entregar la tesis y me siento como un personaje de Joyce en plena epifanía. Precisamente por eso, porque creo que ahora mismo poseo una visión de la realidad que se va a esfumar más pronto que tarde, quiero dejarme algo escrito a mí mismo. Cuando vuelvas a leer esta tesis dentro de unos meses o unos años y se te ocurra la manera de mejorarla o encuentres alguna errata (sé que no vas a parar hasta que lo consigas), no seas un cretino contigo mismo. Puedes continuar siéndolo al valorar el resto de los aspectos de tu vida, pero no este. Hazme ese favor, majo. Ahora sí, como dijo Humbert Humbert, «contemplen esta maraña de espinas». i TABLE OF CONTENTS Table of contents ............................................................................................................... i Abstract ............................................................................................................................ vi Resumen ........................................................................................................................ viii List of tables ...................................................................................................................... x List of figures ................................................................................................................ xiii CHAPTER 1 | INTRODUCTION ................................................................................ 14 1.1. Background and rationale for research ................................................................. 14 1.2. Objectives and hypotheses .................................................................................... 17 1.3. Overview and organization of the thesis ............................................................... 20 CHAPTER 2 | HISTORICAL AND LITERARY BACKGROUND ........................ 23 2.1. William Shakespeare ............................................................................................ 23 2.2. Christopher Marlowe ............................................................................................ 27 2.3. The anonymous play Arden of Faversham ........................................................... 31 2.4. Summary ............................................................................................................... 33 CHAPTER 3 | LINGUISTIC BACKGROUND: AN INTRODUCTION TO FORENSIC LINGUISTICS AND AUTHORSHIP ATTRIBUTION STUDIES .... 35 3.1. Definition of forensic linguistics .......................................................................... 35 3.2. Historical development of forensic linguistics ..................................................... 36 3.3. Areas of forensic linguistics .................................................................................. 39 3.3.1. The written language of the law ..................................................................... 42 3.3.2. The spoken language of the law .................................................................... 48 3.3.3. The linguist as an expert witness ................................................................... 53 ii 3.4. Authorship attribution studies ............................................................................... 56 3.4.1. Attribution of authorship in cases of plagiarism ........................................... 58 3.4.2. Attribution of authorship of criminal texts with an open set of suspects ...... 59 3.4.3. Attribution of authorship of criminal texts with a close set of suspects ........ 61 3.4.4. Attribution of authorship of historical texts .................................................. 62 3.5. Summary ............................................................................................................... 75 CHAPTER 4 | METHODOLOGY ............................................................................... 76 4.1. Delimitation of the scope of the investigation ...................................................... 76 4.2. Data collection ...................................................................................................... 78 4.3. Extraction and adaptation of the samples ............................................................. 80 4.4. Structure of the analysis ........................................................................................ 83 4.5. Selection of the authorship tests for the analysis and the role of ALTXA ........... 86 4.5.1. Quantification of the relative frequency of keywords ................................... 89 4.5.2. Quantification of the average number of words per sentence ....................... 91 4.5.3. Quantification of the lexical richness ............................................................ 93 4.5.4. N-gram tracing ............................................................................................... 94 4.5.5. The Zeta test ................................................................................................ 100 4.6. Summary ............................................................................................................. 105 CHAPTER 5 | PRE-STUDIES ................................................................................... 107 5.1. Pre-study on the calculation of the average number of words per sentence (Pre- study 1) .............................................................................................................. 107 5.1.1. Average number of words per sentence of scenes of between 100 and 450 words .................................................................................................................. 108 5.1.2. Average number of words per sentence of scenes of between 500 and 950 words .................................................................................................................. 109 iii 5.1.3. Average number of words per sentence of scenes of between 1,100 and 1,700 words ........................................................................................................ 111 5.1.4. Average number of words per sentence of scenes of almost 2,000 words or more ................................................................................................................... 112 5.1.5. Conclusions derived from Pre-study 1 ........................................................ 114 5.2. Pre-study on the calculation of the lexical richness (Pre-study 2) ...................... 114 5.2.1. Lexical richness of scenes of between 100 and 450 words ......................... 115 5.2.2. Lexical richness of scenes of between 500 and 950 words ......................... 117 5.2.3. Lexical richness of scenes of between 1,100 and 1,700 words ................... 119 5.2.4. Lexical richness of scenes of almost 2,000 words or more ......................... 120 5.2.5. Conclusions derived from Pre-study 2 ........................................................ 123 5.3. Pre-study on n-gram tracing (Pre-study 3) .......................................................... 123 5.3.1. N-gram tracing with scenes of between 100 and 450 words ....................... 124 5.3.2. N-gram tracing with scenes of between 500 and 950 words ....................... 135 5.3.3. N-gram tracing with scenes of between 1,100 and 1,700 words ................. 146 5.3.4. N-gram tracing with scenes of almost 2,000 words or more ....................... 157 5.3.5. Conclusions derived from Pre-study 3 ........................................................ 167 5.4. Pre-study on the conduction of the Zeta test (Pre-study 4) ................................. 168 5.4.1. Zeta test with scenes of almost 2,000 words or more .................................. 169 5.4.2. Interpretation of the results .......................................................................... 178 5.4.3. Conclusions derived from Pre-study 4 ........................................................ 179 5.5. Summary ............................................................................................................. 180 CHAPTER 6 | CASE STUDY: ATTRIBUTION OF AUTHORSHIP OF THE SCENES OF ARDEN OF FAVERSHAM ................................................................. 182 6.1. Scene I.i (5,135 words) ....................................................................................... 182 iv 6.2. Scene II.i (916 words) ......................................................................................... 186 6.3. Scene II.ii (1,694 words) ..................................................................................... 187 6.4. Scene III.i (822 words) ....................................................................................... 189 6.5. Scene III.ii (516 words) ...................................................................................... 190 6.6. Scene III.iii (357 words) ..................................................................................... 191 6.7. Scene III.iv (240 words) ..................................................................................... 192 6.8. Scene III.v (1,293 words) .................................................................................... 193 6.9. Scene III.vi (1,265 words) .................................................................................. 195 6.10. Scene IV.i (838 words) ..................................................................................... 196 6.11. Scene IV.ii (263 words) .................................................................................... 197 6.12. Scene IV.iii (593 words) ................................................................................... 198 6.13. Scene IV.iv (1,251 words) ................................................................................ 199 6.14. Scene V.i (3,477 words) .................................................................................... 200 6.15. Scene V.ii (106 words) ..................................................................................... 203 6.16. Scene V.iii (179 words) .................................................................................... 204 6.17. Scene V.iv (117 words) ..................................................................................... 205 6.18. Scene V.v (321 words) ...................................................................................... 206 6.19. Epilogue or Scene V.vi (148 words) ................................................................. 206 6.20. Summary ........................................................................................................... 207 CHAPTER 7 | DISCUSSION OF THE RESULTS ................................................. 209 CHAPTER 8 | CONCLUSION AND FUTURE LINES OF RESEARCH ............ 217 8.1. Summary and implications of the findings ......................................................... 217 8.2. Limitations and future lines of research .............................................................. 223 PRIMARY SOURCES ............................................................................................... 226 v BIBLIOGRAPHY AND REFERENCES ................................................................. 227 APPENDICES ............................................................................................................. 235 Appendix 1 ................................................................................................................. 236 Appendix 2 ................................................................................................................. 237 Appendix 3 ................................................................................................................. 239 Appendix 4 ................................................................................................................. 244 Appendix 5 ................................................................................................................. 254 vi ABSTRACT This research project sets out to accomplish two main objectives. On the one hand, to determine the authorship of the Elizabethan play Arden of Faversham with a forensic linguistic analysis considering William Shakespeare and Christopher Marlowe as the possible candidates. On the other hand, to develop the computational program ALTXA, which can carry out authorship attribution tests within the disciplinary framework of forensic linguistics and has an intuitive interface, which will facilitate the work of other linguists and the spread of studies of this kind in educational contexts. Firstly, some biographical data of Shakespeare and Marlowe is offered to establish a connection between both which justifies their possible cooperation in the elaboration of Arden of Faversham, together with a historical and literary analysis of the play itself. Afterwards, forensic linguistics is defined and a series of basic notions about its historical development and main areas of study are provided to narrow down progressively the scope of the thesis until authorship attribution studies are presented and explained in more depth, with a special emphasis on previous investigations on the authorship of Arden of Faversham. These sections are not merely descriptive, since they include theoretical contributions that anticipate the methodological approach selected for the posterior analysis. To study the authorship of Arden of Faversham, a corpus with undisputed plays was compiled for each of the two candidates of the investigation following the hypothesis that, if the idiolect of an author is a dynamic phenomenon, these reference corpora should be formed by plays that were written in a similar period to that in which the disputed work was created, with which they should also share a tragic tone. In addition, under the belief that the validity of each attribution method depends on the type of text and authors with which it is applied, the thesis is divided into a series of pre-studies and a case study. The pre-studies have the purpose of evaluating which authorship attribution methods present a high degree of effectiveness to distinguish between undisputed scenes of Shakespeare and Marlowe depending on their length. These scenes were divided in four groups whose range of words is from 100 to 450, from 500 to 950, from 1,100 to 1,700 and almost 2,000 or more. To carry out the pre-studies, five authorship attribution methods were selected and programmed as functionalities of ALTXA. These are based on the calculation of the relative frequency of a list of keywords selected by the vii researcher, the quantification of the average number of words per sentence of the texts and their lexical richness, tracing common n-grams and the conduction of the Zeta test. The first of these methods was eventually discarded because of its reliance on subjective criteria, whereas the others were included in the pre-studies. The identification of common n-grams proved to be effective to distinguish between Shakespearean and Marlowian scenes from the four groups, whereas the Zeta test proved its reliability to analyse scenes from the fourth group. Consequently, these were the methods employed in the case study, that is, in the attribution of authorship of the scenes of Arden of Faversham, which were studied independently, since the play may have been written in collaboration. The results of the case study associate the authorship of 15 of the 19 scenes of the play with Marlowe, whereas only one of them has a higher degree of resemblance with the Shakespearean idiolect. The three remaining scenes present inconclusive results. Even though there is the need to include other Elizabethan playwrights as possible candidates in future research, this thesis provides sufficient evidence to suggest that the participation of Shakespeare in the elaboration of Arden of Faversham is minor or non-existent, which is already a significant finding that contradicts what has been stated by other scholars. Furthermore, it also suggests that the participation of Marlowe is undeniable, especially in the elaboration of Scene V.i, whose results are so overwhelming that it seems unthinkable that it could have been written by another author. In sum, the present doctoral thesis attributes to Marlowe the authorship of a section of the Elizabethan play Arden of Faversham, which has been catalogued as anonymous for over four centuries. This breakthrough has been accomplished with the assistance of the software ALTXA, which will be used to build an educational project that aims at contributing to the development of the discipline, that has been constantly evolving over the last decades as a result of the irruption of new technologies. viii RESUMEN Esta investigación pretende cumplir dos objetivos principales. Por un lado, determinar la autoría de la obra teatral isabelina Arden of Faversham mediante un análisis lingüístico forense con William Shakespeare y Christopher Marlowe como posibles candidatos. Por otro lado, desarrollar el programa informático ALTXA, capaz de llevar a cabo tareas de atribución de autoría comunes en el ámbito disciplinario de la lingüística forense a través de una interfaz intuitiva, lo que permitirá facilitar la labor de otros lingüistas y la difusión de este tipo de estudios en contextos docentes. Primeramente, se aportan datos biográficos de Shakespeare y Marlowe para establecer una conexión entre ellos que justifique su posible colaboración en la elaboración de Arden of Faversham, así como un breve análisis literario e histórico de la propia obra. Posteriormente, se define qué es la lingüística forense y se ofrecen una serie de nociones básicas acerca de su desarrollo histórico y principales áreas de estudio con el propósito de acotar progresivamente el foco de la tesis hasta que los estudios de atribución de autoría son presentados y explicados de forma más exhaustiva, con un énfasis especial en aquellos estudios previos sobre la autoría de Arden of Faversham. Estas secciones de la tesis no son puramente descriptivas, sino que incluyen contribuciones teóricas que anticipan el enfoque metodológico seleccionado para realizar el análisis posterior. Para estudiar la autoría de Arden of Faversham, se compiló un corpus de obras indubitadas para cada uno de los dos candidatos de la investigación bajo la hipótesis de que, si el idiolecto de un autor es un fenómeno dinámico, estos corpus de referencia deben estar formados únicamente por obras teatrales que fueron escritas en un período similar al de la obra disputada, con la que además deben compartir un tono trágico. Asimismo, con la creencia de que la validez de cada método de atribución depende del tipo de texto y los autores sobre los que se aplica, esta tesis está dividida en una serie de estudios previos y un estudio de caso. Los estudios previos tienen el propósito de evaluar qué métodos de atribución de autoría poseen un alto índice de efectividad para distinguir entre escenas indubitadas de Shakespeare y Marlowe en función de la longitud de estas, las cuales fueron divididas en cuatro grupos. El rango de palabras que presenta cada grupo de escenas es de entre 100 y 450, entre 500 y 950, entre 1.100 y 1.700 y casi 2.000 o más. Para la realización de estos estudios previos, se eligieron cinco métodos de atribución de autoría que fueron ix programados como funcionalidades de ALTXA. Estos se basan en el cálculo de la frecuencia relativa de una lista de palabras clave seleccionadas por el investigador, la cuantificación del número medio de palabras por frase de los textos y su riqueza léxica, la identificación de n-gramas en común y la conducción del Zeta test. El primero de estos métodos fue finalmente descartado por su carácter subjetivo, mientras que los demás sí formaron parte de los estudios previos. La identificación de n-gramas comunes demostró su efectividad para distinguir entre escenas de Shakespeare y Marlowe de los cuatro grupos, mientras que el Zeta test probó su efectividad con las escenas del cuarto. Por ello, estos fueron los métodos empleados en el estudio de caso, es decir, en la atribución de autoría de las escenas de Arden of Faversham, las cuales fueron estudiadas de forma independiente, puesto que la obra pudo haber sido escrita en colaboración. Los resultados del estudio de caso asocian la autoría de 15 de las 19 escenas de la obra con Marlowe, mientras que solo una de ellas guarda un índice de similitud mayor con el idiolecto shakespeareano. Las tres escenas restantes presentan resultados no concluyentes. A pesar de que existe la necesidad de incluir a otros dramaturgos isabelinos como posibles candidatos en futuras investigaciones, esta tesis ofrece pruebas suficientes para sugerir que la participación de Shakespeare en la elaboración de Arden of Faversham es menor o inexistente, lo cual ya es un hallazgo valioso que contradice lo expuesto por otros académicos. Asimismo, también sugiere que la participación de Marlowe es innegable, especialmente en la primera escena del quinto acto, donde los resultados son tan abrumadores que parece impensable que esta pueda haber sido escrita por otro autor. En suma, la presente tesis doctoral atribuye a Christopher Marlowe la autoría de una parte de la obra isabelina Arden of Faversham, la cual ha permanecido catalogada como anónima durante más de cuatro siglos. Este hallazgo ha sido posible gracias al software ALTXA, sobre el cual se pretende construir un proyecto docente que contribuya al desarrollo de la disciplina, que ha estado evolucionando de forma constante durante las últimas décadas como consecuencia de la irrupción de las nuevas tecnologías. x LIST OF TABLES Table 1 | Length of the scenes of Arden of Faversham .................................................. 84 Table 2 | Stage 1 of the pre-study on the average number of words per sentence ........ 108 Table 3 | Stage 2 of the pre-study on the average number of words per sentence ........ 110 Table 4 | Stage 3 of the pre-study on the average number of words per sentence ........ 111 Table 5 | Stage 4 of the pre-study on the average number of words per sentence ........ 113 Table 6 | Stage 1 of the pre-study on the lexical richness ............................................. 115 Table 7 | Stage 2 of the pre-study on the lexical richness ............................................. 117 Table 8 | Stage 3 of the pre-study on the lexical richness ............................................. 119 Table 9 | Stage 4 of the pre-study on the lexical richness ............................................. 121 Table 10 | N-gram tracing with Scene II.iii from Richard III ....................................... 125 Table 11 | N-gram tracing with Scene III.iii from Richard III ..................................... 126 Table 12 | N-gram tracing with Scene V.ii from Richard III........................................ 127 Table 13 | N-gram tracing with Scene II.iv from Richard II ........................................ 128 Table 14 | N-gram tracing with Scene III.i from Richard II ......................................... 129 Table 15 | N-gram tracing with Scene II.iii from Edward II ........................................ 130 Table 16 | N-gram tracing with Scene III.i from Edward II ......................................... 131 Table 17 | N-gram tracing with Scene IV.i from Edward II ......................................... 132 Table 18 | N-gram tracing with Scene IV.iv from Edward II ....................................... 133 Table 19 | N-gram tracing with Scene III.i from The Jew of Malta.............................. 134 Table 20 | N-gram tracing with Scene II.iv from Richard III ....................................... 136 Table 21 | N-gram tracing with Scene III.iv from Richard III...................................... 137 Table 22 | N-gram tracing with Scene IV.ii from Richard III ...................................... 138 Table 23 | N-gram tracing with Scene III.iv from Richard II ....................................... 139 Table 24 | N-gram tracing with Scene V.i from Richard II .......................................... 140 Table 25 | N-gram tracing with Scene II.i from Edward II .......................................... 141 xi Table 26 | N-gram tracing with Scene III.iii from Edward II ....................................... 142 Table 27 | N-gram tracing with Scene III.iii from The Jew of Malta ........................... 143 Table 28 | N-gram tracing with Scene IV.v from The Jew of Malta ............................ 144 Table 29 | N-gram tracing with Scene V.i from The Jew of Malta ............................... 145 Table 30 | N-gram tracing with Scene I.i from Richard III .......................................... 146 Table 31 | N-gram tracing with Scene II.ii from Richard III ........................................ 147 Table 32 | N-gram tracing with Scene I.i from Richard II............................................ 148 Table 33 | N-gram tracing with Scene II.ii from Richard II ......................................... 149 Table 34 | N-gram tracing with Scene II.iii from Richard II ........................................ 150 Table 35 | N-gram tracing with Scene I.i from Edward II ............................................ 151 Table 36 | N-gram tracing with Scene III.ii from Edward II ........................................ 152 Table 37 | N-gram tracing with Scene V.i from Edward II .......................................... 154 Table 38 | N-gram tracing with Scene I.i from The Jew of Malta ................................ 155 Table 39 | N-gram tracing with Scene IV.iv from The Jew of Malta ........................... 156 Table 40 | N-gram tracing with Scene I.iii from Richard III ........................................ 157 Table 41 | N-gram tracing with Scene IV.iv from Richard III ..................................... 158 Table 42 | N-gram tracing with Scene V.iii from Richard III ...................................... 159 Table 43 | N-gram tracing with Scene I.iii from Richard II ......................................... 160 Table 44 | N-gram tracing with Scene II.i from Richard II .......................................... 161 Table 45 | N-gram tracing with Scene I.iv from Edward II .......................................... 163 Table 46 | N-gram tracing with Scene II.ii from Edward II ......................................... 164 Table 47 | N-gram tracing with Scene I.ii from The Jew of Malta ............................... 165 Table 48 | N-gram tracing with Scene II.iii from The Jew of Malta ............................. 166 Table 49 | N-gram tracing with Scene I.i from Arden of Faversham ........................... 182 Table 50 | N-gram tracing with Scene II.i from Arden of Faversham .......................... 186 Table 51 | N-gram tracing with Scene II.ii from Arden of Faversham ......................... 187 Table 52 | N-gram tracing with Scene III.i from Arden of Faversham......................... 189 xii Table 53 | N-gram tracing with Scene III.ii from Arden of Faversham ....................... 191 Table 54 | N-gram tracing with Scene III.iii from Arden of Faversham ...................... 192 Table 55 | N-gram tracing with Scene III.iv from Arden of Faversham....................... 193 Table 56 | N-gram tracing with Scene III.v from Arden of Faversham ........................ 194 Table 57 | N-gram tracing with Scene III.vi from Arden of Faversham....................... 195 Table 58 | N-gram tracing with Scene IV.i from Arden of Faversham ........................ 196 Table 59 | N-gram tracing with Scene IV.ii from Arden of Faversham ....................... 197 Table 60 | N-gram tracing with Scene IV.iii from Arden of Faversham ...................... 198 Table 61 | N-gram tracing with Scene IV.iv from Arden of Faversham ...................... 199 Table 62 | N-gram tracing with Scene V.i from Arden of Faversham .......................... 200 Table 63 | N-gram tracing with Scene V.ii from Arden of Faversham......................... 203 Table 64 | N-gram tracing with Scene V.iii from Arden of Faversham ....................... 204 Table 65 | N-gram tracing with Scene V.iv from Arden of Faversham ........................ 205 Table 66 | N-gram tracing with Scene V.v from Arden of Faversham ......................... 206 Table 67 | N-gram tracing with Scene V.vi from Arden of Faversham ........................ 207 Table 68 | Summary of the results derived from the case study ................................... 209 xiii LIST OF FIGURES Figure 1 | Interface of ALTXA for text analysis ............................................................ 90 Figure 2 | Interface of ALTXA for n-gram tracing......................................................... 96 Figure 3 | Interface of ALTXA for the Zeta test ........................................................... 101 Figure 4 | Zeta test with Scene I.ii from Richard III .................................................... 169 Figure 5 | Zeta test with Scene I.iii from Richard III ................................................... 171 Figure 6 | Zeta test with Scene V.iii from Richard III .................................................. 172 Figure 7 | Zeta test with Scene I.iii from Richard II ..................................................... 173 Figure 8 | Zeta test with Scene IV.i from Richard II .................................................... 174 Figure 9 | Zeta test with Scene I.iv from Edward II ..................................................... 175 Figure 10 | Zeta test with Scene II.ii from Edward II ................................................... 176 Figure 11 | Zeta test with Scene I.ii from The Jew of Malta ........................................ 177 Figure 12 | Zeta test with Scene II.iii from The Jew of Malta ...................................... 178 Figure 13 | Zeta test with Scene I.i from Arden of Faversham..................................... 185 Figure 14 | Zeta test with Scene II.ii from Arden of Faversham .................................. 189 Figure 15 | Zeta test with Scene V.i from Arden of Faversham ................................... 202 14 CHAPTER 1 | INTRODUCTION 1.1. Background and rationale for research The present doctoral thesis intends to conduct a forensic linguistic analysis of the authorship of the Elizabethan play Arden of Faversham considering William Shakespeare and Christopher Marlowe as the possible candidates. This analysis will be carried out with a software named ALTXA, which has been specifically designed for its conduction and whose implementation in educational and professional contexts stands as the second main objective of the thesis. My interest in the forensic analysis of Elizabethan texts was developed in my MA dissertation entitled Attribution of Authorship of “The Merchant of Venice” and “Henry VI” through Linguistic Parameters: A Contrastive Study between William Shakespeare and Christopher Marlowe. While The Merchant of Venice has been attributed to Shakespeare without major doubts for centuries, Henry VI, Part I had been recently attributed to Shakespeare and Marlowe as a collaborative play (see Section 2.2) a few years before I started working on it. Given my lack of expertise in the subject, the main objective of my MA dissertation was to work on the authorship of well-attributed plays to determine if its approach could reach similar conclusions to those presented by experts in the field. As suggested by one of my supervisors –Dr. María Goicoechea–, I decided to focus on the authorship of Arden of Faversham in this doctoral thesis, given that it could constitute a natural continuity of my previous work. Arden of Faversham is an Elizabethan play that remains anonymous and hence this project could move a step further than analysing already well-attributed plays and fill a gap in knowledge, since there is not much research on this issue from a forensic linguistic perspective. The play Arden of Faversham was approximately elaborated in 1592 and, despite the presence of studies that have attempted to link its authorship to Shakespeare, Marlowe and other playwrights (see Section 2.3), it is still considered anonymous due to a lack of conclusive evidence (Kinney, 2009). This work narrates the killing of a landowner from Faversham named Arden by his wife, his wife’s lover and two professional criminals. The play was inspired by a real event that had been documented by Raphael Holinshed in his historical work entitled Chronicles of England, Scotland and Ireland (1577; second 15 edition, 1587). The fact that the text has remained anonymous makes it suitable for a study of this kind, whose approach will be briefly described in the following paragraphs. Forensic linguistics can be defined as a moderately recent branch of applied linguistics that focuses on those legal cases in which the use of language is involved to some extent (Tiersma, 1993; McMenamin, 2002; Gibbons, 2003; Olsson, 2008; Momeni, 2011; Perkins & Grant, 2012; Correa, 2013; Udina, 2017). One of the many applications of this discipline, which will be addressed in detail in Chapter 3, is the attribution of authorship of anonymous or disputed texts, such as threatening notes, suicide letters and, as in the case of the present research, literary texts. Even though the establishment of the authorship of Arden of Faversham has no major legal implications, the forensic approach adopted for the thesis is justified by the development of computational tools over the last decades, which allows researchers to take into account statistical variables that could not be accessed before and thus produce more precise results than previous studies conducted from both literary and linguistic perspectives (Kinney, 2009). In other words, the present investigation aims to cover a gap in knowledge that has been present for over four centuries by using computational resources that facilitate the adoption of innovative empirical approaches that differ from more traditional ones, such as those that characterize the field of literary criticism. It is probable that Arden of Faversham was written in collaboration, since most of the plays that were elaborated during the Elizabethan period had more than one author (Kermode, 2005; Holland, 2007). Following this line of thought, the attribution of authorship of the disputed text, that is, Arden of Faversham, will consist of 19 distinct analyses, one for each of its scenes, given that if two or more playwrights were involved in the elaboration of the play, a reasonable possibility is that they divided it in terms of the thematic content of its scenes (see Section 4.4). This means that the scenes of Arden of Faversham will be analysed as independent texts to obtain results that may provide substantial evidence for the presence of more than one author involved in its creation, which would reflect more faithfully the reality of the time in which it was elaborated. As will be developed in Section 4.5, the methods with which the scenes of Arden of Faversham will be analysed are based on the quantification of linguistic variables, given that the study belongs to the disciplinary field of forensic linguistics. Studies of this kind are built on the notion of idiolect, which stands as the variety of the language that each 16 individual uses and is reflected in their written or spoken production (Coulthard, 2004). Hence, authorship attribution studies within the field of forensic linguistics are based on the study of the idiolectal features of the possible authors of the disputed text by analysing their undisputed works, that is, those texts that have been attributed to them beyond any reasonable doubt, for a posterior comparison with such disputed text to discern with which of the idiolectal models it presents a higher degree of resemblance (Coulthard et al., 2010). The tests that will be considered for the attribution of authorship of Arden of Faversham will be revealed in Section 1.2. The criteria for the compilation of the corpora of undisputed works of each candidate of the study, also known as the reference corpora, becomes of paramount importance for the development of the research and may have a crucial impact on its outcome, as can be inferred from the previous paragraph. The present doctoral thesis intends to suggest a distinct approach to compile these corpora in comparison with previous studies on the same subject, which will be briefly discussed in Section 1.2 and addressed in depth in Section 4.2. In addition, a series of methodological decisions will be made during the conduction of certain tests to increase their effectiveness, which will be also mentioned in Section 1.2 and discussed in more detail in Section 4.5. As pointed out at the beginning of the chapter, this doctoral thesis has two main interrelated objectives. It seeks to investigate the authorship of Arden of Faversham and to elaborate a computational tool oriented to the conduction of authorship attribution studies, which might contribute to facilitate the work of the forensic linguist and the implementation of these studies in educational contexts. With such purpose in mind, ALTXA, a program that presents an intuitive interface and allows for the conduction of a wide range of authorship tests that are common in the field of forensic linguistics, has been created in collaboration with computer programmer Carlos Antón and will be offered as a free software to the academic community. The main reasons underlying the creation of this software will be addressed in the following section, whereas its functionalities and what makes it different from other computational tools for text analysis will be expounded in Section 4.5. In sum, the present investigation aims to analyse the authorship of Arden of Faversham, a literary text that has remained anonymous since the Elizabethan period, considering Shakespeare and Marlowe as the candidates for such attribution. An innovative approach for the compilation of the reference corpora and the application of 17 certain authorship tests will be adopted. The analysis of the scenes of Arden of Faversham will be carried out with this newly designed software called ALTXA, which has been specifically programmed for the conduction of this research and has the purpose of proving its validity in authorship attribution studies within the framework of forensic linguistics. These objectives and more specific questions will be expanded in the following section. 1.2. Objectives and hypotheses The overall objectives of the investigation are to discern the likeliest authorship of the 19 scenes of the Elizabethan play Arden of Faversham considering William Shakespeare and Christopher Marlowe as the potential candidates and to develop the software ALTXA, which will be employed for such authorship analysis. With the purpose of meeting the abovementioned objectives, the following subgoals and hypotheses need to be addressed. The first subgoal consists in the compilation of a Shakespearean and a Marlowian reference corpus for a posterior comparison with Arden of Faversham to determine with which of the two idiolectal models the play presents a higher degree of resemblance. This compilation will be built upon the most relevant hypothesis of the thesis, which is related to what can be considered a representative sample of an author’s idiolect. While many scholars have compiled the reference corpora of the candidates involved in the attribution of authorship of Arden of Faversham with texts that belong to distinct periods (see Kinney, 2009) and even to dissimilar literary genres (see Taylor, 2019), the Shakespearean and the Marlowian reference corpora of the present study will be formed by plays that were written no more than three years apart from the creation of Arden of Faversham and are not comedies. Such decision derives from the belief that, when two authors that have highly similar styles are compared, the most representative reference corpora are not the largest, but those that are able to represent more faithfully the conditions in which the disputed text was written. In other words, the present investigation intends to suggest the hypothesis that Shakespeare and Marlowe may have adopted a series of idiolectal features that were only present during a specific period of time and in plays with a tragic tone, for which their identification and classification can be more useful to determine the likeliest authorship of the disputed text than those features that are present in their entire work, which were probably quite similar in many playwrights at the time. This issue will be discussed in depth in Chapter 4. 18 While Richard III and Richard II will be used for the compilation of the Shakespearean reference corpus, Edward II and The Jew of Malta will integrate the Marlowian corpus (see Section 4.2 for an account of the reasons underlying the selection of these plays). The next subgoal of the thesis is therefore to clean these works as well as Arden of Faversham with the purpose of making the subsequent analysis as precise as possible. For such end, every stage direction or linguistic element that is not part of a dialogue will be erased under the assumption that these constitute a distinct subgenre within the play where idiolectal features are less likely to be found. In other words, only the direct interventions of the characters will be taken into consideration in the authorship analysis. The following subgoal consists in the selection of a series of authorship tests for the conduction of the study, and these will be based on the quantification of the relative frequency of a group of keywords chosen by the researcher in the plays, the calculation of their lexical richness and their average number of words per sentence, the analysis of the common n-grams between the disputed text and the reference corpora and the conduction of the Zeta test (see Section 4.5 for a detailed explanation of these procedures). It seems reasonable to test the effectiveness of these methods before applying them in the analysis of the scenes of Arden of Faversham, for which a series of pre-studies where the attribution of authorship of scenes taken from the Shakespearean and the Marlowian reference corpora will be carried out. The main reason behind the conduction of these pre-studies is to only include in the final case study on the authorship of Arden of Faversham those tests that have been proved to be reliable to distinguish between samples written by Shakespeare and Marlowe. Some of the methods selected for the conduction of these pre-studies and the final case study will be applied in a slightly distinct way than in the works of other scholars under the following hypotheses. The first one is that word n-grams reflect more distinctive linguistic constructions than character n-grams (see Section 4.5 for a thorough explanation of the fundamentals of n-gram tracing). Secondly, the hypothesis that a Zeta test should not compare an author versus a group of authors, as has been done by other scholars (see Kinney, 2009; Elliott & Greatley-Hirsch, 2017), but that it should only compare candidates individually will be suggested (see Section 4.5 for an account of the 19 reasons that justify the adoption of this principle, as well as of the procedures underlying the conduction of a Zeta test). Lastly, a few basic notions about the computational tool that will be used to conduct the authorship tests of the pre-studies and the case study need to be provided to justify the selection of its development as one of the two main objectives of the thesis. As will be addressed more extensively in Section 4.5, the computer programs and programming languages that are currently available to carry out a forensic authorship analysis could be generally divided into those that present an intuitive interface, but lack some of the advanced functionalities that a study of this nature requires, as it is the case of Voyant Tools, and those that include a broad range of functionalities, but whose usage is only accessible to people with a solid IT background, as happens with the programming language R. The need to create a tool that combines a wide catalogue of authorship tests that are common within the framework of forensic linguistic studies with an intuitive interface arose, for which I decided to design, with the assistance of computer programmer Carlos Antón, a software called ALTXA. This tool offers the possibility to carry out the authorship tests selected for the conduction of the research and presents an accessible interface so that it can be used by linguists without experience in programming (see Section 4.5 for a tutorial on how to use the software and an account of what makes it different from others of this kind). There is a complementary relationship between the study of the authorship of the scenes of Arden of Faversham and the creation of a computational tool that can simplify the work of other forensic linguists and enhance the implementation of these studies in educational settings. Even though this discipline has generated a growing interest among students, it is not part of the curriculum of many European universities due to a lack of experts in the field and/or educational tools, with a few exceptions like Aston University and Cardiff University in the United Kingdom, or the Universidad Autónoma de Madrid and the Universitat Pompeu Fabra in Spain. In other words, the accomplishment of the two main objectives of this doctoral thesis can be seen as a contribution to the development of this relatively modern discipline in the academic community. This section has depicted the main objectives that this project seeks to accomplish, as well as a series of subgoals that allow for the fulfilment of these objectives and the main hypotheses on which the investigation is built. The following section will provide a general overview of the scope of the chapters in which the thesis is organized. 20 1.3. Overview and organization of the thesis The present thesis is divided into eight chapters, whose thematic content will be briefly described in this section. The previous sections of this chapter have explained the background and the rationale for the research, the main objectives and the subgoals that it sets out to attain and a series of hypotheses that allow for the adoption of an innovative approach for the conduction of the authorship tests on which it is built. Following this introductory chapter, Chapters 2 and 3 will be devoted to providing the reader with a solid historical, literary and linguistic background that facilitates the understanding of the subsequent authorship analysis of Arden of Faversham. Chapter 2 will focus on the historical and literary background of the thesis and will be divided into three sections, the first one being a simplified biography of William Shakespeare that aims to offer some basic notions about this historical figure and the possible manners in which he might have been involved in the elaboration of Arden of Faversham. Afterwards, the chapter will address the life of Christopher Marlowe and his connections with William Shakespeare in order to provide substantial historical evidence to suggest a possible cooperation between both playwrights in the elaboration of the play. Lastly, Chapter 2 will offer an in-depth explanation of the story behind the play Arden of Faversham, its main literary features, the historical implications derived from its publication and the distinct approaches that have been adopted over the years to deal with the question of its disputed authorship. Chapter 3 will provide the reader with a general overview of what forensic linguistics consists in and the three main branches in which it is divided, the first one being the so- called the written language of the law, which focuses on the adaptation of legal documents to make them more accessible to those citizens that do not have a deep understanding of the law. The second branch of the discipline is known as the spoken language of the law and focuses on the oral interactions underlying the legal proceedings, such as police investigative interviews. Lastly, the many applications of the third branch of forensic linguistics, entitled the forensic linguist as an expert witness, will be developed with a special focus on authorship attribution studies. The chapter will end with a critical review of previous studies on the attribution of authorship of literary texts in general and that of Arden of Faversham in particular, which will be of vital importance to justify the approach and the authorship tests selected for this investigation. 21 Chapter 4 will focus on the methodological aspects of the research. It will start by explaining the reasons why Shakespeare and Marlowe have been selected as the candidates for the attribution of authorship of Arden of Faversham, instead of other playwrights that have also been suggested as its potential authors in previous studies. A detailed explanation of the criteria underlying the selection of the plays to compile the Shakespearean and the Marlowian reference corpora will also be provided, together with the process by which these texts and Arden of Faversham will be cleaned to optimize the effectiveness of the subsequent authorship analysis. Afterwards, this chapter will address the distinct methods selected for the conduction of the research and the need to test them in a series of pre-studies that will focus on the analysis of well-attributed scenes of Shakespeare and Marlowe as if they were disputed texts not only to discern if these methods are effective enough to distinguish between the two authors, but also to estimate what kind of results can be considered significant in the posterior analysis of the scenes of Arden of Faversham. This chapter will also provide an in-depth account of the creation of the software ALTXA, its functionalities and the niche that it could occupy in the academic community. Chapter 5 will present the results derived from the pre-studies. As underlined earlier, these will analyse undisputed scenes of Shakespeare and Marlowe to determine which methods can be considered effective enough to be included in the final case study on the authorship of the scenes of Arden of Faversham. Chapter 6 will show the results of the case study, where the authorship of the 19 scenes in which Arden of Faversham is divided will be analysed independently. Only those methods that have proved their reliability to distinguish between scenes written by Shakespeare and Marlowe will be used. The results of the case study will be commented from a more holistic perspective in Chapter 7, which will allow for the attribution of certain groups of scenes of the play to Shakespeare or Marlowe. In addition, Chapter 7 will assess whether the objectives that have been previously delineated in this introductory chapter have been accomplished or not, as well as the vailidity of the hypotheses that have also been formulated in the previous section of this chapter. 22 Finally, Chapter 8 will summarize the main findings of the doctoral thesis and how these relate to its objectives and hypotheses. It will also highlight the main limitations identified during its conduction and suggest possible lines of future research. 23 CHAPTER 2 | HISTORICAL AND LITERARY BACKGROUND The present chapter intends to offer a historical and literary introduction about the authors and the play that constitute the focus of the thesis. Considering that Arden of Faversham was approximately written in 1592, William Shakespeare’s life events until the last decade of the sixteenth century and a complete biography of Christopher Marlowe will be provided, since the latter was murdered in the year 1593. In other words, this chapter aims to offer a general idea of what both playwrights had accomplished before and during the period in which Arden of Faversham was created. Additionally, the play itself will be presented and discussed from a historical and literary perspective that will address the question of its disputed authorship, which will allow for the establishment of a connection between this chapter of the thesis and the following, where the fundamentals of forensic linguistics and authorship attribution studies will be expounded. 2.1. William Shakespeare The main objective of this section is to present a simplified biography of William Shakespeare that will mainly focus on the events that occurred until the period in which Arden of Faversham was written. It must be borne in mind that his relationship with Christopher Marlowe will be discussed after the biography of the latter is presented in the next section. William Shakespeare (1564-1616) was the son of John Shakespeare, a Catholic glover who managed to become a successful businessman by selling wool and, ultimately, a distinguished member of the political elite in Stratford-upon-Avon, although he ended up facing economic and legal issues during the last years of his life (Fallow, 2016). Wood (2016) states that, despite the fact that there is little historical record of Mary Shakespeare, it is known that she inherited lands from her father and married John Shakespeare, with whom she had eight children, three of which experimented a premature death. As a result, William Shakespeare became the eldest of the five siblings who reached adulthood. Halliday (1964) suggests that if the social status of his parents is taken into consideration, the likeliest possibility is that William Shakespeare had the opportunity to attend the local school in Stratford-upon-Avon, where he received a free education until the age of sixteen. According to Schoenbaum (1985) and Honigman (2001), the Bard attended the New King’s School, where he primarily focused on Ovid’s Metamorphoses, as well as on the works of Virgil, Plautus and Cicero. Even though the Elizabethan 24 dramatist Ben Jonson accused him of knowing “little Latin and less Greek” in his famous poem,1 Honigman argues that “Shakespeare probably read Latin as easily as most graduates with honours in Latin today” (2001, p. 2). He further adds that the Bard was acquainted with Greek tragedies, “either in the original or in Seneca’s adaptations” (2001, p. 3). As previously mentioned, Shakespeare abandoned his studies at the age of sixteen, which is the point of departure of his so-called lost years (Holland, 2007), given the gap of historical knowledge regarding his whereabouts throughout the subsequent years. There is considerable speculation about his development as a playwright after he left the King’s School, but the most accepted theory is that Shakespeare worked as a country schoolmaster in Lancashire (Losey, 1927; Honigman, 2001; Holland, 2007; Potter, 2012). This theory is built upon the figure of John Cottom, one of Shakespeare’s teachers during his last year at school who came back to his hometown in Lancashire with his brother, a Catholic that was eventually tried and executed. According to Holland, it was John Cotton who “encouraged Shakespeare, as a member of a recusant Catholic family, to be a schoolteacher in a staunchly Catholic household in the north of England” (2007, p. 8). The main piece of evidence that has led to such supposition can be found in the will of Alexander de Hoghton of Lea Hall, where he advises his neighbour in Lancashire, Sir Thomas Hesketh, to hire someone called William Shakeshaft as a servant (Honigman, 2001; Holland, 2007). Additionally, Potter explains that “Hoghton bequeathed his musical instruments and ‘play-clothes’ to his heir, in case he wanted to ‘keep players’” (2012, p. 48). Hence, Shakespeare may have started to write and perform in the abovementioned plays, given that “the performance of plays by boys was recommended by forward-looking schoolmasters” (Honigman, 2001, p. 3). Regardless of what Shakespeare did during those years, historians agree on the fact that he was back in Stratford by November 1582, since the license for his marriage with Anne Hathaway, who was already pregnant with their first daughter, Susanna, is still preserved (Honigman, 2001; Holland, 2007; Potter, 2012). Two years after their first daughter was born, William Shakespeare and Anne Hathaway had twins, named Judith and Hamnet, and Holland points out that “there are no records of further children” (2007, p. 10). One could ponder that it was unusual for a couple to only have three children at 1 https://www.poetryfoundation.org/poems/44466/to-the-memory-of-my-beloved-the-author-mr-william- shakespeare https://www.poetryfoundation.org/poems/44466/to-the-memory-of-my-beloved-the-author-mr-william-shakespeare https://www.poetryfoundation.org/poems/44466/to-the-memory-of-my-beloved-the-author-mr-william-shakespeare 25 that time, and hence Honigman suggests that “it may have been shortly thereafter that he left home for a career in the theatre” (2001, p. 3). As Holland (2007) explains, what Shakespeare did between 1585 and 1592 remains unclear and has been an object of speculation for scholars. According to Potter (2012), it is probable that the Bard joined the Queen’s Men after they performed in Stratford in 1587. This theory is built upon the idea that Shakespeare was incorporated as a replacement for William Knell, one of the leading actors of the company who was murdered in a fight that year. Regarding the reasons behind the selection of Shakespeare for such position, Potter states the following: The 23-year-old Shakespeare would have had to be very impressive to take over from the man who played the title role in The Famous Victories of Henry V; it would have been easier for this large and distinguished company to promote one of its own players. (2012, p. 54) It is worth mentioning that, with the purpose of supporting the abovementioned theory, Holland (2007) illustrated in his work the way in which some of the plays that were performed by the Queen’s Men may have had an influence on Shakespeare’s early plays. The first solid piece of evidence of Shakespeare’s reputation as a playwright dates back to 1592 and it is a written document in which Robert Greene, another dramatist, presented the Bard as an intruder who had undeservedly gained popularity among his contemporaries: In his Groat’s Worth of Wit Robert Greene addressed three “gentlemen, his quondam acquaintance, that spend their wits in making plays” (Marlowe, Peele, Nashe) and denounced “an upstart crow, beautified with our feathers, that with his ‘Tiger’s heart wrapped in a player’s hide’ supposes he is as well able to bombast out [i.e. write] a blank verse as the best of you: and, being an absolute Johannes fac totum, is in his own conceit the only Shake-scene in a country.” (Honigman, 2001, pp. 3-4) Honigman (2001) further states that Greene was clearly mocking the verse “O tiger’s heart wrapped in a woman’s hide” from Henry VI, Part III and was trying to create a distance between Shakespeare and the rest of the Elizabethan dramatists like Marlowe and himself, who did attend university, in contrast to the Bard. As can be inferred from 26 the quote presented above, by the year 1592, William Shakespeare was already established as a prominent playwright in London, where he complemented the elaboration of his plays with interpreting his own characters on stage, as it could have been the case with Arden of Faversham (see Section 2.3). It seems impossible to discern the exact date in which the Bard’s early plays were elaborated and thus experts cannot determine with certainty whether Marlowe was Shakespeare’s predecessor or his contemporary (Honigman, 2001). As Holland explains, scholars have structured the chronology of these plays according to their own vision of the playwright, given that “each reordering produces a new narrative for Shakespeare’s contact with other plays and other dramatists, his reading, and his development as a dramatist” (2007, p. 14). In any case, Greene’s text in 1592 proves beyond reasonable doubt that Shakespeare had already written the three parts of Henry VI by that year, and the likeliest possibility is that he collaborated with Marlowe in their creation, as will be explained further on (see Section 2.2). Regardless of the exact date in which they were elaborated, scholars agree on the fact that Shakespeare also wrote, among other plays, The Two Gentlemen of Verona, The Taming of the Shrew, Titus Andronicus and Richard III during the first half of the decade, that is, when Arden of Faversham was produced, and that he was already established as a prestigious playwright in London. Furthermore, due to the closing of theatres in the city between 1592 and 1594 because of the plague, Shakespeare wrote the poems Venus and Adonis and The Rape of Lucrece, which he dedicated to the Earl of Southampton (Halliday, 1964; Schoenbaum, 1985; Honigman, 2001; Holland, 2007; Potter, 2012). As time went by, Shakespeare proved to have inherited his father’s talent for business. Honigman explains that “as he prospered, he took on new responsibilities, with four distinct roles in his company: ‘sharer’ […] of the company’s assets […], ‘house-holder’ […] of the Globe and Blackfriars theatres, dramatist [and] actor” (2001, p. 5). Even though the posterior years of his life do not constitute the focus of this biography, it is noteworthy to mention that, during the beginning of the seventeenth century, the Bard acquired more properties and experimented the most prolific period of his career as a playwright, in which he created literary masterpieces such as Hamlet and Othello until his death in 1616 (Honigman, 2001). 27 In sum, this simplified biography has provided an insight into William Shakespeare’s early education, as well as the most plausible speculations concerning his development and establishment as a playwright, which differs from the traditional academic path followed by authors like Christopher Marlowe, who constitutes the focus of the following section. 2.2. Christopher Marlowe This section intends to offer a brief biography of the dramatist Christopher Marlowe that will address his educational background, the details of his alternative life as a spy and his premature death at the age of 27. Afterwards, his relationship with William Shakespeare and the collaboration between both playwrights in the elaboration of Henry VI (Parts I, II and III) will be discussed. Lastly, the popular hypothesis about Marlowe’s allegedly fake death will be examined from a historical perspective with the aim of providing the reader with a background for the many speculations that have been created over the years concerning his figure. Christopher Marlowe (1564-1593) was the son of a humble shoemaker in Canterbury (Riggs, 2004; Hopkins, 2008; Greenblatt & Logan, 2012; Nicholl, 2016). According to Riggs, his education began when he was six in petty school. The instruction of such lessons did not have a permanent building assigned, but the likeliest possibility is that Marlowe learned how to read and write in the church of St. George the Martyr, where the syllabus was mainly based on “religious instruction rather than practical skills” (2004, p. 25), given that Queen Elizabeth regarded the education of children “as a way of fashioning obedient subjects” (2004, p. 27). During the year 1578, Marlowe got a scholarship to attend the King’s School in Canterbury, even though it is believed that he had already been studying there before he was given the scholarship. Two years later, in 1580, Marlowe moved to Cambridge, where he was admitted in the Corpus Christi College once he was awarded the Parker Scholarship (Honan, 2006; Hopkins, 2008). Hopkins highlights the fact that the Parker Scholarship was “essentially designed to be held primarily by students intending to proceed to holly orders” (2008, p. 5) and she further suggests that it is unclear whether Marlowe really had in mind the idea of pursuing an ecclesiastical career or if he simply saw this scholarship as an opportunity to secure a 28 high-quality education. In any case, the dramatist ended up being accused of atheism after he allegedly criticized and mocked Christianity in public, as will be developed further on. Marlowe finished his BA Degree and a Master of Arts Degree in Cambridge, where he particularly focused on theology, philosophy and Greek and worked on the translation of the authors Ovid and Lucan (Riggs, 2004; Hopkins, 2008). It was during those years that the young dramatist established a close relationship with adult playwrights such as Robert Greene and Robert Sidney, who may have contributed to shape his literary style (Tallent, 2007; Hopkins, 2008; Nicholl, 2016). As a matter of fact, Hopkins states that it is “highly probable that he had already written one or both of Dido and the first part of Tamburlaine while still at the university” (2008, p. 8). As can be inferred from the biographical notes presented above, Marlowe was a precocious talent that managed to stand out as a promising playwright from an early age. Furthermore, the dramatist apparently reconciled his life as a student with working as a spy for the Protestants (Honan, 2006; Hopkins, 2008; Greenblatt & Logan, 2012). The aforementioned theory about Marlowe’s collaboration with the Protestant regime as a spy becomes considerably feasible if the many controversies that arose when he applied for his MA degree in 1587 are noted. The university was reluctant to give Marlowe his degree on the ground that the young dramatist had the intention of going to Rheims, which “had been the home of the seminary to which young English Catholic gentlemen could go in secret to train for the priesthood, which they were forbidden to do in Elizabeth’s Protestant England” (Hopkins, 2008, p. 10). Nevertheless, the Privy Council contacted the university and demanded that Marlowe should be granted his degree under the principle that “it is not Her Majesty’s pleasure […] that anyone employed as he had been in matters touching the benefit of his country should be defamed by those that are ignorant in the affairs he went about” (Greenblatt & Logan, 2012, p. 1106), which makes it seem that Marlowe was sent to Rheims by the Protestants themselves to spy on the Catholics, according to the authors. Indeed, Tallent (2007) stresses the fact that his labour as a spy was crucial for his development as a playwright and that both professions were highly complementary. Despite his probable cooperation with the Protestant regime, Marlowe was accused of atheism by Thomas Kyd and Richard Baines, who testified that “it was the dramatist’s custom in table talk to jest at the Scriptures, gibe at the efficacy of prayer, and strive in argument to confute the sayings of prophets and holy men” (Kocher, 1948, p. 111). 29 Hopkins (2008) states that Kyd probably gave such testimony under torture and that Richard Baines should not be considered a reliable witness, given that he and Marlowe were arrested the previous year in Flushing for coining and both accused each other, which might reflect the hostility that previously existed between them. Greenblatt and Logan explain that these accusations could have been a relevant factor for his premature death: On May 30, 1593, an informer named Richard Baines submitted a note to the Council on which, on the evidence on Marlowe’s own alleged utterances, branded him with atheism, sedition and homosexuality. Four days later, at an inn in the London suburb of Deptford, Marlowe was killed by a dagger thrust, purportedly in an argument over the bill. (2012, p. 1107) Even though Hopkins (2008) doubts whether Richard Baines’ note was submitted on May 27 or June 2, she supports the theory that Marlowe was stabbed to death at an inn in Deptford by a man called Ingram Freezer. Regarding the reasons behind the murder, the author points out that there were three people with Marlowe at the crime scene. Firstly, Ingram Freezer himself, who was known to be involved in “shady business dealings” (2008, p. 18) with Nicholas Skeres, who was also present at the inn with Robert Polley, a member of the intelligence services. She further suggests the following: The fact that the men spent all day together before Marlowe died does not really suggest a premeditated killing; it perhaps indicates more negotiations that had gone wrong, or, as they themselves say, an unexpected disagreement, in which Marlowe was outnumbered. (2008, p. 19) As can be observed in the quote presented above, the events that led to Marlowe’s assassination remain mysterious and it seems impossible to discern if it was due to a simple argument about the bill, a negotiation that went wrong, or if the Protestant regime ordered his execution as a result of the accusations of atheism. It is of paramount importance to highlight the fact that “those who were arrested in connection with the murder were briefly held and then quietly released” (Greenblatt & Logan, 2012, p. 1107). Once the main events of Christopher Marlowe’s life have been depicted, it is time to discuss his relationship with William Shakespeare. Even though it cannot be assured with certainty that both writers knew each other, this could be seen as a solid theory if certain factors are taken into consideration. Firstly, the fact that their residences in London were 30 considerably close (Hopkins, 2008) and, secondly, that both playwrights were widely acknowledged in their guild (Astrana, 1964). Lastly, it must be pointed out that, given the strict deadlines that had to be met, the cooperation between two or more playwrights in the production of their plays became a frequent practice during the Elizabethan period. Indeed, Holland suggests that “a minority of plays had a single author” (2007, p. 15) and Kermode (2005) further hypothesized that the five acts of some Elizabethan plays might have been elaborated by five distinct playwrights due to the abovementioned time constraints. For such reason, there is a plethora of studies with the purpose of offering substantial evidence for the existence of collaboration in plays whose authorship has been traditionally attributed to Shakespeare. These studies have been conducted from historical, literary and, as in the case of this thesis, linguistic approaches (see Section 3.4.4). In the light of the findings provided by these lines of research, The New Oxford Shakespeare has credited Christopher Marlowe as the co-author of Henry VI (Parts I, II and III).2 Finally, the hypothesis that Marlowe’s assassination was a set up in which he exchanged his clothes with a corpse to leave the country and cover up later clandestine activities will be briefly addressed. Nicholl (2016) presented an extensive review of this conspiracy theory, which is based on the notion that Christopher Marlowe faked his own death with the purpose of escaping from the accusations of heresy and ran away to Europe, where he continued his labour as a playwright and sent his works back to England, which were ultimately signed by William Shakespeare. I will not dwell on the details of this hypothesis since, as Nicholl proves in his work, it lacks a solid historical basis, given that much of the information that has been presented as proof was indeed taken from fictional works. Consequently, the present thesis is built upon the idea that Marlowe was murdered in 1593, which allows to put its focus on the authorship of Arden of Faversham, since it was written before that year. The real events that inspired the elaboration of this play, as well as its main literary features and reception in the academic community, will be addressed in the following section. 2 https://www.theguardian.com/culture/2016/oct/23/christopher-marlowe-credited-as-one-of- shakespeares-co-writers https://www.theguardian.com/culture/2016/oct/23/christopher-marlowe-credited-as-one-of-shakespeares-co-writers https://www.theguardian.com/culture/2016/oct/23/christopher-marlowe-credited-as-one-of-shakespeares-co-writers 31 2.3. The anonymous play Arden of Faversham The final section of this chapter aims to provide information about the plot of the play Arden of Faversham, its historical origin, the impact that it may have caused on the Elizabethan society and the wide range of approaches that have been adopted over the years to attribute to the play its likeliest authorship, given that it still remains unclear. Lastly, the main reasons underlying the selection of this text as the focus of the investigation will be pointed out. In his Chronicles of England, Scotland and Ireland, Holinshed stated that “there was at Fa[v]ersham in Kent a gentleman named Arden, most cruell[y] murdered […] by the procurement of his own wife” (1587, p. 1062). The author further explained the whole story behind the assassination of Arden, a landowner from Faversham who was stabbed at his own residence while he was playing backgammon. This crime was perpetrated by Alice, who was his wife, Mosby, who was maintaining an adulterous relationship with Alice, and two criminals who were hired for this endeavour. Therefore, the idea for the play Arden of Faversham was inspired by a real event that had been depicted in Holinshed’s historical work, which was a common source of inspiration for dramatists (Barker & Hinds, 2003; Dudgeon, 2009). Before addressing Arden’s death, the play portrays a succession of attempts at killing his character that consistently fail, sometimes in a comical manner, for instance when Black Will and Shakebag, the two killers that were hired by Alice, desperately try to find Arden in the middle of the fog until Shakebag falls into a ditch. It must be pointed out that all the characters of the play preserved the original name of those who were originally described in Holinshed’s Chronicles of England, Scotland and Ireland except for the criminal Loosebag, who was portrayed as Shakebag in the play, which could be an indication of Shakespeare’s participation in its performance and, perhaps, its elaboration. The text is catalogued as a domestic play (Barker & Hinds, 2003; Richardson, 2006; Dudgeon, 2009; Christensen, 2017), which means that it deals with the life events of the middle classes, instead of narrating the misfortunes of kings and nobles, on whom Elizabethan tragedies were mainly focused. According to Barker and Hinds (2003), one of the most innovative aspects of the play is that it becomes hard for the audience to sympathize with any of the characters, given that even Arden, the victim of the crime, is presented as a sinner. Christensen points out the fact that, in addition to being a greedy 32 landowner who shows no mercy with those from whom he took their lands throughout the play, the character of Arden “comes home only long enough to leave again, attending to a succession of business obligations, yet he is also unwilling to transfer power at home” (2017, p. 33). As a result, every character is punished with death at the end of the play, with the notable exception of Franklin, Arden’s best friend and one of the few relatable characters for the audience together with Bradshaw, a goldsmith who was not involved nor aware of the assassination plans but ended up being executed for delivering a letter of the criminals. Taking into consideration the abovementioned notions about the characters’ sins and their subsequent punishment, one could ponder that the play intends to enhance the traditional family values that characterized the Protestant society. Nevertheless, Barker and Hinds state that if Arden of Faversham is not read as a fictional play, but as a historical text, the audience may switch the attention from the flaws of its characters to the economic, social and political agents affecting their actions. They further indicate that “[f]rom this perspective, Arden of Faversham becomes a play profoundly concerned with the deleterious impact of the Reformation, and the consequent transfers of land ownership, on social kinship bonds and responsibilities” (2003, p. 78). In sum, the authors suggest that the interpretation of the play as a historical document enables a social analysis that otherwise would have remained overlooked. Once a summary of the play has been provided, together with a brief historical, literary and social analysis, the text will be approached from a legal perspective. Even though Arden of Faversham is still considered anonymous, it has been traditionally associated with Shakespeare, Marlowe and even Thomas Kyd (Barker & Hinds, 2003). The fact that the text entered into the Register of the Stationers Company in 1592 and was printed that same year by Edward White, who also published William Shakespeare’s Titus Andronicus, Cristopher Marlowe’s The Massacre at Paris and Thomas Kyd’s The Spanish Tragedy could be considered as an indication of a common link among the three authors and their possible cooperation in the elaboration of the play (M. Goicoechea, personal communication, June 7, 2020). Nevertheless, as can be seen in the works of Kinney (2009) and Taylor (2019), there is a plethora of alternative candidates for the attribution of authorship of the play that differ from the three abovementioned playwrights, such as Robert Greene, Anthony Munday, George Peele or Thomas Watson, among others (see Section 3.4.4 for a more detailed list of the possible authors and Section 33 4.1 for an explanation of the reasons underlying the selection of Shakespeare and Marlowe as the candidates for the present study). Kinney (2009) made a distinction among the three approaches that have been adopted over the years with the purpose of determining the likeliest authorship of Arden of Faversham. Firstly, between the sixteenth and the eighteenth century, it was based on paratextual parameters, such as the claims that appeared on the title pages. Secondly, the author mentions the existence of a period in which the attribution of authorship of the play relied on literary criteria, for instance “shared common words, parallel passages, and even commonality of tone” (2009, p. 81). Lastly, he points out that the nineteenth century was the point of departure for a scientific approach to which the statistical procedures that currently characterise the field of forensic linguistics can be related. In fact, Kinney (2009) conducted a forensic linguistic analysis of the play where, although he did not achieve solid results for most of the scenes, he attributed the authorship of certain sections of the text to William Shakespeare using the Zeta test (see Section 3.4.4). Finally, the two main reasons behind the selection of Arden of Faversham as the focus of this thesis will be briefly highlighted. Firstly, the fact that the play remains anonymous is highly convenient for a study of this kind. In addition, the computational resources that have been developed over the last decades allow for the adoption of innovative ways to analyse ancient texts, which can complement the works of other scholars (see Section 1.1). 2.4. Summary On the whole, this chapter has provided the reader with a historical and a literary approximation to the play whose authorship constitutes the focus of the thesis and the playwrights that will be considered as the potential candidates for such attribution. These candidates are William Shakespeare and Christopher Marlowe, who were active playwrights at the time in which Arden of Faversham was published and are known to have worked together in the elaboration of Henry VI (Parts I, II and III). The play Arden of Faversham has been portrayed as a literary work with a historical origin and its plot and main literary features have been commented under the belief that they will be of use to have a better understanding of the subsequent linguistic analysis. This study will be conducted from a forensic linguistic approach, which belongs to a modern branch of 34 applied linguistics that will be explained in depth throughout the following chapter, which will stand as the linguistic background for the investigation. 35 CHAPTER 3 | LINGUISTIC BACKGROUND: AN INTRODUCTION TO FORENSIC LINGUISTICS AND AUTHORSHIP ATTRIBUTION STUDIES The present thesis intends to conduct a forensic linguistic study of the play Arden of Faversham to discern its likeliest authorship, and therefore the first section of this chapter aims to expound what forensic linguistics is. Afterwards, an explanation of its historical development and applications will be provided, as well as a review of previous research on authorship attribution studies in general and, ultimately, on the authorship of Arden of Faversham in particular, with the objective of narrowing down progressively the scope of the investigation. 3.1. Definition of forensic linguistics A series of complementary definitions of forensic linguistics will be presented to provide the reader with a basic notion of what this discipline is based on. The International Association for Forensic and Legal Linguistics (IAFLL) states on its website that the discipline “covers all areas where law and language intersect” (2020, About section). McMenamin defined it as “the scientific study of language as applied to forensic purposes and contexts” (2002, p. 67). Similarly, Perkins and Grant delineated it as “a branch of applied linguistics relating to the law and legal processes” (2012, p. 174) and Momeni stated that “forensic linguistics as a sub-branch of linguistics is a new-born science which makes a connection between linguistics and the law” (2011, p. 733). Gibbons presented a definition given by the AILA Scientific Commission on Forensic Linguistics, which was based on the idea that the objective of a forensic linguist is “to support the study of the link between language and law in all its forms” (2003, p. 12). Lastly, Olsson pointed out that there are two ways to describe what forensic linguistics is. On the one hand, it could be outlined “by considering the kinds of text forensic linguists are sometimes asked to examine. If a text is somehow implicated in a legal or criminal context then it is a forensic text” (2008, p. 1). On the other hand, it could be labelled as “the application of linguistics to legal questions and issues” (2008, p. 3). Therefore, the presence of expert linguists in courts should be normalized, as pointed out by Shuy: […] specialists in any field often have something useful to contribute to lawyers as they try their cases. For many years, medical doctors, psychiatrists, engineers 36 and others have been called on to testify many times in civil or criminal law cases. (2002, p. 24) In other words, the emergence of the figure of the forensic linguist could be seen as necessary, considering that there is no law without language (Tiersma, 1993; Correa, 2013; Udina, 2017). In brief, forensic linguistics could be broadly designated as the intersection between language and law, and it is the inherent relationship between both which justifies the necessity of this discipline, given that the law is articulated with language. 3.2. Historical development of forensic linguistics The birth of the term forensic linguistics dates back to the year 1968, when Professor Jan Svartvik published The Evans Statements: A Case for Forensic Linguistics. This investigation focused on four statements that Timothy John Evans, accused of murdering his wife and daughter, had allegedly dictated to police officers in 1949, incriminating himself in the homicides for which he was ultimately hanged a year later. In his work, Svartvik proved that those statements were unlikely to have been uttered by someone with Evan’s educational background and that they presented a series of idiolectal inconsistencies among them, for instance when giving time indications: In the present case, it seems unlikely that the illiterate Evans would have said “the 12.55 a.m. train”, particularly since in two previous statements and in the witness- box at the trial, he is recorded as saying “the five to one train” in describing the same event. (1968, p. 20) After the execution of Evans, it was discovered that it was John Christie, who lived in the same building as Evans, the one who killed his wife and daughter, which reinforces Svartvik’s hypothesis that somebody edited the transcription of the statement provided by Evans. Even though the term forensic linguistics was not used until 1968, it should be pointed out that “forensic linguistics is a new discipline with a long history” (Goustos, 1995, p. 99). As a matter of fact, the application of linguistic knowledge to legal contexts could be found in ancient Greece, where playwrights used to accuse each other of plagiarism (Olsson, 2008). Coulthard et al. explain that philosophers showed great interest in the relationship between language and law, and for instance Aristotle wrote in the fourth 37 century B.C. a “typology of rhetoric according to the occasions it served, distinguishing between political, ceremonial and forensic oratory, the latter associated with the courtroom” (2010, p. 529). Similarly, during the first century, Gaius Aelius Gallus elaborated a monolingual dictionary in Latin with the purpose of providing accurate definitions for terms that were frequent in the legal contexts of the time (Coulthard et al., 2010). These authors further explain that there has been a plethora of historical moments in which laws have been enforced to have a direct effect on linguistic practices and name a few cases: On a practical level, issues of language rights and language planning also have a long history. In England, the Pleading in English Act of 1362 was enacted to replace French with English in legal proceedings, and the Blasphemy Act of 1650 penalized acts of, inter alia, “filthy and lascivious speaking”, although rather than being aimed at suppressing bad language, it was in fact an attempt to silence a Protestant sect known as the Ranters […]. A law with a significant impact on the linguistic situation in Spain was King Charles III’s 1768 decree giving the Castillian dialect priority in administration and education. (Coulthard et al., 2010, p. 530) Continuing on the subject of the historical relationship between language and law, Olsson (2004, 2008) points out that it was in the first decades of the eighteenth century when the earliest controversy about the Bible’s authorship was documented. This arose when a priest from Germany called H. B. Witther suggested that the Pentateuch may have been written in collaboration by several unknown authors. These lines of thought were supported a century later by J.G. Eichhorn, a professor at the University of Jena, and by the end of the nineteenth century, the arrival of Darwinism generated a deeper interest in authorship attribution studies concerning the Bible. According to Olsson (2004, 2008), the attribution of authorship of Shakespearean texts has also been an object of speculation among scholars for over two centuries, especially after Reverend James Wilmot wrote in 1785 that it was Francis Bacon the actual author of some of the plays whose authorship had been traditionally attributed to the Bard. 38 The author further suggests in his work that the first properly scientific paper on authorship attribution was that of Mendelhall in 1887, in which, based on a letter sent by Professor De Morgan thirty years before, he conducted a study that was built upon the following principle: […] every writer makes use of a vocabulary which is peculiar to himself […]. In the use of that vocabulary in composition, personal peculiarities in the construction of sentences will, in the long-run, recur with such regularity that short words, long words, and words of medium length, will occur with definite relative frequencies. (1887, p. 238-239) As can be inferred from the quote presented above, Mendelhall defended that the average number of letters per word could be a reliable discriminator to determine the likeliest authorship of a given text. To validate this hypothesis, he compared fragments of Charles Dickens’ works among themselves and, at the same time, with excerpts taken from other authors, and concluded that there was a high degree of resemblance among the extracts of Dickens and that these were different from the ones written by the other authors. Afterwards, he carried out similar procedures with the works of others, such as John Stuart Mill, in order to demonstrate that an author’s literary style remains stable and thus some of its features could be quantified for further studies on authorship attribution. During the twentieth century, there were studies that analysed the intersection between language and law before Svartvik coined the term forensic linguistics, for instance Philbrick’s Language and the Law: The Semantics of Forensic English (1949), where he deconstructed the language used in courts by analysing its principles; and Mellinkoff’s The Language of the Law (1963), in which he made an accurate description of the language used in British and American legal contexts, providing an exhaustive explanation of its development since the times before the Norman conquest until the twentieth century. Furthermore, he argued that legal language should be more intelligible, presenting cases where misunderstandings were prone to happen, this being a premonition of Gibbons’ current lines of research (see Section 3.3.1). After the publication of The Evan’s Statement: A Case for Forensic Linguistics, which marked, as pointed out at the beginning of this section, the official birth of the discipline, Coulthard (2010) explains that there was little research on the field for the next decades, with the exception of Robert Shuy’s contributions in America (see Section 3.3.3). 39 Nevertheless, already a decade ago, Coulthard highlighted the fact that “during the past fifteen years there has been a rapid growth in the frequency with which legal professionals and courts in a number of countries have called upon the expertise of linguists” (2010, p. 15). Forensic linguistics has turned into a relatively well-established discipline with its own association, which is called the International Association for Forensic and Legal Linguistics (IAFLL)3 and organizes a biennial international conference, as well as its own specialized scientific journal, entitled The International Journal of Speech, Language and the Law, formerly known as Forensic Linguistics. In addition, there has been an exponential growth in the amount of specialized undergraduate courses on forensic linguistics and the Universities of Aston and Cardiff offer their own MA in Forensic Linguistics. Research in this discipline is currently divided into three main fields of study, which will be presented in the following section. 3.3. Areas of forensic linguistics This section aims to offer a brief overview of the main research areas that can be found within the framework of forensic linguistic studies to provide the reader with a general idea of the types of investigation which have been developed over the last decades. A more exhaustive explanation of the principles of each area and a review of their most famous cases will be presented in the following sub-sections. Forensic linguistics is an interdisciplinary field, and this can be exemplified by the wide range of crimes that can be perpetrated through language to some extent. As highlighted by Momeni, “[l]anguage crimes are insult, foul language, bribery, perjury, false advertisement, etc. Even crimes like larceny, kidnapping and murder which require language before realization can be considered as language crimes; therefore, they need linguistic analysis” (2011, p. 733). Despite the bewildering variety of crimes in which language can be involved and the many legal contexts in which the figure of a linguist could be of use, the forensic linguistic community has reached an agreement on the three main branches in which their research can be divided, which could be delineated as follows: 3 This name was selected in 2021 by the members of the association, who were given the opportunity to vote for it. Before that, it was called the International Association for Forensic Linguistics (IAFL). 40 A) The written language of the law: This area of forensic linguistics is focused on the deconstruction of written legal texts, such as contracts and courtroom instructions, with the purpose of finding the linguistic patterns that characterize them. The main objective underlying these lines of research is to cope with the problems that may be found by average citizens when they do not understand the content and, by extension, the implications of these documents due to the complexity of their vocabulary and/or grammatical constructions. In other words, what forensic linguists try to do is to make the law more accessible to the majority of the population, regardless of their educational background (Coulthard, 2010; Perkins & Grant, 2012). B) The spoken language of the law: Forensic linguists are expected to examine the oral interactions that take place in legal contexts, for instance during the communication of rights at the moment of arrest, in police investigative interviews or in those courtroom interactions in which interpreters are required to assist someone who is involved directly in the case and does not speak the native language of the members in court (Coulthard, 2010; Coulthard et al., 2010; Kredens, 2016). There are cases in which the spoken and the written legal discourse may overlap, as happens with police cautions, which are written statements uttered by police officers during an arrest and before investigative interviews to inform the suspect about their rights. Hence, police cautions could be considered as “a written legal text that has to be performed as spoken interaction” (Perkins & Grant, 2012, p. 175). For that reason, these two branches of forensic linguistics could be unified and presented as “the language of the legal process” (Coulthard & Johnson, 2010, p. 11). C) The linguist as expert witness: Forensic linguists can be called on stage and provide their expert testimony in court, as well as linguistic evidence that may have an impact on the result of the trial. In other words, this branch could be designated as “that portion of forensic linguistics which provides advice and opinions for investigative and evidential purposes” (Coulthard et al. 2010, p. 536). Even though one of the main scopes of interest of this branch are authorship attribution studies, that is, the focus of the present research, there are many other cases in which the expertise of a forensic linguist can determine the course of a 41 trial. These research areas will be schematically presented below (Gibbons, 2011) and further explained in Sections 3.3.3 and 3.4: • Cases of inappropriate communication and language crimes, such as vilification or harassment. • Cases of legal disputes among trademarks. • Cases of meaning transfer. • Cases of disputed or anonymous authorship. The criteria for the admissibility of evidence tend to differ depending on the system where the trial takes place. More than a decade ago, Grant pointed out that “the United Kingdom jurisdictions do not yet use specific scientific criteria to decide on the acceptability of scientific evidence. It is the expert rather than the method that is approved” (2007, p. 3). Nevertheless, as the author predicted in his article, the British courts have progressively embraced the influence of the United States, where the admissibility of evidence in legal settings is determined by the so-called Daubert criteria, which were established after the resolution of a trial in which two people alleged that the serious defects with which they were born came as a result of the Bendectin that their mothers ingested while being pregnant (Daubert v. Merrel Dow Pharmaceuticals Inc., 1993). Even though the prosecution presented the testimony of well-credentialed experts, the case was dismissed under the principle that the scientific techniques through which they incriminated the pharmaceutical company had to be unanimously considered reliable by the scientific community to be admissible in court. The outcome of this trial established a precedent that determined the nature of the evidence that can be admitted in court, especially when novel sciences such as forensic linguistics are involved in the case (Howald, 2008). According to the Daubert criteria, the requirements for the admissibility of evidence in a trial could be described as follows: Whether the theory or technique has been or can be tested; whether the technique has been subjected to peer review or publications; whether the technique is generally accepted in the scientific community; [and] whether the technique has a known or potential error rate. (Ishihara, 2014, p. 25) Due to the influence of these criteria, there has been a tendency over the last decades towards the usage of quantitative methods in forensic linguistics, given that statistical analyses allow for a properly scientific presentation of results, which increases the 42 forensic expert’s credibility in court (Grant, 2007). For that reason, the present investigation will rely on statistical procedures (see Chapter 4). Once this section has made a brief introduction about the three main branches in which forensic linguistics can be divided, these will be explained in depth in the following sub-sections. 3.3.1. The written language of the law This section intends to describe the main features of the language of the law and the consequences derived from the difficulties that the average reader has in order to understand it. Afterwards, the Plain English Movement will be presented and supported with some practical examples concerning jury instructions and police cautions. Even though the laws of a country apply to its entire population, the way in which they are written creates a distance between them and many citizens (Gibbons, 2003; Tiersma, 2009; Coulthard et al., 2010; Perkins & Grant, 2012; Correa, 2013). The irony behind this situation is the point of departure for the forensic linguist’s work, as Gibbons points out: […] the Common Law presumes that “ignorance of the law is no defence.” If the law is presented in language that cannot be understood by the people to whom it applies, this presumption can lead to grave injustice as well as logical absurdity. This means that legal language should be intelligible to the audience for that language, including the people affected by it […]. Perfect understanding of the law and the justice may prove unachievable, but its pursuit is imperative. (2003, pp. 162-163) The most distinctive linguistic features of legal language will be schematically presented below with the purpose of illustrating the abovementioned difficulties that many citizens have to face when they are exposed to legal documents and, as a result, the need of linguistic expertise to build a bridge between both parts. A) The use of extremely long sentences with complex syntactic structures, such as embedded sub-clauses, which are more complicated to process at a cognitive level (Gibbons, 2003; Alcaraz, 2005; Correa, 2013). Furthermore, legal contracts in English are usually characterized not only by the prominence of excessively long 43 sentences, but also by a lack of punctuation (Coulthard et al. 2010; Perkins & Grant, 2012). B) The abundance of archaisms like herein and self-referential terms, as well as lexical items which are only accessible to a specialized audience, as it is the case of contingency (Gibbons, 2003; Coulthard et al., 2010; Perkins & Grant, 2012; Correa, 2013). C) The profusion of passive constructions without an agent, which hinders the identification of the participants involved in the action to which the text refers. As a matter of fact, there is also a considerable number of expressions in texts of this nature that constitute a source of ambiguity for the identification of participants, such as the party of the third part (Gibbons, 2003). D) The frequency of impersonal sentences (Correa, 2013). E) The coexistence of binomial terms in the same text, as in the case of will and testament, whose simultaneous usage could generate in the reader the idea that they have dissimilar meanings (Perkins & Grant, 2012). F) The usage of polysemic words, such as enterprise (Alcaraz, 2005). G) The tendency towards nominalization (Gibbons, 2003; Correa, 2013). H) The presence of double negatives, some of them “including ‘hidden’ negatives such as unless, forbid and deny” (Gibbons, 2003, p. 171). I) The use of formulaic or stereotyped expressions with a vague meaning, as in the case of beyond reasonable doubt (Dumas, 2002; Alcaraz, 2005). With the purpose of exemplifying some of the features presented above, I will briefly analyse a sample taken from Article 84 of the Spanish Criminal Code: Si se hubiera tratado de un delito cometido sobre la mujer por quien sea o haya sido su cónyuge, o por quien esté o haya estado ligado a ella por una relación similar de afectividad, aun sin convivencia, o sobre los descendientes, ascendientes o hermanos por naturaleza, adopción o afinidad propios o del cónyuge o conviviente, o sobre los menores o personas con discapacidad necesitadas de especial protección que con él convivan o que se hallen sujetos a la potestad, tutela, curatela, acogimiento o guarda de hecho del cónyuge o conviviente, el pago de la multa a que se refiere la medida 2.ª del apartado anterior solamente podrá imponerse cuando conste acreditado que entre ellos no existen 44 relaciones económicas derivadas de una relación conyugal, de convivencia o filiación, o de la existencia de una descendencia común. (2015, p. 32) As can be observed in the text, there is a lack of empathy for the reader in the elaboration of this article, considering the length of the sentence (136 words), its syntactic complexity, the use of specialized terms in legal Spanish which are not accessible for an average reader, such as curatela, and the presence of the self-referential expression la medida 2ª del apartado anterior. If the legal features presented above and the problems that may derive from them are taken into consideration, it seems that there is an urgent need to democratize law, that is, to make it accessible to all the citizens to whom it applies. For such reason, the Plain English Movement is deeply rooted as one of the main scopes of action of forensic linguists and was defined by Felsenfeld as “the first effective effort to […] write legal documents, particularly those used by consumers, in a manner that can be understood not just by the legal technicians who draft them, but by the consumers who are bound by their terms” (1981, p. 408). Nevertheless, with the notable exceptions of Tiersma’s redraft of the Pattern Jury Instructions of California in 2005 and Gibbons’ improvement of the New South Wales Police Caution in 2001, the work conducted by forensic linguists in this area has had little acceptance (Coulthard, 2010). The main reasons underlying the necessity to reform jury instructions and police cautions will be presented below, as well as a depiction of real cases that illustrate the consequences derived from their linguistic flaws and a series of measures suggested by linguistic experts that could facilitate the understanding of these legal texts. Peter M. Tiersma has devoted considerable research to describe the linguistic barrier between jury instructions and the average citizen and to suggest ways in which these difficulties may be overcome. The author clarified the difference between the role of the jury and the judge by stating that “the jury’s function is to determine what happened, or the facts, as well as to reach a verdict. It has become the exclusive duty of judges to decide the rules of law that apply to those facts” (2009, p. 1). He further explained that the judge has to communicate to the jury these rules in the form of jury instructions. The word communicate is of paramount importance in this context, since, as Tiersma highlights, “communication […] requires not just that you speak or read to someone but also that the audience actually understand what you intended to communicate” (2009, p. 1). 45 Dumas (2002) reported a case in which the misunderstanding of jury instructions led to the execution of Bruce Charles Jacobs in Texas. Jacobs was accused of stabbing to death a sixteen-year-old boy in his own house and police officers found a knife close to the kid’s residence without recognisable prints. A number of witnesses claimed to have seen Jacobs in the neighbourhood, for which he was arrested. As a result of the jury’s verdict, Jacob was found guilty of capital murder and ultimately executed. Nevertheless, the defence counsel gathered a team of linguistic experts to discern whether jurors had reached full comprehension of their instructions and they concluded that not only did they misunderstood the term reasonable doubt, but that they were not properly explained the following legal conditions: That Jacobs could be convicted of a lesser included offence (murder or burglary of a habitation); that jurors needed to find that Jacobs committed a felony offence of burglary as well as one of murder; […] that the word deliberately does not mean the same as intentionally; that the word probability means something more than a possibility; [and] that the terms criminal acts of violence, continuing threat, reasonable expectation and society have legal definitions that may be different from ordinary meanings. (2002, pp. 246-247) As underlined earlier, the experts determined that Jacob’s punishment may have been imposed by the members of a jury that did not properly understand the instructions that they were given. Such execution does not constitute an isolated case and, indeed, many forensic linguists have devoted extensive research to assess the comprehensibility of jury instructions in distinct communities. For instance, Levi had already described a similar situation in the previous decade when she acted as an expert witness in a case in which James P. Free, who was sentenced to death for murder, argued that “his constitutional rights had been violated by the fundamental inadequacies of the instructions given to the jury in the sentencing phase of his trial” (1993, p. 20). With the purpose of preventing this type of communicative misunderstandings and their inherent consequences, Tiersma, who contributed to redraft the Pattern Jury Instructions of California in 2005, suggests four maxims to elaborate these legal documents effectively, which could be delineated as follows: A) “Identify the parties clearly and consistently” (2009, p. 14). To illustrate the way in which the parties of a case tend to be introduced in certain jury instructions, 46 Tiersma presented a real sample in which the three people involved in a rape were described as a person, another person and yet another person, which may be confusing for the juries. For that reason, he suggests that the best way to avoid ambiguity in the identification of participants is to use consistently their names or, at least, a descriptive term like the defendant. B) “Use an example or illustration to clarify a difficult point” (2009, p. 15). According to Tiersma, this becomes particularly useful when the jury has to face abstract concepts. C) “Develop a clear ‘template’ for the elements of a crime or cause of action” (2009, p. 15). D) “Give the jurors clear guidance on how to go about their task” (2009, p. 16). The author insists on the idea that these instructions must include clear directions on how to reach a verdict and, additionally, how to fill out the verdict form. In brief, a poor elaboration of jury instructions may have negative consequences for those involved in the course of a trial, and thus the implementation of these measures in the process of creating them could become crucial for the correct functioning of the justice system. Making police cautions more comprehensible constitutes another relevant field of action for forensic linguists. Gibbons (2003) points out that every person who is arrested or is about to face a police investigative interview must be properly informed about their rights, although there might be certain variations in the naming and the content of these instructions among countries: In the USA these are generally referred to as “warnings” and the most widely used warnings are the “Miranda Warnings.” In most other Common Law countries, including England, Australia and Malaysia, they are known as “cautions”, and derive from an original English source which over time has evolved differently in these varied contexts. (2003, pp. 186-187) The main problem behind many of these cautions is that, since they tend to be written in legal language, some citizens may find it hard to comprehend their actual meaning and hence they fail in the achievement of their objective (Rock, 2007). Despite the fact that police officers are allowed to change the original words of the cautions if that helps the detainee to understand their meaning, many studies prove the ineffectiveness of these 47 cautions and the need to adopt a series of measures to ensure the fulfilment of their goal (Rock, 2007; Perkins & Grant, 2012). Even though there has been little acceptance of the work carried out by linguists in the legal community, John Gibbons successfully contributed to redraft the New South Wales Police Caution. The stages of this process were described by the author as follows (2003, p. 188): 1) The police sent out the old versions and I and others suggested revisions in writing; 2) The police produced a draft revised Code of Practice which was sent out again for comment; 3) The police made some changes on the basis of the comments; 4) The revised draft was discussed at a large meeting involving many interested parties at Police Headquarters; 5) The police produced the final version of the Code of Practice without further consultation. Gibbons explains that the original forty-one cautions included in the Code of Practice were reduced to five in the final version that was elaborated after his contribution. With the aim of exemplifying the kind of revisions implemented, the transformation of Caution 1, which was used at the initial stage of police investigative interviews, will be analysed. The original version was the following (2003, p. 189): I am going to ask you certain questions which will be recorded on a videotape recorder. You are not obliged to answer or do anything unless you wish to do so, but whatever you say or do will be recorded and may later be used in evidence. Do you understand that? Gibbons’ analysis of the text concluded that the syntactic subordination and coordination of the second sentence could be complicated to process at a cognitive level for some audiences because of the presence of two passive constructions without an agent and, especially, the fact that this caution forces the reader to deal with more than one concept at a time. In other words, the caution asks if the interviewee has understood the right to remain silent and the fact that s/he is being recorded at the same time, instead of asking about such concepts one by one. For those reasons, the suggested revision presented by Gibbons was as follows (2003, p. 191): 48 I am going to ask you some questions. You do not have to answer if you do not want to. Do you understand that? We will record what you say. We can use this recording in court/against you/against you in court. Do you understand that? Gibbons states that the text presented above divides the caution into its two main issues, which facilitates its understanding. In addition, expressions such as are obliged and unless have been replaced by others which are more comprehensible (have to and if not, respectively). The author regrets the fact that he was never given the chance to sit with them and work collectively on the abovementioned changes. Nevertheless, it seems undeniable that this process constitutes a breakthrough for the forensic linguistic community and has set a precedent for future cooperation, even though Gibbons himself admits that it was merely consultative. This section of the thesis has highlighted the linguistic difficulties that many citizens find when they are exposed to the written language of the law, with a special focus on the consequences derived from the lack of understanding of jury instructions and police cautions. In addition, a series of work methods suggested by relevant scholars in the field to pave the way for the development of the Plain English Movement have been presented. 3.3.2. The spoken language of the law The forensic linguistic analysis of the spoken language of the law covers “from the moment of arrest and the first communication of rights through police interview, interrogation and charge, to the announcement of the verdict at the end of the trial” (Coulthard et al., 2010, p. 534). As mentioned earlier, the line between the spoken and the written language of the law is often blurred, given that there are many written documents which are meant to be performed orally, as it is the case of police cautions, which have been addressed in the previous section. A case that reflects the importance of the oral aspect of police cautions will be presented and discussed. Tiersma (1993) explains that in Rhode Island v. Innis (1980), the police identified a suspect of killing a taxi driver with a fire weapon. At the moment of arrest, he was read the Miranda warning and he asked for a lawyer, which meant that the police officers were not allowed to interrogate him until the attorney was there. While 49 they were taking the suspect to the police station, one of the officers told the other that there were many handicapped children in the area and added “God forbid one of them might find a weapon with shells and they might hurt themselves” (1993, p. 279). It was at that moment when the suspect, who was worried about the kids, told the officers where the gun was hidden. The debate that took place at the Supreme Court was whether those police officers had interrogated the suspect or not, considering that interrogation does not only include direct questioning, but also any functional equivalent of it. At the end of the trial, it was determined that the police officers did not question the suspect in any form. Nevertheless, Tiersma states that the utterance produced by the police officer “conveyed that something very bad might happen unless he provided the information” (1993, p. 280), so it was an indirect way of interrogating the suspect. In my view, the suspect was interrogated, as Tiersma argues. If the utterance is analysed from a pragmatic point of view, even though the locutionary force, that is, what was literally said, does not reflect a prototypical question, its illocutionary force, that is, the intention behind those words, was to obtain that information and, indeed, the perlocutionary effect was that the suspect told the officers where the gun was (see Austin, 1962). The area of forensic linguistics that is specialized in the spoken language of the law is characterized not only for analysing the oral interactions that take place at the moment of arrest, but also for having a major focus on those that occur during police investigative interviews. Valero-Garcés (2018) points out that these interviews tend to have four main objectives. Firstly, to discern if a crime has been committed and, if the answer is affirmative, to determine what the crime was; secondly, to discover evidence that leads to the identification of the subjects that committed such crime; thirdly, to generate evidence that prevents the criminal from mounting an inappropriate defence in court; and lastly, to find out whether the witnesses are portraying the facts accurately or if they are exaggerating or twisting them. Oxburghm et al. made a review of the research that has been conducted over the years on the type of questions which are formulated in police interviews and discussed how these could be categorized, as well as what types of questions could allow the interviewee to express their ideas in a better way, as it is the case of open questions, that can “generate free narratives and longer responses from witnesses compared with closed questions” (2010, p. 48). 50 Regarding the psychological aspect of these interactions, Baldwin stated after the analysis of 600 samples of audio and video tapes of police interviews that, even though police officers often claim to apply complex psychological principles in their interviews, they tend to lack social skills, for instance when they repeatedly interrupt the interviewee or when they make the questions “in such quick-fire succession that suspects [are] not given the opportunity to put their versions of events coherently” (1993, p. 349). It must be borne in mind that the evidence which is presented in court after the conduction of these interviews is not a literal transcription of the dialogue that takes place between the police officer and the suspect or the witness, but a simplified report written by the officer that may contaminate the original narrative (Vázquez Maroño, 2014; Haworth, 2018). Haworth argued that this transformation of the interviewee’s speech into an official written document tends to have a negative impact on their defence in court: […] the credibility of a witness can be destroyed by counsel highlighting differences between what is said in court, and what was (recorded as being) said at interview […]. The effects can be devastating, especially for defendants, and so the accuracy of interview records must be crucial. (2018, p. 428) Coulthard (1996) presents the Bentley case as an example of how police officers have the chance to manipulate the suspect’s words and facilitate their imprisonment. In the 1950s, Derek Bentley and Chris Craig were arrested by the police while they were trying to steal in a warehouse. At the moment of arrest, Derek allegedly told Chris, who was in possession of a revolver, let him have it, Chris, and immediately after those words were uttered, Chris shot the gun and killed a police officer. The interpretation of the jury was that Derek was indirectly asking Chris to shoot, for which he was sentenced to death. The most remarkable aspect of Derek’s defence was the fact that he continually emphasized that he did not mention such words at all, which was received with great scepticism. However, a few decades later, the case of Paul Dandy in 1989 changed the public’s opinion about the credibility of police officers, given that it was irrefutably proved by Electro-Static Deposition Analysis that the officers who conducted his interview added a couple of incriminating sentences in its record some hours after they had drafted the rest of the document. In order to avoid manipulations of this kind, many of the interviews conducted today in some jurisdictions “are video-recorded and almost all of the rest are audio-taped using stereo tapes with a pre-recorded voice announcing the time at ten seconds intervals, in order to prevent subsequent editing” (1996, p. 122). 51 Despite the many improvements in the field over the last decades, Haworth (2018) complains about the fact that physical evidence is carefully preserved to avoid the slightest contamination before it is presented in court, whereas the treatment given to interview data is still far from being equal. For that reason, she suggests a series of measures to ensure the preservation of the original evidence extracted from investigative interviews and, as a result, to avoid the miscommunication of the interviewee’s words and its legal implications: A) “All police recording equipment should be switched to digital rather than outdated audio cassette tapes, in order to ensure better data quality at source” (2018, p. 446). B) Police officers should embrace the usage of video recordings to complement the verbal production of the interviewee. C) There should be a common code of practice for the transcription of interviews that specifies what to do in cases of pauses, overlaps, etcetera. The author further explains that “this would ensure consistency in production and interpretation, which would be especially beneficial at the courtroom evidence stage” (2018, p. 446). D) Transcribers should be properly trained. This means that they should learn some basic notions about legal language, the main differences between its written and spoken format and what principles they should follow during the editing process. E) The people in charge of evaluating the interview as part of the evidence should not only take into consideration the official transcript, but also the original recording. F) “The practice of reading aloud the interview transcript in court should be abandoned”, given that “it adds a further unnecessary layer of distortion, confusion and corruption to the interview data” (2018, p. 447). As pointed out earlier, this practice is usually beneficial for the prosecution, since the credibility of the witness is put into question if there is any minor difference between what they state in court and what they stated during the interview. The cooperation between police forces and linguists can provide a route to improving the quality of investigative interviews, which would have a positive impact on the treatment received by suspects and witnesses and, by extension, on the legal system as a whole. 52 There is another type of interrogation that stands as a major concern for forensic linguists, which is the one that takes place in court between a lawyer and a hostile witness. Gibbons (2005) explains that when an interaction of this kind takes place, the lawyers are in a significantly better position than the witnesses, given that they have the advantage to make the questions and, as a result, they are in control of the direction of the conversation. The author further suggests that lawyers tend to pursue four main objectives during the trial, which will be presented below together with the persuasive techniques through which they can be achieved: A) To reinforce their own version of the facts. Gibbons states that this objective tends to be achieved in two different ways. Firstly, the attorney may formulate the questions and portray the facts in a way that does not allow the witness to provide an alternative narrative, for instance when they make a yes/no question. Secondly, the lawyer might force the witness to accept their version of the story with certain linguistic practices, such as the use of assertions like “He came into the room”, instead of asking “Did he come into the room?” (2005, p. 196).4 The author refers to these techniques as “controlling the information” and “controlling the person”, respectively (2005, p. 194).5 B) To increase the credibility of the witness who is on their side by strengthening their social status during the interview, which is supposed to have a positive impact on the reliability of their story. C) To challenge the veracity of the testimonies provided by the hostile witness and their defence. The most common way to accomplish this is by finding contradictions between the witness’ latest statement and the previous one(s), as has been previously explained in this section of the thesis. D) To cast doubt on the credibility of the hostile witness who has been interrogated. In this case, lawyers usually work out the opposite strategy that they used with their own witnesses. In other words, they try to create in the judge and the rest of the audience the impression that the hostile witness “lacks intelligence, maturity, moral ethics, emotional control, the ability to reason and reliability” (2005, p. 194).6 4 My own translation. 5 My own translation. 6 My own translation. 53 The expertise of the forensic linguist is also required in situations with vulnerable witnesses, as it is the case of those who cannot speak the native language of the court. Kredens (2016) explains that the role of the public service interpreter (PSI) could be categorised into three main domains, the first one being associated with the translation of others’ utterances and, by extension, pragmatic and socio-pragmatic equivalence problems, the second one with economic and political differences that make the interpreter a cultural mediator and, lastly, Kredens highlights the fact that “roles can arise spontaneously in any PSI setting; the interpreter can become […] a confidant, an expert witness […], an ally […], or even a messenger […], and this list is by no means extensive” (2016, p. 66). Children are other type of vulnerable witnesses involved in legal disputes who need to be protected. Coulthard (2010) explains that there have been some major improvements in the way in which information is elicited from them, and a representative example of this progress is the fact that some judges are giving children permission to video-record their testimony before the trial takes place or allowing them to communicate from somewhere outside the courtroom. This section has expounded the main scopes of action of forensic linguists in oral legal contexts, which are the communication of rights during the moment of arrest and in police investigative interviews, as well as the spoken interactions that take place in such scenarios and the courtroom. The analysis of the abovementioned interactions, the implementation of the measures suggested by linguistic experts and the recognition of their work could be crucial to protect certain witnesses and suspects and ensure a better functioning of the legal system. 3.3.3. The linguist as an expert witness Forensic linguists can be called upon to testify in court as expert witnesses when language is involved in a case. There is a plethora of roles which have been played by linguists in this area and, due to space constraints, only a few will be illustrated before authorship attribution studies, which constitute the actual focus of the thesis, are presented and explained in more depth. Linguists can be required in court to analyse legal cases which are considered language crimes. Among the many types of language crimes that can be examined by 54 forensic linguists, vilification and performative crimes will be exemplified and briefly commented. According to Gibbons, vilification “may target a specific individual or corporation, in which case it is handled in common law as ‘defamation’: slander if it is non-permanent form or libel if it is in a recorded form” (2011, p. 236). Shuy acted as an expert witness in a famous case of libel in which Frank Celebrezzee, a member of the Ohio Supreme Court, accused The Cleveland Plain Dealer of making him lose his reelection after they published articles suggesting that “Celebrezze had cast his judicial vote in two criminal cases in exchange for campaign contributors” (2010, p. 99). On the other hand, offering a bribe or threatening someone are examples of performative crimes. Regarding the latter, Fraser states that a successful threat expresses three concepts, these ones being “the intention to perform an act, the belief that the state of the world resulting from that act is unfavourable to the addressee [and] the intention to intimidate the addressee” (1998, p. 162). The author further explains that not all threats are illegal and provides examples for both situations by stating that, while threatening someone to talk about an infidelity to their current partner is legal, threatening about exposing an infidelity to someone’s partner or even to the press unless there is an economic remuneration is illegal. As can be inferred from the previous example, the line between what is legal and what is not is sometimes blurred and the expertise of a linguist may be required for a correct application of justice. Legal disputes among trademarks constitute another relevant scope of action for forensic linguists. Shuy points out that this kind of disputes tend to begin in two different ways: “with charges of trademark infringement and with charges of unfair competition” (2002, p. 44). He further adds that, when one of these situations takes place, “trademark attorneys tend to use accountants and other experts to deal with representations of actual damage and linguists to address the issues of linguistic similarities and the ways that language use can give clue to intentions” (2002, p. 45). With the purpose of illustrating the nature of these cases, the contribution of Shuy in McDonald’s Corporation v. Quality Inns, International, Inc. (1988), will be expounded. The author, who was called upon by Quality Inns as an expert witness, depicted the origin of the conflict as follows: In the fall of 1987, a large hotel chain, Quality Inns International, made public its plan to create a new chain of inexpensive hotels to complement its other market 55 brands. The name of this new hotel was to be McSleep Inns and they planned to open some 200 McSleep franchises within three years. Three days after this initial announcement, the McDonald’s corporation, the famous fast-food marketer, sent a letter to Quality Inns alleging trademark infringement and demanding that Quality Inns not use the proposed McSleep name. (2002, p. 95) Quality Inns argued that it was unlikely that people could associate McDonald’s with McSleep, given that they belong to different types of business. They asked Shuy to analyse the case from a linguistic perspective and he compiled a corpus of words including the prefix Mc- that were unrelated to McDonald’s and categorized their meaning according to the linguistic context in which they were found. Among those words there were proper names, acronyms, products fabricated by Macintosh, parodies of fast-food products and certain words that intended to mean something “basic, convenient, inexpensive and standardized” (2002, p. 99). The latter group was of great interest for the case, and two notable examples of this type of terms were McLaw, that was used in The California Law Review to depict cheap and accessible legal services; and McArt, which could be found in Forbes to describe the massive marketing campaigns that characterized certain art stores. In other words, the defence of Quality Inns was built upon the idea that the prefix had grown into a generalized lexical item and therefore it could have a separate meaning from the one that was associated with McDonald’s in some communicative contexts. According to Shuy (2002), McDonald’s called on another linguist who stated that, as a theoretical linguist, he considered that the only way to determine the meaning of a word is by asking people directly what they think of it. In my view, while Shuy’s contribution was well built, the statements of the McDonald’s linguist could be seen as biased, since they deny the validity of the findings provided by well-established disciplines of applied linguistics such as pragmatics. Nevertheless, the judge decided that the prefix Mc- was not generic and that it was possible to associate McDonald’s with McSleep, which implied a legal defeat for Quality Inns International. Up until this point, this chapter has presented a definition of forensic linguistics, a summary of its historical development and an explanation of its main applications with the aim of offering a general introduction to the discipline, especially for those readers who are not familiarized with it. Due to space limitations, some of these applications have been briefly discussed and even omitted, for which I apologise. Even though authorship 56 attribution studies constitute another scope of action of the linguist as an expert witness and hence they could have been included in this section, a separate one will be devoted to discussing their theoretical foundations and methods in more depth, given that they constitute the actual focus of this investigation. 3.4. Authorship attribution studies This section intends to provide a description of the main principles of authorship attribution studies, as well as a critical review of relevant investigations in the field. According to Bozkurt et al., authorship attribution studies could be defined as follows: Authorship attribution (AA) is the process of attempting to identify the likely authorship of a given document, given a collection of documents whose authorship is known. Applications of authorship attribution include plagiarism detection (e.g. college essays), deducing the writer of inappropriate communications that were sent anonymously or under a pseudonym (e.g. threatening or harassing letters), as well as resolving historical questions of unclear or disputed authorship. (2007, p. 1) In other words, the goal of forensic authorship attribution is the identification of the author(s) of anonymous or disputed documents through the analysis of their linguistic features under the assumption that each speaker has an individual variety of their native language, which is known as their idiolect, and that “this idiolect will manifest itself through distinctive and idiosyncratic choices in texts” (Coulthard, 2004, p. 431). As pointed out by Turell, “[t]he linguistic production of individual speakers and writers can sometimes reveal information about an individual’s age, gender, occupation, education, religion […], political background […], geographical origin [or] ethnicity (2010, p. 212). Even though the term idiolect will be consistently used in the thesis, it is worth mentioning that there is great controversy around this concept. Turell highlighted that its generalized usage may derive from an idealised vision of language, since “it could be argued that it is impossible to determine whether a given feature observed in a recording or a written text is idiolectal, dialectal, sociolectal, genderlectal, constrained by age factors, etc.” (2010, p. 216). She therefore added that “idiolects can only be determined with countless amounts of data from each individual, something which never happens when dealing with real forensic linguistic data” (2010, p. 217), and suggested the use of the notion idiolectal style in forensic authorship contexts. This could be defined as “the 57 set of options that writers take from the linguistic repertoire available to them as users of a specific language” (2010, p. 217), that is, the distinctive way in which an individual applies a linguistic system shared by many. Studies in forensic authorship contexts may concern several types of data, and hence Coulthard et al. (2010) make a distinction between single text and comparative authorship problems, or in other words, cases involving an open and a close set of suspects, respectively. The authors explain that “a single text problem occurs where comparison texts are unavailable or where an investigation is not yet narrowly focused on a small pool of suspects” (2010, p. 536). They further suggest that these cases usually involve a set of texts that can be unified and analysed as a single document and that the forensic linguist is then expected to provide information about the author of such text by classifying its idiolectal features, as well as to clarify the possible meaning of ambiguous utterances. In contrast, studies in this area may involve a disputed text whose linguistic features have to be compared with the idiolect of a suspect or a set of suspects in order to determine its likeliest authorship. Coulthard et al. (2010) explain that a considerable proportion of the cases in which the expertise of a linguist is required are of this nature. Indeed, the research conducted in the present thesis could be classified into this category, given that its main goal is to define Shakespeare’s and Marlowe’s idiolect through a linguistic analysis of their undisputed works to discern the likeliest authorship of a disputed play. Queralt (2014) points out that before any type of investigation takes place, the forensic linguist must decide if the analysis of a certain text will constitute a proper linguistic case or not depending on its length and its quality. She explains that even though the academic community has not reached an agreement on which is the minimum length that is necessary to carry out a reliable linguistic analysis, qualitative studies can be conducted with shorter texts of around 150 words, whereas quantitative studies tend to require larger samples, since they generally imply the usage of computer programs to find linguistic patterns. On the other hand, the quality of a text is related to whether the sample contains enough linguistic features to reflect the idiolect of its author. Regarding this issue, Kredens (personal communication, February 17, 2019) defends the idea that the genre of a text has a major impact on the amount of idiolectal features that it includes. This means 58 that, for instance, it would be almost impossible to determine the authorship of a shopping list for obvious reasons. Once an explanation of the most basic concepts on which this discipline is built has been provided, the main applications in which authorship attribution studies can be divided will be addressed, with a special focus on historical questions of disputed authorship in general, and that of Arden of Faversham in particular. 3.4.1. Attribution of authorship in cases of plagiarism The notion of plagiarism will be expounded together with the role of the forensic linguist in this area and the tools that have been created to facilitate their work. Olsson states that plagiarists act in three distinct ways (2008, p. 108): A) Archaeological plagiarists —the most common type— take an artefact and try to disguise its surface by substituting some of its parts and by re-arranging others. B) Diachronic plagiarists take an artefact from an earlier period and try to disguise its chronicity by translating it into an artefact of their own time. C) Cultural plagiarists transpose elements of their own culture onto a cultural artefact of another culture or, alternatively, try to take cultural artefacts from elsewhere and convert them into own culture substitutes. Barrón-Cedeño et al. (2014) suggest that a text can be plagiarised in four different ways, according to the categories delineated by Martin to describe the nature of this crime (2004). Firstly, someone can plagiarise other individual’s ideas or theories without giving them due recognition. Secondly, a section of a text or even a whole text can be copied word by word or with slight modifications. Thirdly, the sources of a text can be plagiarised when an author mentions those presented by another author without clarifying that these sources were extracted from his/her work. Finally, they mention authorship attribution issues, that is, when someone claims to have written a text that was indeed produced by someone else. In addition to these modalities of plagiarism, Sousa-Silva uses the term translingual plagiarism to refer to those cases in which “plagiarists lift the text from one language, have it translated to another language, and subsequently use it as their own” (2014, p. 72). As the author further explains, the identification and demonstration of translingual 59 plagiarism is particularly complicated, since the resulting texts have no apparent similarities with the originals. The approaches adopted for the detection of plagiarism could be divided into external and intrinsic (Potthast et al., 2009, as cited in Sousa-Silva, 2013). Whereas the first one is oriented to the comparison of the suspicious text with a corpus of original manuscripts to find linguistic similarities, the latter “exclusively analyses the input document, i.e., does not perform comparisons to documents in a reference collection” (Foltýnek et al., 2019, p. 10). The purpose of the intrinsic approach is to find stylistic inconsistencies that might reflect an attempt at plagiarising an external source, which would require a further external analysis. A popular case of plagiarism in the Spanish academic community took place when the Rector of the Universidad Rey Juan Carlos, Fernando Suárez, was found guilty of making a literal transcription of a text produced by the former Dean of the Faculty of Law at the Universitat de Barcelona, Miguel Ángel Aparicio. When the news broke, professors and researchers from different universities signed a petition asking for Fernando Suárez’s resignation, who moved forward the elections to appoint a new Rector.7 This illustrates the frequency with which plagiarism occurs in academic settings and the need of linguistic expertise to protect the intellectual property rights of other scholars. For this end, an increasing amount of computer programs have been developed to prevent plagiarism, as it is the case of Turnitin, Unicheck and Urkund. Nevertheless, these programs are only meant to assist linguists by giving them sufficient proof to discern whether plagiarism has been committed or not. The final decision, as well as the legal measures that should be adopted, must be determined by the forensic expert (Barrón-Cedeño et al., 2014). 3.4.2. Attribution of authorship of criminal texts with an open set of suspects Forensic linguists can be called upon to study the authorship of anonymous texts that cannot be associated with any possible suspect. In such cases, the forensic expert is expected to draw up a profile of the author based on their idiolectal features, which can provide crucial information for the development of the investigation (Coulthard et al., 2010). One of the most well-known cases in which the analysis of the idiolectal features of an anonymous text was required to condemn a terrorist is that of the Unabomber. According to Coulthard and Johnson (2007), between the years 1978 and 1995, an 7 https://www.elmundo.es/madrid/2017/02/03/5894c721e2704e80678b4615.html https://www.elmundo.es/madrid/2017/02/03/5894c721e2704e80678b4615.html 60 American citizen who was later known as the Unabomber sent bombs to people working at universities and airlines through the post. In 1995, he sent a manuscript of 35,000 words called Industrial Society and its Future to six national journals and offered to stop sending bombs if his manuscript was released to the public. The Washington Post agreed to publish the document and, a few months later, a man contacted the FBI claiming that the text contained a series of expressions that were commonly used by his brother, who had not been in touch with him for more than ten years. He put a special emphasis on the fact that his brother used to repeat the expression cool-headed logician, which appeared in the manuscript and is a distinctive idiolectal feature that had a major impact on the posterior analysis. When the FBI finally discovered where his brother was and arrested him, they found a 300-word document that he had written more than a decade ago, and its linguistic analysis revealed that it presented a high degree of resemblance with the 35,000-word manifesto, which was the ultimate proof of his guilt. The anthrax case also stands as one of the most popular investigations that can be associated with the forensic linguistic analysis of criminal documents with an open set of suspects. Olsson (2004, 2008) reports that, after the attack on the Twin Towers on September 11, 2001, certain public figures received envelopes which were, allegedly, letters written by schoolchildren. Nevertheless, these documents contained anthrax, a lethal poison which provoked the death of five people and sickened another seventeen, according to the FBI’s official website.8 Olsson (2004, 2008) explains that the American authorities linked this terrorist attack to Al-Qaida and tried to discern if the messages contained in the abovementioned envelopes had been written by an English or an Arabic native speaker. The author offers a transcription of the message contained in the envelope that was sent to Senator Daschle: “09-11-01. You can not stop us. We have this anthrax. You die now. Are you afraid? Death to America. Death to Israel. Allah is great” (2004, p. 104). He states that the style is considerably similar to that of the letter that was sent to Tom Brokaw: “09-11-01. This is next. Take penacilin now. Death to America. Death to Israel. Allah is great” (2004, p. 104). Olsson inferred the following conclusions after the study of the samples: Note the terseness of the style. It is far from easy for a learner of English to use the language in this concise, precise way. Moreover, it is probably indicative of 8 https://www.fbi.gov/history/famous-cases/amerithrax-or-anthrax-investigation https://www.fbi.gov/history/famous-cases/amerithrax-or-anthrax-investigation 61 someone with a good education and —paradoxically— someone who is used to doing a lot of writing. The misspelling ‘penaciling’ and the pseudo-pidgin style ‘You die now’ are probably just red herrings and should be ignored. (2004, pp. 104-105) The FBI states in their website that an exhaustive revision of the case led to the incrimination of Dr. Bruce Irvins, a worker at the United States Army Medical Research Institute of Infectious Diseases (USAMRIID), who killed himself before charges could be presented. As has been reflected in the two cases presented above, the role of the forensic linguist in drawing a profile of the possible author of an anonymous criminal text may be crucial to narrow down the scope of a police investigation, since there are certain idiolectal features that can provide information about the individual’s gender, age, native language or educational background, among other details. The following section will address the role of the linguist in those cases in which the criminal text can be associated with a series of possible authors. 3.4.3. Attribution of authorship of criminal texts with a close set of suspects In many of the cases in which forensic experts are required to analyse the authorship of a criminal text, there is already a list of possible authors and, as a result, the linguist is expected to determine with which of the idiolects of the candidates the disputed text has a higher degree of resemblance (Coulthard et al., 2010). The protocols followed in these cases involving a close set of suspects can be illustrated by describing that of Dulceliz Díaz, who had allegedly killed her 5-year-old daughter and committed suicide in 2007. James R. Fitzgerald, a former FBI agent who had a crucial participation in the case, explains that the forensic linguistic analysis of suicide notes often intends to discern if the letter was indeed written by the victim or if it was elaborated by someone else in an attempt at covering their murder (2014). Therefore, he states that suicide notes should always be compared with undisputed texts of the victim and, at the same time, with texts produced by relatives or acquaintances that may be seen as potential suspects. According to the author, an alleged suicide letter was sent to three members of Dulceliz’s family through an email account that she shared with her former boyfriend, Alberto Pérez, who was also the father of her daughter. The email that was sent from 62 Díaz’s account (see Appendix 1) was considered the disputed document that needed to be compared with undisputed texts produced by Díaz herself and the main suspect, Alberto Pérez. These undisputed texts were other emails, blog entries and forum posts. The linguistic analysis of the disputed email showed that it was highly probable that Pérez wrote it and, at the same time, that it was unlikely that Díaz elaborated it. The email contains the abbreviation gonna, which was used multiple times by Pérez in his undisputed texts, whereas there was a preference for the form gunna in Diaz’s samples. The expression peace out of the email was also found in many of the suspect’s online posts and in none of Díaz’s. Similarly, there is an ellipsis in the email, which is something Díaz only wrote once in her 438 undisputed texts, while it appeared 119 times in Pérez’s 393 reference samples. Agent Fitzgerald was called upon as an expert witness during the trial and states that this forensic linguistic analysis had a major impact on the final verdict, in which Alberto Pérez was sentenced to death for a double homicide. In sum, the attribution of authorship of a criminal text with a close set of suspects is based on the classification of the idiolectal features of every suspect for further comparison with those of the disputed document. 3.4.4. Attribution of authorship of historical texts Forensic experts may be asked to examine the authorship of literary and other types of ancient texts by tracing the idiolectal features of the possible authors through a linguistic analysis of their undisputed works for further comparison with the features of the disputed sample. Although McMenamin listed the most salient authorship tests of his time almost three decades ago (1993), there is a bewildering variety of methods that have been used to study the authorship of historical texts over the last years due to the irruption of new technologies and the possibilities that they offer, and thus this section will only discuss those procedures that will be taken into consideration for the present research or have been used by other scholars to analyse Arden of Faversham. Despite this, I would like to mention Canter’s and Chester’s investigation proving the lack of reliability of the Cusum technique (1997), which was proposed by Morton and Michaelson (1990) to discriminate between texts written by one author and collaborative texts; Larner’s research on the usage of formulaic expressions as authorship 63 discriminators, even though he did not reach conclusive results (2014); and, more importantly, Grant’s study to identify effective authorship markers combining discriminant function analysis with Bayesian likelihood measures, where he argued for the importance of designing a method that leads to no cases of misattribution, even if its success rate is not as high as those of others who do have the potential to misattribute samples (2007). The first attempt at making a statistical description of an author’s literary style to prove that idiolectal features could be quantified for further authorship attribution studies was that of Mendelhall (1887), which was based on the calculation of the average number of letters per word (see Section 3.2). This work set a precedent and there have been many subsequent studies that have analysed the length and/or the frequency of words and other linguistic items for the same purpose. This approach can be observed in the research conducted by Moerk (1973), who focused on the samples provided by thirty American college students who were asked to write freely a short story that began with this sentence: “He (She — according to the sex of the subject) stood at the window, clasped his (her) hands behind his (her) back and stared out into the night” (1973, p. 51). According to the author, “this one sentence induces nearly all writers to adhere to an area of content concerning personal problems, social interactions, feelings and memories, so that content per se should produce no or minimal differences in style” (1973, p. 51). With the purpose of documenting a statistical description of the style of these literary texts, Moerk quantified the frequency of certain types of words according to their length, grammatical category or syntactic function, as well as the average number of words per sentence of the texts and other variables. Among these, the average number of words per sentence of a text, which can be calculated dividing its total number of words by the number of sentences that it has, seems like a potentially distinctive idiolectal feature, given that it can reflect a preference for certain type of syntactic structures. In other words, those authors that present a low average number of words per sentence may have a tendency towards the usage of simple sentences, whereas those whose average number of words per sentence is higher may prefer more complex syntactic constructions. For that reason, the calculation of this parameter will be considered for the present study and programmed as one of the tasks that can be carried out by the software ALTXA (see Section 4.5.2). 64 Another simple but effective procedure in studies of this nature consists in the calculation of the relative frequency of a list of chosen keywords within a disputed sample for further comparison with the relative frequency that they have in the reference texts of the possible authors. The percentage of the relative frequency of a word in a text can be obtained by dividing the number of times that the word appears in such text by its total number of words and multiplying that result by a hundred. Thomas Merriam (1996) compared a Shakespearean corpus formed by the 36 plays that appear in his First Folio with a corpus that included 7 plays of Christopher Marlowe in terms of the relative frequency of a series of words that he had delineated as idiolectal markers of the latter due to their prominent presence in his play Tamburlaine the Great. When these two reference corpora were compared, the keywords that he had selected presented a considerably higher relative frequency in the corpus of Marlowe, which proved the reliability of the method. Afterwards, the author calculated the frequency of those Marlowian keywords in each of the plays that formed the two reference corpora individually and noticed that their frequency in Henry VI, Part I, which had been allegedly written by Shakespeare only, was similar to the one that they had in the plays written by Marlowe, whereas their values in the rest of the Shakespearean texts were much lower. This stands as a remarkable result, given that this play was attributed to both authors as a collaborative text years later (see Section 2.2), which explains the frequency of Marlowian keywords in the text. For this, the quantification of the relative frequency of a set of keywords selected by the researcher will be considered for the conduction of this thesis and introduced as one of the functionalities of the software ALTXA (see Section 4.5.1). A more complex methodology for the attribution of authorship of historical documents is based on the quantification of their percentage of lexical richness, which can be obtained by dividing the number of distinct words that a text contains, also known as its types, by its total number of words, also known as its tokens, and multiplying the resulting number by a hundred. Baker (1988) conducted a study where he compared the lexical richness9 of the plays and poems written by the two playwrights that constitute the focus of this doctoral thesis, William Shakespeare and Christopher Marlowe, to discern if the results derived from the calculation of this parameter presented enough intra-author 9 Baker refered to this as vocabulary richness. 65 consistency and inter-author variation, that is, if the Shakespearean values were similar among themselves and sufficiently different from the Marlowian ones, which were also expected to be consistent (see Turell, 2010 for a more detailed insight into the notions of intra-author consistency and inter-author variation in idiolectal studies). His analysis revealed that the lexical richness of the Shakespearean plays remained relatively stable, whereas that of the works of Marlowe presented more fluctuations, and Baker therefore suggested that Christopher Marlowe was able to adopt more registers than the Bard. In addition, the results showed that, despite the fluctuations, Marlowe could provide his texts with more lexical richness than Shakespeare. It is obvious that short texts are more likely to present a high lexical richness, given that the chances of repeating words are lower. Despite the fact that Baker suggests that this parameter is not dependent on the length of the texts unless there are overwhelming differences among them, I would say that it is not rigorous to compare works of distinct lengths in terms of it. Nevertheless, the fact that Baker’s study associated many of the works of Shakespeare and Marlowe among themselves using this discriminator seems like a solid reason to consider its usage in the present thesis, although it needs to be applied differently to avoid the inconsistent results that may derive from the comparison of samples of dissimilar lengths, as will be developed in the following chapter. Therefore, the quantification of this parameter will be programmed as one of the tasks that can be carried out by ALTXA (see Section 4.5.3). The main problem behind the procedures that have been described so far is their inaccuracy when they are used to analyse the authorship of short samples, which stands as one of the most complicated tasks within the disciplinary framework of forensic linguistics, since it is more complicated to identify quantifiable idiolectal features in them (Queralt, 2014). Nevertheless, n-gram tracing, which constitutes a more modern method than those that have been previously described in this section, is known for its effectiveness in the attribution of authorship of short texts (Grieve et al., 2018). What n-grams are and how this method can be used for forensic linguistic purposes could be delineated as follows: [A]n n-gram is defined as a sequence of one or more linguistic forms (e.g. 1- grams or 2-grams) at any level of linguistic analysis (e.g. words or characters) […]. The basic idea behind n-gram tracing is to calculate the percentage of n- 66 grams that occur in a questioned document that also occur at least once in a possible author writing sample. This process is repeated for each possible author and the text is then attributed to the possible author whose writing sample contains the highest percentage of the n-grams from the questioned document. (Grieve et al., 2018, p. 6) This relates to the conventional depiction of n-grams as combinations of consecutive characters or words that take place within the same sentence (see also Cheng et al., 2006; Ishihara, 2014). For instance, if the sample He looked at her. She seemed concerned. is analysed from this perspective, the word 2-grams of this short text would be He looked, looked at, at her, She seemed and seemed concerned, whereas her She would not constitute a word 2-gram, given that these words belong to distinct sentences. Grieve et al. (2018) used n-gram tracing to carry out an authorship analysis of the Bixby Letter, which is known to be a short message of 139 words in which Abraham Lincoln gave his condolences to a widow called Lydia Bixby after the loss of her five sons in the American Civil War. According to the authors, this piece of correspondence allegedly written by Abraham Lincoln has raised substantial debate among linguists, given that some historians claim that it was written by John Hay, who was his personal secretary. For that reason, they compiled a series of Hay’s undisputed written documents for his reference corpus and, to compile the reference corpus of Lincoln, they selected a group of texts that he wrote before he hired John Hay, in case Hay himself may have written other samples that have been traditionally attributed to Lincoln. Once these samples were gathered, they analysed the character and the word n-grams that the letter shared with the undisputed corpora of the two candidates and determined that John Hay was its likeliest author. It is worth mentioning that the authors had previously conducted a pre-study in which they analysed undisputed texts of Lincoln and Hay as if they were disputed to assess the reliability of n-gram tracing. The conduction of a case study with methods that have been already tested in a pre-study reflects an approach that has been adopted for this doctoral thesis (see Chapter 4). Other investigations that have proved the effectiveness of n-gram tracing in the attribution of authorship of small samples are those of Wright (2017) and Cicres and Queralt (2019). Wright worked with a set of emails extracted from the Enron Email Corpus and correctly attributed the authorship of most of the samples among the 176 possible authors. The success rate of the studies was especially high when these traced 67 shared word n-grams of between two and six words. On the other hand, Cicres and Queralt analysed texts produced by a group of schoolchildren in Catalan and concluded that word 3-grams and 2-grams could effectively classify the samples in terms of the age of the authors, that ranged from 6 to 11 years old. Given that the scenes of the play Arden of Faversham will be analysed as independent texts for the present investigation, the selection of a method that seems to be effective for the attribution of authorship of short samples is crucial, for which n-gram tracing will be considered for its conduction and programmed as one of the functionalities of ALTXA (see Section 4.5.4). As a matter of fact, n-gram tracing has already been used to analyse the play Arden of Faversham, since Taylor (2019) studied the authorship of the first 274 words of the tenth scene of the play, that is, Scene IV.i, using this method. The author states that he selected this excerpt because “(1) it owes nothing to the narrative sources of the play, and (2) it begins a long stretch of text that recent investigators […] agree was not written by Shakespeare” (2019, p. 859). It seems that the selection of such a specific portion of the text could be perceived as arbitrary, since analysing at least a complete scene of the play appears to be more rigorous than making an artificial cut of the manuscript. Taylor decided to consider 15 possible candidates for the attribution of authorship of the text. These were Munday, Greene, Nashe, Lodge, Shakespeare, Marlowe, Peele, Lyly, Kyd, Drayton, Wilson, Achelley, Chettle, Hathway and Thomas Watson, who was determined as the likeliest author of the text at the end of the study. To compile the reference corpora of the abovementioned authors, he decided to include dramatic and non-dramatic texts that were written between 1585 and 1594, although he further stated that “for some candidates, however, it has been necessary to extend the date range” (2019, p. 857), given the lack of undisputed works of authors like Achelley and Chettle from that period. Therefore, the approach adopted by Taylor was based on the hypothesis that the reference corpora of the candidates should be compiled with texts that belong to distinct literary genres, which is opposite to the idea on which the present thesis is based. The author explains that he identified the n-grams of two or more consecutive words that the disputed sample shared with the reference corpora of the possible candidates, which stands as a traditional approach towards n-gram tracing, but that also “searches were made for every collocation of two or more semantic words […] ten words before or 68 after each other” (2019, p. 859), excluding the function words among them, which is a less conventional way of applying this method. Taylor concluded that, even though the disputed sample presented more unique matches with the Shakespearean reference corpus, the ratio of unique matches per word was superior with the corpus of Thomas Watson, for which he attributed its authorship to him. There were 14 unique n-grams in common between the 274-word sample and the 30,397 words that formed Watson’s corpus. The main differences between Taylor’s approach and that of the present investigation lie in the fact that this doctoral thesis intends to analyse all the scenes of Arden of Faversham, which could be considered as natural divisions of the play, whereas Taylor studied the authorship of a fragment that was artificially selected, as well as in criteria for the compilation of the reference corpora of the possible candidates. This research is based on the hypothesis that an author’s idiolect is dynamic and hence the inclusion of plays from dissimilar periods and genres in a reference corpus will diminish the effectiveness of the study, which is an issue that will be addressed in detail in Chapter 4, where the methodological foundations of the thesis will be expounded. The most advanced procedure that has been selected for this doctoral thesis is the variant of the Zeta test suggested by Craig and Kinney (2009) to analyse the authorship of Arden of Faversham and other Elizabethan plays, which could be explained as follows. The first step consists in the compilation of a reference corpus for each of the two candidates (or groups of candidates) of the study. Once these corpora have been compiled, the texts that they contain must be divided in fragments of 2,000 words and the residual words at the end of each text must be combined with its last fragment. The disputed sample should be also divided following the same criteria. The second step is to obtain a list of 500 words that are characteristic of each candidate (or group of candidates) not only by their prominence in that corpus, but also by their low frequency or lack of appearance in the corpus of the other candidate(s). The formula to obtain each of these 500 markers for both corpora is the following. The researcher is expected to identify how many fragments of 2,000 words (or more, in the case of those that include the last words of a text) of the corpus of the first candidate(s) contain a given word and how many fragments of the corpus of the second candidate(s) do not contain that word, regardless of how many times it appears in each fragment. If the proportion of 69 fragments of the first candidate(s) that contain that specific word is transformed into a number from 0 to 1 and added to another number from 0 to 1 that stands as the percentage of fragments of the second candidate(s) that do not contain the word and the result is higher than 1, this word can become a marker of the first candidate(s). The 500 words with the highest scores that are superior to 1 following this procedure will be considered the markers of the first candidate(s). Afterwards, a list of 500 markers for the second candidate(s) must be obtained with the opposite procedure. With the purpose of filling these lists with distinctive lexical items, most of the function words and certain lexical words that are so related to the context of the play where they appear that they do not reflect an authorial pattern are not considered for their elaboration, that is, they are ignored during the mathematical process described in this paragraph. The final step is to place on an axis of coordinates the fragments in which the reference corpora of the two candidates (or groups of candidates) of the study have been divided, as well as the fragments in which the sample whose authorship wants to be tested has been divided. The value of the horizontal axis for each fragment stands as the number of markers of the first candidate(s) that it contains divided by its number of distinct words. Such division is made to compensate the superior length of those fragments that include residual words for being at the end of a text. Similarly, the value of the vertical axis for each fragment is the division of the number of markers of the second candidate(s) that it has by its number of distinct words. If the style of the two candidates (or groups of candidates) of the study is distinct enough, the fragments of the reference corpus of the first candidate(s) will occupy a specific area on the coordinate axis forming a cluster that is in a different position from the area occupied by the cluster created by the fragments of the reference corpus of the second candidate(s). Therefore, the proximity of the fragments of the disputed text to one cluster or the other will determine its likeliest authorship. Kinney (2009) analysed the authorship of the play Arden of Faversham with the Zeta test comparing Shakespeare with a group of more than 15 Elizabethan playwrights like Marlowe, Kyd, Heywood and Chettle, whose plays were combined in one corpus. The Shakespearean corpus was formed by 27 undisputed plays, whereas the non- Shakespearean corpus included 109. A relevant factor about these samples is that they belong to distinct subgenres and were elaborated between 1580 and 1619. 70 Kinney delineated a list of 500 Shakespearean markers where the word gentle occupied the first position, given that it was present in 69% of the Shakespearean fragments and it did not appear in 55% of the non-Shakespearean fragments, which is a total score of 1.24 if these percentages are transformed into numbers from 0 to 1 and added. On the other hand, the word that appeared at the top of the list that included the 500 non-Shakespearean markers was yes, whose score was of 1.27. Even though yes is a function word, it seems that the author decided to keep it as a potential marker because its usage reflects a choice made by the author, who also has the opportunity to write yea or ay. Nevertheless, a review of the literature on this issue shows that the selection of these linguistic forms in the Elizabethan period was more related to the dialect of the speakers and the linguistic context of the interaction than to their idiolect (see Culpeper, 2018). Neither the list of ignored words to obtain the 500 markers of each reference corpus, that is, the stop list, nor the complete lists of 500 markers themselves were revealed in the study, which would have been of use for other researchers. The author then placed on a coordinate axis the Shakespearean fragments, the non- Shakespearean fragments, and those of the scenes of Arden of Faversham, which were analysed as independent texts (see Appendix 2). The value of every fragment on the horizontal axis stands as the number of Shakespearean markers that it contains divided by its number of distinct words, whereas the value of the vertical axis reflects the number of non-Shakespearean markers that it includes divided by its number of distinct words. As can be observed in the graphical representation of the results presented in Appendix 2, Kinney attributed to Shakespeare the authorship of six scenes of the play, whereas the fragments of the rest of the scenes occupied the area of the non- Shakespearean cluster. The samples whose authorship was attributed to Shakespeare in the study were Scenes III.i, III.ii, III.iii, III.iv, III.vi and V.iii. I would like to comment on a few aspects about Kinney’s investigation on the authorship of Arden of Faversham. Firstly, it does not seem sensible to me to compile the corpora of the candidates with plays that were written between 1580 and 1619 and without making a distinction among subgenres. If the idiolect is defined as a dynamic phenomenon, the inclusion of plays that were written in distant periods and have different tones in the reference corpora reflects the opposite, that is, that the idiolect of an author stays fossilized throughout their entire career. One could hypothesize that the author behind Arden of Faversham adopted certain idiolectal features during the period in which 71 s/he elaborated the play, as well as when s/he was writing plays with a tragic tone that differs from that of comedies. Therefore, if the plays where these idiolectal features can be found are mixed with others that are so dissimilar, the effectiveness of the study might diminish. This stance reflects one of the main hypotheses on which this thesis is built (see Section 1.2) and will be addressed in more depth in the following chapter, where its methodological approach will be discussed. Secondly, I would suggest that authors should be compared individually with this method. If, for instance, Marlowe had a tendency to write a highly distinctive word that could be of use to distinguish between his texts and those of Shakespeare, but his works are mixed with many of other playwrights who did not use it, the average values of the group would cause this solid marker to disappear from the study. This has also been suggested as a hypothesis at the beginning of the thesis (see Section 1.2) and will be developed in more detail in Section 4.5.5. Lastly, I would like to highlight that, since the reference corpora are divided in fragments of 2,000 words or more, it does not seem statistically rigurous to compare them with most of the scenes of Arden of Faversham in terms of the number of markers that they contain from the two lists of 500, even if these are then divided by the number of distinct words of the fragment, given that most of the scenes of Arden of Faversham do not even have 500 words. Maybe this method should be only applied with disputed fragments that have a comparable length to that of the fragments in which the reference corpora are divided (see Section 4.5.5). As a matter of fact, Kinney states that “some of the scenes are very short, and their placement [on the coordinate axis] cannot be regarded as reliable” (2009, p. 94). The solution that he adopted was to divide Arden of Faversham in four large segments that contained consecutive scenes of the play and conduct the Zeta test again, but these results do not seem to be reliable either, since, as he himself admits, this approach “carries a greater risk of combining more than one author’s work in a single segment” (2009, p. 94). Following this approach, he did compare Shakespeare with Marlowe individually and, even though the graphical representation of the results derived from this study is not shown, the author states that the four segments were attributed to Shakespeare. Elliott and Greatley-Hirsch (2017) also analysed the authorship of Arden of Faversham using the variant of the Zeta test adopted by Kinney, among other tests. The most notable difference between their study and that of Kinney is in the criteria for the 72 compilation of the reference corpora of the candidates. These authors used plays that were elaborated between the years 1580 and 1594, while Kinney compiled his reference corpora with plays that were written between 1580 and 1619. None of the two studies considered the subgenre of the plays, which is something that this doctoral thesis intends to do. It is also worth mentioning that neither the stop list with all the ignored words for the calculation of the 500 markers of each candidate nor such lists of markers are shown in the study of Elliott and Greatley-Hirsch, as happens in that of Kinney. The authors divided Arden of Faversham “into overlapping blocks of 2,000 words advancing in 500-word increments, so that the first segment holds words 1-2,000, the second segment holds words 501-2,500, the third 1,001-3,000, and so on” (2017, p. 151). This contrasts with one of the aims of this thesis, which is to divide the play in its original scenes and, if some of these are too short to be analysed with this method, to study their authorship with alternative procedures. Elliott and Greatley-Hirsch compared in every case an author versus a group of authors. The candidates for the study were Greene, Kyd, Lodge, Lyly, Marlowe, Nashe, Peele, Shakespeare and Wilson. Therefore, they compared the plays of Shakespeare with the plays written by the other candidates of the study as a group and then they carried out the same procedure with Marlowe, Kyd and the others, which is an approach that contradicts one of the hypotheses suggested in this doctoral thesis, as has been previously explained. Their study concluded that “Shakespeare is the only authorial candidate to which it [i.e., the Zeta test] attributes any Arden of Faversham segments, and just six of them” (2017, p. 164). These fragments corresponded to the first part of Scene I.i and the totality of Scenes III.vi and IV.i. Finally, this section will briefly discuss other studies on the authorship of Arden of Faversham whose methods have not been adopted for the conduction of this doctoral thesis but have had a relevant impact on the academic community. Craig and Kinney (2009) described another method for the analysis of Elizabethan plays in which the frequency of function words is used to discriminate between two authors (or groups of authors). This procedure, called Principal Component Analysis, does not take into consideration if function words appear or not in certain segments as the Zeta test does with the words on which it focuses, given that the likeliest possibility is 73 that almost every function word will be present in all the fragments. In contrast, this method “works in frequencies and combines them so as to bring out more subtle patterns of use” (2009, p. 28), and thus if the frequency of a function word is considered as a variable, this test will give “each word-frequency variable a weighting so as to highlight cumulative similarities and dissimilarities” (2009, p. 31). Kinney conducted a Principal Component Analysis to study the authorship of Arden of Faversham and the results coincided to some extent with those of his Zeta test, for which he suggested that Shakespeare participated in the elaboration of “the middle section of the play” (2009, p. 99), and that there was at least another author involved in the process. Macdonald P. Jackson has also devoted considerable research to demonstrate the participation of Shakespeare in the creation of Arden of Faversham, especially in the Quarrel Scene, that is, Scene III.v. According to the author, only Shakespeare could have written a scene with such poetic value and emotional intensity between two characters.10 To prove this, he used a database where “words and phrases can be found, and so can instances of the proximity of one word or phrase to another” (2014, p. 17) called Literature Online (LION) to compare the play with the works of Elizabethan writers such as Shakespeare himself, Marlowe and Kyd. He searched for “[p]arallels in imagery and ideas […] only if passages had at least one prominent word in common”, as well as “[p]hrases and collocations rare enough to occur five or fewer times” (2014, p. 19). The results showed that the play with which Arden of Faversham shared more of these parallels was Henry VI, Part III, followed by The Two Gentlemen of Verona and Henry VI, Part II, which allowed Jackson to state that this is solid proof to attribute the authorship of the scene to Shakespeare. Nevertheless, according to recent research, the likeliest possibility is that the three parts of Henry VI were not only written by Shakespeare, but that he collaborated with Marlowe in their creation (see Section 2.2), and thus these results might also reflect the participation of the latter in the elaboration of the scene. Jackson reinforced the results of the abovementioned quantitative analysis by tracing images in Arden of Faversham that can be associated with others found in Shakespearean plays. He later used the LION database to see the frequency of these images in the works of other playwrights with the objective of assessing how rare they were. 10 Scene III.v from Arden of Faversham portrays a heated argument between the characters of Mosby and Alice. 74 In contrast, Vickers has strongly argued that Arden of Faversham was written entirely by Thomas Kyd. He used the software Pl@giarism, which was originally designed to detect cases of plagiarism among students, to find a series of word sequences in common between the play and the undisputed works of Kyd (2008). In addition, Vickers himself encountered parallel passages between Arden of Faversham and the works of this playwright and reinforced his argument in favour of Kyd by stating that the play “is far ahead of Shakespeare’s abilities at the beginning of his career” (2015, p. 11), even though he has been the preferred candidate by many scholars, as reflected in this section of the thesis. Nevertheless, these findings and the methods with which they were obtained have been heavily criticized by Jackson (2015, 2017) and Taylor to the point that the latter stated that “[…] it is surprising, and unfortunate, that the Times Literary Supplement [where Vickers published the two articles that have been referenced earlier] continues to give Vickers a platform” (2015, p. 6). Similarly, I would like to express my disagreement with the approach adopted by Vickers, who has directly attributed the authorship of Arden of Faversham to a single author. According to historical and literary sources, the likeliest possibility is that the play was written in collaboration (see Chapter 2), for which it seems more reasonable to divide it in smaller portions to study their authorship independently and only attribute it to a single author if the results derived from such analyses coincide. Furthermore, the technique of finding parallel passages between a disputed text and the reference corpus of a possible author, which is an approach followed by Jackson and Vickers, does not seem to be conclusive enough in studies involving Elizabethan playwrights, given that their styles tend to present a high degree of resemblance and thus it is possible to find similarities between any play and the reference corpus of any candidate, in my view. This is one of the reasons why this doctoral thesis will only rely on statistical criteria. In sum, Section 3.4 has offered an overview of the fundamentals of authorship attribution studies and the distinct types of texts that can be analysed within this disciplinary framework, with a special emphasis on the study of the authorship of historical texts in general, and that of Arden of Faversham in particular, given the focus of the thesis. This review of previous studies allows for the establishment of a connection between this chapter and the following, where the approach and the methods adopted for the present investigation will be developed. 75 3.5. Summary This chapter has provided the reader with a holistic perspective of forensic linguistics by commenting on its definition and historical development, as well as on its three main areas of study, known as the written language of the law, the spoken language of the law and the linguist as an expert witness. Firstly, the written language of the law has been presented as the branch of forensic linguistics that intends to make the laws more comprehensible through the Plain English Movement, which has been illustrated by classifying the prototypical features of legal documents and by analysing how to improve police cautions and jury instructions. Secondly, the spoken language of the law has been introduced as the area that examines the oral interactions that take place from the moment of arrest until a trial takes place, for which the manner in which oral evidence can be contaminated and the difficulties experienced by vulnerable participants involved in legal processes have been expounded. Lastly, the cases in which a forensic linguist is required to provide evidence for a case in which the use of language is involved or testify in court have been described with a special emphasis on authorship attribution studies, which have been addressed in a separate section. The main goal underlying the selection of this structure for the chapter has been to narrow down progressively its scope until previous studies on the authorship of Arden of Faversham have been depicted, together with the authorship tests considered for the conduction of this research. The following chapter of the thesis will provide an in-depth account of the steps that will be taken to trace the idiolectal features of William Shakespeare’s and Christopher Marlowe’s undisputed plays for further comparison with those of Arden of Faversham. 76 CHAPTER 4 | METHODOLOGY This chapter aims to provide the reader with a chronological explanation of the processes that have been followed for the conduction of the investigation. The criteria for the selection of William Shakespeare and Christopher Marlowe as the possible candidates for the attribution of authorship of Arden of Faversham will be addressed first. Afterwards, the manner in which the undisputed samples of each candidate have been compiled and adapted will be expounded, as well as the modifications that have been introduced in the disputed text, that is, Arden of Faversham, so that it can be compared with these reference corpora. This chapter will then present the tests that have been taken into consideration for the analysis and the need to evaluate their effectiveness in a series of pre-studies focused on the attribution of authorship of undisputed scenes of Shakespeare and Marlowe. Such pre-studies will be carried out with the aim of applying in the attribution of authorship of each scene of Arden of Faversham only those tests that have been proved to be reliable in a similar linguistic context. The lack of accessibility of the tools that can be used to conduct an analysis that includes some of the selected linguistic procedures generated the need to develop the software ALTXA. Its functionalities will be addressed in depth throughout this chapter, given that one of the main goals of the thesis is to facilitate the implementation of forensic linguistics in educational settings by offering this computational tool with an accessible interface as a free software to the academic community. Lastly, an explanation of the way in which the distinct functionalities of the software will be applied in the study of each type of scene will be provided, together with some guidelines on how to interpret every kind of outcome either in the pre-studies or in the final case study. 4.1. Delimitation of the scope of the investigation The analysis of the authorship of Arden of Faversham has been selected as the focus of the thesis for two main reasons. Firstly, it represents a continuity of the research that I developed in my previous work (see Section 1.1). Secondly, there is a scarcity of studies approaching this topic from a forensic linguistic perspective and those that have already been conducted could be considered inconclusive, since there is still much disagreement among scholars on which author(s) could have been involved in the elaboration of the text (see Section 3.4.4). 77 Given the considerable length of the analysis, which will be expounded further on, as well as the many possible candidates that have been suggested for the authorship of each scene, the scope of this thesis needed to be narrowed down to the selection of two candidates. This decision has been made because of the way in which certain procedures like the Zeta test will be applied and, most importantly, to put the focus on establishing a solid methodological and computational basis for further studies involving the rest of the candidates. Even though there have been over fifteen playwrights considered as potential authors of the play in previous studies (see Section 3.4.4), scholars tend to agree on the fact that the three main candidates for the authorship of Arden of Faversham are William Shakespeare, Christopher Marlowe and Thomas Kyd (see Section 2.3). Many researchers who have studied the play from a linguistic point of view have suggested Shakespeare as its partial author (see Section 3.4.4), for which it seemed reasonable to select him for the study in the first place. The reasons why Marlowe has been selected over Kyd for the conduction of this research are that he is known to have collaborated with the Bard in the elaboration of Henry VI and that, if his life events are taken into consideration, one could ponder that he is prone to be associated with an anonymous play of this kind (see Section 2.2). As a matter of fact, his play Tamburlaine was published anonymously (Boas, 1940, as cited in Kinney, 2009). On the other hand, the studies that have supported Kyd’s authorship are far from being widely accepted in the academic community, as explained in Section 3.4.4, and there are only a few single-authored texts attributed to him, which would hinder a subsequent analysis. In any case, the fact that Shakespeare and Marlowe have been selected as the candidates for the attribution of authorship of Arden of Faversham in this study does not mean that the rest of the possible authors will not be taken into consideration in future lines of research, Thomas Kyd being the first one on the list (see Section 8.2). In sum, the objective of the thesis has not been delineated as determining the authorship of Arden of Faversham conclusively, since there are other candidates that have not been included in the analysis, but to discern if the likeliest author of each scene is Shakespeare or Marlowe in case that it was indeed written by one of them. Therefore, this thesis should be seen as the first step of a long-term academic project whose priorities are, for the moment, the determination of Shakespeare or Marlowe as the likeliest author 78 of each scene of the play for future comparisons with Thomas Kyd and the rest of the Elizabethan playwrights, the creation of a solid methodology that can pave the way for those future studies and the development of an accessible computer program with a wide range of functionalities for forensic authorship attribution (see Section 1.2). 4.2. Data collection This section seeks to present the criteria for the compilation of the reference corpora, that is, the plays that will be used to delineate Shakespeare’s and Marlowe’s idiolect for further comparison with Arden of Faversham. On the one hand, Taylor’s study represents an approach for the compilation of the reference corpora that is opposite to that of the present thesis, given that it was based on the notion that “attribution problems in that period can be better understood if plays are tested against authorial canons that include non-dramatic as well as dramatic works” (2019, p. 1). On the other hand, there has been a tendency in studies of this kind to limit the selection of the undisputed works of the candidates to those that belong to the same genre and were written during a similar period to the one in which the disputed text was created, given that an author’s idiolect is not fossilized and thus a play that someone wrote in 1590 may greatly differ from another text that the same person wrote in 1610, for instance. This approach can be observed if the study of Kinney (2009), which included plays that date from 1580 to 1619, is compared to that carried out by Elliott and Greatley-Hirsch (2017), where the selection of the plays for the analysis was restricted to those that were elaborated between 1580 and 1594. In my view, the style of Shakespeare and Marlowe evolved constantly and hence there might be significant stylistic inconsistencies among plays that were elaborated with a difference of more than five years. For that reason, the first criterion for the selection of the Shakespearean and Marlowian texts for the present research is that they need to have been written approximately between the years 1590 and 1595, given that Arden of Faversham was published in 1592 and probably elaborated during that year or the year before (see Section 2.3). This stands as a continuation of the approach suggested by Elliott and Greatley-Hirsch in their study (2017). 79 In addition, I have noticed that the forensic linguistic studies of Elizabethan plays tend not to take into consideration something that seems to be crucial when these texts are analysed in the field of literature, which is their subgenre. The tone of a comedy seems considerably distinct from that of tragedies and history plays and, as a result, one could hypothesize that there might be notable idiolectal differences among subgenres that need to be taken into consideration in the compilation of the corpora. Therefore, the exclusion of comedies such as Shakespeare’s The Two Gentlemen of Verona will be considered as another criterion for the elaboration of the reference corpus of each author, given that Arden of Faversham is a domestic tragedy, despite the presence of a few comic scenes in the play. Narrowing down the selection of the reference plays of each candidate to those that were approximately written between 1590 and 1595 and are not comedies is an innovative way of carrying out the attribution of authorship of Elizabethan plays in general and that of Arden of Faversham in particular. This approach, which differs from those adopted by Kinney (2009), Elliott and Greatley-Hirsch (2017) and Taylor (2019), is built upon the hypothesis that the idiolect is such a dynamic phenomenon that significant changes may arise over short periods of time and among different subgenres and thus this question must be given maximum priority (see Section 1.2). Consequently, it is preferable to have shorter samples than other studies on this subject but to compile reference corpora that present almost identical idiolectal features to those that characterize the disputed text. It may seem that this approach has two main drawbacks, which are the inherent difficulties in finding sole-authored and undisputed plays that meet the abovementioned requirements and, most importantly, that the resulting corpora may not be considered representative enough to define Shakespeare’s and Marlowe’s idiolect due to an insufficient number of words, which is an issue that will be addressed further on. The plays selected for the compilation of the Marlowian corpus are The Jew of Malta (1589) and Edward II (1592),11 since Arden of Faversham and these two works were written no more than three years apart and they both are plays with a tragic tone attributed to Marlowe without major doubts. Examples of Marlowe’s plays that have been discarded from the analysis are Dr. Faustus, given that it contains scenes which he probably wrote 11 These dates have been taken from the official webpage of The Marlowe Society: http://www.marlowe- society.org/christopher-marlowe/works/ 80 in collaboration with other playwrights (Elliott & Greatley-Hirsch, 2017), and The Massacre at Paris, for being extremely short in comparison to other eligible texts. The two plays selected for the compilation of the Shakespearean corpus are Richard III (1592-1594) and Richard II (1595-1596).12 These plays are not comedies; they were elaborated during the period that was established as acceptable for the conduction of the study and there seems to be consensus within the linguistic and the literary community about the fact that they were written only by Shakespeare. Examples of other plays that have been considered for the compilation of the corpus because they were elaborated between 1590 and 1595 but do not meet the rest of the established criteria for such selection are Henry VI, Part I, due to the presence of studies that suggest that he wrote this text in collaboration with Christopher Marlowe (see Section 2.2), and The Comedy of Errors, for being a comedy. In sum, due to the period in which they were written, their subgenre and the fact that they are sole-authored and well-attributed plays, the selected texts for the compilation of the Marlowian reference corpus are The Jew of Malta and Edward II, while Richard III and Richard II have been chosen to delineate Shakespeare’s idiolect for further comparison with Arden of Faversham. Nevertheless, these undisputed works and Arden of Faversham itself needed to be carefully selected among many digital editions and later edited to make the posterior analysis as precise as possible, which is an issue that will be addressed in the following section. 4.3. Extraction and adaptation of the samples This section will discuss the criteria followed to extract and adapt the five plays that constitute the focus of the analysis. During the process of selecting the most suitable digital edition of the texts, it has been of paramount importance to avoid major spelling inconsistencies by taking all of them from a single source with unified criteria. For this reason, the samples have been extracted from the archives of Project Gutenberg,13 since they have published the five plays and claim to have prioritized the preservation of the original words used in the first quarto of Arden of Faversham from 1592, a quarto from 1598 of Marlowe’s Edward II, the first edition of The Jew of Malta, which is a quarto of 12 These dates have been taken from the official webpage of The Royal Shakespeare Company: https://www.rsc.org.uk/shakespeares-plays/timeline 13 https://www.gutenberg.org/ 81 1633, and Shakespeare’s Richard III and Richard II from the 1623 edition of the First Folio (see Primary Sources for the links to access each edition). Even though no public access to a scanned version of the previously referenced Marlowian texts and the 1592 quarto of Arden of Faversham has been found, there is a scanned version of Shakespeare’s 1623 First Folio that can be accessed online,14 for which I compared the editions of Richard III and Richard II published by Project Gutenberg with those of the Folio. The main goal behind this procedure was to ensure that the selection of words of the plays published by Project Gutenberg is faithful to that of the First Folio, even though Project Gutenberg has adapted their spelling slightly to be better understood by modern audiences. Afterwards, the five plays extracted from Project Gutenberg were compared among themselves to see if the spelling criteria are unified and no major differences could be found. This spelling adaptation and homogenization does not constitute a problem for the development of the research, given that its focus is on the selection of words of the plays rather than their spelling. This is because spelling inconsistencies are frequent in the previously referenced original editions, since these were usually transcribed by more than one person (see Ryskina et al., 2017 for an introduction to compositor attribution studies with Elizabethan texts), which is why the authorship analysis of published works stands as such a complex task. In case that there are inconsistencies that have not been noted and edited as well as minor editorial modifications in the samples from Project Gutenberg, it seems that these are not significant enough to constitute a threat for the preciseness of the large-scale statistical analysis on which this investigation is built and can be seen as the low but inevitable error rate that characterizes studies of this kind. Once an explanation of the reasons underlying the selection of Project Gutenberg as the source for the compilation of the texts that constitute the focus of the research has been provided, the modifications that have been introduced in these plays to improve the quality of the subsequent analysis will be listed and exemplified below. A) In order to put the focus on the dialogues of the characters exclusively, all kinds of external indications have been erased, as can be observed in this example taken 14 A scanned version of Shakespeare’s First Folio can be found in the following webpage: https://internetshakespeare.uvic.ca/Library/facsimile/overview/book/F1.html 82 from the beginning of Scene I.i from Richard III, where the original text was the following. ACT I. SCENE I. [Enter RICHARD, DUKE OF GLOUCESTER, solus.] GLOUCESTER. Now is the winter of our discontent Made glorious summer by this sun of York; And all the clouds that lour'd upon our house In the deep bosom of the ocean buried. The piece of text presented above has been modified according to the criterion described earlier and this is the resulting sample. Now is the winter of our discontent Made glorious summer by this sun of York; And all the clouds that lour'd upon our house In the deep bosom of the ocean buried. Those stage directions embedded in the dialogues of the characters, such as [aside], have also been erased. This decision has been made to avoid their contamination under the belief that idiolectal features are less likely to appear in indications of this kind. In other words, stage directions can be seen as a different subgenre within the play where playwrights are only expected to provide basic instructions that do not reflect significant linguistic choices in the same way that a dialogue does. B) Given that one of the authorship tests selected for the conduction of the analysis is based on the average number of words per sentence of the texts (see Section 4.5.2), a decision on how to proceed in those cases in which a character interrupts another has been made, as can be seen in the extract from Scene IV.iv from Richard III provided below. KING RICHARD. Then, by my self- QUEEN ELIZABETH. Thy self is self-misus'd. 83 In this excerpt, the character of Queen Elizabeth interrupts that of King Richard and, if the names of the characters are erased, it could seem like it is a single sentence, when it is in fact a sentence that interrupts another. For this reason, interruptions of this kind have been divided with a period, so that the software ALTXA can count them as two separate sentences. Therefore, the resulting text is as follows. Then, by my self. Thy self is self-misus'd. C) The non-linguistic elements embedded in the texts, such as the footnotes’ numbers, have been removed. D) Since the software ALTXA cannot recognize certain characters, these have been modified, and for instance Æ has been turned into ae and ë into e. E) The samples included a few typos that have been corrected. These could be found in sentences containing an opening bracket that was not followed by a closing bracket, in excerpts where the name of a character was mentioned within a dialogue but the whole name had been written in capital letters as if it was an indication of who was speaking and, more predominantly, in fragments where two hyphens appeared together instead of a dash. F) The prologues and the epilogue written by Thomas Heywood in the 1633 quarto of The Jew of Malta (Elliott & Greatley-Hirsch, 2017) have been erased from the corpus. All the changes described above have been introduced manually and the resulting texts have been revised twice. This adaptation of the plays allows for the conduction of the analysis, whose structure will be expounded in the following section. 4.4. Structure of the analysis Given the possibility that Arden of Faversham was written in collaboration, its scenes will be analysed as independent texts. In cases of possible cooperation between two or more playwrights in an Elizabethan play, forensic linguists can adopt two approaches, the first one consisting in the division of the play in even fragments for further analysis, and 84 the second one involving the analysis of the original scenes of the text.15 The latter has been selected for the conduction of the study, given that it seems more sensible that if Shakespeare and Marlowe had elaborated the play together, they would have probably assigned certain scenes to one or the other depending on its thematic content. For instance, one of them may have been in charge of the scenes with the characters of Black Will and Shakebag, while the other could have written the romantic scenes between Mosby and Alice. This approach has a major drawback, which is the disparity in the length of the scenes of the play. This can be seen in Table 1, which details the number of words of the scenes of Arden of Faversham, once it has been edited under the principles described in the previous section of the chapter. Table 1 | Length of the scenes of Arden of Faversham Scene Length Scene I.i 5,133 words Scene II.i 916 words Scene II.ii 1,694 words Scene III.i 822 words Scene III.ii 516 words Scene III.iii 357 words Scene III.iv 240 words Scene III.v 1,293 words Scene III.vi 1,265 words Scene IV.i 838 words Scene IV.ii 263 words Scene IV.iii 593 words Scene IV.iv 1,250 words Scene V.i 3,477 words Scene V.ii 106 words Scene V.iii 179 words Scene V.iv 117 words Scene V.v 321 words Scene V.vi (Epilogue) 148 words 15 It is worth mentioning that “[t]he 1592 Quarto, the only substantive text, is not divided into acts or scenes. Modern editions […] divide the play into eighteen scenes and an epilogue. Each of these scenes ends with […] ‘Exeunt’, and so there are clear-cut ‘natural’ divisions” (Kinney, 2009, p. 91). 85 Despite the plethora of studies that have analysed the effectiveness of distinct methods in authorship attribution studies, I believe that the effectiveness of every procedure depends on the linguistic context where it is applied. In other words, the validity of any given method depends on the type of text where it is applied, its length and the idiolectal features of its potential authors, and thus an authorship test that has been proved to be effective to distinguish between Shakespeare’s and Marlowe’s fragments of 2,000 words may not be useful to determine the authorship of their shortest scenes, and a test that works well for these two authors may not find significant idiolectal differences if, for instance, Marlowe’s texts are replaced by others written by Thomas Kyd. The scenes of Arden of Faversham have been divided into four groups according to their size and a series of pre-studies will be conducted to analyse the authorship of undisputed scenes from the Shakespearian and the Marlowian reference corpora that have a similar length to those included in each of these four groups as if they were anonymous. Such analyses will be carried out to discern which are the most reliable authorship tests for each type of scene of Arden of Faversham. In other words, the pre-studies will be carried out to only apply in the attribution of authorship of the scenes of Arden of Faversham those procedures that have been proved to be highly effective in the analysis of undisputed scenes of the two candidates of the study. In addition, the conduction of pre-studies of this kind allows the researcher to have a reference of what kind of outcome can be considered valid in the subsequent case study. The first group in which the scenes of Arden of Faversham have been divided, which is the largest, is formed by scenes that contain between 100 and 450 words, and the study of their authorship constitutes the most challenging task of the research. The second group includes scenes whose number of words ranges from 500 to 950. These samples are more representative than those of the first group, but they are still considerably short. The following group includes three scenes whose length ranges from 1,100 to 1,700 words. Finally, the fourth group is constituted by the two largest scenes of the play, which have more than 2,000 words and seem to be the ones whose authorship can be attributed more easily, since idiolectal features are more likely to arise as the number of words of a sample increase. In sum, this investigation will be divided into a series of pre-studies and a case study. The objective of the pre-studies is to analyse a representative number of scenes from the undisputed plays of Shakespeare and Marlowe as if they were disputed texts to only apply 86 in the case study, that is, the attribution of authorship of the scenes of Arden of Faversham, those methods that have been proved to be reliable with samples written by these candidates and have a comparable size. There is a bewildering number of variables that can alter the results of an idiolectal study, which creates the need to control as many of them as possible (Kredens, personal communication, February 17, 2019). This justifies the conduction of the pre-studies, that the selection of the undisputed plays of Shakespeare and Marlowe has been narrowed down to such a specific period of time and why comedies have been excluded from these reference corpora. In response to the question that raised a few pages ago about whether these Shakespearean and Marlowian reference corpora may not appear to be representative enough to analyse the authorship of Arden of Faversham due to an insufficient number of words, I would suggest the following answer, which derives from one of the main hypotheses on which the investigation is built (see Section 1.2). The representativeness of a reference corpus is not only determined by its length, but also by the extent to which its texts are able to reflect the conditions in which the disputed sample was elaborated. The Shakespearean reference corpus contains 50,057 words, whereas that of Marlowe is constituted by a total of 38,434 words. Given that each scene of Arden of Faversham will be analysed as an independent text and most of them will not exceed 1,000 words, the length of these reference corpora seems sufficient to establish reliable comparisons. Furthermore, they have been compiled with undisputed texts that are similar to the scenes of Arden of Faversham in terms of their subgenre and the period in which they were elaborated, for which they are truly representative of the idiolectal features of this disputed text. The following section of the chapter will present the tests that have been taken into consideration for this study and the software that has been specifically developed for its conduction. 4.5. Selection of the authorship tests for the analysis and the role of ALTXA This section will examine the authorship attribution methods selected for the analysis and how they can be accessed in the software ALTXA. These are based on the quantification of the relative frequency of a series of keywords in the plays, the calculation of their average number of words per sentence and their lexical richness, tracing common n-grams and the conduction of the Zeta test. A computational tool is required to carry out the 87 abovementioned procedures, for which the suitability of already available programs was assessed at the initial stage of the investigation. There are some with an intuitive interface, such as WordSmith Tools, Voyant Tools and AntConc, whose usage is accessible for linguists and were programmed to conduct simple tasks like the calculation of the relative frequency of a keyword in a text. However, they cannot carry out some of the tests mentioned earlier and, for instance, WordSmith Tools and Voyant Tools do not include n-gram tracing among their functionalities and the conduction of a Zeta test is not available in any of these programs. WordSmith Tools is a computer program16 with three main functionalities. These are to identify the concordances of a word selected by the user in a corpus, to generate a list of all its words according to their frequency and to calculate the number of appearances and relative frequency of a set of keywords, which is a functionality that it shares with ALTXA (see Smith, 2021 for a review of the latest version of WordSmith Tools). Voyant Tools is an online platform17 with a simplified interface where the user can upload a corpus of one or more texts, press the button Reveal and have instant access to all the parameters that it measures. The functionalities that it shares with ALTXA are the calculation of the relative frequency of a set of chosen keywords in the corpus, its average number of words per sentence and its lexical richness (see Alhudithi, 2021 for a detailed list of all the functionalities of Voyant Tools). AntConc is a computer program18 that shares with ALTXA the ability to calculate the relative frequency of keywords selected by the user in a corpus (although it can also generate its own list of keywords using the log-likelihood or the chi squared method, which are procedures that ALTXA cannot carry out), as well as its lexical richness. It also allows the user to conduct a customized search for n-grams in a corpus, whereas ALTXA has been programmed to detect the n-grams that two corpora share (see Smith, 2021 for a review of one of the latest versions of AntConc). On the other hand, more powerful tools like the software Sketch Engine and the programming language R can be found on the Internet. Their strength relies on the wide range of functionalities that they offer, but their usage might be complicated for those 16 Available at https://www.lexically.net/wordsmith/ 17 https://voyant-tools.org/ 18 Available at https://www.laurenceanthony.net/software/antconc/ https://www.lexically.net/wordsmith/ https://voyant-tools.org/ https://www.laurenceanthony.net/software/antconc/ 88 who do not have a solid IT background, which is not only my case, but the case of many linguists. Sketch Engine is an online platform19 that offers many possibilities for the compilation and treatment of a corpus. Its main functionalities are presented on the interface as Word Sketch, Word Sketch Difference, Thesaurus, Concordance, Parallel Concordance, Wordlist, N-grams, Keywords, Trends and One-Click Dictionary, but these include a wide range of advanced settings (see Arias Rodríguez & Fernández-Pampillón Cesteros, 2020 for a thorough explanation of each of them). Its N-grams tool is more similar to the one that is present in AntConc than to that of ALTXA, given that it is mainly focused on providing the user with a customized search for n-grams within a corpus, rather than comparing those that two corpora share. Despite the many functionalities that Sketch Engine includes, the Zeta test is not programmed as one of them, as happens with the simpler tools presented earlier. R is a programming language that was mainly created for statistical computing. As also happens with Python and other programming languages, its possibilities are almost endless, if the user knows how to divide complex authorship attribution methods into simple tasks and program them. Despite the attempts at making its usage accessible for linguists through specialized courses (see Análisis de textos y estilometría usando R, organized by the Universidad Nacional de Educación a Distancia20), this requires considerable time and effort for those who do not have experience in programming. Given the limitations of these tools, I decided to develop a software that included a representative catalogue of authorship tests within the disciplinary field of forensic linguistics among its functionalities and presented a simplified interface so that it could be accessible to all linguists. With that purpose in mind, I contacted computer programmer Carlos Antón and we invested a couple of years in the creation of a Java program named ALTXA, which is compatible with all operative systems and admits texts in Spanish and English. The implementation of ALTXA in professional and educational settings to contribute to the development of this relatively modern discipline has been delineated as one of the main goals of this doctoral thesis (see Section 1.2). The following 19 https://www.sketchengine.eu/ 20 Information available at https://formacionpermanente.uned.es/tp_actividad/idactividad/10010 https://www.sketchengine.eu/ https://formacionpermanente.uned.es/tp_actividad/idactividad/10010 89 subsections will address the functionalities of ALTXA and the manner in which they will be applied in the pre-studies and, if these are successful, in the final case study. 4.5.1. Quantification of the relative frequency of keywords The first method that was selected for the study is based on the calculation of the relative frequency of a series of keywords chosen by the researcher in the disputed texts, that is, the scenes of Arden of Faversham, and the reference corpora (see Section 3.4.4 for an account of previous research involving this procedure). This selection of keywords was based on my personal judgement after having read most of the plays written by Shakespeare and Marlowe and consisted in a list of words from Arden of Faversham that I thought to be more characteristic of one author than the other. This was the first function programmed in ALTXA and can be accessed in the following way. The interface of the program includes tabs that allow for the conduction of distinct types of tests, which will be listed and explained throughout this chapter. When the user clicks the Text Analysis tab, they will find a file chooser called Text file where they are expected to upload a document in .txt format that contains the text where the analysis will be conducted. There will also be another file chooser called Keywords file where the user has the option to upload a .txt document with a list of keywords. Such keywords must be written separated by single spaces and the program will not make a distinction between capital and lowercase letters to avoid counting as two distinct lexical items a same word which has been written with and without a capital letter, as in You and you. When the button Execute is clicked on the interface of ALTXA, the program will count the number of times that each keyword appears in the sample, divide the result by the total number of words, that is, the tokens, and multiply it by a hundred to establish a percentage that stands as the relative frequency of the keyword. The software then will detail on the blank space of its interface the relative frequency of all the keywords selected by the researcher, as well as the average number of words per sentence of the text, its lexical richness and other parameters that have not been considered for this study, such as the average number of letters per word (see Figure 1). 90 Figure 1 | Interface of ALTXA for text analysis A few months after the inclusion of this functionality in the software, I came up with a study on the authorship of the Bixby Letter conducted by Grieve et al. (see Section 3.4.4) where they stated the following regarding the use of authorship methods that are based on the quantification of a set of features selected by the researcher: In forensic linguistics, short texts are often attributed by manually selecting linguistic features from the questioned document that appear to be relatively distinctive or rare and then by searching for these forms in the writing samples of each possible author. Although this method is logical and regularly applied in casework, there are […] potential issues with its application. First, it is unclear how to select an exhaustive or at least an unbiased feature set […]. It is unclear how to judge whether differences in the use of forms in the possible author writing samples are sufficient in the aggregate to attribute the questioned document: because this approach relies on the judgement of the analyst and therefore cannot be consistently or mechanically applied, it is difficult to systematically evaluate the reliability of such methods. (2018, pp. 5- 6) The excerpt presented above made me question the nature of this test and eventually discard it from the analysis. I realized that most of the selected keywords were mainly influenced by the works of Shakespeare for the simple reason that he was the author that 91 I had read the most. In other words, my selection of keywords would have provided Shakespeare with more chances to be selected as the likeliest author of the scenes of Arden of Faversham, for which I decided that all the tests involved in the analysis should not rely in any way on my judgement. Nevertheless, this function has been kept in ALTXA for other linguists who may decide to carry out an analysis of this kind, since there might be linguistic contexts in which the calculation of this parameter could be useful. In sum, the quantification of the relative frequency of keywords, which was initially selected as one of the tests for the conduction of the study, has been discarded because of its reliance on subjective criteria. The following subsections will show the fundamentals of those that will play a role in the pre-studies. 4.5.2. Quantification of the average number of words per sentence The second test selected for the conduction of the study is based on the quantification of the average number of words per sentence of the samples. As pointed out in Section 3.4.4, it seems that this parameter could be effective to discern which author tends to write more complex syntactic constructions. To measure this parameter, ALTXA has been programmed to count the total number of words of a given sample, that is, its tokens, and divide the result by its number of sentences by considering a period, an exclamation mark, an interrogation mark and a colon as the end of a sentence. This function can be accessed if the Text analysis tab is clicked on the interface of ALTXA, which is the same where the quantification of the relative frequency of keywords in a sample and the calculation of its lexical richness can be conducted (see Figure 1). As underlined in the previous section, the user will find a file chooser called Text file, where a document in .txt format containing the text that they wish to put into analysis can be uploaded. Once the user clicks the button Execute, ALTXA will detail on the blank space of its interface the average number of words per sentence of the text, its lexical richness and, if they previously uploaded a .txt document with keywords to the file chooser Keywords file, it will also indicate the relative frequency of such keywords. This parameter will be included in a pre-study that will analyse undisputed scenes of Shakespeare and Marlowe to assess its effectiveness to distinguish between samples written by both authors. Four analyses, one for each of the four types of scenes that have 92 been delineated earlier according to their length, will be conducted to determine if this authorship test can be used to analyse the authorship of Arden of Faversham. Firstly, five random scenes of the Shakespearean corpus whose length is between 100 and 450 words will be extracted and their average number of words per sentence will be calculated. Afterwards, the same procedure will be conducted with five random scenes from the Marlowian corpus whose length also ranges from 100 and 450 words under the assumption that, if this test is effective in this linguistic context, the Shakespearean scenes will present similar values among themselves and, at the same time, that these results will be different enough from those derived from the analysis of the Marlowian fragments, whose values should be also similar among themselves. If that is the case, a posterior calculation of the average number of words per sentence of the scenes of this length from Arden of Faversham could allow for their association with the values of one of the candidates. The same procedure will then be repeated with undisputed scenes of both playwrights from the other three groups, whose number of words is between 500 and 950, between 1,100 and 1,700, and similar or superior to 2,000, respectively. If the results derived from the analysis of the scenes of any group show enough intra-author consistency and inter- author variation, this method will be used to analyse the authorship of the scenes of Arden of Faversham that belong to the same group. Five scenes of each author from the second and the third group will be analysed, whereas, due to the scarcity of undisputed scenes of almost 2,000 words or more in the Marlowian reference corpus, the fourth stage of the pre-study will include five Shakespearean scenes and four of Marlowe. It is reasonable to believe that, as the size of the samples increases, so will do the effectiveness of this method, given that a higher number of sentences will facilitate the stabilization of this value, which will allow for the existence of more intra-author consistency. Nevertheless, it is hard to predict if the average number of words per sentence of Shakespeare and Marlowe will overlap, which would automatically exclude this test from the final case study, since it would be impossible to associate a scene from Arden of Faversham with one of the authors in terms of this parameter. In sum, four analyses will be conducted to determine if the average number of words per sentence of the scenes of Shakespeare and Marlowe presents sufficient intra-author 93 consistency and inter-author variation to be later used to determine the likeliest authorship of the scenes of Arden of Faversham (see Section 5.1). 4.5.3. Quantification of the lexical richness The belief that the quantification of the lexical richness of the samples can be of use to discern which of the two candidates of the study handled a wider range of vocabulary has determined its inclusion in the research. To calculate this parameter, which can be also accessed in the Text analysis tab, ALTXA has been programmed to divide the number of distinct words of a sample, or types, by its total number of words, or tokens, and multiply the result by a hundred. As happens during the calculation of the relative frequency of keywords, the software will not make a distinction between capital and lowercase letters while measuring the lexical richness of a sample to avoid counting as two different types a word that has been written with and without a capital letter. The results derived from its calculation will appear on the blank space of the interface together with the average number of words per sentence of the sample and the relative frequency of the selected keywords, among other parameters (see Figure 1). With the purpose of assessing the reliability of this parameter to analyse the authorship of Arden of Faversham, a pre-study divided into four stages, one for each of the four types of scenes in terms of their length, will be carried out with undisputed scenes taken from the Shakespearean and the Marlowian corpora. The objective of these analyses is the same as in those on the average number of words per sentence, that is, to discern if there is enough intra-author consistency and inter-author variation to later associate with clarity the lexical richness of the scenes of Arden of Faversham with one of the two candidates. Five Shakespearean and five Marlowian scenes will be included in each stage of the pre-study, with the exception of the fourth, which will analyse five Shakespearean scenes and the only four scenes of almost 2,000 words or more that the Marlowian corpus contains, as in the previous pre-study. The scenes of each group will not be randomly selected as in the pre-study on the average number of words per sentence, given that, as the size of a sample becomes larger, the chances of repeating words are higher, and hence slight increases in the number of words of a sample may greatly lower its percentage of lexical richness. For that reason, the samples whose number of words is more similar will be selected for each stage of the pre-study, creating subgroups of scenes of almost identical length within each group to 94 optimize even more the results. This contrasts with the criterion behind the selection of the scenes for the pre-study on the average number of words per sentence, given that this parameter is not so heavily affected by the size of the samples and thus it is enough to compare random scenes that belong to the same group. It is worth mentioning that, since the scenes of almost 2,000 words or more of the two reference corpora have disparate lengths and hence they cannot be divided into subgroups as those of the other three groups, the decision of establishing after the calculation of their lexical richness a projection of what these values would be if their size was more balanced has been made to evaluate more efficiently the results of the fourth stage of the pre-study (see Section 5.2.4). It could be hypothesized that this discriminator needs larger samples than those involved in the four stages of the pre-study to present intra-author consistency, but this needs to be proved with concrete studies, which will be carried out in Section 5.2. In brief, the pre-study on the calculation of the lexical richness of undisputed scenes of Shakespeare and Marlowe aims to investigate whether there is enough intra-author consistency and inter-author variation in any of the four types of scenes to later apply this test in the attribution of authorship of the scenes of Arden of Faversham. 4.5.4. N-gram tracing A study of the common n-grams between the disputed text and the reference corpora stands as the next authorship test selected for the analysis. Taylor (2019) conducted a study of this kind on the authorship of a small fragment of Arden of Faversham where he compiled a reference corpus for each of his candidates that included texts written in relatively different periods and that belong to distinct literary genres (see Section 3.4.4). The approach of the present study differs from Taylor’s in what has been considered a representative idiolectal sample, since the Shakespearean and the Marlowian corpora of this research have been compiled with plays that were written in a period which was close to the year 1592, that is, when Arden of Faversham was first published, and are not comedies, which may have a notable impact on the results, as has been argued throughout this chapter. Furthermore, this research follows the traditional vision of n-grams as combinations of linguistic forms that appear consecutively within a same sentence, which constitutes another difference with the analysis conducted by Taylor, who also traced certain types of non-consecutive combinations of linguistic forms. 95 There are character n-grams and word n-grams, as pointed out during the explanation of previous studies of this kind (see Section 3.4.4). Even though both types of n-grams have been proved to be useful in certain linguistic contexts, I have decided to focus on word n-grams under the hypothesis that these reflect more distinctive linguistic constructions (see Section 1.2). In addition, this research will only study word n-grams of at least two words, which is a similarity with the research conducted by Taylor. The main reasons underlying the selection of this approach is that, firstly, a combination of two or more words in common tends to be more distinctive than a combination of letters or a single word in common. In addition, the Zeta test, which will be expounded later, identifies distinctive single words in common between the disputed text and the reference corpora, for which a study of n-grams that also focuses on that would be redundant. As a matter of fact, most of the common words that n-gram tracing reveals are function words, which are not as distinctive as the lexical words on which the Zeta test mainly focuses, and thus the latter will present more significant results when studying single words in common. In any case, the identification of all types of word n-grams has been programmed in ALTXA, where the test can be accessed as follows. When the user clicks the tab called N-gram analysis (see Figure 2), they will find two file choosers called Text A file and Text B file, which only admit documents in .txt format. The first file chooser is expected to store the shortest sample, since, when the button Execute is clicked, the program will make a list of all the word n-grams of the document stored as Text A and then look for coincidences with Text B ignoring commas and other punctuation marks within the sentence. There is no problem in uploading the largest sample to the first file chooser, but the process will take longer for ALTXA. The software will then generate on the blank space of its interface a list of all the word n-grams shared between both samples. This list will not only indicate the number of n- grams of each type in common, but it will also offer a detailed list of which are those n- grams and how many times they appear in each of the two samples. The order in which the n-grams are listed will be determined by its length, and for instance 4-grams will appear before 3-grams, and so on. 96 Figure 2 | Interface of ALTXA for n-gram tracing The main problem behind the comparison between a text and the reference corpora of Shakespeare and Marlowe was that their length was dissimilar. While the corpus of Christopher Marlowe had 38,434 words after the editing process that has been described earlier in this chapter, Shakespeare’s corpus contained 50,057 words. The Shakespearean corpus would have always had more chances to present more n-grams in common with a disputed sample than the Marlowian corpus, for which a solution needed to be adopted. This has been to remove a similar number of words from Richard III and Richard II, that is, the two samples included in the Shakespearean corpus, to ensure that both candidates are in equal conditions to become the likeliest author of every scene whose authorship is tested. Removing words from the beginning or the end of the plays may seem biased, for which its point of departure has been determined by a randomly generated number that indicated a word number of the play. To avoid leaving unfinished sentences in the corpus, the removal began in the sentence that followed the randomly generated word number. Following this procedure, 5,808 words have been removed from Richard III starting from the sentence that followed the excerpt “I lay it naked to the deadly stroke, and humbly beg the death upon my knee” in Scene I.ii, while 5,819 words have been erased from Richard II starting from the sentence that followed the excerpt “Then I must not say no” in Scene III.iv. The resulting reference corpus of the Bard is formed by 38,430 words, 97 which is a similar number of words to those contained in the Marlowian corpus. This adaptation will allow for the conduction of unbiased studies in which the corpora of both candidates are in equal conditions to be compared with a smaller sample to determine its likeliest authorship. Once the size of both reference corpora is balanced, a pre-study to evaluate the effectiveness of n-gram tracing in the attribution of authorship of scenes written by Shakespeare and Marlowe will be carried out. This pre-study will be based on the extraction of five scenes of each author from every group of scenes to quantify the n- grams that they share with the reference corpus from which they have been removed and with that of the other candidate to discern if the method can associate each sample with the corpus of the author from which it has been taken. Firstly, five Shakespearean and five Marlowian scenes whose length ranges from 100 to 450 words will be extracted from their corpora and analysed as disputed texts to estimate if n-gram tracing is reliable enough to investigate the authorship of the scenes from Arden of Faversham that have a similar length. Each of these ten scenes will be extracted and compared with the two reference corpora independently, so this stage of the pre-study will be formed by ten different analyses. The purpose of these studies is to discern how many of the ten undisputed scenes present more n-grams in common with the corpus from which they have been taken than with that of the other candidate. The same procedure will then be carried out with scenes taken from the three other groups, which contain between 500 and 950 words, between 1,100 and 1,700 words and almost 2,000 words or more, respectively. The fourth stage will include four Marlowian scenes instead of five, as in the two pre-studies described earlier. Two criteria will be followed during the conduction of this pre-study. The first one derives from the fact that the undisputed scenes that will be analysed as disputed texts will present n-grams in common with the corpus from which they have been taken that include proper names which are exclusive of the play where they belong. For instance, if Scene II.iii from Edward II is extracted from the Marlowian corpus and analysed with ALTXA, it will present a series of 2-grams in common with such corpus that include the names of characters and locations that only appear in that play, for instance Gaveston is and in Tynmouth. These circumstantial n-grams would help to attribute each undisputed scene to its author, but that will not occur if a scene taken from Arden of Faversham is 98 analysed by this method, since it does not share specific characters and locations with any of the two reference corpora. For that reason, an exhaustive review of all the common n-grams of the pre-study will be made to eliminate the circumstantial ones manually from the lists provided by ALTXA and only take into consideration the type of n-grams that would be involved in the final case study. In contrast, those that include the names of characters and locations that are present in both reference corpora will be kept, for instance King Henry and England, which are mentioned multiple times by Shakespeare and Marlowe. This criterion ensures that the pre-study will offer a realistic outcome that enables the assessment of the validity of the method. Even though n-gram tracing is often seen as a quantitative method, its results can be “noisy” on certain occasions, which require a subsequent qualitative analysis (Kredens, personal communication, February 17, 2019). In other words, the fact that two samples coincide in the use of an n-gram that contains many words may not necessarily be significant if it is a highly common construction. Therefore, the second criterion that will be adopted for the conduction of the pre-study is that if a disputed scene shares at least ten n-grams of a certain number of words with one of the reference corpora, these will be analysed quantitatively and the results will be presented in tables,21 whereas the others will be analysed from a qualitative perspective. This qualitative analysis of the larger but less frequent n-grams is only meant to complement the quantitative analysis, which will be given more importance in the attribution process. It makes sense to establish a statistical comparison of the results derived from the study of a type of n-grams that are shared a certain number of times by the texts, but it would be illogical to conduct a quantitative analysis of, for instance, 5-grams that are only shared once or twice by the samples and consider them as significant as the number of common 2-grams, which are much more frequent. This means that a 5-gram like and here he comes the will not be given much importance if it is the only common 5-gram between a scene and a reference corpus, even if there are no 5-grams in common between that scene and the other reference corpus, since it is not a distinctive combination of words. In contrast, a 5-gram like this hell of 21 For the sake of clarity, the results derived from the quantitative analyses will be expressed in absolute figures, instead of using the overlap coefficient or the Jaccard index (see Grieve et al., 2018). 99 grief is will be considered a solid idiolectal marker that can complement the results of the quantitative study, since it is an unusual combination of five words that holds a metaphorical meaning. This criterion will be kept for the final case study on the authorship of the scenes of Arden of Faversham if this method proves to be effective in any of the stages of the pre-study. In short, if a text whose authorship is being analysed shares at least ten n-grams of a certain length with one of the reference corpora, these will be analysed from a quantitative perspective and the results will be presented in tables. When these results are discussed, a qualitative analysis of the larger but less frequent n-grams in common will be provided. In the elaboration of the final verdict on the authorship of a scene, three expressions will be used, the first one being that it seems highly probable that it was written by Shakespeare/Marlowe. This will be used on those occasions in which every type of n- grams clearly links the scene to a specific author or in those cases in which, even if the results derived from the analysis of one type of n-grams are inconclusive, the others associate it with one of the candidates with great certainty. The second expression is that it seems slightly probable that the scene was written by Shakespeare/Marlowe, which will be used when the results provided by the analysis link the authorship of the sample to a specific author by a narrow margin. Finally, when the results lack clarity, the expression it seems uncertain if the scene was written by Shakespeare or Marlowe will be employed. In brief, the reliability of n-gram tracing will be assessed by extracting scenes from the two reference corpora, whose number of words has been balanced, and analysing if they share more n-grams with that from which they have been taken than with the other after the manual exclusion of those n-grams that include the names of characters and locations that are exclusive of the play where they belong. If a scene that is being analysed shares at least ten n-grams of a certain type with one of the reference corpora, these will be analysed quantitatively, whereas the others will be examined from a qualitative perspective. If the success rate of n-gram tracing is sufficiently solid in any of the four stages of the pre-study, this method will be used to determine the likeliest authorship of the scenes of Arden of Faversham of such group following the same criteria that have been delineated for the conduction of the pre-study. 100 4.5.5. The Zeta test The last method selected for the conduction of the pre-studies is the Zeta test (see Section 3.4.4 for a detailed explanation of the fundamentals of this procedure as well as of previous research involving its usage). This test can be accessed if the user clicks the ZTest tab on the interface of ALTXA, where they will find four file choosers called Text A file, Text B file, Text C file and Ignored words file (see Figure 3). The reference corpora of both candidates must be uploaded in txt. format to the first two file choosers with the combination of symbols @#@ written at the end of every text within a corpus. This will allow the software to divide the corpora properly, that is, in fragments of 2,000 words but adding the residual ones at the end of each play to its last fragment. The disputed text is expected to be uploaded with the same format to the third file chooser. To elaborate the lists of the 500 markers of each of the two reference corpora, ALTXA has been programmed to only take into consideration words that do not appear in a stop list that has to be uploaded in .txt format to the file chooser Ignored words list. This stop list needs to include the most common function words of the language in which the researcher is conducting their study,22 which can be easily found on the Internet, as well as proper names and other lexical items that they wish to ignore on the ground that they are “more closely related to local, play-specific contexts rather than indicative of any consistent authorial pattern” (Elliott & Greatley-Hirsch, 2017, p. 151). All these words must be introduced without capital letters and separated by single spaces in the .txt document. The idea of creating an editable stop list to adapt the conduction of each Zeta test to the specific needs of the researcher is one of the innovations introduced by the software ALTXA that differentiates it from other computational tools. Appendix 3 contains the stop list with all the words that have been ignored as potential markers for the conduction of the Zeta tests of this thesis. 22 Following the criterion of Kinney (2009), some function words with a similar meaning, like yes and yea, whose usage may be seen as a choice of the author, have not been ignored for the conduction of this test, despite the fact that the abovementioned forms are mainly dialectal or context-dependent (see Section 3.4.4). The combinations of two function words in a contracted form, such as I’ll, have not been ignored either, since they stand as idiolectal choices, in my view. 101 Figure 3 | Interface of ALTXA for the Zeta test When the button Execute Zeta test analysis is clicked, ALTXA will quantify the proportion of fragments from the first reference corpus in which every word that is not present in the document Ignored words list appears and the proportion of fragments from the second reference corpus where they do not appear. If the percentages of appearance and not appearance of each of these words are transformed into numbers from 0 to 1 and these are added, a distinctive one must produce a result that is higher than 1. The 500 words of the first author with the highest results above 1 will be listed as their markers, and the opposite procedure will be simultaneously applied by the software to elaborate the list of 500 markers of the second candidate. In other words, the 500 markers of an author are not only chosen by their frequency in his/her corpus, but also by their low frequency or lack of appearance in the corpus of the other candidate. The lists will be generated by the software in an Excel document, together with a png. image file with the graphical representation of each fragment of 2,000 words or more on a coordinate axis and another Excel document that details these coordinates. There is a problem with the conduction of the Zeta test with Elizabethan playwrights, which is that their samples include archaic forms like thine that tend not to be present on the lists of function words available on the Internet, and thus the software will identify them as potential markers if it is not ordered to ignore them. For that reason, the process by which the 500 markers of each author are obtained for the conduction of these Zeta 102 tests needs to be repeated many times. Every time it is carried out, it is necessary to include all the function words and proper names that appear on these two lists in the stop list and execute the Zeta test repeatedly until there are only distinctive lexical items in both lists of markers, which is a thorough process (see Appendix 3 for the stop list with all the words ignored as potential markers during the conduction of the Zeta tests of this thesis). For obvious reasons, this would not be such a complicated task if the texts were modern. As pointed out earlier, ALTXA will generate an Excel document with the lists of 500 markers of each author that need to be revised until they only include the kind of words that the researcher wishes to consider for their study, a png. image with the graphical representation of the results and an Excel document that includes the exact coordinates of every fragment on the coordinate axis. The latter document is generated in case the researcher wants to elaborate another graphical representation of the results on the Excel sheet itself or to export the coordinates to a distinct database easily. In the png. file generated by ALTXA, the fragments of the first reference corpus will be represented by blue dots, those of the second reference corpus by red squares, and the fragments of the disputed text by black triangles (see Sections 5.4.1, 6.1, 6.3 and 6.14 for examples of these representations). Their position on the coordinate axis will be determined as follows. The value of the horizontal axis stands as the division of the number of markers of the first reference corpus that a fragment includes by its number of distinct words, whereas its position on the vertical axis will be determined by the division of the markers of the second reference corpus that it contains by its number of distinct words. As explained in Section 3.4.4, the Zeta test does not take into account the number of times that a marker appears in a fragment, but whether it appears or not, and the fact that the number of markers that a fragment contains is divided by its number of different words or types is to compensate the dissimilar size that some of them have as a result of including a residual number of words for being at the end of a text. The likeliest authorship of the fragments of the disputed text will therefore be determined by their proximity to the centroid of each of the two clusters formed by the fragments in which the two reference corpora have been divided. In case that it is not discernible at plain sight which of the two clusters is closer to the fragments of the disputed text, the coordinates of the centroid of a cluster can be determined by calculating the average value of all its X and Y coordinates. Afterwards, the distance between the 103 coordinates of the centroid of a cluster and those of a fragment of the disputed text can be calculated, according to Professor Elisa Isabel Lozano (personal communication, January 10, 2020), with the formula |𝐴𝐵⃗⃗⃗⃗ ⃗| = √(𝑥2 − 𝑥1)2 + (𝑦2 − 𝑦1)2. The main differences between the Zeta tests conducted by Kinney (2009) and Elliott and Greatley-Hirsch (2017) to analyse the authorship of Arden of Faversham and those that will be applied in the present thesis lie in the fact that these ones will compare Shakespeare with Marlowe individually, instead of comparing one candidate with a group of many, and what has been considered a representative reference corpus for each candidate. In addition, the type of scenes to which this method will be applied will also differ from those of the abovementioned studies, which is an issue that will be addressed further on. Firstly, I would like to suggest that using the Zeta test to compare a writer with a group of writers is not as efficient as comparing them individually, which could be exemplified in the following way. If the word gentle appears in many Shakespearean fragments and does not occur so often in the Marlowian fragments, it will be classified as a discriminator of the Bard if these candidates are compared individually. Nevertheless, if Shakespeare is compared with a group of writers that includes Marlowe, but whose majority uses the word gentle frequently, this word will not be selected as a marker because of the average values of the group and thus a reliable discriminator between these two authors will be lost. The idea of obtaining a set of markers from a group of writers in a Zeta test does not seem sensible to me, given that their corpus will not constitute a proper reflection of the idiolect of any of them, but a mixture of many idiolects that cannot fully represent any of its parts. For that reason, one of the main hypotheses suggested in this doctoral thesis is that the Zeta test should only compare authors individually (see Section 1.2), given that it is the only way of obtaining realistic discriminators among them. If a researcher wants to distinguish between Shakespeare and the rest of the Elizabethan authors, such comparisons should be made one by one. This is one of the reasons why the catalogue of candidates for the conduction of this thesis has been narrowed down to Shakespeare and Marlowe and why future studies will compare the likeliest author of every scene of Arden of Faversham according to this research with Kyd and the rest of possible candidates individually (see Section 8.2). 104 The second major difference between the Zeta tests of the present thesis and those applied by Kinney (2009) and Elliott and Greatley-Hirsch (2017) is the criteria for the compilation of the reference corpora of the candidates. The corpora compiled by these authors included works of periods and subgenres that are different from those of Arden of Faversham, while the ones selected for the conduction of this doctoral thesis were written no more than three years apart from the creation of the play and none of them are comedies. This issue, which is associated with one of the most relevant hypotheses of the investigation (see Section 1.2), has been addressed in depth earlier in this chapter, for which no further explanations will be provided. The Zeta test places on a coordinate axis 2,000-word fragments according to the markers of the two reference corpora that they contain, for which it seems reasonable to only analyse scenes whose length is similar or superior to 2,000 words with this method. Despite the fact that Kinney (2009) also included the shortest scenes of Arden of Faversham in this procedure, I would say that it does not make much sense to compare fragments of 2,000 words with others of, for instance, 200, in terms of the number of markers that they have from two lists of 500, even if these numbers are then divided by the number of distinct words of each fragment. There are other methods, such as n-gram tracing, that can effectively analyse the authorship of scenes of this kind without making unbalanced comparisons. Therefore, only undisputed scenes whose length is similar or superior to 2,000 words will be analysed with the Zeta test to evaluate its validity. This pre-study will consist in the extraction of five Shakespearean and four Marlowian scenes of such length to be analysed independently by this method. If the pre-study shows solid results (see Section 5.4), the Zeta test will be used to determine the likeliest authorship of the scenes of Arden of Faversham of that group. In short, this doctoral thesis suggests that candidates should be compared individually during the Zeta test. It has also been suggested that the reference corpora of these candidates should be formed by texts which have similar characteristics to those of the disputed text. Lastly, the fragments in which the disputed text is divided should present a length that is at least close to that of the fragments in which the reference corpora are divided, for which only scenes from the fourth group will be included in the pre-study to assess the validity of the method. 105 4.6. Summary This chapter has offered an exhaustive explanation of the distinct steps that have been and will be taken for the conduction of the study, which could be summarized as follows. The selection of the authorship of Arden of Faversham as the focus of this research has been determined by the topic of my previous work and the inconclusive results of the few studies that have been conducted on this subject from a forensic linguistic perspective. Given the thoroughness of the analysis and its inherent length, the first methodological decision has been to only take into consideration two candidates for the authorship of the play and consider this thesis as the first milestone of a long-term project where the rest of the Elizabethan playwrights will be involved. The selection of Shakespeare as one of the candidates has been due to the influence of the research conducted by other scholars, whereas the selection of Marlowe is mainly due to his biographical data, which seem to make him a suitable candidate for a play of this nature, and the fact that he is known to have collaborated with Shakespeare in the creation of Henry VI. The selection of the candidates for the attribution of authorship of Arden of Faversham has been followed by the delimitation of a series of criteria to compile their reference corpora. These have been to select plays that are not comedies and were written no more than three years apart from the creation of Arden of Faversham. Such decisions respond to one of the main hypotheses on which the investigation is built, which is that authorship problems can be better addressed if the disputed text is compared to reference corpora that reflect faithfully the conditions in which it was created and thus are truly representative of its idiolectal features. The hypotheses that word n-grams reflect more distinctive constructions than character n-grams and that authors should be compared individually during a Zeta test have also been discussed in this chapter. Richard III and Richard II have been selected for the compilation of the Shakespearean corpus, whereas the corpus of Marlowe has been compiled with The Jew of Malta and Edward II. These plays and Arden of Faversham itself have been extracted from the archives of Project Gutenberg, that has prioritized the preservation of the selection of words of the original manuscripts, which is on what the subsequent analysis will focus, rather than spelling features. Afterwards, the texts have been adapted to optimize the results of such analysis following a series of criteria introduced by the researcher, for instance that only the direct interventions of the characters in the dialogues 106 will be considered in the analysis, since they reflect more idiolectal features than stage directions. As a result of the belief that the effectiveness of an authorship attribution method is determined by the context where it is applied, this investigation will be divided into a series of pre-studies and a case study. The pre-studies will analyse the authorship of scenes of distinct lengths taken from the Shakespearean and the Marlowian reference corpora as if they were disputed texts to only apply in the case study, that is, the attribution of authorship of the scenes of Arden of Faversham, those procedures that have been proved to be solid enough in an identical linguistic context. In addition, these pre-studies will allow the researcher to have a reference of what kind of outcomes can be considered conclusive in the case study. The functionalities of the software ALTXA and how they can be accessed on its interface have been thoroughly addressed in this chapter. These are the quantification of the relative frequency of a set of keywords selected by the researcher, which will not be included in the study for its reliance on subjective criteria, the calculation of the average number of words per sentence of a text and its lexical richness, n-gram tracing and the conduction of the Zeta test. This reflects the importance of ALTXA in the thesis, given that one of its main objectives is to prove the validity of this tool and establish a solid methodological basis for the conduction of future studies involving other possible candidates. 107 CHAPTER 5 | PRE-STUDIES This chapter seeks to assess the reliability of the authorship tests that have been selected in Chapter 4 by applying them in the analysis of undisputed scenes taken from the Shakespearean and the Marlowian reference corpora. The only objective of these pre- studies is to use in the final case study, that is, the attribution of authorship of the scenes of Arden of Faversham, those procedures that have proved to be effective in a similar linguistic context. The first pre-study (see Section 5.1) will address whether the Shakespearean and the Marlowian scenes can be effectively differentiated by calculating their average number of words per sentence, whereas the second one (see Section 5.2) will assess the reliability of the calculation of their lexical richness for the same purpose. The third pre-study (see Section 5.3) will evaluate the effectiveness of n-gram tracing to attribute the authorship of samples of both candidates and the fourth one (see Section 5.4) will do the same with the Zeta test. These pre-studies will be divided into four distinct stages, one for each of the four types of scenes that have been delineated in the previous chapter in terms of their length, except for the pre-study about the Zeta test, which will be only conducted with samples whose length is similar or superior to 2,000 words (see Chapter 4 for a detailed explanation of the reasons underlying such decisions). 5.1. Pre-study on the calculation of the average number of words per sentence (Pre- study 1) This pre-study, which will analyse the authorship of undisputed scenes of Shakespeare and Marlowe in terms of their average number of words per sentence, will be based on the following principle. The scenes that were written by the same author should present similar values among themselves and, simultaneously, a sufficient degree of differentiation from those elaborated by the other author, which should also present intra- author consistency. Random scenes from the two reference corpora of between 100 and 450 words will be analysed first, followed by scenes of between 500 and 950, between 1,100 and 1,700 and, lastly, of almost 2,000 words or more. The objective of this pre-study is therefore to assess the extent to which the calculation of the average number of words per sentence can be considered a reliable discriminator to distinguish between samples written by Shakespeare and Marlowe in any of the four abovementioned contexts to later apply it in the analysis of the scenes of Arden of Faversham. 108 5.1.1. Average number of words per sentence of scenes of between 100 and 450 words The average number of words per sentence of five Shakespearean and five Marlowian random scenes of between 100 and 450 words has been calculated by ALTXA and the results can be observed in Table 2. Table 2 | Stage 1 of the pre-study on the average number of words per sentence Shakespearean scenes Words per sentence (w/s) Marlowian scenes Words per sentence (w/s) Richard III, Scene II.iii (398 words) 12.061 w/s Edward II, Scene II.iii (218 words) 12.824 w/s Richard III, Scene III.iii (197 words) 11.588 w/s Edward II, Scene IV.iii (426 words) 10.923 w/s Richard III, Scene V.ii (188 words) 17.091 w/s The Jew of Malta, Scene III.i (253 words) 14.056 w/s Richard II, Scene III.i (342 words) 21.375 w/s The Jew of Malta, Scene III.ii (288 words) 8.727 w/s Richard II, Scene V.vi (411 words) 17.125 w/s The Jew of Malta, Scene IV.iii (351 words) 9.75 w/s The table presented above shows that these ten scenes do not have neither intra-author consistency nor inter-author variation, as will be developed in the following paragraphs. The results of the Shakespearean scenes could be divided into those obtained by Scenes II.iii and III.iii from Richard III, which are relatively close (12.061 and 11.588, respectively), those of Scene V.ii from Richard III and Scene V.vi from Richard II, which are almost identical (17.091 and 17.125, respectively), and that of Scene III.i from Richard II, whose average number of words per sentence is of 21.375. Hence, the Shakespearean samples present a maximum difference of 9.787 points, which can be found if Scene III.iii from Richard III (11.588) is compared to Scene III.i from Richard II (21.375). This stands as a reflection of their lack of consistency. The Marlowian scenes are even more heterogeneous, since there is a difference of more than one point among each of the five samples if their values are ordered from the 109 lowest to the highest (8.727, 9.75, 10.923, 12.824 and 14.056). If Scene III.ii from The Jew of Malta is compared to Scene III.i from the same play, there is a difference of 5.329 words per sentence between them, which is the highest within the Marlowian samples. Even though three of the scenes written by Shakespeare are the only ones that have more than 15 words per sentence and two of the Marlowian scenes are the only ones with less than 10, there are scenes of the two authors that present overlapping results. This is the case of Shakespeare’s Scene II.iii from Richard III, with 12.061 words per sentence, and Marlowe’s Scene II.iii from Edward II, which has 12.824, that is, almost the same. Similarly, the average number of words per sentence of Shakespeare’s Scene III.iii from Richard III (11.588) overlaps with the results derived from the analysis of Scenes II.iii and IV.iii from Marlowe’s Edward II (12.824 and 10.923, respectively). This means that if the average number of words per sentence of a scene from Arden of Faversham is calculated by ALTXA and the result is within those values, it would be impossible to associate its authorship with any of the two candidates of the study. In conclusion, the quantification of the average number of words per sentence of undisputed scenes of between 100 and 450 words written by Shakespeare and Marlowe has shown barely any intra-author consistency and that the results of both authors tend to overlap, for which this discriminator will not be used to determine the authorship of the scenes of Arden of Faversham of the same length. 5.1.2. Average number of words per sentence of scenes of between 500 and 950 words The second stage of this pre-study consists in the quantification of the average number of words per sentence of five undisputed scenes of Shakespeare and five undisputed scenes of Marlowe whose length ranges from 500 to 950 words to evaluate if there is enough intra-author consistency and inter-author variation. The results derived from the analysis of these scenes, which have been randomly selected, can be observed in Table 3. 110 Table 3 | Stage 2 of the pre-study on the average number of words per sentence Shakespearean scenes Words per sentence (w/s) Marlowian scenes Words per sentence (w/s) Richard III, Scene II.iv (591 words) 10.368 w/s Edward II, Scene III.iii (726 words) 12.737 w/s Richard III, Scene III.iv (860 words) 13.871 w/s Edward II, Scene V.iii (527 words) 9.246 w/s Richard III, Scene IV.ii (920 words) 8.364 w/s The Jew of Malta, Scene III.iii (521 words) 8.683 w/s Richard II, Scene I.ii (579 words) 16.543 w/s The Jew of Malta, Scene III.iv (847 words) 10.329 w/s Richard II, Scene III.iv (856 words) 16.151 w/s The Jew of Malta, Scene IV.v (532 words) 10.231 w/s Table 3 shows that there is a lack of intra-author consistency in the results derived from the analysis of the Shakespearean scenes. There is a dramatic difference of 8.179 points if the average number of words per sentence of Scene IV.ii from Richard III (8.364) is compared to that of Scene I.ii from Richard II (16.543). The results of Scenes II.iv and III.iv from Richard III remain in an intermediate position among the two abovementioned scenes, although there is a considerable difference of more than three words per sentence between them (10.368 and 13.871, respectively). The only ones that seem to have similar values are Scenes I.ii and III.iv from Richard II (16.543 and 16.151, respectively). The Marlowian scenes present a maximum difference of 4.054 words per sentence, which can be found if Scene III.iii from Edward II is compared to Scene III.iii from The Jew of Malta (12.737 and 8.683, respectively). There is great intra-author consistency in the average number of words per sentence of Scenes III.iv and IV.v from The Jew of Malta (10.329 and 10.231, respectively), and the results of Scene V.iii from Edward II and Scene III.iii from The Jew of Malta are relatively close (9.246 and 8.683, respectively). The average number of words per sentence of the latter Marlowian scene is almost identical to that of Scene IV.ii from Shakespeare’s Richard III, which is of 8.364. 111 Furthermore, the results of Scenes III.iv and IV.v from Marlowe’s The Jew of Malta are between 10 and 11 words per sentence (10.329 and 10.231, respectively), as happens with Shakespeare’s Scene II.iv from Richard III (10.368). This method has been proved to be ineffective to distinguish between scenes of the second group written by Shakespeare and Marlowe, given the lack of intra-author consistency and, especially, the high frequency with which the average number of words per sentence of the two playwrights overlap. 5.1.3. Average number of words per sentence of scenes of between 1,100 and 1,700 words The third stage of the pre-study focuses on the calculation of the average number of words per sentence of five Shakespearean and five Marlowian random scenes whose length ranges from 1,100 to 1,700 words. The results provided by the software ALTXA can be observed in the following table. Table 4 | Stage 3 of the pre-study on the average number of words per sentence Shakespearean scenes Words per sentence (w/s) Marlowian scenes Words per sentence (w/s) Richard III, Scene I.i (1,243 words) 15.538 w/s Edward II, Scene I.i (1,588 words) 11.94 w/s Richard III, Scene II.ii (1,214 words) 13.64 w/s Edward II, Scene III.ii (1,401 words) 16.679 w/s Richard III, Scene III.i (1,580 words) 11.704 w/s Edward II, Scene V.i (1,266 words) 13.326 w/s Richard II, Scene I.i (1,605 words) 21.4 w/s The Jew of Malta, Scene I.i (1,425 words) 13.443 w/s Richard II, Scene II.iii (1,377 words) 17.213 w/s The Jew of Malta, Scene IV.iv (1,135 words) 11.823 w/s Table 4 shows that the disparity of the average number of words per sentence of the Shakespearean scenes is evident, with a maximum difference of 9.696 points between Scene III.i from Richard III (11.704) and Scene I.i from Richard II (21.4), as well as a difference of almost two points or more among each of the five scenes if their values are 112 ordered from the lowest to the highest (11.704, 13.64, 15.538, 17.213 and 21.4). The five Shakespearean scenes that contain between 1,100 and 1,700 words have shown no intra- author consistency. The results derived from the study of the Marlowian scenes are slightly more consistent, given that there is great similarity between the results of Scene I.i from Edward II (11.94) and Scene Iv.iv from The Jew of Malta (11.823), as well as between those of Scene V.i from Edward II (13.326) and Scene I.i from The Jew of Malta (13.443). Nevertheless, the average number of words per sentence of Scene III.ii from Edward II (16.679) is notably distinct from that of the other Marlowian scenes, creating a maximum difference of 4.856 points between it and Scene Iv.iv from The Jew of Malta (11.823). In addition to the lack of intra-author consistency, which is more evident in the case of the Shakespearean scenes, the results of the two candidates overlap. This can be observed if the average number of words per sentence of Shakespeare’s Scene III.i from Richard III (11.704) is compared to that obtained by Marlowe’s Scene I.i from Edward II (11.94) and Scene Iv.iv from The Jew of Malta (11.823), or if the average number of words per sentence of Shakespeare’s Scene II.ii from Richard III (13.64) is compared to that of Marlowe’s Scene V.i from Edward II (13.326) and Scene I.i from The Jew of Malta (13.443). In brief, it seems that this discriminator is not consistent enough to be used in the authorship analysis of the scenes of Arden of Faversham that have between 1,100 and 1,700 words. 5.1.4. Average number of words per sentence of scenes of almost 2,000 words or more The final stage of the pre-study aims to assess the effectiveness of this test with undisputed scenes of Shakespeare and Marlowe whose number of words is similar or superior to 2,000. While it is possible to include five Shakespearean scenes of more than 2,000 words in this study, there are only three Marlowian scenes that contain such number of words, for which Scene II.ii from Edward II, that contains 1,995 words, has been included in this analysis, whose results can be observed in Table 5. 113 Table 5 | Stage 4 of the pre-study on the average number of words per sentence Shakespearean scenes Words per sentence (w/s) Marlowian scenes Words per sentence (w/s) Richard III, Scene I.iii (2,845 words) 13.678 w/s Edward II, Scene I.iv (3,330 words) 11.767 w/s Richard III, Scene IV.iv (4,267 words) 13.334 w/s Edward II, Scene II.ii (1,995 words) 9.975 w/s Richard III, Scene V.iii (2,726 words) 10.904 w/s The Jew of Malta, Scene I.ii (2,929 words) 11.623 w/s Richard II, Scene I.iii (2,402 words) 18.336 w/s The Jew of Malta, Scene II.iii (3,034 words) 11.669 w/s Richard II, Scene II.i (2,372 words) 17.701 w/s The Shakespearean scenes present heterogeneous results, except for Scenes I.iii and IV.iv from Richard III, whose average number of words per sentence is quite similar (13.678 and 13.334, respectively). Scene V.iii from Richard III presents an average number of words per sentence of 10.904, which creates a considerable difference of 7.432 points if it is compared with Scene I.iii from Richard II (18.336), and of 6.797 points with Scene II.i from Richard II (17.701). In contrast, the Marlowian scenes are highly homogeneous, since three of them present between 11 and 12 words per sentence, while the remaining one, which is Scene II.ii from Edward II, has 9.975. The main problem behind the homogeneity of the Marlowian samples, whose values range from 9.9 to 11.8, is that the average number of words per sentence of Scene V.iii from Shakespeare’s Richard III (10.904) overlaps with them. This means that if the average number of words per sentence of a scene from Arden of Faversham is calculated and the result is close to 11, the authorship of the disputed text could not be associated with Shakespeare or Marlowe with certainty. In sum, this stage of the pre-study has shown highly consistent results in the analysis of the Marlowian scenes, but great intra-author variation in those of Shakespeare, which invalidates the test automatically. In addition, even though the results of the Marlowian scenes are quite similar, they overlap with one of the results of the Shakespearean scenes, 114 which would not allow for a reliable attribution of authorship of a scene from Arden of Faversham that has an average number of words per sentence within those values. 5.1.5. Conclusions derived from Pre-study 1 It has been proved that the calculation of the average number of words per sentence cannot distinguish with sufficient reliability a Shakespearean scene that belongs to a play that is not a comedy and was written between 1590 and 1595 from a Marlowian scene of the same characteristics. The pre-study has not achieved satisfactory results in any of the four types of scenes that have been put into analysis. Even though the intra-author consistency has improved as the size of the samples has increased, especially in the case of the Marlowian scenes, the overlapping results of both playwrights in the four categories has undermined the reliability of this discriminator. Therefore, the calculation of the average number of words per sentence will not be used in the final case study. This does not mean that the method is ineffective, but that it is not effective enough in this specific linguistic context. In other words, the quantification of this discriminator could prove to be effective if the samples of one of the candidates are changed by the works of a different playwright or if the samples of Shakespeare and Marlowe are taken from a different period, for instance. The following section will present and discuss the results derived from the second pre-study. 5.2. Pre-study on the calculation of the lexical richness (Pre-study 2) The second pre-study of the thesis intends to evaluate the reliability of the calculation of the lexical richness to distinguish between Shakespearean and Marlowian scenes. For such end, four distinct analyses will be conducted, that is, one for each of the four types of scenes according to their length, under the principle that if this discriminator is effective enough, the values of the scenes written by the same author should present certain consistency and, at the same time, that they should be sufficiently different from those of the other candidate. Scenes that contain between 100 and 450 words will be analysed first. Afterwards, scenes whose length ranges from 500 to 950 and from 1,100 to 1,700 words will be studied. Finally, the effectiveness of the discriminator will be assessed with scenes that contain almost 2,000 words or more. The scenes involved in this pre-study will not be randomly selected as in that on the average number of words per sentence, given that this parameter is greatly affected by small differences in the size of the samples. For that reason, the scenes of a more similar 115 length in the first three groups will be selected and classified into subgroups, which will optimize the results and allow for a realistic evaluation of the effectiveness of this test. Since there is a lack of scenes of a similar length in the fourth group, an estimation of what their lexical richness would be if their size was balanced will be taken into consideration to assess the reliability of the procedure (see Section 4.5.3 for a more detailed explanation of the reasons underlying these decisions). The results will be presented in tables and later discussed. 5.2.1. Lexical richness of scenes of between 100 and 450 words The lexical richness of five Shakespearean and five Marlowian scenes that have between 100 and 450 words has been calculated by ALTXA to discern if there is sufficient intra- author consistency and inter-author variation. The results derived from this study can be observed in Table 6. Table 6 | Stage 1 of the pre-study on the lexical richness Shakespearean scenes Lexical richness (%) Marlowian scenes Lexical richness (%) Richard III, Scene III.iii (197 words) 64.975% Edward II, Scene III.i (151 words) 61.589% Richard III, Scene III.vi (116 words) 77.586% Edward II, Scene IV.i (123 words) 77.236% Richard III, Scene V.iv (110 words) 60.0% The Jew of Malta, Scene III.i (252 words) 64.032% Richard III, Scene V.v (315 words) 62.54% The Jew of Malta, Scene III.ii (288 words) 57.986% Richard II, Scene III.i (342 words) 56.725% The Jew of Malta, Scene III.v (253 words) 63.241% As underlined earlier, minor differences among the size of the samples could have a major impact on the results of this test, for which the scenes of each author have been carefully selected in order to have two subgroups where their length is almost identical. For that reason, three of the Shakespearean scenes contain between 110 and 200 words (Scenes III.iii, III.vi and V.iv from Richard III), while the other two taken from his corpus present 116 a length that ranges from 310 to 350 words (Scene V.v from Richard III and Scene III.i from Richard II). The results of Scenes III.iii, III.vi and V.iv from Richard III differ considerably, given that their lexical richness is of 64.975%, 77.586% and 60.0% respectively, which means that no consistency can be found in the subgroup of Shakespearean scenes of between 110 and 200 words. Similarly, if the lexical richness of the two Shakespearean scenes whose length is between 310 and 350 words is compared, there is a notable distance of more than five points between them, since the result of Scene V.v from Richard III is of 62.54%, whereas that of Scene III.i from Richard II is of 56.725%. Hence, it has been proved that the quantification of the lexical richness of Shakespearean scenes of such a short length leads to disparate results, even if their number of words is highly similar. In the case of the Marlowian scenes, two of them present a length that ranges from 120 to 160 words (Scenes III.i and IV.i from Edward II), and three of them contain between 250 and 290 words (Scenes III.i, III.ii and III.v from The Jew of Malta). These have been selected to observe if the results derived from the analysis of each subgroup present intra-author consistency, which would allow for a posterior comparison with the scenes of the other candidate. Like the Shakespearean scenes of both subgroups, the Marlowian scenes whose number of words ranges from 120 and 160 words present disparate results, since the lexical richness of Scene III.i from Edward II is of 61.589% and the result achieved by Scene IV.i from the same play is of 77.236%. The scenes from the other Marlowian subgroup present more consistency among them, since Scenes III.i and III.v from The Jew of Malta present a similar lexical richness (64.032% and 63.241%, respectively). Nevertheless, the third sample of this second subgroup, that is, Scene III.ii from The Jew of Malta, presents a lexical richness of 57.986%, which differs considerably from the results achieved by the two others of the same subgroup. In sum, the results derived from the analysis of undisputed scenes that contain between 100 and 450 words seem to have been chaotic and present such intra-author variation that it is not necessary to compare the results of both playwrights to determine that this discriminator should not be applied in the analysis of the scenes of Arden of Faversham of a similar length. Even though the scenes have been carefully selected to create two subgroups per author where they have an almost identical number of words, it 117 seems evident that the calculation of this parameter can only reach consistent results if the size of the texts increases dramatically. 5.2.2. Lexical richness of scenes of between 500 and 950 words The second stage of the pre-study consists in the calculation of the lexical richness of five scenes of between 500 and 950 words from each of the two reference corpora. Even though the samples of this stage of the pre-study are larger than those of the previous one and hence differences in their number of words should not have such a huge impact on the results, the decision of selecting scenes that have a similar length to create two subgroups of scenes per author has been made again, as can be observed in the following table. Table 7 | Stage 2 of the pre-study on the lexical richness Shakespearean scenes Lexical richness (%) Marlowian scenes Lexical richness (%) Richard III, Scene II.iv (591 words) 48.9% Edward II, Scene II.iv (529 words) 48.582% Richard III, Scene III.iv (860 words) 44.651% Edward II, Scene II.v (849 words) 38.634% Richard II, Scene I.ii (579 words) 54.059% The Jew of Malta, Scene III.iii (521 words) 49.712% Richard II, Scene III.iv (856 words) 47.196% The Jew of Malta, Scene III.iv (847 words) 43.92% Richard II, Scene V.i (836 words) 47.608% The Jew of Malta, Scene IV.v (532 words) 43.609% Table 7 shows that the Shakespearean samples could be divided into those that contain between 550 and 600 words, that is, Scene II.iv from Richard III and Scene I.ii from Richard II, and those whose number of words ranges from 830 to 860 words, which are Scene III.iv from Richard III and Scenes III.iv and V.i from Richard II. The results of the scenes of the first subgroup lack consistency, given that their lexical richness is of 48.9% and 54.059% and thus there are more than five points between them. 118 The scenes of the second subgroup are more consistent, since Scenes III.iv and V.i from Ricahrd II have an almost identical lexical richness (47.196% and 47.608%, respectively), although their results differ slightly from that of Scene III.iv from Richard III (44.651%). The Marlowian scenes can be divided into a first subgroup where their length ranges from 500 to 550 words and contains Scene II.iv from Edward II, as well as Scenes III.iii and IV.v from The Jew of Malta; and a second subgroup formed by Scene II.v from Edward II and Scene III.iv from The Jew of Malta, that have between 800 and 850 words. The length of the scenes of these two subgroups is similar to that of the Shakespearean subgroups with the purpose of facilitating a posterior comparison between both playwrights, if necessary. There is a difference of more than six points if the lexical richness of Scene III.iii from The Jew of Malta (49.712%) is compared to that of Scene IV.v from the same play (43.609%) and hence it seems that the results of the first subgroup lack intra-author consistency, despite the fact that the lexical richness of Scene II.iv from Edward II (48.582%) is relatively close to that of Scene III.iii from The Jew of Malta. Similarly, there is a difference of more than five points if the lexical richness of the two scenes of the second subgroup, that is, Scene II.v from Edward II and Scene III.iv from The Jew of Malta, is compared (38.634% and 43.92%, respectively). This lack of intra-author consistency makes an exhaustive comparison between the results obtained by the scenes of both playwrights in each of the two subgroups unnecessary. In any case, it is worth mentioning that the lexical richness of the scenes of Shakespeare and Marlowe overlaps frequently. For instance, the results of Marlowe’s Scene II.iv from Edward II (48.582%) and Scene III.iii from The Jew of Malta (49.712%) are highly similar to that of Scene II.iv from Shakespeare’s Richard III (48.9%), whose length presents a high degree of resemblance with that of the abovementioned Marlowian scenes. In sum, the undisputed scenes of between 500 and 950 words that have been analysed in this stage of the pre-study present a lack of intra-author consistency and inter-author variation that do not allow for the inclusion of this parameter in the analysis of the scenes of Arden of Faversham of this length. 119 5.2.3. Lexical richness of scenes of between 1,100 and 1,700 words The lexical richness of five Shakespearean and five Marlowian scenes of between 1,100 and 1,700 words has been calculated by ALTXA. As in the previous stages of the pre- study, the scenes of both authors have been carefully selected to create two subgroups where they have a similar number of words. The results derived from this analysis can be observed in Table 8. Table 8 | Stage 3 of the pre-study on the lexical richness Shakespearean scenes Lexical richness (%) Marlowian scenes Lexical richness (%) Richard III, Scene II.i (1,117 words) 38.675% Edward II, Scene I.i (1,588 words) 38.854% Richard III, Scene III.i (1,580 words) 33.797% Edward II, Scene III.ii (1,401 words) 38.687% Richard II, Scene I.i (1,605 words) 40.561% Edward II, Scene V.i (1,226 words) 41.028% Richard II, Scene II.ii (1,150 words) 43.304% The Jew of Malta, Scene I.i (1,425 words) 40.0% Richard II, Scene V.iii (1,163 words) 42.304% The Jew of Malta, Scene IV.iv (1,135 words) 38.767% Table 8 shows that the Shakespearean scenes could be divided into those whose length ranges from 1,115 to 1,165 words, that is, Scene II.i from Richard III and Scenes II.ii and V.iii from Richard II; and those whose number of words is between 1,580 and 1,605, that is, Scene III.i from Richard III and Scene I.i from Richard II. The lexical richness of Scene II.i from Richard III, that belongs to the first subgroup, is of 38.675%, which differs from the percentages achieved by the two other scenes of the subgroup, that is, Scenes II.ii and V.iii from Richard II, that are close to each other (43.304% and 42.304%, repectively). The lexical richness of the two Shakespearean scenes of the second subgroup seems even more inconsistent, given that Scene III.i from Richard III has scored a result of 33.797% and this creates a dramatic difference of almost seven points if it is compared to that of Scene I.i from Richard II (40.561%). 120 The Marlowian scenes could be divided into those whose number of words is between 1,130 and 1,230, that is, Scene IV.iv from The Jew of Malta and Scene V.i from Edward II, and those whose length ranges from 1,400 to 1,590 words, that is, Scenes I.i and III.ii from Edward II and Scene I.i from The Jew of Malta. As can be observed in Table 8, these scenes present more consistency than those of the other candidate and, as a matter of fact, the values of the scenes of both subgroups remain quite uniform. The maximum difference among the five scenes can be found if the lexical richness of Scene III.ii from Edward II (38.687%) is compared to that of Scene V.i from the same play (41.028%), which implies a low distance of less than two and a half points. For the first time in this pre-study, there is barely any intra-author variation in the lexical richness of the Marlowian scenes of both subgroups. The results of the Marlowian scenes overlap with those of the other candidate, and for instance the lexical richness of the scenes of the first subgroup of the Bard is of between 38.675% and 43.304%, and that of the Marlowian scenes of the first subgroup, whose length is similar to that of the Shakespearean scenes that have just been mentioned, is of between 38.767% and 41.028%. The same problem occurs if the scenes of the second subgroup of each candidate are compared, and for instance the lexical richness of Shakespeare’s Scene I.i from Richard II (40.561%) is almost identical to that of Marlowe’s Scene I.i from The Jew of Malta (40%). This means that if the lexical richness of a scene from Arden of Faversham is calculated and the result is among these values, it would be impossible to associate its authorship with one of the candidates. The Shakespearean scenes of the two subgroups have scored disparate values, whereas the Marlowian scenes present a high degree of homogeneity, even if they belong to distinct subgroups. In addition to the lack of consistency of the Shakespearean scenes, which does not allow for the inclusion of this discriminator in the case study, the results of both playwrights overlap, and hence it will not be used to study the authorship of the scenes of Arden of Faversham of between 1,100 and 1,700 words. 5.2.4. Lexical richness of scenes of almost 2,000 words or more The last stage of the pre-study consists in the analysis of undisputed scenes whose length is of almost 2,000 words or more. Since there are not five scenes in the Marlowian reference corpus of such length, only four will be included in the analysis. Due to this scarcity of Marlowian scenes of the fourth group, it is not possible to make a careful 121 selection of them to create two subgroups of a specific range of words, as has been made with the Shakespearean samples. The main difficulty behind the conduction of this analysis is that, unlike in the three previous stages of the pre-study, the length of the Shakespearean scenes is quite disparate from that of the Marlowian scenes, for which it is necessary to estimate what the lexical richness of these scenes would be if their size was more balanced to compare both playwrights. Only such estimations will be considered to assess the reliability of this test when samples of dissimilar length are compared, instead of directly analysing the results derived from the calculation of their lexical richness. These results, which have been provided by ALTXA, can be observed in the table presented below. Table 9 | Stage 4 of the pre-study on the lexical richness Shakespearean scenes Lexical richness (%) Marlowian scenes Lexical richness (%) Richard III, Scene I.iii (2,845 words) 33.111% Edward II, Scene I.iv (3,330 words) 28.919% Richard III, Scene V.iii (2,726 words) 33.786% Edward II, Scene II.ii (1,995 words) 37.644% Richard II, Scene I.iii (2,402 words) 36.178% The Jew of Malta, Scene I.ii (2,929 words) 30.522% Richard II, Scene II.i (2,372 words) 37.69% The Jew of Malta, Scene II.iii (3,034 words) 28.378% Richard II, Scene IV.i (2,628 words) 32.078% Table 9 shows that there are great differences between the length of the Shakespearean scenes and that of the scenes extracted from the Marlowian corpus, which justifies the distinct approach that will be adopted to analyse the results. The Shakespearean samples could be divided into a first subgroup formed by scenes that contain between 2,370 and 2,405 words, that is, Scenes I.iii and II.i from Richard II, and a second subgroup where these have a length of between 2,625 and 2,845 words and includes Scenes I.iii and V.iii from Richard III, as well as Scene IV.i from Richard II. 122 In the case of the first subgroup, Scene I.iii from Richard II presents a lexical richness of 36.178%, whereas that of Scene II.i from the same play is of 37.69%. The homogeneity of the values of the first subgroup can be also found in the second subgroup, since there is a distance of less than two points among the lexical richness of Scene I.iii from Richard III (33.111%), Scene V.iii from Richard III (33.786%) and Scene IV.i from Richard II (32.078%). Hence, the results derived from the calculation of the lexical richness of Shakespearean scenes whose length is superior to 2,000 words could be seen as highly consistent in the two subgroups. The length of the Marlowian scenes is heterogeneous, since Scene II.ii from Edward II contains 1,995 words and therefore is considerably shorter than the other three. The number of words of Scene I.ii from The Jew of Malta (2,929) is similar to that of Scene II.iii from the same play (3,034), which allows for a direct comparison between them to discern if there is sufficient intra-author consistency. Finally, Scene I.iv from Edward II contains 3,330 words, which makes it a slightly larger sample than the two previous ones. As the size of the samples increases, the chances of repeating words are higher and hence their lexical richness tends to be lower. Therefore, it makes sense that the Marlowian scene with the highest lexical richness is Scene II.ii from Edward II (37.644%). The two scenes with the most similar length, that is, Scenes I.ii and II.iii from The Jew of Malta, present a slight difference of more than two points if their lexical richness is compared (30.522% and 28.378%, respectively). Lastly, it is surprising that the result of Scene I.iv from Edward II, whose length is moderately superior to that of the two abovementioned scenes, is of 28.919%, which is in an intermediate position between the lexical richness of these two scenes. It could be said that the results of the Marlowian scenes are relatively consistent, but not as much as those of the other candidate. As pointed out at the beginning of the section, the disparity in the length of the scenes of both playwrights makes it necessary to establish an estimation of what the lexical richness of their scenes would be if their size was balanced to compare them. It seems that the Shakespearean scenes whose number of words ranges from 2,625 to 2,845 and have a lexical richness of between 32% and 33.8% would probably present a similar value to the Marlowian scenes that have between 2,920 and 3,330 words and a lexical richness of between 28.3% and 30.6% if their size increased to the point of being balanced with those of Marlowe. In other words, it looks evident that the lexical richness of both playwrights would overlap on many occasions if the size of these scenes was similar, for 123 which this method should not be considered highly effective to distinguish between both playwrights in this linguistic context. In sum, even though the disparity of the length of the samples does not allow for such a precise evaluation of the results as in the three previous stages of the pre-study, it seems that the lexical richness of the scenes of both playwrights presents more intra-author consistency than in any other stage, which could mean that the stability of this parameter increases with the size of the samples. Nevertheless, the probability with which the results of the two candidates may overlap can be seen as a threat for the preciseness of the analysis of a disputed scene, for which this test will not be applied to determine the likeliest authorship of the scenes of Arden of Faversham of this length. 5.2.5. Conclusions derived from Pre-study 2 This pre-study has analysed the effectiveness of the calculation of the lexical richness to distinguish between scenes taken from plays that are not comedies and were written approximately between 1590 and 1595 by Shakespeare and Marlowe. The first stages of the pre-study have shown many inconsistencies in the values of the samples written by the same author, even though they have been carefully selected to have a similar number of words and optimize their results. As the size of the samples has increased, the lexical richness of those written by the same author has presented more consistency, but not enough inter-author variation. Therefore, this parameter has achieved a higher degree of intra-author consistency than the calculation of the average number of words per sentence, but the frequency with which the results of both playwrights tend to overlap does not guarantee the presence of clear results if disputed scenes are put into analysis. In conclusion, the quantification of the lexical richness will not be used for the attribution of authorship of Arden of Faversham, given that it has not been proved to be effective to distinguish between Shakespeare and Marlowe in any of the four types of scenes. Nevertheless, I would like to stress that this does not mean that the quantification of the lexical richness is ineffective in authorship attribution studies in general, but that it is not sufficently reliable in this specific linguistic context. 5.3. Pre-study on n-gram tracing (Pre-study 3) The objective of the third pre-study is to assess the effectiveness of n-gram tracing to distinguish between Shakespearean and Marlowian scenes. For this end, the authorship 124 of ten undisputed scenes (five from each of the two reference corpora) whose number of words is between 100 and 450 will be analysed independently first. Afterwards, the same procedure will be followed with ten scenes that contain between 500 and 950 words, between 1,100 and 1,700 words and, due to the scarcity of Marlowian scenes of more than 2,000 words, the last stage of the pre-study will analyse five Shakespearean and four Marlowian scenes of almost 2,000 words or more. The reference corpora of both playwrights have been edited to have a similar number of words and be in equal conditions to present a higher number of n-grams in common with each sample whose authorship is analysed. Those common n-grams between a scene and the reference corpus from which it has been extracted that include the names of characters and locations that are exclusive of the play where that scene belongs will be manually discarded from the list provided by ALTXA to produce an outcome that can reflect more faithfully the situation that would be faced if the authorship of a scene from Arden of Faversham was studied with this method. If a scene shares at least 10 n-grams of a certain type with one of the reference corpora, these will be analysed quantitatively, whereas the others will be analysed from a qualitative perspective that can complement those results. Depending on the clarity with which a scene can be associated with one of the reference corpora, its attribution will be presented as highly probable or slightly probable, whereas the expression it seems uncertain if this scene was written by Shakespeare/Marlowe will be used if the results are inconclusive (see Section 4.5.4 for a thorough explanation of the reasons underlying the abovementioned decisions). 5.3.1. N-gram tracing with scenes of between 100 and 450 words With the purpose of testing the effectiveness of n-gram tracing in determining the likeliest authorship of Shakespearean and Marlowian scenes that contain between 100 and 450 words, the authorship of five random scenes of each author will be analysed independently. The ten analyses will be conducted by removing each scene from the reference corpus where it belongs, identifying with ALTXA the n-grams that it shares with the two reference corpora and applying the methodological principles that have been pointed out at the beginning of this section, which were thoroughly explained in Section 4.5.4. 125 Scene II.iii from Shakespeare’s Richard III (398 words) The first scene that has been randomly selected for this stage of the pre-study is Scene II.iii from Richard III. It has been removed from the Shakespearean reference corpus and ALTXA has identified the n-grams that it has in common with such corpus and with that of the other candidate of the study. The results of the quantitative analysis of the 3-grams and 2-grams that the scene shares with the Shakespearean and the Marlowian corpora will be presented in Table 10 and later commented. Afterwards, the common 4-grams will be mentioned and qualitatively analysed. Table 10 | N-gram tracing with Scene II.iii from Richard III Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 17 13 2-grams 148 124 Table 10 shows the number of 3-grams and 2-grams that Scene II.iii from Richard III shares with the Shakespearean corpus from which it has been extracted and with the Marlowian corpus. The scene has 17 common 3-grams with the Shakespearean corpus after the elimination of those that include the names of characters and locations that are exclusive of the play where it belongs from the list provided by ALTXA, whereas it has 13 3-grams in common, that is, four less, with the Marlowian corpus. The difference of the 2-grams in common is more significant than that of the 3-grams, since the scene has 148 with the Shakespearean corpus and 124, that is, twenty-four less, with the Marlowian corpus. As a result of the low number of words that it contains, Scene II.iii from Richard III shares no 4-grams with the Marlowian corpus, while the Shakespearean corpus only presents 2 4-grams in common with it after the removal of the circumstantial ones that include the names of characters and locations that are exclusive of the play where the scene belongs from the list provided by ALTXA. These 4-grams are I fear I fear and the king is dead. The expression the king is dead includes two lexical words that are common in texts of this nature, for which it should not be seen as a solid idiolectal marker, whereas the 4-gram I fear I fear could be considered more distinctive, given that the repetition of the expression I fear seems like a conscious decision of the author who elaborated the dialogue. 126 According to the present study, it seems highly probable that Scene II.iii from Richard III was written by Shakespeare due to the clarity of the quantitative analysis of the common 3-grams and 2-grams, which has been slightly reinforced by the qualitative analysis of the 4-grams in common. In other words, this study has determined the authorship of the sample correctly. Scene III.iii from Shakespeare’s Richard III (197 words) The second scene involved in this stage of the pre-study is Scene III.iii from Richard III, which will be analysed under the same principles that have been delineated earlier. The results of the quantitative analysis of the 2-grams that it shares with the Shakespearean and the Marlowian corpora will be presented in Table 11. Since the scene does not have at least 10 3-grams in common with any of the two reference corpora, these will be analysed from a qualitative perspective. Table 11 | N-gram tracing with Scene III.iii from Richard III Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 2-grams 63 53 As can be observed in the table presented above, Scene III.iii from Richard III presents 63 2-grams in common with the Shakespearean corpus and 53, that is, ten less, with the Marlowian corpus, which is a considerable difference for such a short text. Scene III.iii from Richard III also has 6 3-grams in common with the Shakespearean corpus from which it has been extracted. These 3-grams are and for my, from all the, for them as, and I for, to death and and let me tell, which do not seem to be particularly distinctive. On the other hand, the scene has 4 3-grams in common with the Marlowian corpus, that is, two less than with the corpus of the Bard. These 3-grams are we meet again, and for my, we give to and and her princely. The latter 3-gram seems to be the most distinctive of the group due to the presence of the relatively uncommon adverb princely. No common n-grams of more than three words can be found between the scene and the reference corpora. This is not surprising, given that it only has 197 words and hence it has few chances of having larger constructions in common with them. 127 In brief, the quantitative analysis of the common 2-grams reveals that Shakespeare is the likeliest author of Scene III.iii from Richard III by a margin of ten points, which is notable for such a short sample. The analysis of the 3-grams in common shows that the scene also shares more with the Shakespearean corpus than with that of Marlowe, even though one of the 3-grams that it shares with the Marlowian corpus could be seen as the most distinctive. In any case, if the results of the study are observed from a holistic perspective, it seems highly probable that this scene was written by Shakespeare, for which it has achieved its goal successfully. Scene V.ii from Shakespeare’s Richard III (189 words) The results derived from the quantitative analysis of the 3-grams and 2-grams that Scene V.ii from Richard III shares with the reference corpora of Shakespeare and Marlowe can be observed in Table 12. After such results are commented, the number of 5-grams and 4-grams in common will be revealed and these constructions will be analysed from a qualitative perspective. Table 12 | N-gram tracing with Scene V.ii from Richard III Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 12 5 2-grams 70 57 Scene V.ii from Richard III presents 12 common 3-grams with the corpus of the Bard, which stands as more than twice of the 3-grams that it shares with the Marlowian corpus (5). Furthermore, there are 70 2-grams in common between the Shakespearean corpus and the scene, which shares 57 with the Marlowian corpus, that is, thirteen less than with the corpus of the Bard. There is a 5-gram shared between Scene V.ii from Richard III and the Shakespearean corpus, which is to reap the harvest of. This 5-gram, which can be divided into 2 4-grams (to reap the harvest and reap the harvest of), seems to be distinctive, given that it contains two lexical words (reap and harvest) that are relatively uncommon. The Marlowian corpus presents a 4-gram in common with the scene, which is the bowels of the. This 4-gram is not as distinctive as the 5-gram that has been previously commented, given that it only contains one lexical word. 128 According to this study, it seems highly probable that Scene V.ii from Richard III was written by Shakespeare, given the clarity of the quantitative analysis of the common 3- grams and 2-grams as well as the presence of a distinctive 5-gram in common between the scene and his reference corpus, which is unlikely to find in the analysis of such a short sample. In other words, the study has linked the authorship of the text to its author successfully. Scene II.iv from Shakespeare’s Richard II (192 words) The fourth Shakespearean scene involved in this stage of the pre-study belongs to Richard II, which is the second play used for the compilation of his reference corpus. The results of the quantitative analysis of the 3-grams and 2-grams that Scene II.iv from Richard II shares with the two reference corpora will be presented in Table 13 and later discussed. This will be followed by the qualitative analysis of the common 4-grams. Table 13 | N-gram tracing with Scene II.iv from Richard II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 13 6 2-grams 67 59 Table 13 shows that Scene II.iv from Richard II has more 3-grams and 2-grams in common with the Shakespearean corpus from which it has been extracted than with the corpus of Marlowe. While the scene has 13 common 3-grams with the corpus of the Bard, it shares 6 with the Marlowian corpus, that is, less than half. In addition, it shares 67 2- grams with the Shakespearean corpus and eight less (59) with the reference corpus of the other candidate. Scene II.iv from Richard II has no larger n-grams in common with the Marlowian corpus, while it shares 3 4-grams with the reference corpus of Shakespeare. These are friends are fled to, on the earth and and the king is dead. The first of these 4-grams seems to be an uncommon expression. According to this study, it seems highly probable that Scene II.iv from Richard II was written by Shakespeare. This is due to the clarity of the results of the quantitative analysis of the common 3-grams and 2-grams, which has been complemented by the presence of 129 3 4-grams in common between the scene and his reference corpus, one of which seems to be moderately distinctive. Therefore, the study has achieved its goal successfully. Scene III.i from Shakespeare’s Richard II (342 words) The last Shakespearean scene of between 100 and 450 words included in this stage of the pre-study is Scene III.i from Richard II. The results of the quantitative analysis of the common 3-grams and 2-grams between the scene and the two reference corpora will be offered first in the form of a table. Afterwards, such results will be discussed and complemented by the qualitative analysis of the common 4-grams. Table 14 | N-gram tracing with Scene III.i from Richard II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 21 19 2-grams 130 106 Table 14 shows that there is a small difference between the 3-grams that this scene shares with both reference corpora, given that it has 21 in common with the Shakespearean corpus and 19 with that of Marlowe. Scene III.i from Richard II also shares 130 2-grams with the Shakespearean corpus and 106 with that of the other candidate, which stands as a considerable difference of twenty-four points. While the scene has no larger n-grams in common with the Marlowian corpus, it shares 3 4-grams with the corpus of the Bard, which are and the hand of, I am a gentleman and to the king in. None of these three constructions seems to be distinctive of an author’s idiolect, for which they should not be seen as solid markers. According to the study, it seems highly probable that Scene III.i from Richard II was written by Shakespeare, since it shares more 3-grams with his corpus than with the corpus of Marlowe by a low margin, as well as more 2-grams by a notable difference. Furthermore, while the scene has no common 4-grams with the Marlowian corpus, it shares a few with the corpus of the Bard, even though they are not particularly distinctive. In sum, the five Shakespearean scenes of between 100 and 450 words that have been analysed as disputed texts with n-gram tracing have been successfully attributed to their author. The next five scenes will be extracted from the Marlowian corpus. 130 Scene II.iii from Marlowe’s Edward II (218 words) The results of the quantitative analysis of the common 3-grams and 2-grams between Scene II.iii from Edward II and the reference corpora of the two candidates of the study will be presented in Table 15 and later discussed. Afterwards, the qualitative analysis of the common 5-grams and 4-grams will be provided. Table 15 | N-gram tracing with Scene II.iii from Edward II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 16 19 2-grams 86 96 Table 15 shows that the scene presents more 3-grams in common with the Marlowian corpus than with that of Shakespeare by a low margin (19 vs. 16). Furthermore, it presents 96 common 2-grams with the corpus of Marlowe, which are ten more than those that it shares with the Shakespearean corpus (86). The scene presents a common 5-gram with the Shakespearean corpus, which is hardy as to touch the. This construction, which can be divided into the 4-grams hardy as to touch and as to touch the, seems to be unusual, for which it could be seen as an idiolectal marker. On the other hand, the scene shares the 4-gram the earl of Lancaster with the corpus of Marlowe, which does not seem to be distinctive, since it also shares the 2-gram of Lancaster with the corpus of Shakespeare. Therefore, it could be said that the qualitative analysis of the common 5-grams and 4-grams associates the scene with Shakespeare, given that it shares a 5-gram with his reference corpus that is more distinctive than the 4- gram that it has in common with the corpus of Marlowe. This contrasts with the results of the quantitative analysis of the common 3-grams and 2-grams. According to this study, it seems slightly probable that Scene II.iii from Edward II was written by Marlowe. The scene shares more 3-grams with the Marlowian corpus than with the corpus of Shakespeare by a low margin and more 2-grams by a difference of ten points, which is significant if the length of the scene is taken into consideration. Nevertheless, the qualitative analysis of the larger n-grams, which should complement 131 these results, links the scene to Shakespeare. In any case, this study has successfully associated the authorship of the scene with its author. Scene III.i from Marlowe’s Edward II (151 words) The results derived from the quantitative analysis of the 3-grams and 2-grams that Scene III.i from Edward II shares with the reference corpora of the two playwrights that constitute the focus of the study will be presented in Table 16. This will be followed by a discussion of such results and the qualitative analysis of the common 6-grams, 5-grams and 4-grams. Table 16 | N-gram tracing with Scene III.i from Edward II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 11 20 2-grams 61 74 Table 16 shows that the number of 3-grams in common between the scene and the Marlowian corpus is almost twice as the number of 3-grams that it shares with the corpus of the Bard (20 vs. 11). There is also a difference of more than ten points if the number of 2-grams that Scene III.i from Edward II shares with the reference corpus of Marlowe (74) is compared to those that it has in common with the Shakespearean corpus (61). These differences seem especially significant if the fact that the scene only contains 151 words is taken into consideration. In addition, the scene has a common 6-gram with the Marlowian corpus, which is shall I not see the king. This 6-gram, which can be divided into 2 5-grams and 3 4-grams, seems to be distinctive, given that it is the largest construction in common found in this stage of the pre-study. The text also presents 2 more 4-grams in common with the Marlowian corpus, apart from those derived from the division of the 6-gram mentioned above. These are tell him that I and of all my bliss, that do not seem to be solid idiolectal markers. There is as well a common 4-gram between the scene and the corpus of the Bard, which is the king of heaven. This expression appears to be relatively distinctive, given its metaphorical meaning. 132 The results derived from the quantitative analysis reveal that it is highly probable that Scene III.i from Edward II was written by Marlowe, given that it shares more 3-grams and 2-grams with his reference corpus than with that of the Bard by a considerable margin. This has been reinforced by the presence of a common 6-gram between the scene and the corpus of Marlowe that stands as a robust idiolectal marker, as well as a few 5- grams and 4-grams, and thus this study has linked the authorship of the scene to its author effectively. Scene IV.i from Marlowe’s Edward II (123) The attribution of authorship of this scene appears to be of great difficulty, given that it is the one with the lowest number of words and, consequently, there are less chances of finding common constructions between the text and the two reference corpora. The results of the quantitative analysis of the common 2-grams will be presented in Table 17 and later discussed. Afterwards, the qualitative analysis of the 4-grams and 3-grams in common will be provided. Table 17 | N-gram tracing with Scene IV.i from Edward II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 2-grams 33 35 Table 17 shows that the quantification of the common 2-grams offers a very similar result for both playwrights. While Scene IV.i from Edward II has 33 2-grams in common with the Shakespearean corpus, it shares 35, that is, only two more, with the corpus of Marlowe. This low difference should not be shocking if the length of the scene is taken into consideration. The software ALTXA has found no common 4-grams between Scene IV.i from Edward II and the Shakespearean corpus and only one between it and the corpus of Marlowe, which is but hath your grace. This 4-gram does not seem to be a distinctive construction for a play of this kind. The scene has 2 3-grams in common with the Shakespearean corpus and 7 with the corpus of Marlowe. The 3-grams that Scene IV.i from Edward II shares with the Shakespearean corpus are you my lord and hath my lord, which seem to be frequent constructions. 133 The 7 3-grams that the scene shares with the Marlowian corpus are you my lord, which can be also found in the Shakespearean corpus, hath your grace, but hath your, to pass in, me leave to, my country’s cause and for England’s good. Among these constructions, my country’s cause and for England’s good appear to be the most distinctive of the group due to the use of the genitive. This stands as an idiolectal choice, given that the author could have written the cause of my country and for the good of England. According to the present study, it seems slightly probable that Scene IV.i from Edward II was written by Marlowe, given that it shares more 2-grams with his corpus than with that of Shakespeare by a considerably low margin and the qualitative analysis of the common 4-grams and 3-grams links its authorship to him with certain clarity. Although the results are not as clear as on other occasions, the study has been successful. Scene IV.iv from Marlowe’s Edward II (228 words) Table 18 shows the common 3-grams and 2-grams between Scene IV.iv from Edward II and the reference corpora of the two candidates of the study. After these results are commented, the qualitative analysis of the only common 4-gram that has been found by ALTXA will be presented. Table 18 | N-gram tracing with Scene IV.iv from Edward II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 11 17 2-grams 84 94 As can be observed in the table presented above, the scene has six more 3-grams in common with the Marlowian corpus (17) than with the corpus of the Bard (11). In addition, there is a difference of ten points if the number of 2-grams that Scene IV.iv from Edward II shares with the corpus of Marlowe (94) is compared to those that it has in common with the corpus of the other candidate (84). The scene does not share any larger n-grams with the Shakespearean corpus, whereas it has a common 4-gram with the corpus of Marlowe, which is England’s wealth and treasury. This can be seen as a distinctive construction, given that it includes three lexical words and the use of the genitive, which stands as an idiolectal choice, as pointed out in the analysis of the previous scene. 134 In sum, it seems highly probable that this scene was written by Marlowe if the clarity of the results of the quantitative analysis of the common 3-grams and 2-grams is taken into account. Furthermore, these results have been complemented by the presence of a distinctive 4-gram in common between the scene and the reference corpus of Marlowe, for which the study has achieved its goal successfully. Scene III.i from Marlowe’s The Jew of Malta (253 words) The last scene included in this stage of the pre-study is Scene III.i from The Jew of Malta. The results of the quantitative analysis of the 3-grams and 2-grams that the scene shares with the two reference corpora will be presented in Table 19. This will be discussed and complemented by the qualitative analysis of the common 5-grams and 4-grams. Table 19 | N-gram tracing with Scene III.i from The Jew of Malta Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 23 31 2-grams 96 104 Scene III.i from The Jew of Malta presents eight more 3-grams in common with the Marlowian corpus than with the corpus of Shakespeare (31 vs. 23). Similarly, there is a difference of eight points between the 2-grams that the scene shares with the corpus of Marlowe (104) and those that it has in common with the corpus of the other candidate (96). There is a 5-gram in common between the scene and the Marlowian corpus. This is or it shall go hard, which does not seem to be especially distinctive. On the other hand, the scene does not share any 5-grams with the corpus of Shakespeare. The analysis of the common 4-grams conducted by ALTXA reveals that Scene III.i from The Jew of Malta shares 2 with the corpus of Shakespeare and 8 with that of Marlowe. The 2 4-grams that it shares with the Shakespearean corpus are I know she is and and here he comes, which seem to be common combinations of words. The 8 4-grams that Scene III.i from The Jew of Malta has in common with the Marlowian corpus are, apart from those derived from the division of the 5-gram or it shall go hard, she is a courtesan, in such sort as, that ever I beheld, and in the night, and here he comes and and yet I know. Among these, the 4-gram that ever I beheld appears to be 135 the most distinctive construction of the group because of the inversion of the words I and ever, which takes place in an affirmative sentence both in the extracted scene and the remaining corpus. In the light of the findings provided by the study, it seems highly probable that Scene III.i from The Jew of Malta was written by Marlowe. The clarity of the results of the quantitative analysis of the common 3-grams and 2-grams has been reinforced by the presence of a 5-gram and 8 4-grams in common between the scene and the Marlowian corpus, one of which seems to be distinctive. Conclusions derived from the first stage of the pre-study The authorship of five Shakespearean and five Marlowian scenes that contain between 100 and 450 words has been analysed as if they were disputed texts with the purpose of testing the validity of n-gram tracing to determine the authorship of the scenes of Arden of Faversham of the same length. The authorship of the ten scenes has been correctly attributed, with a high degree of certainty on eight of the occasions. Since the success rate of this method has been of 100%, it will be used in the analysis of the scenes of Arden of Faversham that have between 100 and 450 words. The next stage of the pre-study will study scenes whose length ranges from 500 to 950 words. 5.3.2. N-gram tracing with scenes of between 500 and 950 words The second stage of this pre-study will analyse the authorship of five Shakespearean and five Marlowian scenes whose length ranges from 500 to 950 words using n-gram tracing. The scenes will be randomly selected and the analyses will be built upon the same methodological principles that have been followed in the previous stage of the pre-study. Scene II.iv from Shakespeare’s Richard III (591 words) The first scene that has been randomly selected for this second stage of the pre-study is Scene II.iv from Richard III. The results of the quantitative analysis of the 3-grams and 2-grams that it shares with the reference corpora of the two candidates that constitute the focus of the study can be observed in Table 20. Afterwards, the discussion of such results and the qualitative analysis of the common 5-grams and 4-grams will be provided. 136 Table 20 | N-gram tracing with Scene II.iv from Richard III Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 43 29 2-grams 208 208 Table 20 shows that Scene II.iv from Richard III has more than ten 3-grams in common with the Shakespearean corpus than with the Marlowian corpus (43 vs. 29), although the number of 2-grams shared between the scene and the two reference corpora is identical (208). There is a common 5-gram between the text and the Shakespearean corpus, although this 5-gram is I will go with you, which is a common expression that should not be seen as a solid idiolectal marker. Apart from the 2 4-grams in which the previous 5-gram can be divided, the scene also shares another 5 4-grams with the Shakespearean corpus, which are with all my heart, you have no cause, me my gracious lord, how doth the prince and to-morrow or next day. None of these constructions seems to be distinctive of an author’s idiolect. The scene shares 2 4-grams with the Marlowian corpus, which are with all my heart, which can be also found on the list of common 4-grams between the scene and the Shakespearean corpus, and the ruin of my, which is formed by three function words and hence it does not seem to be a solid marker either. In brief, the quantitative analysis of the common 3-grams and 2-grams associates the authorship of the scene with Shakespeare, although it shares the same number of 2-grams with the two reference corpora. The scene also shares more 5-grams and 4-grams with the corpus of the Bard, but these are not distinctive. According to this study, it seems slightly probable that Scene II.iv from Richard III was written by Shakespeare. In other words, the study has successfully achieved its goal, although not with a high degree of certainty. Scene III.iv from Shakespeare’s Richard III (860 words) The results of the quantitative analysis of the 4-grams, 3-grams and 2-grams that Scene III.iv from Richard III shares with the two reference corpora will be presented in Table 137 21. After the results provided in such table are commented, a qualitative analysis of the common 5-grams will be offered. Table 21 | N-gram tracing with Scene III.iv from Richard III Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 11 3 3-grams 57 45 2-grams 325 294 Table 21 shows that the scene has more 4-grams, 3-grams and 2-grams in common with the Shakespearean corpus than with that of Marlowe. Scene III.iv from Richard III shares 11 4-grams with the corpus of the Bard, which are many for a text of this length, and only 3 4-grams with the Marlowian corpus. It also has 57 3-grams in common with the Shakespearean corpus, which are twelve more than those that it shares with the corpus of Marlowe (45). Furthermore, there is a difference of more than thirty 2-grams between those that the scene shares with the corpus of the Bard (325) and those that it has in common with the Marlowian corpus (294). In addition, the scene shares 2 5-grams with the Shakespearean corpus, which are will my lord with all and time here comes the duke. The latter 5-gram could be seen as an idiolectal marker, given that the combination of words time here does not appear in the corpus of the other candidate. According to the study, it seems highly probable that Scene III.iv from Richard III was written by Shakespeare, given that the quantitative analysis of the common 4-grams, 3-grams and 2-grams associates the authorship of the scene with his reference corpus with great clarity. Moreover, ALTXA has identified 2 5-grams in common between the scene and the corpus of the Bard, one of which appears to be relatively distinctive. In sum, the study has effectively accomplished its goal. Scene IV.ii from Shakespeare’s Richard III (920 words) The results of the quantitative analysis of the 3-grams and 2-grams that this scene shares with the reference corpora of the two candidates will be presented in Table 22 and later commented. Afterwards, the qualitative analysis of the common 5-grams and 4-grams will be provided. 138 Table 22 | N-gram tracing with Scene IV.ii from Richard III Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 66 50 2-grams 319 292 Table 22 shows that Scene IV.ii from Richard III shares 66 3-grams with the Shakespearean corpus, that is, sixteen more than with the Marlowian corpus, with which it shares 50. Furthermore, there is a difference of more than twenty-five points if the number of 2-grams shared between the scene and the Shakespearean corpus (319) is compared to those that it has in common with the reference corpus of Marlowe (292). There is a 5-gram in common between Scene IV.ii from Richard III and the Shakespearean corpus, which is may it please to you. This 5-gram, which can be divided into 2 4-grams, does not seem to stand as a particularly distinctive construction and, as a matter of fact, the 3-gram may it please can be also found in the corpus of Marlowe. Apart from the 2 4-grams derived from the division of the abovementioned 5-gram, the scene also shares with the corpus of Shakespeare upon the stroke of, my lord I have, but I had rather, me my gracious lord and and will no doubt. Among the 7 4-grams that it has in common with the Shakespearean corpus, upon the stroke of appears to be the most unusual construction. Scene IV.ii from Richard III shares 3 4-grams with the Marlowian corpus, which are no more but so, a friend of mine and what say’st thou now. None of these constructions seems to be a reliable authorship marker. The clarity of the quantitative analysis of the common 3-grams and 2-grams combined with that of the qualitative analysis of the 5-grams and 4-grams in common suggests that it is highly probable that this scene was written by Shakespeare, for which the study has attributed the scene to its author correctly. Scene III.iv from Shakespeare’s Richard II (856 words) Table 23 presents the 3-grams and 2-grams that Scene III.iv from Richard II shares with the corpora of Shakespeare and Marlowe. Such results will be discussed and complemented by the qualitative analysis of the common 5-grams and 4-grams. 139 Table 23 | N-gram tracing with Scene III.iv from Richard II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 40 45 2-grams 259 237 As can be observed in the table presented above, while the scene has more 3-grams in common with the corpus of Marlowe by a low margin (45 vs. 40), it presents more common 2-grams with the Shakespearean corpus by a considerable difference of twenty- two points (259 vs. 237). According to the lists provided by ALTXA, the text presents 3 4-grams in common with the Shakespearean corpus, which are had he done so, in the remembrance of and the king shall be. None of them seems to be distinctive. The scene also presents 2 5-grams in common with the Marlowian corpus, which are what was I born to and how can’st thou by this. The 5-gram what was I born to appears to be distinctive, given that it is part of a rhetorical question. These 2 5-grams can be divided into 4 4-grams, which are the only ones that the scene shares with the reference corpus of Marlowe. According to the present study, it seems uncertain if Scene III.iv from Richard II was written by Shakespeare or Marlowe. While the scene presents more 2-grams in common with the Shakespearean corpus by a notable difference of more than twenty points, it shares more 3-grams with the Marlowian corpus by a narrow margin. It also shares 2 5- grams with the corpus of Marlowe, one of which is distinctive. This means that, for the first time during the conduction of the pre-study, the authorship of a scene has not been correctly associated with its author, although it has not been misattributed either. Scene V.i from Shakespeare’s Richard II (836 words) The last Shakespearean scene of between 500 and 950 words that has been randomly selected for this stage of the pre-study is Scene V.i from Richard II. Table 24 shows the 3-grams and 2-grams that it shares with the two reference corpora. After these results are discussed, the common 5-grams and 4-grams will be revealed and analysed from a qualitative perspective. 140 Table 24 | N-gram tracing with Scene V.i from Richard II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 33 33 2-grams 259 241 Although Scene V.i from Richard III shares the same number of 3-grams (33) with the two reference corpora, it has 259 2-grams in common with the Shakespearean corpus, which are almost twenty more than those that it shares with the Marlowian corpus (241). The scene shares the 5-gram and yet not so for with the Shakespearean corpus, which does not seem to be especially distinctive, given that it does not include any uncommon lexical words and the 3-gram yet not so can be found in the corpus of Marlowe. In addition, the scene shares 4 4-grams with the Shakespearean corpus, which are, apart from the 2 4-grams that derive from the division of the abovementioned 5-gram, on my head and and with a heavy heart. The latter 4-gram holds a metaphorical meaning, for which it stands as a robust idiolectal marker. The scene also has the 4-gram I am dead and in common with the Marlowian corpus, which is relatively distinctive, given that it seems unusual to claim such thing in the first person. If these analyses are observed from a holistic perspective, it seems highly probable that Scene V.i from Richard III was written by Shakespeare. Even though the scene shares the same number of 3-grams with the two reference corpora, it shares more 2-grams with the Shakespearean corpus by a notable margin and the qualitative analysis of the common 5-grams and 4-grams also links the scene to him with clarity, for which this study has accomplished its objective effectively. In sum, the authorship of four of the five Shakespearean scenes that have been included in this stage of the pre-study has been correctly attributed, whereas the remaining one could not be clearly associated with any of the two candidates. The following scenes will be extracted from the corpus of Marlowe. Scene II.i from Marlowe’s Edward II (649 words) The first Marlowian scene of between 500 and 950 words that has been randomly selected for this stage of the pre-study is Scene II.i from Edward II. The number of 3-grams and 141 2-grams that this scene shares with the Marlowian corpus from which it has been extracted and with that of Shakespeare will be presented in Table 25. Afterwards, a qualitative analysis of the common 5-grams and 4-grams between the scene and these reference corpora will be conducted. Table 25 | N-gram tracing with Scene II.i from Edward II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 38 49 2-grams 239 238 Scene II.i from Edward II shares eleven more 3-grams with the corpus of Marlowe (49) than with that of the Bard (38). Nevertheless, the scene presents one more 2-gram in common with the Shakespearean corpus (239 vs. 238), which contrasts with the previous comparison. The scene has 7 4-grams in common with the Marlowian corpus, which are my lord when I, my lord the king, the king and he, I humbly thank your, it shall be done, and now and then and not so much as. Among these 4-grams, the only one that appears to be distinctive is I humbly thank your because of the way in which the verb thank is modified by the adverb humbly, which stands as an idiolectal choice that cannot be found in the corpus of the other candidate as a 2-gram. On the other hand, the text shares 3 4-grams with the Shakespearean corpus, which are my lord the king, a friend of mine and he loves me well. None of these 4-grams seems to be a solid authorship marker. Taking into consideration that Scene II.i from Edward II only shares one more 2-gram with the Shakespearean corpus than with that of Marlowe, but it has eleven more 3-grams in common with the corpus of the latter and the qualitative analysis of the common 4- grams also associates its authorship with him, it seems slightly probable that it was written by Marlowe. In other words, the study has been successful, although not with the same degree of certainty as on other occasions. Scene III.iii from Marlowe’s Edward II (726 words) The results of the quantitative analysis of the 3-grams and 2-grams in common between Scene III.iii from Edward II and the two reference corpora will be presented in Table 26 142 and later commented. Afterwards, the qualitative analysis of the common 5-grams and 4- grams will be provided. Table 26 | N-gram tracing with Scene III.iii from Edward II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 36 28 2-grams 211 236 Table 26 shows that the scene has eight more 3-grams in common with the Shakespearean corpus than with the corpus of Marlowe (36 vs. 28) and twenty-five more 2-grams in common with the Marlowian corpus than with the corpus of the Bard (236 vs. 211). In other words, the study of the common 3-grams associates the scene with the corpus of Shakespeare with moderate certainty and the study of the 2-grams in common does the same with the Marlowian corpus, for which it could be said that the results of this quantitative analysis are inconclusive. The scene shares a 5-gram with both reference corpora. The 5-gram that it shares with the corpus of the Bard is the worst is death and, while the 5-gram that it has in common with the Marlowian corpus is have you no doubt my. While the first 5-gram seems distinctive because it conveys a negative view towards death and thus it stands as a specific vision of the world, the second one includes the combination of words have you in an imperative sentence, which is something that cannot be found in the corpus of the other candidate. The 4-grams that the scene shares with the Shakespearean corpus and are not derived from the division of the 5-gram that has been previously commented are sound drums and trumpets, which includes three lexical words and appears to be distinctive, it may not be, which seems to be a common construction in plays of this kind, and are up in arms, which can be also found in the Marlowian corpus and thus it should not be considered a solid authorship marker. The 4-grams that Scene III.iii from Edward II has in common with the Marlowian corpus and do not come from the division of the 5-gram that has been already commented are ‘gainst law of arms, that appears to be a relatively unusual combination of words, I 143 doubt it not, which seems to be a frequent construction in texts of this nature, and are up in arms, which can be also found in corpus of the other candidate. According to this study, it seems uncertain if Scene III.iii from Edward II was written by Shakespeare or Marlowe, given that neither the quantitative analysis of the common 3-grams and 2-grams nor the qualitative analysis of the 5-grams and 4-grams in common links with certain clarity the scene to one of the reference corpora. Therefore, while the study has not attributed the sample to the wrong author, it has not been able to provide substantial evidence to attribute its authorship to Marlowe. Scene III.iii from Marlowe’s The Jew of Malta (521 words) The number of 3-grams and 2-grams that Scene III.iii from The Jew of Malta shares with the two reference corpora will be presented in Table 27, together with a brief discussion of such results and the qualitative analysis of the common 5-grams and 4-grams. Table 27 | N-gram tracing with Scene III.iii from The Jew of Malta Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 17 23 2-grams 138 169 Scene III.iii from The Jew of Malta presents six more 3-grams in common with the Marlowian corpus (23) than with that of the other candidate (17). In addition, there is a dramatic difference of more than thirty points between the number of 2-grams that the scene shares with the Marlowian reference corpus (169) and those that it shares with the corpus of the Bard (138). The scene also shares the 5-gram nay you shall pardon me with the Marlowian corpus, which can be divided into 2 4-grams. The use of nay instead of other linguistic items could be seen as an idiolectal choice of the author or as a dialectal or context-dependent form, as happens with ay and yea (see Section 3.4.4). On the other hand, the scene does not share any 5-grams with the corpus of Shakespeare. Since Scene III.iii from The Jew of Malta shares more 3-grams and 2-grams with the Marlowian corpus and there is also a relatively distinctive 5-gram in common between the scene and his corpus, it seems highly probable that it was written by him, for which the study has been effective. 144 Scene IV.v from Marlowe’s The Jew of Malta (532) Table 28 shows the 3-grams and 2-grams that Scene IV.v from The Jew of Malta shares with the Marlowian reference corpus from which it has been extracted and with that of Shakespeare. After such results are discussed, the qualitative analysis of the common 4- grams will be presented. Table 28 | N-gram tracing with Scene IV.v from The Jew of Malta Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 25 42 2-grams 175 216 Scene IV.v from The Jew of Malta has seventeen more 3-grams in common with the Marlowian corpus (42) than with the corpus of the Bard (25). Furthermore, the quantification of the common 2-grams reveals that the scene shares 216 with the reference corpus of Marlowe and 175, that is, thirty-one less, with the corpus of Shakespeare, which stands as a notable difference. The scene also has 4 4-grams in common with the corpus of Marlowe. These are and I shall die, in my power to, and when he comes and send me three hundred, which do not seem to be especially distinctive. On the other hand, it shares the 4-gram I cannot do it with the Shakespearean corpus, which is a highly frequent expression and, as a matter of fact, the 2-gram I cannot can be found multiple times in both reference corpora. According to the study, it seems highly probable that Scene IV.v from The Jew of Malta was written by Marlowe, given the clarity of the quantitative analysis of the common 3-grams and 2-grams and the fact that it also shares more 4-grams with the corpus of Marlowe, even though they are not distinctive. In other words, this study has successfully attributed the text to its author. Scene V.i from Marlowe’s The Jew of Malta (762 words) The study of Scene V.i from The Jew of Malta constitutes the last one of this stage of the pre-study. The results of the quantitative analysis of the common 3-grams and 2-grams between the scene and the two reference corpora will be presented in Table 29, which 145 will be later commented. This will be complemented by the qualitative analysis of the common 5-grams and 4-grams. Table 29 | N-gram tracing with Scene V.i from The Jew of Malta Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 3 10 3-grams 35 65 2-grams 248 289 As can be observed in Table 29, while the scene shares 3 4-grams with the corpus of Shakespeare, it has seven more in common, that is, 10, with the Marlowian corpus, which are many for a text of this length. The scene also has 35 3-grams in common with the Shakespearean corpus, whereas it presents 65, that is, thirty more, with the reference corpus of Marlowe. Lastly, there is a remarkable difference of more than forty points if the number of 2-grams that the scene shares with the corpus of Shakespeare (248) is compared to those that it has in common with the Marlowian corpus (289). Moreover, while the scene does not have any 5-grams in common with the corpus of Shakespeare, it shares 3 with the Marlowian corpus. These are my lord and here they, once more away with him and and I know not that, which do not seem to be distinctive constructions. In the light of the findings provided by the study, it seems highly probable that Scene V.i from The Jew of Malta was written by Marlowe, given the clarity of the results of the quantitative analysis of the 4-grams, 3-grams and 2-grams in common and the fact that the scene also shares 3 5-grams with his reference corpus, even though they are not particularly distinctive. Therefore, this study has achieved its goal efficiently. Conclusions derived from the second stage of the pre-study Five Shakespearean and five Marlowian scenes of between 500 and 950 words have been analysed as if they were disputed texts to evaluate if n-gram tracing can determine their authorship correctly. Eight of these scenes have been successfully attributed to their author, whereas the results derived from the study of the two remaining scenes have been inconclusive. 146 This method has been proved to have an effectiveness of 80% in this linguistic context and, in those cases in which the scenes have not been associated with their author, they have not been misattributed either, which is of vital importance. For these two reasons, n-gram tracing will be used to analyse the scenes of Arden of Faversham that contain between 500 and 950 words. 5.3.3. N-gram tracing with scenes of between 1,100 and 1,700 words This stage of the pre-study, which willl be built upon the same methodological foundations than the previous ones, will analyse the authorship of five Shakespearean and five Marlowian random scenes whose length ranges from 1,100 to 1,700 words. Scene I.i from Shakespeare’s Richard III (1,243) The 4-grams, 3-grams and 2-grams that Scene I.i from Richard III shares with the two reference corpora can be observed in Table 30. After such results are commented, the larger n-grams in common will be revealed and analysed from a qualitative perspective. Table 30 | N-gram tracing with Scene I.i from Richard III Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 10 3 3-grams 82 66 2-grams 412 397 Table 30 shows that Scene I.i from Richard III shares more 4-grams, 3-grams and 2- grams with the Shakespearean reference corpus from which it has been extracted than with the corpus of Marlowe. The scene has 10 4-grams in common with the Shakespearean corpus and only 3, that is, seven less, with the Marlowian corpus. Furthermore, while it has 82 3-grams in common with the corpus of the Bard, it presents sixteen less in common (66) with that of the other candidate. Similarly, there is a difference of fifteen points between the 2-grams that the scene shares with the Shakespearean corpus (412) and with the corpus of Marlowe (397). There is also an 8-gram in common between the scene and the Shakespearean corpus, which is I do beseech your grace to pardon me. This is the largest n-gram in common that has been found in the pre-study so far and it seems to be a robust authorship marker not only because of its length, but also for the inclusion of the auxiliar do between the 147 subject and the verb to emphasize the construction, which reflects a conscious linguistic choice of the author. The division of this 8-gram generates 2 7-grams, 3 6-grams and 4 5- grams in common between the scene and the corpus of the Bard. The clarity of the quantitative analysis of the common 4-grams, 3-grams and 2-grams combined with the presence of a highly distinctive 8-gram shared between the scene and the Shakespearean corpus stands as solid proof to suggest that it is highly probable that Scene I.i from Richard III was written by him, for which the study has achieved its goal effectively. Scene II.ii from Shakespeare’s Richard III (1,214) Table 31 includes the number of 3-grams and 2-grams that Scene II.ii from Richard III shares with the Shakespearean corpus from which it has been extracted and with the corpus of Marlowe. Afterwards, the 5-grams and 4-grams that the scene has in common with them will be qualitatively analysed. Table 31 | N-gram tracing with Scene II.ii from Richard III Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 62 48 2-grams 379 324 As can be observed in the table presented above, while the scene has 62 3-grams in common with the Shakespearean corpus, it shares fourteen less (48) with the corpus of Marlowe. Furthermore, there is a dramatic difference of more than fifty points between the number of 2-grams that the scene shares with the corpus of the Bard (379) and with the Marlowian corpus (324). Scene II.ii from Richard III also has the 5-gram to reap the harvest of in common with the Shakespearean corpus, which includes the uncommon lexical words reap and harvest and therefore appears to be a distinctive construction. In addition, it has 7 4-grams in common with the corpus of Shakespeare. These are, apart from the 2 4-grams that derive from the division of the abovementioned 5-gram, will you go to, I hope the king, and make me die, God will revenge it and who shall hinder me. Among these 4-grams, it is worth mentioning that make me die appears as an imperative construction both in the scene and the reference corpus, and God will revenge 148 it seems to be highly distinctive for the combination of the words God and revenge, which reflects a specific perception that the author has about God. The scene presents 3 common 4-grams with the Marlowian corpus, which are and so do I, and so will I and what noise is this. None of these constructions seems to be a solid authorship marker. Given the clarity of the results of the quantitative analysis of the common 3-grams and 2-grams, as well as the number of 5-grams and 4-grams that the scene shares with the Shakespearean corpus and how distinctive they are, it seems highly probable it was written by him, and therefore this study has successfully achieved its objective. Scene I.i from Shakespeare’s Richard II (1,605 words) Scene I.i from Richard II has been removed from the reference corpus where it belongs and the n-grams that it shares with such corpus and with that of Marlowe have been identified by ALTXA. The number of 3-grams and 2-grams that the scene shares with the two reference corpora can be observed in Table 32, which will be later commented. Afterwards, the 4-grams in common will be qualitatively analysed. Table 32 | N-gram tracing with Scene I.i from Richard II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 70 58 2-grams 461 415 Scene I.i from Richard II shares more 3-grams with the Shakespearean corpus than with that of the other candidate by a margin of twelve points (70 vs. 58). Furthermore, there is a dramatic difference of forty-six points if the number of 2-grams that the scene has in common with the corpus of the Bard (461) is compared to those that it shares with the Marlowian corpus (415). The scene also has one more 4-gram in common with the corpus of Shakespeare than with that of the other candidate. The 3 4-grams that Scene I.i from Richard II shares with the Shakespearean corpus are against the duke of and thou art a traitor, which do not seem to be particularly distinctive, and the kindred of the, that contains the word kindred, which is not present in the corpus of the other candidate. 149 The 2 4-grams that the scene shares with the corpus of Marlowe are of the king and, which seems to be a highly frequent construction in texts of this nature, and be rul’d by me, which should not be seen as a solid marker either, since the 2-gram rul’d by can be found multiple times in the corpora of the two candidates. Taking into consideration the clarity of the quantitative analysis of the common 3- grams and 2-grams, which has been slightly reinforced by the qualitative analysis of the shared 4-grams, it seems highly probable that Scene I.i from Richard II was written by Shakespeare, and thus this study has effectively attributed the text to its author. Scene II.ii from Shakespeare’s Richard II (1,150 words) The results of the quantitative analysis of the 4-grams, 3-grams and 2-grams that Scene II.ii from Richard II shares with the two reference corpora will be presented in Table 33, which will be later commented and complemented by the qualitative analysis of the common 6-grams and 5-grams. Table 33 | N-gram tracing with Scene II.ii from Richard II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 15 7 3-grams 82 67 2-grams 372 332 The table shows that Scene II.ii from Richard II shares eight more 4-grams with the Shakespearean corpus from which it has been extracted than with the corpus of the other candidate (15 vs. 7). It also has fifteen more 3-grams in common with the Shakespearean corpus (82) than with the corpus of Marlowe (67), and there is dramatic difference of forty points if the number of 2-grams that the scene shares with the corpus of the Bard (372) is compared to those that it has in common with the Marlowian corpus (332). Scene II.ii from Richard II has the 6-gram here comes the duke of York in common with the Shakespearean corpus, which includes the frequent construction here comes the duke and the 2-gram of York, which can be also found in the corpus of Marlowe. Apart from the 2 5-grams that derive from the division of the abovementioned 6-gram, the scene also shares the 5-gram myself I cannot do it with the Shakespearean corpus, which does not seem to be a distinctive combination of words. 150 On the other hand, the scene shares the 5-gram it may be so but with the Marlowian corpus, which seems like an ordinary construction and therefore it should not be seen as a solid authorship marker either. According to this study, it seems highly probable that Scene II.ii from Richard II was written by Shakespeare, given the clarity of the results of the quantitative analysis of the common 4-grams, 3-grams and 2-grams, which has been reinforced by the presence of a 6-gram and a few 5-grams in common between the scene and his reference corpus, even though they are not particularly distinctive. Therefore, the study has effectively associated the scene with its author. Scene II.iii from Shakespeare’s Richard II (1,377 words) The last Shakespearean scene selected for this stage of the pre-study is Scene II.iii from Richard II. The number of 4-grams, 3-grams and 2-grams that it has in common with the reference corpora of the two candidates of the study can be observed in Table 34, which will be commented. Afterwards, the common 5-grams will be qualitatively analysed. Table 34 | N-gram tracing with Scene II.iii from Richard II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 14 10 3-grams 97 84 2-grams 494 440 The scene has more 4-grams in common with the Shakespearean corpus than with that of Marlowe by a narrow margin of four points (14 vs. 10). It also has 97 3-grams in common with the corpus of the Bard and thirteen less (84) with the Marlowian corpus. Lastly, there is a remarkable difference of more than fifty points if the number of 2-grams that the scene shares with the Shakespearean corpus (494) is compared to those that it has in common with the corpus of Marlowe (440). According to the analysis conducted by ALTXA, Scene II.iii from Richard II shares a 5-gram with each of the two reference corpora. The one that it shares with the Shakespearean corpus is I will go with you, whereas what would you have me is the one that it has in common with the corpus of Marlowe. None of these constructions seems to be distinctive. 151 Even though the qualitative analysis of the shared 5-grams does not associate the authorship of Scene II.iii from Richard II with any of the two candidates of the study, the clarity of the quantitative analysis of the 4-grams, 3-grams and 2-grams in common suggests that it is highly probable that it was written by Shakespeare. In conclusion, the five Shakespearean scenes of between 1,100 and 1,700 words that have been analysed using n-gram tracing have been correctly attributed to their author with a high degree of certainty. The next five scenes will be extracted from the corpus of the other candidate. Scene I.i from Marlowe’s Edward II (1,588 words) The number of 4-grams, 3-grams and 2-grams that Scene I.i from Edward II shares with the Marlowian corpus and with that of Shakespeare can be observed in Table 35, which will be discussed and complemented by the qualitative analysis of the 6-grams and 5- grams in common. Table 35 | N-gram tracing with Scene I.i from Edward II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 9 14 3-grams 78 117 2-grams 489 551 There is a relatively narrow margin of five points between the 4-grams that the scene shares with the Marlowian corpus (14) and with the corpus of the Bard (9). The scene also has 117 3-grams in common with the corpus of Marlowe and 78 with the Shakespearean corpus, which is a notable difference of almost forty points. Lastly, while the scene shares 551 2-grams with the Marlowian corpus, it presents sixty-two less in common with the corpus of Shakespeare (489), which reflects the clarity with which this analysis associates the scene with its author. Scene I.i from Edward II has more 6-grams and 5-grams in common with the Shakespearean corpus than with the corpus of Marlowe, which contrasts with the results of the quantitative analysis presented earlier. While the scene has no 6-grams in common with the Marlowian corpus, it shares the 6-gram I cannot nor I will not with the corpus of Shakespeare, which does not seem to be particularly distinctive. 152 Apart from the 2 5-grams that derive from the division of the abovementioned 6-gram, the scene shares another 5-gram with the Shakespearean corpus, which is to be reveng’d on thee. This construction is almost identical to the 4-gram to be reveng’d on that the scene shares with the Marlowian corpus, for which it should not be seen as a robust authorship marker. On the other hand, the scene presents the 5-gram the favourite of a king in common with the Marlowian corpus, which does not appear to be a distinctive combination of words either. Scene I.i from Edward II shares a few more 6-grams and 5-grams with the corpus of Shakespeare than with that of Marlowe, but none of them seems to be a robust marker if they are analysed from a qualitative perspective. Nevertheless, the quantitative analysis of the shared 4-grams, 3-grams and 2-grams associates the authorship of the scene with Marlowe with such clarity that it seems highly probable that it was written by him, and hence this study has been successful. Scene III.ii from Marlowe’s Edward II (1,401 words) The number of 3-grams and 2-grams in common between Scene III.ii from Edward II and the two reference corpora can be observed in Table 36. After the results presented in the table are discussed, the common 5-grams and 4-grams will be revealed and qualitatively analysed. Table 36 | N-gram tracing with Scene III.ii from Edward II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 73 74 2-grams 421 447 Table 36 shows that the number of 3-grams that Scene III.ii from Edward II shares with the two reference corpora is almost identical, although it has one more in common with the Marlowian corpus than with that of the other candidate (74 vs. 73). The scene also has more 2-grams in common with the corpus of Marlowe, but the difference is notable in this case, given that it shares 447 with his corpus and 421, that is, twenty-six less, with the corpus of the Bard. 153 ALTXA has identified a 5-gram in common between the scene and each of the two reference corpora. The 5-gram that it shares with the Marlowian corpus is with the king of France, which does not seem to be distinctive, since France is mentioned multiple times in the reference corpora of the two playwrights and, as a matter of fact, the 4-gram the king of France can be also found in the Shakespearean corpus. The 5-gram that the scene has in common with the corpus of Shakespeare is lord I take my leave, which seems like a conventional way of bidding farewell, for which it does not appear to be a distinctive marker either. Scene III.ii from Edward II also presents 9 4-grams in common with the Marlowian corpus and 7 with the corpus of the Bard. The 4-grams that it shares with the corpus of Marlowe and are not derived from the division of the 5-gram that has been previously commented are I here create thee, long live king Edward, drink your fill and, and by my father’s, ‘gainst law of arms, undertake to carry him and the king and do. Among these 4-grams, I here create thee stands out as a remarkable linguistic choice, given that it holds a certain metaphorical meaning, and ‘gainst law of arms seems to be an unusual combination of words. The 7 4-grams that the scene shares with the Shakespearean corpus are, apart from those derived from the division of the 5-gram that has been mentioned earlier, the king of France, in the field and, the earl of Pembroke, my good lord for and yea my good lord. The 4-gram that seems to be the most distinctive of the group is yea my good lord, since the use of the word yea instead of yes can be seen as a linguistic choice of the author, although this linguistic form is more dialectal or context-dependent than idiolectal (see Section 3.4.4). The quantitative analysis of the common 3-grams and 2-grams associates Scene III.ii from Edward II with the Marlowian corpus. In addition, the scene also shares more 4- grams with the corpus of Marlowe than with that of the other candidate, and some of these are relatively distinctive, for which it seems highly probable that it was written by him. Hence, this study has been able to attribute the authorship of the scene to its author effectively. 154 Scene V.i from Marlowe’s Edward II (1,226 words) Table 37 shows the number of 3-grams and 2-grams that Scene V.i from Edward II shares with the reference corpora of the two candidates of the study. The discussion of these results will be followed by the qualitative analysis of the common 4-grams. Table 37 | N-gram tracing with Scene V.i from Edward II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 68 80 2-grams 389 397 The scene shares 80 3-grams with the corpus of Marlowe, which are twelve more than those that it has in common with the Shakespearean corpus (68). It also has eight more 2- grams in common with the Marlowian corpus (397) than with the corpus of Shakespeare (389), which stands as a narrow margin for a study of this kind. Scene V.i from Edward II has one more 4-gram in common with the Shakespearean corpus, with which it has 9, than with the Marlowian corpus from which it has been extracted. The 8 4-grams that the scene shares with the Marlowian corpus are I am a king, I know not but, my lord the king, my most gracious lord, man of noble birth, out of my sight, stay for rather than and what are you mov’d, which do not seem to stand as distinctive linguistic choices. The 4-grams that it shares with the corpus of the Bard are of my wrongs that, the name of king, upon my head and, we take our leave, me to my son, out of my sight, be guilty of so, my lord the king and I am a king. None of these constructions seems to be distinctive either and, as a matter of fact, some of them are similar or identical to those that the scene shares with the Marlowian corpus, for which they should not be seen as robust markers. Even though Scene V.i from Edward II shares one more 4-gram with the corpus of Shakespeare than with that of Marlowe, the qualitative analysis of these constructions does not clearly link the scene to any of the two playwrights. The quantitative analysis of the common 3-grams and 2-grams associates the authorship of the scene with the corpus of Marlowe, but not with the same degree of certainty as on other occasions. Taking all this into account, it seems slightly probable that the scene was written by Marlowe, for which the study has achieved its goal. 155 Scene I.i from Marlowe’s The Jew of Malta (1,424) The number of 3-grams and 2-grams that Scene I.i from The Jew of Malta shares with the corpus from which it has been extracted and with that of the other candidate can be observed in Table 38. After such results are discussed, the 5-grams and 4-grams in common will be revealed and qualitatively analysed. Table 38 | N-gram tracing with Scene I.i from The Jew of Malta Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 42 54 2-grams 348 382 As can be seen in Table 38, while Scene I.i from The Jew of Malta presents 54 3-grams in common with the corpus of Marlowe, it shares twelve less (42) with that of Shakespeare. Furthermore, there is a notable difference of thirty-four points between the 2-grams that the scene has in common with the corpus of Marlowe (382) and those that it shares with the Shakespearean corpus (348). Scene I.i from The Jew of Malta also has 4 4-grams in common with the Marlowian corpus, which are all the wealth in, but who comes here, it may be so and and here he comes. None of these 4-grams seems to contain a characteristic linguistic choice. The scene shares the 5-gram serve as well as I, which seems to be relatively unusual, with the Shakespearean corpus. It also shares 8 4-grams with his corpus, which are, apart from those that derive from the division of the abovementioned 5-gram, but who comes here, it may be so, in the council-house to, and here he comes, oft have I heard and the bowels of the. Among such 4-grams, in the council-house to seems to be relatively rare due to the presence of the compound word council-house. Moreover, oft have I heard appears to be a solid marker if the use of the word oft is taken into account, as well as the inversion of the words have and I, given that this construction appears in an affirmative sentence both in the extracted scene and the corpus of Shakespeare. The fact that the scene shares more 5-grams and 4-grams with the corpus of the Bard and that some these are distinctive contrasts with the results of the quantitative analysis presented earlier. In sum, the quantitative analysis of the common 3-grams and 2-grams attributes the authorship of Scene I.i from The Jew of Malta to Marlowe with great clarity, but the 156 qualitative analysis of the larger n-grams in common, which is meant to complement these results, associates the scene with the Shakespearean corpus. Even though this quantitative analysis has accomplished its objective effectively, the qualitative analysis of the common 5-grams and 4-grams undermines the certainty of the final verdict. Consequently, according to this study, it seems slightly probable that the scene was written by Marlowe. Scene IV.iv from Marlowe’s The Jew of Malta (1,135 words) The last sample included in this stage of the pre-study is Scene IV.iv from The Jew of Malta. Table 39 shows the 4-grams, 3-grams and 2-grams that this text shares with the reference corpora of Shakespeare and Marlowe. Table 39 | N-gram tracing with Scene IV.iv from The Jew of Malta Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 4 12 3-grams 52 90 2-grams 345 416 While Scene IV.iv from The Jew of Malta has 12 4-grams in common with the corpus of Marlowe, which stands as a large number for a study of this kind, it shares only 4 with the corpus of the other candidate. It also presents 90 3-grams in common with the Marlowian corpus and 52 with the corpus of the Bard, which creates a remarkable difference of thirty-eight points. The difference of the shared 2-grams is even more significant, given that the scene has seventy-one more in common with the corpus of Marlowe (416) than with the Shakespearean corpus (345). There are no larger n-grams in common between the scene and any of the two reference corpora, for which there will not be a qualitative analysis in this study. Given the clarity with which the quantitative analysis of the common 4-grams, 3- grams and 2-grams associates Scene IV.iv from The Jew of Malta with the corpus of Marlowe, it seems highly probable that it was written by him, and thus this study has successfully achieved its goal. 157 Conclusions derived from the third stage of the pre-study Five Shakespearean and five Marlowian scenes of between 1,100 and 1,700 words have been analysed as if their authorship was disputed to assess the effectiveness of n-gram tracing in this linguistic context. The ten scenes have been correctly attributed to their author, eight of which with a high degree of certainty, and thus this method will be used to analyse the authorship of the scenes of Arden of Faversham of the same length. 5.3.4. N-gram tracing with scenes of almost 2,000 words or more This last stage of the pre-study will analyse the authorship of scenes of almost 2,000 words or more applying the same methodological criteria that have been followed in the previous ones. Since there are only three Marlowian scenes of more than 2,000 words, Scene II.ii from Edward II, that has 1,995 words, will be also included. Therefore, five randomly selected Shakespearean scenes of more than 2,000 words and the only four Marlowian scenes of almost 2,000 words or more will be analysed independently to assess the validity of n-gram tracing in this linguistic context. Scene I.iii from Shakespeare’s Richard III (2,845 words) The first Shakespearean scene that has been randomly selected for this stage of the pre- study is Scene I.iii from Richard III. The number of 4-grams, 3-grams and 2-grams that it shares with the two reference corpora will be presented in Table 40, which will be discussed and complemented by the qualitative analysis of the common 5-grams. Table 40 | N-gram tracing with Scene I.iii from Richard III Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 21 20 3-grams 173 147 2-grams 844 744 As can be observed in Table 40, Scene I.iii from Richard III presents only one more 4- gram in common with the corpus of the Bard (21) than with the Marlowian corpus (20). Nevertheless, if the number of 3-grams that it shares with the Shakespearean corpus is compared to those that it has in common with the corpus of Marlowe, a notable difference of twenty-six points can be found (173 vs. 147). There is also a dramatic difference of one hundred 2-grams between those that it shares with the Shakespearean corpus (844) 158 and those that it has in common with the Marlowian corpus (744), which reflects the clarity with which this quantitative analysis associates the scene with the Bard. Scene I.iii from Richard III has one more 5-gram in common with the Shakespearean corpus, with which it shares 5, than with the Marlowian corpus. The 5 5-grams that the scene shares with the corpus of Shakespeare are we wait upon your grace, will you go with me, good time of day unto, here come the lords of and vain flourish of my fortune. Among these 5-grams, vain flourish of my fortune appears to be a robust authorship marker, given that the description of a flourishing fortune holds a metaphorical meaning and therefore stands as a distinctive linguistic choice of the author. On the other hand, the 4 5-grams that the scene has in common with the Marlowian corpus are my lord we will not, I can no longer hold, in presence of the king and my lord as much as, which do not seem to be particularly distinctive. Given the clarity of the quantitative analysis of the common 4-grams, 3-grams and 2- grams and the manner in which this has been reinforced by the qualitative analysis of the shared 5-grams, it seems highly probable that Scene I.iii from Richard III was written by Shakespeare, and thus this study has accomplished its objective successfully. Scene IV.iv from Shakespeare’s Richard III (4,268 words) The number of 4-grams, 3-grams and 2-grams that this scene shares with the reference corpora of the two candidates of the study can be observed in Table 41, which will be commented. Afterwards, the common 5-grams between the scene and the two reference corpora will be listed and qualitatively analysed. Table 41 | N-gram tracing with Scene IV.iv from Richard III Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 16 9 3-grams 173 150 2-grams 1,053 963 While the scene has 16 4-grams in common with the corpus of the Bard, it shares 9 with the Marlowian corpus, which stands as an acceptable margin of seven points for a study of this kind. It also shares 173 3-grams with the Shakespearean corpus and twenty-three less (150) with the corpus of the other candidate. The highest difference can be found if 159 the number of 2-grams that the scene has in common with the two corpora is observed, since there is a gap of ninety points between those that it shares with the corpus of Shakespeare (1053) and those that it has in common with the Marlowian corpus (963). Scene IV.iv from Richard III also shares 2 5-grams with the corpus of the Bard and 1 with that of Marlowe. The 2 5-grams that it has in common with the Shakespearean corpus are no my good lord therefore and will my lord with all, whereas the one that it shares with the corpus of the other candidate is I pray that I may. None of these 5-grams seems to stand as a distinctive linguistic choice. Taking into consideration the results of the quantitative analysis of the common 4- grams, 3-grams and 2-grams, it seems highly probable that the scene was written by Shakespeare, even though the qualitative analysis of the common 5-grams has not reinforced such results. Therefore, this study has attributed the scene to its author correctly. Scene V.iii from Shakespeare’s Richard III (2,726 words) Table 42 shows the number of 4-grams, 3-grams and 2-grams that Scene V.iii from Richard III shares with the Shakespearean corpus from which it has been extracted and with that of the other candidate. After these results are discussed, the common 5-grams will be revealed and qualitatively analysed. Table 42 | N-gram tracing with Scene V.iii from Richard III Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 13 10 3-grams 123 114 2-grams 701 631 There is a narrow difference of three points between the 4-grams that the scene shares with the corpus of Shakespeare (13) and with that of Marlowe (10). Furthermore, the scene only shares nine more 3-grams with the Shakespearean corpus (123) than with the corpus of Marlowe (114), which also stands as a low margin for a study of this kind. In contrast, there is a remarkable difference of seventy points between the 2-grams that it shares with the corpus from which it has been extracted and with that of the other candidate (701 vs. 631). 160 Scene V.iii from Richard III shares the 2 5-grams of the house of Lancaster and upon the stroke of four with the Shakespearean corpus. The first of these 5-grams includes the 2-gram of Lancaster, which also appears multiple times in the corpus of Marlowe, for which it should not be seen as a solid marker. Nevertheless, upon the stroke of four seems to be a more unusual construction. The scene has a 5-gram in common with the Marlowian corpus, which is I warrant you my lord. This expression does not appear to be distinctive and, as a matter of fact, the 3-gram you my lord can be found many times in the reference corpora of both candidates. It seems highly probable that the scene was written by Shakespeare, given that the quantitative analysis of the common 4-grams, 3-grams and 2-grams associates its authorship with him, and this has been slightly reinforced by the qualitative analysis of the shared 5-grams. Hence, the study has effectively linked the scene to the corpus from which it has been extracted. Scene I.iii from Shakespeare’s Richard II (2,402 words) The number of 4-grams, 3-grams and 2-grams that Scene I.iii from Richard II shares with the two reference corpora will be presented in Table 43 and later commented. This will be complemented by the qualitative analysis of the 5-grams in common. Table 43 | N-gram tracing with Scene I.iii from Richard II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 11 11 3-grams 93 104 2-grams 661 618 Scene I.iii from Richard II presents 11 4-grams in common with the two reference corpora and it shares eleven more 3-grams with the corpus of Marlowe (104) than with the Shakespearean corpus from which it has been extracted (93), which is surprising. In contrast, it has more 2-grams in common with the corpus of the Bard than with the corpus of Marlowe by a considerable margin of forty-three points (661 vs. 618). As a result of this difference in the number of 2-grams in common, it could be said that the quantitative analysis suggests that Shakespeare is slightly more likely to have written the scene, although the results are far from being as clear as on other occasions. 161 According to the analysis conducted by ALTXA, Scene I.iii from Richard II shares the 5-gram of you my noble cousin with the corpus of Shakespeare, which does not appear to be distinctive. The scene also has 4 5-grams in common with the Marlowian corpus, that is, three more than with the corpus of Shakespeare. These are the duty that you owe, lord I take my leave, by the grace of God and hardy as to touch the. Among these, hardy as to touch the seems to be the most unusual linguistic choice of the group. Nevertheless, the 4-gram hardy as to touch can be found as well in the Shakespearean corpus, for which this 5- gram should not be seen as a solid authorship marker. In sum, the quantitative analysis shows that Scene I.iii from Richard II shares the same number of 4-grams with the two reference corpora, more 3-grams with the corpus of Marlowe by a relatively narrow margin of eleven points and more 2-grams with the Shakespearean corpus by a notable difference of forty-three points. The qualitative analysis of the common 5-grams reveals that, even though the scene shares a few more with the corpus of Marlowe, none of them is distinctive. It could be said that it is uncertain if the scene was written by Shakespeare or Marlowe according to the study. This means that, despite the fact that it has not attributed the sample to the wrong candidate, it has not been able to clearly associate it with its author either. Scene II.i from Shakespeare’s Richard II (2,372 words) The last Shakespearean scene that has been randomly selected for this stage of the pre- study is Scene II.i from Richard II. The number of 3-grams and 2-grams that it shares with the two reference corpora can be observed in Table 44. After such results are discussed, the qualitative analysis of the common 4-grams will be provided. Table 44 | N-gram tracing with Scene II.i from Richard II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 81 75 2-grams 634 576 There is a difference of seven points if the 3-grams that the scene shares with the corpus of Shakespeare (82) are compared to those that it has in common with the Marlowian corpus (75). In addition, it has 636 2-grams in common with the Shakespearean corpus 162 and 576 with that of Marlowe, which creates a dramatic difference of sixty points and reflects the effectiveness with which this quantitative analysis associates the scene with the corpus from which it has been extracted. Since Scene II.i from Richard II does not share 10 4-grams with any of the two reference corpora, these will be analysed from a qualitative perspective. The scene has 8 4-grams in common with the Shakespearean corpus, which are the earl of Wiltshire, I do beseech your, to the duke of, so much for that, commends him to your, on my life and, and the hand of and the king is not. Among these 4-grams, I do beseech your seems to be the most distinctive of the group because of the presence of the auxiliar do between the subject and the verb to emphasize the construction. The other 4-grams include combinations of words that are frequent in plays of this kind or similar to others that appear in the corpus of the other candidate. For instance, the 4-gram the earl of Wiltshire is almost identical to the 3-gram earl of Wiltshire that can be found in the corpus of Marlowe. The scene only shares 2 4-grams with the Marlowian corpus. These are of the king for, which is mainly formed by function words, and if it be so, that includes the 2-gram it be, which also appears multiple times in the corpus of Shakespeare. It seems highly probable that Scene II.i from Richard II was written by Shakespeare, given the clarity of the results of the quantitative analysis of the common 3-grams and 2- grams, which has been reinforced by the qualitative analysis of the shared 4-grams. In brief, four of the five Shakespearean scenes that have been studied in this stage of the pre-study have been correctly attributed to their author, whereas the analysis of the remaining scene has led to inconclusive results. The next four scenes will be extracted from the corpus of Marlowe. Scene I.iv from Marlowe’s Edward II (3,329 words) The number of 4-grams, 3-grams and 2-grams that Scene I.iv from Edward II has in common with the Marlowian corpus and with that of Shakespeare will be presented in Table 45 and later commented. Afterwards, the larger n-grams that the scene shares with the two reference corpora will be revealed and qualitatively analysed. 163 Table 45 | N-gram tracing with Scene I.iv from Edward II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 22 39 3-grams 181 267 2-grams 881 986 The scene has seventeen more 4-grams in common with the Marlowian corpus (39) than with the corpus of the other candidate (22), which is a remarkable difference for a study of this kind. In addition, there is a gap of eighty-six points if the number of 3-grams that the scene shares with the corpus of Marlowe (267) is compared to those that it has in common with the Shakespearean corpus (181). There is also a dramatic difference of one hundred and five 2-grams if those that it shares with the Marlowian corpus (986) are compared to the ones that it has in common with the corpus of the Bard (881). The results of this quantitative analysis associate the scene with the corpus from which it has been extracted with a high degree of certainty. There is a 7-gram in common between the scene and the Marlowian corpus. This 7- gram, which is lord I come to bring you news, could be seen as a solid marker because of its length and the fact that it contains four lexical words. In addition to the 2 6-grams and 3 5-grams that derive from the division of this 7-gram, the scene also shares with the Marlowian corpus the 5-grams it shall be done my, whether I will or no and with the earl of Kent, which do not seem to be particularly distinctive. Scene I.iv from Edward II has 4 5-grams in common with the Shakespearean corpus. These are I will not yield to, my gracious lord I come, what would you have me and the duty that you owe, which seem to be frequent expressions in texts of this nature. Given the clarity of the quantitative analysis of the shared 4-grams, 3-grams and 2- grams, which has been reinforced by the qualitative analysis of the larger n-grams in common, it seems highly probable that Scene I.iv from Edward II was written by Marlowe, and hence this study has correctly attributed the sample to its author. Scene II.ii from Marlowe’s Edward II (1,995 words) Table 46 shows the number of 4-grams, 3-grams and 2-grams that Scene II.ii from Edward II shares with the corpus from which it has been extracted and with that of the 164 other candidate. After such results are discussed, the larger n-grams in common will be listed and analysed from a qualitative perspective. Table 46 | N-gram tracing with Scene II.ii from Edward II Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 10 31 3-grams 119 164 2-grams 569 649 There is a difference of twenty-one 4-grams between those that the scene shares with the corpus of Marlowe (31) and with the Shakespearean corpus (10). It is surprising that a text of 1,995 words has 31 4-grams in common with one of the corpora, which reflects the clarity with which the analysis links the scene to its author. There is also a gap of forty-five points if the 3-grams that the scene shares with the corpus of Marlowe (164) are compared to those that it has in common with the corpus of Shakespeare (119). Lastly, the scene presents eighty more 2-grams in common with the Marlowian corpus (649) than with the corpus of the Bard (569). While Scene II.ii from Edward II has no larger n-grams in common with the corpus of Shakespeare, it shares the 8-gram will be the ruin of the realm and with the corpus of Marlowe. This stands as the largest construction in common that has been found in the four stages of the pre-study. It also has 3 7-grams in common with the corpus of Marlowe, which are the 2 7-grams derived from the division of the abovementioned 8-gram and lord I come to bring you news. This construction could be seen as distinctive because of the number of words that it has and the fact that it is mainly formed by lexical words, as pointed out in the study of the previous scene, where this 7-gram was present as well. Apart from the 6-grams and 5-grams that derive from the division of the 8-gram and the 7-gram that have just been commented, the scene shares with the Marlowian corpus the 5-grams that I love thee well, which seems to be a frequent construction in texts of this nature, and I fear me he is, which appears to be relatively unusual. In total, Scene II.ii from Edward II shares 1 8-gram, 3 7-grams, 5 6-grams and 9 5-grams with the Marlowian corpus from which it has been extracted. 165 The clarity with which the quantitative analysis of the shared 4-grams, 3-grams and 2-grams associates Scene II.ii from Edward II with the corpus of Marlowe and the manner in which these results have been reinforced by the qualitative analysis of the larger n- grams in common suggest that it is highly probable that it was written by him, for which this study has successfully accomplished its objective. Scene I.ii from Marlowe’s The Jew of Malta (2,929 words) The results of the quantitative analysis of the 4-grams, 3-grams and 2-grams that Scene I.ii from The Jew of Malta has in common with the two reference corpora will be presented in Table 47. Once the results of the table are commented, the 6-grams and 5- grams in common will be revealed and analysed from a qualitative perspective. Table 47 | N-gram tracing with Scene I.ii from The Jew of Malta Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 9 23 3-grams 120 163 2-grams 726 817 There is a notable variation of fourteen points between the 4-grams that the scene shares with the Marlowian corpus (23) and with that of the other candidate (9). Furthermore, while the scene has 163 3-grams in common with the corpus of Marlowe, it shares 120, that is, forty-three less, with the Shakespearean corpus. There is also a dramatic difference of ninety-one points if the number of 2-grams that the scene shares with the corpus of Marlowe (817) is compared to those that it has in common with the corpus the Bard (726), which stands as solid proof to suggest that this quantitative analysis has been successful. The scene also has a 6-gram in common with the Marlowian corpus, which is too or it shall go hard. This 6-gram, which can be divided into 2 5-grams, appears to be relatively distinctive because of the combination of words go hard, which cannot be found in the Shakespearean corpus. The 5 5-grams that the scene has in common with the Marlowian corpus are, apart from those that derive from the division of the aforementioned 6-gram, I must be forced to, o my lord we will and my lord and here they. Among these 5-grams, I must be forced 166 to seems to stand out as the most distinctive of the group, whereas the others include the 2-gram my lord, which is highly frequent in texts of this nature. On the other hand, the only 5-gram in common between Scene I.ii from The Jew of Malta and the Shakespearean corpus is it may be so but, which seems to be a common combination of words. Given the clarity of the results of the quantitative analysis of the common 4-grams, 3- grams and 2-grams, which has been reinforced by the qualitative analysis of the shared 6-grams and 5-grams, it seems highly probable that Scene I.ii from The Jew of Malta was written by Marlowe. Therefore, this study has attributed the scene to its author effectively. Scene II.iii from Marlowe’s The Jew of Malta (3,034 words) The last sample included in this stage of the pre-study is Scene II.iii from The Jew of Malta. The number of 4-grams, 3-grams and 2-grams that it shares with the two reference corpora can be observed in Table 48, which will be later discussed. This will be complemented by the qualitative analysis of the shared 6-grams and 5-grams. Table 48 | N-gram tracing with Scene II.iii from The Jew of Malta Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 6 31 3-grams 144 212 2-grams 785 898 Table 48 shows that there is a notable difference of twenty-five points between the 4- grams that Scene II.iii from The Jew of Malta shares with the corpus from which it has been extracted (31) and with that of the other candidate (6). The scene also shares 212 3- grams with the corpus of Marlowe and 144, that is, sixty-eight less, with the corpus of Shakespeare. The highest difference can be found if the number of 2-grams that it has in common with the two reference corpora is compared, since it shares 898 with the Marlowian corpus and 785, that is, one hundred and thirteen less, with the corpus of the Bard. While Scene II.iii from The Jew of Malta does not share any larger n-grams with the corpus of Shakespeare, it presents the 6-gram too or it shall go hard in common with the Marlowian corpus. This 6-gram, which can be also found in the scene that has been 167 previously analysed, appears to be relatively distinctive because of the 2-gram go hard that it includes, which is not present in the Shakespearean corpus. The scene also shares 4 5-grams with the corpus of Marlowe. These are, apart from those that derive from the division of the abovementioned 6-gram, whether I will or no and o my lord we will, which do not seem to stand as unusual linguistic choices. According to the study, it seems highly probable that Scene II.iii from The Jew of Malta was written by Marlowe, since the quantitative analysis of the common 4-grams, 3-grams and 2-grams and the qualitative analysis of the shared 6-grams and 5-grams clearly associate the scene with his reference corpus. Conclusions derived from the fourth stage of the pre-study This stage of the pre-study has analysed five Shakespearean and four Marlowian scenes of almost 2,000 words or more as if their authorship was disputed with n-gram tracing. Eight of these scenes have been correctly attributed to their author with a high degree of certainty, whereas the results derived from the analysis of the remaining scene could not associate its authorship with any of the two candidates. This method has had an effectiveness of 88.8% with the scenes of this group. In addition, the only time in which it has not linked a scene to its author, the results have been inconclusive, which means that no cases of misattribution have been found. For these two reasons, n-gram tracing will be used to analyse the authorship of the scenes of Arden of Faversham of almost 2,000 words or more. 5.3.5. Conclusions derived from Pre-study 3 This pre-study has assessed the effectiveness of n-gram tracing to distinguish between Shakespearean and Marlowian scenes from plays that are not comedies and were written between 1590 and 1595, approximately. The first stage of the pre-study has analysed the authorship of ten scenes of between 100 and 450 words independently and all of them have been correctly attributed to their author. The second stage has studied the authorship of ten scenes whose length ranges from 500 to 950 words and eight of them have been successfully associated with their author, whereas the results derived from the analysis of the other two scenes have been inconclusive. The third stage has analysed ten scenes of between 1,100 and 1,700 words and the method has had an effectiveness of 100%, as happened in the first stage. The last stage of the pre-study has analysed the authorship of 168 nine scenes of almost 2,000 words or more, eight of which have been effectively linked to their author, whereas the remaining scene could not be attributed to any of the two candidates. If these results are evaluated from a holistic point of view, the present pre-study has analysed the authorship of thirty-nine undisputed scenes with n-gram tracing and thirty- six of them have been successfully attributed to their author, which implies an effectiveness of 92.3%. In addition, there has not been any cases of misattribution, which also reflects the reliability of the method, according to the line of thought suggested by Grant in 2007 (see Section 3.4.4). In sum, n-gram tracing has been proved to be highly effective in the four stages of the pre-study, for which all the scenes of Arden of Faversham will be analysed with this method. The following pre-study will test the reliability of the Zeta test. 5.4. Pre-study on the conduction of the Zeta test (Pre-study 4) This last pre-study will evaluate the effectiveness of the Zeta test to distinguish between Shakespearean and Marlowian scenes of almost 2,000 words or more (see Section 4.5.5 for an account of the reasons why shorter scenes will not be included in the pre-study). Five random scenes from the Shakespearean corpus and the only four scenes of the Marlowian corpus of such length will be extracted and analysed independently. If this method can correctly associate these scenes with the corpus from which they have been removed, it will prove its effectiveness to be later used in the analysis of the scenes of Arden of Faversham. The stop list with all the words ignored as potential markers for the conduction of all the Zeta tests of this thesis has been included in Appendix 3. This list of ignored words has been gradually completed after repeating the process of obtaining the lists of 500 markers of the two authors in every Zeta test until all these lists of markers have only been formed by distinctive words. This means that common function words whose usage does not reflect an idiolectal choice, proper names and lexical words that are heavily dependent on play-specific contexts and therefore do not reflect an authorial pattern have been discarded for the conduction of this method (see Section 4.5.5). If it is not discernible at plain sight which centroid of the two clusters formed by the fragments of the reference corpora is closer to the fragments of the disputed text when these are placed on a coordinate axis, the formula |𝐴𝐵⃗⃗⃗⃗ ⃗| = √(𝑥2 − 𝑥1)2 + (𝑦2 − 𝑦1)2 will be used to measure the exact distances. 169 Once the scenes are analysed independently, the results derived from the pre-study will be interpreted from a holistic perspective and its main findings will be summarized. 5.4.1. Zeta test with scenes of almost 2,000 words or more Scene I.ii from Shakespeare’s Richard III (2,062 words) The first Shakespearean scene that has been randomly selected to test the effectiveness of the Zeta test is Scene I.ii from Richard III, which has been removed from the reference corpus where it belongs and introduced in ALTXA as a disputed text. The software has calculated the 500 markers of each of the two reference corpora after the removal of the scene and placed every fragment in which these and the scene itself have been divided on a coordinate axis, as can be observed in Figure 4 (see Section 4.5.5 for a detailed explanation of the procedures underlying the calculation of such markers and the conduction of this test in general). Figure 4 | Zeta test with Scene I.ii from Richard III 170 The red squares in Figure 4 represent the fragments in which the remaining Shakespearean corpus has been divided, the blue circles stand as the fragments derived from the division of the corpus of Marlowe and Scene I.ii from Richard III is represented by the black triangle. All these fragments contain 2,000 words or 2,000 plus the residual words at the end of a text. The value on the horizontal axis of each fragment is derived from the division of the number of Shakespearean markers that it contains by its number of distinct words, whereas its value on the vertical axis stands as the division of the Marlowian markers that it has by its number of distinct words. The division of the number of markers that the fragments have by their number of distinct words is made to compensate the dissimilar size that some of them present (see Section 4.5.5). The black triangle that represents Scene I.ii from Richard III is clearly closer to the Shakespearean cluster, and thus this Zeta test has correctly linked the scene to the corpus from which it has been extracted. Scene I.iii from Shakespeare’s Richard III (2,845 words) Scene I.iii from Richard III has been introduced in ALTXA as a disputed text and, after calculating the 500 markers of the remaining Shakespearean corpus and those of the Marlowian corpus, the software has placed on a coordinate axis the fragments of 2,000 words or more in which the three samples have been divided. As can be observed in Figure 5, the black triangle that represents Scene I.iii from Richard III is considerably closer to the cluster of red squares that stand as the fragments in which the Shakespearean corpus has been divided than to the blue circles that represent the fragments in which the corpus of Marlowe has been divided. Therefore, the Zeta test has successfully associated the authorship of the scene with the corpus from which it has been removed. 171 Figure 5 | Zeta test with Scene I.iii from Richard III Scene V.iii from Shakespeare’s Richard III (2,726 words) Scene V.iii from Richard III has been extracted from the Shakespearean corpus and analysed as a disputed text by ALTXA using the Zeta test. The graphical representation of the results derived from this study can be observed in Figure 6. Figure 6 shows that Scene V.iii from Richard III, which is represented by the black triangle, is notably close to the cluster formed by the red squares that stand as the fragments in which the Shakespearean corpus has been divided, whereas the Marlowian cluster can be found in a distant position from them. Therefore, this Zeta test has correctly attributed the authorship of Scene V.iii from Richard III to Shakespeare. 172 Figure 6 | Zeta test with Scene V.iii from Richard III Scene I.iii from Shakespeare’s Richard II (2,402 words) The authorship of Scene I.iii from Richard II, which has been extracted from the corpus where it belongs, has been analysed with ALTXA using the Zeta test. Once the 500 markers of the remaining Shakespearean corpus and those of the corpus of Marlowe have been obtained, the fragments in which these corpora have been divided and the only fragment that stands as the disputed scene have been placed on a coordinate axis that can be observed in Figure 7. 173 Figure 7 | Zeta test with Scene I.iii from Richard II Figure 7 shows that there is great proximity between the black triangle that stands as Scene I.iii from Richard II and the cluster of red squares formed by the fragments in which the Shakespearean corpus has been divided. In contrast, the blue circles that represent the fragments in which the corpus of Marlowe has been divided are considerably far from the black triangle. According to this Zeta test, the scene was written by Shakespeare, for which it has been successful. Scene IV.i from Shakespeare’s Richard II (2,628 words) Scene IV.i from Richard II is the last Shakespearean scene included in this pre-study. After the calculation of the 500 markers of the Shakespearean corpus from which it has been extracted and those of the corpus of Marlowe, the samples have been divided in fragments of 2,000 words or more and placed on a coordinate axis in terms of the markers from the two lists that they contain, as can be observed in Figure 8. 174 Figure 8 | Zeta test with Scene IV.i from Richard II The blue circles of Figure 8 that represent the fragments in which the Marlowian corpus has been divided are distant from the black triangle and the red squares, which are relatively close and represent Scene IV.i from Richard II and the fragments in which the corpus of Shakespeare has been divided, respectively. It seems evident that this Zeta test associates the authorship of the scene with the Shakespearean corpus. In sum, the Zeta test has effectively associated the five Shakespearean scenes that have been analysed as disputed texts with the corpus from which they have been extracted. The next four scenes of the pre-study will be taken from the corpus of Marlowe. 175 Scene I.iv from Marlowe’s Edward II (3,329 words) The first Marlowian sample included in this pre-study is Scene I.iv from Edward II, which has been extracted from the reference corpus where it belongs and analsysed as a disputed text using the Zeta test. The software ALTXA has quantified the 500 markers of the remaining Marlowian corpus, as well as those of the corpus of the other candidate, and it has placed the fragments in which these corpora and the scene have been divided on a coordinate axis according to the markers of both lists that they contain. Figure 9 | Zeta test with Scene I.iv from Edward II Figure 9 shows that the black triangle that represents Scene I.iv from Edward II is notably closer to the cluster of blue circles formed by the fragments in which the Marlowian corpus has been divided than to the Shakespearean cluster. Therefore, this Zeta test has correctly linked the scene to the corpus of Marlowe. 176 Scene II.ii from Marlowe’s Edward II (1,995 words) Scene II.ii from Edward II has been removed from the corpus of Marlowe and ALTXA has analysed its authorship using the Zeta test. The software has elaborated a list of the 500 markers of the remaining Marlowian corpus and a list of the 500 markers of the corpus of Shakespeare. Afterwards, it has placed on a coordinate axis the fragments of 2,000 words or more in which these corpora and the scene have been divided in terms of the number of markers of each type that they present. Figure 10 | Zeta test with Scene II.ii from Edward II As can be seen in Figure 10, the results derived from this Zeta test associate the scene with the corpus from which it has been extracted with more clarity than on any previous occasion. The black triangle that represents Scene II.ii from Edward II is almost within the same area occupied by the Marlowian cluster, which reflects the effectiveness of the method. 177 Scene I.ii from Marlowe’s The Jew of Malta (2,929 words) Scene I.ii from The Jew of Malta has been extracted from the Marlowian corpus and analysed following the same methodological principles described in the study of previous scenes. The graphical representation of the results provided by ALTXA can be observed in Figure 11. Figure 11 | Zeta test with Scene I.ii from The Jew of Malta It is discernible at plain sight that the black triangle that represents Scene I.ii from The Jew of Malta is notably close to the Marlowian cluster and that these are distant from the area occupied by the red squares that represent the Shakespearean fragments. The Zeta test conducted by ALTXA has determined the authorship of the scene correctly. Scene II.iii from The Jew of Malta (3,034 words) The last sample selected to assess the effectiveness of the Zeta test is Scene II.iii from The Jew of Malta, which has been analysed by ALTXA following the same process 178 described during the study of previous scenes. The position on the coordinate axis of the fragment that represents this scene and that of the fragments in which the two reference corpora have been divided can be observed in Figure 12. Figure 12 | Zeta test with Scene II.iii from The Jew of Malta Figure 12 shows that the Zeta test has effectively accomplished its objective, since the black triangle that stands as Scene II.iii from The Jew of Malta is considerably closer to the centroid of the Marlowian cluster than to that of the Shakespearean cluster. As in previous studies, it is not necessary to calculate the exact distances, given that the results are discernible at plain sight. 5.4.2. Interpretation of the results The reader may be surprised by the fact that the fragments of the scenes that have been analysed as disputed texts are not within the same area occupied by the cluster of the reference corpus from which they have been extracted, but in a near position, which 179 contrasts with studies such as Kinney’s (see Appendix 2). I would like to provide the reason why they have occupied such positions on the coordinate axis. The reference corpora used by Kinney (2009) and Elliott and Greatley-Hirsch (2017) to analyse the authorship of Arden of Faversham were compiled with many plays from distinct periods, some of which are comedies, and this contrasts with one of the main hypotheses suggested in this thesis, which is related to the criteria for the compilation of the reference corpora in studies of this kind (see Chapter 4). This investigation has compiled the reference corpora of the two candidates with plays that have a similar tone to that of Arden of Faversham and were written in a similar period, and thus they include less texts than the corpora of the studies that have been previously referenced. This means that the area of the clusters formed by their fragments will be smaller and, as a result, it is normal that the fragments of the disputed texts do not fall exactly within the same space. The graphical representation of the results of the Zeta tests conducted by the authors mentioned above are more visually pleasing, given that almost the whole coordinate axis is filled with fragments. Nevertheless, most of these fragments belong to plays whose features are not closely related to those of the disputed text that they put into analysis, as has been argued in Chapter 4, which may even lead to deceiving conclusions about its authorship. The guarantee that the approach adopted for the conduction of the Zeta tests of this doctoral thesis is reliable is that the nine scenes from the reference corpora that have been analysed as disputed texts have been correctly attributed to their author, which means that the method has had a success rate of 100%. In addition, these studies allow the researcher and the reader to have a reference of the type of outcome that can be considered acceptable for the attribution of authorship of the scenes of almost 2,000 words or more from Arden of Faversham that will be analysed with this method in Chapter 6. 5.4.3. Conclusions derived from Pre-study 4 This last pre-study has analysed the authorship of five Shakespearean and four Marlowian scenes of almost 2,000 words or more as if they were disputed texts with the Zeta test. The nine samples have been successfully associated with the reference corpus from which they have been extracted, and thus this method will be used to determine the likeliest authorship of the scenes of Arden of Faversham of such length. 180 This pre-study has also been useful to observe the kind of outcome that can be expected in the analysis of the scenes of Arden of Faversham. Since the reference corpora of the candidates have been compiled considering certain variables that have been overlooked in other studies, there are differences in the graphical representation of the results that these Zeta tests provide in comparison with others. Nevertheless, those conducted following this approach have had an effectiveness of 100%, which stands as a reflection of its reliability. 5.5. Summary This chapter has presented a series of pre-studies about the effectiveness of four authorship attribution methods to distinguish between Shakespearean and Marlowian scenes from plays that were approximately written between 1590 and 1595 and are not comedies. These have been based on the calculation of the average number of words per sentence of the scenes (Pre-study 1), the calculation of their lexical richness (Pre-study 2), n-gram tracing (Pre-study 3) and the conduction of the Zeta test (Pre-study 4). The first three pre-studies have included samples whose length ranges from 100 to 450 words, from 500 to 950, from 1,100 to 1,700 and samples whose length is similar or superior to 2,000 words, and thus they have been divided into four stages. In contrast, the pre-study on the reliability of the Zeta test has only focused on scenes of almost 2,000 words or more (see Section 4.5.5 for a justification of such methodological decision). The pre-studies on the quantification of the average number of words per sentence and the lexical richness of the scenes have assessed whether there is enough intra-author consistency and inter-author variation within the undisputed scenes, whereas the pre- studies on n-gram tracing and the Zeta test have extracted scenes from the reference corpora to later discern if these methods could associate them with the corpus from which they have been removed. Firstly, the quantification of the average number of words per sentence has shown great intra-author variation when it has been applied with the first groups of scenes. Even though this intra-author consistency has increased slightly with the third and the fourth group of scenes, these have not presented sufficient inter-author variation, for which this method has been discarded from the final case study. Secondly, the results derived from the calculation of the lexical richness of the scenes have presented a similar tendency to that of the previous method. While the first groups 181 of scenes have shown no intra-author consistency, this has increased with the size of the samples. In fact, the intra-author consistency of the third and the fourth group has been superior to that obtained by the previous method. Nevertheless, the overlapping results of the scenes of both playwrights has not allowed for the inclusion of this test in the final case study. Thirdly, n-gram tracing has shown a high degree of effectiveness in the study of the four types of scenes. Furthermore, in those few cases in which this method could not associate the authorship of a scene with its author, it has not misattributed any of them. Consequently, n-gram tracing has been selected to carry out the authorship analysis of all the scenes of Arden of Faversham. Finally, the Zeta test has been proved to be effective to analyse the authorship of Shakespearean and Marlowian samples that have almost 2,000 words or more, for which this method has been selected to study the scenes of Arden of Faversham of this length. The following chapter will attribute to each scene of Arden of Faversham its likeliest authorship using n-gram tracing, which will be complemented by the Zeta test in those cases in which the disputed sample contains almost 2,000 words or more. 182 CHAPTER 6 | CASE STUDY: ATTRIBUTION OF AUTHORSHIP OF THE SCENES OF ARDEN OF FAVERSHAM This chapter will present the results derived from the study of the authorship of the scenes of Arden of Faversham. The methods that will be applied in each scene are those that have been proved to be reliable in the pre-studies that have analysed undisputed scenes of distinct lengths written by Shakespeare and Marlowe (see Chapter 5). The scenes of Arden of Faversham whose length is similar or superior to 2,000 words will be analysed with n-gram tracing and the Zeta test, whereas only n-gram tracing will be applied with the shorter ones. The results of these studies will be provided and discussed in the same order that the scenes are present in the play. 6.1. Scene I.i (5,135 words) The first scene of Arden of Faversham is the longest of the play, with 5,135 words. Consequently, its authorship has been analysed with n-gram tracing, whose results will be provided first, and a Zeta test, which will be presented afterwards. Attribution of authorship of Scene I.i from Arden of Faversham with n-gram tracing Scene I.i from Arden of Faversham has been extracted from the play and ALTXA has identified the n-grams that it shares with the Shakespearean and the Marlowian corpora. The size of these reference corpora has been balanced so that both candidates are in equal conditions to become the likeliest author of the scene (see Section 4.5.4). The results derived from the quantitative analysis of the 4-grams, 3-grams and 2-grams that it has in common with the reference corpora will be provided in Table 49, which will be later commented. This will be complemented by the qualitative analysis of the larger n-grams in common (see Section 4.5.4 for an account of such methodological decision). Table 49 | N-gram tracing with Scene I.i from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 4-grams 24 43 3-grams 259 322 2-grams 1,279 1,394 Table 49 shows that Scene I.i from Arden of Faversham has more 4-grams, 3-grams and 2-grams in common with the Marlowian corpus than with that of Shakespeare. There is a 183 significant difference of almost twenty 4-grams if those that the scene shares with the corpus of Marlowe (43) are compared with those that it has in common with the Shakespearean corpus (24). The scene also has sixty-three more 3-grams in common with the Marlowian corpus (322) than with the corpus of the Bard (259), which stands as a remarkable distance for a study of this kind. Furthermore, it shares 1,394 2-grams with the Marlowian corpus, which creates a dramatic difference of one hundred and fifteen points if these are compared with the ones that it has in common with the corpus of Shakespeare (1,279). Scene I.i from Arden of Faversham shares the 7-gram I know he loves me well but with the Shakespearean corpus. Even though this construction can be seen as relatively distinctive due to its length, it contains the expression he loves me well, which can be also found in the Marlowian corpus as a 4-gram and stands as a common combination of words in texts of this nature. The scene also shares with the corpus of the Bard the 2 6-grams and 3 5-grams that derive from the division of the previously referenced 7-gram, as well as the 5-gram tell him what you say. This 5-gram appears to be a common expression, for which it should not be seen as a reliable idiolectal marker either. On the other hand, the scene shares the 6-gram for I had rather die than with the reference corpus of Marlowe. The expression I had rather die, by which a character expresses the will to sacrifice their life, cannot be found in the Shakespearean corpus and hence it could be seen as distinctive. In other words, this 6-gram seems to contain a more unusual combination of words than the 7-gram that the scene has in common with the Shakespearean corpus. Scene I.i from Arden of Faversham also shares 3 5-grams with the Marlowian corpus, which are the 2 5-grams that derive from the division of the 6-gram that has been previously commented and I have it for you, that does not seem to include a particular selection of words. If this scene was written by Shakespeare or Marlowe, the present study suggests that it is highly probable that Marlowe is its author, given that the quantitative analysis of the common 4-grams, 3-grams and 2-grams clearly links the authorship of the scene to him, and these results have been reinforced by the qualitative analysis of the larger n-grams in common. 184 Attribution of authorship of Scene I.i from Arden of Faversham with the Zeta test Given its length, the authorship of the scene has also been analysed with a Zeta test, whose graphical representation can be observed in Figure 13. The plays of the reference corpora of Shakespeare and Marlowe have been divided by ALTXA in fragments of 2,000 words and the residual ones at the end of each play have been added to its last fragment. Afterwards, ALTXA has elaborated a list of 500 markers that appear in a considerable proportion of the Shakespearean fragments and are not present in many of the fragments in which the Marlowian corpus has been divided, as well as another list of 500 Marlowian markers (see Section 4.5.5 for a detailed explanation of the formulas underlying the calculation of the 500 markers for each author). The stop list with all the words that have been ignored as potential markers can be observed in Appendix 3, and Appendix 4 includes the lists of 500 markers for the conduction of the Zeta tests of this chapter. The blue circles that appear on the upper left area of the coordinate axis forming a cluster represent the fragments in which the plays from the Marlowian reference corpus have been divided and their coordinates could be explained as follows. The value of the vertical axis stands as the number of Marlowian markers that a fragment contains divided by its number of distinct words, whereas the value of the horizontal axis is determined by the division of the number of Shakespearean markers that it has by its number of distinct words. The Shakespearean fragments have been placed under the same principles that have just been described and are represented by the red squares that can be found on the lower right area forming another cluster. Lastly, the black triangles represent the two fragments in which Scene I.i from Arden of Faversham has been divided. Since the scene has 5,135 words, one of the fragments contains 2,000 words, whereas the other one has 2,000 plus the residual 1,135 which are at the end of the scene. Their coordinates have also been determined by the criteria described in the previous paragraph. As can be observed in Figure 13, it is discernible at plain sight that the fragments of Scene I.i from Arden of Faversham are considerably closer to the Marlowian cluster than to the area occupied by that of Shakespeare, which means that the scene contains more markers from the Marlowian list than from that of the other candidate. 185 Nevertheless, for the sake of clarity, the centroid of each cluster has been calculated by establishing the average values of the X and Y coordinates of all its fragments and the formula |𝐴𝐵⃗⃗⃗⃗ ⃗| = √(𝑥2 − 𝑥1)2 + (𝑦2 − 𝑦1)2 has been applied to measure the distances between the two fragments in which Scene I.i from Arden of Faversham has been divided and these centroids. The distances between the two disputed fragments and the centroid of the Marlowian cluster on the coordinate axis are of 0.07085 and 0.0757 points, whereas their distances from the centroid of the Shakespearean cluster are of 0.1293 and 0.12066 points. Therefore, this Zeta test reveals that Marlowe is the likeliest author of the scene. Figure 13 | Zeta test with Scene I.i from Arden of Faversham In brief, the authorship of Scene I.i from Arden of Faversham has been analysed with n- gram tracing and a Zeta test and both methods have concluded with great certainty that Marlowe is more likely to have written it than Shakespeare, if it was indeed written by one of them. 186 6.2. Scene II.i (916 words) Scene II.i from Arden of Faversham contains 916 words, for which its authorship has only been analysed with n-gram tracing. The number of 3-grams and 2-grams that this scene shares with the Shakespearean and the Marlowian corpora will be provided in Table 50. After such results are commented, the 4-grams that it shares with them will be revealed and qualitatively analysed. Table 50 | N-gram tracing with Scene II.i from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with and the Marlowian corpus 3-grams 47 62 2-grams 280 294 Table 50 shows that Scene II.i from Arden of Faversham has more 3-grams and 2-grams in common with the Marlowian corpus than with the corpus of the Bard. While the scene shares 62 3-grams with the corpus of Marlowe, it has fifteen less in common, that is, 47, with the Shakespearean corpus, which stands as a significant distance if the fact that the text contains 916 words is taken into consideration. Furthermore, there is a difference of fourteen points between the 2-grams that it shares with the Marlowian corpus (294) and those that it has in common with the corpus of Shakespeare (280). While the scene has no larger n-grams in common with the Shakespearean corpus, it shares 6 4-grams with the corpus of Marlowe, which are as if he had, I know not but, I must to the, what wilt thou give, and I am bound and me and I am. These 4-grams are mainly formed by function words and the few lexical words that they include, such as know or give, appear to be quite frequent, for which it seems that none of these combinations of words is distinctive. If Scene II.i from Arden of Faversham was written by Shakespeare or Marlowe, it seems highly probable that the latter is its author. The quantitative analysis of the common 3-grams and 2-grams clearly associates the authorship of the scene with the Marlowian corpus, with which it also shares 6 4-grams, even though they are not particularly distinctive. 187 6.3. Scene II.ii (1,694 words) The authorship of Scene II.ii from Arden of Faversham has been analysed with n-gram tracing and, even though it does not have 2,000 words, with the Zeta test. The main reason underlying this decision is that its length is close to that of the fragments in which the undisputed plays of Shakespeare and Marlowe are divided during the conduction of the Zeta test, for which it seems sensible to include the scene in the procedure. Attribution of authorship of Scene II.ii from Arden of Faversham with n-gram tracing The software ALTXA has identified the n-grams that Scene II.ii from Arden of Faversham has in common with the reference corpora of the two candidates of the study. The number of shared 3-grams and 2-grams can be observed in Table 51, which will be later commented. This will be complemented by the qualitative analysis of the 4-grams in common. Table 51 | N-gram tracing with Scene II.ii from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 62 85 2-grams 455 501 The quantitative analysis of the common 3-grams and 2-grams clearly links the authorship of Scene II.ii from Arden of Faversham to Marlowe. The scene has 85 3-grams in common with the Marlowian corpus, whereas it shares twenty-three less, that is, 62, with the corpus of the Bard. There is also a dramatic difference of forty-six points if the 2- grams that the scene shares with the corpus of Marlowe (501) are compared to those that it has in common with the Shakespearean corpus (455). Scene II.ii from Arden of Faversham shares 5 4-grams with the Shakespearean corpus, which are no more but this, that you have ta’en, and for her sake, as well as I and that thou hast done. These 4-grams seem to be common combinations of words that should not be seen as solid authorship markers. The scene shares 6 4-grams with the Marlowian corpus, that is, one more than with the corpus of the Bard. These are not so much as, I must have more, what’s that to thee, as well as I, you will let my and if he be not. Among these constructions, if he be not 188 appears to be an uncommon combination of words that includes the infinitive form of the verb to be instead of is, which could be seen as an idiolectal choice. According to this study, if Scene II.ii from Arden of Faversham was written by Shakespeare or Marlowe, it seems highly probable that the latter elaborated it, given that the clarity of the results of the quantitative analysis of the common 3-grams and 2-grams has been slightly reinforced by the qualitative analysis of the 4-grams in common. Attribution of authorship of Scene II.ii from Arden of Faversham with the Zeta test As pointed out earlier, the fact that the scene contains almost 2,000 words makes it suitable to carry out the Zeta test, since it has a comparable size to that of the fragments in which the reference corpora of Shakespeare and Marlowe are divided for its conduction. Figure 14 reflects the position on the coordinate axis of the fragments of the two reference corpora as well as that of the fragment that represents Scene II.ii from Arden of Faversham. The criteria for the division of these fragments and the determination of their coordinates are the same as in the study of Scene I.i (see Appendix 3 for the stop list with all the ignored words as potential markers and Appendix 4 for the lists of 500 markers of the two candidates). The Marlowian fragments, which are represented by blue circles, form a cluster on the upper left area, whereas the Shakespearean fragments, which are represented by red squares, create a cluster on the lower right area. It is evident that the black triangle that stands as the fragment of 1,694 words of Scene II.ii from Arden of Faversham is considerably closer to the Marlowian cluster. The distance between this fragment and the centroid of the Marlowian cluster is of 0.06293 points, whereas its distance from the centroid of the Shakespearean cluster is of 0.13372 points. According to this Zeta test, Christopher Marlowe is the likeliest author of the text. The two methods employed to analyse the authorship of the text have presented consistent results and thus it could be argued that, if Scene II.ii from Arden of Faversham was written by Shakespeare or Marlowe, the latter is more likely to have written it. 189 Figure 14 | Zeta test with Scene II.ii from Arden of Faversham 6.4. Scene III.i (822 words) The first scene of the third act of Arden of Faversham contains 822 words, for which its authorship has only been studied with n-gram tracing. The results derived from the quantitative analysis of the common 3-grams and 2-grams between the scene and the two reference corpora can be observed in Table 52. After such results are commented, the 4- grams that the scene shares with them will be qualitatively analysed. Table 52 | N-gram tracing with Scene III.i from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 36 30 2-grams 232 222 190 Table 52 shows that Scene III.i from Arden of Faversham has a few more 3-grams and 2- grams in common with the Shakespearean corpus than with that of Marlowe, which breaks with the trend of previous studies. There is a narrow distance of six points if the 3-grams that the scene shares with the Shakespearean corpus (36) are compared to those that it has in common with the Marlowian corpus (30). Furthermore, while it has 232 2- grams in common with the corpus of the Bard, it shares ten less, that is, 222, with the corpus of Marlowe, which stands again as a relatively low difference if the length of the scene is taken into consideration. The analysis of the larger n-grams in common conducted by ALTXA reveals that Scene III.i from Arden of Faversham shares the 4-gram the hour of death with the Shakespearean corpus, which holds a metaphorical meaning and therefore can be seen as distinctive. The scene also has 2 4-grams in common with the Marlowian corpus, which are let us go to, which seems to be a frequent expression, and I like not this. The negative construction I like not appears to be a conscious choice of the author, given that he could have written I do not like, and this cannot be found in the corpus of Shakespeare. This means that the scene has a distinctive construction of four words in common with the two candidates of the study. The study suggests that if Scene III.i from Arden of Faversham was written by Shakespeare or Marlowe, the Bard is slightly more likely to be its author, given that the quantitative analysis of the shared 3-grams and 2-grams associates the scene with him by a narrow margin, while the qualitative analysis of the larger n-grams in common seems to be inconclusive. 6.5. Scene III.ii (516 words) Scene III.ii from Arden of Faversham contains 516 words, for which its authorship has only been studied with n-gram tracing. The results derived from the quantitative analysis of the 3-grams and 2-grams that the scene has in common with the two reference corpora will be provided in Table 53, which will be later discussed. This will be complemented by the qualitative analysis of the shared 4-grams. 191 Table 53 | N-gram tracing with Scene III.ii from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 25 37 2-grams 171 175 As can be observed in Table 53, there is a difference of twelve points between the 3- grams that the scene shares with the Marlowian corpus (37) and those that it has in common with the corpus of the Bard (25), which can be seen as relatively significant, given that the length of the scene is of 516 words. The number of 2-grams that it shares with the two reference corpora is more balanced, since it has only four more in common with the corpus of Marlowe (175) than with that of Shakespeare (171), which stands as a narrow margin. The analysis conducted by ALTXA reveals that Scene III.ii from Arden of Faversham only shares the 4-gram it will not be with the corpus of Shakespeare, which seems to be a frequent construction and, as a matter of fact, it can be also found in the Marlowian corpus, as will be mentioned in the next paragraph. The scene has the 8-gram and then let me alone to handle him in common with the Marlowian corpus, which seems to be highly unique not only because of its length, but also because of the combination of words let me alone to handle, which cannot be found in the Shakespearean corpus. This 8-gram can be divided into 2 7-grams, 3 6-grams, 4-5 grams and 5 4-grams. In addition to these 5 4-grams, the scene also shares with the Marlowian corpus the 4-grams it will not be, the pleasures of the and of the day and, which do not seem to be distinctive. According to this study, if Scene III.ii from Arden of Faversham was written by Shakespeare or Marlowe, it seems highly probable that the latter is its author. The results of the quantitative analysis of the shared 3-grams and 2-grams, which associate the authorship of the scene with Marlowe, have been reinforced by the presence of a highly distinctive 8-gram in common. 6.6. Scene III.iii (357 words) Table 54 shows the number of 3-grams and 2-grams that Scene III.iii from Arden of Faversham shares with the reference corpora of the two candidates of the study. After 192 these results are discussed, the qualitative analysis of the shared 4-grams will be conducted. Table 54 | N-gram tracing with Scene III.iii from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 10 19 2-grams 90 99 The scene has 19 3-grams in common with the Marlowian corpus, while it shares 10, that is, nine less, with the corpus of the Bard. It also shares nine more 2-grams with the corpus of Marlowe (99) than with the Shakespearean corpus (90). These differences appear to be acceptable if the fact that the scene only contains 357 words is taken into consideration. According to the analysis conducted by ALTXA, Scene III.iii from Arden of Faversham has 2 4-grams in common with the Shakespearean corpus, which are I’ll bear you company and it may be so. The first one should not be considered distinctive because the 3-gram bear you company can be also found in the Marlowian corpus. Similarly, the 4-gram it may be so is also present in the Marlowian corpus, as will be revealed in the following paragraph. The scene shares the 5-gram you shall go with me with the corpus of Marlowe, which does not seem to be a particular selection of words. It also has 3 4-grams in common with his corpus, which are those that derive from the division of the aforementioned 5-gram and it may be so, which can be also found in the Shakespearean corpus, as pointed out earlier. If Scene III.iii from Arden of Faversham was written by Shakespeare or Marlowe, it seems highly probable that the latter is its author, given that the quantitative analysis of the common 3-grams and 2-grams links the scene to him with clarity for such a short sample. In addition, the scene also shares more 5-grams and 4-grams with the Marlowian corpus than with that of Shakespeare, despite the fact that none of them seems to be distinctive. 6.7. Scene III.iv (240 words) ALTXA has identified the n-grams that Scene III.iv from Arden of Faversham shares with the Shakespearean and the Marlowian reference corpora. The number of common 3- 193 grams and 2-grams can be observed in Table 55, which will be later discussed. The qualitative analysis of the shared 4-grams will be provided afterwards. Table 55 | N-gram tracing with Scene III.iv from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with and the Marlowian corpus 3-grams 6 14 2-grams 73 91 Scene III.iv from Arden of Faversham has more 3-grams and 2-grams in common with the Marlowian corpus than with that of the Bard. While the scene has 14 3-grams in common with the corpus of Marlowe, it shares eight less, that is, 6, with the Shakespearean corpus, which is an acceptable distance if the length of the scene is taken into consideration. In addition, there is a notable difference of eighteen points between the 2-grams that the scene shares with the Marlowian corpus (91) and those that it has in common with the Shakespearean corpus (73). Scene III.iv from Arden of Faversham has no larger n-grams in common with the Shakespearean corpus, but it shares 2 4-grams with the corpus of Marlowe. These are this shall be your, which seems to be a common construction, and hear what he can, which is part of an unusual request both in the Marlowian corpus and the disputed scene and therefore can be seen as distinctive.23 According to this study, it seems highly probable that Marlowe is the author of Scene III.iv from Arden of Faversham if it was indeed written by one of the two candidates. This verdict is derived from the clarity of the quantitative analysis of the common 3- grams and 2-grams, which has been slightly reinforced by the qualitative analysis of the shared 4-grams. 6.8. Scene III.v (1,293 words) Scene III.v from Arden of Faversham contains 1,293 words and its authorship has only been studied with n-gram tracing, given that its length is far from the 2,000 words in which the undisputed works of Shakespeare and Marlowe are divided during the conduction of a Zeta test. The number of 3-grams and 2-grams that the scene has in 23 The linguistic contexts where this construction appears are my lord, hear what he can allege, in the case of the corpus of Marlowe, and let’s hear what he can say, in the case of the disputed scene. 194 common with the Shakespearean and the Marlowian reference corpora can be observed in Table 56. After such results are commented, a qualitative analysis of the shared 4- grams will be provided. Table 56 | N-gram tracing with Scene III.v from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 47 63 2-grams 351 370 Table 56 shows that the Marlowian corpus shares more 3-grams and 2-grams with Scene III.v from Arden of Faversham than the Shakespearean corpus. There is a difference of sixteen points if the 3-grams that the scene has in common with the corpus of Marlowe (63) are compared to those that it shares with the corpus of the other candidate (47). In addition, while the scene has 370 2-grams in common with the corpus of Marlowe, it shares 351, that is, nineteen less, with the corpus of the Bard, which stands as a considerable distance. The analysis conducted by ALTXA reveals that Scene III.v from Arden of Faversham presents 2 4-grams in common with the Shakespearean corpus. These are too good to be, which seems to be a common combination of words, and thou know’st it well. Since the 2-gram thou know’st can be also found in the corpus of Marlowe, the latter 4-gram should not be seen as a solid marker either. The scene also shares 4 4-grams with the Marlowian corpus, that is, two more than with the corpus of the Bard. These are to the gates of, come let us in, here she comes and and I’ll none of that. Among these constructions, I’ll none of that stands out as a solid marker due to the use of the word none immediately after I’ll, which seems to be an unusual combination. In sum, the clarity of the results of the quantitative analysis of the common 3-grams and 2-grams, which has been reinforced by the qualitative analysis of the shared 4-grams, suggests that it is highly probable that Marlowe is the author of Scene III.v from Arden of Faversham if it was written by one of the two candidates that constitute the focus of the study. 195 6.9. Scene III.vi (1,265 words) The last scene of the third act of Arden of Faversham contains 1,265 words and its authorship has only been analysed with n-gram tracing for the same reason provided in the study of the previous scene. The number of 3-grams and 2-grams that Scene III.vi from Arden of Faversham shares with the Shakespearean and the Marlowian reference corpora can be observed in Table 57, which will be later discussed. This will be complemented by the qualitative analysis of the larger n-grams in common. Table 57 | N-gram tracing with Scene III.vi from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 55 72 2-grams 362 379 Table 57 shows that Scene III.vi from Arden of Faversham shares 72 3-grams with the Marlowian corpus, whereas it has seventeen less in common, that is, 55, with the corpus of the Bard, which stands as a notable difference. The scene also has seventeen more 2- grams in common with the corpus of Marlowe (379) than with the Shakespearean corpus (362). The study conducted by ALTXA reveals that the scene has the 5-gram ay my good lord and in common with the Shakespearean corpus. The selection of the word ay instead of yes could be seen as a conscious choice of the author, although, as underlined in Section 3.4.4, this linguistic form is more dialectal or context-dependent than idiolectal. In any case, the scene also shares the 4-gram ay my good lord with the Marlowian corpus, as will be revealed further on, for which the abovementioned 5-gram should not be seen as a solid marker for this study. Apart from the 2 4-grams that derive from the division of the previously referenced 5-gram, the scene presents another 2 4-grams in common with the Shakespearean corpus. These are to speak with you and that thou hast done, which are frequent constructions in texts of this kind. On the other hand, Scene III.vi from Arden of Faversham has 6 4-grams in common with the Marlowian corpus, that is, two more than with the corpus of Shakespeare, even though they do not share any 5-grams. These 4-grams are as thou hast done, give him a 196 crown, I have made a, I would you were, ay my good lord and on the sudden is. The 4- gram I would you were stands out as one of the most unique constructions that has been found so far in this research, since the word would is used as a synonym of want or wish,24 which is a characteristic linguistic choice that cannot be found in the Shakespearean corpus. Therefore, if Scene III.vi from Arden of Faversham was written by one of the two playwrights that constitute the focus of the study, it seems highly probable that Marlowe is its author. The clarity of the results of the quantitative analysis of the shared 3-grams and 2-grams has been greatly reinforced by the presence of a unique 4-gram in common between the scene and his reference corpus. 6.10. Scene IV.i (838 words) Scene IV.i from Arden of Faversham contains 838 words, for which its authorship has only been studied with n-gram tracing. Table 58 shows the number of 3-grams and 2- grams that it has in common with the two reference corpora. The discussion of these results will be complemented by the qualitative analysis of the larger n-grams in common. Table 58 | N-gram tracing with Scene IV.i from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 41 40 2-grams 249 262 The number of 3-grams that Scene IV.i from Arden of Faversham shares with the reference corpora of the two candidates of the study is almost identical, although it has one more in common with the Shakespearean corpus (41) than with that of Marlowe (40). In contrast, the scene shares thirteen more 2-grams with the Marlowian corpus (262) than with the corpus of the Bard (249), which is an acceptable difference. According to the analysis conducted by ALTXA, the scene shares the 5-gram the time hath been would with the Shakespearean corpus. This 5-gram appears to be distinctive, since the combination of words time hath cannot be found in the Marlowian corpus and this includes the archaic form of the verb has, which stands as a linguistic choice. Apart 24 This construction appears in I would you were his father too, in the case of the Marlowian corpus, and in I would you were in state to tell it out, in the case of Scene III.vi from Arden of Faversham. 197 from the 2 4-grams that derive from the division of this 5-gram, the scene also shares with the corpus of Shakespeare and all the rest and go along with us, which seem to be frequent constructions. The scene shares 2 4-grams with the Marlowian corpus, that is, two less than with the corpus of Shakespeare. These 4-grams are I have lost my, which does not seem to be distinctive, and these arms of mine. The latter reflects a linguistic choice of the author, given that the same idea could have been expressed with the construction my arms. It is worth mentioning that, while these arms of mine can be found in the scene and the Marlowian corpus, the corpus of Shakespeare only includes the expression my arms, and thus this 4-gram could be seen as a robust idiolectal marker. Scene IV.i from Arden of Faversham shares almost the same number of 3-grams with the corpora of the two candidates of the study, although it has more 2-grams in common with the Marlowian corpus by an acceptable margin. The qualitative analysis of the larger n-grams shows that, even though the scene shares a relatively distinctive 5-gram and a few 4-grams with the corpus of Shakespeare, it has a highly characteristic 4-gram in common with the Marlowian corpus. Therefore, it seems that Marlowe is slightly more likely to have written the scene than Shakespeare if it was indeed written by one of them. 6.11. Scene IV.ii (263 words) Table 59 shows the number of 3-grams and 2-grams that Scene IV.ii from Arden of Faversham, which only contains 263 words, shares with the two reference corpora. Once Table 59 is commented, the qualitative analysis of the shared 5-grams and 4-grams will be presented. Table 59 | N-gram tracing with Scene IV.ii from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 14 12 2-grams 90 93 The quantitative analysis of the common 3-grams and 2-grams shows inconclusive results, since the scene shares two more 3-grams with the Shakespearean corpus than with the corpus of Marlowe (14 vs. 12), but it presents three more 2-grams in common with the Marlowian corpus than with that of the Bard (93 vs. 90). 198 While the scene has no larger n-grams in common with the Marlowian corpus, it shares the 5-gram and I will follow you with the corpus of the Bard, which can be divided into 2 4-grams and does not seem to include a particular combination of words. According to this study, it seems uncertain if Scene IV.ii from Arden of Faversham was written by Shakespeare or Marlowe, given that the number of common 3-grams and 2-grams does not clearly associate the scene with any of the two candidates and, even though there is a 5-gram in common between the scene and the Shakespearean corpus, it does not seem to be significant enough to have an impact on the final verdict on the authorship of the scene after the inconclusive results of the quantitative analysis. 6.12. Scene IV.iii (593 words) The number of 3-grams and 2-grams that Scene IV.iii from Arden of Faversham has in common with the two reference corpora will be presented in Table 60 and later discussed. This will be complemented by the qualitative analysis of the shared 5-grams and 4-grams. Table 60 | N-gram tracing with Scene IV.iii from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 25 35 2-grams 187 204 Table 60 shows that there is a difference of ten 3-grams if those that the scene shares with the Marlowian corpus (35) are compared to those that it has in common with the corpus of the Bard (25). Furthermore, there is a difference of seventeen points between the 2- grams that the scene shares with the corpus of Marlowe (204) and those that it has in common with the Shakespearean corpus (187), which is significant if the fact that this scene only has 593 words is taken into consideration. The analysis conducted by ALTXA reveals that, while Scene IV.iii from Arden of Faversham does not share any larger n-grams with the Shakespearean corpus, it has a 5- gram and 5 4-grams in common with the corpus of Marlowe. The 5-gram that they share is ay for a while but, which presents the collocation for a while, that cannot be found in the Shakespearean corpus. The 5 4-grams that the scene has in common with the Marlowian corpus are, apart from those that derive from the division of the abovementioned 5-gram, I hope to see, as 199 we have done and my life for thine. The latter 4-gram appears to be the most unusual of the group. The quantitative analysis of the common 3-grams and 2-grams and the qualitative analysis of the larger n-grams in common show that, if Scene IV.iii from Arden of Faversham was written by Shakespeare or Marlowe, it seems highly probable that the latter elaborated it. 6.13. Scene IV.iv (1,251 words) The last scene of the fourth act of Arden of Faversham contains 1,251 words, for which its authorship has only been studied with n-gram tracing. Table 61 shows the number of 3-grams and 2-grams that it shares with the two reference corpora, which will be discussed and complemented by the qualitative analysis of the common 4-grams. Table 61 | N-gram tracing with Scene IV.iv from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 48 66 2-grams 370 376 There is a notable difference of eighteen points if the 3-grams that Scene IV.iv from Arden of Faversham has in common with the corpus of Marlowe (66) are compared to those that it shares with the Shakespearean corpus (48). Even though the scene also presents more 2-grams in common with the Marlowian corpus, there is a narrow difference of six points between those that it shares with his corpus (376) and with that of the Bard (370), which appears to be surprising if the fact that the scene contains 1,251 words is taken into consideration. Scene IV.iv from Arden of Faversham shares 5 4-grams with the Shakespearean corpus, which are I will perform it, thee on thy way, to show the world, as I have heard and may do thee good. Among these, to show the world could be seen as an authorship marker, since it might be considered a metonymy. The scene also shares 8 4-grams with the Marlowian corpus, and these are what hast thou done, what wilt thou do, see where he comes, know you what you, with me and be, thee on thy way, such prayers as these and let them have it. Among these 4-grams, such prayers as these stands out as the most distinctive of the group, given that it seems to be 200 an unusual combination of words and the 2-gram such prayers cannot be found in the Shakespearean corpus. If Scene IV.iv from Arden of Faversham was written by Shakespeare or Marlowe, it seems slightly probable that the latter is its author, according to the study. The reason why Marlowe has not been suggested as the likeliest author with high degree of certainty is that, even though the quantitative analysis of the common 3-grams and 2-grams links the authorship of the text to him, the difference in the number of common 2-grams is quite narrow for such a large sample. Furthermore, although the scene has more 4-grams in common with his corpus and one of them appears to be distinctive, it also shares a 4-gram with the Shakespearean corpus that could be seen as a solid idiolectal marker. 6.14. Scene V.i (3,477 words) The study of Scene V.i from Arden of Faversham could be seen as the most important of the thesis, given that it links the authorship of the scene to Marlowe with a degree of certainty that has no precedent neither in the analysis of the previous scenes of the play nor in the pre-studies. Since the scene contains 3,477 words, its authorship has been tested with n-gram tracing, whose results will be provided first, and a Zeta test, which will be presented afterwards. Attribution of authorship of Scene V.i from Arden of Faversham with n-gram tracing The number of 5-grams, 4-grams, 3-grams and 2-grams that Scene V.i from Arden of Faversham shares with the two reference corpora can be observed in Table 62. After these results are discussed, a qualitative analysis of the larger n-grams in common will be provided. Table 62 | N-gram tracing with Scene V.i from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 5-grams 2 11 4-grams 11 35 3-grams 146 236 2-grams 870 973 201 As explained in Section 4.5.4, the criterion for the inclusion of a type of n-grams in the quantitative analysis is that the disputed text and one of the reference corpora must share at least ten constructions of that kind. The fact that the common 5-grams have been included in this quantitative analysis reflects the high degree of resemblance that Scene V.i from Arden of Faversham has with the corpus of Marlowe, given that it is the only occasion on which this has occurred. The scene shares 11 5-grams with the Marlowian corpus and only 2 with that of Shakespeare. There is also a dramatic difference of twenty- four points if the number of 4-grams that the scene has in common with the corpus of Marlowe (35) is compared to those that it shares with the Shakespearean corpus (11), which stands as the largest found so far. The difference in the shared 3-grams is also the largest found in the thesis, since the scene presents 236 in common with the corpus of Marlowe and 146, that is, ninety less, with the Shakespearean corpus. Lastly, while Scene V.i from Arden of Faversham presents 973 2-grams in common with the corpus of Marlowe, it shares one hundred and three less (870) with that of the other candidate. The analysis conducted by ALTXA reveals that Scene V.i from Arden of Faversham does not have any larger n-grams in common with the corpus of Shakespeare. In contrast, it shares the 10-gram I have my wish in that I joy thy sight with the corpus of Marlowe. This stands as the largest construction in common found in the case study and the pre- study, which reflects how unlikely it is that two texts share a combination of words of this kind. In addition, it contains such a particular selection of words that it seems hard to believe that two different authors may have chosen it. This is one the main findings of the thesis and provides solid evidence for the participation of Marlowe in the elaboration of the play, as will be discussed in depth in Chapter 7. One could ponder that not only is Marlowe more likely than Shakespeare to have written Scene V.i from Arden of Faversham, but that it seems complicated to suggest that the scene could have been written by a distinct author if the number of 5-grams, 4-grams, 3-grams and 2-grams that it shares with his reference corpus are taken into consideration, as well as the presence of such a unique 10-gram in common. Attribution of authorship of Scene V.i from Arden of Faversham with the Zeta test Figure 15 shows the position on the coordinate axis of the fragments in which the two reference corpora and Scene V.i from Arden of Faversham have been divided. The division of these three samples, the calculation of the 500 markers of each reference 202 corpus and the determination of the coordinates of the fragments have followed the same criteria applied in previous studies. The ignored words for the elaboration of the two lists of 500 markers have been included in Appendix 3, while Appendix 4 contains the lists of markers. Figure 15 | Zeta test with Scene V.i from Arden of Faversham The blue circles that create a cluster on the upper left area represent the fragments in which the undisputed plays of Marlowe have been divided, whereas the red squares that occupy the lower right area forming another cluster stand as the fragments in which the undisputed plays of Shakespeare have been divided. The black triangle represents Scene V.i from Arden of Faversham, which is considerably closer to the centroid of the Marlowian cluster than to that of the Shakespearean cluster. The exact distance between the centroid of the Marlowian cluster and the position of the fragment that represents the scene is of 0.0749 points, whereas its distance from the centroid of the Shakespearean cluster is of 0.12418 points. Therefore, this Zeta test suggests that Marlowe is the likeliest 203 author of the scene, which coincides with the results of the study conducted with n-gram tracing. The clarity with which the Zeta test and, especially, n-gram tracing have associated the authorship of Scene V.i from Arden of Faversham with Marlowe makes it complicated to suggest that the scene could have been written by a different playwright. This constitutes a crucial breakthrough in the investigation and will be thoroughly addressed in Chapter 7. 6.15. Scene V.ii (106 words) The attribution of authorship of Scene V.ii from Arden of Faversham appears to be of great difficulty, since it only contains 106 words and hence the chances of finding common n-grams with the two reference corpora are lower than in the analysis of the previous scenes of the play. The number of 3-grams and 2-grams that the scene shares with the Shakespearean and the Marlowian reference corpora can observed in Table 63, which will be later commented. This will be complemented by the qualitative analysis of the only shared 4-gram. Table 63 | N-gram tracing with Scene V.ii from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 5 14 2-grams 38 52 Table 63 shows that there is a difference of nine points if the 3-grams that the scene shares with the Marlowian corpus (14) are compared to those that it has in common with the corpus of the Bard (5). It also shares fourteen more 2-grams with the corpus of Marlowe (52) than with the Shakespearean corpus (38). These differences are remarkable if the fact that the scene only contains 106 words is taken into consideration. The software ALTXA has identified a common 4-gram between Scene V.ii from Arden of Faversham and the corpus of Marlowe, which is what care I though. This 4- gram appears to be distinctive, given that the omission of the auxiliar do stands as an idiolectal choice of the author. If Scene V.ii from Arden of Faversham was elaborated by Shakespeare or Marlowe, it seems highly probable that the latter is its author. The clarity of the quantitative analysis 204 of the common 3-grams and 2-grams has been slightly reinforced by the presence of a distinctive 4-gram in common, which is surprising if the small size of the scene is considered. 6.16. Scene V.iii (179 words) Scene V.iii from Arden of Faversham does not share 10 3-grams with any of the two reference corpora, for which only the common 2-grams will be quantitatively analysed. This will be complemented by the qualitative analysis of the common 4-grams and 3- grams. Table 64 | N-gram tracing with Scene V.iii from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 2-grams 60 62 Table 64 shows that the scene presents almost the same number of 2-grams in common with the two reference corpora. While it shares 62 with the corpus of Marlowe, it has two less in common, that is, 60, with that of Shakespeare, which stands as an insufficient distance to consider the results conclusive. The analysis conducted by ALTXA reveals that Scene V.iii from Arden of Faversham shares the 4-gram what shall I say with the corpus of Shakespeare, which does not seem to be a distinctive construction. As a matter of fact, the 3-gram what shall I can be found in the Marlowian corpus, as will be revealed further on. It also shares 9 3-grams with the Shakespearean corpus, which are those derived from the division of the abovementioned 4-gram, as well as I have done, I did not, have done this, and I have, me when we, on me when and me and in. None of these 3-grams seems to be distinctive. On the other hand, the scene has 8 3-grams in common with the corpus of Marlowe, that is, one less than with the corpus of Shakespeare. These are wherefore stay we, what shall I, and bear me, I have done, and I have, not on me, I did it and me and in. None of them appears to be a solid marker either. It seems uncertain if Scene V.iii from Arden of Faversham was written by Shakespeare or Marlowe, according to the study. The scene shares only two more 2-grams 205 with the corpus of Marlowe and, even though none of the larger n-grams that have been qualitatively analysed seems to be distinctive, it has one more 4-gram and one more 3- gram in common with the corpus of Shakespeare. In other words, these results are not conclusive enough to link the authorship of the disputed text to any of the two candidates of the study. 6.17. Scene V.iv (117 words) As happened in the study of the previous scene, only the common 2-grams can be included in the quantitative analysis of Scene V.iv from Arden of Faversham, which will be presented in Table 65. This will be followed by the qualitative analysis of the common 3-grams. Table 65 | N-gram tracing with Scene V.iv from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 2-grams 37 43 As can be observed in Table 65, Scene V.iv from Arden of Faversham shares six more 2- grams with the Marlowian corpus (43) than with the corpus of the Bard (37), which can be seen as an acceptable distance if the fact that the scene only has 117 words is taken into consideration. The software ALTXA has identified 6 3-grams in common between the scene and the Shakespearean corpus, which are there is no, my head and, I have done, that I have, that I can and but I am. None of them seems to be a solid authorship marker. The scene also shares 8 3-grams with the Marlowian corpus, that is, two more than with the corpus of Shakespeare. These are that I have, there is no, him and his, I am sure, my head and, I have done, and cries for and but I am, which do not seem to be distinctive either. If Scene V.iv from Arden of Faversham was written by Shakespeare or Marlowe, it seems slightly probable that the latter elaborated it. Even though the qualitative analysis of the common 3-grams seems to be inconclusive, the quantitative analysis of the shared 2-grams associates the authorship of the scene with Marlowe with an acceptable degree of certainty for such a short sample. 206 6.18. Scene V.v (321 words) Table 66 shows the number of 3-grams and 2-grams that Scene V.v from Arden of Faversham shares with the two reference corpora, which will be later discussed and complemented by the qualitative analysis of the 4-grams in common. Table 66 | N-gram tracing with Scene V.v from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 3-grams 13 18 2-grams 126 138 While the scene shares 18 3-grams with the Marlowian corpus, it has five less in common, that is, 13, with the corpus of Shakespeare. In addition, there is a difference of twelve points if the 2-grams that the scene has in common with the corpus of Marlowe (138) are compared to those that it shares with the Shakespearean corpus (126). Scene V.v from Arden of Faversham shares 2 4-grams with each of the two reference corpora. The 2 4-grams that it has in common with the Shakespearean corpus are how long shall I, which seems to be a common combination of words, and and bring away the, which could be seen as relatively distinctive, since the combination of words bring away cannot be found in the corpus of Marlowe. The 2 4-grams that the scene has in common with the Marlowian corpus are what should I say, which does not seem to be distinctive, and this hell of grief, which is a metaphor that reflects the emotional pain of a character and therefore stands as a solid authorship marker. It seems highly probable that, if Scene V.v from Arden of Faversham was written by Shakespeare or Marlowe, the latter is its author. The quantitative analysis of the shared 3-grams and 2-grams associates the authorship of the scene with him, and this has been reinforced by the presence of a highly distinctive 4-gram in common. 6.19. Epilogue or Scene V.vi (148 words) The number of 2-grams that Scene V.vi from Arden of Faversham has in common with the two reference corpora will be presented in Table 67. This will be discussed and complemented by the qualitative analysis of the shared 4-grams and 3-grams. 207 Table 67 | N-gram tracing with Scene V.vi from Arden of Faversham Type of n-grams Common n-grams with the Shakespearean corpus Common n-grams with the Marlowian corpus 2-grams 40 42 Table 67 shows that Scene V.vi from Arden of Faversham shares only two more 2-grams with the corpus of Marlowe (42) than with that of Shakespeare (40), for which this quantitative analysis does not clearly associate its authorship with any of the two candidates. The analysis conducted by ALTXA reveals that the scene shares the 4-gram this above the rest with the Marlowian corpus, which does not seem to be an unusual combination of words. It also has 6 3-grams in common with his corpus, which are, apart from those that derive from the division of this 4-gram, and in the, as for the, by force and and is to be. None of them seems to be distinctive. On the other hand, the scene has 4 3-grams in common with the corpus of Shakespeare, which are and in the, deed was done, the Lord Protector and as for the. Among these, the only one that seems to be relatively uncommon is the Lord Protector, but the 2-gram Lord Protector is also present in the corpus of Marlowe, for which it cannot be seen as a solid marker for the study. It seems uncertain if Scene V.vi from Arden of Faversham was written by Shakespeare or Marlowe, according to the study. It only shares two more 2-grams with the corpus of Marlowe than with that of Shakespeare and, even though it also has one more 4-gram and two more 3-grams in common with his corpus, none of them seems to be distinctive. All these differences are so narrow that it does not seem fair to attribute the authorship of the text to him. 6.20. Summary This chapter has analysed the authorship of the nineteen scenes of Arden of Faversham independently, considering the hypothesis that the play may have been written in collaboration and that William Shakespeare and Christopher Marlowe may have been involved in such process. Depending on the length of the scenes, these have been analysed with n-gram tracing only or complementing n-gram tracing with the Zeta test, since these are the methods that have been proved to be successful to distinguish between undisputed 208 scenes of the two playwrights in Chapter 5. The attribution of authorship of the scenes has been presented as highly probable, slightly probable or uncertain depending on the degree of certainty with which n-gram tracing and, if applied, the Zeta test have associated them with one of the candidates. Nevertheless, these studies need to be addressed from a holistic perspective with the purpose of drawing general conclusions about the authorship of the play and whether the objectives of the research have been met or not, which constitutes the focus of the next chapter. 209 CHAPTER 7 | DISCUSSION OF THE RESULTS The scenes of Arden of Faversham have been analysed as independent texts in Chapter 6 to discern their likeliest authorship considering William Shakespeare and Christopher Marlowe as the candidates for such attribution. These studies will be discussed from a holistic perspective in this chapter to extract a series of conclusions about the authorship of the play and the objectives and hypotheses delineated at the beginning of the thesis (see Section 1.2). The results of the nineteen studies conducted in Chapter 6 will be presented in the form of a table to assess which groups of scenes have been more easily attributed and which ones could be seen as problematic. This table will include the title of the scene, its length, the method or methods with which its authorship has been analysed, to which candidate it has been attributed and, if it has indeed been attributed to one of them, the degree of certainty with which the attribution has taken place. Table 68 | Summary of the results derived from the case study Title of the scene Length of the scene Methods involved in the attribution Likeliest author Certainty of the attribution I.i 5,135 words N-gram tracing and Zeta test Marlowe Highly probable II.i 916 words N-gram tracing Marlowe Highly probable II.ii 1,694 words N-gram tracing and Zeta test Marlowe Highly probable III.i 822 words N-gram tracing Shakespeare Slightly probable III.ii 516 words N-gram tracing Marlowe Highly probable III.iii 357 words N-gram tracing Marlowe Highly probable III.iv 240 words N-gram tracing Marlowe Highly probable III.v 1,293 words N-gram tracing Marlowe Highly probable III.vi 1,265 words N-gram tracing Marlowe Highly probable IV.i 838 words N-gram tracing Marlowe Slightly probable IV.ii 263 words N-gram tracing Uncertain - IV.iii 593 words N-gram tracing Marlowe Highly probable 210 IV.iv 1,251 words N-gram tracing Marlowe Slightly probable V.i 3,477 words N-gram tracing and Zeta test Marlowe Highly probable V.ii 106 words N-gram tracing Marlowe Highly probable V.iii 179 words N-gram tracing Uncertain - V.iv 117 words N-gram tracing Marlowe Slightly probable V.v 321 words N-gram tracing Marlowe Highly probable V.vi 148 words N-gram tracing Uncertain - Table 68 shows that the only scene of Act I and the two scenes of Act II from Arden of Faversham have been attributed to Marlowe with a high degree of certainty. Two of these three scenes (I.i and II.ii) are long enough to be analysed with n-gram tracing and the Zeta test and both methods have clearly associated their authorship with him. The remaining scene (II.i) has only been analysed with n-gram tracing, which has also linked its authorship to Marlowe with clarity. Therefore, the present study can conclude that, if the first two acts of Arden of Faversham were written by Shakespeare or Marlowe, the latter can be considered their author without major doubts. In contrast, n-gram tracing has associated Scene III.i from Arden of Faversham with Shakespeare. The scene has been attributed to him with a low degree of certainty, given that the difference in the number of n-grams that it shares with the two reference corpora is narrow. The rest of the scenes of the third act of Arden Faversham, that is, III.ii, III.iii, III.iv, III.v and III.vi, have followed the trend that can be found in the scenes of the first two acts of the play and have been attributed to Marlowe with a high degree of certainty using n-gram tracing. In other words, the scenes of Act III have been attributed to Marlowe without major doubts except for the first one, which has been associated with Shakespeare by a small margin. It is worth mentioning that this is the only scene of the play whose authorship has been attributed to him. The fourth act could be seen as the most problematic of the study, given that it contains four scenes whose authorship has been analysed with n-gram tracing and only one has been attributed with a high degree of certainty. Scene IV.i has been linked to Marlowe by a narrow margin, whereas the study of the second scene has shown 211 inconclusive results. Scene IV.iii is the one that has been attributed to Marlowe with a high degree of certainty and Scene IV.iv has also been attributed to him, but without great clarity. Even though the authorship of three of the four scenes has been linked to Marlowe, it has occurred without great certainty on two occasions, for which one could ponder that there might be a different author involved in the elaboration of the fourth act of Arden of Faversham, as will be developed further on. The authorship of Scene V.i, which has been analysed with n-gram tracing and the Zeta test, has been associated with Marlowe with a degree of certainty that has no precedent in the thesis. This gives rise to the idea that it is extremely unlikely that the scene could have been written by a different playwright, which constitutes a significant breakthrough in the investigation that will be expounded further on. The five remaining scenes of Act V have only been analysed with n-gram tracing. While Scene V.ii has been linked to Marlowe with clarity, the results derived from the analysis of Scene V.iii have been inconclusive. The study of Scene V.iv suggests that Marlowe is slightly more likely than Shakespeare to have written it, and Scene V.v has also been attributed to Marlowe, but with a high degree of certainty. Lastly, the authorship of Scene V.vi has remained uncertain. In sum, if the first two scenes of Act V from Arden of Faversham were written by Shakespeare or Marlowe, it seems almost certain that the latter elaborated them. Nevertheless, the four remaining scenes are more problematic and, while two of them have been linked to Marlowe, one of which by a narrow margin, the other two have not been associated with any of the two candidates of the study. In total, Arden of Faversham contains 19 scenes and, when their authorship has been analysed independently considering William Shakespeare and Christopher Marlowe as the possible candidates, the latter has been suggested as the likeliest author on 15 occasions. On 3 of the 4 occasions in which Marlowe has not been selected as the likeliest author of a scene, the results have been inconclusive, whereas the remaining scene (III.i) has been attributed to Shakespeare by a narrow margin. The first conclusion that can be inferred from the present research is that, if Arden of Faversham was written by Shakespeare and/or Marlowe, the latter elaborated most of the scenes of the play. Nevertheless, there have been many other playwrights considered as possible authors of the text (see Section 3.4.4), for which this research should be perceived as the first milestone of a long-term project in which the candidate that has 212 been designated as the likeliest author of every scene should be compared in future studies with the other candidates. These comparisons should start with Thomas Kyd, who has been suggested by scholars as the most solid alternative for Shakespeare and Marlowe (see Section 2.3). The inconclusive results derived from the analysis of certain scenes from the fourth and the fifth act may be interpreted as a consequence of their small size, which hinders the attribution of their authorship, or a reflection of the participation of a distinct playwright from the two candidates of the study in the creation of the disputed text. Even though one of the main objectives of the thesis is only to determine if Shakespeare is more likely than Marlowe to have written each scene of Arden of Faversham or vice versa, a conclusion that can be inferred from Chapter 6 is that the participation of Marlowe in the elaboration of the play seems quite probable for two main reasons. Firstly, it is possible that, if Marlowe had not been involved in the creation of the play, these authorship studies would have produced more balanced results between the two candidates of the study, instead of attributing 15 of the 19 scenes of the play to him with a high degree of certainty in almost every case. Secondly, the manner in which the Zeta test and, especially, n-gram tracing have associated the authorship of Scene V.i from Arden of Faversham with Marlowe is so overwhelming that it seems complicated to suggest that a different playwright may have been involved in the elaboration of the scene. As pointed out during the conduction of such analysis (see Section 6.14), the number of 5-grams, 4-grams, 3-grams and 2-grams in common between the scene and the Marlowian corpus cannot be found in any of the other 57 studies that have been conducted using n-gram tracing in the thesis. Furthermore, the qualitative analysis of the larger n- grams in common has revealed that the scene and the Marlowian corpus share the 10- gram I have my wish in that I joy thy sight. This common 10-gram, which stands as the largest found in the investigation, represents such a distinctive combination of words that it seems impossible that two different playwrights might have selected it, unless one of them was committing plagiarism. Therefore, this investigation has provided substantial evidence to suggest that, even though there is still the need to include more candidates in future studies about the authorship of Arden of Faversham, Christopher Marlowe was clearly involved in its creation, at least in that of Scene V.i, which constitutes a major breakthrough. 213 The results provided in Chapter 6 also facilitate the extraction of a series of conclusions about the participation of William Shakespeare in the elaboration of Arden of Faversham. It seems that, if he participated in it, he had a minor contribution, given that only Scene III.i has been attributed to him and such attribution has been defined as slightly probable. This means that, if the authorship of Scene III.i is analysed by comparing the Shakespearean reference corpus with the corpora of other playwrights who are different from Marlowe, the chances of finding a candidate whose idiolect presents a higher degree of resemblance with that displayed in the scene are solid. This finding can be also seen as significant, given that it differs from the results obtained by Kinney (2009), whose Zeta test reached the conclusion that Shakespeare was not only involved in the elaboration of Scene III.i, as suggested in this study, but also in that of Scenes III.ii, III.iii, III.iv, III.vi and V.iii, which have been attributed to Marlowe in this study with a high degree of certainty. It also differs from the results presented by Elliott and Greatley- Hirsch (2017), who associated with Shakespeare the first part of Scene I.i and the totality of Scenes III.vi and IV.i after the conduction of the Zeta test, whereas these scenes have been attributed to Marlowe in this thesis (see Section 3.4.4 for an account of these studies). On the other hand, this thesis does not contradict the findings derived from Taylor’s investigation (2019), where he attributed the authorship of the first words of Scene IV.i to Thomas Watson (see Section 3.4.4), given that he has not been considered as a possible candidate here. As a matter of fact, Scene IV.i is one of the few that has been attributed to Marlowe as slightly probable, and thus this could reflect the participation of a distinct playwright in its elaboration. It is also worth mentioning that this thesis has suggested the hypothesis that word n-grams of at least two words can be more effective than character n-grams and word 1-grams to deal with an authorship problem of this kind, which coincides with the approach followed by Taylor in his study. The two hypotheses that have led to such distinct results in comparison with the previously referenced studies of Kinney (2009) and Elliott and Greatley-Hirsch (2017) are that authors should be compared individually during the Zeta test and that the reference corpora of the candidates should be compiled with undisputed works that have similar characteristics to those of the disputed text. The comparison of a single author with a group of authors during the conduction of a Zeta test does not seem statistically sensible, as explained in Section 4.5.5. If, for instance, 214 Shakespeare was compared with a group of ten playwrights, many discriminators that would be useful to distinguish between Shakespeare and one of those authors would be probably lost due to the average values of the other candidates of the group, which could be illustrated as follows. If the word beseech was highly frequent in the Shakespearean corpus and it was barely present in the Marlowian corpus, it would become one of the 500 Shakespearean markers for the conduction of the test if this only compared these two authors. Nevertheless, if Marlowe was included in a group with nine other playwrights and they all used the word beseech frequently, this word would not be included as a discriminator in the study and thus a great opportunity to compare the two playwrights would be missed. It seems that the combination of many idiolects in a same corpus might not represent any of its parts properly and that they could mean nothing as a group, statistically speaking. For that reason, studies of this kind should compare authors individually, as has been consistently suggested throughout the thesis. As mentioned earlier, the most important hypothesis of the thesis is that, since the idiolect is a dynamic phenomenon, the style of an author may vary greatly depending on the period and the type of text that they write, and thus the reference corpora of the candidates should be narrowed down and only include plays that belong to a similar period to that in which the disputed text was created, with which they should also share a similar tone. Scholars state that there are certain idiolectal choices that remain uniform in all the creations of an author, which is something I agree with, but it must be borne in mind that playwrights were constantly imitating each other during the Elizabethan period and there are few stylistic differences among them, for which those idiolectal choices that remain uniform in their entire work are probably shared by many. For that reason, it has been suggested that there might be a series of idiolectal features that can be only found in the plays that Shakespeare and Marlowe wrote during a certain period of time and had a tragic tone, and thus their identification and quantification can be the key to discern the likeliest authorship of a disputed text of similar characteristics. Following that premise, their reference corpora have been compiled with plays that were elaborated between 1590 and 1595, since Arden of Faversham was approximately written in 1592, and are not comedies, since this play is a tragedy. This approach differs from that adopted by Taylor (2019), who suggested that it is more effective to compile the reference corpora with texts that belong to dissimilar periods and even distinct genres, such as poetry. It also differs from those adopted by Kinney (2009) and Elliott and 215 Greatley-Hirsch, who compiled their reference corpora with plays written from 1580 to 1619 and from 1580 to 1594, respectively, including comedies in both cases. It is impossible to validate or refute the hypotheses suggested for the conduction of this thesis until these are consistently tested in future research involving distinct types of texts, since there will never be studies able to attribute the authorship of Arden of Faversham beyond any reasonable doubt. Nevertheless, the fact that the adoption of these methodological principles has generated such a distinct outcome from that of previous studies could raise a debate on the reliability of each approach. I would suggest that the compilation of reference corpora that ignores variables such as the genre of the texts and the period in which they were written can be successful when two authors that have notably different idiolectal features are compared. In contrast, when facing the authorship of an Elizabethan text or any kind of sample whose potential authors present such similar styles, the most representative corpora are not the larger ones, but those that are able to reproduce more faithfully the conditions in which the disputed text was elaborated. The fact that forensic linguistics is a relatively new discipline and that many of the computational tools used for the conduction of these studies have only been available for the last few years explains the lack of consensus on the methodological approach that should be adopted. Therefore, this research could be seen as another contribution to the development of a discipline that has been constantly evolving over the last decades. One of the two main objectives of the thesis, which is to analyse the authorship of the nineteen scenes of Arden of Faversham independently, has been accomplished with the assistance of ALTXA, whose development as a free software stands as its other main objective (see Section 1.2). This computational tool can quantify the relative frequency of a keyword in a text, as well as its average number of words per sentence and lexical richness. It can also identify the word n-grams that two samples share and conduct a Zeta test, and all these functionalities can be accessed in an intuitive interface that seeks to facilitate the work of other forensic linguists and the spread of authorship attribution studies in educational contexts, where there is usually a lack of experts in the field. The two main objectives of the thesis have contributed reciprocally to their fulfillment, since the study of the play Arden of Faversham required the use of ALTXA, and ALTXA has proved its validity by being applied in the analysis of Arden of Faversham. 216 Chapter 8 will summarize the steps taken for the conduction of this thesis, its main findings, how these can relate to the initial hypotheses and the manner in which its two main objectives have been accomplished. 217 CHAPTER 8 | CONCLUSION AND FUTURE LINES OF RESEARCH This final chapter consists of two sections. The first one will summarize the objectives, hypotheses, theoretical foundations, methodology, results and main findings of the thesis, whereas the second one will highlight its limitations and the path that could be adopted in future research. 8.1. Summary and implications of the findings This doctoral thesis had two main objectives. The first one was to study from a forensic linguistic perspective the authorship of the nineteen scenes of the Elizabethan play Arden of Faversham considering Shakespeare and Marlowe as the possible candidates. The other one was to develop a computer program that could allow for the conduction of such analyses and pave the way for the spread of the discipline in professional and academic contexts. A brief historical and literary introduction about the playwrights and the text that constitute the focus of the study was provided in Chapter 2. This presented a general overview about William Shakespeare’s life events until the last decade of the sixteenth century, that is, when Arden of Faversham was created, as well as a complete biography of Christopher Marlowe, since he died during that period. Shakespeare only received basic education and seems to have forged his way as a playwright by working as a schoolmaster first and joining the Queen’s Men as an actor afterwards, which made him look like an intruder to Robert Greene and others who followed a more traditional path, as it is the case of Marlowe. The latter received an exquisite education in Cambridge, where he had the opportunity to learn from already established playwrights like Greene himself, and seems to have combined his literary career with working as a spy, which might justify his mysterious death and the many subsequent speculations. These biographical notes also established a connection between the two playwrights, who have been credited as the co-authors of the three parts of Henry VI. This, combined with the testimony of literary experts suggesting that the cooperation between two or more playwrights in the elaboration of plays like Arden of Faversham was a common practice at the time, justifies the approach followed for the analysis. The play itself was presented as a literary text built upon a historical event, since the assassination on which it focuses was documented by Holinshed in the Chronicles of England, Scotland and Ireland in 1577. A summary of its plot and main literary features was presented under the belief that 218 this would be of use to have a deeper understanding of the subsequent linguistic analysis, since both disciplines could be seen as complementary in a study of this kind. Lastly, a description of the approaches adopted to deal with the authorship of this anonymous play over the years was provided to connect the contents of this chapter with those of the following, which stands as the linguistic background of the thesis. Chapter 3 provided the reader with a definition of forensic linguistics and an account of its historical development and main applications, with a special emphasis on authorship attribution studies. Even though forensic linguistics is a discipline that has not been acknowledged as such until recently, since the term was first used in 1968 by Professor Jan Svartvik, there have been many legal cases throughout history where the use of language has played a crucial role, some of which were commented in this chapter to illustrate the inherent relationship between the law and the language. The three main fields of study in which forensic linguistics is currently divided were presented and discussed. These are known as the written language of the law, which is related to the need to make legal documents accessible to the average citizen, the spoken language of the law, which analyses the oral interactions that take place in legal contexts such as police investigative interviews, and the linguist as an expert witness, which refers to those cases in which the linguist gives advice and provides evidence in legal processes. Each of these fields of study was carefully explained with practical cases under the belief that many linguists are not familiar with forensic linguistics yet and hence this thesis could be their first contact with the discipline. The chapter moved from the general to the specific, since authorship attribution studies, which correspond to one of the roles that the linguist may adopt when acting as an expert witness, were thoroughly explained in its last section. This discussed the fundamentals of plagiarism detection, the analysis of criminal texts with an open and a close set of suspects and, lastly, the study of historical and literary texts, which was addressed in depth by pointing out its methodological foundations and reviewing previous research on the authorship of Arden of Faversham. Chapter 3 was not merely descriptive in any of its sections, given that it provided theoretical contributions during the discussion of certain concepts and practical cases to anticipate the approach adopted for the conduction of the research. The methodological aspects of the thesis were developed in Chapter 4. Under the belief that the idiolect of an author is such a dynamic phenomenon that it may vary depending on the period in which they write and the type of text that they produce, it was 219 suggested that the reference corpora of Shakespeare and Marlowe should be compiled with plays that were written in a similar period to that of Arden of Faversham and that they should also present a similar tone. This, which stands as the main hypothesis of the thesis, led to the selection of Richard III and Richard II to compile the Shakespearean corpus and Edward II and The Jew of Malta to compile that of Marlowe, given that they were written no more than three years apart from Arden of Faversham and are plays with a tragic tone. These four plays and Arden of Faversham were extracted from the archives of Project Gutenberg, that preserved the selection of words of the first published manuscripts, which is on what the posterior analysis focused, rather than their spelling, since this is more likely to have been altered even by those who transcribed the abovementioned manuscripts. After the extraction of the texts, these were cleaned to optimize the posterior analysis by deleting every stage direction or linguistic element that was not part of a dialogue under the assumption that these are less likely to reflect idiolectal features. The decision of structuring the analysis into a series of pre-studies and a case study was of paramount importance to ensure that the methods used for the analysis of Arden of Faversham are effective in a similar linguistic context. Initially, five tests were selected for the conduction of the pre-studies, but the one based on the quantification of the relative frequency of a list of keywords in the samples was eventually discarded because of its reliance on subjective criteria. The four remaining methods, which consist in the quantification of the average number of words per sentence and the lexical richness of the texts, the identification of common n-grams and the conduction of the Zeta test, were included in the pre-studies. These analysed samples taken from the Shakespearean and the Marlowian reference corpora as if they were disputed in order to assess the reliability of each method with each type of scenes depending on their length (from 100 to 450 words, from 500 to 950, from 1,200 to 1,700 and almost 2,000 or more). Chapter 4 also showed the way in which the five linguistic procedures that have been referenced in this paragraph can be carried out with ALTXA and how the necessity to design it arose as a result of the lack of functionalities of some of the already existing computational tools and the lack of accessibility of others, which were mentioned to give the reader an idea of the niche that this software intends to occupy. The results derived from the pre-studies were presented in Chapter 5. The first one evaluated the effectiveness of the calculation of the average number of words per sentence 220 as an authorship discriminator between Shakespeare and Marlowe. It was divided into four stages, that is, one for each of the four types of scenes according to their length. The purpose of this pre-study was to discern if there is enough intra-author consistency and inter-author variation in the scenes of the two playwrights. The average number of words per sentence of Shakespearean and Marlowian scenes from the four groups was calculated and the results showed that, even though the intra-author consistency increases with the size of the samples, especially that of Marlowe, the results of the scenes of both playwrights overlap frequently, for which this parameter was not included in the case study. The second pre-study, which was also structured into four stages, focused on the calculation of the lexical richness of scenes from both reference corpora with the same purpose of discerning if there is enough intra-author consistency and inter-author variation. Even though the results of the larger scenes presented more intra-author consistency than those derived from the calculation of the average number of words per sentence, they did not show enough inter-author variation to include the calculation of this parameter in the case study either. The third pre-study evaluated the effectiveness of n-gram tracing by extracting scenes from the reference corpora of Shakespeare and Marlowe to discern if this method could associate them with the corpus from which they had been removed. As explained in Chapter 4, two methodological decisions were made and a hypothesis was formulated to optimize the reliability of these studies. The first decision was to balance the size of the two reference corpora by removing a similar number of words from the two plays that constitute that of Shakespeare, which was larger. The second one was to analyse from a quantitative perspective the type of n-grams that a disputed text shares at least ten times with one of the reference corpora, whereas the others could be qualitatively analysed and seen as a complement of the quantitative study. Lastly, the hypothesis that word n-grams of at least two words are more distinctive than character n-grams and word 1-grams was suggested for the conduction of the studies. The results presented in Chapter 5 proved the high degree of effectiveness of n-gram tracing to determine the likeliest authorship of Shakespearean and Marlowian scenes from the four groups, which allowed for its inclusion in the case study. The last pre-study evaluated the reliability of the Zeta test to analyse the authorship of scenes from the fourth group, that is, of almost 2,000 words or more, since the plays 221 of the reference corpora are divided in segments of such length during the procedure and thus it seems sensible to only compare them with others that are similar. A controversial hypothesis was suggested in Chapter 4 for the conduction of the Zeta test, which is that authors should be compared individually with this method, instead of comparing an author with a group of many, which does not seem to be rigorous from a statistical point of view, as illustrated in Section 4.5.5. All the Shakespearean and Marlowian scenes extracted from their corpus and analysed as disputed texts were correctly attributed to their author, for which the Zeta test was used in the case study to analyse the scenes of almost 2,000 words or more. Chapter 6 analysed the authorship of the nineteen scenes of Arden of Faversham independently and Chapter 7 discussed these studies from a holistic perspective, which allowed for the extraction of conclusions about the authorship of the play, the validity of the hypotheses and the approach adopted for the conduction of the thesis and whether its main objectives have been accomplished or not. The results of the case study showed that Marlowe is more likely than Shakespeare to have written 15 of the 19 scenes of the play, which were attributed to him with a high degree of certainty on most of the occasions. Only Scene III.i was linked to Shakespeare and such attribution took place with a low degree of certainty, whereas the results derived from the analysis of the three remaining scenes were inconclusive. As discussed in Chapter 7, even though one of the main objectives of the investigation was only to discern if Shakespeare is more likely than Marlowe to have written each scene of the play or vice versa, the participation of Marlowe in its elaboration appears to be almost certain for the proportion of scenes that were attributed to him and the results derived from the study of Scene V.i. It is possible that, if none of the two playwrights had been involved in the creation of the play, the number of scenes attributed to each in Chapter 6 would have been more balanced. In addition, n-gram tracing associated the authorship of Scene V.i with Marlowe with a degree of certainty that cannot be found in the rest of the thesis. The clarity of this quantitative analysis was superior to that of the other 57 conducted with this method, including those with undisputed scenes, and this was reinforced by the qualitative analysis of the larger n-grams in common, which revealed that Arden of Faversham and the reference corpus of Marlowe share the 10-gram I have my wish in that I joy thy sight. This, which is the largest construction in common found in the thesis, stands as such a unique linguistic choice that it seems impossible that 222 two different playwrights may have chosen it. The study of Scene V.i using n-gram tracing was complemented by the conduction of a Zeta test that also attributed its authorship to Marlowe. In sum, these studies seem to have offered substantive evidence to confirm the presence of Marlowe in the elaboration of Arden of Faversham, regardless of the comparisons that need to be made with other possible candidates in future research, which stands as a major breakthrough. They also suggest that the contribution of Shakespeare is minor or non-existent, given that only one of the scenes was attributed to him and this occurred with a low degree of certainty, while there are still other playwrights that need to be considered to examine the authorship of the play. This thesis has reached conclusions that differ from those presented in the studies of Kinney (2009) and Elliott and Greatley-Hirsch (2017) as a result of the hypotheses formulated about the compilation of the reference corpora and the conduction of the Zeta test. While there has been a tendency in studies of this kind to compile the reference corpora of the candidates with plays that were written in distant periods and belong to different genres, this thesis has strongly advocated for the necessity to take into account as many linguistic variables as possible when comparing authors with similar styles, as it is the case of Elizabethan playwrights. In addition, it has presented several arguments against the comparison of groups of authors during the conduction of a Zeta test. Even though these hypotheses cannot be validated or refuted until they are tested in other contexts, the fact that they have led to such distinct results from those of other studies on the authorship of Arden of Faversham could raise a debate about which approach is more reliable. As mentioned in Chapter 7, this could be seen as another contribution to the development of a budding discipline that has been constantly evolving over the last decades due to the irruption of new technologies. The two main objectives delineated at the beginning of the thesis have been accomplished. The authorship of the nineteen scenes of Arden of Faversham has been attributed considering Shakespeare and Marlowe as the candidates, and these analyses have been carried out with the newly designed software ALTXA. This computational tool stands as the pillar of a future project that seeks to assist fellow researchers and facilitate the implementation of authorship attribution studies in educational contexts, as will be explained in the following section. 223 8.2. Limitations and future lines of research This section will present the limitations of the thesis and suggest possible lines of future research. As pointed out in Chapter 4, the delimitation of the scope of the investigation to the sole consideration of Shakespeare and Marlowe as the possible candidates for the attribution of authorship of Arden of Faversham was a necessity, given its methodological approach. Every authorship method selected to carry out the research was tested in a pre-study divided into four stages depending on the length of the scenes, with the exception of the pre-study about the Zeta test, which was only applied with scenes from the fourth group. In addition, the hypothesis that authors should be compared individually during the conduction of the Zeta test was formulated. This means that the inclusion of more candidates to carry out the pre-studies and the case study would have produced an unbearable amount of work for the time I had been given or an excessively long thesis. Therefore, the main limitation of this work is that only Shakespeare and Marlowe have been considered as the possible candidates for the attribution of authorship of the disputed text and, for that reason, this should be seen as the first step of a long-term project in which other candidates need to be involved (see Section 4.1). The playwright designated as the likeliest author of every scene of Arden of Faversham in this thesis needs to be compared with others in future studies, where the authorship of each of these nineteen scenes must be analysed independently. Thomas Kyd is the first one with whom these comparisons should be made, since he has been presented as the most solid alternative for Shakespeare and Marlowe in previous research (see Sections 2.3 and 3.4.4). Thomas Watson, who was suggested by Taylor (2019) as a contributor to the elaboration of the play, and other playwrights of the time who have been considered by scholars as potential candidates (see Section 3.4.4) should be also included in future studies to reach conclusive results about the authorship of the play. In sum, the rigour of the approach followed for the conduction of the thesis makes the attribution of authorship of Arden of Faversham an arduous task. Every time two playwrights are compared, the validity of each method to distinguish between their undisputed scenes of distinct lengths needs to be tested, given that what has been proved to be effective with Shakespeare and Marlowe may be useless if it is used to compare between Marlowe and Kyd, for instance (see Chapter 4). If authors are compared 224 individually, instead of comparing a single author with a group of many, as other scholars did during the conduction of procedures like the Zeta test, there is a wide range of possible combinations. In other words, this approach is not compatible with immediate results. This thesis is the first milestone of a project where an approach that allows for a reliable comparison between two candidates to analyse the authorship of the scenes of Arden of Faversham has been designed. In addition, a computational tool on which future investigations and an educational project will be built has been developed. Even though the creation of the software ALTXA has consumed much time, this will be the key instrument to carry out the following studies quickly and effectively. The program will be used to start an initiative in 2022 to make forensic linguistics in general, and authorship attribution studies in particular, accessible to all types of audiences, which could facilitate the spread of the discipline. If the user clicks the Help button of ALTXA, a drop-down menu with the section About us will spread on its interface. There, the user will have access to the official email account of the software to send doubts and queries, its Twitter account to have access to the latest updates and, most importantly, the YouTube channel Project ALTXA, where videotutorials in Spanish and English on the functionalities of ALTXA will be uploaded, as well as enjoyable talks about forensic linguistics and its main areas of study (see Appendix 5). The objective of this future project is to create an accessible learning environment where guest speakers and myself will present brief videos addressing distinct topics related to the discipline that can be easily followed by students. These videos may discuss theoretical aspects, such as the Plain English Movement or the methodological foundations for plagiarism detection, or practical cases that can be solved either with or without the assistance of ALTXA, which can be of use for students starting their own investigations. The software and this educational project will be promoted in academic journals, conferences and social media. Its goal is to facilitate the establishment of forensic linguistics in academic contexts, where there is still a scarcity of experts in the field and teaching tools, which is a niche that ALTXA intends to occupy. This doctoral thesis has been motivated by the desire of democratizing knowledge and the aspiration of solving 225 an authorship problem that has been present for centuries and has its focus on some of the most gifted authors in the history of literature. 226 PRIMARY SOURCES Anonymous (1592). Arden of Faversham [eBook edition]. Project Gutenberg. Retrieved on December 9, 2021, from https://www.gutenberg.org/files/43440/43440- h/43440-h.htm Brandeis University (2018, December 31). Fascimile Viewer: First Folio (1623). Internet Shakespeare Editions. Retrieved on December 9, 2021, from https://internetshakespeare.uvic.ca/Library/facsimile/overview/book/F1.html Marlowe, C. (1598). Edward II [eBook edition]. Project Gutenberg. Retrieved on December 9, 2021, from https://www.gutenberg.org/cache/epub/20288/pg20288.html Marlowe, C. (1633). The Jew of Malta [eBook edition]. Project Gutenberg. Retrieved on December 9, 2021, from https://www.gutenberg.org/files/901/901-h/901-h.htm Shakespeare, W. (1623). Richard III [eBook edition]. Project Gutenberg. Retrieved on December 9, 2021, from https://www.gutenberg.org/cache/epub/1103/pg1103- images.html Shakespeare, W. (1623). Richard II [eBook edition]. Project Gutenberg. Retrieved on December 9, 2021, from https://www.gutenberg.org/files/1111/1111.txt https://www.gutenberg.org/files/43440/43440-h/43440-h.htm https://www.gutenberg.org/files/43440/43440-h/43440-h.htm https://internetshakespeare.uvic.ca/Library/facsimile/overview/book/F1.html https://www.gutenberg.org/cache/epub/20288/pg20288.html https://www.gutenberg.org/files/901/901-h/901-h.htm https://www.gutenberg.org/cache/epub/1103/pg1103-images.html https://www.gutenberg.org/cache/epub/1103/pg1103-images.html https://www.gutenberg.org/files/1111/1111.txt 227 BIBLIOGRAPHY AND REFERENCES Alcaraz, E. (2005). La lingüística legal: el uso, el abuso y la manipulación del lenguaje jurídico. In Turell, M. T. (Ed.), Lingüística forense, lengua y derecho: Conceptos, métodos y aplicaciones (pp. 49-63). Barcelona: Documenta Universitaria. Alhudithi, E. (2021). Review of Voyant Tools: See through your Text. Language Learning & Technology, 25(3), pp. 43-50. Anthony, L. (2022). AntConc (Version 4.0.3) [Computer software]. Retrieved on January 25, 2022, from https://www.laurenceanthony.net/software/antconc/ Arias Rodríguez, I., & Fernández-Pampillón Cesteros, A. M. [Área de Lingüística General UCM]. (2020, May 27). Taller de Sketch Engine [Video]. YouTube. Retrieved on January 26, 2022, from https://www.youtube.com/watch?v=rLNs2UUVHB8 Astrana, L. (1964). Vida inmortal de William Shakespeare. Barcelona: Editorial Atlántico. Austin, J. L. (1962). How to Do Things with Words. Oxford: Oxford University Press. Baker, J. C. (1988). Pace: A Test of Authorship Based on the Rate at which New Words Enter an Author’s Text. Literary and Linguistic Computing, 3(1), pp. 36-39. Baldwin, J. (1993). Police Interview Techniques: Establishing Truth or Proof? British Journal of Criminology, 33(3), pp. 325-352. Barker, S., & Hinds, H. (2003). The Routledge Anthology of Renaissance Drama. New York: Routledge. Barrón-Cedeño, A., Vila, M., & Rosso, P. (2014). Detección automática de plagio: De la copia exacta a la paráfrasis. In Garayzábal, E., Jiménez, M., & Reigosa, M. (Eds.), Lingüística forense: La lingüística en el ámbito legal y policial (pp. 123-152). Madrid: Euphonia Ediciones. Boas, F. S. (1940). Christopher Marlowe: A Biographical and Critical Study. Oxford: Oxford University Press. Bozkurt, I. N., Baghoglu, O., & Uyar, E. (2007). Authorship Attribution. Performance of Various Features and Classification Methods. 22nd International Symposium on Computer and Information Sciences, pp. 1-5. Retrieved on January 17, 2019, from https://ieeexplore.ieee.org/abstract/document/4456854/citations#citations Bryson, B. (2009). Shakespeare: El mundo como escenario. Barcelona: Editorial RBA. Canter, D., & Chester, J. (1997). Investigation into the Claim of Weighted Cusum in Authorship Attribution Studies. Forensic Linguistics, 4(2), pp. 252-261. Cheng, W., Greaves, C., & Warren, M. (2006). From N-gram to Skipgram to Concgram. International Journal of Corpus Linguistics, 11(4), pp. 411-433. Christensen, A. (2017). Separation Scenes: Domestic Drama in Early Modern England. Nebraska: University of Nebraska Press. https://www.laurenceanthony.net/software/antconc/ https://www.youtube.com/watch?v=rLNs2UUVHB8 https://ieeexplore.ieee.org/abstract/document/4456854/citations#citations 228 Cicres, J., & Queralt, S. (2019). An N-gram Based Approach to the Automatic Classification of Schoolchildren’s Writing. Vigo International Journal of Applied Linguistics, 16, pp. 53-80. Clarke, I., & Kredens, K. (2018). “I consider myself to be a service provider”: Discursive Identity Construction of the Forensic Linguist Expert. The International Journal of Speech, Language and the Law, 25(1), pp. 79-107. Correa, M. (2013). Forensic Linguistics. An Overview of the Intersection and Interaction of Language and Law. Studies About Languages, 23, pp. 5-13. Coulthard, M. (1996). The Official Version. Audience Manipulation in Police Records of Interviews with Suspects. In Cmejrková, S., Hoffmannová, J., Müllerová, O., & Svetlá, J. (Eds.), Dialoganalyse VI: Proceedings of the VI Conference (pp. 121- 132). Prague: Max Niemeyer Verlag. Coulthard, M. (2004). Author Identification, Idiolect and Linguistic Uniqueness. Applied Linguistics, 25(4), pp. 431-447. Coulthard, M. (2010). Forensic Linguistics: The Application of Language Description in Legal Contexts. Langage et société, 132(2), pp. 15-33. Coulthard, M., Grant, T., & Kredens, K. (2010). Forensic Linguistics. In Wodak, R., Johnstone, B., & Kerswill, P. (Eds.), Handbook of Sociolinguistics (pp. 529-544). London: SAGE Publications. Coulthard, M., & Johnson, A. (2007). An Introduction to Forensic Linguistics: Language in Evidence. New York: Routledge. Coulthard, M., & Johnson, A. (2010). The Routledge Handbook of Forensic Linguistics. New York: Routledge. Craig, H., & Kinney, A. F. (2009). Methods. In Craig, H., & Kinney, A. F. (Eds.), Shakespeare, Computers and the Mystery of Authorship (pp. 15-39). Cambridge: Cambridge University Press. Culpeper, J. (2018). Affirmatives in Early Modern English: Yes, yea and ay. Journal of Historical Pragmatics, 19(2), pp. 243-264. Daubert v. Merrell Dow Pharmaceuticals Inc., Volume 509 U.S. Page 579 (1993). Retrieved on May 31, 2019 from https://caselaw.findlaw.com/us-9th- circuit/1430422.html Dudgeon, C. (2009). Forensic Performances: Evidentiary Narrative in Arden of Faversham. In Majeske, A., & Detmer-Goebel, E. (Eds.), Justice, Power and Women in English Renaissance Drama (pp. 98-117). New Jersey: Fairleigh Dickinson University Press. Dumas, B. K. (2002). Reasonable Doubt about Reasonable Doubt: Assessing Jury Instruction Adequacy in a Capital Case. In Cotteril, J. (Ed.), Language in the Legal Process (pp. 245-259). Hampshire: Palgrave Macmillan. Durant, A. (2010). Meaning in the Media: Discourse, Controversy and Debate. Cambridge: Cambridge University Press. https://caselaw.findlaw.com/us-9th-circuit/1430422.html https://caselaw.findlaw.com/us-9th-circuit/1430422.html 229 El Mundo (2017, February). El rector de la URJC “plagió literalmente” una obra de un ex decano de la UB. Retrieved on February 21, 2020, from https://www.elmundo.es/madrid/2017/02/03/5894c721e2704e80678b4615.html Elliott, J., & Greatley-Hirsch, B. (2017). Arden of Faversham, Shakespearean Authorship, and “The Print of Many”. In Taylor, G., & Egan, G. (Eds.), The New Oxford Shakespeare: Authorship Companion (pp. 139-181). Oxford: Oxford University Press. Fallow, D. (2016). Su padre, John Shakespeare. In Edmonson, P., & Wells, S. (Eds.), El círculo de Shakespeare (pp. 47-67). Barcelona: Stella Maris. Federal Bureau of Investigation (n.d.). Amerithrax or Anthrax Investigation. Retrieved on February 15, 2020, from https://www.fbi.gov/history/famous- cases/amerithrax-or-anthrax-investigation Felsenfeld, C. (1981). The Plain English Movement in the United States. FLASH: The Fordham Law Archive of Scholarship and History, 6, pp. 408-421. Fitzgerald, J. R. (2014). Atribución de autoría y supuestas notas de suicidio: Análisis lingüístico forense y su papel en los tribunales penales estadounidenses en dos crímenes violentos ocurridos en 2007. In Garayzábal, E., Jiménez, M., & Reigosa, M. (Eds.), Lingüística forense: La lingüística en el ámbito legal y policial (pp. 49- 77). Madrid: Euphonia Ediciones. Foltýnek, T., Meuschke, N., & Gipp, B. (2019). Academic Plagiarism Detection: A Systematic Literature Review. ACM Computing Surveys, 52(6), pp. 112:1-112:42. Fraser, B. (1998). Threatening Revisited. Forensic Linguistics, 5(2), pp. 159-173. French, P. (1994). An Overview of Forensic Phonetics with Particular Reference to Speaker Identification. International Journal of Speech, Language and the Law, 1(2), pp. 169-181. Gibbons, J. (2003). Forensic Linguistics: An Introduction to Language in the Justice System. Oxford: Blackwell Publishing. Gibbons, J. (2005). El entramado lingüístico de los interrogatorios. In Turell, M. T. (Ed.), Lingüística forense, lengua y derecho: Conceptos, métodos y aplicaciones (pp. 193-219). Barcelona: Documenta Universitaria. Gibbons, J. (2011). Towards a Framework for Communication Evidence. The International Journal of Speech, Language and the Law, 18(2), pp. 233-260. Gil, J., & San Segundo, E. (2014). La cualidad de voz en fonética judicial. In Garayzábal, E., Jiménez, M., & Reigosa, M. (Eds.), Lingüística forense: La lingüística en el ámbito legal y policial (pp. 153-197). Madrid: Euphonia Ediciones. Goustos, D. (1995). Review of Forensic Stylistics, by G. McMenamin. Forensic Linguistics, 2(1), pp. 99-113. Grant, T. (2007). Quantifying Evidence in Forensic Authorship Analysis. The International Journal of Speech, Language and the Law, 14(1), pp. 1-25. Greenblatt, S., & Logan, G. (2012). The Sixteenth Century. In Greenblatt, S. (Ed.), The Norton Anthology of English Literature (pp. 531-1339). New York: Norton. https://www.elmundo.es/madrid/2017/02/03/5894c721e2704e80678b4615.html https://www.fbi.gov/history/famous-cases/amerithrax-or-anthrax-investigation https://www.fbi.gov/history/famous-cases/amerithrax-or-anthrax-investigation 230 Grieve, J., Clarke, I., Chiang, E., Gideon, H., Heini, A., Nini, A., & Waibel, E. (2018). Attributing the Bixby Letter Using N-gram Tracing. Digital Scholarship in the Humanities, 34(3), pp. 493-512. Halliday, F. E. (1964). Shakespeare: Biografía ilustrada. Barcelona: Ediciones Destino. Haworth, K. (2018). Tapes, Transcripts and Trials: The Routine Contamination of Police Interview Evidence. The International Journal of Evidence and Proof, 22(4), pp. 428-450. Holinshed, R. (1587). Chronicles of England, Scotland and Ireland. London: The British Library. Retrieved on January 19, 2020, from http://english.nsms.ox.ac.uk/holinshed/texts.php?text1=1587_8324#p14902 Holland, P. (2007). William Shakespeare. Oxford: Oxford University Press. Honan, P. (2006). Christopher Marlowe: Poet and Spy. Oxford: Oxford University Press. Honigman, E. (2001). Shakespeare’s Life. In De Grazia, M., & Wells, S. (Eds.), The Cambridge Companion to Shakespeare (pp. 1-12). Cambridge: Cambridge University Press. Hopkins, L. (2008). Christopher Marlowe, Renaissance Dramatist. Edinburgh: Edinburgh University Press. Howald, B. S. (2008). Authorship Attribution under the Rules of Evidence: Empirical Approaches —a Layperson’s Legal System. The International Journal of Speech, Language and the Law, 15(2), pp. 219-247. International Association for Forensic and Legal Linguistics (2020). Forensic Linguistics. IAFLL. Retrieved on January 19, 2020, from https://www.iafl.org/forensic- linguistics/ Ishihara, S. (2014). A Likelihood Ratio Based Evaluation of Strength of Authorship Attribution Evidence in SMS Messages Using N-grams. The International Journal of Speech, Language and the Law, 21(1), pp. 23-49. Jackson, M. P. (2014). Determining the Shakespeare Canon: Arden of Faversham and A Lover’s Complaint. Oxford: Oxford University Press. Jackson, M. P. (2017). Shakespeare, Arden of Faversham, and A Lover’s Complaint: A Review of Reviews. In Taylor, G., & Egan, G. (Eds.), The New Oxford Shakespeare: Authorship Companion (pp. 123-135). Oxford: Oxford University Press. Jackson, M. P., & Taylor, G. (2015). Shakespearean Authorship. The Times Literary Supplement, 5849, p. 6. Jessen, M. (2008). Forensic Phonetics. Language and Linguistics Compass, 2(4), pp. 671- 711. Jonson, B. (1623). To the Memory of my Beloved the Author, Mr. William Shakespeare. Poetry Foundation. Retrieved on May 6, 2021, from https://www.poetryfoundation.org/poems/44466/to-the-memory-of-my-beloved- the-author-mr-william-shakespeare Kermode, F. (2005). El tiempo de Shakespeare. Madrid: Debate. https://www.iafl.org/forensic-linguistics/ https://www.iafl.org/forensic-linguistics/ https://www.poetryfoundation.org/poems/44466/to-the-memory-of-my-beloved-the-author-mr-william-shakespeare https://www.poetryfoundation.org/poems/44466/to-the-memory-of-my-beloved-the-author-mr-william-shakespeare 231 Kilgarriff, A., & Rychlý, P. (2003). Sketch Engine [Online tool]. Retrieved on January 26, 2022, from https://www.sketchengine.eu/ Kinney, A. F. (2009). Authoring Arden of Faversham. In Craig, H., & Kinney, A. F. (Eds.), Shakespeare, Computers and the Mystery of Authorship (pp. 78-99). Cambridge: Cambridge University Press. Kocher, P. L. (1948). Christopher Marlowe, Individualist. University of Toronto Quarterly, 17(2), pp. 111-120. Kredens, K. (2016). Conflict or Convergence? Interpreters’ and Police Officers’ Perceptions of the Role of the Public Service Interpreter. Language and Law, 3(2), pp. 65-77. Larner, S. (2014). A Preliminary Investigation into the Use of Formulaic Sequences as a Marker of Authorship. The International Journal of Speech, Language and the Law, 21(1), pp. 1-22. Latorre, J. A. (2017). Attribution of Authorship of The Merchant of Venice and Henry VI through Linguistic Parameters: A Contrastive Study between William Shakespeare and Christopher Marlowe [Master’s dissertation, Universidad Complutense de Madrid]. Retrieved on February 11, 2020, from https://eprints.ucm.es/47400/ Levi, J. N. (1993). Evaluating Jury Comprehension of Illinois Capital-Sentencing Instructions. American Speech, 68(1), pp. 20-49. Ley Orgánica 10/1995, de 23 de noviembre, del Código Penal (2015). Boletín Oficial del Estado, 281, sec. I, de 24 de noviembre de 1995, 33987 a 34058. Retrieved on May 4, 2020, from https://www.boe.es/buscar/doc.php?id=BOE-A-1995-25444 Losey, F. D. (1927). The Kingsway Shakespeare. London: George G. Harrap & Co. Martin, B. (2004). Plagiarism: Policy Against Cheating or Policy Against Learning? University of Wollongong. Retrieved on January 4, 2020, from http://www.uow.edu.au/arts/sts/bmartin/ McDougall, K., & Duckworth, M. (2018). Individual Patterns of Disfluency Across Speaking Styles: A Forensic Phonetic Investigation of Standard Southern British English. International Journal of Speech, Language and the Law, 25(2), pp. 205- 230. McMenamin, G. (1993). Forensic Stylistics. Amsterdam: Elsevier. McMenamin, G. (2002). Advances in Forensic Stylistics. Florida: CRC Press. Mellinkoff, D. (1963). The Language of the Law. Oregon: Resource Publications. Mendelhall, T. C. (1887). The Characteristic Curves of Composition. Science, ns- 9(214S), pp. 237-246. Retrieved on May 12, 2019, from https://science.sciencemag.org/content/ns-9/214S/237/tab-pdf Merriam, T. (1996). Tamburlaine Stalks in Henry VI. Computers and the Humanities, 30(3), pp. 267-280. Moerk, E. L. (1973). An Objective, Statistical Description of Style. Linguistics: An Interdisciplinary Journal of the Language Sciences, 11(108), pp. 50-58. https://www.sketchengine.eu/ https://eprints.ucm.es/47400/ https://www.boe.es/buscar/doc.php?id=BOE-A-1995-25444 https://science.sciencemag.org/content/ns-9/214S/237/tab-pdf 232 Momeni, N. (2011). Forensic Linguistics: A Conceptual Frame of Bribery with Linguistic and Legal Features (a Case Study in Iran). International Journal of Criminology and Social Theory, 4(2), pp.733-744. Morton, A. Q., & Michaelson, S. (1990). The Q-sum Plot. Edinburgh: Department of Computer Science, University of Edinburgh. Nicholl, C. (2016). El caso de Marlowe. In Edmonson, P., & Wells, S. (Eds.), La verdad sobre Shakespeare (pp. 59-73). Barcelona: Stella Maris. Nini, A., & Grant, T. (2013). Bridging the Gap between Stylistic and Cognitive Approaches to Authorship Analysis Using Systemic Functional Linguistics and Multidimensional Analysis. The International Journal of Speech, Language and the Law, 20(2), pp. 173-202. Olsson, J. (2004). Forensic Linguistics: An Introduction to Language, Crime and the Law. London: Continuum International Publishing Group. Olsson, J. (2008). Forensic Linguistics. New York: Continuum International Publishing Group. Oxburgh, G. E., Myklebust, T., & Grant, T. (2010). The Question of Question Types in Police Interviews: A Review of Literature from a Psychological and Linguistic Perspective. International Journal of Speech, Language and the Law, 17(1), pp. 45-66. Perkins, R., & Grant, T. (2012). Forensic Linguistics. In Siegel, J. A., & Saukko, P. J. (Eds.), Encyclopedia of Forensic Sciences, Second Edition (pp. 174-177). Amsterdam: Elsevier. Perraudin, F. (2016, October 23). Christopher Marlowe Credited as one of Shakespeare’s Co-writers. The Guardian. Retrieved on January 21, 2020, from https://www.theguardian.com/culture/2016/oct/23/christopher-marlowe- credited-as-one-of-shakespeares-co-writers Philbrick, F. A. (1949). Language and the Law. The Semantics of Forensic English. New York: The Macmillan Company. Potter, L. (2012). The Life of William Shakespeare: A Critical Biography. Oxford: Wiley- Blackwell. Potthast, M., Stein, B., Eiselt, A., Barrón-Cerdeño, A., & Rosso, P. (2009). Overview of the 1st International Competition on Plagiarism Detection. In Stein, B., Rosso, P., Stamatos, E., Koppel, M., & Agirre, E. (Eds.), Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 09) (pp. 1-9). CEUR- WS.org. Quality Inns International, Inc. v. McDonald’s Corporation, Volume 695 F. Supp. 198 (1988). Retrieved on April 10, 2019, from https://law.justia.com/cases/federal/district-courts/FSupp/695/198/2346281/ Queralt, S. (2014). Acerca de la prueba lingüística en atribución de autoría hoy. Revista de Llengua i Dret, 62, pp. 35-48. Rhode Island v. Innis, Volume 446 U.S. Page 291 (1980). Retrieved on February 2, 2019, from https://supreme.justia.com/cases/federal/us/446/291/ https://www.theguardian.com/culture/2016/oct/23/christopher-marlowe-credited-as-one-of-shakespeares-co-writers https://www.theguardian.com/culture/2016/oct/23/christopher-marlowe-credited-as-one-of-shakespeares-co-writers https://law.justia.com/cases/federal/district-courts/FSupp/695/198/2346281/ https://supreme.justia.com/cases/federal/us/446/291/ 233 Richardson, C. (2006). Domestic Life and Domestic Tragedy in Early Modern England: The Material Life of the Household. Manchester: Manchester University Press. Riggs, D. (2004). The World of Christopher Marlowe. London: Faber and Faber. Rock, F. (2007). Communicating Rights: The Language of Arrest and Detention. Hampshire: Palgrave Macmillan. Royal Shakespeare Company (2021). Timeline of Shakespeare’s Plays. Retrieved on June 7, 2021, from https://www.rsc.org.uk/shakespeares-plays/timeline Ryskina, M., Alpert-Abrams, H., Garrette, D., & Berg-Kirkpatrick, T. (2017). Automatic Compositor Attribution in the First Folio of Shakespeare. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Short Papers), pp. 411-416. Schoenbaum, S. (1985). William Shakespeare: Una biografía documentada. Barcelona: Editorial Argos Vergara. Scott, M. (2021). WordSmith Tools (Version 8) [Computer software]. Retrieved on January 25, 2022, from https://www.lexically.net/wordsmith/ Shuy, R. (2002). Linguistic Battles in Trademark Disputes. Hampshire: Palgrave Macmillan. Shuy, R. (2010). The Language of Defamation Cases. New York: Oxford University Press. Sinclair, S., & Rockwell, G. (2022). Voyant Tools (Version 2.5.3) [Online tool]. Retrieved on January 25, 2022, from https://voyant-tools.org/ Smith, E. L. (2021). A Review of the Computational Linguistics Tools WordSmith Tools (Version 8) and AntConc (Version 3.5.8). Renaissance and Reformation, 44(1), pp. 200-214. Solan, L. M. (1993). The Language of Judges. Chicago: University of Chicago Press. Sousa-Silva, R. (2013). Detecting Plagiarism in the Forensic Linguistics Turn [Doctoral dissertation, Aston University]. Retrieved on January 25, 2022, from https://research.aston.ac.uk/en/studentTheses/detecting-plagiarism-in-the- forensic-linguistics-turn Sousa-Silva, R. (2014). Detecting Translingual Plagiarism and the Backlash against Translation Plagiarists. Language and Law, 1(1), pp. 70-94. Svartvik, J. (1968). The Evans Statements: A Case for Forensic Linguistics. Goteborg: Elanders boktryckeri aktiebolag. Retrieved on May 7, 2019, from https://www.thetext.co.uk/Evans%20Statements%20Part%201.pdf Tallent, L. (2007). Looking for Marlowe. College Literature, 34(1), pp. 213-222. Taylor, G. (2019). Finding “Anonymous” in the Digital Archives: The Problem of Arden of Faversham. Digital Scholarship in the Humanities, 34(4), pp. 855-873. The Marlowe Society (2021). Published Works. Retrieved on June 7, 2021, from http://www.marlowe-society.org/christopher-marlowe/works/ https://www.rsc.org.uk/shakespeares-plays/timeline https://www.lexically.net/wordsmith/ https://voyant-tools.org/ https://research.aston.ac.uk/en/studentTheses/detecting-plagiarism-in-the-forensic-linguistics-turn https://research.aston.ac.uk/en/studentTheses/detecting-plagiarism-in-the-forensic-linguistics-turn https://www.thetext.co.uk/Evans%20Statements%20Part%201.pdf http://www.marlowe-society.org/christopher-marlowe/works/ 234 Tiersma, P. (1993). The Judge as Linguist. Loyola of Los Angeles Law Review, 27(1), pp. 269-283. Tiersma, P. (2009). Communicating with Juries: How to Draft more Understandable Jury Instructions. Loyola-LA Legal Studies Paper, 2009-44. Retrieved on November 11, 2019, from https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1507298 Turell, M. T. (2005). El plagio en la traducción literaria. In Turell, M. T. (Ed.), Lingüística forense, lengua y derecho: Conceptos, métodos y aplicaciones (pp. 275-298). Barcelona: Documenta Universitaria. Turell, M. T. (2010). The Use of Textual, Grammatical and Sociolinguistic Evidence in Forensic Text Comparison. The International Journal of Speech, Language and the Law, 17(2), pp. 211-250. Udina, N. (2017). Forensic Linguistics Implications for Legal Education: Creating the e- textbook on Language and Law. Procedia: Social and Behavioral Sciences, 237, pp. 1337-1340. Universidad Nacional de Educación a Distancia (2017, September). Análisis de textos y estilometría usando R. Formación permanente UNED. Retrieved on January 15, 2022, from https://formacionpermanente.uned.es/tp_actividad/idactividad/10010 Valero-Garcés, C. (2018). Lingüística forense. Contextos, teoría y práctica. Madrid: Edisofer S.L. Vázquez Maroño, M. L. (2014). La entrevista policial, un diálogo transformado en monólogo. In Garayzábal, E., Jiménez, M., & Reigosa, M. (Eds.), Lingüística forense: La lingüística en el ámbito legal y policial (pp. 341-356). Madrid: Euphonia Ediciones. Vickers, B. (2008). Thomas Kyd, Secret Sharer. The Times Literary Supplement, 5481, pp. 13-15. Vickers, B. (2015). No Shakespeare to Be Found. The Times Literary Supplement, 5487, pp. 9-11. Wood, M. (2016). Su madre, Mary Shakespeare. In Edmonson, P., & Wells, S. (Eds.), El círculo de Shakespeare (pp. 27-45). Barcelona: Stella Maris. Woolls, D., & Coulthard, M. (1998). Tools for the Trade. Forensic Linguistics, 5(1), pp. 33-57. Wright, D. (2017). Using Word N-grams to Identify Authors and Idiolects. International Journal of Corpus Linguistics, 22(2), pp. 212-241. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1507298 https://formacionpermanente.uned.es/tp_actividad/idactividad/10010 235 APPENDICES 236 APPENDIX 1 Transcript of the email that was sent from Dulceliz Díaz’s email account to three members of her family on January 15, 2007 (Fitzgerald, 2014, p. 53) Subject: do me a favor. I’m doing something today thast will affe4ct us all, I weant uou to do me a favor, get jajaira and eddie and all 4 of their kids, he raped me when i went to their hous e and she watched, so I want you to kill thenm, ill be watchin to make sure you do this, leave albert alone though just ‘tell albert I love him and this ist his fault, and its not the familys faut either, I just deont weant to live anymore, mommy and poppi i liove you, mio I love you carlos I love you and Brenda I love you, please tell albert that I will always love him……. i sorry that it has to be this way everyone, but this is what iv wantred to do for a very long time, peace3 out and I’ll be keeping an eye on all of you, and even though we argued and fight over stupid things, you guys are always gonna bwe in my heart, 237 APPENDIX 2 Graphical representation of the results derived from the Zeta test conducted by Kinney to analyse the authorship of the scenes of Arden of Faversham and a table with their coordinates to clarify which ones have been attributed to Shakespeare (2009, pp. 92-93) 238 239 APPENDIX 3 Stop list with all the words ignored as potential markers during the conduction of the Zeta tests of the thesis in alphabetical order a abigail about above across adieu after afterwards again against alexandria all almost along already also although always am among amongst amoungst an and anne another any anyhow anyone anything anyway anywhere are around art arundel as at aumerle back bagot baldock barabas barnardine be because been before beforehand behind being below beside besides between beyond bolingbroke bolingbroke's borsa both brandon britaine buckingham bushy but by calymath can cannot catesby chamberlain clarence clarence' cornwall could could'st crosby danae dare de despite did did'st didst do does don done dorset dost doth down during durst each edmund edmund's edward's edwardum eg egypt eight eighth either eleven elizabeth else elsewhere ely 'em england 240 england's english enough est etc even ever every everyone everything everywhere except exeter ferneze ferneze's few fifth first five flint flint's florence for four fourth france frenchman from gaunt gaveston george george's glocester gloucester gloucester's gurney had hadst harry has hast hastings hath have he hebrew hebrews hence henry henry's her hercules here hereabouts hereafter hereby hereford hereford's herein hereinafter heretofore hereunder hereupon herewith hers herself him himself his how however hum hylas i if in indeed instead into ireland is isabel isabella it italian ithamore its itself jacomo jerusalem jesu 'jesu jew jews jew's john jove jove's kent killingworth lancaster last latter latterly least leicester less levune lightborn like lodowick london longshanks' lot lots malta malta's many margaret margaret's 241 marquis mathias matrevis may mayst me might mine more moreover mortimer mortimers mortimer's morton most mostly much must my myself namely near need ne'er neither never nevertheless next nine nineth no nobody nolite none noone nor norfolk normandy northumberland not nothing now nowhere o' occidere of off often oftentimes on once one o'er only onto or other others otherwise ought our ours ourselves out outside over paul paul's pembroke pembroke's per perhaps pilia plantagenet plantagenets pomfret quam ratcliff rather re rhodes richard richard's richmond richmond's rome rutland salisburg salisbury same second selim seven seventh several shall shalt she should sicily since six sixth so some somehow someone something sometime sometimes somewhat somewhere spain spenser spensers spenser's stanley still such 't 242 ten tenth tewksbury th' than that the thee their theirs them themselves then thence there thereabouts thereafter thereby therefore therein thereof thereon thereupon these they thine third this thither thomas those thou though three through throughout thru thus thy timere tis to together too top toward towards turk turkey turks twas twelve twenty two tynmouth tyrrel under until up upon us used valois vaughan venice very via walter warwick warwickshire was wast we well welshmen were wert westminster what whatever when whenas whence whenever where whereafter whereas whereby wherein whereupon wherever whether which while whither who whoever whole whom whose why whyever will william wilt wiltshire winchester with within without would wouldst ye yet york york's you your yours yourself 243 yourselves 244 APPENDIX 4 Lists of the 500 Shakespearean and Marlowian markers for the attribution of authorship of the scenes of Arden of Faversham with the Zeta test. The position of these markers on the lists is determined by their score according to the formula provided in Section 4.5.5 Shakespearean markers 1. duke 2. god's 3. god 4. tongue 5. hours 6. wife 7. royal 8. deep 9. children 10. foul 11. dangerous 12. bloody 13. sons 14. eye 15. arm 16. holy 17. noble 18. days 19. ill 20. princes 21. fearful 22. seat 23. weeping 24. living 25. hour 26. just 27. heavy 28. cousin 29. happy 30. to-day 31. sorrow 32. bosom 33. king 34. right 35. high 36. rivers 37. beseech 38. grievous 39. woe 40. weary 41. kindred 42. bad 43. peace 44. thoughts 45. brother's 46. eyes 47. set 48. look'd 49. souls 50. truth 51. duty 52. guilty 53. princely 54. virtuous 55. beat 56. wail 57. tender 58. pluck 59. forth 60. amen 61. anointed 62. gracious 63. black 64. conscience 65. won 66. mortal 67. prove 68. thing 69. bid 70. widow 71. deadly 72. hearts 73. tedious 74. sour 75. dull 76. thrive 77. breath 78. womb 79. shame 80. slander 81. degree 82. coward 83. ancient 84. leisure 85. mother 86. grey 87. brief 88. deny 89. ear 90. counsel 91. joy 92. subjects 93. cry 94. knee 95. self 96. hand 245 97. patience 98. mighty 99. issue 100. bids 101. age 102. own 103. spent 104. pale 105. cold 106. title 107. liege 108. brothers 109. stabb'd 110. drown 111. destruction 112. joys 113. guess 114. husband 115. shortly 116. says 117. humble 118. sleeping 119. party 120. current 121. morrow 122. pluck'd 123. world's 124. sentence 125. doom 126. alack 127. loyal 128. bold 129. mother's 130. gain 131. glory 132. tidings 133. win 134. green 135. side 136. sad 137. breathing 138. virtue 139. remember 140. haste 141. heart 142. defend 143. years 144. hell 145. young 146. law 147. wrong 148. to-morrow 149. matter 150. loss 151. gentlemen 152. subject 153. saw 154. withal 155. play 156. touch 157. thought 158. light 159. rage 160. dreams 161. untimely 162. reverend 163. quoth 164. earnest 165. prime 166. dispatch 167. outward 168. guilt 169. proportion 170. terror 171. heart's 172. woeful 173. yielded 174. bones 175. fathers 176. vantage 177. state 178. tempest 179. nurse 180. barren 181. withdraw 182. profane 183. designs 184. grace 185. model 186. bleeding 187. wound 188. height 189. boldly 190. sacrament 191. liest 192. upright 193. privilege 194. adversaries 195. treasons 196. dread 197. kinsman 198. record 199. looks 200. put 201. false 202. body 203. melancholy 204. woman 205. deed 206. pretty 207. rude 208. woes 209. dust 210. broken 211. bend 212. lands 213. buried 214. office 215. judge 216. urg'd 246 217. lend 218. book 219. jest 220. shed 221. lo 222. times 223. foe 224. battle 225. hot 226. devil 227. form 228. deserve 229. plain 230. faith 231. lies 232. purpose 233. country's 234. despair 235. course 236. tear 237. yea 238. told 239. clouds 240. loving 241. face 242. presence 243. seen 244. son 245. tyranny 246. direction 247. opposite 248. shallow 249. cousins 250. voice 251. prithee 252. allies 253. flourish 254. grandam 255. determin'd 256. bless'd 257. corse 258. blunt 259. beggar 260. aunt 261. creature 262. divided 263. plant 264. prepare 265. coronation 266. glass 267. beholding 268. prophesy 269. heinous 270. confound 271. contented 272. usurp 273. watch 274. prevent 275. scene 276. damn'd 277. angels 278. windows 279. loath 280. trees 281. yon 282. dispers'd 283. benefit 284. pieces 285. household 286. urge 287. intelligence 288. bay 289. heirs 290. merry 291. rights 292. commends 293. sickness 294. mock 295. envious 296. tardy 297. stopp'd 298. tale 299. awak'd 300. faces 301. sorrow's 302. snow 303. sullen 304. heavier 305. infant 306. dire 307. order 308. victory 309. reach 310. ceremonious 311. valour 312. empty 313. minister 314. deputy 315. spur 316. ripe 317. correction 318. saint 319. try 320. balm 321. grace's 322. father 323. lord 324. knightly 325. laid 326. swear 327. trembling 328. frozen 329. say 330. nought 331. blood 332. devotion 333. befall 334. apparent 335. sun 336. comfort 247 337. traitor 338. wept 339. prey 340. dog 341. prayer 342. worthy 343. beg 344. behalf 345. kill'd 346. stroke 347. sign 348. manner 349. assur'd 350. keeps 351. spake 352. fellow 353. fool 354. affairs 355. doubt 356. sets 357. kingdom 358. number 359. forward 360. pain 361. depose 362. falls 363. large 364. summer 365. cut 366. end 367. vow 368. brother 369. rough 370. head 371. reverence 372. betwixt 373. divine 374. hate 375. other's 376. shadow 377. lest 378. ay 379. awhile 380. wash 381. quick 382. enemies 383. kill 384. nature 385. rebels 386. mild 387. taste 388. banishment 389. bed 390. justice 391. fight 392. move 393. precious 394. cause 395. full 396. old 397. dignity 398. leads 399. daughters 400. infer 401. lovel 402. boar 403. knocks 404. knot 405. conqueror 406. sanctuary 407. growth 408. uncles 409. ungovern'd 410. babes 411. lamentation 412. vice 413. red 414. wife's 415. zounds 416. remorse 417. fain 418. perpetual 419. smother'd 420. butcher'd 421. reported 422. meed 423. acquaint 424. warn 425. cheerfully 426. scarcely 427. crept 428. toad 429. homicide 430. murd'rous 431. devilish 432. fouler 433. fairer 434. ugly 435. holes 436. evil 437. lip 438. graces 439. victorious 440. contrary 441. endur'd 442. beggars 443. nails 444. wonders 445. consorted 446. conference 447. consequence 448. contempt 449. pass'd 450. cried 451. piece 452. humility 453. dew 454. sights 455. boon 248 456. looking- glass 457. usurp'd 458. sovereignty 459. duteous 460. glories 461. shook 462. ages 463. bond 464. surrey 465. revengeful 466. plants 467. sap 468. government 469. garden 470. unruly 471. saints 472. scope 473. bending 474. bent 475. cloudy 476. signify 477. slaughtered 478. woe's 479. women 480. joints 481. boys 482. blot 483. angel 484. worldly 485. blushing 486. wand'ring 487. outrage 488. lower 489. yields 490. double 491. fairly 492. lineaments 493. unrest 494. wither'd 495. pause 496. accept 497. employ'd 498. foolish 499. bounty 500. process 249 Marlowian markers 1. gold 2. cast 3. wealth 4. hard 5. money 6. words 7. seeing 8. pass 9. yes 10. we'll 11. place 12. serve 13. hang 14. here's 15. governor 16. content 17. villains 18. sure 19. pay 20. crowns 21. got 22. dissemble 23. base 24. brave 25. tush 26. hale 27. soon 28. town 29. hundred 30. runs 31. bring 32. receive 33. there's 34. gone 35. nuns 36. sirrah 37. round 38. wonder 39. resolute 40. treasury 41. that's 42. follow 43. soldiers 44. realm 45. loved 46. christians 47. carry 48. appoint 49. cruel 50. they'll 51. what's 52. pull 53. droop 54. is't 55. seek 56. madam 57. lordship 58. fleet 59. sooner 60. minion 61. barons 62. sell 63. friend 64. force 65. he's 66. let's 67. banish 68. accursed 69. goods 70. wicked 71. fatal 72. knew 73. remains 74. forsake 75. return 76. earl 77. walk 78. leave 79. perish 80. friar 81. daughter 82. distress 83. happily 84. esteem 85. messenger 86. policy 87. commit 88. harbour 89. letters 90. saith 91. fire 92. left 93. fury 94. rule 95. sea 96. wind 97. moved 98. nun 99. abbess 100. ruled 101. looked 102. nunnery 103. ha' 104. price 105. poisoned 106. powder 107. written 108. fingers 109. winds 110. unkind 111. inflict 112. seized 113. weapons 114. amain 115. road 116. gates 117. slain 250 118. gets 119. requite 120. fetch 121. favour 122. lov'st 123. assure 124. question 125. hateful 126. walks 127. countenance 128. seize 129. cease 130. ship 131. anger 132. wish 133. nobles 134. speeches 135. bliss 136. answer 137. get 138. farewell 139. trouble 140. bought 141. grieves 142. shake 143. hardly 144. who's 145. life 146. spare 147. brook 148. unto 149. charge 150. means 151. christian 152. riches 153. device 154. advance 155. seest 156. salute 157. easily 158. grieve 159. remain 160. ease 161. levy 162. safe 163. reveng'd 164. fast 165. command 166. pearl 167. court 168. villainy 169. friars 170. resolved 171. tribute 172. gotten 173. coin 174. paltry 175. sum 176. secret 177. rescue 178. sink 179. betray'd 180. promised 181. fail 182. asleep 183. betray 184. bravely 185. desires 186. message 187. sail 188. walls 189. where's 190. disdain 191. master 192. close 193. train 194. warrant 195. lofty 196. attempt 197. overthrow 198. fie 199. honours 200. pine 201. torments 202. yield 203. wills 204. wrathful 205. proudest 206. long 207. strange 208. peasant 209. use 210. rend 211. goes 212. guard 213. worth 214. conspire 215. keep 216. highness 217. highly 218. pride 219. earls 220. underneath 221. grant 222. lovely 223. city 224. sight 225. writes 226. begone 227. courtesan 228. nose 229. revenged 230. chaste 231. governor's 232. circumcised 233. tormented 234. prythee 235. i'd 236. turned 237. convert 251 238. forced 239. tribute- money 240. search 241. profession 242. discharged 243. galleys 244. entry 245. custom 246. costly 247. diamonds 248. pearls 249. entertained 250. bags 251. thirst 252. senses 253. rests 254. gallows 255. hapless 256. ope 257. multiply 258. escape 259. remove 260. despatch 261. fits 262. beard 263. drives 264. knights 265. threats 266. shipping 267. knell 268. groaning 269. alone 270. gather 271. bestow 272. window 273. silks 274. passions 275. forlorn 276. behoof 277. realm's 278. nephew 279. quickly 280. poison 281. ungentle 282. felicity 283. towers 284. run 285. casts 286. pope 287. certify 288. rent 289. ruin 290. channel 291. robes 292. street 293. envied 294. display 295. preach 296. expel 297. knees 298. comes 299. running 300. hope 301. soldier's 302. wait 303. favourite 304. greater 305. mean 306. home 307. villain 308. sir 309. further 310. unhappy 311. dies 312. bears 313. read 314. rich 315. chance 316. murderer 317. store 318. vain 319. nay 320. suffer 321. bear 322. longer 323. father's 324. aside 325. pierce 326. away 327. people 328. monstrous 329. thrust 330. misery 331. extreme 332. crave 333. say'st 334. room 335. clean 336. offended 337. unnatural 338. request 339. sorrows 340. going 341. ring 342. trust 343. ships 344. fare 345. stays 346. looking 347. treasure 348. feast 349. quiet 350. having 351. dishonour 352. perform 353. cursed 354. sighs 355. distressed 356. loves 252 357. worst 358. speech 359. favours 360. prison 361. kingly 362. haughty 363. sake 364. gentle 365. die 366. hair 367. reward 368. peers 369. enmity 370. arms 371. dearest 372. servant 373. challenge 374. sums 375. new-made 376. girl 377. determined 378. demand 379. silly 380. brethren 381. provide 382. orient 383. ashore 384. trade 385. laws 386. naught 387. keys 388. door 389. friendly 390. vanish 391. rice 392. nation 393. trow 394. admit 395. heaven's 396. homage 397. bills 398. aid 399. mean'st 400. pledge 401. stony 402. spoil 403. dearly 404. sole 405. commands 406. uncontroll'd 407. jar 408. injuries 409. arriv'd 410. passionate 411. lead 412. humbly 413. company 414. argues 415. mere 416. chiefest 417. 'twixt 418. want 419. groom 420. slack 421. work 422. walking 423. rob 424. patiently 425. higher 426. banks 427. swell 428. pity 429. equally 430. discharge 431. ocean 432. write 433. subscribe 434. entreat 435. waits 436. violent 437. bound 438. regiment 439. equal 440. titles 441. create 442. poor 443. exile 444. know'st 445. parley 446. threaten 447. stay 448. 'tis 449. offend 450. cross 451. spite 452. parliament 453. worship 454. poverty 455. multitude 456. exil'd 457. banquet 458. shut 459. hanged 460. clothes 461. whipt 462. rogue 463. shirt 464. god-a 465. monastery 466. revealed 467. lived 468. lodging 469. intolerable 470. batter 471. island 472. sauced 473. juice 474. broth 475. proverb 476. rock 253 477. carried 478. neatly 479. reveal 480. counting- house 481. lik'st 482. locks 483. requisite 484. laughed 485. chambers 486. 'faith 487. usurer 488. physic 489. gallery 490. thou'lt 491. 'scape 492. critical 493. sessions 494. afire 495. doors 496. sacrifice 497. diamond 498. barefoot 499. market- place 500. bullets 254 APPENDIX 5 About us section on the interface of ALTXA Tesis Juan Antonio Latorre García PORTADA ACKNOWLEDGEMENTS TABLE OF CONTENTS ABSTRACT RESUMEN LIST OF TABLES LIST OF FIGURES CHAPTER 1 | INTRODUCTION CHAPTER 2 | HISTORICAL AND LITERARY BACKGROUND CHAPTER 3 | LINGUISTIC BACKGROUND: AN INTRODUCTION TO FORENSIC LINGUISTICS AND AUTHORSHIP ATTRIBUTION STUDIES CHAPTER 4 | METHODOLOGY CHAPTER 5 | PRE-STUDIES CHAPTER 6 | CASE STUDY: ATTRIBUTION OF AUTHORSHIP OF THE SCENES OF ARDEN OF FAVERSHAM CHAPTER 7 | DISCUSSION OF THE RESULTS CHAPTER 8 | CONCLUSION AND FUTURE LINES OF RESEARCH PRIMARY SOURCES BIBLIOGRAPHY AND REFERENCES APPENDICES APPENDIX 1 APPENDIX 2 APPENDIX 3 APPENDIX 4 APPENDIX 5