Estadística multivariante aplicada al análisis y predicción de partidos de fútbol en las principales ligas europeas

Thumbnail Image
Full text at PDC
Publication Date
Advisors (or tutors)
Journal Title
Journal ISSN
Volume Title
Universidad Politécnica de Madrid
Google Scholar
Research Projects
Organizational Units
Journal Issue
El propósito de este estudio es analizar las estadísticas de juego en las principales ligas europeas y ver qué factores son más determinantes a la hora de predecir el resultado de un partido. Para ello usaremos técnicas de estadística multivariante incluyendo análisis de componentes principales y regresión logística. Las dos primeras componentes principales explican alrededor del 70 % de precisión obtenida cuando se predicen victorias fuera de casa tomando como variables predictivas las propias componentes. Este estudio también demuestra que en la liga inglesa los partidos son menos equilibrados.
The purpose of this study is to analyse main game-related statistics differences between the main European leagues and which factors are more determinant when predicting a match score, by means of multivariate statistical techniques, including principal component analysis and logistic regression. The first two principal components explain around the 70 % of variance, and over a 70 % of accuracy is obtained when predicting away-team wins, with these two principal components as predictive variables. This study also shows that in English Premier League, games are less equilibrated.
Unesco subjects
[1] ARAYA, J., & LARKIN, P. Key performance variables between the top 10 and bottom 10 teams in the English premier league 2012/13 season. Hum Mov Health Coach Edu, 1, 17–29, 2014. [2] BARROS, R.,CUNHA, S., MAGALHAES, W., GUIMARAES, M., et al. Representation and analysis of soccer players’ actions using principal components. Journal of Human Movement Studies, 2006. [3] BROICH, H., MESTER, J., SEIFRIZ, F., & YUE, Z. Statistical analysis for the first Bundesliga in the current soccer season. Progress in Applied Mathematics, 7(2), 1–8, 2014. [4] DOBSON, S., & GODDARD, J. Modelling and forecasting match results in the English premier league and football league. In Economics, management and optimization in sports (pp. 59–77). Springer, 2004. [5] ELYAKIM, E., MORGULEV, E., LIDOR, R., MECKEL, Y., ARNON, M., & BEN-SIRA, D. Comparative analysis of game parameters between Italian league and Israeli league football matches. International Journal of Performance Analysis in Sport, 20(2), 165–179, 2020. [6] HARRELL JR, F. E., LEE, K. L., CALIFF, R. M., PRYOR, D. B., & ROSATI, R. A. Regression modelling strategies for improved prognostic prediction. Statistics in medicine, 3(2), 143–152, 1984. [7] LAGO-PEÑAS, C., LAGO-BALLESTEROS, J., DELLAL, A., & GÓMEZ, M. Game-related statistics that discriminated winning, drawing and losing teams from the Spanish soccer league. Journal of sports science & medicine, 9(2), 288, 2010. [8] LEPSCHY, H., WÄSCHE, H., & WOLL, A. How to be successful in football: a systematic review. The Open Sports Sciences Journal, 11(1), 2018. [9] LEPSCHY, H., WÄSCHE, H., & WOLL, A. Success factors in football: an analysis of the German Bundesliga. International Journal of Performance Analysis in Sport, 1–15, 2020. [10] MOURA, F. A., MARTINS, L. E. B., & CUNHA, S. A. Analysis of football game-related statistics using multivariate techniques. Journal of sports sciences, 32(20), 1881–1887, 2014. [11] PEDUZZI, P., CONCATO, J., KEMPER, E., HOLFORD, T. R., & FEINSTEIN, A. R. A simulation study of the number of events per variable in logistic regression analysis. Journal of clinical epidemiology, 49(12), 1373–1379, 1996. [12] PEÑA, D. Análisis de datos multivariante. Mc Graw Hill, 2002. [13] PEREZ-SÁNCHEZ, J. M., GÓMEZ-DENIZ, E., & DAVILA-CÁRDENES, N. A comparative study of logistic models using an asymmetric link: Modelling the away victories in football. Symmetry, 10(6), 224, 2018. [14] REILLY, T. A motion analysis of work-rate in different positional roles in professional football match-play. J. Human Movement Studies, 2, 87–97, 1976. [15] UEFA. Association club coefficients, 2019. [16] WILLOUGHBY, K. A. Winning games in canadian football: A logistic regression analysis. The College Mathematics Journal, 33(3), 215–220, 2002.