Técnicas de Big Data y Machine-Learning para Recomendador Bibliográfico

Thumbnail Image
Official URL
Full text at PDC
Publication Date
Advisors (or tutors)
Journal Title
Journal ISSN
Volume Title
Google Scholar
Research Projects
Organizational Units
Journal Issue
En este Trabajo Fin de Máster se presenta la idea de mejorar los recomendadores bibliográficos. Por ello presentamos los distintos sistemas de recomendación en un primer capítulo, el procesamiento de lenguaje natural en un segundo y en el tercero y cuarto capítulo presentamos el problema y nuestra hipótesis de mejora junto con su implementación. La principal idea es crear un clasificador en diferentes temáticas: ciencia ficción, histórico, policíaco, etc. Esta clasificación servirá para realizar un esquema de un sistema de recomendación bibliográfico que proporciona recomendaciones basadas en los perfiles temáticos de los usuarios. Para solventar el problema del gran tamaño de estos datos usaremos la Ley de Zipf como pieza fundamental.
In this Master’s Project the idea of improving literary recommendations is presented. Different recommendation systems are discussed in the first chapter and the second chapter discusses natural language processing. In the third and fourth chapters, the problem is presented along with an improvement hypothesis and its implementation. The main idea is to create a classifier for different genres: science fiction, historical fiction, crime, etc. This classification will serve as an outline of a literary recommendation system that provides recommendations based on the thematic profiles of users. A solution based on Zipf’s Law was used to deal with the large dataset.
Calificación: 9,3
[1] Haifa Alharthi, Diana Inkpen, and Stan Szpakowicz, A survey of book recommender systems, Journal of Intelligent Information Systems 51 (2018), no. 1, 139–160. [2] Chris Anderson, The long tail: Why the future of business is selling less of more, Hachette Books, 2006. [3] Shlomo Argamon and Shlomo Levitan, Measuring the usefulness of function words for authorship attribution, Proceedings of the 2005 ACH/ALLC Conference, 2005, pp. 4–7. [4] Joeran Beel, Bela Gipp, Stefan Langer, and Corinna Breitinger, Research paper recommender systems: A literature survey, International Journal on Digital Libraries (2015), 1–34. [5] Alejandro Bellogín, Iván Cantador, and Pablo Castells, A study of heterogeneity in recommendations for a social music service, Proceedings of the 1st International Workshop on Information Heterogeneity and Fusion in Recommender Systems, 2010, pp. 1–8. [6] Steven Bird, Ewan Klein, and Edward Loper, Natural language processing with python: analyzing text with the natural language toolkit, . O ’Reilly Media, Inc.", 2009. [7] Stephan Bloehdorn and Andreas Hotho, Boosting for text classification with semantic features, International workshop on knowledge discovery on the web, Springer, 2004, pp. 149–166. [8] Robin Burke, Hybrid recommender systems: Survey and experiments, User modeling and user-adapted interaction 12 (2002), no. 4, 331–370. [9] W Bruce Croft, Donald Metzler, and Trevor Strohman, Search engines: Information retrieval in practice, vol. 520, Addison-Wesley Reading, 2010. [10] Niladri Sekhar Dash and Selvaraj Arulmozi, History, features, and typology of language corpora, Springer, 2018. [11] Federación de Gremios de Editores de España (FGEE), Barómetro de hábitos de lectura y compra de libros en españa 2019,, [Online; accedido 21-05-2020]. [12] Gideon Dror, Noam Koenigstein, Yehuda Koren, and Markus Weimer, The yahoo! music dataset and kdd-cup’11, Proceedings of the 2011 International Conference on KDD Cup 2011-Volume 18, 2011, pp. 3–18. [13] Martin Gerlach and Francesc Font-Clos, A standardized project gutenberg corpus for statistical analysis of natural language and quantitative linguistics, Entropy 22 (2020), no. 1, 126. [14] David Goldberg, David Nichols, Brian M Oki, and Douglas Terry, Using collaborative filtering to weave an information tapestry, Communications of the ACM 35 (1992), no. 12, 61–70. [15] Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins, Eigentaste: A constant time collaborative filtering algorithm, information retrieval 4 (2001), no. 2, 133–151. [16] F Maxwell Harper and Joseph A Konstan, The movielens datasets: History and context, Acm transactions on interactive intelligent systems (tiis) 5 (2015), no. 4, 1–19. [17] Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The elements of statistical learning: data mining, inference, and prediction, Springer Science & Business Media, 2009. [18] Jonathan L Herlocker, Joseph A Konstan, Loren G Terveen, and John T Riedl, Evaluating collaborative filtering recommender systems, ACM Transactions on Information Systems (TOIS) 22 (2004), no. 1, 5–53. [19] Zan Huang, Wingyan Chung, Thian-Huat Ong, and Hsinchun Chen, A graph-based recommender system for digital library, Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, 2002, pp. 65–73. [20] Karen Sparck Jones, A statistical interpretation of term specificity and its application in retrieval, Journal of documentation (1972). [21] Isabel Moreno-Sánchez, Francesc Font-Clos, and Álvaro Corral, Large-scale analysis of zipf ’s law in english texts, PloS one 11 (2016), no. 1. [22] Mark EJ Newman, Power laws, pareto distributions and zipf ’s law, Contemporary physics 46 (2005), no. 5, 323–351. [23] Sebastian Raschka and Vahid Mirjalili, Python machine learning: Machine learning and deep learning with python, scikit-learn, and tensorflow 2, Packt Publishing Ltd, 2019. [24] Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John Riedl, Grouplens: an open architecture for collaborative filtering of netnews, Proceedings of the 1994 ACM conference on Computer supported cooperative work, 1994, pp. 175–186. [25] Elaine Rich, User modeling via stereotypes, Cognitive science 3 (1979), no. 4, 329–354. [26] Jake Ryland Williams, James P Bagrow, Christopher M Danforth, and Peter Sheridan Dodds, Text mixing shapes the anatomy of rank-frequency distributions: A modern zipfian mechanics for natural language, arXiv preprint arXiv:1409.3870 (2014). [27] Efstathios Stamatatos, A survey of modern authorship attribution methods, Journal of the American Society for information Science and Technology 60 (2009), no. 3, 538-556. [28] Alexander Strehl, Joydeep Ghosh, and Raymond Mooney, Impact of similarity measures on web-page clustering, Workshop on artificial intelligence for web search (AAAI 2000), vol. 58, 2000, p. 64. [29] Sergios Theodoridis, Aggelos Pikrakis, Konstantinos Koutroumbas, and Dionisis Cavouras, Introduction to pattern recognition: a matlab approach, Academic Press, 2010. [30] Andreas Töscher, Michael Jahrer, and Robert M Bell, The bigchaos solution to the netflix grand prize, Netflix prize documentation (2009), 1–52. [31] C. J. van Rijsbergen, Information retrieval, Butterworth, 1979. [32] André Vellino, Usage-based vs. citation-based methods for recommending scholarly research articles, arXiv preprint arXiv:1303.7149 (2013). [33] Tao Zhou, Zoltán Kuscsik, Jian-Guo Liu, Matúš Medo, Joseph Rushton Wakeling, and Yi-Cheng Zhang, Solving the apparent diversity-accuracy dilemma of recommender systems, Proceedings of the National Academy of Sciences 107 (2010), no. 10, 4511–4515. [34] Cai-Nicolas Ziegler, Sean M McNee, Joseph A Konstan, and Georg Lausen, Improving recommendation lists through topic diversification, Proceedings of the 14th international conference on World Wide Web, 2005, pp. 22–32. [35] George Kingsley Zipf, Human behavior and the principle of least effort., (1949).