RT Generic
T1 Testeando LLMs
T2 Testing LLMs
A1 Ramos González, Gonzalo
A1 Sampedro Mate, Marta
A1 De Hoyos Pino, Javier
AB Este proyecto se centra en el desarrollo de un sistema diseñado para evaluar y comparar la eficiencia de LLMs (modelos de lenguaje de gran tamaño), como GPT o Cohere los cuales son los utilizados en este proyecto. El objetivo principal fue crear una herramienta que permita la interacción con un LLM en prueba y utilizar un LLM de referencia para evaluar las respuestas.La herramienta cuenta con funcionalidades que permiten al usuario ajustar la dificultad y la temática de las preguntas, adaptando así la evaluación a diferentes necesidades y contextos. Este sistema nos aporta una herramienta útil para evaluación de diferentes LLMs pudiendo utilizar para otro tipo de estudios relacionado con la inteligencia artificial
AB This project focuses on developing a system designed to evaluate and compare the efficiency of Large Language Models (LLMs), such as GPT or Cohere, which are used in this project. The main goal was to create a tool that allows interaction with a test LLM and uses a reference LLM to evaluate the responses.The tool features functionalities that enable the user to adjust the difficulty and theme of the questions, thus tailoring the evaluation to different needs and contexts. This system provides us with a useful tool for evaluating various LLMs, which can be used for other types of studies related to artificial intelligence.
YR 2024
FD 2024
LK https://hdl.handle.net/20.500.14352/106077
UL https://hdl.handle.net/20.500.14352/106077
LA spa
NO Trabajo de Fin de Grado en Ingeniería del Software, Facultad de Informática UCM, Departamento de Sistemas Informáticos y Computación, Curso 2023/2024
DS Docta Complutense
RD 7 abr 2025