NavigAItor: Una herramienta recomendadora de LLMs basada en la descripción del dominio a aplicar
Loading...
Official URL
Full text at PDC
Publication date
2024
Authors
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
El procesamiento del lenguaje natural (NLP) ha sido un desafío tecnológico durante décadas. Este estudio se centra en evaluar el rendimiento de diversos modelos de lenguaje a gran escala (LLMs) en casos de uso específicos de NLP, con el objetivo principal de desarrollar un asistente virtual llamado NavigAItor que ofrezca recomendaciones basadas en los resultados del estudio. Se identifican y comparan modelos de OpenAI, LLAMA y Mistral en dos contextos: el análisis de entrevistas de trabajo y de llamadas telefónicas. Se han utilizado herramientas de evaluación que incluyen un formulario para la valoración subjetiva de los usuarios sobre las salidas generadas en las tareas de cada caso de uso por cada modelo, junto con la medición de la latencia, indicando el tiempo que cada modelo tarda en ejecutar dichas tareas. Además, se ha llevado a cabo una investigación exhaustiva sobre otras métricas relevantes, como el rendimiento en benchmarks estandarizados y el precio por token. Los resultados obtenidos buscan guiar a desarrolladores y profesionales de IA en la selección de modelos de lenguaje para aplicaciones del mundo real, contribuyendo al avance del campo del NLP y mejorando la eficacia de las soluciones implementadas.
Natural Language Processing (NLP) has been a technological challenge for decades. This study focuses on evaluating the performance of various large language models (LLMs) in specific NLP use cases, with the primary objective of developing a virtual assistant called NavigAItor that provides recommendations based on the study's findings. Models from OpenAI, LLAMA, and Mistral are identified and compared in two contexts: the analysis of job interviews and phone calls. Evaluation tools used include a user feedback form for subjective assessment of the outputs generated by each model for each use case task, along with latency measurement indicating the time each model takes to execute these tasks. Additionally, an exhaustive investigation into other relevant metrics, such as performance in standardized benchmarks and cost per token, has been conducted. The results aim to guide developers and AI professionals in selecting language models for real-world applications, contributing to the advancement of NLP and improving the effectiveness of implemented solutions.
Natural Language Processing (NLP) has been a technological challenge for decades. This study focuses on evaluating the performance of various large language models (LLMs) in specific NLP use cases, with the primary objective of developing a virtual assistant called NavigAItor that provides recommendations based on the study's findings. Models from OpenAI, LLAMA, and Mistral are identified and compared in two contexts: the analysis of job interviews and phone calls. Evaluation tools used include a user feedback form for subjective assessment of the outputs generated by each model for each use case task, along with latency measurement indicating the time each model takes to execute these tasks. Additionally, an exhaustive investigation into other relevant metrics, such as performance in standardized benchmarks and cost per token, has been conducted. The results aim to guide developers and AI professionals in selecting language models for real-world applications, contributing to the advancement of NLP and improving the effectiveness of implemented solutions.
Description
Trabajo de Fin de Grado en Ingeniería Informática, Facultad de Informática UCM, Departamento de Ingeniería del Software e Inteligencia Artificial, Curso 2023/2024.