Detección de vulnerabilidades de código en C y C++ mediante Redes Neuronales Recurrentes
Loading...
Official URL
Full text at PDC
Publication date
2023
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
En la actualidad, la detección de vulnerabilidades se ha vuelto un tema crítico para cualquier organización, esto debido a que en un mundo cada vez más conectado se exigen nuevas aplicaciones que satisfagan las necesidades de los usuarios. El análisis de código estático es una prueba fundamental que deben realizar las organizaciones para verificar la robustez de su código previo a poner en producción un nuevo producto, permitiendo minimizar los huecos de seguridad y con esto proteger los datos de una organización. Sin embargo, en análisis de código en la actualidad se realiza de manera semi-automática con la ayuda de herramientas comerciales que destacan por su alta precisión pero en una etapa donde el código se encuentra desarrollado casi en su totalidad, lo que podría ocasionar errores humanos o pasar por alto una pieza de software vulnerable. La inteligencia artificial podría permitir desarrollar códigos cada vez más seguros gracias a su capacidad de procesar grandes cantidades de información. Las Redes Neuronales Recurrentes son un tipo de modelo de aprendizaje profundo que pueden capturar secuencias de datos, lo que las hace adecuadas para analizar código fuente, que se estructura en secuencias de instrucciones y símbolos. En el contexto de la detección de vulnerabilidades, las RNN pueden, analizar el contexto, identificar patrones complejos, adaptarse a variaciones propias de la forma de programación de cada desarrollador y aprender de grandes conjuntos de datos, consiguiendo realizar predicciones correctamente sobre ficheros con vulnerabilidades similares. Este trabajo de fin de grado se centra en el diseño de un modelo a través de una Red Neuronal Recurrente, concretamente a través de una de LSTM, con el que se buscará detectar vulnerabilidades dentro de los fragmentos de código en C/C++.
Nowadays, vulnerability detection has become a critical issue for any organization, due to the fact that in an increasingly connected world, new applications that meet the needs of users are demanded. Static code analysis is a fundamental test that organizations must perform to verify the robustness of their code prior to putting a new product into production, allowing them to minimize security holes and thus protect an organization’s data. However, code analysis is currently performed semi-automatically with the help of commercial tools that are highly accurate but at a stage where the code is almost fully developed, which could lead to human error or overlook a vulnerable piece of software. Artificial intelligence could enable the development of increasingly secure code thanks to its ability to process large amounts of information. Recurrent Neural Networks are a type of deep learning model that can capture sequences of data, making them suitable for analyzing source code, which is structured in sequences of instructions and symbols. In the context of vulnerability detection, RNNs can analyze the context, identify complex patterns, adapt to variations in the programming style of each developer and learn from large data sets, making correct predictions about files with similar vulnerabilities. This final degree work focuses on the design of a model through a Recurrent Neural Network, specifically through one of LSTM, which will seek to detect vulnerabilities within the code fragments in C/C++.
Nowadays, vulnerability detection has become a critical issue for any organization, due to the fact that in an increasingly connected world, new applications that meet the needs of users are demanded. Static code analysis is a fundamental test that organizations must perform to verify the robustness of their code prior to putting a new product into production, allowing them to minimize security holes and thus protect an organization’s data. However, code analysis is currently performed semi-automatically with the help of commercial tools that are highly accurate but at a stage where the code is almost fully developed, which could lead to human error or overlook a vulnerable piece of software. Artificial intelligence could enable the development of increasingly secure code thanks to its ability to process large amounts of information. Recurrent Neural Networks are a type of deep learning model that can capture sequences of data, making them suitable for analyzing source code, which is structured in sequences of instructions and symbols. In the context of vulnerability detection, RNNs can analyze the context, identify complex patterns, adapt to variations in the programming style of each developer and learn from large data sets, making correct predictions about files with similar vulnerabilities. This final degree work focuses on the design of a model through a Recurrent Neural Network, specifically through one of LSTM, which will seek to detect vulnerabilities within the code fragments in C/C++.
Description
Trabajo de Fin de Doble Grado en Ingeniería Informática y Administración y Dirección de Empresas, Facultad de Informática UCM, Departamento de Ingeniería de Software e Inteligencia Artificial, Curso 2022/2023.