Bagley Por Víctor Fresco Perales Grado en Grado de Ingeniería Informática Facultad de Facultad de Informática Dirigido por José Luis Vázquez-Poletti Bagley Madrid, 2021–2022 Bagley Automated tool for reconnaissance and vulnerability detection in Bug Bounty environments Memoria que se presenta para el Trabajo de Fin de Grado Víctor Fresco Perales Dirigido por José Luis Vázquez-Poletti Facultad de Informática Universidad Complutense de Madrid Madrid, 2022 Abstract Bug Bounties are monetary rewards that companies pay to independent security re- searchers when they successfully find and report an exploitable vulnerability. A bounty for a critical vulnerability in a big company can reach the equivalent to a year’s salary in Spain, and this amount is not defined by the complexity of the bug, but by the impact of it. This means that very simple to find and exploit bugs that affect critical infrastructure can report a very big amount of money if the person who finds it is in the right place, in the right moment. The goal of this project is to build and maintain an automated tool that runs on its own, in a Virtual Private Server and is able to perform reconnaissance and detect these simple vulnerabilities in a target. It also implements a communication interface over Discord, so that the researcher can operate it at any moment with any device and find out immediately if something is discovered, making it the perfect tool for assisting bug hunters. ii Agradecimientos Gracias a mis padres Pilar y Gelo por apoyarme siempre, incondicionalmente. A mis abuelos Isabel y Felipe por haber trabajado más de una vida por todos nosotros. A mi tia Vivi por enseñarme cómo disfrutar de las cosas, y a mi tio Emilio por enseñarme a ver más allá de ellas. Gracias a Patricia por haberme aguantado a su lado durante todo el proceso. A mis amigos, a quienes les dedicaría mucho más que un simple párrafo de agradecimientos, por estar siempre ahí. Especial gracias a José Luis Vázquez-Poletti y a David Pacios Izquierdo por haberme dado la motivación necesaria para emprender un camino en la seguridad informática. iii Sobre TEFLO NX Teflon X(cc0 1.0(documentación) MIT(código))es una plantilla de LATEX creada por David Pacios Izquierdo con fecha de Enero de 2018. Con atribuciones de uso CC0. Esta plantilla fue desarrollada para facilitar la creación de documentación profesional para Trabajos de Fin de Grado, Trabajos de Fin de Máster o Doctorados. La versión usada es la X V:X Overleaf V2 with XeLaTeX, margin 1in, bib Contacto Autor: David Pacios Izquiero Correo: dpacios@ucm.es ASCII: ascii@ucm.es Despacho 110 - Facultad de Informática iv dpacios@ucm.es ascii@ucm.es Contents Page 1 Introduction 2 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 State of the Art 5 2.1 Vulnerabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Bug Bounties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Remote management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Similar software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Architecture 11 3.1 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Main Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.1 Database connection . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.2 Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.3 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.4 Discord connection . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.5 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4 Virtual Private Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Use Case 34 5 Conclusions and Future Work 36 Bibliografía y enlaces de referencia 39 A Operator Manual 40 A.1 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 A.2 Configuring Discord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 A.3 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 1 Chapter 1 Introduction 1.1 Motivation Nowadays, every medium and big company must develop software at some point of their life, whether it’s just a static web page or it’s a complex multi-platform application running on millions of devices. These companies tend to already have security experts that audit their code and recommend them with best practices in order to stay safe, however, the security field is too wide, and attackers only need a single mistake from the developers to inflict great harm. For this reason, some of these companies are willing to pay big amounts of money to anyone who finds and reports a vulnerability that could be exploited by a malicious actor in any of their products. These payments are known as Bug Bounties and they have spread a lot in recent years, so much that there are entire platforms dedicated to connecting companies and security researchers, acting as intermediaries between them. The most notorious ones are HackerOne, Bugcrowd and Intigriti. The amount paid by these companies usually varies between $50 and $10.000 per vul- nerability, depending mostly on the impact of it and the company itself 1, but there have been cases of much bigger amounts. Many of these vulnerabilities are caused by very complex bugs in the code of the applications, which require days or even months of extensive research and a lot of previous knowledge of the technologies behind them. However, there are others which are not difficult to find or exploit at all, and just require that the researcher is in the right place at the right moment so they can be discovered. These vulnerabilities are known as low-hanging-fruit and are usually found by the first person who pays attention to the place where the bug is. Some examples of them are subdomain takeovers, data leaks or even some easy SQL injections in URL parameters or POST data. An important aspect to consider when doing Bug Bounties is the reconnaissance, com- monly known as recon. It is the first phase of the research process when targeting one of these companies, at least those offering web services, and it involves listing all available assets from a target, finding hidden services, etc. It results in impossible to do this man- ually, so researchers use automated tools to do so. It’s crucial to perform reconnaissance in depth, since it allows to expand the surface to find bugs on. It can lead to finding 1https://www.hackerone.com/resources/reporting/hacker-powered-security-report-industry-insights-21 2 https://www.hackerone.com/resources/reporting/hacker-powered-security-report-industry-insights-21 Bagley UCM components that are not as exposed as others so developers don’t put that much effort into securing them and can contain critical bugs. There is also a possibility of discovering something that shouldn’t be publicly accessible such as an administration panel or API keys, which would also be considered “low-hanging-fruit”. Most of the finding of this low-hanging-fruit and the reconnaissance can be automated to be running 24 hours a day on a Virtual Private Server and notify the researcher when something interesting is found, so he or she can focus on analyzing these results and finding more complex bugs. The payments received from reporting these bugs will be greater than the cost of maintaining the tool (mainly the cost of the Virtual Private Server), so it can be a profitable project in the long run. The name comes from the popular video game saga Watchdogs, precisely, Watchdogs Legion, in which the main characters are assisted by an AI named Bagley, which helps them along their adventures. Although this project does not contain any AI, we thought it was a good name. 1.2 Goals The main goal of this project is to build and maintain an automated tool that can perform basic tasks of reconnaissance and vulnerability detection to a set of given targets without any required supervision, and notify the operator when a potential vulnerability is found. This project will be focused only on web applications, so targets will be domains or subdomains which the operator has permission to test on. The ultimate goal is to find as many vulnerabilities as possible with very little interaction from the person operating it. The tool must be able to: • Be running indefinitely and independently from the operator, for example, in a VPS (Virtual Private Server). • Communicate with the operator via an existing messaging app. It must be able to send different notifications for different events (errors, logs, vulnerabilities found, etc.) and the operator must be able to send predefined commands to perform basic actions in the tool, (start, stop, add target, etc.). • Crawl a target to find as much content as possible, imitating a real user. It must be able to render JavaScript so that modern web applications are crawlable too. • Perform content discovery with alternative methods such as brute forcing or con- sulting third party sources. • Look for known vulnerabilities in the technologies used by the targets. • Check if any subdomain can be claimed to an external hosting provider (subdomain takeover). • Discover simple authorization bypasses. • Analyze client-side JavaScript looking for vulnerabilities and endpoints. • Analyze responses looking for patterns in order to find credentials or API keys. 3 Bagley UCM • Discover injection vulnerabilities such as SQL injection, XSS or SSTI by brute- forcing parameters in an efficient way. Some of these goals can be accomplished with existing free, open-source applications maintained by big communities. In those cases, the task will be to make them work as if it was all one big application. 1.3 Document Structure First chapter is the Introduction, which contains the motivation behind this project, the goals of it and the structure of the document, which is this section. Second chapter is the State of Art, which presents every aspect surrounding this project. It contains a section explaining what is a vulnerability, a section digging into what are Bug Bounties, a section enumerating the different options available for remote management and a section about other applications related to this project. Third chapter explains everything about the architecture of the system, including its database, its main script, its deployment and the Virtual Private Server in which it is hosted. Forth chapter presents the use case for the system and fifth chapter explains the conclusions obtained from the project and the future of it. Lastly, there is an annex detailing how to use the system. 4 Chapter 2 State of the Art 2.1 Vulnerabilities This project will deal extensively with vulnerabilities, so it’s important to give them a formal definition before anything else. A vulnerability is a weakness in an application that allows a malicious actor to perform some unpermitted actions or gain access to information they shouldn’t otherwise be allowed [1]. A bug is not strictly the same as a vulnerability, since a bug can be a simple error with no security implications, however, they will be treated as synonyms in this document, due to the wide usage of the word bug to refer to a vulnerability in the Bug Bounty world. 2.2 Bug Bounties In order to better understand the goals and limitations of this project and the technolo- gies surrounding it, we have to extend the earlier definition given to Bug Bounties and explain some key concepts about them. So, as previously mentioned, Bug Bounties are payments that security researchers receive when finding and reporting a valid bug or vulnerability that could be exploited by a malicious actor in a company’s service 1. This obviously includes web applications, but also desktop applications, mobile applications, smart contracts deployed on the Blockchain, physical devices such as IoT, etc. Security researchers who are dedicated to this are usually called bug hunters. Those companies interested in offering this kind of rewards may have a Bug Bounty program on their own (i.e Google or Apple) or a program running on other independent platforms, such as HackerOne, BugCrowd or Intigriti (i.e PayPal, IBM, Twitter, Uber. . . ). These platforms help companies connect with researchers and vice-versa, while also ensur- ing that everybody is treated fairly and researchers receive the reward they really deserve. They also provide transparency, allowing everybody to see very precise information about the programs, such as rewards paid in the last 90 days or average response time, and about the researchers, such as vulnerabilities found in the last 90 days or percentage of valid reports. There is also a possibility for companies to enroll in these platforms as private Bug Bounty programs, which require an invitation to see their statistics, test on their services and report vulnerabilities. 1https://www.hackerone.com/vulnerability-management/what-are-bug-bounties-how-do-they-work-examples 5 https://www.hackerone.com/vulnerability-management/what-are-bug-bounties-how-do-they-work-examples Bagley UCM The amount of the payments vary greatly depending on the impact of the vulnerability and the company offering the bounty. The impact can be defined as the potential damage that a malicious actor can do to the company or its customers if the vulnerability is successfully exploited. HackerOne measures the impact as low, medium, high and critical, while BugCrowd measures it as P1, P2, P3. . . being P1 the highest, but it’s essentially the same. An example of a critical impact vulnerability can be a Remote Code Execution in the backend server or an account takeover, while a low impact vulnerability can be an Open Redirect or a Captcha bypass. In 2021, the average bounty given by software companies for critical vulnerabilities was $7000, while for low vulnerabilities was less than $2002. As an example, GitLab, which has one of the biggest programs in HackerOne, pays from $20.000 to $35.000 for critical vulnerabilities. Besides that, if a vulnerability is considered critical enough, the bounty will usually be bigger than what the company usually pays for other critical vulnerabil- ities. That is why it’s important for a researcher to demonstrate the impact with easy to understand reports or Proof of Concept videos. However, not every program offers bounties. There are some programs that offer some kind of recognition such as a signed certificate, points inside a platform, etc. or even company products, such as clothes, stickers, etc. (commonly known as swag). Another important concept in Bug Bounties is the scope. It’s the set of services or assets that the company allows to test on, and therefore, it’s willing to pay a bounty for a vulnerability affecting these services or assets. If a researcher reports a vulnerability affecting a service which is out of scope, the program is not required to pay a bounty. In the case of web applications, the scope is usually given as a list of domains or subdomains, or even complete URLs. Sometimes, there are some vulnerabilities that are also out of scope, depending on the program. They follow the same logic: if a researcher reports an out of scope vulnerability, the program is not required to pay a bounty. Some examples can be self XSS or a misconfiguration in security attributes of a cookie. Finally, each program has its own set of requirements, which are rules that the researcher must follow to participate in it. They define what the bug hunter is allowed to do and under what circumstances it is allowed to do that. If a valid vulnerability is reported but these requirements are not met, the program may choose not to pay the bounty, same as with the scope. Some of the most common requirements are to limit the requests per second or to include special headers or cookies in order for them to be able to tell which requests come from a researcher. There are some programs that won’t allow automated tools at all. As a bug hunter, there are a lot of options to choose when doing Bug Bounties, however, this project will focus only on hunting for bugs in web applications. It’s also important to mention that the author has more experience in HackerOne than in any other platform, so there may be some concepts greatly tied to it. 2https://www.hackerone.com/resources/reporting/hacker-powered-security-report-industry-insights-21 6 https://www.hackerone.com/resources/reporting/hacker-powered-security-report-industry-insights-21 Bagley UCM 2.2.1 Methodology When hunting for bugs, each researcher has its very methodology and each vulnerability has a unique path that the bug hunter has traversed in order to find it. However, there are some common practices that the vast majority of researchers use in order to find those bugs. They don’t guarantee finding anything, but they narrow the process and make it easier. This subsection aims to explain these common practices to later understand better how automated tools can really assist the bug hunter. Once the researcher has chosen a target, which, as has already been said, can be a domain, a set of subdomains or complete URLs, the first step is to manually explore the web application. This is, basically, using it as a regular user: creating an account if an authentication system is implemented, using all available services, etc. The main goal is to learn how the application is designed, what are its main functionalities and how the developers expect the users to interact with it. This allows the researcher to understand how the application was built and where the developers might have missed something that can potentially be exploited, that is, a bug or a vulnerability. While the researcher makes this initial contact, he or she will be looking at the requests that the browser is sending and the responses produced by the server. This is done with a proxy that sits between the browser and the server in the machine of the researcher, such as the one integrated with Burp Suite or the one in OWASP ZAP. It lets the bug hunter analyze the cookies being sent, the headers, type of requests, etc. but also intercept and change those requests, repeat them... When something unusual happens, such as an unsigned cookie specifying the username, or many requests made to an endpoint resulting in strange errors, the researcher focuses on that part of the application, to check if something can be exploited. Many times, if one of those unusual behaviors looks like it can be product of a widely known vulnerability, such as an SQL error, which might mean there’s an SQL injection, the researcher can use automated tools to try to exploit it and save time. Maybe the application is protected by a WAF and the exploitation needs the researcher to work on a bypass for that, but if that’s not the case, the tool will succeed and probably take less time than the researcher. There are very powerful tools to exploit widely known vulnerabilities that don’t require complex bypasses, such as sqlmap for SQL injections, dalfox for XSS and many more. When everything has been inspected and nothing remarkable to focus on has been found, the researcher still can expand the area to look for bugs. This is called recon, a short for reconnaissance. One of the methods to do recon is content discovery, which is finding resources that are not accessible just by navigating the application, for example, a sub- domain that is used by employees but is not directly linked to the main web application. This is usually done by brute-forcing or by consulting third-party sources, such as Shodan or the Wayback Machine. Content discovery is usually aimed to finding subdomains and URL paths, and can uncover very interesting resources, such as functionalities that are no longer maintained but still affect the company so they are susceptible to having impactful bugs or even content that is not supposed to be publicly accessible, such as credentials or administrator panels. In order to perform this technique, researchers use automated tools such as gobuster or ffuf for brute-forcing or gau for consulting third-party sources. 7 Bagley UCM All these explained techniques have a disadvantage, which is very common among cy- bersecurity, and it’s that they all belong to black box testing, which means that they do not look at the internal workings of the technology that is being tested, so the researcher never knows for sure how the web application is working in the backend. On the contrary, white box testing means to actually look at the internal functionalities of the technology being tested [2], for example, code analysis. In Bug Bounty environments, programs may have the backend code of the web application available as open source, however, that’s not a regular thing. Nonetheless, every modern Web Application runs client-side JavaScript, which can be inspected and even debugged. Reviewing this code can reveal a lot of client-side vulnerabilities such as DOM-based XSS or Open Redirects. A researcher can even find new endpoints in the web application, credentials, API keys, etc. 2.2.2 Automation As already discussed, vulnerabilities can have different impacts, depending on the poten- tial harm they can produce. However, this impact is not directly related to the difficulty of finding and exploiting them. This means that a critical vulnerability doesn’t have to be hidden deeply in the code, protected by multiple layers of security. Some very critical ones can be in plain sight, waiting for them to be discovered by the first researcher who focuses on that part of the application. An example of this was the CVE-2021-44228 vulnerability, commonly known as log4shell. This bug was one of the most impactful ones in recent years, due to the amount of servers that were affected and its ease of ex- ploitation [3], but the origin of the vulnerability itself was not obscure at all. There was even a conference talking about the same issue that produced this vulnerability in Java applications from a couple of years before it was discovered [4]. The easiest bugs to find are commonly known as low-hanging fruit or low-hanging vul- nerabilities, and they just require that the researcher is the first one to report them. An example of them are subdomain takeovers, which are a type of vulnerability in which the researcher is able to claim a subdomain from a legitimate site, usually hosted on a third party provider [1]. In this case, the researcher just has to notice that the subdomain is unclaimed and contact the provider in order to earn a bounty for a medium to high im- pact vulnerability. However, manual testing is often very slow and it would be tedious for a single researcher to be testing every possible endpoint of an application. Low-hanging vulnerabilities may be easy to find, however, they can be anywhere, so its difficulty relies on testing all potential points of failure. The solution to this problem is automation. A researcher can create tools that perform all this tedious testing, whenever there is no complex reasoning behind and the output is predictable. Vulnerability detection is not the only task that can be automated. It has been described in the past subsection how researchers use existing tools to perform recon on a web application. These tools can be further chained one after another, so that the process is fully automated, integrating even the vulnerability detection. This whole automation will allow the researcher to easily find low-hanging vulnerabilities without having to test everything manually, allowing him or her to focus on more complex vulnerabilities that require some research and investigation. 8 Bagley UCM 2.3 Remote management The most common way of managing a tool that is running on an external server, such as a VPS, is to directly connect to the server and then manage the tool from the inside. This connection is usually done by SSH (Secure Shell), which is a protocol for secure remote login over insecure networks3. This is a perfect option if the operator wants to do maintenance, check the logs and state of the tool, modify its behavior through parameters, etc. However, sometimes the optimal situation is that the operator gets notified when a significant event happens, for example, an error in the tool, an important result, etc. This is even more critical in the Bug Bounty environment, since there is always a possibility of another researcher finding the same vulnerability. In those cases, the first one that reports it is usually the one getting the bounty, so it’s crucial for the researcher to know that a vulnerability has been discovered as soon as possible. This section will explore the available options to implement a remote management system that fits into this project. A simple option to implement a management system with notifications is to use email. It would allow the tool to send and receive emails to respectively notify the operator on various events and accept commands to manage it. To do so, many programming languages offer libraries to interact with SMTP protocol, but an email and a password must be hard-coded in the code or provided every time the tool is initialized. A good alternative is to use a service like Sendgrid, which is a cloud-based solution to send transactional and marketing emails4. It allows using API keys so that no password is used, adding extra security since API keys can be easily revoked and regenerated, and its access can be limited. The main inconvenience about this is that emails are not instantaneous and can sometimes be misidentified as spam. It would be tedious for the operator to manage the system having to wait for the results of each command. The perfect email alternative is to use an instantaneous messaging application that sup- ports automation through an API. This is the case of Telegram, which is a free appli- cation that offers cross-platform, cloud-based instant messaging and file-sharing services among others5. It offers an API compatible with many programming languages in or- der for the system to be able to use it6. Since it’s an instant messaging application, notifications would be delivered right away, and the management of the system through commands would be a comfortable experience for the operator. Since the application is cross-platform, it can be used in any device, allowing the operator to use it any- where. 3https://datatracker.ietf.org/doc/html/rfc4251 4https://sendgrid.com/ 5https://telegram.org/ 6https://core.telegram.org/ 9 https://datatracker.ietf.org/doc/html/rfc4251 https://sendgrid.com/ https://telegram.org/ https://core.telegram.org/ Bagley UCM Although Telegram is already a very valid solution, there are other options offering a few shallow improvements, such as Discord or Slack. Both of them are platforms that allow users to communicate with instant messaging and VoIP, allowing also to send files, create servers with different channels, etc. Their main difference between them relies on the initial goal design of each one: Discord was oriented to gaming environments while Slack to the workplace78. Both of them offer an API in many languages, which can be used to create bots or other kinds of automation over the platform. The main improvement that these applications provide over Telegram is the use of channels in a server. They can be used to classify notifications so that each channel corresponds to a different event. This way, channels with notifications regarding errors and findings can be active while others dedicated to logs and showing the state of the system can be muted. Discord was finally used in this project simply because the author uses Discord a lot, while he has never used Slack. 2.4 Similar software As already stated, each researcher has a very unique methodology, and although there are common practices, finding a bug is the result of taking many decisions that depend greatly on the person behind the keyboard. This means that building an automation tool for Bug Bounty environments is a very personal process, in which each researcher chooses to implement a different set of tools and techniques that he or she thinks will be more efficient for finding bugs. Besides that, this set of techniques and methodologies that are implemented in an automa- tion tool defines how the researcher will discover bugs in Bug Bounty programs, resulting in the hunter earning some money. It’s easy to understand that those researchers who successfully create and maintain their systems don’t want to make them public, so that other hunters don’t find and report the same vulnerabilities as them. This would result in the original developer earning less money than if the tool had remained private. These two situations explain why there is not a single successful public tool with the similar capabilities to those of this system. Although there are many recommendations for building automation, and many tools such as Nikto9 or Nuclei10, which can be used to scan entire web applications and look for some vulnerabilities, there is not a single tool that can do recon and vulnerability detection in a target, can be managed remotely and can be easily deployed anywhere. 7https://discord.com/ 8https://slack.com/ 9https://github.com/sullo/nikto 10https://github.com/projectdiscovery/nuclei 10 https://discord.com/ https://slack.com/ https://github.com/sullo/nikto https://github.com/projectdiscovery/nuclei Chapter 3 Architecture This chapter will explain the architecture of this project. It consists of a main script written in Python that performs the recon and the vulnerability detection, and a MariaDB database which stores all produced data, both running in separate Docker containers. These containers are deployed in a Linode VPS. The main script controls a Discord bot, through which it communicates with the operator. In case of an emergency or maintenance, the operator can connect directly to the VPS via ssh. Figure 3.1: Overall architecture of the project 3.1 Database The database used is MariaDB, which is an open-source relational database. It was born as a fork from MySQL, in order to stay free and open-source, so it uses the same syntax as MySQL1. 1https://mariadb.org/ 11 https://mariadb.org/ Bagley UCM A relational database was used instead of a ORM (Object Relational Mapper) because of a series of factors: • The access to the database needs to be fast and reliable, taking into account that there are several threads writing and reading at the same time, and ORMs are not as efficient as directly accessing the database. • The application uses objects and queries that may not work well with current ORMs due to complexity, including some objects combining storage in database and in files. • The author of this project was very comfortable with SQL databases but not with ORMs at the time of developing this project. Its main purpose is to store a model that represents the targets of the tool. This model is built by the main script when doing recon and then queried by it to look for vulnerabilities or to do further recon. The database also stores some parameters used by the main script and the actual vulnerabilities found. At the conceptual level, this model can be represented as an Entity Relationship model, as shown in the diagram in figure 3.2. It’s composed by the following entities: • Domain: Represents a domain or a group of subdomains that are in scope. Targets are specified to the tool as domains or groups of subdomains, so they are inserted here. Each domain has an ID, a domain name, a set of headers, a set of cookies and a set of excluded submodules. Groups of subdomains are stored with the format ., so for example, the group of subdomains .example.com would contain subdomains such as www.example.com or api.example.com . If a port is specified, it’s inserted as part of the domain name, for example example.com:80. The headers and cookies associated with a domain must be sent with every request made to it. In case of subdomains, it also applies to every child that is inserted in the database after it. This can be useful to scan web applications that require authentication or when a Bug Bounty program specifies headers or cookies that must be sent with every request. They are references to header and cookie entities. The excluded submodules specify to the main script which submodules not to execute for this domain. In the case of subdomains, it also applies to all children. • Path: Represents a path to a resource inside a URL. Each path has an ID, a proto- col, a domain, an element, a parent and a set of technologies. The protocol specifies which one is used to access the resource. It is usually http or https. The domain is the actual domain in which the resource is located, and it’s a reference to a domain entity. The element is the resource that the path is pointing to. The parent is the directory in which the resource is, and it’s a reference to another path entity. For example, for a path representing https://example.com/animals/cat, the element would be cat and the parent would be the path representing https://example.com/animals/. The set of technologies are those that are used in the resource that is represented by the path. They are references to technology entities. 12 Bagley UCM • Request: Represents a request that the browser has made to a website. Every request has an ID, a path, a method, parameters, some data, a response, headers and cookies. The path is the requested resource, and it’s a reference to a path entity. The method is the HTTP method used, commonly GET or POST, although it can be many others2. The parameters of the requests are the query of the URL, which is stored as a single string because, although URL queries are usually parameters in key/value pairs, the standard only specifies that it must be a string3. The data is what is sent as the body of the request when it is a POST request. The response, if any, is the response received back from the server, and it’s a reference to a response entity. Headers and cookies represent those sent with the request, and both are references to the corresponding entities. • Response: Represents a response received by the browser. Each response has an ID, a hash, a code, a body, a set of headers and a set of cookies. The hash is obtained from the body and the code. It’s used to check if a response already exists in the database without having to compare the whole body and the code. The code is the HTTP response code received4. The body is the text included with the response, either html, JSON, etc. Headers and cookies represent those sent with the response, and both are references to the corresponding entity. • Header: Represents an HTTP header. Each header has an ID, a name and a value. • Cookie: Represents an HTTP cookie. Each cookie has an ID, a name, a value and every standard cookie attribute: domain, path, expires, maxage, httponly, secure and samesite5. • Script: Represents a client-side JavaScript file that is interpreted and executed by the browser. Each script has an ID, a hash, a set of paths and a set of responses. The hash is obtained from the script itself (its contents). The paths are the locations in which the script is located in the server. That is, the URLs in which the script can be found. They are references to path entities. The responses are all those in which the script is used. This includes scripts directly located between the