Extensión de CIRCOM con buses: expresividad y seguridad Adding buses to circom: expressiveness and security Trabajo de Fin de Grado Curso 2023–2024 Autor Lucas Cuesta Araujo Directores Clara Rodríguez Núñez Albert Rubio Gimeno Doble grado en Ingeniería Informática - Matemáticas Facultad de Informática Universidad Complutense de Madrid Extensión de CIRCOM con buses: expresividad y seguridad Adding buses to circom: expressiveness and security Trabajo de Fin de Grado en Ingeniería Informática Autor Lucas Cuesta Araujo Directores Clara Rodríguez Núñez Albert Rubio Gimeno Convocatoria: Junio 2024 Doble grado en Ingeniería Informática - Matemáticas Facultad de Informática Universidad Complutense de Madrid May 27, 2024 Acknowledgements To Clara Rodríguez and Albert Rubio, for the enthusiasm they have conveyed to me on the topic of this project. And to Miguel Isabel, for his patience and invaluable help. v Resumen Extensión de CIRCOM con buses: expresividad y seguridad A lo largo de este proyecto se ha desarrollado una extensión de circom, un len- guaje de dominio específico ampliamente extendido para la definición de circuitos aritméticos orientados a demostraciones de “conocimiento nulo” (protocolos crip- tográficos que permiten verificar la veracidad de un predicado sin revelar informa- ción sobre el mismo). El alcance de dicha extensión incluye la incorporación de sintaxis para poder definir conjuntos de señales personalizados llamados buses, los cuales hacen las veces de los structs de la mayoría de lenguajes de programación. Los buses permiten realizar el etiquetado de grupos de señales mediante tags, un elemento de circom gracias al cual se pueden anotar propiedades sobre las señales. Los tags funcionan como un tipado débil orientado a reforzar la corrección y seguri- dad de los circuitos aritméticos generados. Para añadir las nuevas funcionalidades mencionadas a circom se ha trabajado sobre un fork del compilador original con vistas a incorporar estos cambios a la versión oficial proximamente. También se han llevado a cabo mejoras sobre circuitos de la circomlib (la librería oficial del lenguaje circom) empleando los nuevos buses y tags de buses con el objetivo de mejorar la comprensión y seguridad de los programas. Estos circuitos actualizados serán incluidos en la nueva versión oficial de la circomlib. Además, se han real- izado pruebas de rendimiento con las que se ha comprobado que el uso de buses no tiene un impacto negativo en el tamaño de las pruebas generadas por circuitos de circom. Una versión extendida de este trabajo se tratará de publicar en una revista internacional de alto impacto, como IEEE Transactions on Dependable and Secure Computing. Palabras clave Circom, circomlib, compilador, pruebas de conocimiento nulo, circuitos aritméti- cos, criptografía, tags, buses, Rust. vii Abstract Adding buses to circom: expressiveness and security Throughout this project, an extension of circom has been developed. Circom is a widely used domain-specific language for defining arithmetic circuits aimed at zero-knowledge proofs (cryptographic protocols that allow the verification of a pred- icate’s truth without revealing information about it). The scope of this extension includes the incorporation of syntax to define custom signal sets called buses, which function similarly to the structs found in most programming languages. Buses allow the labelling of signal groups using tags, a feature of circom that enables the an- notation of properties on signals. Tags function as a weak typing mechanism aimed at reinforcing the correctness and security of the generated arithmetic circuits. To add the mentioned new functionalities to circom, work has been done on a fork of the original compiler in view of incorporating these changes into the official version soon. Improvements have also been made to circuits in the circomlib (the official library of the circom language) using the new buses and bus tags with the aim of improving the understanding and security of the programs. These updated circuits will be included in the new official version of the circomlib. Additionally, the performance tests conducted have shown that the use of buses does to not have a negative impact on the size of the proofs generated by circom circuits. An extended version of this work will be submitted for publication in a high-impact international journal, such as IEEE Transactions on Dependable and Secure Computing. Keywords Circom, circomlib, compiler, zero-knowledge proofs, arithmetic circuits, cryptog- raphy, tags, buses, Rust. ix Contents 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Work description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 In the following chapters . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 State of Art 9 2.1 Zero-knowledge proofs . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Arithmetic circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Circom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4 Circomlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 The circom compiler . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Circom Development 21 3.1 Definition of buses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Implementation of buses: Parser . . . . . . . . . . . . . . . . . . . . . 23 3.3 Implementation of buses: AST . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Implementation of buses: Static analyses . . . . . . . . . . . . . . . . 27 3.4.1 Symbol analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4.2 Buses free of invalid statements . . . . . . . . . . . . . . . . . 28 3.4.3 Constant propagation . . . . . . . . . . . . . . . . . . . . . . . 29 3.4.4 Type analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4.5 Unknown-known analysis . . . . . . . . . . . . . . . . . . . . . 32 4 Circomlib circuits 35 4.1 Binary numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 Elliptic curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5 Benchmarking 45 6 Conclusions and Future Work 49 6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 xi 6.2 Future lines of work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Bibliography 53 List of figures 1.1 Example of a circom program using tags. . . . . . . . . . . . . . . . 4 1.2 Example of how circom allows the addition of tags. . . . . . . . . . 4 1.3 Example of what circom tags cannot be used for. . . . . . . . . . . . 5 1.4 Example of a circom template using buses. . . . . . . . . . . . . . . 6 2.1 Representation of an arithmetic circuit C defined over the finite field F11 that outputs the result of the operation s1×s2×s3+s4×s5 mod 11. 13 2.2 Example to understand the purpose of tags in circom. . . . . . . . . 15 3.1 Template that checks if a number in binary format has an alias in Fp. 21 3.2 Bus type for binary numbers. . . . . . . . . . . . . . . . . . . . . . . 22 3.3 Template that checks if a number in binary format has an alias in Fp using buses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4 Example of type comparisons with buses. . . . . . . . . . . . . . . . . 31 3.5 Example of dot accesses in buses with different meanings. . . . . . . . 32 4.1 Template that checks if a number in binary format has an alias in Fp. 37 4.2 Template to transform a point in twisted Edwards form to its Mont- gomery form. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 Point bus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4 Template to transform a point in twisted Edwards form to its Mont- gomery form using buses. . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.5 Modification of Edwards2Montgomery() solving the problem with the input point (0,−1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.6 Template to transform a point in Montgomery form to its twisted Edwards form using buses. . . . . . . . . . . . . . . . . . . . . . . . . 43 4.7 Modification of Montgomery2Edwards() solving the problem with the input point (0, 0). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 xiii List of tables 5.1 Test results on templates using the BinaryNumber bus type. . . . . . 47 5.2 Test results on templates using the Point bus type. . . . . . . . . . . 48 xv Chapter 1 Introduction We will use this first chapter to provide an overview of the project and its back- ground. The opening section will offer some context to introduce the underlying ideas behind the work done. Later, we will describe the objectives pursued and the tasks completed throughout the development of the project. 1.1 Motivation Cryptography is the study and practice of secure communication techniques in the presence of adversarial behaviour [1]. With the growing need to protect data, es- pecially with the widespread use of the Internet and the emergence of machine learning, new cryptographic methods are required to ensure robust security for all online activities taking place. In this regard, Zero-Knowledge protocols have been a significant breakthrough. A Zero-Knowledge (ZK) protocol is a cryptographic method that allows one party, called the prover, to convince another party, called the verifier, that a given statement is true without revealing any additional information beyond the state- ment’s validity [2, 3, 4]. ZK protocols must meet three essential criteria: soundness (ensuring that invalid proofs are accepted only with negligible probability), com- pleteness (guaranteeing that valid proofs are always accepted), and zero-knowledge (disclosing no information to the verifier beyond the fact that the statement is true). ZK protocols have become a favoured solution for boosting security and privacy in distributed ledgers, where transparency is a valued feature [5, 6, 7, 8]. Distributed ledgers are decentralised databases shared and synchronised across multiple locations or participants, eliminating the need for a central authority [9]. Each participant has an identical copy of the ledger, which is maintained through a consensus mechanism, ensuring consistency and transparency. Transactions recorded on the ledger are immutable and publicly visible to all participants, enhancing se- curity and trust. Distributed ledgers are widely used in applications like cryptocur- rencies, supply chain management, financial services, voting systems, and smart contracts, providing reliable and tamper-proof records across various industries. 1 2 Chapter 1. Introduction The privacy, transparency, and security features of ZK protocols make them highly attractive for use in distributed ledgers. When implemented in blockchain systems, ZK protocols are additionally required to be non-interactive, meaning the prover can send the proof to the verifier without any further communication, and succinct, fundamental for enhancing the performance, scalability, and privacy of blockchain technologies, making them more practical and efficient for widespread use. SNARK protocols [10, 11] fulfil these requirements, and have been successfully incorporated into blockchains due to their small proof sizes. However, SNARKs are dependent on an initial trusted setup phase [12], which raises security concerns. To address this problem, STARK protocols were introduced in 2018 [13]. They eliminate the need for a trusted setup, but in exchange generate larger proof sizes compared to those of SNARKs, which makes them difficult to implement into blockchains. Re- cently, recursive proofs [14] have emerged as a promising solution, aiming to combine the benefits of both SNARKs and STARKs while overcoming their respective limita- tions. They involve applying recursively one zero-knowledge proof (like a SNARK or a STARK) to verify the correctness of another zero-knowledge proof. This recursive application allows the construction of very efficient and scalable cryptographic proto- cols, which are particularly useful in blockchain technologies and privacy-preserving computation. Like most ZK protocols, SNARKs and STARKs function within the model of circuit satisfiability [15, 16, 17, 18]. This is a classical NP-complete problem in computer science where it has to be determined if an assignment to the inputs of a circuit exists that makes the output true. When that turns to be the case we call the circuit satisfiable, and otherwise the circuit is called unsatisfiable. SNARKs and STARKs work with a special kind of circuits called arithmetic circuits. The wires of arithmetic circuits carry values from a large prime finite field Fp, and its gates per- form the arithmetic operations of adding and multiplying. In a ZK proof, the prover must demonstrate knowledge of private inputs that, combined with public inputs, produce specific outputs within a certain arithmetic circuit. The proof statement is translated into a system of polynomial equations (constraints) within the finite field Fp symbolising the relationships between the signals of the circuit (operands and results of arithmetic gates). However, representing an arithmetic circuit correctly using constraints is a process that becomes increasingly complex the more intricate the proof statement gets, especially for recursive proofs requiring validation across multiple iterations. To help cryptographers in this task, high-level domain-specific languages like Zokrates and Leo have been developed [19, 20], providing classical programming instructions tailored for arithmetic circuits and compiling into the polynomial con- straints used in ZK proofs. These languages support standard arithmetic operations, logical operations, comparisons, control-flow statements, and custom data structures through classical structs for a more structured approach to complex data. Despite their easy usage, these kind of languages can lead to a loss of control 1.1. Motivation 3 over the final generated constraint system, potentially making it too large for effi- cient handling by the ZK protocol. Consequently, many cryptographers prefer other languages like circom [21, 22], which offers precise control over constraints but sac- rifices some expressiveness. Circom does not support custom types using structs and only provides signals, arrays, and tuples. Despite these limitations, circom is widely used in multiple projects due to its scalability. The use of languages like cir- com enhances efficiency but also increases the risk of failing to meet specifications or introducing security flaws. Bugs in the arithmetic circuit definition, especially in under-constrained circuits where more solutions than intended are allowed, pose significant security risks. Large circuits with thousands of constraints are prone to vulnerabilities, so cir- com simplifies the task of writing constraints by enabling the creation of small, modular circuits called templates that can be interconnected to form more complex systems. These templates can be custom-built by programmers or sourced from ZK libraries like circomlib (the official circom library) [23], which offers a variety of pre-audited arithmetic circuits, such as comparators, hash functions, digital signa- tures, and converters. Despite these safeguards, errors can still occur if templates are not used correctly. For example, a vulnerability in the Baby-Jubjub elliptic curve library has been identified involving the handling of private keys in nullifier computations. The Baby-Jubjub curve uses a prime number r for defining the order of a subgroup where the public-key computation from a private key x is performed. If a malicious user employs an alternative private key x′ related to the original pri- vate key x by x′ = x + i · r (where i is a number from the finite field Fp), they can generate identical public keys for different private keys and distinct nullifiers for the same transaction, potentially enabling undetected double spending. With such problems in mind, the circom 2.1.0 update introduced tags [24], a basic form of type annotations for circuit signals, allowing programmers to express the intended behaviour of a signal. For instance, by using tags we can assign a max value label to the input signal in so other programmers using the template can understand that the values of in have to be confined to the interval (0,max), as demonstrated in the babypbk template of figure 1.1. Although at first glance it may seem the code should work fine, a compilation error will arise because the component bby instantiated from the template babypbk receives as input the value of the signal key, which does not include the max tag. Thus we cannot be sure if key will have values greater than r. To solve this problem, we have to replace line 6 with the commented code in lines 7-9. What the template MaxValueCheck(r) will accomplish is adding the max tag to the key signal by im- posing the necessary constraints to guarantee that the signal adheres to a maximum value of r. Notwithstanding, circom will also allow us to attach the tag max to the private key directly, like in figure 1.2. That code will not issue any compilation error, but the underlying security flaw will still be there, as no additional constraints have been implemented to ensure that 4 Chapter 1. Introduction 1 template main() { // beginning of template 's definition 2 signal input pvt_key; //input signal of the circuit 3 signal output pbl_key; // output signal of the circuit 4 var r = ...; // normal variable storing the prime number for the curve 5 signal key; // intermediate signal 6 key <== pvt_key; // connection between key and pvt_key 7 // component maxvCheck = MaxValueCheck(r); 8 // maxvCheck.in <== pvt_key; 9 //key <== maxvCheck.out; 10 component bby = babyPbk(r); // instantiation of babypbk template circuit 11 bby.in <== key; // connection between key and the input signal of bby 12 pbl_key <== bby.out; // connection between pbl_key and the output signal of bby 13 } 14 15 template babypbk(r) { //this circuit computes the public key from the private key 16 signal input {max} in; //input signal with maximum value tag attached 17 assert(in.max < r); 18 //... standard code for the public -key computation 19 } Figure 1.1: Example of a circom program using tags. 1 signal {max} key; 2 key.max = r; 3 key <== pvt_key; Figure 1.2: Example of how circom allows the addition of tags. 1.1. Motivation 5 the key value is less that r. It is important to note that circom compiler only carries out syntactic checks. The onus is on the programmer to verify the semantics of tags. As mentioned, tags are an excellent tool to annotate the expected properties of individual signals. But despite their great potential of expressiveness, they cannot represent relationships between multiple signals. To show this idea, let us take a look at the template from the circomlib in figure 1.3. 1 template Edwards2Montgomery () { 2 signal input in[2]; //array of input signals 3 signal output out [2]; //array of output signals 4 5 //tells the compiler how to compute the value of signals out [0] and out[1] 6 out [0] <–- (1 + in[1]) / (1 - in[1]); 7 out [1] <–- out[0] / in[0]; 8 9 // declaration of constraints without associated computations 10 out [0] * (1-in[1]) === (1 + in[1]); 11 out [1] * in[0] === out [0]; 12 } Figure 1.3: Example of what circom tags cannot be used for. The Edwards2Montgomery template works on the assumption that the signal array in[2] represents a point of the Baby-Jubjub elliptic curve in twisted Ed- wards form [25, 26]. Therefore, the template will not function as intended if these signals do not conform to the expected format, compromising the security of the entire arithmetic circuit. Up to this moment circom did not offer any feature that could allow programmers express semantic relationships amongst multiple signals. Our goal in this project is trying to solve this problem with the implementation of buses, a set of related signals. Just like in real electronic circuits, a bus will extend the functionality of individual signals, allowing programmers to use them as inputs, outputs or intermediates in templates. Additionally, they will let the association of semantics embodied by tags to groups of signals, providing a tool to solve problems such as the one shown in the previous example. Let us see how that error could be avoided with the implementation of buses in figure 1.4. We have modified the template to now accept a bus of the type Point as input and return another Point as output. The input Point is required to implement the edwards_point tag as a way to ensure it will indeed be a point in twisted Edwards form, and the output Point will denote its Montgomery form through the mont- gomery_point tag. This way, if the input bus adheres to the specified requirements, the output will always be granted to meet its own specifications. 6 Chapter 1. Introduction 1 bus Point { 2 signal x; 3 signal y; 4 } 5 6 template Edwards2Montgomery () { 7 Point input {edwards_point} in; 8 Point output {montgomery_point} out; 9 10 out.x <–- (1 + in.y) / (1 - in.y); 11 out.y <–- out.x / in.x; 12 13 out.x * (1-in.y) === (1 + in.y); 14 out.y * in.x === out.x; 15 } Figure 1.4: Example of a circom template using buses. 1.2 Objectives The final goal of all the work presented here is implementing buses in the circom language as a means to provide two new features: the creation of custom signal types, improving the expressiveness of the language, and the bonding of tags to groups of signals with the purpose of annotating relationships between them. This will improve the safety of circuits developed in circom by allowing programmers to annotate expected properties involving multiple signals. We will accomplish this task by implementing the syntax to enable the use of the aforementioned buses. In addition, we want to apply the benefits of buses to the circuits in the circomlib, the circom official library, in view of the release of a new version. In order to do that, we will have to prove the correctness of the new modified circuits. Our targets can be broken down into the following specific objectives: I) Equipping the circom language with the necessary syntax to define groups of signals called buses, serving as the hardware equivalent of structs. II) Extending the use of tags to bus types as a way to express group attributes. III) Updating circomlib templates with the new circom add-ons to improve its clarity and reliability. IV) Verifying the correctness of the new template versions in the circomlib. V) Testing the impact of buses on the sizes of the ZK proofs generated by circuits using them. 1.3 Work description When I started working on this project, the first thing to do for me was learning the characteristics of the circom language, which are quite unique. After getting 1.4. In the following chapters 7 used to the syntax of circom, I began working with the base code of the circom compiler written in Rust. Rust is a general-purpose programming language designed for systems development, like OS and compilers, with a strong emphasis on memory safety. Its unique characteristics make the language have a very steep learning curve. It took me a while to get familiar with the Rust syntax and concepts like the borrow checker. The first part of the compiler that I worked on was the parser, where I defined the syntax of the new bus feature that would be added to circom. Then we moved to the definition of the AST nodes and other objects that would encapsulate all the information related to buses. After this stage, I developed a series of static analyses over the AST in order to ensure the correct use of buses. While the rest of the parts of the compiler were being modified, I focused on the revision of circuits in the circomlib to create the new official version of the library coming out. I adapted the templates to the addition of buses and the association of tags to them. To ensure everything would work as expected, I proved the correctness of some of the circuits affected by the modifications carried out, but found potential bugs that will be examined in the future. Finally, once a functional version of the compiler was complete, I tested the new updated versions of the circomlib circuits to have a first glance at the benefits and drawbacks brought by buses. The project’s code can be found in the forked repository https://github.com/ Sacul231/circom. The /tests/circomlib folder contains all the modified ver- sions of the circomlib templates, while the parts of the compiler developed during the project are distributed between the folders /parser, /program_structure and /type_analysis. 1.4 In the following chapters Having introduced the topic and goals of this project, the chapters that follow will be used to further elaborate in the background of ZK proofs and circom as well as to explain the details of the work done. In the second chapter we will see in depth what a zero-knowledge proof is and why is it so useful. We will describe how do we work with ZK proofs as arithmetic circuits and the features offered by circom and circomlib to encode and generate ZK proofs. Finally, an overview of the parts that make up the circom compiler and their purposes will be given. The third chapter will begin with a discussion on the need to introduce buses in circom, followed by an explanation of its implementation in the circom compiler. We will go step by step in the compilation process to show each modification needed. Some parts of the Rust code will be exposed to help the reader understand the ideas discussed. In the fourth chapter we will look at the updated versions of some of the files in the official circom library, the circomlib, the reasons why they had to be modified and some proofs of their correctness. Chapter 5 will be used to present the results in performance obtained with the updated versions of some circomlib templates. https://github.com/Sacul231/circom https://github.com/Sacul231/circom 8 Chapter 1. Introduction Finally, we will present our conclusions on the work carried out and suggest new lines of research. Chapter 2 State of Art The purpose of this chapter is to provide the reader with an understanding of the background surrounding circom and its practical applications. We will begin by introducing the concepts of Zero-Knowledge proofs and arithmetic circuits, followed by an overview of the circom language and its compiler. 2.1 Zero-knowledge proofs One of the main issues concerning the current cryptographic protocols is revealing sensitive information to prove some statement. Imagine you want to book your holidays in advance through an online website. Although the company may let you delay the time to pay for later, you will be required to provide your credit card number as “evidence” of having a bank account. This number will be stored in a central database, which is vulnerable to hacking attacks and information leaks. Zero-knowledge proofs represented a breakthrough in this regard, as they aim at improving security of information for individuals. A zero-knowledge proof offers a means to validate the veracity of a statement without disclosing the statement itself [2, 3, 4]. In our previous example, ZK protocols would empower the company to check the existence of your bank account but the proof would not reveal the credit card number. The ‘prover’ is the party trying to prove a claim, while the ‘verifier’ is responsible for validating the claim. To understand how this process works let’s take a look at the following classical example from “How to Explain Zero-Knowledge Protocols to Your Children” [27]. In this story Ali Baba, a merchant from the Eastern city of Baghdad long time ago, founds himself robbed in the bazaar. He follows the thief into a cave whose entryway forks into two paths, both of which end in a dead end, and loses track of the thief. For the next 39 days he keeps getting robbed at the bazaar, but the thieves mysteriously disappear at one of the caves’ ends. Finally he decides to un- veil the secret of the cave and hides under some sacks at the end of the right-hand passage. He discovers the thieves could open the walls of the cave by whispering the magic words ‘Open sesame’. Moreover, he finds out the secret door connects 9 10 Chapter 2. State of Art the left-hand passage with the right-hand passage. After some experimentation Ali Baba manages to change the magic words and writes down his discovery with clues about the new password in a manuscript. In the present the manuscript is found by US researchers who manage to re- cover the new magic words. The mysterious cave is also located, and one of the researchers, a certain Mick Ali, wants to demonstrate he knows the secret. How- ever, he has no intention of revealing the magic words, so he comes up with a plan. First, a television crew films a detailed tour of the cave with the two dead-end pas- sages. Later, everybody exits the cave, and Mick Ali goes back in alone through one of the passages. Then the reporter, accompanied by the camera, enters the cave only as far as the fork. There he flips a coin to choose between right and left. Depending on the result of the coin flip, he tells Mick to go out on the left or the right, and Mick does just that. If Mick Ali did not know the magic words and could not open the door, there would be a 50% chance of him being exposed. Each new test divides by two the chances of success for someone who ignores the secret. By repeating this experiment n times, the probability that Mick does not know the magic words but keeps coming from the correct passage is 2−n. On the other hand, the secret allows Mick to come out each time by the required exit. So after several tries, the television crew is convinced that Mick knows the code despite not knowing it themselves. A zero-knowledge protocol must satisfy the following principles: • Completeness: The protocol will always return ‘true’ if the input is valid. • Soundness: It will be almost impossible, with an arbitrary small margin of probability, for a dishonest prover to mislead an honest verifier into accepting an invalid statement. • Zero-knowledge: The verifier gains no additional information about the statement beyond its validity or falsity; they possess “zero knowledge” regard- ing the statement. This condition also prevents the verifier from deducing the original input (i.e., the contents of the statement) from the proof. In the previous example, interaction between the prover and the verifier was needed as well. While revolutionary, the practical utility of interactive proving was limited due to the requirement for both parties to be continuously available for in- teraction. Even if a verifier was convinced of a prover’s honesty, the proof lacked independent verification capability, as generating a new proof necessitated a fresh set of messages between the prover and verifier. In response, Manuel Blum, Paul Feldman, and Silvio Micali proposed the first non-interactive zero-knowledge proofs [10], where the prover and verifier share a key. This enables the prover to demon- strate his knowledge of certain information without providing the information itself [11]. 2.2. Arithmetic circuits 11 Unlike interactive proofs, non-interactive proofs entail only a single round of communication between the participants. Here, the prover employs a specialised al- gorithm to compute a zero-knowledge proof based on the secret information, which is then transmitted to the verifier. The verifier, in turn, employs another algorithm to confirm that the prover possesses the secret information. Non-interactive proving reduces communication between the prover and verifier, thereby enhancing the efficiency of ZK-proofs. Additionally, once generated, a proof becomes available for verification by anyone with access to the shared key and veri- fication algorithm. Amongst this type of zero-knowledge proofs we find the so-called ZK-SNARKs. ZK-SNARK stands for Zero-Knowledge Succinct Non-Interactive Argument of Knowledge. A ZK-SNARK protocol fulfils the same requirements as the rest of zero- knowledge protocols, with the addition of being non-interactive and having succinct proof size regardless of the size of the statement to prove. Most ZK-SNARKs also guarantee short verification times. However, not everything about ZK-SNARKs is perfect. The setting up of this kind of protocols requires an initial phase called trusted setup. This step is dependent on some values generated randomly, and the exposure of the random values could compromise the security of the whole scheme. To improve the security of the protocol, most implementations carry out this setup phase applying multi-party computation (MPC), in which multiple independent par- ties collaboratively to build up the trusted setup parameters. In this process, it is enough that one of the participants destroys its secret counterpart of the contribu- tion to keep the whole scheme secure. ZK-SNARK protocols are mostly employed to verify the correctness of a com- putation. In this context, computations can be represented by defining them as arithmetic circuits [15]. 2.2 Arithmetic circuits Arithmetic circuits work just like normal electronic circuits but with numbers from a finite field instead of electric signals. They consist of a set of wires connected to gates that perform operations of addition and multiplication modulo p, where p is usually a large prime number of approximately 254 bits [26]. Within the signals of the circuit we can distinguish between input signals, intermediate signals and output signals. Arithmetic circuits are used in the context of ZK-SNARK proofs because they provide a language to encode ZK proofs as statements involving the inputs and outputs of a circuit. Circuit satisfiability (determining whether the input signals of a given circuit have an assignment that results in a certain output) is a classical NP-complete problem of computability theory. Thus, finding a correct assignment for the inputs of a circuit that matches the given outputs is very expensive compu- tationally. This fact makes arithmetic circuits a fantastic option for ZK protocols [16, 17, 18]. 12 Chapter 2. State of Art Usually, in ZK-SNARK proofs intermediate signals are only known to the prover, while output signals are public both to the prover and verifier. Amongst the input signals, both public and private ones can be found. Therefore, the prover must provide evidence that, with the public information available, he knows a valid as- signment to the rest of signals that makes the circuit satisfiable, named a witness. The encoding of this arithmetic circuits has been an area of study in recent years [28]. A representation is required that both simulates accurately the underlying arithmetic circuit and does not add new possible solutions that could compromise the security of a cryptographic protocol using it. One of the most widespread ways of encoding is an algebraic form called Rank-1 Constraint System (R1CS). An R1CS encodes a program as a set of conditions or constraints over some variables, so that a correct execution of a circuit is equivalent to finding a satisfiable variable assign- ment. Due to the interchangeable use of arithmetic circuits and R1CS, programs specified in R1CS are often referred to as circuits, and their variables as signals. R1CS use quadratic constraints to represent arithmetic circuits. Formally, a quadratic constraint over a group of signals S = {s1, ..., sn} is an equation of the form (a1s1 + ...+ ansn)× (b1s1 + ...+ bnsn)− (c1s1 + ...+ cnsn) = 0 where ai, bi, ci ∈ Fp ∀i ∈ {1, ..., n}. We can summarise this equation as A×B−C = 0, with A, B and C representing linear combinations of the signals s1, ..., sn over Fp. A Rank-1 Constraint System (R1CS) is then defined as a finite collection of quadratic constraints over S. Before continuing, we are going to examine an example of an arithmetic circuit and its R1CS form to get familiar with both concepts. Example 2.2.1. Let C be the circuit from figure 2.1 where values of signals belong to the finite field F11. The circuit outputs the result of the operation s1 × s2 × s3 + s4 × s5 The arithmetic circuits we will be working with use gates which always have two input signals and one output signal. Therefore, to calculate the value of the output signal s9, C uses three multiplication gates and one addition gate, requiring three intermediate signals s6, s7 and s8. All in all, C is a circuit defined by the values of the set of signals S := {s1, s2, s3, s4, s5, s6, s7, s8, s9} 2.2. Arithmetic circuits 13 × × × + s 1 s2 s3 s 4 s5 s 6 s 7 s8 s9 Figure 2.1: Representation of an arithmetic circuit C defined over the finite field F11 that outputs the result of the operation s1 × s2 × s3 + s4 × s5 mod 11. An example of a witness for C could be the set of values w = {1, 2, 3, 4, 5, 2, 6, 9, 10}. We cannot take equation s1 × s2 × s3 + s4 × s5 mod 11 as the R1CS representation of circuit C because it does not have the shape of a quadratic constraint. To get a proper R1CS form of the circuit we need to split the equation in intermediate steps expressed as constraints.  s1 × s2 − s6 = 0 mod 11 s3 × s6 − s7 = 0 mod 11 s4 × s5 − s8 = 0 mod 11 s7 + s8 − s9 = 0 mod 11 This representation can be further simplified as follows s1 × s2 − s6 = 0 mod 11 s3 × s6 − s7 = 0 mod 11 s4 × s5 + s7 − s9 = 0 mod 11 Here we have joined two constraints by replacing the intermediate signal s8 with the equivalent expression s9 − s7. We can do this because the new constraint keeps the structure of a quadratic constraint. If that were not the case we would not have been able to reduce the number of constraints in the system. R1CS are a very natural way of describing arithmetic circuits since they rely on additions and multiplications. In fact, every transformation over a set of signals per- formed by an arithmetic circuit can be described with enough quadratic constrains. Therefore, if a signal assignment provides a solution for the constraint system, the same assignment will be valid for the circuit, making it satisfiable. Proving satisfi- ability in R1CS encoded form requires to check all gates of a circuit. With this in 14 Chapter 2. State of Art mind, most ZK protocols use aggregation techniques, such as quadratic arithmetic programs, to check all gates at once [15]. There exits a group of software tools developed for designing arithmetic circuits and creating ZK proofs. They can be divided in two main groups. On one hand we find libraries, also known as backends. Libraries are typically written in a general purpose programming language and offer features that allow users to define state- ments, as well as to generate and validate ZK proofs. On the other hand we have domain specific languages (DSL), in which users can express statements in a higher level language that is later translated by the corresponding DSL compiler to lower level functions provided by a library. DSLs provide a more natural manner of de- signing ZK proofs since statements are formulated at the level of abstraction of the problem domain. As a result, domain experts (in our context, cryptographers and developers) can more easily understand, validate, modify, and develop programs [21]. Two kinds of DSLs for ZK can be distinguished: program-based DSLs and constraint-based DSLs. In a program-based DSL, statements are defined as “programs” written in a reg- ular programming language. The instructions of the program are then converted into primitives by the backend system, and the frontend compiler transcribes the program into a circuit definition comprising a set of constraints. Meanwhile, a constraint-based DSL requires users to code their statements as circuits using arith- metic constraints. Writing large constraint systems is streamlined by enabling the creation of small circuits that function as modular components, which can be in- terconnected to form larger circuits as in any hardware description language like VHDL. The constraint-based DSL compiler may apply simplification techniques to the circuit’s constraint set, reducing the size of the statement generated. In general, constraint-based DSLs produce much smaller ZK statements than program-based DSLs as they provide programmers with a greater control over the constraint sys- tem [21]. 2.3 Circom Circom is a contraint-based domain specific language for designing arithmetic cir- cuits in R1CS form [21, 22]. Despite its low-level approach, letting programmers specify and modify each constraint of the statement to be proved, it aims at mak- ing the process of designing large circuits as simple as possible. In this regard, circom borrows from hardware description languages the idea of defining smaller pieces of the circuit as modules that can be interconnected to describe more com- plex behaviours. Circom’s modules are named templates, conveying the meaning of defining a circuit mould with some parameters that can later be instantiated with concrete values of its parameters. Circom refers to these template instantiations as components. 2.3. Circom 15 Amongst its unique characteristics is its capability to divide the circuit descrip- tion in two different parts: one for constraints and the other for witness computation. This division becomes necessary when witness computation involves operations be- yond quadratic constraints. The ability to split the two parts enables the compiler to automatically generate a program to efficiently compute the witness of a ZK proof. When compiling a circuit with circom, the compiler outputs the associated R1CS constraints and can additionally provide programs in C++ or WebAssembly for computing the witness. In addition, we mentioned in section 1.1 how circom 2.1.0 introduced tags as a mechanism to augment security measures and ensure type safety within arithmetic circuits [24]. Tags in circom serve as simple type annotations for input and output signals associated to semantic meanings that help define and constrain the data being processed. The integration of tags facilitates the compiler’s ability to verify the correctness of connections between signals. For instance, if a particular input in a subcircuit requires a tag, any signal connected to this input must also exhibit the same tag, ensuring type consistency across the circuit. This feature primarily adds a syntactic level of checking where the compiler ensures that the tags on connected signals match as specified. We will illustrate this idea with a simple example: 1 template Bits2Num(n) { 2 signal input {binary} in[n]; 3 signal output out; 4 var lc1 =0; 5 6 var e2 = 1; 7 for (var i = 0; i out; 13 } 14 15 tempate Example { 16 signal output num; 17 18 signal bits [2]; 19 // signal {binary} bits [2]; 20 component b2n = Bits2Num (2); 21 22 bits [0] <== 0; 23 bits [1] <== 1; 24 //(bits [0] - 1) * bits [0] === 0; 25 //(bits [1] - 1) * bits [1] === 0; 26 b2n.in <== bits; 27 num <== b2n.out; 28 } Figure 2.2: Example to understand the purpose of tags in circom. 16 Chapter 2. State of Art Template Bits2Num converts a number in binary form represented by an array of signals to its Fp counterpart (it is important to recall that the signals of an arithmetic circuit carry values from a large prime finite field). Therefore, Bits2Num assumes that the values of in[n] signals will be either 0 or 1. To communicate that assumption to other circuits using Bits2Num its input signal in[n] is defined with the tag binary. But as we can see the signals of bits[2] in the Example template do not exhibit the binary tag, so this code will result in a compilation error although the values we are assigning to bits[0] and bits[1] are 0 and 1 respectively. We could solve this issue by adding the tag binary to the declaration of bits[2], as done in line 19. However, the underlying semantic problem would persist since we have included no new constraints to check if the signals of bits[2] carry values other than 0 or 1. If we replaced the values assigned to bits[0] and bits[1] with 3 and 4, we would get no compilation errors despite not fulfilling the preconditions of Bits2Num. The constraints we should add to perform that verification can be seen in the commented lines 24 and 25. It is critical to note that while the compiler enforces syntactic correctness, it does not delve into verifying the semantic correctness of the tags. The responsibility to validate that the tags accurately represent the underlying semantics of the data and adhere to the intended logic of the circuit rests with the programmer. This approach underscores the importance of the programmer’s role in not only designing the cir- cuits but also in ensuring that the semantic integrity of the tags is maintained, thus preventing potential security vulnerabilities that could arise from improper imple- mentations. 2.4 Circomlib We have already explained how circom promotes the use of templates to define small individual circuits that later can be combined to form larger ones. In this regard, circom users have two different alternatives. On one hand, they can design their own templates. On the other hand, they can draw on circuits provided by libraries. And amongst those libraries, one of the most reliable and widely used ones is circomlib [23]. Circomlib is an open-source library that enhances the functionality of cir- com. This library includes a wide range of pre-built and rigorously audited circuits, allowing developers to construct complex cryptographic protocols more efficiently by leveraging existing components rather than creating each element from scratch. The circuits available in circomlib cover various common cryptographic opera- tions. For instance, it includes implementations of some of the most frequently used cryptographic hash functions like MiMC, Poseidon, and SHA-256. It also provides circuits for elliptic curve operations, such as point addition and scalar multiplication, which are essential for modern cryptographic schemes like digital signatures and key 2.5. The circom compiler 17 exchange protocols. Additionally, circomlib offers components for verifying dig- ital signatures, basic arithmetic operations in finite fields, and integer comparison circuits. One significant advantage of circomlib is the fact that all its circuits are thor- oughly tested and audited, ensuring a high level of security and reliability. This reduces the risk of introducing errors or vulnerabilities into cryptographic applica- tions, a crucial characteristic for maintaining the integrity and trustworthiness of these systems. Circomlib also promotes a modular approach to circuit design. Developers can combine smaller, well-defined components provided by the library to build larger and more complex circuits. This modularity not only simplifies the development process but also improves the maintainability and scalability of the circuits. Circomlib is commonly found in blockchain projects to implement ZK-SNARKs and other Zero-Knowledge proof systems. These applications often require privacy- preserving transaction protocols, and circomlib’s pre-built circuits make it easier to achieve these goals. Beyond blockchain, circomlib is also used for creating privacy solutions like secure voting systems, private identity verification, and confi- dential data sharing, as well as in scalability solutions that enhance the performance of blockchain networks by enabling efficient off-chain computations that can be ver- ified on-chain. In the final stages of this project we will look at some examples of circuits from the circomlib repository and how they could benefit from the use of buses. We will modify some of them and show the advantages of the updated templates implement- ing buses, which will be released with the new official version of the circomlib. Lastly, we will put the new circuits to the test to check if buses affect the performance of circom programs. 2.5 The circom compiler The goal of this project is to expand the circom language by adding the new bus feature, which will improve the language expressiveness and enhance the safety of all code generated. In order to do that some parts of the circom compiler will need to change so as to equip it with the necessary tools to recognise and compute buses cor- rectly [29]. To carry out this task we will first need to understand how the circom compiler works and what are its component parts. We will take a look at each stage of the compilation process and show the characteristics of the software involved [30]. The first thing the compiler must do is dividing the characters that make up a program into tokens, which are the units of meaning of the program. Afterwards, the compiler will assess the syntactical correctness of the program in a process called parsing. An incontextual grammar must be defined so as to create the pushdown automaton (PDA) that will perform the parsing. In the case of the circom com- 18 Chapter 2. State of Art piler, both of this stages are joined together in a file that directly defines the lexicon and the LR(1) grammar of the circom language using the tools provided by the Rust crate “lalrpop”. During the parsing, the compiler is also building a tree whose nodes correspond to the expressions, statements and declarations of the program, known as the Ab- stract Syntax Tree (AST). Each node contains the information of the part of the program it represents. After the parsing process is finished and the AST is assem- bled, a series of static analyses are performed on the nodes of the tree. The circom compiler includes the usual binding analysis, where every identifier is attached to its declaration, and typing analysis, helpful for finding type-mismatch errors. In addition, other more specific static analyses are carried out, including: • Return statements in functions: circom allows the declaration of usual functions to perform complex operations over a set of numerical values. All functions must return a value or array of values, so a return statement must be found in each of its possible paths of execution. • Functions free of template elements: As mentioned before, circom func- tions are only able to perform operations over numerical values. Therefore, it is not possible for a function to declare signals or components inside its body. This static analysis ensures that this restriction is met. • Signal declaration: A circuit structure cannot change over time or be de- pendent on the values of its signals. This is why circom does not allow the declaration of signals inside while statements. In circom, signals must be declared at the top scope of templates, and this analysis will check templates to not declare signals inside an inner block. The only exception are if-else blocks, where signals can be declared as long as the value of the condition can be known at compilation time. • Constant propagation: For performance reasons and other important is- sues, variables detected to be constant are computed and propagated through the AST. • Unknown-Known analysis: We brought up in the signal declaration anal- ysis the importance of keeping circuits’ structure stable. If a component were instantiated using the values of signals as arguments of its template, or the size of an array were determined by signals, we would not be able to guarantee that the structure of the circuit would not be altered. In this analysis, each expression in a circom program is classified as Known or Unknown at com- pile time. Hence the compiler will issue an error if a constraint declaration depends on an Unknown expression. The static analyses phase is followed by a rather special symbolic execution of the code. Here, the compiler will disassemble the component structure of the program to gather all constraints that form the ZK proof statement in a single list. To do that, it must first compute the concrete values of each expression used as a template 2.5. The circom compiler 19 parameter in a component instantiation so as to know the actual structure of the component. Notice that all of these expression have been previously checked to be known at compilation time thanks to the unknown-known analysis. At the same time this process takes place, the compiler will create the Directed Acyclic Graph (DAG). This graph will contain information on every component and its associated template. That will be of great help for the compiler to know which components have the same template and parameter values linked to them, since this information can be used to reduce the amount of memory needed for the instantiation of tem- plates. After the DAG is created, all of the constraints will be extracted from their components, and the compiler will try to simplify the system without changing its semantics to reduce the size of the statement. Finally, an intermediate represen- tation of the program is generated to later be used to compute the witness in the desired back-end language. Originally the circom compiler was written in JavaScript, a general-purpose programming language widely used in web software because of its multiple features but with some flaws related to performance and security. Through versions from 0.0 to 2.0 the compiler continued the use of this language for its software files [31, 32]. Then, in version 2.0, it was decided to rewrite the official circom compiler in Rust, a systems programming language focused on performance and safety. This is the language we will be working with. Rust has a lot of fancy characteristics that allow its usage in many different kinds of projects [33, 34]. Perhaps the most important of those characteristics is its concept of ownership and borrowing. Ownership is a set of rules that govern how a Rust program manages memory. With the aim of avoiding dangling references (pointers to locations in memory that may have already been given to someone else) the solution that Rust brings to the table is having a single owner for each piece of data. When that data has to be used by other part of the program, either the owner has to change or a borrowed reference has to be created. The rules regarding ownership and borrowing can resemble the algorithms applied in concurrent prob- lems such as the classic readers and writers. In fact, Rust also has strong concurrent applications. Since no garbage collector is needed to ensure the safety of references, Rust performance ranks higher than that of most of its peers. The Rust language offers many other interesting features, such as pattern match- ing and ideas from functional programming and the object-oriented paradigm. Per- haps it is not the easiest programming language to start working with, as its radically different ideas about references and types can sometimes seem little intuitive or ar- bitrary, but once they are internalised memory safety is almost 100% guaranteed. Chapter 3 Circom Development In this chapter, we will take a look at the implementation of the new bus data structure in the circom language. We will start by defining precisely what we want to add to the syntax of the language and why do we need this new feature. Afterwards, we will describe what changes have been made to each part of the compiler in order to achieve the desired result. Finally, we will cover the details of the implementation and the dificulties found during the whole process. 3.1 Definition of buses Usually, when we refer to a bus of signals we want to express the idea of a bunch of signals grouped together. This concept is very useful in many different ways. Not only it allows us to define a custom data type that lets us package together and name multiple related values that make up a meaningful group, as a struct or structure would do in most programming languages, but it also opens the door to defining semantic statements applied to the whole bus that can be used to ensure the circuit’s safety. 1 include "compconstant.circom"; 2 3 template AliasCheck () { 4 signal input {binary} in [254]; 5 6 component compConstant = CompConstant (-1); 7 for (var i=0; i <254; i++) in[i] ==> compConstant.in[i]; 8 compConstant.out === 0; 9 } Figure 3.1: Template that checks if a number in binary format has an alias in Fp. To further motivate the importance of defining buses, let us take a look at figure 3.1, which displays a circomlib circuit used in many projects. The purpose of this circuit is adding the necessary constraints to ensure a number N coded in binary 21 22 Chapter 3. Circom Development with 254 bits belongs to the finite field Fp. The question is, how could this not be possible? Imagine we are working with three-bit numbers and our finite field is F5. We will not have any problems with the numbers 0 = (000)2, 1 = (001)2, 2 = (010)2, 3 = (011)2 and 4 = (100)2. However, although we can represent number 5 with our three bits as (101)2, that number does not belong to our field, it has an ‘alias’, which is 0, the representative of the equivalence class 0 mod 5. To solve this issue, the template AliasCheck uses an instance of another template called CompConstant. This second template checks whether a number in binary no- tation is greater than the parameter of the component, and returns 1 if that happens to be the case or 0 otherwise. AliasCheck creates an instance of CompConstant with the parameter −1 to compare the input number N with p−1. If N > p−1 it means N is not a valid number of our field. In our example, since 5 > 4, AliasCheck will include a constraint compConstant.out === 0; that will not be met by the input 5 (remember we are working with numbers from Fp). The problem with this solution lies in its implementation. Considering we do not have a way of attaching a tag to a group of signals, we will be forced to undergo this test for every new number in its binary representation, although that number may come as the result of operations within Fp over other numbers that have been checked, and so the property of belonging to the field is preserved. We will be adding unnecessary constraints that will increase the size of the ZK proofs. In this context, the addition of buses to the syntax of circom shines. Let us see how we could change the above code using the concept of buses. First, we will have to establish the syntax used for defining bus types. We have chosen to work with the one shown in figure 3.2. 1 bus BinaryNumber { 2 signal {binary} bits [254]; 3 } Figure 3.2: Bus type for binary numbers. Our new bus will represent a number in binary notation. We have stated the bus to have an array of 254 signals implementing the tag binary (they should only carry the values 0 and 1). Right now it seems that we have not gained anything by defining this bus type BinaryNumber, as we could represent the binary number with the same array of signals we have declared inside the bus. But we now can attach a tag to the BinaryNumber bus, so everyone using it later will know the number represented by the binary code actually belongs to Fp. The unique tag added to the out declaration encapsulates the semantics of being the representative of its equivalent class in Fp and not having a different representa- tion. This characteristic of out is forced by the constrains of AliasCheck, and can 3.2. Implementation of buses: Parser 23 1 include "compconstant.circom"; 2 3 template AliasCheck () { 4 signal input {binary} in [254]; 5 BinaryNumber output {unique} out; 6 7 component compConstant = CompConstant (-1); 8 9 for (var i=0; i <254; i++) { 10 in[i] ==> compConstant.in[i]; 11 in[i] ==> out.bits[i]; 12 } 13 14 compConstant.out === 0; 15 } Figure 3.3: Template that checks if a number in binary format has an alias in Fp using buses. be later assumed by any other template using the value of out. Now that we have motivated the inclusion of buses into the circom language, it is high time to pore over the modifications of the circom compiler we need to make in order to allow their use. 3.2 Implementation of buses: Parser As mentioned in section 2.5, the circom compiler consists of multiple parts that work together to infer the semantics of the program, detect possible errors and pro- duce code in some backend language. We will go through each of those parts and the tweaks made to implement the bus construction. Let us begin with the parser. To enable circom users to define their own bus types and declare buses of signals for the defined types, the lexicon and grammar of the language must be extended with new rules. We will start by focusing on the rules for defining new bus types. A new keyword bus must be included to in- dicate the beginning of the definition of a bus type. The identifier of the bus type will be written after the bus keyword, followed by an optional list of parameters inside parenthesis. These parameters will allow the programmer to create different instantiations of the same bus type instead of having to define a new type for each instantiation. An example where this feature could be useful is when defining a bus type containing an array of signals. Perhaps we want to have objects of that bus type with arrays of different sizes, but without the aid of parameters we would be forced to define a new type for each length. Since buses function as signals, the compiler needs to know at compilation time their internal structure because the arithmetic circuit’s shape can not depend on the values of signals. Consequently, the values of bus parameters have to be known at compilation type. We will cover this fact 24 Chapter 3. Circom Development in more detail in section 3.4. For the purpose of making bus syntax less verbose we will allow the definition of buses without parenthesis with the same meaning as writing empty parenthesis. The last part of the bus definition will display the body of the bus, a list of signal, array and bus declarations inside brackets denoting the fields that the bus will encompass. The described rule looks like this: BusDefinition → ‘bus ’ Identifier (‘(’ ParametersList ‘)’ | ϵ) ‘{’ Body ‘}’ Our new rule will be connected to the definition rule in the sense that the non- terminal BusDefinition will be generated by the non-terminal Definition as happens with TemplateDefinition and FunctionDefinition as well. Definition → BusDefinition | TemplateDefinition | FunctionDefinition Another rule is needed for declaring buses of a certain type. When we declare signals in circom we use the keyword signal followed by the group to which the signal belongs (which can be input for input signals, output for output signals or none for intermediate signals), a list of tag names inside brackets (can be omitted if the signal has no tags attached) and the name of the signal. We will use the same syntax for bus declarations, but replace the signal keyword by the identifier of the bus type we want to instantiate and, if needed, the arguments passed as its parameters: SignalDeclaration → Identifier (‘input ’ | ‘output ’ | ϵ) (‘{’ TagsList ‘}’ | ϵ) Identifier BusDeclaration → Identifier (‘(’ ArgumentsList ‘)’ | ϵ) (‘input ’ | ‘output ’ | ϵ) (‘{’ TagsList ‘}’ | ϵ) Identifier Like we did with the definition rule, we have connected the new rule to the rule for declarations. Declaration → BusDeclaration | ComponentDeclaration | SignalDeclaration | VarDeclaration We will also need a rule for accesses to the fields of a bus variable. However, since we are using the same dot syntax as for accesses to input and output signals of components, we will use the already defined rules for component accesses, and thus 3.3. Implementation of buses: AST 25 we will not add any further rules on this matter. We have covered every possible case of use of our new circom feature. The circom compiler has all the necessary tools to build up the Abstract Symbol Tree (AST) with new nodes related to buses. That is our next stop in the compilation process. 3.3 Implementation of buses: AST The abstract syntax of a programming language refers to a simplified, structural representation of the program’s source code, stripped of unnecessary details such as specific punctuation and formatting that are present in the concrete syntax (the literal syntax used when writing the code). The abstract syntax focuses on the logical structure of the code, highlighting the relationships and hierarchy between different elements of the program. This syntax is encoded in the AST, so its rules will serve as the guidelines to build the nodes of the tree. In order to represent bus types and instances in the AST, we have to define new nodes that will contain all the information needed to later perform the desired static analyses and the rules to construct them. In other words, we need to define the abstract syntax of buses. When a bus type is defined, the information of its parameters and fields must be stored with a view to accessing it whenever we need to perform some analysis. For example, if we want to determine whether an access to a bus field is correct, we will check if the type of that bus contains a field with the same identifier as the one in the access studied. To keep that information, we will define a new AST node BusDefinition with attributes where we will store all the characteristics of the bus type. Since nodes like FunctionDefinition and TemplateDefinition store similar information for functions and templates respectively, we will copy their structure in BusDefinition. Apart from other attributes which are not important for our discussion, four are the features essential to store: the name of the bus type, the list of parameters of the bus type, the body with the declarations of the fields and the list of those fields. However, the list of bus fields is not assembled during the AST construction, so we will have to create it in a later step. Thus, the constructor of the node (which we will call buildBus) will receive the first three attributes as arguments and will build the corresponding BusDefinition node. buildBus : Identifier × ParametersList × Statement −→ BusDefinition We have only defined one rule for bus definitions in the non-contextual gram- mar of our compiler, so the abstract syntax BusDefinition node will always be constructed with the equivalent abstract rule: BusDefinition → Name ParametersList Body BusDefinition.AST = buildBus(Name,ParametersList,Body) 26 Chapter 3. Circom Development In the same way that we saw for the parser context-free grammar, the new rule will be connected to the definition rule because the BusDefinition node will be a type of Definition node, like TemplateDefinition and FunctionDefinition. Definition → BusDefinition Definition.AST = BusDefinition After the AST construction, we will have to call a function findFields over the Body statement that will put together the list of fields in our bus type. This function will do a tree traversal over the AST nodes starting at Body, and will add to the list each signal or bus declaration found in the process. Finally, the list will be stored as an attribute of our BusDefinition node. Internally, this list will also be stored as a hash map to enable constant time accesses to the fields of the bus, but we will not go any further into the details of implementation. Before continuing, it is worth mentioning that we have also be forced to change the abstract syntax of templates, as now they accept both buses and signals as in- puts and outputs. Although a minor change, during the process of finding the input and output lists of a template, the equivalent template function of findFields will now look for declarations of buses as well, and not just signals like it did previously. We will now describe how we are going to store information on bus declarations. Due to reasons of code reuse we have decided to internally break up declarations of buses of the form Point(3) input {tag } p; in two different statements bus p; and p = Point(3) input {tag };, similarly to how templates are instantiated as components. In fact, the code behind both types of declarations has many common parts. This approach has led us to the following abstract syntax rule: BusDeclaration → BusVariableDeclaration BusTypeInstantiation BusVariableDeclaration → Identifier Type Dimensions BusTypeInstantiation → Identifier ArgumentsList Here we have divided the declaration in two nodes of different types. The node BusVariableDeclaration is a declaration type node that will contain the usual information of this kind of nodes: type of variable, name of the variable, dimen- sions (in case it is an array), etc. The node BusTypeInstantiation corresponds to an expression node associated with the type of the bus instantiated with concrete parameters. Having decided the representation of bus definitions and declarations inside the AST, it is time to have a look at the static analyses required to ensure a correct and safe use of buses in circom programs. Throughout the next section we will examine each of those analyses and motivate their importance in the compilation process. 3.4. Implementation of buses: Static analyses 27 3.4 Implementation of buses: Static analyses Circom is a cryptography-oriented language; programmers using the language ex- pect their programs to have a high level of security. However, programmers make mistakes, and in the case of ZK-proofs those mistakes can cause huge problems. We cannot just let them rely on their ability to detect errors. Moreover, with the addition of new functionalities it is even more likely that those mistakes appear in the code when trying to get used to the latest add-ons. Therefore, the compiler has some responsibility towards programmers to help them detect errors as early as possible. Static analysis refers to the process in which the code of a program is analysed without its execution. Almost every modern compiler executes some sort of static analysis during compilation to help programmers detect errors and ease the task of debugging their code. In our case, those analyses are all the more necessary for the reasons mentioned above. With the addition of our new bus circom feature, up to four static analyses need to be implemented or modified to guarantee our code is free of the most common mistakes. We will have a look at each of them in order of execution. 3.4.1 Symbol analysis The first analysis run by the circom compiler is the symbol analysis. This anal- ysis is in charge of ensuring every identifier used in the program can be linked to an object visible in the current scope. When an attempt is made to use a symbol declared in an inner scope or apply a function that has not been previously defined, the symbol analysis will fail and an error message pointing to the identifier not de- fined will be thrown. The symbol analysis will also check that no identifier is used in two different definitions whatever the objects being defined so that coherence will be preserved and ambiguous identifiers will be avoided. Additionally, this analysis will ensure that all functions, templates and now buses are called or instantiated with the correct number of arguments. A reader familiarised with compilers might think the previous analysis is no dif- ferent thing than the usual binding static analysis found in most compilers. While both play a similar role, a normal binding analysis would link each identifier with its declaration or definition after checking it has been correctly used, but our symbol analysis has been relieved of that responsibility, which is performed by other pro- cesses. Nevertheless, the algorithm behind these two analyses is the same. To keep track of the identifiers already defined or declared previously in the code, we need to make use of an environment, which you can imagine as a stack contain- ing tables of symbols. Each time the algorithm enters an inner scope, for instance within a while loop, an empty table of symbols is pushed onto the environment stack to hold the identifiers of the declarations found in that scope. When a new identifier is declared, first the environment checks whether the symbol has already 28 Chapter 3. Circom Development been defined to throw an error message, and otherwise stores it in the current table of symbols. Then if a statement uses a certain identifier, the environment will search for that identifier in the current table of symbols, if it is not found there it will keep searching in the previous table of symbols, and this process will continue until the identifier is found in a table of symbols or the environment reaches the base of the stack. When the latter happens an error has been found because an identifier has been used without declaring it first, but the analysis will carry on in search of more possible errors. This whole process works because in a certain scope the identifiers allowed to be used are the ones declared within that scope or in outer scopes. Con- sequently the environment can search for an identifier in the symbol table at the top of the stack and all the ones below it. After the entire scope has been analysed the table of symbols representing that scope is popped from the top of the stack because those identifiers will no longer be available for use. In the case of our symbol analysis, apart from checking if an identifier can be linked to a declaration or definition, the compiler will compare the structure of the statement where the symbol was found with its definition to look for possible mistakes in the number of parameters of a function or the arguments to instantiate a template. This analysis will be also necessary for buses as their types can be specified with parameters as well. On top of that, there is a special case with buses that can cause errors which were previously impossible, and that is when a tag attached to a bus uses the same name as one of the bus’s fields. Remember that tags are labels used to tie semantics to signals, but we have not mentioned they can be associated with numbers to express properties such as carrying values smaller than a certain number. To access the value of a tag or assign a value to the tag we have to use the dot notation. It is clear that if we use an identifier for both the tag of a bus and one of its fields we could create ambiguous situations where the compiler would not know if we are referring to the value of the tag or the bus field. The symbol analysis is responsible for checking that no tag attached to a bus is called the same way as one of its fields to avoid this kind of situations. 3.4.2 Buses free of invalid statements The next analysis we will be covering is the detection of invalid statements inside the body of a bus definition. Our idea of a bus is a set of signals (alone or forming arrays and other buses) grouped together for the purpose of representing a certain concept that can have properties attached to it. Buses like signals are immutable objects, so it does not make sense to let them have operations as classes in object-oriented languages do. Since we do not want (at least for the moment) to allow buses to include conditional declarations, we should not let programmers include statements different from declarations in the bodies of bus definitions. Our first approach was to constrain the kind of statements allowed in the body of a bus definition through the parser. We created a new BusBody rule where the only possible statements to be used were declarations of signals, buses and arrays made up of elements from those two. Despite working, this idea had its drawbacks. 3.4. Implementation of buses: Static analyses 29 On one hand error messages would not have been very precise at pointing to the mistake causing the errors. On the other hand a lot of code from other parts of the compiler would have needed to be duplicated. In view of these problems we decided to use the same rule for the body of buses as for templates and functions, and detect any problem related to buses’ bodies in a later stage. In fact, a similar situation happens with functions, where signal declarations are not allowed but the verification of this fact takes place at a later phase of compilation, with a static analysis. Since the parser was now freed from any responsibility we had to implement a static analysis for detecting any forbidden statement inside a bus declaration. The analysis performs a tree traversal through the AST nodes of the tree with its root at the bus’s body statement. In each node, the analysis will decide if the syntax represented by that node is allowed to be used inside a bus definition, and fail otherwise. The elements that will cause the analysis to fail are non-declaration statements, such as while loops, conditionals or constraints, declarations of input or output signals (it does makes no sense to declare signals inside buses as input or output since buses are not circuits) and expressions containing function calls where the function identifier is unknown or anonymous component calls. 3.4.3 Constant propagation After the invalid statements analysis a constant propagation process is performed. During this process expressions in a bus definition will be examined to determine whether they maintain a constant value through the whole program. Those expres- sions found to be constant will be evaluated and their values will be broadcasted across the AST. This process has two objectives. On one side it allows for optimi- sations that will improve the performance of the final generated code. On the other side it facilitates the implementation of the unknown-known analysis that will be performed later. 3.4.4 Type analysis We now enter the stage of the type analysis, perhaps the most important of all the analyses we will discuss. The goal of the type analysis is to check that all the statements of the program follow the rules imposed by the type system of the lan- guage. A type system is a set of rules that allow a type to be assigned to every syntactic construction in a language in order to later verify the correct formation of the constructions found in a program. Types restrict the operations that can be applied to an object. Not every syntactically correct construction is valid. In spite of the loss in flexibility, the use of types offers many advantages. To begin with, they enable early error detection, shortening the debugging phase and avoiding errors in production. They can also help the programmer to better convey his intentions and allow for deferring and encapsulating decisions about the representation of complex types, easing program maintenance and modification. In addition, a code fragment can be assigned many different types, which fosters code reuse and reduces costs of 30 Chapter 3. Circom Development production. Circom is a strongly and statically typed programming language that uses man- ifest typing. In circom, the programmer needs to indicate the type of each object (numerical variable, signal, signal array, component, ...). The type analysis we are discussing at this moment is used by the circom compiler to enforce the use of operations on the types they are designed for, like the use of arrows <== and <-- on signals, and executing the type checking in search of errors. This characteristics make the type system of circom a very restrictive one, but the security and effi- ciency obtained are worth the cost. Before the inclusion of buses in the language, circom did not provide support for the creation of custom types, but that changes with the addition of buses. Ultimately buses work just like signals, so a programmer should be able to con- nect two buses of the same type directly, without having to worry about the con- nection of each of the signals that make up the whole bus. That is why we allow the constraint in line 17 of figure 3.4. In order to avoid type-related mistakes the compiler must check if the types of two connected buses are equal. We have decided to implement type equivalence by name, so two buses with different name types will be consider different even though they may display the same internal structure. As a result line 18 of figure 3.4 will produce a compilation error even though bus types A and B share the same internal structure. This type of equivalence is more restric- tive, but is less prone to errors that can go undetected, and that is a key factor in a language designed for generating ZK proofs. It should be noted that two bus types will be considered equal even if they are instantiations of a certain bus definition with different values used as parameters. For example, if we define a bus type bus BusT(N) {...} with a parameter N to indicate the size of a signal array inside the bus and later declare BusT(2) busA; and BusT(3) busB;, an statement like busA <== busB; will not result in a type error because the compiler considers the types of those two buses to be the same. The algorithm applied to perform the type checking traverses the AST bottom- up, inferring the type of a node from the types obtained for its children and the rules of the type system. The type inferred is attached to the node to be used when comparing the types of each expression looking for type mismatches. When all the children of a node have been typed, the algorithm will verify that the types of each expression are compatible and the global type structure makes sense. Therefore, if an expression like list[i+j] is found, the algorithm will first type the expressions list and i+j. Then it will verify that the type of list is in fact an array and the type of i+j is an arithmetic expression. Finally, it will attach the type of the objects inside the array list to the node representing list[i+j]. The algorithm will find that type looking at the declaration of list. Buses generate a new problem when it comes to the type analysis: they force us to treat expressions with accesses to their fields. Accesses may be concatenated, so the process of typing them has to be recursive. What is worse, since the syntax 3.4. Implementation of buses: Static analyses 31 1 bus A { 2 signal s1; 3 signal s2[2]; 4 } 5 6 bus B { 7 signal s1; 8 signal s2[2]; 9 } 10 11 template Example () { 12 A input a1; 13 B output b; 14 15 A a2; 16 17 a2 <== a1; 18 b <== a2; 19 } Figure 3.4: Example of type comparisons with buses. to get to a certain field is the same as to get to the input and output signals of a component or to provide a tag with some value, we are required to distinguish between those cases based on the types of each identifier. To better understand the hurdles found when trying to check correct typing in buses, let us look at the example in figure 3.5. In template Example() we have three accesses with the dot syntax applied to the bus out of type B, but each of them serves a different purpose. The access in line 14 is used to assign a value to the max_a tag associated with out. In line 15 we access the signal field b of out and in line 16 we first access the bus field busA of out to later access the signal field a of out.busA, which is a bus of type A. As you can see, an expression using dot accesses over buses could have various meanings, and is the duty of the type analysis to determine which of them is the correct one. When working with dot accesses in expressions, the type analysis algorithm first has to check that the variable type definition is that of a bus. After that, the algorithm is in charge of verifying the bus type includes a field by that name or has a tag named that way linked to it. It follows from this fact that there cannot be a field in a bus and a tag attached to it with the same identifier, but that situation was already prevented in the symbol analysis. If the identifier after the dot in the access expression is associated with a tag and no dot is found after the tag, the job is finished and the expression’s type will be tag. If it is a signal, either the whole expression is a signal or another dot access is found after the signal’s identifier, making it a tag type. Lastly, the identifier may belong to another bus variable, and the entire process has to be run again. 32 Chapter 3. Circom Development 1 bus A { 2 signal a; 3 } 4 5 bus B { 6 signal b; 7 A busA; 8 } 9 10 template Example () { 11 signal input in; 12 B output {max_a} out; 13 14 out.max_a = 3; 15 out.b <== in; 16 out.busA.a <== in; 17 } Figure 3.5: Example of dot accesses in buses with different meanings. 3.4.5 Unknown-known analysis Leaving behind the type analysis, there is only one static analysis left we need to discuss: the unknown-known analysis. This static analysis is fairly particular to circom. As mentioned in section 2.5, circuits in real life are fixed structures whose composition cannot be altered by the values of its wires. No matter what values we use as inputs of an OR gate, it will remain an OR gate. It is a common mistake however to apply the same logic used for normal programming languages to hard- ware description languages. An inexperienced programmer may try to connect an output signal to one input signal of a circuit or another depending on the value of a signal used as a flag. The goal of the unknown-known analysis is to detect those kind of errors and prevent them from happening. There are two main concerns that make it critical to detect these kind of mis- takes. The first one is to keep the circuit’s structure described by the R1CS of a program stable. If we allowed the structure of an arithmetic circuit to depend on the values of its signals, a program would no longer represent a certain circuit, but rather a group of them. The code would not be translated into an actual circuit until execution time. This is not the purpose of the circom language. The other problem is related to security. If we do not know what circuit a program represents until execution time, we cannot guarantee that a circuit generated that way will not have breaches that make it unsafe due to multiple witnesses that make the circuit satisfiable, and thus a deceitful prover could take advantage from it and fool the verifier. That is why a whole analysis is carried out to alert the programmer to such errors. The unknown-known analysis works as follows. Each expression in the AST whose value cannot be known until the execution of the code is tagged as Unknown. 3.4. Implementation of buses: Static analyses 33 Such expressions include operations over signals, components or variables whose values depend on signals. At this point our previous constant analysis where we found out what expressions remain constant over the computation of the program is of great help. Those expressions whose values are known at compilation time are tagged as Known. The analysis traverses the AST downwards tagging all expres- sions. The goal is to find a constraint or signal declaration in a code block whose execution depends on an Unknown expression. As we have seen, we should not allow this to happen, and hence an error message will be thrown pointing to the part of the program which depends on an Unknown value. Array size declarations with an Unknown expression also cause the analysis to fail. Since buses are treated like signals they and all their fields and tags must be labelled as Unknown. Another thing we have to pay attention to is bus types in- stantiations because the type of a bus signal should be known at compilation time in order to carry out type comparisons properly, so it cannot be dependent on Unknown arguments. And of course the expressions used in the bus definition to declare the sizes and arguments of each field have to be Known. Having verified all of these cases we can now be more confident about the security of the program. During the next phases of the compilation process other things will be addressed such as tags management, constraints optimisations and code generation. All of these parts of the compiler need to undergo certain changes to allow the use of buses in circom programs. However, the modifications that need to be done are beyond the scope of this project and the work done by its author. Instead, during the next chapter we will focus on the implementation of buses in circuits from the circomlib, the circom library, with the aim of enhancing safety and detection of errors for the next official version of the repository. Chapter 4 Circomlib circuits We will devote this section to present the changes made to the circuits of the cir- comlib library with the objective of taking advantage of the benefits brought by the new bus tool. Since circomlib is the official circom library, it is expected to grant a high level of reliability to its users. Buses can help achieve this purpose by eliminating some of the security flaws that programmers can introduce by accident in their codes associated with the supposition of preconditions on groups of signals that may not be met. In collaboration with tags, they can effectively be used to express relationships between elements of a circuit and therefore add an additional security check for detecting this kind of bugs. Moreover, buses provide a new means to represent complex ideas, the same way structs and classes allow the definition of new data structures from simpler data types. This can aid programmers at convey- ing ideas and all in all facilitating collaboration and code reuse. Two major modifications have been carried out in the templates of circomlib whose effects have spread through almost all of the files. One of the modifications has to do with the representation of numbers in binary form, and more precisely, the problems that conversions from numbers in Fp to numbers in binary form entail. A glimpse of the other major change was shown in section 1.1. Here we have de- fined two bus types for representing points in an elliptic curve. In combination with tags, this has allowed us to implement verifications of some properties regarding the coordinates of these points. Nonetheless, it is worth mentioning again that this implementation only adds a syntactic check, but no semantic test is performed, so that responsibility still belongs to the programmer. 4.1 Binary numbers Let us commence by looking at the first modification mentioned. Circom signals carry values from the finite field Fp, where p is a prime number which is usually very large. This format can sometimes be inconvenient to perform certain operations us- ing quadratic constraints, such as order comparisons. To overcome this issue, many circomlib templates first transform the signals involved into their binary repre- 35 36 Chapter 4. Circomlib circuits sentation, carry out the operation over the two numbers in binary format, and then undo the transformation. Therefore, a lot of circuits are implemented receiving as input an array of signals that is supposed to stand for the binary form of a number in A, where A can be Fp or a subset of Fp. However, a problem arises from this solution: if one of the signals from the array carries a value other than 0 or 1, or if the number represented by the array of signals falls outside of A, preconditions are not met and the behaviour of the circuit may not be the expected one. To better understand the second case, imagine that A = F5. We will need 3 bits to represent all numbers from A in binary format. However, not every number whose binary representation employs 3 bits will be contained in A. For example, if we consider the number (111)2 = 7, it does not belong to F5, so it is not supposed to be a valid input, but since 7 ≡ 2 mod 5 maybe the operations applied to (111)2 yield the same output as (010)2. Having two different inputs that make a circuit satisfiable is the main issue in ZK proofs we want to avoid, as it can compromise the security of the whole protocol. The problem with bit values outside {0, 1} can be partially addressed with tags, included in circom from version 2.1.0 onwards. A binary tag can be added to the declaration of the input array of signals to ensure those signals have been associ- ated with the binary tag by presumably implementing the necessary constraints to guarantee the values they carry are either 0 or 1. To tackle the other mentioned problem some templates like CompConstant and AliasCheck have been introduced. The CompConstant template must be instantiated with a number ct as parameter. It receives an array of input signals as the bits of some number n in Fp and returns an output signal that equals 1 if n > ct in binary or 0 otherwise. AliasCheck takes advantage of this behaviour, and employs a component of CompConstant to check if a number received in binary form through an array of 254 input signals is smaller than −1 ≡ p− 1 mod p. With both circuits, we can insert the necessary constraints in our circuits to make sure we are not using any number outside our desired range. Buses can help us improve the verifications performed by those two circuits. Imagine we had a manner of enveloping all the signals associated with the bits of a number in a certain data structure that we will call BinaryNumber. Then not only it would be far more clear what the idea behind those signals is, but we could modify AliasCheck to return a BinaryNumber signal out with a tag unique attached to it in order to inform other circuits manipulating out that the values carried by its bit signals have been checked to belong to a number in Fp. That is exactly what buses allow us to do, as shown in figure 4.1. The BinaryNumber(n) definition has been added to a different file of the cir- comlib called bitify.circom, where most of the templates involving transformations of numbers in Fp to numbers in binary format and and vice versa can be found. Those templates have also been modified to receive or return BinaryNumber buses so as to keep coherence through all the repository. 4.2. Elliptic curves 37 1 template AliasCheck () { 2 BinaryNumber (254) input in; 3 BinaryNumber (254) output {unique} out; 4 5 signal greater <== CompConstant (-1)(in); 6 greater === 0; 7 out <== in; 8 } Figure 4.1: Template that checks if a number in binary format has an alias in Fp. 4.2 Elliptic curves It is time to focus on the modifications made to represent points of elliptic curves. One of the most important protocols provided by the circomlib is public-private key encryption circuits. Public-private key encryption, also known as asymmetric cryptography, is a cryptographic system that uses pairs of keys: one public key that can be shared openly and one private key that is kept secret. In this system, any- thing encrypted with the public key can only be decrypted by the corresponding private key, and vice versa [35, 36]. This dual-key mechanism not only helps in en- crypting messages but also in verifying the identity of the communicators, ensuring that the data remains secure from unauthorized access. Asymmetric cryptography is foundational in numerous security protocols, including those used for secure email transmissions, digital signatures, and secure web browsing. In circom, those protocols are implemented using ECDSA (Elliptic Curve Digi- tal Signature Algorithm). ECDSA is a variant of the DSA algorithm, but instead of functioning with exponentiations and discrete logarithms it employs operations over points in elliptic curves, which reduces the sizes of the numbers needed to guarantee the same security level as DSA or RSA [37]. The circom circuits work with the points of a particular elliptic curve defined in [26], known as the Baby-Jubjub curve. In order to understand how this elliptic curve is defined and used, we first need to review some definitions extracted from [25]. Definition 4.2.1 (Montgomery curve). Take a field F whose characteristic sat- isfies char(F) ̸= 2. Fix two parameters A ∈ F \{−2, 2} and B ∈ F \{0}. The Montgomery curve with coefficients A and B over F is the elliptic curve EM,A,B : Bv2 = u3 + Au2 + u Definition 4.2.2 (Edwards curve). Take a field F whose characteristic satisfies char(F) ̸= 2. An Edwards curve over F is an elliptic curve E : x2 + y2 = 1 + dx2y2 where d ∈ F \{0, 1}. 38 Chapter 4. Circomlib circuits Definition 4.2.3 (Twisted Edwards curve). Take a field F whose characteristic satisfies char(F) ̸= 2. Fix distinct non-zero elements a, d ∈ F. The twisted Edwards curve with coefficients a and d over F is the elliptic curve EE,a,d : ax 2 + y2 = 1 + dx2y2 An Edwards curve is a twisted Edwards curve with a = 1. Theorem 3.2. of ?? demonstrates that every twisted Edwards curve EE,a,d over a field F with char(F) ̸= 2 is birationally equivalent to a Montgomery curve EM,A,B over F via the map (x, y) 7→ (u, v) = ( 1 + y 1− y , 1 + y (1− y)x ) (4.1) and its inverse (u, v) 7→ (x, y) = ( u v , u− 1 u+ 1 ) (4.2) where A = 2(a+ d)/(a− d) and B = 4/(a− d), or equivalently a = (A+ 2)/B and d = (A− 2)/B. The Baby-Jubjub curve was discovered using a deterministic algorithm designed by a group of researchers of IRTF in 2016 [38]. Given a prime number p, the al- gorithm returns the Montgomery elliptic curve defined over Fp with the smallest coefficient A such that A − 2 is a multiple of 4. The requirement for A − 2 to be divisible by 4 is useful to simplify the operations involving points of the curve re- turned by the algorithm because these operations rely heavily on the value A − 2. The Baby-Jubjub elliptic curve was the first curve returned by the algorithm satis- fying the criteria of SafeCurves, a project that performs some of the most common and known attacks on elliptic curves to verify their security level [39]. Definition 4.2.4 (Baby-Jubjub). Consider the field Fp where p is a huge prime number found in [26]. We define the Baby-Jubjub curve as the Montgomery elliptic curve over Fp given by the equation BJM : v2 = u3 + 168698u2 + u Its equivalent form as a twisted Edwards curve over Fp is described by the equation BJE : 168700x2 + y2 = 1 + 168696x2y2 The idea behind providing different equations to represent basically the same elliptic curve is not just mathematical curiosity. Each type of elliptic curve has its own laws to define the geometrical operations of adding two points or multiplying a point by an integer. As a result, some kinds of curves provide more efficient opera- tions than others, so by changing from one representation to another we can improve the performance of the algorithms working with points of the curve. 4.2. Elliptic curves 39 Circomlib offers templates to apply these operations over points of the Baby- Jubjub curve either in twisted Edwards form or in Montgomery form. Templates to change from one form to the other are also found. However, we face the same problem as when we were working with the binary representation of numbers in Fp: these templates operate on the assumption that their input signals represent the coordinates of a point of the Baby-Jubjub curve in the desired form, but there is no mechanism that guarantees that this precondition is met. By using buses in the circuits dealing with elliptic curves points we will add another layer of verification to detect possible mistakes that can lead to security flaws. To illustrate how we will be doing this, we present the template provided in circomlib to transform a point of the Baby-Jubjub curve in twisted Edwards form to its Montgomery form as it is in the current version of circomlib: 1 template Edwards2Montgomery () { 2 signal input in[2]; 3 signal output out [2]; 4 5 out [0] <–- (1 + in[1]) / (1 - in[1]); 6 out [1] <–- out[0] / in[0]; 7 8 out [0] * (1 - in[1]) === (1 + in[1]); 9 out [1] * in[0] === out [0]; 10 } Figure 4.2: Template to transform a point in twisted Edwards form to its Mont- gomery form. As the reader can appreciate, the Edwards2Montgomery template receives two input signals standing for the x and y coordinates of a twisted Edwards curve point, encodes the equations given by the map defined in 4.1 and returns the u and v coordinates of the same point in the equivalent Montgomery curve as two output signals. The template has to declare separately the instructions to compute the witness of the arithmetic circuit and the constraints representing it because the op- erations carried out do not have a quadratic form. Nonetheless, both parts express the same equations moving terms properly. Since the input of the circuit is a basic array of two signals, it is not difficult to wrongly wire them to other signals assuming they represent the coordinates of a point in twisted Edwards form but without implementing the necessary constraints to force that fact. To try to avoid these kind of situations we will replace the arrays of signals by a bus conveying the idea of a point in an elliptic curve. The bus will naturally have two signal fields x and y, one for each coordinate of the point. Using this new bus type we can attach tags to the inputs and outputs of the template in 4.2 to characterise points in twisted Edwards form or in Montgomery form. We will assign the tag babyedwards to points of the twisted Edwards version 40 Chapter 4. Circomlib circuits 1 bus Point { 2 signal x,y; 3 } Figure 4.3: Point bus. of the Baby-Jubjub elliptic curve, while we will use the tag babymontgomery for points of the Montgomery Baby-Jubjub curve. Therefore, the template will receive a Point bus with the tag babyedwards and will output a Point bus with the tag babymontgomery, which in theory should be equivalent to the point represented by the input. 1 template Edwards2Montgomery () { 2 Point input {babyedwards} pin; 3 Point output {babymontgomery} pout; 4 5 pout.x <–- (1 + pin.y) / (1 - pin.y); 6 pout.y <–- pout.x / pin.x; 7 8 pout.x * (1 - pin.y) === (1 + pin.y); 9 pout.y * pin.x === pout.x; 10 } Figure 4.4: Template to transform a point in twisted Edwards form to its Mont- gomery form using buses. With this new version of the circuit it is far more clear what the template is doing. Whoever employs the template will now automatically understand that the template receives a Baby-Jubjub point in twisted Edwards form and returns the same point in Montgomery form. Moreover, if the user forgets to add constraints to ensure the input bus is in fact a point of BJE and does not assign the tag babyed- wards to the bus, the compiler will issue an error due to a tag mismatch between two wires connected. The only possible scenario where a bug can be introduced is when the programmer inserts the tag explicitly but does not implement the nec- essary constraints to force the fulfilment of preconditions or ensures that they are always met. To show that the new version of the template guarantees that a point of the Montgomery Baby-Jubjub curve will always be returned if the precondition sug- gested by the babyedwards tag is fulfilled (i.e. the input point belongs to the twisted Edwards Baby-Jubjub curve) we will present a rigorous mathematical proof. Proposition 4.2.1. If the Point bus pin belongs to the elliptic curve BJE described by the equation 168700 (pin.x)2 + (pin.y)2 = 1 + 168696 (pin.x)2(pin.y)2 (4.3) 4.2. Elliptic curves 41 and is different from the point (0,−1), then the Point bus pout returned by the arithmetic circuit Edwards2Montgomery() will belong to the elliptic curve BJM de- fined by the equation (pout.y)2 = (pout.x)3 + 168698 (pout.x)2 + pout.x (4.4) Proof. Let us consider the first of the constraints declared by Edwards2Montgomery(). pout.x× (1− pin.y) = 1 + pin.y ⇔ pout.x− 1 = pin.y× (pout.x+ 1) Then taking a = 168700, d = 168696 and A = 168698 we can reason as follows on the twisted Edwards equation: a (pin.x)2 + (pin.y)2 = 1 + d (pin.x)2(pin.y)2 ⇔ ⇔ a (pin.x)2(pout.x+ 1)2 + (pin.y)2(pout.x+ 1)2 = = (pout.x+ 1)2 + d (pin.x)2(pin.y)2(pout.x+ 1)2 ⇔ ⇔ a (pin.x)2(pout.x+ 1)2 + (pout.x− 1)2 = = (pout.x+ 1)2 + d (pin.x)2(pout.x− 1)2 ⇔ ⇔ a (pin.x)2(pout.x)2 + 2a (pin.x)2pout.x+ a (pin.x)2+ + (pout.x)2 − 2 pout.x+ 1 = (pout.x)2 + 2 pout.x+ 1+ + d (pin.x)2(pout.x)2 − 2d (pin.x)2pout.x+ d (pin.x)2 ⇔ ⇔ (a− d) (pin.x)2(pout.x)2 + 2 (a+ d) (pin.x)2pout.x+ + (a− d) (pin.x)2 = 4 pout.x Notice that a − d = 4 and a + d = 2A. Now we can use the equivalence pout.y × pin.x = pout.x given by the second constraint of the template to multiply both terms of the equation by (pout.y)2 and obtain the formula 4 (pout.x)4 + 4A (pout.x)3 + 4 (pout.x)2 = 4 pout.x (pout.y)2 ⇔ ⇔ (pout.x)4 + A (pout.x)3 + (pout.x)2 = pout.x (pout.y)2 We are very close to getting the desired Montgomery curve equation 4.4. The only step left is dividing by pout.x, but in order to do that we first have to verify that pout.x can not be zero. Let us suppose that pout.x = 0. Then the first constraint of Edwards2Montgomery() forces 0 = 1 + pin.y, and thus pin.y = −1. Since the point pin fulfils the Baby-Jubjub twisted Edwards equation 4.3, the coordinate pin.x can only be 0. However, one of our premises was pin ̸= (0,−1), so with our initial assumptions it is not possible for pout.x to be 0, and that concludes the proof. It is worth mentioning that the exclusion of the point (0,−1) from the possible template inputs is not arbitrary. This point is one of the two exceptions of the map 4.1 (the other one being (0, 1)) because it would require a division by 0. Its corre- sponding point of the curve BJM is (0, 0) [25]. The problem of accepting (0,−1) as input of Edwards2Montgomery() is that taking pout.x = 0 all constraints are satisfied since the nullity of pin.x cancels out the left term of the second constraint, 42 Chapter 4. Circomlib circuits and that allows pout.y to take any possible value, but only for pout.y = 0 the point pin belongs to BJM . The discovery of this exception has led to an investiga- tion with the goal of determining if there exists a situation where pin could take the values (0,−1). If that is the case, we will have found a bug in the circuit that should be immediately fixed, as any circuit of the circomlib must have only one possible output in order to be secure, and it should meet the post-conditions required. On the other hand, if no circuit of the circomlib uses Edwards2Montgomery() with pin.x = 0 and pin.y = −1, the addition of a tag signalling the new precondition pin ̸= (0,−1) will be enough. We will not have problems with the point (0, 1) as it is ruled out by the constraints of the template. To address the possible bug without excluding the input (0,−1), a modification of Edwards2Montgomery() is proposed in figure 4.5. Lines 11-13 force pout.y to be 0 if pin.x = 0 ((0,−1) is the only point of the twisted Edwards Baby-Jubjub curve with zero as its first coordinate that fulfils the constraints of Edwards2Montgomery()). We have also changed the computation of the witness in lines 5-6 to return pout.x = 0 and pout.y = 0 if the inputs verify pin.x = 0 and pin.y = −1. The result would be the same if we replaced the constraint in line 12 by isz.in <== pin.y + 1. 1 template Edwards2Montgomery () { 2 Point input {babyedwards} pin; 3 Point output {babymontgomery} pout; 4 5 pout.x <–- pin.y != -1 ? (1 + pin.y) / (1 - pin.y) : 0; 6 pout.y <–- pin.x != 0 ? pout.x / pin.x : 0; 7 8 pout.x * (1 - pin.y) === (1 + pin.y); 9 pout.y * pin.x === pout.x; 10 11 component isz = isZero (); 12 isz.in <== pin.x; 13 isz.out * pout.y === 0; 14 } Figure 4.5: Modification of Edwards2Montgomery() solving the problem with the input point (0,−1). We will now turn our attention to the circuit of circomlib that performs the opposite operation. The template Montgomery2Edwards() receives a point of the Montgomery Baby-Jubjub curve and returns the equivalent point of the twisted Ed- wards Baby-Jubjub curve. Figure 4.6 displays the updated version of the template using buses to represent points. We will conduct a similar proof to the one for proposition 4.2.1 to verify that tags are correctly assigned to the output wires. Proposition 4.2.2. If the Point bus pin belongs to the elliptic curve BJM described by the equation (pin.y)2 = (pin.x)3 + 168698 (pin.x)2 + pin.x (4.5) 4.2. Elliptic curves 43 1 template Montgomery2Edwards () { 2 Point input {babymontgomery} pin; 3 Point output {babyedwards} pout; 4 5 pout.x <–- pin.x / pin.y; 6 pout.y <–- (pin.x - 1) / (pin.x + 1); 7 8 pout.x * pin.y === pin.x; 9 pout.y * (pin.x + 1) === pin.x - 1; 10 } Figure 4.6: Template to transform a point in Montgomery form to its twisted Ed- wards form using buses. and is different from the point (0, 0), then the Point bus pout returned by the arith- metic circuit Montgomery2Edwards() will belong to the elliptic curve BJM defined by the equation 168700 (pout.x)2 + (pout.y)2 = 1 + 168696 (pout.x)2(pout.y)2 (4.6) Proof. Let us consider the first of the constraints declared by Montgomery2Edwards(). pout.x× pin.y = pin.x If we take A = 168698 we can reason as follows on the Montgomery equation: (pin.y)2 = (pin.x)3 + A (pin.x)2 + pin.x ⇔ ⇔ (pout.x)2(pin.y)2 = (pout.x)2((pin.x)3 + A (pin.x)2 + pin.x) ⇔ ⇔ (pin.x)2 = (pout.x)2((pin.x)3 + A (pin.x)2 + pin.x) Now we have to multiply both terms of the equation by (1− pout.y)3 and simplify the resulting expression using the second constraint of Montgomery2Edwards(): pout.y× (pin.x+ 1) = pin.x− 1 ⇔ pin.x× (1− pout.y) = 1 + pout.y (pin.x)2 = (pout.x)2((pin.x)3 + A (pin.x)2 + pin.x) ⇔ ⇔ (pin.x)2(1− pout.y)3 = (pout.x)2(1− pout.y)3((pin.x)3+ + A (pin.x)2 + pin.x) ⇔ ⇔ (1 + pout.y)2(1− pout.y) = (pout.x)2((1 + pout.y)3+ + A (1 + pout.y)2(1− pout.y) + (1 + pout.y) (1− pout.y)2) ⇔ ⇔ (1 + pout.y) (1− (pout.y)2) = (pout.x)2(2 + A+ + (2 + A) pout.y+ (2− A) (pout.y)2 + (2− A) (pout.y)3) 44 Chapter 4. Circomlib circuits Since a = 168700 = A+ 2 and d = 168696 = A− 2 we obtain the next equivalence: (1 + pout.y) (1− (pout.y)2) = (pout.x)2(2 + A+ + (2 + A) pout.y+ (2− A) (pout.y)2 + (2− A) (pout.y)3) ⇔ ⇔ (1 + pout.y) (1− (pout.y)2) = = (pout.x)2(1 + pout.y) (a− d (pout.y)2) ⇔ ⇔ (1 + pout.y) (a (pout.x)2 + (pout.y)2) = = (1 + pout.y) (1 + d (pout.x)2(pout.y)2) If we could divide the whole equation by 1 + pout.y we would obtain the desired expression and conclude the proof. To do that we need to rule out the case when 1 + pout.y = 0, or in other words, when pout.y = −1. So let us suppose that in fact pout.y = −1. Then the second constraint of Montgomery2Edwards() forces pin.x = 0. Since the point pin fulfils the Baby-Jubjub Montgomery equation 4.5, the coordinate pin.y can only be 0. However, one of our premises was pin ̸= (0, 0), so we have reached a contradiction. Therefore, 1 + pout.y ̸= 0 and the proof is complete. As expected, the point (0, 0) of BJM , which is an exception of the map 4.2, causes the same problems as its equivalent point in twisted Edwards form (0,−1). The whole discussion of the (0,−1) point of BJE is also valid for this point, and the solution suggested in figure 4.7 is similar. 1 template Montgomery2Edwards () { 2 Point input {babymontgomery} pin; 3 Point output {babyedwards} pout; 4 5 pout.x <–- pin.y != 0 ? pin.x / pin.y : 0; 6 pout.y <–- (pin.x - 1) / (pin.x + 1); 7 8 pout.x * pin.y === pin.x; 9 pout.y * (pin.x + 1) === pin.x - 1; 10 11 component isz = isZero (); 12 isz.in <== pin.x; 13 isz.out * pout.x === 0; 14 } Figure 4.7: Modification of Montgomery2Edwards() solving the problem with the input point (0, 0). Chapter 5 Benchmarking With all the work done until this point we have first provided the circom language with great new functionalities that improve clarity and reliability, and then applied those desirable traits to circuits of circomlib. It is time to test whether the new features will have a negative impact on the sizes of the statements generated or if proof sizes will not be affected by the use of buses. For this purpose, we have selected a representative sample of the templates from circomlib that have been modified to implement buses. We will run a series of tests employing different functionalities of the circom compiler to compare the results with the old version of each template. The circom compiler is equipped with a set of tools that can be used to preform optimisations in the constraint systems described by a circom program. Specifi- cally, it has three different command line options for choosing the desired level of optimisation. The first option that we will use is the one designated by the flag --O0. When selecting this option, the compiler applies no simplification whatsoever. The second option that will be applied corresponds to the flag --O1. This option tells the compiler to only simplify constraints of the kind s1 === s2, which are very common in circom circuits. Specially, when connecting the output of a component as an input of another component, the developer needs to introduce this kind of constraints to capture the binding. The situation is worsened when working with large circuits, which can entail hundreds of millions of extra constraints. The third option we will be working with is selected with the flag --O2. This option activates the compiler’s full constraint simplification mode. Apart from the mentioned sim- plifications, with the --O2 option the compiler will carry out a fixed-point algorithm to eliminate as many linear constraints as possible. Linear constraints correspond to equations where no product of signals is found, so every term of the equation is linear. As explained in [21], to reduce the amount of linear constraints describing a circuit the compiler divides the set of constraints into clusters of related linear constraints and then applies Gauss-Jordan elimination to each of the clusters. This process is iterated until it is no longer possible to optimise more linear constraints. The R1CS generated with the activation of this option will be the best one possible with the current version of the circom compiler. 45 46 Chapter 5. Benchmarking We will divide the tests in two different groups: one for templates using the new BinaryNumber bus type (table 5.1) and the other for templates related to elliptic curves employing the Point bus type (table 5.2). For each of the circuits tested, we will apply the three levels of optimisation offered by the compiler and compare the results obtained by the old versions of the circuits not implementing buses and the updated versions using buses. We will examine three different metrics: the number of non-linear constraints of the final R1CS, the number of linear constraints and the total number of wires found in the arithmetic circuit. Looking at the results obtained, a few conclusions can be drawn. First it can be noticed that buses do not introduce extra quadratic constraints, which is good news. In the simpler circuits, the number of constraints generated with or without buses is the same even before any kind of simplification is carried out. For the more complex templates, like Num2Bits_strict or BabyPbk, although without any simplification the implementation of buses increases the number of linear constraints, activating the first simplification option of the compiler eliminates those extra constraints. Remember that this option applies simplifications only over constraints like s1 === s2, but it seems that level of simplification is enough to obtain the same results as when not employing buses in the circuit. There are even cases where activating option --O1 makes templates with buses generate systems with fewer constraints than those of the older versions. However, there are two exceptions to this rule, namely the templates AliasCheck and BabyCheck. The reason for this behaviour is the addition of outputs. While the old versions of the templates do not produce any outputs and just introduce constraints to force the signals fulfil certain properties, the newer versions output the input signals with tags linked to them so as to convey the semantics associated to those signals. In order to reach stronger conclusions, more tests should be carried out with dif- ferent templates. Nevertheless, the results shown in this section look very promising, as they suggest all the benefits that buses bring come with no drawbacks of bigger proof sizes. 47 CIRCUIT NON-LINEAR LINEAR WIRES Old New Old New Old New --O0 254 254 1 1 256 256 Num2Bits(254) --O1 254 254 1 1 256 256 --O2 254 254 1 1 256 256 --O0 516 516 769 1024 1284 1539 Num2Bits_strict() --O1 515 515 3 3 518 518 --O2 515 515 1 1 516 516 --O0 0 0 1 1 256 256 Bits2Num(254) --O1 0 0 1 1 256 256 --O2 0 0 1 1 256 256 --O0 262 262 769 1024 1284 1539 Bits2Num_strict() --O1 261 261 3 3 518 518 --O2 261 261 1 1 516 516 --O0 256 256 2 2 259 259 Num2BitsNeg(254) --O1 256 256 1 1 258 258 --O2 256 256 0 0 257 257 --O0 252 252 1 1 1269 1269 BinSum(254,4) --O1 252 252 1 1 1269 1269 --O2 252 252 1 1 1269 1269 --O0 255 255 1 1 764 764 BinSub(254) --O1 255 255 1 1 764 764 --O2 255 255 0 0 763 763 --O0 262 262 259 514 774 1029 AliasCheck() --O1 261 261 2 256 517 771 --O2 261 261 0 254 515 769 Table 5.1: Test results on templates using the BinaryNumber bus type. 48 Chapter 5. Benchmarking CIRCUIT NON-LINEAR LINEAR WIRES Old New Old New Old New --O0 2 2 0 0 5 5 Edwards2Montgomery() --O1 2 2 0 0 5 5 --O2 2 2 0 0 5 5 --O0 2 2 0 0 5 5 Montgomery2Edwards() --O1 2 2 0 0 5 5 --O2 2 2 0 0 5 5 --O0 3 3 0 0 8 8 MontgomeryAdd() --O1 3 3 0 0 8 8 --O2 3 3 0 0 8 8 --O0 4 4 0 0 7 7 MontgomeryDouble() --O1 4 4 0 0 7 7 --O2 4 4 0 0 7 7 --O0 6 6 0 0 11 11 BabyAdd() --O1 6 6 0 0 11 11 --O2 6 6 0 0 11 11 --O0 3 3 0 2 5 7 BabyCheck() --O1 3 3 0 2 5 7 --O2 3 3 0 2 5 7 --O0 3948 3948 6170 6173 10119 10122 BabyPbk() --O1 3939 3939 182 180 4122 4120 --O2 776 776 0 0 777 777 --O0 3695 3695 5909 5912 9859 9862 EscalarMulFix(254,B) --O1 3691 3691 176 174 4122 4120 --O2 524 524 0 0 779 779 Table 5.2: Test results on templates using the Point bus type. Chapter 6 Conclusions and Future Work In this final chapter, we will present the conclusions drawn from this project. We will summarise the work completed and the objectives that have been achieved. Lastly, we will suggest potential avenues for further development that may stem from this project. 6.1 Conclusions The initial phase of the project involved a thorough examination of an unfamiliar code base. This process was challenging due to the complexity typically associated with compiler code and the initial unfamiliarity with the programming language used. Effective strategies to navigate this included basic language tutorials followed by example-driven learning and active communication with the original developers. An important element of this project was learning to program in Rust, which can be challenging due to its unique memory safety features and pattern matching capabilities. Despite the initial difficulties, mastering Rust proved beneficial, partic- ularly due to the compiler’s helpful debugging hints and the language’s robust safety features, which ensure code reliability by preventing side effects and data races. The flexibility in creating structures, such as those used for AST components, and Rust’s growing popularity suggest that knowledge of this language will be advantageous for future endeavours. The primary goal of this project was the development of a tool in the circom language that could assist programmers in specifying properties of groups of signals. To that end, the characteristics of circom were explored, leading to the conception of signal buses. Buses allow for the application of existing tags to groups of signals, thereby helping clarify the semantics associated to these signals. Furthermore, the introduction of buses has enriched the circom language by providing the capability for defining custom signal types, thus enhancing the language’s expressiveness and improving the clarity of code within circuit designs. 49 50 Chapter 6. Conclusions and Future Work The implementation of buses required significant modifications to various com- ponents of the circom compiler, alongside the development of analyses to identify potential errors in their use. Each alteration was meticulously planned and executed to ensure optimal integration. Comprehensive testing was conducted to ensure the correct functionality of the updates and to address any undetected bugs. Finally, the compiler was updated to accurately interpret the bus syntax, fulfilling objectives I) and II) of section 1.2. To ascertain whether the use of buses affected the size of the generated Zero-Knowledge proofs, benchmark tests were conducted against code that did not use buses. These tests confirmed that, thanks to the compiler’s optimisation features, the inclusion of buses did not impact the proof sizes at all. With the purpose of taking advantage of the benefits brought by buses, many circuits within the official circom library (circomlib) were revised in preparation for the release of its new version. These adjustments not only have enhanced the clarity of the circuits but also significantly strengthen the reliability of the library’s templates thanks to enabling the attachment of tags to groups of signals expressing properties that have been enforced by the circuits of the circomlib. Hence, we have managed to achieve objective III). To make sure no property has been wrongly assigned, rigorous mathematical proofs have been employed to verify the correctness of these properties as stated in objective IV). All in all, the security level of the templates provided by the circomlib has been notably increased. 6.2 Future lines of work Although the work conducted has been exhaustive, there remain several aspects that we have not had time to explore, which could serve as valuable directions for future research: • The development of tools for automatically verifying the correctness of cir- cuits that utilise the new tags and buses added to circomlib. During the development of the project the idea came up to create a software that would verify if the templates of the circomlib are used correctly in a program. This software would take advantage of the new tags implemented in the templates of the circomlib and would check if no tags are added manually to a signal or bus, but rather obtained by means of another circomlib circuit. When no possible problematic points were found the programmer would receive a message confirming the program is (almost) guaranteed to be safe. • The extension of circom to admit conditional definitions on buses dependent on the parameters provided. We have explained in detail why buses are not like object-oriented languages’ classes and the reasons to forbid statements different from declarations inside buses. However, it could make sense to allow the inclusion of conditional statements inside bus definitions with the purpose of providing more flexibility to bus types and even enabling some recursive implementations. 6.2. Future lines of work 51 • The implementation of new statements to connect bus fields to a set of sig- nals with a single instruction. Looking at other general-purpose programming languages, many of them feature instructions to initialise the fields of a struct or an equivalent object with a single assignment of the form var a = b, c, d. Circom could add a similar statement to its repertoire as a less verbose manner of declaring linear constraints involving a bus. Bibliography [1] Ronald L Rivest. Cryptography. In Algorithms and complexity, pages 717–755. Elsevier, 1990. [2] Shafi oldwasser, Silvio Micali, and Chales Rackoff. The knowledge complexity of interactive proof-systems. In 17th Annual ACM Symposium on Theory of Computing, 1985. [3] Oded Goldreich, Silvio Micali, and Avi Wigderson. Proofs that yield nothing but their validity or all languages in np have zero-knowledge proof systems. Journal of the ACM (JACM), 38(3):690–728, 1991. [4] Oded Goldreich and Yair Oren. Definitions and properties of zero-knowledge proof systems. Journal of Cryptology, 7(1):1–32, 1994. [5] Vincent Gramoli, Len Bass, Alan Fekete, and Daniel W Sun. Rollup: Non- disruptive rolling upgrade with fast consensus-based dynamic reconfigurations. IEEE Transactions on Parallel and Distributed Systems, 27(9):2711–2724, 2015. [6] EthHub. Zk-rollups. Available at https://docs.ethhub.io/ ethereum-roadmap/layer-2-scaling/zk-rollups/. [7] Polygon. Polygon zk-evm. Available at https://zkevm.polygon. technology/. [8] Scroll. Scroll zk-evm. Available at https://scroll.io/blog/architecture. [9] Roger Maull, Phil Godsiff, Catherine Mulligan, Alan Brown, and Beth Kewell. Distributed ledger technology: Applications and implications. Strategic Change, 26(5):481–489, 2017. [10] Manuel Blum, Paul Feldman, and Silvio Micali. Non-interactive zero-knowledge and its applications. In Proceedings of the twentieth annual ACM symposium on Theory of computing, pages 103–112, 1988. [11] Thomas Chen, Hui Lu, Teeramet Kunpittaya, and Alan Luo. A review of zk-snarks. arXiv preprint arXiv:2202.06877, 2022. 53 https://docs.ethhub.io/ethereum-roadmap/layer-2-scaling/zk-rollups/ https://docs.ethhub.io/ethereum-roadmap/layer-2-scaling/zk-rollups/ https://zkevm.polygon.technology/ https://zkevm.polygon.technology/ https://scroll.io/blog/architecture 54 BIBLIOGRAPHY [12] Ran Canetti. Universally composable security: A new paradigm for crypto- graphic protocols. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science, pages 136–145. IEEE, 2001. [13] Eli Ben-Sasson, Iddo Bentov, Yinon Horesh, and Michael Riabzev. Scalable, transparent, and post-quantum secure computational integrity. Cryptology ePrint Archive, 2018. [14] Sean Bowe, Jack Grigg, and Daira Hopwood. Recursive proof composition without a trusted setup. Cryptology ePrint Archive, 2019. [15] Bryan Parno, Jon Howell, Craig Gentry, and Mariana Raykova. Pinocchio: Nearly practical verifiable computation. Communications of the ACM, 59(2): 103–112, 2016. [16] Jonathan Bootle, Andrea Cerulli, Pyrros Chaidos, Jens Groth, and Christophe Petit. Efficient zero-knowledge arguments for arithmetic circuits in the dis- crete log setting. In Advances in Cryptology–EUROCRYPT 2016: 35th An- nual International Conference on the Theory and Applications of Cryptographic Techniques, Vienna, Austria, May 8-12, 2016, Proceedings, Part II 35, pages 327–357. Springer, 2016. [17] Abdelrahaman Aly, Tomer Ashur, Eli Ben-Sasson, Siemen Dhooghe, and Alan Szepieniec. Design of symmetric-key primitives for advanced cryptographic protocols. IACR Transactions on Symmetric Cryptology, pages 1–45, 2020. [18] Eli Ben-Sasson, Alessandro Chiesa, Eran Tromer, and Madars Virza. Succinct {Non-Interactive} zero knowledge for a von neumann architecture. In 23rd USENIX Security Symposium (USENIX Security 14), pages 781–796, 2014. [19] Jacob Eberhardt and Stefan Tai. Zokrates-scalable privacy-preserving off-chain computations. In 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pages 1084–1091. IEEE, 2018. [20] Collin Chin, Howard Wu, Raymond Chu, Alessandro Coglio, Eric McCarthy, and Eric Smith. Leo: A programming language for formally verified, zero- knowledge applications. Cryptology ePrint Archive, 2021. [21] Marta Bellés-Muñoz, Miguel Isabel, Jose Luis Muñoz-Tapia, Albert Rubio, and Jordi Baylina. Circom: A circuit description language for building zero- knowledge applications. IEEE Transactions on Dependable and Secure Com- puting, 2022. [22] iden3. Circom documentation, . Available at https://docs.circom.io/. [23] iden3. Circomlib repository, . Available at https://github.com/iden3/ circomlib/. https://docs.circom.io/ https://github.com/iden3/circomlib/ https://github.com/iden3/circomlib/ BIBLIOGRAPHY 55 [24] Miguel Isabel, Clara Rodríguez-Núñez, and Albert Rubio. Scalable verification of zero-knowledge protocols. In 2024 IEEE Symposium on Security and Privacy (SP), pages 133–133. IEEE Computer Society, 2024. [25] Daniel J Bernstein, Peter Birkner, Marc Joye, Tanja Lange, and Christiane Pe- ters. Twisted edwards curves. In Progress in Cryptology–AFRICACRYPT 2008: First International Conference on Cryptology in Africa, Casablanca, Morocco, June 11-14, 2008. Proceedings 1, pages 389–405. Springer, 2008. [26] Barry WhiteHat, Jordi Baylina, and Marta Bellés. Baby jubjub elliptic curve. Ethereum Improvement Proposal, EIP-2494, 29, 2020. [27] Jean-Jacques Quisquater, Myriam Quisquater, Muriel Quisquater, Michaël Quisquater, Louis Guillou, Marie Annick Guillou, Gaïd Guillou, Anna Guillou, Gwenolé Guillou, and Soazig Guillou. How to explain zero-knowledge protocols to your children. In Conference on the Theory and Application of Cryptology, pages 628–631. Springer, 1989. [28] Rosario Gennaro, Craig Gentry, Bryan Parno, and Mariana Raykova. Quadratic span programs and succinct nizks without pcps. In Advances in Cryptology– EUROCRYPT 2013: 32nd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Athens, Greece, May 26-30, 2013. Proceedings 32, pages 626–645. Springer, 2013. [29] iden3. Circom repository, . Available at https://github.com/iden3/circom. [30] Michael Scott. Programming language pragmatics. Morgan Kaufmann, 2000. [31] Hermenegildo García Navarro. Design and implementation of the circom 1.0 compiler. 2020. [32] iden3. Old circom repository, . Available at https://github.com/iden3/ circom_old. [33] Carol Nichols Steve Klabnik. The rust programming language. Available at https://doc.rust-lang.org/stable/book/. [34] Rust Foundation. Rust by example. Available at https://doc.rust-lang. org/rust-by-example/. [35] Alfred J Menezes, Paul C Van Oorschot, and Scott A Vanstone. Handbook of applied cryptography. CRC press, 2018. [36] Daniel J Bernstein and Tanja Lange. Post-quantum cryptography. Nature, 549 (7671):188–194, 2017. [37] Don Johnson and Alfred Menezes. The elliptic curve digital signature algorithm fecdsah 1 2. 1999. [38] Adam Langley, Mike Hamburg, and Sean Turner. Elliptic curves for security. Technical report, 2016. https://github.com/iden3/circom https://github.com/iden3/circom_old https://github.com/iden3/circom_old https://doc.rust-lang.org/stable/book/ https://doc.rust-lang.org/rust-by-example/ https://doc.rust-lang.org/rust-by-example/ 56 BIBLIOGRAPHY [39] Tanja Lange. Safecurves: choosing safe curves for elliptic-curve cryptography daniel j. bernstein university of illinois at chicago &. Página de Título Contents Table of Contents List of figures List of tables Introduction Motivation Objectives Work description In the following chapters State of Art Zero-knowledge proofs Arithmetic circuits Circom Circomlib The circom compiler Circom Development Definition of buses Implementation of buses: Parser Implementation of buses: AST Implementation of buses: Static analyses Symbol analysis Buses free of invalid statements Constant propagation Type analysis Unknown-known analysis Circomlib circuits Binary numbers Elliptic curves Benchmarking Conclusions and Future Work Conclusions Future lines of work Bibliography