This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
Citation information: DOI 10.1109/TETC.2022.3187199, IEEE Transactions on Emerging Topics in Computing

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/

1

PERCIVAL: Open-Source Posit RISC-V Core
with Quire Capability

David Mallasén, Raul Murillo, Alberto A. Del Barrio, Senior Member, IEEE,
Guillermo Botella, Senior Member, IEEE, Luis Piñuel and Manuel Prieto-Matias

Abstract—The posit representation for real numbers is an alternative to the ubiquitous IEEE 754 floating-point standard. In this work,
we present PERCIVAL, an application-level posit RISC-V core based on CVA6 that can execute all posit instructions, including the
quire fused operations. This solves the obstacle encountered by previous works, which only included partial posit support or which had
to emulate posits in software. In addition, Xposit, a RISC-V extension for posit instructions is incorporated into LLVM. Therefore,
PERCIVAL is the first work that integrates the complete posit instruction set in hardware. These elements allow for the native execution
of posit instructions as well as the standard floating-point ones, further permitting the comparison of these representations. FPGA and
ASIC synthesis show the hardware cost of implementing 32-bit posits and highlight the significant overhead of including a quire
accumulator. However, results show that the quire enables a more accurate execution of dot products. In general matrix multiplications,
the accuracy error is reduced up to 4 orders of magnitude. Furthermore, performance comparisons show that these accuracy
improvements do not hinder their execution, as posits run as fast as single-precision floats and exhibit better timing than
double-precision floats, thus potentially providing an alternative representation.

Index Terms—Arithmetic, Posit, IEEE-754, Floating point, RISC-V, CPU, CVA6, LLVM, Matrix multiplication.

F

1 INTRODUCTION

R EPRESENTING real numbers and executing arithmetic
operations on them in a microprocessor presents

unique challenges. When comparing with the simpler set of
integers, working with reals introduces notions such as their
precision. The representation of real numbers in virtually all
computers for decades has followed the IEEE 754 standard
for floating-point arithmetic [1]. However, this standard has
some flaws such as rounding and reproducibility issues,
signed zero, or excess of Not a Number (NaN) represen-
tations.

To face these challenges, alternative real number repre-
sentations are proposed in the literature. Posits [2] are a
promising substitute proposed in 2017 that provide com-
pelling benefits. They deliver a good trade-off between dy-
namic range and accuracy, encounter fewer exceptions when
operating, and have tapered precision. This means that
numbers near ±1 have more precision, while very big and
very small numbers have less. The posit standard includes
fused operations, which can be used to compute a series
of multiplications and accumulations without intermediate
rounding. Furthermore, posits are consistent between im-
plementations, as they use a single rounding scheme and
include only two special cases: single 0 and ±∞. Therefore,
they potentially simplify the hardware implementation [3].
Nonetheless, posits are still under development, and it is
still not clear whether they could completely replace IEEE
floats [4].

Including Posit Arithmetic Units (PAUs) into cores in
hardware is a crucial step to study the efficiency of this
representation further. When designing such a core and

All authors are with the Department of Computer Architecture and Automa-
tion, Complutense University of Madrid, 28040 Madrid, Spain.
E-mails: {dmallase, ramuri01, abarriog, gbotella, lpinuel, mpmatias}@ucm.es
Manuscript received -; revised -.

its arithmetic operations, an important decision is which
Instruction Set Architecture (ISA) to implement. RISC-V [5]
is a promising open-source ISA that is getting significant
attraction both in academia and in industry. Thanks to its
openness and flexibility, multiple RISC-V cores have been
developed targeting diverse purposes in recent years. In the
case of studying the performance of posits, a core that can
run application-level software is needed.

Some works have studied the use of posits by emu-
lating their execution in software [6], [7], [8]. However,
this approach has the significant drawback of requiring
excessive execution times, thus limiting the scalability of the
applications.

To overcome these limitations, we propose to include
native posit and quire support in hardware by leveraging
a high-performance RISC-V core. A comparison of four of
the leading open-source application-class RISC-V cores is
studied in [9], CVA6 among them. In this work, we have
extended the datapath of the CVA6 [10] RISC-V core with a
32-bit PAU with quire and a posit register file. Together with
the Xposit compiler extension, this core allows the native
hardware execution of high-level applications that leverage
the posit number system.

Therefore, the main contributions of this paper are the
following:

• We present PERCIVAL, an oPEn-souRCe1 posIt risc-
V core with quire cApabiLity based on the CVA6 that
can execute all 32-bit posit instructions, including the
quire fused operations.

• Compiler support for the Xposit RISC-V extension
in LLVM. This allows to easily embed posit instruc-
tions into a C program that can be run natively on

1. https://github.com/artecs-group/PERCIVAL

ar
X

iv
:2

11
1.

15
28

6v
3 

 [
cs

.A
R

] 
 7

 J
ul

 2
02

2


2

PERCIVAL or any other core that implements these
opcodes.

• To the best of our knowledge, the PERCIVAL core
together with the Xposit extension is the first work
that integrates in hardware standard posit addition,
subtraction, and multiplication together with quire
fused operations. It also includes posit logarithmic-
approximate hardware for division and square root
operations. Furthermore, all comparison operations
and conversions to and from integer numbers are
also included in PERCIVAL.

• Field-Programmable Gate Array (FPGA) and
Application-Specific Integrated Circuit (ASIC)
synthesis results showcasing the resource-usage of
posit arithmetic and quire capabilities on a RISC-V
CPU. These results are compared with the native
IEEE 754 Floating-Point Unit (FPU) available in the
CVA6 and with previous works.

• Accuracy and timing performance of posit numbers
and IEEE 754 floats are compared on PERCIVAL
using General Matrix Multiplication (GEMM) and
max-pooling benchmarks. Results show that 32-bit
posits can be up to 4 orders of magnitude more
accurate than 32-bit floats thanks to the quire register.
Furthermore, this improvement does not imply a
trade-off in execution time, as they can perform as
fast as 32-bit floats, and thus execute faster than 64-
bit floats.

The rest of the paper is organized as follows: Section 2
introduces the necessary background about the posit for-
mat, the RISC-V ISA and the CVA6 RISC-V core. Related
works from the literature are surveyed in Section 3, both
as standalone PAUs and at the core level. In Section 4 the
PERCIVAL posit core is described and in Section 5 the
necessary compiler support for the Xposit RISC-V extension
is introduced. The FPGA and ASIC synthesis results of
the core are presented, as well as compared with other
implementations, in Section 6. Subsequently, in Section 7
posits and IEEE 754 floats are compared regarding accuracy
and timing performance. Finally, Section 8 concludes this
work.

2 BACKGROUND

2.1 Posit Format

Posit numbers [2] were introduced in 2017 as an alternative
to the predominant IEEE 754 floating-point standard to
represent and operate with real numbers. Posits provide
reproducible results across platforms and few special cases.
Furthermore, they do not support overflow or underflow,
which reduces the complexity of exception handling.

A posit number configuration is defined using two pa-
rameters as Posit〈n, es〉, where n is the total bit-width, and
es is the maximum bit-width of the exponent. Although in
literature [4], [6], [11] the most widespread posit formats
have been Posit〈8, 0〉, Posit〈16, 1〉 and Posit〈32, 2〉, in the
latest Posit Standard 4.12 Draft [12], the value of es is fixed
to 2. This has the advantage of simplifying the hardware
design and facilitates the conversion between different posit
sizes.

Fig. 1. Posit format with sign, regime, exponent and fraction fields.

Posits only distinguish two special cases: zero and Not-
a-Real (NaR), which are represented as 0 · · ·0 and 10 · · ·0
respectively. The rest of the representations are composed of
four fields as shown in Figure 1:

• The sign bit S;
• The variable-length regime field R, consisting of k

bits equal to R0 followed by R0 or the end of the
posit. This field encodes a scaling factor r given by
Equation (1);

• The exponent E, consisting of at most es bits, which
encodes an integer unbiased value e. If any of its bits
are located after the least significant bit of the posit,
that bit will have value 0;

• The variable-length fraction field F, formed by the
remaining m bits. Its value 0 ≤ f < 1 is given by
dividing the unsigned integer F by 2m.

r =

{
−k if R0 = 0
k − 1 if R0 = 1

(1)

The real value p of a generic posit is given by Equa-
tion (2). The main differences with the IEEE 754 floating-
point format are the existence of the regime field, the use of
an unbiased exponent, and the value of the fraction hidden
bit. Usually, in floating-point arithmetic, the hidden bit is
considered to be 1. However, in the latest representation of
posits, it is considered to be 1 if the number is positive, or
−2 if the number is negative. This simplifies the decoding
stage of the posit representation [3], [13].

p = ((1− 3s) + f)× 2(1−2s)×(4r+e+s) (2)

In posit arithmetic, NaR has a unique representation that
maps to the most negative 2’s complement signed integer.
Consequently, if used in comparison operations, it results in
less than all other posits and equal to itself. Moreover, the
rest of the posit values follow the same ordering as their cor-
responding bit representations. These characteristics allow
posit numbers to be compared as if they were 2’s comple-
ment signed integers, eliminating additional hardware for
posit comparison operations.

The variable-length regime field acts as a long-range
dynamic exponent, as can be seen in Equation (2), where
it is multiplied by 4 or, equivalently, shifted left by the two
exponent bits. Since it is a dynamic field, it can occupy more
bits to represent larger numbers or leave more bits to the
fraction field when looking for accuracy in the neighbor-
hoods of ±1. However, detecting these variable-sized fields
adds some hardware overhead.

As an example, let 11101010 be the binary encoding of
a Posit8, i.e. a Posit〈8, 2〉 according to the latest Posit Stan-
dard 4.12 Draft [12]. The first bit s = 1 indicates a negative
number. The regime field 110 gives k = 2 and therefore
r = 1. The next two bits 10 represent the exponent e = 2.


3

Finally, the remaining m = 2 bits, 10, encode a fraction
value of f = 2/22 = 0.5. Hence, from (2) we conclude that
11101010 ≡ (−2 + 0.5)× 2−(4+2+1) = −0.01171875.

In addition to the standard representation, posits include
fused operations using the quire, a 16n-bit fixed-point 2’s
complement register, where n is the posit bit-width. This
allows to execute up to 231−1 Multiply-Accumulate (MAC)
operations without intermediate rounding or accuracy loss.
The quire can represent either NaR, similarly to regular
posits, or the value given by 216−8n times the 2’s comple-
ment signed integer represented by the 16n concatenated
bits.

2.2 RISC-V ISA
The open-source RISC-V ISA [5] emanates from the ideas of
Reduced Instruction Set Computers (RISCs). It is structured
as a base integer ISA plus a set of optional standard and non-
standard extensions to customize and specialize the final set
of instructions. There are two main base integer ISAs, RV32I
and RV64I, that establish the user address spaces as 32-bit
or 64-bit respectively.

The RISC-V general standard extensions include, among
others, functionality for integer multiply/divide (M), atomic
memory operations (A) and single- (F) and double-precision
(D) floating-point arithmetic following the IEEE 754 stan-
dard. This set of general-purpose standard extensions
IMAFD, together with the instruction-fetch fence (Zifencei),
and the control and status register (Zicsr), form the general-
purpose G abbreviation. In general, following the RISC prin-
ciples, all extensions have fixed-length 32-bit instructions.
However, there is also a compressed instruction extension
(C) that provides 16-bit instructions.

Expanding the RISC-V ISA with specialized extensions is
supported by the standard to allow for customized accelera-
tors. Non-standard extensions can be added to the encoding
space leveraging the four major opcodes reserved for cus-
tom extensions. A proposal of the changes that should be
made to the F standard extension in order to have a 32-bit
posit RISC-V extension is described in [14].

2.3 CVA6
The CVA6 [10] (formerly known as Ariane) is a 6-stage,
in-order, single-issue CPU which implements the RV64GC
RISC-V standard. The core implements three privilege levels
and can run a Linux operating system. The primary goal of
its micro-architecture is to reduce the critical path length. It
was developed initially as part of the PULP ecosystem, but
it is currently maintained by the OpenHW Group, which is
developing a complete, industrial-grade pre-silicon verifica-
tion. CVA6 is written in SystemVerilog and is licensed under
an open-source Solderpad Hardware License.

As execution units in the datapath it includes an integer
ALU, a multiply/divide unit and an IEEE 754 FPU [15]. This
FPU claims to be IEEE 754-2008 compliant, except for some
issues in the division and square root operations. For the
sake of comparison, it is important that the FPU is IEEE
754 compliant instead of being limited to normal floats only,
since in theory, posit hardware is slightly more expensive
than floating-point hardware that does not take into account
subnormal numbers [3].

3 RELATED WORK

There has been a great deal of interest in hardware im-
plementations of posit arithmetic since its first appearance.
Standalone PAUs with different degrees of capabilities or
basic posit functional units have been described in the liter-
ature [11], [16], [17], [18]. These units provide the building
blocks to execute posit arithmetic. However, they do not
allow themselves to execute whole posit algorithms.

Recently, some works adding partial posit support to
RISC-V cores have been presented. CLARINET [19] incor-
porates the quire into a RV64GC 5-stage in-order core. How-
ever, not all posit capabilities are included in this work. Most
operations are performed in IEEE floating-point format, and
the values are converted to posit when using the quire. The
only posit functionalities added to the core are fused MAC
with quire, fused divide and accumulate with quire and
conversion instructions.

PERC [20] integrates a PAU into the Rocket Chip gen-
erator, replacing the 32 and 64-bit FPU. However, this
work does not include quire support, as it is constrained
by the F and D RISC-V extensions for IEEE-754 floating-
point numbers. More recently, PERI [21] added a tightly
coupled PAU into the SHAKTI C-class core, a 5-stage in-
order RV32IMAFC core. This proposal also does not include
quire support as it reuses the F extension instructions.
Nonetheless, it allows dynamic switching between es=2 and
es=3 posits. In [22] authors include a PAU named POSAR
into a RISC-V Rocket Chip core. Again, this proposal does
not include quire support and replaces the FPU present in
Rocket Chip to reuse the floating-point instructions.

A different approach is taken in [23], where authors
use the posit representation as a way to store IEEE floats
in memory with a lower bit-width while performing the
computations using the IEEE FPU. For this purpose they
include a light posit processing unit into the CVA6 core that
converts between 8 or 16-bit posits and 32-bit IEEE floats.
They also develop an extension of the RISC-V ISA to include
these conversion instructions.

4 PERCIVAL POSIT CORE

In this work, we have integrated full posit capabilities, in-
cluding quire and fused operations, into an application-level
RISC-V core. In addition to the design of the functional units
that execute the posit and quire operations, the novelty of
our design is that it is fully compatible both at the software
and hardware level with the F and D RISC-V extensions.
Therefore, both posit and IEEE floating-point numbers can
be used simultaneously on the same core. This is the first
work that integrates practically all of the posit and quire
operations specified in the posit standard into a core, to the
best of our knowledge.

4.1 PAU Design
The Posit Arithmetic Unit (PAU) is in charge of executing
most posit operations and also contains the quire register,
as shown in Figure 2. Posit comparisons are executed in the
integer Arithmetic Logic Unit (ALU). As mentioned above,
this is one of the benefits of the posit representation.

When designing the micro-architecture of the PAU, our
objective was to achieve a similar latency and throughput


4

Fig. 2. Internal structure of the proposed Posit Arithmetic Unit (PAU).

as the FPU operations, to obtain fair comparisons. The
throughput is limited, as there is no pipeline in the FPU nor
the PAU. Nevertheless, all of the operations are multi-cycle.
The latency of the PAU units is the following:

• PADD, PSUB, QMADD, and QMSUB: 2 cycles.
• PMUL, PDIV, PSQRT, and QROUND: 1 cycle.

All other operations have no latency, i.e. they output their
result at the next clock cycle after receiving the inputs.

As a comparison, the 32-bit FADD, FSUB, FMADD,
FMSUB, and FMUL instructions in the FPU have a latency
of 2 clock cycles, but the 64-bit analogous instructions have a
latency of 3 cycles. It is noteworthy that the comparisons in
the FPU have a latency of 1, while the posit comparisons that
reuse the integer hardware have no latency. Conversions to
and from integer values also take an extra clock cycle in the
FPU.

Depending on the operation, the input operands are
directed to the corresponding posit unit and the result
is forwarded as an output of the PAU. There are three
main blocks: computational operations (COMP), conversion
operations (CONV), and operations that make use of the
quire register (FUSED) (Figure 2).

Regarding COMP, the ADD unit is used both for addi-
tion and subtraction, calculating the two’s complement of
the second operand when subtracting. In this group, all the
modules use both operands except the square root, which
uses only operand A. In addition, the operands and the
result correspond to the posit register file.

It must be noted that the posit division and square root
units are approximate, as this type of arithmetic simplifies
the designs and thus reduces the hardware cost of the
system. They are logarithm-approximate units based on
Mitchell’s Approximate Log Multipliers and our previous
work [11]. These units have been demonstrated to have a
maximum relative error of 11.11%, and have less impact on
area/performance than the exact hardware operators. On
the other hand, exact division and square root algorithms

could be implemented in software leveraging the MAC unit,
thus eliminating the need for dedicated hardware. However,
this is out of the scope of this work.

In the CONV group, only operand A is used for conver-
sions. Depending on the operation, the input data and the
result belong to the posit or the integer register file.

The quire register is the most singular addition to this
number format. According to the posit standard, it must
be an architectural register accessible by the programmer
that is also allowed to be dumped into memory. However,
being so wide, the cost of including this functionality into
the core’s datapath could be too high for the benefits it
would add. In the vast majority of cases, the quire is used as
an accumulator to avoid overflows in the MAC operations,
and this does not require quire load and store operations.
Instead, we can initialize the quire to zero (QCLR.S), negate
it if needed (QNEG.S), accumulate the partial products
in it without rounding or storing in memory (QMADD.S
and QMSUB.S), and, when the whole operation is finished,
round and output the result (QROUND.S). The necessary
support for all of these operations related to the quire is
included in our proposal (see Table 2 below). The hardware
cost of including the quire as an internal register in the PAU
is studied in Section 6.

4.2 Core Integration
The proposed PAU has been integrated into the CVA6
RV64GC core while maintaining the compatibility with all
existing extensions, including single- and double-precision
floating point. Moreover, since we work with Posit32 num-
bers, i.e. Posit〈32, 2〉, the core adds a 32-bit posit register file
in addition to the integer and floating-point registers.

The instruction decoder has been extended to support
posit instructions. The inner workings of the decoder are
described in Figure 3. As part of the decoding process,
each posit instruction selects from which register file it must
obtain its operands and to which register file it must forward
its result.


5

Require: Instruction to decode instr.
Ensure: Scoreboard entry sc_instr which contains the

operation op and the destination functional unit fu.
switch (instr.opcode)
. . .
case POSIT:

switch (instr.func3)
case 000: {Computational posit instruction}

switch (instr.func5)
case 00000: {PAU instruction}
sc_instr.fu = PAU
sc_instr.op = PADD

. . .
case 00100: {ALU instruction}
sc_instr.fu = ALU
sc_instr.op = PMIN

. . .
end switch

case 001: {Posit load instruction}
sc_instr.fu = LOAD
sc_instr.op = PLW

case 011: {Posit store instruction}
sc_instr.fu = STORE
sc_instr.op = PSW

end switch
. . .
default: {Instruction not decoded in any switch/case}
illegal_instr = true

end switch

Fig. 3. Pseudocode describing the decoding of posit instructions.

The CVA6 core uses scoreboarding for dynamically
scheduled instructions and allows out-of-order write-back
of each functional unit. The scoreboard tracks which in-
structions are issued, their functional unit and in which
register they will write back to. Our design has enlarged
the scoreboard to include posit registers and instructions.
In this manner, we can discern whether the input data of
posit operations are retrieved from a register or forwarded
directly as a result of a previous operation.

As mentioned in Section 2.1, posit numbers have the
benefit of being able to reuse the comparison hardware
of 2’s complement signed integers. Therefore, the integer
ALU has also been extended to accept posit operands and
to be able to forward the result of these instructions with
minimal hardware overhead. Furthermore, the PAU has
been integrated into the execution phase of the processor
in parallel to the ALU and the FPU, connecting the issue
module with the aforementioned scoreboard. Finally, the
complete datapath has been adapted to include the posit
signals and all necessary additional interconnections.

5 COMPILER SUPPORT: XPOSIT EXTENSION

The assembly output of a RISC-V compiler when process-
ing programs that use floating-point arithmetic includes
instructions from the corresponding F and D extensions.
To produce a similar output but targeting posit numbers,
a new extension must be introduced that translates posit
instructions and posit operators to binary code. Therefore,

in this section, the Xposit RISC-V extension targeting posit
arithmetic is presented. As part of this work, Xposit has
been integrated into LLVM 12 backend [24] to allow the
compilation of high-level applications.

This modified version of LLVM can compile C code.
However, posit instructions must be written from the assem-
bly level, as there is currently no support for writing posit or
quire operations directly in C. Therefore, previous codes can
be reused in PERCIVAL, and only the computational kernels
have to be manually written in assembly. An example of this
is shown in Section 7.

The posit instruction set follows the structure of the
F RISC-V standard extension for single-precision floating
point [25]. This Xposit extension mostly follows the adap-
tation to the posit format proposed in [14]. The differences
with this proposal are the following:

• We include 32 posit registers p0-31 as in the F
standard extension.

• Similarly to the integer operations in CVA6, there is
no flag signaling division by zero.

• We do not include the possibility of loading and
storing the quire in memory.

The Xposit extension uses the 0001011 opcode (cus-
tom-0), occupying the space indicated in Table 1 as POSIT.
If more operations were needed in the future, especially
posit load and store instructions of other word lengths, the
0101011, 1011011, and 1111011 opcodes (custom-1,2,3) could
be leveraged. In this way, a similar approach as the F and
D RISC-V extensions could be followed, which utilize the
OP-FP, LOAD-FP and STORE-FP opcodes.

The format and fields of the Xposit instructions are
described in Figure 4. Posit load and store use the same
base+offset addressing as the corresponding floating-point
instructions, with the base address in register rs1 and a
signed 12-bit byte offset. Thus, the PLW instruction loads
a posit value from memory to the rd posit register and
the PSW instruction stores a posit value from the rs2 posit
register to memory. The rest of the Xposit operations keep
the POSIT opcode and differ from the previous instructions
by the funct3 field. Finally, it must be noted that the fmt field
is fixed to 01 indicating that the instructions are for single-
precision (32-bit) posits. The complete instruction set of the
proposed Xposit RISC-V extension is detailed in Table 2.

An important addition of the Xposit extension are the
quire instructions. Since the quire is a single internal register
of the PAU, the instructions that operate with it do not have
to specify a quire register number. For example, the quire
clear instruction does not have any parameters. It is decoded
and then executed internally by the PAU, which simply sets
the quire register to 0. The quire fused operations only have
to specify the posit registers of the two values that will be
multiplied. Then, the accumulation is performed implicitly
on the quire.

6 SYNTHESIS RESULTS

In this section, we present the FPGA and ASIC synthesis re-
sults of PERCIVAL. The details of its PAU and the IEEE 754
FPU using 32 and 64-bit formats are also included. In this
manner, the hardware cost of posit numbers and the quire
are highlighted and compared with other implementations.


6

TABLE 1
RISC-V base opcode map + POSIT extension; inst[1:0]=11

inst[4:2] 000 001 010 011 100 101 110 111
inst[6:5] (> 32b)

00 LOAD LOAD-FP POSIT MISC-MEM OP-IMM AUIPC OP-IMM-32 48b
01 STORE STORE-FP custom-1 AMO OP LUI OP-32 64b
10 MADD MSUB NMSUB NMADD OP-FP reserved custom-2/rv128 48b
11 BRANCH JALR reserved JAL SYSTEM reserved custom-3/rv128 ≥ 80b

Fig. 4. Internal structure and fields of Xposit instructions.

6.1 FPGA Synthesis

The FPGA synthesis was performed using Vivado v.2020.2
targeting a Genesys II (Xilinx Kintex-7 XC7K325T-
2FFG900C) FPGA. Different configurations of FPU and PAU
were tested, the results of which are shown in Table 3. Since
the critical path does not traverse the arithmetic units of the
core, in all of the cases the timing constraint of 20ns was met
and the timing slack was +0.177ns.

The bare CVA6 without a FPU or PAU requires 28950
Lookup Tables (LUTs) and 19579 Flip-flops (FFs). Includ-
ing support for 32-bit floating-point numbers increases the
number of LUTs and FFs by 6452 and 2039 respectively.
This difference grows to 12310 LUTs and 4366 FFs when
using also the double-precision D extension. Note that these
values are larger than simply the FPU area, since they also
include other elements such as the floating-point register
file, instruction decoding and interconnections. These other
non-FPU elements require 2406 LUTs and 1066 FFs in the
32-bit case and 4147 LUTs and 2122 FFs in the 64-bit case.

Comparing the overall cost of including posit support
with the cost of including IEEE floating-point support, a
significant difference can be seen. Adding 32-bit posit oper-
ations and quire support to the CVA6 requires 15743 LUTs
and 4057 FFs, which is comparable to the FD floating-point
configuration. Out of this area, 3864 LUTs and 1072 FFs
are occupied by the non-PAU elements mentioned in the
previous floating-point analysis.

The synthesis results reveal that the PAU requires signif-
icantly more resources than the FPU available in the CVA6.
In particular, the 32-bit PAU with quire occupies 2.94 times
as many LUTs and 3.07 times as many FFs as the 32-bit
FPU. To better understand these results, in Table 4 the area
requirements of the different modules inside the PAU are
presented. The most interesting value shown in this table is
the area occupied by the posit MAC unit, which corresponds
to almost half of the total area of the PAU.

When compared with the floating-point units, which do
not include an accumulation register, the area requirements
of the quire could be separated. Thus, the posit MAC and

the quire rounding to posit can be subtracted from the total
PAU area to obtain a value of 5326 LUTs and 1312 FFs.
This outcome is now much closer to the synthesis results
of the FPU, as the PAU without quire occupies 1.32 times as
many LUTs and 1.35 times as many FFs. These results match
previous works [22], where authors also report an increase
of around 30% in FPGA resources when comparing their
32-bit PAU without quire with a 32-bit FPU.

In our case, the actual value of not including a quire
would be even smaller, as the cost of allocating the 512-bit
quire in the PAU and computing its 2’s complement, which
are included in the PAU top, should also be subtracted.
However, the synthesis tool does not include these details.

6.2 ASIC Synthesis

The 32-bit PAU with quire and the 32-bit FPU configuration
present in PERCIVAL were synthesized targeting TSMC’s
45nm standard-cell library to further study their hardware
cost in ASICs. The synthesis was performed using Synopsys
Design Compiler with a timing constraint of 5ns, which was
met in both cases, and a toggle rate of 0.1.

The 32-bit FPU within CVA6 requires an area of
30691 µm2 and consumes 27.26 mW of power. On the
other hand, the 32-bit PAU with quire requires an area of
76970 µm2 and consumes 67.73 mW of power. This follows
the same trend shown in the FPGA synthesis, as the PAU
with quire is significantly larger, 2.51x, and consumes more
power, 2.48x, than the FPU.

In addition, to better assess these values in comparison
with other proposals, the PAU available in CLARINET [19]
was also synthesized with the same parameters. We have
chosen to evaluate this work because it integrates, to the
best of our knowledge, the only other PAU that contains a
quire. In this case, the 32-bit PAU with quire requires an
area of 69920 µm2 and consumes 68.31 mW of power. This
is a decrease of around 10% in area and a slight increase in
power compared to our proposal, although ours implements
a much larger set of posit functionality.


7

TABLE 2
Instruction set of the proposed XPosit RISC-V extension.

31 27 26 25 24 20 19 15 14 12 11 7 6 0
imm[11:0] rs1 001 rd 00001011 PLW

imm[11:5] rs2 rs1 011 imm[4:0] 00001011 PSW
00000 10 rs2 rs1 000 rd 00001011 PADD.S
00001 10 rs2 rs1 000 rd 00001011 PSUB.S
00010 10 rs2 rs1 000 rd 00001011 PMUL.S
00011 10 rs2 rs1 000 rd 00001011 PDIV.S
00100 10 rs2 rs1 000 rd 00001011 PMIN.S
00101 10 rs2 rs1 000 rd 00001011 PMAX.S
00110 10 00000 rs1 000 rd 00001011 PSQRT.S
00111 10 rs2 rs1 000 00000 00001011 QMADD.S
01000 10 rs2 rs1 000 00000 00001011 QMSUB.S
01001 10 00000 00000 000 00000 00001011 QCLR.S
01010 10 00000 00000 000 00000 00001011 QNEG.S
01011 10 00000 00000 000 rd 00001011 QROUND.S
01100 10 00000 rs1 000 rd 00001011 PCVT.W.S
01101 10 00000 rs1 000 rd 00001011 PCVT.WU.S
01110 10 00000 rs1 000 rd 00001011 PCVT.L.S
01111 10 00000 rs1 000 rd 00001011 PCVT.LU.S
10000 10 00000 rs1 000 rd 00001011 PCVT.S.W
10001 10 00000 rs1 000 rd 00001011 PCVT.S.WU
10010 10 00000 rs1 000 rd 00001011 PCVT.S.L
10011 10 00000 rs1 000 rd 00001011 PCVT.S.LU
10100 10 rs2 rs1 000 rd 00001011 PSGNJ.S
10101 10 rs2 rs1 000 rd 00001011 PSGNJN.S
10110 10 rs2 rs1 000 rd 00001011 PSGNJX.S
10111 10 00000 rs1 000 rd 00001011 PMV.X.W
11000 10 00000 rs1 000 rd 00001011 PMV.W.X
11001 10 rs2 rs1 000 rd 00001011 PEQ.S
11010 10 rs2 rs1 000 rd 00001011 PLT.S
11011 10 rs2 rs1 000 rd 00001011 PLE.S

TABLE 3
Comparison of FPGA synthesis results with different configurations of FPU, marked as F and D for 32 and 64-bit numbers respectively, and 32-bit

PAU with quire.

PAU No PAU

F D FD - F D FD -

Total core
(LUT, FF) (50318, 25727) (55900, 27652) (57129, 27996) (44693, 23636) (35402, 21618) (40740, 23599) (41260, 23945) (28950, 19579)

FPU area
(LUT, FF) (3726, 1008) (6352, 1905) (7612, 2245) - (4046, 973) (6626, 1905) (8163, 2244) -

PAU area
(LUT, FF) (11796, 2979) (11810, 2979) (11803, 2979) (11879, 2985) - - - -

TABLE 4
FPGA synthesis area results of the PAU desegregated into its individual

components.

Name LUTs FFs

PAU top 593 1063
Posit Add 784 106
Posit Mult 736 73
Posit ADiv 413 43
Posit ASqrt 426 33
Posit MAC 5644 1541
Quire to Posit 889 126
Int to Posit 176 0
Long to Posit 331 0
ULong to Posit 425 0
Posit to Int 499 0
Posit to Long 379 0
Posit to UInt 228 0
Posit to ULong 358 0

PAU total 11 879 2985
PAU w/o quire 5346 1318

Similarly as in Section 6.1, the area and power results
of the different elements inside the PAU are presented in
Table 5. As can be seen, when subtracting the cost of the
quire in the PAU, the outcome is still higher than the 32-bit
FPU, but it becomes much closer. The 32-bit PAU occupies
1.32 times as much area and consumes 1.38 times as much
power as the 32-bit IEEE FPU FPNew [15]. However, it is
noteworthy that some aspects of posit arithmetic are not yet
fully studied. For example, most of the works presenting
posit units have tackled the decoding and encoding phases
using sign-magnitude. Nonetheless, more recent studies
show that a 2’s complement approach is more efficient [13].

7 POSIT VS IEEE-754 BENCHMARKS

One of the benefits of PERCIVAL is that an accurate and
fair comparison can be made between posit and IEEE
floating point. The main advantage of having support for
native posit and IEEE floating point simultaneously on


8

TABLE 5
ASIC synthesis area and power results of the 32-bit PAU with quire

desegregated into its individual components.

Name Area (µm2) Power (mW)

PAU top 13 462.15 12.69
Posit Add 4075.31 3.59
Posit Mult 8635.37 9.98
Posit ADiv 2540.87 2.41
Posit ASqrt 1722.84 1.61
Posit MAC 30 419.12 26.07
Quire to Posit 6026.76 4.04
Int to Posit 905.99 0.68
Long to Posit 1423.43 0.96
UInt to Posit 869.77 0.66
ULong to Posit 1353.11 0.94
Posit to Int 966.67 0.71
Posit to Long 1810.33 1.38
Posit to UInt 958.44 0.68
Posit to ULong 1800.22 1.33

PAU total 76 970.38 67.73
PAU w/o quire 40 524.62 37.62

CLARINET PAU 69 920.02 68.31

the same core is that identical benchmarks can be run
on both number representations to compare them. In this
work, we have chosen to benchmark the General Matrix
Multiplication (GEMM) and the max-pooling layer, used to
down-sample the representation of neural networks. These
examples showcase the use of the quire and posits both in
the PAU and in the ALU, loading and storing from memory
and leveraging the posit register file.

The GEMM and max-pooling codes for posits and IEEE
floats have been written in C, including inline assembly for
the required posit and float instructions. The floating-point
code has also been written in inline assembly to provide
exactly the same sequence of instructions to the core. The
GEMM code for floats is shown in Figure 5 and the analo-
gous version for posits using the quire is shown in Figure 6.
These codes have been compiled using the modified version
of LLVM with the Xposit RISC-V extension as specified in
Section 5, and serve as an example of how this extension
can be leveraged. Therefore, the final target architecture is
RV64GCXposit. The -O2 optimization flag has been used to
obtain an optimized code in every case.

7.1 Accuracy

The accuracy differences between posits and floats are stud-
ied for the GEMM benchmark. Furthermore, each arithmetic
is executed with and without using fused MAC operations,
which in posit arithmetic include the quire. In the cases
without quire or FMADD, each fused operation is substituted
by a multiplication and an addition. The results obtained
using the 64-bit IEEE 754 format are considered the golden
solution and used to compute the Mean Squared Error
(MSE) of the 32-bit posit and the 32-bit IEEE 754 floating
point. In all cases, the inputs are square matrices with
the same random values. These input values are gener-
ated from a uniform distribution in intervals of the form
[−10i, 10i], i ∈ {−1, 0, 1, 2, 3}. This results in 5 different
sets of inputs. These intervals allow for a study of the
impact of the input data range on the GEMM. These random

Require: Float matrices a and b of size n×n.
Ensure: Float matrix c = ab.

for i = 0 to n-1 do
for j = 0 to n-1 do

asm(”fmv.w.x ft0,zero”:::); {Set ft0 to 0}
for k = 0 to n-1 do

asm
”flw ft1,0(%0)” {Load float a and b}
”flw ft2,0(%1)”
”fmadd.s ft0,ft1,ft2,ft0” {Accumulate on ft0}
:: ”r” (&a[i * n + k]), ”r” (&b[k * n + j]):

end asm
end for
asm

”fsw ft0,0(%1)” {Store the result in c}
: ”=rm” (c[i * n + j]) : ”r” (&c[i * n + j]):

end asm
end for

end for

Fig. 5. 32-bit floating-point GEMM using the F RISC-V extension.

Require: Posit matrices a and b of size n×n.
Ensure: Posit matrix c = ab.

for i = 0 to n-1 do
for j = 0 to n-1 do

asm(”qclr.s”:::); {Clear the quire}
for k = 0 to n-1 do

asm
”plw pt0,0(%0)” {Load posit a and b}
”plw pt1,0(%1)”
”qmadd.s pt0,pt1” {Accumulate on the quire}
:: ”r” (&a[i * n + k]), ”r” (&b[k * n + j]):

end asm
end for
asm

”qround.s pt2” {Round the quire to a posit}
”psw pt2,0(%1)” {Store the result in c}
: ”=rm” (c[i * n + j]) : ”r” (&c[i * n + j]) :

end asm
end for

end for

Fig. 6. Posit GEMM using the Xposit RISC-V extension with the quire
accumulator.

values are generated as 64-bit IEEE 754 numbers and then
converted to the two other formats with the aid of the
SoftPosit [26] library.

The MSE results are shown in Table 6 for different matrix
sizes and input ranges. Additionally, Figure 7 shows the
MSE in the [−1, 1] case. We decided to give slightly more at-
tention to this case since many applications normalize their
values. As can be seen, for 256×256 matrices, the difference
between MSEs is around four orders of magnitude when
using fused operations. This is reduced to two orders of
magnitude if the quire is not used. Note that when using
floats, the accuracy difference between employing fused
FMADD operations or not is minimal.

If we compare how the MSE scales when increasing the
matrix size, it can be seen that posit numbers present a


9

better behavior thanks to the quire register. This is true in
all ranges of input values. Overall, the impact of the quire is
significant among all test cases, and its extra cost is justified
by the results.

These results go in line with our previous work [27],
where a similar benchmark was performed using hardware
simulations with an input interval of [−2, 2]. The MSE
results on 32-bit floats and posits follow the same trends
given in Table 6.

When removing the quire, posits still have a lower MSE
than floats except in the [−1000, 1000] case. This can be
explained by posit’s tapered precision. When the numbers’
exponents are closer to 0, they end up in the so-called
“golden zone” of posits [4]. This is the area where posits
have more accuracy bits than floats thanks to their variable-
length fields. However, when the accumulated values are
large or very small, IEEE floats gain an advantage over
posits without quire.

Particularly, this “golden zone” comprises values
roughly in the interval [10−6, 106]. In the test with input
values in [−1000, 1000], the absolute value of the final
outputs averages 1.2×106 in the 16×16 matrix and 4.3×106
in the 256×256 case. As a comparison, even in the 256×256
multiplication, the [−100, 100] input range only averages
4.3× 104.

7.2 Performance

Besides the synthesis data presented in Section 6, the exe-
cution time is a critical metric to study the hardware per-
formance of posits and floats. The test has been performed
executing the same GEMM and max-pooling described pre-
viously on PERCIVAL, avoiding cold misses and averaging
over 10 executions to obtain more accurate measurements.

The range of the input values does not affect perfor-
mance. Thus, the values shown in Table 7 for GEMM are
an average of the timings obtained in the 5 cases described
previously. This gives a total of 50 executions in the GEMM
operation. In this case, when using fused MAC operations
and the quire, the execution time of 32-bit posits is prac-
tically the same as that of single-precision floats for the
larger matrix sizes, where the overhead execution of the
extra qround.s instruction becomes negligible (see Figure 6).
This instruction is executed in the order of O(n2) times,
compared with the O(n3) running time of the algorithm.
This cost is noticeable for smaller values of n, when 32-
bit posits are slightly slower than 32-bit and 64-bit floats.
However, for larger matrix sizes, which are common in
scientific applications and Deep Neural Networks (DNNs),
32-bit posits perform equally as 32-bit floats and outperform
64-bit floats, since these instructions require more clock
cycles to compute. Furthermore, as seen in the previous
accuracy benchmark, 32-bit posits are orders of magnitude
more accurate than 32-bit floats when performing this cal-
culation. Therefore, they provide an alternative solution for
the execution of kernels that make use of the dot product.

The quire and fused MAC operations have a positive
impact on timing performance. This is true in all test cases.
Again, this performance increase stems from the extra clock
cycles needed for a multiplication + an addition in compar-
ison to only one fused operation.

Additionally, for the sake of completeness, we have per-
formed the same GEMM timing test on a commercial core
with support for posit arithmetic. RacEr is a GPGPU FPGA
provided by VividSparks that supports computation with
Posit32 but does not include quire support, so its accuracy
results are the same as the Posit32 no quire case. It has
512 CPUs running at 300MHz with 32GB of DDR4 RAM.
Table 7 also includes the results of the GEMM benchmark
on this platform. As can be seen, our proposal provides
significantly faster results than this commercial accelerator.

Regarding the max-pooling layers, three different con-
figurations have been tested following common DNNs. In
LeNet-5, the input of this layer is 28x28x6, the pooling
kernel is 2x2 and is applied with a stride of 2, creating a
14x14x6 output representation. In AlexNet, the input size is
54x54x96, the kernel size is 3x3 and is applied with a stride
of 2, generating an output of size 26x26x96. Finally, ResNet-
50 is the largest configuration we have tested, as its input is
112x112x64, the pooling kernel is 3x3 and again is applied
with a stride of 2, creating a 55x55x64 output representation.

The results of executing these layers on PERCIVAL using
the 32 and 64-bit IEEE floating-point and Posit32 represen-
tations are shown in Table 8. Results show that 32-bit posits
perform as fast as 32-bit floats but without the need for extra
hardware, as the posit maximum operation is carried out
reusing the integer ALU. Double-precision floats are slower
than 32-bit posits and floats by a factor of 1.4-1.7× due to
the latency difference in the units as seen in the GEMM
benchmark.

8 CONCLUSIONS

This paper has presented PERCIVAL, an extension of the
application-level CVA6 RISC-V core, including all 32-bit
posit instructions as well as the quire fused operations.
These capabilities, integrated into a Posit Arithmetic Unit
together with a posit register file, are natively incorporated
while preserving IEEE 754 single- and double-precision
floats.

Furthermore, the RISC-V ISA has been extended with
Xposit, which includes support for all posit and quire in-
structions. This allows the compilation and execution on
PERCIVAL of application-level programs that make use of
posits and floats simultaneously. To the best of our knowl-
edge, this is the first work that enables complete posit and
quire capabilities in hardware.

Synthesis results show that half the area dedicated to
the PAU is occupied by the quire and its operations. When
comparing with the only previous work which includes
quire capabilities [19], our proposal consumes slightly less
power and only 10% more area, while also providing full
posit operations support. When focusing on the 32-bit PAU
without the quire, our proposal requires 32% more area and
38% more power than the 32-bit FPU. This goes in line
with the results of recent works which reuse the F RISC-V
extension [22], where authors obtain a 30% increase in FPGA
resources when comparing their PAU to the FPU.

The Posit vs IEEE-754 comparison benchmark results
show that 32-bit posits are up to 4 orders of magnitude
more accurate than 32-bit floats when calculating the GEMM


10

TABLE 6
GEMM MSE comparison between IEEE 754 floating-point and posit numbers.

Input values Matrix size 16× 16 32× 32 64× 64 128× 128 256× 256

[-0.1, 0.1]

IEEE 754 1.385× 10−18 4.429× 10−18 1.523× 10−17 6.347× 10−17 2.407× 10−16

Posit32 3.157× 10−21 6.110× 10−21 1.158× 10−20 2.014× 10−20 3.497× 10−20

IEEE 754 no FMADD 1.515× 10−18 4.752× 10−18 1.566× 10−17 6.524× 10−17 2.432× 10−16

Posit32 no quire 2.146× 10−20 6.726× 10−20 2.371× 10−19 7.805× 10−19 2.203× 10−18

[-1, 1]

IEEE 754 1.490× 10−14 4.251× 10−14 1.602× 10−13 6.019× 10−13 2.361× 10−12

Posit32 1.138× 10−17 2.355× 10−17 4.729× 10−17 9.430× 10−17 1.937× 10−16

IEEE 754 no FMADD 1.324× 10−14 4.637× 10−14 1.686× 10−13 6.246× 10−13 2.416× 10−12

Posit32 no quire 5.028× 10−17 1.727× 10−16 6.457× 10−16 2.447× 10−15 9.870× 10−15

[-10, 10]

IEEE 754 1.371× 10−10 3.998× 10−10 1.581× 10−9 5.922× 10−9 2.378× 10−8

Posit32 8.549× 10−13 1.475× 10−12 3.055× 10−12 6.355× 10−12 1.295× 10−11

IEEE 754 no FMADD 1.300× 10−10 4.304× 10−10 1.708× 10−9 6.026× 10−9 2.447× 10−8

Posit32 no quire 3.878× 10−12 1.341× 10−11 7.500× 10−11 3.282× 10−10 1.41× 10−9

[-100, 100]

IEEE 754 1.412× 10−6 4.206× 10−6 1.544× 10−5 6.402× 10−5 2.405× 10−4

Posit32 4.819× 10−8 8.266× 10−8 1.760× 10−7 6.150× 10−7 1.506× 10−6

IEEE 754 no FMADD 1.293× 10−6 5.052× 10−6 1.595× 10−5 6.503× 10−5 2.440× 10−4

Posit32 no quire 3.077× 10−7 1.230× 10−6 4.295× 10−6 2.804× 10−5 1.569× 10−4

[-1000, 1000]

IEEE 754 1.503× 10−2 3.936× 10−2 1.509× 10−1 6.069× 10−1 2.391
Posit32 5.293× 10−3 8.573× 10−3 1.900× 10−2 3.746× 10−2 8.265× 10−2

IEEE 754 no FMADD 1.675× 10−2 4.815× 10−2 1.644× 10−1 6.323× 10−1 2.433
Posit32 no quire 4.168× 10−2 1.570× 10−1 5.669× 10−1 2.365 9.586

16x16 32x32 64x64 128x128 256x256
Matrix size

10−17

10−16

10−15

10−14

10−13

10−12

10−11

M
SE

1.
14

e-
17

2.
35
e-
17

4.
73
e-
17

9.
43
e-
17

1.
94
e-
16

1.
49
e-
14 4.
25
e-
14 1.
60
e-
13 6.
02
e-
13 2.
36
e-
12

5.
03
e-
17 1.
73
e-
16 6.
46
e-
16 2.
45
e-
15 9.
87
e-
15

1.
32
e-
14 4.
64
e-
14 1.
69
e-
13 6.
25
e-
13 2.
42
e-
12

GEMM MSE comparison w.r.t. double, [-1, 1] input
Posit32
Float
Posit32 no quire
Float no FMADD

Fig. 7. MSE results of posits and floats with respect to doubles in the GEMM test with input values in [−1, 1]. Note the logarithmic Y-axis. Blue
(green) bars show the results with (without) fused MAC and quire operations.

TABLE 7
GEMM timing comparison between IEEE 754 floating-point and posit numbers.

Matrix size 16× 16 32× 32 64× 64 128× 128 256× 256

32-bit float 0.978 ms 6.58 ms 52.1 ms 1.48 s 13.9 s
64-bit float 0.920 ms 6.64 ms 69.4 ms 1.74 s 15.0 s
Posit32 0.949 ms 7.30 ms 57.7 ms 1.48 s 13.9 s
32-bit float no FMADD 1.16 ms 8.69 ms 68.6 ms 1.61 s 15.0 s
64-bit float no FMADD 1.26 ms 9.36 ms 92.6 ms 1.92 s 16.7 s
Posit32 no quire 1.27 ms 9.63 ms 69.1 ms 1.61 s 15.0 s
VividSparks Posit32 no quire 7.95 ms 48.9 ms 345 ms 2.63 s 21.1 s


11

TABLE 8
Max-pooling timing comparison between IEEE 754 floating-point and

posit numbers.

Max-pooling layer 32-bit float 64-bit float Posit32

LeNet-5 (28x28x6) 0.715ms 1.211ms 0.688ms
AlexNet (54x54x96) 0.115ms 0.160ms 0.116ms
ResNet-50 (112x112x64) 0.337ms 0.470ms 0.340ms

due to the quire. Moreover, they do not show a perfor-
mance degradation compared with floats, thus providing
a potential alternative when operating with real numbers.
In addition, our proposal performs significantly better than
available commercial solutions, obtaining up to 8× speedup
when multiplying small matrices.

Some known limitations occur in the use of the quire. As
it is a single internal register in the PAU, PERCIVAL cannot
support parallel accumulation into different independent
accumulators. This also prevents safe automatic context
switches, as the value of the quire cannot be loaded or stored
in memory. Therefore, when developing programs for PER-
CIVAL this must be taken into account to not overwrite the
value of the quire.

As future work, we plan to implement and evaluate on
PERCIVAL large-scale scientific applications which make
use of dot products, leveraging the accuracy gains of fused
operations.

ACKNOWLEDGMENTS

This work was supported by a 2020 Leonardo Grant for
Researchers and Cultural Creators, from BBVA Foundation,
whose id is PR2003 20/01, by the EU(FEDER) and the
Spanish MINECO under grant RTI2018-093684-B-I00, and
by the CM under grant S2018/TCS-4423.

REFERENCES

[1] IEEE Computer Society, “IEEE Standard for Floating-Point Arith-
metic,” IEEE Std 754-2019 (Revision of IEEE 754-2008), pp. 1–84, Jul.
2019.

[2] J. L. Gustafson and I. T. Yonemoto, “Beating floating point at its
own game: Posit arithmetic,” Supercomputing Frontiers and Innova-
tions, vol. 4, no. 2, pp. 71–86, Apr. 2017.

[3] A. Guntoro, C. De La Parra, F. Merchant, F. De Dinechin, J. L.
Gustafson, M. Langhammer, R. Leupers, and S. Nambiar, “Next
Generation Arithmetic for Edge Computing,” in 2020 Design, Au-
tomation & Test in Europe Conference & Exhibition (DATE). Greno-
ble, France: IEEE, Mar. 2020, pp. 1357–1365.

[4] F. de Dinechin, L. Forget, J.-M. Muller, and Y. Uguen, “Posits: The
good, the bad and the ugly,” in Proceedings of the Conference for next
Generation Arithmetic 2019, ser. CoNGA’19. New York, NY, USA:
Association for Computing Machinery, 2019.

[5] A. Waterman, Y. Lee, D. A. Patterson, and K. Asanović, “The RISC-
V instruction set manual, volume I: User-level ISA, version 2.0,”
EECS Department, University of California, Berkeley, Tech. Rep.
UCB/EECS-2014-54, May 2014.

[6] R. Murillo, A. A. Del Barrio, and G. Botella, “Deep PeNSieve:
A deep learning framework based on the posit number system,”
Digital Signal Processing, vol. 102, p. 102762, Jul. 2020.

[7] G. Raposo, P. Tomás, and N. Roma, “Positnn: Training Deep
Neural Networks with Mixed Low-Precision Posit,” in ICASSP
2021 - 2021 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), Jun. 2021, pp. 7908–7912.

[8] H. F. Langroudi, V. Karia, Z. Carmichael, A. Zyarah, T. Pandit,
J. L. Gustafson, and D. Kudithipudi, “Alps: Adaptive Quantization
of Deep Neural Networks with GeneraLized PositS,” in 2021
IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops (CVPRW). Nashville, TN, USA: IEEE, Jun. 2021, pp.
3094–3103.

[9] A. Dörflinger, M. Albers, B. Kleinbeck, Y. Guan, H. Michalik,
R. Klink, C. Blochwitz, A. Nechi, and M. Berekovic, “A compar-
ative survey of open-source application-class RISC-V processor
implementations,” in Proceedings of the 18th ACM International
Conference on Computing Frontiers, ser. CF ’21. New York, NY,
USA: Association for Computing Machinery, 2021, pp. 12–20.

[10] F. Zaruba and L. Benini, “The Cost of Application-Class Process-
ing: Energy and Performance Analysis of a Linux-Ready 1.7-GHz
64-Bit RISC-V Core in 22-nm FDSOI Technology,” IEEE Transac-
tions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 11,
pp. 2629–2640, Nov. 2019.

[11] R. Murillo, A. A. Del Barrio Garcia, G. Botella, M. S. Kim, H. Kim,
and N. Bagherzadeh, “PLAM: A Posit Logarithm-Approximate
Multiplier,” IEEE Transactions on Emerging Topics in Computing, pp.
1–1, 2021.

[12] Posit Working Group, “Posit Standard Documentation Release
4.12-draft,” Jul. 2021. [Online]. Available: https://posithub.org/
posit standard4.12.pdf

[13] R. Murillo, D. Mallasén, A. A. Del Barrio, and G. Botella, “Com-
paring Different Decodings for Posit Arithmetic,” in Conference on
Next Generation Arithmetic (CoNGA), 2022.

[14] J. L. Gustafson, “RISC-V Proposed Extension for 32-bit Posits,”
https://posithub.org/docs/RISC-V/RISC-V.htm, Jun. 2018.

[15] S. Mach, F. Schuiki, F. Zaruba, and L. Benini, “FPnew: An Open-
Source Multiformat Floating-Point Unit Architecture for Energy-
Proportional Transprecision Computing,” IEEE Transactions on
Very Large Scale Integration (VLSI) Systems, vol. 29, no. 4, pp. 774–
787, Apr. 2021.

[16] R. Chaurasiya, J. Gustafson, R. Shrestha, J. Neudorfer, S. Nambiar,
K. Niyogi, F. Merchant, and R. Leupers, “Parameterized Posit
Arithmetic Hardware Generator,” in 2018 IEEE 36th International
Conference on Computer Design (ICCD), Oct. 2018, pp. 334–341.

[17] M. K. Jaiswal and H. K.-H. So, “PACoGen: A Hardware Posit
Arithmetic Core Generator,” IEEE Access, vol. 7, pp. 74 586–74 601,
2019.

[18] R. Murillo, A. A. Del Barrio, and G. Botella, “Customized Posit
Adders and Multipliers using the FloPoCo Core Generator,” in
2020 IEEE International Symposium on Circuits and Systems (ISCAS),
Oct. 2020, pp. 1–5.

[19] N. Sharma, R. Jain, M. Mohan, S. Patkar, R. Leupers, N. Rishiyur,
and F. Merchant, “CLARINET: A RISC-V Based Framework for
Posit Arithmetic Empiricism,” arXiv:2006.00364 [cs], Oct. 2021.

[20] M. V. Arunkumar, S. G. Bhairathi, and H. G. Hayatnagarkar,
“PERC: Posit Enhanced Rocket Chip,” in 4th Workshop on Computer
Architecture Research with RISC-V (CARRV’20), 2020, p. 8.

[21] S. Tiwari, N. Gala, C. Rebeiro, and V. Kamakoti, “PERI: A Config-
urable Posit Enabled RISC-V Core,” ACM Transactions on Architec-
ture and Code Optimization, vol. 18, no. 3, pp. 1–26, Jun. 2021.

[22] S. D. Ciocirlan, D. Loghin, L. Ramapantulu, N. Tapus, and
Y. M. Teo, “The Accuracy and Efficiency of Posit Arithmetic,”
arXiv:2109.08225 [cs], Sep. 2021.

[23] M. Cococcioni, F. Rossi, E. Ruffaldi, and S. Saponara, “A
Lightweight Posit Processing Unit for RISC-V Processors in Deep
Neural Network Applications,” IEEE Transactions on Emerging
Topics in Computing, no. 01, pp. 1–1, Oct. 2021.

[24] C. Lattner and V. Adve, “LLVM: A compilation framework for
lifelong program analysis amp; transformation,” in International
Symposium on Code Generation and Optimization, 2004. CGO 2004.,
Mar. 2004, pp. 75–86.

[25] “The RISC-V Instruction Set Manual, Volume I: User-Level ISA,
Document Version 20191213,” Dec. 2019. [Online]. Available:
https://riscv.org/technical/specifications/

[26] S. H. Leong, “SoftPosit,” Mar. 2020. [Online]. Available:
https://gitlab.com/cerlane/SoftPosit

[27] R. Murillo, D. Mallasén, A. A. Del Barrio, and G. Botella, “Energy-
Efficient MAC Units for Fused Posit Arithmetic,” in 2021 IEEE 39th
International Conference on Computer Design (ICCD), Oct. 2021, pp.
138–145.


12

David Mallasén David Mallasén Quintana re-
ceived a BSc Degree in Computer Science and
a BSc Degree in Mathematics in 2020 from the
Complutense University of Madrid (UCM). From
2020 to 2022 he obtained a MSc Degree in
Embedded Systems at KTH Royal Institute of
Technology, specializing in embedded platforms.
Currently he is pursuing a Ph.D. in Computer
Engineering at UCM. His main research areas
include computer arithmetic, computer architec-
ture, embedded systems, and high-performance

computing.

Raul Murillo Raul Murillo studied Mathematics
and Computer Science at Complutense Univer-
sity of Madrid (UCM), Spain, where he also
received a MSc Degree in Computer Science
in 2021. His main research interests include
Approximate Computing, new Computer Arith-
metic, and Deep Neural Networks (DNNs). He
is currently pursuing a Ph.D. at UCM related to
the previously mentioned areas.

Alberto A. Del Barrio Alberto A. Del Barrio
received the Ph.D. degree in Computer Sci-
ence from the Complutense University of Madrid
(UCM), Madrid, Spain, in 2011. He has per-
formed stays at Northwestern University, Uni-
versity of California at Irvine and University of
California at Los Angeles. Since 2021, he is an
Associate Professor (tenure-track, civil-servant)
of Computer Science with the Department of
Computer Architecture and System Engineering,
UCM. His main research interests include De-

sign Automation, Arithmetic and their application to the field of Artifi-
cial Intelligence. He is leading the PARNASO project, funded by the
Leonardo Grants program by Fundación BBVA. The main objective is
to natively integrate the posit format in a hardware/software platform.
Since 2019 he is an IEEE Senior Member and since December 2020 he
is an ACM Senior Member, too.

Guillermo Botella Guillermo Botella received
the M.A.Sc. degree in Physics (Fundamental)
in 1998, the M.A.Sc. degree in Electronic En-
gineering in 2001 and the Ph.D. degree (Com-
puter Engineering) in 2007, all from the Uni-
versity of Granada, Spain. He was a research
fellow funded by EU working at University of
Granada, Spain and the Vision Research Lab-
oratory at University College London, UK. After
that, he joined as Assistant Professor at the De-
partment of Computer Architecture and Automa-

tion of Complutense University of Madrid, Spain where he is currently
Associate Professor. He has performed research stays acting also as
visiting professor from 2008 to 2012 at the Department of Electrical
and Computer Engineering, Florida State University, Tallahassee, USA.
His current research interests include Image and Video Processing for
VLSI, FPGAs, GPGPUs, Embedded Systems, and novel computing
paradigms such as analog and quantum computing. Since 2019 he has
become an IEEE Senior Member.

Luis Piñuel Luis Piñuel is an Associate Profes-
sor of the Department of Computer Architecture
and Systems Engineering at the Universidad
Complutense de Madrid, Spain. He received his
M. Sc. and Ph.D. degrees in Computer Science
from the Universidad Complutense de Madrid
(UCM) in 1996 and 2003, respectively. His re-
search interests include computer architecture,
high-performance computing, embedded sys-
tems, and resource management for emerging
computing systems. In these fields, he is co-

author of more than 70 publications in prestigious journals and inter-
national conferences, several book chapters and he has advised or co-
advised 5 PhD dissertations. Worried about improving knowledge trans-
fer between research institutions and industry, he has directed more
than 12 research contracts with different companies (Texas Instruments,
Imagination Technologies, Indra, ...). He has also served as evaluator
for several national agencies and has also been member of the Board
of Directors of the Spanish Computer Architecture Society (SARTECO).

Manuel Prieto-Matias Manuel Prieto Matias ob-
tained a Ph.D. degree from Complutense Uni-
versity of Madrid (UCM) in 2000. Since 2002,
he has been a Professor at the Department
of Computer Architecture at UCM, being a Full
Professor since 2019. His research interests in-
clude high-performance computing, non-volatile
memory technologies, accelerators, and code
generation and optimization. His current focus
is on effectively managing resources on emerg-
ing computing platforms, emphasizing the inter-

action between the system software and the underlying architecture.
Manuel has co-authored over 100 scientific publications in journals and
conferences in parallel computing and computer architecture. He is a
member of the ACM.