IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 56, NO. 4, AUGUST 2009 2091

A Methodology for Automatic Insertion of Selective
TMR in Digital Circuits Affected by SEUs

O. Ruano, J. A. Maestro, Member, IEEE, and P. Reviriego, Member, IEEE

Abstract—In this paper, a methodology to perform automatic se-
lective TMR insertion on digital circuits is presented, having as a
constraint the required reliability level. Such reliability is guar-
anteed while reducing the area compared to TMR. In addition,
a performance enhancement is proposed in order to guarantee a
computation time feasible for this automatic selective TMR inser-
tion methodology. It focuses on the choice of a starting point close
enough to an optimal solution. The method consists in the analysis
of the topological features of the target circuit which will help the
optimization engine to identify those flip-flops more susceptible to
be tripled depending on the showed sensitivity to SEUs.

Index Terms—Fault injection, optimization, single event upsets
(SEUs), triple modular redundancy (TMR).

I. INTRODUCTION

F AULT tolerance on semiconductor devices has been a
meaningful matter since upsets were first experienced

in space applications several years ago. Integrated circuits
operating in the space environment can be upset by charged
particles that generate errors in the system. Therefore, the
interest in studying fault-tolerant techniques in order to keep
integrated circuits operational has increased. In order to
guarantee the circuit reliability against Single Event Upsets
(SEU), some mitigation techniques have been proposed in
literature during the last few years. These techniques cover a
wide range of methods which can be classified as fabrication
process-based, design-based and recovery techniques. The
fabrication process-based, such as Epitaxial CMOS processes
[1] and Silicon-on-Insulator (SOI) [2], can reduce the radia-
tion effects of Total Ionization Dose (TID) and Single Event
Latch-up (SEL). However, they do not eliminate SEUs. The
design-based techniques, also called architectural techniques,
are based on logic redundancy such as Triple Modular Re-
dundancy (TMR) [3], Error detection and correction coding
(EDAC) like Hamming [4] or hardened memory cells like HIT
and DICE [5]. Finally, new techniques based on recovery have
been proposed for SRAM-Based FPGAs. The main idea is to
recover the original programmed information after an upset
[6]. Among all these methods, the well-known TMR has be-
come a common practice. In this case, the SEU sensitive logic
(memory cell) is tripled and voters are placed at the outputs to

Manuscript received September 05, 2008; revised January 09, 2009. Current
version published August 12, 2009. This work was supported by the Spanish
Ministry of Science and Education under Grant ESP-2006-04163, the Regional
Government of Madrid and the European Union FEDER programme.

The authors are with the Universidad Antonio de Nebrija, C/Pirineos, 55
E-28040 Madrid, Spain (e-mail: oruano@nebrija.es; jmaestro@nebrija.es; pre-
virie@nebrija.es).

Digital Object Identifier 10.1109/TNS.2009.2014563

Fig. 1. TMR memory cell with single voter.

identify the correct value (Fig. 1). In [7], the TMR technique
is shown to be a suitable solution to provide reliability against
SEUs, for an 8051 microcontroller. The fault injection results
showed that less than 1% of the bit upsets could provoke an
error in the output of the TMR design. However, the TMR
technique presents some overheads because of its full hardware
redundancy, such as area and power dissipation.

As a result, both area and power consumption are increased,
so many applications cannot accept this high cost.

In order to make the TMR technique valid for many applica-
tions, a heuristic method called selective TMR is proposed in
[8], [9] to make a selective insertion of TMR in the nodes that
present a bigger vulnerability to SEUs. This proposed method
is based on the logic masking analysis of each node. In [10],
[11], this technique is improved mixing the logic masking anal-
ysis with an innovative study called timing masking. It consists
in computing the error latching window and its aim is to have
equal delays to outputs from each gate in the circuit through gate
relocation, in order to achieve a better solution. Another tech-
nique for soft error mitigation is presented in [12]. It is based
on a selective addition of redundant wires that can prevent the
distorted signal from propagating to an output or a storage el-
ement. In [13], partial TMR is applied to the circuit sections
that affect structures which cause a persistent error. The rest of
works found in literature propose new fault tolerant techniques
that try to minimize the area cost. This is the case of the DWC
technique introduced in [14] that combines time and hardware
redundancy reducing the area compared to TMR. Other exam-
ples can be found in [15] and [16]. Focusing on the idea of the
selective TMR, and on systems that can allow a fixed error rate,
an analysis of an innovative method for selective TMR inser-
tion is presented in this paper, which provides an automatic se-
lective insertion based on an iterative optimization method. The
problem is addressed as an optimization task where two vari-
ables must keep a balance: the reliability level demanded by
the user must be assured in the final design, while, on the other

0018-9499/$26.00 © 2009 IEEE
Authorized licensed use limited to: Univ Complutense de Madrid. Downloaded on November 05,2024 at 12:11:25 UTC from IEEE Xplore.  Restrictions apply. 


2092 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 56, NO. 4, AUGUST 2009

Fig. 2. Methodology workflow.

hand, the area cost due to the modular redundancy must be min-
imized. This paper is organized as follows. Section II introduces
the proposed methodology for TMR optimization and explains
in detail the major factors that are involved. In Section III, a
tri-state pipeline stage is used as a case study where the method-
ology is validated by fault injection experiments in VHDL. In
Section IV, the methodology is supplemented by an innovative
static analysis process which enhances the performance of the
optimization process. In Section V, the different criteria are an-
alyzed experimentally via fault injection. Section VI shows in
detail the previous case study with the enhancement integrated
in the method. Final remarks and future work are placed in Sec-
tion VII, followed by the references.

II. PROPOSED METHODOLOGY

Currently, the design of radiation tolerant circuits is very ex-
pensive in terms of area and power consumption if the Triple
Modular Redundancy (TMR) technique is applied. Therefore,
this work proposes a selective TMR insertion adapted to the
fault tolerance level required by the application. Although TMR
clearly provides a superior protection level, not all the applica-
tions demand the highest fault tolerance level. In some cases,
an intermediate protection could be enough, what makes TMR
excessive in terms of area cost. The proposed methodology is
based on a combinatorial optimization problem which makes
use of the reliability constraints that the target circuit has to
meet. The issue is addressed like a graph partitioning problem
which is included in the category of NP-complete problems. It
consists in partitioning a given initial graph which includes all
registers of the initial circuit, in two sub-graphs. One of them,
stores the registers that the system has chosen to protect with
TMR, and the second stores the remaining registers without

TMR. The purpose of this methodology is to find the best par-
tition that assures the minimum possible area in terms of the
number of registers on which TMR is inserted, while meeting
the reliability constrains specified for the circuit. Therefore, the
optimization process looks for a solution among a finite set of
alternative results which minimizes the area and assures the re-
quired reliability. The methodology workflow defined to drive
the process is described in Fig. 2.

The operational method is detailed next. First of all, data
inputs required by the methodology must include an original
VHDL description, a random initial partition of the registers
grouped into two sub-graphs and the fault tolerance constraints
required by the application. Once these data are available, the
methodology is able to obtain a valid partition which can sat-
isfy the specified reliability constraints. The first step consists
in measuring the fault tolerance for the random initial partition
in order to evaluate it and see if it is a valid solution or not. In
order to achieve this objective, the process is detailed next. The
original VHDL and the set of registers with TMR are given to
the Selective TMR insertion module in order to insert TMR on
the appropriate registers. Next, the modified VHDL is used to
perform a fault injection campaign using the SEU Simulation
Tool [17], [18]. The fault tolerance level of the modified VHDL
is reported by the fault injection results through software fault
injection [19]. These results are sent to the optimization engine
module which evaluates them against the reliability constraints
required by the application. Once the initial partition has been
measured, the above process is repeated with a new partition of
the registers selected by the optimization engine, which man-
ages the remaining possible partitions through an optimization
algorithm. For each new partition driven by the optimization al-
gorithm, the Selective TMR Insertion module modifies the orig-
inal VHDL and a new fault injection is run to evaluate the new
tolerance. The process is finished when the optimization engine

Authorized licensed use limited to: Univ Complutense de Madrid. Downloaded on November 05,2024 at 12:11:25 UTC from IEEE Xplore.  Restrictions apply. 


RUANO et al.: A METHODOLOGY FOR AUTOMATIC INSERTION OF SELECTIVE TMR IN DIGITAL CIRCUITS AFFECTED BY SEUS 2093

cannot find a new partition with less area cost that satisfies the
demanded reliability. In the following, some issues which can
have impact in the proposed methodology are discussed.

A. Optimization Algorithm

In order to guide the successive movements and the several
partitions which are tested through the fault injection process,
an iterative algorithm has been selected to perform the exper-
imental results. Fiduccia-Mattheyses [20] is the meta-heuristic
algorithm which has been chosen due to its widely use in liter-
ature. A final solution is achieved by moving one register at a
time, for one graph of the partition to the other, updating the par-
tition in each step. Another feature of this algorithm is that it re-
duces the chance that the minimization process becomes trapped
at local minima (hill-climbing) [20]. In an attempt to reduce the
number of registers with TMR while satisfying the reliability
constraints, it drives the intermediate partitions according to a
cost function which will determine if each partition can satisfy
the reliability constraints or not.

B. Cost Function

The cost function must drive the algorithm in order
to accept or deny each movement that is proposed by
Fiduccia-Mattheyses in the search space. This cost func-
tion is included in the minimization problem in the following
way.

— Given: a function .
— Sought: an element in such that for all

in .
The domain of is called the search space, while the ele-

ments of are called candidate solutions or valid solutions. A
valid solution that minimizes the objective function is called an
optimal solution. With these premises, the selective TMR inser-
tion issue can be characterized in the following way.

— represents all possible partitions which can be generated
from the whole set of registers that compose the design
under test.

— represents a valid solution, i.e., a partition composed
by two sub-graphs in which the first contains the selected
registers to protect with TMR and the second the rest of
registers without TMR that meets the specified reliability
constraints.

All valid solutions must satisfy a specified set of constraints. In
our case, a set of reliability constraints on the circuit outputs
are used. Consequently, in order to measure the quality of a so-
lution according to the parameters of the problem (number of
registers, reliability constraints and real error rate obtained by
the fault injection), the next cost function has been defined so
that combines all these parameters

# #

#

#

where #registers is the total number of register that com-
poses the currently partition, the number of errors

detected in the output through the fault injection process,
is the error threshold that must be

satisfied by each output and finally the coefficients, which
represent the influence of each output. Currently, the cost
function takes no account of the overhead of the majority voters
in an explicit way because this factor is directly proportional to
the number of tripled registers.

C. Reliability Constraints

Reliability constraints mentioned previously
, are defined as the tolerance

level which must be met by a valid solution in any case.
These constraints are measured as the maximum percentage
of wrong cycles allowed in each output of the target design.
The optimization engine and more specifically the F cost
function requires these parameters in order to accept or deny
the movement of each register. These factors must be specified
by the user as the tolerance thresholds which must be met
by each output of the circuit and therefore, their choice is
considered key for the optimization process. Depending on
these constraints, the circuit can tolerate more or less failures,
and as a result it will have a direct effect on the insertion of
selective TMR.

D. Optimization Performance

It is known that the runtime required to obtain a valid solu-
tion, for the chosen iterative algorithm, increases with the size
of the problem. In some cases, the computation time can be so
high that the heuristic method is not able to obtain accurate so-
lutions in reasonable runtimes (too many registers). According
to this issue, it is detected that the initial partition has a
major influence to reduce the execution time of the method as it
will be shown in the experimental results of the next section. In
this case, a previous knowledge or a static analysis which reports
information of possible weak points of the circuit (Section IV)
can improve the runtime performance. This would reduce the
number of iterations, initiating the process from a selected ini-
tial partition close to a valid solution instead of a random ini-
tial partition. Also, from this information, the cost function can
be customized through the coefficients in order to prioritize
those weak points identified as sensitive to SEUs, converging
earlier to a solution. Finally, other kinds of algorithms can be
used in order to make a performance comparison, like a greedy
heuristic. Moreover, two heuristics can be studied to address
the following two sources of complexity. The first one would
consist of avoiding exhaustive simulation of all states combina-
tions using sampling and the second one would be referred to
reduce the search space, considering the addition of TMR on
those flip-flops which reach a primary output with a high prob-
ability [21], [22].

For this work, all these points have been taken into account in
the next section in order to present experimental results which
optimize the insertion of TMR in a selective way ( , reli-
ability constraints and coefficients).

III. CASE STUDY

In order to verify the proposed methodology, a case study is
analyzed in this section. The circuit under test computes the sum

Authorized licensed use limited to: Univ Complutense de Madrid. Downloaded on November 05,2024 at 12:11:25 UTC from IEEE Xplore.  Restrictions apply. 


2094 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 56, NO. 4, AUGUST 2009

Fig. 3. Tri-state pipeline stage.

TABLE I
AUTOMATIC INSERTION OF SELECTIVE TMR FOR AREA EFFICIENCY

of three 8-bit data buses , and , that is to say
, and and supplies the results on a single 9-bit output

bus in three consecutive clock cycles. Also, an extra output con-
trol signal (OutDataReady) should be active for one clock cycle
to indicate when the three summed outputs are available. The
proposed design (54 flip-flops) is shown in Fig. 3 [23].

The proposed methodology has been applied to this circuit
in order to obtain results of the automatic insertion of selective
TMR. Experimental results are collected in Table I, where three
different scenarios regarding the requested tolerance (0, 0, 5, and
1 per cent respectively) have been proposed in order to show the

area decrease compared to TMR. Also, for each reliability sce-
nario, three different tests have been proposed where the initial
partitions are changed in order to study how the method perfor-
mance can vary in terms of the explored solutions. The relia-
bility constraint has been considered only for the Y output as all
registers of the system have effect on it, included those which
determine the value of the OutDataReady output. All tests have
run simulations of 100 000 clock cycles injecting random cam-
paigns of 10 000 SEUs with a minimum separation of 3 cycles
between two consecutive ones. These results allow a compar-
ison with the area cost of the standard TMR approach (Table I).

Authorized licensed use limited to: Univ Complutense de Madrid. Downloaded on November 05,2024 at 12:11:25 UTC from IEEE Xplore.  Restrictions apply. 


RUANO et al.: A METHODOLOGY FOR AUTOMATIC INSERTION OF SELECTIVE TMR IN DIGITAL CIRCUITS AFFECTED BY SEUS 2095

With these results, it can be noticed that in the cases where
the application does not demand the highest fault tolerance level
(for example 0,5% or 1%), an intermediate protection could be
enough to satisfy the reliability constraints, reducing the area
cost in terms of flip-flops (29.6% and 40.7% respectively) com-
pared to the standard TMR approach.

On the other hand, an interesting result offered by the method-
ology consists in the detection of critical nodes which show
a bigger vulnerability to SEUs like in the case of the control
flip-flops . In all tests,
these registers have been tripled with a high priority in order
to achieve the fault tolerance needed. In this way, the
illustrated in Table I for tolerances of 0, 5, and 1% has in-
cluded in all cases these control registers in the TMR sub-graph,
and in second place the and Y registers were also
added. Finally, it also should be noticed that the initial parti-
tion is a significant factor in order to improve the performance
of the process. In this way, an innovative analysis based on the
topology of the circuit will complete the optimization method-
ology in the next section, in order to select the best initial par-
tition. This will reduce the number of searches required by the
engine. Table I emphasizes that when beginning from a partition
near to the final solution, a small number of searches is required.
For example, in the case of 0% tolerance in order to achieve the
same (54-0), starting from a partition without any register
tripled (0-54) 103 explorations are required while using a better
partition, like (26-28) or the best (54-0), 58 and 12 are needed
respectively, improving the performance considerably.

These results can be contrasted with an analysis of the cir-
cuit. As it has been said, in all final partitions the control flip-
flops have been included. This means that these flip-flops are
considered more sensitive to SEUs by the process. An SEU
in these flip-flops can alter significantly the good behavior of
the circuit being able to propagate the error during three cycles
in the worst case (SEU on InDataReady flip-flop). The right
functionality is only recovered when the SEU goes out of the
delay line of the control path . On the other hand,

, and data registers are con-
sidered less sensitive because they refresh the useful data each
cycle and also are used of the cycles, so an SEU can be ob-
served at the output or it cannot affect the output value. So, these
registers usually have not been tripled by the system due to the
low probability of observing an SEU in the Y output. Neverthe-
less, an SEU in the remaining registers and
will be observed in all cases but only during one cycle. With this
analysis, it has been proved why the methodology firstly tripli-
cates control registers, secondly and Y, and finally, if
it does not satisfy the reliability constraints yet, the rest of reg-
isters.

IV. PERFORMANCE ENHANCEMENT: SELECTION OF THE

INITIAL PARTITION

In the previous section, it could be noticed the importance of a
good initial partition for a successful optimization process. This
factor is used as an input for the optimization engine and it has
been demonstrated that it has a direct influence on the perfor-
mance of the methodology. Depending on the initial partition,
the optimization process can reach the best solution exploring

more or less partial solutions, as it is shown in Table I. So, as the
computational time depends on the circuit size, this can cause
an unacceptable computational time. In order to mitigate this
weakness, an enhancement will be presented and studied in an
experimental way, validating it and later integrating it into the
workflow, as it is shown in Fig. 4.

In this figure, a new module has been included to improve the
performance. It drives the selection process in order to choose
the best starting point for the optimization engine. This method
is based on an intuitive topological analysis of the target circuit,
which is able to improve the computational time. The output of
this module is a suggested initial partition which is supposed to
be close enough to the final solution so that the computation time
is feasible, reducing the effort of the optimization algorithm. In
Fig. 5, this idea is illustrated. Given a search space, the goal of
the proposed enhancement is not to look for the optimal solu-
tion (S) (this is the task of the optimization engine), but an ap-
proximation (D) which can reduce the number of steps needed
to get the solution. With this, it is possible to partly reduce the
dependence between the computational time and the size of the
circuit.

As it has been said above, the way to obtain an initial partition
close to the optimal solution is based on the topological analysis
of the flip-flops in the circuit. In order to do this, some topo-
logical criteria have been chosen to study how they can influ-
ence the correct behavior of the circuit in the presence of SEUs.
These criteria have been analyzed experimentally for several cir-
cuit samples, allowing to draw conclusions about their relation
to SEU sensitiveness. In other words, by examining these cri-
teria, it will be possible to identify those flip-flops which are
‘hot spots’ against SEUs. Therefore, the proposed approach al-
lows labeling this group of flip-flops as candidates to be part of
Graph 1 of the start partition (tripled registers) (Fig. 4).

In order to characterize each flip-flop according to the criteria,
the circuit is considered as a finite directed graph on vertices
(flip-flops) where its adjacency matrix is used to represent the
connectivity of the different nodes (Fig. 6) [24].

The adjacency matrix is the matrix where each no-di-
agonal entry is a direct connection from flip-flop to flip-flop
, and each diagonal entry is the feedback or loop at flip-flop
. In the following, a set of topological criteria are chosen, al-

lowing the characterization of each flip-flop. They have been
described and finally analyzed, for a wide space of circuit sam-
ples, which allow estimating their relevance in a statistical way.

A. FAN-IN Input Parameter

This parameter refers to the number of flip-flops which have
influence on the state of the analyzed flip-flop. The main reason
to use this characteristic is the fact that the greater the number
of inputs which drive the flip-flop state is, the less likely that an
SEU can be stored in it. This is because the effect of the distorted
value can be masked by the rest of inputs. On the other hand, if
there is only one input that drives the target flip-flop and it is
distorted, the SEU will be stored always. So, depending on the
number of inputs of the implemented function, the flip-flop can
be more or less tolerant to SEUs [8].

Algorithm 1 is implemented in order to measure this attribute
on a given topology.

Authorized licensed use limited to: Univ Complutense de Madrid. Downloaded on November 05,2024 at 12:11:25 UTC from IEEE Xplore.  Restrictions apply. 


2096 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 56, NO. 4, AUGUST 2009

Fig. 4. Methodology workflow with the added enhancement.

Algorithm 1: input criterion (Adjacency Matrix)

Function: input criterion (Adjacency Matrix)
Input: Adjacency Matrix,

Foreach (R) do

Return C1

B. FAN-OUT Output Parameter

This parameter refers to the number of flip-flops which are af-
fected directly by the state of each analyzed flip-flop. The main
reason to decide on this characteristic is because an SEU in a
flip-flop which propagates its state to several paths has a high
probability of affecting any design output, and therefore its ob-
servability is increased. Algorithm 2 is implemented in order to
measure this attribute:

Algorithm 2: output criterion (Adjacency Matrix)

Function: output criterion (Adjacency Matrix)

Input: Adjacency Matrix,

Foreach (R) do

(Find

Return C2

C. Influence on Output Paths

This parameter refers to the number of primary outputs which
are affected by the analyzed flip-flop. The main reason to take
this issue into account, is the fact that an SEU in a flip-flop be-
longing to more than one output path, has a higher probability to
propagate the error to at least one of these outputs, increasing its
observability. Algorithm 3 is implemented in order to measure
this attribute, using the Dijkstra’s algorithm, which is often used
in computer networks [25], [26]. It is a graph search algorithm
that solves the single-source shortest path problem for a graph.
In this case Dijkstra’s algorithm is used to find the number of
outputs which are linked to each evaluated flip-flop. When Dijk-
stra returns a path between the flip-flop and the output, it means
that there is connectivity.

Algorithm 3: Influence output criterion (Adjacency
Matrix)

Function: Influence output criterion (Adjacency Matrix)
Authorized licensed use limited to: Univ Complutense de Madrid. Downloaded on November 05,2024 at 12:11:25 UTC from IEEE Xplore.  Restrictions apply. 


RUANO et al.: A METHODOLOGY FOR AUTOMATIC INSERTION OF SELECTIVE TMR IN DIGITAL CIRCUITS AFFECTED BY SEUS 2097

Fig. 5. Selecting the initial partition: partition (A) All TMR partition; (B) None TMR partition; (C) Random partition; (D) Based on topological analysis; S)
Optimal Solution.

Fig. 6. Adjacency matrix example.

Input: Adjacency Matrix,

Foreach (R) do

Foreach (O) do

If (dijkstra

Return C3

D. Proximity to the Outputs

This parameter refers to the proximity of the evaluated
flip-flop to the outputs which are affected by it. The main
reason to take this criterion into account is the fact that de-
pending on the distance to the output, an SEU can have a lower
or higher probability to be masked. It is due to factors like
masking logic, timing logic and so on [10], [11]. An SEU in a
flip-flop with a greater distance to the output is more likely to
be masked. Algorithm 4 is implemented in order to measure
this attribute, using Dijkstra’s algorithm. In this case, it is used
to find the shortest path between each evaluated flip-flop and
all other circuit outputs affected by them.

Algorithm 4: proximity output criterion (Adjacency
Matrix)

Function: proximity output criterion (Adjacency Matrix)

Input: Adjacency Matrix,

Foreach (R) do

Foreach (O) do

// take the minimum distance among all affected
outputs

Return C4

E. Feedback Loops

This parameter refers to the situation in which the output
signal of a flip-flop is passed (fed back) to its input, directly
or through several previous stages. This is often used to control
the dynamic behavior of the system. The main reason to take
this criterion into account is the fact that an SEU in this type
of flip-flop (with feedback) can be stored in the loop, making
the effect of the SEU permanent. Algorithm 5 is implemented
in order to measure this attribute, using the same Dijkstra al-
gorithm. In this case, the direct outputs of the target flip-flop
are identified and from each one, Dijkstra algorithm looks for
the shortest path to the target flip-flop. If this path is found, it
means that there is a loop.

Algorithm 5: feedback criterion (Adjacency Matrix)

Function: feedback criterion (Adjacency Matrix)

Input: Adjacency Matrix,

Foreach (R) do

Foreach do
Authorized licensed use limited to: Univ Complutense de Madrid. Downloaded on November 05,2024 at 12:11:25 UTC from IEEE Xplore.  Restrictions apply. 


2098 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 56, NO. 4, AUGUST 2009

Fig. 7. Criteria Summary. Dark grey node: More sensitive FF, Light grey node: Less sensitive FF. (A) Input parameter. (B) Output parameter. (C) Influence on
output paths.(D) Proximity to the output. (E) Feedback loop.

// Count the feedback paths

If

Return C5

In Fig. 7, a summary of the criteria which have been intro-
duced in this section is shown. In each example, the expected
behavior of the highlighted target flip-flops is estimated against
the injection of SEUs through a color code. Dark grey nodes are
expected to be more sensitive than light grey nodes according

to the explained criteria. These criteria will be analyzed ex-
perimentally through fault injection in the following section, in
order to validate the expected behaviors.

Finally, these algorithms have been implemented in order
to obtain, in an automatic way, a criteria matrix, where each
flip-flop is quantified for each of these criteria.

V. EXPERIMENTAL ANALYSIS OF THE CRITERIA

In this section, the behavior of each criterion is verified exper-
imentally, in order to guarantee the explained logic principles of
the previous section. A set of specific circuits have been used to
make the fault injection and prove the expected behavior. These
circuits have been selected in order to measure the influence of
each criterion independently (Fig. 7). Next, the obtained results
achieved through fault injection are shown in Figs. 8 to 12. The
results illustrate the number of error cycles observed at the out-
puts (Y axis) after having injected 1000 SEUs per register.

Authorized licensed use limited to: Univ Complutense de Madrid. Downloaded on November 05,2024 at 12:11:25 UTC from IEEE Xplore.  Restrictions apply. 


RUANO et al.: A METHODOLOGY FOR AUTOMATIC INSERTION OF SELECTIVE TMR IN DIGITAL CIRCUITS AFFECTED BY SEUS 2099

Fig. 8. Experimental analysis of FAN-IN.

Fig. 9. Experimental analysis of FAN-OUT.

All these graphics represent the value of the associated
criterion versus the measured error rate (wrong cycles/injected
SEUs). With these results, it can be noticed that in all cases
the behavior of the criteria corresponds with the expected one,
explained in the previous section.

For the first criterion (fan-in), the trend is to decrease the error
rate when the number of inputs that drive the flip-flop increases.
In this case, it can be concluded that a flip-flop with a higher
fan-in should not be tripled.

For criteria 2 and 3, the trend is to increase the error rate ac-
cording to the increment of the number of paths/outputs respec-
tively. In this case, it can be concluded that a flip-flop with a
larger fan-out is susceptible to be tripled.

For criterion 4, the experimental results meet the expected
behavior too. In this case, it can be observed that depending on
the distance of the injected flip-flop to one primary output, the
measured error rate increases. This is due to the filter capacity

of the masking logic. In this case, it can be concluded that a
flip-flop which is far from a primary output is not convenient
to be tripled. Finally for the last criterion (feedback loop), the
experimental results show a bigger vulnerability for those flip-
flops that have larger feedback loops against others with smaller
loops or the simple case without any loop. So it can be concluded
that a flip-flop with feedback loops is more convenient to be
tripled, depending on the size of the loop. As a result, taking
into account the knowledge of the behavior against SEUs for
all these common topological features, they can be used in an
automatic way to propose an initial partition which is able to
reduce the search space, thus improving the efficiency of the
methodology.

VI. QUALITY ANALYSIS OF THE ENHANCED TECHNIQUE

In order to verify the proposed enhancement, the same case
study of Section III, is analyzed adding the criteria analysis. The

Authorized licensed use limited to: Univ Complutense de Madrid. Downloaded on November 05,2024 at 12:11:25 UTC from IEEE Xplore.  Restrictions apply. 


2100 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 56, NO. 4, AUGUST 2009

Fig. 10. Experimental analysis of the influence on output paths.

Fig. 11. Experimental analysis of the proximity to the output.

objective is to prove how the selection of a good initial partition,
based on the mentioned criteria, can improve the efficiency of
the optimization method. Now, for each reliability scenario pro-
posed in Table I (0%, 0,5%, 1%) a similar test has been per-
formed, where the only difference is the initial partition which
is chosen according to the previous criteria set. To accomplish
this, the algorithm set introduced in Section IV has been imple-
mented in order to obtain a criteria matrix where the values of
the criteria for each flip-flop are calculated. On this criteria ma-
trix, once all flip-flops have been characterized, an algorithm has
been applied in order to select those flip-flops which meet the
next requirement. Any flip-flop is considered sensitive to SEUs
if

— at least one of the criteria C2, C3 or C5 has the maximum
value;

— the value of criterion C4 is high and has to affect at least a
high-priority output;

— the value of criterion C1 is not high.
Applying these constraints, the following initial partition has
been chosen by the automatic algorithm

With TMR

Without TMR .

Next, this initial partition will be justified following the pre-
vious criteria. For A hold, Indataready delay, Enb, and Enc, the
reason is because they have the maximum fan-out value. In ad-
dition, Indataready delay, Enb, and Enc affect the two primary
outputs of the system, and therefore they are triplicated. For
Ena delay, M delay and Out delay, the proximity to the primary

Authorized licensed use limited to: Univ Complutense de Madrid. Downloaded on November 05,2024 at 12:11:25 UTC from IEEE Xplore.  Restrictions apply. 


RUANO et al.: A METHODOLOGY FOR AUTOMATIC INSERTION OF SELECTIVE TMR IN DIGITAL CIRCUITS AFFECTED BY SEUS 2101

Fig. 12. Experimental analysis of the feedback loop.

TABLE II
COMPARISON OF PARTIAL SOLUTIONS EXPLORED WITH OR WITHOUT ENHANCEMENT

output has been useful. Ena delay is selected because its cost of
triplication is lower than A delay, which has the same value for
criterion C4, but it is formed by more flip-flops. The rest of reg-
isters make up the no-TMR sub-graph. For example Ena has a
higher fan-in so the first criterion (fan-in) makes it unnecessary
to be tripled.

Finally, this initial partition has been used to perform the op-
timization process again on the case study. In Table II the ex-
perimental results with this new partition (29–25) are shown, in
order to compare the use of the topological criteria versus the
use of trivial initial partitions.

With these results, it can be concluded that this method-
ology helps the optimization engine to improve its efficiency,
obtaining the same results in terms of cost as that in Table I
(final partition). The only case where the technique does not
improve the performance is the specific situation where 0% of
wrong cycles are required. In this case, it is clear that starting
with all registers in TMR would be faster. Nevertheless, if a
certain degree of fault tolerance is allowed, which is the usual
situation in real applications, using the topological criteria to
find a good initial partitioning is the most convenient approach.
For this example, the search space is rather small and therefore
performance has been improved by a small margin. But for
larger designs, this technique would be able to provide higher
time savings by finding an initial partition closer to the optimal
solution.

VII. CONCLUSIONS AND FUTURE WORK

In this paper, a new methodology to insert selective TMR for
SEU mitigation has been presented. The benefits of applying the
automatic selective TMR methodology are clearly proved with
the experimental results that have been obtained: the technique
results in a lower circuit complexity than the traditional TMR
approach, while meeting the specified reliability level. Besides,
this methodology has been improved in terms of performance,
adding an enhancement based on an innovative topological anal-
ysis of the target circuit. The benefit of this approach has been
verified through fault injection experiments. This improvement
results in a reduction of the number of explored partial solutions
by the optimization engine, due to the calculation of an initial
partition close enough to the optimal solution.

The next steps will focus on the study of these criteria in
general purpose circuits and real space circuits like SpaceWire
[27]. Also, a more sophisticated model that studies all the cal-
culated criteria and automatically produces the set of conditions
to initially map or not any given flip-flop in TMR will be pro-
duced. Finally the study of other optimization algorithms (simu-
lated annealing, genetic, mimetic, adaptive), and alternative cost
functions that can be adapted prioritizing the influence of each
output will be carried out.

Authorized licensed use limited to: Univ Complutense de Madrid. Downloaded on November 05,2024 at 12:11:25 UTC from IEEE Xplore.  Restrictions apply. 


2102 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 56, NO. 4, AUGUST 2009

REFERENCES

[1] Y. Moreau, S. Duzellier, and J. Gasiot, “Evaluation of the upset risk in
CMOS SRAM through full three dimensional simulation,” IEEE Trans.
Nucl. Sci., vol. 42, no. 6, pt. 1, pp. 1789–1796, Dec. 1995.

[2] F. Irom, F. F. Farmanesh, A. H. Johnston, G. M. Swift, and D. G.
Millward, “Single-event upset in commercial silicon-on-insulator Pow-
erPC microprocessors,” IEEE Trans. Nucl. Sci., vol. 49, no. 6, pt. 1, pp.
3148–3155, Dec. 2002.

[3] C. Carmichael, “Triple modular redundancy design techniques for
virtex series FPGA,” presented at the Application Notes 197, San Jose,
CA, 2000, Xilinx.

[4] R. Hentschke, F. Marques, F. Lima, L. Carro, A. Susin, and R. Reis,
“Analysing area an performance penalty of protecting different dig-
ital modules with hamming code and triple modular redundancy,” pre-
sented at the Symp. Integrated Circuits and Systems Design, Proceed-
ings, Sep. 2002.

[5] R. Velazco, D. Bessot, S. Duzellier, R. Ecoffet, and R. Koga, “Two
CMOS memory cells suitable for the design of SEU-tolerant VLSI cir-
cuits,” IEEE Trans. Nucl. Sci., vol. 41, no. 6, pt. 1, pp. 2229–2234, Dec.
1994.

[6] L. Sterpone and M. Violante, “A new partial reconfiguration-based
fault-injection system to evaluate SEU effects in SRAM-based
FPGAs,” IEEE Trans. Nucl. Sci., vol. 54, no. 4, pt. 2, pp. 965–970,
Aug. 2007.

[7] F. Lima, S. Rezgui, E. Cota, L. Carro, M. Lubaszewski, R. Reis, and R.
Velazco, “Designing a radiation hardened 8051-like micro-controller,”
in Proc. Symp. Integrated Circuits and Systems Design, Sep. 2000, pp.
255–260.

[8] P. K. Samudrala, J. Ramos, and S. Katkoori, “Selective triple modular
redundancy (STMR) based single-event upset (SEU) tolerant synthesis
for FPGAs,” IEEE Trans. Nucl. Sci., vol. 51, no. 10, pp. 2957–2969,
Oct. 2004.

[9] P. K. Samudrala, J. Ramos, and S. Katkoori, “Method and Apparatus
for Creating Circuit Redundancy in Programmable Logic Devices,”
U.S. Patent Pub. No. WO/2004/077260, Oct. 2004.

[10] S. Krishnaswamy, S. M. Plaza, I. L. Markov, and J. P. Hayes, “En-
hancing design robustness with realiability-aware resynthesis and logic
simulation,” in Proc. Int. Conf. Computer Aided Design, pp. 149–154.

[11] S. Krishnaswamy, I. L. Markov, and J. P. Hayes, “On the role of timing
masking in reliable logic circuit design,” presented at the Design Au-
tomation Conf., 2008.

[12] S. Almukhaizim and Y. Makris, “Soft error mitigation through selec-
tive addition of functionally redundant wires,” IEEE Trans. Rel., vol.
57, no. 1, pp. 23–31, Mar. 2008.

[13] B. Pratt, M. Caffrey, J. F. Carroll, P. Graham, K. Morgan, and M.
Wirthlin, “Fine-grain SEU mitigation for FPGAs using partial TMR,”
IEEE Trans. Nucl. Sci., vol. 55, no. 4, pt. 1, pp. 2274–2280, Aug. 2008.

[14] F. Lima, G. Neuberger, R. Hentschke, L. Carro, and R. Reis, “De-
signing fault-tolerant techniques for SRAM-based FPGAs,” IEEE Des.
Test Comput., vol. 21, no. 6, pp. 552–562, Nov.–Dec. 2004.

[15] P. Reyes, P. Reviriego, J. A. Maestro, and O. Ruano, “New protection
techniques against SEUs for moving average filters in a radiation en-
vironment,” IEEE Trans. Nucl. Sci., vol. 54, no. 4, pp. 957–964, Aug.
2007.

[16] O. Ruano, P. Reviriego, and J. A. Maestro, “A new EDAC technique
against soft errors based on pulse detectors,” presented at the IEEE Int.
Symp. Industrial Electronics, Cambridge, U.K., Jun. 2008.

[17] D. Gonzalez, Single Event Upset Simulation Tool Functional Descrip-
tion ESA Rep. TEC-EDM/DCC-SST2, Jul. 2004.

[18] O. Ruano, J. A. Maestro, P. Reyes, and P. Reviriego, “A simulation
platform for the study of soft errors on signal processing circuits
through software fault injection,” in Proc. IEEE Int. Symp. Industrial
Electronics, 2007, pp. 3316–3321.

[19] J. C. Baraza, J. Gracia, S. Blanc, D. Gil, and P. J. Gil, “Enhancement of
fault injection techniques based on the modification of VHDL code,”
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 6, pp.
693–706, Jun. 2008.

[20] C. Fiduccia and R. Mattheyses, “A linear time heuristic for improving
network partitions,” in Proc. 19th IEEE Design Automation Conf.,
1982, pp. 175–181.

[21] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing
for Digital, Memory, and Mixed-Signal VLSI Circuits. New York:
Springer-Verlag, 2000, 978-0-7923-7991-1.

[22] H. Fujiwara and T. Shimono, “On the acceleration of test generation
algorithms,” IEEE Trans. Comput., vol. C-32, no. 12, pp. 1137–1144,
Dec. 1983.

[23] D. J. Smith, HDL Chip Design. A Practical Guide for Designing, Sythe-
sizing and Simulating ASICs and FPGAs Using VHDL or Verilog .
Madison, AL: Doone, 2001, ISBN 0-9651934-3-8.

[24] J. L. Gross and J. Yellen, Grahp Theory and Its Applications. Boca
Raton, FL: Chapman & Hall/CRC, ISBN: 158488505X.

[25] E. Dijkstra, “A note on the problems in connection with graphs,”
Numer. Math., Oct. 1959.

[26] W. Stallings, Data & Computer Communications. Upper Saddle
River, NJ: Pearson Prentice-Hall, 2007, ISBN-10: 0132433109.

[27] SpaceWire. [Online]. Available: http://spacewire.esa.int/con-
tent/Home/HomeIntro.php

Authorized licensed use limited to: Univ Complutense de Madrid. Downloaded on November 05,2024 at 12:11:25 UTC from IEEE Xplore.  Restrictions apply.