# Soft Error Resilient VLSI Architecture for Signal Processing

# Dawood ALNAJJAR\*, Younghun KO\*, Takashi IMAGAWA<sup>†</sup>, Masayuki HIROMOTO<sup>†</sup>, Yukio MITSUYAMA\*, Masanori HASHIMOTO\*, Hiroyuki OCHI<sup>†</sup>, and Takao ONOYE\*

\* Dept. Information Systems Engineering, Osaka University, Japan & JST CREST

<sup>†</sup> Dept. Communications and Computer Engineering, Kyoto University, Japan & JST CREST

Abstract—This paper presents a reliability-configurable coarse-grained reconfigurable array for signal processing, which offers flexible reliability to soft error. A notion of cluster is introduced as a basic element of the proposed reconfigurable array, each of which can select one of four operation modes with different levels of spatial redundancy and area-efficiency. Evaluation of permanent error rates demonstrates that four different reliability levels can be achieved by a cluster of the reconfigurable array. A fault-tolerance evaluation of Viterbi decoder mapped on the proposed reconfigurable array demonstrates that there is a considerable trade-off between reliability and area overhead.

# I. INTRODUCTION

Signal processing in critical applications, such as medical services, financial services, transportation, and security systems, must be highly reliable. High reliable signal processing can be achieved on high reliable VLSI. With the aggressive process scaling, however, sustaining reliability has became a major concern in VLSI design. As devices are miniaturized, critical charge, which is the minimum charge to cause a bit flip, becomes smaller, and functional correctness has been threatened by soft errors. On the other hand, reliability requirements depend on applications and operating environment, and hence, a design scheme that can flexibly choose countermeasures to reliability degradation is demanded.

To attain immunity to soft errors, soft error-tolerant designs have been attempted, especially in aerospace applications. For this purpose, time redundancy, spatial redundancy, and error correction coding (ECC) have been widely studied and utilized to detect a soft error and avoid a failure[1], [2]. Especially for an application implementation with spatial redundancy, a reconfigurable device is suitable since redundant hardware, e.g. triple modular redundancy (TMR), can be easily realized thanks to the regular array structure. Besides, voters and ECC circuits are essential elements in attaining immunity to soft errors. In case of using fine-grained reconfigurable devices such as FPGAs, voters or ECC circuits can be implemented by LUTs in any part of the device and as many as necessary. In contrast, coarse-grained reconfigurable devices suffer from inefficiency in implementing voters or ECC circuits, since conventional coarse-grained reconfigurable architectures have no reliability consideration and do not equip such functionalities in their basic elements.

Motivated by these tendencies, the present paper proposes a reliability-configurable coarse-grained reconfigurable array for signal processing. The functionality and interconnect architecture of the reconfigurable device are based on our previous architecture[3], which offers media processing capabilities such as multi-standard video decoding. This paper focuses on the additional mechanism to change reliability levels depending on applications and environments.

For reliability-oriented applications, the reconfigurable architecture achieves a sufficient level of reliability at the cost of area and power overhead, while for cost-oriented applications it provides area/power-efficiency. In addition, it can be also noted that reliability requirements are different even among circuit modules within an application, e.g. control parts of an application may require higher reliability level than that of datapath parts. We devise a scheme where the reliability level can be selected individually for each basic element of the reconfigurable architecture to reduce area and power overhead by avoiding excessive reliability as much as possible.

To utilize the flexible reliability scheme, it is necessary to find out the required immunity to soft errors for each basic element. For that purpose, we developed a fault-tolerance evaluation method. In this paper, we also measure the number of sensitive bits in the configuration memory as an index of vulnerability, and demonstrate that there is a considerable difference among basic elements.

#### II. FLEXIBLE RELIABILITY IN ARCHITECTURE DESIGN

In order to achieve flexible reliability with area efficiency on the proposed reconfigurable architecture, the reliability of each basic element should be alterable according to its sensitivity to soft errors. Throughout this paper, a single soft error in a memory element and a soft error in combinational logic will be referred to as a single event upset (SEU) and a single event transient (SET), respectively.

In the architecture design of basic elements, classification of required flexible reliability should be discussed. Reliability of the configuration memory is often considered more seriously than that of the computed data, since an SEU on the configuration memory permanently damages the functionality until the configuration data is reloaded again, which we will be referring to as a permanent error throughout the paper. We thus suppose the following four conditions C1-C4 for basic elements of the reconfigurable architecture in this study:

- C1: functionality must be correct, and computed data must be correct as well,
- C2: functionality must be correct, and errors in computed data can be detected, however some of them can be corrected,
- C3: functionality must be correct, and errors in computed data are not considered,
- C4: no consideration for error detection and recovery is necessary.

Considering the conditions above, we define four operation modes corresponding to the four conditions, TMR, double



Fig. 1. Cluster and cluster interconnection.

modular redundancy (DMR), single modular with single context (SMS), and single modular with multi-context (SMM), as shown in table I. The four operation modes offer different reliability levels (redundancy) and different capabilities of dynamic reconfigurability (#contexts). They will be explained more thoroughly in section III.

# III. PROPOSED ARCHITECTURE FOR FLEXIBLE RELIABILITY

#### A. Reconfigurable architecture for soft error tolerance

Figure 1 illustrates the overview of the proposed architecture. Having designed the architecture independent from the granularity, we will represent the granularity using the variable n. Clusters, which are basic elements of our architecture, are placed repeatedly in a two-dimensional array. A cluster has a switch (CFGSM: configuration memory switching matrix) and four cells, each of which consists of an execution module (EM)(in case of ALU and multiplier cluster) or a register module (RM)(in case of register cluster), three configuration memories (CFGs) for dynamic configuration of the EM/RM, and voters (VCs), all in a reconfigurable cell unit (RCU). To realize flexible dependability, the proposed architecture also introduces a redundancy control unit (RDU) and a comparing and voting unit (CVU). Inter-cluster interconnection has four tracks (Track0-3), through which each cell in a cluster is connected to the cells in adjacent clusters. Inside a cluster, each cell can be connected to adjacent cells via a diagonal intra-cluster connection. This overall interconnection enables application mapping in all four operation modes, which are summarized in Table I. TMR, DMR, SMS and SMM, which correspond to the four conditions C1-C4, are supported. In



Fig. 2. Operations in TMR, DMR, SMS and SMM modes.

the case of TMR, DMR and SMS, as shown in Fig. 2, each cell has three redundant CFGs which contain one context (C0, C1, C2), three VCs, a selector CS (a part of CFGSM) and the EM/RM. An SEU occurring in the CFG will be repaired when the next clock is given to the CFGs, since the voted value is rewritten to the CFGs in every clock cycle. On the other hand, in SMM mode, the voters are disabled, and three contexts are stored in the CFGs of each cell.

In TMR mode shown in Fig. 2, the outputs of three EMs/RMs pass through the three voters (VD), while the forth cell is reserved as a spare cell. An SET or SEU occurring in VC, CS and EM will be recovered in the VDs. With the prohibition of data feedback inside a cluster and the enforcement of voting at every output of the cells, the proposed architecture can avoid error accumulation in EM/RM without introducing rollback mechanism. In DMR mode, on the other hand, the outputs of the EMs along with the parity bits are directed to a comparator and selector (C&S). SEUs occurring in the registers of EM are detectable using parity bits, and can be recovered in C&S by selecting the correct output. However, SETs in VC, CS and EM can only be detected in the C&S. In the case of SMS and SMM, only SEUs in the registers of EM can be detected, while SETs in CS and EM will propagate to successive clusters.

The RDU configures the operation mode of the cluster, cell usage selection, and the context selection stored in the CFGs. This configuration data is stored with bitwise TMR, and hence, the RDU is SEU-tolerable. The dynamic context selection can be carried out just by changing the 2 bits in the RDU.

#### B. Functionality of reconfigurable cells

The proposed architecture has three types of reconfigurable clusters, ALU, multiplier, and register clusters. In ALU cluster, each EM in the cell has an *n*-bit ALU, a shifter and a parity error detector (PED) as depicted in Fig. 3. The EM can be configured to perform addition and subtraction operations with or without cooperation of the neighboring cells. It also can be configured to perform logical operations such as logic AND and OR, multiplexing, and fixed or variable shifting. In SMS and SMM modes, the cluster can perform multi-byte operations through the cooperation of the neighboring cells.



Fig. 3. Execution module (EM) for Multiplier and ALU clusters (n=8).

The EM in the multiplier cluster contains a multiplier, a PED and a shifter. It can be configured to perform  $n \times n$  bit signed/unsigned multiplication.

On the other hand, RMs in register clusters contain a 16word register file with word size of n-bit. The register file can work not only as a register file, but also as a delay unit, which outputs the input data after 1-16 cycles, or as an LUT.

### IV. ARCHITECTURE EVALUATION

# A. Soft error reliability

1) Preparation: Let  $\lambda_U$  denote SEU rate of 1 bit memory element. As for SET, we assume that SET rate is proportional to the area of a combinational circuit. We thus calculate SET with  $\lambda_T$ , which is the SET rate in a single gate. We here assume that  $\lambda_T$  only includes SETs that are captured in FFs, while SETs that are filtered out by electrical, logical and temporal maskings are not included. SEU rates in memory blocks such as CFG, EM/RM, and RDU are expressed as  $\lambda_U$ multiplied by the number of memory bits. In contrast, SET rates in combinational logic or partial combinational logic such as VC, CS, EM/RM, and VD are calculated based on their area. We enumerated all cases in which a permanent error could occur. We then derived analytical expressions of error rates that correspond to the enumerated cases. And then, we evaluated the error rates of each mode using the derived expressions.

2) Discussions on reliability in four operation modes: The permanent error rates of four operation modes is evaluated. Supposing this device will be utilized for aerospace applications, we assume an SEU rate on the satellite orbit[4], and  $\lambda_U=2.0$  FIT<sup>1</sup> is used for the evaluation. As for SET rate  $\lambda_T$ , it is difficult to choose an appropriate value, and hence we evaluated the permanent error rate with various  $\lambda_T$ . Figure 4 shows the results of using a 100MHz clock for the configuration memory and EMs for the ALU cluster.

The permanent error rate of ALU cluster in TMR mode is about  $10^{-16}$  FIT, and high reliability is attained. The permanent error rate of DMR mode depends on  $\lambda_T/\lambda_U$ , because an SET in EM is detectable but is not correctable. When SET rate  $\lambda_T$  is much smaller than  $\lambda_U$ , DMR provides moderate reliability level between TMR and SMS. On the other hand, when  $\lambda_T$  is comparable to  $\lambda_U$ , the permanent error rate of DMR is close to that of SMS. When we use DMR, SET rate should be carefully examined.



Fig. 4. Permanent error rate of ALU cluster in four modes.

TABLE II Area overhead (n = 8).

|              | ALU Cluster |          | Mult. Cluster |          | Reg. Cluster |          |
|--------------|-------------|----------|---------------|----------|--------------|----------|
| Block Name   | Area        | Overhead | Area          | Overhead | Area         | Overhead |
| CVU          | 378         | 378      | 684           | 684      | 260          | 260      |
| RDU          | 194         | 194      | 194           | 194      | 194          | 194      |
| CFGSM        | 2,053       | 793      | 1,770         | 690      | 1,532        | 512      |
| EMs/RMs      | 3,054       | 196      | 4,414         | 257      | 8,717        | 323      |
| CFGs         | 6,048       | -        | 5,184         | -        | 4,896        | -        |
| VCs          | 4,186       | 4,186    | 3,611         | 3,611    | 3,314        | 3,314    |
| Interconnect | 2,955       | -        | 3,446         | -        | 3,032        | -        |
|              | 18,868      | 5,747    | 19,303        | 5,436    | 21,945       | 4,603    |

The permanent error rate of SMS is higher than DMR and TMR, as we expected. It might be thought that SMS and SMM have quite similar reliability, because in this evaluation all errors are treated equally. However, in SMM mode, the configuration information is not protected, and the functionality of the circuit is destroyed by an SEU/SET, whereas the configuration memory is protected in SMS. Thus, the reliability of SMS and SMM are different.

#### B. Area overhead

We show the area overhead that is introduced to attain immunity to soft errors and realize four operation modes. To analyze the overhead quantitatively, we compared the number of gates of the proposed architecture with that of a baseline architecture containing minimum hardware enough to perform dynamic reconfiguration properly, but which is not immune to SEU and SET. The gate count of both the proposed architecture and the baseline architecture is estimated in RTL design with an industrial 90nm cell library and Synopsis Design Compiler and is listed in Table II, where n is 8-bit. The area of additional circuits to provide flexible dependability occupies 21.0% to 30.5% of the total area and the average is 26.6%. Most of area overhead arises from voters for configuration memory (VCs), and the other part is limited. On the other hand, the overhead varies depending on the data width of the architecture. When n=16 and 32, the area overhead of ALU cluster is reduced to 25.6% and 19.7%, respectively.

#### V. SIGNAL PROCESSING APPLICATION

We evaluate the trade-off between area and reliability of the proposed cluster array, on which an example application is implemented using both TMR and SMM modes. The reliability is evaluated in a similar way to [5], that is, by counting the number of "sensitive bits" for each cell. A configuration memory element that impacts the primary output of a particular design is called a "sensitive bit". For counting the number of sensitive bits, the cycle-based simulation is performed for the implementations with each bit in configuration flipped. If

<sup>&</sup>lt;sup>1</sup>1 FIT =  $1 \times 10^{-9}$  error/hour



Fig. 5. Implementation results of Viterbi decoders and the number of total sensitive bits and configuration information.

the primary output becomes erroneous, the bit is classified as a sensitive bit.

As a sample application, Viterbi decoder (constraint length is 3) was manually mapped on the cluster array. Viterbi decoding is divided into three parts: branch metric, path metric, and path memory. Figure 5 demonstrates two cases of mapping: one with all clusters configured as TMR (All-TMR), and the other with all clusters configured as SMM (All-SMM). This figure shows that All-TMR has no sensitive bits with about 2 times area overhead.

We also analyzed the distribution of the sensitive bit on All-SMM, as shown in Fig. 6. This figure shows that each cell has different requirement of reliability, depending on its functionality. The bottom row of the path metric clearly contains a low number of sensitivity bits, which is due to the fact that the clusters in the row are utilized only for flag interconnect. This evaluation result suggests that configuring the operation mode of each cluster individually can improve the area-reliability trade-off.

Finally, Each part of Viterbi decoder is mapped in two ways: one is that all clusters of the part are configured as TMR mode (denoted by T), the other is that all clusters of the part are configured as SMM mode (denoted by S). Combining these three part, eight patterns of Viterbi decoders are obtained. The pattern is denoted by three characters (e.g. T-S-T) where the first, second, and third characters correspond to the mode of branch metric, path metric , and path memory, respectively.

Figure 7 describes the number of required clusters and the number of sensitive bits in each configuration pattern. The unused clusters are not included in the number of required clusters. In "S-S-S", the number of sensitive bits is 803 with 29 clusters, while in "T-T-T" the number of sensitive bits is 0 with 58 clusters. Therefore, there is a considerable trade-off between area and reliability.

#### VI. CONCLUSIONS

A reliability-configurable coarse-grained reconfigurable array, in which four operation modes with different reliability and area-efficiency can be selected for each cluster, has been proposed. Evaluation results of permanent error rates show that the proposed reconfigurable architecture can realize flexible reliability to soft errors through four operation modes. The area overhead to attain considerable mitigation and provide



Fig. 6. Distribution of the sensitive bits with All-SMM.



Fig. 7. Area-reliability trade-off of the Viterbi Decoder.

flexible reliability accounts for 26.6% of the proposed coarsegrained dynamically reconfigurable device. In addition, faulttolerance evaluation based on sensitive bits of Viterbi decoder suggests that the variation of the number of sensitive bits at each cluster could be utilized to improve the trade-off between reliability and area overhead.

#### VII. ACKNOWLEDGMENT

The authors would like to thank the project members of JST CREST of Kyoto University, Kyoto Institute of Technology, Nara Institute of Science and Technology, and ASTEM RI for their discussions.

#### References

- M. Nicolaidis, "Time redundancy based soft-error tolerance to rescue nanometer technologies," in *Proc. IEEE VLSI Test Symposium*, pp.86–94, April 1999.
- [2] L. Anghel, D. Alexandrescu, and M. Nicolaidis, "Evaluation of a soft error tolerance technique based on time and/or space redundancy," in *Proc. Symposium on Integrated Circuits and Systems Design*, pp. 237–242, April 2007.
- [3] Y. Mitsuyama, K. Takahashi, R. İmai, M. Hashimoto, T. Onoye, and I. Shirakawa, "Area-efficient reconfigurable architecture for media processing," IEICE Trans. Fundamentals, Vol. E91-A, No. 12, pp. 3651-3662, Dec. 2008.
- [4] E. Fuller, M. Caffrey, A. Salazar, C. Carmichael, and J. Fabula, "Radiation testing update, SEU mitigation, and availability analysis of the Virtex FPGA for space reconfigurable computing," in *Proc. Military and Aerospace Programmable Logic Devices*, pp.6.1–6.7, Sept. 2001.
- [5] B. Pratt, M. Caffrey, P. Graham, K. Morgan, and M. J. Wirthlin, "Improving FPGA design robustness with partial TMR," in *Proc.* of *IEEE International Reliability Physics Symposium (IRPS)*, pp.226–232, March 2006.