# Critical Path Isolation for Time-to-Failure Extension and Lower Voltage Operation

Yutaka Masuda Department of Information Systems Engineering, Osaka University masuda.yutaka@ist.osakau.ac.jp Masanori Hashimoto Department of Information Systems Engineering, Osaka University hasimoto@ist.osakau.ac.jp

Takao Onoye Department of Information Systems Engineering, Osaka University

# ABSTRACT

Device miniaturization due to technology scaling has made manufacturing variability and aging more significant, and lower supply voltage makes circuits sensitive to dynamic environmental fluctuation. These may shorten the time to failure (TTF) of fabricated chips unexpectedly. This paper focuses on critical path isolation, which increases timing slack of non-intrinsic critical paths and decreases timing error occurrence probability in the circuit, and proposes a design methodology of isolated circuits for TTF extension and/or lower voltage operation. The proposed methodology selects a set of FFs for isolation using ILP so that it maximumly reduces the sum of gate-wise failure probabilities. We evaluated MTTF (Mean Time To Failure) of circuits with/without critical path isolation and examined how much supply voltage could be reduced without MTTF degradation. Evaluation results show that circuits with the proposed critical path isolation achieved 25% supply voltage reduction with 1.4% area overhead. With the same supply voltage, MTTF was improved by 14 orders of magnitude.

## **Keywords**

critical path isolation; mean time to failure; supply voltage reduction; stochastic timing error rate estimation; integer linear programming

## **1. INTRODUCTION**

In a synchronous sequential circuit, a timing error occurs when the signal propagation time through the combinational circuit exceeds the clock cycle time. The signal propagation time varies depending on manufacturing variability, environmental fluctuation like supply noise and temperature gradation, and aging effects. Manufacturing variability includes variations of threshold voltage, gate length, oxide thickness, etc., which vary gate delay and wiring delay. Aggressive device miniaturization due to technology scaling has made the manufacturing variability more and more significant, and lower supply voltage makes circuits sensitive to supply noise and temperature variation. In addition, aging degrades performance, and one of representative aging phenomena is NBTI (Neg-

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

ICCAD '16, November 07-10, 2016, Austin, TX, USA

© 2016 ACM. ISBN 978-1-4503-4466-1/16/11...\$15.00

DOI: http://dx.doi.org/10.1145/2966986.2967019

ative Bias Temperature Instability) [1] [2]. NBTI changes PMOS threshold voltage due to the gate oxide degradation induced by negative bias voltage. Reference [3] reported that circuit speed could degrade by up to 80 % after 10 year operation. This report might be too pessimistic, but an important point here is that the above-mentioned delay variations directly lead to circuit performance degradation and shorten the time to failure (TTF) of the chip.

For avoiding timing errors due to manufacturing variability, environmental fluctuation and aging, design and operational margins are given in design time and in field operation. However, as the performance degradation becomes significant, such margin tends to be too painful for designers. Timing closure becomes more and more difficult and time-consuming, and sometimes it is infeasible. To set the necessary and sufficient margins taking into account all the variation sources including aging effects, we need to estimate the chip TTF in design time.

Fig. 1 illustrates the TTF variation originating from the stochastic properties of manufacturing variability and aging process. Statistical characterization of manufacturing variability is studied comprehensively in the last decades, and its statistical modeling is now a common practice. In addition, it is reported that threshold variation due to NBTI also varies statistically. Due to these statistical properties, the time when the circuit delay exceeds the clock cycle time, which corresponds to TTF, varies as illustrated in Fig. 1. Very recently, on the other hand, a stochastic framework that estimates mean TTF (MTTF) is proposed in [5] [6]. This framework considers manufacturing variability, temporal environmental fluctuation and aging in the MTTF estimation. In addition, workload-dependent path activation probabilities are taken into account. With this framework, it becomes possible to know, for example, the trade-off between MTTF and supply voltage.

This work proposes a design methodology that minimizes supply voltage for power reduction while satisfying the given MTTF specification. The key idea of the proposed design methodology is critical path isolation. Critical path isolation gives timing slacks to non-intrinsic critical paths (i.e. paths whose delay can be reduced and yet is close to the intrinsic critical path delay for the sake of power and area saving), and reduces the number of paths whose delays are very close to those of the intrinsic critical paths. In this case, we can expect that circuits with critical path isolation have fewer paths where timing errors are likely to occur. Therefore, TTF extension can be anticipated. In this work, we construct a design methodology of critical-path-isolated (CPI) circuits. The design methodology needs to choose paths to be isolated and how much timing slack should be added for each path. In this work, on the other hand, we propose FF-based critical path isolation that assigns timing slack to each FF instead of path-based slack assignment. We



Figure 1: TTF variation due to manufacturing variability and aging variation.

develop a FF selection method using ILP that maximumly reduces the sum of gate-wise failure probabilities. After the FF selection, we add achievable maximum timing margins to those FFs individually. We evaluate the effectiveness of the proposed design methodology in terms of supply voltage and average lifetime, i.e. MTTF, using the stochastic timing error rate estimation method [5] [6]. Experimental results in which manufacturing variability, supply voltage fluctuation and NBTI are considered, show that the proposed design methodology reduces supply voltage by 25.0% while keeping MTTF.

This paper is organized as follows. Section 2 introduces the concept of critical path isolation and related works, and formulates the problem of CPI circuit design. Section 3 describes the proposed design methodology. Section 4 experimentally evaluates the improvement thanks to critical path isolation in terms of supply voltage reduction and MTTF extension. Lastly, concluding remarks are given in Section 5.

# 2. CRITICAL PATH ISOLATION AND PROB-LEM FORMULATION

This section, first, explains the concept of critical path isolation and its related work. Next, we formulate the critical path isolation for lowering supply voltage while keeping MTTF as an optimization problem.

# 2.1 Critical Path Isolation

Critical path isolation increases timing slacks of non-intrinsic critical paths. Fig. 2(a) illustrates the path delay distribution of a conventional circuit. In a conventional circuit design flow, cell instances that are included in non-critical paths are replaced with smaller cells and/or high-Vth cells for reducing power dissipation and area. Therefore, the number of paths whose delays are close to the critical path delay increases. On the other hand, this replacement decreases timing margin of the paths that go through the replaced instances and may increase the timing error occurrence probability under variations. In other words, more instances are sensitive to timing error occurrence. As mentioned in the previous section, each instance delay largely varies due to manufacturing variability, supply noise and aging, and consequently the probability that the instance delay variation causes a path delay violation becomes higher.

On the other hand, in the CPI circuit, non-intrinsic critical paths have timing margin even when gate delay varies. Fig. 2(b) exemplifies the path delay distribution of the CPI circuit. In this case, timing error occurrence probability in these paths is dramatically reduced compared to the conventional circuit, which is the main advantage of the critical path isolation. However, from another point of view, the CPI circuit needs to give up the power and area reduc-



Figure 2: Path delay distributions of circuits (a) without critical path isolation, (b) with conventional isolation, and (c) with proposed isolation.

tion attained in the conventional circuit.

From this sense, we need to find a better trade-off relation between the timing error occurrence probability and power/area and identify the best point that satisfies TTF requirement and/or power/area restriction. In the conventional CPI circuits in [4], timing slack is given to as many paths as possible as shown in Fig. 2(b), and hence the area overhead is significant. To mitigate the area overhead of isolation while keeping its effectiveness for power saving, we focus on the paths that actually contribute to timing errors and isolate only those paths as shown in Fig. 2(c). The proposed isolation methodology will be explained in Section 3.

# 2.2 Related Work

In this subsection, we introduce two related works. The first work is CRISTA (CRitical path ISolation for Timing Adaptiveness) [4], which is the only publication that utilizes the concept of critical path isolation as far as the authors know. The second work is a stochastic timing error rate estimation method for MTTF calculation [5] [6].

#### 2.2.1 CRISTA

Ghosh et al. proposed a design paradigm called CRISTA, which improves robustness with respect to timing failure and provides the opportunity for aggressive voltage scaling by critical path isolation. The key points of CRISTA [4] are :

- isolate critical paths so that possible timing errors occurring at lower supply voltage under single-cycle operation can be avoided by a two-cycle operation.
- 2) mitigate the occurrence of the two-cycle operations by reducing the activation probability of critical paths.
- increase the timing margin of non-critical paths by logic synthesis.

Reference [4] experimentally showed that circuits with CRISTA achieved an average of 60% improvement in power with 18% area overhead. However, the effectiveness of CRISTA for power saving is evaluated only under static manufacturing variability, and dynamic variability, such as environmental fluctuation and aging, are not considered. When the design and operational margins are minimized, we need to pay attention to dynamic variation as well for ensuring device lifetime. Otherwise, the device can easily fail in field.

On the other hand, in this paper, we propose a MTTF-aware design methodology that uses critical path isolation for power reduction taking into account the failure probabilities of individual gates. The isolated circuits are evaluated in terms of circuit performance and MTTF under manufacturing variability, supply voltage noise and NBTI aging effect.

#### 2.2.2 Stochastic Timing Error Rate Estimation Method

Iizuka et al. proposed a stochastic timing error rate estimation method [5] [6] that calculates circuit MTTF quickly. A naive approach to calculate the TTF is to execute gate-level simulation repeatedly. However, the probability that actual timing errors occur is quite low<sup>1</sup>, and hence, the simulation time required to reproduce these errors is prohibitively long. For example, when we evaluate timing error rate that occurs once per one month, the simulation time exceeds  $10^8$  years [5]. To overcome this problem, [5] models circuit operation under dynamic delay variation as a continuoustime Markov process. The continuous-time Markov process modeling enabled us to estimate the MTTF in a reasonable time. In a test case, MTTF is estimated  $10^{12}$  times faster than a logic simulator [5]. Moreover, [6] extended the state assignment of the continuoustime Markov process to accommodate within-die random variation. This error rate estimation method is suitable for computing TTF taking into account both static and dynamic variability. Therefore, this work uses the MTTF calculation method in [6] to estimate MTTF of the CPI circuits obtained by the proposed design methodology.

#### **2.3 Problem Formulation**

The concept of critical path isolation is explained using the path delay distribution depicted in Fig. 2, but the path-based design optimization for CPI circuits is not efficient since the number of paths in a circuit is huge. Instead, we choose FF-based design optimization for CPI circuits. Fig. 3 explains two-step FF-based critical path isolation; (1) increase setup time of the target *i*-th FF by  $\Delta setup_i$  artificially and re-synthesize the design as an ECO (engineering change order) process, and (2) restore the setup time for successive analysis process. With this FF-based isolation, we enforce the paths ending at the target FF to have the slack of more than  $\Delta setup_i$ . Note that if there are intrinsic critical paths whose path delays cannot be shortened, such paths cannot have the slack of more than  $\Delta setup_i$ . After the critical path isolation, the circuit area increases since conventional designs exploit such slacks for area reduction. CPI circuits have more timing margin but larger area.

An important observation in this work is that all the FFs do not have the same contribution to MTTF. Fig. 4 shows failure probabilities, i.e., timing error occurrence probabilities, of FFs in an OR1200 OpenRISC processor, which will be used in our experiments. We evaluated them by calculating the joint probability of timing violation probability and activation probability. Fig. 4 shows that several FFs have high failure probabilities, which dominantly determine the MTTF. This result motivates us to smartly select a small number of target FFs that really impact the MTTF. In this case, the area overhead of critical path isolation can be mitigated.

Based on the discussion above, we formulate the problem of CPI circuit design as follows.

#### • Objective



Figure 3: FF-based critical path isolation.



Figure 4: Failure probabilities of FFs are largely different.

– Minimize : 
$$V_{dd}$$

• Constraints

$$-MTTF \ge MTTF_{min}$$

$$- Area \leq Area_{max}$$

• Variables

 $-\Delta setup_i \quad (1 \le i \le N_{FF})$ 

The objective of this problem is to minimize  $V_{dd}$  aiming at power minimization while satisfying the constraints of MTTF ( $MTTF_{min}$ ) and circuit area ( $Area_{max}$ ). The variables  $\Delta setup_i$  are the additional slacks given to FFs, where  $\Delta setup_i$  is given to the ECO resynthesis as an intentional increase in setup time of *i*-th FF<sub>i</sub>.  $N_{FF}$ is the number of FFs in the circuit. When  $\Delta setup_i=0$ , *i*-th FF<sub>i</sub> is not included in the set of target FFs. Thus, the number of target FFs  $N_{CPI}$  is expressed as the number of FFs whose  $\Delta setup_i$  is larger than 0. Here, MTTF depends on  $\Delta setup_i$  and  $V_{dd}$ , and this relation is evaluated by the stochastic error rate estimation method explained in the previous subsection. Area depends on  $\Delta setup_i$ , and it is given by the synthesis tool after ECO re-synthesis.

#### 3. PROPOSED DESIGN METHODOLOGY

In this section, we propose a design methodology to solve the problem described in the previous section.

#### 3.1 Overview

A difficulty to solve the formulated problem is the non-linear relations among MTTF, Area and  $\Delta setup_i$ . In addition, the evaluations of MTTF and Area need relatively long CPU time, and

<sup>&</sup>lt;sup>1</sup>Otherwise, such circuits with frequent error occurrence are useless.



Figure 5: Example to select target FFs.

hence an explicit optimization is not efficient in terms of CPU time. Thus, we take the following approach.

For various numbers of  $N_{CPI}$ , we determine the set of target FFs and their  $\Delta setup_i$  aiming at MTTF maximization. Here, we are expecting that a circuit with longer MTTF has larger room for  $V_{dd}$  reduction and  $N_{CPI}$  is related to the area increase. Then, for each set of  $\Delta setup_i$ , we perform ECO re-synthesis to obtain Area and evaluate the trade-off relation between  $V_{dd}$  and MTTF using the stochastic error rate estimation method. From the evaluation results, we find the set of  $\Delta setup_i$  that minimizes  $V_{dd}$  while satisfying the constraints of MTTF and Area.

In this approach, for the given number of  $N_{CPI}$ , we need to select  $N_{CPI}$  FFs and determine  $\Delta setup_i$  of the selected FFs. Section 3.2 explains how to select  $N_{CPI}$  target FFs, and Section 3.3 presents how to determine  $\Delta setup_i$ .

#### **3.2 Target FF Selection**

First, we propose a selection method of target FFs aiming at MTTF maximization. The proposed selection method chooses a set of target FFs that maximumly reduces the sum of gate-wise failure probabilities. Fig. 5 shows a simple example, where the circuit is composed of ten combinational logic cells and four FFs. The numbers attached to each gate are the gate-wise failure probabilities, where their computation is explained later. Let us suppose  $N_{CPI} = 2$ .

When the slack times of FF2 and FF4 are increased, the slack times of L1, L3, L4, L5, L6, L7, L9 and L10 are also increased. In this case, even if a delay variation occurs at one of L1, L3, L4, L5, L6, L7, L9 and L10, the variation might be concealed by the increased slack. The expected probability of error reduction corresponds to the sum of gate-wise failure probabilities and it is 0.21 (= 0.02 + 0.02 + 0.02 + 0.03 + 0.03 + 0.03 + 0.03 + 0.03 ). On the other hand, if we choose FF1 and FF2 like Fig. 5(b), the slack times of L1, L2, L3 and L4 are increased. In this case, the reduced failure probability is 0.08 ( $= 0.02 \times 4$ ) and this amount of reduction is smaller than the previous one. In this case, TTF tends to be shorter.

We formulate this FF selection problem as an ILP (Integer Linear Programming) problem to derive the exact solution. Our ILP formulation is as follows:

Objective

- Maximize :  $\sum_{k=1}^{N_{inst}} (inst_fail_k \times inst_k)$ 

Constraints

$$- 0 \leq inst_k \leq 1 \quad (1 \leq k \leq N_{inst}) \\ - 0 \leq FF_i \leq 1 \quad (1 \leq i \leq N_{FF}) \\ - \sum_{i=1}^{N_{FF}} FF_i \leq N_{CPI} \\ - inst_k < \sum_{i=1}^{N_{FF}} (FF_i \times FF \ inst_{i,k})$$

Variables

- 
$$FF_i$$
  $(1 \le i \le N_{FF})$ 

The number of instances in the circuit is  $N_{inst}$ . The objective of this ILP problem is to maximize the sum of  $(inst_fail_k \times inst_k)$ , where  $inst_fail_k$  is the gate-wise failure probability and it means how much k-th instance contributes to timing error, and where  $inst_k$  is a binary variable and it becomes 1 when k-th instance is included in the paths ending at the target FFs for isolation. Therefore, the sum of  $inst_fail_k \times inst_k$  represents the gate-wise failure probability reduction. In this problem, we assign binary variables  $FF_i$ , where  $FF_i$  becomes 1 when *i*-th FF is included in the set of target FFs for isolation.

The first and second constraints are given to restrict  $inst_k$  and  $FF_i$  to binary numbers. The third constraint means the number of target FFs for isolation should be equal or less than  $N_{CPI}$ . The fourth constraint is a key constraint that defines the relation between  $inst_k$  and  $FF_i$ .  $FF\_inst_{i,k}$  is a binary constant which is determined by the circuit topology, and it becomes 1 when k-th instance is included in the paths ending at *i*-th FF. The product term of  $FF_i \times FF\_inst_{i,k}$  becomes 1 when both  $FF_i$  and  $FF\_inst_{i,k}$  are 1.  $inst_k$  becomes 0 only when the product of  $FF_i$  and  $FF\_inst_{i,k}$  is 0 for all the FFs. On the other hand, if k-th instance is included in the paths ending at target FFs, at least one of the products of  $FF_i$  and  $FF\_inst_{i,k}$  become 1. In this case,  $inst_k$  can be 1. In this ILP formulation, we are maximizing the sum of  $(inst\_fail_k \times inst_k)$  and hence  $inst_k$  is necessarily assigned to be 1.

The remaining issue is  $inst_fail_k$  calculation. When we calculate MTTF according to [6], the timing failure probabilities at individual FFs,  $FF_fail_i$ , are computed. Therefore, we calculate  $inst_fail_k$  using  $FF_fail_i$  as follows.

$$st_fail_k = \max\{\frac{FF_fail_i}{\sum_{k=1}^{k_{max}} (FF_inst_{i,k})}\} (1 \le i \le N_{FF}).$$
(1)

The above equation assumes that each instance that is included in the fan-in cone of  $FF_i$  has the same contribution to timing error, and hence the  $FF_fail_i$  is divided by the number of instances in the fan-in cone of  $FF_i$ . An instance can be included in the fanin cones of multiple FFs. For coping with this, max operation is performed in Eq. (1).

#### **3.3** $\triangle setup_i$ determination

in

Next, we determine  $\Delta setup_i$  for the set of target FFs selected in the previous subsection. Fig. 6 exemplifies the relation between the actual slack increase at *i*-th FF and  $\Delta setup_i$ . We can see the actual slack increase has an upper bound  $\Delta slack_i^{UB}$  since we cannot reduce the circuit delay unlimitedly. For simplicity, the proposed method sets  $\Delta setup_i$  to  $\Delta slack_i^{UB}$ . Besides, the optimal value of  $\Delta setup_i$  could be between 0 and  $\Delta slack_i^{UB}$ . Exploring intermediate values is one of our future works.



Figure 6: Relation between actual timing slack after resynthesis and increased setup time  $\Delta setup$ .

# 4. EXPERIMENTAL RESULTS

#### 4.1 Evaluation Setup

In this work, we used OR1200 OpenRISC processor, which is a 32-bit RISC microprocessor with five pipeline stages, as a target circuit for critical path isolation. The processor was designed by a commercial logic synthesizer with a 45nm Nangate standard cell library and SRAM whose delay is 1.7ns at 0.9V. The SRAM delay dependence on supply voltage is similar to that of a buffer cell. The synthesized circuit includes 24,000 standard cells and 2,500 FFs. Thus,  $N_{inst}$  is 21,500 and  $N_{FF}$  is 2,500. We used Gurobi Optimizer 6.5 to solve the ILP problem defined in the previous section. The solver was executed on a 2.4 GHz Xeon CPU machine under the Red Hat Enterprise Linux 6 operating system with 1024 GB memory. The required CPU time was at most 0.05 seconds.

To calculate meaningful MTTF, practical delay variations should be considered. Our evaluation considers the following variations.

- Dynamic supply noise
- Intra-die + inter-die manufacturing variability
- NBTI aging effect

The detailed information on those variations is described in Appendix.

As for workload, we selected three benchmark programs (CRC32, SHA1 and Dijkstra) from MIBenchmark [7] and 30 sets of input data for each program for MTTF estimation. Totally, we used 90 (=  $3 \times 30$ ) workloads.

With this setup, we evaluated the MTTF of the pre-isolated circuit whose cycle time was 2.1 ns. In the MTTF calculation, we met cases where no timing error occurred, i.e., MTTF is  $\infty$ . To make this infinity MTTF visible in figures, we plotted the infinity MTTF as the MTTF of  $1.00 \times 10^{17}$  cycles. Fig. 7 shows the MTTF at various supply voltages. The MTTF at 1.2V was  $1.00 \times 10^{17}$  cycles, which corresponds to 3.3 years. We set MTTF of  $1.00 \times 10^{16}$  cycles, i.e., 8.0 months as  $MTTF_{min}$  in the problem formulation. As for the area constraint,  $Area_{max}$  was set to 101.5%, 103% and 103.5% of the initial circuit.  $Slack_{min}$  was 1.1ns.

#### 4.2 Evaluation Results

We designed 3 CPI circuits with the proposed methodology, where  $N_{CPI}$  was set to 10/20/30. Fig. 8 shows the relation between the





Figure 8: Area overhead due to critical path isolation.

number of target FFs for isolation  $N_{CPI}$  and area overhead. As  $N_{CPI}$  increases, the area overhead, which is the area increase from the initial circuit, becomes larger. When applying the proposed methodology, the area overhead exceeds 3.5% of the initial circuit area in case of  $N_{CPI} > 30$ . If the area overhead is restricted to 3% or 1.5%,  $N_{CPI}$  must be less than or equal to 20 or 10, respectively.

For each CPI circuit, we calculated MTTF. We gave nine supply voltages from 1.2V to 0.85V with 0.05V interval, and fixed cycle time to 2.1 ns. Fig. 9 shows the MTTF of CPI circuits generated with the proposed methodology. From Fig. 9, we can see that the CPI circuit with  $N_{CPI} = 10$  achieved the same MTTF at 0.9V with that of the initial circuit at 1.2V, i.e.  $MTTF_{min}$ . In other words, the supply voltage can be reduced from 1.2 V to 0.9 V by 25.0 % without MTTF degradation. This 25% supply voltage reduction corresponds to 44% dynamic power reduction. The impact of the critical path isolation on the power reduction is significant.

Next, let us compare MTTFs of the pre-isolated and CPI circuits at 0.9V. With the proposed critical path isolation of  $N_{CPI} = 10$ , the MTTF is improved from  $1.38 \times 10^2$  cycles to  $1.00 \times 10^{17}$  cycles. The MTTF improvement ratio is  $7.24 \times 10^{14}$ .

Thus, the power reduction and MTTF improvement thanks to critical path isolation are remarkable while the area overhead is a few percent. The longer MTTF means fewer timing errors in field, which is also desirable for resilient circuit designs, such as Razor [8] and TRC (Tunable Replica Circuit) [9], and error prediction technique, e.g., TEP-FF (Timing Error Prediction Flip-Flop) [10]. With the critical path isolation, the power dissipation of such resilient circuits could be reduced further and/or the reliability can be improved.

# 4.3 Comparison

Next, we compare the proposed methodology with the following four approaches.

- C1: Choose FFs using ILP so that the number of non-timingcritical cells is maximized.
- C2: Choose FFs for isolation in ascending order of slack time.
- **C3:** Choose FFs for isolation in descending order of activation probability.
- C4: Choose FFs for isolation in descending order of failure probability.

The first approach of C1 thinks that maximizing the number of instances whose slack times are increased by isolation is highly related to MTTF extension. The main difference between our proposed and C1 approach is that C1 does not consider the failure probability of selected FFs. The second approach of C2 thinks that timing critical FFs are most likely to cause timing error. This approach needs only STA (or SSTA) timing reports and hence this approach is easier to adopt. The third approach of C3 places importance on activation probability. Even if some FFs are timing critical, timing errors never occur as long as they are not activated. To calculate the activation probability, we need to perform logic simulation with prospective workloads or to calculate signal transition rates mathematically (e.g. [11]). The fourth approach of C4 is the hybrid one that combines the second and fourth approaches. We calculate failure probability, which is defined as the joint probability of timing violation probability and activation probability.

In this paper, we calculated the timing violation probability by performing Monte Carlo SSTA(Statistical Static Timing Analysis) and derived the activation probability of each path by associating the signal transition time in logic simulation and the path delay in STA as shown in Fig. 10.

Fig. 11 shows the MTTF comparison between the CPI circuits of the proposed methodology and C1, C2, C3 and C4 approaches, where  $N_{CPI} = 10$ . From this figure, we can see that the proposed methodology attained the best trade-off relation between MTTF and supply voltage and achieved the largest supply voltage reduction without MTTF degradation. The supply voltage reduction of the proposed methodology was 25.0%, whereas those of C1, C2, C3 and C4 were 8.3%, 0%, 8.3% and 12.5%, respectively.

Fig. 12 summarizes the achieved supply voltage reduction with the proposed methodology and comparative approaches of C1 to C4. Area<sub>max</sub> was set to 101.5%, i.e.,  $N_{CPI} = 10$ , 103%, i.e.,  $N_{CPI} = 20$  and 103.5%, i.e.,  $N_{CPI} = 30$ . Fig. 12 shows that the proposed methodology achieved the largest supply voltage reduction in each case that  $N_{CPI} = 10$ , 20 and 30. It should be noted that C4 approach also achieved the largest supply voltage reduction in case of  $N_{CPI} = 30$ .



Figure 9: MTTF of CPI circuits (Proposed).

# 4.4 Discussion

The above results showed that the CPI circuits designed with the proposed methodology attained the largest supply voltage reduction. Let us investigate its reason. Fig. 13 shows the path delay distributions of the proposed, C4 and pre-isolated circuits. This figure also shows whether each path is contributing timing errors similarly to Fig. 2. Fig. 13 indicates that the proposed methodology significantly reduces the number of paths actually contributing to timing errors from 144 paths to 13 paths. Fig. 14 shows the comparison of failure probability of individual FFs. We can see that the proposed CPI circuit reduces the failure probability the most, i.e., it achieved the lowest failure probability and consequently the highest MTTF. Furthermore, the proposed methodology reduces the number of FFs contributing to timing errors the most, where there are 12 FFs in the proposed CPI circuit while there are 95 FFs in the pre-isolated circuit. This reduction not only contributes to MTTF extension but also facilitates sensor-based adaptive circuits, such as TRC [9] and TEP-FF [10] based designs.

Next, we discuss the importance of activation probability information comparing with C2 and C4. From Fig. 12, we can see that



Figure 10: Failure probability calculation.





Figure 12: Achieved supply voltage reduction.  $Area_{min}$  was set to 101.5% ( $N_{CPI} = 10$ ), 103% ( $N_{CPI} = 20$ ) and 103.5% ( $N_{CPI} = 30$ ) of the initial circuit area.







Figure 15: Relation between path slack and activation probability. Each dot corresponds to a path in the pre-isolated circuit.

C4 has higher capability of supply voltage reduction than C2, e.g., 25.0% versus 0% in case of  $N_{CPI} = 30$ . This difference comes from a fact that paths whose timing slacks are small are not necessarily activated frequently. Fig. 15 shows such a tendency. X-axis is the path timing slack and Y-axis is the path activation probability. Each dot corresponds to a path in the pre-isolated circuit. We can see that plots are scattered, which means the path timing slack and the path activation probability are not correlated much. Therefore, exploiting the path activation probability, C4 achieved 25% supply voltage reduction. Although the information on path activation is worth obtaining.

## 5. CONCLUSION

This paper proposed a design methodology that exploits critical path isolation for chip time to failure (TTF) extension and/or lower voltage operation. The proposed methodology selects a set of FFs for FF-based isolation using ILP so that it reduces the sum of gatewise failure probability. We evaluated MTTF (Mean Time To Failure) of circuits with/without critical path isolation and examined how much supply voltage can be reduced without MTTF degradation. Evaluation results show that the circuits whose critical paths are isolated by the proposed methodology achieved 25% supply voltage reduction with 1.4% area overhead. At the same voltage operation, MTTF was improved by 14 orders of magnitude.

# ACKNOWLEDGEMENT

This work is supported by STARC and ICOM Foundation, Japan.

## 6. **REFERENCES**

- B. Zang, M. Orshansky, "Modeling of nbti-induced pmos degradation under arbitrary dynamic temperature variation," *Proc. ISQED*, pp.774–779, 2008.
- [2] T. Wang, and Q. Xu, "On the simulation of NBTI-Induced performance degradation considering arbitrary temperature and voltage variations," *Proc. DAC*, pp.1–6, 2014.
- [3] W. Wang, S. Yang, S. Bhardwaj, R. Vattikonda, S. Vrudhula, F. Liu, and Y. Cao, "The Impact of NBTI on the Performance of Combinational and Sequential Circuits," *Proc. DAC*, pp.364–369, 2007.
- [4] S. Ghosh, S. Bhunia, and K. Roy, "CRISTA: A New Paradigm for Low-Power, Variation-Tolerant, and Adaptive Circuit Synthesis Using Critical Path Isolation," *IEEE Trans. CAD*, vol.26, no.11, pp.1947–1956, Nov. 2007.
- [5] S. Iizuka, M. Mizuno, D. Kuroda, M. Hashimoto, and T. Onoye, "Stochastic error rate estimation for adaptive speed control with field delay testing," *Proc. ICCAD*, pp.107–114, 2013.
- [6] S. Iizuka, Y. Masuda, M. Hashimoto, and T. Onoye, "Stochastic Timing Error Rate Estimation under Process and Temporal Variations," *Proc. ITC*, 2015.
- [7] M.R. Guthaus, J.S. Ringenberg, D. Ernst, T.M. Austin, T. Mudge, and R.B. Brown, "MiBench: A free, commercially representative embedded benchmark suite," *Proc. Workload Characterization*, pp.3–14, 2001.
- [8] S. Das, D. Roberts, L. Seokwoo, S. Pant, D. Blaauw, T. Austin, K. Flautner, and T. Mudge, "A self-tuning DVS processor using delay-error detection and correction," *IEEE Journal Solid-State Circuits*, vol.41, pp.792–804, 2006.
- [9] K. A. Bowman, J. W. Tschanz, S. L. Lu, P. A. Aseron, M. M. Khellah, A. Raychowdhury, B. M. Geuskens, C. Tokunaga, C. B. Wilkerson, T. Karnik, and K. D. Vivek, "A 45nm Resilient Microprocessor Core for Dynamic Variation Tolerance," *IEEE Journal Solid-State Circuits*, vol. 46, no. 1, 2011.
- [10] H. Fuketa, M. Hashimoto, Y. Mitsuyama, and T. Onoye, "Adaptive Performance Compensation With In-Situ Timing Error Predictive Sensors for Subthreshold Circuits," *IEEE Trans. VLSI*, vol. 20, no. 2, pp. 333-343, 2012.
- [11] F. N. Najm, "Transition density: a new measure of activity in digital circuits," *IEEE Trans. CAD*, vol. 12, no. 2, pp. 310–323, Feb 1993.
- [12] B. J. Velamala, K. B. Sutaria, H. Shimizu, H. Awano, T. Sato, G. Wirth, and Y. Cao, "Compact Modeling of Statistical BTI Under Trapping/Detrapping," *IEEE Trans. ED*, vol.60, no.11, pp.3645–3654, 2013.
- [13] H. Awano, M. Hiromoto, and T. Sato, "Variability in device degradations: Statistical observation of NBTI for 3996 transistors," *Proc. ESSDERC*, pp.218–221, 2014.

# APPENDIX

This appendix details the setup of MTTF evaluation in terms of variations.

# A. SUPPLY VOLTAGE NOISE AND MAN-UFACTURING VARIABILITY

The supply voltage is assumed to fluctuate between -50mV and 50mV by 10mV with eleven steps. Here, a stochastic fluctuation model expressed as a Markov chain in Fig. 16 is used referring to [6]. The transition rate of supply voltage fluctuation was set to 0.001 (p = 0.001).

The manufacturing variability is assumed to consist of the following intra-die random variation and inter-die variation. The intradie random variation includes NMOS and PMOS threshold voltage variations of  $\sigma = 30$ mV and gate length variations of  $\sigma = 2$ nm. As for the inter-die variations,  $\sigma = 30$ mV and  $\sigma = 1$ nm are given to threshold voltage variations and gate length variation, respectively.

### **B.** NBTI

NBTI aging effect is assumed to be expressed as a Markov process depicted in Fig. 17. Threshold voltage degradation is expressed as state transitions. Here, we explain how we derived the model parameters from the measurement data reported in [13]. The derivation was carried out with the following three steps; (1) define discrete degradation states, (2) fit the measurement data to an NBTI model expression and (3) calculate transition rates between degradation states.

The first step is the state definition, and we defined seven degradation states of 0mV, 0.5mV, 1mV, 5mV, 10mV, 15mV and 20mV. Next, we analyze the measured data in [13]. The threshold voltage degradation of 666 transistors under a particular stress voltage over some discrete stress times is available. On the other hand, we need to know the threshold voltage degradation under various supply voltages at any stress time to construct the model. For this purpose, we selected Trapping/Detrapping model [12] below for fitting, which is one of the state-of-the-art physical models representing NBTI aging effect.

$$\Delta V_{th}(t) = X e^{V_g} + Y e^{V_g} \log(1 + Zt), \tag{2}$$

where  $\Delta V_{th}(t)$  represents threshold voltage degradation,  $V_g$  is stress voltage, and t is total stress time. X, Y and Z are fitting parame-



Figure 16: Stochastic model of dynamic supply noise.



Figure 17: Stochastic NBTI model used in evaluation.

ters. For each transistor of 666 transistors in [13], we obtained a set of fitting parameters and calculated the elapsed time to arrive each degradation state, i.e., each threshold voltage degradation, at each operating voltage from 1.2 V to 0.85 V.

Lastly, we calculated transition rates between degradation states. Suppose  $t_i$ ,  $p_i$   $(i = 1, \dots, 7)$  and T denote time interval between degradation states i and i + 1, transition rate from degradation state i to i + 1 and cycle time, respectively. In degradation state i, transition rate to state i + 1 is equal to the probability that total amount of staying time in state i exceeds  $t_i$  after T. Here, if an event which elapses T occurs  $\frac{t_i}{T}$  times, the total amount of staying time exceeds  $t_i$ , and hence we can define relation between  $t_i$  and  $p_i$  as the following equation.

$$p_i = \frac{T}{t_i} \tag{3}$$

In this work, we calculated transition rate according to Eq.(3) with  $T = 2.1 \times 10^{-9}$  s. This transition rate calculation was performed at each supply voltage. Using these voltage-dependent models, we can consider the effect that the aging process proceeds faster at higher supply voltage in the MTTF evaluation.