# Stochastic Verification of Run-time Performance Adaptation with Field Delay Testing

Masanori Hashimoto

Dept. Information Systems Engineering, Osaka University, Japan

Abstract—Run-time performance adaptation with field delay testing is a promising approach for minimizing design margin while sustaining necessary operational margin in the field. However, run-time performance adaptation has not been adopted in industrial designs since a serious concern on timing error occurrence exists. For putting the run-time performance adaptation in a practical use, we need to verify and optimize the run-time adaptation system in design time. For this purpose, we have developed a stochastic framework for error rate estimation that models adaptive speed control as a continuous-time Markov process. In this paper, we evaluate MTTF and power consumption of an embedded processor whose performance is adaptively controlled with online testing and offline testing. This evaluation quantitatively shows the power reduction and MTTF improvement thanks to run-time performance adaptation.

#### I. INTRODUCTION

Circuit delay fluctuation due to PVT variation is becoming more and more significant. In addition, unexpected timing error can occur due to aging effects, such as NBTI. To avoid timing errors, circuits are usually designed with guardbanding. However, large timing margin makes timing closure difficult and involves an increase in area and power dissipation. Moreover, a supply voltage which is higher than the necessary and sufficient voltage becomes necessary, which results in wasteful power dissipation.

To overcome this problem, adaptive speed control system is studied in which each chip self-adjusts its operating condition, such as supply voltage and body bias, accompanied with timing self-test [1], [2]. Generally, adaptive speed control is performed so that no paths have timing violations. On the other hand, voltage over-scaling, which accepts rare timing errors for pursuing aggressive power reduction, is also studied [3], [4]. Path activation probability heavily depends on the running program on a processor, and in some cases, significant power reduction can be achieved by exploiting this property.

To adopt run-time adaptive speed control, each circuit needs to regularly perform online or offline test to check whether the current circuit performance satisfies the speed specification. On the other hand, run-time adaptive speed control cannot completely eliminate timing errors due to unexpected delay increase, which is applicable to not only voltage over-scaling but also ordinary voltage scaling. In addition, biased circuit operation might mislead speed control in case of online test, and limited number of test patterns for offline test could miss timing errors. Meanwhile, the occurrence frequency of timing errors can be changed by design parameter modification, and long MTTF (mean time to failure), such as a year, is supposed to be obtained via parameter optimization. However, this timing error occurrence is very difficult to evaluate in design time, since simulation is too slow for rare errors, such as an error per month. For enabling such design parameter optimization, we have developed a stochastic error rate estimation method [6]. The necessary computation time was reduced by twelve orders of magnitude, which can guide design optimization of run-time adaptive system.

This paper investigates and compares tradeoffs between MTTF and power dissipation for different test strategies with the proposed stochastic error rate estimation method. In addition, we show a case study for exemplifying the power reduction thanks to voltage over-scaling.

## II. RUN-TIME SPEED ADAPTATION

## A. Online Test Based Adaptation using TEP-FF

Figure 1 shows a circuit that adaptively controls the speed and power dissipation using a warning signal generated by a timing-error predictive (TEP) FF [5]. The TEP-FF consists of a normal flip-flop, a delay buffer and a comparator (XOR gate). When the timing margin is gradually decreasing, a timing error occurs at the TEP-FF before the main FF captures a wrong value due to the delay buffer, which enables us to know that the timing margin of the main FF is not large enough. A warning signal is generated to predict the timing errors, and it is monitored during a specified period. Note that timing errors are predicted, not detected, which is a distinct difference from Razor [3]. Once a warning signal is observed, the circuit is controlled to speed up, in other words, the circuit delay is reduced by voltage scaling and/or body biasing. Clock frequency is supposed to be fixed throughout this paper. If no warning signals are observed during the monitoring period, the circuit is slowed down for power reduction. This proactive speed control overcomes the variation of the timing margin which is different chip by chip and varies depending on operating condition and aging.

Even when the TEP-FF is well configured to generate the warning signal, the error occurrence cannot be reduced to zero. This is because when critical paths are not activated for a long time in the circuit operation, the circuit might be slowed down excessively. If a critical path is activated in this condition, a timing error happens.

To reduce the error occurrence, we can tune the following design parameters; the number of TEP-FFs, locations where TEP-FFs should be inserted, delay time of the delay buffer in each TEP-FF, monitoring period and fineness of the speed control [5], [7].



Fig. 1. Run-time adaptive speed control with TEP-FF.



Fig. 2. Run-time adaptive speed control with offline test.

#### B. Offline Test Based Adaptation

We next explain an adaptive speed control system that repeatedly performs delay test in idle times of the circuit (Fig. 2). While the circuit is idle, test patterns, which can be for scan test or SBST (software-based self-test), that were prepared beforehand and stored in an internal or external memory are loaded and it is checked if the circuit includes timing-violating paths or not. When a timing-violating path is detected, the minimum speed level that includes no timingviolating paths is selected for the operation in the following. Otherwise, the speed level is decremented. The scan test has higher freedom of applicable test patterns, and hence accurate error detection, in other words, lower missing rate of timingviolating paths can be expected.

Here, there are two strategies for test execution. One strategy forces the circuit to be idle with a fixed time interval, which can guarantee the time interval between the delay tests. This strategy is helpful to make the timing error rate predictable in addition to mitigating the error rate. A drawback is the performance degradation due to the test, and in some real-time systems, this strategy could be difficult to adopt. The other strategy is to perform offline tests only in true idle time. While the performance degradation does not arise, the test interval is less predictable and consequently the error rate tends to be higher. We in this paper assume the first strategy having the fixed time interval of delay test.

In this offline test based adaptation, the test interval is a key parameter to determine the rate of timing error occurrence. If it is set to be long, the delay fluctuation in the duration of successive delay tests is likely to be large enough to cause timing violations. For mitigating the timing errors, the test interval should be short. On the other hand, frequent tests induce performance overhead and degrade the system throughput. To reduce the error occurrence while coping with the overhead, we need to carefully tune the test interval and the number of test patterns.



Fig. 3. Overview of Stochastic Error Rate Estimation.

### **III. STOCHASTIC VERIFICATION**

As discussed in the previous section, timing errors cannot be completely eliminated in the circuits with adaptive speed control. Researchers working for any types of adaptive speed control claim that by tuning some design parameters the possibility of timing error occurrence can be reduced to almost zero and the mean time to failure (MTTF) over years can be easily attained with some overhead. For example, delay test should be more frequently carried out, or earlier error prediction should be enforced. However, it is challenging to quantitatively estimate such long MTTF and extremely low probability of error occurrence. A naive simulation is totally impractical since one year operation of a processor, for example, includes  $3 \times 10^{16}$  cycles, and to get 10,000 samples,  $3 \times 10^{20}$  cycles must be simulated. With a logic simulator processing  $3 \times 10^3$  cycles per second, it takes  $3 \times 10^9$  years, and hence another approach instead of naive simulation is indispensable.

For such a purpose, we have developed a stochastic estimation method of timing errors instead of simulation [6]. The proposed method, which is illustrated in Fig. 3, models the adaptive speed control under dynamic delay variation as a continuous-time Markov process, and stochastically estimates MTTF. Given a matrix of transition rates between states, the MTTF can be calculated via matrix computations and its calculation time is independent of how long MTTFs are and how rarely the timing error happens, which is an excellent property for evaluating a long-MTTF circuit operation. To construct the transition rate matrix, we developed a similarity database and a direct derivation method of the matrix using the database. Thanks to this development, the proposed method computes MTTF  $10^{12}$  times faster than a logic simulator in a test case.

### IV. EXPERIMENTAL RESULTS

This section shows how MTTF and power consumption depends on design parameters, and gives a comparison between online and offline test.

# A. Setup

We used an MIPS R3000 microprocessor, which had 5stage pipeline and 32-bit RISC instruction set, as a target of adaptive speed control. We synthesized an RTL description into a gate-level netlist with a commercial logic synthesizer and an industrial 65nm standard cell library. The number of standard cells is 6,813. The maximum clock frequency at 1.2V and 25°C is 147MHz, which corresponds to the critical path



Fig. 4. Relation between MTTF and time constant of delay fluctuation.

of 6.8ns. Ten speed levels, i.e. ten supply voltages (1.2V, 1.1V, 1.0V, 0.90V, 0.85V, 0.80V, 0.75V, 0.70V, 0.65 and 0.60V) could be selected in the adaptive speed control.

Offline test needs test patterns, and here we explain the pattern preparation for scan test and SBST. The target fault model was path delay fault. The paths under test were selected in the following three steps. We first selected 10 longest paths for each end point (i.e. FF) with a commercial timing analyzer. We next got them together and sorted them in terms of timing slack. Finally, we selected 20,000 most timing-critical paths in total.

A commercial ATPG tool was used for scan test pattern generation. All the 1,777 FFs were replaced with scan-FFs, and they composed a single scan path. LoC (Launch on Capture) was assumed.

As for SBST, we used a genetic algorithm for generating an instruction sequence. We used test coverage of the generated instruction sequence as a metric of fitness function, where the test coverage was computed by fault simulation with a commercial tool. We adopted one-point crossover, where two sequences were split at a random place and the split sequences were exchanged. To introduce new instructions which were not included in the current population, we generated a random sequence at the rate of 20% as a mutation. The number of population was 30, the number of generated solutions at each generation was 60, and the number of generation was 300.

## B. Dependence of MTTF on parameters

Given the stochastic error estimation method, we can estimate MTTF on the fly for exploring the design parameter space. Here, dependences of MTTF on design parameters are exemplified.

We first varied the time constant of temporal delay fluctuation and evaluated MTTF of adaptive speed control with TEP-FF. Detailed explanation of the delay fluctuation model and the time constant is found in [6]. Figure 4 shows the relation between MTTF and the time constant. We can see that MTTF becomes longer for larger time constant of delay fluctuation. This means less frequent delay fluctuation attains longer MTTF and is consistent with our intuitive understanding.

We next varied the delay value of TEP-FF from 100ps to 4ns and evaluated MTTF. From now, the time constant of delay fluctuation was fixed to 0.01 s. Figure 5 shows the result. When the delay value was set to 300 ps or less, MTTF suddenly



Fig. 5. Relation between MTTF and delay value of TEP-FF.



Fig. 6. Relation between MTTF and monitoring time.



Fig. 7. Relation between MTTF and scan test interval.

became shorter. In this case, timing error prediction was not working well. On the other hand, even when the delay value was set to more than 3 ns, MTTF was not improved. This suggests that under the assumed delay fluctuation, MTTF of  $10 \times 10^9$  cycles cannot be achieved by adjusting only the delay value of TEP-FF.

Figure 6 shows the relation between MTTF and the monitoring time before voltage downscaling. Remind that the circuit is slowed down for power reduction in case that no warning signals are observed during the monitoring period. The monitoring time was varied from 100k to 100M cycles. As the monitoring time becomes longer, MTTF is extended.

We next evaluated MTTF of adaptive speed control with offline scan test. Figure 7 shows MTTF in case that the time interval of scan test was varied from 1k to 100k cycles. We can clearly see that MTTF was improved as offline scan test was performed more frequently, and this tendency was more significant beyond 10k cycles. This is reasonable since the delay fluctuation occurred within the test interval tends to be



Fig. 8. MTTF and power dissipation of three adaptive speed controls.

small and timing errors become less likely to occur.

MTTF for adaptive speed control with SBST was also evaluated. The achieved MTTF was about  $4 \times 10^8$  cycles and it was not dependent on the test interval of SBST. This means that MTTF could not be improved even while SBST was performed more frequently. The quality of SBST test sequence was insufficient.

C. Comparison of adaptive speed controls with online and offline test

We compare MTTF and power consumption of the following three adaptive speed controls.

- · Adaptive speed control based on online test with TEP-FF
- Adaptive speed control based on offline scan test
- · Adaptive speed control based on offline SBST

This comparison was performed based on the evaluation results in the previous subsection, more specifically Figs. 5 and 7.

Figure 8 shows the relation between MTTF and average power consumption. The figure also includes MTTF and power consumption in cases that supply voltage was fixed to 1.2V or 0.6V. The MTTF at 1.2V is the maximum achievable MTTF in this experiment. Compared to 1.2V case, the adaptive speed control with scan test reduced power dissipation by 64% while achieving the similar MTTF. The adaptive speed control contributed to significant power reduction. On the other hand, compared to 0.6V case, the adaptive speed control improved MTTF by 6X while keeping the power consumption at the same level. In this experiment, relatively frequent large delay fluctuation, which could cause timing errors even at 1.2V, was assumed, and then MTTF was limited to 10G cycles. Future works include further comparisons under various delay fluctuation models corresponding to more realistic environmental fluctuation and aging effect. Besides, the performance of SBST was not distributed. This was due to the poor quality of test instruction sequence. More sophisticated generation method for SBST would be necessary.

Finally, we carefully examine the difference between online test with TEP-FF and offline scan test. Figure 9 is a magnified version of Fig. 8. A fundamental difference between two adaptive speed controls is that the former online test with



Fig. 9. Comparison of MTTF and power dissipation for clarifying voltage overs-scaling effect. This figure is a magnified version of Fig. 8.

TEP-FF exploits unbalanced path activation probabilities and performs voltage over-scaling, while the latter speed control with offline scan test does not allow timing violation in any paths. This effect of voltage over-scaling is found as 19% power reduction in Fig. 9 in this example.

## V. CONCLUSION

This paper presented a case study that investigated and compared MTTF and power dissipation of adaptive speed controls having different test approaches; online test with TEP-FF, offline scan test and offline SBST. Experimental results show that adaptive speed control based on online test with TEP-FF achieved six times longer MTTF without an increase in power dissipation compared to the operation at the minimum supply voltage. Adaptive speed control with offline scan test reduced power dissipation by 64% without MTTF degradation compared to the operation at the maximum supply voltage. The voltage over-scaling with TEP-FF attained up to 19% power reduction compared to the ordinary voltage scaling with scan test. Future works include more comprehensive evaluation in a wider design space with other design targets.

### **ACKNOWLEDGEMENTS**

This work was partly supposed by NEDO and STARC.

# REFERENCES

- M. Agarwal, B. C. Paul, Z. Ming, and S. Mitra, "Circuit Failure Prediction and Its Application to Transistor Aging," in *Proc. VTS*, pp.277–286, 2007.
- [2] Y. Li, S. Makar, and S. Mitra, "CASP: Concurrent Autonomous Chip Self-Test Using Stored Test Patterns," in *Proc. DATE*, pp.885–890, 2008.
- [3] S. Das, et.al., "A self-tuning DVS processor using delay-error detection and correction," *IEEE JSSC*, vol.41, pp.792–804, Apr. 2006.
- [4] D. Blaauw, et.al., "Razor II: In Situ Error Detection and Correction for PVT and SER Tolerance," in *ISSCC Dig.*, pp.400–401, 2008.
- [5] H. Fuketa, M. Hashimoto, Y. Mitsuyama, and T. Onoye, "Adaptive Performance Compensation with In-Situ Timing Error Predictive Sensors for Subthreshold Circuits," *IEEE TVLSI*, vol. 20, no. 2, pp. 333–343, Feb. 2012.
- [6] S. Iizuka, M. Mizuno, D. Kuroda, M. Hashimoto and T. Onoye, "Stochastic Error Rate Estimation for Adaptive Speed Control with Field Delay Testing," *ICCAD*, 2013.
- [7] H. Fuketa, M. Hashimoto, Y. Mitsuyama, and T. Onoye, "Trade-Off Analysis between Timing Error Rate and Power Dissipation for Adaptive Speed Control with Timing Error Prediction," *IEICE Trans. Fundamentals*, vol. E92-A, no. 12, pp. 3094–3102, Dec. 2009.