# Clock Skew Reduction by Self-Compensating Manufacturing Variability with On-chip Sensors

Shinya Abe<sup>†‡</sup> Ken-ichi Shinkai<sup>†</sup> Masanori Hashimoto<sup>†‡</sup> Takao Onoye<sup>†‡</sup> <sup>†</sup>Dept. Infomation Systems Engineering, Osaka University <sup>‡</sup>JST, CREST e-mail: {shinkai.kenichi, hasimoto}@ist.osaka-u.ac.jp

## ABSTRACT

This paper presents a self-compensation scheme of manufacturing variability for clock skew reduction. In the proposed scheme, a CDN with embedded variability sensors tunes variable clock drivers for canceling the clock skew induced by manufacturing variability. We apply the proposed scheme for a mesh-style CDN in a 65nm technology and evaluate the deskewing effect as a function of the sensor performance. Experimental results show that the skew can be reduced by over 70% and the correlation coefficient between estimated and actual variabilities, which represents the sensor performance, should be more than 0.3 for skew reduction.

## **Categories and Subject Descriptors**

B.7.1 [Integrated Circuits]: Types and Design Styles

## **General Terms**

Design, Performance

## **Keywords**

Clock distribution, Manufacturing variability, Self-compensation, On-chip sensors

## 1. INTRODUCTION

Influence of manufacturing variability on circuit performance has been increasing because of finer manufacturing process and lowered supply voltage. Smaller clock skew is desirable not only for shortening clock cycle but also for reducing delay elements inserted to satisfy hold time constraints. The obstacles to satisfy the skew constraints are: (1) manufacturing variability and environmental fluctuation such as power supply noise and variation in temperature, (2) design imperfectness such as difference of wire length between clock source and flip-flop, and non-uniform flipflop placement. To solve the second problem, CAD research has been intensively carried out [1].

On the other hand, the first problem of manufacturing variability is becoming more prominent as the technology scales. Thus,

*GLSVLSI'10*, May 16–18, 2010, Providence, Rhode Island, USA. Copyright 2010 ACM 978-1-4503-0012-4/10/06 ...\$5.00. the conventional worst-case design with guard-banding comes to be less and less efficient, since a large design margin is necessary and performance improvement might not be attained, especially for high-end designs, even though a finer process is used for implementation. To overcome this problem, two approaches are studied [2].

- 1. A clock distribution topology which is robust to manufacturing variability, such as mesh structure, is adopted.
- 2. A clock distribution network (CDN) is tuned after fabrication.

Mesh-type CDN is often adopted in high-end designs for reducing clock skew [3, 4]. Nodes are shortened by the mesh wires and clock arrival times are averaged out, which contributes to reduce clock skew [5].

On the other hand, in current high-end microprocessor designs, variable delay elements are embedded in CDN and post-fabrication tuning is performed. Some designs detect the phase difference between trunk clock signals and adjust the clock timing. Recently, as another approach, post-fabrication clock deskewing methods using clock delay adjustment with at-speed test have been studied [6, 7]. However, these methods require prohibitively large tuning cost, because they adjust clock delay by repeating tests to check if timing constraints are met.

In this paper, we focus on mesh-style clock distribution (Fig. 1) which is believed to be effective for reducing clock skew, and we propose a clock deskewing method by self-compensation of manufacturing variability, which requires less adjustment cost. Withindie variability mainly consists of spatially-correlated variability and random variability (Fig. 2). Our previous evaluation [8] pointed out that the spatially-correlated variation has more impact on clock skew than random variability is well canceled out thanks to shortened mesh wires. In the proposed scheme, the influence of spatially-correlated variation, which embedded variability sensors detect, is compensated by clock buffer adjustment. We evaluate and compare clock skews with and without the proposed method by Monte Carlo



Figure 1: Mesh-style CDN.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

simulation, and reveal that clock skew can be reduced by selfcompensating manufacturing variability. We also discuss the required performance of on-chip variability sensors.

The rest of this paper is organized as follows. Section 2 introduces the proposed clock deskewing scheme. Section 3 describes how to tune variable clock drivers in the proposed scheme. Experimental evaluation is carried out in Section 4, and the discussion is concluded in Section 5.

# 2. PROPOSED SCHEME FOR CLOCK SKEW REDUCTION

#### 2.1 Overview

Figure 3 illustrates the overview of the proposed scheme. CDN including variable clock drivers and on-chip variability sensors are implemented. Here, the on-chip variability sensors aim to detect spatially-correlated variability, since the mesh clock distribution structure is inherently robust to random variability, whereas it still suffers from spatially-correlated variability [8]. To further reduce the clock skew, the proposed scheme aims at sensing and compensating the spatially-correlated variability. The variable clock drivers are tuned on the basis of the detected variability information. The tuning is done by a small circuit or a simple processor.

The most significant feature of the proposed scheme is that the tuning is performed automatically without any tests using external equipments. In contrast, conventional approaches [6, 7] involve a lot of trial-and-error at-speed tests to adjust clock driver performance.

## 2.2 Requirement of variability sensors

To realize the proposed scheme for skew reduction, variability sensors that satisfy the following requirements are necessary.

- 1. detect within-die spatially-correlated variability
- 2. separate PMOS and NMOS variability
- 3. be a simple and small circuit



Figure 2: Within-die variability.





The proposed scheme focuses on the compensation of spatiallycorrelated variability. The die-to-die variability uniformly changes the performance of clock drivers, and hence the clock skew does not change significantly. The random variability affects the clock skew, however it is well suppressed inherently thanks to the mesh structure[8]. Thus, variability sensors that can detect within-die spatially-correlated variability are necessary. To obtain the information on the spatially-correlated variability, we suppose that sensors that detect within-die spatially-correlated and die-to-die variability are placed. We differentiate the sensing results, and obtain the within-die spatially-correlated variability.

The second requirement comes from a clock design strategy that keeps a balance between rise and fall transition times. To compensate the unbalanced rise and fall transition times, we need to know the PMOS and NMOS variability information separately.

The third requirement aims to reduce the area overhead and design difficulty needed to integrate the self-compensation scheme.

#### **2.3** Problem discussed in this work

The accuracy of the sensing results depends on several factors. The most significant factor that may degrade the sensing results is random variability. The random variability is often eliminated by using many sensors or increasing the number of components in a sensor. Therefore, there is a trade-off between area and elimination ratio of random variability. In addition, the sensors usually estimate only a few variational parameters, while other parameters also fluctuate, which deteriorates the sensor accuracy. When ring oscillator-based sensors are used and the oscillating frequency is measured by a counter, the accuracy and length of the running time affects the sensing results.

In this work, we investigate the sensor accuracy needed for skew reduction in the proposed scheme. We focus on the correlation coefficient between actual and estimated variabilities and regard it as a metric of sensor performance. In the following, we execute a case study in an industrial 65nm process to evaluate how much skew reduction is obtainable as a function of the sensor performance, and reveal the required sensor performance to reduce clock skew. We assume that sensors give four variational parameters, that are PMOS/NMOS threshold voltage  $\Delta V_{th.p}/\Delta V_{th.n}$  and gate length  $\Delta L_p/\Delta L_n$ . The implementation of data processing for sensor data and adjusting variable drivers is one of our future works, and is not further discussed in this paper.

## 3. ADJUSTMENT OF VARIABLE CLOCK DRIVERS

This section discusses tuning of variable clock drivers. We first describe how to adjust clock driver performance on the basis of the sensing results. Next, we experimentally show the appropriateness of the driver adjustment.

#### **3.1** Tuning of variable clock drivers

In the proposed scheme, the performance of clock drivers are adjusted using the variability information given by the variability sensors. The proposed scheme aims to minimize the clock skew that is caused by manufacturing variability, which means that the clock skew without within-die variation at the typical process corner is ideally obtainable in every chip. To remove the performance variability of each clock driver and obtain the typical performance, body biasing is assumed to be used for performance adjustment in this work. Another approach might be, for example, to adjust the number of working transistors/buffers in parallel.

We compensate NMOS variability  $\Delta V_{th,n}$  and  $\Delta L_n$  by adjusting P-well voltage  $\Delta V_{pw}$ , that is body voltage of NMOS,

and PMOS variability  $\Delta V_{th,p}$  and  $\Delta L_p$  by adjusting N-well voltage  $\Delta V_{nw}$ . We here define the performance of clock driver as 50% propagation delay. We characterize the sensitivity of the driver performance with respect to  $V_{pw}$  and  $V_{nw}$  in addition to  $V_{th,n}, V_{th,p}, L_n$  and  $L_p$  in advance. The clock driver adjustment is supposed to be performed in the following procedure.

- 1. Sensing results of variability sensors are collected, and within-die spatially-correlated variability is estimated.
- 2. For each variable clock driver, the body voltages  $\Delta V_{pw}$  and  $\Delta V_{nw}$  necessary to compensate  $\Delta V_{th\_n}$ ,  $\Delta V_{th\_p}$ ,  $\Delta L_n$  and  $\Delta L_p$  are computed and given.

To cancel the variability, the following relations must be satisfied with a first-order approximation.

$$\frac{\partial Delay}{\partial V_{th\_n}} \Delta V_{th\_n} + \frac{\partial Delay}{\partial L_n} \Delta L_n + \frac{\partial Delay}{\partial V_{pw}} \Delta V_{pw} = 0, \quad (1)$$

$$\frac{\partial Delay}{\partial V_{th\_p}} \Delta V_{th\_p} + \frac{\partial Delay}{\partial L_p} \Delta L_p + \frac{\partial Delay}{\partial V_{nw}} \Delta V_{nw} = 0.$$
(2)

Thus, at the second step,  $\Delta V_{pw}$  and  $\Delta V_{nw}$  are computed by

$$\Delta V_{pw} = -\left(\frac{\partial Delay}{\partial V_{th.n}}\Delta V_{th.n} + \frac{\partial Delay}{\partial L_n}\Delta L_n\right) / \frac{\partial Delay}{\partial V_{pw}} ,(3)$$
$$\Delta V_{nw} = -\left(\frac{\partial Delay}{\partial V_{th.p}}\Delta V_{th.p} + \frac{\partial Delay}{\partial L_p}\Delta L_p\right) / \frac{\partial Delay}{\partial V_{nw}} .(4)$$

#### 3.2 Case study

We here describe the constraint of body voltage range and examine the appropriateness of the first-order approximation. Then, we experimentally evaluate how much performance variation of clock drivers can be reduced.

#### *3.2.1 Body voltage range*

Usable body voltage is limited from a point of practical usage. As higher forward body bias is given, the performance of clock drivers improves, whereas beyond a certain voltage, the performance degrades [9]. In addition, when such large forward body bias is given, PN junction is forwardly biased and large current flows [9, 10]. Figures 4 and 5 show the oscillation frequency and power dissipation of a 9-stage ring oscillator consisting of minimum-sized inverters. Here, the same amount of body bias is given to PMOSs and NMOSs. Figure 4 shows that forward body biasing up to 0.7V increases the oscillation frequency. However, above 0.7V, the frequency drops suddenly. Besides, Fig. 5 indicates that the leakage current dominates the power dissipation above 0.5V.

On the other hand, reverse body biasing decreases the frequency monotonically. However, the maximum reverse body voltage is also limited, since large reverse body biasing increases junction tunneling leakage current of P-well and N-well [9, 10].

Thus, the available range of body biasing is limited. In the following evaluation, body biasing from -1.2V to 0.4V is assumed in accordance with the above discussion.

# 3.2.2 Driver performance versus variability and body voltage

In Eqs. (3) and (4), first-order sensitivity is assumed to compute  $V_{pw}$  and  $V_{nw}$ . The appropriateness of this assumption is experimentally validated.

Figure 6 shows the frequency variations of a 9-stage ring oscillator in cases that  $V_{th.n}$ ,  $L_n$  and  $V_{pw}$  are varied. The oscillation



Figure 4: Relation between oscillation frequency and body voltage.

Figure 5: Relation between power dissipation and body voltage.



Figure 6: Oscillation frequency versus  $V_{th}$  and L variation and body voltage.

frequency is almost proportional to  $V_{th,n}$  and  $L_n$  variations and  $V_{pw}$ . Similar tendency is observed for  $V_{th,p}$ ,  $L_p$  and  $V_{nw}$ . Therefore, the linear approximation used in the derivation of Eqs. (3) and (4) is concluded to be reasonable.

#### 3.2.3 Performance adjustment result

We here evaluate how effectively Eqs. (3) and (4) compensate the variability using Monte Carlo simulation. We adjusted body voltages of a 9-stage ring oscillator. In this evaluation, only spatially-correlated variability is considered, and the magnitudes are  $\sigma_{\Delta V_{th,n}} = \sigma_{\Delta V_{th,p}} = 35 \text{ mV}$  and  $\sigma_{\Delta L_n} = \sigma_{\Delta L_p} = 1 \text{ nm}$ . Figures 7 and 8 show the distributions of the oscillation frequency before and after the performance compensation. The number of evaluation is 100. We can see that the performance variation becomes much smaller thanks to the performance compensation. Table 1 lists the average performance difference from the typical performance. Table 1 shows that the performance compensation by body biasing reduces the variation of driver performance by 70%. The performance degradation due to body biasing happened three times in 100 evaluations. In the two cases of performance degradation, the computed  $\Delta V_{nw}$  is larger than the maximum value, that is 0.4V, and the variability was not compensated enough. On the other hand, the opposite variability happened to NMOS, and NMOS variability was sufficiently compensated. Thus, the performance degradation happened. In the remaining one case, the performance before compensation was very close to the typical performance, and the body biasing slightly increased the performance difference from the typical performance by just 0.1%.



Figure 7: Distribution of driver performance (before compensation).



Figure 8: Distribution of driver performance (after compensation).

| Table 1: | Average | performance | difference | from | the | typical | per- |
|----------|---------|-------------|------------|------|-----|---------|------|
| formanc  | e       |             |            |      | _   |         |      |

| Before compensation | 6.40 % |
|---------------------|--------|
| After compensation  | 1.87 % |

## 4. EXPERIMENTAL EVALUATION

This section evaluates how much skew reduction can be achieved by the proposed scheme in mesh CDN. We first explain the experimental setup, and then show experimental results.

## 4.1 Experimental setup

We designed a mesh CDN in an industrial 65nm process for simulation-based evaluation. The clock distribution structure used for the experiment is the hybrid structure with a H-tree and a mesh shown in Fig. 1. The design parameters are listed in Table 2. We here assume a clock distribution within a 1mm square clock domain. The mesh pitch is 100 $\mu$ m and the mesh is constructed with intermediate wires. The wiring material is copper, and we calculate the wire capacitance and resistance by given process information from the foundry. FF distributions of FPU [11] and MeP [12] depicted in Figs. 9 and 10 were obtained by a commercial EDA tool. The input capacitance of a FF is 1.15fF. Note that, in this evaluation, we did not execute sizing of each driver and wire segment for simplicity, and hence the clock skew remains even at the typical process corner without within-die variability.

We assumed random and within-die spatially-correlated variability of  $V_{th}$  and L in NMOS and PMOS. Variances of random and within-die spatially-correlated variability are assumed to be the same, and the total variances of  $V_{th}$  and L are 35mV and 1nm. We assumed that the spatially-correlated component has the correlation coefficient expressed as  $f(x) = e^{-2x}$ , where x [mm] is the distance of two devices [13]. The clock skew is evaluated by a transistor-level circuit simulator [14].

The proposed scheme determines body voltages given to variable clock drivers according to sensed spatially-correlated variability. If the spatially-correlated variability is misestimated and much different from the actual one, the appropriate body biasing cannot be applied and the variability cannot be compensated. The efficiency of variability compensation depends on the performance of vari-



Figure 9: FF distribution (FPU, #FF 667).

Figure 10: FF distribution (MeP, #FF 4411).

ability sensor. To clarify the relation between sensor performance and the amount of skew reduction, we vary the sensor performance and evaluate the clock skew. Here, the sensor performance is represented as correlation coefficient between actual variability and estimated variability.

## 4.2 Results

We performed Monte Carlo simulation whose count was 100 for FPU and MeP with and without the proposed scheme. Figure 11 shows the skew distribution of FPU without tuning variable clock drivers. Figures 12 and 13 depict the skew distributions supposing that the variability of  $\Delta V_{th.n}$ ,  $\Delta V_{th.p}$ ,  $\Delta L_n$ , and  $\Delta L_p$  are estimated with correlation coefficients of 1 and 0, respectively.

Similarly, the skew distributions of MeP are shown in Figs. 14– 16. The average  $\mu$  and standard deviation  $\sigma$  of FPU and MeP are summarized in Table 3.

Looking at Figs. 11, 12, 14 and 15 and Table 3, we can see that the skew distribution is concentrated and the standard deviation of clock skew is reduced by over 70% in case that the correlation coefficient of the sensor performance is 1. The reduction of the standard deviation helps to decrease design margin and consequently reduce design cost.

Table 2: Design parameters.

| 8 I                        |               |
|----------------------------|---------------|
| Process                    | 65 nm         |
| Clock distribution area    | 1 mm x 1mm    |
| Mesh pitch                 | 100 µm        |
| Metal layer                | M7            |
| Wire width in tree         | $0.4 \ \mu m$ |
| Wire width in mesh         | 0.4 μm        |
| Wire width from mesh to FF | 0.2 μm        |

Table 3: Average  $\mu$  and standard deviation  $\sigma$  of clock skew.

| Circuit | Compensation | Corr. Coeff. | $\mu$ [ps] | $\sigma$ [ps] |
|---------|--------------|--------------|------------|---------------|
| FPU     | No           | -            | 16.09      | 1.76          |
|         | Yes          | 1            | 15.11      | 0.45          |
|         |              | 0            | 16.20      | 2.07          |
| MeP     | No           | -            | 14.08      | 2.57          |
|         | Yes          | 1            | 12.17      | 0.71          |
|         |              | 0            | 15.38      | 3.14          |





Figure 11: Skew distribution (w/o compensation, FPU).

Figure 12: Skew distribution (w/ compensation, corr. coeff.= 1 FPU).



(w/ compensation, corr. coeff.= 0 FPU).

Probability





Figure 14: Skew distribution (w/o compensation, MeP).

Figure 15: Skew distribution (w/ compensation, corr. coeff.= 1 MeP).



In contrast, when the correlation coefficient of the sensor performance is 0, the skew distribution is shifted to the right (Figs. 11, 13, 14 and 16), and the average and standard deviation increase (Table 3). In this case, the body voltages are computed according to the estimated variability which is totally uncorrelated with the actual one. This means that the variability is amplified and hence the skew is deteriorated. Therefore, to reduce clock skew, we need variability sensors whose correlation coefficient of the performance is over a certain value.

We next evaluate the relation between the average/standard deviation of clock skew and the correlation coefficient of the sensors. Figures 17 and 18 show the results. Here, it is assumed that the spatially-correlated variabilities of  $\Delta V_{th.n}$ ,  $\Delta V_{th.p}$ ,  $\Delta L_n$ , and  $\Delta L_p$  are estimated with the same correlation coefficient. Figures 17 and 18 indicate that the average and standard deviation of clock skew decreases, as the correlation coefficient increases, as we expected. When the correlation coefficient is larger than 0.3, the average and standard deviation of the clock skew become smaller compared to the conventional non-compensation case (Table 3, Figs. 17 and 18). This result means that the variability sensor whose correlation coefficient is larger than 0.3 is necessary. We also see that improving the sensor performance directly helps to reduce clock skew in the proposed scheme.

#### 4.3 Discussion on Sensor Performance

We finally discuss the feasibility of variability sensors that are capable for skew reduction. We took up two sets of ring oscillators for estimating  $\Delta V_{th.n}$ ,  $\Delta V_{th.p}$ ,  $\Delta L_n$  and  $\Delta L_p$ .

- **Set A** standard inverters, inverters with 16x NMOS width and inverters with 16x PMOS width
- Set B  $V_{th_n}$ -sensitive inverters (Fig. 5 in [15]),  $V_{th_n}$ -sensitive inverters (complimentary to Fig. 5 in [15]),  $L_n$ -sensitive inverters with pass-gated PMOS and NMOS loading,  $L_p$ -sensitive inverters with pass-gated PMOS and NMOS loading (similar loading structure is found in Fig. 5 in [15])

Using response surface method [16], we derived the following second-order polynomial expressions of four variation parameters. In this derivation, random variability is not considered.

$$\Delta V_{th_n} = f(\mathbf{F}) \tag{5}$$

$$\Delta V_{th_p} = g(\mathbf{F}) \tag{6}$$

$$\Delta L_n = h(\mathbf{F}) \tag{7}$$

$$\Delta L_p = i(\mathbf{F}) \tag{8}$$



Figure 17: Relation between correlation coefficient of variability sensor and average/standard deviation of clock skew (FPU).



Figure 18: Relation between correlation coefficient of variability sensor and average/standard deviation of clock skew (MeP).



Figure 19: Accuracy of  $\Delta V_{th.n}$  estimate (101-stage, Set B).

Here, in case of **Set A**, we measured the oscillation frequencies of three ring oscillators at supply voltage of 1.2V and that of the normal inverters at 0.6V as well. F includes these measured frequencies normalized by the nominal frequency. As for **Set B**, F consists of the normalized frequencies of four oscillators at 1.2V.

For sensor performance evaluation, both random and spatiallycorrelated variabilities are considered. We assume that the standard deviations of  $V_{th}$  random and spatially-correlated variability are the same and 35mV. Similarly, the standard deviations of L are 1nm. We performed Monte Carlo simulation for Set A and Set B with different stages. The number of evaluation was 100. Table 4 lists the correlation coefficient between actual spatially-correlated variability and estimated variability. We can see that larger-stage ring oscillators are desirable for variability sensors, because random variability becomes relatively small thanks to the well-known averaging effect. On the other hand, with 101-stage Set B, the correlation coefficient of 0.3 is obtained in three variation parameters, and skew reduction by the proposed scheme is expected. However, high correlation coefficient, such as >0.8 is difficult to obtain. Figure 19 shows the scattering plot of  $\Delta V_{th.n}$  in the case of 101-stage Set B. The dots are not concentrated on the diagonal line. Further study for improving on-chip variability sensors is necessary.

#### 5. CONCLUSION

This paper investigated the feasibility of the self-deskewing method using on-chip variability sensors in mesh CDN. We considered the correlation coefficient between actual and estimated variabilities as the sensor performance, and evaluated the skew reduction effect as a function of the sensor performance. The experimental results show that the skew could be reduced by 70%, and higher sensor performance improved the deskewing effect. Correlation coefficient of at least 0.3 was necessary to reduce the clock skew. We also examined the feasibility of the variability sensors, and pointed out that the state-of-the-art sensors would be capable for skew reduction, however further study is necessary to enhance the deskewing efficiency.

Table 4: Correlation coefficients of variability sensors.

|                   | Set A    |           | Set B    |           |  |
|-------------------|----------|-----------|----------|-----------|--|
| Parameter         | 11-stage | 101-stage | 11-stage | 101-stage |  |
| $\Delta V_{th_n}$ | 0.57     | 0.80      | 0.60     | 0.65      |  |
| $\Delta V_{th_p}$ | 0.25     | 0.57      | 0.63     | 0.63      |  |
| $\Delta L_n$      | 0.06     | 0.18      | 0.03     | 0.23      |  |
| $\Delta L_p$      | 0.09     | 0.25      | 0.13     | 0.53      |  |

#### 6. ACKNOWLEDGMENT

This work was partly supported by NEDO.

#### 7. REFERENCES

- N. Sherwani, "Algorithms for VLSI Physical Design Automation Third Edition," Kluwer Academic Publishers, 1999.
- [2] Qing K. Zhu, "High-Speed Clock Network Design," Kluwer Academic Publishers, Jan. 2003.
- [3] R. Heald, *et al.*, "Implementation of a 3rd-Generation SPARC V9 64b Microprocessor," in *Proc. ISSCC*, pp. 412–413, Feb. 2000.
- [4] Phillip J. Restle, Craig A. Carter, James P. Eckhardt, Byron L. Krauter, Bradley D. McCredie, Keith A. Jenkins, Alan J. Weger and Anthony V. Mule, "The Clock Distribution of the Power4 Microprocessor," in *Proc. ISSCC*, pp. 144–145, Feb. 2002.
- [5] M. Mori, H. Chen, B. Yao and C. Cheng, "A Multiple Level Network Approach for Clock Skew Minimization with Process Variations," in *Proc. ASP-DAC*, pp. 263–268, Jan. 2004.
- [6] E. Takahashi, Y. Kasai, M. Murakawa and T. Higuchi, "Post-Fabrication Clock-Timing Adjustment Using Genetic Algorithms," *IEEE JSSC*, Vol. 39, No. 4, pp. 643–650, Apr. 2004.
- [7] J.-L. Tsai, D. Baik, C. C.-P. Chen and K. K. Saluja, "A Yield Improvement Methodology Using Pre- and Post-Silicon Statistical Clock Scheduling," in *Proc. ICCAD*, pp. 611–618, Nov. 2004.
- [8] S. Abe, M. Hashimoto, and T. Onoye, "Clock Skew Evaluation Considering Manufacturing Variability in Mesh-Style Clock Distribution," in *ISQED*, pp.520–525, 2008.
- [9] K. Hamamoto, H. Fuketa, M. Hashimoto, Y. Mitsuyama, and T. Onoye, "An Experimental Study on Body-Biasing Layout Style Focusing on Area Efficiency and Speed Controllability," *IEICE Trans. on Electronics*, vol.E92-C, no.2, pp.281–285, February 2009.
- [10] S Narendra, J. Tschanz, J. Kao, S. Borkar, A. Chandrakasan and V. De, "Leakage in Nanometer CMOS Technologies," Springer, pp.141–162, 2006.
- [11] OPENCORES.ORG, http://www.opencores.org/.
- [12] TOSHIBA Semiconductor Company, http://www.semicon.toshiba.co.jp/eng/ product/micro/mep/index.html.
- [13] H. Masuda, S. Ohkawa, A. Kurokawa, and M. Aoki, "Challenge: Variability Characterization and Modeling for 65- to 90-nm Processes," in *Proc. CICC*, pp. 593-599, 2005.
- [14] Synopsys, Inc., http: //www.synopsys.com/Tools/Verification/ AMSVerification/CircuitSimulation/Pages/ NanoSim.aspx.
- [15] B. Wan, J. Wang, G. Keskin and L. T. Lileggi, "Ring Oscillators for Single Process-Parameter Monitoring," in *Proc. Workshop on Test Structure Design for Variability Characterization*, 2008.
- [16] R. H. Myers and D. C. Montgomery, "Response Surface Methodology: Process and Product Optimization Using Designed Experiments, 2nd Edition," Wiley-Interscience, Feb. 2002.