# Statistical Timing Analysis Considering Spatially and Temporally Correlated Dynamic Power Supply Noise

Takashi Enami, Student Member, IEEE, Shinyu Ninomiya, Student Member, IEEE, and Masanori Hashimoto, Member, IEEE

Abstract—Power supply noise is having increasingly more influence on timing, even though noise-aware timing analysis has not yet been fully established, because of several difficulties such as its dependence on input vectors and dynamic behavior. This paper proposes static timing analysis that takes power supply noise into consideration where the dependence of noise on input vectors and spatial and temporal correlations are handled statistically. We construct a statistical model of power supply voltage that dynamically varies with spatial and temporal correlation, and represent it as a set of uncorrelated variables. We demonstrate that power-voltage variations are highly correlated and adopting principal component analysis as an orthogonalization technique can effectively reduce the number of variables. Experiments confirmed the validity of our model and the accuracy of timing analysis. We also discuss the accuracy and CPU time in association with the reduced number of variables.

*Index Terms*—Gaussianization, power supply noise, principal component analysis, statistical timing analysis.

#### I. INTRODUCTION

ANUFACTURING variability in the nanometertechnology era has caused significant fluctuations in circuit performance, and variation-aware timing analysis has been intensively studied [2]–[4]. In addition, timing verification taking power/ground noise into consideration has been eagerly anticipated. Power supply noise is expected to become an increasingly more serious problem in timing in the future because of increasing current consumption and decreased power supply voltage. A severe obstacle to noise-aware timing analysis is the difficulty of identifying the worst-case noise for timing. Power supply noise depends on given input signals and internal register states, and it changes within a clock cycle, as well as cycle by cycle. As the circuit scale increases, combinations of input signals and register states increase exponentially, which makes it prohibitively expensive to find the actual worst-case noise.

Dynamic timing simulation with a power/ground network and input patterns can provide timing information with noise.

Manuscript received May 14, 2008; revised July 23, 2008 and October 7, 2008. Current version published March 18, 2009. This work was supported in part by the Semiconductor Technology Academic Research Center (STARC), by the New Energy and Industrial Technology Development Organization (NEDO), and by the University of Tokyo with the collaboration with Synopsys Corporation. An earlier version of this work was presented in [1]. This paper was recommended by Associate D. Z. Pan.

The authors are with the Department of Information Systems Engineering, Osaka University, Osaka 565-0871, Japan (e-mail: enami.takashi@ist.osaka-u.ac.jp; ninomiya.shinyu@ist.osaka-u.ac.jp; hasimoto@ist.osaka-u.ac.jp).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCAD.2009.2013990

However, dynamic timing analysis cannot cover all paths, and only verifies part of the path delays, which is a well-known drawback. Even if a test pattern that maximizes voltage drop is found, the vector does not necessarily correspond to the worstcase for timing, because the circuit structure and the layout are also associated with the timing. Preparing effective test vectors for verifying noise-aware dynamic timing is computationally expensive, and it is impossible to solve this within practical time limits.

To consider what impact power/ground noise has on timing, static timing analysis (STA) is commonly undertaken assuming that a constant (dc) voltage drop, e.g., the maximum voltage variation, is applied to all gates. This approach is computationally efficient, but there is no systematic way of determining the voltage drop without being optimistic or too pessimistic. When the maximum voltage drop is applied to all gates, the estimated timing is too pessimistic, which causes a timing convergence problem and overdesign. To solve this problem, timing analysis that takes dynamic voltage variations into account has been proposed [5], [6], and some commercial tools are available. However, it is necessary to obtain or assume worst-case noise, which means the problem of the dependence of power supply noise on the test pattern remains unsolved.

Although finding the exact worst-case noise for timing is extremely difficult, designers have to quantitatively ensure that the designed circuit will operate at the target frequency before they fabricate it. Therefore, a systematic technique that can be used to estimate not exact but realistic worst-case timing is necessary. Path-based methods of estimating the maximum delay have been proposed [7]-[9]. However, as these have to be applied to many potential critical paths, the computational cost could be extremely high. Recently, Pant and Blaauw [10] proposed an approach to estimating the effect of power supply noise on timing by solving an optimization problem. The problem was formulated as a nonlinear delay-maximization problem under the given constraints of current consumption. However, the circuit size they reported [10] was limited, and its application to larger circuits was not clarified. Statistical treatment has been introduced into power supply noise-aware timing analysis [11]–[13] as another approach. Pant et al. [11] estimated the voltage variations by convoluting statistically modeled current consumption and the impulse response of a power/ground network. Jiang and Chen [12] first derived the average and standard deviation of all blocks and the correlation coefficients between the blocks, and they then estimated the delay. Kim and Walker [13] focused on the spatial correlation of power supply noise and proposed using principal component



Fig. 1. Overview of proposed approach.

analysis (PCA) to model power supply noise. The path-delay distribution was then computed with uncorrelated variables. We are familiar with the argument that timing failure due to power supply noise must be verified deterministically, since a certain input pattern inevitably causes problems. However, precise verification in a vast input pattern and register state space is impossible, and we thus believe that a statistical approach would help designers to estimate timing performance both quantitatively and systematically.

This paper proposes STA that takes dynamic power supply noise into account. There is an overall flowchart of the proposed method in Fig. 1, which statistically models power/ground noise. Spatially and temporally correlated power supply noise is transformed to uncorrelated variables by using orthogonalization techniques, such as PCA and independent component analysis (ICA). We then carry out statistical STA (SSTA) using the derived statistical power/ground noise model. STA with a statistical model of power supply noise with PCA has been proposed by Kim and Walker [13]; however, as theirs was a preliminary work, several important issues, such as the non-Gaussian distribution shape of variables and dynamic voltage fluctuations within the clock cycle, remain unsolved or have not been addressed. Furthermore, reducing the number of variables by using PCA due to the tight correlation between them has not been aggressively exploited to reduce CPU cost in STA.

We experimentally demonstrated that PCA-based statistical modeling with various distribution transformation techniques (e.g., a Box-Cox transformation), if necessary, works well, even though the distribution of power supply noise is not exactly Gaussian. To take dynamic noise behavior within a clock cycle into consideration, we propose discretizing the clock cycle into several time slots, and assigning a random variable to each time slot to construct a statistical model of dynamic power supply noise. We focus on the observation that power supply noise is highly correlated not only spatially but also temporally, and model power supply noise with a small set of random variables, which helps to reduce the CPU time to verify timing. We also demonstrate that adaptive spatial discretization for variable assignment reduces the PCA cost significantly. One might think that SSTA with PCA is a well-known approach, but this similarity is a huge advantage in retaining compatibility



Fig. 2. Different waveforms for power supply noise in space and time.

with conventional SSTA. The proposed method can easily be integrated into SSTA tools for manufacturing variability, i.e., SSTA that easily includes both manufacturing variability and power supply noise in a unified approach.

This paper assumes that information on power supply noise needed for statistical modeling, including input vectors, has been given. It is not generally easy to estimate power supply noise. However, we think that sophisticated methods, such as impulse response and convolution with logic simulation results for estimating power and verifying functions [10] will give us the information. Although the efficient preparation of information, which includes mutual dependence between power supply noise and manufacturing variability, is another interesting research topic, it is beyond the scope of this paper.

This paper is organized as follows. Section II discusses difficulties with timing analysis taking power supply noise into consideration. We describe how we statistically modeled power/ground voltage variations in Section III. Section IV explains the SSTA procedure with the proposed noise model. We present the experimental results in Section V, and Section VI concludes with a discussion.

## II. DIFFICULTIES WITH NOISE-AWARE TIMING ANALYSIS AND PROPOSED APPROACH

One problem when analyzing timing taking power supply noise into account is that the maximum voltage drop does not necessarily cause the worst-case delay. The supply voltage changes spatially and temporally within a clock cycle as well as cycle by cycle. The observation of power supply noise only may not necessarily detect the timing failure due to power supply noise, because the timing depends on the position of critical paths as mentioned by Pant and Blaauw [10].

Fig. 2 shows an example where maximum voltage drop does not always cause the worst delay. The solid lines represent the power supply noise of cycle #(c) and the broken ones represent those of cycle #(d). Let us suppose there is a critical path in area A. In that case, the delay in cycle #(c) would be worse than that in cycle #(d). However, if a critical path were located in area B, it is unclear which cycle would be the worse-case for timing in this chip. In area B, the noise of cycle #(c) delays gate switching at the beginning of the clock cycle, whereas it has less effect on switching in the latter half of the clock cycle. However, in cycle #(d), switching in the latter half is greatly slowed down. Thus, voltage fluctuations within a clock cycle can influence gate delay much or less, depending on the switching timing, where the switching timing is basically determined by the structure of the circuit.

The noise waveform shape varies according to given input vectors. As previously mentioned, the space in input vectors



Fig. 3. Spatial correlation of power supply voltage.



Fig. 4. Spatial correlation of current consumption.

and internal logic states is extremely large and cannot be thoroughly examined. We thus statistically modeled power supply noise while preserving the spatial and temporal correlation, and applied it to SSTA. The proposed approach can solve the problem described above, i.e., the position of critical paths and the spatial and temporal difference in power supply noise can be considered simultaneously. We report experimental results in Section V that indicate maximum noise does not necessarily involve worst-case delay.

PCA, which is an orthogonalization method, has a tremendous advantage. Highly correlated Gaussian variables are transformed into a small set of Gaussian variables with only a small sacrifice of accuracy. Here, we present an example where power supply noise is highly correlated in space. We evaluated the power supply noise of a floating point unit (FPU) circuit in a  $1 \times 1 \text{ mm}^2$  area [14], and set  $10 \times 10$  variables associated with spatially divided 10  $\times$  10 grids. Here, 10  $\times$  10 grids is a reasonable discretization to capture the spatial behavior. With this discretization, the average error of average delay estimated by SSTA is within 0.5% in our experiments, which will be shown in Section V-C. Each variable represented the cycle-average supply voltage on the  $V_{\rm DD}$  side at each grid. The evaluation conditions were the same as those used in the experiments described in Section IV. Fig. 3 is a histogram of the correlation coefficients between variables. We can see that variables are highly correlated, and 36.2% of coefficients are above 0.9. We thus expected that a compact statistical model with a small number of variables could be derived. A small number of variables enabled us to carry out SSTA and Monte Carlo simulations efficiently. However, when we chose current consumption as a variable instead of supply voltage, the correlation between the variables was weaker than the power supply voltage, as shown in Fig. 4, and hence the number of variables could not be efficiently reduced. Although current consumptions at adjacent nodes were not greatly correlated, the impedance of the power network strengthened the spatial corre-



Fig. 5. Spatial correlation between adjacent power supply voltages.



Fig. 6. Temporal correlation of power supply voltage.



Fig. 7. Example of dynamic noise waveform.

lation of power supply voltage. In other words, a current drawn at a node flows through wire segments of power supply network, and hence power supply noises at the wire segments have a correlation because of the common current component. As the wire segments become distant, the correlation becomes weaker since the current waveform changes due to an intrinsic RC filter and the portion of the common current component decreases. Thus, power supply noise has a local spatial correlation, which is shown in Fig. 5. We extracted the correlation coefficients between adjacent variables from Fig. 3, and made the histogram of Fig. 5. In this case, 92.8% of the coefficients are above 0.9.

Then, we evaluated temporal correlation and correlation between power and ground of power supply noise. Power supply noise was spatially divided into  $10 \times 10$  and temporally divided within a clock cycle into ten spans, then we assigned the variables. Temporal correlation was obtained by assembling correlation coefficients between variables which belong to identical area and different time span. Histogram of the temporal correlation is shown in Fig. 6, which reveals power supply noise has strong temporal correlation. As Fig. 7, once a voltage drop arises, power supply voltage cannot regain the nominal voltage promptly. It is because recharging parasitic and decoupling capacitances which supply their charge to neighboring switching gates is necessary and its *RC* time constant is usually comparable to clock cycle. In addition, current consumption has a certain amount of temporal correlation. Therefore,



Fig. 8. Correlation between power and ground.

temporal correlation exists between temporally adjoining variables. Then, correlation between power and ground was evaluated by assembling correlation coefficients between power and ground variables which correspond to identical area and identical time span. In CMOS digital circuits, charge supplied from power network is necessarily discharged to ground. Therefore, when switching current at a power node is large and the supply voltage fluctuates heavily, the current at the corresponding ground tends to be large and the ground voltage also fluctuates significantly. Consequently, power and ground voltages are correlated. Fig. 8 is a histogram of the correlation between power and ground, and also reveals strong correlation. Thus, power supply noise modeling with PCA is expected to operate efficiently.

When using PCA, we have to pay attention to the distribution shape of the variables, because PCA assumes a Gaussian distribution. One problem with applying PCA to power-supplynoise modeling is the non-Gaussian noise distribution, which may cause unwanted modeling error. Solutions to this problem include Gaussianizing the variables, e.g., Box-Cox transformation [15]. This transformation improves the Gaussianity of the variable. We experimentally demonstrated that PCA-based modeling is reasonable from the standpoint of practical use, even though, strictly speaking, the distribution is not Gaussian. When the variables are quite far from having a Gaussian distribution, another orthogonalization technique, such as ICA, should be applied, which is similar to that used by Singh and Sapatnekar [4].

An advantage of the proposed method using variable orthogonalization is its compatibility with SSTA, which was developed for manufacturing variability [2]–[4]. As the derived statistical model of power supply noise is expressed similarly to manufacturing variability, importing the noise effect into SSTA is thus straightforward, even though handling withincycle voltage variations requires modifications. We can therefore undertake SSTA covering both the process and voltage variations in a unified way. The proposed method presents the possibility of providing new sign-off criteria that take both manufacturing and voltage variations into account, even though several other matters remain for further study, which will be touched in Section V-F.

## III. PROPOSED STATISTICAL MODELING OF POWER SUPPLY NOISE

This section explains the proposed modeling of power supply noise. After this, we will assume the distributions of power supply voltage are Gaussian or they can be transformed into



Fig. 9. Spatial discretization (divided into partitions indicated by broken lines).

Gaussian form by using variable transformation techniques. We thus used PCA as an orthogonalization method in the research discussed in this paper. We experimentally demonstrated that the non-Gaussianity of the distribution was not significant, which is discussed in Section V. Note that even when the distribution of power supply noise is far from being Gaussian, the basic concept underlying the proposed method works by using ICA instead of PCA similar to that used by Singh and Sapatnekar [4].

#### A. Spatial and Temporal Discretization

Power supply noise varies continuously in space and time, and strictly speaking, all cells have different noise waveforms. However, the points for observing power supply noise are limited because of cost, and the number of points is reduced by clustering cells. We first set up observation points by spatially discretizing a chip. We also temporally discretize power supply variations within a clock cycle. We then assign a random variable to each time span at each spatial grid. For each assigned variable, we treat voltage values in different clock cycles as different samples in statistical modeling. For example, noise information of 2000 clock cycles corresponds to 2000 samples for each random variable.

Spatial discretization is undertaken by partitioning a chip/block area into a 2-D grid and choosing a representative value for each divided partition. For example, the voltage at the center point (Fig. 9) or the average voltage in each partition would be a candidate as a representative value. The voltages of all nodes in the same partition are assumed to be identical.

Fig. 9 has an example of uniform discretization, which is widely used for modeling manufacturing variability. More sophisticated discretization is desirable for power supply noise, since power/ground voltage occasionally fluctuates locally. Fine discretization should be applied to heavily fluctuating areas, whereas coarse discretization is acceptable for stable areas. In timing analysis, the average, standard deviation, and correlation coefficient of supply/ground voltage are used, which means their modeling error caused by discretization should be kept small. After discretization, all voltage observation nodes in a single partition are regarded to be identical, i.e., the average and standard deviation of all observation nodes in the same partition become the same, and the correlation coefficient becomes one. Therefore, we should put neighboring observation nodes whose average and standard deviation are similar and whose correlation is strong into the same partition.



Fig. 10. Temporal discretization (dividing a clock cycle into time spans).



Fig. 11. Variable assignment for statistical modeling.

The following is a simple adaptive discretization method that used in the experiments as an example.

- We divide a chip/block area into partitions, each of which includes only a single observation node.
- 2) We assess whether two partitions can be regarded to have the same voltage fluctuation, i.e., the differences of average and standard deviation are smaller than predefined threshold values (AVGth, SDth) and the correlation coefficient is larger than CCth. When these partitions can be regarded as equivalence, we merge these partitions into a single partition.
- 3) This operation continues until all primal partitions are evaluated.

The modeling error and the number of partitions are controlled by three parameters: AVGth, SDth, and CCth. The SSTA accuracy, the number of partitions, and PCA cost are experimentally demonstrated in Section V.

Another important difference in power supply noise from that in manufacturing variability is its dynamic behavior. Temporal continuity also needs to be removed. We partition a clock cycle into several time spans and compute a representative voltage (e.g., average, as shown in Fig. 10).

We then treat the value at every clock cycle as a different sample. Fig. 11 shows an example where the voltage at position (x, y) is divided into three time spans, and its random variables are denoted as  $V_{x,y,1}$ ,  $V_{x,y,2}$ , and  $V_{x,y,3}$ . The number of time spans is determined according to the modeling requirements, i.e., when we need to accurately model dynamic variations within a clock cycle, the number of spans should be increased, otherwise a few spans are sufficient.

## B. Variable Transformation With Orthogonalization

Given a set of variables, we translate these and derive a compact statistical model with Gaussianization and orthogonalization. 1) Gaussianization: The first step in the transformation of variables is to improve their Gaussianity. This step can be skipped when the supply-voltage distribution can be reasonably treated as Gaussian. A well-known transformation to improve Gaussianity is Box-Cox transformation [15]. There are several equations for the Box-Cox transformation, and the one we have used in this paper is expressed as

$$\hat{z} = \begin{cases} \frac{z^{\Lambda} - 1}{\Lambda}, & (\Lambda \neq 0)\\ \log(z), & (\Lambda = 0) \end{cases}$$
(1)

where z is the original variable,  $\hat{z}$  is the transformed variable, and  $\Lambda$  is a parameter. In our modeling, z corresponds to a variable of power supply noise  $V_{x,y,t}$ . The optimum  $\Lambda$  that maximizes Gaussianity is individually computed for every variable  $V_{x,y,t}$  by using the maximum likelihood procedure, and is given to SSTA.

2) Orthogonalization by PCA: PCA maps a given set of correlated random variables to a new set of uncorrelated random variables, which are called principal components (PCs). We here suppose Box-Cox transformation is not applied. Given a variance-covariance matrix, PCA transforms variable  $z_i$  into (2), where  $\lambda_j$  is the *j*th largest eigenvalue,  $e_{ij}$  is the element of the *j*th eigenvector, which corresponds to  $z_i$ ,  $\mu_i$  is the average of  $z_i$ , and  $\sigma_i$  is the standard deviation of  $z_i$ . k is the number of PCs and  $pc_i$  is the *j*th PC. Principal component  $pc_i$  is expressed as (4), which is a linear summation of the *n* original variables of  $z_i$ . The PCs are random variables that are mutually uncorrelated, which simplifies the computation of correlations significantly in SSTA [2]. Moreover,  $z_i$  is often approximated as (3) with the reduced number of PCs, k'(k' <k), when the original variables of  $z_i$  are correlated. When Box-Cox transformation is applied to  $z_i$  beforehand, we construct a variance-covariance matrix of the transformed variables, and perform PCA. Consequently, in this case,  $z_i$ ,  $\mu_i$ ,  $\lambda_i$ ,  $e_{ij}$ ,  $pc_j$ , and  $\sigma_i$  in (2)–(4) are replaced with  $\hat{z}_i$ ,  $\hat{\mu}_i$ ,  $\hat{\lambda}_i$ ,  $\hat{e}_{ij}$ ,  $\hat{p}\hat{c}_j$ , and  $\hat{\sigma}_i$ , respectively

$$z_i = \mu_i + \left(\sum_{j=1}^k \sqrt{\lambda_j} e_{ij} p c_j\right) \sigma_i \tag{2}$$

$$\approx \mu_i + \left(\sum_{j=1}^{k'} \sqrt{\lambda_j} e_{ij} p c_j\right) \sigma_i \tag{3}$$

$$pc_j = \frac{1}{\sqrt{\lambda_j}} \sum_{i=1}^n \left( e_{ij} \frac{z_i - \mu_i}{\sigma_i} \right). \tag{4}$$

3) Computational Complexity: Let m denote the number of samples and n denote that of variables. The optimal  $\Lambda$  for Box-Cox transformation in (1) is derived by using the likelihood function, and its complexity is O(m). The transformation of all n variables requires the effort of O(mn). However, the complexity of PCA is  $O(n^3)$  [2]. Consequently, the total cost of transforming the variables is  $O(n^3)$ . Although this complexity is not low, as the variables are only transformed once before SSTA, this computational cost is expected to be acceptable, which is similar to that of other SSTA methods [2], [3].

TABLE I COMPUTATIONAL TIME FOR PCA



Fig. 12. Power and ground level differences between driver and receiver.

Table I shows the execution time of PCA implemented in R [16] on a computer with an Opteron processor 2.4 GHz and 16 GB of memory. We can see that the CPU time increases superlinearly as previously mentioned. Therefore, it is important to keep the number of variables small, and the adaptive discretization described in Section III-A helps reduce PCA cost. Even in the case that PCA cost is not acceptable, a region that is modeled at a time is reduced with a sacrifice of accuracy. However, power-voltage variation has a property of locality [17], and the local region is smaller than a chip, and hence the accuracy loss is thought to be limited. On the other hand, the execution time of Box-Cox transformation for 2000 variables each of which has 2000 samples was 37.5 s. The complexity of Box-Cox transformation (O(mn)) is lower than that of PCA  $(O(n^3))$ , and hence the cost of Box-Cox transformation is not dominant even when the number of variables increases.

## IV. SSTA WITH STATISTICAL MODEL OF POWER SUPPLY NOISE

This section discusses the application of the statistical model of power supply noise to SSTA. The proposed model can be applied to both path-based and block-based SSTA.

Equation (5) is a common gate-delay model in canonical form that is widely used in SSTA implementations. We adopted this form, because it achieves fundamental *sum* and *max* operations in SSTA efficiently as long as the variables are Gaussian [2]

$$d_i = \mu_i + \sum_{j=1}^{k'} a_{i,j} p c_j.$$
 (5)

Here,  $a_{i,j}$  is a sensitivity coefficient associated with  $pc_j$ .

The power and ground level differences between drivers and a receiver affect the switching delay of the receiver, as has been reported (e.g., [10], [18]). Fig. 12 shows the difference in levels between a driver and a receiver. Suppose the receiver is placed at the (x, y) grid and switching is done in the (t) time span. Similarly, the driver is placed at the  $(x_l, y_l)$  grid and switching is done in the  $(t_l)$  time span.  $V_{DD_r}/V_{SS_r}$  is the supply/ground voltage on the receiver side at the (x, y) grid in the (t) time span. Similarly,  $V_{DDd_l}/V_{SSd_l}$  is the supply/ground voltage on the *l*th driver side at the  $(x_l, y_l)$  grid in the  $(t_l)$  time span. To take the difference in levels into account, we use the following canonical delay form. We here focus on power supply noise, and hence only the sensitivity terms to the power/ground voltages are included. When process variation is also considered, the sensitivity terms to process parameters are added

$$d_{r} = \mu_{r} + \frac{\partial d_{r}}{\partial V_{\text{DD}_{r}}} \Delta V_{\text{DD}_{r}} + \frac{\partial d_{r}}{\partial V_{\text{SS}_{r}}} \Delta V_{\text{SS}_{r}} + \sum_{l} \left( \frac{\partial d_{r}}{\partial V_{\text{DD}d_{l}}} \Delta V_{\text{DD}d_{l}} + \frac{\partial d_{r}}{\partial V_{\text{SS}d_{l}}} \Delta V_{\text{SS}d_{l}} \right) \quad (6)$$

$$\approx \mu_r + \sum_{j=1}^{k'} \sqrt{\lambda_j} A_{r,j} p c_j \tag{7}$$

$$A_{r,j} = \sigma_{V_{\text{DD}_r}} \frac{\partial d_r}{\partial V_{\text{DD}_r}} e_{(V_{\text{DD}_r}),j} + \sigma_{V_{\text{SS}_r}} \frac{\partial d_r}{\partial V_{\text{SS}_r}} e_{(V_{\text{SS}_r}),j} + \sum_l \left( \sigma_{V_{\text{DD}d_l}} \frac{\partial d_r}{\partial V_{\text{DD}d_l}} e_{(V_{\text{DD}d_l}),j} + \sigma_{V_{\text{SS}d_l}} \frac{\partial d_r}{\partial V_{\text{SS}d_l}} e_{(V_{\text{SS}d_l}),j} \right).$$
(8)

The second and third terms in the right-hand side of (6) correspond to delay variations due to voltage variations at the receiver. The subsequent terms mean the delay variations caused by voltage variations at the driver. In multiple-input cells, there are several inputs. Even the voltages of stable (not switching) inputs affect the propagation delay [18], and hence we sum up terms with respect to all voltage variables at the drivers. By expressing the variation in (6) (each  $\Delta V$ ) with the second term of (3), we can obtain (7) and (8), where  $\lambda_j$  is the eigenvalue and  $e_{(V),j}$  is the element of the eigenvector, respectively.

When Box-Cox transformation is applied,  $V_{DD}$  and  $V_{SS}$  are translated into  $\hat{V}_{DD}$  and  $\hat{V}_{SS}$ .  $\partial d/\partial \hat{V} (= (\partial d/\partial V) \cdot (\partial V/\partial \hat{V}))$ is the sensitivity of the delay to  $\hat{V}$ , and  $\partial V/\partial \hat{V}$  must be computed. An ordinary way is that the derivative  $\partial V/\partial \hat{V}$  at the nominal  $\hat{V}$  is used. However, Box-Cox transformation of (1) is a nonlinear function, and then a certain amount of inaccuracy arises when the variation is not small enough. To mitigate the inaccuracy, the term of  $\partial V/\partial \hat{V}$  is substituted with the standard deviation ratio of the original variable to the transformed variable  $\sigma_V/\sigma_{\hat{V}}$ . This substitution is based on an idea that  $\sigma_V(\partial d_r/\partial V)$  and  $\sigma_{\hat{V}}(\partial d_r/\partial \hat{V})$  should be identical to reduce the estimation error of the standard deviation. The form of (7) is compatible with (5), and hence we can easily take manufacturing variability and power supply noise into consideration by adding the sensitivity terms of process variation to (7).

Unlike process variations, the proposed method needs special consideration. In spatial discretization, a grid, i.e., a variable parameter, is definitely assigned to a gate. However, in temporal discretization, the correspondence with a variable is occasionally obscure, because a switching transition may occur at the boundary of the temporal division. Suppose an input transition happens before the boundary of the temporal division, and the corresponding output transition occurs after the boundary. In this case, if we choose a set of parameters in the canonical delay



Fig. 13. Weighted-average calculation based on switching term.

form  $(\mu_r \text{ and } a_{r,j})$  only from the former time span of the input transition timing, timing estimation error arises. Furthermore, when the temporal division is rough, i.e., there are few time spans, the voltage difference between two successive time spans is large, which may cause large errors in timing estimates. To mitigate this error, we revised a weighted-average calculation (Fig. 13) to cope with a problem that the input and output transition timings of a gate are included in different time spans. Let  $t_{\rm I}$  and  $t_{\rm O}$  represent the input and output transition timings, where the former time belongs to Span #(m) and the latter to Span #(m+1). First, we estimate  $t_{\rm O}$  by using  $\mu_{r_m}$ , which is the average delay in Span #(m), i.e.,  $t_{\rm O} = t_{\rm I} + \mu_{r_m}$ , because to is needed for the weighting-average calculation, but the average delay that is necessary to compute  $t_{\rm O}$  is not available at first. Using these values, average  $\mu'_r$  and the coefficient of (5),  $a'_{r,i}$ , are recalculated by

$$\mu_r' = \frac{\Delta t_{\rm I}}{\Delta t_{\rm I} + \Delta t_{\rm O}} \mu_{r_m} + \frac{\Delta t_{\rm O}}{\Delta t_{\rm I} + \Delta t_{\rm O}} \mu_{r_{m+1}} \tag{9}$$

$$a'_{r,j} = \frac{\Delta t_{\rm I}}{\Delta t_{\rm O} + \Delta t_{\rm O}} a_{r_m,j} + \frac{\Delta t_{\rm O}}{\Delta t_{\rm I} + \Delta t_{\rm O}} a_{r_{m+1},j} \quad (10)$$

where  $\Delta t_{\rm I}$  is the time from  $t_{\rm I}$  to the boundary time,  $\Delta t_{\rm O}$ is the time from the boundary time to  $t_{\rm O}$ , and  $\mu_{r_{m+1}}$  is the average delay of Span #(m+1). Here,  $a_{r_m,j}$  is the coefficient of (7) (=  $\sqrt{\lambda_j}A_{r,j}$ ) in Span #(m) and  $a_{r_{m+1},j}$  is that in Span #(m+1). Then, we regard  $\mu'_r$  and  $a'_{r,j}$  as  $\mu_i$  and  $a_{i,j}$ in (5), and we finally obtain gate delay.

## V. EXPERIMENTAL RESULTS

This section presents the experimental results. We first validated the statistical modeling of power supply noise, and then verified the accuracy of the proposed timing analysis.

#### A. Experimental Conditions

We used an FPU circuit and a Tiny64 processor [14] as noise generators to construct the proposed model for power supply noise. These circuits were synthesized by using a commercial logic synthesizer and placed and routed by utilizing a commercial tool with a 90-nm standard cell library. The FPU circuit had 39-k gates and the Tiny64 processor had 20-k gates. We attached the power/ground network shown in Fig. 14 to each noise generator circuit and simulated power supply noise. A flip-chip package with bump connections was assumed. Input vectors of 2000 clock cycles were applied to both circuits. There is a tradeoff between the number of samples (cycles)



Fig. 14. Power network for test circuit.

and statistical validity of the noise model. When higher validity is necessary, more input vectors are required. The simulation results were used for PCA including the correlation matrix calculations. Please note that other methods of estimating power noise can be used, even though we used a fast circuit simulator.

We implemented block-based SSTA and iterative STA (2000 runs) simulation in C++ and performed these for ISCAS85 benchmark circuits, a 64-b multiplier, an ALU circuit for vector operation, and an H-tree for clock distribution on a computer with a 2.4-GHz Opteron processor and 16 GB of memory. These circuits except the H-tree were synthesized, placed, and routed by using commercial tools. A single path with seven drivers was selected in the H-tree and its jitter was evaluated. The power supply noise of the FPU circuit or the Tiny64 processor described above was applied to the benchmark circuits.

#### B. Validation of Statistical Modeling of Power Supply Noise

1) Box-Cox Transformation for Power Supply Variables: Here, we discuss a distribution of power supply voltage as an example. We chose a distribution of power supply noise that was relatively far from the Gaussian (Fig. 15), whereas many variables are close to the Gaussian. Fig. 16 is the normal probability plot of Fig. 15. In the normal distribution, all closed circles are plotted along the diagonal line. When the closed circles are far from the diagonal line, the distribution is very different from the Gaussian. In Fig. 16, many closed circles are not along the diagonal line, which means the distribution is different from the Gaussian, as shown in Fig. 15.

However, the variable transformed by the Box-Cox transformation approaches the Gaussian (Fig. 17). In the normal probability plot of Fig. 18, the closed circles are closely plotted along the diagonal line, which means the Gaussianity is greatly improved.

2) Effect of Box-Cox Transformation on Sum Distribution: Box-Cox transformation improves Gaussianity of variables, however, you might be afraid that important statistical parameters such as averages, standard deviations and correlation coefficients degrade. We examined the effect of Box-Cox transformation using several well-known distributions. We compare two sum distributions of two variables; one is constructed with Box-Cox transformation and the other is reproduced without Box-Cox transformation. The evaluation procedure is as follows.

- 1) Generate 5000 original samples of two correlated variables.
- 2) \*Perform Box-Cox transformation to the two variables.



Fig. 15. Supply-voltage distribution before Box-Cox transformation.



Fig. 16. Normal probability plot of Fig. 15.

- 3) Perform PCA to the two variables with the transformed (with Box-Cox)/original (without Box-Cox) samples.
- 4) Generate samples in accordance with PCs and obtain new samples of the two variables.
- 5) \*Perform inverse Box-Cox transformation to the new samples.
- Compare sum distributions computed with the original samples (ideal) and the new samples (with and without inverse Box-Cox).

We here show an example using a Gamma distribution [(11): $\alpha = 2$ ,  $\beta = 1$ : Fig. 19] and a Weibull distribution [(12): $\alpha = 2$ ,  $\beta = 2$ ]. The correlation coefficient between two variables is set to about 0.7

$$y = \begin{cases} \frac{1}{\Gamma(\alpha)\beta^{\alpha}} x^{\alpha-1} e^{-\frac{x}{\beta}}, & (x \ge 0) \\ 0, & (\text{otherwise}) \end{cases} \qquad (\alpha > 0) \qquad (11)$$

$$y = \begin{cases} \frac{\beta x^{\beta-1}}{\alpha^{\beta}} e^{-\left(\frac{x}{\alpha}\right)^{\beta}}, & (x \ge 0)\\ 0, & (\text{otherwise}) \end{cases} \qquad (\alpha, \beta > 0).$$
(12)

Fig. 20 shows the sum distributions of Gamma and Weibull distributions with and without Box-Cox transformation. By performing Box-Cox transformation before PCA, the sum distribution nicely approaches to the ideal distribution, which means important statistical parameters, such as average, standard deviation, and correlation coefficient, are well reproduced. We compared the error of the distribution with Box-Cox transformation to the error of that without Box-Cox transformation. Here, the error is defined as the integral of the absolute difference of the cumulative density from 1% to 99%. This comparison showed that Box-Cox transformation reduced the error by 66.4%. Although the optimum As are different (Gamma: -3.42, Weibull: -2.05), Box-Cox transformation works well. We further examined other combinations of distribution shapes,



Fig. 17. Distribution after Box-Cox transformation corresponding to Fig. 15.



Fig. 18. Normal probability plot of Fig. 17.



Fig. 19. PDF of Gamma distribution ( $\alpha = 2, \beta = 1$ ).



Fig. 20. CDF of sum of Gamma and Weibull distributions.

such as uniform and triangle distributions as well as Gamma and Weibull distributions, and confirmed that Box-Cox transformation reduces the error considerably similarly to the above result.

In the experiments, we found that the SSTA results were accurate, which will be discussed later in Section V-C. We thus concluded that orthogonalization with PCA for power supply noise is a realistic approach.

3) Variable Reduction Rate: When the correlation between random variables is high, the original distribution can be reproduced with a small number of PCs. This section discusses



Fig. 21. Proportion of first principal component.

how many PCs can be reduced. When we reduce the number of PCs, a metric called cumulative proportion is used [19]. The cumulative proportion is expressed as

cumulative proportion<sub>k'</sub> = 
$$\frac{1}{n} \sum_{j=1}^{k'} \lambda_j$$
 (13)

where n is the number of variables. As the cumulative proportion approaches one, the original distribution is well reproduced.

Fig. 21 shows the proportion of the first PC (i.e., cumulative proportion<sub>1</sub>) when the number of divisions is changed. The solid line plots the relationship between the number of spatial divisions and the proportion of variance where temporal divisions within a cycle are not executed. Here, the spatial division is performed uniformly. The strongly correlated variables allow the first PC to maintain a high proportion. However, the broken line plots the results when the number of temporal divisions varies while keeping the number of spatial divisions unchanged. The increase in the number of temporal divisions does not affect the proportion very much, because the parasitic capacitor in the chip smoothes power supply noise and increases temporal correlation. Furthermore, if intentional decoupling capacitance is inserted, the spatial and temporal correlation of power supply noise increases, and modeling efficiency improves further. Power noise also has a correlation with ground noise. Therefore, even when there are numerous variables, a small number of PCs can achieve a high cumulative proportion. Let us discuss an example. Suppose the numbers of spatial and temporal divisions of the difference in potential between power and ground are  $10 \times 10$  and 10, respectively. We examined the number of PCs whose cumulative proportion exceeded 90%. Only six PCs were capable of attaining the target value, even though the total number of variables was 1000. In this instance, more than 99% of the variables could be reduced, which helped to reduce the computational cost of SSTA, because complexity is proportional to the number of PCs [2].

4) Adaptive Spatial Discretization: We will explain an example where the adaptive spatial discretization explained in Section III-A was applied to the power supply noise of Tiny64. In this experiment, the threshold values of the average and standard deviation used for equivalent partition checking AVGth and SDth were set to a 20% of the differences between the



Fig. 22. Adaptive spatial discretization.



Fig. 23. Comparison of proposed SSTA to iterative STA.

maximum and minimum values in the whole area, and the threshold of the correlation coefficient CCth was set to 0.8.

Fig. 22 shows the results of adaptive discretization where there are 92 divided areas. The region where voltage is fluctuating locally is finely discretized. If all the areas were divided with the finest resolution, there would be 840 divisions. As mentioned in Section III-B-3, the complexity of PCA is  $O(n^3)$ , and hence, the reduction in variables from 840 to 92 corresponds to over a 700× cost reduction in PCA. The effect of the adaptive spatial discretization on SSTA will be shown in Section V-E.

## C. SSTA Results for Power Supply Noise

We first verified the accuracy of the method of timing analysis we propose. In this experiment for the uniform spatial discretization, the number of spatial divisions was set to  $10 \times 10$ and the number of temporal divisions was set to ten. Here, we performed STA simulation iteratively for every clock cycle by using the noise information from 2000 cycles, which is the same as the information given to PCA. The overview of proposed SSTA and iterative STA simulation is shown in Fig. 23. The noisy-power-voltage waveforms of each cycle are given for all cells considering their placements. The delay in each cell was calculated with the voltage value corresponding to the cell position and switching timing. With these gate delays, conventional STA was carried out and the circuit delay of each cycle was obtained. Therefore, there were 2000 STA evaluations. The STA for 2000 cycles results do not include errors that originated from discretization, PCA for incomplete Gaussian distributions, or SSTA operation. The results for STA (2000 runs) were compared to those of SSTA as ideal solutions.

Table II lists the average and standard deviation of the delay acquired by SSTA with and without Box-Cox transformation

| SSTA w/o   |         | SSTA w/o - STA |          | SSTA w/ |        |          | SSTA w/ - STA |                 | STA (2000 runs) |        | delay    |         |           |
|------------|---------|----------------|----------|---------|--------|----------|---------------|-----------------|-----------------|--------|----------|---------|-----------|
| circuit    | # cells | Box-Co         | x trans. | ST      | A      | E        | Box-Cox tra   | -Cox trans. STA |                 | A (ST  |          | A)      | w/o noise |
|            |         | avg (ps)       | sd (ps)  | avg (%) | sd (%) | avg (ps) | sd (ps)       | CPU (ms)        | avg (%)         | sd (%) | avg (ps) | sd (ps) | (ps)      |
| c432       | 232     | 843.1          | 11.1     | 0.522   | 7.58   | 843.1    | 11.2          | 67.0            | 0.522           | 8.17   | 838.7    | 10.4    | 716.1     |
| c1355      | 329     | 477.8          | 4.98     | 1.32    | 28.2   | 477.8    | 4.99          | 107             | 1.33            | 28.0   | 471.6    | 6.94    | 399.7     |
| c1908      | 387     | 737.7          | 15.0     | 0.548   | 27.3   | 737.7    | 15.1          | 119             | 0.548           | 28.0   | 733.6    | 11.8    | 619.3     |
| c6288      | 3382    | 2755           | 35.3     | 0.331   | 10.3   | 2755     | 35.3          | 1000            | 0.331           | 10.5   | 2764     | 32.0    | 2371      |
| c7552      | 2070    | 725.7          | 13.6     | 0.121   | 17.5   | 725.7    | 13.7          | 599             | 0.121           | 18.3   | 726.6    | 11.6    | 608.9     |
| multiplier | 41629   | 1839           | 19.9     | 0.102   | 8.45   | 1839     | 19.9          | 11700           | 0.102           | 8.71   | 1837     | 18.3    | 1590      |
| ALU        | 14655   | 1075           | 12.3     | 0.192   | 3.79   | 1075     | 12.3          | 4190            | 0.192           | 4.02   | 1077     | 11.8    | 907.0     |
| H-tree     | 7       | 194.2          | 1.53     | 0.584   | 11.8   | 194.2    | 1.54          | 1               | 0.584           | 12.0   | 193.0    | 1.37    | 171.7     |
| average    | -       | -              | -        | 0.465   | 14.4   | -        | -             | -               | 0.466           | 14.7   | -        | -       | -         |

 TABLE II

 Accuracy of Timing Estimation With and Without Box-Cox Transformation (FPU)

 TABLE III

 ACCURACY AND #PCs (MULTIPLIER, TINY64)

| #PC | S | c.prop. (%) | avg (ps) | sd (ps) | CPU time (ms) |
|-----|---|-------------|----------|---------|---------------|
| 1   |   | 84.2        | 1843     | 0.384   | 164           |
| 2   |   | 92.9        | 1843     | 3.16    | 166           |
| 4   |   | 95.8        | 1843     | 3.71    | 180           |
| 8   |   | 98.4        | 1843     | 4.07    | 205           |
| 16  |   | 99.5        | 1843     | 4.09    | 238           |
| 200 | 0 | 100         | 1843     | 4.09    | 11800         |

and STA (2000 runs). We can see that the proposed SSTA with and without Box-Cox transformation estimates the timing accurately. The error in estimating the average delay is 0.465% and that of the standard deviation is 14.4%. The estimation error of the standard deviation seems to be large; however, the standard deviation is relatively small to the average. On the other hand, voltage variation affects not only the standard deviation but also the average in statistical *max* operation. The error of  $\mu + 3\sigma$  is added up to 0.570%, which means the statistical delay variation due to power supply noise is well estimated.

On the other hand, the effect of Box-Cox transformation is limited, because most variables were originally close to the Gaussian in this experiment. However, the statistical noise model is improved thanks to Box-Cox transformation so that non-Gaussian distribution can be reproduced. When the noise information from the Tiny64 processor is applied, the error for the average is 0.568% and that for the standard deviation is 19.5%. The proposed method should help designers to quantitatively know how circuit delays systematically fluctuate depending on input vectors.

The STA (2000 runs) results indicate worst-case delay does not always occur when power/ground noise is maximum. Even when the supply voltage, which is averaged temporally within a clock cycle and spatially within a block area, was minimum in circuit c1355, the circuit delay was not the longest. This situation in fact corresponds to the case of the 970th longest circuit delay in the 2000 cycles that were evaluated. Thus, finding the maximum power/ground noise is not sufficient for verifying timing.

Table III lists the relation between the number of PCs (cumulative proportion) and accuracy of estimating delay for a 64-b multiplier. The number of spatial divisions is  $10 \times 10$  and the number of temporal divisions is ten, and the noise generator is a Tiny64 processor. Here, the result with only eight PCs is very close to that with all 2000 PCs, which considerably reduces the number of variables. The CPU time is reduced from 11 800 to 205 ms, i.e., by 98.3%.

#### D. Discussion on the Number of Vectors

In this section, we evaluate the relation between the number of input vectors and the accuracy of delay estimation. We examined the delays estimated by iterative STA which was used as the ideal solution in Section V-C, since we here focused on the number of vectors and intended to eliminate errors originating from SSTA. We first prepared a result of 20 k-runs STA as the ideal solution, and we compared the ideal solution with those of 2000-runs STA. The input vectors of 2000-runs STA are subsets of those given to 20 k-runs STA, and 2000runs STA was performed ten times using ten subsets of input vectors.

Each estimated delay of circuit c432 is listed in Table IV. The error of average is 0.00593% and that of standard deviation is 1.68% in average, respectively. We also performed 200-runs STA ten times similarly. In this case, the error of average is 0.0184% and that of standard deviation is 5.72%, that is to say, both the errors between 2000-runs and 200-runs STA become three times as large as those between 20 *k*-runs and 2000-runs STA.

Similar evaluations were performed to other eight benchmark circuits. The error of average is 0.00543% and that of standard deviation is 0.960% when we compared 20 k-runs STA with 2000-runs STA. These errors are smaller than the errors of SSTA discussed in Section V-C. On the other hand, the error of average increases to 0.0166% and that of standard deviation increases to 3.19%, when we compared 2000-runs STA with 200-runs STA. The error of Monte Carlo simulation is generally represented as  $O(1/\sqrt{n})$  where n is the number of trials [20], and our experiments follow this tendency. Considering the SSTA error, 2000-runs STA can be reasonably used as a reference in this paper.

### E. Discussion on Spatial Discretization

We then evaluated the accuracy of SSTA when power supply noise was modeled with adaptive spatial discretization described in Section III-A. We first compared SSTA with uniform  $10 \times 10$  discretization to that with the adaptive discretization whose threshold parameters were set to the same as Section V-B4. In both cases, the number of temporal divisions is ten. The results are listed in Table V. The adaptive spatial

 TABLE
 IV

 Estimation Variation by Different Vector Subset (c432, Tiny64)

| STA           | avg (ps) | sd (ps)   |          |            |                |          |         |           |                      |  |  |
|---------------|----------|-----------|----------|------------|----------------|----------|---------|-----------|----------------------|--|--|
| (20k runs)    | 823.9    | 1.43      |          |            |                |          |         |           |                      |  |  |
|               | STA      | (2000 run | s)       |            | STA (200 runs) |          |         |           |                      |  |  |
|               |          |           |          | - 20k runs |                |          |         |           | 200 runs - 2000 runs |  |  |
| vector subset | del      | ay        | 20k      | runs       | vector subset  | delay    |         | 2000 runs |                      |  |  |
| (2000 cycles) | avg (ps) | sd (ps)   | avg (%)  | sd (%)     | (200 cycles)   | avg (ps) | sd (ps) | avg (%)   | sd (%)               |  |  |
| #1            | 823.8    | 1.43      | 0.00359  | 0.0542     | #1-1           | 824.0    | 1.28    | 0.0170    | 10.0                 |  |  |
| #2            | 823.8    | 1.48      | 0.00914  | 3.52       | #1-2           | 823.8    | 1.37    | 0.00822   | 3.91                 |  |  |
| #3            | 823.8    | 1.41      | 0.00459  | 0.870      | #1-3           | 823.6    | 1.50    | 0.0280    | 4.86                 |  |  |
| #4            | 823.9    | 1.41      | 0.00925  | 1.19       | #1-4           | 823.6    | 1.46    | 0.0226    | 2.23                 |  |  |
| #5            | 823.9    | 1.37      | 0.0111   | 4.08       | #1-5           | 823.7    | 1.47    | 0.0167    | 3.09                 |  |  |
| #6            | 823.8    | 1.44      | 0.0105   | 1.26       | #1-6           | 824.1    | 1.33    | 0.0384    | 6.49                 |  |  |
| #7            | 823.9    | 1.45      | 0.00189  | 1.43       | #1-7           | 823.9    | 1.31    | 0.0125    | 8.15                 |  |  |
| #8            | 823.8    | 1.45      | 0.00181  | 1.74       | #1-8           | 823.8    | 1.47    | 0.000439  | 2.83                 |  |  |
| #9            | 823.9    | 1.40      | 0.000418 | 1.60       | #1-9           | 824.0    | 1.38    | 0.02353   | 3.60                 |  |  |
| #10           | 823.9    | 1.41      | 0.00697  | 1.06       | #1-10          | 823.7    | 1.60    | 0.0163    | 12.0                 |  |  |
| average       | -        | -         | 0.00593  | 1.68       | average        | -        | -       | 0.0184    | 5.72                 |  |  |

 TABLE
 V

 Accuracy of Timing Estimation With Uniform and Adaptive Spatial Discretization (Tiny64)

|            | SSTA                |         | SSTA uni STA |         | SSTA     |          |         | SSTA adap STA |         | STA (2000 runs) |          |         |
|------------|---------------------|---------|--------------|---------|----------|----------|---------|---------------|---------|-----------------|----------|---------|
| circuit    | uit uniform (10×10) |         | STA          |         | adaptive |          |         | STA           |         | (STA)           |          |         |
|            | avg (ps)            | sd (ps) | CPU (ms)     | avg (%) | sd (%)   | avg (ps) | sd (ps) | CPU (ms)      | avg (%) | sd (%)          | avg (ps) | sd (ps) |
| c432       | 837.4               | 1.66    | 66.1         | 1.65    | 16.4     | 827.2    | 1.40    | 63.3          | 0.409   | 1.98            | 823.8    | 1.43    |
| c1355      | 474.4               | 2.05    | 108          | 1.41    | 8.41     | 482.4    | 1.81    | 98.6          | 0.241   | 19.3            | 481.2    | 2.24    |
| c1908      | 725.8               | 1.44    | 118          | 0.195   | 3.43     | 719.4    | 1.50    | 109           | 0.689   | 8.27            | 724.4    | 1.39    |
| c6288      | 2726                | 2.72    | 993          | 0.242   | 7.25     | 2724     | 3.43    | 970           | 0.175   | 16.9            | 2719     | 2.93    |
| c7552      | 718.7               | 1.53    | 595          | 0.716   | 10.5     | 725.1    | 1.90    | 578           | 0.166   | 11.2            | 723.9    | 1.71    |
| multiplier | 1843                | 4.09    | 11800        | 0.0179  | 1.84     | 1842     | 4.21    | 11400         | 0.0391  | 0.833           | 1843     | 4.17    |
| ALU        | 1037                | 0.687   | 4050         | 0.198   | 22.0     | 1037     | 0.794   | 4040          | 0.153   | 9.72            | 1035     | 0.880   |
| H-tree     | 190.6               | 0.0919  | 1            | 0.111   | 86.3     | 189.8    | 0.430   | 1             | 0.351   | 35.9            | 190.4    | 0.672   |
| average    | -                   | -       | -            | 0.568   | 19.5     | -        | -       | -             | 0.278   | 13.0            | -        | -       |



Fig. 24. Estimation error and number of spatial divisions.

discretization reduced the average error to 49% (from 0.568% to 0.278%) and the error of standard deviation to 67% (from 19.5% to 13.0%) in average, respectively, even though the number of variables is also reduced to 90%. The adaptive discretization improves accuracy even while the number of variable decreases. This result also points out that finer discretization could reduce the estimation error, although there is a tradeoff between estimation accuracy and computational time.

We next evaluated the relation between the accuracy and the number of spatial divisions. The number of temporal divisions is set to ten, and we tested various threshold values used in the adaptive spatial discretization. We also varied the number of uniform spatial divisions, and these results are shown in Fig. 24. In this experiment, the error of  $\mu + 3\sigma$  is used for the evaluation. The solid line with squares represents the error

of the uniform discretization, and each square corresponds to  $4 \times 4$  to  $10 \times 10$ , respectively. Closed circles represent the error of the adaptive discretization, and each circle corresponds the SSTA result under a set of threshold values. Almost all circles are below the solid line, which means that the adaptive spatial discretization can derive more precise statistical model when using the same number of divisions, and reduce the number of variables while keeping the same accuracy.

## F. SSTA Result Both for Power Supply Noise and Manufacturing Variability

We could finally demonstrate that the proposed method estimates delay distributions taking both dynamic power supply noise and static manufacturing variability into consideration in a unified way. In this experiment, the threshold voltage (Vth) was fluctuated. Its variations consisted of a spatially correlated constituent and a random fluctuation constituent. For the spatial correlation, we assumed that the correlation coefficient of Vth was given by a function,  $f(x) = e^{-2x}$ , where *x*mm is the distance between two gates [21], [22]. We presumed that the magnitudes of both variational components were the same and the total standard deviation was 25 mV, which is a typical value in a 90-nm process [21]. For the sake of simplicity, intragate fluctuations were not considered in this experiment. We then added the sensitivity term corresponding to Vth to the canonical delay model of (7). We also assumed that



Fig. 25. CDF of delay distribution taking process and power supply fluctuations into consideration.

TABLE VI ESTIMATED DELAYS WITH POWER SUPPLY NOISE, Vth VARIATION, AND BOTH (MULTIPLIER, FPU)

| Consideration           | avg (ps) | sd (ps) |
|-------------------------|----------|---------|
| Power supply noise only | 1839     | 19.7    |
| Vth fluctuation only    | 1833     | 27.9    |
| Both fluctuations       | 1857     | 34.9    |

manufacturing variability and power supply noise were uncorrelated in this experiment, even though this mutual dependence has been analyzed [23]. In other words, power supply noise varies depending on process variation; however, in this experiment, the statistical noise model was constructed without considering process variation. On the other hand, the mutual correlation could be modeled by PCA naturally, as long as we could obtain the statistical data that includes mutual dependence between process variation and power supply noise. This experiment was aimed at demonstrating how feasible the proposed method was in coping with manufacturing variability and power supply noise in a unified way. The ideal solution was obtained as follows. First, 2000 sets of Vth variation were generated in Monte Carlo way. Then, one set of Vth variation and one cycle of power supply noise were given to STA. By iterating STA 2000 times, we obtained the delay distribution.

Fig. 25 shows the delay distribution of a 64-b multiplier where the number of spatial divisions is set to  $10 \times 10$  and the number of temporal divisions is set to ten. The power supply noise of an FPU was applied. The difference between the two distributions at 50% cumulative density is 4 ps, and this error is quite small, which means the proposed method copes well with both variations. Table VI lists the delay under power supply noise, Vth variation, and both. If the timing margin  $3\sigma$ is individually set for all variations, the total margin becomes 142.7 ps (=  $(19.7 + 27.9) \times 3$ ). However, simultaneously taking all variations into consideration by using the proposed method reduces the timing margin to 104.8 ps (=  $34.9 \times 3$ ). This indicates the possibility that the new method can provide new sign-off criteria taking both manufacturing and supplyvoltage fluctuations into account, even though several studies need to be carried out before applying it to a practical design. More importantly, the average delay considering both the variabilities is 1857 ps, which is larger by 18 ps than the average delay considering only power supply noise. This result cannot be obtained without simultaneously taking both variabilities into consideration. Thus, it is requisite to treat manufacturing and supply voltage fluctuation in a unified manner.

### VI. CONCLUSION

We proposed SSTA in this paper that took dynamic power supply noise into account with the orthogonalization technique. We confirmed that dynamic power/ground noise could be statistically modeled with PCA even though the distribution of power supply voltage was not rigidly Gaussian. The experiments revealed that the new method accurately estimated delay variations due to power supply noise. We experimentally demonstrated that a small number of PCs obtained by PCA were capable of accurately estimating delay due to the spatial and temporal correlation of power supply noise.

#### ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers, whose comments provided excellent feedback, resulting in an improvement in the quality of this paper.

#### REFERENCES

- T. Enami, S. Ninomiya, and M. Hashimoto, "Statistical timing analysis considering spatially and temporally correlated dynamic power supply noise," in *Proc. ISPD*, Apr. 2008, pp. 160–167.
- [2] H. Chang and S. Sapatnekar, "Statistical timing analysis under spatial correlations," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 24, no. 9, pp. 1467–1482, Sep. 2005.
- [3] C. Visweswariah, K. Ravindran, K. Kalafala, S. G. Walker, and S. Narayan, "First-order incremental block-based statistical timing analysis," in *Proc. DAC*, Jun. 2004, pp. 331–336.
- [4] J. Singh and S. Sapatnekar, "A scalable statistical static timing analyzer incorporating correlated non-Gaussian and Gaussian parameter variations," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 27, no. 1, pp. 160–173, Jan. 2008.
- [5] K. Shimazaki, M. Fukazawa, M. Nagata, S. Miyahara, M. Hirata, K. Sato, and H. Tsujikawa, "An integrated timing and dynamic supply noise verification for nano-meter CMOS SoC designs," in *Proc. CICC*, Sep. 2005, pp. 31–34.
- [6] M. Hashimoto, J. Yamaguchi, T. Sato, and H. Onodera, "Timing analysis considering temporal supply voltage fluctuation," in *Proc. ASP-DAC*, Jan. 2005, pp. 1098–1101.
- [7] J.-J. Liou, A. Krstic, Y.-M. Jiang, and K.-T. Cheng, "Path selection and pattern generation for dynamic timing analysis considering power supply noise effects," in *Proc. ICCAD*, Nov. 2000, pp. 493–497.
- [8] G. Bai, S. Bobba, and I. N. Hajj, "Static timing analysis including power supply noise effect on propagation delay in VLSI circuits," in *Proc. DAC*, Jun. 2001, pp. 295–300.
- [9] D. Kouroussis, R. Ahmadi, and F. N. Najm, "Worst-case circuit delay taking into account power supply variations," in *Proc. DAC*, Jun. 2004, pp. 652–657.
- [10] S. Pant and D. Blaauw, "Static timing analysis considering power supply variations," in *Proc. ICCAD*, Nov. 2005, pp. 365–371.
- [11] S. Pant, D. Blaauw, V. Zolotov, S. Sundareswaran, and R. Panda, "A stochastic approach to power grid analysis," in *Proc. DAC*, Jun. 2004, pp. 171–176.
- [12] Y. Jiang and K. Cheng, "Analysis of performance impact caused by power supply noise in deep submicron devices," in *Proc. DAC*, Jun. 1999, pp. 760–765.
- [13] H. S. Kim and D. M. H. Walker, "Statistical static timing analysis considering the impact of power supply noise in VLSI circuits," in *Proc. MTV*, Dec. 2006, pp. 76–82.
- [14] OPENCORES.ORG. [Online]. Available: http://www.opencores.org/
- [15] R. M. Sakia, "The Box-Cox transformation technique: A review," *Statistician*, vol. 41, pp. 169–178, 1992.
- [16] *The R Project for Statistical Computing*. [Online]. Available: http://www.r-project.org/
- [17] E. Chiprout, "Fast flip-chip power grid analysis via locality and grid shells," in *Proc. ICCAD*, Nov. 2004, pp. 485–488.
- [18] M. Hashimoto, J. Yamaguchi, and H. Onodera, "Timing analysis considering spatial power/ground level variation," in *Proc. ICCAD*, Nov. 2004, pp. 814–820.

- [19] I. T. Jolliffe, *Principal Component Analysis*, 2nd ed. New York: Springer-Verlag, Oct. 2002.
- [20] G. S. Fishman, Monte Carlo Concepts, Algorithms, and Applications. New York: Springer-Verlag, Apr. 1996.
- [21] H. Masuda, S. Ohkawa, A. Kurokawa, and M. Aoki, "Challenge: Variability characterization and modeling for 65- to 90-nm processes," in *Proc. CICC*, Sep. 2005, pp. 593–599.
- [22] J. Xiong, V. Zolotov, and L. He, "Robust extraction of spatial correlation," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 26, no. 4, pp. 619–631, Apr. 2007.
- [23] P. Ghanta, S. Vrudhula, S. Bhardwaj, and R. Panda, "Stochastic variational analysis of large power grids considering intra-die correlations," in *Proc. DAC*, Jul. 2006, pp. 211–216.





**Shinyu Ninomiya** (S'06) received the B.E. degree from Osaka University, Osaka, Japan, in 2007, where he is currently working toward the M.E. degree in the Department of Information Systems Engineering.

His research interest includes variability modeling and statistical timing analysis.

Mr. Ninomiya is a Student Member of the Institute of Electrical, Information and Communication Engineers.

**Masanori Hashimoto** (S'00–A'01–M'03) received the B.E., M.E., and Ph.D. degrees in communications and computer engineering from Kyoto University, Kyoto, Japan, in 1997, 1999, and 2001, respectively.

Since 2004, he has been an Associate Professor with the Department of Information Systems Engineering, Graduate School of Information Science and Technology, Osaka University, Osaka, Japan. His research interest includes computer-aided design for digital integrated circuits, and high-speed circuit design.

Dr. Hashimoto served on the technical program committees for international conferences including the Design Automation Conference, the International Conference on Computer-Aided Design, the Asia and South Pacific Design Automation Conference (ASP-DAC), the International Conference on Computer Design, and the International Symposium on Quality Electronic Design. He is a member of the Institute of Electrical, Information and Communication Engineers and the Information Processing Society of Japan. He received the Best Paper Award at ASP-DAC 2004.



**Takashi Enami** (S'05) received the B.E. and M.E. degrees from Osaka University, Osaka, Japan, in 2006 and 2008, respectively, where he is currently working toward the Ph.D. degree in the Department of Information Systems Engineering.

His research interest includes noise aware timing analysis and distribution of power supply network.

Mr. Enami is a Student Member of the Institute of Electrical, Information and Communication Engineers.