### Automatic Generation of Standard Cell Library in VDSM Technologies

Masanori Hashimoto, Kazunori Fujimori and Hidetoshi Onodera Department of Communications and Computer Engineering, Kyoto University {hasimoto, onodera}@i.kyoto-u.ac.jp

### Abstract

We developed standard cell libraries for three technologies(130, 180 and 350nm) using an automatic layout generation tool that we have developed. The automatic layout generation tool is improved so as to cope with phase shift mask and mismatch between transistor pitch and wire pitch, which appear commonly in VDSM technologies. The developed libraries are as competitive as manually-designed libraries in layout density and speed. The libraries are currently public to educational organizations in Japan after we verified the functionalities of all cells and the speed of basic combinational cells on the fabricated chips.

### 1 Introduction

With technology advance, design rules for layout pattern have been complicated. Life cycle of manufacturing technology, however, is getting shorter, and design time for library design must be shortened. Although layout generation of standard cells has been studied so far, we must continuously improve layout generation tools to catch up complicated design rules. Accordingly, manual design is still done in many fabs.

In current SoC/ASIC designs, cell-base design methodology is widely used. High-quality cell libraries are crucial for designing high-performance circuits. Recently cellbase design with various driving strength cells is discussed and it is found that a rich library in driving strength improves circuit performance close to transistor-level optimized circuits[1, 2, 3]. Increasing the cell variation demands automatic cell generation with dense layout density. Recently surrounding layout pattern affects transistor characteristics due to OPC and layout density variation, and hence predictable layout generation is crucial, because no one is sure about the transistor characteristics in a novel layout.

We have developed a cell layout generation tool called VARDS that is based on symbolic layout method [1]. VARDS can tune transistor sizes inside a cell flexibly while keeping the benefit of symbolic layout method. VARDS can generate the variety in driving strength easily, still though the generated cell layout is predictable.

In this paper, we describe the cell layout generation that can cope with phase shift mask and difference between transistor pitch and wire pitch. These two issues are not unusual in VDSM technologies, and hence automatic cell generator must handle these issues. We also report the developed standard cell libraries including a plenty of driving strength variation for 130, 180 and 350nm technologies. The functionalities of there libraries are verified by the measurement of the fabricated chips. The libraries are widely used in Japanese educational organizations.

This paper is organized as follows. In Section 2, cell layout generation is explained. Section 3 describes the detail of the developed libraries, and Section 4 explains the library verification on the fabricated chips. Section 5 concludes the discussion.

### 2 Layout generation system: VARDS

We have developed a cell layout generation system, which is called VARDS[1], and improve it for VDSM library generation. VARDS generates cell layouts from gridbase symbolic layouts[4] managing design rules. Given process-independent symbolic layouts, design rules in manufacturing process, a map file that describes restriction of gate width, for example, VARDS generates physical layouts in GDSII or CIF format that satisfy design rules (Fig. 1). The features of VARDS are:

- 1. process-portable layout generation
- 2. dense layout
- 3. to generate various driving-strength cells easily (cell height, transistor sizes inside a cell)
- 4. to generate phase shift mask
- 5. to cope with a process whose transistor and wire pitches are different.

Thanks to symbolic layout based system, the generated cell layouts are predictable when transistor sizes inside a cell are changed, whereas cell generators whose input files



are SPICE netlists sometimes generate totally different cell layouts even with small difference in transistor sizes. Also, VARDS can easily generate the cell layout whose height is different, and tune transistor sizes inside a cell. Symbolic layout methods have a good feature of predictable layout generation. In VDSM technologies, manufacturing variability based on layout pattern is getting significant. The predictable cell layout helps to control manufacturing variability and contributes design for manufacturability.

The fourth and fifth features are mainly developed and implemented to generate cell libraries in VDSM technologies of 0.13 and 0.18 $\mu$ m. These features are explained in the following sections.

#### 2.1 Symbolic layout

#### 2.1.1 Symbolic object and symbolic design rule

Symbolic layouts are created by placing symbolic objects on virtual grid under a virtual design rule called symbolic design rule. We hereafter call this virtual grid as a cell grid. A symbolic object is a symbolized layout primitive, which has geometric information such as attributes and size of layer, figure type and so on. The cell grid is a grid defined such that it corresponds to the pitch on the physical layout.

A symbolic design rule consists of a spacing rule between symbolic objects and a region where a symbolic object is allowed to place. VARDS calculates grid spacings which meet a spacing rule between symbolic objects, and selects the largest spacing as a pitch of the physical layout. It is common for P&R tools to perform routing based on a routing grid defined by interconnect pitches. Thus the height and the width of standard cell layout should be multiple of the routing pitches.

#### 2.1.2 Cell layout

A cell layout generated by VARDS is a one-dimensional transistor placement style. Each cell layout has two transistor regions, which are a p-transistor region and an ntransistor region. A p-transistor region is upper in a cell



Figure 1. Overview of cell layout generation system (VARDS).



| @BEGIN A | DI21          |                |
|----------|---------------|----------------|
| BOUNDARY | 0.50,0.00     | 4.50,9.00      |
| NWELL    | -0.50,10.00   | 5.50,5.00      |
| PMOS2    | 1.50,7.00     | 1.00           |
| NMOS2    | 1.50,2.00     | 1.00           |
| POLY     | 1.50,P_TOP    | 1.50,N_TOP     |
| PMOS2    | 2.50,7.00     | 1.00           |
| NMOS2    | 2.50,2.00     | 1.00           |
| POLY     | 2.50,P_TOP    | 2.50,N_TOP     |
| PMOS2    | 3.50,7.00     | 1.00           |
| NMOS2    | 3.50,2.00     | 1.00           |
| POLY     | 3.50,P_TOP    | 3.50,N_TOP     |
| PDCONT   | 1.00,P_BOTTOM | 1.00,P_TOP     |
| MlW      | 1.00,0.50     | 1.00,N_BOTTOM  |
| NDCONT   | 1.00,N_BOTTOM | 1.00,N_TOP     |
| PDCONT   | 2.00,P_BOTTOM | 2.00,P_TOP     |
| NDCONT   | 2.00,N_BOTTOM | 2.00,N_TOP     |
| M1W      | 3.00,8.50     | 3.00, P_BOTTOM |
|          |               |                |
| @END     |               |                |

#### (b) description

### Figure 2. An example of symbolic layout : **AOI21**

layout, and an n-transistor region is lower. Power line and n-well contacts are placed at the top of a cell layout, ground line and p-well contacts are placed at the bottom.

Fig. 2 shows an example of a symbolic layout. Fig. 2 (a) is a graphical view and Fig. 2 (b) is a part of the description of the symbolic layout. The numbers in Fig. 2 (b) express the placement coordinate. There are coordinates represented not by numbers but by labels in Fig. 2 (b). These are hierarchical grids. A hierarchical virtual grid on transistor is especially called MOSGRID. A transistor has three coordinates, TOP, MIDDLE, and BOTTOM. MOSGRID will be explained in the next section.



Figure 3. Translation into physical layout.

### 2.2 Tunability in transistor widths inside a cell

Conventional symbolic layout methods can not assign transistor widths flexibly. VARDS introduce a hierarchical grid system in order to tune transistor width flexibly while keeping the features of predictable and fast layout generation.

Symbolic layout methods generate layouts according to the cell grid, and the cell grid usually corresponds to the minimum transistor pitch. In the upper case of Fig. 3, the transistor width is decided by the grid pitch conventionally. Extending transistor widths can be done by increasing the grid pitch in transistor region. However, when reducing transistor widths, the simple decrease of the grid pitch causes a design rule violation. It is impossible to reduce transistor widths while keeping the cell height.

VARDS then introduces a virtual grid that is hierarchically defined over the ordinal cell grid of symbolic layout, which enables transistor downsizing with the cell height unchanged. Especially, the hierarchical virtual grid on MOS-FETS is called "MOSGRID". MOSGRID is mapped into the cell grid of symbolic layout according to the given transistor width, and then the cell grid is translated into the actual coordinates (See the lower figure in Fig. 3). Thanks to the two-step hierarchical translation, the objects expressed using MOS grid are placed properly, because the actual coordinates are not fixed in the symbolic layout.

### 2.3 Phase shift mask

Phase shift mask is to change the phase of light that goes through a mask. Exploiting the interference of the 180 degree phase shifted light, the effective resolution of layout pattern can be improved. Currently the technique of phase shift mask is commonly used. Figure 4 shows examples of layouts with phase shift mask. The boundary of phase shift



Figure 4. Layout examples with phase shift mask. In the right example, phase shift mask can not be applied.



Figure 5. Phase shift mask and poly interconnect.

mask is placed at the center of gate poly, and the phase is changed alternately. Generally, there are layout patterns that phase shift mask can not apply, and hence we must design symbolic layouts that phase shift mask can be placed. In the right example of Fig. 4, phase shift mask can not apply the part of the horizontal interconnection.

In the cells such that a set of PMOS and NMOS is connected with a straight line, we should generate a rectangular phase shift mask. However, in complex cells such as flip-flops, a set of PMOS and NMOS is not necessarily connected with a straight line and then a simple rectangular phase shift mask may cause a design rule violation (Fig. 5). We hence improve VARDS such that a polygonal phase shift mask can be generated according to the shape of poly interconnects. When there is a horizontal poly interconnection, phase shift mask needs to be placed on the poly interconnect, and we improve VARDS to place phase shift mask in that way.

## 2.4 Mismatch between transistor pitch and wire pitch

In recent technologies, there are processes whose transistor pitch, that is the distance between source contact and drain contact, and wire pitch. Conventional symbolic layout





(a)difference between wire (b)off-grid processing of pitch and transistor pitch a terminal

# Figure 6. Relation between wire track and cell grid.

methods assume that interconnect pitch and transistor pitch are the same, and hence minimum interconnect pitch can not be utilized (Fig. 6(a); wire pitch is smaller than transistor pitch). In order to enable routing with the minimum wire pitch, we improve VARDS such that terminals are placed on wire tracks by off-grid processing (Fig. 6(b)).

In the description of symbolic layouts, we specify the object position such that the values on X axis and Y axes in the cell grid, such as 2.0,2.0 are given. We add a suffix of R, L, C to the X axis value and a suffix of U, D, C to the Y axis value in order to place terminals on the neighboring wire tracks. For example, when the description of 2.0R,2.0U is given, the terminal position should be on the intersection of the right vertical wire track adjacent to X=2.0 and the upper horizontal wire track adjacent to Y=2.0. The suffix of C means the nearest position on the wire track. When we move the terminal position simply, a design rule violation to neighboring interconnects may occur. Therefore when the suffix of C is given, we first check if a design rule violation occurs, and we move the terminal when no design rule violations happen. When a design rule occurs, we will try another candidate.

We assign proper suffix in the description of symbolic layout according to cell layout, and then we can place input and output terminals on wire track automatically.

### **3** Developed standard cell libraries

This section describes the cell libraries generated by VARDS. Table 1 lists the cell varieties of our libraries in logical function and driving strength. The number of cells in each 0.18 and 0.35  $\mu$ m library set is 310. The organization of each library in logical function and driving strength

### Table 1. Variation in Functionality and Driving-Strength

| Functionality                       | Driving-Strength                  |
|-------------------------------------|-----------------------------------|
| INV, BUF                            | 005, 010, 015, 020, 030, 040,     |
|                                     | 050, 060, 080, 120, 160           |
| NAND2, NAND3, NOR2, NOR3, AOI21     | 005, 010, 015, 020, 030, 030S,    |
| AOI22, AOI211, OAI21, OAI22, OAI211 | 040, 040S, 060, 060S, 080, 080S   |
| NAND4, NOR4                         | 005, 010, 015, 020, 030, 040,     |
|                                     | 040S, 060, 060S, 080, 080S        |
| AND2, AND3, OR2, OR3                | 005, 010, 015, 020, 030, 040,     |
|                                     | 040S, 060, 080, 080S              |
| AND4, OR4                           | 005, 010, 015, 020, 030, 040,     |
|                                     | 040S, 060, 080                    |
| XOR2, XNOR2                         | 005, 010, 020, 030, 040, 060, 080 |
| HAD1, FAD1, MUX2, TINV, TBUF        | 010, 020, 030, 040, 060, 080      |
| DF, DFR, DFSR, DFN, DFNR, DFNSR     | 010, 020, 040, 060, 080, 120, 160 |

"010" represents the standard driving strength. The cells labeled "S" are composed by parallel connection.

is the same. For  $0.13\mu$ m process, we generate only basic cells, and the number of cells is 38. The principal features of our libraries are:

- The driving strength range is wide from x0.5 to x16 and it advances the flexibility to various, load conditions, i.e. very small load to huge #fanout load, which is essential and indispensable requirement for high-performance circuits, especially in DSM technologies[3].
- Small driving cells(x0.5) is so effective to reduce power dissipation[2]. Small increase step in driving strength enables fine tuning in delay and power optimization.
- Two types of high-driving strength cells are prepared: one is composed by parallel cells and suitable for highspeed design. The other is constructed in the seriallyconnected structure with a tapering ratio and it is desirable to compact and low-power design.
- The variety in functionality is not so large, because a compact set is good enough for circuit design[3, 5]. We choose the necessary and sufficient variety in functionality shown in Table 1 from both the point of circuit performance and reducing library development cost.

The cell height and PMOS/NMOS ratio, which are the basic specification of standard cell libraries, are shown in Table 2. We develop high-speed and low-power libraries for 0.13 and  $0.35\mu$ m processes whose cell heights are different. The cell height of the high-performance(HP) library is fourteen interconnect pitches and that of the low-power(LP) library is nine interconnect pitches.

We show the basic performance of the developed  $0.35 \mu m$ libraries in Table 3. To make a comparison, Table 3 also





Figure 7. Examples of Cell Layouts. From the left, DFP010 in 0.18 $\mu$ m process, AOI21P010 with different cell heights in 0.35 $\mu$ m process, and DFRP 010 in 0.13 $\mu$ m process.

lists the performance of the library generated by a commercial module generator(MG Lib.) and that of the commercial library special to its process (Fab Lib.). The information of Fab Lib. is restricted, then the only parameters we know are listed. Our layout generation tool VARDS can cope with the difference between wiring and transistor pitches, and hence the interconnect pitches of the developed libraries is the same with Fab Lib.. Also the area of D flip-flop with reset is the same with that of Fab Lib., which reveals that the layout density of our library is so competitive. Table 3 also shows the delay difference between HP and LP libraries. The difference in the case of FO1 comes from PMOS/NMOS ratio, and the delay difference in the case of  $200\mu$ m-length interconnect load reveals that HP library is effective to reduce delay in large-scale high-speed designs.

To demonstrate performance difference between HP and LP libraries, we design a 32-bit RISC processor core using the  $0.35\mu$ m libraries. The design constraints are 100MHz and 130MHz, and we compare delay, circuit area, and the sum of transistor widths, where the sum of transistor widths directly corresponds to power consumption. The circuit is

### **Table 2. Basic Specification of Cell Libraries**

|                         | 0.35(HP) | 0.35(LP) | 0.18 | 0.13(HP) | 0.13(LP) |
|-------------------------|----------|----------|------|----------|----------|
| Cell Height(int. pitch) | 14       | 9        | 12   | 14       | 9        |
| PN Ratio(PMOS/NMOS)     | 1.23     | 1.00     | 1.33 | 1.23     | 1.00     |

## Table 3. Performance Comparison( $0.35 \mu m$ technology)

|                   | HP Lib. | LP Lib. | MG Lib. | Fab Lib. |
|-------------------|---------|---------|---------|----------|
| Cell height       | 1.56    | 1       | 1.31    | 1        |
| Int. pitch        | 1       | 1       | 1.07    | 1        |
| Tr. width(NMOS)   | 1.82    | 1       | 1.31    | -        |
| DFR Area          | 1.56    | 1       |         | 1        |
| INV delay (FO1)   | 0.91    | 1       | 0.99    | -        |
| (FO1, int. 200µm) | 0.77    | 1       | 0.91    | -        |

### Table 4. Design results of RISC processor core.

| Constraint | Metric                 | HP Lib. | LP Lib.    | MG Lib.    |
|------------|------------------------|---------|------------|------------|
| 100MHz     | Area(mm <sup>2</sup> ) | 1.62    | 1.11       | 1.18       |
|            | Tr. width(mm)          | 393.4   | 212.8      | 284.2      |
| 130MHz     | Area(mm <sup>2</sup> ) | 1.74    | impossible | impossible |
|            | Tr. width(mm)          | 424.5   | -          | -          |



Figure 8. Design results of DCT circuit.

designed by a commercial logic synthesis tool and a P&R tool. After P&R, we tune driving strength of each instance according to the P&R results. The results are shown in Table 4. In the case of 100MHz design, our library reduces area by 6% and total transistor width(power dissipation) by 25%. HP library realizes 130MHz design, although LP and MG Libs. can not satisfy design constraints. We also design a DCT circuit[6]. Figure 8 shows the delay and area tread-off. HP library provides the fastest circuit implementation, whereas the circuit area of LP library is the minimum, which are the results we expected. Circuit designers can choose a suitable library according to required circuit performance.





Figure 9. Chip Micro-graphs. From the left 0.18 $\mu$ m chip (2.8mm square), 0.35 $\mu$ m chip (4.8mm square), and two blocks in 0.13 $\mu$ m chip (360×230 $\mu$ m<sup>2</sup>).

### 4 Verification

We design TEG circuits to verify our libraries for every process. The chip micro-graphs are shown in Fig. 9. The TEG circuits consist of two groups: one is to verify logical function, and the other is to evaluate cell delay. The TEG chip of  $0.13\mu m$  process aims only to evaluate cell delay. We first explain TEG for logical verification. We construct chains that consist of serially-connected combinational cells and check whether signal transition propagates through the chain, where the input pins except the pin we want to examine are fixed to high or low such that the signal transition will occur. The designed circuit covers all combinational cells and its all input-output combinations. Flip-flops(DFP040, DFRP040, DFSRP040, DFNP040, DFNRP040, DFNSRP040) are examined solely, i.e. all input signals are given independently and observe their outputs. We build dividers using the rest of FFs and check the divider behavior. Tri-state buffers and inverters are tested whether enable inputs allow/stop signal propagation. As for cell delay evaluation, we construct ring oscillators using basic combinational cells and evaluate those oscillation frequency.

The library TEG chips of  $0.18\mu$ m and  $0.35\mu$ m processes are measured by HP83000 LSI tester. We confirm that all combinational cells and FFs work correctly. We also observe that cell delay of basic combinational cells is within process variation range that is announced from Fabs.. The chip of  $0.13\mu$ m technology is measured using a probe card with twelve needles. We confirm that the measured cell delay and power dissipation are close to the simulation results.

### 5 Conclusion

This paper describes the automatic generation of standard cell layouts in VDSM technologies. We improve the generator so as to cope with phase shift mask and mismatch between transistor and wire pitches. We also developed standard cell libraries that are rich in driving strength for  $0.35\mu$ m,  $0.18\mu$ m,  $0.13\mu$  processes. The developed libraries are competitive in layout density, and the libraries of 0.35 and  $0.18\mu$ m are currently used in the periodic fabrication service for Japanese universities.

### Acknowledgement

The VLSI chip in this study has been fabricated in the chip fabrication program of VLSI Design and Education Center(VDEC), the University of Tokyo with the collaboration by Rohm Corporation, Toppan Printing Corporation, Hitachi Ltd., Dai Nippon Printing Corporation and STARC(Semiconductor Technology Academic Research Center). This work is supported in part by the 21st Century COE Program (Grand No. 14213201).

### References

- H. Onodera, M. Hashimoto and T. Hashimoto, "ASIC Design Methodology with On-Demand Library Generation," In *Proc. Symposium on VLSI Circuits*, pp.57-60, 2001.
- [2] M. Hashimoto and H. Onodera, "Post-Layout Transistor Sizing for Power Reduction in Cell-Base Design," *IEICE Trans. on Fundamentals*, Vol. E84-A, No. 11, pp.2769-2777, Nov. 2001.
- [3] G. A. Northrop and P.-F. Lu, "A Semi-Custom Design Flow in High-Performance Microprocessor Design," In *Proc. DAC*, pp.426-431, 2001.
- [4] N. Weste and K. Eshraghian, "Principle of CMOS VLSI Design A System Perspective," Addison -Wesley Puglishing Company, 1985.
- [5] N. M. Duc and T. Sakurai, "Compact yet High-Performance (CyHP) Library for Short Time-to-Marcket with New Technologies," In *Proc. ASP-DAC*, pp.475-480, 2000.
- [6] Sherif Taher Eid, "Project: DCT- Discrete Cosine Transformer," http://www.opencores.org/.

