# DESIGN OF DIGITAL STRUCTURES TOLERANT TO LOCAL INTRA-DIE PROCESS VARIATIONS

Daniel Iparraguirre Cárdenas<sup>(1)</sup> and Víctor Champac Vilela<sup>(2)</sup>

(1) Freescale Semiconductor, Mexico Technology Center, Jalisco – Mexico.
(2) Instituto Nacional de Astrofísica, Óptica y Electrónica, Puebla – México.

daniel.iparraguirre@freescale.com, champac@inaoep.mx

# ABSTRACT

Process variations are imposing strong limits to performance of digital circuits at gigascale integration; they are classified in two types: inter-die and intra-die variations. Whereas inter-die variations affect the deviation of performance distribution in a lot of chips, intra-die variations affect the media of performance distribution. The present work is proposing a new design methodology for digital circuits, in order to reduce the impact of local intra-die process variations. It involves a set of transistor structures and logic gates, so as selection criteria for these structures; this design methodology is applied on rebuilding critical paths with the new structures, reducing their propagation delay variability at the expense of circuit area. Simulation results show a significant performance variability reduction for a moderate increment of area and power consumption.

# **1. INTRODUCTION**

A synchronous clocked digital system contains a number of critical paths, whose propagation delays determine the system performance. Since process variability affects propagation delays and therefore the final performance, a lot of circuits from the same design provide a statistical distribution for the last one. For this distribution, according to Bowman [1], inter-die variability directly affects the deviation, whereas intra-die variability affects the media. This has significant consequences for the digital designer, because intra-die variations, which were ignored in the past, have each time become more significant, and they are now larger than inter-die variations.

The scope of the present work is to generate a methodology for robust digital design oriented to reduce the impact of local intra-die process variations. Since performance characteristics are determined essentially by critical paths in every chip, this design methodology is to be applied only on these paths. The fundamental key of this work is the fact that intra-die variations become averaged in an array of components, in such a way that relative performance deviation becomes reduced in comparison to a single component. This fact suggests the implementation of new digital structures with a library of new gates, in which single transistors are replaced with arrays of transistors in a parallel, serial o mixed fashion. An adequate structure selection and transistor sizing allows achieving a significant variability reduction, at the expense of more area for the given critical path.

This paper is organized as follows: Section 2 shows the fundamentals for the proposed methodology and its effectiveness when applied on critical paths. Section 3 shows the application of the proposed methodology on design of generic critical paths, as well as the achieved improvements by simulation results. Finally, Section 4 gives the conclusions from the present work.

# 2. THE PROPOSAL FUNDAMENTALS

### 2.1. Reduction of intra-die variability

An example for a critical path is illustrated at fig. 1(a), with its electrical representation shown in figure 1(b). Since high-to-low and low-to-high transitions involve capacitance chargings / dischargings at each gate, propagation delay depends on every electrical component (resistance and load capacitance) present in the model. As well, propagation delay variability is a function of the resistance and capacitance variabilities at every stage.

According to the Pelgrom's model [2], [3], local intra-die variations are uncorrelated from device to device, so the resistances and capacitances coming from different stages in figure 1(b) are expected to be uncorrelated as well. Because of that, the way to reduce the impact of local intra-die variations that is applied here involves working on every individual component (that is, transistor from every gate); here comes the idea of replacing transistors with certain components whose behavior is expected to be more tolerant to local intra-die variations.

### 2.1. The proposal's core

The present design proposal consists on the use of transistor structures, which are intended to replace single



Fig. 1. A generic 3-stage logic path: (a) external description and (b) its electrical and timing characteristics

transistors in certain gates of the critical path; the structures average variation effects from single transistors, giving back more consistent performance characteristics. Design methodology involves identifying and replacing the elements inside the path that provide the highest contributions on propagation delay variability; the applied structures are sized in order to make the new path matching the original propagation delay.

Figure 2 depicts the design proposal's core. The inverter at the shown critical path has been previously pointed for redesign, so the original topology is replaced by two structures, for the NMOS and PMOS section respectively, in which two serial transistors constitute the N structure and two parallel transistors make up the P structure, like shown in figure 2(a). The secondary effect of this replacement is a modification of the stage's input capacitance; that implies a further structure replacement or resizing at the previous stage, in order to maintain the original path propagation delay, like portrayed in figure 2(b).

Arising from the above example, the main proposal characteristics are the following:

- The structures are essentially arrays in serial, parallel and mixed fashion, belonging to a structure library.
- Channel width W is the only parameter to be varied when the structure is sized; channel length L is restricted to the minimal value.
- Area and power are the cost to pay.
- It is up to the designer to specify the resource constraints to be afforded, as well as the desired variability reduction.



Fig. 2. Structure replacement process in a path: (a) stage replacement, and (b) additional replacement for compensating the new load capacitance. The new topologies provide more consistent performance characteristics that improve the path's robustness.

#### **3. THE STRUCTURE LIBRARY**

#### 3.1. Definition

The library, which is the core of the current proposal, consists on a set of transistor structures conveniently characterized for the output (drain-source) resistance and input (gate) capacitance, and considering all the electrical information about them; up to 4-transistor structures have been considered for the present work. Figure 3 shows the structures of 1, 2 and 3 transistors in their schematic representations, as well as their layout geometries; figure 4 shows the proposed 4-transistor structures. All transistors inside the structure are sharing the same W, which is called "structure width".

Electrical characterization is essential in order to establish the structure equivalence, which allows determining the structure widths when a replacement is performed inside a gate. The procedure may be analytical (considering that structures are basically built from parallel and serial arrangements) or empirical (working with simulation results). Each structure receives a certain



Fig. 3. Nomenclature, schematics and layout for the 1, 2 and 3-transistor structures.



Fig. 4. Nomenclature, schematics and layout for the 4-transistor structures.

topology: parallel (arrays 2A, 3A and 4A), quasi-parallel (arrays 3B, 4B and 4C), mixed (arrays 4D and 4E), quasi-serial (arrays 3C, 4F and 4G) and serial (2B, 3D and 4H).

# 3.2. Variability effects of parameters on resistance

SMOS model [2] states that local intra-die variations are inversely proportional to transistor dimensions. In such a sense, considering a fixed channel length, it is possible to determine an expression for the resistance variability, shown in (1).

$$\sigma_{local}(R_1) = \frac{1}{W^2} \sqrt{a_{R1}W + a_{R2}}$$
(1)

where  $a_{R1}$  merges the effect of L,  $V_{to}$ ,  $t_{ox}$  and  $\mu_0$  variations, and  $a_{R2}$  only variations on W.

The equation (1) allows finding resistance variabilities at every structure. Table I shows structure resistance variabilities, considering W-variations to be 20\% of total variations [4] ( $a_{R1}$ =0.8 and  $a_{R2}$ =0.2) and a unitary resistance variability for a single transistor of unitary W ( $\sigma_{W=1}R_1 = 1$ ). Results are arranged for decreasing resistance variability.

The most evident fact emerging from Table I is that variability increases slightly for equivalent parallel structures, because of the term  $a_{R2}$  that represents W-variations. On the other hand, variability is decreased for mixed, quasi-serial and serial structures.

TABLE I ABSOLUTE AND RELATIVE VARIABILITIES FOR THE EQUIVALENT WIDTH, FOR  $a_{R1}$ =0.8 AND  $a_{R2}$ =0.2

| Struc. | Eq. width | $\sigma(R)$                            | W = 1       |          |
|--------|-----------|----------------------------------------|-------------|----------|
|        |           |                                        | $\sigma(R)$ | $C_{in}$ |
| 4A     | 0.25W     | $\frac{1}{W^2}\sqrt{0.8W+0.8}$         | 1.265       | 1        |
| 3A     | 0.333W    | $\frac{1}{W^2}\sqrt{0.8W+0.6}$         | 1.183       | 1        |
| 2A     | 0.5W      | $\frac{1}{W^2}\sqrt{0.8W+0.4}$         | 1.095       | 1        |
| 4B     | 0.386W    | $\frac{0.917}{W^2}\sqrt{0.8W+0.518}$   | 1.052       | 1.54     |
| 1      | W         | $\frac{1}{W^2}\sqrt{0.8W+0.2}$         | 1           | 1        |
| 3B     | 0.63W     | $\frac{0.861}{W^2}\sqrt{0.8W + 0.318}$ | 0.963       | 1.89     |
| 4C     | 0.706W    | $\frac{0.865}{W^2}\sqrt{0.8W+0.283}$   | 0.900       | 2.82     |
| 4F     | 1.033W    | $\frac{0.641}{W^2}\sqrt{0.8W + 0.194}$ | 0.637       | 4.13     |
| 3C     | 1.2W      | $\frac{0.597}{W^2}\sqrt{0.8W + 0.167}$ | 0.587       | 3.6      |
| 4D     | 0.85W     | $\frac{0.551}{W^2}\sqrt{0.8W + 0.235}$ | 0.570       | 3.4      |
| 4E     | 0.85W     | $\frac{0.551}{W^2}\sqrt{0.8W} + 0.235$ | 0.570       | 3.4      |
| 2B     | 1.7W      | $\frac{0.551}{W^2}\sqrt{0.8W + 0.118}$ | 0.528       | 3.4      |
| 4G     | 1.9W      | $\frac{0.401}{W^2}\sqrt{0.8W+0.105}$   | 0.381       | 7.6      |
| 3D     | 2.4W      | $\frac{0.378}{W^2}\sqrt{0.8W} + 0.083$ | 0.355       | 7.2      |
| 4H     | 3.1W      | $\frac{0.288}{W^2}\sqrt{0.8W} + 0.065$ | 0.268       | 12.4     |

### 4. PATH DESIGN

#### 4.1. Considerations and criteria

The main criteria and strategies to be applied for structure replacement are the following:

- Local contributions on variability are quadratical [2], so the largest improvement on variability is reached when the maximum local variability in the path is reduced. Large contributions on variability are presented by:
  - Stages driving large load capacitances.
  - Stages with large variability disbalance between the N and P sections (like NOR gates).
- 2. Every stage has a "replacement tail". As suggested from figure 2(b), the replacement at a certain stage entails further replacements at previous stages in order to compensate changes on load capacitances. That's why the last stages in the path are the first to be considered for improvement. Further structure replacements are performed at the first stages.
- 3. Every replacement must look for "transistor economy", leaving the costly solutions (p.e. with 4-transistor structures) for the last.

### 4.2. Design example: A generic critical path

Figure 5 shows the gate-level schematics of the path to be improved. The path contains an inverter, a NAND and a NOR gate, as well as 3 external capacitances equivalent to 3, 5 and 4 symmetric inverters of double the minimal size ( $W_n = 0.92 \mu m$ ,  $W_p = 2.45 \mu m$ ), respectively.

Table II shows performance characteristics for the original path, applying the TSMC  $0.18\mu m$  technology. The following conclusions can be extracted from here:

• Last stage, involving the NOR gate, presents the highest disbalance on rise and fall variability.

| TABLE II                                  |
|-------------------------------------------|
| STRUCTURE SIZES AND PROPAGATION DELAY     |
| VARIABILITIES FOR THE CRITICAL PATH TO BE |
| IMPROVED.                                 |

|          |              |             | Stage 1<br>1 / 1 | Stage 2<br>1-1 / 1 | Stage 3<br>1 / 1-1 |     |
|----------|--------------|-------------|------------------|--------------------|--------------------|-----|
|          | NMO          | S size (µm) | 0.92             | 1.35               | 0.53               |     |
|          | PMO          | S size (µm) | 2.45             | 2.02               | 2.84               |     |
|          |              | Thl1 (ps)   | Tlh2 (ps)        | Thl3 (ps           | ) Thl (p           | os) |
|          | $\mu$        | 80.6        | 160.2            | 179.9              | 420.7              | 7   |
|          | $\sigma$     | 7.15        | 12.4             | 20.8               | 29.6               |     |
| 0        | $\sigma/\mu$ | 8.87%       | 7.72%            | 11.54%             |                    |     |
| $\sigma$ | /Thl         | 1.70%       | 2.94%            | 4.94%              | 7.04%              | 6   |
|          |              | Tlh1 (ps)   | Thl2 (ps)        | Tlh3 (ps           | ) Tlh (p           | os) |
|          | $\mu$        | 82.6        | 134.6            | 184.6              | 401.7              | 7   |
|          | $\sigma$     | 5.40        | 9.60             | 10.6               | 18.2               |     |
| 0        | $\sigma/\mu$ | 6.54%       | 7.13%            | 5.76%              |                    |     |
| $\sigma$ | /Tlh         | 1.34%       | 2.39%            | 2.65%              | 4.53%              | 6   |
|          |              | Pow         | ver ( $\mu W$ )  | 475                |                    |     |



Fig. 5. Schematics for the path to be improved

• Last stage presents also the highest local variability and media.

Therefore, design procedure will work primarily on the NOR gate; consequent changes on load capacitances will be countered with replacements on the other gates. Working on the NOR gate involves replacing the N transistor with serial structures and the P with parallel structures, in order to improve robustness for the fall transition at the expense of the rise transition.

New gate input capacitance for the NOR gate is expected to be 2.5 times the original; however, the presence of the external capacitance (5 inverters) makes the new total capacitance to be only 1.25 times the former one. This increment can be easily managed by a moderate improvement on the NAND gate; a similar argument leads to a little change on the inverter. Resulting implementation is displayed at figure 6, and its performance portrayed at Table III. A fairly high robustness has been reached (61% of the original variability), along with a moderate increment on power dissipation (27\%).

## 5. CONCLUSIONS

The above implementations and results have shown the effectiveness of the proposed methodology on reducing variability effects at critical paths.

Because of the amount of freedom degrees (stage propagation delays, structure sizes), there are several implementation alternatives for a given path. It is up to the user to establish the most important characteristics

| TABLE III                                     |
|-----------------------------------------------|
| STRUCTURE SIZES AND PROPAGATION DELAY         |
| VARIABILITIES FOR THE IMPROVED CRITICAL PATH. |

|                            |    |                  | Stage 1<br>1 / 1 | Stage 2<br>1-1 / 1 | Stage 3<br>1 / 1-1 |
|----------------------------|----|------------------|------------------|--------------------|--------------------|
| N                          | MO | S size (µm)      | 0.92             | 1.35               | 0.53               |
| Р                          | MO | S size $(\mu m)$ | 2.45             | 2.02               | 2.84               |
|                            |    | Thl1 (ps)        | Tlh2 (ps)        | Thl3 (ps)          | ) Thl (ps)         |
| $\mu$                      |    | 80.6             | 160.2            | 179.9              | 420.7              |
| $\sigma$                   |    | 7.15             | 12.4             | 20.8               | 29.6               |
| $\sigma/\mu$               | t  | 8.87%            | 7.72%            | 11.54%             |                    |
| $\sigma/Th$                | hl | 1.70%            | 2.94%            | 4.94%              | 7.04%              |
|                            |    | Tlh1 (ps)        | Thl2 (ps)        | Tlh3 (ps)          | ) Tlh (ps)         |
| $\mu$                      |    | 82.6             | 134.6            | 184.6              | 401.7              |
| $\sigma$                   |    | 5.40             | 9.60             | 10.6               | 18.2               |
| $\sigma/\mu$               | t  | 6.54%            | 7.13%            | 5.76%              |                    |
| $\sigma/Tl$                | h  | 1.34%            | 2.39%            | 2.65%              | 4.53%              |
| <b>Power</b> $(\mu W)$ 475 |    |                  |                  |                    |                    |



Fig. 6. Schematics for the new implementation.

and constraints, like area and power consumption, in order to select the adequate combination.

#### **10. REFERENCES**

[1] K. Bowman, S. Duvall and J. Meindl, "Impact of Die-to-Die and Within-Die Parameter Fluctuations on the Maximum clock Frequency Distribution for Gigascale Integration", IEEE Journal of Solid-State Circuits, vol. 37, pp. 183-190, Feb. 2002.

[2] C. Michael and M. Ismail, "Statistical Modeling for Computer-Aided Design of MOS VLSI Circuits", Kluwer Academic Publishers, 1993.

[3] M. Pelgrom, A. Duinmaijer and A. Welbers, "Matching Properties of MOS Transistors", IEEE Journal of Solid-State Circuits, vol. 34, pp. 1433-1440, Oct. 1989.

[4] K. Bernstein et al., "High Speed CMOS Design Styles", Kluwer Academic Publishers, 1999.