# Wire-sizing for Delay Minimization and Ringing Control Using Transmission Line Model

Youxin Gao and D.F. Wong Department of Computer Sciences University of Texas at Austin Austin, Texas 78712

## Abstract

In this paper, we consider continuous wire-sizing optimization for delay minimization and ringing control. The optimization is based on a fast and accurate delay estimation method under a finite ramp input, where an analytical expression is also derived to estimate overshoot/under shoot voltage. In this paper, we specify the wire shape to be of the form  $f(x) = ae^{-bx}$ , since previous studies under the Elmore delay model suggest that exponential wire shape is effective for delay minimization. The relevant transmission line equations are solved by using Picard-Carson method. The transient response in the time domain is derived as a function of a and b. The coefficients a and b are then determined such that either the actual delay (50% delay) is minimized, or the wiring area is minimized subject to a delay bound. At the same time, the overshoot/undershoot voltage is bounded to prevent false switching. Our method for delay estimation is very efficient. In all the experiments we performed, it is far more accurate than the Elmore delay model and the estimated delay values are very close to SPICE's results. We also find that in determining the optimal shape which minimizes delay, the Elmore delay model performs as good as the our method in terms of the minimum actual delay it achieves, i.e. the Elmore delay model has high fidelity. However, in determining the optimal shape which minimizes area subject to a delay bound, the Elmore delay model performs much worse than our method. We also find that the constraint for overshoot/undershoot control does affect optimization results for both delay and area minimization objectives.

#### 1. Introduction

As the feature size of VLSI circuits is gradually scaled down to deep sub-micron, interconnect delay becomes the bottleneck in designing high performance microprocessors [17]. On the other hand, while the operating frequency approaches or even beyond Giga Hertz, not only the delay due to inductance is significant, but also the transmission line effects (e.g. reflection and dispersion) are no longer negligible. Wire-sizing has been found to be an effective way to reduce the interconnect delay, and one of the approaches is continuous wire-sizing. Previous studies on continuous wire-sizing make use of the Elmore delay model [4] because of its simple algebraic form. It is found that the optimal shape function is exponential [2, 5] or near-exponential [3, 7]. However, it is well known that the Elmore delay is only a rough estimate of the actual delay. The delay value estimated by the Elmore delay model may deviate from actual delay by 30% - 40%, and the situation becomes even worse when inductance is taken into consideration. Furthermore, the Elmore delay model can not provide enough transient information of the response, which becomes as important as delay estimation in deep sub-micron design. Only recently, [8] solves the continuous wire-sizing problem analytically for non-uniform wires under transmission line model. The transmission equations are solved in terms of sinh and cosh hyperbolic functions. The transient response in the time domain can thus be solved analytically using residue theorem. Unfortunately, fringing capacitance and inductance are not included in [8].

In this paper, we consider continuous wire-sizing optimization for non-uniform wires by taking fringing capacitance and inductance into consideration. The optimization is based on a fast and accurate delay estimation method under a finite ramp input. We also derive an analytical expression to estimate overshoot/undershoot voltage. In this paper, we specify the wire shape to be of the form  $f(x) = ae^{-bx}$ , since previous studies under the Elmore delay model suggest that exponential wire shape is effective for delay minimization. The relevant transmission line equations are solved by using Picard-Carson method. The transignt response in the time domain is derived as a function of a and b. The coefficients a and b are then determined such that either the actual delay (50% delay) is minimized, or the wiring area is minimized subject to a delay bound. At the same time, the overshoot/undershoot voltage is bounded to prevent false switching. Our method for delay estimation is very efficient. In all the experiments we performed, it is far more accurate than the Elmore delay model and the estimated delay values are very close to SPICE's results. We also find that in determining the optimal shape which minimizes delay, the Elmore delay model performs as good as the our method in terms of the minimum actual delay it achieves, i.e. the Elmore delay model has high fidelity. However, in determining the optimal shape which minimizes area subject to a delay bound, the Elmore delay model performs much worse than our method. We also find that the constraint for overshoot/undershoot control does affect optimization results for both delay and area minimization objectives.

We would like to point out that a continuous wire shape is not expensive to fabricate. It is not necessary to ultra-accurately reproduce the continuous shape on the silicon, because rounding the continuous shape to the nearest available litho width will give virtually no degradation in the wire delay. The staircase shape can be stamped out just as easily as any other mask shape.

## 2. Capacitance and Inductance Models

Consider a wire segment of width W and thickness T, Sakurai's formula [16] is a good approximation for calculating the unit length capacitance. The formula can be simplified as

$$C_{tot} = c_0 W + c_f \tag{1}$$

where  $c_0 = 1.15\epsilon_{ox}/T_{ox}$  is the unit length area capacitance,  $c_f = 2.80\epsilon_{ox}(T/T_{ox})^{0.222}$  is the unit length fringing capacitance, and  $\epsilon_{ox}$  is the dielectric constant of an insulator.

In this paper, we only consider self inductance of a wire. The unit length inductance can be calculated by [19]

$$L_{self} = \frac{\mu_0}{2\pi} \ln(\frac{8T_{tot}}{W} + \frac{W}{4T_{tot}})$$
(2)

where  $\mu_0 = 4\pi \times 10^{-9}$  [H/cm] is the magnetic permeability of the wire, and  $T_{tot}$  is the distance between metal line and ground plane. Usually  $T_{tot}$  is much larger than W especially for upper layer metal lines. The second term in (2) is thus much smaller

than the first one. For our purpose of wire-sizing, we neglect the second term. Therefore the unit length inductance is approximated by

$$L_{self} \approx 2 \ln \frac{8T_{tot}}{W} \quad [nH/cm].$$
 (3)

# 3. Voltage Response in *s*-Domain



Figure 1. A non-uniform wire and its corresponding cascaded two-port networks.

Consider a non-uniform wire shown in Fig.1(a), the driving voltage source is  $V_D$  and the voltage response at load end is  $V_L$ .  $Z_D$  and  $Y_L$  are driver impedance and load admittance respectively. In *s*-domain,  $Z_D = R_D$  and  $Y_L = sC_L$ . As it is shown in Fig.1(b), the whole system can be represented by three cascaded two-port networks. The relationship between  $V_L$  and  $V_D$  in *s*-domain can be derived as [8]:

$$V_L = V_D \frac{1}{\mathcal{A} s R_D C_L - \mathcal{B} s C_L - \mathcal{C} R_D + \mathcal{D}}$$
(4)

where the ABCD parameters describe the relation between  $(V_{\rm I}, I_{\rm I})$  and  $(V_{\rm II}, I_{\rm II})$ . These parameters rely on solving transmission line equations for the non-uniform wire. To describe the voltage and current behavior on the non-uniform transmission line, we have the well known Telegraph's equations:

$$\frac{dV(s,x)}{dx} = -Z(x,s)I(s,x)$$
(5)

$$\frac{dI(s,x)}{dx} = -Y(x,s)V(s,x) \tag{6}$$

where Z(s, x) = r(x) + l(x)s, Y(s, x) = c(x)s. r(x), l(x) and c(x) are unit length resistance, inductance and capacitance at position x respectively. For a uniform wire, r(x), l(x) and c(x) are position independent and thus are constants. However, for a nonuniform wire, r(x), l(x) and c(x) are functions of x. Since previous studies based on the Elmore delay model have found that exponential wire shape is effective for delay minimization [2, 5], we restrict the wire shape function to be of the form  $f(x) = ae^{-bx}$ . Using equations (1) and (3), c(x) and l(x) can be calculated as,

$$c(x) = c_f + c_0 a e^{-bx} ag{7}$$

$$l(x) = 2\ln(\frac{8T_{tot}}{a}e^{bx}) = l_0 + 2l_1bx$$
(8)

The unit length resistance r(x) is simply:

$$r(x) = \frac{r_0}{a} e^{bx} \tag{9}$$

In (7)-(9),  $c_0$  is unit length area capacitance,  $c_f$  is unit length fringing capacitance,  $l_0 = 2 \ln \frac{8T_{tot}}{a}$ ,  $l_1 = 10^{-13}$  [H/cm], and  $r_0$  is unit square resistance. Substitute (7)-(9) into Z(s, x) and Y(s, x), we get

$$Z(s,x) = \frac{r_0}{a}e^{bx} + s(l_0 + 2l_1bx)$$
(10)

$$Y(s,x) = (c_f + c_0 a e^{-bx})s$$
(11)

If we can solve equations (5) and (6), the solutions would look like, in terms of the  $\mathcal{ABCD}$  parameters,

$$\begin{bmatrix} V(s,x)\\I(s,x)\end{bmatrix} = \begin{bmatrix} \mathcal{A} & \mathcal{B}\\ \mathcal{C} & \mathcal{D}\end{bmatrix} \begin{bmatrix} V(s,0)\\I(s,0)\end{bmatrix}$$
(12)

Let x = L, and notice that  $V_{I} = V(s, 0)$ ,  $I_{I} = I(s, 0)$ ,  $V_{II} = V(s, L)$  and  $I_{II} = I(s, l)$ . The matrix thus describes the relation between  $(V_{I}, I_{I})$  and  $(V_{II}, I_{II})$ .

## 4. Solving the *ABCD* Parameters

Unfortunately, with fringing capacitance and inductance taken into consideration, it is very difficult to get analytical solutions to the transmission line equations (5) and (6), although the right hand sides are analytical functions. This is different from previous work in [8], where the ABCD parameters can be solved as analytical functions depending on circuit parameters. In this section, we show how to use a successive approximation method, Picard-Carson method [10, 15] to solve equations (5) and (6). Picard-Carson method is a powerful method in getting a power series solution for the distributed network. In fact, we prefer power series solution for the distributed network because it is easy to calculate poles and zeros.

Picard-Carson method solves ordinary differential equations by an iterative sequence. It starts with a crude guess as an approximation to the solution, then it refines the solution by successive iterations [15]. To solve equations (5) and (6), Picard-Carson method introduces the following iterative sequences:

$$V_n = V_{\rm I} - \int_0^x Z I_{n-1} dx$$
 (13)

$$I_n = I_{\rm I} - \int_0^x Y V_{n-1} dx$$
 (14)

Since the terms inside the integrals are continuous and bounded, the sequences will converge to the true solutions [15], i.e.

$$V(s,x) = \lim_{n \to \infty} V_n(x) \tag{15}$$

$$I(s,x) = \lim_{n \to \infty} I_n(x) \tag{16}$$

To start the iteration, we can choose  $V_1 = V_I$  and  $I_1 = I_I$  as initial guesses. Although they satisfy the boundary conditions, they do not have correct slopes. In applying Picard-Carson method, usually the more nearly correct a particular approximation, the better will be its successor. In practice, we find the iteration can be accelerated by choosing the following improved initial approximations,

$$V_{I} = V_{I} - (r_{0}/a + l_{0}s)xI_{I} = V_{I} - \alpha_{1}I_{I}$$
(17)  
$$I_{1} = I_{I} - (c_{f} + c_{0}a)sxV_{I} = I_{I} - \beta_{1}V_{I}$$
(18)

where we define  $\alpha_1 = (r_0/a + l_0s)x$  and  $\beta_1 = (c_f + c_0a)sx$ . (17) and (18) are chosen by taking slopes into consideration. By substituting (17) and (18) into (13) and (14), we can expand the iterative sequences one by one. By introducing iterative sequences

$$\begin{aligned}
\zeta_0 &= 1 & \varphi_0 = 1 \\
\zeta_1 &= \int_0^x \varphi_0 Z dx & \varphi_1 = \int_0^x \zeta_0 Y dx \\
\vdots &\vdots \\
\zeta_n &= \int_0^x \varphi_{n-1} Z dx & \varphi_n = \int_0^x \zeta_{n-1} Y dx
\end{aligned}$$
(19)

and

0

$$\begin{aligned} \alpha_1 &= (r_0/a + l_0 s) x \quad \beta_1 &= (c_f + c_0 a) s x \\ \alpha_2 &= \int_0^x \beta_1 Z dx \qquad \beta_2 &= \int_0^x \alpha_1 Y dx \\ & \cdots \end{aligned} \tag{20}$$

$$a_n = \int_0^x \beta_{n-1} Z dx \quad \beta_n = \int_0^x \alpha_{n-1} Y dx$$

the solutions at the *n*th iteration can be rewritten as

$$V_{n} = V_{\mathrm{I}} \sum_{k=0}^{\left\lceil \frac{n-1}{2} \right\rceil} \zeta_{2k} - I_{\mathrm{I}} \sum_{k=0}^{\left\lceil \frac{n-2}{2} \right\rceil} \zeta_{2k+1} + V_{c} (-1)^{n} \alpha_{n} \qquad (21)$$

$$I_n = I_{\rm I} \sum_{k=0}^{2} \varphi_{2k} - V_{\rm I} \sum_{k=0}^{2} \varphi_{2k+1} + I_c (-1)^n \beta_n \qquad (22)$$

where  $V_c$  and  $I_c$  are correction terms, and if n is even,  $V_c = V_I$ ,  $I_c = I_I$ ; otherwise,  $V_c = I_I$ ,  $I_c = V_I$ . It is easy to show that  $V_c$  and  $I_c$  are generated because we start with improved initial approximations. In other words, if we start with  $V_I$  and  $I_I$  as initial approximations, the final solutions similar to those in (21) and (22) will not contain such correction terms. According to equation (12), the ABCD parameters can be collected as

$$\mathcal{A} = \sum_{\substack{k=0\\ \bar{r}=2}}^{\left\lceil \frac{n-1}{2} \right\rceil} \zeta_{2k} + \mathcal{A}^{*}$$

$$\mathcal{B} = -\sum_{\substack{k=0\\ \bar{r}=2}}^{\left\lceil \frac{n-1}{2} \right\rceil} \zeta_{2k+1} + \mathcal{B}^{*}$$

$$\mathcal{C} = -\sum_{\substack{k=0\\ \bar{r}=2}}^{\left\lceil \frac{n-1}{2} \right\rceil} \varphi_{2k+1} + \mathcal{C}^{*}$$

$$\mathcal{D} = \sum_{\substack{k=0\\ \bar{r}=2}}^{\left\lceil \frac{n-1}{2} \right\rceil} \varphi_{2k} + \mathcal{D}^{*}$$
(23)

where  $\mathcal{A}^*$ ,  $\mathcal{B}^*$ ,  $\mathcal{C}^*$  and  $\mathcal{D}^*$  are other correction terms, and if *n* is even,  $\mathcal{A}^* = \alpha_n$ ,  $\mathcal{D}^* = \beta_n$ , and  $\mathcal{B}^* = \mathcal{C}^* = 0$ ; otherwise,  $\mathcal{B}^* = \alpha_n$ ,  $\mathcal{C}^* = \beta_n$ , and  $\mathcal{A}^* = \mathcal{D}^* = 0$ . Substituting these  $\mathcal{ABCD}$ parameters into (4) thus yields the voltage response in *s*-domain. Let  $D_e$  denote the denominator in (4), i.e.  $D_e = \mathcal{A}sR_DC_L - \mathcal{B}sC_L - \mathcal{C}R_D + \mathcal{D}$ . At each iteration shown in (19) and (20),  $\zeta$ ,  $\varphi$ ,  $\alpha$  and  $\beta$  can be expressed as polynomials in *s* of some degree. According to (23), if we carry out the iteration infinite times,  $D_e$ will thus contain infinite number of terms where each term is a product of a coefficient and a power of *s*. After we collect together all the coefficients of the same power of *s*,  $D_e$  can be written as an infinite power series of *s*. Let  $D_e = \sum_{k=0}^{\infty} a_k s^k$ . Coefficients  $a_k$ can be collected from the  $\mathcal{ABCD}$  parameters. These coefficients are directly related to the moments of the transfer function. The following lemma shows how to compute these coefficients exactly.

**Lemma 1** The coefficient  $a_k$  in  $D_e$  can be computed exactly in 2k iterations.

## 5. Fast and Accurate Delay Estimation

We develop a method to estimate delay fast and accurately. The method makes use of first three poles in the voltage transfer function  $V_L = V_D/D_e$ . It is different from other previous pole (or moment) based delay estimation methods in the past literature (e.g. [13, 18]). Our method is designed to deal with non-uniform wires, and in this paper we specify the wire shape function to be  $f(x) = ae^{-bx}$ . The delay model in [13] only works for uniform wires, and [18] can only deal with lumped RC(RLC) circuits.

Since we only make use of first three poles, the voltage response is approximated by

$$V_L = V_D \frac{1}{D_e} \approx V_D \frac{1}{1 + a_1 s + a_2 s^2 + a_3 s^3}$$
 (24)

where coefficients  $a_1$ ,  $a_2$  and  $a_3$  are collected from the ABCD parameters. Lemma 1 tells us that  $a_1$ ,  $a_2$  and  $a_3$  can be computed exactly in 6 iterations. In practice, we observe that using improved initial approximations only needs 5 iterations. We use Maple to carry out the iterations and collect coefficients  $a_1$ ,  $a_2$  and  $a_3$ . All these coefficients are functions of circuit parameters, such as  $R_D$ ,  $C_L$  etc. Furthermore,  $a_1$  is obviously the Elmore delay calculated for the exponential wire shape  $f(x) = ae^{-bx}$ .

Since a finite ramp input is more reasonable to model the driving signal than a step input, we derive the transient response in time domain under a finite ramp input instead of a step input. As in [8], for a finite ramp input,  $V_D(s) = \frac{1}{s^2 T_R} (1 - e^{-sT_R})$ , where  $T_R$  is the rising time. To get the voltage response in time-domain, we first get the voltage response for an infinite ramp input. Suppose it is u(t), then the voltage response in the time domain under a finite ramp input is

$$v_L(t) = \begin{cases} u(t) & \text{for } t < T_R \\ u(t) - u(t - T_R) & \text{for } t \ge T_R \end{cases}$$
(25)

We use the usual 50% delay as the delay metric and define it as the time when  $v_L$  reaches 0.5 for the first time. The delay can be calculated from the transient response in the time domain shown in (25). Since the delay depends on a and b, symbolically we can write it as  $T_{50\%}(a, b)$ .

## 6. Overshoot/Undershoot Voltage Estimation

Since we have taken inductance into our consideration, the voltage response in time-domain could have some amount of ringing (under-damped). It is advantageous that a small amount of ringing could help to decrease delay. But it is also very danger if the response voltage drops back below threshold voltage which will cause false switching. A good wire-sizing optimization which can utilize ringing is to minimize delay, and bound the overshooting/undershoot voltage at the same time.

We derive an analytical expression for estimating overshoot/undershoot voltage. We are interested in the case where two roots of algebraic equation  $D_e = 0$  are imaginary, because if all three roots are real, there will be no ringing occurs. Suppose three roots are  $\alpha \pm \beta i$  and  $\gamma$ . We can write an expression for overshoot/undershoot voltage by subtracting 1 from (25), because we assume a unit source input. Let  $\delta v$  be the overshoot/undershoot voltage, so  $\delta v$  is a function depending on t.

$$\delta v = -\frac{e^{\alpha t}}{T_R a_3 (\alpha^2 + \beta^2) \beta} \frac{\sqrt{C_1}}{\sqrt{C_2}}$$

$$\times \sin \left(\beta t + \eta + \zeta + \xi\right) + \frac{e^{\gamma t} (1 - e^{-\gamma T_R})}{T_R a_3 \gamma^2 C_2} \quad (26)$$

where

t.

$$C_1 = 1 - 2e^{-\alpha T_R} \cos\beta T_R + e^{-2\alpha T_R}$$
(27)

$$C_2 = (\alpha - \gamma)^2 + \beta^2 \tag{28}$$

$$an \eta = (\alpha^2 - \beta^2)/2\alpha\beta$$
<sup>(29)</sup>

$$\tan\zeta = e^{-\alpha T_R} \sin\beta T_R / (1 - e^{-\alpha T_R} \cos\beta T_R) \quad (30)$$

$$\tan \xi = (\alpha - \gamma)\beta \tag{31}$$

Under a finite ramp input, the input signal reaches its maximum at time  $t = T_R$ . As a result, the overshoot on output signal can not occur until  $t \ge T_R$  (more strictly speaking, it is  $t \ge Delay$ , where Delay is the 90% delay). To find when overshoot/undershoot occurs, we could take derivative on (26) with respect to t and let it be zero. The roots of such nonlinear equation correspond to the time when overshoot/undershoot occurs. However, solving the nonlinear equation is not an efficient way. We can actually get a good approximation. We start from (26), and take the derivative with respect to t as usual. Let  $\frac{dv_L(t)}{dt} = 0$ , i.e.

$$\frac{\sqrt{C_1}e^{\alpha t}}{\beta\sqrt{\alpha^2+\beta^2}}\sin\left(\beta t+\eta+\zeta+\xi+\phi\right) = \frac{e^{\gamma t}(1-e^{-\gamma T_R})}{\gamma\sqrt{C_2}}$$
(32)

where

$$\tan \phi = \beta / \alpha$$

In practice, we find the right hand side of (32) is not important in determining the time when overshoot occurs. Our approximation is to simply let the right hand side be zero and solve for *t*. Thus *t* is approximated by

$$t \approx (n\pi - \eta - \zeta - \xi - \phi)/\beta, \quad \mathbf{n} = 0, 1, 2, 3 \cdots$$
(33)

Note that (26) is valid only when  $t \ge Delay$ , so the maximum overshoot occurs at the first t when  $t \ge Delay$ . Let  $t_{over}$  and  $t_{under}$  denote the times when first overshoot and undershoot occur respectively, and obviously  $t_{under} = t_{over} + \pi/\beta$ . Substituting  $t_{over}$  and  $t_{under}$  back into (26), we can get expressions for estimating overshoot and undershoot voltages.

# 7. Wire-sizing Optimization

We consider two different optimization objectives. The first objective is to find the optimal wire shape in terms of a and b such that delay is minimized. Since ringing could help to decrease delay, it is possible that a minimum delay wire shape could have some amount of ringing which causes false switching. Therefore we add one constraint to control overshoot/undershoot. The optimization problem is stated as follows:

| Minimize   | $T_{50\%}(a,b)$          |
|------------|--------------------------|
| Subject to | $\delta v(a,b) \leq v_b$ |

where  $\delta v(a, b)$  is overshoot/undershoot voltage shown in (26), and  $v_b$  is predefined voltage bound.

Our second objective is to minimize the wiring area subject to a target delay bound. For the wire shape function considered in this paper, i.e.  $f(x) = ae^{-bx}$ , the area is  $\int_0^L f(x)dx = \frac{a}{b}(1-e^{-bL})$ , which is an analytical function depending on *a* and *b*. We can state this optimization problem formally as follows:

| Minimize   | $\frac{a}{b}(1-e^{-bL})$         |
|------------|----------------------------------|
| Subject to | $\check{T}_{50\%}(a,b) \leq T_B$ |
|            | $\delta v(a,b) \leq v_b$         |

where  $T_B$  is the delay bound, and the second constraint is to bound overshoot/undershoot voltage.

In our experiments, we observe that  $T_{50\%}$  is a smooth function in all experiments we have performed, although  $T_{50\%}$  does not have an analytical form. In practice, we use Conjugate Gradient method [1] to solve these two optimization problems.

## 8. Experimental Results

In this section, we present some experimental results. Our experiments are performed on some single interconnect wires, where we select circuit parameters in the following ranges,  $R_D \sim$  $20 - 30,000\Omega, C_L \sim 0.01 - 0.2pF, r_0 \sim 0.03 - 0.1\Omega/\Box,$  $c_0 \sim 0.03 - 0.13 fF/\mu m^2$ ,  $c_f \sim 0.08 - 0.2 fF/\mu m$ ,  $l_0 \sim 0.1 - 10 pH/\mu m$  and  $L \sim 4 - 40 mm$ . These circuit parameters are consistent with those in [17]. All circuit parameters as well as rising times are listed in Table 1. Each parameter has the same unit as shown in the ranges. They are divided into 8 test examples labeled from T1 to T8. We will use these parameters throughout our experiments, and in addition, we bound the undershoot voltage to be within 5%. Our experiments consist of four parts. In Experiment 1, we use SPICE to verify our delay estimation method. In Experiment 2, we find a and b such that delay is minimized. In Experiment 3, we find a and b such that area is minimized subject to a delay bound. For comparison, experiments 2 and 3 are performed without the undershoot voltage constraint. Only in Experiment 4, we add the constraint and explore how it affects the wire-sizing optimization results in two optimization objectives. In all experiments, we list b as b', where  $b' = b \times 10^{-4}$ .

| Test | $R_D$ | $C_L$ | $r_0$ | c0    | $^{c}f$ | $l_1$ | L  | $T_R$ |
|------|-------|-------|-------|-------|---------|-------|----|-------|
| T1   | 28.3  | 0.016 | 0.072 | 0.032 | 0.0877  | 0.1   | 40 | 1ns   |
| T2   | 283   | 0.016 | 0.072 | 0.032 | 0.0877  | 0.1   | 40 | 3ns   |
| T3   | 2830  | 0.016 | 0.072 | 0.032 | 0.0877  | 0.1   | 40 | 10ns  |
| T4   | 283   | 0.16  | 0.072 | 0.032 | 0.0877  | 0.1   | 20 | 50ps  |
| T5   | 283   | 0.16  | 0.072 | 0.032 | 0.0877  | 1.0   | 4  | 0.2ns |
| T6   | 283   | 0.16  | 0.032 | 0.072 | 0.1777  | 1.0   | 4  | 0.2ns |
| T7   | 183   | 0.016 | 0.032 | 0.042 | 0.1377  | 1.0   | 20 | 0.1ns |
| T8   | 283   | 0.16  | 0.032 | 0.072 | 0 1777  | 1.0   | 40 | 1ne   |

Table 1. Circuit parameters used in all experiments.

**Experiment 1:** First, by comparing the results with SPICE, we investigate how good our method for delay estimation is. Since SPICE can not deal with non-uniform wires directly, we approximate each wire by a large number of uniform width segments whose widths are calculated by  $f(x) = ae^{-bx}$ . Therefore a non-uniform wire is approximated by a large number of RLC segments. As the number of RLC segments approaches infinity, the SPICE-computed voltage response will be exact [12]. In all experiments, we divide each wire into 5,000 segments. In addition to

delays calculated by using our method and SPICE, we also calculate delays using the Elmore delay model (i.e. one-pole approximation). The Elmore delay under ramp input can be approximated by  $T_D = T_{ED} + \frac{1}{2}T_R$  [11], where  $T_{ED}$  is the Elmore delay under step input and  $T_R$  is the rising time. We carry out the experiments for each set of parameters from T1 to T8 in Table 1. The results are summarized in Table 2. In Table 2, columns 2-3 list some choices of *a* and *b*. Columns 4-6 list delays calculated by using the Elmore delay model, our delay estimation method and SPICE respectively. In all experiments we have done, our method is found far more accurate than the Elmore delay model and its delay value is very close to SPICE's result. However, SPICE is computationally expensive. For comparison, it takes less than 1ms to calculate a delay value in using our method. On the contrary, it takes SPICE about 10-60 minutes to calculate one delay value. All the times are measured in SUN Ultra Sparc I machines.

| Example | a     | ь′    | Elmore  | Our method | SPICE   |
|---------|-------|-------|---------|------------|---------|
| T1      | 40.35 | 1.303 | 3.126ns | 2.881ns    | 2.745ns |
| T2      | 8.103 | 1.010 | 7.942ns | 6.951ns    | 6.878ns |
| T3      | 1.987 | 0.823 | 28.66ns | 23.70ns    | 23.69ns |
| T4      | 5.447 | 1.097 | 1.928ns | 1.505ns    | 1.495ns |
| T5      | 2.389 | 2.526 | 376.5ps | 339.7ps    | 328.2ps |
| T6      | 1.329 | 3.001 | 505.4ps | 412.5ps    | 406.4ps |
| T7      | 4.430 | 1.734 | 1.738ns | 1.589ns    | 1.600ns |
| T8      | 4.699 | 0.733 | 8.250ns | 6.859ns    | 6.654ns |

 Table 2. Results for Experiment 1: Delays calculated by using the Elmore delay model, our method and SPICE based on circuit parameters in Table 1.

**Experiment 2:** To solve the first optimization objective, we will find the optimal a and b such that  $T_{50\%}$  is minimized. The optimal a and  $\overline{b}$  by using our method are summarized in Table 3. For comparison, we also list these optimal a and b found by using the Elmore delay model. In Table 3, columns 2-4 list the optimal a and b and minimum delay values by using the Elmore delay model. Columns 5-7 list similar results by using our method, and columns 8-9 list SPICE-computed delays for each optimal shape. We use "E-Shape" and "Our-Shape" to denote the optimal shapes determined by the Elmore delay model and our method respectively. By using Conjugate Gradient method, the optimal a and b can be found within 10-20 iterations, and the total running time is less than 0.1 second. Table 3 clearly shows some difference between two kinds of optimal shapes determined by the Elmore delay model and our method. Although the optimal shapes determined by the Elmore delay model differ from these determined by our method and the Elmore delay values are not accurate at all, the wire quality in terms of minimum actual delay is as good as the one determined by our method. Therefore despite its inaccuracy in estimating the actual delay, the Elmore delay model has high fidelity for the delay minimization objective.

| Test | Elmore Delay Model |       |         | Our met | hod   |         | SPICE Delay |           |
|------|--------------------|-------|---------|---------|-------|---------|-------------|-----------|
|      | a                  | ь′    | Delay   | a       | ь′    | Delay   | E-Shape     | Our-Shape |
| T1   | 48.41              | 0.867 | 2.345ns | 33.09   | 0.704 | 2.192ns | 2.059ns     | 2.068ns   |
| T2   | 9.796              | 0.588 | 6.383ns | 9.642   | 0.519 | 5.233ns | 5.246ns     | 5.229ns   |
| T3   | 2.171              | 0.372 | 24.39ns | 2.364   | 0.368 | 19.38ns | 19.42ns     | 19.40ns   |
| T4   | 6.213              | 0.913 | 1.876ns | 5.447   | 0.687 | 1.423ns | 1.426ns     | 1.421ns   |
| T5   | 2.614              | 2.524 | 376.1ps | 1.569   | 0.047 | 334.5ps | 318.8ps     | 320.1ps   |
| T6   | 1.393              | 2.572 | 0.504ns | 0.973   | 0.030 | 0.411ns | 0.408ns     | 0.426ns   |
| T7   | 5.261              | 0.959 | 1.388ns | 3.382   | 0.497 | 1.265ns | 1.049ns     | 1.102ns   |
| T8   | 5.063              | 0.469 | 6.983ns | 5.063   | 0.370 | 5.874ns | 5.788ns     | 5.800ns   |

| Table 3.  | Results for l | Experiment 2: | Optimal a an | d b that mini | mize delay |
|-----------|---------------|---------------|--------------|---------------|------------|
| using the | Elmore delay  | model and our | method.      |               |            |

**Experiment 3:** We also carry out the experiments for the second optimization objective, that is to minimize the wiring area subject to a delay bound. The optimal shapes which are determined based on the Elmore delay model and our method are summarized in Table 4. In Table 4, column 2 lists the delay bounds for each experiment. Columns 3-5 list the optimal a and b and minimized areas determined by using the Elmore delay model. Columns 6-8 list similar results determined by using our method. Columns 9-10 list SPICE-computed delay values for each optimal shape. It is shown in Table 4 that the optimal shapes determined by the Elmore

| Test | Delay   | Elmore Delay Model |       |        | Our method |       |        | SPICE Delay |           |
|------|---------|--------------------|-------|--------|------------|-------|--------|-------------|-----------|
|      | Bound   | a                  | ь′    | Area   | а          | ь′    | Area   | E-Shape     | Our-Shape |
| T1   | 2.5ns   | 22.85              | 0.739 | 293236 | 11.49      | 0.576 | 179528 | 2.169ns     | 2.425ns   |
| T2   | 6.5ns   | 5.007              | 0.457 | 91903  | 2.517      | 0.368 | 52665  | 5.502ns     | 6.414ns   |
| T3   | 26ns    | 1.168              | 0.365 | 24569  | 0.535      | 0.295 | 12557  | 20.93ns     | 25.82ns   |
| T4   | 2ns     | 3.251              | 0.768 | 33215  | 2.440      | 1.073 | 20089  | 1.560ns     | 2.000ns   |
| T5   | 0.45ns  | 0.873              | 2.393 | 2246   | 0.524      | 2.263 | 1380   | 0.371ns     | 0.445ns   |
| T6   | 0.535ns | 0.730              | 2.426 | 1869   | 0.332      | 2.132 | 894    | 0.421ns     | 0.518ns   |
| T7   | 1.455ns | 3.155              | 0.852 | 30290  | 1.246      | 0.332 | 18198  | 1.075ns     | 1.401ns   |
| T8   | 7.5ns   | 4.848              | 0.489 | 85160  | 2.240      | 0.514 | 38005  | 5.820ns     | 7.355ns   |

Table 4. Results for Experiment 3: The optimal shapes which minimize area subject to a delay bound using the Elmore delay model and our method. The wiring area has unit  $\mu m^2$ .

delay model have much worse performance than those determined by our method. On the one hand, these shapes have much bigger area. On the other hand, the actual delay values are not close to the delay bounds. Therefore, using more accurate delay model such as our method is necessary for this optimization objective.

**Experiment 4:** We carry out similar experiments as in experiments 2 and 3, but we add the undershoot voltage constraint. We explore how the constraint affects the optimization results. For the delay minimization objective, the results are summarized in Table 5. In Table 5, columns 6-9 list optimal results after considering the constraint. They are copied from columns 5-7 in Table 3, **Experiment 2**. T1 and T7 are not listed in Table 5, since the old results already satisfy undershoot voltage constraint.

Similarly, Table 6 list the results for the area minimization objective. Again, columns 2-5 list results before considering the undershoot voltage constraint, and last four columns list the results after considering the constraint. T1 and T4 are not listed, since the undershoot voltages are within the bound. From Table 5 and 6, we find that undershoot voltage constraint does affect the optimization results. Notice the difference between columns 5 and 9 in both tables for the optimizations before and after adding the constraint. Without considering the constraint, the optimal shapes could have undershoot voltages which exceed the bound. By taking the constraint into consideration, the optimal shapes may have a little bit larger delays or areas, but the undershoot voltages are safely bounded. It is thus necessary to take the undershoot voltage constraint into wire-sizing optimization.

| I | Test | No unde | rshoot conti | rol     |       | undershoot control |       |         |      |  |
|---|------|---------|--------------|---------|-------|--------------------|-------|---------|------|--|
| ſ |      | a       | ь′           | Delay   | δυ    | а                  | ь′    | Delay   | δυ   |  |
| ſ | T2   | 9.642   | 0.519        | 5.233ns | 0.078 | 13.32              | 0.987 | 5.909ns | 0.05 |  |
| I | T3   | 2.364   | 0.368        | 19.38ns | 0.157 | 3.576              | 0.904 | 21.67ns | 0.05 |  |
| I | T4   | 5.447   | 0.687        | 1.423ns | 0.098 | 1.532              | 1.532 | 1.595ns | 0.05 |  |
| I | T5   | 1.569   | 0.047        | 334.5ps | 0.057 | 2.388              | 1.312 | 337.0ps | 0.04 |  |
| I | T6   | 0.973   | 0.030        | 0.411ns | 0.121 | 3.158              | 8.333 | 0.470ps | 0.05 |  |
| I | T8   | 5.063   | 0.370        | 5.874ns | 0.136 | 6.578              | 0.780 | 6.529ns | 0.05 |  |

**Table 5. Results for Experiment 4:** Optimal *a* and *b* that minimize delay using our method before and after adding the undershoot voltage constraint.

| Test | No unde | rshoot conti | rol   |       | undershoot control |       |       |            |
|------|---------|--------------|-------|-------|--------------------|-------|-------|------------|
|      | a       | ь′           | Area  | δυ    | a                  | ь′    | Area  | $\delta v$ |
| T2   | 2.517   | 0.368        | 52665 | 0.053 | 5.406              | 0.777 | 66496 | 0.05       |
| T3   | 0.535   | 0.295        | 12557 | 0.102 | 0.849              | 0.552 | 13682 | 0.05       |
| T5   | 0.524   | 2.263        | 1380  | 0.118 | 1.646              | 6.360 | 2385  | 0.05       |
| T6   | 0.332   | 2.132        | 894   | 0.125 | 1.448              | 8.108 | 1716  | 0.05       |
| T7   | 1.246   | 0.332        | 18198 | 0.060 | 1.748              | 0.723 | 18487 | 0.04       |
| T8   | 2.240   | 0.514        | 38005 | 0.060 | 2.638              | 0.592 | 40376 | 0.05       |

**Table 6.** Results for Experiment 4: Optimal *a* and *b* that minimize area subject to a delay bound using our method before and after adding the undershoot voltage constraint. The wiring area has unit  $\mu m^2$ .

## References

- M.S. Bazaraa, H.D. Sherali and C.M. Shetty, *Nonlinear Programming: Theory and Algorithm*, 2nd Edition, John Wiley & Sons, Inc., 1993.
- [2] C.-P. Chen, Y.P. Chen, and D.F. Wong, Optimal Wire-sizing Formula under the Elmore Delay Model, *Proc. ACM/IEEE Design Automation Conf.*, pp.487-490, 1996.

- [3] C.-P. Chen and D.F. Wong, Optimal Wire-sizing Function with Fringing Capacitance Consideration, *Proc. ACM/IEEE Design Automation Conf.*, pp.604-607, 1997.
- [4] W.C. Elmore, The Transient Response of Damped Linear Network with Particular Regard to Wide-band Amplifier, *Journal of Applied Physics*, vol.19, pp.55-63, 1948.
- [5] J.P. Fishburn and C.A. Schevon, Shaping a Distributed RC Line to Minimize Elmore Delay, *IEEE Transactions on CAS-I*, vol.42, pp.1020-1022, 1995.
- [6] D.S. Gao and D. Zhou, Propagation Delay in RLC interconnection Networks, Proc. IEEE Int. Symp. on Circuits and Systems, Vol.3, pp.2125-2128, 1993.
- [7] Y. Gao and D.F. Wong, Optimal Shape Function for a Bi-directional Wire under Elmore Delay Model, *Proc. IEEE Int. Conf. on Computer Aided Design*, pp.622-627, 1997.
- [8] Y. Gao and D.F. Wong, Shaping a VLSI Wire to Minimize Delay Using Transmission Line Model, *Proc. IEEE Int. Conf. on Computer Aided Design*, pp.611-616, 1998.
- [9] N. Gopal, D.P. Neikirk and L.T. Pillage, Evaluating RC-Interconnect Using Moment Matching Approximations, *Proc. IEEE Int. Conf. on Computer Aided Design*, pp.74-77, 1991.
- [10] M.S. Ghausi and J.J. Kelly, *Introduction to Distributed-Parameter Networks: with Application to Integrated Circuits*, R.E. Krieger Publishing Co., Huntington, New York, 1977.
- [11] A.B. Kahng, K. Masuko, and S. Muddu, Analytical Delay Models for VLSI Interconnects under Ramp Input, *Proc. IEEE Int. Conf. on Computer Aided Design*, pp.30-35, 1996.
- [12] A.B. Kahng and S. Muddu, Two-pole Analysis of Interconnect Trees, IEEE Multi-Chip Module Conf., pp.105-110, 1995.
- [13] A.B. Kahng, K. Masuko and S. Muddu, Delay Models for MCM Interconnects When Response is Non-monotone, *IEEE Multi-Chip Module Conf.*, pp.102-107, 1997.
- [14] V.B. Rao, Delay Analysis of the Distributed RC Line, Proc. ACM/IEEE Design Automation Conf., pp.370-375, 1995.
- [15] E.D. Rainville, *Elementary Differential Equations*, 3rd Edition, The Macmillan Company, 1964.
- [16] T. Sakurai, Closed-Form Expressions for Interconnection Delay, Coupling, and Crosstalk in VLSI's, *IEEE Transactions on ED*, vol.40, No.1, 1993.
- [17] Semiconductor Industry Association, National Technology Roadmap for Semiconductors, 1997.
- [18] B. Tutuianu, F. Dartu and L. Pileggi, An Explicit RC-Circuit Delay Approximation Based on the First Three Moments of the Impulse Response, ACM/IEEE Design Automation Conf., pp.611-616, 1996.
- [19] N.H.E. Weste and K. Eshraghian, *Principles of CMOS VLSI Design*, 2nd Edition, Addison-Wesley Publishing Company, 1993.