# Noise-Aware Power Optimization for On-Chip Interconnect\*

Ki-Wook Kim, Seong-Ook Jung, Unni Narayanan\*, C. L. Liu<sup>†</sup> and Sung-Mo Kang

Coordinated Science Laboratory, Univ. of Illinois at Urbana-Champaign, USA \* Design Technology, Intel Corporation, Santa Clara, USA † Dept. of Computer Science, National Tsing Hua University, Taiwan

### Abstract

Realization of high-performance domino logic depends strongly on energy-efficient and noise-tolerant interconnect design in ultra deep sub-micron processes. We characterize the cycle-averaged power model for interconnects accounting for switching statistics and dynamic behaviors. For the sake of signal integrity, cross-coupling effects are also characterized which reflect logical correlation between adjacent wires. Based on the new models for interconnect power and capacitive crosstalk, we optimize the coupling power consumed by interconnects with crosstalk constraints. Experimental results show that optimized designs save the power consumption significantly.

### 1 Introduction

The era of system-on-a-chip integrated with millions of transistors drives the technology to scale down to ultra deep sub-micron (UDSM) range,  $0.25\mu$ m or below. As feature size decreases, interconnect pitch also shrinks while packaging is becoming denser. Meanwhile, wire aspect ratio (namely, metal height/width) increases from 1.8 in 0.18  $\mu$ m processes to 2.7 in 0.07  $\mu$ m processes to keep resistance in control for high performance [11]. These apparent trends bring up two competitive issues: power efficiency and noise immunity on interconnect.

First, higher packaging density and wider chip area lead to relatively longer interconnects which will dominate both the speed and power dissipation. Hence, it is crucial to keep the effective wire capacitance small for high performance and low power dissipation. Coupling capacitance between adjacent wires becomes the dominant component of interconnect capacitance. The ratio between cross-coupling capacitance and total wire capacitance is as much as 0.8 in  $0.18\mu$ m technology with minimum spacing [2]. Based on these observations, we conclude that most of the power consumed by interconnects depend on the coupling capacitance, which is referred to as *coupling power*. Our approach to coupling power optimization is motivated by the fact that coupling power can be minimized with appropriate net ordering.

ISLPED '00, Rapallo, Italy.

Copyright 2000 ACM 1-58113-190-9/00/0007...\$5.00.



Figure 1: Interconnect models: (a) Physical capacitance for interconnect nets; (b) a  $\pi$  model of the interconnect; (c) an effective capacitance model accounting for signal probabilities

Another important issue is signal integrity which can be deteriorated by design-related noise sources, such as coupling effects, di/dt noise and IR drop. The noise margin and slew rate of a wire are strongly dependent on the Miller effects in UDSM which is referred to as crosstalk. There are two types of crosstalk due to cross-coupled capacitors or inductors. Inductive crosstalk becomes significant when the clock frequency goes high [6], which we shall not consider in our work. Capacitive crosstalk becomes far more hazardous as dynamic logic is now prevalently used in high-performance circuit design. The reason is that dynamic logic is more vulnerable to crosstalk noise, because its evaluation nodes are not able to recover from erroneous transitions.

First, we characterize the average power consumed by interconnects for domino logic. Our cycle-averaged power model for interconnects accounts for physical net topology, switching activities, and dynamic characteristics of the circuit implementation. Second, the maximum crosstalk model takes into account logical correlation on top of coupling capacitance and net spacing. The reason is that satisfiability and observability of crosstalk condition determine effective crosstalk between two signals. Third, based on the power and crosstalk model, we optimize the initial routing solution with minimizing coupling power as the objective with constraints on maximum crosstalk and area. The power optimization is performed via track assignment by using simulated annealing.

<sup>\*</sup>Supported in part by National Science Foundation under grant NSF MIP 96 12184 and by Intel under grant 5414.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

$$\underbrace{\frac{L}{C_x \downarrow 0}}_{(a) \text{ Type LL}} \quad \underbrace{\frac{L}{C_x \downarrow 0^+}}_{(b) \text{ Type LH}} \quad \underbrace{\frac{L}{C_x \downarrow 0^-}}_{(c) \text{ Type HH}} \quad \underbrace{\frac{L}{C_x \downarrow 0^-}}_{(c) \text{ Type HH}}$$

Figure 2: Switching activity types for domino logic

### 2 Cycle-Averaged Power Model for Interconnect

The average power consumed by a wire with a clock frequency  $f = \frac{1}{T_{clk}}$  can be evaluated by [8]

$$W_{av} = \lim_{n \to \infty} \frac{\int_{0}^{n \cdot T_{clk}} V_{dd} \cdot I_{c}(t) \cdot f dt}{n}$$
  
$$= V_{dd} \cdot \lim_{n \to \infty} \frac{\int_{0}^{n \cdot T_{clk}} I_{c}(t) dt}{n} \cdot f$$
  
$$= V_{dd} \cdot \Delta Q_{av} \cdot f \qquad (1)$$

where *n* is the number of clock cycles observed and  $I_c(t)$  represents the drawn current due to transitions during a clock period.  $\Delta Q_{av}$  is the cycle-averaged charge provided by the power supply to all the capacitance of interconnect which is given by

$$\Delta Q_{av} = \sum_{j=S} p_j \cdot C_j^{total} \cdot \Delta V_j$$
$$\triangleq \sum_{j=S} \widehat{C}_j \cdot V_{dd}$$
(2)

where S is the set of nodes connected to  $V_{dd}$ ,  $p_j$  denotes the switching probability, and  $C_j^{total}$  denotes lumped total capacitance on node j. The effective capacitance  $\hat{C}$  is characterized by both physical capacitance and switching activities.

#### 2.1 Effective Capacitance

The effective capacitance accounts for cycle-averaged charge stored in physical capacitance which is provided by the power supply during the evaluation stage.

One can model the physical capacitance of a wire with width w, length l, overlapping segment length y, and edge-to-edge distance d as shown in Figure 1(a)

$$C_{total} = (C_a \cdot w + 2C_f) \cdot l + C_x \cdot \left(\frac{y \cdot d_{min}}{d}\right) \quad (3)$$

where  $C_a$  is unit area capacitance  $(F/cm^2)$  to substrates,  $C_f$  is unit fringe capacitance (F/cm) and  $C_x$  is unit coupling capacitance (F/cm) with minimum spacing  $d_{min}$ . To be precise,  $C_f$  and  $C_x$  are not constant due to complex fringing effects [3], but for our purpose it is adequate. The area and fringe capacitance are barely independent of the distance d, which is referred to as the self capacitance  $C_s$ .

The distributed RC model for interconnects, called  $\pi$ -model, is illustrated in Figure 1(b). Corresponding to self and coupling capacitance, effective capacitance can be viewed as consisting of two components to reflect the switching activities as in Figure 1(c)

$$\widehat{C}_{total} = \widehat{C}_s + \widehat{C}_x. \tag{4}$$

The effective self capacitance  $\hat{C}_s$  is characterized by the switching activity of domino logic. Domino logic is free from spurious



Figure 3: Coupling power for type HH transition  $(P_{HH})$  normalized by the coupling power for the type LH transition  $(P_{LH})$  with various transition delay  $(\tau)$ 

transitions. At most one low-to-high transition can occur at the output of a domino logic cell during the evaluation stage. If the input combination to the cell results in a current path from the evaluation node to the ground rail, then discharge of load capacitance causes a low-to-high transition to take place at the output of the domino cell. Otherwise, charge is kept in the load capacitance, so the output logic value of the domino cell remains low. Therefore, only two types of waveforms are observable at the buffer output of a domino cell, namely, either stable low or low-to-high transition during the evaluation stage. The monotonicity of domino logic accounts for the effective self capacitance of a wire i, which is given by

$$\widehat{C}_s(i) = p(i_{=1}) \cdot (C_a \cdot w + 2C_f) \cdot l \tag{5}$$

where  $p(i_{=1})$  is the probability that the signal *i* is high, which is referred to as the on-probability of the signal *i*.

The effective coupling capacitance  $\widehat{C}_x$  is determined by signal activities of adjacent wires consisting of three waveform combinations shown in Figure 2. Type LL transition involves neither charging nor discharging of the coupling capacitor. Type LH transition results in charging  $C_x$  with  $V_{dd}$  during the evaluation stage. The charge stored in the coupling capacitance is reset to zero during the precharge stage. When both signals switch simultaneously in type HH, there is no charge distribution on  $C_x$  in principle. However, possible misalignment of the two transitions can cause substantial power consumption. The amount of power consumption due to misaligned transitions varies according to the dynamic characteristics. Each type of transition contributes to the effective capacitance between a wire i and a wire j.

$$\hat{C}_x(i,j) = (p(i_{=0} \land j_{=1}) + p(i_{=1} \land j_{=0}) \\
+ \omega \cdot p(i_{=1} \land j_{=1})) \cdot \left(\frac{C_x \cdot y \cdot d_{min}}{d}\right)$$
(6)

where  $p(i_{=q} \land j_{=r})$  denotes the correlated probability where  $q, r \in \{0, 1\}$ . The temporal correlation factor  $\omega$  captures the temporal behavior of type HH transitions. Discussion on signal and temporal correlations follows in the subsequent sections.

#### 2.2 Signal Correlation

To evaluate the on-probability  $p(i_{=1})$  and the correlated probability  $p(i_{=q} \land j_{=r})$ , we need to compute the signal correlation coefficients and the transition correlation coefficients. The *signal correlation coefficient* [4] between a signal *i* and a signal *j* is

$$\Phi_{qr}^{ij} \triangleq \frac{p(i_{=q} \land j_{=r})}{p(i_{=q}) \cdot p(j_{=r})}, \quad q, r \in \{0, 1\}$$
(7)



Figure 4: Temporal correlation: (a) Timing diagram; (b) temporal correlation factor  $\omega$  computation for uniform distribution of independent transitions

where  $p(i_{=q})$  and  $p(j_{=r})$  are signal probabilities that *i* has the value *q* and *j* has the value *r*, respectively.

The transition correlation coefficient [10] between a signal i which changes from q to u, and a signal j which changes from r to v is defined as

$$\Gamma_{qr,uv}^{ij} \triangleq \frac{p(i_{q \to u} \land j_{r \to v})}{p(i_{q \to u}) \cdot p(j_{r \to v})}, \quad q, r, u, v \in \{0, 1\}$$
(8)

where  $p(i_{q \to u})$  is the transitional probability that the signal value of *i* changes from *q* to *u*, which can be represented by the signal probability  $p(i_{=q})$  and the conditional probability  $p(i(t + \Delta t)_{=u} | i(t)_{=q})$ :

$$p(i_{q \to u}) = p(i(t)_{=q}) \cdot p(i(t + \Delta t)_{=u} | i(t)_{=q}), \quad q, u \in \{0, 1\}$$
(9)

Due to the monotonicity of domino logic, the transition correlation coefficient  $\Gamma_{qr,uv}^{ij}$  becomes  $\Gamma_{00,uv}^{ij}$ , which can be simply represented by the signal correlation coefficient as  $\Phi_{uv}^{ij}$ .

The signal correlation coefficient can be efficiently evaluated by using OBDD. If we assume that higher order correlations between two signals and a third signal are negligible, then the signal correlation coefficient for three signals can be represented as a product of the signal correlation coefficients of signal pairs:

$$\Phi_{qrs}^{ijk} = \Phi_{qr}^{ij} \cdot \Phi_{rs}^{jk} \cdot \Phi_{sq}^{ki} \tag{10}$$

Then the signal probability of a signal i to have a value  $\theta \in \{0,1\}$  is

$$p(i_{=\theta}) = \sum_{\pi \in \Pi_{\theta}} \prod_{k=1}^{n} \left[ p\left(x_{=\theta_{k}}^{k}\right) \cdot \prod_{1 \le k < l \le n} \Phi_{\theta_{k}\theta_{l}}^{x_{k}x_{l}} \right]$$
(11)

where  $\Pi_{\theta}$  is the set of path from the OBDD node representing the signal *i* to the terminal OBDD node denoting  $\theta \in \{0, 1\}$ .  $x^k$  is a primary input variable constituting *i*, and  $\theta_k$  is an assigned value to  $x^k$  such that the signal *i* should be evaluated  $\theta$  in a path  $\pi$ . The signal probability can be computed in O(N) time where N is the number of OBDD nodes [1].

The correlated probability in Equation (6) represents the correlation of two signals:

$$p(i_{=q} \wedge j_{=r}) = p(i_{=q}) \cdot p(j_{=r}) \cdot \Phi_{qr}^{ij}$$
(12)

Because the signal probability can be computed in O(N) time, evaluation of Equation (12) can also be performed in O(N) time.

#### 2.3 Temporal Correlation

Simultaneous type HH transitions on neighboring wires yield an insignificant amount of charge distribution in coupling capacitance.

However, in most cases, the transitions are misaligned due to unbalanced propagation delays through multiple signal paths to the logic component. The misalignment leads to multiple interactions between adjacent wires. The power consumed by coupling capacitance increases corresponding to the multiple charge distributions. Figure 3 shows the coupling power of type HH ( $P_{HH}$ ) normalized with respect to the coupling power of type LH ( $P_{LH}$ ) with various transition delay between two rising signals on adjacent wires. The timing dependency of coupling power is measured by HSPICE.

Let  $\tau$  denote the delay between two transitions. Then we have two boundary conditions for  $\tau$ . As  $\tau$  approaches toward 0, coincident transitions consume negligible coupling power which is represented by  $\beta_0$ . On the other extreme, as  $\tau$  increases within a functionally proper range, misaligned transitions result in significant power dissipation, which approaches a saturated ratio  $\beta_{\infty}$ . Now we define *coincident transition time*  $\tau_0$  as the time  $\tau$  needed for the normalized power to reach  $\beta_0$ . The *discrete transition time*  $\tau_f$  is defined as the time  $\tau$  required for the normalized power to reach  $\beta_0 + 0.9 * (\beta_{\infty} - \beta_0)$ . Let us assume that  $min(t_{r1}, t_{r2}) + \tau_f \leq T_{clk}$ . Then we approximate the intermediate cases using a linear interpolation between  $\tau_0$  and  $\tau_f$ .

$$\beta = \beta_0 + \frac{\beta_\infty - \beta_0}{\tau_f - \tau_0} \cdot (\tau - \tau_0)$$
(13)

Now we are ready to figure out the temporal correlation factor  $\omega$ . In principle, the signal transition on a wire can be temporally correlated to the signal transition on other adjacent wires. In other words, the transition on  $t_{r1}$  for the signal 1 is not an independent event of the transition on  $t_{r2}$  for the other signal 2 in Figure 4(a). The temporal correlation factor  $\omega$  can be evaluated by

$$\omega = p(|\tau| \ge \tau_f) \cdot \beta_{\infty} + p(|\tau| \le \tau_0) \cdot \beta_0 + \frac{\int_{\tau_0}^{\tau_f} \left[ p(|\tau|) \cdot \left( \beta_0 + \frac{\beta_{\infty} - \beta_0}{\tau_f - \tau_0} \cdot (\tau - \tau_0) \right) \right] d\tau}{\tau_f - \tau_0}$$
(14)

Equation (14) accounts for the temporal correlation between two signals. However, accurate evaluation of temporal correlation is known to be prohibitively expensive from a computational point of view. To approximate the problem, we assume that the transitions are uniformly distributed and independent of each other. Then, we can compute the temporal correlation factor as follows. Let  $\alpha_f$  denote the probability of  $p(|\tau| \geq \tau_f)$ ,  $\alpha_0$  denote the probability of  $p(|\tau| \leq \tau_0)$ , and  $\alpha_i$  denote the probability of  $p(\tau_0 < |\tau| < \tau_f)$  shown in Figure 4(b). In this particular case, the temporal correlation factor  $\omega$  is given by

$$\omega = \alpha_f \cdot \beta_{\infty} + \alpha_0 \cdot \beta_0$$

$$+ \alpha_i \cdot \frac{\int_{\tau_0}^{\tau_f} \left[ \beta_0 + \frac{\beta_{\infty} - \beta_0}{\tau_f - \tau_0} \cdot (\tau - \tau_0) \right] d\tau}{\tau_f - \tau_0} \qquad (15)$$

$$= \left( \frac{\beta_{\infty} + \beta_0}{2} \right) + (\alpha_f - \alpha_0) \cdot \left( \frac{\beta_{\infty} - \beta_0}{2} \right)$$

If  $\beta_0 = 0$ ,

$$\varphi = \frac{\beta_{\infty} \cdot (1 + \alpha_f - \alpha_0)}{2} \tag{16}$$

According to the uniform distribution and random arrival as in Poisson distribution, the coefficients  $\alpha_f$ ,  $\alpha_0$  and  $\alpha_i$  can readily be evaluated as the area proportion illustrated in Figure 4(b).

Statistical information for signal transition and dynamic behavior analysis enable us to compute both the effective self capacitance  $\hat{C}_s$  and the effective coupling capacitance  $\hat{C}_x$ . Based on the effective capacitance, we eventually compute the cycle-averaged power for interconnects in Equation (1).



Figure 5: Routing example for low power consumption: (a) Leftedge algorithm (temperature = 13.4); (b) Optimized routing solution (temperature = 10.7)

### 3 Crosstalk Model for Domino Logic

Crosstalk effects are determined by physical capacitance and logical characteristics of the wires under consideration. The physical capacitance is represented by unit-length cross-coupling capacitance  $C_x$  and spacing between wires.

Due to logical correlation between signals, some signals are free from crosstalk from other specific set of signals. The concepts of satisfiability and observability are introduced to capture such logical behavior.

The uni-directional transition of domino logic implies that a signal with a stable low is a potential victim signal. And the potential aggressor signal can be defined as the signal which experiences a transition causing cross-coupling with a victim signal. *Satisfiability* represents the existence of at least one primary input vector that satisfies both the victim signal condition and the aggressor signal condition. Satisfiability of a crosstalk between a victim net v and a aggressor net a is defined to be

$$S_{va} = \begin{cases} 1 & \text{if } (v_{=0} \land a_{=1}), \text{ for input vector } \mathbf{p} \\ 0 & \text{otherwise} \end{cases}$$
(17)

where  $\mathbf{p}$  is an instance of primary input vector which evaluates v to logic value 0 and a to logic value 1.

For a crosstalk to affect the functional or temporal behavior of a circuit, the crosstalk should be observable at primary outputs. The observability of a crosstalk on a victim signal v is given by

$$O_{va} = \begin{cases} 1 & \text{if } \sum_{q \in PO} (q |_{v=0} \oplus q |_{v=1}) |_{a=1} \neq 0 \\ 0 & \text{otherwise} \end{cases}$$
(18)

where q is a primary output signal, and  $q \mid_{v=\theta}$  is a cofactor of q with respect to the variable v with the value  $\theta$ .

Then, the maximum crosstalk that a victim signal v can experience is given by

$$X_{v} = \sum_{a \in \mathcal{N}} \left( \frac{C_{x} \cdot y \cdot d_{min}}{d} \right) \cdot (S_{va} \cdot O_{va})$$
(19)

where  $\mathcal{N}$  denotes the whole set of nets in a circuit.

Notice that we can define logical crosstalk immunity based on satisfiability and observability according to Equation (19). For a potential victim signal v and a potential aggressor signal a, if  $S_{va} \cdot O_{va} = 0$  holds, then v is immune to crosstalk of a. In addition, if



Figure 6: Synthesis flow for low interconnect power

 $S_{av} \cdot O_{av} = 0$  also holds, then wire pair v and a are mutually free from crosstalk, being in the same *crosstalk immunity set* (CIS) [9]. Intuitively, the wire pair v and a can be placed as close as possible for compact design, since there will be no crosstalk between them.

#### 4 Cross-Coupling Power Optimization

#### 4.1 Problem Formulation

We assume that the design has already been placed and routed using gridded channel routing. In addition, the primary input statistics are known a priori. These assumptions are realistic for soft or firm intellectual property (IP) macroblock specification.

The effective capacitance consists of both self and coupling capacitance. Coupling power is the dominant component which is heavily dependent upon net spacing. Hence, we isolate the effective coupling capacitance and optimize the net spacing in a routing solution in order to minimize the coupling power. We formulate the power optimization problem as follows:

$$\begin{array}{ll} Minimize & \sum\limits_{i,j\in\mathcal{N}} \widehat{C}_x(i,j) \\ subject \ to & Max_{v\in\mathcal{N}} \ X_v \leq X_{spec} \\ & Routing \ area \leq Area_{spe} \end{array}$$

where  $\pi$  is a member of the set of signal paths from the primary inputs to the primary outputs. The constraints specify that maximum crosstalk and routing area should all be kept within pre-defined bounds.

c

#### 4.2 Coupling Power Optimization

Gridded channel routing is specifically considered. The objective of track assignment is to minimize the effective coupling capacitance (Equation (6)) in order to reduce coupling power, while satisfying all the constraints. Because it is hard to consider the objective function and constraints simultaneously during the routing process, a conventional channel routing approach, left-edge algorithm [5], is



Figure 7: HSPICE power estimation results for C880 benchmark circuits implemented by (a) the left-edge algorithm and (b) our optimization algorithm

first used to allocate the net segments such that the number of tracks used is minimum.

The segments in the initial routing solution are then perturbed by moving, swapping, and permuting so as to minimize the coupling power. Note that the perturbation cannot be performed freely because of the vertical constraints on their relative vertical positions. The track assignment problem is known to be NP-hard. So we use a simulated annealing approach to obtain an optimal solution [7].

The temperature function is defined as follows: Without loss of generality, we assume that unit-length cross-coupling capacitance  $C_x$  has unit value for minimum spacing. Then the temperature is computed as the sum of the effective capacitive couplings of each segment of adjacent nets.

$$Temp = \sum_{\substack{i,j \in \mathcal{N} \\ j_s \in seg(j)}} \sum_{\substack{i_s \in seg(i) \\ j_s \in seg(j)}} \widehat{C}_x(i_s, j_s) \tag{20}$$

where net *i* is adjacent to net *j*,  $i_s$  is a segment of the net *i*, and  $j_s$  is a segment of the net *j*. Each segment corresponds to a unit RC tile in Figure 1(b). The space between two adjacent net segments and switching activities determine the effective capacitive couplings of each segment as in Equation (6).

**Example 4.1**: Track assignment examples are illustrated in Figure 5. Each routing solution in Figure 5(a) and (b) is produced by the left edge algorithm and a track perturbation algorithm based on simulated annealing, respectively. The left edge algorithm gives us a minimum track routing, which is the basis for track perturbation.

Given switching probabilities and slack time, the temperature is computed to be 13.4 for the solution in Figure 5(a). Figure 5(b) shows the routing solution based on track permutation, net swapping, and net moving [13]. The average coupling power is computed to be 10.7 which is substantially improved in comparison to initial routing.  $\Box$ 

It is worthy noting that track perturbation does not incur any area overhead because we carry out permutation, moving and swapping within the initial routing area. During the annealing process, the maximum crosstalk constraints are guaranteed by evaluating Equation (19). The concept of crosstalk immunity set is applied to account for logical relationships between signals.

### 5 Experimental Results

Domino logic synthesis flow is shown in Figure 6, which is implemented on an Ultra SPARC workstation. Technology independent optimization is performed using SIS [12], then all the internal inverters are eliminated for a domino logic realization. Crosstalk immunity set information is extracted so that maximum crosstalk computation accurately represents the logical behavior of adjacent wires [9]. Identified CIS information is used to compute the maximum cross-coupling effect so that noise constraints are satisfied during optimization. The circuit is then mapped with a parameterized cell-library [14], which specifies the maximum number of allowable serial and parallel transistors in a cell. The mapped circuit is placed and routed using the left-edge algorithm, which provides the initial routing solution for track assignment. In each cooling step of the simulated annealing, the temperature in Equation (20) is evaluated based on switching probabilities and updated timing information. For each routing solution, interconnect parameters for distributed RC model are extracted under 0.18µm technology process. Eventually, we measure the power consumed by the circuit using HSPICE for given signal probabilities on the primary input signals. Figure 7 contains two curves of power consumption in the benchmark circuit C880 for given input patterns. The curve (a) in Figure 7 represents the running average for the circuit implementation with the left-edge algorithm, while the curve (b) represents the average power for optimized routing. Notice that in Figure 7, it takes about only 60 simulation cycles to converge its eventual level, and it takes about 10 to 40 cycles to reach the  $\pm 1\%$  error line.

Table 5 contains synthesis results and measured power in mWwhen the signal probabilities assigned to the primary inputs were 0.2, 0.5 and 0.8, respectively. The results under the column "P(LE)" refer to the power consumed by the circuit implementation using the left-edge algorithm. It should be noted that the power measured by HSPICE includes power consumed by gate capacitance and interconnect capacitance, power attributed to short circuit current and leakage power. The power consumed by the optimal implementation are reported under the column "P(OPT)". The optimal routing solutions save 14.2%, 12.6%, and 16.1% power compared to the initial routing as shown in the column "% Sav" denoting the percentage of the power saving, with respect to each signal probability. It can be noted that the impacts of optimization on power saving would become more significant in UDSM process, since power dissipation will more heavily dependent on coupling capacitance in UDSM than in  $0.18\mu$ m technology that we used.

Table 2 contains the results of maximum crosstalk computed by Equation (19) for each physical realization with unit value for coupling capacitance  $C_x$ . The results show that maximum crosstalk effects are reduced by 10.7%, 8.8%, and 12.1%, with respect to each signal probability.

## 6 Conclusion

Signal integrity and energy efficiency are becoming important in ultra deep sub-micron technology. Furthermore, as a premier candidate for high-performance system design, reliable domino logic implementation requires a well-structured interconnect network. The reason is that domino logic is more vulnerable to noise in comparison to static CMOS logic.

We presented a formulation for the average power dissipated by interconnect. The power model accounts for signal and temporal correlations when computing effective capacitance. Signal correlations are evaluated by using OBDD efficiently. Timing analysis provides for temporal correlation factor that accounts for charge redistribution on coupling capacitance due to misaligned transitions on adjacent nets. Then, maximum crosstalk effects between neighboring wires are modeled to reflect the logical aspects as well as electrical factors. Based on the coupling power model and the maximum crosstalk model, coupling power is optimized using track perturbation. Meanwhile, constraints on area and noises are assured to meet the specification. Experimental results show that over 12% of total power is saved and the maximum crosstalk is reduced by 8.8% eventually with on-probability of 0.5 for all primary input signals.

|         | Signal Prob $= 0.2$ |        |       | Signal Prob $= 0.5$ |        |       | Signal Prob = 0.8 |        |       |
|---------|---------------------|--------|-------|---------------------|--------|-------|-------------------|--------|-------|
| Circuit | P(LE)               | P(OPT) | % Sav | P(LE)               | P(OPT) | % Sav | P(LE)             | P(OPT) | % Sav |
| C432    | 1.5                 | 1.4    | 6.7   | 1.5                 | 1.4    | 6.7   | 1.5               | 1.4    | 6.7   |
| C499    | 3.6                 | 2.9    | 19.4  | 3.6                 | 3.0    | 16.7  | 3.6               | 2.8    | 22.2  |
| C880    | 2.3                 | 1.9    | 17.4  | 2.2                 | 1.8    | 18.2  | 2.1               | 1.6    | 23.8  |
| C1908   | 8.3                 | 6.8    | 18.1  | 8.2                 | 6.9    | 15.9  | 8.3               | 6.7    | 19.3  |
| C2670   | 6.7                 | 6.0    | 10.4  | 6.7                 | 6.2    | 7.5   | 6.8               | 6.2    | 8.8   |
| C3540   | 39.8                | 37.4   | 6.0   | 40.9                | 37.8   | 7.6   | 41.1              | 37.2   | 9.5   |
| C5315   | 33.4                | 30.3   | 9.3   | 33.2                | 29.8   | 10.2  | 33.5              | 29.5   | 11.9  |
| C6288   | 121.1               | 89.1   | 26.4  | 121.0               | 98.2   | 18.8  | 118.8             | 86.2   | 27.4  |
| C7552   | 97.7                | 83.4   | 14.6  | 97.4                | 86.3   | 11.4  | 96.7              | 81.1   | 16.1  |
| Avg.    |                     | •      | 14.2  |                     | •      | 12.6  |                   | •      | 16.1  |

Table 1: Impact of power optimization for routing results with various signal probabilities: power (mW) is measured using HSPICE

Table 2: Maximum crosstalk for routing solutions

|         | Signal Prob = $0.2$ |        |       | Signal Prob = $0.5$ |        |       | Signal Prob = $0.8$ |        |       |
|---------|---------------------|--------|-------|---------------------|--------|-------|---------------------|--------|-------|
| Circuit | X(LE)               | X(OPT) | % Red | X(LE)               | X(OPT) | % Red | X(LE)               | X(OPT) | % Red |
| C432    | 0.27                | 0.25   | 9.2   | 0.27                | 0.26   | 3.9   | 0.27                | 0.22   | 19.0  |
| C499    | 0.26                | 0.24   | 7.6   | 0.26                | 0.25   | 2.2   | 0.26                | 0.22   | 16.2  |
| C880    | 0.33                | 0.24   | 26.5  | 0.33                | 0.27   | 17.5  | 0.33                | 0.28   | 16.5  |
| C1908   | 0.43                | 0.42   | 2.5   | 0.43                | 0.39   | 10.7  | 0.43                | 0.42   | 2.0   |
| C2670   | 0.64                | 0.62   | 3.1   | 0.64                | 0.61   | 4.7   | 0.64                | 0.60   | 6.3   |
| C3540   | 1.63                | 1.43   | 12.6  | 1.63                | 1.49   | 8.7   | 1.63                | 1.58   | 3.3   |
| C5315   | 1.30                | 1.27   | 2.3   | 1.30                | 1.19   | 8.0   | 1.30                | 1.27   | 2.3   |
| C6288   | 3.16                | 2.70   | 14.5  | 3.16                | 2.89   | 8.5   | 3.16                | 2.00   | 36.9  |
| C7552   | 2.72                | 2.22   | 18.1  | 2.72                | 2.31   | 14.8  | 2.72                | 2.54   | 6.4   |
| Avg.    | •                   | •      | 10.7  | •                   | •      | 8.8   | •                   | •      | 12.1  |

### References

- [1] S. T. Chakravarty. On the complexity of using BDDs for the synthesis and analysis of boolean circuits. In *the 27th Annual Allerton Conf. on Communication, Control and Computing*, pages 730–739, 1989.
- J. Cong. Challenges and opportunities for design innovations in nanometer technologies. In SRC Design Science Concept, 1997.
- [3] J. Cong and L. He. Theory and algorithm of local-refinementbased optimization with application to device and interconnect sizing. *IEEE Trans. Computer-Aided Design*, 18(4):406– 420, Apr. 1999.
- [4] S. Ercolani, M. Favalli, M. Damiani, P. Olivo, and B. Ricco. Estimate of signal probability in combinational logic networks. *In First European Test Conf.*, pages 132–138, 1989.
- [5] A. Hashimoto and J. Stevens. Wire routing by optimizing channel assignment within large apertures. In *Proc. ACM/IEEE Design Automation Conf.*, pages 155–169, 1971.
- [6] Y. I. Ismail, E. G. Friedman, and J. L. Neves. Figures of merit to characterize the importance of on-chip inductance. In *Proc.* ACM/IEEE Design Automation Conf., pages 560–565, 1998.
- [7] K. S. Jhang, S. Ha, and C. S. Jhon. Simulated annealing approach to crosstalk minimization in gridded channel routing. VLSI Design: Special Issue: High Performance Design Automation of VLSI Interconnects, 7(1):85–95, 1998.

- [8] S. M. Kang and Y. Leblebici. CMOS Digital Integrated Circuits: Analysis and Design. McGraw-Hill, 2nd edition, 1998.
- [9] K. W. Kim, U. Narayanan, and S. M. Kang. Domino logic synthesis minimizing crosstalk. In *Proc. ACM/IEEE Design Automation Conf.*, 2000.
- [10] R. Marculescu, D. Marculescu, and M. Pedram. Switching activity analysis considering spatiotemporal correlations. In *Proc. IEEE/ACM Int. Conf. Computer Aided Design*, pages 294–299, Nov. 1994.
- [11] Semiconductor Industry Association. International technology roadmap for semiconductors, 1999.
- [12] E. M. Sentovich, J. K. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj, P. R. Stephan, R. K. Brayton, and A. Sangiovanni-Vincentelli. SIS: A system for sequential circuit synthesis. Technical report, UCB/ERL M92/41, Electronics Research Laboratory, College of Engineering, University of California, Berkeley, May 1992.
- [13] J. S. Yim and C. M. Kyung. Reducing cross-coupling among interconnect wires in deep-submicron datapath design. In *Proc. ACM/IEEE Design Automation Conf.*, pages 485–490, 1999.
- [14] M. Zhao and S. S. Sapatnekar. Technology mapping for domino logic. In *Proc. IEEE/ACM Int. Conf. Computer Aided Design*, pages 248–251, 1998.