## Post Routing Performance Optimization via Tapered Link Insertion and Wiresizing

Tianxiong Xue and Ernest S. Kuh Department of Electrical Engineering and Computer Sciences University of California at Berkeley Berkeley, CA 94720

#### Abstract

Most existing performance-driven and clock routing algorithms can not guarantee performance after all nets are routed. This paper proposes a new post routing approach which can reduce both maximum delay and skew of an existing routing topology by tapered link insertion and non-uniform wiresizing. It uses the Sequential Quadratic Programming method for constrained optimization. Experimental results show that our approach can improve performance significantly and consume less area than uniform wiresizing.

#### 1 Introduction

Interconnect routing problem is critical for high speed preformance-driven physical design and must be addressed properly during the layout process. In recent years, several performance-driven routing algorithms have been proposed [1, 2, 3, 6]. Unlike conventional routing approaches, their objective for routing tree construction is to minimize the maximum delay of the net. They all adopt Elmore delay or its upper bounds for delay estimation and various optimization algorithms have been developed.

Another category of problems closely related to performance-driven routing is clock routing which aims at minimizing the maximum delay skew among sinks of a net. There have been numerous approaches for zero-skew solutions. [7] uses a recursive bottom up algorithm to build a zero-skew clock tree.

These performance-driven and clock routing algorithms are in fact pre-routing methods, i.e., the optimal routing topology for each net is constructed individually without considering the routability of the entire circuit. Pre-routing methods can not guarantee performance after final routing because the net topologies are subject to significant modifications when nets are routed in order to generate a feasible solution. Thus, post routing performance optimization is necessary to satisfy the performance requirements for the nets.

Another limitation of existing approaches is that they restrict routing topologies to either trees or other fixed topologies. Such simplification sacrifices the flexibility in routing. For example, topologies which contain loops have not been studied.

This paper proposes a new approach for post routing performance optimization. Unlike [4], which greedily adds new edges into an existing tree on a geometric routing graph, we analyze in detail the impact of link insertion on maximum delay and delay skew of any arbitrary topology and demonstrate that a link inserted between the reference node and maximum delay node can achieve the best improvement in performance. We then design an approach which inserts a new link into an existing net topology and performs non-uniform wiresizing under routing resource constraint to improve the performance of the net. The objective is to satisfy the performance requirements with minimum routing area consumption. This approach has several advantages over previous ones:

1. It achieves reduction in both maximum delay and skew while previous clock routing algorithms often sacrifice delay for skew minimization.

2. It guarantees the routability of the net and treats the tree and mesh structures in the same manner without any restrictions on routing topology.

3. It reduces maximum delay and skew to satisfy any user specified bounds, while minimum delay and zero skew algorithms may generate over-constrained solutions in real world situations.

The advantage of our approach over previous wiresizing approaches can be demonstrated by the following example. Wiresizing on Topology 1(Fig. 1 (a)) alone can never reduce the skew between sink 1 and 2 to zero, but Topology 2 (Fig. 1 (b)) with a new link added between the source and sink 2 consumes less routing area than the double wiresizing of Topology 1 while achieving a larger reduction in maximum delay and zero skew between the sinks. This is because both the admittance at the nodes and the topology of the net are adjusted by our approach.



Fig. 1 (a) Topology 1 (b) Topology 2

# 2 Delay and Skew Analysis2.1 Definitions

The interconnect routing topology N is formulated as a network of uniform RC lines, each line is formulated by the  $\pi$  lumped RC model. Trees and meshes are treated without distinction in the following discussions.

We adopt the convention that the ground node in N is not numbered, the source consists of a voltage source in series with a driver resistance  $R_d$  which is inserted between the ground and node 0, as shown in Fig. 2. The remaining nodes in N are numbered from 1 to n = |N|. The loading capacitance at every sink node in N is denoted by  $C_s$ . The resistance matrix of N is denoted by  $\mathbf{R} = [R_{ij}]$ , where each entry  $R_{ij}$  in ohms, is equal to the potential in volts at node *i* if a 1A current were injected into node j while all nodes in Nother than j were open circuited [5]. Notice that the driver resistance  $R_d$  should be a part of every  $R_{ij}$ . In the special case where N is a tree,  $R_{ij}$  is simply the total resistance along the common path shared by node *i* and *j*. Denote  $\mathbf{c} = [C_j]$  as the capacitance vector for N, where  $C_j$  is the ground capacitance at node  $j \in N$ . We use Elmore delay which is the first order of moment of the impulse for delay and skew computation. It provides a good estimation for signal delay when the waveform is monotonic. The delay  $D_i$  at node iis:

$$D_i = \sum_j R_{ij} C_j \tag{1}$$

Define the maximum node delay in N as  $D_{max} = max\{D_i|i \in N\}$ . Denote the delay skew between nodes i, j by  $\Delta D_{ij} = D_i - D_j$ . The maximum delay skew in N,  $\Delta D_{max}$ , is then defined as  $\Delta D_{max} = max\{|D_{ij}||i, j \in N\}$ . Consider N as an undirected connected graph, which

Consider N as an undirected connected graph, which is not *bi-connected*, and let reference node 0 be an *articulation point* of N, i.e., N will be disconnected if node 0 is removed from it. Assume that the removal of node 0 results in the decomposition of N into k + 1 disjoint components  $N_1, N_2, \ldots, N_k$  and the source. There is no connection between any of these components (If 0 is not an articulation point of N, then k = 1). This decomposition facilitates the investigation of changes in net delay and skew caused by topology modification(Fig. 2).





To study the impact of changes in topology on  $D_{max}$ and  $\Delta D_{max}$ , we establish a new routing topology N'by adding an additional link e with parameters  $R_e$ and  $C_e$  between node 0 and a chosen node  $n \in N$ .  $\mathbf{R}'$ and  $\mathbf{c}'$  are the resistance matrix and capacitance vector of N', respectively.  $D'_{max}$  and  $\Delta D'_{max}$  denote the maximum delay and delay skew of N', respectively.

## 2.2 Changes in Node Delay

We first compare the entries in resistance matrices of N and N', i.e.,  $\mathbf{R} = [R_{ij}]$  and  $\mathbf{R'} = [R'_{ij}]$ .

For any chosen node  $n \in N_k$ , the additional link ewith parameters  $R_e, C_e$  between node n and 0 will increase the admittance value at node n and provide an extra path from every node in  $N_k$  to reference node 0. As each pair of nodes  $i, j \in N_k$  now share one more path to node 0 through  $n, R'_{ij}$  will decrease. For each pair of nodes i, j which do not belong to the same component in the decomposition,  $R_{ij}$  is not affected by this topology change in N' since node 0 is on every path p(i, j) between i and j. In this case,  $R'_{ij}$  and  $R_{ij}$  are the same as the driver resistance  $R_d$ . This observation leads to the following Lemma (The formal proofs of all lemmas, theorems and corollaries in this paper can be found in [8]).

**Lemma 1** Suppose  $n \in N_k$ . If both  $i, j \in N_k$ ,  $R'_{ij} < R_{ij}$ , and when  $R_e$  decreases,  $R'_{ij}$  also decreases; if  $i \notin N_k$  or  $j \notin N_k$ ,  $R'_{ij} = R_{ij}$ .

According to Eqn (1), the delay  $D_i$  at node *i* is determined by the product of  $\mathbf{R}_{,i}$  and  $\mathbf{c}$ , so decrease in  $R_{ij}$  alone can not guarantee a decrease in  $D_i$ .  $R_e$  and  $C_e$  must be set properly so that the decrease in  $R_{ij}$  can offset the increase in  $C_0$  and  $C_n$ .

**Theorem 1** With appropriate  $R_e$  and  $C_e$  values,  $D'_i < D_i$ ,  $\forall i \in N_k$ , and when  $R_e$  decreases,  $D'_i$  also decreases.

According to Theorem 1,  $D'_{max} < D_{max}$  holds for the following special case:

When node 0 is not an *articulation point* of N, i.e., k = 1 and  $N = N_k$ , according to Theorem 1,  $D'_{max}$  will decrease.

**Corollary 1** If node 0 is not an articulation point of N, then  $D'_{max} < D_{max}$  holds with proper values of  $R_{\epsilon}, C_{\epsilon}$ .

In most routing situations, it is easy to satisfy the condition that node 0 is not an articulation point of N by designing the routing topology properly. According to Corollary 1, the maximum delay can then always be reduced by adding new link into the existing topology with proper values of  $R_{e}, C_{e}$ .

#### 2.3 Changes in Delay Skew

We first investigate the change in  $\Delta R(i, j, m) = R_{im} - R_{jm}$  which is directly related to the changes in delay skew  $\Delta D_{ij}$ . Again, it is assumed that n can be any node in  $N_k$  (later in Corollary 2, n is set to be the node with maximum delay skew). To study the impact of topology change on  $\Delta R(i, j, m)$ , we further decompose  $N_k$  into two sub-components,  $N_{k1}$  and  $N_{k2}$ , if n is an *articulation point* of  $N_k$ .  $N_{k1}$ ,  $N_{k2}$  are connected to each other only at n, i.e., n is on every path from a node in  $N_{k2}$  to node 0 (Fig. 3).



## Fig. 3 Decomposition of $N_k$

N' is again formed by establishing a new link between 0 and n. If both  $i, j \in N_k$ , according to Lemma 1,  $R'_{ij} < R_{ij}$ . Intuitively, for each node  $m \in N$ , if  $j \in N_{k2}$ ,  $R_{nm}$  and  $R_{jm}$  will decrease by the same amount because each path from node j to node 0 must include n, i.e.,  $\Delta R'(n, j, m) = \Delta R(n, j, m)$ . If  $j \in N_{k1}, j \neq n$ ,  $R_{nm}$  and  $R_{jm}$  do not decrease by the same amount. Since the increase in admittance from node j to node m is caused by the insertion of a new link between nodes 0 and n, it is less than the admittance increase from n to m, i.e., the decrease in  $R_{jm}$  is less than the decrease in  $R_{nm}$ . If  $j \notin N_k$ ,  $R_{jm}$ will not change according to Lemma 1. In both cases,  $\Delta R'(n, j, m) < \Delta R(n, j, m)$  (if n is not an articulation point, i.e.,  $N_k = N_{k1}$ , the analysis above is still valid).

**Lemma 2**  $\forall m \in N$ , If n is an articulation point of  $N_k$  and  $j \in N_{k2}$ ,  $\Delta R'(n, j, m) = \Delta R(n, j, m)$ ; otherwise,  $\Delta R'(n, j, m) < \Delta R(n, j, m)$ , and when  $R_e$  decreases,  $\Delta R'(n, j, m)$  decreases.

Again, since delay skew relates to both **R** and **c** and  $C'_0, C'_n$  increase in N', reducing  $\Delta R(n, j, m)$  alone can not guarantee the decrease of  $\Delta D_{nj}$ . Appropriate values of  $R_e$  and  $C_e$  are necessary.

**Theorem 2** If n is not an articulation point or  $j \in N_{k1}$ , then  $\Delta D'_{nj} < \Delta D_{nj}$  holds with proper values of  $R_e$  and  $C_e$ . When  $R_e$  decreases,  $\Delta D'_{nj}$  also decreases.

Recall that  $\Delta D_{max} = max\{|\Delta D_{ij}| | \forall i, j \in N\}$ . Without the loss of generality, we can choose  $n \in N_k$  as the node with maximum delay skew in N. Suppose  $\Delta D_{max}$  exists between n and some node  $p \in N$ , i.e.,  $\Delta D_{max} = D_n - D_p = \Delta D_{np}$  (If  $\Delta D_{max}$  occurs between more than one pair of nodes, multiple links can be added). Since  $\Delta D_{nj} \geq 0, \forall j \in N, n$  is also the node with maximum delay in N and n can not be an *articulation point* (otherwise,  $\exists n' \in N_{k2}, \text{ s.t., } D_{n'} > D_n$  and  $\Delta D_{n'j} > \Delta D_{np} = \Delta D_{max}$ ). So n enjoys both maximum delay and skew and is denoted as the maximum delay (skew) node in the following context.

Let  $\Delta D_{m2}$  be the maximum skew among nodes other than n, i.e.,  $\Delta D_{m2} = max\{|\Delta D_{ij}|| i, j \neq n\}$ . Apparently,  $\Delta D_{m2} < \Delta D_{max}$ . After the link is established in N', both  $\Delta D'_{m2}$  and  $\Delta D'_{np}$  will be continuous functions of  $R_e$  and  $C_e$ . Since  $\Delta D'_{np}$  is less than  $\Delta D_{max}$ , it is always possible to determine values of  $R_e$  and  $C_e$  such that  $D'_{max} = max(\Delta D'_{np}, \Delta D'_{m2}) < \Delta D_{max}$ .

**Corollary 2** If n is the maximum delay (skew) node in N, then  $\Delta D'_{max} < \Delta D_{max}$  holds with proper values of  $R_e, C_e$ .

Corollary 2 implies that we can always reduce the maximum delay skew of an existing routing topology by adding a new link properly between the reference node and the node with maximum delay (skew). Recall that a similar approach is used to reduce the maximum delay of the topology. So unlike previous clock routing algorithms which increase delay to minimize delay skew, it is possible to reduce both maximum delay and maximum delay skew with our approach.

## 3 Tapered Link Insertion and Wiresizing

This section discusses a new post routing approach which constructs a tapered link and wiresizes it properly to improve the performance of the net in terms of both maximum delay and delay skew. This optimization process has two phases:

1. Find the route of the link under routing resource constraints.

2. Wiresize the link so that the performance requirements are satisfied.

#### 3.1 **Problem Formulation**

In the following analysis, we assume that reference node 0 is not an *articulation point* of N, i.e.,  $N = N_k$ (the situation where node 0 is an *articulation point* can be easily handled by adding a link for each  $N_i$ ). The given topology is not modified during the optimization process.

Suppose link e established between maximum delay(skew) node n and reference node 0 in N' has kline segments. Define length and width vectors of eas  $\mathbf{l} = (l_1, \ldots, l_k)$  and  $\mathbf{w} = (W_1, \ldots, W_k)$  and denote  $\mathbf{w}_{ub}$  as the vector of routing resource constraints on  $\mathbf{w}$ . The R, C values of segment  $i \in e, R_i, C_i$  satisfies:

$$R_i = r_o \frac{l_i}{W_i}, \quad C_i = c_0 l_i W_i \tag{2}$$

where  $r_0, c_0$  are the unit resistance and capacitance of the link with minimum width  $W_0$ , respectively. The total R, C values of link e,  $R_e$  and  $C_e$ , can then be expressed as:

$$R_e = r_0 \sum_{i=1}^k \frac{l_i}{W_i}, \quad C_e = c_0 \sum_{i=1}^k l_i W_i$$
(3)

The delay and skew of N' can be measured by  $K = (P - P_{ub})/P_{ub}$ , where P is the maximum delay or delay skew, and  $P_{ub}$  is a specified constraint.  $K_{max} = \max\{K_{delay}, K_{skew}\}$  measures the performance of the net and  $K_{max} \leq 0$  means that the requirements for maximum delay and skew are both satisfied.

#### 3.2 **Proposed Algorithms**

According to Eqn (3), the route of e should be as short as possible. Thus, a shortest path algorithm can be applied to obtain a feasible route of e between the reference node 0 and the maximum delay (skew) node n. Once the route is determined, the length vector of e,  $\mathbf{l}$ , is fixed. Unlike  $\mathbf{l}$ , the maximum delay and delay skew are not monotonic in  $\mathbf{w}$ , since  $C_i$  is proportional to  $W_i$ while  $R_i$  is inversely proportional to  $W_i(\text{Eqn}(2))$ . So the wiresizing problem requires an optimization process which adjusts the width vector  $\mathbf{w}$  in such a way that a satisfactory performance  $(K_{max} \leq 0)$  can be obtained under the routing resource constraint  $\mathbf{w}_{ub}$ .

#### 3.2.1 Tapered Link Insertion and Wiresizing Algorithm

- 1. Initialization:
  - 1.1 Input existing net topology N and its specified constraints.
  - 1.2 Identify maximum delay (skew) node n and  $K_{max}$ .
  - 1.3 Set initial delay bound  $D_{bound} = D_{max}$  and its decreasing rate p.
- 2. Link insertion:

Build new topology N' by establishing a shortest feasible link e of k segments between the reference node and n.

- 3. Tapered link wiresizing: While  $K_{max} > 0$  and solution is feasible:
  - 3.1 Reduce  $D_{bound}$  by p.
  - 3.2 Call tapered link wiresizing algorithm s.t.  $D'_{max}$  satisfies  $D_{bound}$ .
  - 3.3 Evaluate node delays and  $K_{max}$ .

When the algorithm terminates, either a feasible solution which satisfies the performance requirements has been obtained, or  $\mathbf{w}$  exceeds  $\mathbf{w}_{ub}$  implying that the specified requirements are unachievable.

#### 3.2.2 Taper Link Wiresizing Algorithm

The non-uniform wiresizing of link e is formulated as an optimization problem:

Wiresize link e of k segments under constraints on routing width s.t.  $D'_{max}$  satisfies specified constraint  $D_{bound}$  with minimum routing resource consumption. After the routing which fixes the length vector  $\mathbf{l}$ ,  $\mathbf{w}$ is the only adjustable vector in N', and the delay at each node can be expressed as a function of  $\mathbf{w}$ . Thus, the non-uniform wiresizing optimization problem can be formulated as:

Minimize Subject to:

$$egin{array}{rcl} D'_{max}(\mathbf{w}) &\leq D_{bound}\ 0 < \mathbf{w} &\leq \mathbf{w}_{ub} \end{array}$$

lw

This problem is solved by constrained optimization methods discussed below.

## 3.2.3 Sequential Quadratic Programming

As can be seen from Eqn (4) below,  $D'_{max}(\mathbf{w})$  is nonlinear with respect to w. Thus the wiresizing problem is a non-linear programming problem which can be solved effectively by the Sequential Quadratic Programming (SQP) method. SQP is an iterative procedure which establishes a direction of search at each major iteration. The principal idea is the formulation of a Quadratic Programming (QP) sub-problem at each iteration based on an approximation made of the Hessian of the Lagrangian function, using a quasi-Newton updating method. The Lagrangian function is expressed as  $L(\mathbf{w}, \lambda) = f(\mathbf{w}) + \sum_{i=1}^{m} \lambda_i g_i(\mathbf{w})$ , where f and the  $g_i$ s are the objective function and constraints of the problem, respectively. This sub-problem can be solved using any QP algorithm. Its solution is then used to form a search direction for a line search procedure which determines the step length of the variables in such a way that a sufficient decrease in a merit function is obtained.

#### 3.3 Optimality Analysis

The optimality of our proposed algorithm is based on the following three properties:

1. For a certain amount of reduction in  $D'_{max}$ , a link connecting to the maximum delay (skew) node  $n \in N$  consumes less area than a link connecting to any other node.

2. For a link connecting to n, the non-uniform wiresizing algorithm requires less routing area than uniform link sizing for the same amount of reduction in  $D'_{max}$ . 3. The reduction in maximum delay and maximum delay skew are consistent, i.e, larger reduction in  $D'_{max}$ leads to larger reduction in  $\Delta D'_{max}$ .

#### **3.3.1** Link to *n* vs Link to Other Node

Although maximum delay can be reduced by inserting a link between the reference node and any node  $i \in N$ , it is reduced the most if the link connects to the maximum delay (skew) node  $n \in N$ . This is because a link to n would reduce  $R'_{nj}$  more than a link to any other node, and therefore the delay at node n,  $D'_n$ , is reduced the most. Since  $D'_n$  provides a lower bound for  $D'_{max}$  (which may now exist at a node other than n), one would expect that  $D'_{max}$  would also be the lowest among all possible links. This is formally stated in the following lemma.

**Lemma 3** For an inserted link e with fixed values of  $R_e$  and  $C_e$ ,  $D'_{max}$  is minimized if e is connected to the maximum delay (skew) node  $n \in N$ .

Notice that if the  $C_e$  value of the tapered link is fixed, its routing area  $\sum_{i \in e} l_i W_i$  is also fixed. So Lemma 3 implies that for any amount of routing resource consumption, a link to node n will reduce the maximum delay more than a link to any other node.

When the width W of an uniformly sized link increases,  $D'_n$  decreases if W is below a certain value  $W_{limit}$  and increases when  $W > W_{limit}$ . Without the loss of generality, we can assume that the best sizing solution of each link will follow the same relationship (which is lower than the delay value by uniform sizing). Thus following corollary can be established using Lemma 3.

**Corollary 3** To achieve a certain amount of reduction in  $D'_{max}$ , a link connecting to n consumes less routing area than a link to any other node.

**3.3.2** Uniform vs Non-uniform Wiresizing For any two links with the same  $R_e$  value, the changes in  $R_{ij}$  should be the same regardless of the segment widths of the links since same amount of additional admittance is introduced at node n. But  $D'_{max}$  also depends on the distribution of capacitance along e, which can be adjusted by wiresizing. Intuitively, large capacitive loads should be placed close to the source so that  $R_{ij}, \forall i$ , is small for node  $j \in e$  with large  $C_j$ . This means that non-uniform wiresizing can yield a larger reduction in maximum delay than uniform wire sizing.

**Lemma 4** For any tapered link e inserted between node n and reference node 0 with fixed values of  $R_e$ ,  $C_e$ , non-uniform wiresizing of e can yield smaller  $D'_{max}$  than uniform wiresizing.

Lemma 4 implies that for any amount of routing resource consumption, non-uniform wiresizing can always achieve larger reduction in  $D'_{max}$  than uniform wiresizing. Analogous to the analysis for Corollary 3, the following corollary can be established.

**Corollary 4** To achieve a certain amount of reduction in  $D'_{max}$ , non-uniform wiresizing consumes less routing area than uniform wiresizing.

Corollary 4 also implies that it is possible that non-uniform wiresizing may generate a solution with smaller  $D'_{max}$  and less routing area than uniform wiresizing. This will be shown by our experimental results.

#### 3.3.3 Max Delay vs Skew Reduction

Corollary 2 in Section 2 implies that an inserted link at the maximum delay (skew) node can effectively reduce the maximum skew, which means that when the inserted link is connected to node n, maximum delay and skew reductions are consistent. This is formally stated in the following lemma.

**Lemma 5** If an inserted tapered link is connected to the maximum delay (skew) node  $n \in N$  and wiresized, the maximum delay and maximum skew reductions are consistent, i.e., larger reduction in  $D'_{max}$  leads to larger reduction in  $\Delta D'_{max}$ .

Lemma 5 implies that the conclusions for maximum delay reduction can also be extended to maximum delay skew reduction, so the following corollary can be established according to the definition of  $K_{max}$ .

**Corollary 5** Lemma 3,4 and Corollary 3, 4 are all valid for reduction of  $K_{max}$ .

Based on the analysis above, the optimality of the algorithm proposed in Sec. 3.2.1 can be established.

**Proposition 1** Among all tapered link insertion and wiresizing approaches which satisfy given performance constraints ( $K_{max} < 0$ ), the proposed algorithm is optimal in terms of routing resource consumption.

## 4 Experimental Results

The post routing performance optimization algorithm has been implemented in C and tested on a DEC3100 workstation. The following MCM parameters are used in the delay evaluation (courtesy of Prof. Wayne Dai of UC Santa Cruz from data provided by AT&T): driving resistance  $R_d = 25.0\Omega$ , unit resistance and capacitance  $R = 0.008\Omega/\mu m$ ,  $C = 0.06 fF/\mu m$ , loading capacitance  $C_s = 1000 fF$ .

|     |               | U             |               |          |
|-----|---------------|---------------|---------------|----------|
| Net | # of          | Total wire    | Inserted link | $W_{ub}$ |
| no. | $_{\rm pins}$ | length $(mm)$ | length $(mm)$ | $(xW_0)$ |
| 1   | 8             | 90            | 60            | 4        |
| 2   | 8             | 130           | 60            | 4        |
| 3   | 8             | 160           | 40            | 4        |
| 4   | 9             | 130           | 80            | 6        |
| 5   | 11            | 160           | 90            | 6        |
| 6   | 12            | 180           | 80            | 6        |
| 7   | 12            | 220           | 100           | 6        |
| 8   | 20            | 480           | 80            | 8        |

Table 1. Physical Characteristics of Nets

Table 1 lists the physical characteristics of eight test topologies, including both tree and mesh structures (See Fig. 4). In our experiments, the number of line segments of each link, k, is set to 6 and segments on each link share the same width upper bound  $W_{ub}$  ( $W_0$  is the minimum wire width). For nets 3, 5, 7, 8, the length of link e is longer than the Manhattan distance between node 0 and n, since the shortest path is either unroutable or can not be wiresized to  $W_{ub}$ .

Two different wiresizing approaches for the inserted link e are compared: one wiresizes e uniformly; the other approach applies the tapered link wiresizing algorithm discussed in Section 3.

Table 2 shows the performance improvement of each net achieved by link insertion and wiresizing. MD and MS denote Maximum Delay and Skew of the topology respectively. For each net, these quantities must be reduced by the specified reduction percentages in order to satisfy the performance requirements. The  $K_{max}$ values before optimization indicate that the original maximum delay and skew values are high above their specified bounds. After optimization, it can be observed that:

1. Both the uniform and non-uniform wiresizing approaches can improve net performance significantly. On average, maximum delay and skew are reduced by approximately 40% and 60% respectively by both approaches.  $K_{max}$  is reduced from an average of 1.54 to 0.031 and -0.004 by uniform and non-uniform methods, respectively. This shows that post routing link insertion and wiresizing is very effective at improving net performance.

2. The tapered link non-uniform wiresizing approach can achieve better maximum delay and delay skew than the uniform wiresizing approach. In all cases where the uniform wiresizing approach fails to reach the specified bounds for both maximum delay and skew  $(K_{max} > 0)$ , the non-uniform wiresizing approach can generate feasible solutions which satisfy the performance requirement  $(K_{max} < 0)$ . The optimization process stops when  $K_{max} < 0$ , better results

|     | Before Optimization |        |           | After Optimization |        |                 |        |        |           |
|-----|---------------------|--------|-----------|--------------------|--------|-----------------|--------|--------|-----------|
| Net | Spec. Reduction     |        |           | Uni. Sizing        |        | Non Uni. Sizing |        |        |           |
| No. | MD(-%)              | MS(-%) | $K_{max}$ | MD(-%)             | MS(-%) | $K_{max}$       | MD(-%) | MS(-%) | $K_{max}$ |
| 1   | 35                  | 56     | 1.27      | 33.72              | 56.44  | 0.020           | 35.53  | 57.00  | -0.008    |
| 2   | 47                  | 69     | 2.23      | 45.93              | 68.93  | 0.020           | 47.15  | 69.56  | -0.003    |
| 3   | 30                  | 61     | 1.56      | 26.12              | 59.22  | 0.055           | 30.81  | 61.36  | -0.003    |
| 4   | 51                  | 64     | 1.39      | 49.19              | 59.64  | 0.037           | 51.50  | 64.12  | -0.003    |
| 5   | 33                  | 62     | 1.63      | 30.05              | 60.40  | 0.044           | 33.34  | 62.11  | -0.004    |
| 6   | 35                  | 60     | 1.50      | 32.10              | 42.61  | 0.045           | 36.07  | 44.85  | -0.002    |
| 7   | 47                  | 58     | 1.38      | 45.97              | 57.43  | 0.019           | 47.01  | 58.03  | -0.001    |
| 8   | 43                  | 64     | 1.39      | 42.03              | 64.81  | 0.003           | 43.04  | 64.92  | -0.003    |
| avg | 40.13               | 61.75  | 1.54      | 38.14              | 60.67  | 0.031           | 40.56  | 62.14  | -0.004    |

Table 2. Net Performance Improvement by Link Insertion and Wiresizing

in performance are possible if it continues.

Table 3. Area Consumption by Link Insertionand Wiresizing

|     | Area Consumption |                 |        |  |  |  |  |
|-----|------------------|-----------------|--------|--|--|--|--|
| Net | Uni. Sizing      | Non Uni. Sizing | Saving |  |  |  |  |
| No. | $(xW_0)$         | $(xW_0)$        | (-%)   |  |  |  |  |
| 1   | 204              | 169             | 17.16  |  |  |  |  |
| 2   | 240              | 204             | 15.00  |  |  |  |  |
| 3   | 312              | 259             | 16.99  |  |  |  |  |
| 4   | 400              | 343             | 14.25  |  |  |  |  |
| 5   | 486              | 402             | 17.29  |  |  |  |  |
| 6   | 456              | 350             | 23.25  |  |  |  |  |
| 7   | 520              | 419             | 19.42  |  |  |  |  |
| 8   | 624              | 554             | 11.22  |  |  |  |  |

The major advantage of non-uniform wiresizing is demonstrated in Table 3 which compares the routing resource consumption by the two approaches. To achieve the performance listed in Table 3, the nonuniform wiresizing approach consumes an average of 16.83% less area than the uniform wiresizing approach. In summary, the tapered link insertion and wiresizing approach can achieve better maximum delay and skew with less area than the uniform wiresizing.

#### References

- K. Boese, A. Kahng, B. Mccoy and G. Robins, "Rectilinear Steiner Trees with Minimum Elmore Delay", Proc. DAC31, pp. 381-386, 1994.
- [2] J. Cong and K. Leung, "Optimal Wiresizing Under the Distributed Elmore Delay Model", Proc. ICCAD93, pp.634-639, 1993.
- [3] X. Hong, T. Xue, E. Kuh, C. Cheng, "Performance-Driven Steiner Tree Algorithms For Global Routing", Proc. DAC30, pp.17-181, 1993.
- [4] B. McCoy and G. Robins, "Non-Tree Routing", Proc. ED&T 94, pp. 430-434, 1994.
- [5] A. E. Ruehli, Circuit Analysis, Simulation and Design, Advances in CAD For VLSI, Vol.3, 11.2.
- [6] S. Sapatnekar, "Interconnect Optimization under the Elmore Delay Model", Proc. of DAC31, pp. 387-391, 1994.

- [7] R. S. Tsay, "Exact Zero Skew", Proc. ICCAD91, pp. 336-339, 1991.
- [8] T. Xue, E. S. Kuh, internal technical report.



