# Flip-Flop and Repeater Insertion for Early Interconnect Planning\*

Ruibing Lu, Guoan Zhong, Cheng-Kok Koh ECE, Purdue University West Lafayette, IN, 47907, USA {lur, zhongg, chengkok}@ecn.purdue.edu Kai-Yuan Chao Intel Corporation Hillsboro, OR 97124, USA kchao@ichips.intel.com

#### Abstract

We present a unified framework that considers flipflop and repeater insertion and the placement of flipflop/repeater blocks during RT or higher level design. We introduce the concept of independent feasible regions in which flip-flops and repeaters can be inserted in an interconnect to satisfy both delay and cycle time constraints. Experimental results show that, with flip-flop insertion, we greatly increase the ability of interconnects to meet timing constraints. Our results also show that it is necessary to perform interconnect optimization at early design steps as the optimization will have even greater impact on the chip layout as feature size continually scales down.

### 1 Introduction

Continued scaling of VLSI technologies has brought the issues of interconnect-limited designs to the forefront. In deep sub-micron circuits, global interconnect delay has been found to be comparatively larger than gate delay. This leads to the large increase in the number of timing violations encountered in the physical design stage of the conventional design flow, because accurate delay information for interconnects can be acquired only in physical design stage. Therefore, the backend of the design flow has become a slow iterative process with no guarantee of convergence.

In the conventional design flow, the interconnect optimization is performed during the physical design stage. Studies show that repeater insertion is an effective methods to minimize signal delays of global wires. However, repeater insertion has its limitations. Several papers on repeater planning [5, 10, 9] at the floorplanning step have clearly showed that not all timing requirements (delay and transition time) can be met even by using repeaters. The average completion rates or the percentages of all global nets that meet their timing constraints are 67.5% [5], 85% [10], and 79% [9] respectively. One reason for such low completion rates is that blockages of circuit blocks may prevent a repeater planner from finding feasible locations for repeaters. Alpert *et al.* [2] proposed distributed buffer sites in IP blocks to solve blockage problem and improve the routability. Moreover, the timing constraints may be so tight that they are beyond the maximum performance that repeater insertion can reach. In fact, the wire delay can be as long as about ten clock cycles in the near future [7], thus making pipelined signal transmission and the insertion of flip-flops or latches necessary.

Recent studies [8, 11] have proposed new design flows in which physical design was done partially during high level design stages. This makes more accurate delay information available at early design steps, so that lengthy iterations of the whole design flow can be avoided. Ref. [11], for example, proposed a design flow for architectural level retiming with module selection to consider the long delay of global wires. Through architectural floorplanning, the clock cycles needed for each global interconnect are estimated. Then a new minimal area retiming method is defined to deal with the area delay tradeoff of circuit blocks under the clock cycle constraint of global interconnects. However, the paper did not consider the effect on the floorplan of the inserted flip-flops (FFs) and repeaters, which are usually assumed to be placed at their optimal positions. Since the number of FFs and repeaters increases dramatically when the clock frequency increases and the die size becomes larger, the floorplan may change significantly. In addition, IP reuses and pre-defined hard blocks may prevent flip-flops and repeaters from finding their locations. For these reasons, many timing violations may still be found at physical design level, making it very difficult to avoid lengthy design iterations.

This paper proposes a flip-flop/repeater block planning method during architectural floorplanning/interconnect planning stage. The addition of flip-flops (FFs) or latches to global interconnects allows signals to be transferred from

<sup>\*</sup>This work was supported in part by NSF under contract number CCR-9984553, SRC under contract number 99-TJ-689, and a grant from Intel Corporation.

the driver blocks to receiver blocks in pipeline stages; that overcomes the limitation of repeater insertion. The result can be fed back to early design stages such that proper adjustments to a design can be made. We propose a unified framework that considers flip-flop and repeater insertion, and introduce the concept of independent feasible regions in which flip-flops and repeaters can be inserted in an interconnect to satisfy both delay and cycle time constraints. This approach can also be combined with the architectural retiming [11] to improve the convergence speed and quality of the circuit. With the consideration of physical locations of FFs and repeaters, our approach can generate more practical and feasible clock cycle constraints and consider their effects on the chip area.

The rest of this paper is organized as follows: In Section 2, we formulate the flip-flop and repeater block planning problem. In Section 3, we present the enabling concept to the proposed flip-flop and repeater block planning solution: independent feasible regions for the placement of flip-flops and repeaters. In Section 4, we briefly introduce our FF/repeater block planning algorithm and then present the experimental results on MCNC benchmark circuits, and in Section 5, we conclude the paper.

### 2 Problem formulation

When the timing constraint for a given interconnect is too stringent for repeaters to overcome, it is necessary to insert FFs to break the long interconnect into several segments. When *n* FFs are inserted, we divide the wire into n+1 segments. With the insertion of FFs, we also introduce three timing constraints; one for the first segment, one for the middle segments, and one for the last segment. The timing constraints for the middle segments is the clock period minus the setup time and the flip-flop propagation delay. The timing constraint for the first segment should ensure that the delays from those source registers to the first FF are less than one clock period minus the setup time and the register propagation delay. Similarly, we have to consider all sink registers to which the sink of the net sends data. The timing constraint for the last segment should ensure that the delays from the last FF to the sink registers are less than one clock period minus the setup time and the flip-flop propagation delay. We denote the target delays for the first segment, the middle segments, and the last segment by  $D_{tgt,F}$ ,  $D_{tgt,M}$ , and  $D_{tgt,L}$ , respectively.

In this paper, we study the problem of simultaneously inserting FFs and repeaters into a net to meet the given timing constraints  $D_{tgt}$  and the clock frequency constraint. Note that the given timing constraints are in general smaller than the clock period. As we do not know the actual implementations before the net driver and after the net sink, we use the given timing constraint for  $D_{tgt,F}$  and  $D_{tgt,L}$ ; otherwise, the exact values of  $D_{tgt,F}$  and  $D_{tgt,L}$  can be easily derived as discussed earlier.

Given the timing constraints and clock period, we find suitable locations along a net for the insertion of FFs and repeaters. Usually, the suitable location for the insertion of an FF or a repeater is not unique. We are interested in finding an *independent feasible region* along the net in which that FF or repeater can be inserted to meet the required timing constraints. All FFs or repeaters can be placed arbitrarily in their independent feasible regions without violating the timing constraints. In this study, we consider only two-pin nets.

#### **3** Flip-flop and repeater insertion

In this section, we study the problem of finding independent feasible regions of FFs and repeaters inserted along a net. Here, all drivers of each segment, and the repeaters inserted in each segment are modeled as switch-level RC circuits. We use Elmore delay model for delay computation.

First, we investigate when flip-flops are required. The result allows us to calculate the number of FFs required for each net to meet the timing constraints. Next, we show that the independent feasible regions (IFRs) for both FFs and repeaters can be derived analytically. For the results to be useful in our proposed flip-flop/repeater block planning algorithm, we should ensure that the number of FFs and repeaters are minimal (or nearly minimal), and the width of IFRs of all the FFs and repeaters are balanced.

The notation for the physical parameters of the interconnects, repeaters, drivers, loads, and flip-flops are as follows:

- $R_d$ : driver resistance;  $C_l$ : load capacitance of sink;
- *r*: wire resistance per unit length;
- *c*: wire capacitance per unit length;
- $T_b$ : intrinsic repeater delay;
- *T<sub>b</sub>*. Intrinsic repeater deray,
- $C_b$ : repeater input capacitance;
- $R_b$ : repeater output resistance;
- $C_F$ : Flip-Flop input capacitance;
- $R_F$ : Flip-Flop output resistance.

#### 3.1 When are flip-flops needed?

Flip-flops are needed only when the delay constraints are beyond the maximum performance that repeater insertion can reach. Here, we derive the formula for the maximum length that a wire with repeaters can be before it exceeds the given target delay. To do that, we have to find the optimal number of repeaters that achieves that.

Consider a wire with length L, driver  $R_d$ , and sink  $C_l$ . If we insert *n* repeaters into the wire, based on the optimal



Figure 1. Relationship between the wire length and the ideal and actual delays.

positions of these repeaters for delay minimization given in [1], the optimal delay  $D_N$  can be expressed as:

$$D_N(n,L) = \frac{rc}{2(n+1)} (L + l_r + l_d)^2 + (n+1)(R_bC_b + T_b) + L(R_bc + rC_b) + l_rrC_b + l_cR_bc - \frac{rc}{2}(l_r^2 + l_c^2) - T_b,$$
(1)

where  $l_r = \frac{R_d - R_b}{r}$  and  $l_c = \frac{C_l - C_b}{c}$ .

From this equation, it is easy to get the maximal length  $L_N(n)$  for an interconnect with *n* repeaters under a given timing requirement  $D_{tgt}$ .

From Eqn. (1), we can also obtain the optimal delay for an interconnect properly inserted with an *ideal optimal number* of repeaters:

$$D'_{opt}(L) = \left( R_b c + rC_b + \sqrt{2rc(R_bC_b + T_b)} \cdot L + (l_r + l_c) \cdot \sqrt{2rc(R_bC_b + T_b)} + l_r rC_b + l_c R_b c - \frac{rc}{2}(l_r^2 + l_c^2) - T_b,$$
(2)

where the ideal optimal number of repeaters is:

$$n'_{opt}(L) = \sqrt{\frac{rc}{2(R_b C_b + T_b)}} \cdot (L + l_r + l_c) - 1, \quad (3)$$

which may not be an integer. Therefore, under a given delay constraint  $D_{tgt}$ , the maximum wire length of a signal inserted with the ideal optimal number of repeaters is:

$$L'_{max}(R_d, C_l, D_{tgt}) = \frac{D_{tgt} + T_b + \frac{rc}{2}(l_r^2 + l_c^2) - l_r r C_b - l_c R_b c}{R_b c + r C_b + \sqrt{2rc(R_b C_b + T_b)}} - \frac{(l_r + l_c)\sqrt{2rc(R_b C_b + T_b)}}{R_b c + r C_b + \sqrt{2rc(R_b C_b + T_b)}}.$$
(4)

However,  $L'_{max}$  may be overestimated because the ideal optimal number of repeaters  $n'_{opt}$  may not be an integer, which is not realizable. Figure 1 plots the ideal and the actual optimal delays, denoted respectively by  $D'_{opt}$  and  $D_{opt}$ , as the length of an interconnect varies. We use the plots to derive the longest wire length that satisfies a given delay constraint:

**Theorem 1** For an interconnect N with driver resistance  $R_d$  and load capacitance  $C_l$ , the maximum wire length of the interconnect inserted with repeaters under a given target delay  $D_{tgt}$  is:

$$L_{max}(R_d, C_l, D_{tgt}) = max\{L_N(\lfloor n' \rfloor), L_N(\lceil n' \rceil)\}$$

where  $n' = n'_{opt}(L'_{max}(R_d, C_l, D_{tgt})).$ 

*Proof:* Let us consider an interconnect *N* with length *L*. By definition, the minimal delay of *N* when inserted with optimal number of repeaters is:

$$D_{opt}(L) = min\{D_N(0,L), D_N(1,L), D_N(2,L), \dots\}.$$

As  $D_N(n,L)$  for any  $n \ge 0$  is continuous and monotonically increasing with respect to L,  $D_{opt}(L)$  is continuous and monotonically increasing. From [1], we also know that the optimal number of repeaters (for delay minimization) is non-decreasing as wire length increases.

From Eqns. (1) and (2), we can deduce the line  $D'_{min}(L)$  is tangent to every curve  $D_N(n,L)$  at one and only one point. Let  $(LT_n, D'_{opt}(LT_n))$  denote that point at which  $D'_{min}(L)$  is tangent to  $D_N(n,L)$ . Then, equating Eqns. (1) and (2), we obtain:

$$LT_n = (n+1) \cdot \sqrt{\frac{2(R_bC_b + T_b)}{rc}} - l_r - l_c$$

Therefore,  $LT_n$  increases as *n* increases. In fact,  $LT_n$  is the wire length that makes the ideal optimal number of repeaters  $n'_{opt}(LT_n) = n$ . As  $D_{opt} \ge D'_{opt}$ , but  $D_{opt}$  is not larger than  $D_N(n,L)$  for any n,  $(LT_n, D'_{opt}(LT_n))$  is also on the curve  $D_{opt}(L)$ , or  $D_{opt}(LT_n) = D'_{opt}(LT_n)$ .

From Eqn. (3), the ideal optimal number of repeaters  $n'_{opt}$  linearly increases as the wire length *L* increases. Therefore, we have

$$LT_{\lfloor n' \rfloor} \leq L'_{max} \leq LT_{\lceil n' \rceil}$$

where  $n' = n'_{opt}(L'max)$ . Furthermore, as the ideal optimal delay is monotonically increasing with respect to wire length,

$$D_{tgt} \ge D'_{opt}(LT_{\lfloor n' \rfloor}) = D_{opt}(LT_{\lfloor n' \rfloor}),$$
  
$$D_{tgt} \le D'_{opt}(LT_{\lceil n' \rceil}) = D_{opt}(LT_{\lceil n' \rceil}).$$

In addition, the actual optimal delay is also monotonically increasing. Therefore,

$$LT_{\lfloor n' \rfloor} \leq L_{max} \leq LT_{\lceil n' \rceil}$$

Therefore, the actual optimal number of repeaters of the interconnect is either  $\lfloor n' \rfloor$  or  $\lceil n' \rceil$ .

For an interconnect whose length is less than  $L_{max}$ , inserting only repeaters is sufficient to meet the delay constraints. However, for interconnects with length longer than  $L_{max}$ , it is necessary to insert flip-flops. The following corollary computes the least number of flip-flops required to meet the delay and clock period constraints:

**Corollary 1** Given an interconnect of length L, a delay constraint  $D_{tgt}$ , and a clock period constraint, the minimal number of flip-flops required is:

$$N_{FF} = \begin{cases} 0 & \text{if } L \leq L_{max,0}, \\ 1 & \text{if } L_{max,0} < L \leq L_L + L_F, \\ \left\lceil \frac{L - L_F - L_L}{L_M} \right\rceil + 1 & \text{otherwise,} \end{cases}$$
(5)

where  $L_{max,0} = L_{max}(R_d, C_l, D_{tgt}), \quad L_F = L_{max}(R_d, C_F, D_{tgt,F}), \quad L_L = L_{max}(R_F, C_l, D_{tgt,L}), \quad and \\ L_M = L_{max}(R_F, C_F, D_{tgt,M}).$ 

#### 3.2 Feasible region for flip-flops

If *n* flip-flops are required to be inserted in an interconnect, we denote the location of the *i*-th ( $1 \le i \le n$ ) flip-flop by  $f_i$ . We define the feasible region (FR) for the *i*-th flip-flop as

$$FR_i = (f_i^* - W_{FR}/2, f_i^* + W_{FR}/2) \cap (0, L),$$

such that  $(f_1, f_2, \dots, f_i, \dots, f_n) \in FR_1 \times FR_2 \times \dots \times FR_n$  and  $f_1 \leq L_F$ ,  $f_i - f_{i-1} \leq L_M$  for  $2 \leq i \leq n$ , and  $L - f_n \leq L_L$ . Here,  $f_i^*$  can be viewed as the central location of the *i*-th flip-flop in its feasible region, and  $W_{FR}$  denotes the width of the feasible region.

For a flip-flop solution to be feasible, it must satisfy the following inequalities:

$$f_1^{\star} + W_{FR}/2 \leq L_F,$$
  

$$f_i^{\star} - f_{i-1}^{\star} + W_{FR} \leq L_M, \text{ for } 2 \leq i \leq n,$$
  

$$L - f_n^{\star} + W_{FR}/2 \leq L_L.$$

Summing the n + 1 inequalities, we obtain:

$$L + nW_{FR} \leq L_F + (n-1)L_M + L_L$$

In order to maximize  $W_{FR}$ , we get:

$$W_{FR} = (L_F + L_L + (n-1)L_M - L)/n.$$
(6)

We can also obtain the central locations  $f_i^{\star}$ :

$$f_i^{\star} = L_F + (i-1)L_M - (i-1/2)W_{FR}.$$
 (7)

#### **3.3 IFRs for flip-flops and repeaters**

In the preceding section, the feasible region of the flipflops are calculated without consideration of the feasible region of repeaters between the flip-flops. If the first flip-flop is inserted at location  $f_1^* + W_{FR}$ , then the locations of the repeaters in the first segment is fixed. In fact, as  $L_F$ ,  $L_M$ , and  $L_L$  are derived based on the optimal placement of repeaters, the locations of the repeaters between the flip-flops are fixed if two consecutive flip-flops are placed as far apart as possible but still within their feasible regions. That imposes a very severe restriction on the placement of repeaters in a floorplan.

In this section, we consider finding the independent feasible regions of flip-flops and repeaters such that they satisfy the following condition: if every flip-flops or repeater is in its independent feasible region, then that constitutes a valid solution. The goal here is to balance the widths of the IFRs of flip-flops and repeaters.

Consider the first segment of the interconnect. We assume that the "length" of the segment is defined by on the central position of the first flip-flop. Note that the actual segment length is determined by the final placement of the flip-flop. We can easily calculate the smallest number of repeaters, denoted by  $n_F$ , required to satisfy the timing constraint  $D_{tgt,F}$  based on Theorem 3 in [5]. We define the independent feasible region of the *i*-th repeater

$$IFR_r(i) = (x_i^* - W_F/2, x_i^* + W_F/2) \cap (0, f_1^*)$$

and that of the first flip-flop as

$$IFR_f(1) = (f_1^{\star} - W_F/2, f_1^{\star} + W_F/2)$$

such that  $(x_1, x_2, ..., x_i, ..., x_{n_F}, f_1) \in IFR_r(1) \times IFR_r(2) \times ... \times IFR_r(n_F) \times IFR_f(1)$ , and  $D(x_1, x_2, ..., x_{n_F}, f_1) \leq D_{tgt,F}$ , where  $D(x_1, x_2, ..., x_{n_F}, f_1)$  denotes the Elmore delay of the first segment. Here,  $W_F$  denotes the width of the independent feasible regions in the first segment. Similarly, we denote  $n_M$  and  $n_L$  as the numbers of repeaters required for the middle segments and last segment, respectively. We also denote  $W_M$  and  $W_L$  as the widths of the independent feasible regions of the repeaters and flip-flops in the middle segments and the last segment, respectively. Based on Lemma 1 and Theorem 2 in [10], we obtain the following theorem to compute the widths  $W_F$ ,  $W_M$ , and  $W_L$ :



Figure 2. IFRs for repeaters and Flip-Flops.

**Theorem 2** Given an interconnect, the number, n, and the central positions,  $f_i^*$ , of flip-flops inserted, and the number of repeaters for each segment,  $n_F$ ,  $n_M$ , and  $n_L$ , the widths  $W_F$ ,  $W_M$  and  $W_L$  of the respective independent feasible regions are:

$$\begin{split} W_F &= 2(\sqrt{K_{t,F}^2 + 2rc(4n_F + 1)(D_{tgt,F} - D_{opt,F})} - K_{t,F})/(rc(4n_F + 1)), \\ W_M &= (\sqrt{K_{t,M}^2 + 2rc(n_M + 1)(D_{tgt,M} - D_{opt,M})} - K_{t,M})/(rc(n_M + 1)), \\ W_L &= 2(\sqrt{K_{t,L}^2 + 2rc(4n_L + 1)(D_{tgt,L} - D_{opt,L})} - K_{t,L})/(rc(4n_L + 1)), \end{split}$$

where  $K_{t,F} = rcx_{f_1}^* + R_dc + rC_b$ ,  $K_{t,M} = rcx_{f_2^*-f_1^*}^* + R_Fc + rC_b$ ,  $K_{t,L} = rcx_{L-f_n^*}^* + R_Fc + rC_b$ , and  $D_{opt}$  is the optimal delay for the corresponding segment. Note that  $x^*$  is the wire length from the central position of the segment source to the optimal position of the first repeater in the segment.  $\Box$ 

We may obtain two different widths  $W_F$  and  $W_M$  for the IFR of the first flip-flops and two different widths  $W_M$  and  $W_L$ for the IFR of the last flip-flop. Figure 2 shows an example IFR for the first flip-flop. As it turns out, the IFR to the left of the first flip-flop is constrained by the second segment, and the IFR to its right is constrained by the first segment. Therefore the actual width of the independent feasible region of the first flip-flop is  $(W_F + W_M)/2$ . Similarly, the width of the independent feasible region of the last flip-flop is  $(W_M + W_L)/2$ .

## 4 FF/repeater block planning and experiment results

The FF/repeater block planning algorithm takes the initial floorplan, and delay constraints on the global nets as inputs. It determines the locations, assignments and sizes of FF/repeater blocks to be inserted in the channel areas and

feature R C $T_b$ С r (MHz)  $fF/\mu m$  $(\Omega \mu m)$ (Ω) (fF) size (ps) 0.25 400 0.139 0.050 228 15.8 84 0.18 600 0.147 0.076 238 9.7 57 0.13 800 0.148 0.220 294 5.0 31 0.10 1100 0.167 0.350 311 3.1 20

Table 1. Parameter values for experiments.

dead areas (or white spaces) between the circuit modules such that the delay constraints can be satisfied.

We adapt the algorithms from [5, 10, 9] to perform FF/repeater block planning. First, the dead areas and channel areas are divided into a set of rectangular candidate FF/repeater blocks (CFRBs), where the planning algorithm may insert FFs/repeaters to meet timing constraints. Next, we compute the required numbers of FFs and repeaters required for each global net to meet its timing requirement based on the physical information, and the IFRs of these FFs/repeaters. Then, a bipartite graph G = (R, B, E) is constructed, where an edge  $(r \in R, b \in B) \in E$  means CFRB *b* is in the IFR of FF/repeater *r*. Finally, the iterative deletion [6] on the bipartite graph *G* is used to assign a CFRB for each FF or repeater .

We have implemented our FF/repeater block planning algorithm using C++ on a Pentium 3 machine. In this section, we present the details of our experimental set up and the results obtained. The chip frequency and interconnect and repeater parameters shown in Table 1 are obtained from [4] and ITRS'99 roadmap [3]. We assume that the input capacitance and output resistance of a flip-flop are same as those of a repeater, while its area is four times larger. The output resistance ( $R_d$ ,  $R_b$  and  $R_F$ ) are and the input capacitance ( $C_l$ ,  $C_b$  and  $C_F$ ) are denoted by R and C in Table 1 respectively.

The experiments are based on MCNC benchmark circuits, under different technology generations from  $0.25\mu m$  to  $0.10\mu m$ . We use the same initial floorplans as those in [5, 10] even under different technology generations, assuming that the chip area for each circuit remains the same even when the technology scaled because each block in the circuit became more complex in the newer generation.

We use the target frequency published in the Roadmap (see Table 1) to determine the clock period for  $0.18\mu m$  technology. Assuming that the skew, setup time, and hold time are negligible, the difference between the clock period and a target delay are attributed to the logic delay. Using the repeater intrinsic delay as the intrinsic delay of a logic gate, we estimated the logic depth. To generate the target delays for a different technology generation, we use the estimated logic depth to calculate the logic delay under that technology.

We summarize the experimental results in in Table 2. We report the following results: (i) the ratio of the number of

| circuit | feature | Without flip-flops |      | With flip-flops |      |
|---------|---------|--------------------|------|-----------------|------|
|         | size    | MET<br>NOMET       | δΑ   | MET<br>NOMET    | δΑ   |
|         | 0.25    | 167/5              | 0.04 | 167/5           | 0.04 |
| apte    | 0.18    | 124/48             | 0.30 | 153/19          | 1.20 |
|         | 0.13    | 76/96              | 0.17 | 122/50          | 0.92 |
|         | 0.10    | 41/131             | 0.05 | 83/89           | 0.51 |
|         | 0.25    | 215/11             | 0.54 | 223/3           | 0.98 |
| hp      | 0.18    | 175/51             | 0.15 | 219/7           | 2.19 |
|         | 0.13    | 135/91             | 0.20 | 200/26          | 1.55 |
|         | 0.10    | 82/144             | 0.05 | 151/75          | 0.63 |
|         | 0.25    | 452/3              | 0.37 | 455/0           | 0.67 |
| xerox   | 0.18    | 297/158            | 1.45 | 453/2           | 2.23 |
|         | 0.13    | 144/311            | 0.34 | 405/50          | 2.72 |
|         | 0.10    | 95/360             | 0.04 | 257/198         | 2.04 |
|         | 0.25    | 361/2              | 0.43 | 361/2           | 0.43 |
| ami33   | 0.18    | 237/126            | 2.00 | 351/12          | 3.58 |
|         | 0.13    | 110/253            | 0.32 | 340/23          | 5.09 |
|         | 0.10    | 47/316             | 0.05 | 231/132         | 2.17 |
|         | 0.25    | 539/6              | 0.08 | 540/2           | 0.38 |
| ami49   | 0.18    | 460/85             | 2.38 | 533/12          | 5.55 |
|         | 0.13    | 262/283            | 0.86 | 523/22          | 6.06 |
|         | 0.10    | 130/415            | 0.19 | 405/140         | 3.90 |

**Table 2. Experimental Results** 

nets for which the delay constraint is met to the number of nets for which the delay constraint is not satisfied, denoted by  $\frac{MET}{NOMET}$ ; (ii) the chip area increase,  $\delta A$  expressed as a percentage of the original chip area. We compare the results for repeater block planning with and without FF insertion.

Comparing the data in Table 2, we observe that the number of nets requiring FFs increases greatly as the feature size decreases. As expected, using FFs together with repeaters increases the ability to meet timing constraints drastically, from 23.6% to 61.9% on the average for  $0.10\mu m$  technology, and from 59.3% to 86.1% on the average for all four feature sizes. The results also show that using FFs and repeaters consume much more chip area than using repeaters only. There are two main reasons: First, those nets requiring FFs are usually very long, and to meet their timing constraint, usually quite a few repeaters are also required in addition to the flip-flops. Therefore, more repeaters are required when we consider FF insertion. Second, usually the area of an FF is much bigger than that of a repeater. This strongly supports our claim that it is important to consider the insertion of both FFs and repeaters simultaneously at early design stages; otherwise, it will be very difficult to introduce additional FFs and repeaters at a later stage, without significantly affecting the floorplan and placement.

The results, however, show that not all the interconnect timing delay constraints can be met. One reason is that we consider only monotonic path for all two-pin interconnects, which means that the wire length of every interconnect is optimal. While such an approach works for most nets, it is overly restrictive for a few nets that have their shortest routes obstructed by other circuit blocks (see discussions earlier). Therefore, some "post processing", in which detour path should be considered, may be necessary. The result can also be improved if there are distributed "buffer sites" proposed in [2]. Here, the buffer sites can also be used for FF insertion.

#### 5 Conclusion

In this paper, we have presented a flip-flop/repeater insertion method to meet stringent timing constraints at early design stage for global interconnects. The concept of independent feasible region is used for simultaneous planning of flip-flops and repeaters. Experimental results show that our technique can improve the ability to meet timing given any floorplan and the necessity of FF/repeater planning during RTL or higher design stages.

#### References

- C. J. Alpert and A. Devgan. Wire segmenting for improved buffer insertion. In *Proc. Design Automation Conf*, pages 588–593, 1997.
- [2] C. J. Alpert, J. Hu, S. S. Sapatnekar, and P. G. Villarrubia. A practical methodology for early buffer and wire resource allocation. In *Proc. Design Automation Conf*, pages 189– 193, 2001.
- [3] S. I. Association. International Technology Roadmap for Semiconductors. 1999.
- [4] J. Cong, L. He, K.-Y. Khoo, C.-K. Koh, and Z. Pan. Interconnect design for deep submicron ICs. In *Proc. Int. Conf.* on Computer Aided Design, pages 478–485, 1997.
- [5] J. Cong, T. Kong, and D. Z. Pan. Buffer block planning for interconnect-driven floorplanning. In *Proc. Int. Conf. on Computer Aided Design*, pages 358–363, 1999.
- [6] P. H. Madden. Partitioning by iterative deletion. In Proc. Int. Symp. on Physical Design, pages 83–89, 1999.
- [7] D. Matzke. Will physical scalability sabotage performance gains? *IEEE Computer*, 8:37–39, Sept. 1997.
- [8] R. H. J. M. Otten and R. K. Brayton. Performance planning. *Integration, the VLSI Journal*, 29:1–24, 2000.
- [9] P. Sarkar and C.-K. Koh. Repeater block planning under simultaneous delay and transition time constraints. In *Design*, *Automation and Test in Euro. Conf.*, pages 540–544, 2001.
- [10] P. Sarkar and C.-K. Koh. Routability-driven repeater block planning for interconnect-centric floorplanning. *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, 20(5):660–671, 2001.
- [11] A. Tabara, B. Tabbara, R. K. Brayton, and A. R. Newton. Integration of retiming with architectural floorplanning. *Integration, the VLSI Journal*, 29:25–43, 2000.