# Partitioning and Reduction of RC Interconnect Networks Based on Scattering Parameter Macromodels

Haifang Liao<sup>\*</sup> and Wayne Wei-Ming Dai

Computer Engineering, University of California, Santa Cruz, CA 95064

# Abstract

This paper presents a linear time algorithm to reduce a large RC interconnect network into subnetworks which are approximated with lower order equivalent RC circuits. The number of RC elements can be reduced between 50% and 90%. Instead of increasing approximation order for a network with large number of ports which will result in inefficiency at best and ill-conditioned matrices at worst, we partition the original network into several subnetworks, each of which is approximated by lower order model. The reduced circuits are guaranteed to be stable. The experiment results show that the number of circuit elements in a reduced network is O(N) for typical clock networks, where N is the number of external ports. The simulation time can be reduced by two to three orders of magnitude while the response of the reduced circuits is within five percent of that of the original circuits.

### **1. Introduction**

As fast timing analysis is desired for today's complex interconnect structures, the moment matching technique is widely used to approximate a higher order linear network using the waveforms generated by its lower order moments [11, 14, 16]. These approaches can also be used to improve the efficiency of existing time-domain circuit simulators by incorporating macromodels of interconnect networks with recursive convolution [12, 14, 17], or generating an equivalent circuit based on a numerical integration method [9].

The macromodels generated by moment matching technique are approximations of the port behavior of the interconnects. In order to incorporate the macromodels into SPICE-like simulators, they are usually described by admittance matrices (Y-matrices) [17, 9]. The Y-matrices are generally full matrices. When the number of external ports of an interconnect network is large, the number of entries of the matrix may be larger than the number of circuit elements of the original interconnect network. For example, a clock network with 500 fanouts will create more than 250,000 entries of the corresponding Y-matrix. This huge full matrix may even increase the computational burden to circuit simulators.

On the other hand, trying to model the huge full matrix by lower order approximation with moment matching techniques is impractical. Severe numerical problems may result in extremely ill-conditioned matrices for computing high order moments. The complex frequency hopping method [2] and the PVL (Padé approximation via the Lanczos process) [4] are efforts to compute high order moments while avoiding the ill-condition problem. However, increasing the approximation order will increase the modeling and simulation time. Especially, when a large circuit with large number of ports is terminated with nonlinear drivers and loads, there is no efficient method to obtain reduced-order approximate models of linear networks. Furthermore, Padé approximation may create a non-positivereal Y-matrix. The positive-real Y-matrix is the necessary and sufficient condition for the corresponding network to be passive [8]. Non-positive-real Y-matrix may result in unstable simulation.

The difficulty of reducing a large linear network with large number of ports can be dealt with by partitioning the network into smaller subnetworks. In this paper, we present a new reduced-order macromodel of RC interconnect networks for transient analysis. Instead of modeling a single large interconnect network using higher order approximation, we develop a linear-time algorithm to partition a complex interconnect network into subnetworks to reduce the total number entries of Y-matrices. We synthesize a lower order equivalent circuit for each subnetwork. Since all parameters of reduced circuits are non-negative, the circuit is always stable. Simulators only need to analyze the synthesized low order equivalent circuit which will greatly reduce the running-time. Because networks are partitioned into smaller subnetworks, high accuracy can be achieved using lower order models approximating those subnetworks.

<sup>\*</sup> Haifang Liao is now with Ultima Interconnect Technology, Inc., Santa Clara, CA.

# 2. Network reduction

Scattering parameters (S-parameters) is a powerful way to describe and model interconnects. They can be measured directly at high frequencies and they exists for all distributed-lumped circuit elements including open and short circuits. Scattering parameters represent the interrelationship of a set of incoming and outgoing power waves of a multi-port system. To develop the macromodel, we begin with the description of all elements in the distributedlumped network in terms of their scattering parameters. Some basic elements are utilized to characterize a general interconnect network, e.g., one-port impedance, two-port impedance, single transmission line and multiport interconnect node. These basic elements are shown in Fig. 1. S-



(c) Single transmission line (d) Multiport interconnect node Figure 1. Basic elements.

parameters of the first three elements can be found in [3].

The interconnect node as shown in Fig. 1(d) can be used to connect elements described in Fig. (1a) through (1c) to construct a generalized interconnection network. The S-matrix of the n-port interconnect node is [10]:

$$S = \frac{1}{n} \begin{bmatrix} 2-n & 2 & \dots & 2\\ 2 & 2-n & \dots & 2\\ \dots & \dots & \dots & \dots\\ 2 & 2 & \dots & 2-n \end{bmatrix}$$
(1)

Combining these basic elements and all multiport elements described by S-parameters, one can represent a variety of distributed-lumped network topology including capacitive cutsets, inductive loops, and lossy transmission lines. Given the individual component scattering parameters, a systematic reduction algorithm has been developed in [11]. Two merging rules are employed to reduce the original arbitrary distributed-lumped linear network into a multiport component (See Fig. 2). The Adjoined Merging



Figure 2. S-parameter based macromodel.

Rule (See Fig. 3) is used to merge all internal components, and the Self Merging Rule (See Fig. 4) is applied to eliminate the self loops introduced by the Adjoined Merging process. Each step of the merging process amounts to contracting an edge in a circuit graph G(V, E), where vertices V consists of circuit elements shown in Fig. 1 and edges E represent the adjacency between elements in the circuit.



Figure 4. Self Merging.

### 3. Circuit partitioning

The description of an N-port component is generally a full S-matrix or Y-matrix. When N is very large, the synthesized equivalent circuit could have a very large number of elements. In order to minimize the number of elements of the synthesized circuit, we propose to partition a multiport component into several subcomponents with a smaller number of ports (See Fig. 5). Since the number of elements



#### Figure 5. Several smaller components are more efficient than one large component to model complex interconnects.

of the reduced circuit is proportional to the number of the entries of S-matrix, the objective of the partitioning problem is to minimize the total number of entries of the Smatrix. More precisely,

$$ninimize \qquad \sum_{i=1}^{\chi} N_i^2 \tag{2}$$

where  $\chi$  is the number of subcomponents and  $N_i$  the number of ports of the component *i*. We propose a linear-time heuristic algorithm to partition the interconnect network into several multiport components.

Consider the circuit graph of a network G(V, E). Define the weight W(e) of an edge  $e \in E$  as the number of ports of the resultant component after contracting e with Adjoined Merging (Fig. 3) or Self Merging (Fig. 4). We bound the weight of "contractible" edges by  $W_{max}$ . When an edge has the weight more than  $W_{max}$ , it will not be contracted during network reduction, and when weights of all edges are greater than  $W_{max}$ , the reduction process terminates. Here  $W_{max}$  is chosen in such a way that the number of matrix entries does not increase too much when contracting the edges. Let  $N_x$  and  $N_y$  be the port numbers of the two different components connected by an edge e. According to the definition,  $W(e) = N_x + N_y - 2$ .  $W_{max}$  is chosen such that for any e, the following inequality holds,

$$N_x^2 + N_y^2 > \beta (N_x + N_y - 2)^2$$
(3)

where  $N_x^2 + N_y^2$  is the number of entries of S-matrices before contracting *e*,  $(N_x + N_y - 2)^2$  is that after contracting *e*, and  $\beta$  the contracting factor.

Now we have two problems: one is how to select efficiently an edge with minimum weight, and the other is how to update the edge weights after contracting an edge. Those operations can be done in constant time with a bucket list structure shown in Fig. 6. This structure includes a bucket



Figure 6. Bucket list structure.

array, whose *i*th entry contains a doubly-linked list of edges with weights equal to *i*. For the bucket array, a *minweight* index is used to keep track of the bucket containing edges with the minimum weight. Since only neighboring edges change their weights, these weights can be updated in constant time [6].

#### 4. Circuit synthesis

Given the scattering matrix S of a multiport component, the admittance matrix is

$$Y = Z_0^{-1} (E + S)^{-1} (E - S)$$
(4)

where  $Z_0$  is the reference impedance, and E identity matrix. The synthesis of a general Y-matrix without using a transformer is still a open-problem [1, 8, 15]. Here we propose a new synthesis method without using transformers. As we know, the admittance to ground of the *i* th port is given by the sum of the *i* th row (or column) of its Ymatrix. The admittance connecting port-*i* and port-*j* is the value of  $-y_{ij}$ . Thus, the circuit synthesis problem amounts to synthesizing admittances between a port and the ground and between pairs of ports with lumped R and C elements. Unlike the method proposed in [18] where the connection between any two ports is synthesized with a fixed circuit configuration and the values of circuit elements are determined based on the admittance at two frequency points, we use moment matching technique to synthesize the admittance matrices of an RC network. With Taylor series expansion,  $y_{ij}$  can be described as

$$y_{ij} = m_0^{(ij)} + m_1^{(ij)}s + \dots$$
 (5)

To synthesize the circuits between pairs of ports, the T-model shown in Fig. 7(a) is used, and the circuit parameters are determined by matching the moments of  $y_{ij}$ . The T-model together with the capacitors at two ports, to be synthesized with  $y_{ii}$ , forms a 2 $\Pi$  model which provides an accurate description of connection between pairs of ports. The admittance matrix of the T-model is

$$\tilde{y}_{ii} = \frac{1}{R_{ij1}} - \frac{R_{ij2}}{R_{ij1}(R_{ij1} + R_{ij2} + sC_{ij}R_{ij1}R_{ij2})}$$

$$\tilde{y}_{jj} = \frac{1}{R_{ij2}} - \frac{R_{ij1}}{R_{ij2}(R_{ij1} + R_{ij2} + sC_{ij}R_{ij1}R_{ij2})}$$

$$\tilde{y}_{ii} = \tilde{y}_{ii} = \frac{-1}{R_{ij1} + R_{ij2} + sC_{ij}R_{ij1}R_{ij2}}$$
(6)



Figure 7. Circuit models between pairs of ports.

By expanding the Eq. (6) into Taylor series, we have

$$\tilde{m}_{0}^{(ii)} = \tilde{m}_{0}^{(jj)} = -\tilde{m}_{0}^{(ij)} = 1/(R_{ij1} + R_{ij2})$$

$$\tilde{m}_{1}^{(ii)} = C_{ij}R_{ij2}^{2}/(R_{ij1} + R_{ij2})^{2}$$

$$\tilde{m}_{1}^{(jj)} = C_{ij}R_{ij1}^{2}/(R_{ij1} + R_{ij2})^{2}$$

$$\tilde{m}_{1}^{(ij)} = C_{ij}R_{ij1}R_{ij2}/(R_{ij1} + R_{ij2})^{2}$$
(7)

To determine the values of  $R_{ij1}$ ,  $R_{ij2}$  and  $C_{ij}$ , let

$$\widetilde{m}_{0}^{(ij)} = m_{0}^{(ij)} 
\widetilde{m}_{1}^{(ij)} = m_{1}^{(ij)} 
\widetilde{m}_{1}^{(ii)} / \widetilde{m}_{1}^{(jj)} = m_{1}^{(ii)} / m_{1}^{(jj)}$$
(8)

The reason why we match  $m_1^{(ii)}/m_1^{(jj)}$ , and not match  $m_1^{(ii)}$  and  $m_1^{(jj)}$  directly, is that  $m_1^{(ii)}$  and  $m_1^{(jj)}$  are the second moments of  $y_{ii}$  and  $y_{jj}$  which are total contribution of the circuit, not just a branch between port-*i* and port-*j*.

Solving the Eqs. (6), (7) and (8), we have

$$R_{ij1} = \frac{-\sqrt{m_1^{(ij)}}}{m_0^{(ij)} (\sqrt{m_1^{(ii)}} + \sqrt{m_1^{(jj)}})}$$

$$R_{ij2} = \frac{-\sqrt{m_1^{(ii)}}}{m_0^{(ij)} (\sqrt{m_1^{(ii)}} + \sqrt{m_1^{(jj)}})}$$

$$C_{ij} = \frac{(\sqrt{m_1^{(ii)}} + \sqrt{m_1^{(jj)}})^2 m_1^{(ij)}}{\sqrt{m_1^{(ii)} m_1^{(jj)}}}$$
(9)

The above formulas could be used to synthesize all off-diagonal elements of Y-matrix with  $m_1^{(ij)} \ge 0$ . However, if there are floating capacitors,  $m_1^{(ij)}$  may be negative. In this case, we can also use a floating capacitor to model the  $y_{ij}$  between port-*i* and port-*j* (see Fig. 7(b)). The admittance matrix of the model is

$$\tilde{y}_{ii} = \tilde{y}_{jj} = -\tilde{y}_{ij} = 1/R_{ij} + sC_{ij}$$
 (10)

Thus, we have

$$R_{ij} = -1/m_0^{(ij)} \qquad C_{ij} = -m_1^{(ij)} \tag{11}$$

For the diagonal elements  $y_{ii}$ , we need to synthesize  $y_{i0}$ , where

$$y_{i0} = y_{ii} - \sum \tilde{y}_{ii} = m_0^{(i0)} + m_1^{(i0)}s + \dots$$
 (12)

The  $y_{i0}$  is modelled with a parallel RC model as shown in Fig. 8. If there is no DC path from port-*i* to ground,  $m_0^{(i0)} = 0$ . The  $y_{i0}$  is modelled with a single capacitor. In general, the parameters of the model are

$$R_{ii} = 1/m_0^{(i0)} \qquad C_{ii} = m_1^{(i0)} \tag{13}$$

In case the capacitor  $C_{ii}$  computed in Eq. (13) is negative, we can set  $C_{ii} = 0$  and scale up all  $C_{ij}$  to keep total capacitance unchanged.



Figure 8. The RC parallel model of a one-port.

## 5. Stability of synthesized circuits

The sufficient condition for the reduced circuits to be stable is that all synthesized circuit parameter are non-negative [8, 15, 18]. In this section, we prove indeed that the reduced circuits are stable. See [13] for the proofs of the lemmas and theorems.

**Lemma 1:** The matrix obtained by replacing each element of the Y-matrix by its first moment is diagonal dominant and all off-diagonal elements are nonpositive. That is,

$$m_0^{(ii)} \ge \sum_{i \ne j} \left| m_0^{(ij)} \right|$$
, and  $m_0^{(ij)} \le 0$  for  $i \ne j$  (14)

**Lemma 2:** For RC circuits, the second moments  $m_1^{(ii)}$  of the diagonal elements  $y_{ii}$  (i = 1, ..., N) of the Y-matrix are positive.

**Theorem 1:** The resistances  $R_{ij1}$  and  $R_{ij2}$  computed by Eq. (9),  $R_{ij}$  by Eq. (11) and  $R_{ii}$  by Eq. (13) are nonnegative.

**Theorem 2:** The capacitances  $C_{ij}$  computed by Eq. (9) or Eq. (11) and  $C_{ii}$  by Eq. (13) are non-negative.

Therefore, all parameters of the reduced circuits are non-negative. According to [8, 15, 18], we have

Theorem 3: The reduced RC circuits are stable.

# 6. Efficiency and accuracy of synthesized circuits

The common method to increase accuracy of modeling a large interconnect network with moment matching techniques is to increase approximation order [2, 4, 9]. This will increase the modeling and simulation time. The problem becomes evident when the number of port is very large. Only recently, an improved PVL method [5] was proposed to compute the reduced-order approximate models of linear networks with multiple inputs and outputs. However, the huge full Y-matrix with high order of approximation will still be a heavy burden on simulation.

Instead of modeling a single large interconnect networks with higher order approximation, we partition the network and use lower order models to approximate multiple small subnetworks. High accuracy can be achieved using lower order models approximating the all subnetworks. The partitioning reduces the simulation time significantly. Furthermore, since partitioning chooses the optimal reduction order and does not merge large components, the reduction time actually decreases.

In addition, partitioning greatly reduces the number of elements in the reduced circuits. When using a Y-matrix to describe the port behavior of a network, the number of y-elements is  $O(N^2)$ , where N is the number of external ports. However, as we will show in experimental results, for typical clock networks, the number of circuit elements in reduced circuits with partitioning is O(N).

#### 7. Experimental results

To illustrate the efficiency and generality of the proposed approach, some industry circuits were tested. We compare results of the original circuits and the reduced circuits using SPICE3e2 [7] on a Sun Sparc 20 workstation. The first circuit is a clock network which consists of 26747 RC elements and has 547 external ports. The number of RC elements is reduced to 2084 which is less than 10% of that of the original circuit. Consequently, the simu-

| # of ports | # of elements | reduction<br>time (sec.) | simulation<br>time (sec.) |  |
|------------|---------------|--------------------------|---------------------------|--|
| 2          | 11            | 39.1                     | 0.2                       |  |
| 12         | 43            | 40.6                     | 0.3                       |  |
| 52         | 198           | 36.8                     | 0.5                       |  |
| 102        | 387           | 38.6                     | 0.8                       |  |
| 202        | 765           | 44.0                     | 1.5                       |  |
| 402        | 1537          | 45.3                     | 3.3                       |  |
| 547        | 2084          | 55.7                     | 4.4                       |  |
| Original   | 26747         | _                        | 202.8                     |  |

Table 1. The clock network reduction for different number of ports

lation time (SPICE3e2) is reduced from 202.8 seconds to 4.4 seconds. The number of elements and simulation time could be further reduced if fewer external ports were of interest. For example, only ports on critical paths may be of interest. Table 1 shows the reduction results for different number of external ports. The number of circuit elements in reduced circuits versus the number of external ports is plotted in Fig. 9. The figure shows that the number of reduced circuit elements is O(N), where N is the number of external ports. The simulation time versus the number of external ports is plotted in Fig. 10. Simulation results at the same port of those reduced circuits with different number of ports are almost identical (See Fig. 11). The error of the simulation results is less than 2%.

The reduction results of a loop-circuit and a mesh-circuit are shown in Table 2. After reduction, the simulation time of the loop-circuit and the mesh-circuit was reduced 7.7 times and 20 times, respectively. The error of the simulation results is less than 2%, compared with that of the



Figure 9. Number of the reduced circuit elements versus number of ports.



Figure 10. Simulation time of the reduced circuits versus number of ports.



Figure 11. Simulation results of the clock network.

| circuits | # of<br>ports | # of elements |      |         | simulation time |          |         |       |
|----------|---------------|---------------|------|---------|-----------------|----------|---------|-------|
|          |               | original      |      | reduced |                 | (second) |         | error |
|          |               | R             | С    | R       | С               | original | reduced |       |
| loop     | 102           | 1100          | 1000 | 210     | 111             | 5.4      | 0.7     | < 2%  |
| mesh     | 102           | 2560          | 2413 | 254     | 107             | 17.9     | 0.9     | < 2%  |

Table 2. The loop circuit and the mesh circuit reduction results.

original circuits.

The circuit partitioning plays a very important role in the circuit reduction. Another clock circuit consists of 7159 resistors and 6875 capacitors with 1953 external ports (7.19 elements per port). If we use only a simple multiport macromodel, the number of elements in the synthesized circuit would be much more than that of the original circuit. However, since the efficient combination of partitioning and reduction, the reduced circuit which keeps all 1953 ports external has only 5546 resistors and 2460 capacitors (4.10 elements per port).

The circuit reduction provides a new strategy to handle nonlinear circuits. A seven-port clock network with nonlinear drivers and loads is shown in Fig. 12. The interconnect consists of 9427 RC elements. After reduction, the circuit has only 73 lumped elements. The simulation time for this reduced circuit is 0.6 second, while simulation time for the original circuit is 43.0 seconds. Again, the results



Figure 12. A clock network with nonlinear drivers and loads.



Figure 13. Comparison of results of the clock network system and its equivalent circuit.

(See Fig. 13) show excellent agreement in response waveforms.

#### 8. Conclusions

The efficient combination of circuit partitioning and reduction provides an efficient circuit analysis strategy. Based on scattering parameter macromodel, the reducedorder models of linear networks with multiple inputs and outputs can be obtained in one pass of reduction. Circuit partitioning speeds up the reduction process, decreases the total number of circuit elements, and achieves the same accuracy using lower order approximation. The proposed circuit synthesis method guarantees the reduced circuit to be stable. The resultant equivalent circuits are fully compatible with general-purpose simulators. Since no assumption is made to port voltages and currents, the proposed approach can be applied to the simulation of analog circuits and mixed-signal circuits.

#### Acknowledgment

The authors would like to thank Zhong L. Mo of EPIC and John Chow of Archer for providing testing benchmarks and evaluating simulation results. This work was partially supported by the National Science Foundation Presidential Young Investigator Award under Grant MIP-9058100.

### References

- B. D. O. Anderson and S. Vongpanitlerd, *Network Analysis and Synthesis: A Modern Systems Theory Approach*, Englewood Cliffs, N.J., Prentice-Hall, 1973.
- [2] E. Chiprout and M.S. Nakhla, "Analysis of Interconnect Networks Using Complex Frequency Hopping (CFH)," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, pp. 186-200 Feb. 1995.
- [3] J. Dobrowolski, Introduction to Computer Methods for Microwave Circuit Analysis and Design, Artech House, 1991.
- [4] P. Feldmann and R. W. Freund, "Efficient Linear Circuit Analysis by Padé Approximation via the Lanczos Process," *Proceedings of the Euro-DAC*, Sept. 1994.
- [5] P. Feldmann and R. W. Freund, "Reduced-Order Modeling of Large Linear Subcircuits via a Block Lanczos Algorithm," *Proceedings of the 32th ACM/IEEE Design Automation Conference*, June 1995.
- [6] C. M. Fiduccia and R. M. Mattheyses, "A Linear-Time Heuristic Improving Network Partitions," *Proceedings of the* 19th Design Automation Conference, pp. 241-247, 1982.
- [7] B. Johnson, T. Quarles, A. R. Newton, D. O. Pederson and A. Sangiovanni-Vincentelli, "SPICE3 Version 3e User's Manual," University of California, Berkeley, April 1991.
- [8] S. Karni, *Network Theory: Analysis and Synthesis*, Allyn and Bacon, Inc., 1966.
- [9] S. Y. Kim, N. Gopal and L. T. Pillage, "Time-Domain Macromodels for VLSI Interconnect Analysis," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 13, pp. 1257-1270, Oct. 1994.
- [10] H. Liao and W. Dai, "Wave Spreading Evaluation of Interconnect Systems," *Proceedings of the 3rd IEEE Multichipmodule Conference*, pp. 128-133, 1993.
- [11] H. Liao, W. Dai, R. Wang, and F. Y. Chang, "S-Parameter Based Macro Model of Distributed-Lumped Networks Using Exponentially Decayed Polynomial Function," *Proceedings of the 30th ACM/IEEE Design Automation Conference*, pp. 726-731, June 1993.
- [12] H. Liao and W. Dai, "Scattering Parameter Transient Analysis of Interconnect Networks with Nonlinear Terminations Using Recursive Convolution," *Proceedings of the Synthesis* and Simulation Meeting and International Interchange, pp. 295-304, Oct. 1993.
- [13] H. Liao, "Scattering-Parameter-Based Macromodel for Transient Analysis of Interconnect Networks with Nonlinear Terminations," Ph.D Thesis, UCSC-CRL-95-24, University of California at Santa Cruz, 1995.
- [14] S. Lin, and E. S. Kuh, "Transient Simulation of Lossy Interconnects Based on the Recursive Convolution Formulation," *IEEE Trans. on Circuits and Systems I: Fundamental Theory* and Applications, vol. 39, pp. 879-892, Nov. 1992.
- [15] R. W. Newcomb, *Linear Multiport Synthesis*, New York, McGraw-Hill, 1966.
- [16] L. T. Pillage and R. A. Rohrer, "Asymptotic Waveform Evaluation for Timing Analysis," *IEEE Trans. on CAD*, vol. CAD-9, pp. 352-366, April 1990.
- [17] J. V. Raghavan, J. E. Bracken and R. A. Rohrer, "AWESpice: A General Tool for the Accurate and Efficient Simulation of Interconnect Problems," *Proceedings of the 29th ACM/IEEE Design Automation Conference*, pp. 87-92, June 1992.
- [18] J. C. Rautio, "Synthesis of Lumped Models from N-Port Scattering Parameter Data," *IEEE Transactions Microwave Theory and Techniques*, vol. 42, pp. 535-537, March 1994.