# Robustness Enhancement through Chip-Package Co-Design for High-Speed Electronics

Meigen Shen, Li-Rong Zheng, Hannu Tenhunen Laboratory of Electronics and Computer Systems, IMIT Royal Institute of Technology (KTH) Electrum 229, SE-164 40 Kista-Stockholm, Sweden E-mail: {mgshen, lrzheng, hannu}@imit.kth.se

#### Abstract

Low interaction between chip and package has more and more limited system performance. In this paper, chippackage co-design flow is presented. We address robustness enhancement under package and interconnection constraints by using impedance control, optimal package pins assignment and transmitter equalization. From the high-speed transmitter design example, co-design can reduce signal integrity problem, enhance its bandwidth, and improve high-speed electronic systems robustness.

## **1. Introduction**

The International Technology Roadmap for Semiconductors (ITRS) [12] shows that the on-chip local frequency and the number of pads are continually increasing. For example, in 2007, the on-chip local clock frequency will be 9.285GHz and the number of chip (microprocessor) pads will arrive 3072. At the same time, manufacturing technology is also undergoing rapid change. System-on-Package (SoP) approach will be used to fabricate electrical systems in order to optimize the best chip and package integration for cost, performance, size and reliability. With higher I/O frequency, I/O densities and tighter geometries, signal integrity issues such as simultaneous switching noise (SSN), crosstalk and resonance present increasing challenges to the design community. System architecture is also changing with rapidly scaled technology. System interconnect topologies have been moving away from multi-point and toward high-speed point-to-point links because the bandwidth requirement of communication systems, at box-to-box, board-to-board, and chip-to-chip levels increases. communications Currently, off-chip various in applications are far beyond Gbps data rate. These applications include fiber channel, Ethernet, 3GIO, InfiniBand, RapidIO. From the ITRS, the highperformance-level serial data rate will be 40Gbit/s in 2007. In order to realize this goal, more attention should be paid on the package and interconnection because package parasitics, interconnection attenuation and noise are more and more limiting the system performance. These parasitics translate into noise that degrades signal quality on and off the chip. Once the signals are corrupted, catastrophic system failures may occur. To increase electronic systems potential, advanced package technology such as flip-chip ball grid array (BGA), multichip-module (MCM) configurations are often used to reduce package parasitics and there also has been a great deal of interest in the use of multiple level signaling [3,7], equalization techniques [5,6] to overcome bandwidth limitations of interconnections. Robust system should also independent of process, voltage, and temperature (PVT) variation. So on chip PVT-compensation circuit and Corners or Monte Carlo analysis should be used to make systems robust against all kinds of parameter variations.

In this paper, robustness of high-speed electronics (5Gbps) was enhanced under package and interconnection constraints by using chip-package co-design, co-simulation and co-optimized. The paper is organized as following: chip and package co-design flow is first established. Package and interconnection models are created and off-chip signaling circuit is designed under these constraints. Then chip and package are co-simulated and co-optimized. Finally, the simulation results are discussed

## 2. Chip and package co-design flow

## 2.1 Package and interconnection constraints

To address package and interconnection constraints, the first step is to set up proper equivalent circuit models and extract the parasitic parameters for off-chip package. Fig.1 shows the package parasitics extraction procedure. The package discontinuities such as bondwire, through via, solder bump are simulated by using 3-dimension electromagnetic (EM) solver – HP HFSS. These



parameters are also acquired from its equivalent circuit, which is shown in table 1. As these sizes are smaller in comparison with operation frequency (2.5GHz, in our case), the lumped mode was used. Otherwise, distributed mode and high frequency effect such as skin effect must be considered. The EM solver simulation results and equivalent circuit simulation results are compared by using curve fitting to obtain the parasitic parameters. In table 1, the capacitance of bonding pad and ESD are scaled with pad area and PN junction area.



Fig.1 Package parasitics extraction procedure

 Table 1 Equivalent circuits of package discontinuities and trace

| Discontinuitie         | ESD | Bondwire, | Through | Microstrip |
|------------------------|-----|-----------|---------|------------|
| 8                      | Pad | Ball      | Via     | Line       |
| Equivalent<br>Circuits | ∎ c |           |         |            |

After extracting the package parasitics, the second step is to create wideband lossy transmission line for interconnection. The wideband resistor, capacitor, inductor models were first established, as shown in [9]. Once they have been obtained, the next step is to solve the multi transmission line (MTL) equations using these components models. Since the components are frequency dependent, wideband distributed equivalent circuit models [13,14] should be used for general-purpose circuit simulation such as SPICE. Another common method is to solve the MTL equations in frequency domain, and then use inverse Fourier-transform to obtain time domain solutions.

With sinusoidal steady-state assumption, the first order coupled MTL equations can be decomposed in a set of second-order, uncoupled, ordinary differential equations

$$\frac{\mathrm{d}^2}{\mathrm{d}z^2} V(z) = ZYV(z) \qquad (1)$$

$$\frac{\mathrm{d}^2}{\mathrm{d}z^2}\boldsymbol{I}(z) = \boldsymbol{Y}\boldsymbol{Z}\boldsymbol{I}(z) \qquad (2)$$

where  $Z=R+j\omega L$  and  $Y=G+j\omega C$ , denoting per-unit length impedance matrix admittance matrix. To obtain closedform results, mode decomposition procedure is used, as shown in [9]. For a uniform MTL with length  $\lambda$ , we can set z=0 at near-end and z= $\lambda$  at far-end. Consequently, the far-end and near-end voltages and currents can be related with a chain parameter matrix,  $\varphi(\lambda)$ :

$$\begin{bmatrix} \boldsymbol{V}(\lambda) \\ \boldsymbol{I}(\lambda) \end{bmatrix} = \boldsymbol{\varphi}(\lambda) \begin{bmatrix} \boldsymbol{V}(0) \\ \boldsymbol{I}(0) \end{bmatrix} \quad (3)$$

The element of this chain matrix is derivable from [9]. This expression is very useful in computing non-uniform MTL lines, because a non-uniform line can be broken into many short uniform segments with lengths  $\lambda 1$ ,  $\lambda 2$ ,  $\lambda 3$ , ...  $\lambda n$ . As a result, we have

$$\begin{bmatrix} \boldsymbol{V}(\lambda) \\ \boldsymbol{I}(\lambda) \end{bmatrix} = \boldsymbol{\varphi}_{n} \left( \lambda_{n} \right) \begin{bmatrix} \boldsymbol{V}(\lambda_{n-1}) \\ \boldsymbol{I}(\lambda_{n-1}) \end{bmatrix} = \boldsymbol{\varphi}_{n} \left( \lambda_{n} \right) \boldsymbol{\varphi}_{n-1} \left( \lambda_{n-1} \right) \begin{bmatrix} \boldsymbol{V}(\lambda_{n-2}) \\ \boldsymbol{I}(\lambda_{n-2}) \end{bmatrix}$$
$$= \dots = \prod_{K=1}^{n} \boldsymbol{\varphi}_{k} \left( \lambda_{k} \right) \begin{bmatrix} \boldsymbol{V}(0) \\ \boldsymbol{I}(0) \end{bmatrix}$$
(4)

The signal channel consists of the bondwire BGA package and PCB trace. The package is shown in Fig.2 and the interconnection model (or signal channel) for point-to-point on board communication is shown in Fig.3. From the figure, we can see, there are 5 segments in this model. Segment 1 is bondwire and pad. Segment 2 and 5 are wideband lossy microstrip transmission line. Segments 3, 4 are via and solder ball, respectively. The mutual inductance and capacitance are not shown in Fig.3. Off-chip and on-chip decoupling capacitor are used to reduce SSN and power supply noise.



Fig.2 Package discontinuities such as bondwire, through via, ball can be found in bondwire BGA package

2.2 Off-chip signaling circuit design and optimization





After establishing the interconnection models, its bandwidth, noise and crosstalk are estimated [1]. Under these constrains, optimal singling circuit is chosen. The current-mode differential signaling is used because it can reject common mode noise, minimize EMI, and reduce ground/supply bounce. It makes system more robust and better suited for the high-speed electronics.

Output impedance should be accurately controlled to reduce signal integrity problem. We use PMOS transistor as the termination resistor. In order to make PMOS transistor work well as a resistor, the output swing should be kept well inside its linear regime. In our implementation, for example, to avoid more than 5% of resistance variation, the signal swing should less than 400mV. Process, voltage and temperature (PVT) variation can add up to  $\pm 50\%$  variability in the output impedance. So output impedance compensation circuit should be used to reduce the variability from  $\pm 50\%$  down to  $\pm 10\%$ . In our design, digitally compensated output impedance is used and it is shown in Fig.4. In Fig.4, R<sub>0</sub>, R<sub>1</sub>, R<sub>z</sub> are accuracy  $(\pm 1\%)$  off-chip resistor. The control logic is similar as [11]. The Output impedance variation is compensated by using 4-bit binary weighted digital compensation techniques with one leg always on. The PMOS transistor width total becomes  $W_{p}+(W_{p0}+2W_{p0}+4W_{p0}+8W_{p0}).$ 

It is the distortion such as skin effect, dispersion, dielectric loss and reflections that lead to inter-symbol interference (ISI) [15]. The pre-emphasis (equalization) filter is used in the off-chip signaling circuit to undo the channel distortion. In frequency domain, this can be interpreted as having a high-pass pre-emphasis filter to

cancel the low-pass filtering effect of the channel, resulting in a flat frequency response. In the time domain, this can be interpreted as transmitting additional bits to cancel the inter-symbol interference, resulting in a delta function. In fact, equalization introduces filter E that cancels distortion H (i.e.  $E = H^{-1}$ ).



Fig.4 Digital compensation output impedance

A symbol-spaced finite-impulse-response (FIR) filter is used as the pre-emphasis filter in our circuit and it is described by the following equation

 $V_o(n) = V_i(n) + \alpha_1 \cdot V_i(n-1) + \alpha_2 \cdot V_i(n-2) + \dots$ 

where  $\alpha_{1,}\alpha_{2,}$ ... are the filter tap coefficients

Fig.5 shows the frequency response of one tap filter.





Fig 5: Illustration of the transfer function of the one tap pre-emphasis, symbol period is 200ps and  $\alpha 1$ = -0.3

Fig.6 shows the transmitter block diagram. It consists of PBRS generator, resynchronized circuit, 4:1 multiplexer and output driver. Before the data can be fit into the multiplexers from the single clock domain (for example, from the PRBS generator), they are resynchronized to multi-phase (4 phase in our design) clock domains to ensure proper timing margins. Pseudo-NMOS and current mode logic (CML) are used as multiplexer and output buffer. The schematics are shown in Fig.7, Fig.8, respectively.



Fig.6 Transmitter block diagram



Fig.7 4:1 multiplexer Fig.8 CML output buffer

#### 2.3 Package and off-chip interconnections

Impedance-controlled package play an importance role in high-speed off-chip communications. It should be carefully designed in order to avoid signal integrity problem such as reflection, overshoot, undershoot, and ring [2]. The bondwire with pad, via and ball can be made a single  $\pi$ -section lumped model as shown in Fig.9. The input impedance  $Z_i$  is designed to equal to  $Z_0$ , which is characteristic impedance of the transmission line.



Fig.9 Single  $\pi$ -section lumped model for package

As an example, inductor of bondwire and capacitor of output buffer, ESD, pad can be made a single  $\pi$ -section segment. If the value of inductance L is a constant, then the capacitance C must be chosen from the Eq. (5), where  $\omega$  is the operating frequency,  $\omega_c = 2/\sqrt{LC}$ . If the input impedance  $Z_i$  stay within 5% of the  $Z_0$ , the operating frequency ( $\omega$ ) should be less than  $0.62/\sqrt{LC}$ . So output buffer, ESD and pad should be carefully designed.

$$Z_{i} = \sqrt{\frac{L}{C}} \sqrt{1 - \left(\frac{\omega}{\omega_{c}}\right)^{2}}$$
(5)

Advanced packages, such as flip-chip BGAs with over 2,000 pins and I/Os in the GHz range are emerging. So chip I/O planning technology plays more and more important role for optimizing placement of I/Os at the chip level to reduce noise such as SSN, crosstalk.

The effective inductance of power or signal pin [10] is shown as following equation

$$L_{eq} = L_{p} \pm M$$
(6)  
$$\frac{1}{L_{total}} = \frac{1}{L_{eq1}} + \frac{1}{L_{eq2}} + \dots \frac{1}{L_{eqN}}$$
(7)

where  $L_{ea}$  is the effective inductance of one pin,  $L_p$  and M are self and mutual inductance, respectively,  $L_{total}$  is the total effective inductance of N pins .

From the equation (6), (7), we can see, the optimum number and locations for the pins should be analyzed to obtain minimum self-inductance and maximum mutual inductance. Mutual inductance between two pins for an identical function on the same side of a package increases the effective inductance, and the mutual inductance between two pins carrying currents for complementary functions decreases the effective inductance. Self- and mutual inductances increase with the length of interconnects. Therefore, shorter paths are preferred for power and signal pins. All of power pins are placed close to the ground pins in the optimized design. Increasing mutual inductance of differential pair pin also can



decrease the effective inductance and hence increase the interconnection bandwidth.

### 3. Co-simulation and co-optimization

#### **3.1 Digital compensation analysis**

Different worse case Corners are used to analysis the output terminator resistor caused by PVT variation. The corners are shown in Table 2. The temperature and supply voltage are 0°C (min), 25°C (type), 85°C (max), 3V (min), 3.3V (type), 3.6V (max), respectively. In the table, TM means typical condition. WP means fast NMOS and fast PMOS. WS means slow NMOS and slow PMOS. WO means fast NMOS and slow PMOS. WZ means slow NMOS and fast PMOS.

| Table 2 Worse case Corners           |                                              |                                        |                                        |                                                      |                                                      |  |
|--------------------------------------|----------------------------------------------|----------------------------------------|----------------------------------------|------------------------------------------------------|------------------------------------------------------|--|
| Corner                               | MOS                                          | RES                                    | CAP                                    | Temp                                                 | Supply                                               |  |
| 1                                    | TM                                           | TM                                     | TM                                     | TYP                                                  | TYP                                                  |  |
| 2                                    | WP                                           | WP                                     | WP                                     | MIN                                                  | MAX                                                  |  |
| 3                                    | WS                                           | WS                                     | WS                                     | MIN                                                  | MIN                                                  |  |
| 4                                    | WS                                           | WS                                     | WS                                     | MAX                                                  | MIN                                                  |  |
| 5                                    | WO                                           | WP                                     | WP                                     | MIN                                                  | MAX                                                  |  |
| 6                                    | WO                                           | WS                                     | WS                                     | MAX                                                  | MIN                                                  |  |
| 7                                    | WZ                                           | WP                                     | WP                                     | MIN                                                  | MAX                                                  |  |
| 8                                    | WZ                                           | WS                                     | WS                                     | MAX                                                  | MIN                                                  |  |
| 1<br>2<br>3<br>4<br>5<br>6<br>7<br>8 | TM<br>WP<br>WS<br>WS<br>WO<br>WO<br>WZ<br>WZ | TM<br>WP<br>WS<br>WS<br>WP<br>WS<br>WP | TM<br>WP<br>WS<br>WS<br>WP<br>WS<br>WP | TYP<br>MIN<br>MIN<br>MAX<br>MIN<br>MAX<br>MIN<br>MAX | TYP<br>MAX<br>MIN<br>MIN<br>MAX<br>MIN<br>MAX<br>MIN |  |



Fig.10 Corner analysis of PMOS resistor

If only one PMOS transistor is used as the termination resistor, its size is 194u/0.3u and the resistance is 50 $\Omega$ under typical environment. From the Fig.10, we can see, there is a maximum 37% variation in different PVT conditions. The termination resistance changes from 37.6  $\Omega$  (corner 2) to 68.46  $\Omega$  (corner 4). In our digital compensation circuit, Wp and Wp0 are 127u/0.3u, 5.538u/0.3u, respectively. So, the total transistor size is 322u. The output control signal depends on PVT variation. For example, the control signal C [0..3] is 0000 for corner 4. Control signal C [0..3] is 1111 for corner2 while their resistance keeps 50.3  $\Omega$ . If the nonlinear of PMOS transistor is not considered, variation is in the range of 5% under worse case PVT variation.

#### 3.2 I/O assignment

Because differential signaling is used in our design, differential pins are put close to decrease the effective inductance. Hence signal bandwidth is improved. In our design, the pin assignment is G P S S G P S S G P S S G P. Both the power and ground pin surround each signal. They act as return paths and shields. This minimizes the effective inductance and improves the signal integrity. Bondwire is a mainly cause to crosstalk, SSN and affect the signal bandwidth because of its low pass filter with pad. Using power or ground plane can decrease bondwire effective inductance. In our design, we assume that bondwire diameter is 25µm, die pad pitch is 75µm, bond pad pitch is 160µm, length is 3mm, and height is 200µm. If there is no ground plane under bondwire, the self and mutual inductance are 2.59nH, 1.33nH, respectively. Hence couple factor is 0.5136 and Leff is 1.26nH. If ground plane is used, the self and mutual inductance are 2.079nH, 1.27nH, respectively. Hence the coupling factor is 0.6098 and L<sub>eff</sub> equals to 0.809nH.

#### 3.3 Impedance controlled interconnect design

The parameters of the differential microstrip (copper) are: width=250µm, thickness=20µm, distance=500µm, length=240cm. The dielectric thickness, permittivity ( $\varepsilon_r$ ), loss tangent (tan $\delta$ ) are 500 $\mu$ m, 4.5, 0.013, respectively. The odd mode characteristic impedance of the differential microstrip is  $100\Omega$ . Three differential microstrips exist in our system, as shown in Fig.3. Parasitic capacitor C1 includes output buffer, bond pad, ESD capacitance and C2 is pad capacitance. The bondwire inductor L is 3.55nH and coupling factor is 0.6. So, C1, C2 are set to 300fF for impedance match. The other parameters are: through via (C1=C2=300fF, L=1.42nH), ball (C1=C2=300fF, L=1.42nH). The transfer function of the signal channel is shown in Fig.11. From the curve (s21), we can see, the attenuation is about -12.3dB/2.5GHz. The crosstalk can reduce 10dB at 2.5GHz by using ground and power bondwire adjacent to signal pins.

#### 3.4 Design example analysis

For robust design, the worst-case noise sources should also consider and it is shown in Table 3. The whole independent noise is about 40mv.

Table 3. Independent noise source

| Rx_O                                                | Receiver input offset 0.01                          |  |  |
|-----------------------------------------------------|-----------------------------------------------------|--|--|
| Rx_s                                                | Receiver sensitivity 0.01                           |  |  |
| PS                                                  | Unrelated power supply noise: 5% of V <sub>DD</sub> |  |  |
| Atnps                                               | Power supply noise attenuation 0.05                 |  |  |
| Tx_O                                                | Transmitter offset 0.01                             |  |  |
| Worse case V <sub>NI</sub> =Rx_O+Rx_s+Atnps*Ps+Tx_O |                                                     |  |  |



Fig.12 shows the eye-diagram of 5Gbps at the receiver end. Independent noise was not considered in this simulation. A  $2^7$ -1 pseudorandom bit sequence (PRBS) is used as the input data stream. If there is no transmitter equalization, the eye is closed in the receiver end and it is not suitable for bandwidth-limited channel, as shown in Fig.12 (a). When one-tap transmitter equalization is used (tap coefficients -0.3), the eye diagram is shown in Fig.12 (b). It has a maximum eye height of 90mv and an eye width of 226ps.



Fig.11 Transfer function of the signal channel



Fig.12 Eye-diagram of 5Gb/s at the receiver end

#### 4. Conclusions

A chip-package co-design approach was used to enhance high-speed electronics robustness. The chip and package were co-simulated and co-optimized under package and interconnection constraints by using on-chip and off-chip impedance control, optimal pin assignment, and transmitter equalization. It is shown that the system performance can be improved through chip and package co-design.

## **References:**

[1] A. Deutsch, et al., "Bandwidth Prediction for High-Performance Interconnections," in *Proc. 50th Electronic Components and Technology conference*, May 2000, pp.256-266.

[2] B. Young, Digital Signal Integrity: Modeling and Simulation with Interconnects and Packages, Prentice Hall PTR, 2001. ISBN 0-13-028904-3.

[3] J. L. Zerbe et al., "1.6 Gb/s/pin 4-PAM Signaling and Circuits for a Multidrop Bus," *IEEE J. of Solid-State Circuits*, Vol.36, No.5, May 2001, pp.752-760.

[4] S. Afonso, et al., "Modeling and Electrical Analysis of Seamless High Off-Chip Connectivity (SHOCC) Interconnects," *IEEE Trans. Adv. Package*, Vol.22 No.3, Aug. 1999, pp.309-320.

[5] J. T. Stonick and Gu-Yeon Wei et al., "An Adaptive PAM-4 5-Gb/s Backplane Transceiver in 0.25-µm CMOS, "*IEEE J. of Solid-State Circuits*, Vol. 38, No. 3, March 2003, pp. 436-443.

[6] R. Farjad-Rad, et al., "A 0.4μm CMOS 10-Gb/s 4-PAM Pre-Emphasis Serial Link Transmitter," *IEEE J. of Solid-State Circuits*, Vol.34, No. 5, May 1999, pp.580-58
[7] H. Johnson, "Multilevel Singling," presented at the DesignCon, Feb. 2000. [Online.] Available: http://www.sigcon.com/Pubs/misc/mls.htm.

[8] O. Ziv and James H. Constable, "Interconnection Channel Capacity Under Crosstalk Noise," in *IEEE Trans. on Electromagnetic Compatibility*, Vol.41, No.4, Nov. 1999, pp.361-365.

[9] Li-Rong Zheng, Design, Analysis and Integration of Mixed-Signal Systems for Signal and Power Integrity. PH.D thesis, pp.54-p56, Royal Institute of Technology, ISSN 1104-8697

[10] U. A. Shrivastava and B. Lan Bui, "Inductance Calculation and Optimal Pin Assignment for the Design of Pin-Grid-Array and Chip Carrier Packages," in *IEEE Trans. on components*, hybrids, and manufacturing technology, Vol.13, No.1, March 1990, pp.147-153.

[11] T. J. Gabara and Scott C. Knauer, "Digitally Adjustable Resistors in CMOS for High-Performance Application," in *IEEE Trans. on Solid-State Circuits*, Vol.27, No.8, August 1992, pp.1176-1185.

[12] International Technology Roadmap for Semiconductors, 2003 edition. http://public.itrs.net

[13] Alina Deutsch, Paul W. Coteus, etc, "On-Chip Wiring Design Challenges for Gigahertz Operation," in *Proceeding of the IEEE*, Vol.89, No.4, April 2001, pp.529-555.

[14] C. H. Yen, Z. Fazarinc, and R. L. Wheeler, "Timedomain skin-effect model for transient analysis of lossy transmission line," *Proc. IEEE*, Vol.70, No.7, July 1982, pp.750-757

[15] W. J. Dally, J. W. Poulton, Digital System Engineering, Cambridge University Press, 1998. ISBN 0-521-59292-5

