# **Design Tools for Reliability Analysis**

Zhihong Liu, Bruce W. McGaughy and James Z. Ma zliu@cadence.com, Cadence Design Systems, Inc. 2655 Seely Ave., San Jose, CA 95134, USA.

# ABSTRACT

Recent progress in EDA tools allows IC designs to be accurately verified with consequent improvements in yield and performance through reduced guard bands. This paper will present a tools perspective, including the primary effects such as HCI, NBTI and EM for which EDA tools are available, types of tools (dynamic simulation vs. static rule checking) and necessary reliability infrastructure and flows that have been working in practice. Finally, developing areas and future opportunities will be addressed.

#### **Categories and Subject Descriptors**

B.7.2 [Hardware]: IC; B.8.2 [Hardware]: Performance and Reliability; J.6 [Computer Applications]: CAD

### **General Terms**

Design, Performance, Algorithms, Reliability

#### Keywords

HCI, NBTI, EM, Reliability simulation, Design-in Reliability.

# **1. INTRODUCTION**

As CMOS technologies advance to nanometer feature sizes. various circuit reliability issues, such as HCI (Hot-Carrier Instability), NBTI (Negative-Bias Temperature Instability), and EM (ElectroMigration) become serious problems [1]. Many of the reliability issues will lead to threshold voltage and drain current shifts over time, causing circuit performance degradation. Certain degradation such as NBTI also reduces the yield during the hightemperature burn-in process, making reliability an issue not only for longer term circuit lifetime but also directly impacting profitability. Furthermore, current and voltage overshoots may magnify some reliability problems. EM problems for both signal and power nets are also increasing concerns due to higher power density and current density as the chip is getting smaller [2]. Higher temperature due to integration density worsens the EM and NBTI issues. On the other hand. IC design margins become so small that circuit designers cannot afford to allow even more design guard band for these reliability constraints, nor do they want to simply trade off the circuit performance or time to tape out. Therefore, accurate reliability simulations, efficient and realistic reliability analyses at design stages are necessary for circuit performance, reliability and yield. With the leading CMOS technologies now in the 65nm regime, the need becomes even more necessary.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

DAC 2006, July 24-28, 2006, San Francisco, California, USA.

Copyright 2006 ACM 1-59593-381-6/06/0007 ... \$5.00.

In this paper, circuit-level design tools for simulating HCI, NBTI and EM effects will be reviewed. The simulation flows will also be discussed with some useful practices as examples.

#### 2. HCI AND NBTI MODELING

A typical reliability modeling and simulation flow is described in Fig. 1, where a fresh circuit simulation is done with the fresh and original device models using a circuit simulator such as SPICE. Following the fresh circuit simulation, circuit activities and their impact to device lifetimes are evaluated. To decide the circuit performance after a certain operation time, depending on the degradation mechanisms, new device models need to be generated for each degraded device to replace the fresh device models. A second pass of the circuit simulation then has to be performed. As one can see, generating the degraded models based on reliability models and circuit activities are the keys to accurate reliability simulation.

HCI and NBTI modeling involves several key steps including transistor accelerated stress, reliability model parameter extraction and the subsequent model verification which will be discussed in more detail in the next few sections [3].



Fig. 1 A typical reliability modeling and simulation flow.

# 2.1 Physical Reliability Effects

Various reliability effects can happen for MOSFETs under circuit operating conditions where one particular effect usually dominates over the others in certain operating regions. They include HCI, NBTI, PBTI (Positive Bias Temperature Instability) and NCS (Non-conductive stress) as depicted in Fig.2. The most important phenomena are HCI and NBTI since they are two major contributors to the device degradation in advanced technologies.

As shown in Fig.3, the HCI effect is the result of electron trapping and/or interface-state generation induced by the impact

ionization of channel carriers near the drain region [4]. It strongly depends on the channel electric field and thus becomes much more severe at the shortest channel lengths and the highest drain voltages. Besides, the HCI effect can also be enhanced due to secondary impact ionization generation when a body bias is present. Substrate current serves as a good monitor for NMOS HCI while both substrate current and gate current are good monitors for PMOS HCI. On the other hand, the NBTI effect, which is driven by a high vertical electric field with acceleration by high temperatures, can cause the fixed-oxide charging in the gate oxide and interface-state generation at the Si/SiO2 interface. NBTI occurs even in longchannel PMOS devices where the HCI effect is negligible. NBTI gets worse for ultra-thin gate oxides used in nanometer technologies, especially at high temperatures. The device degradation resulting from the HCI and NBTI effects typically manifest themselves as shifts in the threshold voltage and drain current.



Fig. 2 Operating regions for HCI, NBTI, PBTI and NCS.



# Fig. 3 Physical pictures of the HCI and NBTI effects

The degradation process is an accumulative process. This is true not only for one particular reliability effect, but also for multiple reliability effects in the same device. This physics provides an evidence for de-embedding different reliability effects from each other for reliability modeling and simulation.

Reliability effects happen under normal operating conditions over a long period of time, e.g. years. To model these effects, as shown in Fig.4, multiple accelerated stresses at high voltages and high temperatures must be performed and the reliability models are then extracted and validated. When it comes time to simulate the reliability effects under the normal operating conditions, it is based on the fundamental assumption that the accelerated stresses can be extrapolated back to normal operating conditions. Therefore, it is important to choose appropriate accelerated stress conditions such that they cause sufficient device degradation within a reasonable length of time while still being close enough to the real operating conditions to allow systematic extrapolation. This requires a good understanding of the worse-case reliability effects and the degradation dependency on the device characteristics. It is also important to be able to de-embed the combined reliability effects from each others, by making one particular effect more severe than the rest under certain bias and temperature conditions.

# 2.2 HCI Modeling

Device degradation is a function of bias and temperature, and experimentally its relationship versus stress time shows multiple straight lines in a log-log scale. A unified variable called age can be used to transform the time axis, so that when the degradation under different stress conditions is plotted against the age, a simple straight line can be obtained.

A commonly used HCI lifetime model can be described in expression (1) which correlates the NMOS device degradation to the substrate current under DC stress conditions [5]. Here H, m and n are the model parameters. The parameter age, defined as the product of the degradation rate and stress time, is used to quantify the degree of device degradation. Furthermore, this lifetime model assumes that the device degradation is an accumulative process, and the device degradation rate is independent of previous stresses or the present age. This is the basis of device lifetime calculated at the circuit level under AC stress conditions.

$$\Delta D = \left(\frac{Ids}{HW} \left(\frac{Isub}{Ids}\right)^m \bullet t\right)^n \tag{1}$$



#### Fig. 4 HCI, NBTI, PBTI and NCS accelerated stresses.

The expression (1) requires an accurate modeling of Isub characteristics across all applicable device geometries, operating voltages and temperatures. The physics-based  $I_{sub}$  model in equation (2) is much more capable in that regard as compared to the  $I_{sub}$ 

equations in the BSIM3 or BSIM4 models [3]. Special deembedding techniques may be needed because any measured  $I_{sub}$ may be contaminated by parasitic current components such as junction leakage, gate oxide leakage, gate-induced drain leakage and so on.

$$Isub = \frac{A_i}{B_i} (Vds - Vdsat) \bullet Ids \bullet \exp(-\frac{B_i l_c}{Vds - Vdsat})$$
(2)

# 2.3 NBTI Módeling

The NBTI lifetime model must be able to predict all major physical effects such as the dependence of  $V_{gs}$ ,  $V_{ds}$ , temperature and channel length. An NBTI degradation model was developed and enhanced based on the commonly accepted models [3]:

$$\frac{\Delta Ids}{Ids(0)} = A \bullet \exp(-\Delta H/KT) \bullet \exp(\gamma/Vg) \bullet t^n$$
(3)

The key is to find an NBTI age formula such that the device degradation due to the NBTI effect and the NBTI age can be physically established and modeled.

Although NBTI accelerated stresses mostly occur in the offstate, the NBTI effects are always combined with the HCI effects in on-state. Therefore, it is necessary to de-embed the HCI and NBTI effects from each other so that the HCI and NBTI reliability model parameters can be extracted independently.

Under AC operation conditions, the NBTI device degradation can have observable recovery when the NBTI stress is removed. This recovery effect must be modeled in so as to avoid overestimation of the NBTI effect. It is found that the recovery is universally independent of stress voltage, stress time and temperature [6]. Furthermore, in the extreme case, the recovery can totally reset the degraded device to its original state. NBTI models which do not include the recovery effects can lead to overly conservative guard band. Therefore, it is critical to be able to physically measure the NBTI recovery effect, and to model and simulate the NBTI effects by partially or completely subtracting NBTI recovery effects in a realistic way.

# 3. HCI and NBTI SIMULATION

With the extracted HCI and NBTI reliability models, it is now possible to perform the HCI and NBTI reliability simulation in an improved design flow as depicted in Fig.5 [7], [8].



Fig. 5 Reliability tools in the circuit design flow including transistor level and gate level reliability simulation.

# 3.1 Transistor Level HCI and NBTI Simulation

Transistor level HCI/NBTI simulators such as RelXpert depend on circuit simulators like SPICE to calculate fresh and aged voltage waveforms. They can achieve excellent accuracy with respect to silicon measurements. Some fast SPICE simulators such as UltraSim can also account for HCI and NBTI effects [3].

# 3.2 HCI and NBTI Reliability Analysis

Traditionally, rule-based techniques are used to address the reliability concerns of advanced CMOS technologies by either process technology engineers and/or circuit designers. Unfortunately, these techniques only consider the worse-case DC or static AC operating conditions at the shortest channel devices which are often too conservative. Therefore, dynamic circuit level reliability simulation is needed. To simulate the HCI and NBTI effects, as shown in Fig.1, reliability simulators are linked to a SPICE simulator, which reads in the SPICE circuit netlist and generates the fresh voltage and current waveforms for the terminals of all MOSFETs. The reliability simulator then uses these waveforms to calculate the substrate current and the corresponding age for each device. Once the device age is known, the degraded model can then be created for the aged circuit which is read in again by the SPICE simulator for second-pass circuit simulation. The final reliability simulation results include the device degradation table, the Isub table, the simulated fresh and aged waveforms, and so on.

For simple circuit reliability study, designers can use the estimated lifetime parameters to perform the reliability checks. This can help identify the potential hot spots of the circuit such that the circuit design schemes can be improved to relieve the reliability effects. In a more advanced reliability design flow, designers can use the reliability simulators repetitively to study how each of design schemes may affect the circuit reliability and performance, then to quantify the reliability severity of the chosen designs. Process development engineers can also use the reliability simulators to determine the optimal screening stress and burn-in conditions. Since the burn-in process may significantly reduce the device lifetime, it needs to be considered in the circuit design phase.

| Factors                 | Degree of Approximation |
|-------------------------|-------------------------|
| (1) Voltage Waveform    | (a) Worse-case DC       |
|                         | (b) Worse-case AC       |
|                         | (c) Device-dependent AC |
| (2) Switching Activity  | (a) 100% Duty Cycle     |
|                         | (b) Clock Frequency     |
|                         | (c) Real Activity       |
| (3) Circuit Sensitivity | (a) Device Ids          |
|                         | (b) Stage Delay         |
|                         | (c) Path Delay          |

# Fig. 6 Key factors for the HCI and NBTI degradation.

Fig.6 summarizes some key factors for the HCI and NBTI degradation including the circuit voltage waveforms, switching activity and circuit sensitivity. The importance of these factors is further illustrated in Fig.7 where the impacts of voltage

overshoot are clearly shown. Fig.8 demonstrates how the configuration and application of a 2-input NAND gate circuit can impact the circuit reliability performance [9], [10]. First of all, the MNA transistor degrades much faster than the MNB transistor mainly due to the secondary impact ionization effect caused by the body effect. Besides, the reliability temperature behaviors are quite different between the NMOS and PMOS transistors. The degradation of PMOS transistors at 125C is significantly higher due to the contribution from NBTI effects.



Fig. 7 Possible physical causes for an enhanced HCI degradation due to voltage overshoot.



# Fig. 8 A comparative study on a 2-input NAND gate in terms of the HCI/NBTI effects at 25C and 125C.

# 4. Gate Level Reliability Simulation and Analysis

While the transistor level models and tools discussed in the previous section provide excellent accuracy, they are not convenient or efficient enough for digital design flows. Transistor reliability effects such as HCI can be incorporated into a standard cell design flow through degraded timing ratios [11] as shown in Fig.9. A degraded timing library is built for all of the standard cells using a reliability cell characterization tool.



Fig. 9 Gate level reliability analysis in the digital design flow. Key components are the reliability characterizer & age calculator based on circuit activities and sensitivity of gates to degradation.

During timing analysis, the degraded SDF can be generated based on the degraded timing library, the delay calculator, and the stress conditions (in the form of switching frequency for HCI and logic high/low probabilities for NBTI). In the case of HCI, each of the standard cells is characterized as follows:

$$\alpha(T_{slew}, C_L, N_{sw}) = \frac{T_{aged}}{T_{fresh}}$$
(4)

where  $\alpha$  is the ratio of the aged gate delay ( $T_{aged}$ ) to fresh gate delay ( $T_{fresh}$ ),  $T_{slew}$  is the input slew rate,  $C_L$  is the output load of the gate, and  $N_{sw}$  is the number of switching events that occur at the input. When there are multiple inputs, the delay from one input pin to the output pin can be represented as a linear combination of the ratios of each transistor in the gate, as follows:

$$\alpha = \sum_{i=1}^{n} \alpha_i - (n-1) \tag{5}$$

where n is the total number of series connected transistors and  $\alpha_i$  is the contribution from transistor i. The transistor level models and tools introduced in the previous sections are used during the cell characterization process to generate equivalent degraded timing ratios for each gate. Fig.10 shows the fresh and degraded timing waveforms resulting from degraded timing characterization for an inverter cell under various input slew rate and output loading conditions. The gate level delay degradation macro model shows excellent agreement with the transistor level reliability simulation, as shown in Fig.11.

The flow for other transistor reliability phenomena can follow a similar gate-level verification flow. For example, NBTI degradation is only sensitive to the probability of a low gate input voltage and the power supply voltage. The impact that NBTI has on the timing of the cell would still be sensitive to the loading, input slew rates, and relative position in the switching path for series connected transistors, so a similar table of data as for hot carrier could be built with  $P_{low}$  or  $P_{high}$  replacing  $N_{sw}$ .

As CMOS scaling continues, the effect of digital noise caused by tightly coupled wires, and increased leakage current caused by degradation will become bigger concerns. The impact of digital



Figure 10: Fresh and degraded timing waveforms for characterization of an inverter standard cell.

noise caused by coupling can be incorporated into the digital design flow by linking the gate reliability tool with a signal integrity verification tool which determines the maximum overshoot and undershoots. The accelerated degradation can be approximated as a proportionately higher power supply voltage. Accordingly, degraded timing tables can be built for multiple power supply voltages. Finally, leakage currents can increase due to the degradation, so degraded leakage current can also be characterized as part of the cell library reliability characterization.

#### 5. Electromigration Simulation and Analysis

Electromigration is caused by the flow of metal ions due to the electronic force exerted on them by the flow of electrons during



Figure 11: Correlation of the ratio of aged to fresh gate delay between the gate level hot carrier simulations (using GLACIER) and the transistor level hot carrier simulations (using BERT) for a variety of logic gates.



Figure 12: Color-coded overlay of current densities on a layout showing areas of the chip with higher risks of EM failure.

current conduction. The mean time to failure of a wire due to electromigration can be modeled by Black's equation [2]:

$$MTTF = A \bullet J^{-n} \bullet \exp\left(\frac{E_a}{k \bullet T}\right) \tag{6}$$

where A is a material constant dependent on structural and geometric properties of the conductor, J is the current density,  $E_a$  is the activation energy, k is the Boltzmann constant, and T is the temperature. There are independent limits set for the average current density  $J_{avg}$  (under DC or pulsed DC conditions), the rms current  $J_{rms}$  (under Joule self-heating stress) and the peak current  $J_{peak}$  (under high current pulse stress). While copper is well known for having better resistance against electromigration due to significantly higher activation energy and higher conductivity [12], continued scaling of wires and increased power density will cause electromigration effects to worsen due to higher current densities and Joule selfheating.

In power supply wires, the current is usually pulsed DC, while in signal net wires the current is often bi-directional leading to low average current density. However, portions of signal nets can also experience pulsed DC currents, in particular buses which are often speed sensitive and carry correspondingly higher currents. Therefore, the average, rms and peak current densities must all be controlled within their respective safe limits to ensure an EM safe design [13]. The areas of the design with highest current densities relative to their respective EM susceptibility are typically highlighted on the layout as color coded results, with red areas showing the highest risk (as shown in Fig. 12).

Static and dynamic techniques exist for determining the current densities. In static methods, the challenge is to determine a realistic worst-case value of current densities without input stimulus to the circuit. Static methods are generally applicable only to digital circuits where the function of the each logic gate and its corresponding current densities can be determined without stimulus. Together with switching probabilities, the current drawn by each gate can be calculated approximately [14]:

$$I = \frac{C_L \bullet V_{dd}}{t} \tag{7}$$

Where  $C_L$  is the load capacitance,  $V_{dd}$  is the power supply voltage and t is either half the rise time of the gate's output for peak current calculation or the average period between low to high switching events for average current calculation. EM on signal nets can also be characterized statically for digital logic gates, where resistive shielding effects due to interconnect load are important to avoid pessimistic EM predictions [13]. Finally, the flow of charge through an interconnect network can be efficiently calculated by the nodal equations for the linear system and the total charge transfer at the driving point, avoiding costly transient simulations [15].



# Fig. 13 Electromigration analysis in the design flow at the floorplanning, block design, full-chip design, and sign-off stages.

Dynamic methods are useful to get a more accurate estimation and also for circuits that are not amenable to static analysis, such as memory, analog and RF circuits. The main challenge in dynamic methods is to determine the input stimulus to trigger the worst case current densities and also the high demands that unreduced postlayout netlists put on transient simulation engines.

With interconnect playing a central role throughout the design flow for advanced processes, EM aware design and analysis must also be included at each stage. Furthermore, to deal with the complexity of designing very large highly integrated chips, hierarchical design is becoming the mainstream. Consequently, the EM analysis itself should also be carried out hierarchically as shown in Figure 13. In hierarchical EM analysis, the current densities for each block are carrier to the top level EM analysis at both the floorplanning and detailed chip implementation stages. To complement EM analysis, EM aware physical design tools can be used to avoid violations during place and route, and fix violations during chip finishing by techniques such as current decompaction [16].

# 6. SUMMARY

The common design tools for HCI, NBTI and EM analysis have been thoroughly reviewed. We have also discussed how these tools fit in the common design flows. Smaller geometries, new materials, and higher power densities will result in tighter reliability margins. To achieve optimal yield and performance and ensure adequate product lifteimes, it will be critical to have more detailed analysis and control of reliability degradation phenomena throughout the design flow.

# 7. REFERENCES

- G. L. Rosa, et al., "New phenomena in device reliability physics of advanced CMOS submicron technologies," IRPS Tutorial, 2001.
- [2] J. R. Black, "Electromigration failure models in aluminum metallization for semiconductor devices," Proceedings of the IEEE, vol. 57(9), p.1587 - 1969.
- [3] Users' Manuals for BSIMPro<sup>+</sup>/RelXpert/UltraSim tools, www.cadence.com.
- [4] C Hu, et al., IEEE Trans. Electron Devices, "Hot-electron induced MOSFET degradation – model, monitor, improvement," vol. ED-32, p.375, 1985.
- [5] C. Hu, IEEE J. of Solid-State Circuits, p.241, 1992.
- [6] S. Rangan, et al., "Universal recovery behavior of negative bias temperature instability," IEDM Tech. Dig., p.341, 2003.
- [7] B. W. McGaughy, et al., "Meeting the challenges of hot carrier modeling and simulation for the VDSM era," TCAD, 1999.
- [8] Zhihong Liu, et al., "Build-in reliability analysis for circuit design in the nanometer technology era," ICICDT, 2004.
- [9] N. K. Jha, et al., "NBTI degradation and its impact for analog circuit reliability," IEEE Trans. Electron Devices, vol.52, p.2609, 2005.
- [10] K. R. Mistry, et al., "Circuit design guidelines for n-channel MOSFET hot carrier robustness," IEEE Trans. Electron Devices, vol.40, p.1284, 1993.
- [11] H. Yonezawa, et al, "Ratio based hot-carrier degradation modeling for aged timing simulation of millions of transistors digital circuits" IEDM Tech. Dig., p.93, 1998.
- [12] C. S. Hau-Riege, "An introduction to Cu electromigration", Microelectronics Reliability, Volume 44, Number 2, February 2004, pp. 195-205(11).
- [13] N.S. Nagaraj, et al, "A practical approach to static signal electromigration anlaysis", DAC, p.572, 1998.
- [14] S. Steele, et al, "Full chip verification methods for DSM power distribution systems", DAC, p.744, 1998.
- [15] D. Blaauw, et al, "Static electromigration analysis for on-chip signal interconnects", IEEE Trans. on CAD, vol. 22(1), p.39, Jan, 2003.
- [16] J. Lienig, et al., "Electromigration-aware physical design of integrated circuits", IEEE International Conf. on VLSI Design, 2005.