### Minimizing Power across Multiple Technology and Design Levels

Takayasu Sakurai

Center for Collaborative Research, and Institute of Industrial Science, University of Tokyo E-mail: tsakurai@iis.u-tokyo.ac.jp

#### Abstract

Approaches to achieve low-power and high-speed VLSI's are described with the emphasis on techniques across multiple technology and design levels. To suppress the leakage current in a standby mode, Boosted Gate MOS (BGMOS) is effective, which is based on cooperation between technology level and circuit level. To reduce the power in an active mode,  $V_{DD}$ hopping and  $V_{TH}$ -hopping are promising, which are cooperative approaches between circuit and software. Power consumed in interconnect system can be reduced by a cooperative approach between application and layout as in bus shuffling. Other lowpower design approaches are also discussed.

### **1. Introduction**

Power consumption of VLSI's is ever increasing and various effective techniques to mitigate the power problem have been proposed at a level of system, algorithm, software, CAD, circuit, technology and assembly. There is, however, a new trend in low-power designs to exploit cooperation across multiple technology and design levels. To support the new paradigm, EDA tools are required. In Fig.1, some of the cooperative schemes are tabulated.

# 2. Cooperation between technology and circuit: BGMOS

In order to mitigate the leakage problem in a standby mode, it is effective to insert a non-leaking power switch in series to a leaky yet high-speed logic gate block made of low-V<sub>TH</sub> MOSFET's (BGMOS in Fig.2, [1]). The basic idea is the same as MTCMOS [19] but MTCMOS becomes slow when V<sub>DD</sub> gets less than 1V and stops operating when V<sub>DD</sub> gets less than 0.5V. In BGMOS, the non-leaking power switch is realized by a high-V<sub>TH</sub> (0.6V for example) MOSFET but the gate of the switch is driven up to higher voltage than V<sub>DD</sub> to ensure high drivability. The gate oxide thickness of the power switch should be thicker than normal transistors to cope with the higher gate voltage.

To realize the scheme, the technology side provides a thicker oxide transistor, while the design side thinks about using the different type of transistors and thus the scheme can be called cooperation between a technology level and a circuit level. Design tools are needed to handle various types of transistors for low power. MOSFET's tuned for the higher voltage is also helpful in SRAM, I/O and analog designs as shown in Fig.2.

### **3.** Cooperation between circuit and software: $V_{DD}$ hopping and $V_{TH}$ hopping

In an active mode, changing  $V_{DD}$  and  $V_{TH}$  in time in accordance with required performance is effective for power reduction. If  $V_{DD}$  is lowered or  $V_{TH}$  is increased, the power decreases but speed is degraded as shown in Fig.3. The difficulty is to find the timing to bwer the speed. Only software knows when it is possible to decrease the processor performance without sacrificing the system performance. Hardware provides a method to change  $V_{DD}$  and/or  $V_{TH}$ . The scheme is shown to be effective even for real-time multimedia applications.

In  $V_{DD}$ -hopping,  $V_{DD}$  is changed according to software's decision [14, 6] (see Fig.4). The scheme has been applied to a MPEG4 codec system and the power of the processor has been reduced to one fourth of the conventional fixed  $V_{DD}$  processor in the measurement. The video codec system guarantees real-time operation for any data input but the highest performance is needed only for 6% of time.

The algorithm to adaptively change  $V_{DD}$  depending on the workload is of importance. Since the workload depends strongly on data, the control should be dynamic in run-time, and should not be static in a compile-time. By chopping a real-time task into slices, and by monitoring current time and deadline for a slice, we can successfully control  $V_{DD}$  to reduce power. There is a software feedback loop.

It is to be noted that  $V_{DD}$  hopping algorithm works fine for every multimedia application we tried including MPEG2 and VSELP although the switching time between voltage levels requires 0.2ms which is considered to be extraordinary long in terms of processor clock period. This long transition time is due to the charging and discharging of huge capacitance on  $V_{DD}$ nodes on the board and in the LSI. In a multimedia application, however, the real-time feature is for humans and human is slow. This is the reason why the  $V_{DD}$  hopping works fine in spite of the long transition time between voltage levels. The other point of interest is that the number of voltage levels can be as low as two as is shown in Fig.5.

The  $V_{DD}$  hopping scheme can also be applied to multitasking real-time operating system [20] (see Fig.6). Since OS has higher-level information on available time slot that can be assigned to an application, higher efficiency can be realized compared with application-only case as shown in Fig.16. One example we tried is modified power-conscious µ-ITRON OS running FFT and MPEG4 at the same time and the observed power reduction was 75% while the power saving for FFT alone was only 50%.

When subthreshold leakage becomes dominant in the future as shown in Fig.7, the same software control mechanism can be used in  $V_{IH}$  hopping scheme where  $V_{IH}$  is changed in time in accordance with the required performance [5] (see Figs.8-9). About 80% power reduction is possible for a multimedia real-time application.

# 4. Cooperation between application and layout: Bus shuffling

Power consumed in interconnects is another issue. Recent interconnect consumes power by the coupling capacitance. Bus shuffling which reduces the power consumed by the coupling capacitance is an approach for low-power through cooperation between application level and layout. Bus layout is just shuffled without any encoder and decoder but the scheme achieves about 40% power reduction [2] (see Figs.10-11).

### 5. Other approaches

Another important low-power consideration is on I/O's. 3D integration using System in Package (SiP) will be effective in reducing the I/O power. In designing SiP, co-design between an LSI itself and an assembly structure will be needed. Voltage drop across power lines due to high current expected in low- $V_{DD}$  regime can also be mitigated by the use of the thicker metal layer on an interposer and area pads of an LSI (see Fig. 12-14). Design tools for the SiP are to be investigated.

In the future, when device is scaled further, power consumption of LSI's tend to increase due to the leakage increase including sub-threshold, gate tunneling, and junction leakage. One important way to mitigate active leakage problem is to adopt memory-rich architectures [7] (see Figs.15).

Some of the approaches for low-power LSI's in the active leakage dominant regime are summarized in Fig.16.

#### References

1. T.Inukai, M.Takamiya, K.Nose, H.Kawaguchi, T.Hiramoto, and T.Sakurai, "Boosted Gate MOS (BGMOS): Device/Circuit

Cooperation Scheme to Achieve Leakage-Free Giga-Scale Integration," Proc. of CICC, pp.409-412, May 2000.

- Y.Shin, and T.Sakurai, "CouplingDriven Bus Design for Low-Power Application-Specific Systems," Proceedings of Design Automation Conference, pp.750-753, June 2001.
- S.Heo and K.Asanovic, "Leakage-Biased Domino Circuits for Dynamic Fine-Grain Leakage Reduction," Symp. on VLSI Circ., pp.316-319, June 2002.
- S.Tang, S.Hsu, Y.Ye, J.Tschanz, D.Somasekhar, S.Narendra, S.Lu, R.Krishnamurthy, and V.De, "A Leakage-Tolerant Dynamic Register File Using Leakage Bypass with Stack Forcing (LBSF) and Source Follower NMOS (SFN) Techniques," Symp. on VLSI Circ., pp.320-321, June 2002.
- K.Nose, M.Hirabayashi, H.Kawaguchi, S.Lee and T.Sakurai, "VTH-hopping scheme for 82% power saving in low-voltage processors," Proc. of CICC, pp.93-96, May 2001.
- K.Aisaka, T.Aritsuka, S.Misaka, K.Toyama, K.Uchiyama, K.Ishibashi, H.Kawaguchi, and T.Sakurai, "Design Rule for Frequency-Voltage Cooperative Power Control and Its Application to an MPEG-4 Decoder," Symp. on VLSI Circ., pp.216-219, June 2002.
- K.Nose, and T.Sakurai, "Optimization of VDD and VTH for Low-Power and High-Speed Applications," Proceedings of Asia and South Pacific Design Automation Conference, pp.469-474, Jan.2000.
- H.Kawaguchi, Y.Itaka and T.Sakurai, "Dynamic Leakage Cut-off Scheme for Low-Voltage SRAM's," Symp. on VLSI Circuits, pp.140-141, June 1998.
- K.Kanda, T.Miyazaki, M.K.Sik, H.Kawaguchi, and T.Sakurai, "Two Orders of Magnitude Leakage Power Reduction of Low Voltage SRAM's by Row-By-Row Dynamic VDD Control (RRDV) Scheme," to be published, 15th Annual IEEE Int. ASIC/SOC Conf., Rochester, NY, Sept. 2002.
- "Scaling Limit in a Power Limited Environment, Architecture versus Circuit Design," Rump Session, Symp. on VLSI Circ., p.96, June 2002.
- T.Ishihara, and K.Asada, "An Architectural Level Energy Reduction Technique for Deep-Submicron Cache Memories", ASPDAC'02, pp.282-287, Jan. 2002.
- T.Douseki, J.Yamada, and H.Kyuragi, "Ultra Low-Power CMOS/SOI LSI Design for Future Mobile Systems," Symp. on VLSI Circ., pp.6-9, June 2002.
- H.Kawaguchi, K.Nose, and T.Sakurai, "A Super Cut-off CMOS (SCCMOS) Scheme for 0.5-V Supply Voltage with Picoampere Stand-by Current," IEEE JSSC, vol.35, no.10, pp.1498-1501, Oct.2000.
- S.Lee, and T.Sakurai, "Run-Time Voltage Hopping for Low-Power Real-Time Systems," Design Automation Conference, pp.806-809, June 2000.
- S.Geissler et al., "A Low-Power RISC Microprocessor using Dual PLLs in a 0.13um SOI Technology with Copper Interconnect and Low-k BEOL Dielectric," ISSCC, pp.148-149, Feb.2002.
- M.Horiguchi, T.Sakata and K.Itoh, "Switched-Source-Impedance CMOS Circuit for Low Standby Subthreshold Current Giga-Scale LSI's," Symp. on VLSI Circ., pp.47-48, June 1993.
- T.Kuroda, K.Suzuki, S.Mita, T.Fujita, F.Yamane, F.Sano, A.Chiba, Y.Watanabe, K.Masuda, T.Maeda, T.Sakurai, T.Furuyama, "Variable Supply-Voltage Scheme for low-Power

High-Speed CMOS Digital Design," IEEE JSSC, vol.33, No.3, pp.454-462, Mar. 1998.

18. T.Kuroda, T.Fujita, S.Mita, T.Nagamatsu, S.Yoshioka, K.Suzuki, F.Sano, M.Norishima, MMurota, M.Kako, M.Kinugawa, M.Kakumu, and T.Sakurai, "A 0.9-V, 150-MHz 10-mW 4mm2 2-D Discrete Cosine Transform Core Processor with Variable Threshold-Voltage (VT) Scheme," IEEE JSSC, vol.31, no.11, PP.1770-1779, Nov. 1996.



Fig.3 Power & Delay Dependence on V<sub>DD</sub> & V<sub>TH</sub>



- 19. S.Mutoh, T.Douseki, Y.Matsuya, T.Aoki, S.Shigematsu and J.Yamada, "1-V Power Supply High-Speed Digital Circuit Tecnology with Multithreshold-Voltage CMOS," IEEE JSSC, No.30, Vol.8, pp.847-854, 1995.
- 20. Y.S.Shin, H.Kawaguchi, T.Sakurai, "Cooperative Voltage Scaling (CVS) between OS and Applications for Low-Power Real-Time Systems," CICC'01, pp.553-556, May 2001.
- Fig.4 Application slicing and software feedback loop in  $V_{\text{DD}}$  hopping



Fig.5 V<sub>DD</sub> hopping reduces power in multimedia applications





Fig.6 Power Conscious OS & Application Slicing



Estimated based on ITRS2001(International Technology Roadmap for Sem parameters. VTH is extracted from on- and off- current. Power is at 100°C. ductors

Fig.7 Increasing leakage power



Fig.8 Schematic of V<sub>TH</sub>-hopping



#### On-chip bus coding

• Area, delay and power overhead by extra circuit

Bus shuffling



- Just shuffling and no overhead
  Considering coupling among lines
- Statistical characteristics of signal is required.



Fig.10 Bus shuffling concept

- Result of heuristic
  - 7 data address sets
  - 40% power saving compared to un-shuffled buses



Fig.11 Power reduction by bus shuffling



K.Ohsawa, H.Odaira, M.Ohsawa, S.Hirade, T.Iijima, S.G.Pierce, "3-D Assembly Interpos Technology for Next-Generation Integrated Systems," ISSCC Digest of Tech. Papers, pp.272-273, Feb.2001.

Fig.12 System in Package





Fig.14 Reduction of I/O power by superconnect



Fig.15 Demand for memory-rich architecture

|               | Active Dynamic  | Active Leakage      | Standby    | Memory       |
|---------------|-----------------|---------------------|------------|--------------|
| Multipe VTH   |                 | Dual-VTH[15],LBD[3] | MTCMOS[19] | Dual-VTH[12] |
| Variable V TH |                 | V™ hopping[5]       | VTCMOS[18] | DLC[8]       |
| Multiple VDD  | Dual-Voo[17]    |                     | BGMOS[1]   | Dual-tox[1]  |
| Variable V DD | VDD hopping[14] | VDD hopping[14]     |            | RRDV[9]      |
| Vos Rev. Bias |                 | LBSF[4]             | SCCMOS[13] | SSICMOS[16]  |
| Variable af   | Gated clock     |                     |            |              |
| aC            | Bus Shuffling   |                     |            |              |
|               |                 |                     |            |              |

LBSF: Leakage Bypass with Stack Forcing LBD: Leakage-Biased Domino Circuits MTCMOS: Multi-Threshold CMOS VTCMOS: Variable Threshold CMOS BGMOS: Boosted Gate MOS SCCMOS: Super-Cut-Off CMOS DLC: Dynamic Leakage Control RRDV: Row by Row Dynamic Voltage Control SSICMOS: Switched-Source-Impedance CMOS

Fig.16 Detailed table for low-power techniques