SIGDA Super Compendium, ISLPED 2000, Abstracts

<BODY> <H3><A HREF="../../../../../main_ss.htm">Main Page</A></BR> ISLPED 2000 : [ <A HREF="lpedtoc.htm">Table of Contents</A> ] [ <A HREF="lpedsess.htm">Session Index</A> ] [ <A HREF="lpedai.htm">Author Index</A> ]</H3> <HR> <H2>ISLPED 2000 Abstracts</H2> <B> Sessions: [<A HREF="#invit1">Invited Talk 1</A>] [<A HREF="#sess1">1</A>] [<A HREF="#sess2">2</A>] [<A HREF="#sess3">3</A>] [<A HREF="#sess4">4</A>] [<A HREF="#sess5">5</A>] [<A HREF="#sess6">6</A>] [<A HREF="#evepanel">Evening Panel</A>] [<A HREF="#invit2">Invited Talk 2</A>] [<A HREF="#sess7">7</A>] [<A HREF="#sess8">8</A>] [<A HREF="#poster1">Poster Session 1</A>] [<A HREF="#poster2">Poster Session 2</A>] [<A HREF="#sess9">9</A>] [<A HREF="#sess10">10</A>] [<A HREF="#tut1">Embedded Tutorial 1</A>] [<A HREF="#tut2">Embedded Tutorial 2</A>] </B> <P> <HR> <H2>Opening and Welcome <BR> Wednesday, July 26th, 2000, 9:00 - 9:15</H2> <I>Welcome: David Blaauw (Motorola, General Co-Chair of ISLPED '00)<BR> Symposium Highlights: Enrico Macii (Politecnico di Torino, Tech. Prog. Co-Chair of ISLPED '00)</I><P> <HR> <H2> Keynote Speech<BR> Wednesday, July 26th, 2000, 9:15 - 10:00 </H2> <DL> <DT> Transmeta's Crusoe: Low-Power x86-Compatible Microprocessors Built with Software [p. xiv]<BR> <I>Speaker: David R. Ditzel (Transmeta Corp.)<BR> Session Chair: E. Macii (Politecnico di Torino)</I><P> </DL> <HR> <H2><A NAME="invit1"> Invited Talk 1:<BR> </A> Wednesday, July 26th, 2000, 10:00 - 10:45 </H2> <DL> <DT> <A HREF="../../pdffiles/invit1.pdf" NAME="invit1"><IMG SRC="../../../../../icons/pdficon.gif"></A> Low-Power Micromachined Microsystems [p. 1] <DD><I>Speaker: Khalil Najafi (University of Michigan)<BR> Session Chair: D. Blaauw (Motorola)</I><P> Micromachined microsystems and Micro Electro Mechanical Systems (MEMS) have made possible the development of highly accurate and portable sensors and instrument for a variety of applications in the health care, industrial, consumer products, avionics, and defense. Design of low-power circuits for these applications, and use of micromachined sensors and actuators in combination with integrated circuits to implement even lower power microinstruments has now become possible and the focus of attention. This paper reviews the state of the art in the development of micromachined microsystems and MEMS, discusses low-power design approaches for microsystems, and reviews some recent development in power generation and energy harvesting from the environment.<BR> Keywords: MEMS, Micromachining, Low-Power, Microsystems, Power Sources, Energy Harvesting<P> </DL> <HR> <H2>Session 1: <A NAME="sess1"> Power Efficient Digital Circuits and Technologies I<BR> </A> Wednesday, July 26th, 2000, 11:15 - 12:45 </H2> <I>Session Chair: B. Athas (House Ear Institute)<BR> Session Organizer: J. Burr (Sun Microsystems)</I><P> <DL> <DT> <A HREF="../../pdffiles/01_1.pdf" NAME="01_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> 1.1: Design Issues for Dynamic Voltage Scaling [p. 9] <DD><I> Thomas D. Burd, Robert W. Brodersen (University of California at Berkeley) </I><P> Processors in portable electronic devices generally have a computational load which has time-varying performance requirements. Dynamic Voltage Scaling is a method to vary the processor's supply voltage so that it consumes the minimal amount of energy by operating at the minimum performance level required by the active software processes. A dynamically varying supply voltage has implications on the processor circuit design and design flow, but with some minimal constraints it is straightforward to design a processor with this capability.<BR> Keywords: Energy efficient, variable voltage, processor, circuit design.<P> <DT> <A HREF="../../pdffiles/01_2.pdf" NAME="01_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> 1.2: Dual-V<SUB>T</SUB> SRAM Cells with Full-Swing Single-Ended Bit Line Sensing for High-Performance On-Chip Cache in 0.13 µm Technology Generation [p. 15] <DD><I> Fatih Hamzaoglu, Yibin Ye, Ali Keshavarzi, Kevin Zhang, Siva Narendra, Shekhar Borkar, Vivek De (Intel), Mircea Stan (Univ. of Virginia) </I><P> Comparisons among different dual-VT design choices for a large on-chip cache with single-ended sensing show that the design using a dual-VT cell and low-VT peripheral circuits is the best, and provides 10% performance gain with 1.2x larger active leakage power, and 1.6% larger cell area compared to the best design using high-VT cells.<BR> Keywords: Dual-VT, SRAM, Single-Ended Sensing.<P> <DT> <A HREF="../../pdffiles/01_3.pdf" NAME="01_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> 1.3s: An Adaptive On-Chip Voltage Regulation Technique for Low-Power Applications [p. 20] <DD><I> Nicola Dragone (ST Microelectronics), Akshay Aggarwal, L. Richard Carley (Carnegie Mellon) </I><P> In this paper we present a completely on-chip voltage regulation technique which promises to adjust the degree of voltage regulation in a digital logic chip in the face of process induced delay variations so as to minimize energy dissipation while always guaranteeing the target operating frequency. For this purpose the delay of a critical path replica of the circuit being regulated is constantly compared with the target delay to provide the regulator with the information needed to select the optimum voltage levels. The proposed solution is even more attractive in that no external components are required. Based on this scheme, a completely on-chip voltage regulator has been fabricated in a commercial 0.5 um CMOS process and used to generate the inner rail voltages for a DSP multiplier-accumulator (MAC) implemented in mixed swing QuadRail. Measured results indicate that the voltages generated by the regulator offer a very high degree of load regulation thus verifying the fast response time of the on-chip output buffer.<P> <DT> <A HREF="../../pdffiles/01_4.pdf" NAME="01_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> 1.4s: Robust Ultra-Low Power Sub-threshold DTMOS Logic [p. 25] <DD><I> Hendrawan Soeleman, Kaushik Roy, Bipul Paul (Purdue University) </I><P> Digital sub-threshold logic circuits have recently been proposed for applications in the ultra-low power end of the design spectrum, where the performance is of secondary importance. To improve switching performance of the subthreshold logic family with comparable energy/switching, we propose the use of sub-DTMOS (sub-threshold Dynamic Threshold MOS) transistors. The stability of sub-threshold DTMOS logic to temperature and process variations eliminates the need of additional stabilization scheme that may be required for regular sub-threshold MOS logic families to ensure proper operation in the sub-threshold region.<P> </DL> <HR> <H2>Session 2: <A NAME="sess2"> Algorithmic and Systems Issues<BR></A> Wednesday, July 26th, 2000, 11:15 - 12:45 </H2> <I>Session Chair: M. Martonosi (Princeton University)<BR> Session Organizer: K. Usami (Toshiba)</I><P> <DL> <DT> <A HREF="../../pdffiles/02_1.pdf" NAME="02_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> 2.1: Algorithmic Transforms for Efficient Energy Scalable Computation [p. 31] <DD><I> Amit Sinha, Alice Wang, Anantha P. Chandrakasan (Massachusetts Institute of Technology) </I><P> We introduce the notion of energy scalable computation on general purpose processors. The principle idea is to maximize computational quality for a given energy constraint. The desirable energy-quality behavior of algorithms is discussed. Subsequently the energy-quality scalability of three distinct categories of commonly used signal processing algorithms (viz. filtering, frequency domain transforms and classification) are analyzed on the StrongARM SA-1100 processor and transformations are described which obtain significant improvements in the energy-quality scalability of the algorithm.<P> <DT> <A HREF="../../pdffiles/02_2.pdf" NAME="02_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> 2.2: Operating-System Directed Power Reduction [p. 37] <DD><I>Yung-Hsiang Lu, Giovanni De Micheli (Stanford University), Luca Benini (Università di Bologna)</I><P> This paper presents a new approach for power reduction by taking a global, software-centri view. It analyzes the sources of power consumption: tasks that require services from hardware components. When a component is not used by any task, it an enter a sleeping state to save power. Operating systems have detailed information about tasks; therefore, S is the best place for power reduction. Our technique is effective in identifying hardware idleness and shutting down unused components. We implement this technique in Linux and show that it can save more than 50% power compared to traditional hardware-centri shutdown techniques.<P> <DT> <A HREF="../../pdffiles/02_3.pdf" NAME="02_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> 2.3s: Energy Minimization with Guaranteed Quality of Service [p. 43] <DD><I>Gang Qu, Miodrag Potkonjak (University of California at Los Angeles)</I><P> Quality of service (QoS) is one of the key features for new Internet-based multimedia and other applications. Meanwhile, energy remains as a big concern for systems that perform such applications. We address the issue of combining system design concerns and QoS requirements to design systems that can deliver QoS guarantees. In this paper, we discuss how to satisfy QoS requirements and minimize the system's energy consumption. Specifically, we consider the following problem: Given a set of applications each specifying its required amount of computation and service time, how we allocate CPU time and determine the voltage profile on a variable voltage system, such that all the applications' requirements are satisfied and the system's total energy consumption is minimized. We optimally solve several basic cases and propose a dynamic programming procedure for the general case. Simulation shows that the new approach saves 38.75% energy over the system shut-down technique.<P> <DT> <A HREF="../../pdffiles/02_4.pdf" NAME="02_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> 2.4s Energy Efficient Design of Portable Wireless Systems [p. 49] <DD><I>Tajana Simunic, Haris Vikalo, Peter Glynn, Giovanni De Micheli (Stanford University)</I><P> Portable wireless systems require long battery lifetime while still delivering high performance. The major contribution of this work is combining new power management (PM) and power control (PC) algorithms to trade off performance for power consumption at the system level in portable devices. First we present the formulation for the solution of the PM policy optimization based on renewal theory. Next we present the formulation for power control (PC) of the wireless link that enables us to obtain further energy savings when the system is active. Finally, we discuss the measurements obtained for a set of PM and PC algorithms implemented for the WLAN card on a laptop. The PM policy we developed based on our renewal model consumes three times less power as compared to the default PM policy for the WLAN card with still high performance. Power control saves additional 53% in energy at same bit error rate. With both power control and power management algorithms in place, we observe on average a factor of six in power savings.<P> </DL> <HR> <H2>Session 3: <A NAME="sess3"> Mixed Signal Design <BR></A> Wednesday, July 26th, 2000, 14:30-16:00 </H2> <I>Session Chair: M. Pelgrom (Philips)<BR> Session Organizer: S. Mukherjee (Philips) </I><P> <DL> <DT> <A HREF="../../pdffiles/03_1.pdf" NAME="03_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> 3.1 Power Consumption Reduction in High-Speed Sigma-Delta Bandpass Modulators [p. 55] <DD><I>P. Cusinato, F. Stefani (STMicroelectronics), A. Baschirotto (University of Lecce)</I><P> Power consumption is a key point in the design of high-speed switched capacitor (SC) circuits, which allow to efficiently implement a number of analog functions. Among them, SC Signal-Delta modulators are very popular for A/D conversion: in this kind of circuits, operational amplifiers are the most consuming cells because of their requirements in terms of DC gain and unity-gain frequency. A new amplifier with 110dB DC gain and a unity-gain frequency of 250MHz is presented. The large power consumption (20mW) makes critical its use in commercial applications: however, combining this cell with a fast adaptive biasing circuit, high performance may be achieved with a reasonable dissipation. This approach has been used in the design of a 6th-order bandpass Sigma-Delta modulator featuring 73dB DR and suitable for the conversion at IF (10.7MHz) of the FM radio signal.<P> <DT> <A HREF="../../pdffiles/03_2.pdf" NAME="03_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> 3.2 Low Power Mixed Analog-Digital Signal Processing [p. 61] <DD><I>Mattias Duppils, Christer Svensson (Linköpings Universitet)</I><P> The power consumption of mixed-signal systems featured by an analog front-end, a digital back-end, and with signal processing tasks that can be computed with multiplications and accumulations, is analyzed. An implementation is proposed, composed of switched-capacitor mixed analog/digital multiply-accumulate units in the analog front-end, followed by an A/D converter. This implementation is shown to be superior in respect of power consumption compared to an equivalent implementation with a high-speed A/D converter in the front-end, to execute signal processing tasks that include decimation. The power savings are only due to relaxed requirement on A/D conversion rate, as a direct consequence of the decimation. In a case study of a narrowband FIR filter, realized with four multiply-accumulate units, and with a decimation factor of 100; power saving is 54 times. Implementation details are given, the power consumption, and the thermal noise are analyzed.<P> <DT> <A HREF="../../pdffiles/03_3.pdf" NAME="03_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> 3.3s A Low-Power Clock and Data Recovery Circuit for 2.5 Gb/s SDH Receivers [p. 67] <DD><I>Andrea Pallotta (Siemens ICN), Francesco Centurelli, Alessandro Trifiletti (Univ. di Roma "La Sapienza")</I><P> A low power monolithic Clock and Data Recovery IC for 2.5 Gb/s SDH STM-16 systems has been designed and fabricated using Maxim GST-2 27 GHz-fT Silicon bipolar technology. The circuit performs the following functions: signal amplification and limitation, clock recovery and decision; a single 3.3 V supply voltage is required, and power consumption results below 350 mW. This IC and a previously presented transimpedance amplifier so allows composing a chip set for the receiver with a total power dissipation below 0.5 W. Preliminary measurements under a 2 23 -1 PRBS data stream have shown an input sensitivity below 20 mVpp and a rms jitter of 10 ps.<BR> Keywords: Clock recovery, optical communications, SDH, low power.<P> <DT> <A HREF="../../pdffiles/03_4.pdf" NAME="03_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> 3.4s A Micro-Power Mixed Signal IC for Battery-Operated Burglar Alarm Systems [p. 73] <DD><I>Silvio Bolliri, Luigi Raffo (University of Cagliari), Paolo Porcu (ELSA SPA)</I><P> The design of the standard CMOS IC core of a commercial wireless burglar alarm system is presented as an example of a very low-power analog VLSI design for battery-operated systems. The main constraint is battery life, which must be at least five years (with standard camera-battery). The chip is composed of a digital (decision) part and an analog interface with sensors. The entire chip absorbs 10 uA. Measures on each single component and test on working environment show full functionality and complied with specifications. Even though the example is application specific, the design solutions and each single element can also be utilized in many other battery-operated low-frequency devices (e.g. environmental parameter monitoring).<P> </DL> <HR> <H2>Session 4: <A NAME="sess4"> Architecture and Memory Design <BR></A> Wednesday, July 26th, 2000, 14:30 - 16:00 </H2> <I>Session Chair: B. Moyer (Motorola)<BR> Session Organizer: C. Piquet (CSEM) </I><P> <DL> <DT> <A HREF="../../pdffiles/04_1.pdf" NAME="04_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> 4.1 A Recursive Algorithm for Low-Power Memory Partitioning [p. 78] <DD><I>Luca Benini (Universita di Bologna), Alberto Macii, Massimo Poncino (Politecnico di Torino)</I><P> Memory-processor integration offers new opportunities for reducing the energy of a system. In the case of embedded systems, one solution consists of mapping the most frequently accessed addresses onto the on-chip SRAM to guarantee power and performance efficiency. This option is especially effective when memory access patterns can be profiled and studied at design time (as in typical real-time embedded systems). In this work, we propose an algorithm for the automatic partitioning of on-chip SRAM in multiple banks that can be independently accessed. Starting from the dynamic execution profile of an embedded application running on a given processor core, we synthesize a multi-banked SRAM architecture optimally fitted to the execution profile. The algorithm provides a globally optimum solution to the problem under realistic assumptions on the power cost metrics, and with constraints on the number of memory banks. Results, collected on a set of embedded applications for the ARM processor, have shown average energy savings around 42%.<P> <DT> <A HREF="../../pdffiles/04_2.pdf" NAME="04_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> 4.2 Optimization of High-Performance Superscalar Architectures for Energy Efficiency [p. 84] <DD><I>V. Zyuban, P. Kogge (University of Notre-Dame)</I><P> In recent years reducing power has become a critical design goal for high-performance microprocessors. This work attempts to bring the power issue to the earliest phase of high-performance microprocessor development. We propose a methodology for power-optimization at the micro-architectural level. First, major targets for power reduction are identified within superscalar microarchitecture, then an optimization of a superscalar micro-architecture is performed that generates a set of energy-efficient configurations forming a convex hull in the power-performance space. The energy-efficient families are then compared to find configurations that dissipate the lowest power given a performance target, or, conversely, deliver the highest performance given a power budget. Application of the developed methodology to a superscalar micro-architecture shows that at the architectural level there is a potential for reducing power up to 50%, given a performance requirement, and for up to 15% performance improvement, given a power budget.<P> <DT> <A HREF="../../pdffiles/04_3.pdf" NAME="04_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> 4.3s Gated-V<SUB>dd</SUB>: A Circuit Technique to Reduce Leakage in Deep-Submicron Cache Memories [p. 90] <DD><I>Michael Powell, Se-Hyun Yang, Babak Falsafi, Kaushik Roy, T. N. Vijaykumar (Purdue University)</I><P> Deep-submicron CMOS designs have resulted in large leakage energy dissipation in microprocessors. While SRAM cells in on-chip cache memories always contribute to this leakage, there is a large variability in active cell usage both within and across applications. This paper explores an integrated architectural and circuit-level approach to reducing leakage energy dissipation in instruction caches. We propose, gated-V dd , a circuit-level technique to gate the supply voltage and reduce leakage in unused SRAM cells. Our results indicate that gated-V dd together with a novel resizable cache architecture reduces energy-delay by 62% with minimal impact on performance.<P> <DT> <A HREF="../../pdffiles/04_4.pdf" NAME="04_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> 4.4s Voltage Scheduling in the lpARM Microprocessor System [p. 96] <DD><I>Trevor Pering, Thomas Burd, Robert Brodersen (University of California at Berkeley)</I><P> Microprocessors represent a significant portion of the energy consumed in portable electronic devices. Dynamic Voltage Scaling (DVS) allows a device to reduce energy consumption by lowering its processor speed at run-time, allowing a corresponding reduction in processor voltage and energy. A voltage scheduler determines the appropriate operating voltage by analyzing application constraints and requirements. A complete software implementation, including both applications and the underlying operating system, shows that DVS is effective at reducing the energy consumed without requiring extensive software modification.<BR> Keywords: Low-power, energy-efficient, RTOS, operating systems.<P> </DL> <HR> <H2>Session 5: <A NAME="sess5"> Digital Logic <BR></A> Wednesday, July 26th, 2000, 11:15-12:45 </H2> <I>Session Chair: V. Germini (Compaq)<BR> Session Organizer: D. Garrett (Lucent) </I><P> <DL> <DT> <A HREF="../../pdffiles/05_1.pdf" NAME="05_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> 5.1 MOS Current Mode Logic for Low Power, Low Noise CORDIC Computation in Mixed-Signal Environments [p. 102] <DD><I>Jason M. Musicer, Ja. Rabaey (University of California at Berkeley)</I><P> In this work, MOS Current Mode Logic (MCML) is analyzed for application to low power, mixed signal environments. A small MCML cell library is developed and optimized for several different performance requirements. The cells are then applied to the generation of piplelined CORDIC structures and compared with equivalent CMOS circuits. MCML CORDICs are designed which can operate from 125MHz to 310MHz with power consumption varying between 4.3mW and 18.6mW. These power results are up to 1.5 times less than CMOS CORDICs with equivalent propagation delays. Design was done in a 0.25µm standard CMOS process from ST Microelectronics.<BR> Keywords: Current mode logic, CORDIC, Low-energy design, Digital logic.<P> <DT> <A HREF="../../pdffiles/05_2.pdf" NAME="05_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> 5.2 Noise-Aware Power Optimization for On-Chip Interconnect [p. 108] <DD><I>Ki-Wook Kim, Seong-Ook Jung, Sung-Mo Kang (University of Illinois at Urbana-Champaign), Unni Narayanan (Intel Corp.), C. L. Liu (National Tsing Hua University)</I><P> Realization of high-performance domino logic depends strongly on energy-efficient and noise-tolerant interconnect design in ultra deep submicron processes. We characterize the cycle-averaged power model for interconnects accounting for switching statistics and dynamic behaviors. For the sake of signal integrity, cross-coupling effects are also characterized which reflect logical correlation between adjacent wires. Based on the new models for interconnect power and capacitive crosstalk, we optimize the coupling power consumed by interconnects with crosstalk constraints. Experimental results show that optimized designs save the power consumption significantly.<P> <DT> <A HREF="../../pdffiles/05_3.pdf" NAME="05_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> 5.3s New Clock-Gating Techniques for Low Power Flip-Flips [p. 114] <DD><I>A. G. M. Strollo, E. Napoli, D. De Caro (University of Naples "Federico II")</I><P> Two novel low power flip-flops are presented in the paper. Proposed flip-flops use new gating techniques that reduce power dissipation deactivating the clock signal. Presented circuits overcome the clock duty-cycle limitation of previously reported gated flip-flops. Circuit simulations with the inclusion of parasitics show that sensible power dissipation reduction is possible if input signal has reduced switching activity. A 16-bit counter is presented as a simple low power application.<BR> Keywords: CMOS digital integrated circuits, flip-fops, low-power circuits, transition probability.<P> <DT> <A HREF="../../pdffiles/05_4.pdf" NAME="05_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> 5.4s An Improved Pass Transistor Synthesis Method for Low Power, High Speed CMOS Circuits [p. 120] <DD><I>Tudor Vinereanu, Sverre Lidholm (Cork University College)</I><P> A synthesis method for generating hybrid pass gate circuits is presented. These circuits combine features from both complementary CMOS and pass gates architectures. The simulation results using a 0.7 um technology show that circuits synthesized according to the proposed method may achieve significant improvements in terms of area, power and delay over traditional full swing pass transistor logic and complementary CMOS.<P> </DL> <HR> <H2>Session 6: <A NAME="sess6"> Synthesis and Optimization<BR></A> Wednesday, July 26th, 2000, 11:15 - 12:45 </H2> <I>Session Chair: D. Marculescu (Carnegie Mellon)<BR> Session Organizer: M. Poncino (Politecnico di Torino)</I><P> <DL> <DT> <A HREF="../../pdffiles/06_1.pdf" NAME="06_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> 6.1: Achieving Utility Arbitrarily Close to the Optimal with Limited Energy [p. 125] <DD><I> Gang Qu, Miodrag Potkonjak (University of California at Los Angeles) </I><P> Energy is one of the limited resources for modern systems, especially the battery-operated devices and personal digital assistants. The backlog in new technologies for more powerful battery is changing the traditional system design philosophies. For example, due to the limitation on battery life, it is more realistic to design for the optimal benefit from limited resource rather than design to meet all the applications' requirement. We consider the following problem: a system achieves a certain amount of utility from a set of applications by providing them certain levels of quality of service (QoS). We want to allocate the limited system resources to get the maximal system utility. We formulate this utility maximization problem, which is NP-hard in general, and propose heuristic algorithms that are capable of finding solutions provably arbitrarily close to the optimal. We have also derived explicit formulae to guide the allocation of resources to actually achieve such solutions. Simulation shows that our approach can use 99.9% of the given resource to achieve 25.6% and 32.17% more system utilities over two other heuristics, while providing QoS guarantees to the application program.<P> <DT> <A HREF="../../pdffiles/06_2.pdf" NAME="06_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> 6.2: Power Minimization of Functional Units by Partially Guarded Computation [p. 131] <DD><I> Junghwan Choi, Jinhwan Jeon, Kiyoung Choi (Seoul National University) </I><P> This paper deals with power minimization problem for data-dominated applications based on a novel concept called partially guarded computation. We divide a functional unit into two parts - MSP (Most Significant Part) and LSP (Least Significant Part) - and allow the functional unit to perform only the LSP computation if the range of output data can be covered by LSP. We dynamically disable MSP computation to remove unnecessary transitions thereby reducing power consumption. We also propose a systematic approach for determining optimal location of the boundary between the two parts during high-level synthesis. Experimental results show about 10~44% power reduction with about 30~36% area overhead and less than 3% delay overhead in functional units.<BR> Keywords: Low Power, Partially Guarded Computation<P> <DT> <A HREF="../../pdffiles/06_3.pdf" NAME="06_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> 6.3s: Systematic Cycle Budget versus System Power Trade-off: a New Perspective on System Exploration of Real-Time Data-Dominated Applications [p. 137] <DD><I> Erik Brockmeyer, Arnout Vandecappelle, Francky Catthoor (IMEC) </I><P> In contrast to current design practice for (programmable) processor mapping, which mainly targets performance, we focus on a systematic trade-off between cycle budget and energy consumed in the background memory organization. The latter is a crucial component in many of today's designs, including multi-media, network protocols and telecom signal processing. We have a systematic way and tool to explore both freedoms and to arrive at Pareto charts, in which for a given application the lowest cost implementation of the memory organization is plotted against the available cycle budget per submodule. This by making optimal usage of a parallelized memory architecture. We indicate, with results on a digital audio broadcasting receiver and an image compression demonstrator, how to effectively use the Pareto plot to gain significantly in overall system energy consumption within the global real-time constraints. <DT> <A HREF="../../pdffiles/06_4.pdf" NAME="06_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> 6.4s: Low Power Sequential Circuit Design by Using Priority Encoding and Clock Gating [p. 143] <DD><I> Xunwei Wu (Ningbo University), Massoud Pedram (University of Southern California) </I><P> This paper presents a state assignment technique called priority encoding, which uses multi-code assignment plus clock gating to reduce power dissipation in sequential circuits. The basic idea is to assign multiple codes to states so as to enable more effective clock gating in the sequential circuit. Practical design examples are studied and simulated by PSPICE. Experimental results demonstrate that the priority encoding technique can result in sizable power saving. <P> </DL> <HR> <H2><A NAME="evepanel"> Evening Panel<BR></A> Wednesday, July 26th, 2000, 18:30 - 20:00 </H2> <DL> <DT> Do Our Low-Power Tools Have Enough Horse Power? [p. 149]<BR> <I>Moderator: Giovanni De Micheli (Stanford University)<BR> Panelists: Tony Correale (IBM), Hugo De Man (IMEC), Pietro Erratico (ST), Jerry Frankil (Sente), Srini Raghvendra (Synopsys), Vivek Tiwari (Intel)</I><P> </DL> <HR> <H2><A NAME="invit2"> Invited Talk 2:<BR> </A> Thursday, July 27th, 2000, 9:00 - 9:45 </H2> <DL> <DT> <A HREF="../../pdffiles/invit2.pdf" NAME="invit2"><IMG SRC="../../../../../icons/pdficon.gif"></A> Low-Power Considerations in the Design of Bluetooth [p. 151] <DD> <I>Speaker: Sven Mattisson (Ericsson)<BR> Session Chair: T. Gabara (Lucent)</I><P> </I><P> In this paper, we review the Bluetooth technology, a new universal radio interface enabling electronic devices to connect and communicate wirelessly via short-range connections. Motivations for the radio requirements are given, and the implications of system parameters like operating modes, frequency hopping, interference resistance are discussed from a low-power perspective. Specific characteristics enabling low-cost single-chip implementations and supporting low power consumption are outlined.<P> </DL> <HR> <H2>Session 7: <A NAME="sess7"> Power Efficient Digital Circuits and Technologies II<BR></A> Thursday, July 27th, 2000, 10:15 - 11:45 </H2> <I>Session Chair: K. Roy (Purdue University)<BR> Session Organizer: B. Barton (TI)</I><P> <DL> <DT> <A HREF="../../pdffiles/07_1.pdf" NAME="07_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> 7.1: High-Speed Dynamic Logic Styles for Scaled-Down CMOS and MTCMOS Technologies [p. 155] <DD><I> Mohamed W. Allam, Mohab H. Anis, Mohamed I. Elmasry (University of Waterloo) </I><P> A new high-speed Domino circuit, called HS-Domino is developed. HS-Domino resolves the trade-off between performance and noise margins in conventional CD-Domino logic while dissipating low dynamic power with minimal area overhead. A dual-threshold (MTCMOS) implementation of HS-Domino and DDCVS logic is also devised. This implementation achieves low leakage values during standby, while maintaining high performance and low dynamic power during the active mode.<P> <DT> <A HREF="../../pdffiles/07_2.pdf" NAME="07_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> 7.2: A Three-Port nRERL Register File for Ultra-Low-Energy Applications [p. 161] <DD><I> Jun-Ho Kwon, Soo-Ik Chae (Seoul National University), Joonho Lim (Global Communication Technology) </I><P> In this paper, we propose an adiabatic register file for ultra-low-energy applications, which uses a new reversible adiabatic logic, nRERL [1]. The nRERL register file discards garbage information with minimal energy dissipation. We designed a 16x8b three-port nRERL register file. From SPICE simulations, we found that the nRERL register file consumes less than 10% of the energy consumed in the conventional register file at the frequency of lower than 1MHz. We also describe how to design a RAM, a large array of the storage cells.<P> <DT> <A HREF="../../pdffiles/07_3.pdf" NAME="07_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> 7.3s: Minimum Power And Area N-Tier Multilevel Interconnect Architectures Using Optimal Repeater Insertion [p. 167] <DD><I> Raguraman Venkatesan, Jeffrey A. Davis, Keith A. Bowman, James D. Meindl (Georgia Institute of Technology) </I><P> Minimum power CMOS ASIC macrocells are designed by minimizing the macrocell area using a new methodology to optimally insert repeaters for n-tier multilevel interconnect architectures. The minimum macrocell area and power dissipation are projected for the 100, 70 and 50 nm technology generations and compared with a n-tier design without using repeaters. Repeater insertion and a novel interconnect geometry scaling technique decrease the power dissipation by 58-68% corresponding to a macrocell area reduction of 70-78% for the global clock frequency designs of these three technology generations.<P> <DT> <A HREF="../../pdffiles/07_4.pdf" NAME="07_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> 7.4s: Practical Considerations of Clock-Powered Logic [p. 173] <DD><I> William Athas (House Ear Institute) </I><P> Recovering and reusing circuit energies that would otherwise be dissipated as heat can reduce the power dissipated by a VLSI chip. To accomplish this requires a power source that can efficiently inject and extract energy, and an efficient power delivery system to connect the power source to the circuit nodes. The additional circuitry and timing required to support this process can readily exceed the power-savings benefit. Clock-powered logic is a circuit-level, energy-recovery approach that has been implemented in two generations of small-scale microprocessor experiments. The results have shown that it is possible and practical to extract useful amounts of power savings by leveraging the additional circuitry for other compatible purposes. The capabilities and limitations of clock-powered logic as a competitive low-power approach are presented and discussed in this paper.<BR> Keywords: Energy-recovery CMOS, clock-powered logic, adiabatic charging, microprocessors, ER-CMOS, supply-voltage scaling.<P> </DL> <HR> <H2>Session 8: <A NAME="sess8"> Modeling and Simulation <BR></A> Thursday, July 27th, 2000, 10:15-11:45 </H2> <I>Session Chair: V. Tiwari (Intel)<BR> Session Organizer: M. Nemani (Intel)</I><P> <DL> <DT> <A HREF="../../pdffiles/08_1.pdf" NAME="08_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> 8.1 Model and Analysis for Combined Package and On-Chip Power Grid Simulation [p. 179] <DD><I>Rajendran Panda, David Blaauw, Rajat Chaudhry, Vladimir Zolotov, Brian Young, Ravi Ramaraju (Motorola)</I><P> We present new modeling and simulation techniques to improve the accuracy and efficiency of transient analysis of large power distribution grids. These include an accurate model for the inherent decoupling capacitance of non-switching devices, as well as a statistical switching current model for the switching devices. Moreover, three new simulation techniques are presented for problem size-reduction and speed-up. Results of application of these techniques on three PowerPC tm microprocessors are also presented.<P> <DT> <A HREF="../../pdffiles/08_2.pdf" NAME="08_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> 8.2 Cycle-Accurate Energy Consumption Measurement and Analysis: Case Study of ARM7TDMI [p. 185] <DD><I>Naehyuck Chang, Kwanho Kim, Hyung Gyu Lee (Seoul National University) </I><P> We introduce an energy consumption analysis of complex digital systems through a case study of ARM7TDMI RISC processor by using a new energy measurement technique. We developed a cycle-accurate energy consumption measurement system based on charge transfer which is robust to spiky noise and is capable of collecting a range of power consumption profiles in real time. The relative energy variation of the RISC core is measured by changing the opcode, the instruction fetch address, the register number, the register value, the data fetch address, and the immediate operand value in each pipeline stage, respectively. We demonstrated energy characterization of a pipelined RISC processor for high-level power reduction.<P> <DT> <A HREF="../../pdffiles/08_3.pdf" NAME="08_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> 8.3s Speeding up Power Estimation of Embedded Software [p. 191] <DD><I>Akshaye Sama, J. F. M. Theeuwen (Philips Research), M. Balakrishnan (Indian Institute of Technology)</I><P> Power is increasingly becoming a design constraint for embedded systems. A processor is responsible for energy consumption on account of the software component of the embedded system. The power estimation of this component is a major concern due to the rising complexities of processors and the slow estimation tools. This work attempts to estimate the energy dissipation of the PR1900 - processor based on instruction set model with improved accuracy. The model is integrated in a simulation framework and validated. Over 200 times speedup has been obtained with average 1.4% loss in accuracy over gate level estimation. Analysis of the energy dissipated by the instruction vis a vis the processor architecture has been carried out and a substantial reduction in the measurement effort to build the processor energy model has been achieved.<P> <DT> <A HREF="../../pdffiles/08_4.pdf" NAME="08_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> 8.4s High-Level Power Estimation with Interconnect Effects [p. 197] <DD><I>Kavel M. Büyüksahin (University of Illinois at Urbana-Champaign), Farid N. Najm (University of Toronto)</I><P> We extend earlier work on high-level average power estimation to include the power due to interconnect loading. The resulting technique is a combination of a RTL-level gate count prediction method and average interconnect estimation based on Rent's rule. The method can be adapted to be used with different place and route engines and standard cell libraries. For a number of benchmark circuits, the method is verified by extracting wire lengths from a layout of each circuit and then comparing the predicted (at RTL) power against that measured using SPICE. An average error of 14.4% is obtained for the average interconnect length, and an average error of 25.8% is obtained for average power estimation including interconnect effects.<BR> Categories and Subject Descriptors: B.7.2 [Hardware]: Integrated Circuits|Design Aids<BR> General Terms: High-level power estimation, Register transfer level (RTL) power estimation, Interconnect capacitance estimation<P> </DL> <HR> <H2> <A NAME="poster1"> Poster Session 1 <BR></A> Thursday, July 27th, 2000, 11:45-12:30 </H2> <I>Session Chair: W. Troutman (Lucent)<BR> Session Organizer: A. Chandrakasan (MIT)</I><P> <DL> <DT> <A HREF="../../pdffiles/p1_1.pdf" NAME="p1_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> P1.1 "Cool Low Power" 1 GHz Multi-Port Register File and Dynamic Latch in 1.8V, 0.25um SOI and Bulk Technology [p. 203] <DD><I>R. V. Joshi, W. Hwang, S. C. Wilson, C. T. Chuang (T. J. Watson Research Center)</I><P> This paper describes power analysis at sub-zero temperatures for a high performance dynamic multiport register file (6 Read and 2 Write ports, 32 wordlines x 64 bitlines) fabricated in 0.25 um Silicon on Insulator (SOI) and bulk technologies. Based on the hardware it is shown that the performance of both register file and latch improves by 2-3.5% per 10 0 C reduction in temperature. The standby power for SOI reduces by 1.5% to 3% per 10 0 C temperature drop down to _30 deg C. The SOI chip is shown to have more significant performance improvement at low temperatures compared to bulk chip due to the floating body effect which partially offsets the increase in the threshold voltages (Vt). The low temperature performance gain is attributed to reduction in capacitance (around 7-8%) and rest is due to dynamic threshold voltages. At _30 deg C the register file is capable of functioning close to 1.02 GHz for read and write operations in a single cycle.<P> <DT> <A HREF="../../pdffiles/p1_2.pdf" NAME="p1_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> P1.2 Low-Power Digital Filtering Using Multiple Voltage Distribution and Adaptive Voltage Scaling [p. 207] <DD><I>Sandeep Dhar, Dragan Maksimovic (University of Colorado at Boulder)</I><P> This paper describes an adaptive power management architecture to reduce power consumption in digital filters. The proposed approach combines two low-power techniques which utilize supply voltage reduction. The first technique, multiple voltage distribution (MVD), attempts to reduce power consumption by assigning reduced supply voltages to circuit modules while satisfying timing constraints. The second technique, adaptive voltage scaling (AVS), dynamically adjusts these multiple voltages to meet throughput requirements resulting in further power reduction. An FIR filter application using the combined MVD-AVS power management scheme for two adaptively scaled supply voltages is shown to consume one-third the power of a fixed supply voltage scheme, and half the power consumed with a single supply AVS.<P> <DT> <A HREF="../../pdffiles/p1_3.pdf" NAME="p1_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> P1.3 Low Power Self-Timed Radix-2 Division [p. 210] <DD><I>Jae-Hee Won, Kiyoung Choi (Seoul National University)</I><P> A self-timed radix-2 division scheme for low power consumption is proposed. By replacing dual-rail dynamic circuits in non-critical data paths with single-rail static circuits, power dissipation is decreased, yet performance is maintained by speculative remainder computation. SPICE simulation results show that the proposed design can achieve 33.8-ns latency for 56-bit mantissa division and 47% energy reduction compared to a fully dual-rail version.<BR> Keywords: Low power, radix-2 division, self-timed, RSD.<P> <DT> <A HREF="../../pdffiles/p1_4.pdf" NAME="p1_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> P1.4 A Rate Selection Algorithm for Quantized Undithered Dynamic Supply Voltage Scaling [p. 213] <DD><I>Lama H. Chandrasena, Michael J. Liebelt (The University of Adelaide) </I><P> In this paper we propose a novel rate calculation algorithm called Quantized Rate Selection (QRS) for quantized un-dithered dynamic supply voltage scaling (DSVS) systems. The algorithm monitors the total buffered workload, and where possible selects a rate value equal to a quantized rate value. At quantized rate values, energy dissipation of quantized DSVS systems approaches continuous voltage level DSVS systems. Our experimental work on FMIDCT computation using nine video sequences and a 4-level quantized undithered system shows that additional energy savings of 1.4 % to 18.5% can be achieved from QRS, compared to the existing averaging technique.<BR> General Terms: Dynamic Supply Voltage Scaling, Averaging, Quantization and Dithering<P> <DT> <A HREF="../../pdffiles/p1_5.pdf" NAME="p1_5"><IMG SRC="../../../../../icons/pdficon.gif"></A> P1.5 Low-Power Sensing and Digitization of Cardiac Signals based on Sigma-Delta Conversion [p. 216] <DD><I>Andrea Gerosa, Arianna Novo, Andrea Neviani (University of Padova)</I><P> In this work we propose an architecture for the acquisition and digitization of cardiac signals in a pace-maker, based on Sigda-Delta modulation. Due to the characteristics of such an application, the proposed system presents the typical design challenges of low-voltage, low-power circuits. The work demonstrates that, thanks to the narrow bandwidth typical of biological signals (50-150 Hz), oversampling conversion techniques can be advantageous in terms of power dissipation at a given dynamic range. The converter is designed in a 0:8um CMOS technology using the switched Op-Amp technique. The Sigma-Delta converter is a third order modulator with an oversampled frequency of about 8KHz and the circuit can operate at a minimum supply voltage of 2 V, while dissipating 2 uW at most. According to simulation results the dynamic range is larger than 50 dB .<P> <DT> <A HREF="../../pdffiles/p1_6.pdf" NAME="p1_6"><IMG SRC="../../../../../icons/pdficon.gif"></A> P1.6 A 1.5V Low-Power Third Order Continuous-Time Lowpass Sigma-Delta A/D Converter [p. 219] <DD><I>Friedel Gerfers, Yiannos Manoli (University of Saarland)</I><P> This paper presents the design of a 3rd-order lowpass Sigma-Delta analog-to-digital (A/D) converter using a continuous-time (CT) loopfilter. The loopfilter has been implemented by using active RC-integrators. The influence of the low supply voltage on the building blocks such as the amplifier and the common mode feedback as well as on the overall Sigma-Delta modulator is discussed. Simulation results of the 1:5V CT Sigma-Delta A/D converter show a 75 dB dynamic range in a bandwidth of 25kHz. The expected power consumption is less than 300uW.<P> <DT> <A HREF="../../pdffiles/p1_7.pdf" NAME="p1_7"><IMG SRC="../../../../../icons/pdficon.gif"></A> P1.7 Design of a Low-Power CMOS Baseband Circuit for Wideband CDMA Testbed [p. 222] <DD><I>Chunlei Shi, Yue Wu, Mohammed Ismail (The Ohio State University)</I><P> In this paper, the design and performance of a CMOS base-band circuit for WCDMA direct conversion receiver are presented. Consisting of one 5th-order anti-aliasing filter, one 4th-order tunable channel filter, and three variable gain amplifier (VGA) stages, the baseband chain provides 72dB gain range with 2dB gain step and is tunable to select three different bandwidths (from 5MHz to 20MHz radio-frequency spacing). It dissipates only 18mW from a single 3V supply. The input IP3 is 10dBm, and the input-referred noise in the passband is 41nV/ pHz.<BR> Keywords: Wideband CDMA, Baseband, Filter, VGA<P> <DT> P1.8 WITHDRAWN<P> <DT> <A HREF="../../pdffiles/p1_9.pdf" NAME="p1_9"><IMG SRC="../../../../../icons/pdficon.gif"></A> P1.9 A Low-Voltage CMOS Multiplier for RF Applications [p. 225] <DD><I>Carl James Debono, Franco Maloberti (University of Pavia), Joseph Micallef (University of Malta)</I><P> A low-voltage analog multiplier operating at 1.2V is presented. The multiplier core consists of four MOS transistors operating in the saturation region. The circuit exploits the quadratic relation between current and voltage of the MOS transistor in saturation. The circuit was designed using standard 0.6um CMOS technology. Simulation results indicate an IP3 of 4.9dBm and a spur free dynamic range of 45dB.<BR> Keywords: Low-voltage, RF, CMOS, analog multiplier.<P> </DL> <HR> <H2><A NAME="poster2"> Poster Session 2<BR></A> Thursday, July 27th, 2000, 11:45 - 12:30 </H2> <I>Session Chair: G. Stamoulis (Intel)<BR> Session Organizer: R. Panda (Motorola)</I><P> <DL> <DT> <A HREF="../../pdffiles/p2_1.pdf" NAME="p2_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> P2.1: Voltage Dependent Gate Capacitance and its Impact in Estimating Power and Delay of CMOS Digital Circuits with Low Supply Voltage [p. 228] <DD><I> Koichi Nose, Takayasu Sakurai (University of Tokyo), Soo-Ik Chae (Seoul National University) </I><P> Gate capacitance has complex voltage dependency on terminal voltages but the impact of this voltage dependency of gate capacitance on power and delay has not been fully investigated, especially, in low-voltage, low-power designs. Introducing an effective gate capacitance, CG,eff, it is shown that the power and delay of CMOS digital circuit can be estimated accurately. CG,effis a strong function of VTH/VDD and VTH/VDD tends to increase in low-voltage region. Hence, the effective capacitance relative to oxide capacitance, COX, is decreasing in low-voltage, low-power designs. Therefore, considering CG,eff in accurate power and delay estimation becomes more important in the future.<BR> Keywords: Gate capacitance, low supply voltage, low-power design.<P> <DT> <A HREF="../../pdffiles/p2_2.pdf" NAME="p2_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> P2.2: Reducing Energy Requirements for Instruction Issue and Dispatch in Superscalar Microprocessors [p. 231] <DD><I> Kanad Ghose (State University of New York) </I><P> Recent studies [MGK 98, Tiw 98] have confirmed that a significant amount of energy is dissipated in the process of instruction dispatching and issue in modern superscalar microprocessors. We propose a model for the energy dissipated by instruction dispatching and issuing logic in modern superscalar microprocessors and validate them through register level simulations and SPICE - measured dissipation coefficients from 0.5 micron CMOS layouts of relevant circuits. Alternative organizations are studied for instruction window buffers that result in energy savings of about 47% over traditional designs.<BR> Keywords: power minimization, superscalar processor, instruction dispatching, instruction issue, window buffer<P> <DT> <A HREF="../../pdffiles/p2_3.pdf" NAME="p2_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> P2.3: Low Power Synthesis of Sum-of-Products Computation [p. 234] <DD><I> K. Masselos, S. Theoharis, P. K. Merakos, T. Stouraitis, C. E. Goutis (University of Patras) </I><P> Novel techniques for the power efficient synthesis of sum-of-product computations are presented. Simple and efficient heuristics for scheduling and assignment are described. Different partly static cost functions are proposed to drive the synthesis tasks. The proposed cost functions target the power consumption either in the buses connecting the functional units with the storage elements or inside the functional units. The partly static nature of the proposed cost functions reduces the time of the synthesis procedure. Experimental results from different relevant digital signal processing algorithmic kernels prove that the proposed synthesis techniques lead to significant power savings.<P> <DT> <A HREF="../../pdffiles/p2_4.pdf" NAME="p2_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> P2.4: A Spatially-Adaptive Bus Interface for Low-Switching Communication [p. 238] <DD><I> Andrea Acquaviva (Universita di Bologna), Riccardo Scarsi (Politecnico di Torino) </I><P> Adaptive encoding has shown to be an effective approach to bus power minimization in situations where characterization of the input statistics is not available. In this paper, we propose a novel technique for adaptive bus encoding that, conversely from existing solutions, exploits spatial correlations in the input data being transmitted to increase the accuracy in the dynamic selection of the encoding function. We discuss the encoding algorithm and we describe an architecture for its implementation as bus interface. We i present experimental data collected in a realistic simulation framework on a number of meaningful benchmarks, and we compare them to those obtained through the application of existing encoding schemes.<P> <DT> <A HREF="../../pdffiles/p2_5.pdf" NAME="p2_5"><IMG SRC="../../../../../icons/pdficon.gif"></A> P2.5: A Low Power Unified Cache Architecture Providing Power and Performance Flexibility [p. 241] <DD><I> Afzal Malik, Bill Moyer, Dan Cermak (Motorola, Inc.) </I><P> Advances in technology have allowed portable electronic devices to become smaller and more complex, placing stringent power and performance requirements on the device's components. The M.CORE M3 architecture was developed specifically for these embedded applications. To address the growing need for longer battery life and higher performance, an 8-Kbyte, 4-way set-associative, unified (instruction and data) cache with programmable features was added to the M3 core. These features allow the architecture to be optimized based on the application's requirements. In this paper, we focus on the features of the M340 cache sub-system and illustrate the effect on power and performance through benchmark analysis and actual silicon measurements.<P> <DT> <A HREF="../../pdffiles/p2_6.pdf" NAME="p2_6"><IMG SRC="../../../../../icons/pdficon.gif"></A> P2.6: Memory System Energy: Influence of Hardware-Software Optimizations [p. 244] <DD><I> G. Esakkimuthu, N. Vijaykrishnan, M. Kandemir, M. J. Irwin (Pennsylvania State University) </I><P> Memory system usually consumes a significant amount of energy in many battery-operated devices. In this paper, we provide a quantitative comparison and evaluation of the interaction of two hardware cache optimization mechanisms (block buffering and sub-banking) and three widely used compiler optimization techniques (linear loop transformation, loop tiling, and loop unrolling). Our results show that the pure hardware optimizations (eight block buffers and four sub-banks in a 4K, 2-way cache) provided up to 4% energy saving, with an average saving of 2% across all benchmarks. In contrast, the pure software optimization approach that uses all three compiler optimizations, provided at least 23% energy saving, with an average of 62%. However, a closer observation reveals that hardware optimization becomes more critical for on-chip cache energy reduction when executing optimized codes.<P> <DT> <A HREF="../../pdffiles/p2_7.pdf" NAME="p2_7"><IMG SRC="../../../../../icons/pdficon.gif"></A> P2.7: Energy-efficient Code Generation for DSP56000 family [p. 247] <DD><I> Sathishkumar Udayanarayanan, Chaitali Chakrabarti (Arizona State University) </I><P> This paper presents a procedure to generate energy-efficient code for the Motorola DSP56K processor based on increasing the packing efficiency and minimizing the number of address instructions. The key features are a novel scheduling algorithm that reduces the dependencies between instructions, a register allocation algorithm that spills variables based on their packability, and an address code generation algorithm that minimizes the number of additional instructions. The size of the code generated by this procedure is on the average 45% (25%) smaller than that generated by Motorola's g56K (SPAM).<BR> Categories and Subject Descriptors: 2.3 [Software and System Design]: Compilers, DSP and embedded systems<BR> General Terms: Code Generation, Low Power<P> <DT> <A HREF="../../pdffiles/p2_8.pdf" NAME="p2_8"><IMG SRC="../../../../../icons/pdficon.gif"></A> P2.8: Power-Optimal Encoding for DRAM Address Bus [p. 250] <DD><I> Wei-Chung Cheng, Massoud Pedram (University of Southern California at Los Angeles) </I><P> This paper presents Pyramid code, an optimal code for transmitting sequential addresses over a DRAM bus. Constructed by finding an Eulerian cycle on a complete graph, this code is optimal for conventional DRAM in the sense that it minimizes the switching activity on the time-multiplexed address bus from CPU to DRAM. Experimental results on a large number of testbenches with different characteristics (i.e. sequential vs. random memory access behaviors) are reported and demonstrate a reduction of bus activity by as much as 50%.<P> <DT> <A HREF="../../pdffiles/p2_9.pdf" NAME="p2_9"><IMG SRC="../../../../../icons/pdficon.gif"></A> P2.9: Profile-Driven Code Execution for Low Power Dissipation [p. 253] <DD><I> Diana Marculescu (Carnegie Mellon University) </I><P> This paper proposes a novel technique for power-performance trade-off based on profile-driven code execution. Specifically, we show that there is an optimal level of parallelism for energy consumption and propose a compiler-assisted technique for code annotation that can be used at run-time to adaptively trade-off power and performance. As shown by experimental results, our approach is up to 23% better than clock throttling and is as efficient as voltage scaling (up to 10% better in some cases). The technique proposed in this paper can be used by an ACPI-compliant power manager for prolonging battery life or as a passive cooling feature for thermal management.<P> </DL> <HR> <H2>Session 9: <A NAME="sess9"> Synthesis and Optimization<BR></A> Thursday, July 26th, 2000, 14:30 - 16:00 </H2> <I>Session Chair: V. De (Intel)<BR> Session Organizer: D. Xanthopoulus (ATI)</I><P> <DL> <DT> <A HREF="../../pdffiles/09_1.pdf" NAME="09_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> 9.1: An Asynchronous Matrix-Vector Multiplier for Discrete Cosine Transform [p. 256] <DD><I> Kyeounsoo Kim (Ilryung Telesys, Inc.), Peter A. Beerel (University of Southern California at Los Angeles), Youpyo Hong (Dongguk University) </I><P> This paper proposes an efficient asynchronous hardwired matrix-vector multiplier for the two-dimensional discrete cosine transform and inverse discrete cosine transform (DCT/IDCT). The design achieves low power and high performance by taking advantage of the typically large fraction of zero and small-valued data in DCT and IDCT applications. In particular, it skips multiplication by zero and dynamically activates/deactivates required bit-slices of fine-grain bit-partitioned adders using simplified, static-logic-based speculative completion sensing. The results extracted by both bit-level analysis and HSPICE simulations indicate significant improvements compared to traditional designs.<BR> Keywords: Asynchronous matrix-vector multiplier, discrete cosine transform<P> <DT> <A HREF="../../pdffiles/09_2.pdf" NAME="09_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> 9.2: Low Power Techniques and Design Tradeoffs in Adaptive FIR Filtering for PRML Read Channels [p. 262] <DD><I> Khurram Muhammad, Robert B. Staszewski (Texas Instruments, Inc.), Poras T. Balsara (University of Texas at Dallas) </I><P> In this paper, we describe area and power reduction techniques for a low-latency adaptive finite-impulse response filter for magnetic recording read channel applications. Various techniques are used to reduce area and power dissipation while speed remains as the main performance criterion for the target application. A parallel transposed direct form architecture operates on real-time input data samples and employs a fast, low-area multiplier based on selection of radix-8 pre-multiplied coefficients in conjunction with one-hot encoded bus leading to a very compact layout and reduced power dissipation. Area, speed and power comparisons with other low-power implementation options are also shown. The proposed filter has been fabricated using a 0.18 um L-effective CMOS technology and operates at 550 MSamples/s.<P> <DT> <A HREF="../../pdffiles/09_3.pdf" NAME="09_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> 9.3: Energy-Efficient 32 x 32-bit Multiplier in Tunable Near-Zero Threshold CMOS [p. 268] <DD><I> Vjekoslav Svilan (Stanford University), Masataka Matsui (Toshiba Corp.), James B. Burr (Sun Microsystems Labs) </I><P> An 80,000 transistor, low swing, 32 x 32-bit multiplier was fabricated in a standard 0.35 um, Vth =0.5 V CMOS process and in a 0.35 m, back-bias tunable, near-zero Vth process. While standard CMOS at Vdd =3.3 V runs at 136 MHz, the same performance can be achieved in the low-Vth version at Vdd =1.3 V, resulting in more than 5 times lower power. Similar power reductions are obtained for frequencies down to 10 MHz. In addition, the low-Vth version is able to run at 188 MHz, which is 38% faster than standard CMOS.<P> </DL> <HR> <H2>Session 10: <A NAME="sess10"> RF Design<BR></A> Thursday, July 27th, 2000, 14:30 - 16:00 </H2> <I>Session Chair: S. Heine (Infineon)<BR> Session Organizer: Q. Huang (ETHZ)</I><P> <DL> <DT> <A HREF="../../pdffiles/10_1.pdf" NAME="10_1"><IMG SRC="../../../../../icons/pdficon.gif"></A> 10.1: Tradeoffs and Design of an Ultra Low Power UHF Transceiver Integrated in a Standard Digital CMOS Process [p. 273] <DD><I> Alain-Serge Porret, Thierry Melly (Swiss Federal Institute of Technology), E. A. Vittoz, C. C. Enz (Swiss Center for Electronics & Microtechnology) </I><P> A broad range of high-volume consumer applications require low-power, battery operated, wireless microsystems and sensors. These systems should conciliate a sufficient battery lifetime with reduced dimensions, low cost and versatility. The design of such systems highlights many tradeoffs between performances, lifetime, cost and power consumption. Also, special circuit and design techniques are needed to comply with the reduced supply voltage (down to 1V). These considerations are illustrated by design examples taken from a transceiver chip realized in a standard 0.5m digital CMOS process. The chip is dedicated to a distributed sensors network and is based on a direct-conversion architecture. The circuit prototype operates in the 434 MHz ISM band and consumes only 1mW in receive mode. It achieves a -95dBm sensitivity for a data rate of 24kbit/s. The transmitter section is designed for 0dBm output power under the minimum 1V supply, with a global efficiency higher than 15%.<BR> Keywords: RF, Transceiver, Low-Power, Low-Voltage, CMOS.<P> <DT> <A HREF="../../pdffiles/10_2.pdf" NAME="10_2"><IMG SRC="../../../../../icons/pdficon.gif"></A> 10.2: An 8mA, 3.8dB NF, 40dB gain CMOS Front-End for GPS Applications [p. 279] <DD><I> F. Svelto (Università di Bergamo), S. Deantoni, G. Montagna (STMicroelectronics), R. Castello (Università di Pavia) </I><P> A fully differential 0.35um CMOS LNA plus mixer, tailored to a double conversion architecture, for GPS applications has been realized. The LNA makes use of an inductively degenerated input stage and a resonant LC load, featuring 12% frequency tuning, accomplished by an MOS varactor. The mixer is a Gilbert cell like, in which a NMOS and a PMOS differential pair, shunted together, realize the input stage. This topology allows to save power, for given mixer gain and linearity. The front-end measured performances are 40dB gain, 3.8dB NF, -25.5dBm IIP3, 1.3 GHz input frequency, 140MHz output frequency, with 8mA from a 2.8V voltage supply.<P> <DT> <A HREF="../../pdffiles/10_3.pdf" NAME="10_3"><IMG SRC="../../../../../icons/pdficon.gif"></A> 10.3s: Bias Boosting Technique for a 1.9GHz Class AB RF Amplifier [p. 284] <DD><I> Tirdad Sowlati, Sifen Luo (Philips Research) </I><P> A bias boosting technique for a 3.2V, 1.9GHz Class AB RF amplifier designed in a 30GHz BiCMOS process is presented in this paper. In a Class AB amplifier, the average current drawn from the supply depends on the input signal level. As the output power increases so does the average currents in both the emitter and the base of the power transistor. The increased average current causes an increased voltage drop in the biasing circuitry and the ballast resistor. This reduces the conduction angle in the amplifier, pushing it deep into Class B and even Class C operation, reducing the maximum output power by 25%. To avoid the power reduction, the amplifier should have a larger bias which inevitably has a larger power dissipation at low output power levels. The proposed bias boosting circuitry dynamically increases the bias of the power transistor as the output power increases. The amplifier has less power dissipation at low power levels with an increased maximum output power.<P> <DT> <A HREF="../../pdffiles/10_4.pdf" NAME="10_4"><IMG SRC="../../../../../icons/pdficon.gif"></A> 10.4s: Analysis and Design of Low-Phase-Noise Ring Oscillators [p. 289] <DD><I> Liang Dai, Ramesh Harjani (University of Minnesota) </I><P> This paper presents a framework for CMOS ring oscillator phase noise analysis for given power consumption specifications. This model considers both linear and nonlinear operations. It indicates that fast rail-to-rail switching has to be achieved for low phase noise and that the up-conversion of low-frequency noise from the current bias/control circuit can be significant. Our phase noise model is validated via simulation and measurement results. We also present a coupled-ring oscillator whose phase noise is - 114dBc/Hz at a 600kHz offset fro the 960MHz carrier frequency.<P> </DL> <HR> <H2><A NAME="tut1"> Embedded Tutorial 1<BR></A> Thursday, July 27th, 2000, 16:30 - 17:30 </H2> <DL> <DT> <A HREF="../../pdffiles/tut1.pdf" NAME="tut1"><IMG SRC="../../../../../icons/pdficon.gif"></A> Reliable Low-Power Design in the Presence of Deep Submicron Noise [p. 295] <DD><I>Speakers: Naresh Shanbhag (University of Illinois at Urbana-Champaign), Krishnamurthy Soumyanath (Intel Corporation), Samuel Martin (Lucent Technologies)<BR> Session Chair: N. Sivan (Motorola)</I><P> Scaling of feature sizes in semiconductor technology has been responsible for increasingly higher computational capacity of silicon. This has been the driver for the revolution in communications and computing. However, questions regarding the limits of scaling (and hence Moore's Law) have arisen in recent years due to the emergence of deep submicron noise. The tutorial describes noise in deep submicron CMOS and their impact on digital as well as analog circuits. In particular, noise-tolerance is proposed as an effective means for achieving energy and performance efficiency in the presence of DSM noise.<P> </DL> <HR> <H2><A NAME="tut2"> Embedded Tutorial 2<BR></A> Thursday, July 27th, 2000, 16:00 - 17:30 </H2> <DL> <DT> <A HREF="../../pdffiles/tut2.pdf" NAME="tut2"><IMG SRC="../../../../../icons/pdficon.gif"></A> Low-Power DSP's for Wireless Communications [p. 303] <DD><I>Speakers: Ingrid Verbauwhede (University of California at Los Angeles), Chris Nicol (Lucent Technologies)<BR> Session Chair: L. Benini (Università di Bologna)</I><P> Wireless communications and more specifically, the fast growing penetration of cellular phones and cellular infrastructure are the major drivers for the development of new programmable Digital Signal Processors (DSP's). In this tutorial, an overview will be given of recent developments in DSP processor architectures, that makes them well suited to execute computationally intensive algorithms typically found in communications systems. DSP processors have adapted instruction sets, memory architectures and data paths to execute compute intensive communications algorithms efficiently and in a low power fashion. Basic building blocks include convolutional decoders (mainly the Viterbi algorithm), turbo coding algorithms, FIR filters, speech coders, etc. This is illustrated with examples of different commercial and research processors. Please note that the authors do not endorse the processors used in this tutorial. These processors are used to illustrate how different solutions are proposed for the same problem.<BR> Keywords: Digital Signal Processing, architectures, programmable processors, wireless communications.<P> </DL>