# A Low-Power Graphics LSI integrating 29Mb Embedded DRAM for Mobile Multimedia Applications

Ramchan Woo, Sungdae Choi, Ju-Ho Sohn, Seong-Jun Song, Young-Don Bae and Hoi-Jun Yoo

Department of EECS, Korea Advanced Institute of Science and Technology 373-1, Guseong-Dong, Yuseong-Ku, Daejeon 305-701, KOREA Tel: +82-42-869-8068, E-Mail: ural@eeinfo.kaist.ac.kr

Abstract – A low-power graphics LSI is designed and implemented for mobile multimedia applications. The LSI contains a 32bit RISC processor with enhanced MAC, a 3D rendering engine, programmable power optimizer, and 29Mb embedded DRAM. Full 3D graphics pipeline featuring 264Mtexels/s texture-mapped 3D graphics as well as 2D MPEG-4 video decoding can be realized while consuming less than 210mW and 121mm² chip area. The chip is implemented with 0.16μm pure DRAM process to reduce the fabrication cost. The real-time 3D graphics applications are successfully demonstrated by the fabricated chip on two PDA system boards.

#### I. Introduction

As the mobile electronics market matures, 3G multimedia terminals such as PDAs or smart cell-phones are getting popular. And their applications are already migrating to the real-time multimedia, even to the 3D gaming applications [1]. In this work, we designed and implemented a graphics LSI [2-4] using the pure DRAM technology to reduce the fabrication cost while keeping the huge memory bandwidth. Using the DRAM process enables us to further reduce the power consumption because off-chip loading to the rendering memory is completely eliminated. We optimize the circuits and architectures so that the full 3D pipeline is realized with less than 210mW at the drawing speed of 66Mpixels/s and 264Mtexels/s bilinear MIPMAP texturing and antialiasing, while satisfying the requirements of the battery lifetime and the physical dimensions of mobile terminals.

## II. Architecture

The architecture of the designed graphics LSI is shown in Fig. 1. The chip contains a 32b RISC processor with enhanced MAC, a 3D rendering engine (3DRE), 29Mb embedded DRAM, bandwidth equalizer (BEQ) and programmable power optimizer (PPO). The ARM-9 compatible RISC with 4KB I/D caches operates at 132MHz. The RISC accelerates the 3D geometry operations so that it can calculate 1Mverticies/s transformation running a hand-optimized fixed-point graphics library, which is a 43% improvement over the conventional ARM9 processor. Inside the 3DRE, SlimShader performs the main rendering

operations such as texturing, shading, blending and depth comparison with two pixel processor and two texture units. Memory Programmer post-processes the special rendering effects such as antialiasing, motion blur, and fog. Embedded 29Mb DRAM provides sufficient bandwidth and capacity required for 3D rendering operations. Dedicated hardware engines and 1.6GByte/s bandwidth with 416b-wide DRAM lower the operation frequency of the 3DRE even to 33MHz, while the RISC operates at 132MHz. To compensate the difference of the processing speed and data width between the RISC and 3DRE, BEQ buffers the vertex data with 1kB Dual-Ported SRAM. BEQ partially activates the banks of SRAM according to the required buffer size, saving 20% power of SRAM. PPO reduces the power consumption of the chip by varying four different clock domains. Each clock can be selectively gated and its frequency is scalable by the software to adjust the frame rate during run-time.

### III. Implementation and Measurements

The graphics LSI is implemented using a typical 0.16µm 1-W 3-Al DRAM process to integrate both the logic and memory into the single chip with low fabrication cost. The chip contains 1M logic transistors, 29Mb DRAM, 72kb SRAM and PLL. The logic components, SRAM and analog blocks are drawn with the design rule of peripheral transistors of DRAM. They meet the performance requirements of mobile applications with little leakage current although they show relatively large gate delay and routing area compared with the pure logic process. The implemented chip consumes 210mW in continuous calculation of texture-mapped 3D graphics applications at FAST mode. The embedded DRAM drastically reduces the power consumption since the external I/Os for 3D rendering are completely eliminated, and an additional 22% reduction is obtained by low-energy texturing unit and pipeline clock gating. Non-textured 3D applications and MPEG-4 video decoding consume 145mW and 85mW, respectively. Textured 3D rendering consumes 110mW at NORMAL, and 65mW at SLOW mode, respectively. Fig. 2 shows the micrograph of the chip and table 1 summarizes its features. The die area including I/O pads takes 121mm<sup>2</sup>. Fig. 3 shows the measured waveform varying the clock frequencies from the SLOW mode to FAST mode. The first silicon is successfully working and real-time 3D graphics applications are demonstrated on the system evaluation boards, each of which is equipped with the graphics LSI, 32MB SRAM (1T-SRAM or Ut-SRAM), USB interface, and memory controller, as shown in Fig. 4 and Fig. 5.

### IV. Conclusions

A graphics LSI is implemented using a typical 0.16µm pure DRAM process for low-cost, low-power mobile multimedia applications. The LSI contains a 32bit RISC processor with MAC, a 3D rendering engine, programmable power optimizer, and 29Mb embedded DRAM. The chip consumes 210mW and 121mm² chip area. The 3D graphics images are successfully demonstrated by the fabricated chip on the PDA system boards.

### References

- [1] Khronos Group, "Bring 3D Gaming to Cell Phones," Game Developers Conference 2003
- [2] Ramchan Woo, et al, "A 210mW Graphics LSI implementing Full 3D Pipeline with 264Mtexels/s Texturing for Mobile Multimedia Applications," ISSCC Digest of Technical Papers, pp. 44-45, 2003
- [3] Ramchan Woo, et al, "A Low-Power and High-Performance 2D/3D Graphics Accelerator for Mobile Multimedia Applications," *Accepted for Presentation*, Hot Chips 2003
- [4] Ramchan Woo, et al, "A Low-Power 3D Rendering Engine with Two Texture Units and 29Mb Embedded DRAM for 3G Multimedia Terminals," *Accepted for Presentation*, ESSCIRC 2003



Fig.1: Block diagram of Graphics LSI



Fig.2: Chip Microphotograph

Table 1: Chip Characteristics

| Process Technology                          | 0.16um CMOS DRAM with 1-W 3-Al                                                                                                                                                                                                                                      |
|---------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Power Supply                                | 2.0V (DRAM Core), 2.5V (Logic), 3.3V (I/O)                                                                                                                                                                                                                          |
| Operating Frequency<br>(RISC,BEQ/3DRE,DRAM) | FAST : 132MHz/33MHz<br>NORMAL : 66MHz/16.5MHz<br>SLOW : 33MHz/8.25MHz                                                                                                                                                                                               |
| Power Consumption                           | < 210mW                                                                                                                                                                                                                                                             |
| Transistor Counts                           | 1M Logic<br>29Mbit DRAM<br>72kbit SRAM (9KByte)                                                                                                                                                                                                                     |
| Die Size                                    | 11mm x 11mm                                                                                                                                                                                                                                                         |
| Package                                     | 240pin PGA                                                                                                                                                                                                                                                          |
| Target<br>Applications                      | Realtime 2D/3D Graphics Pipeline<br>MPEG-4 SP@L1 Decoding<br>MP3 Audio Decoding                                                                                                                                                                                     |
| 3D Rendering<br>Performance                 | 66Mpixels/s, 264Mtexels/s Triangle Setup Engine Perspective-Correct Bilinear MIPMAP Texturing Gouraud Shading, Alpha Blending, Texture Blending Embedded 5Mb Double Depth/Frame Buffer Embedded 24Mb Texture Memory Antialiasing, Motion Blur, Fog, Special Effects |



Fig.3: Measured Waveform



Fig. 4: PDA System Prototype I (REMY-I)



Fig.5: PDA System Prototype II (REMY-II)