| |
ISSS'99 ABSTRACTS
Sessions:
[1]
[2]
[Panel]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
Chair: N. Dutt
-
Design of a Set-Top Box System on a Chip [p. 2]
- E. Foster
This presentation will review system-level issues associated
with integrating the major blocks of a Set-Top Box onto
a single die. In addition to the challenges of merging several
powerful functions into a single chip, the goal of integration
is to yield a composite design that is not only more cost effective
but also provides more function than the sum of discrete
parts. This is accomplished through consolidated and
shared memory, improved system bandwidth and efficiency,
and additional inter-macro signals to facilitate improved
communication.
-
On the Rapid Prototyping and Design of a Wireless Communication System on a
Chip [p. 3]
- B. Kelley
The evolutionary convergence of computing, integrated
circuit technology, and advances in wireless communications
has led to an explosive growth of personal communication
devices and services (PCS). In fact, the dramatic
"Moore's Law" shrinkage of IC devices, itself, has lead to an
unprecedented ability to place increasingly complex systems
on a chip (SoC).
In a wireless communication environment, the integration
task is made more difficult by the need to integrate RF, mixed
signal, and digital systems. Furthermore, the digital system
design task generally requires a mapping of heterogeneous
stacks of software processes onto a similarly diverse collection
of digital signal processors, microprocessors and
application-specific integrated circuits.
In this presentation, we give an overview of a modern
wireless communication device and describe advanced system-level
design methodologies utilized for rapid prototyping
and design of current and next generation systems.
-
Embedded Java: Techniques and Applications [p. 6]
- B. Barry, J. Duimovich
Java is an ideal language for developing embedded applications.
However, most Java implementations and tools
were designed for workstations and have limitations due to
that heritage. Special tools are required to support deployment
and effect better integration with target hardware.
This talk will be in two parts. The first part will provide
an overview of pervasive computing with a special focus
on embedded Java, and describe typical applications drawn
from several different market segments.
The second half will delve more deeply into the architecture
of an embedded Java runtime and discuss technical
issues relating to dynamic compilation, optimization, and
deployment.
Organizers/Moderators: D. Gajski, R. Bergamaschi
Panelists:
M. Franz, G. Hellestrand A. Horak, J. Kunkel, W. Lee, G. Martin, K. Vissers,
-
Panel Statement [p. 8]
System level design has brought together a number of
formidable challenges, such as methodology, software and
hardware design and design automation, to name a few.
More than ever, the successful design of a system requires
all these challenges to be addressed - by both the designers
and the design automation tools.
Designers, better and anyone else, know what the problems
are. Design automation companies claim to know how
to solve them and have the products to prove it. Is this really
true? Are the design automation tools really solving
the hard problems or skimming over the real challenges.
This panel addresses exactly that by confronting the views
of distinguished designers and tools developers.
The panelists belong to two teams. The designer team will
present the main problems in doing system design including
verification, IP use, integration and synthesis among others,
and try to show that many of the real problems are not being
addressed by current tools. The tools team will explain how
the tools are indeed tackling the real problems and how the
designers can make the best use out of them.
The attendees can expect a very interesting, informative
and technical debate. At the end, the audience will be the
judge and a verdict will be passed on what the real problems
are, which ones can be solved with existing tools, and what
needs to be done in the future to address the system design
challenges.
-
Microelectromechanical Systems (MEMS): Miniaturization Beyond Microelectronics
[p. 10]
- N. Maluf
The concept of a "system-on-a-chip" quickly invokes in our minds integrated
microelectronic circuits. It took the semiconductor industry over 30 years to
reach this level of integration, putting many millions of transistors on the
same chip to perform extremely complex digital functions. Throughout this
"system-level" revolution, the core element remained the MOS transistor, and
the interface between the electronics and society has changed little. The same
microfabrication methods of the electronics industry are now being adapted to
design components and systems that integrate multiple physical functions
including mechanics, fluid flow, optics, biology, etc on the same substrate.
The net result is systems for complex non-digital applications, and
sophisticated interfacing with the "real world." This technology is only
beginning to emerge and the level of integration is in its early stages, yet
the enabled functionality has already been phenomenal. For example, the
integration of a small accelerometer and gyroscope with electronics is now at
the center of modern vehicle stability systems. In another example, the DMD?
display contains nearly one million little mirrors to control the intensity of
individual pixels. In biology and biochemistry, on-going efforts aim at
miniaturizing genetic analysis and diagnostic systems. In this presentation,
we will review the basic fundamentals of microelectromechanical systems. The
presentation will also include a brief survey of existing microsystems and
provide a peek into the future.
-
Middleware Techniques and Optimizations for Real-Time, Embedded Systems [p. 12]
- D. Schmidt
Due to constraints on footprint, performance, and
weight/power consumption, real-time, embedded system software
development has historically lagged mainstream software
development methodologies. As a result, real-time, embedded
software systems are costly to evolve and maintain.
Moreover, they are often so specialized that they cannot adapt
readily to meet new market opportunities or technology innovations.
To further exacerbate matters, a growing class of real-time,
embedded systems require end-to-end support for various
quality of service (QoS) aspects, such as bandwidth, latency,
jitter, and dependability. These applications include
telecommunication systems (e.g., call processing and switching),
avionics control systems (e.g., operational flight programs
for fighter aircraft), and multimedia (e.g., Internet
streaming video and wireless PDAs). In addition to requiring
support for stringent QoS requirements, these systems are often
targeted at highly competitive markets, where deregulation
and global competition are motivating the need for increased
software productivity and quality.
Requirements for increased software productivity and quality
motivate the use of Distributed Object Computing (DOC)
middleware [1]. Middleware resides between client and server
applications and services in complex software systems. The
goal of middleware is to integrate reusable software components
to decrease the cycle-time and effort required to develop
high-quality real-time and embedded applications and
services.
Chair: P. Chou
-
Event-Driven Power Management of Portable Systems [p. 18]
- T. Simunic, G. De Micheli, L. Benini
The policy optimization problem for dynamic power
management has received considerable attention in the recent
past. We formulate policy optimization as a constrained
optimization problem on continuous-time Semi-Markov
decision processes (SMDP). SMDPs generalize the
stochastic optimization approach based on discrete-time
Markov decision processes (DTMDP) presented in the earlier
work by relaxing two limiting assumptions. In SMDPs,
decisions are made at each event occurrence instead of at
each discrete time interval as in DTMDP, thus saving power
and giving higher performance. In addition, SMDPs can
have general inter-state transition time distributions, allowing
for greater generality and accuracy in modeling real-life
systems where transition times between power states are
not geometrically distributed.
-
Real-Time Task Scheduling for a Variable Voltage Processor [p. 24]
- T. Okuma, T. Ishihara, H. Yasuura
This paper presents a real-time task scheduling technique
with a variable voltage processor which can vary
its supply voltage dynamically. Using such a processor,
running tasks with a low supply voltage leads to drastic
power reduction. However, reducing the supply voltage
may violate real-time constraints. In this paper, we propose
a scheduling technique which simultaneously assigns both
CPU time and a supply voltage to each task so as to minimize
total energy consumption while satisfying all real-time
constraints. Experimental results demonstrate effectiveness
of the proposed technique.
-
Path-Based Edge Activation for Dynamic Run-Time Scheduling [p. 30]
- V. Mooney III
We present a tool that performs real-time analysis
and dynamic execution of software tasks in a mixed
hardware-software system with a custom run-time scheduler.
The tasks in hardware and software have control-flow constraints
(precedence and alternative execution),
resource constraints, relative timing constraints, and rate constraint.
The custom run-time scheduler dynamically executes tasks in different
orders, based on the conditional execution path, such that a hard
real-time rate constraint can be predictably met.
We describe the task modelling, run-time scheduler
implementation, and real-time analysis. We introduce
the concept of path-based edge activation utilizing conditional
edges. We show how our approach fits into an
overall tool flow and target architecture. Finally, we conclude
with a sample application of the system to a design example.
Chair: Loganath Ramachandran
-
Optimized System Synthesis of Complex RT Level Building Blocks from
Multirate Dataflow Graphs [p. 38]
- J. Horstmannshoff, H. Meyr
In order to cope with the ever increasing complexity of
todays application specific integrated circuits, a building
block based design methodology is established. The system
is composed of high level building blocks of which some
are reused from previous designs while others might have
been created by behavioral synthesis. In data flow oriented
designs, these blocks usually have complex non-matching
interface properties, making it necessary to generate additional
interfacing and controlling hardware to integrate
them into an operable system.
In this paper, an RTL-HDL code generation from a synchronous
data flow representations is introduced, that efficiently
automates the generation of the required additional
hardware. While existing code generation approaches provide
strong limitations concerning the building block interfacing
properties, our method enables the integration of
components that access their ports periodically with arbitrary
patterns. In order to reduce interface register cost,
a minimum-area retiming approach is taken to determine
optimum building block activation times, which is known
to have polynomial time complexity. The code generation
methodology is compared to an existing approach using a
simple case study.
-
RTGEN: An Algorithm for Automatic Generation of Reservation Tables from
Architectural Descriptions [p. 44]
- P. Grun, A. Halambi, N. Dutt, A. Nicolau
Reservation Tables (RTs) have long been used to detect
conflicts between operations that simultaneously access the
same architectural resource. Traditionally, these RTs have
been specified explicitly by the designer. However, the increasing
complexity of modern processors makes the manual specification
of RTs cumbersome and error-prone. Furthermore,
manual specification of such conflict information is infeasible
for supporting rapid architectural exploration. In this paper
we present an algorithm to automatically generate RTs from
a high-level processor description, with the goal of avoiding
manual specification of RTs, resulting in more concise architectural
specifications and also supporting faster turn-around
time in Design Space Exploration. We demonstrate the utility
of our approach on a set of experiments using the TI C6201
VLIW DSP and DLX processor architectures, and a suite of
multimedia and scientific applications.
-
Pre-fetching for Improved Core Interfacing [p. 51]
- R. Lysecky, F. Vahid, R. Patel, T. Givargis
Reuse of cores can reduce design time for systems-on-a-chip.
Such reuse is dependent on being able to easily interface a core
to any bus. To enable such interfacing, many propose
separating a core's interface from its internals. However, this
separation can lead to a performance penalty when reading core's
internal registers. We introduce pre-fetching, which is
analogous to caching, as a technique to reduce or eliminate
this performance penalty, involving a tradeoff with power and
size. We describe the pre-fetching technique, classify different
types of registers, describe our initial pre-fetching
architectures and heuristics for certain classes of registers,
and highlight experiments demonstrating the performance
improvements and size/power tradeoffs.
Keywords:
Cores, system-on-a-chip, interfacing, on-chip bus, intellectual
property.
-
Compressed Code Execution on DSP Architectures [p. 56]
- P. Centoducatte, R. Pannain, G. Araujo
Decreasing the program size has become an important
goal in the design of embedded systems target to mass production.
This problem has led to a number of efforts aimed
at designing processors with shorter instruction formats
(e.g. ARM Thumb and MIPS16), or that can execute compressed
code (e.g. IBM CodePack PowerPC). Much of this
work has been directed towards RISC architectures though.
This paper proposes a solution to the problem of executing
compressed code on embedded DSPs. The experimental
results reveal an average compression ratio of 75% for typical
DSP programs running on the TMS320C25 processor.
This number includes the size of the decompression engine.
Decompression is performed by a state machine that translates
codewords into instruction sequences during program
execution. The decompression engine is synthesized using
the AMS standard cell library and a 0.6um 5V technology.
Gate level simulation of the decompression engine reveals
minimum operation frequencies of 150MHz.
Chair: Walid Najjar
-
Loop Scheduling and Partitions for Hiding Memory Latencies [p. 64]
- F. Chen, E. Sha
Partition Scheduling with Prefetching (PSP) is a memory latency
hiding technique which combines the loop pipelining
technique with data prefetching. In PSP, the iteration space
is first divided into regular partitions. Then two parts of
the schedule, the ALU part and the memory part, are produced
and balanced to produce an overall schedule with high
throughput. These two parts are executed simultaneously,
and hence the remote memory latency are overlapped. We
study the optimal partition shape and size so that a well balanced
overall schedule can be obtained. Experiments on
DSP benchmarks show that the proposed methodology consistently
produces optimal or near optimal solutions.
-
Loop Alignment for Memory Accesses Optimization [p. 71]
- A. Fraboulet, G. Huard, A. Mignotte
Portable or embedded systems allow more and more
complex applications like multimedia today. These applications
and submicronic technologies have made the power
consumption criterium crucial. We propose new techniques
thanks to which we can optimize the behavioral description
of an integrated system before the hardware/software partitioning
(Codesign). These transformations are performed
on "for" loops that constitute the main parts of the multimedia
code which handle the arrays. We present in this paper
two new (polynomial) techniques for minimizing memory
accesses in loop nests by data temporal locality optimization.
-
A Buffer Merging Technique for Reducing Memory Requirements of
Synchronous Dataflow Specifications [p. 78]
- P. Murthy, S. Bhattacharyya
Synchronous Dataflow, a subset of dataflow, has proven to
be a good match for specifying DSP programs. Because of
the limited amount of memory in embedded DSPs, a key
problem during software synthesis from SDF specifications
is the minimization of the memory used by the target
code. We develop a powerful formal technique called
buffer merging that attempts to overlay buffers in the SDF
graph systematically in order to drastically reduce data
buffering requirements. We give a polynomial-time algorithm
based on this formalism, and show that code synthesized
using this technique results in more than a 60%
reduction of the buffering memory consumption compared
to existing techniques.
-
Exploration and Synthesis of Dynamic Data Sets in Telecom Network Applications
[p. 85]
- C. Ykman-Couvreur, J. Lambrecht, D. Verkest, F. Catthoor, H. De Man
We present a new exploration and optimization
method to select customized implementations for dynamic
data sets, as encountered in telecom network,
database and multimedia applications. Our method fits
the context of embedded system synthesis for such applications,
and enables to further raise the abstraction
level of the initial specification, where dynamic data sets
can be specified without low-level details. Our method is
suited for hardware and software implementations. In
this paper, it mainly aims at minimizing the memory
power consumption, although it can also be driven by
other cost functions such as area or performance. Compared
with existing methods, it can save up to 2/3 of
the memory power consumption and 3/4 of the memory
area.
Chair: Lev Markov
-
A Graph Theoretic Approach for Design and Synthesis of Multiplierless FIR
Filters [p. 94]
- K. Muhammad, K. Roy
We present a novel approach which can be used to
obtain multiplierless implementations of finite impulse response
(FIR) digital filters. The main idea is to reorder filter coefficients
such that an implementation based on differential coefficients
requires only a few adders. We represent this problem
using a graph in which vertices represent the coefficients and
edges represent the resources required when the differential
coefficient corresponding to the edge is used in a computation.
We also present a graph model for an implementation based on
second-order coefficient differences. The optimal solution to
the coefficient reordering problem is the well known problem
of finding the Hamiltonian path of smallest weight in this graph.
We use two approaches to find the smallest weight Hamiltonian
cycle; a greedy approach, and, the heuristic algorithm
proposed by Lin and Kernighan. The power and potential of
this approach is demonstrated by presenting results for large
filters (lengths up to > 300) which show that, in general, for
16-bit coefficients, the total number of adders required per
coefficient is less than 2. Hence, high performance and/or
low power filters can be designed and synthesized using the
proposed approach.
-
Efficient Scheduling of DSP Code on Processors with Distributed Register Files
[p. 100]
- B. Mesman, C. Alba Pinto, K. van Eijk
Code generation methods for digital signal processors
are increasingly hampered by the combination of tight timing
constraints imposed by the algorithms and the limited
capacity of the available register files. Traditional methods
that schedule spill code to satisfy storage capacity have
difficulty satisfying the timing constraints. The method presented
in this paper analyses the combination of limited register
file capacity, resource- and timing constraints during
scheduling. Value lifetimes are serialized until all capacity
constraints are guaranteed to be satisfied after scheduling.
Experiments in the FACTS environment show that we efficiently
obtain high quality instruction schedules for inner-most
loops of DSP algorithms.
-
Automatic Architectural Synthesis of VLIW and EPIC Processors [p. 107]
- S. Aditya, B. Ramakrishna Rau, V. Kathail
This paper describes a mechanism for automatic design
and synthesis of very long instruction word (VLIW), and
its generalization, explicitly parallel instruction computing
(EPIC) processor architectures starting from an abstract
specification of their desired functionality. The process architecture design makes concrete decisions regarding the
number and types of functional units, number of read/write
ports on register files, the datapath interconnect, the instruction
format, its decoding hardware, and the instruction
unit datapath. The processor design is then automatically
synthesized into a detailed RTL-level structural model VHDL along with an estimate of its area. The system also
generates the corresponding detailed machine description
and instruction format description that can be used to re-target
a compiler and an assembler respectively. All this part of an overall design system, called Program-In-Chip-Out
(PICO), which has the ability to perform automatic exploration
of the architectural design space while customizing
the architecture to a given application and making intelligent,
quantitative, cost-performance tradeoffs.
-
Bit-Width Selection for Data-Path Implementations [p. 114]
- C. Carreras, J. López, O. Nieto-Taladriz
Specifications of data computations may not necessarily
describe the ranges of the intermediate results that can
be generated. However, such information is critical to determine
the bit-widths of the resources required for a data-path
implementation. In this paper, we present a novel approach
based on interval computations that provides, not
only guaranteed range estimates that take into account dependencies
between variables, but estimates of their probability
density functions that can be used when some truncation
must be performed due to constraints in the specification.
Results show that interval-based estimates are obtained
in reasonable times and are more accurate than those
provided by independent range computation, thus leading
to substantial reductions in area and latency of the corresponding
data-path implementation.
Chair: Giovanni De Micheli
-
Catalyst: A DSIP Design Flow Development in Industry [p. 122]
- W. De Rammelaere, K. Eckert, E. Hilkens, T. Lawell, R. McGarity,
P. Le Moenner, F. Steininger
The Motorola System on Chip Design Technologies
(SoCDT) team aims at providing a system design
environment for its customers. The Toulouse branch
concentrates on design efforts incorporating DSP
functionality. This is referred to as the Catalyst
methodology. We found that in current systems very often
the software development cycle is longer than that of the
silicon development. To ease the software burden, we have
changed the silicon architecture and its flow to permit the
DSP software to be written in the C language instead of
assembler code, as is normally done. The resulting
architecture is domain specific; it is smaller, has a reduced
design cycle and is simpler to implement because it is tuned
to the application software we are providing. This paper will
describe the methodology which we are developing to create
domain specific architectures, it shows one example
architecture and aspects which are critical for industry
acceptance.
-
System Synthesis of Synchronous Multimedia Applications [p. 128]
- G. Qu, M. Mesarina, M. Potkonjak
Modern system design is being increasingly driven by
applications such as multimedia and wireless sensing and
communications, which all have intrinsic quality of service
(QoS) requirements, such as throughput, error-rate, and resolution.
One of the most crucial QoS guarantees that the
system has to provide is the timing constraints among the
interacting media (synchronization) and within each media
(latency). We have developed the first framework for
systems design with timing QoS guarantees, latency and
synchronization. In particular, we address how to design
system-on-chip with minimal silicon area to meet timing
constraints. We propose the two-phase design methodology.
In the first phase, we select an architecture which facilitates
the needs of synchronous low latency applications well. In
the second phase, for a given processor configuration, we
use our new scheduler in such a way that storage requirements
are minimized. We have develop scheduling algorithms
that solve the problem optimally for a-priori specified
applications. The algorithms have been implemented
and their effectiveness demonstrated on a set of simulated
MPEG streams from popular movies.
-
A Framework for Scheduling and Context Allocation in Reconfigurable Computing
[p. 134]
- R. Maestre, M. Fernandez, R. Hermida, N. Bagherzadeh
Reconfigurable computing is emerging as a viable design
alternative to implement a wide range of computationally
intensive applications. The scheduling problem becomes a
really critical issue in achieving the high performance that
these kind of applications demands. This paper describes
the different aspects regarding the scheduling problem in a
reconfigurable architecture. We also propose a general
strategy in order to perform at compilation time a
scheduling that includes all possible optimizations
regarding context (configuration) and data transfers. In
particular, we focus especially on the methodology and
mechanisms to solve the context scheduling. Some
experimental results are presented to validate our
assumptions. Finally, the problem of data transfers is only
formulated and will be addressed in future work.
|