|
CODES'01 ABSTRACTS
Topics:
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
"s" indicates short paper
-
The Usage of Stochastic Processes in Embedded System Specifications
[p. 5]
-
Axel Jantsch, Ingo Sander, Wenbiao Wu
We review the use of nondeterminism and identify two different purposes.
The descriptive purpose handles uncertainties in the behaviour of existing
entities. The constraining purpose is used in specifications to constrain
implementations. For the specification of embedded systems we suggest
a stochastic process sigma instead of nondeterminism. It serves
mostly the descriptive purpose but can also be used to constrain the system.
We carefully distinguish different interpretations of these concepts by the
different design activities simulation, synthesis and verification.
-
Modeling and Evaluation of Hardware/Software Designs
[p. 11]
-
Neal K. Tibrewala, JoAnn M. Paul, Donald E. Thomas
We introduce the foundation of a system modeling environment targeted at
capturing the anticipated interactions of hardware and
software behaviors - not just their co-execution. Key to our
approach is the separation of external and internal design
testbenches. We use a frequency interleaved scheduling foundation
ideally suited to our approach because it allows unrestricted
hardware and software modeling, a mix of untimed and timed
software, and a layered approach using software schedulers and
protocols to resolve software to resource time budgets. We illustrate
our approach by discussing how architectural corner cases that arise
due to interacting hardware and software behaviors can be a
meaningful digital modeling concept. In addition to characterizing
the response of a system when viewed as a black box, we characterize the
response of the design to anticipated design changes. We include examples and
simulation results.
Keywords: Hardware/Software Codesign, Computer System Modeling and
Simulation, Digital System Design
-
SystemC: A Homogenous Environment to Test Embedded Systems
[p. 17]
-
Alessandro Fin, Franco Fummi, Maurizio Martignano, Mirko Signoretto
The SystemC language is becoming a new standard in
the EDA field and many designers are starting to use
it to model complex systems. SystemC has been mainly
adopted to define abstract models of hardware/software
components, since they can be easily integrated for rapid
prototyping. However, it can also be used to describe
modules at a higher level of detail, e.g., RT-level hardware
descriptions and assembly software modules. Thus, it
would be possible to imagine a System-C-based design
flow, where the system description is translated from one
abstraction level to the following one by always using
SystemC representations. The adoption of a SystemC-based
design flow would be particularly efficient for testing purpose
as shown in this paper. In fact , it allows the definition
of a homogeneous testing procedure, applicable to all design
phases, based on the same error model and on the same test
generation strategy. Moreover, test patterns are indifferently
applied to hardware and software components, thus making the
proposed testing methodology particularly suitable for embedded
system. Test patterns are generated on the SystemC description
modeling the system at one abstraction level, then, they are used
to validate the translation of the system to a lower abstraction
level. New test patterns are then generated for the lower abstraction
level to improve the quality of the test set and this process is
iterated for each translation (synthesis) step.
Keywords: functional testing, C++ models, embedded systems verification
-
"s" Embedded UML: a merger of real-time UML and co-design
[p. 23]
-
Grant Martin, Luciano Lavagno, Jean Louis-Guerin
In this paper, we present a proposal for a UML profile called
"Embedded UML". Embedded UML represents a synthesis of
various ideas in the real-time UML community, and concepts
drawn from the Hardware-Software co-design field. Embedded
UML first selects from among the competing real-time UML
proposals, the set of ideas which best allow specification and
analysis of mixed HW-SW systems. It then adds the necessary
concept of underlying deployment architecture that UML
currently lacks in complete form, using the notion of an
embedded HW-SW Îplatformâ. It supplements this with the
concept of a Îmappingâ, which is a platform-dependent
refinement mechanism that allows efficient generation of an
optimised implementation of the executable specification in both
HW and SW. Finally, it provides an approach which supports
the development of automated analysis, simulation, synthesis and
code generation tool capabilities which can be provided for
design usage even while the embedded UML standardisation
process takes place.
Keywords: UML, embedded systems, real-time systems, HW-SW co-design,
function-architecture co-design, platforms
-
Hardware/Software Partitioning of embedded system in OCAPI-xl
[p. 30]
-
G. Vanmeerbeeck, P. Schaumont, S. Vernalde, M.Engels, I. Bolsens
The implementation of embedded networked appliances requires
a mix of processor cores and HW accelerators on a
single chip. When designing such complex and heterogeneous
SoCs, the HW / SW partitioning decision needs to be made
prior to refining the system description. With OCAPI-xl, we
developed a methodology in which the partitioning decision
can be made anywhere in the design flow, even just prior to
doing code-generation for both HW and SW. This is made
possible thanks to a refinable, implementable, architecture
independent system description. The OCAPI-xl model was
used to develop a stand alone, networked camera, with onboard
GIF engine and network layer.
-
HW/SW Partitioning of an Embedded Instruction Memory Decompressor
[p. 36]
-
Shlomo Weiss, Shay Beren
We introduce a new PLA-based decoder architecture for
random-access run-time decompression of compressed instruction
memory in embedded systems. The compression
method employs class-based coding. We show that this
new decoder architecture can be extended to provide high
throughput decompression. The design of the decompressor
is based on the following HW/SW tradeoff: decoding is done
in hardware to provide high throughput, yet the codebook
used for decompression is fully programmable.
Keywords: embedded systems, compressed instruction memory.
-
"s" MAGELLAN: Multiway Hardware-Software Partitioning and Scheduling for Latency
Minimization of Hierarchical Control Dataflow Task Graphs
[p. 42]
-
Karam S. Chatha, Ranga Vemuri
The paper presents MAGELLAN, a heuristic technique for mapping hierarchical
control-dataflow task graph, specifications on heterogeneous architecture
templates. The architecture can consist of multiple hardware and software
processing elements as specified by the user. The objective of the technique
is to minimize the worst case latency of the task graph subject to the
area constraints on the architecture. The technique uses an iterative
approach consisting of closely linked hardware-software partitioner and
scheduler. Both the partitioner and scheduler operate on the task graph
in a hierarchical top down manner. The technique optimizes deterministic
loop constructs by applying clustering, unrolling and pipelining. The
technique considers speculative execution for conditional constructs. The
number of actual hardware/software implementations of a function in the
task graph are also optimized by the technique. The effectiveness of the
technique is demonstrated by a case study of an image compression algorithm.
-
A Practical Toolbox for System Level Communication Synthesis
[p. 48]
-
Denis Hommais, Frédéric Pétrot, Ivan Augé
This paper presents a practical approach to communication synthesis
for hardware/software system specified as tasks communicating
through lossless blocking channels. It relies on a limited set of
templates that characterize the way data are exchanged between
tasks realized either in software or in hardware. The templates
are highly portable because their software part is implemented using
the POSIX thread functions, and their hardware part is a hand
crafted synthesizable module with a System VCI interface.
These Interface Modules allow simple Virtual Component reuse
since they not only implement protocol compatibility through the
use of the System VCI/OCB standard but also system level communications
through semantics widely accepted in the design community.
-
"s" System Canvas: A New Design Environment for Embedded DSP and Telecommunication
Systems
[p. 54]
-
Praveen K. Murthy, Etan G. Cohen, Steve Rowland
We present a new design environment, called System Canvas, targeted
at DSP and telecommunication system designs. Our environment
uses an easy-to-use block-diagram syntax to specify systems
at a very high level of abstraction. The block diagram syntax is
based on formal semantics, and uses a number of different models
of computation including cyclo-static dataflow, dynamic dataflow,
and a discrete-event model. A key feature of our tool is that the
user does not need to have an awareness of which model is being
used; the models can be freely mixed and matched and a simulation
can consist of an arbitrary combination of models. The blocks
are written in ÎCâ or ÎC++â and it is straightforward to write custom
blocks and incorporate them into custom libraries. Other key
features include the ability to control simulations via language-neutral
scripts, and a powerful optimization engine that enables
optimization of the system over arbitrarily specified parameters,
constraints, and cost functions. Fixed-point analysis capability
allows any signal or variable in the system to be set to any type of
number system before the simulation proceeds. The tool is available
on the Windows NT platform and incorporates modern and
ubiquitous Windows GUI look and feel.
-
Designing Domain-Specific Processors
[p. 61]
-
Marnix Arnold, Henk Corporaal
We present a semi-automated method for the detection and
exploitation of application domain specific instruction set
extensions for embedded (VLIW) processors. It consists of
three steps: the first step detects frequently occurring
operation patterns, in the second step, the patterns are grouped
and implemented in a number of Special Function Units (SFUs) and the
third step incorporates the custom operations into the code generation
process.
Experimental show that the SFUs generated and exploited with our
methodology can result in architectures that perform up to 30% better
than architectures of the same cost without SFUs.
Keywords: Instruction Set Synthesis, Design Space Exploration
-
RS-FDRA: A Register Sensitive Software Pipelining Algorithm for Embedded VLIW
Processors
[p. 67]
-
Cagdas Akturan, Margarida F. Jacome
The paper proposes a novel software-pipelining algorithm,
Register Sensitive Force Directed Retiming Algorithm (RSFDRA),
suitable for optimizing compilers targeting embedded
VLIW processors. The key difference between RS-FDRA and
previous approaches is that our algorithm can handle code size
constraints along with latency and resource constraints. This
capability enables the exploration of pareto "optimal" points with
respect to code size and performance. RS-FDRA can also
minimize the increase in "register pressure" typically incurred by
software pipelining. This ability is critical since, the need to insert
spill code may result in significant performance degradation.
Extensive experimental results are presented demonstrating that
the extended set of optimization goals and constraints supported
by RS-FDRA enables a thorough compiler-assisted exploration of
tradeöoffs among performance, code size, and register
requirements, for time critical segments of embedded software
components.
Keywords
Software pipelining, optimizing compilers, embedded systems,
VLIW processors, retiming
-
A Novel Parallel Deadlock Detection Algorithm and Architecture
[p. 73]
-
Pun Hang Shiu, Yudong Tan, Vincent John Mooney III
A novel deadlock detection algorithm and its hardware implementation
are presented in this paper. The hardware deadlock detection algorithm
has a run time complexity of Ohw(min(m,n)), where m and n
are the number of processors and resources, respectively. Previous algorithms
based on a Resource Allocation Graph have Osw(mxn) run time
complexity for the worst case. We simulate a realistic example in which
the hardware deadlock detection unit is applied, and demonstrate that the
hardware implementation of the novel deadlock detection algorithm reduces
deadlock detection time by 99.5%. Furthermore, in a realistic example,
total execution time is reduced by 68.9%.
Keywords: Deadlock Detection, Parallel Algorithm, Hardware/Software
Codesign, Real-time Operating System.
-
Towards Effective Embedded Processors in Codesigns: Customizable Partitioned
Caches
[p. 79]
-
Peter Petrov, Alex Orailoglu
This paper explores an application-specific customization technique
for the data cache, one of the foremost area/power consuming and
performance determining microarchitectural features of modern embedded
processors. The automated methodology for customizing
the processor microarchitecture that we propose results in increased
performance, reduced power consumption and improved determinism
of critical system parts while the fixed design ensures processor
standardization. The resulting improvements help to enlarge
the significant role of embedded processors in modern hardware/
software codesign techniques by leading to increased processor
utilization and reduced hardware cost. A novel methodology
for static analysis and a field-reprogrammable implementation of a
customizable cache controller that implements a partitioned cache
structure is proposed. The simulation results show significant decrease
of miss ratio compared to traditional cache organizations.
Keywords: embedded processors, data cache, reprogrammable customizations
-
Development Cost and Size Estimation Starting from High-Level Specifications
[p. 86]
-
William Fornaciari, Fabio Salice, Umberto Bondi, Edi Magini
This paper addresses the problem of estimating cost and
development effort of a system, starting from its complete or
partial high-level description. In addition, some modifications to
evaluate the cost-effectiveness of reusing VHDL-based designs,
are presented. The proposed approach has been formalized using
an approach similar to the COCOMO analysis strategy, enhanced
by a project size prediction methodology based on a VHDL
function point metric. The proposed design size estimation
methodology has been validated through a significant benchmark,
the LEON-1 microprocessor, whose VHDL description is of
public domain
Categories for Codesâ01 reviewers
System development processes, Applications
Keywords
Concurrent engineering, process management, project size
estimation, design reuse, VHDL
-
Exploring Design Space of Parallel Realizations:MPEG-2 Decoder Case Study
[p. 92]
-
Basant K. Dwivedi, Jan Hoogerbrugge, Paul Stravers, M. Balakrishnan
Many applications lend them to parallelism at different levels of
granularity. We first identify the key issues involved in creating
a parallel model of an application. These are done with a view to
estimate performance and explore the "parallel" design space to select
a suitable design point. The framework presented provides an opportunity
to perform this exploration both in the target architecture independent
and target architecture dependent manner. An MPEG-2 decoder model in
YAPI has been presented which has more parallelism and improved performance.
This model has further been mapped into SpaceCAKE architecture to study its
architectural parameters. Detailed results obtained with YAPI simulation
(target architecture independent) and TSS simulation (after process-processor
binding) on MPEG-2 decoder application establish the effectiveness of
our approach.
Keywords: MPEG-2 Decoder, YAPI, Parallel realization, Process, Thread,
FIFO
-
"s" Source-Level Execution Time Estimation of C Programs
[p. 98]
-
Carlo Brandolese, William Fornaciari, Fabio Salice, Donatella Sciuto
In this paper a comprehensive methodology for software execution
time estimation is presented. The methodology is supported by
rigorous mathematical models of C statements in terms of elementary
operations. The deterministic contribution is combined with a
statistical term accounting for all those aspects that cannot be
quantified exactly. The methodology has been validated by realizing
a complete prototype toolset, used to carry out the experiments.
-
"s" STARS of MPEG decoder: a case study in worst-case analysis of discrete-event
systems
[p. 104]
-
Felice Balarin
STARS (STatic Analysis of Reactive Systems) is a methodology for worst-case
analysis of discrete systems. Theoretical foundations of STARS have been
laid down [1,2,3], but no implementation has been presented so far. We
introduce an implementation of STARS as an extension of YAPI, a programming
interface used to model signal processing applications as process
networks [7]. We apply STARS to a YAPI model of an MPEG decoder. We show
that worst-case bounds computed by STARS are quite close to simulated values
(within 15%). We also show that additional effort by the designer
required to build STARS models is very small compared to effort of building
the YAPI simulation model, and that the run times of STARS are negligible
compared to the simulation run times.
KEY WORDS: system verification, worst-case analysis, static analysis
-
"s" Evaluating Register File Size in ASIP Design
[p. 109]
-
Manoj Kumar Jain, Lars Wehmeyer, Stefan Steinke, Peter Marwedel,
M. Balakrishnan
Interest in synthesis of Application Specific Instruction Set Processors
or ASIPs has increased considerably and a number of methodologies
have been proposed for ASIP design. A key step in ASIP
synthesis involves deciding architectural features based on application
requirements and constraints. In this paper we observe the
effect of changing register file size on the performance as well as
power and energy consumption. Detailed data is generated and
analyzed for a number of application programs. Results indicate
that choice of an appropriate number of registers has a significant
impact on performance.
Keywords
Register file, Synthesis, Instruction set, Instruction power model,
Register spill, Application specific instruction set processor
-
Generating Mixed Hardware/Software Systems from SDL Specifications
[p. 116]
-
Frank Slomka, Matthias Dörfel, Ralf Münzenberger
A new approach for the translation of SDL specifications
to a mixed hardware/software system is presented.
Based on the computational model of communicating
extended finite state machines (EFSM) the control flow
is separated from data flow of the SDL process. Hence
for the first time it is possible to generate a mixed hardware/
software implementation of an SDL process. This
technique also reduces the complexity for high-level and
register-transfer synthesis tools for the hardware parts
of the system. The advantage of this methodology is
shown by a design example of a wireless communication
chip.
-
Area-Efficient Buffer Binding Based on a Novel Two-Port FIFO Structure
[p. 122]
-
Kyoungseok Rha, Kiyoung Choi
In this paper, we address the problem of minimizing buffer storage
requirement in buffer binding for SDF (Synchronous Dataflow) graphs.
First, we propose a new two-port FIFO buffer structure that can be
efficiently shared by two producer/consumer pairs. Then we propose a
buffer binding algorithm based on this two-port buffer structure for
minimizing the buffer size requirement. Experimental results
demonstrate 9.8%~37.8% improvement in buffer requirement
compared to the conventional approaches.
Keywords
Buffer binding, buffer sharing, SDF, scheduling
-
Deriving Hard Real-Time Embedded Systems Implementations directly from SDL
Specifications
[p. 128]
-
J.M. Alvarez, M Diaz, L. Llopis, E. Pimentel, J.M. Troya
Object-Oriented methodologies together with Formal Description Techniques
(FDT) are a promising way to deal with the increasing complexity of hard
real-time embedded systems. However, FDTs do not take into account
non-functional aspects as real-time constraints. Based on a new real-time
execution model for FDT SDL proposed in previous works, a way to derive
implementations of hard real-time embedded systems directly from SDL
specifications is presented. In order to get it we propose a middleware
that supports this model to organize the execution of the tasks generated
from SDL system specification. Additionally, a worst case real-time
analysis, including the middleware overhead, is presented. Finally, an
example to generate the implementation from the SDL specification and a
performance study is developed.
Keywords: SDL, real-time, scheduler, embedded system
-
A Trace Transformation Technique for Communication Refinement
[p. 134]
-
Paul Lieverse, Pieter van der Wolf, Ed Deprettere
Models of computation like Kahn and dataflow process networks
provide convenient means for modeling signal processing applications.
This is partly due to the abstract primitives that these models
offer for communication between concurrent processes. However,
when mapping an application model onto an architecture, these
primitives need to be mapped onto architecture level communication
primitives. We present a trace transformation technique that
supports a system architect in performing this communication refinement.
We discuss the implementation of this technique in a tool
for architecture exploration named SPADE and present examples.
-
"s" A Systematic Approach to Software Peripherals for Embedded Systems
[p. 140]
-
Dimitrios Lioupis, Apostolos Papagiannis, Dionysia Psihogiou
The continued growth of microprocessorsâ performance and the
need for better CPU utilization, has led to the introduction of the
software peripheralsâ approach: By this term we refer to software
modules that can successfully emulate peripherals that, until now,
were traditionally implemented in hardware. Software
implementations offer great flexibility in product design and in
functional upgrades, while they have high contribution in the
cost/performance ratio optimization. We focus on embedded
applications, where the cost and the short time to market are the
leading issues. In this paper, we study the hardware and software
requirements for developing a generic microprocessor with
support for software peripherals. Additionally, we present three
software peripherals, a Universal Asynchronous Receiver
Transmitter, a keypad controller and a dot matrix LCD controller,
and we analyze their impact in CPU occupation. Finally, we
explore the impact of using a software UART on system power dissipation.
Keywords
Software peripherals, embedded processors, reconfigurable
architectures
-
A Constructive Algorithm for Memory-Aware Task Assignment and Scheduling
[p. 147]
-
Radoslaw Wlodzimierz Szymanek, Krzysztof Kuchcinski
This paper presents a constructive algorithm for memory-aware
task assignment and scheduling, which is a part of the prototype
system MATAS. The algorithm is well suited for image and video
processing applications which have hard memory constraints as
well as constraints on cost, execution time, and resource usage.
Our algorithm takes into account code and data memory constraints
together with the other constraints. It can create pipelined
implementations. The algorithm finds a task assignment, a schedule,
and data and code memory placement in memory. Infeasible
solutions caused by memory fragmentation are avoided. The
experiments show that our memory-aware algorithm reduces
memory utilization comparing to greedy scheduling algorithm
which has time minimization objective. Moreover, memory-aware
algorithm is able to find task assignment and schedule when time
minimization algorithm fails. MATAS can create pipelined implementations,
therefore the design throughput is increased.
keywords: task scheduling, task assignment, memory constraints,
constraint programming
-
A Constraint-based Application Model and Scheduling Techniques for Power-aware
Systems
[p. 153]
-
Jinfeng Liu, Pai H. Chou, Nader Bagherzadeh, Fadi Kurdahi
New embedded systems must be power-aware, not just low-power.
That is, they must track their power sources and the changing power
and performance constraints imposed by the environment. Moreover,
they must fully explore and integrate many novel power management
techniques. Unfortunately, these techniques are often incompatible
with each other due to overspecialized formulations or
they fail to consider system-wide issues. This paper proposes a
new graph-based model to integrate novel power management techniques
and facilitate design-space exploration of power-aware embedded
systems. It captures min/max timing and min/max power
constraints on computation and non-computation tasks through a
new constraint classification and enables derivation of flexible systemlevel
schedules. We demonstrate the effectiveness of this model
with a power-aware scheduler on real mission-critical applications.
Experimental results show that our automated techniques can improve
performance and reduce energy cost simultaneously. The
application model and scheduling tool presented in this paper form
the basis of the IMPACCT system-level framework that will enable
designers to aggressively explore many power-performance tradeoffs
with confidence.
Keywords
constraint modeling, power-aware real-time scheduling, embedded
systems software, system-level design
-
"s" Optimal Acyclic Fine-Grain Scheduling with Cache Effects for Embedded and
Real Time Systems
[p. 159]
-
Sid-Ahmed-Ali Touati
To sustain the increases in processor performance, embedded and
real-time systems need to find the best total schedule
time when compiling their application. The optimal
acyclic scheduling problem is a classical challenge which
has been formulated using integer programming in lot of
works. In this paper, we give a new formulation of acyclic
instruction scheduling problem under registers and resources
constraints in multiple instructions issuing processors with
cache effects. Given a direct acyclic graph G = (V,E),
the complexity of our integer linear programming model is
bounded by O(|V|2) variables and O(|E|+|V|2)
constraints.
This complexity is better than the complexity of the existing
techniques which includes a worst total schedule time
factor.
Keywords optimal acyclic schedule, registers constraints, resources
constraints, cache effects, integer programming
-
"s" Scheduling-based Code Size Reduction in Processors with Indirect Addressing
Mode
[p. 165]
-
Sungtaek Lim, Jihong Kim, Kiyoung Choi
DSPs are typically equipped with indirect addressing modes with
auto-increment and auto-decrement, which provide efficient
address arithmetic calculations. Such an addressing mode is
maximally utilized by careful placement of variables in storage,
thereby reducing the amount of address arithmetic instructions.
Finding proper placement of variables in storage is called storage
assignment problem and the result highly depends on the access
sequence of variables. This paper suggests statement scheduling as
a compiler optimization step to generate a better access sequence.
Experimental results show 3.6% improvement on the average over
naive storage assignment.
Keywords
Code generation, indirect addressing mode, storage assignment,
code size reduction
-
"s" Task concurrency management methodology to schedule the MPEG4 IM1 player on
a highly parallel processor platform
[p. 170]
-
Chun Wong, Paul Marchal, Peng Yang, Aggeliki Prayati,
Francky Catthoor, Rudy Lauwereins, Diederik Verkest, Hugo De Man
This paper addresses the concurrent task management of complex
multi-media systems, like the MPEG4 IM1 player, with emphasis
on how to derive energy-cost vs time-budget curves through
task scheduling on a multi-processor platform. Starting from the
original ăstandardä specification, we extract the concurrency originally
hidden by implementation decisions in a ăgrey-boxä model.
Then we have applied two high-level transformations on this model
to improve the task-level concurrency. Finally, by scheduling the
transformed task-graph, we have derived energy-cost vs time-budget
curves. These curves will be used to get globally optimized design
decisions when combining subsystems into one complete system or
to be used by a dynamic scheduler. The results on the MPEG4 IM1
player confirm the validity of our assumptions and the usefulness
of our approach.
Keywords
concurrency, scheduling, MPEG-4, embedded system, cost-efficiency
-
Parameterized System Design Based on Genetic Algorithms
[p. 177]
-
Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi
A recent reduction in the time to market has led to the development
of a new approach to IP-based design in which a high parametric pre-designed
system-on-a-chip is configured according to the application it will have to
execute. The greatest problems in this area regard exploration of the
range of possible system configurations in search of the optimal configuration
for a given system. There are, in fact, a number of parameters involved
(bus sizes, cache configurations, software algorithms, etc.), each of which
has a great impact on design constraints such as area, power and performance.
An exhaustive analysis of all possible configurations is thus computationally
unfeasible. In this paper we propose using genetic algorithms to determine
the optimal configuration for a highly parametric system. The approach
is applied to the search for the optimal configuration (in terms of area,
power and mean access time) of a memory hierarchy involved in a given
application.
Keywords: Parameterised Systems, Genetic algorithms, Exploration of
system configurations
-
"s" Minimizing System Modification in an Incremental Design Approach
[p. 183]
-
Paul Pop, Petru Eles, Traian Pop, Zebo Peng
In this paper we present an approach to mapping and scheduling
of distributed embedded systems for hard real-time applications,
aiming at minimizing the system modification cost. We consider an
incremental design process that starts from an already existing system
running a set of applications. We are interested to implement
new functionality so that the already running applications are disturbed
as little as possible and there is a good chance that, later,
new functionality can easily be added to the resulted system. The
mapping and scheduling problem are considered in the context of a
realistic communication model based on a TDMA protocol.
Keywords: design space exploration, design reuse, distributed
real-time systems, process mapping and scheduling, methodology.
-
"s" High-level architectural co-simulation using Esterel and C
[p. 189]
-
Andre Chatelain, Yves Mathys, Giovanni Placido, Alberto La Rosa,
Luciano Lavagno
This paper introduces an architectural simulation environment, aimed at
defining the best SOC architecture for complex system-level applications.
The application is modeled using an abstract Timing Modeling Language, that
describes the requests (e.g., memory accesses, I/Os, etc.) that the application
makes to the architecture. The abstract architecture is modeled at the
cycle-accurate level using a mixture of Esterel (a synchronous language)
and C. We discuss the results of the application of this tool to a
GSM/GPRS application, including a dramatic speed-up of the architectural
exploration phase.
-
"s" A Generic Wrapper Architecture for Multi-Processor SoC Cosimulation and
Design
[p. 195]
-
Sungjoo Yoo, Gabriela Nicolescu, Damien Lyonnard, Amer Baghdadi, Ahmed A.
Jerraya
In communication refinement with multiple communication protocols
and abstraction levels, the system specification is described by
heterogeneous components in terms of communication protocols
and abstraction levels. To adapt each heterogeneous component to
the other part of system, we present a generic wrapper architecture
that can adapt different protocols or different abstraction levels, or
both. In this paper, we give a detailed explanation of applying the
generic wrapper architecture to mixed-level cosimulation. As preliminary
experiments, we applied it to mixed-level cosimulation of
an IS-95 CDMA cellular phone system.
-
"s"
The TACO Protocol Processor Simulation Environment
[p. 201]
-
Seppo Virtanen, Johan Lilius
Network hardware design is becoming increasingly challenging
because more and more demands are put on network
bandwidth and throughput requirements, and on the speed
with which new devices can be put on the market. Using
current standard techniques (general purpose microprocessors,
ASIC's) these goals are difficult to reach simultaneously.
One solution to this problem that has recently attracted
interest is the design of programmable processors
with network-optimized hardware, that is, network or protocol
processors. In this paper a simulation framework for a
family of TTA protocol processor architectures is proposed.
The protocol processors consist of a number of buses with
functional units that encapsulate protocol specific operations.
The TACO protocol processor simulator is a C++
framework based on SystemC. Functional units are created
as C++ classes, which makes it easy to experiment with
different configurations of the processor to see its performance.
Keywords
microprocessor, protocol, simulation, codesign
-
Formal Synthesis and Code Generation of Embedded Real-Time Software
[p. 208]
-
Pao-Ann Hsiung
Due to rapidly increasing system complexity, shortening time-tomarket,
and growing demand for hard real-time systems, formal
methods are becoming indispensable in the synthesis of embedded
systems, which must satisfy stringent temporal, memory, and environment
constraints. There is a general lack of practical formal
methods that can synthesize complex embedded real-time software
(ERTS). In this work, a formal method based on Time Free-Choice
Petri Nets (TFCPN) is proposed for ERTS synthesis. The synthesis
method employs quasi-static data scheduling for satisfying
limited embedded memory requirements and uses dynamic realtime
scheduling for satisfying hard real-time constraints. Software
code is then generated from a set of quasi-statically and dynamically
scheduled TFCPNs. Finally, an application example is given
to illustrate the feasibility of the proposed TFCPN-based formal
method for ERTS synthesis.
Keywords
Embedded real-time software, Petri Nets, scheduling, code generation
-
Whole program compilation for embedded software: the ADSL experiment
[p. 214]
-
Johan Cockx
The increasing complexity and decreasing time-to-market of embedded software
forces designers to write more modular and reusable code, using for example
object-oriented techniques and languages such as C++. The resulting memory
and runtime overhead cannot be removed by traditional optimizing compilers;
a global, whole program analysis is required. To evaluate the potential
of whole program optimization techniques, we have manually optimized the
embedded software of a commercial ADSL modem. Using only techniques
that can be automated, a memory footprint reduction of nearly 60% has been
achieved. We conclude that a consistent and aggressive use of whole
system optimization techniques is feasible and worthwhile, and that the
implementation of such techniques in a compiler for embedded software will
allow software designers to write more modular and reusable code without
suffering the associated implementation overhead.
Keywords: Whole program compilation, embedded software, interprocedural
optimization, C++
-
Compiler-Directed Selection of Dynamic Memory Layouts
[p. 219]
-
Mahmut Taylan Kandemir, Ismail Kadayif
Compiler technology is becoming a key component in
the design of embedded systems, mostly due to increasing participation
of software in the design process. Meeting system-level objectives
usually requires flexible and retargetable compiler optimizations
that can be ported across a wide variety of architectures. In
particular, source-level compiler optimizations aiming at increasing
locality of data accesses are expected to improve the quality of the
generated code. Previous compiler-based approaches to improving
locality have mainly focused on determining optimal memory layouts
that remain in effect for the entire execution of an application. For
large embedded codes, however, such static layouts may be insufficient
to obtain acceptable performance. The selection of memory
layouts that dynamically change over the course of a program's execution
adds another dimension to data locality optimization. This
paper presents a technique that can be used to automatically determine
which layouts are most beneficial over specific regions of a
program while taking into account the added overhead of dynamic
(runtime) layout changes. The results obtained using two benchmark
codes show that such a dynamic approach brings significant
benefits over a static state-of-the-art technique.
Keywords. Software Compilation, Data Locality, Memory
Layout Optimizations, Array Reuse, Data Dependence.
-
"s" Logic Optimization and Code Generation for Embedded Control Applications
[p. 225]
-
Yunjian Jiang, Robert Brayton
We address software optimization for embedded control
systems. The Esterel language is used as the front-end
specification; Esterel compiler v6 is used to partition
the control circuit and data path; the resulting intermediate
representation of t he design is a control-data network. This
paper emphasizes the optimization of the control circuit portion
and the code generation of the logic network. The new control-data
network representation has four types of nodes: control, multiplexer,
predicate and data expression; the control portion is a multi-valued
logic network optimization package called MVSIS for the control
optimization. It includes algebraic methods to perform multi-valued
algebraic devision, factorization and decomposition and logic
simplification methods based on observability don't cares. We have
developed methods to evaluate a control-data network based on both
an MDD and sum-of-products representation of the multi-valued
logic functions. The MDD-based approach uses multi-valued intermediate
variables and generates code according to the internal BDD structure.
The SOP-base code is proportional to the number of cubes in the logic
network. Preliminary results compare the two approaches and the optimization
effectiveness.
Keywords: Esterel, Logic optimization, MDD, Code generation, Multiple-valued.
-
"s" Empirical Comparison of Software-Based Error Detection and Correction
Techniques for Embedded Systems
[p. 230]
-
R.H.L. Ong, M.J. Pont
"Function Tokens" and "NOP Fills" are two methods proposed by
various authors to deal with Instruction Pointer corruption in
microcontrollers, especially in the presence of high
electromagnetic interference levels. An empirical analysis to
assess and compare these two techniques is presented in this
paper.
Two main conclusions are drawn: [1] NOP Fills are a powerful
technique for improving the reliability of embedded applications
in the presence of EMI, and [2] the use of Function Tokens can
lead to a reduction in overall system reliability.
Keywords
Instruction Pointer Corruption, Electromagnetic Interference,
EMI, Function Token, NOP Fill, Software-based Error Detection
Techniques, Embedded systems
-
Dynamic I/O Power Management for Hard Real-time Systems
[p. 237]
-
Vishnu Swaminathan, Krishnendu Chakrabarty, S. S. Iyengar
Power consumption is an important design parameter for
embedded and portable systems. Software-controlled (or
dynamic) power management (DPM) has recently emerged
as an attractive alternative to inflexible hardware solutions.
DPM for hard real-time systems has received relatively little
attention. In particular, energy-driven I/O device scheduling
for real-time systems has not been considered before.
We present the first online DPM algorithm, which we call
Low Energy Device Scheduler (LEDES), for hard realtime
systems. LEDES takes as inputs a predetermined task
schedule and a device-usage list for each task and it generates
a sequence of sleep/working states for each device. It
guarantees that real-time constraints are not violated and
it also minimizes the energy consumed by the I/O devices
used by the task set. LEDES is energy-optimal under the
constraint that the start times of the tasks are fixed. We
present a case study to show that LEDES can reduce energy
consumption by almost 50%.
-
"s" Hybrid Global/Local Search Strategies for Dynamic Voltage Scaling in
Embedded Multiprocessors
[p. 243]
-
Neal K. Bambha, Shuvra S. Bhattacharyya, Jürgen Teich, Eckart Zitzler
In this paper, we explore a hybrid global/local search optimization
framework for dynamic voltage scaling in embedded multiprocessor
systems. The problem is to find, for a multiprocessor system in
which the processors are capable of dynamically varying their core
voltages, the optimum voltage levels for all the tasks in order to
minimize the average power consumption under a given performance
constraint. An effective local search approach for static
voltage scaling based on the concept of a period graph has been
demonstrated in [1]. To make use of it in an optimization problem,
the period graph must be integrated into a global search algorithm.
Simulated heating, a general optimization framework developed in
[19], is an efficient method for precisely this purpose of integrating
local search into global search algorithms. However, little is
known about the management of computational (compile-time)
resources between global search and local search in hybrid algorithms,
such as those coordinated by simulated heating. In this
paper, we explore various hybrid search management strategies for
power optimization under the framework of simulated heating. We
demonstrate that careful search management leads to significant
power consumption improvement over add-hoc global search /
local search integration, and explore alternative approaches to performing
hybrid search management for dynamic voltage scaling.
Keywords
simulated heating, dynamic voltage scaling
-
"s" Processor Frequency Setting for Energy Minimization of Streaming
Multimedia Applications
[p. 249]
-
Andrea Acquaviva, Luca Benini, Bruno Ricco
In this paper, we describe a software-controlled approach for adaptively
minimizing energy in embedded systems for real-time multimedia processing.
Energy is optimized by clock speed setting: the software controller
dynamically adjusts processor clock speed to the frame rate requirements
of the incoming multimedia stream. The speed-setting policy is based on a
system model that correlates clock speed with best-case, average-case and
worst-case sustainable frame rate, accounting for data-dependency in
multimedia streams. Experiments on an MP3 decoding application show that
computational energy can be drastically reduced with respect to fixed-frequency
operation.
-
"s" Retargetable Compilation for Low Power
[p. 254]
-
Wen-Tsong Shiue
Most research to date on energy minimization in DSP processors
has focuses on hardware solution. This paper examines the
software-based factors affecting performance and energy
consumption for architecture-aware compilation. In this paper, we
focus on providing support for one architectural feature of DSPs
that makes code generation difficult, namely the use of multiple
data memory banks. This feature increases memory bandwidth by
permitting multiple data memory accesses to occur in parallel
when the referenced variables belong to different data memory
banks and the registers involved conform to a strict set of
conditions. We present novel instruction scheduling algorithms
that attempt to maximize the performance, minimize the energy,
and therefore, maximize the benefit of this architectural feature.
Experimental results demonstrate that our algorithms generate high
performance, low energy codes for the DPS architectural features
with multiple data memory banks. Our algorithm led to
improvements in performance and energy consumption of 48.3%
and 66.6% respectively in our benchmark examples.
Keywords
Architecture-aware compiler design, high performance and
low power design, instruction scheduling, register
allocation.
-
"s" A Design Framework to Efficiently Explore Energy-Delay Tradeoffs
[p. 260]
-
William Fornaciari, Donatella Sciuto, Cristina Silvano, Vittorio Zaccaria
Comprehensive exploration of the design space parameters
at the system-level is a crucial task to evaluate architectural
tradeoffs accounting for both energy and performance
constraints. In this paper, we propose a system-level design
methodology for the e.cient exploration of the memory
architecture from the energy-delay combined perspective.
The aim is to .nd a sub-optimal con.guration of the
memory hierarchy without performing the exhaustive analysis
of the parameters space. The target system architecture
includes the processor, separated instruction and data levelone
caches, the main memory, and the system buses. The
methodology is based on the sensitivity analysis of the optimization
function with respect to the tuning parameters
of the cache architecture (mainly cache size, block size and
associativity). The e.ectiveness of the proposed methodology
has been demonstrated through the design space exploration
of a real-world example: a MicroSPARC2-based
system running the Mediabench suite. Experimental results
have shown an optimization speedup of 329 times with respect
to the full search, while the near-optimal system-level
con.guration is characterized by a distance from the optimal
full search con.guration in the band of 10%.
|