There is an obvious bus bottleneck when multiple CPUs within a Many-Core architecture share the same physical off-chip memory (eg. DDR / DRAM). Worst-Case Execution Time (WCET) analysis of application tasks will inevitably include the effects of sharing the memory bus amongst CPUs; likewise average case execution times will include effects of individual memory accesses being slowed by interference with other memory requests from other CPUs. One approach for mitigating this is to use a hardware prefetch to move instructions and data from memory to the CPU cache before a cache miss instigates a memory request. However, in a real-time system, there is a trade-off between issuing prefetch requests to off-chip memory and hence reducing bandwidth available to serving CPU cache misses; and the gain in the fact that some CPU cache misses are avoided by the prefetch with the memory system seeing reduced memory requests. In this paper we propose, analyse and show the implementation of a hardware prefetcher designed so that WCET of application tasks are not affected by the run-time behaviour of the prefetcher, i.e. it utilises spare time within the memory system to issue prefetch requests and forward them to the appropriate CPU. As well as not affecting WCET times, the prefetcher enables significant reduction in average case execution times of application tasks, showing the efficacy of the approach.

BibTex Entry

@inproceedings{Garside2014,
 acmid = {2659824},
 address = {New York, NY, USA},
 articleno = {193},
 author = {Jamie Garside and  Neil C. Audsley},
 booktitle = {Proceedings of the 22Nd International Conference on Real-Time Networks and Systems},
 doi = {10.1145/2659787.2659824},
 isbn = {978-1-4503-2727-5},
 link = {http://doi.acm.org/10.1145/2659787.2659824},
 location = {Versaille, France},
 numpages = {10},
 pages = {193:193--193:202},
 publisher = {ACM},
 series = {RTNS '14},
 title = {WCET Preserving Hardware Prefetch for Many-Core Real-Time Systems},
 year = {2014}
}