This paper provides further evaluation of the proposed hardware prefetch unit for the Blueshell NoC. This utilises a separate shared memory tree (Bluetree) for connecting CPUs to external memory. The tree is supplemented with a Prefetch Unit next to external memory. Prefetching is carried out in a streaming manner, with prefetch distance being varied between 1 and 4. Whilst previous work has suggested that prefetching is an appropriate architectural technique within NoCs, enabling better system performance, this paper provides further behavioural insight - particularly the degree to which the bottleneck of external DDR (single port access) eventually dominates performance. Evaluation via traffic generators (hosted on Microblaze CPUs in the NoC) show improvements of over 100% for certain memory loads and prefetch distances. In all cases, prefetching is shown to have a beneficial effect upto the point at which the memory system is flooded by CPU requests. The evaluation is supported by an MP3 case study, which shows improvements of around 10% for upto 4 CPU cores - performance improvement falling as the number of CPUs increases (to 8 or 16) due to the memory system being flooded

BibTex Entry

@inproceedings{Garside2013,
 address = {Montreal},
 author = {Jamie Garside and  Neil C Audsley},
 booktitle = {Memory Architecture and Organisation Workshop 2013},
 file = {:Users/jamie/Documents/Papers/meaow-prefetch.pdf:pdf},
 title = {Investigating Shared Memory Tree Prefetching within Multimedia NoC Architectures},
 year = {2013}
}