Typical Big Data frameworks do not consider the architecture of the servers that make up the cluster. However, these computers are increasingly heterogeneous and are based on a ccNUMA architecture. In such architectures, main memory access times differ depending on the core on which access is requested. Hence, as well as locality of data access throughout a cluster of servers, locality of memory access within individual servers can have an impact on performance. Java is a commonly-used language for Big Data applications (through the popularity of Hadoop) and the newly-released Java 8 introduces streams to simplify data-parallel programming. However, this paper argues that there are no built-in parallel stream sources that can efficiently operate on very large datasets and take data locality into account. This paper details recent work from the JUNIPER project, an EU Framework 7 Project, which is investigating how the Java 8 platform (augmented by the Real-Time Specification for Java) can be used for real-time Big Data applications. JUNIPER introduces architecture-aware stream sources which are suitable for Big Data systems and which preserve locality of data. Our results show that when reading data from disk, thread affinity can seriously degrade the performance of standard Java streams, but JUNIPER's architecture-aware streams maintain their performance.
Download Not Available

BibTex Entry

@inproceedings{Chan2014,
 acmid = {2661028},
 address = {New York, NY, USA},
 articleno = {20},
 author = {Yu Chan and  Andy Wellings and  Ian Gray and  Neil Audsley},
 booktitle = {Proceedings of the 12th International Workshop on Java Technologies for Real-time and Embedded Systems},
 doi = {10.1145/2661020.2661028},
 isbn = {978-1-4503-2813-5},
 link = {http://doi.acm.org/10.1145/2661020.2661028},
 location = {Niagara Falls, NY, USA},
 numpages = {9},
 pages = {20:20--20:28},
 publisher = {ACM},
 series = {JTRES '14},
 title = {On the Locality of Java 8 Streams in Real-Time Big Data Applications},
 year = {2014}
}