|
CORTEX-1
CORTEX-1 was the first implementation of a distributed neural processor. It was designed as a high-performance development system for AURA hardware, software, and applications. A major factor in the motivation and design of the system is to demonstrate the scalability of the AURA methods and technology, which is also a key aim for the AURA II project. This system has now been retired and forms part of the James Austin Computer Collection.
In addition, four of the nodes are equipped with Dolphin Scali high-speed (80 MBytes/s) interconnections, and all nodes have 100Mbit ethernet. The system can be maintained from a single point using KVM switching and external access to the cluster is provided via a primary gateway node.
Cortex-1 Physical Equipment Configuration
Details of the neural accelerator cards can be found on the Presence hardware page. The client-server system is a distributed form of the AURA library described elsewhere. This is currently designed around a stripped-down remote method invocation (RMI) model based on the Adaptive Communication Environment (ACE) from Hughes Networks. It uses a system of sockets for high speed communication to a client system lying outside the Cortex-1 machine from any part of the Cortex-1 machine, allowing the Cortex-1 machine to be configured as a flexible compute server for AURA enabled applications, for both software and hardware accelerated tasks. The software interface is designed to be virtually transparent to a programmer used to the standard
Cortex-1 Client-Server Relationships
Cortex-1: Main Components of the Client-Server System
Further Developments Future developments of the software are likely to include full implementation of futures, load balancing, and ultimately it is intended that the software should be GRID-enabled.
Presence is the current family of hardware designs which accelerate the core CMM computations needed in AURA applications. This section looks at how the functionality of PRESENCE has been seemlessly incorporated into the AURA library, and how multiple PRESENCE cards are used to scale up CMM size. Single PRESENCE recall performance is proportional to the CMM's output (separator) width, therefore by striping an output vector across multiple cards and executing simultaneous recalls in parallel, we can also scale performance. We have identified two levels of scalable AURA:
PRESENCE Device DriverThe test-bed environment for Scalable Aura is the Cortex-1 cluster . Each node of the cluster runs RedHat Linux, therefore a low-level device driver (/dev/presdrv) was written for PCI PRESENCE. The device driver is inserted into the linux kernel as a kernel-space module. A static library (hw_ops) enables the user to access the driver via the ioctl() system routine. A list of the library functions are available on-line. The driver has been extended to allow parallel operation of multiple cards (maximum of 5) in a system.Multi-PRESENCE ScalabilityA HardwareCMM class was added to the AURA library that addresses one or more PRESENCE cards on a systems PCI-bus. A maximum of 5 PCI cards can be attached to a node, boosting available CMM weights memory to 640MByte. The diagram below illustrates how the HardwareCMM is constructed.Multi-Node ScalabilityA client-server framework was written that allows a HardwareCMM to be accessed remotely over the cluster. The framework uses the object-oriented Adaptive Communications Environment (ACE), an open-source networking toolkit that is portable across platforms. A DistributedCMM class was created that utilises the client-server framework to distribute a CMM over multiple HardwareCMMs, and hence multiple PRESENCE cards.
Document Actions |
|
Please refer to the legal disclaimer covering content on this site.