CORTEX-1

CORTEX-1 was the first implementation of a distributed neural processor. It was designed as a high-performance development system for AURA hardware, software, and applications. A major factor in the motivation and design of the system is to demonstrate the scalability of the AURA methods and technology, which is also a key aim for the AURA II project.

This system has now been retired and forms part of the James Austin Computer Collection.



cortex-1 rack system


The system (shown left) comprises four main components:

  • 7 networked PCs connected in a local cluster
  • 28 Presence neural processor cards (each with 128 hardware neurons, each neuron with 109 weights)
  • Client-server software allowing the system to be used as a "compute farm" for neural networks and AURA applications.
  • Sun-Grid Engine

 


Each PC node in the cluster is equipped according to the following specification:

  • Pentium II (700/500 MHz)
  • 768Mb PC133 RAM storage
  • 5 Presence-1 neural network accelerator cards
  • 10Gb hard disk
  • Red Hat 6.2 operating system.

In addition, four of the nodes are equipped with Dolphin Scali high-speed (80 MBytes/s) interconnections, and all nodes have 100Mbit ethernet. The system can be maintained from a single point using KVM switching and external access to the cluster is provided via a primary gateway node.

Cluster 

Cortex-1 Physical Equipment Configuration

 

Details of the neural accelerator cards can be found on the Presence hardware page. The client-server system is a distributed form of the AURA library described elsewhere. This is currently designed around a stripped-down remote method invocation (RMI) model based on the Adaptive Communication Environment (ACE) from Hughes Networks. It uses a system of sockets for high speed communication to a client system lying outside the Cortex-1 machine from any part of the Cortex-1 machine, allowing the Cortex-1 machine to be configured as a flexible compute server for AURA enabled applications, for both software and hardware accelerated tasks. The software interface is designed to be virtually transparent to a programmer used to the standard

 Actors

Cortex-1 Client-Server Relationships

 

 Components

 

Cortex-1: Main Components of the Client-Server System

 

Further Developments

Future developments of the software are likely to include full implementation of futures, load balancing, and ultimately it is intended that the software should be GRID-enabled.

 

 

 

Presence is the current family of hardware designs which accelerate the core CMM computations needed in AURA applications. This section looks at how the functionality of PRESENCE has been seemlessly incorporated into the AURA library, and how multiple PRESENCE cards are used to scale up CMM size. Single PRESENCE recall performance is proportional to the CMM's output (separator) width, therefore by striping an output vector across multiple cards and executing simultaneous recalls in parallel, we can also scale performance.

We have identified two levels of scalable AURA:

  • Multiple PRESENCE cards in a machine (node);
  •  Multiple nodes in a cluster.

PRESENCE Device Driver

The test-bed environment for Scalable Aura is the Cortex-1 cluster . Each node of the cluster runs RedHat Linux, therefore a low-level device driver (/dev/presdrv) was written for PCI PRESENCE. The device driver is inserted into the linux kernel as a kernel-space module. A static library (hw_ops) enables the user to access the driver via the ioctl() system routine. A list of the library functions are available on-line. The driver has been extended to allow parallel operation of multiple cards (maximum of 5) in a system.

Multi-PRESENCE Scalability

A HardwareCMM class was added to the AURA library that addresses one or more PRESENCE cards on a systems PCI-bus. A maximum of 5 PCI cards can be attached to a node, boosting available CMM weights memory to 640MByte. The diagram below illustrates how the HardwareCMM is constructed.

image:ioctl call

Multi-Node Scalability

A client-server framework was written that allows a HardwareCMM to be accessed remotely over the cluster. The framework uses the object-oriented Adaptive Communications Environment (ACE), an open-source networking toolkit that is portable across platforms. A DistributedCMM class was created that utilises the client-server framework to distribute a CMM over multiple HardwareCMMs, and hence multiple PRESENCE cards.

image:distributed CMM

 

 

 

Document Actions
Latest News

THE award

ACAG Wins Top Award

find out more

OPPORTUNITIES

Globe

PHD 

STUDENTSHIPS

Log in


Forgot your password?
INTRANET

Group Pages

 

Please refer to the legal disclaimer covering content on this site.