University of York - Department of Computer Science

AURAmol technical data

Search:
Option 1- Select a random molecule from our database:
Option 2- Specify any molecule from 0 to 999 from our database:
Option 3- Upload a molecule file:
Threshold:
Molecule

Overview

AURAmol is the structure-based search software that uses the 3D structure of molecules when searching for similar examples in molecular databases. This reliable, easy-to-use system guarantees to give meaningful rankings of similarity, giving the user a valuable enrichment of the virtual screening process.

Since 1998, Cybula, in collaboration with the University of York, has been developing AURAmol to be the leading "shape-based" search tool used in the drug discovery and chemistry community. AURAmol provides a feature-rich software platform which enables drug discovery professionals to rapidly assimilate sets of candidate molecules to create screening libraries, search for similar compounds during IP protection or simply look for alternative compounds for chemical trials. AURAmol integrates seamlessly with the molecule file format you are using now.

Using AURAmol

Following installation, a typical search environment can be is set up and maintained as follows:

The basic AURAmol software was designed to fit in with existing molecular display systems and so does not have its own graphical user interface. Search results can be retrieved from any standard database and then be displayed using any molecule viewing software. In this application we have added a web interface to the system.

3D Similarity Match

AURAmol describes the molecules as attributed graphs, and then uses a graph similarity search method to find similar examples (see references). Since it uses graphic representation of the 3D structure of the molecule, the approach returns more relevant matches. The method has been extensively evaluated with a number of companies and shown to provide good results in comparison to other string-based search methods. By using a graph representation searching for similar examples uses the 3D structure of the molecule explicitly, rather than via implicit methods that are found with other systems. As a result AURAmol returns matches that are more relevant to the structure of the molecule. The method can be combined with other feature information (e.g. surface charge, hydrophobicity) if required, to allow combined selection on features as well as structure.

Operating System

The AURAmol software library is compatible with MS windows, Linux and Solaris programming environments. It is available as an object library in all platforms. It is a non-visual, simple-typed environment with a limited number of configuration parameters. As such, the complexity of the system is hidden from the end user, meaning that results of searches can be obtained with minimal training. For enquires on obtaining the software contact Cybula Ltd.

User Interface

There are three principal function calls to the library, each designed to handle configuration, training and querying molecular databases. Header files are currently available in C/C++ and can be easily ported to a number of other languages on request. Performance and requirements

AURAmol delivers a ranked list of at close matches to a query from any database. The processing rate in approximately 1.5 seconds per 1,000 molecule comparisons on a 2.2GHz machine. Results can be delivered incrementally, so you can be viewing results from comparison with the first 1,000 molecules in the database whilst AURAmol searches subsequent data blocks.

A special feature of AURAmol is that the system can be tuned to deliver results within specific processing times. This is a useful feature if very large databases need to be searched. If very fast processing times are required, AURAmol can use a coarse representation of molecules to quickly obtain a rank of match that closely approximates the more precise ranking that can be obtained with a more thorough search.

The software libraries occupy 16 Mb of RAOM. Data libraries occupy 60 Mb per 1000 molecules. Whilst processing a query, RAM rises by an additional 2 Mb. The minimum recommended hardware specification for searching databases of 100,000 molecules with an average of 60 atoms per molecule is as follows:

The current system also runs on the Cortex II parallel processor supported by the PRESENCE II FPGA based processor cards. More information on this can supplied on request.

OpenBabel Interface

AURAmol is commercial software and has its own internal representation of molecules. We are happy to provide means of translation of over 30 molecule formats using the OpenBabel software. OpenBabel is available via a GPL. Cybula can provide libraries and source code that can be integrated with AURAmol software to perform invisible conversion to and from your own the file format that you are currently using. Compatible formats include SDF/MOL, Sybyl mol2, PDB, SMILES, XYZ and CML amongst many others. ODBC compatible

AURAmol searches on raw molecule shape information, which is commonly held in proprietary databases. AURAmol can be configured to return ODBC compatible indexes to these databases, which means you can use shape to search your own database and pull back all the information on the matches just as you would with an SQL query.

Standard System

The standard AURAmol system is designed to accommodate databases of less than 100,000 molecules with an average size of no more than 60 atoms per molecule. The above specifications assume that the search facility is installed for datasets that meet these criteria. Larger databases with more complex molecules can also be searched using AURAmol. Please contact us to discuss any search requirements you may have for databases outside of these limits.

Hardware Implementation

AURAmol is compatible with hardware acceleration using PRESENCE-2 technology. PRESENCE-2 (PCI64-NP) is a unique PCI based FPGA/DSP hardware platform for solving high-performance embedded computing problems. The card is targeted at applications requiring the use of large amounts of RAM and where complex systems need implementing together with standard signal processing problems.

The card uniquely supports up to 4G-bytes of RAM interfaced directly to the FPGA. In addition, the FPGA is presented directly to the PCI bus via a sophisticated bridge device. Extensive expansion is provided including a dedicated LVDS multi-card link.

The card is available in 3, 4 or 6 Million system gate Virtex II FPGA versions. The FPGA interfaces to a fast 32-bit integer Digital Signal Processor, up to 4 G-bytes of PC133 SDRAM memory, two independent fast Zero-Bus-Turnaround memories, dual high-speed 4-bit LVDS data channels, Sundance SDB compatible digital I/O header and a mezzanine expansion card connector. With each on-board resource given an independent interface to the FPGA, the designer is able to implement bus structures of choice, rather than the board itself imposing a fixed wiring scheme. Additional host system resources (system memory, I/O devices etc.) are accessible via the PCI bus with the card as bus master. The addition of multiple off-board communication channels and digital I/O enables parallisation of the hardware with other cards to support tightly coupled task-sharing configurations.

References

S. Klinger and J. Austin (2005): "Chemical Similarity Searching Using a Neural Graph Matcher", European Symposium on Artificial Neural Networks, 479-484. PDF

S. Klinger and J. Austin (2005), "A Neural Supergraph Matching Architecture", International Joint Conference on Neural Networks. PDF

Cybula Ltd Contact: Aaron Turner, ACAG, Department of Computer Science,
the University of York, Heslington, York YO1 5GH, UK
e-mail:
tel. +44 (0)1904 325630
Valid XHTML 1.0 Strict