RNA Graph Classification Dataset

This database was compiled as part of a PhD project investigating the classification of RNA molecules using graph methods. The dataset comprises 3178 graphs representing RNA structures, with one of eight broad classifications attached to each. The original strands are extracted from the RNA Bricks database [1] and are processed into graphs by (a) creating a vertex for each RNA base, which are labelled by the residue type (‘A’,‘C’,‘G’,U’ and ‘O’ =other). (b) creating an edge between in base in the RNA sequence. © creating edges between paired bases in the strand. The base pairing were found using the 3DNA software ‘find-pairs’ [2]. The classes are curated from a variety of sources relating to each molecule. More details of the dataset are given in our paper below. The format of the data is described in the file ‘README.txt’. The database is freely provided for non-commercial research.

[1] G. Chojnowski, T. Walen, and J. M. Bujnicki, “Rna bricks: A database of rna 3d motifs and their interactions”, Nucleic Acids Research, vol. 42, no. D1, pp. 123-131, 2014. [Online]. Available: http://dx.doi.org/10.1093/nar/gkt1084

[2] x3dna.org

Citation

If you use this data in your research, please cite the following paper:

Enes Algul and Richard C. Wilson, “A Database and Evaluation for Classification of RNA Molecules using Graph Methods”, Proc. GbR, 2019.

Resources

York RNA Graph Dataset.

Examples

Example structure of rna37 (1CQ5_A). Generated by RNApdbee 2.0.

Example structure of rna37 (1CQ5_A). Generated by Matlab molviewer.