-
home
|
|
contact
|
-------
------
-
Cybula Ltd provides a commercialisation route for ACAG research.
find out more
|
- Info
Text Matching Technologies
MinerTaur
Due to the proliferation of information in databases
and on the Internet, users are overwhelmed leading to information
overload. It is impossible for humans to index and search such
a vast amount of information by hand so automated indexing and searching
techniques are required. A method is needed to: process documents
unsupervised and generate a multi-level and compact index; overcome
spelling mistakes in the user's query, suggesting alternative spellings
for their query terms; and finally, calculate query-to-document
similarities from statistics available in the text corpus.
Specifically, we are incorporating the Information
Retrieval process into a modular Neural Network architecture [Hodge_THESIS,
Hodge_ESANN01,
Weeks+Hodge_PDP02, Weeks+Hodge_HPDC02].
Integration aims to exploit the benefits of the incorporated techniques
whilst overcoming their respective limitations. Our system comprises
three modules:
The system autonomously generates the modules from
unstructured textual information. We use the AURA
modular neural system for the spell checker and index, and we
employ our hierarchical, neural clustering algorithm to autonomously induce
a hierarchical thesaurus of synonym clusters from corpus statistics.
Each query word input by the user passes through each module in
turn.
Publications
Thesis
- Victoria
J. Hodge, [Hodge_THESIS]. Integrating Information Retrieval & Neural Networks, PhD Thesis, Department of Computer Science, The University of York, Heslington,
YORK, YO10 5DD United Kingdom, 2001.download
pdf (.pdf)
Journals, Proceedings, Reports
Unfortunately copyright restrictions prevent making some of my publications available on-line. However,
reprints are available on request - request
a copy.
- Victoria
J. Hodge & Jim Austin [Hodge_TKDE03]. An Evaluation of Standard Spell Checking Algorithms and a
Binary Neural Approach. IEEE Transactions on Knowledge and Data Engineering
15(5): pp. 1073–1081, IEEE Computer Society, Sept/Oct 2003. Full Text Article from White Rose Research Online
- Victoria
J. Hodge & Jim Austin [Hodge_PR02]. A Comparison of a Novel Spell Checker and Standard Spell
Checking Algorithms. Pattern Recognition 35(11): pp. 2571–2580, Elsevier
Science, 2002. Full Text Article from Elsevier Science Journals - Pattern Recognition (pdf)
- Victoria
J. Hodge & Jim Austin [Hodge_NC02]. Hierarchical Word Clustering – automatic thesaurus generation,
NeuroComputing 48(1–4): pp. 819–846, Elsevier Science,
2002. Full Text Article from Elsevier Science Journals - NeuroComputing (pdf)
- M.
Weeks, Victoria J. Hodge & Jim Austin [Weeks+Hodge_HPDC02]. Scalability of a Distributed Neural Information Retrieval
System, Accepted for presentation at HPDC–2002, 11th IEEE International
Symposium on High Performance Distributed Computing. Edinburgh,
Scotland, July 24–26, 2002 download zipped abstract (.zip)
- M. Weeks, Victoria J. Hodge & Jim Austin [Weeks+Hodge_PDP02]. A Hardware Accelerated Novel IR System,
In,
Proceedings of the 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing (PDP-2002),
Las Palmas de Gran Canaria, Canary Islands, January 9th–11th, 2002. IEEE Computer Society, Los Alamitos, CA.
download zipped postscript (.zip)
- Victoria
J. Hodge & Jim Austin [Hodge_TR01]. An Evaluation of Phonetic Spell Checkers, Technical Report YCS 338(2001), Department of Computer
Science, University of York. download
postscript (.ps)
- Victoria J.
Hodge & Jim Austin [Hodge_ICANN01]. A Novel Binary Spell Checker. In, Proceedings of the International Conference on Artificial
Neural Networks (ICANN'2001), Vienna, Austria, 25–29 August,
2001. Dorffner, Bischof & Hornik (Eds), Lecture Notes in Computer Science (LNCS) 2130, Springer Verlag, Berlin.
download zipped
postscript (.zip)
- Victoria J.
Hodge & Jim Austin [Hodge_ESANN01]. An Integrated Neural IR System. In, M.Verleysen (ed.) Proceedings of the 9th European Symposium on Artificial
Neural Networks (ESANN'2001), Bruges (Belgium), 25–27 April
2001, D-Facto public., ISBN 2–930307–01–3, pp. 265–270. download zipped pdf (.zip)
- Victoria J.
Hodge & Jim Austin [Hodge_NN01]. An Evaluation of Standard Retrieval Algorithms and a Binary
Neural Approach. Neural Networks, 14(3): pp. 287–303, Elsevier Science, 2001. Full
Text Article from Elsevier Science – Neural Networks (pdf)
- Victoria J.
Hodge & Jim Austin [Hodge_TKDE01]. Hierarchical Growing Cell Structures: TreeGCS. IEEE Transactions on Knowledge and Data Engineering, Special
Issue on Connectionist Models for Learning in Structured Domains,
13(2): pp. 207–218, 2001. Full Text Article from White Rose Research Online
- Victoria J.
Hodge & Jim Austin [Hodge_KES00]. Hierarchical Growing Cell Structures: TreeGCS. In, Proceedings of the Fourth International Conference on
Knowledge–Based Intelligent Engineering Systems (KES'2000),
Brighton, UK, August 30th to September 1st, 2000. download zipped PDF (.zip)
- Victoria J.
Hodge & Jim Austin [Hodge_IJCNN00]. An Evaluation of Standard Retrieval Algorithms and a Weightless
Neural Approach. In, Proceedings of the IEEE–INNS–ENNS International Joint
Conference on Neural Networks (IJCNN'2000), Italy, 24–27
July, 2000. download zipped
postscript (.zip)
MinerTaur: AURA Implementation
The system comprises three modules: a spell checking pre-processor, a thesaurus, and an indexing data structure to provide the key for the Information Retrieval System in a word-document index. The system autonomously generates the modules from unstructured textual information. We use the AURA modular neural system for the spell checker and index, and we autonomously induce a hierarchical thesaurus of synonym clusters from corpus statistics. Each query word input by the user passes through each module in turn. The word-document index identifies and ranks the matching documents using rapid and efficient searching.
Screenshot
|
|