The FEDAURA Project - Objectives

The major objectives of the FEDAURA Project are:

Establish a baseline by evaluating currently used detection methods.
Identify standards and produce an evaluation framework for fraud detection systems.
Develop and evaluate a selected range of new techniques using AURA technology.
Show that the AURA methods can scale to process large-scale data sets within the demanding processing time limits exemplified by the DWP fraud application.
Evaluate other Neural Network, Statistical and Data Mining methods for fraud detection and compare these with AURA-based methods for similar tasks.
Develop a framework for using the DWP data based on Case-Based Reasoning,
Assess the accuracy of the AURA technology in identifying anomalies (e.g. fraud),
Show how the technology can be incorporated into existing DWP systems operated by EDS and others.

The benefits of the project will be primarily the reduction of Benefit fraud. This will be achieved through a follow on implementation of the technology in an operational environment. The project will allow a new technology to be transferred to leading software consultancy companiesand developed ready for market. The technology is applicable in many similar fraud environments including insurance, banking and e-business.

The AURA techniques developed at York have demonstrated powerful pattern-matching abilities in a range of domains. Two major feature of the technology are:

its ability to perform an initial fast (but approximate) search followed by a more detailed analysis
together with rapid, one-pass training of the system.

The initial fast search removes all data clearly inapplicable to the problem, leaving a small residual data set that can be analysed by very powerful, but possibly slower methods.

A framework is needed to apply AURA within the problem domain of fraud identification. The research intends to explore the use of a Case-Based Reasoning (CBR) framework, one of the major Machine Learning frameworks for pattern matching. CBR provides a methodology for identification and updating of cases. The main advantages of CBR are:

that it makes explicit the information within each case, facilitating explanation;
it provides for on-line updating of information; and,
it allows case updating.

As a basis for applying the AURA technology to benefit claim fraud identification, it appears to offer a suitable and very promising framework within which to structure the problem. Essentially, data concerning each individual submitting a claim is considered to form a case, with accumulated, historical claims data being matched against any new benefit claim. However, with very large numbers of cases, run-times can become a serious problem.
"Case-based reasoning will be ready for large-scale problems only when retrieval algorithms are efficient at handling thousands of cases." I Watson, F Marir, 1994, Case-Based Reasoning: A Review, The Knowledge Engineering Review, Vol. 9, No. 4.

Clearly the DWP application is at least an order of magnitude larger than this, in terms of case numbers. This can be seen in terms of the linear time complexity in the number of cases (as opposed to neural network approaches, which achieve fast run times at the expense of lengthy training times). Clearly with over 5 million new cases per year, the pattern-matching problem can be immense.

The major issues to be investigated involve how AURA can be used for case matching, how cases can be updated and how explanations can be produced from such a system. These issues are discussed in more detail below:

(1) Case matching

The project will build on previous research in methods of using AURA for k-NN classification. In the DWP application AURA will be used to identify a suitably small subset k of the most similar benefits claims, from a database of known fraudulent and non-fraudulent claims. AURA will rapidly identify these, permitting a statistical analysis to assess similarity and ultimately the risk that a claim is fraudulent. The key feature within AURA is that the class-density modelling is done on-line using the subset of data returned. This allows the case-base to be continually updated while in use. Other methods typically perform the probabilistic modelling off-line, and as such updates become a time-consuming process. Clearly, some inaccuracies in the modelling may occur, but we expect the impact of these to be negligible when compared to the gains in speed in training and improved usability.

This approach is only possible because AURA allows very fast access to the large number of cases. The approach is highly novel and is clearly applicable widely beyond this project.
In addition to the simple one-stage process described above, more complex relationships may be identified through repeated searching of the case-base, thus allowing a form of on-line reasoning to be attempted. For example, where relatives of a client (claimant) are also claiming benefits and these need to be identified by a further search.

To be successful, the approach taken must exploit existing knowledge of fraud methods identified by DWP staff. The project will identify ways to incorporate this information within the matching framework in conjunction with the similarity-based search provided by the main AURA matching engine. The research will assess whether this knowledge can be coded into the system as rules. Alternatively, conventional expert systems may be considered.

To be effective, the approach must scale with the large number of cases. To implement such a large application requires very large resources. Sun Microsystems have joined the project to provide the computing infrastructure needed in the work, they will be donating a 16 processor "Throughput Engine" and necessary storage for the project, to be used in addition to the extensive facilities at University of York. Assessment of the system on a commercial platform will be achieved through the support provided by Sun Microsystems.

(2) Adding new cases

Case update must allow the new claims to be added to the system. Cases are encoded and stored as individual items in the AURA system. The simplicity of this operation is one of the main points of the proposed approach. Rule data extracted from experts will supplement this within the inference engine. It is not intended to look at methods for extracting rules automatically from the data as this is would exceed the time available for the project, but such methods may be considered if time becomes available. All new knowledge will be based on actual claims and expert domain knowledge.

(3) Explanation

As identified in the problem requirements, the system must have an explanation capability. The use of cases provides a possible means to achieve this. In the case-matching system the AURA methods will provide a subset of cases that are "close" to the new claim. These cases will be analysed to identify if there is a high likelihood of the claim being fraudulent. Clearly the set of cases used for this can be used to present an explanation. In its simplest form, this could be presented just as a set of cases, to be used to determine the claim status. However, some summarising of the data can be undertaken to make this more reportable. In addition, information from the rule-based system will be incorporated (for example, those rules active in identifying possible fraud).

Evaluation of neural network and other methods

To complement the work based on AURA, the project will build on the work undertaken at Sema into the use of MLP neural networks for the identification of fraud. Although AURA-based methods have the advantage of speed, on-line updating and an ability to explain how the classification was achieved, they can suffer from reduced accuracy compared to other neural, machine learning and statistical methods that may iteratively search for the best model of the data. The work will allow us to answer the question:
"how much accuracy is lost by using AURA-based methods?",
i.e. what the cost is of the speed and flexibility offered by the AURA methods.

Clearly the combination of CBR and AURA methods is an exciting possibility and may allow CBR systems to be built that scale to very large problems, such as the one addressed in the current work. The work will also show that certain classes of neural network based systems (such as AURA) can be used on very large, symbolic, problems. A number of issues need to be addressed to show the benefits of this coupling, and these form the basis of the research to be undertaken.