Machine Learning

Machine Learning research at York focuses on statistical relational learning, reinforcement learning, Bayesian networks and other graphical models and natural language learning.

Bayesian networks and other graphical models

Graphical models use graphs to represent probabilistic interactions between variables modelling a given situation. The most popular sort of graphical model is a Bayesian network which uses a directed graph; in certain circumstances Bayesian networks can represent causal relations between variables. A key problem is to how to get hold of a graphical model in the first place. Machine learning offers a solution where graphical models are inferred from data drawn from the situation being modelled. For this task we have developed the GOBNILP software for exact Bayesian network learning.

Recent publications in Bayesian network and other graphical models

Research topics in Bayesian network and other graphical models

Graphical model learning using constrained optimisation algorithms

A common approach to graphical model learning is to define a score measuring how well any given graphical model is supported by observed data. The problem then reduces to finding a graphical model with the highest score. Finding such a model is known to be an NP-hard problem. This motivates using optimisation approaches such as integer linear programming or weighted MAX-SAT solvers to do the searching. An MRC project A graphical model approach to pedigree construction using constrained optimisation started Oct 2011. This is a joint project with Leicester and Bristol which uses integer linear programming to induce Bayesian networks representing 'pedigrees' (family trees).

Graphical models for worst-case execution time

In collaborative with the Real-time systems group, we have been learning BNs to model worst-case execution times. A particular focus has been on modelling cache hits and misses.

Students working on Bayesian networks and other graphical models

Statistical relational learning

Statistical relational learning (SRL) is a branch of machine learning which aims to combine the strengths of probabilistic and relational representations. At York the focus is on first-order logic to represent relations, and so our brand of SRL is probabilistic inductive logic programming (PILP).

Despite the current great interest in SRL, there remains considerable debate about how best to achieve this integration. One useful avenue of research is to approach this from a Bayesian point of view, since this reduces statistical inference to probabilistic inference.

Publications in SRL

Research topics in SRL

Using algebraic statistics to analyse conditional independence in SRL formalisms

Graphical models such as Bayesian networks represent relations of conditional independence via the structure of a graph. SRL formalisms, such as the PRISM system, are capable of representing more complex conditional independence relations, and so graphs aren't enough. One promising avenue is to use algebraic statistics to analyse these more complex relations. Algebraic statistics is a sub-branch of algebraic geometry, so a nice bonus is the geometrical intuitions which it provides.

Fast implementations for SRL

SRL is a computationally demanding form of machine learning requiring both symbolic and numerical computation. An interesting route would be to implement SRL algorithms in the Mercury language. Roughly speaking, Mercury is strongly-typed Prolog which compiles to C (and thus to native code). Encouraging results have already been achieved for non-probabilistic ILP.

Reinforcement learning

Reinforcement learning (RL) is a highly popular machine learning technique, mainly due to its natural fit to the agent paradigm (i.e. learning by repeatedly acting and sensing in an environment) and its resulting wide application potential.Despite these advantages, RL suffers from scalability problems which have prevented its successful use in many complex real-world domains. Our research is focused on Knowledge-Based Reinforcement learning, which is concerned with the use of domain knowledge to scale-up and improve reinforcement learning and support transfer learning. Conversely, reinforcement learning is also used to revise domain knowledge. Our research considers both single-agent and multi-agent learning. Our application areas include mobile sensor management, robotics, and network intrusion detection and prevention. The research has been partially sponsored by QinetiQ and the MoD.

Students working on Reinforcement Learning

Natural language learning

Please see our page on natural language processing for information on machine learning of natural language.