Research | Reinforcement Learning | AI | CompSci | Univeristy of York University of YorkDepartment of Computer Science


Knowledge-Based Reinforcement Learning

One promising way to speed up the reinforcement learning process and improve the results is to exploit expert knowledge about the respective application domain. For many real-world tasks, human expert knowledge is available. However, this expert knowledge is often rough and incomplete. When the expert knowledge is used directly (e.g. in the form of pre-programmed behaviour rules), the solutions are often sub-optimal, due to the incompleteness of the knowledge, the uncertainty of environments, and the possibility of encountering unexpected situations.

Our research on Knowledge-based RL focuses on three areas where expert knowledge and RL can help each other:

  1. Knowing general procedures for solving tasks in an application (or even parts of such procedures) will help the RL agent to focus its exploration of the environment on more promising avenues for task solution. Thus, the learning process is sped up, as the agent does not waste time on obviously bad action choices.
  2. While the RL agent is exploring the environment, it is gaining experience that will potentially uncover mistakes and weaknesses in the expert domain knowledge. Therefore, the RL agent can help to revise this general knowledge and inform experts of potential weaknesses in their high-level procedures and heuristics.
  3. Often, an agent needs to solve tasks in more than one application scenario, where the environments and goals differ to various degrees. In these cases, it is beneficial to transfer experience, rather than start learning from scratch. Domain knowledge, being general and heuristic in nature, is very suitable for use in a new application scenario and can help the agent learn more quickly and produce better results.

Our work studies both single-agent and multi-agent domains.

Selected Publications:

  • S. Devlin, M. Grzes, D. Kudenko (2011): An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems, Advances in Complex Systems 14(2).
  • S. Devlin, D. Kudenko (2011): Theoretical Considerations of Reward Shaping for Multi-Agent Systems, Tenth International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS).
  • M. Grzes, D. Kudenko (2010): "PAC-MDP Learning with Knowledge-based Admissible Models", Ninth International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS).
  • M. Grzes, D. Kudenko (2010): "Online Learning of Shaping Rewards in Reinforcement Learning", Neural Networks 23(4).
  • M. Grzes, D. Kudenko (2008): "Plan-based reward shaping for reinforcement learning", Fourth IEEE International Conference on Intelligent Systems (IS), IEEE.


Network Security

Current research involves the application of reinforcement learning (RL) for intrusion response. Specifically, we investigate the application of multi-agent reinforcement learning (MARL) for a distributed defence against distributed denial-of-service (DDoS) attacks.

Previous work within the group involved the application of RL for intrusion detection. A novel MARL approach was proposed for anomaly-based DDoS detection. More specifically, distributed network agents cooperate in a hierarchical way in order to learn how to identify flooding DDoS intrusions, in the form of normal and abnormal network states.

  1. A. Servin, D. Kudenko (2008): "Multi-Agent Reinforcement Learning for Intrusion Detection: A case study and evaluation", Sixth German Conference on Multi-Agent System Technologies (MATES).
  2. A. Servin, D. Kudenko (2008): Multi-Agent Reinforcement Learning for Intrusion Detection: A case study and evaluation, Eighth Workshop on Adaptive Agents and Multi-Agent Systems (ALAMAS-ALAg).
  3. A. Servin, D. Kudenko (2007). Multi-Agent Reinforcement Learning for Intrusion Detection, Seventh Symposium on Adaptive and Learning Agents and Multi-Agent Systems (ALAMAS), also in Springer LNAI 4865.

Distributed Mobile Sensor Management

The DMSM provides a stochastic simulation of a realistic problem in sensor domains. The software simulation a moving sensor and points of interest(POI) with the goal being the optimization of coverage of the POIs by the sensor. The simulation is representative of the following research problem:

'What is a near-optimal steering control law for a gimballed sensor, to view a number of widely spread objects of uncertain location, when the sensor is moving and has limited range?'

Current research involves:

  • Application of reinforcement learning algorithms.
  • Impact of different reward functions e.g. prefer objects that are closer.
  • Impact of modifying of the dynamics of the sensor control.
  • Impact of altering number of POIs and their speed.

Robotic Soccer

The RoboCup( initiative aims to advance the development of intelligent robots through encouraging cross-discipline competition in a number of exciting areas ranging from football/soccer to search and rescue.

Our research has focussed on designing agents to play Keepaway( and in doing so has generated novel contributions to reinforcement learning in partially observable problem domains and multi-agent reward shaping.

  1. Devlin S., Grzes M., and Kudenko D.: An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems. In Advances in Complex Systems (ACS), World Scientific Publishing Co. Pte. Ltd. [link to publication]
  2. Devlin S., Grzes M., and Kudenko D.: Reinforcement Learning in RoboCup KeepAway with Partial Observability. In Proceedings of the IEEE/WIC/ACM International Conferences on Intelligent Agent Technology (IAT 2009), Milan, Italy, 2009. IEEE Computer Society. [link to publication]

Student Projects:

We have had many students doing final-year projects in the are of RL. You can find the complete list of project topics here (LINK TO APPEAR SOON).