These are the annual awards for the best papers produced by group members.
There is one award for the best paper with a student as a first author and one award for the best paper with an RA as first author.
Reinforcement learning suffers scalability problems due to the state space explosion and the temporal credit assignment problem. Knowledge-based approaches have received a significant attention in the area. Reward shaping is a particular approach to incorporate domain knowledge into reinforcement learning. Theoretical and empirical analysis of this paper reveals important properties of this principle, especially the influence of the reward type, MDP discount factor, and the way of evaluating the potential function on the performance.
This paper addresses an important question: speeding up the convergence of reinforcement learning (RL) and addresses this question with extensive theoretical and empirical analysis. The writing is also concise and clear.
Static analysis can be used to determine safe estimates of Worst Case Execution Time. However, overestimation of the number of loop iterations, particularly in nested loops, can result in substantial pessimism in the overall estimate. This paper presents a method of determining exact parametric values of the number of loop iterations for a particular class of arbitrarily deeply nested loops. It is proven that values are guaranteed to be correct using information obtainable from a finite and quantifiable number of program traces. Using the results of this proof, a tool is constructed and its scalability assessed.
The paper presents an application of machine learning to a problem in the field of Real-time Systems. More specifically, it utilises inductive logic programming to help calculate the maximum execution time of programs. It is notable for its cross disciplinary nature.
Reinforcement learning, while being a highly popular learning technique for agents and multi-agent systems, has so far encountered difficulties when applying it to more complex domains due to scaling-up problems. This paper focuses on the use of domain knowledge to improve the convergence speed and optimality of various RL techniques. Specifically, we propose the use of high-level STRIPS operator knowledge in reward shaping to focus the search for the optimal policy. Empirical results show that the plan-based reward shaping approach outperforms other RL techniques, including alternative manual and MDP-based reward shaping when it is used in its basic form. We show that MDP-based reward shaping may fail and successful experiments with STRIPS-based shaping suggest modifications which can overcome encountered problems. The STRIPS-based method we propose allows expressing the same domain knowledge in a different way and the domain expert can choose whether to define an MDP or STRIPS planning task. We also evaluate the robustness of the proposed STRIPS-based technique to errors in the plan knowledge.
The paper works on improving reinforcement learning by adding in domain knowledge using a technique known as 'reward shaping' using high-level knowledge expressed using STRIPS operators. The empirical section compares the proposed approach with many others and the proposed approach comes out top. More importantly, there is considerable analysis of why the empirical results are the way they are. The paper is notable for its mix of theory and experiment.