UoY - CS - AIG: 2008 Group Seminars

Date: Wednesday 17^th December 2008, CS103 Computer Science, Univ. of York

13:00

Speaker: Waleed Alsanie, The University of York, Computer Science

Literature Review Seminar

Title: Relation Extraction from Text

Extracting relations from large unstructured text is demanding as a standalone application, and as a middle tier for other natural language applications, e.g. question answering, text summarization, etc. The complexity of such task arises from the data sparsity and differences in narratives from a domain to another. Different techniques have been followed to solve this problem ranging from extracting relations in domain specific texts to extracting relations in open domain corpus. In this presentation, we show the major techniques developed to solve this problem with highlighting the limitations of each technique. We also present potential future directions.

13:30

Speaker: Rania Hodhod, The University of York, Computer Science

Thesis Seminar

Title: Adaptive Narrative Based Educational System to Teach in Ill Defined Domains

Ethics and Citizenship is an ill defined domain. It offerers many challenges for both computer scientists and educationalists. Since computer systems can detect and respond to student knowledge gaps, misconceptions, affective states and other attributes, we think it would be helpful to have a computer system that helps in teaching in this domain. AEINS is an inquiry-based exploratory teaching system taht allows the student to be an active participant in the learning process. It adopts and uses the educational theories and the classroom teaching strategies such as the Socratic Method in order to teach ethics and citizenship. AEINS targets chldren aged 10-12. The idea is centered around presenting and involving students in different moral dilemmas (called teaching moments) within which the Socratic Method is the used pedagogy . AEINS monitors and analyzes the student’s actions in order to provide an individualized story-path and an individualized learning process.

Date: Friday 28^th November 2008, 12:15, CS202J Computer Science, Univ. of York

Speaker: Ioannis Korkontzelos, The University of York

Title: Reviewing and Evaluating Automatic Term Recognition Techniques (GoTAL 2008)

Automatic Term Recognition (ATR) is defined as the task of identifying domain specific terms from technical corpora. Termhood-based approaches measure the degree that a candidate term refers to a domain specific concept. Unithood-based approaches measure the attachment strength of a candidate term constituents. These methods have been evaluated using different, often incompatible evaluation schemes and datasets. This paper provides an overview and a thorough evaluation of state-of-the-art ATR methods, under a common evaluation framework, i.e. corpora and evaluation method. Our contributions are two-fold: (1) We compare a number of different ATR methods, showing that termhood-based methods achieve in general superior performance. (2) We show that the number of independent occurrences of a candidate term is the most effective source for estimating term nestedness, improving ATR performance.

Date: Friday 14^st November 2008, 12:15, CS202 Computer Science, Univ. of York

Speaker: Maria Arinbjarnar, The University of York

Title: Schemas in Directed Emergent Drama

A common problem in creating interactive drama is the authoring bottleneck. If pre-authored stories are directly incorporated into an interactive virtual environment then there is a need to consider all possible interactions and story twists, which for a sizeable drama is infeasible.

One proposed solution to this problem is to use search and planning algorithms along with narrative structures. This leads to a huge state space and planning becomes intractable for real-time solutions.

A way to address this problem is to distribute the story planning to autonomous characters so that the drama emerges from their interactions. However without predefined structure or directives the drama is unlikely to emerge into the intended story or even the intended genre.

We propose to divide the drama into narrative episodes which we call schemas. Schemas are used by a director and a set of actors to structure the drama so that it emerges into a fully developed drama. The schemas are pre-authored in an abstract way such that they can be deployed multiple times in the same drama, which removes the authoring bottleneck. In this article, we define the structure of the schemas and how the director and actors use schemas in Directed Emergent Drama (DED).

Speaker: Heather Barber, The University of York

Title: Generation of Adaptive Dilemma-based Interactive Narratives

This talk is a slightly abbreviated version of my thesis seminar. It presents a system for the Generation of Adaptive Dilemma-based Interactive Narratives (GADIN). An interactive narrative is a game world in which the user controlled character(s) can physically and mentally interact with ideally perceived total freedom while experiencing a dramatically interesting narrative which is fundamentally different on nearly every play – dependent on the user’s actions. GADIN automatically generates interactive narratives which are focused on dilemmas to create dramatic tension. The system is provided with knowledge of generic story actions and dilemmas based on those cliches encountered in many storytelling domains. The storyworld creator is only required to provide genre specific storyworld knowledge, such as information on characters and their relationships, locations and actions. These dilemmas and story actions are instantiated for the given storyworld and a planner creates sequences of actions that each lead to a dilemma for a character (who can be the user). The user interacts with the storyworld by making decisions on relevant dilemmas and by freely choosing their own actions. Using this input the system chooses and adapts future storylines according to the user’s past behaviour. To evaluate the success of an interactive narrative a series of criteria known as the compellingness criteria are used. These criteria are: interestingness; immersion; scalability; domain independence; agency; and replayability. A successful system will have a high degree of compellingness. This has not been achieved by any previous interactive narrative systems but is by GADIN.

Date: Friday 7^st November 2008, 12:15, CS202 Computer Science, Univ. of York

Speaker: Ahmad Shahid - The University of York

Title:Multilingual Lexicon Generation and Category Translations

Authors: Ahmad Shahid, Dimitar Kazakov

This paper proposes a method for creating a multilingual dictionary by taking the titles of Wikipedia pages in English and then finding the titles of the corresponding articles in other languages. The creation of such multilingual dictionaries has become possible as a result of exponential increase in the size of multilingual information on the web. Wikipedia is a prime example of such multilingual source of information on any conceivable topic in the world, which is edited by the readers. Here, a web crawler has been used to traverse Wikipedia following the links on a given page. The crawler takes out the title along with the titles of the corresponding pages in other targeted languages. The result is a set of words and phrases that are translations of each other. Three case studies have been done: (1) a lexicon has been constructed which contains 7-tuples corresponding to 7 different languages, namely: English, German, French, Polish, Bulgarian, Greek and Chinese, (2) a Computer Science domain specific dictionary has been created in 37 different languages, containing around 2,500 entries, and (3) Category translations for Computer Science and Artificial Intelligence.

Speaker: Shengping Xia - The University of York

Title : Scalable object indexing and retrieval using local invariant feature based central clustering tree and pairwise graph clustering

Given a query image of an object, we want develop a system to retrieve all instances of that object from a large corpus of image data set with the ease, speed and accuracy with which Google retrieves text documents containing particular word.

In the up-to-date "Bag-of-features" based methods (University of Oxford), all descriptors are quantized to form a bag of visual words and each local invariant feature corresponds to a visual word, which is used for indexing and retrieving. However, we regard a bag of features as a whole corresponding to a keyword in text retrieval. Hence we represent each image with a graph, which is constructed from a group of selected SIFT features. The graphs are then used to index object images.

In the text retrieval literature a standard method for improving performance is query expansion. Each of the highly ranked documents from the original query is iteratively reissued as a new query. Similarly, we generalize such query expansion strategy and propose a pairwise clustering method, based on the graphs and their pairwise graph matching (PGM) similarity metrics, to find query response set from a large corpus.

We illustrate these ideas on a database containing over 70K images of more than 500 kinds of objects. We show that the average precision is substantially improved, achieving 1 for many sufficiently or "continuously" sampled objects.

Date: Tuesday 22^nd July 2008, 12:00, CS122 Computer Science, Univ. of York

Speaker: Jimmy H.M. Lee, Chinese University of Hong Kong.

Title: VISOLE: A Game-based Constructivist Learning Paradigm.

VISOLE (Virtual Interactive Student-Oriented Learning Environment) is a constructivist pedagogical approach to empower game-based learning. Briefly speaking, it encompasses the creation of a near-real life online interactive world modeled upon a set of multi-disciplinary domains. In this virtual world, each student plays a role to take part in shaping its development competitively and collaboratively. With sophisticated multi-player simulation contexts and teacher facilitation (scaffolding and debriefing), VISOLE aims to provide opportunities for students to acquire subject-specific knowledge in a multi-disciplinary fashion, and sharpen their generic skills for problem-solving.

Farmtasia is the first online game following the learning paradigm of VISOLE. It involves the subject areas of geography (natural environment and hazards, as well as environmental problems), biology, economics (including government, and production system), and technology. The "virtual world" of Farmtasia consists of interacting farming systems covering the domains of cultivation, horticulture, and pasturage.

A study was conducted in Hong Kong to investigate the educational realization of VISOLE, in which there were 256 secondary-4 (K-10 equivalent) students and 28 teachers participated. In the study, we realized the pedagogical ideas of VISOLE empirically.

Date: 6^th June 2008, 12:15, CS103 Computer Science, Univ. of York (Fri week 7)

Talk 1 :

Speaker: Pierre Andrews, Computer Science Department, The University of York.

Title: Argumentative human computer dialogue for automated persuasion.

Argumentation is an emerging topic in the field of human computer dialogue. In this paper we describe a novel approach to dialogue management that has been developed to achieve persuasion using a textual argumentation dialogue system. The paper introduces a layered management architecture that mixes task oriented dialogue techniques with chatbot techniques to achieve better persuasiveness in the dialogue.

This is a draft run for a 25min talk at the sigDIAL conference at the end of the month.

Talk 2 :

Speaker: Charles Blundell, M.Eng. Student, Computer Science Department, The University of York.

Title: Learning Causal Bayesian Networks (30 min. M.Eng. Project presentation).

Causal Bayesian networks provide a richer source of data than Bayesian networks in the form of the outcomes of interventions. It is shown how a stochastic, a constraint-based, and a gradient ascent algorithm can be improved and then adapted to successfully exploit interventional data to learn the structure of causal Bayesian networks.

By this construction, it is shown that interventional data can often improve the structure hypothesis of a causal Bayesian network beyond the best hypothesis entertained by observational data allowing superior predictions of the outcomes of future interventions. A topological order of the causal Bayesian network, either known a priori or from data, plays a central role in several of the algorithms achieving this result.

Date: 30^th May 2008, 12:15, CS202J Computer Science, Univ. of York (Fri week 6)

Speaker: James Cussens, Computer Science Department, The University of York.

Title: Bayesian network learning by compiling to weighted MAX-SAT (associated paper accepted for UAI '08).

The problem of learning discrete Bayesian networks from data is encoded as a weighted MAX-SAT problem and the MaxWalkSat local search algorithm is used to address it. For each dataset, the per-variable summands of the (BDeu) marginal likelihood for different choices of parents (`family scores') are computed prior to applying MaxWalkSat. Each permissible choice of parents for each variable is encoded as a distinct propositional atom and the associated family score encoded as a `soft' weighted single-literal clause. Acyclicity is encoded using hard clauses involving atoms representing ancestor relations. Learning experiments have been conducted on 21 synthetic datasets sampled from 7 BNs. The largest dataset has 10,000 datapoints and 60 variables producing a weighted CNF input file with 19,932 atoms and 269,367 clauses. For most datasets, MaxWalkSat quickly finds BNs with higher BDeu score than the `true' BN. The effect of adding prior information is assessed. It is further shown that Bayesian model averaging can be effected by collecting BNs generated during the search.

Date: 23^th May 2008, 12:15, CS202J Computer Science, Univ. of York (Fri week 5)

Speaker: Dimitar Kazakov, Computer Science Department, The University of York.

Title: New Directions in Worst-case Execution Time Analysis.

Authors: Iain Bate and Dimitar Kazakov.

To be presented in: IEEE World Congress on Computational Intelligence, June 2008, Hong Kong.

In recent years there has been a considerable move towards the use of measurement-based approaches to timing analysis and specifically to the Worst-Case Execution Time (WCET) analysis problem. The key reasons for this move are that the hardware available for embedded real-time systems is becoming increasingly complex to model and consequently analysis is proving difficult. In our previous work, we showed how machine learning can be used to derive loop bounds as part of the Worst-Case Execution Time analysis problem. Here we build on this work by investigating the issue of branch prediction. We address this significant problem using a combination of static analysis, measurement-based testing and machine learning.

To demonstrate the potential of the approach relatively simple predictors are analysed. These are chosen as they have reasonable levels of complexity but not that complex that the principles cannot be demonstrated in the space available. Also in our experience, and that of others, more complex branch predictors have a negative effect on the WCET and its analysis.

Date: 22^th February 2008, 12:15, CS202J Computer Science, Univ. of York (Fri week 7)

Speaker: Arturo Servin, Computer Science Department, The University of York.

Title: Multi-Agent Reinforcement Learning for Intrusion Detection: A case study and evaluation

In this seminar I will present an architecture of distributed sensor and decision agents that learn how to identify normal and abnormal states of the network using Reinforcement Learning (RL). Sensor agents extract network state information using tile-coding as a function approximation technique and send communication signals in the form of actions to decision agents. These in turn generate actions in the form of alarms to the network operator. By means of an on-line process, sensor and decision agents learn the semantics of the communication actions without any previous knowledge. In this presentation I will describe the learning process, the operation of the agent architecture and the evaluation results of our research work.

Date: 8^th February 2008, 12:15, CS202J Computer Science, Univ. of York (Fri week 5)

B.Eng. Student Talks.

Speaker: Oliver Brennan, Computer Science Department, The University of York.

Title: Dialect Formation in Relationship to Cooperation

Background: In recent years computer simulation has come to be used to as a technique for simulating the evolution of language. One particular issue that has been studied is the coordination of a lexicon in a population. Through a series of language games in which a pair of agents communicate, updating their knowledge of the language based the success of the conversation, a common coordinated system emerges in the whole population. The aim of this project is to extend the current research on language games to study the effect of distribution of speakers and their ability to understand each other on the degree of cooperation and the distribution of various dialects in the population.

Speaker: David Griffin, Computer Science Department, The University of York.

Title: Declarative vs Object-Orientated Programming

Background: It is often assumed that the nowadays predominant OOP paradigm has proven advantages over alternatives, such as Declarative Programming (DP). This project compares two aspects of DP and OOP, (1) their ease of use for fast prototyping, which here takes the form of a quick implementation of common-case algorithms, and (2) the ability to reuse code when implementing a subsequent task of similar nature.

Speaker: Chris Savvopoulos, Computer Science Department, The University of York.

Title: Using Machine Learning for Empirical WCET Analysis

Background: Theory of computing and real-time systems are both interested in worst-case execution time (WCET), providing an upper bound for the time needed to run a piece of code. While much attention has been paid to the code content and structure, the choice of input parameters and their impact on WCET has been less studied. This is a serious omission, as it is well known that changing the input can sway an exponential algorithm towards an area where the execution time is almost always polynomial, as in the case of the propositional formula satisfiability (SAT) problem. Data pairing inputs with execution time, as well as logs of the number of times a given portion (elementary block) of the code has been executed, could provide information to a machine learning algorithm to come up with rules about the expected WCET as a function of the inputs.

Date: 25^th January 2008, 12:15, CS202J Computer Science, Univ. of York (Fri week 3)

Speaker: Burcu Can, Computer Science Department, The University of York.

Title: A Syllable-Based Speech Recognition System for the Turkish Language

Methods used in speech recognition systems change with the language which is aimed to recognize. Since the words in Turkish language are mostly produced by derivational and infectional affixes to roots, it is very hard to develop a word-based Turkish speech recognition system. For this reason, most of the Turkish speech recognition systems developed up to now have been phoneme-based. It is advantageous to develop phoneme-based speech recognition systems because the number of phonemes in every language is limited.

The unit used by speech recognition systems effects the system's success directly. As the unit's size grows, it becomes easier to discriminate the units from each other. For this reason, isolated word recognition systems are very successful. But it is impossible to implement it for Turkish language.When the disadvantages of these two units were considered, we decided to recognize syllables which are larger than phoneme in size and which is limited in number in Turkish language.

Within the implementation of the thesis work, multi-layer perceptrons were used to classify the syllables, an experimental application and an experiment platform were implemented for a syllable-based Turkish speech recognition system.

Date: 18^th January 2008, 12:15, CS202J Computer Science, Univ. of York (Fri week 2)

Speaker: Rayner Alfred, Computer Science Department, The University of York.

Title: DARA: Data Summarisation with Feature Construction

Feature construction methods are mostly related to classification problems, where the learning algorithms involve classifying objects based on their characteristics. In this case, the predictive accuracy can often be significantly improved by constructing new features which are more relevant used to predict the class of an object. On the other hand, the data summarisation in DARA algorithm deals with descriptive accuracy where this algorithm summarises symbolic representations of objects by clustering them into groups that share similar characteristics. This paper addresses the question whether or not the descriptive accuracy of the DARA algorithm benefits from the feature construction process. This involves solving the problem of constructing relevant set of features used to generate terms representing objects in the TF-IDF weighted matrix in order to cluster these objects. In this work, feature construction is used to enhance the results of the data summarisation approach in learning data stored in multiple tables. It is expected that, by improving the descriptive accuracy of the data summarisation approach, the predictive accuracy of a classfication problem can also be improved, provided that the summarised data is fed as one of the features for the classification task.