Unsupervised Bayesian and Deep Learning Models of Morphology

Wednesday 4 November 2020, 1.30PM to 2.30pm

Speaker(s): Burcu Can

Agglutinating languages are built upon words that are made up of a sequence of morphemes. Although in languages such as English the number of morphemes is not large, it is usually three or more in Turkish and other agglutinating languages. It is very common to build a word having all the relevant constituents related to tense, person, plural etc. Morphological segmentation of those words is one of the challenges in natural language processing, especially in low-resource languages.

One of the topics that I will discuss in this talk is our recent work on unsupervised morphological segmentation using non-parametric Bayesian models. It is a tree-structured model where all words are structured within hierarchical morphological paradigms in a forest of trees.

In the last few years, representation learning has been standard in natural language processing with its superior performance in almost any NLP task. With representation learning, a unit (usually a word) can be represented by a low dimensional vector which involves all the relevant features of this unit regarding syntax or semantics. Those features are learned using distributional and contextual information of words in a very large corpus. However, if the word does not exist or it is not frequent enough in the corpus, how should we represent it in the same representation space?

Most of the recent work handles this problem by processing each word as a set of characters where the representation is obtained through the word's characters. Here I will question whether a word should be represented by its characters or its morphemes. I will conclude by returning to the question of how to represent words in agglutinating languages.

Burcu Can is a Reader in Computational Linguistics at the Research Institute in Information and Language Processing at University of Wolverhampton. Her research interests are mainly focused on natural language processing, computational linguistics, and on a range of machine learning topics such as Bayesian learning and currently deep neural networks.

Burcu Can received a Ph.D. degree in Computer Science from the University of York as a member of Artificial Intelligence Group in 2011. She worked as a Research Assistant (2011-2015) and as an Assistant Professor at the Department of Computer Engineering, Hacettepe University, Turkey (2015-2020). She led the Natural Language Processing Research Group (HUNLP) in 2015-2020. Meanwhile, she was the principal investigator of a nationally funded project, Unsupervised Joint Learning of Morphology and Syntax in Turkish, which was funded by the Scientific and Technological Research Council of Turkey.

She was a Visiting Scholar as a member of the Artificial Intelligence Group, University of York from 2014 to 2015. She was a Visiting Researcher at the Institute of Statistical Mathematics, Tokyo, Japan in 2019.

She served on the program committees of the major conferences and workshops in Computational Linguistics including ACL, AAAI, EMNLP, NAACL, IJCNLP, COLING; served on the workshop organizing committee of the Workshop on Representation Learning 2019, ACL. She is an associate editor on ACM Transactions on Asian and Low Resource Language Information Processing and a member of the editorial board on Turkish Journal of Electrical Engineering and Computer Science.

Join the seminar

Location: Online