Automated Software Engineering Group

We are a group of 36 researchers in the Department of Computer Science at the University of York, developing ground-breaking methods and tools for automated analysis, design, development, deployment, and management of complex software-intensive systems. We collaborate closely with companies such as Rolls-Royce, IBM, Altran, and Volkswagen on projects co-funded by the European Commission, RCUK, InnovateUK and DSTL.

Members

Professor Dimitris Kolovos
Model-based software engineering, software repository mining and big-data persistence and processing architectures.
Professor Richard Paige
Model-based software engineering, agile development, service-oriented architectures, formal methods, object-oriented programming, systems engineering.
Dr Radu Calinescu
Formal methods for adaptive, autonomic, secure and dependable IT systems, automated, model- and metadata-driven software engineering, formal specification, modelling and verification. Leading the Trustworthy Adaptive and Autonomous Systems & Processes team.
Dr Javier Camara Moreno
Software engineering, self-adaptive systems, software architectures, applied formal methods, cyber-physical systems.
Dr Nicholas Matragkas
Model-based software engineering, software repository mining and software testing.
Dr Simos Gerasimou
Self-adaptive and autonomous systems with a focus on methods that enable dependable system adaptation, runtime quantitative verification, search-based software engineering, model-driven engineering, robotics and artificial intelligence.
Dr Thanos Zolotas
Model-based software engineering, big data architectures
Dr Kostas Barmpis
Model-based software engineering, mining software repositories.
Dr Colin Paterson
Tool-supported formal approaches for engineering of adaptive and autonomous systems and processes, probabilistic model checking.
Patrick Neubauer
Model-based software engineering, mining software repositories.
Dr Alfa Yohannis
Model-based software engineering, change-based model persistence.
Justin Cooper
Domain-specific languages, embedded at Rolls-Royce.
Jon Co
Model-based spreadsheet analysis, embedded at IBM.
Betty Sanchez
Model-based software engineering, Simulink, reactive modelling workflows.
Sultan Almutairi
Model-based software engineering, model-to-text transformation.
Nikos Fountoulakis
Software repository mining, code repository indexing.
Qurat Ul Ain Ali
Low-code software engineering
Sorour Jahanbin
Low-code software engineering
Panagiotis Kourouklidis
Low-code software engineering for machine learning
Emad Alharbi
Metaheuristics for protein model synthesis from electron-density maps.
Ana Markovic
Multi-language distributed stream processing
Premathas Somasekaram
Autonomous systems, cloud computing, high availability cluster and grid computing, machine learning, statistical analysis, Bayesian networks.
Ioannis Stefanakos
Formal methods, model-driven software engineering
Saud Yonbawi
Self-adaptation in distributed systems, runtime quantitative verification.

Recent Publications

Towards Model-Based Development of Decentralised Peer-to-Peer Data Vaults

Yohannis, A., De La Vega, A., Kahrobaei, D. & Kolovos, D., 18 Oct 2020, ACM / IEEE 23rd International Conference on Model Driven Engineering Languages and Systems (MODELS). 8 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publicationACM / IEEE 23rd International Conference on Model Driven Engineering Languages and Systems (MODELS)
DateAccepted/In press - 2020
DatePublished (current) - 18 Oct 2020
Number of pages8
Original languageEnglish

Abstract

Using centralised data storage systems has been the standard practice followed by online service providers when managing the personal data of their users.
This method requires users to trust these providers and, to some extent, users are not in full control over their data.
The development of applications around decentralised data vaults, i.e., encrypted storage systems located in user-managed devices, can give this control back to the users as sole owners of the data.
However, the development of such applications is not effort-free, and it requires developers to have specialised knowledge, such as how to deploy secure and peer-to-peer communication systems.
We present Vaultage, a model-based framework that can simplify the development of data vault applications.
We demonstrate its core features through a social network application case study and include some initial evaluation results, showing Vaultage's code generation capabilities and some profiling analysis of the generated network components.

Bibliographical note

© 2020 Association for Computing Machinery. This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy. Further copying may not be permitted; contact the publisher for details.

Polyglot and Distributed Software Repository Mining with Crossflow

Matragkas, N., Kolovos, D., Barmpis, K., Neubauer, P. & Paige, R., Oct 2020, MSR '20: Proceedings of the 17th International Conference on Mining Software Repositories. p. 374-384 11 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publicationMSR '20: Proceedings of the 17th International Conference on Mining Software Repositories
DatePublished - Oct 2020
Pages374-384
Number of pages11
Original languageEnglish

Publication details

JournalSoftware and Systems Modeling
DateAccepted/In press - 10 Jun 2020
DatePublished (current) - 11 Aug 2020
Number of pages24
Original languageEnglish

Abstract

UML profiles offer an intuitive way for developers to build domain-specific modelling languages by reusing and extending UML concepts. Eclipse Papyrus is a powerful open-source UML modelling tool which supports UML profiling. However, with power comes complexity, implementing non-trivial UML profiles and their supporting editors in Papyrus typically requires the developers to handcraft and maintain a number of interconnected models through a loosely guided, labour-intensive and error-prone process. We demonstrate how metamodel annotations and model transformation techniques can help manage the complexity of Papyrus in the creation of UML profiles and their supporting editors. We present Jorvik, an open-source tool that implements the proposed approach. We illustrate its functionality with examples, and we evaluate our approach by comparing it against manual UML profile specification and editor implementation using a non-trivial enterprise modelling language (Archimate) as a case study. We also perform a user study in which developers are asked to produce identical editors using both Papyrus and Jorvik demonstrating the substantial productivity and maintainability benefits that Jorvik delivers.

Bibliographical note

© The Author(s) 2020

Efficient Generation of Graphical Model Views via Lazy Model-to-Text Transformation

Kolovos, D., De La Vega, A. & Cooper, J., 13 Jul 2020, (Accepted/In press) ACM/IEEE 23rd International Conference on Model Driven Engineering Languages and Systems (MODELS ’20).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publicationACM/IEEE 23rd International Conference on Model Driven Engineering Languages and Systems (MODELS ’20)
DateAccepted/In press - 13 Jul 2020
Original languageEnglish

Abstract

Producing graphical views from software and system models is often desirable for communication and comprehension purposes, even when graphical model editing capabilities are not required -- because the preferred editable concrete syntax of the models is text-based, or for models extracted via reverse engineering. To support such scenarios, we present a novel approach for efficient rule-based generation of transient graphical views from models using lazy model-to-text transformation, and an implementation of the proposed approach in the form of an open-source Eclipse plugin named Picto. Picto builds on top of mature visualisation software such as Graphviz and PlantUML and supports, among others, composite views, layers, and multi-model visualisation. We illustrate how Picto can be used to produce various forms of graphical views such as node-edge diagrams, tables and sequence-like diagrams, and we demonstrate the efficiency benefits of lazy view generation approach against batch model-to-text transformation for generating views from large models.

Bibliographical note

This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy. Further copying may not be permitted; contact the publisher for details.

Supporting Robotic Software Migration Using Static Analysis and Model-Driven Engineering

Gerasimou, S., Wood, S., Matragkas, N., Kolovos, D. & Paige, R. F., 13 Jul 2020, (Accepted/In press) ACM/IEEE 23rd International Conference on Model Driven Engineering Languages and Systems (MODELS ’20).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publicationACM/IEEE 23rd International Conference on Model Driven Engineering Languages and Systems (MODELS ’20)
DateAccepted/In press - 13 Jul 2020
Original languageEnglish

Abstract

The wide use of robotic systems contributed to developing robotic software highly coupled to the hardware platform running the robotic system. Due to increased maintenance cost or changing business priorities, the robotic hardware is infrequently upgraded, thus increasing the risk for technology stagnation. Reducing this risk entails migrating the system and its software to a new hardware platform. Conventional software engineering practices such as complete re-development and code-based migration, albeit useful in mitigating these obsolescence issues, they are time-consuming and overly expensive. Our RoboSMi model-driven approach supports the migration of the software controlling a robotic system between hardware platforms. First, RoboSMi executes static analysis on the robotic software of the source hardware platform to identify platform-dependent and platform-agnostic software constructs. By analysing a model that expresses the architecture of robotic components on the target platform, RoboSMi establishes the hardware configuration of those components and suggests software libraries for each component whose execution will enable the robotic software to control the components. Finally, RoboSMi through code-generation produces software for the target platform and indicates areas that require manual intervention by robotic engineers to complete the migration. We evaluate the applicability of RoboSMi and analyse the level of automation and performance provided from its use by migrating two robotic systems deployed for an environmental monitoring and a line following mission from a Propeller Activity Board to an Arduino Uno.

Empirical Analysis of 1-edit Degree Patches in Syntax-Based Automatic Program Repair

Dziurzanski, P., Gerasimou, S., Kolovos, D. & Matragkas, N., 20 Mar 2020, (Accepted/In press) IEEE Congress on Evolutionary Computation.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publicationIEEE Congress on Evolutionary Computation
DateAccepted/In press - 20 Mar 2020
Original languageEnglish

Abstract

In this paper, software patches modifying a single line (aka 1-edit degree patches) of buggy Java open-source projects have been generated automatically using computational search and experimentally evaluated. We carried out the presumably largest to date experiment related to 1-edit degree patches, consisting of almost 27,000 computational jobs upper bounded with 107,000 computational hours. Our experiments show the benefits and drawbacks of such kind of patches. In particular, the search space size has been shown to be reduced by several orders of magnitude. The volume of tests that can be filtered out without any negative impact while generating 1-edit degree patches has been increased by about 97%.
Finally, the effectiveness of finding 1-edit plausible patches is compared with multi-line plausible patches found with state-of-the-art syntax-based Automatic Program Repair tools. It is shown that despite patching fewer bugs in total, 1-edit degree patches have potential to patch some extra bugs.

Publication details

JournalSoftware and Systems Modeling
DateAccepted/In press - 1 Jan 2020
DatePublished (current) - 18 May 2020
Original languageEnglish

Publication details

JournalSoftware and Systems Modeling
DateAccepted/In press - 4 Dec 2019
DatePublished (current) - 1 Jan 2020
Issue number1
Volume19
Number of pages9
Pages (from-to)5-13
Original languageEnglish

Abstract

In 2017 and 2018, two events were held—in Marburg, Germany, and San Vigilio di Marebbe, Italy, respectively—focusing on an analysis of the state of research, state of practice, and state of the art in model-driven engineering (MDE). The events brought together experts from industry, academia, and the open-source community to assess what has changed in research in MDE over the last 10 years, what challenges remain, and what new challenges have arisen. This article reports on the results of those meetings, and presents a set of grand challenges that emerged from discussions and synthesis. These challenges could lead to research initiatives for the community going forward.

Bibliographical note

© The Author(s) 2020

Safety Controller Synthesis for Collaborative Robots

Gleirscher, M. & Calinescu, R., 28 Oct 2020, Proceedings of the 25th International Conference on Engineering of Complex Computer Systems (ICECCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publicationProceedings of the 25th International Conference on Engineering of Complex Computer Systems (ICECCS)
DatePublished - 28 Oct 2020
Original languageEnglish

Publication details

JournalActa Crystallographica Section D: Structural Biology
DateAccepted/In press - 31 Jul 2020
DateE-pub ahead of print - 19 Aug 2020
DatePublished (current) - 1 Sep 2020
Issue number9
Volume76
Number of pages10
Pages (from-to)814-823
Early online date19/08/20
Original languageEnglish

Abstract

For the last two decades, researchers have worked independently to automate protein model building, and four widely used software pipelines have been developed for this purpose: ARP/wARP, Buccaneer, Phenix AutoBuild and SHELXE. Here, the usefulness of combining these pipelines to improve the built protein structures by running them in pairwise combinations is examined. The results show that integrating these pipelines can lead to significant improvements in structure completeness and Rfree. In particular, running Phenix AutoBuild after Buccaneer improved structure completeness for 29% and 75% of the data sets that were examined at the original resolution and at a simulated lower resolution, respectively, compared with running Phenix AutoBuild on its own. In contrast, Phenix AutoBuild alone produced better structure completeness than the two pipelines combined for only 7% and 3% of these data sets.

Towards Formal Verification of Control Algorithms for Autonomous Marine Vehicles

Foster, S. D., Gleirscher, M. & Calinescu, R., 2 Aug 2020, (Accepted/In press) Proceeding of the 25th International Conference on Engineering of Complex Computer Systems (ICECCS 2020). 6 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publicationProceeding of the 25th International Conference on Engineering of Complex Computer Systems (ICECCS 2020)
DateAccepted/In press - 2 Aug 2020
Number of pages6
Original languageEnglish

Abstract

The use of autonomous vehicles in real-world applications is often precluded by the difficulty of providing safety guarantees for their complex controllers. The simulation-based testing of these controllers cannot deliver sufficient safety guarantees, and the use of formal verification is very challenging due to the hybrid nature of the autonomous vehicles. Our work-in-progress paper introduces a formal verification approach that addresses this challenge by integrating the numerical computation of such a system (in GNU/Octave) with its hybrid system verification by means of a proof assistant (Isabelle). To show the effectiveness of our approach, we use it to verify differential invariants of an Autonomous Marine Vehicle with a controller switching between multiple modes.

Analysis and Refactoring of Software Systems Using Performance Antipattern Profiles

Calinescu, R., Cortelessa, V., Stefanakos, I. & Trubiani, C., 17 Mar 2020, 23rd International Conference on Fundamental Approaches to Software Engineering. p. 357-377 21 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publication23rd International Conference on Fundamental Approaches to Software Engineering
DateAccepted/In press - 20 Dec 2019
DateE-pub ahead of print (current) - 17 Mar 2020
Pages357-377
Number of pages21
Original languageEnglish

Bibliographical note

© The Author(s) 2020.

Assurance Argument Patterns and Processes for Machine Learning in Safety-Related Systems

Picardi, C., Paterson, C., Hawkins, R. D., Calinescu, R. & Habli, I., 27 Feb 2020, Proceedings of the Workshop on Artificial Intelligence Safety (SafeAI 2020). CEUR Workshop Proceedings, p. 23-30 (CEUR Workshop Proceedings; vol. 2560).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publicationProceedings of the Workshop on Artificial Intelligence Safety (SafeAI 2020)
DatePublished - 27 Feb 2020
Pages23-30
PublisherCEUR Workshop Proceedings
Original languageEnglish

Publication series

NameCEUR Workshop Proceedings
Volume2560
ISSN (Electronic)1613-0073

Abstract

Machine Learnt (ML) components are now widely accepted for use in a range of applications with results that are reported to exceed, under certain conditions, human performance. The adoption of ML components in safety-related domains is restricted, however, unless sufficient assurance can be demonstrated that the use of these components does not compromise safety. In this paper, we present patterns that can be used to develop assurance arguments for demonstrating the safety of the ML components. The argument patterns provide reusable templates for the types of claims that must be made in a compelling argument. On their own, the patterns neither detail the assurance artefacts that must be generated to support the safety claims for a particular system, nor provide guidance on the activities that are required to generate these artefacts. We have therefore also developed a process for the engineering of ML components in which the assurance evidence can be generated at each stage in the ML lifecycle in order to instantiate the argument patterns and create the assurance case for ML components. The patterns and the process could help provide a practical and clear basis for a justifiable deployment of ML components in safety-related systems.

Bibliographical note

© 2020 for this paper by its authors.

Interval Change-Point Detection for Runtime Probabilistic Model Checking

Zhao, X., Calinescu, R., Gerasimou, S., Robu, V. & Flynn, D., 2020, 35th IEEE/ACM International Conference on Automated Software Engineering.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publication35th IEEE/ACM International Conference on Automated Software Engineering
DateAccepted/In press - 30 Jul 2020
DatePublished (current) - 2020
Original languageEnglish

Importance-Driven Deep Learning System Testing

Gerasimou, S., Eniser, H. F. & Sen, A., 2020, 42nd International Conference on Software Engineering.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publication42nd International Conference on Software Engineering
DateAccepted/In press - 9 Dec 2019
DatePublished (current) - 2020
Original languageEnglish

Abstract

Deep Learning (DL) systems are key enablers for engineering intelligent applications due to their ability to solve complex tasks such as image recognition and machine translation. Nevertheless, using DL systems in safety- and security-critical applications requires to provide testing evidence for their dependable operation. Recent research in this direction focuses on adapting testing criteria from traditional software engineering as a means of increasing confidence for their correct behaviour. However, they are inadequate in capturing the intrinsic properties exhibited by these systems. We bridge this gap by introducing DeepImportance, a systematic testing methodology accompanied by an Importance-Driven (IDC) test adequacy criterion for DL systems. Applying IDC enables to establish a layer-wise functional understanding of the importance of DL system components and use this information to guide the generation of semantically-diverse test sets. Our empirical evaluation on several DL systems, across multiple DL datasets and with state-of-the-art adversarial generation techniques demonstrates the usefulness and effectiveness of DeepImportance and its ability to guide the engineering of more robust DL systems.

Publication details

JournalCEUR Workshop Proceedings
DatePublished - 6 Dec 2019
Volume2513
Number of pages14
Pages (from-to)67-80
Original languageEnglish

Abstract

Domain-specific languages enable concise and precise formalization of domain concepts and promote direct employment by domain experts. Therefore, syntactic constructs are introduced to empower users to associate concepts and relationships with visual textual symbols. Model-based language engineering facilitates the description of concepts and relationships in an abstract manner. However, concrete representations are commonly attached to abstract domain representations, such as annotations in metamodels, or directly encoded into language grammar and thus introduce redundancy between metamodel elements and grammar elements. In this work we propose an approach that enables autonomous development and maintenance of domain concepts and textual language notations in a distinctive and metamodel-agnostic manner by employing style models containing grammar rule templates and injection-based property selection. We provide an implementation and showcase the proposed notationspecification language in a comparison with state of the art practices during the creation of notations for an executable domain-specific modeling language based on the Eclipse Modeling Framework and Xtext.

Bibliographical note

© 2019 The Authors.

Publication details

JournalInformation and Software Technology
DateAccepted/In press - 25 May 2019
DatePublished (current) - 1 Nov 2019
Volume115
Number of pages22
Pages (from-to)97-118
Original languageEnglish

Abstract

Context: Model-driven engineering (MDE) promotes the active use of models in all phases of software development. Even though models are at a high level of abstraction, large or complex systems still require building monolithic models that prove to be too big for their processing by existing tools, and too difficult to comprehend by users. While modularization techniques are well-known in programming languages, they are not the norm in MDE. Objective: Our goal is to ease the modularization of models to allow their efficient processing by tools and facilitate their management by users. Method: We propose five patterns that can be used to extend a modelling language with services related to modularization and scalability. Specifically, the patterns allow defining model fragmentation strategies, scoping and visibility rules, model indexing services, and scoped constraints. Once the patterns have been applied to the meta-model of a modelling language, we synthesize a customized modelling environment enriched with the defined services, which become applicable to both existing monolithic legacy models and new models. Results: Our proposal is supported by a tool called EMF-Splitter, combined with the Hawk model indexer. Our experiments show that this tool improves the validation performance of large models. Moreover, the analysis of 224 meta-models from OMG standards, and a public repository with more than 300 meta-models, demonstrates the applicability of our patterns in practice. Conclusions: Modularity mechanisms typically employed in programming IDEs can be successfully transferred to MDE, leading to more scalable and structured domain-specific modelling languages and environments.

Bibliographical note

© 2019 Elsevier B.V. This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy.

Publication details

JournalJournal of Object Technology
DateAccepted/In press - 20 Apr 2019
DatePublished (current) - 1 Jul 2019
Issue number2
Volume18
Number of pages21
Pages (from-to)1-21
Original languageEnglish

Abstract

Comparison of large models can be time-consuming since every element has to be visited, matched, and compared with its respective element in other models. This can result in bottlenecks in collaborative modelling environments, where identifying differences between two versions of a model is desirable. Reducing the comparison process to only the elements that have been modified since a previous known state (e.g., previous version) could significantly reduce the time required for large model comparison. This paper presents how change-based persistence can be used to localise the comparison of models so that only elements affected by recent changes are compared and to substantially reduce comparison and differencing time (up to 90% in some experiments) compared to state-based model comparison.

Bibliographical note

© 2019, The Author(s).

Publication details

JournalJournal of Object Technology
DatePublished - 1 Jul 2019
Issue number2
Volume18
Original languageEnglish

Abstract

The growing size of software models poses significant scalability challenges. Amongst these challenges is the execution time of queries and transformations. In many cases, model management programs are (or can be) expressed as chains and combinations of core fundamental operations. Most of these operations are pure functions, making them amenable to parallelisation, lazy evaluation and short-circuiting. In this paper we show how all three of these optimisations can be combined in the context of Epsilon: an OCL-inspired family of model management languages. We compare our solutions with both interpreted and compiled OCL as well as hand-written Java code. Our experiments show a significant improvement in the performance of queries, especially on large models.

On-the-fly Translation and Execution of OCL-like Queries on Simulink Models

Sanchez Pina, B. A., Zolotas, A., Hoyos Rodriguez, H., Kolovos, D. & Paige, R. F., 19 Jun 2019, (Accepted/In press) Proceedings of the ACM/IEEE 22th International Conference on Model Driven Engineering Languages and Systems.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publicationProceedings of the ACM/IEEE 22th International Conference on Model Driven Engineering Languages and Systems
DateAccepted/In press - 19 Jun 2019
Original languageEnglish

Publication details

JournalSoftware and Systems Modeling
DateAccepted/In press - 12 Apr 2019
DateE-pub ahead of print (current) - 11 May 2019
Number of pages37
Early online date11/05/19
Original languageEnglish

Abstract

While the majority of research on Model-Based Software Engineering revolves around open-source modelling frameworks such as the Eclipse Modelling Framework (EMF), the use of commercial and closed-source modelling tools such as RSA, Rhapsody, MagicDraw and Enterprise Architect appears to be the norm in industry at present. This technical gap can prohibit industrial users from reaping the benefits of state-of-the-art research-based tools in their practice. In this paper, we discuss an attempt to bridge a proprietary UML modelling tool (PTC Integrity Modeller), which is used for model-based development of safety-critical systems at Rolls-Royce, with an open-source family of languages for automated model management (Epsilon). We present the architecture of our solution, the challenges we encountered in developing it, and
a performance comparison against the tool's built-in scripting interface. In addition, we use the bridge in a real-world industrial case study that involves the co-ordination with other bridges between proprietary tools and Epsilon.

Bibliographical note

© The Author(s) 2019

Crossflow: A framework for distributed mining of software repositories

Kolovos, D., Neubauer, P., Barmpis, K., Matragkas, N. & Paige, R., 1 May 2019, Proceedings - 2019 IEEE/ACM 16th International Conference on Mining Software Repositories, MSR 2019. IEEE Computer Society Press, p. 155-159 5 p. 8816734. (IEEE International Working Conference on Mining Software Repositories; vol. 2019-May).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publicationProceedings - 2019 IEEE/ACM 16th International Conference on Mining Software Repositories, MSR 2019
DatePublished - 1 May 2019
Pages155-159
Number of pages5
PublisherIEEE Computer Society Press
Original languageEnglish
ISBN (Electronic)9781728134123

Publication series

NameIEEE International Working Conference on Mining Software Repositories
Volume2019-May
ISSN (Print)2160-1852
ISSN (Electronic)2160-1860

Abstract

Large-scale software repository mining typically requires substantial storage and computational resources, and often involves a large number of calls to (rate-limited) APIs such as those of GitHub and StackOverflow. This creates a growing need for distributed execution of repository mining programs to which remote collaborators can contribute computational and storage resources, as well as API quotas (ideally without sharing API access tokens or credentials). In this paper we introduce Crossflow, a novel framework for building distributed repository mining programs. We demonstrate how Crossflow can delegate mining jobs to remote workers and cache their results, and how workers can implement advanced behaviour such as load balancing and rejecting jobs they cannot perform (e.g. due to lack of space, credentials for a specific API).

Publication details

JournalInternational Journal on Software & Systems Modelling
DateAccepted/In press - 11 Jan 2018
DateE-pub ahead of print - 23 Jan 2018
DatePublished (current) - 8 Feb 2019
Issue number1
Volume18
Number of pages23
Pages (from-to)345-366
Early online date23/01/18
Original languageEnglish

Abstract

Flexible or bottom-up model-driven engineering (MDE) is an emerging approach to domain and systems modelling. Domain experts, who have detailed domain knowledge, typically lack the technical expertise to transfer this knowledge using traditional MDE tools. Flexible MDE approaches tackle this challenge by promoting the use of simple drawing tools to increase the involvement of domain experts in the language definition process. In such approaches, no metamodel is created upfront, but instead the process starts with the definition of example models that will be used to infer the metamodel. Pre-defined metamodels created by MDE experts may miss important concepts of the domain and thus restrict their expressiveness. However, the lack of a metamodel, that encodes the semantics of conforming models has some drawbacks, among others that of having models with elements that are unintentionally left untyped. In this paper, we propose the use of classification algorithms to help with the inference of such untyped elements. We evaluate the proposed approach in a number of random generated example models from various domains. The correct type prediction varies from 23 to 100% depending on the domain, the proportion of elements that were left untyped and the prediction algorithm used.

Publication details

JournalActa crystallographica. Section D, Structural biology
DateAccepted/In press - 4 Nov 2019
DatePublished (current) - 1 Dec 2019
Issue numberPt 12
Volume75
Number of pages10
Pages (from-to)1119-1128
Original languageEnglish

Abstract

A comparison of four protein model-building pipelines (ARP/wARP, Buccaneer, PHENIX AutoBuild and SHELXE) was performed using data sets from 202 experimentally phased cases, both with the data as observed and truncated to simulate lower resolutions. All pipelines were run using default parameters. Additionally, an ARP/wARP run was completed using models from Buccaneer. All pipelines achieved nearly complete protein structures and low Rwork/Rfree at resolutions between 1.2 and 1.9 Å, with PHENIX AutoBuild and ARP/wARP producing slightly lower R factors. At lower resolutions, Buccaneer leads to significantly more complete models.

Bibliographical note

© 2019 International Union of Crystallography. Uploaded with permission of the publisher/copyright holder. Further copying may not be permitted; contact the publisher for details

Towards systematic engineering of collaborative heterogeneous robotic systems

Gerasimou, S., Matragkas, N. & Calinescu, R., 27 May 2019, 2019 IEEE/ACM 2nd International Workshop on Robotics Software Engineering (RoSE). IEEE, p. 25-28 4 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publication2019 IEEE/ACM 2nd International Workshop on Robotics Software Engineering (RoSE)
DatePublished - 27 May 2019
Pages25-28
Number of pages4
PublisherIEEE
Original languageEnglish
ISBN (Electronic)9781728122496

Publication details

DatePublished - 10 May 2019
Original languageUndefined/Unknown

Abstract

Machine learning has evolved into an enabling technology for a wide range of highly successful applications. The potential for this success to continue and accelerate has placed machine learning (ML) at the top of research, economic and political agendas. Such unprecedented interest is fuelled by a vision of ML applicability extending to healthcare, transportation, defence and other domains of great societal importance. Achieving this vision requires the use of ML in safety-critical applications that demand levels of assurance beyond those needed for current ML applications. Our paper provides a comprehensive survey of the state-of-the-art in the assurance of ML, i.e. in the generation of evidence that ML is sufficiently safe for its intended use. The survey covers the methods capable of providing such evidence at different stages of the machine learning lifecycle, i.e. of the complex, iterative process that starts with the collection of the data used to train an ML component for a system, and ends with the deployment of that component within the system. The paper begins with a systematic presentation of the ML lifecycle and its stages. We then define assurance desiderata for each stage, review existing methods that contribute to achieving these desiderata, and identify open challenges that require further research.

Publication details

JournalIEEE Transactions on Software Engineering
DateAccepted/In press - 20 Apr 2019
DateE-pub ahead of print (current) - 25 Apr 2019
Early online date25/04/19
Original languageEnglish

Abstract

We introduce an efficient parametric model checking (ePMC) method for the analysis of reliability, performance and other quality-of-service (QoS) properties of software systems. ePMC speeds up the analysis of parametric Markov chains modelling the behaviour of software by exploiting domain-specific modelling patterns for the software components (e.g., patterns modelling the invocation of functionally-equivalent services used to jointly implement the same operation within service-based systems, or the deployment of the components of multi-tier software systems across multiple servers). To this end, ePMC precomputes closed-form expressions for key QoS properties of such patterns, and uses these expressions in the analysis of whole-system models. To evaluate ePMC, we show that its application to service-based systems and multi-tier software architectures reduces the analysis time by several orders of magnitude compared to current parametric model checking methods.

Bibliographical note

© Copyright 2019 IEEE - All rights reserved. This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy. Further copying may not be permitted; contact the publisher for details.

Socio-Cyber-Physical Systems: Models, Opportunities, Open Challenges

Calinescu, R. C., Camara Moreno, J. & Paterson, C., 2019, (Accepted/In press) 5th International Workshop on Software Engineering for Smart Cyber-Physical Systems.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publication5th International Workshop on Software Engineering for Smart Cyber-Physical Systems
DateAccepted/In press - 2019
Original languageEnglish

Abstract

Almost without exception, cyber-physical systems operate alongside, for the benefit of, and supported by humans. Unsurprisingly, disregarding their social aspects during
development and operation renders these systems ineffective. In this paper, we explore approaches to modelling and reasoning about the human involvement in socio-cyber-physical systems (SCPS). To provide an unbiased perspective, we describe both the opportunities afforded by the presence of human agents, and the challenges associated with ensuring that their modelling is sufficiently accurate to support decision making during SCPS development and, if applicable, at run-time. Using SCPS examples from emergency management and assisted living, we illustrate how recent advances in stochastic modelling, analysis and synthesis can be used to exploit human observations about the impact of natural and man-made disasters, and to support the efficient provision of assistive care.

On Learning in Collective Self-adaptive Systems: State of Practice and a 3D Framework

Gerasimou, S., D’Angelo, M., Ghahremani, S., Grohmann, J., Nunes, I., Pournaras, E. & Tomforde, S., 22 Mar 2019, (Accepted/In press) 14th International Symposium on Software Engineering for Adaptive and Self-Managing Systems.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publication14th International Symposium on Software Engineering for Adaptive and Self-Managing Systems
DateAccepted/In press - 22 Mar 2019
Original languageEnglish

Abstract

Collective self-adaptive systems (CSAS) are distributed and interconnected systems composed of multiple agents that can perform complex tasks such as environmental data collection, search and rescue operations, and discovery of natural resources. By providing individual agents with learning capabilities, CSAS can cope with challenges related to distributed sensing and decision-making and operate in uncertain environments. This unique characteristic of CSAS enables the collective to exhibit robust behaviour while achieving system-wide and agent-specific goals. Although learning has been explored in many CSAS applications, selecting suitable learning models and techniques remains a significant challenge that is heavily influenced by expert knowledge. We address this gap by performing a multifaceted analysis of existing CSAS with learning capabilities reported in the literature. Based on this analysis, we introduce a 3D framework that illustrates the learning aspects of CSAS considering the dimensions of autonomy, knowledge access, and behaviour, and facilitates the selection of learning techniques and models. Finally, using example applications from this analysis, we derive open challenges and highlight the need for research on collaborative, resilient and privacy-aware mechanisms for CSAS.

DeepFault: Fault Localization for Deep Neural Networks

Gerasimou, S., Eniser, H. F. & Sen, A., 15 Feb 2019, 22nd International Conference on Fundamental Approaches to Software Engineering. Springer-Verlag

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publication22nd International Conference on Fundamental Approaches to Software Engineering
DateE-pub ahead of print - 15 Feb 2019
PublisherSpringer-Verlag
Original languageEnglish

Abstract

Deep Neural Networks (DNNs) are increasingly deployed in safety-critical applications including autonomous vehicles and medical diagnostics. To reduce the residual risk for unexpected DNN behaviour and provide evidence for their trustworthy operation, DNNs should be thoroughly tested. The DeepFault white box DNN testing approach presented in our paper addresses this challenge by employing suspiciousness measures inspired by fault localization to establish the hit spectrum of neurons and identify suspicious neurons whose weights have not been calibrated correctly and thus are considered responsible for inadequate DNN performance. DeepFault also uses a suspiciousness-guided algorithm to synthesize new inputs, from correctly classified inputs, that increase the activation values of suspicious neurons. Our empirical evaluation on several DNN instances trained on MNIST and CIFAR-10 datasets shows that DeepFault is effective in identifying suspicious neurons. Also, the inputs synthesized by DeepFault closely resemble the original inputs, exercise the identified suspicious neurons and are highly adversarial.

Publication details

JournalJournal of Cloud Computing: Advances, Systems and Applications (JoCCASA)
DateAccepted/In press - 1 Mar 2018
DateE-pub ahead of print - 15 Mar 2018
DatePublished (current) - 1 Dec 2018
Issue number1
Volume7
Early online date15/03/18
Original languageEnglish

Abstract

A key challenge in porting enterprise software systems to the cloud is the migration of their database. Choosing a cloud provider and service option (e.g., a database-as-a-service or a manually configured set of virtual machines) typically requires the estimation of the cost and migration duration for each considered option. Many organisations also require this information for budgeting and planning purposes. Existing cloud migration research focuses on the software components, and therefore does not address this need. We introduce a two-stage approach which accurately estimates the migration cost, migration duration and cloud running costs of relational databases. The first stage of our approach obtains workload and structure models of the database to be migrated from database logs and the database schema. The second stage performs a discrete-event simulation using these models to obtain the cost and duration estimates. We implemented software tools that automate both stages of our approach. An extensive evaluation compares the estimates from our approach against results from real-world cloud database migrations.

Bibliographical note

© The Author(s). 2018

Event-Driven Bandwidth Allocation with Formal Guarantees for Camera Networks

Seetanadi, G. N., Camara Moreno, J., Almeida, L., Arzen, K. E. & Maggio, M., 31 Jan 2018, Proceedings - 2017 IEEE Real-Time Systems Symposium, RTSS 2017. Institute of Electrical and Electronics Engineers Inc., Vol. 2018-January. p. 243-254 12 p. (IEEE Real-Time Systems Symposium (RTSS)).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publicationProceedings - 2017 IEEE Real-Time Systems Symposium, RTSS 2017
DateAccepted/In press - 31 Jul 2017
DatePublished (current) - 31 Jan 2018
Pages243-254
Number of pages12
PublisherInstitute of Electrical and Electronics Engineers Inc.
Volume2018-January
Original languageEnglish
ISBN (Electronic)9781538614143

Publication series

NameIEEE Real-Time Systems Symposium (RTSS)
PublisherIEEE
ISSN (Electronic)2576-3172

Abstract

Modern computing systems are often formed by multiple components that interact with each other through the use of shared resources (e.g., CPU, network bandwidth, storage). In this paper, we consider a representative scenario of one such system in the context of an Internet of Things application. The system consists of a network of self-adaptive cameras that share a communication channel, transmitting streams of frames to a central node. The cameras can modify a quality parameter to adapt the amount of information encoded and to affect their bandwidth requirements and usage. A critical design choice for such a system is scheduling channel access, i.e., how to determine the amount of channel capacity that should be used by each of the cameras at any point in time. Two main issues have to be considered for the choice of a bandwidth allocation scheme: (i) camera adaptation and network access scheduling may interfere with one another, (ii) bandwidth distribution should be triggered only when necessary, to limit additional overhead. This paper proposes the first formally verified event-triggered adaptation scheme for bandwidth allocation, designed to minimize additional overhead in the network. Desired properties of the system are verified using model checking. The paper also describes experimental results obtained with an implementation of the scheme.

Structuring Clinical Decision Support rules for drug safety using Natural Language Processing

Despotou, G., Korkontzelos, I., Matragkas, N., Bilici, E. & Arvanitis, T. N., 3 Jun 2018, Data, Informatics and Technology: An Inspiration for Improved Healthcare. Hasman, A., Gallos, P., Liaskos, J., Househ, M. S. & Mantas, J. (eds.). IOS Press, p. 89-92 4 p. (Studies in Health Technology and Informatics (HTI); vol. 251).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Publication details

Title of host publicationData, Informatics and Technology
DatePublished - 3 Jun 2018
Pages89-92
Number of pages4
PublisherIOS Press
EditorsArie Hasman, Parisis Gallos, Joseph Liaskos, Mowafa S. Househ, John Mantas
Original languageEnglish
ISBN (Print)9781614998808

Publication series

NameStudies in Health Technology and Informatics (HTI)
PublisherIOS Press
Volume251
ISSN (Print)0926-9630
ISSN (Electronic)1879-8365

Publication details

JournalIEEE Transactions on Software Engineering
DateAccepted/In press - 19 Jul 2017
DateE-pub ahead of print - 11 Aug 2017
DatePublished (current) - 1 Nov 2018
Issue number11
Volume44
Number of pages31
Pages (from-to)1039-1069
Early online date11/08/17
Original languageEnglish

Abstract

Building on concepts drawn from control theory, self-adaptive software handles environmental and internal uncertainties by dynamically adjusting its architecture and parameters in response to events such as workload changes and component failures. Self-adaptive software is increasingly expected to meet strict functional and non-functional requirements in applications from areas as diverse as manufacturing, healthcare and finance. To address this need, we introduce a methodology for the systematic ENgineering of TRUstworthy Self-adaptive sofTware (ENTRUST). ENTRUST uses a combination of (1) design-time and runtime modelling and verification, and (2) industry-adopted assurance processes to develop trustworthy self-adaptive software and assurance cases arguing the suitability of the software for its intended application. To evaluate the effectiveness of our methodology, we present a tool-supported instance of ENTRUST and its use to develop proof-of-concept self-adaptive software for embedded and service-based systems from the oceanic monitoring and e-finance domains, respectively. The experimental results show that ENTRUST can be used to engineer self-adaptive software systems in different application domains and to generate dynamic assurance cases for these systems.

Bibliographical note

(c) 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works

Publication details

JournalJournal of Systems and Software
DateAccepted/In press - 9 May 2018
DateE-pub ahead of print - 16 May 2018
DatePublished (current) - Sep 2018
Volume143
Number of pages19
Pages (from-to)140-158
Early online date16/05/18
Original languageEnglish

Abstract

We describe a tool-supported method for the efficient synthesis of parametric continuous-time Markov chains (pCTMC) that correspond to robust designs of a system under development. The pCTMCs generated by our RObust DEsign Synthesis (RODES) method are resilient to changes in the system’s operational profile, satisfy strict reliability, performance and other quality constraints, and are Pareto-optimal or nearly Pareto-optimal with respect to a set of quality optimisation criteria. By integrating sensitivity analysis at designer-specified tolerance levels and Pareto optimality, RODES produces designs that are potentially slightly suboptimal in return for less sensitivity—an acceptable trade-off in engineering practice. We demonstrate the effectiveness of our method and the efficiency of its GPU-accelerated tool support across multiple application domains by using RODES to design a producer-consumer system, a replicated file system and a workstation cluster system.

Bibliographical note

© 2018 The Authors.

Funded Projects

Responsible Data Science by Design, EUR 956,754.00

Kahrobaei, D., Kolovos, D. & Matragkas, N.

1/01/2031/12/22

Project: Research project (funded)Research

Description

York Maastricht Partnership Investment Fund
StatusActive
Effective start/end date1/01/2031/12/22

Description

Marie Skłodowska-Curie training network of 15 Early Stage Researchers across Europe investigating aspects of scalability in low-code software engineering platforms. Network members include British Telecom, Intecs, B2T Concept, CLMS, IncQuery Labs and the Universities of Nantes (IMT), Madrid (Autonoma), L'Aquila and (TU) Wien.
StatusActive
Effective start/end date1/01/1931/12/22

KTP with Rolls Royce 2 - Industry Funding

Kolovos, D.

1/10/1830/09/21

Project: Research project (funded)Research

Description

Knowledge Transfer Partnership with Rolls-Royce on Model-Based Development of Aerospace Systems, co-funded by InnovateUK
StatusActive
Effective start/end date1/10/1830/09/21

KTP With IBM (Innovate)

Paige, R. F., Kolovos, D. & Manandhar, S.

1/04/1831/03/21

Project: Research project (funded)Research

Description

Knowledge Transfer Partnership with IBM UK on automated knowledge extraction and re-engineering of financial planning spreadsheets, co-funded by InnovateUK
StatusActive
Effective start/end date1/04/1831/03/21

TYPHON - Polyglot Persistence and Processing of Big Data

Kolovos, D.

EUROPEAN COMMISSION

1/01/1831/12/20

Project: Research project (funded)Research

Description

Horizon 2020 project on polyglot (relational/document/graph) data persistence and processing architectures with Volkswagen, GMV, Alpha Bank, OTE, the Open Group, and the Universities of L'Aquila, Edge Hill, Namur and Amsterdam (CWI)
StatusActive
Effective start/end date1/01/1831/12/20

StatusActive
Effective start/end date1/11/2030/04/24

Engineering Assured Autonomous Systems

Calinescu, R. & Gerasimou, S.

EPSRC

19/11/1928/02/21

Project: Research project (funded)Research

StatusActive
Effective start/end date19/11/1928/02/21

Description

Horizon 2020 project on knowledge mining from open-source software repositories with the Eclipse Foundation, the Open Group, OW2, Bitergia, FrontEndArt, Softeam, Unparallel Innovation, Castalia and the Universities of L'Aquila, Athens (AUEB), Amsterdam (CWI), and Edge Hill
StatusFinished
Effective start/end date1/01/1731/12/19

AcronymScalable Modelling and Model Management on the Cloud
StatusFinished
Effective start/end date1/11/1330/04/16

OSSMETER (EU ICT Bid)

Paige, R. F. & Kolovos, D.

EUROPEAN COMMISSION

1/10/1230/03/15

Project: Research project (funded)Research

StatusFinished
Effective start/end date1/10/1230/03/15

Bridging the Gap Between Programming and Modelling

Paige, R. F.

THE ROYAL SOCIETY

1/03/1829/02/20

Project: Research project (funded)Research

StatusFinished
Effective start/end date1/03/1829/02/20

CyPhERS

McDermid, J. A. & Paige, R. F.

EUROPEAN COMMISSION

1/07/1328/02/15

Project: Research project (funded)Research

StatusFinished
Effective start/end date1/07/1328/02/15

DSTL PhD Studentship - Radu Calinescu

Calinescu, R. & Paige, R. F.

1/10/1230/09/16

Project: Research project (funded)Research

StatusFinished
Effective start/end date1/10/1230/09/16

COMPASS: Automated Safety Warnings (SESAR)

Paige, R. F.

SESAR JOINT UNDERTAKING

1/04/1130/11/13

Project: Research project (funded)Research

StatusFinished
Effective start/end date1/04/1130/11/13

StatusFinished
Effective start/end date1/02/1031/07/12

Development of Collaborations with the Weizmann Institute of Science and IBM Haifa

Paige, R. F.

EPSRC

1/11/0731/10/08

Project: Research project (funded)Research

StatusFinished
Effective start/end date1/11/0731/10/08

DSTL TDS Studentship: Assured Reinforcement Learning

Calinescu, R. & Kudenko, D.

1/10/1330/09/17

Project: Research project (funded)Research

StatusFinished
Effective start/end date1/10/1330/09/17

Cloud Computing for LSCITS

Calinescu, R.

EPSRC

1/05/1231/03/14

Project: Research project (funded)Research

StatusFinished
Effective start/end date1/05/1231/03/14

Automatic Repair Of Natural Source Code (MANATEE)

Matragkas, N.

Project: Research project (funded)Research

StatusNot started

Secure and Safe Multi-Robot Systems

Matragkas, N. & Gerasimou, S.

Project: Research project (funded)Research

Short titleSESAME
StatusNot started