MARRYing the analyses of feasible algorithms and problems (MARRY)
Start: January 1st, 2023
Duration: 4 years
Royal Society International Exchanges Grant, IES\R3\223051
Wider Research Context
Pure foundational research provides seeds for successful applications. A notable example in the context of our proposal is a tool, RaML: While based on implicit characterisations of feasible computation by Martin Hofmann to provide elegant descriptions, RaML is nowadays the most advanced tool to estimate resource consumptions of OCaml programs. Also the direction from applications to theory is significant. Deeply rooted in practical questions, Ashish Tiwari defined a restrictive class of linear loop programs and initiated an ongoing study into their termination. It emerged that this links intimately to Skolem's problem, a deep problem in number theory that attracted comments from Terence Tao.
Objectives
Inspired by these examples, we aim to study results between application-oriented areas in program analysis and foundational aspects in proof theory. More precisely, the analysis of feasible algorithms is concerned with the development of (often) automated methods to infer the runtime costs of programs. On the other hand, complexity theory, the study of the inherent complexity of given computational problems, is naturally couched in theories of Bounded Arithmetic (BA). Thus, questions of feasibility of algorithms and problems become questions of provability in BA.
Methods
In an initial phase of the project, we aim to validate our approach by investigating the transfer of methods in the study of rewriting systems and the analysis of proofs in BA. A second phase will explore methods used in the study of BA and program analysis with the aim to develop a long-term programme that can be sustained by follow-up funding.
November 17th to 26th, 2023 | |
University of Innsbruck, AT |
Automated Sublinear Amortised Resource Analysis of Data Structures (AUTOSARD)
Start: April 1st, 2023
Duration: 4 years
FWF Project Number: P 36623
Wider Research Context
Amortised analysis is a method for the worst-case cost analysis of (probabilistic) data structures, where a single data structure operation is considered as part of a larger sequence of operations. The cost analysis of sophisticated data structures, such as self-adjusting binary search trees, has been a main objective already in the initial proposal of amortised analysis. Analysing these data structures requires sophisticated potential functions, which typically contain sublinear expressions (such as the logarithm). Apart from our pilot project, the analysis of these data structures has so far eluded automation.
Objectives
We target an automated analysis of the most common data structures with good, ie. sublinear, complexity, such as balanced trees, Fibonacci heaps, randomised search trees, skip lists, skew heaps, Union-find, etc. Our goals are the verification of textbook data structures, the confirmation and improvement (on coefficients) of previously reported complexity bounds, as well as the automated analysis of realistic data structure implementations. Initially, we will only consider strict (first-order) functional programming languages. Later on, we will extend our research towards lazy evaluation in order to allow for the analysis of persistent data structures. Moreover, we will also consider probabilistic data structures.
Methods
Based on the success of our pilot study, we envision the following steps for automatising amortised analysis: (i) Fix a parametrised potential function; (ii) derive a (linear) constraint system over the function parameters from the AST of the program; (iii) capture the required non-linear reasoning trough explicit background knowledge integrable into the constraint solving mechanism; and finally (iv) find values for the parameters by an (optimising) constraint solver.
Machine Translation for Ladin
Start: September 1st, 2021
Duration: 3 years
Wider Research Context
Traditionally, state-of-the-art approaches for the development of effective machine translation systems require a large amount of training data. However, the limited availability of resources and texts in smaller languages prevents the effective application of these algorithms. In recent years, this area of research has increasingly focused on low-resource scenarios and thus also on the machine translation of smaller languages. This has led to the development of various methods that circumvent the problem of limited data availability or make more efficient use of the available data. Furthermore, with the publication of ChatGPT and other Large Language Models (LLMs), an entirely new branch of machine translation research has emerged. LLMs, as language models, have an impressive understanding of natural language and are able to process complex contexts. With just a few training samples, they can be adapted to new problems. In combination with the methods developed, these capabilities could now also be used for the improvement of machine translation for low-resource languages and pave the way for well-functioning systems in such languages too. The question remains how this can best be achieved.
Goals
The aim of this project is to develop, in cooperation with the Ladin Cultural Institute 'Micurá de Rü', a machine translation system for the Ladin language. The Ladin language or Dolomite Ladin is an officially recognised minority language in Italy and spoken by around 30,000 people in the five valleys around the Sella massif. The language varies from valley to valley and a distinction is made between the following idioms Val Badia (Gadertal), Val Gardena (Grödnertal), Fassa (Fassa), Ampezzo (Anpezo) and Buchenstein (Fodom). These variants must be treated separately in this problem, which creates an additional challenge. For the time being, we will limit ourselves to the Val Badia variant and try to develop approaches that have proven to work well for this variant and that we can then extend to the other variants.
Impact
The successful development of a well-functioning machine translation system for Ladin would create new opportunities for the use and research of this language. On the one hand, it could facilitate access to this language and culture, and on the other, it would provide a valuable tool for the composition of Ladin texts, which would also be of benefit to the Ladin community. Therefore, it would also contribute to the preservation of this language, at least in the digital world.
Title | Start | Duration | # of Publications | Next Meeting |
---|---|---|---|---|
MARRYing the analyses of feasible algorithms and problems (MARRY) | January 1st, 2023 | 4 years | (none yet) | (none planned) |
Automated Sublinear Amortised Resource Analysis of Data Structures (AUTOSARD) | April 1st, 2023 | 4 years | 3 | (none planned) |
Machine Translation for Ladin | September 1st, 2021 | 3 years | 2 | (none planned) |