no code implementations • ACL 2022 • George Pantazopoulos, Alessandro Suglia, Arash Eshghi
Compositionality – the ability to combine simpler concepts to understand & generate arbitrarily more complex conceptual structures – has long been thought to be the cornerstone of human language capacity.
no code implementations • SIGDIAL (ACL) 2022 • Alessandro Suglia, Bhathiya Hemanthage, Malvina Nikandrou, George Pantazopoulos, Amit Parekh, Arash Eshghi, Claudio Greco, Ioannis Konstas, Oliver Lemon, Verena Rieser
We demonstrate EMMA, an embodied multimodal agent which has been developed for the Alexa Prize SimBot challenge.
no code implementations • EURALI (LREC) 2022 • Irene Sucameli, Michele De Quattro, Arash Eshghi, Alessandro Suglia, Maria Simi
Since the advent of Transformer-based, pretrained language models (LM) such as BERT, Natural Language Understanding (NLU) components in the form of Dialogue Act Recognition (DAR) and Slot Recognition (SR) for dialogue systems have become both more accurate and easier to create for specific application domains.
no code implementations • ReInAct 2021 • Angus Addlesee, Arash Eshghi
The next generation of conversational AI systems need to: (1) process language incrementally, token-by-token to be more responsive and enable handling of conversational phenomena such as pauses, restarts and self-corrections; (2) reason incrementally allowing meaning to be established beyond what is said; (3) be transparent and controllable, allowing designers as well as the system itself to easily establish reasons for particular behaviour and tailor to particular user groups, or domains.
no code implementations • 26 Oct 2024 • Houman Mehrafarin, Arash Eshghi, Ioannis Konstas
Evaluating Large Language Models (LLMs) on reasoning benchmarks demonstrates their ability to solve compositional questions.
1 code implementation • 21 Sep 2024 • Javier Chiyah-Garcia, Alessandro Suglia, Arash Eshghi
In dialogue, the addressee may initially misunderstand the speaker and respond erroneously, often prompting the speaker to correct the misunderstanding in the next turn with a Third Position Repair (TPR).
1 code implementation • 9 Sep 2024 • Georgios Pantazopoulos, Malvina Nikandrou, Alessandro Suglia, Oliver Lemon, Arash Eshghi
We explore two hypotheses to explain this phenomenon: 1) the effect of task-agnostic visual encoding on the updates of the hidden states, and 2) the difficulty in performing visual grounding from the perspective of in-context multimodal retrieval.
1 code implementation • 19 Jun 2024 • Alessandro Suglia, Claudio Greco, Katie Baker, Jose L. Part, Ioannis Papaioannou, Arash Eshghi, Ioannis Konstas, Oliver Lemon
First, we introduce the Egocentric Video Understanding Dataset (EVUD) for training VLMs on video captioning and question answering tasks specific to egocentric videos.
1 code implementation • 21 Apr 2024 • Georgios Pantazopoulos, Alessandro Suglia, Oliver Lemon, Arash Eshghi
In this paper, we use \textit{diagnostic classifiers} to measure the extent to which the visual prompt produced by the resampler encodes spatial information.
no code implementations • 7 Nov 2023 • Georgios Pantazopoulos, Malvina Nikandrou, Amit Parekh, Bhathiya Hemanthage, Arash Eshghi, Ioannis Konstas, Verena Rieser, Oliver Lemon, Alessandro Suglia
Interactive and embodied tasks pose at least two fundamental challenges to existing Vision & Language (VL) models, including 1) grounding language in trajectories of actions and observations, and 2) referential disambiguation.
1 code implementation • 22 Aug 2023 • Arash Eshghi, Arash Ashrafzadeh
We further do a zero-shot evaluation of the ability of the same model to generate self-repairs when the generation goal changes mid-utterance.
1 code implementation • 31 Jul 2023 • Vevake Balaraman, Arash Eshghi, Ioannis Konstas, Ioannis Papaioannou
We demonstrate the usefulness of the data by training and evaluating strong baseline models for executing TPRs.
1 code implementation • 28 Jul 2023 • Javier Chiyah-Garcia, Alessandro Suglia, Arash Eshghi, Helen Hastie
Referential ambiguities arise in dialogue when a referring expression does not uniquely identify the intended referent for the addressee.
no code implementations • 25 May 2023 • Sabrina Chiesurin, Dimitris Dimakopoulos, Marco Antonio Sobrevilla Cabezudo, Arash Eshghi, Ioannis Papaioannou, Verena Rieser, Ioannis Konstas
Large language models are known to produce output which sounds fluent and convincing, but is also often wrong, e. g. "unfaithful" with respect to a rationale as retrieved from a knowledge base.
Conversational Question Answering
Open-Domain Question Answering
2 code implementations • 25 Feb 2022 • Javier Chiyah-Garcia, Alessandro Suglia, José Lopes, Arash Eshghi, Helen Hastie
Anaphoric expressions, such as pronouns and referential descriptions, are situated with respect to the linguistic context of prior turns, as well as, the immediate visual environment.
1 code implementation • EACL 2021 • Miruna Clinciu, Arash Eshghi, Helen Hastie
As transparency becomes key for robotics and AI, it will be necessary to evaluate the methods through which transparency is provided, including automatically generated natural language (NL) explanations.
2 code implementations • COLING 2020 • Angus Addlesee, Yanchao Yu, Arash Eshghi
Automatic Speech Recognition (ASR) systems are increasingly powerful and more accurate, but also more numerous with several options existing currently as a service (e. g. Google, IBM, and Microsoft).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • IJCNLP 2019 • Igor Shalyminov, Sungjin Lee, Arash Eshghi, Oliver Lemon
Our main dataset is the Stanford Multi-Domain dialogue corpus.
no code implementations • 14 Sep 2019 • Angus Addlesee, Arash Eshghi, Ioannis Konstas
Dialogue technologies such as Amazon's Alexa have the potential to transform the healthcare industry.
no code implementations • WS 2019 • Igor Shalyminov, Sungjin Lee, Arash Eshghi, Oliver Lemon
Learning with minimal data is one of the key challenges in the development of practical, production-ready goal-oriented dialogue systems.
9 code implementations • 13 Mar 2019 • Xingkun Liu, Arash Eshghi, Pawel Swietojanski, Verena Rieser
We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible to the lay developer.
no code implementations • 8 Oct 2018 • Igor Shalyminov, Arash Eshghi, Oliver Lemon
To test the model's generalisation potential, we evaluate the same model on the bAbI+ dataset, without any additional training.
no code implementations • WS 2017 • Yanchao Yu, Arash Eshghi, Oliver Lemon
We present an optimised multi-modal dialogue agent for interactive learning of visually grounded word meanings from a human tutor, trained on real human-human tutoring data.
no code implementations • WS 2017 • Yanchao Yu, Arash Eshghi, Gregory Mills, Oliver Joseph Lemon
We motivate and describe a new freely available human-human dialogue dataset for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner.
no code implementations • WS 2016 • Yanchao Yu, Arash Eshghi, Oliver Lemon
We present a multi-modal dialogue system for interactive learning of perceptually grounded word meanings from a human tutor.
1 code implementation • 22 Sep 2017 • Igor Shalyminov, Arash Eshghi, Oliver Lemon
Results show that the semantic accuracy of the MemN2N model drops drastically; and that although it is in principle able to learn to process the constructions in bAbI+, it needs an impractical amount of training data to do so.
no code implementations • EMNLP 2017 • Arash Eshghi, Igor Shalyminov, Oliver Lemon
Our experiments show that our model can process 74% of the Facebook AI bAbI dataset even when trained on only 0. 13% of the data (5 dialogues).
no code implementations • WS 2017 • Yanchao Yu, Arash Eshghi, Oliver Lemon
We present VOILA: an optimised, multi-modal dialogue agent for interactive learning of visually grounded word meanings from a human user.
no code implementations • 1 Dec 2016 • Dimitrios Kalatzis, Arash Eshghi, Oliver Lemon
We present a method for inducing new dialogue systems from very small amounts of unannotated dialogue data, showing how word-level exploration using Reinforcement Learning (RL), combined with an incremental and semantic grammar - Dynamic Syntax (DS) - allows systems to discover, generate, and understand many new dialogue variants.