Search Results for author: Frank Keller

Found 49 papers, 17 papers with code

Investigating Negation in Pre-trained Vision-and-language Models

1 code implementation EMNLP (BlackboxNLP) 2021 Radina Dobreva, Frank Keller

Pre-trained vision-and-language models have achieved impressive results on a variety of tasks, including ones that require complex reasoning beyond object recognition.

Negation Object Recognition

Select and Summarize: Scene Saliency for Movie Script Summarization

no code implementations4 Apr 2024 Rohit Saxena, Frank Keller

Abstractive summarization for long-form narrative texts such as movie scripts is challenging due to the computational and memory constraints of current language models.

Abstractive Text Summarization

Efficient Pre-training for Localized Instruction Generation of Videos

no code implementations27 Nov 2023 Anil Batra, Davide Moltisanti, Laura Sevilla-Lara, Marcus Rohrbach, Frank Keller

Understanding such videos is challenging, involving the precise localization of steps and the generation of textual instructions.

Semi-supervised multimodal coreference resolution in image narrations

1 code implementation20 Oct 2023 Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i. e., a narration is paired with an image.

coreference-resolution Descriptive

Visual Storytelling with Question-Answer Plans

no code implementations8 Oct 2023 Danyang Liu, Mirella Lapata, Frank Keller

Our model translates the image sequence into a visual prefix, a sequence of continuous embeddings which language models can interpret.

Visual Storytelling

Dynamic Planning with a LLM

1 code implementation11 Aug 2023 Gautier Dagan, Frank Keller, Alex Lascarides

While Large Language Models (LLMs) can solve many NLP tasks in zero-shot settings, applications involving embodied agents remain problematic.

Meta-learning For Vision-and-language Cross-lingual Transfer

no code implementations24 May 2023 Hanxu Hu, Frank Keller

Current pre-trained vison-language models (PVLMs) achieve excellent performance on a range of multi-modal datasets.

Cross-Lingual Transfer Meta-Learning

Learning the Effects of Physical Actions in a Multi-modal Environment

1 code implementation27 Jan 2023 Gautier Dagan, Frank Keller, Alex Lascarides

However, predicting the effects of an action before it is executed is crucial in planning, where coherent sequences of actions are often needed to achieve a goal.

Physical Commonsense Reasoning

Who are you referring to? Coreference resolution in image narrations

no code implementations ICCV 2023 Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

Coreference resolution aims to identify words and phrases which refer to same entity in a text, a core task in natural language processing.

coreference-resolution

Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation

no code implementations CVPR 2022 Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects, which is essential for full scene understanding.

Graph Generation Informativeness +2

Film Trailer Generation via Task Decomposition

no code implementations16 Nov 2021 Pinelopi Papalampidi, Frank Keller, Mirella Lapata

Movie trailers perform multiple functions: they introduce viewers to the story, convey the mood and artistic style of the film, and encourage audiences to see the movie.

A Temporal Variational Model for Story Generation

3 code implementations14 Sep 2021 David Wilmot, Frank Keller

Recent language models can generate interesting and grammatically correct text in story generation but often lack plot development and long-term coherence.

Story Generation

A New Split for Evaluating True Zero-Shot Action Recognition

1 code implementation27 Jul 2021 Shreyank N Gowda, Laura Sevilla-Lara, Kiyoon Kim, Frank Keller, Marcus Rohrbach

We benchmark several recent approaches on the proposed True Zero-Shot(TruZe) Split for UCF101 and HMDB51, with zero-shot and generalized zero-shot evaluation.

Few-Shot action recognition Few Shot Action Recognition +2

CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition

no code implementations18 Jan 2021 Shreyank N Gowda, Laura Sevilla-Lara, Frank Keller, Marcus Rohrbach

Theproblem can be seen as learning a function which general-izes well to instances of unseen classes without losing dis-crimination between classes.

Action Recognition Clustering +4

Movie Summarization via Sparse Graph Construction

1 code implementation14 Dec 2020 Pinelopi Papalampidi, Frank Keller, Mirella Lapata

We summarize full-length movies by creating shorter videos containing their most informative scenes.

graph construction Turning Point Identification +1

Screenplay Summarization Using Latent Narrative Structure

2 code implementations ACL 2020 Pinelopi Papalampidi, Frank Keller, Lea Frermann, Mirella Lapata

Most general-purpose extractive summarization models are trained on news articles, which are short and present all important information upfront.

Document Summarization Extractive Summarization +3

Movie Plot Analysis via Turning Point Identification

no code implementations IJCNLP 2019 Pinelopi Papalampidi, Frank Keller, Mirella Lapata

According to screenwriting theory, turning points (e. g., change of plans, major setback, climax) are crucial narrative moments within a screenplay: they define the plot structure, determine its progression and segment the screenplay into thematic units (e. g., setup, complications, aftermath).

Position Sentence +1

An Imitation Learning Approach to Unsupervised Parsing

1 code implementation ACL 2019 Bowen Li, Lili Mou, Frank Keller

In our work, we propose an imitation learning approach to unsupervised parsing, where we transfer the syntactic knowledge induced by the PRPN to a Tree-LSTM model with discrete parsing actions.

Imitation Learning Language Modelling +1

Cross-lingual Visual Verb Sense Disambiguation

1 code implementation NAACL 2019 Spandana Gella, Desmond Elliott, Frank Keller

We extend this line of work to the more challenging task of cross-lingual verb sense disambiguation, introducing the MultiSense dataset of 9, 504 images annotated with English, German, and Spanish verbs.

Machine Translation Translation

Character-based Surprisal as a Model of Reading Difficulty in the Presence of Error

no code implementations2 Feb 2019 Michael Hahn, Frank Keller, Yonatan Bisk, Yonatan Belinkov

Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones.

Dependency Grammar Induction with a Neural Variational Transition-based Parser

no code implementations14 Nov 2018 Bowen Li, Jianpeng Cheng, Yang Liu, Frank Keller

Transition-based models enable faster inference with $O(n)$ time complexity, but their performance still lags behind.

Dependency Grammar Induction Variational Inference

Modeling Task Effects in Human Reading with Neural Network-based Attention

no code implementations31 Jul 2018 Michael Hahn, Frank Keller

Research on human reading has long documented that reading behavior shows task-specific effects, but it has been challenging to build general models predicting what reading behavior humans will show in a given task.

Question Answering Reading Comprehension

Extreme clicking for efficient object annotation

no code implementations ICCV 2017 Dim P. Papadopoulos, Jasper R. R. Uijlings, Frank Keller, Vittorio Ferrari

We crowd-source extreme point annotations for PASCAL VOC 2007 and 2012 and show that (1) annotation time is only 7s per box, 5x faster than the traditional way of drawing boxes [62]; (2) the quality of the boxes is as good as the original ground-truth drawn the traditional way; (3) detectors trained on our annotations are as accurate as those trained on the original ground-truth.

Object

Image Pivoting for Learning Multilingual Multimodal Representations

no code implementations EMNLP 2017 Spandana Gella, Rico Sennrich, Frank Keller, Mirella Lapata

In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding.

Image Retrieval Semantic Textual Similarity

An Analysis of Action Recognition Datasets for Language and Vision Tasks

no code implementations ACL 2017 Spandana Gella, Frank Keller

A large amount of recent research has focused on tasks that combine language and vision, resulting in a proliferation of datasets and methods.

Action Recognition Image Retrieval +2

Cross-lingual Transfer of Correlations between Parts of Speech and Gaze Features

no code implementations COLING 2016 Maria Barrett, Frank Keller, Anders S{\o}gaard

Several recent studies have shown that eye movements during reading provide information about grammatical and syntactic processing, which can assist the induction of NLP models.

Cross-Lingual Transfer POS +2

Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings

1 code implementation NAACL 2016 Spandana Gella, Mirella Lapata, Frank Keller

We introduce a new task, visual sense disambiguation for verbs: given an image and a verb, assign the correct sense of the verb, i. e., the one that describes the action depicted in the image.

Image Retrieval Retrieval +1

We don't need no bounding-boxes: Training object class detectors using only human verification

1 code implementation CVPR 2016 Dim P. Papadopoulos, Jasper R. R. Uijlings, Frank Keller, Vittorio Ferrari

Training object class detectors typically requires a large set of images in which objects are annotated by bounding-boxes.

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures

no code implementations15 Jan 2016 Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank

Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities.

Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.