Search Results for author: Gül Varol

Found 33 papers, 19 papers with code

A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision

no code implementations16 May 2024 Charles Raude, K R Prajwal, Liliane Momeni, Hannah Bull, Samuel Albanie, Andrew Zisserman, Gül Varol

To this end, we introduce a multi-task Transformer model, CSLR2, that is able to ingest a signing sequence and output in a joint embedding space between signed language and spoken language text.

Learning text-to-video retrieval from image captioning

no code implementations26 Apr 2024 Lucas Ventura, Cordelia Schmid, Gül Varol

In this paper, we make use of this progress and instantiate the image experts from two types of models: a text-to-image retrieval model to provide an initial backbone, and image captioning models to provide supervision signal into unlabeled videos.

Image Captioning Image Retrieval +4

AutoAD III: The Prequel -- Back to the Pixels

no code implementations22 Apr 2024 Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names.

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

no code implementations10 Oct 2023 Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.

Language Modelling Text Generation

CoVR: Learning Composed Video Retrieval from Web Video Captions

1 code implementation28 Aug 2023 Lucas Ventura, Antoine Yang, Cordelia Schmid, Gül Varol

Most CoIR approaches require manually annotated datasets, comprising image-text-image triplets, where the text describes a modification from the query image to the target image.

Composed Video Retrieval (CoVR) Language Modelling +3

TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis

no code implementations ICCV 2023 Mathis Petrovich, Michael J. Black, Gül Varol

We show that maintaining the motion generation loss, along with the contrastive training, is crucial to obtain good performance.

Moment Retrieval Motion Synthesis +3

SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation

no code implementations ICCV 2023 Nikos Athanasiou, Mathis Petrovich, Michael J. Black, Gül Varol

Motivated by the observation that the correspondence between actions and body parts is encoded in powerful language models, we extract this knowledge by prompting GPT-3 with text such as "what are the body parts involved in the action <action name>?

Action Generation

Going Beyond Nouns With Vision & Language Models Using Synthetic Data

1 code implementation ICCV 2023 Paola Cascante-Bonilla, Khaled Shehada, James Seale Smith, Sivan Doveh, Donghyun Kim, Rameswar Panda, Gül Varol, Aude Oliva, Vicente Ordonez, Rogerio Feris, Leonid Karlinsky

We contribute Synthetic Visual Concepts (SyViC) - a million-scale synthetic dataset and data generation codebase allowing to generate additional suitable data to improve VLC understanding and compositional reasoning of VL models.

Sentence Visual Reasoning

AutoAD: Movie Description in Context

1 code implementation CVPR 2023 Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form.

Image Captioning Text Generation

Weakly-supervised Fingerspelling Recognition in British Sign Language Videos

1 code implementation16 Nov 2022 K R Prajwal, Hannah Bull, Liliane Momeni, Samuel Albanie, Gül Varol, Andrew Zisserman

Through extensive evaluations, we verify our method for automatic annotation and our model architecture.

TEACH: Temporal Action Composition for 3D Humans

1 code implementation9 Sep 2022 Nikos Athanasiou, Mathis Petrovich, Michael J. Black, Gül Varol

In particular, our goal is to enable the synthesis of a series of actions, which we refer to as temporal action composition.

Motion Synthesis Sentence

Automatic dense annotation of large-vocabulary sign language videos

no code implementations4 Aug 2022 Liliane Momeni, Hannah Bull, K R Prajwal, Samuel Albanie, Gül Varol, Andrew Zisserman

Recently, sign language researchers have turned to sign language interpreted TV broadcasts, comprising (i) a video of continuous signing and (ii) subtitles corresponding to the audio content, as a readily available and large-scale source of training data.

Scaling up sign spotting through sign language dictionaries

no code implementations9 May 2022 Gül Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman

The focus of this work is $\textit{sign spotting}$ - given a video of an isolated sign, our task is to identify $\textit{whether}$ and $\textit{where}$ it has been signed in a continuous, co-articulated sign language video.

Multiple Instance Learning

TEMOS: Generating diverse human motions from textual descriptions

1 code implementation25 Apr 2022 Mathis Petrovich, Michael J. Black, Gül Varol

In contrast to most previous work which focuses on generating a single, deterministic, motion from a textual description, we design a variational approach that can produce multiple diverse human motions.

Motion Synthesis

Sign Language Video Retrieval with Free-Form Textual Queries

no code implementations CVPR 2022 Amanda Duarte, Samuel Albanie, Xavier Giró-i-Nieto, Gül Varol

Systems that can efficiently search collections of sign language videos have been highlighted as a useful application of sign language technology.

Retrieval Sentence +2

BBC-Oxford British Sign Language Dataset

no code implementations5 Nov 2021 Samuel Albanie, Gül Varol, Liliane Momeni, Hannah Bull, Triantafyllos Afouras, Himel Chowdhury, Neil Fox, Bencie Woll, Rob Cooper, Andrew McParland, Andrew Zisserman

In this work, we introduce the BBC-Oxford British Sign Language (BOBSL) dataset, a large-scale video collection of British Sign Language (BSL).

Sign Language Translation Translation

Action-Conditioned 3D Human Motion Synthesis with Transformer VAE

2 code implementations ICCV 2021 Mathis Petrovich, Michael J. Black, Gül Varol

By sampling from this latent space and querying a certain duration through a series of positional encodings, we synthesize variable-length motion sequences conditioned on a categorical action.

Action Recognition Denoising +2

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval

5 code implementations ICCV 2021 Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman

Our objective in this work is video-text retrieval - in particular a joint embedding that enables efficient text-to-video retrieval.

Ranked #4 on Video Retrieval on QuerYD (using extra training data)

Retrieval Text Retrieval +4

Read and Attend: Temporal Localisation in Sign Language Videos

no code implementations CVPR 2021 Gül Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman

Our contributions are as follows: (1) we demonstrate the ability to leverage large quantities of continuous signing videos with weakly-aligned subtitles to localise signs in continuous sign language; (2) we employ the learned attention to automatically generate hundreds of thousands of annotations for a large sign vocabulary; (3) we collect a set of 37K manually verified sign instances across a vocabulary of 950 sign classes to support our study of sign language recognition; (4) by training on the newly annotated data from our method, we outperform the prior state of the art on the BSL-1K sign language recognition benchmark.

Sign Language Recognition

Sign language segmentation with temporal convolutional networks

1 code implementation25 Nov 2020 Katrin Renz, Nicolaj C. Stache, Samuel Albanie, Gül Varol

The objective of this work is to determine the location of temporal boundaries between signs in continuous sign language videos.

Watch, read and lookup: learning to spot signs from multiple supervisors

1 code implementation8 Oct 2020 Liliane Momeni, Gül Varol, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman

The focus of this work is sign spotting - given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video.

Multiple Instance Learning

BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

1 code implementation ECCV 2020 Samuel Albanie, Gül Varol, Liliane Momeni, Triantafyllos Afouras, Joon Son Chung, Neil Fox, Andrew Zisserman

Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality.

Action Classification Keyword Spotting +2

Synthetic Humans for Action Recognition from Unseen Viewpoints

1 code implementation9 Dec 2019 Gül Varol, Ivan Laptev, Cordelia Schmid, Andrew Zisserman

Although synthetic training data has been shown to be beneficial for tasks such as human pose estimation, its use for RGB human action recognition is relatively unexplored.

Action Classification Action Recognition +2

Learning from Synthetic Humans

2 code implementations CVPR 2017 Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev, Cordelia Schmid

In this work we present SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data.

2D Human Pose Estimation 3D Human Pose Estimation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.