Search Results for author: Samuel Albanie

Found 73 papers, 46 papers with code

A Practitioner's Guide to Continual Multimodal Pretraining

1 code implementation26 Aug 2024 Karsten Roth, Vishaal Udandarao, Sebastian Dziadzio, Ameya Prabhu, Mehdi Cherti, Oriol Vinyals, Olivier Hénaff, Samuel Albanie, Matthias Bethge, Zeynep Akata

In this work, we complement current perspectives on continual pretraining through a research test bed as well as provide comprehensive guidance for effective continual model updates in such scenarios.

Continual Learning Continual Pretraining +1

GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models

no code implementations21 Aug 2024 Jonathan Roberts, Kai Han, Samuel Albanie

As such, there is a pressing need for a new generation of benchmarks challenging enough for the next generation of LMMs.

HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits

no code implementations5 Jun 2024 Tim Franzmeyer, Aleksandar Shtedritski, Samuel Albanie, Philip Torr, João F. Henriques, Jakob N. Foerster

Verifying whether an X note is helpful or whether a Wikipedia edit should be accepted are hard tasks that require grounding by querying the web.

Inverse Constitutional AI: Compressing Preferences into Principles

1 code implementation2 Jun 2024 Arduin Findeis, Timo Kaufmann, Eyke Hüllermeier, Samuel Albanie, Robert Mullins

In constitutional AI, a set of principles (or constitution) is used to provide feedback and fine-tune AI models.

Chatbot Language Modelling +1

A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision

no code implementations16 May 2024 Charles Raude, K R Prajwal, Liliane Momeni, Hannah Bull, Samuel Albanie, Andrew Zisserman, Gül Varol

To this end, we introduce a multi-task Transformer model, CSLR2, that is able to ingest a signing sequence and output in a joint embedding space between signed language and spoken language text.

Retrieval Sign Language Recognition +1

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

1 code implementation4 Apr 2024 Vishaal Udandarao, Ameya Prabhu, Adhiraj Ghosh, Yash Sharma, Philip H. S. Torr, Adel Bibi, Samuel Albanie, Matthias Bethge

Web-crawled pretraining datasets underlie the impressive "zero-shot" evaluation performance of multimodal models, such as CLIP for classification/retrieval and Stable-Diffusion for image generation.

Benchmarking Image Generation +1

A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval

no code implementations29 Feb 2024 Andreea-Maria Oncescu, João F. Henriques, Andrew Zisserman, Samuel Albanie, A. Sophia Koepke

Furthermore, we show that using the same prompts, we can successfully employ LLMs to improve the retrieval on EpicSounds, compared to using the original audio class labels of the dataset.

Retrieval

Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress

1 code implementation29 Feb 2024 Ameya Prabhu, Vishaal Udandarao, Philip Torr, Matthias Bethge, Adel Bibi, Samuel Albanie

However, with repeated testing, the risk of overfitting grows as algorithms over-exploit benchmark idiosyncrasies.

Benchmarking

InstructVideo: Instructing Video Diffusion Models with Human Feedback

1 code implementation CVPR 2024 Hangjie Yuan, Shiwei Zhang, Xiang Wang, Yujie Wei, Tao Feng, Yining Pan, Yingya Zhang, Ziwei Liu, Samuel Albanie, Dong Ni

To tackle this problem, we propose InstructVideo to instruct text-to-video diffusion models with human feedback by reward fine-tuning.

Video Generation

Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs

1 code implementation24 Nov 2023 Jonathan Roberts, Timo Lüddecke, Rehan Sheikh, Kai Han, Samuel Albanie

Multimodal large language models (MLLMs) have shown remarkable capabilities across a broad range of tasks but their knowledge and abilities in the geographic and geospatial domains are yet to be explored, despite potential wide-ranging benefits to navigation, environmental research, urban development, and disaster response.

Disaster Response

Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models

1 code implementation12 Oct 2023 Vishaal Udandarao, Max F. Burg, Samuel Albanie, Matthias Bethge

This finding points to a blind spot in current frontier VLMs: they excel in recognizing semantic content but fail to acquire an understanding of visual data-types through scaling.

Simple Baselines for Interactive Video Retrieval with Questions and Answers

1 code implementation ICCV 2023 Kaiqu Liang, Samuel Albanie

To date, the majority of video retrieval systems have been optimized for a "single-shot" scenario in which the user submits a query in isolation, ignoring previous interactions with the system.

Question Answering Retrieval +1

RLIPv2: Fast Scaling of Relational Language-Image Pre-training

3 code implementations ICCV 2023 Hangjie Yuan, Shiwei Zhang, Xiang Wang, Samuel Albanie, Yining Pan, Tao Feng, Jianwen Jiang, Dong Ni, Yingya Zhang, Deli Zhao

In this paper, we propose RLIPv2, a fast converging model that enables the scaling of relational pre-training to large-scale pseudo-labelled scene graph data.

 Ranked #1 on Zero-Shot Human-Object Interaction Detection on HICO-DET (using extra training data)

Graph Generation Human-Object Interaction Detection +6

arXiVeri: Automatic table verification with GPT

1 code implementation13 Jun 2023 Gyungin Shin, Weidi Xie, Samuel Albanie

In this paper, we propose to meet this challenge through the novel task of automatic table verification (AutoTV), in which the objective is to verify the accuracy of numerical data in tables by cross-referencing cited sources.

GPT4GEO: How a Language Model Sees the World's Geography

no code implementations30 May 2023 Jonathan Roberts, Timo Lüddecke, Sowmen Das, Kai Han, Samuel Albanie

Large language models (LLMs) have shown remarkable capabilities across a broad range of tasks involving question answering and the generation of coherent text and code.

Disaster Response Language Modelling +2

Zero-shot Unsupervised Transfer Instance Segmentation

1 code implementation27 Apr 2023 Gyungin Shin, Samuel Albanie, Weidi Xie

Segmentation is a core computer vision competency, with applications spanning a broad range of scientifically and economically valuable domains.

Instance Segmentation Segmentation +1

SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models

no code implementations23 Apr 2023 Jonathan Roberts, Kai Han, Samuel Albanie

In this work, we introduce SATellite ImageNet (SATIN), a metadataset curated from 27 existing remotely sensed datasets, and comprehensively evaluate the zero-shot transfer classification capabilities of a broad range of vision-language (VL) models on SATIN.

Classification Diversity +1

Can GPT-4 Perform Neural Architecture Search?

1 code implementation21 Apr 2023 Mingkai Zheng, Xiu Su, Shan You, Fei Wang, Chen Qian, Chang Xu, Samuel Albanie

We investigate the potential of GPT-4~\cite{gpt4} to perform Neural Architecture Search (NAS) -- the task of designing effective neural architectures.

Navigate Neural Architecture Search

Large Language Models are Few-shot Publication Scoopers

no code implementations2 Apr 2023 Samuel Albanie, Liliane Momeni, João F. Henriques

Driven by recent advances AI, we passengers are entering a golden age of scientific discovery.

scientific discovery

DeepMIM: Deep Supervision for Masked Image Modeling

1 code implementation15 Mar 2023 Sucheng Ren, Fangyun Wei, Samuel Albanie, Zheng Zhang, Han Hu

Deep supervision, which involves extra supervisions to the intermediate features of a neural network, was widely used in image classification in the early deep learning era since it significantly reduces the training difficulty and eases the optimization like avoiding gradient vanish over the vanilla training.

Image Classification object-detection +2

Moment Detection in Long Tutorial Videos

1 code implementation ICCV 2023 Ioana Croitoru, Simion-Vlad Bogolin, Samuel Albanie, Yang Liu, Zhaowen Wang, Seunghyun Yoon, Franck Dernoncourt, Hailin Jin, Trung Bui

To study this problem, we propose the first dataset of untrimmed, long-form tutorial videos for the task of Moment Detection called the Behance Moment Detection (BMD) dataset.

SuS-X: Training-Free Name-Only Transfer of Vision-Language Models

2 code implementations ICCV 2023 Vishaal Udandarao, Ankush Gupta, Samuel Albanie

Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way to train large-scale vision-language models.

Retrieval Zero-Shot Learning

Weakly-supervised Fingerspelling Recognition in British Sign Language Videos

1 code implementation16 Nov 2022 K R Prajwal, Hannah Bull, Liliane Momeni, Samuel Albanie, Gül Varol, Andrew Zisserman

Through extensive evaluations, we verify our method for automatic annotation and our model architecture.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

7 code implementations9 Nov 2022 BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, Dragomir Radev, Eduardo González Ponferrada, Efrat Levkovizh, Ethan Kim, Eyal Bar Natan, Francesco De Toni, Gérard Dupont, Germán Kruszewski, Giada Pistilli, Hady Elsahar, Hamza Benyamina, Hieu Tran, Ian Yu, Idris Abdulmumin, Isaac Johnson, Itziar Gonzalez-Dios, Javier de la Rosa, Jenny Chim, Jesse Dodge, Jian Zhu, Jonathan Chang, Jörg Frohberg, Joseph Tobing, Joydeep Bhattacharjee, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro von Werra, Leon Weber, Long Phan, Loubna Ben allal, Ludovic Tanguy, Manan Dey, Manuel Romero Muñoz, Maraim Masoud, María Grandury, Mario Šaško, Max Huang, Maximin Coavoux, Mayank Singh, Mike Tian-Jian Jiang, Minh Chien Vu, Mohammad A. Jauhar, Mustafa Ghaleb, Nishant Subramani, Nora Kassner, Nurulaqilla Khamis, Olivier Nguyen, Omar Espejel, Ona de Gibert, Paulo Villegas, Peter Henderson, Pierre Colombo, Priscilla Amuok, Quentin Lhoest, Rheza Harliman, Rishi Bommasani, Roberto Luis López, Rui Ribeiro, Salomey Osei, Sampo Pyysalo, Sebastian Nagel, Shamik Bose, Shamsuddeen Hassan Muhammad, Shanya Sharma, Shayne Longpre, Somaieh Nikpoor, Stanislav Silberberg, Suhas Pai, Sydney Zink, Tiago Timponi Torrent, Timo Schick, Tristan Thrush, Valentin Danchev, Vassilina Nikoulina, Veronika Laippala, Violette Lepercq, Vrinda Prabhu, Zaid Alyafeai, Zeerak Talat, Arun Raja, Benjamin Heinzerling, Chenglei Si, Davut Emre Taşar, Elizabeth Salesky, Sabrina J. Mielke, Wilson Y. Lee, Abheesht Sharma, Andrea Santilli, Antoine Chaffin, Arnaud Stiegler, Debajyoti Datta, Eliza Szczechla, Gunjan Chhablani, Han Wang, Harshit Pandey, Hendrik Strobelt, Jason Alan Fries, Jos Rozen, Leo Gao, Lintang Sutawika, M Saiful Bari, Maged S. Al-shaibani, Matteo Manica, Nihal Nayak, Ryan Teehan, Samuel Albanie, Sheng Shen, Srulik Ben-David, Stephen H. Bach, Taewoon Kim, Tali Bers, Thibault Fevry, Trishala Neeraj, Urmish Thakker, Vikas Raunak, Xiangru Tang, Zheng-Xin Yong, Zhiqing Sun, Shaked Brody, Yallow Uri, Hadar Tojarieh, Adam Roberts, Hyung Won Chung, Jaesung Tae, Jason Phang, Ofir Press, Conglong Li, Deepak Narayanan, Hatim Bourfoune, Jared Casper, Jeff Rasley, Max Ryabinin, Mayank Mishra, Minjia Zhang, Mohammad Shoeybi, Myriam Peyrounette, Nicolas Patry, Nouamane Tazi, Omar Sanseviero, Patrick von Platen, Pierre Cornette, Pierre François Lavallée, Rémi Lacroix, Samyam Rajbhandari, Sanchit Gandhi, Shaden Smith, Stéphane Requena, Suraj Patil, Tim Dettmers, Ahmed Baruwa, Amanpreet Singh, Anastasia Cheveleva, Anne-Laure Ligozat, Arjun Subramonian, Aurélie Névéol, Charles Lovering, Dan Garrette, Deepak Tunuguntla, Ehud Reiter, Ekaterina Taktasheva, Ekaterina Voloshina, Eli Bogdanov, Genta Indra Winata, Hailey Schoelkopf, Jan-Christoph Kalo, Jekaterina Novikova, Jessica Zosa Forde, Jordan Clive, Jungo Kasai, Ken Kawamura, Liam Hazan, Marine Carpuat, Miruna Clinciu, Najoung Kim, Newton Cheng, Oleg Serikov, Omer Antverg, Oskar van der Wal, Rui Zhang, Ruochen Zhang, Sebastian Gehrmann, Shachar Mirkin, Shani Pais, Tatiana Shavrina, Thomas Scialom, Tian Yun, Tomasz Limisiewicz, Verena Rieser, Vitaly Protasov, Vladislav Mikhailov, Yada Pruksachatkun, Yonatan Belinkov, Zachary Bamberger, Zdeněk Kasner, Alice Rueda, Amanda Pestana, Amir Feizpour, Ammar Khan, Amy Faranak, Ana Santos, Anthony Hevia, Antigona Unldreaj, Arash Aghagol, Arezoo Abdollahi, Aycha Tammour, Azadeh HajiHosseini, Bahareh Behroozi, Benjamin Ajibade, Bharat Saxena, Carlos Muñoz Ferrandis, Daniel McDuff, Danish Contractor, David Lansky, Davis David, Douwe Kiela, Duong A. Nguyen, Edward Tan, Emi Baylor, Ezinwanne Ozoani, Fatima Mirza, Frankline Ononiwu, Habib Rezanejad, Hessie Jones, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, Isar Nejadgholi, Jesse Passmore, Josh Seltzer, Julio Bonis Sanz, Livia Dutra, Mairon Samagaio, Maraim Elbadri, Margot Mieskes, Marissa Gerchick, Martha Akinlolu, Michael McKenna, Mike Qiu, Muhammed Ghauri, Mykola Burynok, Nafis Abrar, Nazneen Rajani, Nour Elkott, Nour Fahmy, Olanrewaju Samuel, Ran An, Rasmus Kromann, Ryan Hao, Samira Alizadeh, Sarmad Shubber, Silas Wang, Sourav Roy, Sylvain Viguier, Thanh Le, Tobi Oyebade, Trieu Le, Yoyo Yang, Zach Nguyen, Abhinav Ramesh Kashyap, Alfredo Palasciano, Alison Callahan, Anima Shukla, Antonio Miranda-Escalada, Ayush Singh, Benjamin Beilharz, Bo wang, Caio Brito, Chenxi Zhou, Chirag Jain, Chuxin Xu, Clémentine Fourrier, Daniel León Periñán, Daniel Molano, Dian Yu, Enrique Manjavacas, Fabio Barth, Florian Fuhrimann, Gabriel Altay, Giyaseddin Bayrak, Gully Burns, Helena U. Vrabec, Imane Bello, Ishani Dash, Jihyun Kang, John Giorgi, Jonas Golde, Jose David Posada, Karthik Rangasai Sivaraman, Lokesh Bulchandani, Lu Liu, Luisa Shinzato, Madeleine Hahn de Bykhovetz, Maiko Takeuchi, Marc Pàmies, Maria A Castillo, Marianna Nezhurina, Mario Sänger, Matthias Samwald, Michael Cullan, Michael Weinberg, Michiel De Wolf, Mina Mihaljcic, Minna Liu, Moritz Freidank, Myungsun Kang, Natasha Seelam, Nathan Dahlberg, Nicholas Michio Broad, Nikolaus Muellner, Pascale Fung, Patrick Haller, Ramya Chandrasekhar, Renata Eisenberg, Robert Martin, Rodrigo Canalli, Rosaline Su, Ruisi Su, Samuel Cahyawijaya, Samuele Garda, Shlok S Deshmukh, Shubhanshu Mishra, Sid Kiblawi, Simon Ott, Sinee Sang-aroonsiri, Srishti Kumar, Stefan Schweter, Sushil Bharati, Tanmay Laud, Théo Gigant, Tomoya Kainuma, Wojciech Kusa, Yanis Labrak, Yash Shailesh Bajaj, Yash Venkatraman, Yifan Xu, Yingxin Xu, Yu Xu, Zhe Tan, Zhongli Xie, Zifan Ye, Mathilde Bras, Younes Belkada, Thomas Wolf

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions.

Decoder Language Modelling +1

NamedMask: Distilling Segmenters from Complementary Foundation Models

1 code implementation22 Sep 2022 Gyungin Shin, Weidi Xie, Samuel Albanie

Our method, termed NamedMask, begins by using CLIP to construct category-specific archives of images.

Data Augmentation Object +1

RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection

3 code implementations5 Sep 2022 Hangjie Yuan, Jianwen Jiang, Samuel Albanie, Tao Feng, Ziyuan Huang, Dong Ni, Mingqian Tang

The task of Human-Object Interaction (HOI) detection targets fine-grained visual parsing of humans interacting with their environment, enabling a broad range of applications.

Human-Object Interaction Detection Relation +1

Automatic dense annotation of large-vocabulary sign language videos

no code implementations4 Aug 2022 Liliane Momeni, Hannah Bull, K R Prajwal, Samuel Albanie, Gül Varol, Andrew Zisserman

Recently, sign language researchers have turned to sign language interpreted TV broadcasts, comprising (i) a video of continuous signing and (ii) subtitles corresponding to the audio content, as a readily available and large-scale source of training data.

ReCo: Retrieve and Co-segment for Zero-shot Transfer

2 code implementations14 Jun 2022 Gyungin Shin, Weidi Xie, Samuel Albanie

Semantic segmentation has a broad range of applications, but its real-world impact has been significantly limited by the prohibitive annotation costs necessary to enable deployment.

Retrieval Segmentation +1

Scaling up sign spotting through sign language dictionaries

no code implementations9 May 2022 Gül Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman

The focus of this work is $\textit{sign spotting}$ - given a video of an isolated sign, our task is to identify $\textit{whether}$ and $\textit{where}$ it has been signed in a continuous, co-articulated sign language video.

Multiple Instance Learning

A 23 MW data centre is all you need

no code implementations31 Mar 2022 Samuel Albanie, Dylan Campbell, João F. Henriques

The field of machine learning has achieved striking progress in recent years, witnessing breakthrough results on language modelling, protein folding and nitpickingly fine-grained dog breed classification.

Board Games Language Modelling +1

Unsupervised Salient Object Detection with Spectral Cluster Voting

1 code implementation23 Mar 2022 Gyungin Shin, Samuel Albanie, Weidi Xie

In this paper, we tackle the challenging task of unsupervised salient object detection (SOD) by leveraging spectral clustering on self-supervised features.

Clustering Object +5

Sign Language Video Retrieval with Free-Form Textual Queries

no code implementations CVPR 2022 Amanda Duarte, Samuel Albanie, Xavier Giró-i-Nieto, Gül Varol

Systems that can efficiently search collections of sign language videos have been highlighted as a useful application of sign language technology.

Retrieval Sentence +2

Cross Modal Retrieval with Querybank Normalisation

1 code implementation CVPR 2022 Simion-Vlad Bogolin, Ioana Croitoru, Hailin Jin, Yang Liu, Samuel Albanie

In this work we first show that, despite their effectiveness, state-of-the-art joint embeddings suffer significantly from the longstanding "hubness problem" in which a small number of gallery embeddings form the nearest neighbours of many queries.

Cross-Modal Retrieval Metric Learning +3

Audio Retrieval with Natural Language Queries: A Benchmark Study

1 code implementation17 Dec 2021 A. Sophia Koepke, Andreea-Maria Oncescu, João F. Henriques, Zeynep Akata, Samuel Albanie

Additionally, we introduce the SoundDescs benchmark, which consists of paired audio and natural language descriptions for a diverse collection of sounds that are complementary to those found in AudioCaps and Clotho.

AudioCaps Audio captioning +4

BBC-Oxford British Sign Language Dataset

no code implementations5 Nov 2021 Samuel Albanie, Gül Varol, Liliane Momeni, Hannah Bull, Triantafyllos Afouras, Himel Chowdhury, Neil Fox, Bencie Woll, Rob Cooper, Andrew McParland, Andrew Zisserman

In this work, we introduce the BBC-Oxford British Sign Language (BOBSL) dataset, a large-scale video collection of British Sign Language (BSL).

Sign Language Translation Translation

Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval

no code implementations CVPR 2021 Yang Liu, Qingchao Chen, Samuel Albanie

In this paper, we study the task of visual-text retrieval in the highly practical setting in which labelled visual data with paired text descriptions are available in one domain (the "source"), but only unlabelled visual data (without text descriptions) are available in the domain of interest (the "target").

Inductive Bias Text Retrieval

TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval

1 code implementation ICCV 2021 Ioana Croitoru, Simion-Vlad Bogolin, Marius Leordeanu, Hailin Jin, Andrew Zisserman, Samuel Albanie, Yang Liu

In recent years, considerable progress on the task of text-video retrieval has been achieved by leveraging large-scale pretraining on visual and audio datasets to construct powerful video encoders.

Retrieval Video Retrieval

All you need are a few pixels: semantic segmentation with PixelPick

2 code implementations13 Apr 2021 Gyungin Shin, Weidi Xie, Samuel Albanie

A central challenge for the task of semantic segmentation is the prohibitive cost of obtaining dense pixel-level annotations to supervise model training.

Active Learning Segmentation +1

On the Origin of Species of Self-Supervised Learning

no code implementations31 Mar 2021 Samuel Albanie, Erika Lu, Joao F. Henriques

In the quiet backwaters of cs. CV, cs. LG and stat. ML, a cornucopia of new learning systems is emerging from a primordial soup of mathematics-learning systems with no need for external supervision.

Self-Supervised Learning

Read and Attend: Temporal Localisation in Sign Language Videos

no code implementations CVPR 2021 Gül Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman

Our contributions are as follows: (1) we demonstrate the ability to leverage large quantities of continuous signing videos with weakly-aligned subtitles to localise signs in continuous sign language; (2) we employ the learned attention to automatically generate hundreds of thousands of annotations for a large sign vocabulary; (3) we collect a set of 37K manually verified sign instances across a vocabulary of 950 sign classes to support our study of sign language recognition; (4) by training on the newly annotated data from our method, we outperform the prior state of the art on the BSL-1K sign language recognition benchmark.

Sign Language Recognition

Quantum Self-Supervised Learning

2 code implementations26 Mar 2021 Ben Jaderberg, Lewis W. Anderson, Weidi Xie, Samuel Albanie, Martin Kiffner, Dieter Jaksch

The resurgence of self-supervised learning, whereby a deep learning model generates its own supervisory signal from the data, promises a scalable way to tackle the dramatically increasing size of real-world data sets without human annotation.

Self-Supervised Learning

Sign language segmentation with temporal convolutional networks

1 code implementation25 Nov 2020 Katrin Renz, Nicolaj C. Stache, Samuel Albanie, Gül Varol

The objective of this work is to determine the location of temporal boundaries between signs in continuous sign language videos.

Watch, read and lookup: learning to spot signs from multiple supervisors

1 code implementation8 Oct 2020 Liliane Momeni, Gül Varol, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman

The focus of this work is sign spotting - given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video.

Multiple Instance Learning

Seeing wake words: Audio-visual Keyword Spotting

1 code implementation2 Sep 2020 Liliane Momeni, Triantafyllos Afouras, Themos Stafylakis, Samuel Albanie, Andrew Zisserman

The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio.

Lip Reading Visual Keyword Spotting

BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

1 code implementation ECCV 2020 Samuel Albanie, Gül Varol, Liliane Momeni, Triantafyllos Afouras, Joon Son Chung, Neil Fox, Andrew Zisserman

Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality.

Action Classification Keyword Spotting +2

Iterative Averaging in the Quest for Best Test Error

no code implementations2 Mar 2020 Diego Granziol, Xingchen Wan, Samuel Albanie, Stephen Roberts

We analyse and explain the increased generalisation performance of iterate averaging using a Gaussian process perturbation model between the true and batch risk surface on the high dimensional quadratic.

Diversity Image Classification

Disentangled Speech Embeddings using Cross-modal Self-supervision

no code implementations20 Feb 2020 Arsha Nagrani, Joon Son Chung, Samuel Albanie, Andrew Zisserman

The objective of this paper is to learn representations of speaker identity without access to manually annotated data.

Self-Supervised Learning Speaker Recognition

Unsupervised Learning of Landmarks by Descriptor Vector Exchange

1 code implementation ICCV 2019 James Thewlis, Samuel Albanie, Hakan Bilen, Andrea Vedaldi

Equivariance to random image transformations is an effective method to learn landmarks of object categories, such as the eyes and the nose in faces, without manual supervision.

Object Unsupervised Facial Landmark Detection

Use What You Have: Video Retrieval Using Representations From Collaborative Experts

3 code implementations31 Jul 2019 Yang Liu, Samuel Albanie, Arsha Nagrani, Andrew Zisserman

The rapid growth of video on the internet has made searching for video content using natural language queries a significant challenge.

Natural Language Queries Retrieval +2

Deep Industrial Espionage

no code implementations1 Apr 2019 Samuel Albanie, James Thewlis, Sebastien Ehrhardt, Joao Henriques

The theory of deep learning is now considered largely solved, and is well understood by researchers and influencers alike.

Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks

9 code implementations NeurIPS 2018 Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Andrea Vedaldi

We also propose a parametric gather-excite operator pair which yields further performance gains, relate it to the recently-introduced Squeeze-and-Excitation Networks, and analyse the effects of these changes to the CNN feature activation statistics.

Emotion Recognition in Speech using Cross-Modal Transfer in the Wild

no code implementations16 Aug 2018 Samuel Albanie, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

We make the following contributions: (i) we develop a strong teacher network for facial emotion recognition that achieves the state of the art on a standard benchmark; (ii) we use the teacher to train a student, tabula rasa, to learn representations (embeddings) for speech emotion recognition without access to labelled audio data; and (iii) we show that the speech emotion embedding can be used for speech emotion recognition on external benchmark datasets.

Facial Emotion Recognition Facial Expression Recognition (FER) +1

Small steps and giant leaps: Minimal Newton solvers for Deep Learning

6 code implementations ICLR 2019 João F. Henriques, Sebastien Ehrhardt, Samuel Albanie, Andrea Vedaldi

Instead, we propose to keep a single estimate of the gradient projected by the inverse Hessian matrix, and update it once per iteration.

PyTorch CurveBall - A second-order optimizer for deep networks

1 code implementation21 May 2018 João F. Henriques, Sebastien Ehrhardt, Samuel Albanie, Andrea Vedaldi

We propose a fast second-order method that can be used as a drop-in replacementfor current deep learning solvers.

Substitute Teacher Networks: Learning with Almost No Supervision

1 code implementation1 Apr 2018 Samuel Albanie, James Thewlis, Joao F. Henriques

Learning through experience is time-consuming, inefficient and often bad for your cortisol levels.

Seeing Voices and Hearing Faces: Cross-modal biometric matching

no code implementations CVPR 2018 Arsha Nagrani, Samuel Albanie, Andrew Zisserman

We make the following contributions: (i) we introduce CNN architectures for both binary and multi-way cross-modal face and audio matching, (ii) we compare dynamic testing (where video information is available, but the audio is not from the same video) with static testing (where only a single still image is available), and (iii) we use human testing as a baseline to calibrate the difficulty of the task.

Face Recognition Speaker Identification

Squeeze-and-Excitation Networks

82 code implementations CVPR 2018 Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Enhua Wu

Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2. 251%, surpassing the winning entry of 2016 by a relative improvement of ~25%.

Image Classification Object Detection

Stopping GAN Violence: Generative Unadversarial Networks

1 code implementation7 Mar 2017 Samuel Albanie, Sébastien Ehrhardt, João F. Henriques

While the costs of human violence have attracted a great deal of attention from the research community, the effects of the network-on-network (NoN) violence popularised by Generative Adversarial Networks have yet to be addressed.

Unknowable Manipulators: Social Network Curator Algorithms

no code implementations17 Jan 2017 Samuel Albanie, Hillary Shakespeare, Tom Gunter

For a social networking service to acquire and retain users, it must find ways to keep them engaged.

Decision Making

Learning Grimaces by Watching TV

no code implementations7 Oct 2016 Samuel Albanie, Andrea Vedaldi

As a starting point, we consider the problem of relating facial expressions to objectively measurable events occurring in videos.

Emotion Recognition Face Verification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.