Search Results for author: Yusuf Aytar

Found 36 papers, 12 papers with code

Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

no code implementations13 Jun 2024 Ziyi Wu, Yulia Rubanova, Rishabh Kabra, Drew A. Hudson, Igor Gilitschenski, Yusuf Aytar, Sjoerd van Steenkiste, Kelsey R. Allen, Thomas Kipf

By fine-tuning a pre-trained text-to-image diffusion model with this information, our approach enables fine-grained 3D pose and placement control of individual objects in a scene.


FlexCap: Generating Rich, Localized, and Flexible Captions in Images

no code implementations18 Mar 2024 Debidatta Dwibedi, Vidhi Jain, Jonathan Tompson, Andrew Zisserman, Yusuf Aytar

The model, FlexCap, is trained to produce length-conditioned captions for input bounding boxes, and this allows control over the information density of its output, with descriptions ranging from concise object labels to detailed captions.

Attribute Dense Captioning +8

Learning from One Continuous Video Stream

no code implementations CVPR 2024 João Carreira, Michael King, Viorica Pătrăucean, Dilara Gokay, Cătălin Ionescu, Yi Yang, Daniel Zoran, Joseph Heyward, Carl Doersch, Yusuf Aytar, Dima Damen, Andrew Zisserman

We introduce a framework for online learning from a single continuous video stream -- the way people and animals learn, without mini-batches, data augmentation or shuffling.

Data Augmentation Future prediction

RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

no code implementations30 Aug 2023 Mel Vecerik, Carl Doersch, Yi Yang, Todor Davchev, Yusuf Aytar, Guangyao Zhou, Raia Hadsell, Lourdes Agapito, Jon Scholz

For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly.

Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation

no code implementations13 Apr 2023 Mohit Sharma, Claudio Fantacci, Yuxiang Zhou, Skanda Koppula, Nicolas Heess, Jon Scholz, Yusuf Aytar

We demonstrate that appropriate placement of our parameter efficient adapters can significantly reduce the performance gap between frozen pretrained representations and full end-to-end fine-tuning without changes to the original representation and thus preserving original capabilities of the pretrained model.

TAP-Vid: A Benchmark for Tracking Any Point in a Video

3 code implementations7 Nov 2022 Carl Doersch, Ankush Gupta, Larisa Markeeva, Adrià Recasens, Lucas Smaira, Yusuf Aytar, João Carreira, Andrew Zisserman, Yi Yang

Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move.

Optical Flow Estimation Point Tracking

Learning Transferable Motor Skills with Hierarchical Latent Mixture Policies

no code implementations ICLR 2022 Dushyant Rao, Fereshteh Sadeghi, Leonard Hasenclever, Markus Wulfmeier, Martina Zambelli, Giulia Vezzani, Dhruva Tirumala, Yusuf Aytar, Josh Merel, Nicolas Heess, Raia Hadsell

We demonstrate in manipulation domains that the method can effectively cluster offline data into distinct, executable behaviours, while retaining the flexibility of a continuous latent variable model.

Wish you were here: Hindsight Goal Selection for long-horizon dexterous manipulation

no code implementations ICLR 2022 Todor Davchev, Oleg Sushkov, Jean-Baptiste Regli, Stefan Schaal, Yusuf Aytar, Markus Wulfmeier, Jon Scholz

In this work, we extend hindsight relabelling mechanisms to guide exploration along task-specific distributions implied by a small set of successful demonstrations.

Continuous Control Reinforcement Learning (RL)

Learning rich touch representations through cross-modal self-supervision

1 code implementation21 Jan 2021 Martina Zambelli, Yusuf Aytar, Francesco Visin, Yuxiang Zhou, Raia Hadsell

The sense of touch is fundamental in several manipulation tasks, but rarely used in robot manipulation.

Self-Supervised Learning Robotics

Offline Learning from Demonstrations and Unlabeled Experience

no code implementations27 Nov 2020 Konrad Zolna, Alexander Novikov, Ksenia Konyushkova, Caglar Gulcehre, Ziyu Wang, Yusuf Aytar, Misha Denil, Nando de Freitas, Scott Reed

Behavior cloning (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations.

Continuous Control Imitation Learning

Large-scale multilingual audio visual dubbing

no code implementations6 Nov 2020 Yi Yang, Brendan Shillingford, Yannis Assael, Miaosen Wang, Wendi Liu, Yutian Chen, Yu Zhang, Eren Sezener, Luis C. Cobo, Misha Denil, Yusuf Aytar, Nando de Freitas

The visual content is translated by synthesizing lip movements for the speaker to match the translated audio, creating a seamless audiovisual experience in the target language.


Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation

no code implementations21 Oct 2019 Rae Jeong, Yusuf Aytar, David Khosid, Yuxiang Zhou, Jackie Kay, Thomas Lampe, Konstantinos Bousmalis, Francesco Nori

In this work, we learn a latent state representation implicitly with deep reinforcement learning in simulation, and then adapt it to the real domain using unlabeled real robot data.

Domain Adaptation reinforcement-learning +1

Visual Imitation with a Minimal Adversary

no code implementations ICLR 2019 Scott Reed, Yusuf Aytar, Ziyu Wang, Tom Paine, Aäron van den Oord, Tobias Pfaff, Sergio Gomez, Alexander Novikov, David Budden, Oriol Vinyals

The proposed agent can solve a challenging robot manipulation task of block stacking from only video demonstrations and sparse reward, in which the non-imitating agents fail to learn completely.

Imitation Learning Robot Manipulation

One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

no code implementations ICLR 2019 Tom Le Paine, Sergio Gómez Colmenarejo, Ziyu Wang, Scott Reed, Yusuf Aytar, Tobias Pfaff, Matt W. Hoffman, Gabriel Barth-Maron, Serkan Cabi, David Budden, Nando de Freitas

MetaMimic can learn both (i) policies for high-fidelity one-shot imitation of diverse novel skills, and (ii) policies that enable the agent to solve tasks more efficiently than the demonstrators.

Playing hard exploration games by watching YouTube

1 code implementation NeurIPS 2018 Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, Nando de Freitas

One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator.

Montezuma's Revenge

Exploiting Convolution Filter Patterns for Transfer Learning

no code implementations23 Aug 2017 Mehmet Aygün, Yusuf Aytar, Hazim Kemal Ekenel

In this paper, we introduce a new regularization technique for transfer learning.

Transfer Learning

See, Hear, and Read: Deep Aligned Representations

1 code implementation3 Jun 2017 Yusuf Aytar, Carl Vondrick, Antonio Torralba

We capitalize on large amounts of readily-available, synchronous data to learn a deep discriminative representations shared across three major natural modalities: vision, sound and language.

Cross-Modal Retrieval Representation Learning +1

Face-to-BMI: Using Computer Vision to Infer Body Mass Index on Social Media

no code implementations9 Mar 2017 Enes Kocabey, Mustafa Camurcu, Ferda Ofli, Yusuf Aytar, Javier Marin, Antonio Torralba, Ingmar Weber

A person's weight status can have profound implications on their life, ranging from mental health, to longevity, to financial income.

Body Mass Index (BMI) Prediction

Is Saki #delicious? The Food Perception Gap on Instagram and Its Relation to Health

no code implementations21 Feb 2017 Ferda Ofli, Yusuf Aytar, Ingmar Weber, Raggi al Hammouri, Antonio Torralba

Studying how food is perceived in relation to what it actually is typically involves a laboratory setup.


Cross-Modal Scene Networks

no code implementations27 Oct 2016 Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.


SoundNet: Learning Sound Representations from Unlabeled Video

6 code implementations NeurIPS 2016 Yusuf Aytar, Carl Vondrick, Antonio Torralba

We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild.

General Classification

How Transferable are CNN-based Features for Age and Gender Classification?

no code implementations1 Oct 2016 Gökhan Özbulak, Yusuf Aytar, Hazim Kemal Ekenel

Domain specific VGG-Face CNN model has been found to be more useful and provided better performance for both age and gender classification tasks, when compared with generic AlexNet-like model, which shows that transfering from a closer domain is more useful.

Age And Gender Classification Classification +3

Learning Aligned Cross-Modal Representations from Weakly Aligned Data

no code implementations CVPR 2016 Lluis Castrejon, Yusuf Aytar, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.


Immediate, Scalable Object Category Detection

no code implementations CVPR 2014 Yusuf Aytar, Andrew Zisserman

The objective of this work is object category detection in large scale image datasets in the manner of Video Google — an object category is specified by a HOG classifier template, and retrieval is immediate at run time.

Object Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.