Search Results for author: Shuang Ma

Found 22 papers, 9 papers with code

Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training

1 code implementation ICCV 2023 Yao Wei, Yanchao Sun, Ruijie Zheng, Sai Vemprala, Rogerio Bonatti, Shuhang Chen, Ratnesh Madaan, Zhongjie Ba, Ashish Kapoor, Shuang Ma

We introduce DualMind, a generalist agent designed to tackle various decision-making tasks that addresses challenges posed by current methods, such as overfitting behaviors and dependence on task-specific fine-tuning.

Decision Making

TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

1 code implementation22 Jun 2023 Ruijie Zheng, Xiyao Wang, Yanchao Sun, Shuang Ma, Jieyu Zhao, Huazhe Xu, Hal Daumé III, Furong Huang

Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle.

Continuous Control Contrastive Learning +3

SMART: Self-supervised Multi-task pretrAining with contRol Transformers

no code implementations24 Jan 2023 Yanchao Sun, Shuang Ma, Ratnesh Madaan, Rogerio Bonatti, Furong Huang, Ashish Kapoor

Self-supervised pretraining has been extensively studied in language and vision domains, where a unified model can be easily adapted to various downstream tasks by pretraining representations without explicit labels.

Imitation Learning Reinforcement Learning (RL)

Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022

1 code implementation18 Nov 2022 Jiachen Lei, Shuang Ma, Zhongjie Ba, Sai Vemprala, Ashish Kapoor, Kui Ren

In this report, we present our approach and empirical results of applying masked autoencoders in two egocentric video understanding tasks, namely, Object State Change Classification and PNR Temporal Localization, of Ego4D Challenge 2022.

Object State Change Classification Temporal Localization +1

PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training

no code implementations22 Sep 2022 Rogerio Bonatti, Sai Vemprala, Shuang Ma, Felipe Frujeri, Shuhang Chen, Ashish Kapoor

Robotics has long been a field riddled with complex systems architectures whose modules and connections, whether traditional or learning-based, require significant human expertise and prior knowledge.

LATTE: LAnguage Trajectory TransformEr

2 code implementations4 Aug 2022 Arthur Bucker, Luis Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Sai Vemprala, Rogerio Bonatti

Natural language is one of the most intuitive ways to express human intent.

Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers

no code implementations25 Mar 2022 Arthur Bucker, Luis Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Rogerio Bonatti

However, using language is seldom an easy task when humans need to express their intent towards robots, since most of the current language interfaces require rigid templates with a static set of action targets and commands.

Imitation Learning Text Generation

Contrastive Learning of Global and Local Video Representations

no code implementations NeurIPS 2021 Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

In this work, we propose to learn video representations that generalize to both the tasks which require global semantic information (e. g., classification) and the tasks that require local fine-grained spatio-temporal information (e. g., localization).

Classification Contrastive Learning +4

A Neural Network-Based Linguistic Similarity Measure for Entrainment in Conversations

no code implementations4 Sep 2021 Mingzhi Yu, Diane Litman, Shuang Ma, Jian Wu

Then we use the model to perform similarity measure in a corpus-based entrainment analysis.

Contrastive Learning of Global-Local Video Representations

1 code implementation7 Apr 2021 Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

In this work, we propose to learn video representations that generalize to both the tasks which require global semantic information (e. g., classification) and the tasks that require local fine-grained spatio-temporal information (e. g., localization).

Classification Contrastive Learning +6

Active Contrastive Learning of Audio-Visual Video Representations

1 code implementation ICLR 2021 Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

Contrastive learning has been shown to produce generalizable representations of audio and visual data by maximizing the lower bound on the mutual information (MI) between different views of an instance.

Contrastive Learning Representation Learning +1

Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency

no code implementations25 Oct 2019 Matt Whitehill, Shuang Ma, Daniel McDuff, Yale Song

We use this method to transfer emotion from a dataset containing four emotions to a dataset with only a single emotion.

Emotion Classification Style Transfer

Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck

1 code implementation ICCV 2019 Shuang Ma, Daniel McDuff, Yale Song

We propose a multimodal information bottleneck approach that learns the correspondence between modalities from unpaired data (image and speech) by leveraging the shared modality (text).

Image Generation Speech Synthesis

M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention

no code implementations9 Jul 2019 Shuang Ma, Daniel McDuff, Yale Song

Generative adversarial networks have led to significant advances in cross-modal/domain translation.

Dialogue Generation Image Captioning +5

Neural TTS Stylization with Adversarial and Collaborative Games

no code implementations ICLR 2019 Shuang Ma, Daniel McDuff, Yale Song

The synthesized audio waveform is expected to contain the verbal content of x_txt and the auditory style of x_aud.

Disentanglement Style Transfer

DA-GAN: Instance-Level Image Translation by Deep Attention Generative Adversarial Networks

no code implementations CVPR 2018 Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei

Specifically, we jointly learn a deep attention encoder, and the instance-level correspondences could be consequently discovered through attending on the learned instances.

Data Augmentation Deep Attention +2

DA-GAN: Instance-level Image Translation by Deep Attention Generative Adversarial Networks (with Supplementary Materials)

no code implementations CVPR 2018 Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei

Specifically, we jointly learn a deep attention encoder, and the instancelevel correspondences could be consequently discovered through attending on the learned instance pairs.

Data Augmentation Deep Attention +2

Cannot find the paper you are looking for? You can Submit a new open access paper.