Search Results for author: Shuang Ma

Found 22 papers, 9 papers with code

Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

1 code implementation • 9 Feb 2024 • Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Shuang Ma, Hal Daumé III, Huazhe Xu, John Langford, Praveen Palanisamy, Kalyan Shankar Basu, Furong Huang

We present Premier-TACO, a multitask feature representation learning approach designed to improve few-shot policy learning efficiency in sequential decision-making tasks.

Computational Efficiency Continuous Control +4

Paper
Code

Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training

1 code implementation • ICCV 2023 • Yao Wei, Yanchao Sun, Ruijie Zheng, Sai Vemprala, Rogerio Bonatti, Shuhang Chen, Ratnesh Madaan, Zhongjie Ba, Ashish Kapoor, Shuang Ma

We introduce DualMind, a generalist agent designed to tackle various decision-making tasks that addresses challenges posed by current methods, such as overfitting behaviors and dependence on task-specific fine-tuning.

Decision Making

Paper
Code

TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

1 code implementation • 22 Jun 2023 • Ruijie Zheng, Xiyao Wang, Yanchao Sun, Shuang Ma, Jieyu Zhao, Huazhe Xu, Hal Daumé III, Furong Huang

Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle.

Continuous Control Contrastive Learning +3

Paper
Code

SMART: Self-supervised Multi-task pretrAining with contRol Transformers

no code implementations • 24 Jan 2023 • Yanchao Sun, Shuang Ma, Ratnesh Madaan, Rogerio Bonatti, Furong Huang, Ashish Kapoor

Self-supervised pretraining has been extensively studied in language and vision domains, where a unified model can be easily adapted to various downstream tasks by pretraining representations without explicit labels.

Imitation Learning Reinforcement Learning (RL)

Paper
Add Code

Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022

1 code implementation • 18 Nov 2022 • Jiachen Lei, Shuang Ma, Zhongjie Ba, Sai Vemprala, Ashish Kapoor, Kui Ren

In this report, we present our approach and empirical results of applying masked autoencoders in two egocentric video understanding tasks, namely, Object State Change Classification and PNR Temporal Localization, of Ego4D Challenge 2022.

Object State Change Classification Temporal Localization +1

Paper
Code

PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training

no code implementations • 22 Sep 2022 • Rogerio Bonatti, Sai Vemprala, Shuang Ma, Felipe Frujeri, Shuhang Chen, Ashish Kapoor

Robotics has long been a field riddled with complex systems architectures whose modules and connections, whether traditional or learning-based, require significant human expertise and prior knowledge.

Paper
Add Code

LATTE: LAnguage Trajectory TransformEr

2 code implementations • 4 Aug 2022 • Arthur Bucker, Luis Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Sai Vemprala, Rogerio Bonatti

Natural language is one of the most intuitive ways to express human intent.

Paper
Code

Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers

no code implementations • 25 Mar 2022 • Arthur Bucker, Luis Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Rogerio Bonatti

However, using language is seldom an easy task when humans need to express their intent towards robots, since most of the current language interfaces require rigid templates with a static set of action targets and commands.

Imitation Learning Text Generation

Paper
Add Code

Contrastive Learning of Global and Local Video Representations

no code implementations • NeurIPS 2021 • Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

In this work, we propose to learn video representations that generalize to both the tasks which require global semantic information (e. g., classification) and the tasks that require local fine-grained spatio-temporal information (e. g., localization).

Classification Contrastive Learning +4

Paper
Add Code

A Neural Network-Based Linguistic Similarity Measure for Entrainment in Conversations

no code implementations • 4 Sep 2021 • Mingzhi Yu, Diane Litman, Shuang Ma, Jian Wu

Then we use the model to perform similarity measure in a corpus-based entrainment analysis.

Paper
Add Code

CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning

no code implementations • 25 Jun 2021 • Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn, Ashish Kapoor

The ability to perform causal and counterfactual reasoning are central properties of human intelligence.

Causal Discovery counterfactual +2

Paper
Add Code

Contrastive Learning of Global-Local Video Representations

1 code implementation • 7 Apr 2021 • Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

Classification Contrastive Learning +6

Paper
Code

Contrastive Self-Supervised Learning of Global-Local Audio-Visual Representations

no code implementations • 1 Jan 2021 • Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

Contrastive self-supervised learning has delivered impressive results in many audio-visual recognition tasks.

Classification DeepFake Detection +5

Paper
Add Code

Active Contrastive Learning of Audio-Visual Video Representations

1 code implementation • ICLR 2021 • Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

Contrastive learning has been shown to produce generalizable representations of audio and visual data by maximizing the lower bound on the mutual information (MI) between different views of an instance.

Contrastive Learning Representation Learning +1

Paper
Code

Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency

no code implementations • 25 Oct 2019 • Matt Whitehill, Shuang Ma, Daniel McDuff, Yale Song

We use this method to transfer emotion from a dataset containing four emotions to a dataset with only a single emotion.

Emotion Classification Style Transfer

Paper
Add Code

Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck

1 code implementation • ICCV 2019 • Shuang Ma, Daniel McDuff, Yale Song

We propose a multimodal information bottleneck approach that learns the correspondence between modalities from unpaired data (image and speech) by leveraging the shared modality (text).

Image Generation Speech Synthesis

Paper
Code

M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention

no code implementations • 9 Jul 2019 • Shuang Ma, Daniel McDuff, Yale Song

Generative adversarial networks have led to significant advances in cross-modal/domain translation.

Dialogue Generation Image Captioning +5

Paper
Add Code

Characterizing Bias in Classifiers using Generative Models

1 code implementation • NeurIPS 2019 • Daniel McDuff, Shuang Ma, Yale Song, Ashish Kapoor

Models that are learned from real-world data are often biased because the data used to train them is biased.

Bayesian Optimization Image Classification

Paper
Code

Neural TTS Stylization with Adversarial and Collaborative Games

no code implementations • ICLR 2019 • Shuang Ma, Daniel McDuff, Yale Song

The synthesized audio waveform is expected to contain the verbal content of x_txt and the auditory style of x_aud.

Disentanglement Style Transfer

Paper
Add Code

DA-GAN: Instance-Level Image Translation by Deep Attention Generative Adversarial Networks

no code implementations • CVPR 2018 • Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei

Specifically, we jointly learn a deep attention encoder, and the instance-level correspondences could be consequently discovered through attending on the learned instances.

Data Augmentation Deep Attention +2

Paper
Add Code

DA-GAN: Instance-level Image Translation by Deep Attention Generative Adversarial Networks (with Supplementary Materials)

no code implementations • CVPR 2018 • Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei

Specifically, we jointly learn a deep attention encoder, and the instancelevel correspondences could be consequently discovered through attending on the learned instance pairs.

Data Augmentation Deep Attention +2

Paper
Add Code

A-Lamp: Adaptive Layout-Aware Multi-Patch Deep Convolutional Neural Network for Photo Aesthetic Assessment

no code implementations • CVPR 2017 • Shuang Ma, Jing Liu, Chang Wen Chen

However, the performance of these deep CNN methods is often compromised by the constraint that the neural network only takes the fixed-size input.

Ranked #2 on Aesthetics Quality Assessment on AVA

Aesthetics Quality Assessment

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.