1 code implementation • 9 Feb 2024 • Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Shuang Ma, Hal Daumé III, Huazhe Xu, John Langford, Praveen Palanisamy, Kalyan Shankar Basu, Furong Huang
We present Premier-TACO, a multitask feature representation learning approach designed to improve few-shot policy learning efficiency in sequential decision-making tasks.
1 code implementation • ICCV 2023 • Yao Wei, Yanchao Sun, Ruijie Zheng, Sai Vemprala, Rogerio Bonatti, Shuhang Chen, Ratnesh Madaan, Zhongjie Ba, Ashish Kapoor, Shuang Ma
We introduce DualMind, a generalist agent designed to tackle various decision-making tasks that addresses challenges posed by current methods, such as overfitting behaviors and dependence on task-specific fine-tuning.
1 code implementation • 22 Jun 2023 • Ruijie Zheng, Xiyao Wang, Yanchao Sun, Shuang Ma, Jieyu Zhao, Huazhe Xu, Hal Daumé III, Furong Huang
Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle.
no code implementations • 24 Jan 2023 • Yanchao Sun, Shuang Ma, Ratnesh Madaan, Rogerio Bonatti, Furong Huang, Ashish Kapoor
Self-supervised pretraining has been extensively studied in language and vision domains, where a unified model can be easily adapted to various downstream tasks by pretraining representations without explicit labels.
1 code implementation • 18 Nov 2022 • Jiachen Lei, Shuang Ma, Zhongjie Ba, Sai Vemprala, Ashish Kapoor, Kui Ren
In this report, we present our approach and empirical results of applying masked autoencoders in two egocentric video understanding tasks, namely, Object State Change Classification and PNR Temporal Localization, of Ego4D Challenge 2022.
no code implementations • 22 Sep 2022 • Rogerio Bonatti, Sai Vemprala, Shuang Ma, Felipe Frujeri, Shuhang Chen, Ashish Kapoor
Robotics has long been a field riddled with complex systems architectures whose modules and connections, whether traditional or learning-based, require significant human expertise and prior knowledge.
2 code implementations • 4 Aug 2022 • Arthur Bucker, Luis Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Sai Vemprala, Rogerio Bonatti
Natural language is one of the most intuitive ways to express human intent.
no code implementations • 25 Mar 2022 • Arthur Bucker, Luis Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Rogerio Bonatti
However, using language is seldom an easy task when humans need to express their intent towards robots, since most of the current language interfaces require rigid templates with a static set of action targets and commands.
no code implementations • NeurIPS 2021 • Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song
In this work, we propose to learn video representations that generalize to both the tasks which require global semantic information (e. g., classification) and the tasks that require local fine-grained spatio-temporal information (e. g., localization).
no code implementations • 4 Sep 2021 • Mingzhi Yu, Diane Litman, Shuang Ma, Jian Wu
Then we use the model to perform similarity measure in a corpus-based entrainment analysis.
no code implementations • 25 Jun 2021 • Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn, Ashish Kapoor
The ability to perform causal and counterfactual reasoning are central properties of human intelligence.
1 code implementation • 7 Apr 2021 • Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song
In this work, we propose to learn video representations that generalize to both the tasks which require global semantic information (e. g., classification) and the tasks that require local fine-grained spatio-temporal information (e. g., localization).
no code implementations • 1 Jan 2021 • Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song
Contrastive self-supervised learning has delivered impressive results in many audio-visual recognition tasks.
1 code implementation • ICLR 2021 • Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song
Contrastive learning has been shown to produce generalizable representations of audio and visual data by maximizing the lower bound on the mutual information (MI) between different views of an instance.
no code implementations • 25 Oct 2019 • Matt Whitehill, Shuang Ma, Daniel McDuff, Yale Song
We use this method to transfer emotion from a dataset containing four emotions to a dataset with only a single emotion.
1 code implementation • ICCV 2019 • Shuang Ma, Daniel McDuff, Yale Song
We propose a multimodal information bottleneck approach that learns the correspondence between modalities from unpaired data (image and speech) by leveraging the shared modality (text).
no code implementations • 9 Jul 2019 • Shuang Ma, Daniel McDuff, Yale Song
Generative adversarial networks have led to significant advances in cross-modal/domain translation.
1 code implementation • NeurIPS 2019 • Daniel McDuff, Shuang Ma, Yale Song, Ashish Kapoor
Models that are learned from real-world data are often biased because the data used to train them is biased.
no code implementations • ICLR 2019 • Shuang Ma, Daniel McDuff, Yale Song
The synthesized audio waveform is expected to contain the verbal content of x_txt and the auditory style of x_aud.
no code implementations • CVPR 2018 • Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei
Specifically, we jointly learn a deep attention encoder, and the instance-level correspondences could be consequently discovered through attending on the learned instances.
no code implementations • CVPR 2018 • Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei
Specifically, we jointly learn a deep attention encoder, and the instancelevel correspondences could be consequently discovered through attending on the learned instance pairs.
no code implementations • CVPR 2017 • Shuang Ma, Jing Liu, Chang Wen Chen
However, the performance of these deep CNN methods is often compromised by the constraint that the neural network only takes the fixed-size input.
Ranked #2 on Aesthetics Quality Assessment on AVA