Search Results for author: Shuang Ma

Found 27 papers, 12 papers with code

Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo

no code implementations2 Oct 2024 Shengyu Feng, Xiang Kong, Shuang Ma, Aonan Zhang, Dong Yin, Chong Wang, Ruoming Pang, Yiming Yang

Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge.

Math

ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities

1 code implementation8 Aug 2024 Jiarui Lu, Thomas Holleis, Yizhe Zhang, Bernhard Aumayer, Feng Nan, Felix Bai, Shuang Ma, Shen Ma, Mengyu Li, Guoli Yin, ZiRui Wang, Ruoming Pang

Recent large language models (LLMs) advancements sparked a growing research interest in tool assisted LLMs solving real-world challenges, which calls for comprehensive evaluation of tool-use capabilities.

Apple Intelligence Foundation Language Models

no code implementations29 Jul 2024 Tom Gunter, ZiRui Wang, Chong Wang, Ruoming Pang, Aonan Zhang, BoWen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek, Sam Wiseman, Syd Evans, Tao Lei, Vivek Rathod, Xiang Kong, Xianzhi Du, Yanghao Li, Yongqiang Wang, Yuan Gao, Zaid Ahmed, Zhaoyang Xu, Zhiyun Lu, Al Rashid, Albin Madappally Jose, Alec Doane, Alfredo Bencomo, Allison Vanderby, Andrew Hansen, Ankur Jain, Anupama Mann Anupama, Areeba Kamal, Bugu Wu, Carolina Brum, Charlie Maalouf, Chinguun Erdenebileg, Chris Dulhanty, Dominik Moritz, Doug Kang, Eduardo Jimenez, Evan Ladd, Fangping Shi, Felix Bai, Frank Chu, Fred Hohman, Hadas Kotek, Hannah Gillis Coleman, Jane Li, Jeffrey Bigham, Jeffery Cao, Jeff Lai, Jessica Cheung, Jiulong Shan, Joe Zhou, John Li, Jun Qin, Karanjeet Singh, Karla Vega, Kelvin Zou, Laura Heckman, Lauren Gardiner, Margit Bowler, Maria Cordell, Meng Cao, Nicole Hay, Nilesh Shahdadpuri, Otto Godwin, Pranay Dighe, Pushyami Rachapudi, Ramsey Tantawi, Roman Frigg, Sam Davarnia, Sanskruti Shah, Saptarshi Guha, Sasha Sirovica, Shen Ma, Shuang Ma, Simon Wang, Sulgi Kim, Suma Jayaram, Vaishaal Shankar, Varsha Paidi, Vivek Kumar, Xin Wang, Xin Zheng, Walker Cheng, Yael Shrager, Yang Ye, Yasu Tanaka, Yihao Guo, Yunsong Meng, Zhao Tang Luo, Zhi Ouyang, Alp Aygar, Alvin Wan, Andrew Walkingshaw, Andy Narayanan, Antonie Lin, Arsalan Farooq, Brent Ramerth, Colorado Reed, Chris Bartels, Chris Chaney, David Riazati, Eric Liang Yang, Erin Feldman, Gabriel Hochstrasser, Guillaume Seguin, Irina Belousova, Joris Pelemans, Karen Yang, Keivan Alizadeh Vahid, Liangliang Cao, Mahyar Najibi, Marco Zuliani, Max Horton, Minsik Cho, Nikhil Bhendawade, Patrick Dong, Piotr Maj, Pulkit Agrawal, Qi Shan, Qichen Fu, Regan Poston, Sam Xu, Shuangning Liu, Sushma Rao, Tashweena Heeramun, Thomas Merth, Uday Rayala, Victor Cui, Vivek Rangarajan Sridhar, Wencong Zhang, Wenqi Zhang, Wentao Wu, Xingyu Zhou, Xinwen Liu, Yang Zhao, Yin Xia, Zhile Ren, Zhongzheng Ren

We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute.

Language Modeling Language Modelling

Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training

1 code implementation ICCV 2023 Yao Wei, Yanchao Sun, Ruijie Zheng, Sai Vemprala, Rogerio Bonatti, Shuhang Chen, Ratnesh Madaan, Zhongjie Ba, Ashish Kapoor, Shuang Ma

We introduce DualMind, a generalist agent designed to tackle various decision-making tasks that addresses challenges posed by current methods, such as overfitting behaviors and dependence on task-specific fine-tuning.

All Decision Making

TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

1 code implementation22 Jun 2023 Ruijie Zheng, Xiyao Wang, Yanchao Sun, Shuang Ma, Jieyu Zhao, Huazhe Xu, Hal Daumé III, Furong Huang

Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle.

continuous-control Continuous Control +4

SMART: Self-supervised Multi-task pretrAining with contRol Transformers

no code implementations24 Jan 2023 Yanchao Sun, Shuang Ma, Ratnesh Madaan, Rogerio Bonatti, Furong Huang, Ashish Kapoor

Self-supervised pretraining has been extensively studied in language and vision domains, where a unified model can be easily adapted to various downstream tasks by pretraining representations without explicit labels.

Imitation Learning Reinforcement Learning (RL) +1

Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022

1 code implementation18 Nov 2022 Jiachen Lei, Shuang Ma, Zhongjie Ba, Sai Vemprala, Ashish Kapoor, Kui Ren

In this report, we present our approach and empirical results of applying masked autoencoders in two egocentric video understanding tasks, namely, Object State Change Classification and PNR Temporal Localization, of Ego4D Challenge 2022.

Object State Change Classification Temporal Localization +1

PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training

no code implementations22 Sep 2022 Rogerio Bonatti, Sai Vemprala, Shuang Ma, Felipe Frujeri, Shuhang Chen, Ashish Kapoor

Robotics has long been a field riddled with complex systems architectures whose modules and connections, whether traditional or learning-based, require significant human expertise and prior knowledge.

Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers

no code implementations25 Mar 2022 Arthur Bucker, Luis Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Rogerio Bonatti

However, using language is seldom an easy task when humans need to express their intent towards robots, since most of the current language interfaces require rigid templates with a static set of action targets and commands.

Imitation Learning Text Generation

Contrastive Learning of Global and Local Video Representations

no code implementations NeurIPS 2021 Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

In this work, we propose to learn video representations that generalize to both the tasks which require global semantic information (e. g., classification) and the tasks that require local fine-grained spatio-temporal information (e. g., localization).

Classification Contrastive Learning +4

A Neural Network-Based Linguistic Similarity Measure for Entrainment in Conversations

no code implementations4 Sep 2021 Mingzhi Yu, Diane Litman, Shuang Ma, Jian Wu

Then we use the model to perform similarity measure in a corpus-based entrainment analysis.

Contrastive Learning of Global-Local Video Representations

1 code implementation7 Apr 2021 Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

In this work, we propose to learn video representations that generalize to both the tasks which require global semantic information (e. g., classification) and the tasks that require local fine-grained spatio-temporal information (e. g., localization).

Classification Contrastive Learning +6

Active Contrastive Learning of Audio-Visual Video Representations

1 code implementation ICLR 2021 Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

Contrastive learning has been shown to produce generalizable representations of audio and visual data by maximizing the lower bound on the mutual information (MI) between different views of an instance.

Contrastive Learning Representation Learning +1

Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency

no code implementations25 Oct 2019 Matt Whitehill, Shuang Ma, Daniel McDuff, Yale Song

We use this method to transfer emotion from a dataset containing four emotions to a dataset with only a single emotion.

Emotion Classification Style Transfer +2

Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck

1 code implementation ICCV 2019 Shuang Ma, Daniel McDuff, Yale Song

We propose a multimodal information bottleneck approach that learns the correspondence between modalities from unpaired data (image and speech) by leveraging the shared modality (text).

Image Generation Speech Synthesis

M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention

no code implementations9 Jul 2019 Shuang Ma, Daniel McDuff, Yale Song

Generative adversarial networks have led to significant advances in cross-modal/domain translation.

Dialogue Generation Image Captioning +7

Neural TTS Stylization with Adversarial and Collaborative Games

no code implementations ICLR 2019 Shuang Ma, Daniel McDuff, Yale Song

The synthesized audio waveform is expected to contain the verbal content of x_txt and the auditory style of x_aud.

Decoder Disentanglement +1

DA-GAN: Instance-Level Image Translation by Deep Attention Generative Adversarial Networks

no code implementations CVPR 2018 Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei

Specifically, we jointly learn a deep attention encoder, and the instance-level correspondences could be consequently discovered through attending on the learned instances.

Data Augmentation Deep Attention +2

DA-GAN: Instance-level Image Translation by Deep Attention Generative Adversarial Networks (with Supplementary Materials)

no code implementations CVPR 2018 Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei

Specifically, we jointly learn a deep attention encoder, and the instancelevel correspondences could be consequently discovered through attending on the learned instance pairs.

Data Augmentation Deep Attention +2

Cannot find the paper you are looking for? You can Submit a new open access paper.