Search Results for author: Manli Shu

Found 25 papers, 14 papers with code

TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

1 code implementation7 Dec 2024 Zixian Ma, JianGuo Zhang, Zhiwei Liu, Jieyu Zhang, Juntao Tan, Manli Shu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Caiming Xiong, Ranjay Krishna, Silvio Savarese

While open-source multi-modal language models perform well on simple question answering tasks, they often fail on complex questions that require multiple capabilities, such as fine-grained recognition, visual grounding, and reasoning, and that demand multi-step solutions.

Depth Estimation Mathematical Reasoning +4

BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions

no code implementations12 Nov 2024 Anas Awadalla, Le Xue, Manli Shu, An Yan, Jun Wang, Senthil Purushwalkam, Sheng Shen, Hannah Lee, Oscar Lo, Jae Sung Park, Etash Guha, Silvio Savarese, Ludwig Schmidt, Yejin Choi, Caiming Xiong, ran Xu

We introduce BLIP3-KALE, a dataset of 218 million image-text pairs that bridges the gap between descriptive synthetic captions and factual web-scale alt-text.

Descriptive Image Captioning

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

no code implementations21 Oct 2024 Michael S. Ryoo, Honglu Zhou, Shrikant Kendre, Can Qin, Le Xue, Manli Shu, Silvio Savarese, ran Xu, Caiming Xiong, Juan Carlos Niebles

We present xGen-MM-Vid (BLIP-3-Video): a multimodal language model for videos, particularly designed to efficiently capture temporal information over multiple frames.

Language Modeling Language Modelling +2

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

1 code implementation17 Jun 2024 Anas Awadalla, Le Xue, Oscar Lo, Manli Shu, Hannah Lee, Etash Kumar Guha, Matt Jordan, Sheng Shen, Mohamed Awadalla, Silvio Savarese, Caiming Xiong, ran Xu, Yejin Choi, Ludwig Schmidt

Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs).

Coercing LLMs to do and reveal (almost) anything

1 code implementation21 Feb 2024 Jonas Geiping, Alex Stein, Manli Shu, Khalid Saifullah, Yuxin Wen, Tom Goldstein

It has recently been shown that adversarial attacks on large language models (LLMs) can "jailbreak" the model into making harmful statements.

Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models

1 code implementation5 Feb 2024 Yuancheng Xu, Jiarui Yao, Manli Shu, Yanchao Sun, Zichu Wu, Ning Yu, Tom Goldstein, Furong Huang

Vision-Language Models (VLMs) excel in generating textual responses from visual inputs, but their versatility raises security concerns.

Data Augmentation Data Poisoning +3

Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks

2 code implementations NeurIPS 2023 Micah Goldblum, Hossein Souri, Renkun Ni, Manli Shu, Viraj Prabhu, Gowthami Somepalli, Prithvijit Chattopadhyay, Mark Ibrahim, Adrien Bardes, Judy Hoffman, Rama Chellappa, Andrew Gordon Wilson, Tom Goldstein

Battle of the Backbones (BoB) makes this choice easier by benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more.

Benchmarking object-detection +2

On the Exploitability of Instruction Tuning

1 code implementation NeurIPS 2023 Manli Shu, Jiongxiao Wang, Chen Zhu, Jonas Geiping, Chaowei Xiao, Tom Goldstein

In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior.

Data Poisoning Instruction Following

Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

1 code implementation23 Jun 2023 Neel Jain, Khalid Saifullah, Yuxin Wen, John Kirchenbauer, Manli Shu, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein

With the rise of Large Language Models (LLMs) and their ubiquitous deployment in diverse domains, measuring language model behavior on realistic data is imperative.

Chatbot Language Modeling +1

On the Reliability of Watermarks for Large Language Models

1 code implementation7 Jun 2023 John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, Tom Goldstein

We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document, and we compare the robustness of watermarking to other kinds of detectors.

Hierarchical Point Attention for Indoor 3D Object Detection

no code implementations6 Jan 2023 Manli Shu, Le Xue, Ning Yu, Roberto Martín-Martín, Caiming Xiong, Tom Goldstein, Juan Carlos Niebles, ran Xu

By plugging our proposed modules into the state-of-the-art transformer-based 3D detectors, we improve the previous best results on both benchmarks, with more significant improvements on smaller objects.

3D Object Detection Object +1

Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

1 code implementation15 Sep 2022 Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, Chaowei Xiao

In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.

Image Classification Zero-shot Generalization

Gradient-Free Adversarial Training Against Image Corruption for Learning-based Steering

no code implementations NeurIPS 2021 Yu Shen, Laura Zheng, Manli Shu, Weizi Li, Tom Goldstein, Ming Lin

We introduce a simple yet effective framework for improving the robustness of learning algorithms against image corruptions for autonomous driving.

Autonomous Driving Self-Driving Cars

Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability

1 code implementation3 Aug 2021 Roman Levin, Manli Shu, Eitan Borgnia, Furong Huang, Micah Goldblum, Tom Goldstein

We find that samples which cause similar parameters to malfunction are semantically similar.

Improving Robustness of Learning-based Autonomous Steering Using Adversarial Images

no code implementations26 Feb 2021 Yu Shen, Laura Zheng, Manli Shu, Weizi Li, Tom Goldstein, Ming C. Lin

For safety of autonomous driving, vehicles need to be able to drive under various lighting, weather, and visibility conditions in different environments.

Autonomous Driving Data Augmentation +1

Driving through the Lens: Improving Generalization of Learning-based Steering using Simulated Adversarial Examples

no code implementations1 Jan 2021 Yu Shen, Laura Yu Zheng, Manli Shu, Weizi Li, Tom Goldstein, Ming Lin

To ensure the wide adoption and safety of autonomous driving, the vehicles need to be able to drive under various lighting, weather, and visibility conditions in different environments.

Autonomous Driving Data Augmentation +2

Towards Accurate Quantization and Pruning via Data-free Knowledge Transfer

no code implementations14 Oct 2020 Chen Zhu, Zheng Xu, Ali Shafahi, Manli Shu, Amin Ghiasi, Tom Goldstein

Further, we demonstrate that the compact structure and corresponding initialization from the Lottery Ticket Hypothesis can also help in data-free training.

Data Free Quantization Transfer Learning

Prepare for the Worst: Generalizing across Domain Shifts with Adversarial Batch Normalization

no code implementations28 Sep 2020 Manli Shu, Zuxuan Wu, Micah Goldblum, Tom Goldstein

Adversarial training is the industry standard for producing models that are robust to small adversarial perturbations.

Semantic Segmentation

Encoding Robustness to Image Style via Adversarial Feature Perturbations

1 code implementation NeurIPS 2021 Manli Shu, Zuxuan Wu, Micah Goldblum, Tom Goldstein

We adapt adversarial training by directly perturbing feature statistics, rather than image pixels, to produce models that are robust to various unseen distributional shifts.

Data Augmentation Semantic Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.