no code implementations • 7 Nov 2023 • Fengqing Jiang, Zhangchen Xu, Luyao Niu, Boxin Wang, Jinyuan Jia, Bo Li, Radha Poovendran
Successful exploits of the identified vulnerabilities result in the users receiving responses tailored to the intent of a threat initiator.
1 code implementation • 11 Oct 2023 • Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, Bryan Catanzaro
After instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on a wide range of zero-shot tasks.
no code implementations • NeurIPS 2023 • Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li
Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications such as healthcare and finance -- where mistakes can be costly.
no code implementations • 20 May 2023 • Boxin Wang, Yibo Jacky Zhang, Yuan Cao, Bo Li, H. Brendan McMahan, Sewoong Oh, Zheng Xu, Manzil Zaheer
We study (differentially) private federated learning (FL) of language models.
1 code implementation • 13 Apr 2023 • Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, Anima Anandkumar, Bryan Catanzaro
Thus, it is still an open question: shall we pretrain large autoregressive LMs with retrieval?
no code implementations • 21 Jul 2022 • Wenda Chu, Chulin Xie, Boxin Wang, Linyi Li, Lang Yin, Arash Nourian, Han Zhao, Bo Li
However, due to the heterogeneous nature of local data, it is challenging to optimize or even define fairness of the trained global model for the agents.
1 code implementation • Findings (NAACL) 2022 • Boxin Wang, Chejian Xu, Xiangyu Liu, Yu Cheng, Bo Li
In particular, SemAttack optimizes the generated perturbations constrained on generic semantic spaces, including typo space, knowledge space (e. g., WordNet), contextualized semantic space (e. g., the embedding space of BERT clusterings), or the combination of these spaces.
1 code implementation • 8 Feb 2022 • Boxin Wang, Wei Ping, Chaowei Xiao, Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Bo Li, Anima Anandkumar, Bryan Catanzaro
In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models.
1 code implementation • 3 Feb 2022 • Maurice Weber, Linyi Li, Boxin Wang, Zhikuan Zhao, Bo Li, Ce Zhang
As a result, the wider application of these techniques is currently limited by its scalability and flexibility -- these techniques often do not scale to large-scale datasets with modern deep neural networks or cannot handle loss functions which may be non-smooth such as the 0-1 loss.
1 code implementation • 4 Nov 2021 • Boxin Wang, Chejian Xu, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah, Bo Li
In this paper, we present Adversarial GLUE (AdvGLUE), a new multi-task benchmark to quantitatively and thoroughly explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
Ranked #1 on Adversarial Robustness on AdvGLUE
1 code implementation • Findings (EMNLP) 2021 • Wei Wang, Boxin Wang, Ning Shi, Jinfeng Li, Bingyu Zhu, Xiangyu Liu, Rong Zhang
Deep learning models exhibit a preference for statistical fitting over logical reasoning.
1 code implementation • 12 Jun 2021 • Ning Shi, Wei Wang, Boxin Wang, Jinfeng Li, Xiangyu Liu, Zhouhan Lin
Punctuation restoration is an important post-processing step in automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
2 code implementations • 20 Mar 2021 • Boxin Wang, Fan Wu, Yunhui Long, Luka Rimanic, Ce Zhang, Bo Li
In this paper, we aim to explore the power of generative models and gradient sparsity, and propose a scalable privacy-preserving generative model DATALENS.
2 code implementations • ICLR 2021 • Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Ranked #3 on Natural Language Inference on ANLI test (using extra training data)
2 code implementations • 25 Jun 2020 • Kaizhao Liang, Jacky Y. Zhang, Boxin Wang, Zhuolin Yang, Oluwasanmi Koyejo, Bo Li
Knowledge transferability, or transfer learning, has been widely adopted to allow a pre-trained model in the source domain to be effectively adapted to downstream tasks in the target domain.
no code implementations • 7 Apr 2020 • Boxin Wang, Boyuan Pan, Xin Li, Bo Li
Recent advances in large-scale language representation models such as BERT have improved the state-of-the-art performances in many NLP tasks.
2 code implementations • 14 Mar 2020 • Ning Shi, Boxin Wang, Wei Wang, Xiangyu Liu, Zhouhan Lin
Humans can systematically generalize to novel compositions of existing concepts.
1 code implementation • 28 Feb 2020 • Zhuolin Yang, Zhikuan Zhao, Boxin Wang, Jiawei Zhang, Linyi Li, Hengzhi Pei, Bojan Karlas, Ji Liu, Heng Guo, Ce Zhang, Bo Li
Intensive algorithmic efforts have been made to enable the rapid improvements of certificated robustness for complex ML models recently.
1 code implementation • 9 Feb 2020 • Yunan Ye, Hengzhi Pei, Boxin Wang, Pin-Yu Chen, Yada Zhu, Jun Xiao, Bo Li
Our framework aims to address two unique challenges in financial PM: (1) data heterogeneity -- the collected information for each asset is usually diverse, noisy and imbalanced (e. g., news articles); and (2) environment uncertainty -- the financial market is versatile and non-stationary.
3 code implementations • EMNLP 2020 • Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang, Bo Li
In particular, we propose a tree-based autoencoder to embed the discrete text data into a continuous representation space, upon which we optimize the adversarial perturbation.
no code implementations • 25 Sep 2019 • Boxin Wang, Hengzhi Pei, Han Liu, Bo Li
In particular, we propose a tree based autoencoder to encode discrete text data into continuous vector space, upon which we optimize the adversarial perturbation.
3 code implementations • 22 Aug 2019 • Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Bo Li, Ce Zhang, Costas J. Spanos, Dawn Song
The most surprising result is that for unweighted $K$NN classifiers and regressors, the Shapley value of all $N$ data points can be computed, exactly, in $O(N\log N)$ time -- an exponential improvement on computational complexity!
2 code implementations • NeurIPS 2021 • Yunhui Long, Boxin Wang, Zhuolin Yang, Bhavya Kailkhura, Aston Zhang, Carl A. Gunter, Bo Li
In particular, we train a student data generator with an ensemble of teacher discriminators and propose a novel private gradient aggregation mechanism to ensure differential privacy on all information that flows from teacher discriminators to the student generator.
1 code implementation • 27 Feb 2019 • Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, Costas Spanos
In this paper, we study the problem of data valuation by utilizing the Shapley value, a popular notion of value which originated in cooperative game theory.