no code implementations • EMNLP (BlackboxNLP) 2021 • Zhouhang Xie, Jonathan Brophy, Adam Noack, Wencong You, Kalyani Asthana, Carter Perkins, Sabrina Reis, Zayd Hammoudeh, Daniel Lowd, Sameer Singh
Adversarial attacks curated against NLP models are increasingly becoming practical threats.
1 code implementation • 21 Oct 2022 • Kalyani Asthana, Zhouhang Xie, Wencong You, Adam Noack, Jonathan Brophy, Sameer Singh, Daniel Lowd
In addition to the primary tasks of detecting and labeling attacks, TCAB can also be used for attack localization, attack target labeling, and attack characterization.
1 code implementation • 23 May 2022 • Jonathan Brophy, Daniel Lowd
We also find that IBUG can achieve improved probabilistic performance by using different base GBRT models, and can more flexibly model the posterior distribution of a prediction than competing methods.
1 code implementation • 30 Apr 2022 • Jonathan Brophy, Zayd Hammoudeh, Daniel Lowd
In the pursuit of better understanding GBDT predictions and generally improving these models, we adapt recent and popular influence-estimation methods designed for deep learning models to GBDTs.
no code implementations • 21 Jan 2022 • Zhouhang Xie, Jonathan Brophy, Adam Noack, Wencong You, Kalyani Asthana, Carter Perkins, Sabrina Reis, Sameer Singh, Daniel Lowd
The landscape of adversarial attacks against text classifiers continues to grow, with new attacks developed every year and many of them available in standard toolkits, such as TextAttack and OpenAttack.
1 code implementation • 11 Sep 2020 • Jonathan Brophy, Daniel Lowd
The weights in the kernel expansion of the surrogate model are used to define the global or local importance of each training example.
3 code implementations • 11 Sep 2020 • Jonathan Brophy, Daniel Lowd
The upper levels of DaRE trees use random nodes, which choose split attributes and thresholds uniformly at random.
no code implementations • 14 Jan 2020 • Jonathan Brophy, Daniel Lowd
In this paper, we present Extended Group-based Graphical models for Spam (EGGS), a general-purpose method for classifying spam in online social networks.