SST-2
25 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in SST-2
Most implemented papers
Leveraging QA Datasets to Improve Generative Data Augmentation
The ability of generative language models (GLMs) to generate text has improved considerably in the last few years, enabling their use for generative data augmentation.
SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN
Spiking neural network (SNN) has attracted great attention due to its characteristic of high efficiency and accuracy.
Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level
We develop statistically rigorous methods to address this, and after accounting for pretraining and finetuning noise, we find that our BERT-Large is worse than BERT-Mini on at least 1-4% of instances across MNLI, SST-2, and QQP, compared to the overall accuracy improvement of 2-10%.
STraTA: Self-Training with Task Augmentation for Better Few-shot Learning
Despite their recent successes in tackling many NLP tasks, large-scale pre-trained language models do not perform as well in few-shot settings where only a handful of training examples are available.
General Cross-Architecture Distillation of Pretrained Language Models into Matrix Embeddings
We match or exceed the scores of ELMo for all tasks of the GLUE benchmark except for the sentiment analysis task SST-2 and the linguistic acceptability task CoLA.
How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task
Despite their success, modern language models are fragile.
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks: Unidirectional PLMs (e. g., GPT) are well known for their superior text generation capabilities; bidirectional PLMs (e. g., BERT) have been the prominent choice for natural language understanding (NLU) tasks.
A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis
Our evaluation results on the single-task polarity prediction show that our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
Improving the Adversarial Robustness of NLP Models by Information Bottleneck
Existing studies have demonstrated that adversarial examples can be directly attributed to the presence of non-robust features, which are highly predictive, but can be easily manipulated by adversaries to fool NLP models.
ELECTRA is a Zero-Shot Learner, Too
Numerically, compared to MLM-RoBERTa-large and MLM-BERT-large, our RTD-ELECTRA-large has an average of about 8. 4% and 13. 7% improvement on all 15 tasks.