Here, we present a large-scale empirical study on active learning techniques for BERT-based classification, addressing a diverse set of AL strategies and datasets.
1 code implementation • 2 Aug 2022 • Eyal Shnarch, Alon Halfon, Ariel Gera, Marina Danilevsky, Yannis Katsis, Leshem Choshen, Martin Santillan Cooper, Dina Epelboim, Zheng Zhang, Dakuo Wang, Lucy Yip, Liat Ein-Dor, Lena Dankin, Ilya Shnayderman, Ranit Aharonov, Yunyao Li, Naftali Liberman, Philip Levin Slesarev, Gwilym Newton, Shila Ofek-Koifman, Noam Slonim, Yoav Katz
Text classification can be useful in many real-world scenarios, saving a lot of time for end users.
We observe that there are different heuristics that are associated with summaries of different perspectives, and explore these heuristics to create weak-labeled data for intermediate training of the models before fine-tuning with scarce human annotated summaries.
In real-world scenarios, a text classification task often begins with a cold start, when labeled data is scarce.
Data exploration is an important step of every data science and machine learning project, including those involving textual data.
In such low resources scenarios, we suggest performing an unsupervised classification task prior to fine-tuning on the target task.
Approaching new data can be quite deterrent; you do not know how your categories of interest are realized in it, commonly, there is no labeled data at hand, and the performance of domain adaptation methods is unsatisfactory.
no code implementations • 25 Nov 2019 • Liat Ein-Dor, Eyal Shnarch, Lena Dankin, Alon Halfon, Benjamin Sznajder, Ariel Gera, Carlos Alzate, Martin Gleize, Leshem Choshen, Yufang Hou, Yonatan Bilu, Ranit Aharonov, Noam Slonim
One of the main tasks in argument mining is the retrieval of argumentative content pertaining to a given topic.
With the advancement in argument detection, we suggest to pay more attention to the challenging task of identifying the more convincing arguments.
We propose a methodology to blend high quality but scarce strong labeled data with noisy but abundant weak labeled data during the training of neural networks.
A human observer may notice the following underlying common structure, or pattern: [someone][argue/suggest/state][that][topic term][sentiment term].