no code implementations • 7 Mar 2024 • Michal Lukasik, Harikrishna Narasimhan, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar
Large language models (LLMs) have shown strong results on a range of applications, including regression and scoring tasks.
no code implementations • 13 Oct 2023 • Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar
Classical wisdom in machine learning holds that the generalization error can be decomposed into bias and variance, and these two terms exhibit a \emph{trade-off}.
no code implementations • 9 Oct 2023 • Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar
The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (memorise) completely random labels.
no code implementations • 9 Nov 2022 • Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, Sanjiv Kumar
By contrast, when the context is irrelevant to the task, the model should ignore it and fall back on its internal knowledge.
no code implementations • 1 Nov 2022 • Yihan Wang, Si Si, Daliang Li, Michal Lukasik, Felix Yu, Cho-Jui Hsieh, Inderjit S Dhillon, Sanjiv Kumar
Pretrained large language models (LLMs) are general purpose problem solvers applicable to a diverse set of tasks with prompts.
no code implementations • 13 Jun 2022 • Serena Wang, Harikrishna Narasimhan, Yichen Zhou, Sara Hooker, Michal Lukasik, Aditya Krishna Menon
We show empirically that our robust distillation techniques not only achieve better worst-class performance, but also lead to Pareto improvement in the tradeoff between overall performance and worst-class performance compared to other baseline methods.
no code implementations • 28 Oct 2021 • Wittawat Jitkrittum, Michal Lukasik, Ananda Theertha Suresh, Felix Yu, Gang Wang
In this paper, we study training and inference of neural networks under the MPC setup.
1 code implementation • 13 Oct 2021 • Srinadh Bhojanapalli, Ayan Chakrabarti, Andreas Veit, Michal Lukasik, Himanshu Jain, Frederick Liu, Yin-Wen Chang, Sanjiv Kumar
Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision.
no code implementations • 29 Sep 2021 • Wittawat Jitkrittum, Michal Lukasik, Ananda Theertha Suresh, Felix Yu, Gang Wang
In this paper, we study training and inference of neural networks under the MPC setup.
no code implementations • 19 Jun 2021 • Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar
Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model.
no code implementations • 16 Jun 2021 • Srinadh Bhojanapalli, Ayan Chakrabarti, Himanshu Jain, Sanjiv Kumar, Michal Lukasik, Andreas Veit
State-of-the-art transformer models use pairwise dot-product based self-attention, which comes at a computational cost quadratic in the input sequence length.
no code implementations • EMNLP 2020 • Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar
Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising.
2 code implementations • 3 Jul 2020 • Aleksandar Bojchevski, Johannes Gasteiger, Bryan Perozzi, Amol Kapoor, Martin Blais, Benedek Rózemberczki, Michal Lukasik, Stephan Günnemann
Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks.
1 code implementation • EMNLP 2020 • Michal Lukasik, Boris Dadachev, Gonçalo Simões, Kishore Papineni
Document and discourse segmentation are two fundamental NLP tasks pertaining to breaking up text into constituents, which are commonly used to help downstream tasks such as information retrieval or text summarization.
no code implementations • ICML 2020 • Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar
Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors.
Ranked #13 on
Learning with noisy labels
on CIFAR-10N-Random3
no code implementations • EMNLP 2018 • Michal Lukasik, Richard Zens
Background research is an essential part of document writing.
Extreme Multi-Label Classification
General Classification
+1
no code implementations • 6 Dec 2017 • Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik, Kalina Bontcheva, Trevor Cohn, Isabelle Augenstein
We show that sequential classifiers that exploit the use of discourse properties in social media conversations while using only local features, outperform non-sequential classifiers.
no code implementations • COLING 2016 • Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik
Rumour stance classification, the task that determines if each tweet in a collection discussing a rumour is supporting, denying, questioning or simply commenting on the rumour, has been attracting substantial interest.
no code implementations • 7 Sep 2016 • Michal Lukasik, Kalina Bontcheva, Trevor Cohn, Arkaitz Zubiaga, Maria Liakata, Rob Procter
Social media tend to be rife with rumours while new reports are released piecemeal during breaking news.
no code implementations • EMNLP 2015 • Michal Lukasik, Trevor Cohn, Kalina Bontcheva
Social media is a rich source of rumours and corresponding community reactions.