no code implementations • 27 Nov 2024 • Seungyeon Kim, Wheesung Lee, Sung-Ho Ahn, Do-Eun Lee, Tae-Rin Lee
Accurate prediction of cerebral blood flow is essential for the diagnosis and treatment of cerebrovascular diseases.
no code implementations • 28 Oct 2024 • Sangmin Bae, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Seungyeon Kim, Tal Schuster
In this work, we revisit "layer tying" as form of parameter sharing in Transformers, and introduce novel methods for converting existing LLMs into smaller "Recursive Transformers" that share parameters across layers, with minimal loss of performance.
no code implementations • 24 Oct 2024 • Ankit Singh Rawat, Veeranjaneyulu Sadhanala, Afshin Rostamizadeh, Ayan Chakrabarti, Wittawat Jitkrittum, Vladimir Feinberg, Seungyeon Kim, Hrayr Harutyunyan, Nikunj Saunshi, Zachary Nado, Rakesh Shivanna, Sashank J. Reddi, Aditya Krishna Menon, Rohan Anil, Sanjiv Kumar
In particular, this paradigm relies on an SLM to both (1) provide soft labels as additional training supervision, and (2) select a small subset of valuable ("informative" and "hard") training examples.
no code implementations • 20 Aug 2024 • Ameya Godbole, Nicholas Monath, Seungyeon Kim, Ankit Singh Rawat, Andrew McCallum, Manzil Zaheer
In text generation, hallucinations refer to the generation of seemingly coherent text that contradicts established knowledge.
no code implementations • 29 Jul 2024 • Yonghyeon LEE, Byeongho Lee, Seungyeon Kim, Frank C. Park
Effective movement primitives should be capable of encoding and generating a rich repertoire of trajectories -- typically collected from human demonstrations -- conditioned on task-defining parameters such as vision or language inputs.
no code implementations • 29 May 2024 • Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, Sanjiv Kumar
Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule that invokes the larger model only for "hard" inputs, while speculative decoding uses speculative execution to primarily invoke the larger model in parallel verification mode.
no code implementations • 28 Jan 2023 • Hrayr Harutyunyan, Ankit Singh Rawat, Aditya Krishna Menon, Seungyeon Kim, Sanjiv Kumar
Despite the popularity and efficacy of knowledge distillation, there is limited understanding of why it helps.
no code implementations • 27 Jan 2023 • Seungyeon Kim, Ankit Singh Rawat, Manzil Zaheer, Sadeep Jayasumana, Veeranjaneyulu Sadhanala, Wittawat Jitkrittum, Aditya Krishna Menon, Rob Fergus, Sanjiv Kumar
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR).
no code implementations • 14 Aug 2022 • Manzil Zaheer, Ankit Singh Rawat, Seungyeon Kim, Chong You, Himanshu Jain, Andreas Veit, Rob Fergus, Sanjiv Kumar
In this paper, we propose the teacher-guided training (TGT) framework for training a high-quality compact model that leverages the knowledge acquired by pretrained generative models, while obviating the need to go through a large volume of data.
no code implementations • 29 Sep 2021 • Aditya Krishna Menon, Sadeep Jayasumana, Seungyeon Kim, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
Transformer-based models such as BERT have proven successful in information retrieval problem, which seek to identify relevant documents for a given query.
no code implementations • 29 Sep 2021 • Yaodong Yu, Heinrich Jiang, Dara Bahri, Hossein Mobahi, Seungyeon Kim, Ankit Singh Rawat, Andreas Veit, Yi Ma
Concretely, we show that larger models and larger datasets need to be simultaneously leveraged to improve OOD performance.
no code implementations • 29 Sep 2021 • Yonghyeon LEE, Seungyeon Kim, Jinwon Choi, Frank C. Park
The only requirement on the part of the user is the choice of a meaningful underlying probability distribution, which is more intuitive and natural to make than what is required in existing ad hoc formulations.
no code implementations • 19 May 2021 • Seungyeon Kim, Daniel Glasner, Srikumar Ramalingam, Cho-Jui Hsieh, Kishore Papineni, Sanjiv Kumar
It is generally believed that robust training of extremely large networks is critical to their success in real-world applications.
no code implementations • AISTATS 2021 • Sashank J. Reddi, Rama Kumar Pasumarthi, Aditya Krishna Menon, Ankit Singh Rawat Felix Yu, Seungyeon Kim, Andreas Veit, Sanjiv Kumar
Knowledge distillation is an approach to improve the performance of a student model by using the knowledge of a complex teacher. Despite its success in several deep learning applications, the study of distillation is mostly confined to classification settings.
no code implementations • 5 Feb 2021 • Srinadh Bhojanapalli, Kimberly Wilber, Andreas Veit, Ankit Singh Rawat, Seungyeon Kim, Aditya Menon, Sanjiv Kumar
By analyzing the relationship between churn and prediction confidences, we pursue an approach with two components for churn reduction.
no code implementations • EMNLP 2020 • Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar
Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising.
no code implementations • ICLR 2021 • Cheng-Yu Hsieh, Chih-Kuan Yeh, Xuanqing Liu, Pradeep Ravikumar, Seungyeon Kim, Sanjiv Kumar, Cho-Jui Hsieh
In this paper, we establish a novel set of evaluation criteria for such feature based explanations by robustness analysis.
no code implementations • 21 May 2020 • Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Seungyeon Kim, Sanjiv Kumar
In this paper, we present a statistical perspective on distillation which addresses this question, and provides a novel connection to extreme multiclass retrieval techniques.
no code implementations • NeurIPS 2020 • Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J. Reddi, Sanjiv Kumar, Suvrit Sra
While stochastic gradient descent (SGD) is still the \emph{de facto} algorithm in deep learning, adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across important tasks, such as attention models.
no code implementations • 25 Sep 2019 • Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J Reddi, Sanjiv Kumar, Suvrit Sra
While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning, adaptive methods like Adam have been observed to outperform SGD across important tasks, such as attention models.
no code implementations • NeurIPS 2012 • Joonseok Lee, Mingxuan Sun, Seungyeon Kim, Guy Lebanon
Recent approaches to collaborative filtering have concentrated on estimating an algebraic or statistical model, and using the model for predicting missing ratings.
no code implementations • 8 Feb 2012 • Seungyeon Kim, Fuxin Li, Guy Lebanon, Irfan Essa
Sentiment analysis predicts the presence of positive or negative emotions in a text document.
no code implementations • 6 Mar 2010 • Seungyeon Kim, Guy Lebanon
Unlike static documents, version controlled documents are continuously edited by one or more authors.