no code implementations • 11 Mar 2025 • Zhen Tan, Jun Yan, I-Hung Hsu, Rujun Han, Zifeng Wang, Long T. Le, Yiwen Song, Yanfei Chen, Hamid Palangi, George Lee, Anand Iyer, Tianlong Chen, Huan Liu, Chen-Yu Lee, Tomas Pfister
Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information from long-term interactions limits their effectiveness in applications requiring sustained personalization.
no code implementations • 10 Mar 2025 • Fan Yin, Zifeng Wang, I-Hung Hsu, Jun Yan, Ke Jiang, Yanfei Chen, Jindong Gu, Long T. Le, Kai-Wei Chang, Chen-Yu Lee, Hamid Palangi, Tomas Pfister
To address this, we propose Magnet, a principled framework for synthesizing high-quality training trajectories to enhance the function calling capability of large language model agents in multi-turn conversations with humans.
no code implementations • 22 Feb 2025 • Mihir Parmar, Xin Liu, Palash Goyal, Yanfei Chen, Long Le, Swaroop Mishra, Hossein Mobahi, Jindong Gu, Zifeng Wang, Hootan Nakhost, Chitta Baral, Chen-Yu Lee, Tomas Pfister, Hamid Palangi
Recent agent frameworks and inference-time algorithms often struggle with complex planning problems due to limitations in verifying generated plans or reasoning and varying complexity of instances within a single task.
no code implementations • 12 Feb 2025 • Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Jeevesh Juneja, Zifeng Wang, Chen-Yu Lee, Pradeep Shenoy, Rina Panigrahy, Aditya Krishna Menon, Sanjiv Kumar
Large language models' significant advances in capabilities are accompanied by significant increases in inference costs.
no code implementations • 6 Feb 2025 • Shangbin Feng, Wenxuan Ding, Alisa Liu, Zifeng Wang, Weijia Shi, Yike Wang, Zejiang Shen, Xiaochuang Han, Hunter Lang, Chen-Yu Lee, Tomas Pfister, Yejin Choi, Yulia Tsvetkov
This position paper argues that in many realistic (i. e., complex, contextualized, subjective) scenarios, one LLM is not enough to produce a reliable output.
no code implementations • 6 Feb 2025 • Shangbin Feng, Zifeng Wang, Palash Goyal, Yike Wang, Weijia Shi, Huang Xia, Hamid Palangi, Luke Zettlemoyer, Yulia Tsvetkov, Chen-Yu Lee, Tomas Pfister
We propose Heterogeneous Swarms, an algorithm to design multi-LLM systems by jointly optimizing model roles and weights.
1 code implementation • 12 Dec 2024 • Yi-Lun Lee, Chen-Yu Lee, Wei-Chen Chiu, Yi-Hsuan Tsai
Specifically, the non-important tokens are masked based on the attention weights and the correlation across different modalities, significantly reducing the storage size of an exemplar and consequently saving more exemplars under the same memory buffer.
no code implementations • 29 Nov 2024 • Justin Chih-Yao Chen, Zifeng Wang, Hamid Palangi, Rujun Han, Sayna Ebrahimi, Long Le, Vincent Perot, Swaroop Mishra, Mohit Bansal, Chen-Yu Lee, Tomas Pfister
Reverse thinking plays a crucial role in human reasoning.
no code implementations • 15 Oct 2024 • Shangbin Feng, Zifeng Wang, Yike Wang, Sayna Ebrahimi, Hamid Palangi, Lesly Miculicich, Achin Kulshrestha, Nathalie Rauschmayr, Yejin Choi, Yulia Tsvetkov, Chen-Yu Lee, Tomas Pfister
Extensive experiments demonstrate that Model Swarms could flexibly adapt LLM experts to a single task, multi-task domains, reward models, as well as diverse human interests, improving over 12 model composition baselines by up to 21. 0% across tasks and contexts.
no code implementations • 15 Oct 2024 • Wenda Xu, Rujun Han, Zifeng Wang, Long T. Le, Dhruv Madeka, Lei LI, William Yang Wang, Rishabh Agarwal, Chen-Yu Lee, Tomas Pfister
To address these limitations, we introduce Speculative Knowledge Distillation (SKD), a novel approach that leverages cooperation between student and teacher models to generate high-quality training data on-the-fly while aligning with the student's inference-time distribution.
1 code implementation • 7 Oct 2024 • Si-An Chen, Lesly Miculicich, Julian Martin Eisenschlos, Zifeng Wang, Zilong Wang, Yanfei Chen, Yasuhisa Fujii, Hsuan-Tien Lin, Chen-Yu Lee, Tomas Pfister
Recent advancements in language models (LMs) have notably enhanced their ability to reason with tabular data, primarily through program-aided mechanisms that manipulate and analyze tables.
no code implementations • 3 Aug 2024 • Yanfei Chen, Jinsung Yoon, Devendra Singh Sachan, Qingze Wang, Vincent Cohen-Addad, Mohammadhossein Bateni, Chen-Yu Lee, Tomas Pfister
Recent advances in large language models (LLMs) have enabled autonomous agents with complex reasoning and task-fulfillment capabilities using a wide range of tools.
no code implementations • 11 Jul 2024 • Zilong Wang, Zifeng Wang, Long Le, Huaixiu Steven Zheng, Swaroop Mishra, Vincent Perot, Yuwei Zhang, Anush Mattapalli, Ankur Taly, Jingbo Shang, Chen-Yu Lee, Tomas Pfister
Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses.
no code implementations • 23 Jun 2024 • Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister
Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input.
no code implementations • 8 Jun 2024 • I-Hung Hsu, Zifeng Wang, Long T. Le, Lesly Miculicich, Nanyun Peng, Chen-Yu Lee, Tomas Pfister
Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses by accurately citing verifiable sources.
no code implementations • 8 Apr 2024 • Zifeng Wang, Chun-Liang Li, Vincent Perot, Long T. Le, Jin Miao, Zizhao Zhang, Chen-Yu Lee, Tomas Pfister
To this end, we introduce CodecLM, a general framework for adaptively generating high-quality synthetic data for LLM alignment with different downstream instruction distributions and LLMs.
1 code implementation • 16 Feb 2024 • Kuniaki Saito, Kihyuk Sohn, Chen-Yu Lee, Yoshitaka Ushiku
In this new knowledge acquisition and extraction, we find a very intriguing fact that LLMs can accurately answer questions about the first sentence, but they struggle to extract information described in the middle or end of the documents used for fine-tuning.
2 code implementations • 9 Jan 2024 • Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, Tomas Pfister
We propose the Chain-of-Table framework, where tabular data is explicitly used in the reasoning chain as a proxy for intermediate thoughts.
Ranked #4 on
Table-based Fact Verification
on TabFact
no code implementations • 19 Sep 2023 • Vincent Perot, Kai Kang, Florian Luisier, Guolong Su, Xiaoyu Sun, Ramya Sree Boppana, Zilong Wang, Zifeng Wang, Jiaqi Mu, Hao Zhang, Chen-Yu Lee, Nan Hua
The main obstacles to adopting LLMs for this task include the absence of layout encoding within LLMs, which is critical for high quality extraction, and the lack of a grounding mechanism to localize the predicted entities within the document.
no code implementations • 1 Aug 2023 • Cheng-Yu Hsieh, Si-An Chen, Chun-Liang Li, Yasuhisa Fujii, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister
Today, large language models (LLMs) are taught to use new tools by providing a few demonstrations of the tool's usage.
no code implementations • 4 May 2023 • Chen-Yu Lee, Chun-Liang Li, Hao Zhang, Timothy Dozat, Vincent Perot, Guolong Su, Xiang Zhang, Kihyuk Sohn, Nikolai Glushnev, Renshen Wang, Joshua Ainslie, Shangbang Long, Siyang Qin, Yasuhisa Fujii, Nan Hua, Tomas Pfister
In FormNetV2, we introduce a centralized multimodal graph contrastive learning strategy to unify self-supervised pre-training for all modalities in one loss.
1 code implementation • 3 May 2023 • Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, Tomas Pfister
Third, we reduce both the model size and the amount of data required to outperform LLMs; our finetuned 770M T5 model outperforms the few-shot prompted 540B PaLM model using only 80% of available data on a benchmark, whereas standard finetuning the same T5 model struggles to match even by using 100% of the dataset.
2 code implementations • CVPR 2023 • Yi-Lun Lee, Yi-Hsuan Tsai, Wei-Chen Chiu, Chen-Yu Lee
In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to finetune on heavy transformer models.
1 code implementation • CVPR 2023 • Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister
Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image.
Ranked #1 on
Zero-shot Image Retrieval
on ImageNet-R
no code implementations • 12 Jan 2023 • Ruoxi Sun, Chun-Liang Li, Sercan O. Arik, Michael W. Dusenberry, Chen-Yu Lee, Tomas Pfister
Accurate estimation of output quantiles is crucial in many use cases, where it is desired to model the range of possibility.
no code implementations • 15 Nov 2022 • Zilong Wang, Yichao Zhou, Wei Wei, Chen-Yu Lee, Sandeep Tata
Understanding visually-rich business documents to extract structured data and automate business workflows has been receiving attention both in academia and industry.
no code implementations • 14 Nov 2022 • Zifeng Wang, Zizhao Zhang, Jacob Devlin, Chen-Yu Lee, Guolong Su, Hao Zhang, Jennifer Dy, Vincent Perot, Tomas Pfister
Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities.
no code implementations • CVPR 2023 • Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister
In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.
3 code implementations • 10 Apr 2022 • Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister
Continual learning aims to enable a single model to learn a sequence of tasks without catastrophic forgetting.
no code implementations • ACL 2022 • Chen-Yu Lee, Chun-Liang Li, Timothy Dozat, Vincent Perot, Guolong Su, Nan Hua, Joshua Ainslie, Renshen Wang, Yasuhisa Fujii, Tomas Pfister
Sequence modeling has demonstrated state-of-the-art performance on natural language and document understanding tasks.
no code implementations • 10 Jan 2022 • Vishnu Suresh Lokhande, Kihyuk Sohn, Jinsung Yoon, Madeleine Udell, Chen-Yu Lee, Tomas Pfister
Such a requirement is impractical in situations where the data labeling efforts for minority or rare groups are significantly laborious or where the individuals comprising the dataset choose to conceal sensitive information.
2 code implementations • 21 Dec 2021 • Kihyuk Sohn, Jinsung Yoon, Chun-Liang Li, Chen-Yu Lee, Tomas Pfister
We define a distance function between images, each of which is represented as a bag of embeddings, by the Euclidean distance between weighted averaged embeddings.
5 code implementations • CVPR 2022 • Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister
The mainstream paradigm behind continual learning has been to adapt the model parameters to non-stationary data distributions, where catastrophic forgetting is the central challenge.
no code implementations • 29 Sep 2021 • Vishnu Suresh Lokhande, Kihyuk Sohn, Jinsung Yoon, Madeleine Udell, Chen-Yu Lee, Tomas Pfister
Such a requirement is impractical in situations where the data labelling efforts for minority or rare groups is significantly laborious or where the individuals comprising the dataset choose to conceal sensitive information.
no code implementations • 29 Sep 2021 • Justin Lazarow, Kihyuk Sohn, Chun-Liang Li, Zizhao Zhang, Chen-Yu Lee, Tomas Pfister
While remarkable progress in imbalanced supervised learning has been made recently, less attention has been given to the setting of imbalanced semi-supervised learning (SSL) where not only is a few labeled data provided, but the underlying data distribution can be severely imbalanced.
no code implementations • ACL 2021 • Chen-Yu Lee, Chun-Liang Li, Chu Wang, Renshen Wang, Yasuhisa Fujii, Siyang Qin, Ashok Popat, Tomas Pfister
Natural reading orders of words are crucial for information extraction from form-like documents.
no code implementations • 11 Jun 2021 • Jinsung Yoon, Kihyuk Sohn, Chun-Liang Li, Sercan O. Arik, Chen-Yu Lee, Tomas Pfister
We demonstrate our method on various unsupervised AD tasks with image and tabular data.
no code implementations • 11 Jan 2021 • Kunpeng Li, Zizhao Zhang, Guanhang Wu, Xuehan Xiong, Chen-Yu Lee, Zhichao Lu, Yun Fu, Tomas Pfister
To address this issue, we introduce a new method for pre-training video action recognition models using queried web videos.
no code implementations • 1 Jan 2021 • Kunpeng Li, Zizhao Zhang, Guanhang Wu, Xuehan Xiong, Chen-Yu Lee, Yun Fu, Tomas Pfister
To address this issue, we introduce a new method for pre-training video action recognition models using queried web videos.
no code implementations • ICML 2020 • Pengsheng Guo, Chen-Yu Lee, Daniel Ulbricht
Training multiple tasks jointly in one deep network yields reduced latency during inference and better performance over the single-task counterpart by sharing certain layers of a network.
7 code implementations • 10 May 2020 • Kihyuk Sohn, Zizhao Zhang, Chun-Liang Li, Han Zhang, Chen-Yu Lee, Tomas Pfister
Semi-supervised learning (SSL) has a potential to improve the predictive performance of machine learning models using unlabeled data.
Ranked #13 on
Semi-Supervised Object Detection
on COCO 100% labeled data
(using extra training data)
2 code implementations • CVPR 2019 • Chen-Yu Lee, Tanmay Batra, Mohammad Haris Baig, Daniel Ulbricht
In this work, we connect two distinct concepts for unsupervised domain adaptation: feature distribution alignment between domains by utilizing the task-specific decision boundary and the Wasserstein metric.
Ranked #19 on
Image-to-Image Translation
on SYNTHIA-to-Cityscapes
4 code implementations • ICML 2018 • Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, Andrew Rabinovich
Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly.
2 code implementations • ICCV 2017 • Chen-Yu Lee, Vijay Badrinarayanan, Tomasz Malisiewicz, Andrew Rabinovich
This paper focuses on the task of room layout estimation from a monocular RGB image.
no code implementations • CVPR 2016 • Chen-Yu Lee, Simon Osindero
We present recursive recurrent neural networks with attention modeling (R$^2$AM) for lexicon-free optical character recognition in natural scene images.
2 code implementations • 30 Sep 2015 • Chen-Yu Lee, Patrick W. Gallagher, Zhuowen Tu
We seek to improve deep neural networks by generalizing the pooling operations that play a central role in current architectures.
1 code implementation • 11 May 2015 • Liwei Wang, Chen-Yu Lee, Zhuowen Tu, Svetlana Lazebnik
One of the most promising ways of improving the performance of deep convolutional neural networks is by increasing the number of convolutional layers.
1 code implementation • 18 Sep 2014 • Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu
Our proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent.
no code implementations • CVPR 2014 • Chen-Yu Lee, Anurag Bhardwaj, Wei Di, Vignesh Jagadeesh, Robinson Piramuthu
We present a new feature representation method for scene text recognition problem, particularly focusing on improving scene character recognition.