Search Results for author: Chen-Yu Lee

Found 49 papers, 17 papers with code

In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents

no code implementations11 Mar 2025 Zhen Tan, Jun Yan, I-Hung Hsu, Rujun Han, Zifeng Wang, Long T. Le, Yiwen Song, Yanfei Chen, Hamid Palangi, George Lee, Anand Iyer, Tianlong Chen, Huan Liu, Chen-Yu Lee, Tomas Pfister

Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information from long-term interactions limits their effectiveness in applications requiring sustained personalization.

Management Reinforcement Learning (RL) +1

Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation

no code implementations10 Mar 2025 Fan Yin, Zifeng Wang, I-Hung Hsu, Jun Yan, Ke Jiang, Yanfei Chen, Jindong Gu, Long T. Le, Kai-Wei Chang, Chen-Yu Lee, Hamid Palangi, Tomas Pfister

To address this, we propose Magnet, a principled framework for synthesizing high-quality training trajectories to enhance the function calling capability of large language model agents in multi-turn conversations with humans.

Large Language Model

PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving

no code implementations22 Feb 2025 Mihir Parmar, Xin Liu, Palash Goyal, Yanfei Chen, Long Le, Swaroop Mishra, Hossein Mobahi, Jindong Gu, Zifeng Wang, Hootan Nakhost, Chitta Baral, Chen-Yu Lee, Tomas Pfister, Hamid Palangi

Recent agent frameworks and inference-time algorithms often struggle with complex planning problems due to limitations in verifying generated plans or reasoning and varying complexity of instances within a single task.

When One LLM Drools, Multi-LLM Collaboration Rules

no code implementations6 Feb 2025 Shangbin Feng, Wenxuan Ding, Alisa Liu, Zifeng Wang, Weijia Shi, Yike Wang, Zejiang Shen, Xiaochuang Han, Hunter Lang, Chen-Yu Lee, Tomas Pfister, Yejin Choi, Yulia Tsvetkov

This position paper argues that in many realistic (i. e., complex, contextualized, subjective) scenarios, one LLM is not enough to produce a reliable output.

Diversity

Exemplar Masking for Multimodal Incremental Learning

1 code implementation12 Dec 2024 Yi-Lun Lee, Chen-Yu Lee, Wei-Chen Chiu, Yi-Hsuan Tsai

Specifically, the non-important tokens are masked based on the attention weights and the correlation across different modalities, significantly reducing the storage size of an exemplar and consequently saving more exemplars under the same memory buffer.

Data Augmentation Incremental Learning

Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence

no code implementations15 Oct 2024 Shangbin Feng, Zifeng Wang, Yike Wang, Sayna Ebrahimi, Hamid Palangi, Lesly Miculicich, Achin Kulshrestha, Nathalie Rauschmayr, Yejin Choi, Yulia Tsvetkov, Chen-Yu Lee, Tomas Pfister

Extensive experiments demonstrate that Model Swarms could flexibly adapt LLM experts to a single task, multi-task domains, reward models, as well as diverse human interests, improving over 12 model composition baselines by up to 21. 0% across tasks and contexts.

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

no code implementations15 Oct 2024 Wenda Xu, Rujun Han, Zifeng Wang, Long T. Le, Dhruv Madeka, Lei LI, William Yang Wang, Rishabh Agarwal, Chen-Yu Lee, Tomas Pfister

To address these limitations, we introduce Speculative Knowledge Distillation (SKD), a novel approach that leverages cooperation between student and teacher models to generate high-quality training data on-the-fly while aligning with the student's inference-time distribution.

Instruction Following Knowledge Distillation +2

TableRAG: Million-Token Table Understanding with Language Models

1 code implementation7 Oct 2024 Si-An Chen, Lesly Miculicich, Julian Martin Eisenschlos, Zifeng Wang, Zilong Wang, Yanfei Chen, Yasuhisa Fujii, Hsuan-Tien Lin, Chen-Yu Lee, Tomas Pfister

Recent advancements in language models (LMs) have notably enhanced their ability to reason with tabular data, primarily through program-aided mechanisms that manipulate and analyze tables.

RAG Retrieval +1

Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval

no code implementations3 Aug 2024 Yanfei Chen, Jinsung Yoon, Devendra Singh Sachan, Qingze Wang, Vincent Cohen-Addad, Mohammadhossein Bateni, Chen-Yu Lee, Tomas Pfister

Recent advances in large language models (LLMs) have enabled autonomous agents with complex reasoning and task-fulfillment capabilities using a wide range of tools.

Retrieval

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

no code implementations11 Jul 2024 Zilong Wang, Zifeng Wang, Long Le, Huaixiu Steven Zheng, Swaroop Mishra, Vincent Perot, Yuwei Zhang, Anush Mattapalli, Ankur Taly, Jingbo Shang, Chen-Yu Lee, Tomas Pfister

Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses.

ARC RAG +3

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

no code implementations23 Jun 2024 Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister

Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input.

RAG Retrieval-augmented Generation

CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation

no code implementations8 Jun 2024 I-Hung Hsu, Zifeng Wang, Long T. Le, Lesly Miculicich, Nanyun Peng, Chen-Yu Lee, Tomas Pfister

Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses by accurately citing verifiable sources.

Open-Domain Question Answering

CodecLM: Aligning Language Models with Tailored Synthetic Data

no code implementations8 Apr 2024 Zifeng Wang, Chun-Liang Li, Vincent Perot, Long T. Le, Jin Miao, Zizhao Zhang, Chen-Yu Lee, Tomas Pfister

To this end, we introduce CodecLM, a general framework for adaptively generating high-quality synthetic data for LLM alignment with different downstream instruction distributions and LLMs.

Instruction Following

Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction

1 code implementation16 Feb 2024 Kuniaki Saito, Kihyuk Sohn, Chen-Yu Lee, Yoshitaka Ushiku

In this new knowledge acquisition and extraction, we find a very intriguing fact that LLMs can accurately answer questions about the first sentence, but they struggle to extract information described in the middle or end of the documents used for fine-tuning.

Denoising Language Modeling +4

LMDX: Language Model-based Document Information Extraction and Localization

no code implementations19 Sep 2023 Vincent Perot, Kai Kang, Florian Luisier, Guolong Su, Xiaoyu Sun, Ramya Sree Boppana, Zilong Wang, Zifeng Wang, Jiaqi Mu, Hao Zhang, Chen-Yu Lee, Nan Hua

The main obstacles to adopting LLMs for this task include the absence of layout encoding within LLMs, which is critical for high quality extraction, and the lack of a grounding mechanism to localize the predicted entities within the document.

Language Modeling Language Modelling

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

1 code implementation3 May 2023 Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, Tomas Pfister

Third, we reduce both the model size and the amount of data required to outperform LLMs; our finetuned 770M T5 model outperforms the few-shot prompted 540B PaLM model using only 80% of available data on a benchmark, whereas standard finetuning the same T5 model struggles to match even by using 100% of the dataset.

Multimodal Prompting with Missing Modalities for Visual Recognition

2 code implementations CVPR 2023 Yi-Lun Lee, Yi-Hsuan Tsai, Wei-Chen Chiu, Chen-Yu Lee

In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to finetune on heavy transformer models.

Prompt Learning

Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

1 code implementation CVPR 2023 Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister

Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image.

Attribute Retrieval +2

Neural Spline Search for Quantile Probabilistic Modeling

no code implementations12 Jan 2023 Ruoxi Sun, Chun-Liang Li, Sercan O. Arik, Michael W. Dusenberry, Chen-Yu Lee, Tomas Pfister

Accurate estimation of output quantiles is crucial in many use cases, where it is desired to model the range of possibility.

Attribute quantile regression +2

VRDU: A Benchmark for Visually-rich Document Understanding

no code implementations15 Nov 2022 Zilong Wang, Yichao Zhou, Wei Wei, Chen-Yu Lee, Sandeep Tata

Understanding visually-rich business documents to extract structured data and automate business workflows has been receiving attention both in academia and industry.

document understanding

QueryForm: A Simple Zero-shot Form Entity Query Framework

no code implementations14 Nov 2022 Zifeng Wang, Zizhao Zhang, Jacob Devlin, Chen-Yu Lee, Guolong Su, Hao Zhang, Jennifer Dy, Vincent Perot, Tomas Pfister

Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities.

document understanding Form +1

Prefix Conditioning Unifies Language and Label Supervision

no code implementations CVPR 2023 Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister

In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.

Classification Contrastive Learning +3

Towards Group Robustness in the presence of Partial Group Labels

no code implementations10 Jan 2022 Vishnu Suresh Lokhande, Kihyuk Sohn, Jinsung Yoon, Madeleine Udell, Chen-Yu Lee, Tomas Pfister

Such a requirement is impractical in situations where the data labeling efforts for minority or rare groups are significantly laborious or where the individuals comprising the dataset choose to conceal sensitive information.

Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly Types

2 code implementations21 Dec 2021 Kihyuk Sohn, Jinsung Yoon, Chun-Liang Li, Chen-Yu Lee, Tomas Pfister

We define a distance function between images, each of which is represented as a bag of embeddings, by the Euclidean distance between weighted averaged embeddings.

Anomaly Detection Clustering +2

Learning to Prompt for Continual Learning

5 code implementations CVPR 2022 Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister

The mainstream paradigm behind continual learning has been to adapt the model parameters to non-stationary data distributions, where catastrophic forgetting is the central challenge.

class-incremental learning Class Incremental Learning +2

Invariant Learning with Partial Group Labels

no code implementations29 Sep 2021 Vishnu Suresh Lokhande, Kihyuk Sohn, Jinsung Yoon, Madeleine Udell, Chen-Yu Lee, Tomas Pfister

Such a requirement is impractical in situations where the data labelling efforts for minority or rare groups is significantly laborious or where the individuals comprising the dataset choose to conceal sensitive information.

Unifying Distribution Alignment as a Loss for Imbalanced Semi-supervised Learning

no code implementations29 Sep 2021 Justin Lazarow, Kihyuk Sohn, Chun-Liang Li, Zizhao Zhang, Chen-Yu Lee, Tomas Pfister

While remarkable progress in imbalanced supervised learning has been made recently, less attention has been given to the setting of imbalanced semi-supervised learning (SSL) where not only is a few labeled data provided, but the underlying data distribution can be severely imbalanced.

Pseudo Label

Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts

no code implementations11 Jan 2021 Kunpeng Li, Zizhao Zhang, Guanhang Wu, Xuehan Xiong, Chen-Yu Lee, Zhichao Lu, Yun Fu, Tomas Pfister

To address this issue, we introduce a new method for pre-training video action recognition models using queried web videos.

Action Recognition Pseudo Label +1

Exploring Sub-Pseudo Labels for Learning from Weakly-Labeled Web Videos

no code implementations1 Jan 2021 Kunpeng Li, Zizhao Zhang, Guanhang Wu, Xuehan Xiong, Chen-Yu Lee, Yun Fu, Tomas Pfister

To address this issue, we introduce a new method for pre-training video action recognition models using queried web videos.

Action Recognition Pseudo Label +1

Learning to Branch for Multi-Task Learning

no code implementations ICML 2020 Pengsheng Guo, Chen-Yu Lee, Daniel Ulbricht

Training multiple tasks jointly in one deep network yields reduced latency during inference and better performance over the single-task counterpart by sharing certain layers of a network.

Multi-Task Learning

A Simple Semi-Supervised Learning Framework for Object Detection

7 code implementations10 May 2020 Kihyuk Sohn, Zizhao Zhang, Chun-Liang Li, Han Zhang, Chen-Yu Lee, Tomas Pfister

Semi-supervised learning (SSL) has a potential to improve the predictive performance of machine learning models using unlabeled data.

Ranked #13 on Semi-Supervised Object Detection on COCO 100% labeled data (using extra training data)

Data Augmentation image-classification +5

Sliced Wasserstein Discrepancy for Unsupervised Domain Adaptation

2 code implementations CVPR 2019 Chen-Yu Lee, Tanmay Batra, Mohammad Haris Baig, Daniel Ulbricht

In this work, we connect two distinct concepts for unsupervised domain adaptation: feature distribution alignment between domains by utilizing the task-specific decision boundary and the Wasserstein metric.

General Classification image-classification +5

GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

4 code implementations ICML 2018 Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, Andrew Rabinovich

Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly.

Recursive Recurrent Nets with Attention Modeling for OCR in the Wild

no code implementations CVPR 2016 Chen-Yu Lee, Simon Osindero

We present recursive recurrent neural networks with attention modeling (R$^2$AM) for lexicon-free optical character recognition in natural scene images.

Language Modeling Language Modelling +2

Training Deeper Convolutional Networks with Deep Supervision

1 code implementation11 May 2015 Liwei Wang, Chen-Yu Lee, Zhuowen Tu, Svetlana Lazebnik

One of the most promising ways of improving the performance of deep convolutional neural networks is by increasing the number of convolutional layers.

General Classification

Deeply-Supervised Nets

1 code implementation18 Sep 2014 Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu

Our proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent.

Classification General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.