Search Results for author: Yujia Xie

Found 27 papers, 10 papers with code

Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

1 code implementation27 Nov 2023 Munan Ning, Bin Zhu, Yujia Xie, Bin Lin, Jiaxi Cui, Lu Yuan, Dongdong Chen, Li Yuan

Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries.

Decision Making Question Answering

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

2 code implementations7 Sep 2023 Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James Glass, Pengcheng He

Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i. e., generating content that deviates from facts seen during pretraining.

Predictive Sparse Manifold Transform

no code implementations27 Aug 2023 Yujia Xie, Xinhui Li, Vince D. Calhoun

PSMT incorporates two layers where the first sparse coding layer represents the input sequence as sparse coefficients over an overcomplete dictionary and the second manifold learning layer learns a geometric embedding space that captures topological similarity and dynamic temporal linearity in sparse coefficients.

Interactive Editing for Text Summarization

1 code implementation5 Jun 2023 Yujia Xie, Xun Wang, Si-Qing Chen, Wayne Xiong, Pengcheng He

Summarizing lengthy documents is a common and essential task in our daily lives.

Text Summarization

Album Storytelling with Iterative Story-aware Captioning and Large Language Models

no code implementations22 May 2023 Munan Ning, Yujia Xie, Dongdong Chen, Zeyin Song, Lu Yuan, Yonghong Tian, Qixiang Ye, Li Yuan

One natural approach is to use caption models to describe each photo in the album, and then use LLMs to summarize and rewrite the generated captions into an engaging story.

Summarization with Precise Length Control

no code implementations9 May 2023 Lesly Miculicich, Yujia Xie, Song Wang, Pengcheng He

Many applications of text generation such as summarization benefit from accurately controlling the text length.

Text Generation

Personalized Abstractive Summarization by Tri-agent Generation Pipeline

1 code implementation4 May 2023 Wen Xiao, Yujia Xie, Giuseppe Carenini, Pengcheng He

The inference-only large language model (ChatGPT) serves as both the generator and editor, with a smaller model acting as the instructor to guide output generation.

Abstractive Text Summarization Language Modelling +1

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

no code implementations24 Feb 2023 Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng, Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Liden, Zhou Yu, Weizhu Chen, Jianfeng Gao

Large language models (LLMs), such as ChatGPT, are able to generate human-like, fluent responses for many downstream tasks, e. g., task-oriented dialog and question answering.

Informativeness Open-Domain Question Answering

Look Before You Match: Instance Understanding Matters in Video Object Segmentation

no code implementations CVPR 2023 Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Chuanxin Tang, Xiyang Dai, Yucheng Zhao, Yujia Xie, Lu Yuan, Yu-Gang Jiang

Towards this goal, we present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank.

Instance Segmentation Segmentation +3

Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

1 code implementation CVPR 2023 Shuquan Ye, Yujia Xie, Dongdong Chen, Yichong Xu, Lu Yuan, Chenguang Zhu, Jing Liao

Through our analysis, we find one important reason is that existing large-scale VL datasets do not contain much commonsense knowledge, which motivates us to improve the commonsense of VL-models from the data perspective.

Data Augmentation Retrieval

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

no code implementations15 Sep 2022 Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Luowei Zhou, Yucheng Zhao, Yujia Xie, Ce Liu, Yu-Gang Jiang, Lu Yuan

This paper presents OmniVL, a new foundation model to support both image-language and video-language tasks using one universal architecture.

Ranked #4 on Cross-Modal Retrieval on Flickr30k (using extra training data)

Action Classification Action Recognition +13

Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning

no code implementations3 Jun 2022 Yujia Xie, Luowei Zhou, Xiyang Dai, Lu Yuan, Nguyen Bach, Ce Liu, Michael Zeng

Thanks to the strong zero-shot capability of foundation models, we start by constructing a rich semantic representation of the image (e. g., image tags, object attributes / locations, captions) as a structured textual prompt, called visual clues, using a vision foundation model.

Image Paragraph Captioning Language Modelling +1

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

1 code implementation2 Jun 2022 Yuanze Lin, Yujia Xie, Dongdong Chen, Yichong Xu, Chenguang Zhu, Lu Yuan

Specifically, we observe that in most state-of-the-art knowledge-based VQA methods: 1) visual features are extracted either from the whole image or in a sliding window manner for retrieving knowledge, and the important relationship within/among object regions is neglected; 2) visual features are not well utilized in the final answering model, which is counter-intuitive to some extent.

Question Answering Retrieval +1

K-LITE: Learning Transferable Visual Models with External Knowledge

2 code implementations20 Apr 2022 Sheng Shen, Chunyuan Li, Xiaowei Hu, Jianwei Yang, Yujia Xie, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao

We propose K-LITE, a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in text with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts.

Benchmarking Descriptive +4

Differentiable Top-k with Optimal Transport

no code implementations NeurIPS 2020 Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister

Finding the k largest or smallest elements from a collection of scores, i. e., top-k operation, is an important model component widely used in information retrieval, machine learning, and data mining.

Information Retrieval Retrieval

A Hypergradient Approach to Robust Regression without Correspondence

no code implementations ICLR 2021 Yujia Xie, Yixiu Mao, Simiao Zuo, Hongteng Xu, Xiaojing Ye, Tuo Zhao, Hongyuan Zha

Due to the combinatorial nature of the problem, most existing methods are only applicable when the sample size is small, and limited to linear regression models.

Multi-Object Tracking regression

Differentiable Top-$k$ with Optimal Transport

no code implementations NeurIPS Workshop LMCA 2020 Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister

The top-$k$ operation, i. e., finding the $k$ largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining.

Information Retrieval Retrieval

Conditional Self-Attention for Query-based Summarization

no code implementations18 Feb 2020 Yujia Xie, Tianyi Zhou, Yi Mao, Weizhu Chen

Thereby, the contextual dependencies modeled by CSA will be highly relevant to the query.

Differentiable Top-k Operator with Optimal Transport

no code implementations16 Feb 2020 Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister

The top-k operation, i. e., finding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining.

Information Retrieval Retrieval

Meta Learning with Relational Information for Short Sequences

1 code implementation NeurIPS 2019 Yujia Xie, Haoming Jiang, Feng Liu, Tuo Zhao, Hongyuan Zha

This paper proposes a new meta-learning method -- named HARMLESS (HAwkes Relational Meta LEarning method for Short Sequences) for learning heterogeneous point process models from short event sequence data along with a relational network.

Meta-Learning

On Scalable and Efficient Computation of Large Scale Optimal Transport

no code implementations ICLR Workshop DeepGenStruct 2019 Yujia Xie, Minshuo Chen, Haoming Jiang, Tuo Zhao, Hongyuan Zha

Optimal Transport (OT) naturally arises in many machine learning applications, yet the heavy computational burden limits its wide-spread uses.

Domain Adaptation

Active Image Synthesis for Efficient Labeling

no code implementations5 Feb 2019 Jialei Chen, Yujia Xie, Kan Wang, Chuck Zhang, Mani A. Vannan, Ben Wang, Zhen Qian

The great success achieved by deep neural networks attracts increasing attention from the manufacturing and healthcare communities.

Image Generation Small Data Image Classification

A Fast Proximal Point Method for Computing Exact Wasserstein Distance

1 code implementation12 Feb 2018 Yujia Xie, Xiangfeng Wang, Ruijia Wang, Hongyuan Zha

However, as we will demonstrate, regularized variations with large regularization parameter will degradate the performance in several important machine learning applications, and small regularization parameter will fail due to numerical stability issues with existing algorithms.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.