Search Results for author: Yujie Lu

Found 40 papers, 22 papers with code

MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research

1 code implementation26 May 2025 Hui Chen, Miao Xiong, Yujie Lu, Wei Han, Ailin Deng, Yufei He, Jiaying Wu, Yibo Li, Yue Liu, Bryan Hooi

Recent advancements in AI agents have demonstrated their growing potential to drive and support scientific discovery.

scientific discovery

VITED: Video Temporal Evidence Distillation

no code implementations CVPR 2025 Yujie Lu, Yale Song, William Wang, Lorenzo Torresani, Tushar Nagarajan

We investigate complex video question answering via chain-of-evidence reasoning -- identifying sequences of temporal spans from multiple relevant parts of the video, together with visual evidence within them.

Question Answering Video Question Answering

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

no code implementations16 Jun 2024 Yujie Lu, Dongfu Jiang, Wenhu Chen, William Yang Wang, Yejin Choi, Bill Yuchen Lin

Recent breakthroughs in vision-language models (VLMs) emphasize the necessity of benchmarking human preferences in real-world multimodal interactions.

Benchmarking Spatial Reasoning

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

1 code implementation12 Jun 2024 Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, JianFeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang

Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics.

counterfactual Future prediction +1

Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

no code implementations11 Jun 2024 Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth

We present a novel task and benchmark for evaluating the ability of text-to-image(T2I) generation models to produce images that align with commonsense in real life, which we call Commonsense-T2I.

Adversarial Text Text to Image Generation +2

From Text to Pixel: Advancing Long-Context Understanding in MLLMs

1 code implementation23 May 2024 Yujie Lu, Xiujun Li, Tsu-Jui Fu, Miguel Eckstein, William Yang Wang

The rapid progress in Multimodal Large Language Models (MLLMs) has significantly advanced their ability to process and understand complex visual and textual information.

Language Modeling Language Modelling +4

Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)

1 code implementation5 Apr 2024 Michael Saxon, Fatima Jahara, Mahsa Khoshnoodi, Yujie Lu, Aditya Sharma, William Yang Wang

With advances in the quality of text-to-image (T2I) models has come interest in benchmarking their prompt faithfulness -- the semantic coherence of generated images to the prompts they were conditioned on.

Benchmarking

Unsigned Orthogonal Distance Fields: An Accurate Neural Implicit Representation for Diverse 3D Shapes

1 code implementation CVPR 2024 Yujie Lu, Long Wan, Nayu Ding, Yulong Wang, Shuhan Shen, Shen Cai, Lin Gao

However, common distance field based implicit representations, specifically signed distance field (SDF) for watertight shapes or unsigned distance field (UDF) for arbitrary shapes, routinely suffer from degradation of reconstruction accuracy when converting to explicit surface points and meshes.

Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?

1 code implementation29 Nov 2023 Xiujun Li, Yujie Lu, Zhe Gan, Jianfeng Gao, William Yang Wang, Yejin Choi

Recent multimodal large language models (MLLMs) have shown promising instruction following capabilities on vision-language tasks.

In-Context Learning MM-Vet +1

GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks

no code implementations2 Nov 2023 Xinlu Zhang, Yujie Lu, Weizhi Wang, An Yan, Jun Yan, Lianke Qin, Heng Wang, Xifeng Yan, William Yang Wang, Linda Ruth Petzold

Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details.

Image Generation Image to text

Empowering Psychotherapy with Large Language Models: Cognitive Distortion Detection through Diagnosis of Thought Prompting

1 code implementation11 Oct 2023 Zhiyu Chen, Yujie Lu, William Yang Wang

Mental illness remains one of the most critical public health issues of our time, due to the severe scarcity and accessibility limit of professionals.

ImagenHub: Standardizing the evaluation of conditional image generation models

2 code implementations2 Oct 2023 Max Ku, Tianle Li, Kai Zhang, Yujie Lu, Xingyu Fu, Wenwen Zhuang, Wenhu Chen

Recently, a myriad of conditional image generation and editing models have been developed to serve different downstream tasks, including text-to-image generation, text-guided image editing, subject-driven image generation, control-guided image generation, etc.

Conditional Image Generation text-guided-image-editing +1

Learning Concise and Descriptive Attributes for Visual Recognition

3 code implementations ICCV 2023 An Yan, Yu Wang, Yiwu Zhong, chengyu dong, Zexue He, Yujie Lu, William Wang, Jingbo Shang, Julian McAuley

Recent advances in foundation models present new opportunities for interpretable visual recognition -- one can first query Large Language Models (LLMs) to obtain a set of attributes that describe each class, then apply vision-language models to classify images via these attributes.

Descriptive

Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought

1 code implementation23 May 2023 Vaishnavi Himakunthala, Andy Ouyang, Daniel Rose, Ryan He, Alex Mei, Yujie Lu, Chinmay Sonar, Michael Saxon, William Yang Wang

Despite exciting recent results showing vision-language systems' capacity to reason about images using natural language, their capacity for video reasoning remains under-explored.

Descriptive Video Prediction

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

1 code implementation NeurIPS 2023 Yujie Lu, Xianjun Yang, Xiujun Li, Xin Eric Wang, William Yang Wang

Existing automatic evaluation on text-to-image synthesis can only provide an image-text matching score, without considering the object-level compositionality, which results in poor correlation with human judgments.

Attribute Image Generation +2

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation

no code implementations18 May 2023 Wanrong Zhu, Xinyi Wang, Yujie Lu, Tsu-Jui Fu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

We conduct a series of experiments to compare the common edits made by humans and GPT-k, evaluate the performance of GPT-k in prompting T2I, and examine factors that may influence this process.

Text Generation Text to Image Generation +1

Multimodal Procedural Planning via Dual Text-Image Prompting

1 code implementation2 May 2023 Yujie Lu, Pan Lu, Zhiyu Chen, Wanrong Zhu, Xin Eric Wang, William Yang Wang

The key challenges of MPP are to ensure the informativeness, temporal coherence, and accuracy of plans across modalities.

Image to text Informativeness +2

Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks

1 code implementation27 Oct 2022 Edwin Zhang, Yujie Lu, Shinda Huang, William Wang, Amy Zhang

Training generalist agents is difficult across several axes, requiring us to deal with high-dimensional inputs (space), long horizons (time), and generalization to novel tasks.

reinforcement-learning Reinforcement Learning (RL)

WikiWhy: Answering and Explaining Cause-and-Effect Questions

no code implementations21 Oct 2022 Matthew Ho, Aditya Sharma, Justin Chang, Michael Saxon, Sharon Levy, Yujie Lu, William Yang Wang

As large language models (LLMs) grow larger and more sophisticated, assessing their "reasoning" capabilities in natural language grows more challenging.

Question Answering

ULN: Towards Underspecified Vision-and-Language Navigation

1 code implementation18 Oct 2022 Weixi Feng, Tsu-Jui Fu, Yujie Lu, William Yang Wang

Vision-and-Language Navigation (VLN) is a task to guide an embodied agent moving to a target position using language instructions.

Vision and Language Navigation

Structured Knowledge Grounding for Question Answering

no code implementations17 Sep 2022 Yujie Lu, Siqi Ouyang, Kairui Zhou

In this paper, we propose to solely leverage the LMs to combine the language and knowledge for knowledge based question-answering with flexibility, breadth of coverage and structured reasoning.

Knowledge Graphs Open-Ended Question Answering +1

Anticipating the Unseen Discrepancy for Vision and Language Navigation

no code implementations10 Sep 2022 Yujie Lu, Huiliang Zhang, Ping Nie, Weixi Feng, Wenda Xu, Xin Eric Wang, William Yang Wang

In this paper, we propose an Unseen Discrepancy Anticipating Vision and Language Navigation (DAVIS) that learns to generalize to unseen environments via encouraging test-time visual consistency.

Data Augmentation Decision Making +3

Few-Shot Document-Level Event Argument Extraction

1 code implementation6 Sep 2022 Xianjun Yang, Yujie Lu, Linda Petzold

To fill this gap, we present FewDocAE, a Few-Shot Document-Level Event Argument Extraction benchmark, based on the existing document-level event extraction dataset.

Document-level Event Extraction Event Argument Extraction +2

Re4: Learning to Re-contrast, Re-attend, Re-construct for Multi-interest Recommendation

1 code implementation17 Aug 2022 Shengyu Zhang, Lingxiao Yang, Dong Yao, Yujie Lu, Fuli Feng, Zhou Zhao, Tat-Seng Chua, Fei Wu

Specifically, Re4 encapsulates three backward flows, i. e., 1) Re-contrast, which drives each interest embedding to be distinct from other interests using contrastive learning; 2) Re-attend, which ensures the interest-item correlation estimation in the forward flow to be consistent with the criterion used in final recommendation; and 3) Re-construct, which ensures that each interest embedding can semantically reflect the information of representative items that relate to the corresponding interest.

Contrastive Learning Recommendation Systems

Neuro-Symbolic Procedural Planning with Commonsense Prompting

no code implementations6 Jun 2022 Yujie Lu, Weixi Feng, Wanrong Zhu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

Procedural planning aims to implement complex high-level goals by decomposition into sequential simpler low-level steps.

Graph Sampling

Imagination-Augmented Natural Language Understanding

1 code implementation NAACL 2022 Yujie Lu, Wanrong Zhu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

Human brains integrate linguistic and perceptual information simultaneously to understand natural language, and hold the critical ability to render imaginations.

Natural Language Understanding

AstBERT: Enabling Language Model for Financial Code Understanding with Abstract Syntax Trees

no code implementations20 Jan 2022 Rong Liang, Tiehua Zhang, Yujie Lu, Yuze Liu, Zhen Huang, Xin Chen

Specifically, we collect a sheer number of source codes (both Java and Python) from the Alipay code repository and incorporate both syntactic and semantic code knowledge into our model through the help of code parsers, in which AST information of the source codes can be interpreted and integrated.

Clone Detection Code Search +3

High-fidelity 3D Model Compression based on Key Spheres

1 code implementation19 Jan 2022 Yuanzhan Li, Yuqi Liu, Yujie Lu, Siyu Zhang, Shen Cai, Yanting Zhang

Compared to previous works, our method achieves the high-fidelity and high-compression 3D object coding and reconstruction.

Model Compression Object +1

MIC: Model-agnostic Integrated Cross-channel Recommenders

no code implementations22 Oct 2021 Yujie Lu, Ping Nie, Shengyu Zhang, Ming Zhao, Ruobing Xie, William Yang Wang, Yi Ren

However, existing work are primarily built upon pre-defined retrieval channels, including User-CF (U2U), Item-CF (I2I), and Embedding-based Retrieval (U2I), thus access to the limited correlation between users and items which solely entail from partial information of latent interactions.

model Recommendation Systems +3

Federated Natural Language Generation for Personalized Dialogue System

no code implementations13 Oct 2021 Yujie Lu, Chao Huang, Huanli Zhan, Yong Zhuang

FedNLG first pre-trains parameters of standard neural conversational model over a large dialogue corpus, and then fine-tune the model parameters and persona embeddings on specific datasets, in a federated manner.

Text Generation

Multi-trends Enhanced Dynamic Micro-video Recommendation

no code implementations8 Oct 2021 Yujie Lu, Yingxuan Huang, Shengyu Zhang, Wei Han, Hui Chen, Zhou Zhao, Fei Wu

In this paper, we propose the DMR framework to explicitly model dynamic multi-trends of users' current preference and make predictions based on both the history and future potential trends.

Recommendation Systems

RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms

1 code implementation3 Nov 2020 Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, Yingqian Min, Zhichao Feng, Xinyan Fan, Xu Chen, Pengfei Wang, Wendi Ji, Yaliang Li, Xiaoling Wang, Ji-Rong Wen

In this library, we implement 73 recommendation models on 28 benchmark datasets, covering the categories of general recommendation, sequential recommendation, context-aware recommendation and knowledge-based recommendation.

Collaborative Filtering Sequential Recommendation

Future-Aware Diverse Trends Framework for Recommendation

1 code implementation1 Nov 2020 Yujie Lu, Shengyu Zhang, Yingxuan Huang, Luyao Wang, Xinyao Yu, Zhou Zhao, Fei Wu

By diverse trends, supposing the future preferences can be diversified, we propose the diverse trends extractor and the time-aware mechanism to represent the possible trends of preferences for a given user with multiple vectors.

Representation Learning Sequential Recommendation

CLOUD: Contrastive Learning of Unsupervised Dynamics

no code implementations23 Oct 2020 Jianren Wang, Yujie Lu, Hang Zhao

Developing agents that can perform complex control tasks from high dimensional observations such as pixels is challenging due to difficulties in learning dynamics efficiently.

Contrastive Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.