Search Results for author: Tat-Seng Chua

Found 303 papers, 191 papers with code

Re-examining the Role of Schema Linking in Text-to-SQL

no code implementations EMNLP 2020 Wenqiang Lei, Weixin Wang, Zhixin Ma, Tian Gan, Wei Lu, Min-Yen Kan, Tat-Seng Chua

By providing a schema linking corpus based on the Spider text-to-SQL dataset, we systematically study the role of schema linking.

Text-To-SQL

Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation

1 code implementation13 Jan 2025 Han Liu, Yinwei Wei, Fan Liu, Wenjie Wang, Liqiang Nie, Tat-Seng Chua

In this paper, we develop a novel meta-learning-based multimodal fusion framework called Meta Multimodal Fusion (MetaMMF), which dynamically assigns parameters to the multimodal fusion function for each micro-video during its representation learning.

Meta-Learning Multimodal Recommendation +1

How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond

no code implementations10 Jan 2025 Chen Huang, Yang Deng, Wenqiang Lei, Jiancheng Lv, Tat-Seng Chua, Jimmy Xiangji Huang

With the advancement of large language models (LLMs), intelligent models have evolved from mere tools to autonomous agents with their own goals and strategies for cooperating with humans.

Aligning Large Language Models for Faithful Integrity Against Opposing Argument

1 code implementation2 Jan 2025 Yong Zhao, Yang Deng, See-Kiong Ng, Tat-Seng Chua

In this work, we propose a novel framework, named Alignment for Faithful Integrity with Confidence Estimation (AFICE), which aims to align the LLM responses with faithful integrity.

Towards Modality Generalization: A Benchmark and Prospective Analysis

1 code implementation24 Dec 2024 Xiaohao Liu, Xiaobo Xia, Zhuo Huang, Tat-Seng Chua

Multi-modal learning has achieved remarkable success by integrating information from various modalities, achieving superior performance in tasks like recognition and retrieval compared to uni-modal approaches.

Length Controlled Generation for Black-box LLMs

no code implementations19 Dec 2024 Yuxuan Gu, Wenjie Wang, Xiaocheng Feng, Weihong Zhong, Kun Zhu, Lei Huang, Tat-Seng Chua, Bing Qin

Large language models (LLMs) have demonstrated impressive instruction following capabilities, while still struggling to accurately manage the length of the generated text, which is a fundamental requirement in many real-world applications.

Abstractive Text Summarization Instruction Following

Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence

no code implementations18 Dec 2024 Jinghan He, Kuan Zhu, Haiyun Guo, Junfeng Fang, Zhenglin Hua, Yuheng Jia, Ming Tang, Tat-Seng Chua, Jinqiao Wang

Large vision-language models (LVLMs) have made substantial progress in integrating large language models (LLMs) with visual inputs, enabling advanced multimodal reasoning.

Hallucination Multimodal Reasoning

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

1 code implementation18 Dec 2024 YiPeng Zhang, Yifan Liu, Zonghao Guo, Yidan Zhang, Xuesong Yang, Chi Chen, Jun Song, Bo Zheng, Yuan YAO, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

To address this issue, we present LLaVA-UHD v2, an advanced MLLM centered around a Hierarchical window transformer that enables capturing diverse visual granularity by constructing and integrating a high-resolution feature pyramid.

Attribute Text Generation

Knowledge Boundary of Large Language Models: A Survey

no code implementations17 Dec 2024 Moxin Li, Yong Zhao, Yang Deng, Wenxuan Zhang, Shuaiyi Li, Wenya Xie, See-Kiong Ng, Tat-Seng Chua

Although large language models (LLMs) store vast amount of knowledge in their parameters, they still have limitations in the memorization and utilization of certain knowledge, leading to undesired behaviors such as generating untruthful and inaccurate responses.

Memorization Survey

Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning

no code implementations15 Dec 2024 Shengqiong Wu, Hao Fei, Liangming Pan, William Yang Wang, Shuicheng Yan, Tat-Seng Chua

Our framework systematically addresses potential issues in both visual and textual inputs by verifying and integrating perception-level information with cognition-level commonsense knowledge, ensuring more reliable outputs.

Hallucination

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation

no code implementations8 Dec 2024 Leigang Qu, Haochuan Li, Wenjie Wang, Xiang Liu, Juncheng Li, Liqiang Nie, Tat-Seng Chua

To adapt SILMM to LMMs with continuous features, we propose a diversity mechanism to obtain diverse representations and a kernel-based continuous DPO for alignment.

Diversity Prompt Engineering +1

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training

no code implementations29 Nov 2024 Haiyi Qiu, Minghe Gao, Long Qian, Kaihang Pan, Qifan Yu, Juncheng Li, Wenjie Wang, Siliang Tang, Yueting Zhuang, Tat-Seng Chua

Video Large Language Models (Video-LLMs) have recently shown strong performance in basic video understanding tasks, such as captioning and coarse-grained question answering, but struggle with compositional reasoning that requires multi-step spatio-temporal inference across object relations, interactions, and events.

Question Answering Video Understanding

Headache to Overstock? Promoting Long-tail Items through Debiased Product Bundling

no code implementations28 Nov 2024 Shuo Xu, Haokai Ma, Yunshan Ma, Xiaohao Liu, Lei Meng, Xiangxu Meng, Tat-Seng Chua

The inherent popularity bias in the pre-extracted user feedback features and the insufficient utilization of other popularity-independent knowledge may force the conventional bundling methods to find more popular items, thereby struggling with this long-tail bundling scenario.

Knowledge Distillation Navigate +1

Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models

1 code implementation18 Nov 2024 Chenhang Cui, Gelei Deng, An Zhang, Jingnan Zheng, Yicong Li, Lianli Gao, Tianwei Zhang, Tat-Seng Chua

Recent advances in Large Vision-Language Models (LVLMs) have showcased strong reasoning abilities across multiple modalities, achieving significant breakthroughs in various real-world applications.

Response Generation

Self-Calibrated Listwise Reranking with Large Language Models

no code implementations7 Nov 2024 Ruiyang Ren, Yuhao Wang, Kun Zhou, Wayne Xin Zhao, Wenjie Wang, Jing Liu, Ji-Rong Wen, Tat-Seng Chua

Large language models (LLMs), with advanced linguistic capabilities, have been employed in reranking tasks through a sequence-to-sequence approach.

Show Me What and Where has Changed? Question Answering and Grounding for Remote Sensing Change Detection

1 code implementation31 Oct 2024 Ke Li, Fuyu Dong, Di Wang, Shaofeng Li, Quan Wang, Xinbo Gao, Tat-Seng Chua

Furthermore, we present VisTA, a simple yet effective baseline method that unifies the tasks of question answering and grounding by delivering both visual and textual answers.

Change Detection Question Answering +1

Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation

1 code implementation30 Oct 2024 Yang Zhang, Juntao You, Yimeng Bai, Jizhi Zhang, Keqin Bao, Wenjie Wang, Tat-Seng Chua

Recent advancements in recommender systems have focused on leveraging Large Language Models (LLMs) to improve user preference modeling, yielding promising outcomes.

counterfactual Counterfactual Reasoning +1

Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector

1 code implementation30 Oct 2024 Youcheng Huang, Fengbin Zhu, Jingkun Tang, Pan Zhou, Wenqiang Lei, Jiancheng Lv, Tat-Seng Chua

With the new RADAR dataset, we further develop a novel and effective iN-time Embedding-based AdveRSarial Image DEtection (NEARSIDE) method, which exploits a single vector that distilled from the hidden states of VLMs, which we call the attacking direction, to achieve the detection of adversarial images against benign ones in the input.

Large Language Models Empowered Personalized Web Agents

no code implementations22 Oct 2024 Hongru Cai, Yongqi Li, Wenjie Wang, Fengbin Zhu, Xiaoyu Shen, Wenjie Li, Tat-Seng Chua

To overcome the limitation, we first formulate the task of LLM-empowered personalized Web agents, which integrate personalized data and user instructions to personalize instruction comprehension and action execution.

Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment

no code implementations18 Oct 2024 Chenhang Cui, An Zhang, Yiyang Zhou, Zhaorun Chen, Gelei Deng, Huaxiu Yao, Tat-Seng Chua

The recent advancements in large language models (LLMs) and pre-trained vision models have accelerated the development of vision-language large models (VLLMs), enhancing the interaction between visual and linguistic modalities.

Preference Diffusion for Recommendation

1 code implementation17 Oct 2024 Shuo Liu, An Zhang, Guoqing Hu, Hong Qian, Tat-Seng Chua

Recommender systems predict personalized item rankings based on user preference distributions derived from historical behavior data.

Sequential Recommendation Variational Inference

Addressing Heterogeneity and Heterophily in Graphs: A Heterogeneous Heterophilic Spectral Graph Neural Network

no code implementations17 Oct 2024 Kangkang Lu, Yanhua Yu, Zhiyong Huang, Jia Li, Yuling Wang, Meiyu Liang, Xiting Qin, Yimeng Ren, Tat-Seng Chua, Xidian Wang

Specifically, we propose a Heterogeneous Heterophilic Spectral Graph Neural Network (H2SGNN), which employs a dual-module approach: local independent filtering and global hybrid filtering.

Graph Neural Network

PSVMA+: Exploring Multi-granularity Semantic-visual Adaption for Generalized Zero-shot Learning

no code implementations15 Oct 2024 Man Liu, Huihui Bai, Feng Li, Chunjie Zhang, Yunchao Wei, Meng Wang, Tat-Seng Chua, Yao Zhao

Generalized zero-shot learning (GZSL) endeavors to identify the unseen categories using knowledge from the seen domain, necessitating the intrinsic interactions between the visual features and attribute semantic features.

Attribute Diversity +1

Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

no code implementations8 Oct 2024 Hao Fei, Shengqiong Wu, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan

Recent developments of vision large language models (LLMs) have seen remarkable progress, yet still encounter challenges towards multimodal generalists, such as coarse-grained instance-level understanding, lack of unified support for both images and videos, and insufficient coverage across various vision tasks.

Efficient Inference for Large Language Model-based Generative Recommendation

no code implementations7 Oct 2024 Xinyu Lin, Chaoqun Yang, Wenjie Wang, Yongqi Li, Cunxiao Du, Fuli Feng, See-Kiong Ng, Tat-Seng Chua

To alleviate this, we consider 1) boosting top-K sequence alignment between the draft model and the target LLM, and 2) relaxing the verification strategy to reduce trivial LLM calls.

Language Modeling Language Modelling +1

Temporal Relational Reasoning of Large Language Models for Detecting Stock Portfolio Crashes

no code implementations7 Oct 2024 Kelvin J. L. Koa, Yunshan Ma, Ritchie Ng, Huanhuan Zheng, Tat-Seng Chua

Stock portfolios are often exposed to rare consequential events (e. g., 2007 global financial crisis, 2020 COVID-19 stock market crash), as they do not have enough historical information to learn from.

Relational Reasoning

AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models

2 code implementations3 Oct 2024 Junfeng Fang, Houcheng Jiang, Kun Wang, Yunshan Ma, Xiang Wang, Xiangnan He, Tat-Seng Chua

To address this, we introduce AlphaEdit, a novel solution that projects perturbation onto the null space of the preserved knowledge before applying it to the parameters.

knowledge editing

Grammar Induction from Visual, Speech and Text

no code implementations1 Oct 2024 Yu Zhao, Hao Fei, Shengqiong Wu, Meishan Zhang, Min Zhang, Tat-Seng Chua

Grammar Induction could benefit from rich heterogeneous signals, such as text, vision, and acoustics.

MASKDROID: Robust Android Malware Detection with Masked Graph Representations

1 code implementation29 Sep 2024 Jingnan Zheng, Jiaohao Liu, An Zhang, Jun Zeng, Ziqi Yang, Zhenkai Liang, Tat-Seng Chua

Among the various tools employed in malware detection, graph representations (e. g., function call graphs) have played a pivotal role in characterizing the behaviors of Android apps.

Android Malware Detection Graph Neural Network +1

Scene-Text Grounding for Text-Based Video Question Answering

1 code implementation22 Sep 2024 Sheng Zhou, Junbin Xiao, Xun Yang, Peipei Song, Dan Guo, Angela Yao, Meng Wang, Tat-Seng Chua

In this paper, we propose to study Grounded TextVideoQA by forcing models to answer questions and spatio-temporally localize the relevant scene-text regions, thus decoupling QA from scenetext recognition and promoting research towards interpretable QA.

2k Contrastive Learning +3

Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations

1 code implementation22 Sep 2024 Peixin Qin, Chen Huang, Yang Deng, Wenqiang Lei, Tat-Seng Chua

With the aid of large language models, current conversational recommender system (CRS) has gaining strong abilities to persuade users to accept recommended items.

Explanation Generation Recommendation Systems

ExpLLM: Towards Chain of Thought for Facial Expression Recognition

no code implementations4 Sep 2024 Xing Lan, Jian Xue, Ji Qi, Dongmei Jiang, Ke Lu, Tat-Seng Chua

Specifically, we have designed the CoT mechanism from three key perspectives: key observations, overall emotional interpretation, and conclusion.

Facial Expression Recognition Facial Expression Recognition (FER)

MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models

1 code implementation8 Aug 2024 Haoxuan Li, Zhengmao Yang, Yunshan Ma, Yi Bin, Yang Yang, Tat-Seng Chua

We study an emerging and intriguing problem of multimodal temporal event forecasting with large language models.

Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation

no code implementations24 Jul 2024 Yongqi Li, Hongru Cai, Wenjie Wang, Leigang Qu, Yinwei Wei, Wenjie Li, Liqiang Nie, Tat-Seng Chua

Despite its great potential, existing generative approaches are limited due to the following issues: insufficient visual information in identifiers, misalignment with high-level semantics, and learning gap towards the retrieval target.

Avg Cross-Modal Retrieval +2

Fine-tuning Multimodal Large Language Models for Product Bundling

no code implementations16 Jul 2024 Xiaohao Liu, Jie Wu, Zhulin Tao, Yunshan Ma, Yinwei Wei, Tat-Seng Chua

Recent advances in product bundling have leveraged multimodal information through sophisticated encoders, but remain constrained by limited semantic understanding and a narrow scope of knowledge.

In-Context Learning Multiple-choice

Disentangling Masked Autoencoders for Unsupervised Domain Generalization

1 code implementation10 Jul 2024 An Zhang, Han Wang, Xiang Wang, Tat-Seng Chua

Domain Generalization (DG), designed to enhance out-of-distribution (OOD) generalization, is all about learning invariance against domain shifts utilizing sufficient supervision signals.

Domain Generalization

Language Representations Can be What Recommenders Need: Findings and Potentials

1 code implementation7 Jul 2024 Leheng Sheng, An Zhang, Yi Zhang, Yuxin Chen, Xiang Wang, Tat-Seng Chua

Contrary to prevailing understanding that LMs and traditional recommenders learn two distinct representation spaces due to the huge gap in language and behavior modeling objectives, this work re-examines such understanding and explores extracting a recommendation space directly from the language representation space.

Collaborative Filtering Contrastive Learning +4

Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment

no code implementations27 Jun 2024 Hao Fei, Shengqiong Wu, Meishan Zhang, Min Zhang, Tat-Seng Chua, Shuicheng Yan

Then, an SG-based framework is built, where the textual SG (TSG) is encoded with a graph Transformer, while the video dynamic SG (DSG) and the HSG are modeled with a novel recurrent graph Transformer for spatial and temporal feature propagation.

LARP: Language Audio Relational Pre-training for Cold-Start Playlist Continuation

1 code implementation20 Jun 2024 Rebecca Salganik, Xiaohao Liu, Yunshan Ma, Jian Kang, Tat-Seng Chua

As online music consumption increasingly shifts towards playlist-based listening, the task of playlist continuation, in which an algorithm suggests songs to extend a playlist in a personalized and musically cohesive manner, has become vital to the success of music streaming.

Collaborative Filtering Contrastive Learning +1

Ask-before-Plan: Proactive Language Agents for Real-World Planning

1 code implementation18 Jun 2024 Xuan Zhang, Yang Deng, Zifeng Ren, See-Kiong Ng, Tat-Seng Chua

In this work, we introduce a new task, Proactive Agent Planning, which requires language agents to predict clarification needs based on user-agent conversation and agent-environment interaction, invoke external tools to collect valid information, and generate a plan to fulfill the user's demands.

Decision Making valid

On Softmax Direct Preference Optimization for Recommendation

1 code implementation13 Jun 2024 Yuxin Chen, Junfei Tan, An Zhang, Zhengyi Yang, Leheng Sheng, Enzhi Zhang, Xiang Wang, Tat-Seng Chua

Specifically, we incorporate multiple negatives in user preference data and devise an alternative version of DPO loss tailored for LM-based recommenders, which is extended from the traditional full-ranking Plackett-Luce (PL) model to partial rankings and connected to softmax sampling strategies.

Language Modeling Language Modelling +2

Unified Text-to-Image Generation and Retrieval

no code implementations9 Jun 2024 Leigang Qu, Haochuan Li, Tan Wang, Wenjie Wang, Yongqi Li, Liqiang Nie, Tat-Seng Chua

Subsequently, we unify generation and retrieval in an autoregressive generation way and propose an autonomous decision module to choose the best-matched one between generated and retrieved images as the response to the text query.

Image Retrieval Retrieval +1

Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

1 code implementation9 Jun 2024 Hao Li, Chenghao Yang, An Zhang, Yang Deng, Xiang Wang, Tat-Seng Chua

Crucial to addressing this real-world need are event summary and persona management, which enable reasoning for appropriate long-term dialogue responses.

Response Generation Retrieval

Towards Semantic Equivalence of Tokenization in Multimodal LLM

no code implementations7 Jun 2024 Shengqiong Wu, Hao Fei, Xiangtai Li, Jiayi Ji, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan

The resulting vision tokens effectively preserve semantic integrity and capture both low-frequency and high-frequency visual features.

Visual Question Answering

Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning

no code implementations5 Jun 2024 Man Liu, Huihui Bai, Feng Li, Chunjie Zhang, Yunchao Wei, Tat-Seng Chua, Yao Zhao

Zero-shot learning (ZSL) endeavors to transfer knowledge from seen categories to recognize unseen categories, which mostly relies on the semantic-visual interactions between image and attribute tokens.

Attribute Domain Generalization +4

Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

no code implementations31 May 2024 Shengyu Zhang, Ziqi Jiang, Jiangchao Yao, Fuli Feng, Kun Kuang, Zhou Zhao, Shuo Li, Hongxia Yang, Tat-Seng Chua, Fei Wu

The emerging causal recommendation methods achieve this by modeling the causal effect between user behaviors, however potentially neglect unobserved confounders (\eg, friend suggestions) that are hard to measure in practice.

Recommendation Systems

ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation

1 code implementation23 May 2024 Jingnan Zheng, Han Wang, An Zhang, Tai D. Nguyen, Jun Sun, Tat-Seng Chua

Systematic analysis also validates that the generated test scenarios represent meaningful use cases, as well as integrate enhanced measures to probe long-tail risks.

ReactXT: Understanding Molecular "Reaction-ship" via Reaction-Contextualized Molecule-Text Pretraining

1 code implementation23 May 2024 Zhiyuan Liu, Yaorui Shi, An Zhang, Sihang Li, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

To resolve the challenges above, we propose a new pretraining method, ReactXT, for reaction-text modeling, and a new dataset, OpenExp, for experimental procedure prediction.

Molecule Captioning Retrosynthesis

ProtT3: Protein-to-Text Generation for Text-based Protein Understanding

1 code implementation21 May 2024 Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

ProtT3 empowers an LM to understand protein sequences of amino acids by incorporating a PLM as its protein understanding module, enabling effective protein-to-text generation.

Property Prediction Question Answering +2

Co-Matching: Towards Human-Machine Collaborative Legal Case Matching

no code implementations16 May 2024 Chen Huang, Xinwei Yang, Yang Deng, Wenqiang Lei, Jiancheng Lv, Tat-Seng Chua

However, successful legal case matching requires the tacit knowledge of legal practitioners, which is difficult to verbalize and encode into machines.

Learnable Item Tokenization for Generative Recommendation

1 code implementation12 May 2024 Wenjie Wang, Honghui Bao, Xinyu Lin, Jizhi Zhang, Yongqi Li, Fuli Feng, See-Kiong Ng, Tat-Seng Chua

Utilizing powerful Large Language Models (LLMs) for generative recommendation has attracted much attention.

Diversity World Knowledge

A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models

no code implementations10 May 2024 Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li

Given the powerful abilities of RAG in providing the latest and helpful auxiliary information, Retrieval-Augmented Large Language Models (RA-LLMs) have emerged to harness external and authoritative knowledge bases, rather than solely relying on the model's internal knowledge, to augment the generation quality of LLMs.

Information Retrieval RAG +1

Auto-Encoding Morph-Tokens for Multimodal LLM

1 code implementation3 May 2024 Kaihang Pan, Siliang Tang, Juncheng Li, Zhaoyu Fan, Wei Chow, Shuicheng Yan, Tat-Seng Chua, Yueting Zhuang, Hanwang Zhang

For multimodal LLMs, the synergy of visual comprehension (textual output) and generation (visual output) presents an ongoing challenge.

Image Reconstruction MORPH

A Taxation Perspective for Fair Re-ranking

1 code implementation27 Apr 2024 Chen Xu, Xiaopeng Ye, Wenjie Wang, Liang Pang, Jun Xu, Tat-Seng Chua

From a taxation perspective, we theoretically demonstrate that most previous fair re-ranking methods can be reformulated as an item-level tax policy.

Ethics Fairness +1

A Survey of Generative Search and Recommendation in the Era of Large Language Models

no code implementations25 Apr 2024 Yongqi Li, Xinyu Lin, Wenjie Wang, Fuli Feng, Liang Pang, Wenjie Li, Liqiang Nie, Xiangnan He, Tat-Seng Chua

With the information explosion on the Web, search and recommendation are foundational infrastructures to satisfying users' information needs.

Towards Human-centered Proactive Conversational Agents

no code implementations19 Apr 2024 Yang Deng, Lizi Liao, Zhonghua Zheng, Grace Hui Yang, Tat-Seng Chua

Recent research on proactive conversational agents (PCAs) mainly focuses on improving the system's capabilities in anticipating and planning action sequences to accomplish tasks and achieve goals before users articulate their requests.

Information Retrieval Retrieval

Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales

no code implementations17 Apr 2024 Minghe Gao, Shuang Chen, Liang Pang, Yuan YAO, Jisheng Dang, Wenqiao Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang, Tat-Seng Chua

Their ability to execute intricate compositional reasoning tasks is also constrained, culminating in a stagnation of learning progression for these models.

Hallucination

Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors

1 code implementation4 Apr 2024 Chen Huang, Peixin Qin, Yang Deng, Wenqiang Lei, Jiancheng Lv, Tat-Seng Chua

The conversational recommendation system (CRS) has been criticized regarding its user experience in real-world scenarios, despite recent significant progress achieved in academia.

Conversational Recommendation Recommendation Systems

A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning

1 code implementation22 Mar 2024 Changmeng Zheng, Dayong Liang, WengYu Zhang, Xiao-Yong Wei, Tat-Seng Chua, Qing Li

The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images.

Multimodal Reasoning

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

1 code implementation18 Mar 2024 Ruyi Xu, Yuan YAO, Zonghao Guo, Junbo Cui, Zanlin Ni, Chunjiang Ge, Tat-Seng Chua, Zhiyuan Liu, Maosong Sun, Gao Huang

To address the challenges, we present LLaVA-UHD, a large multimodal model that can efficiently perceive images in any aspect ratio and high resolution.

Long-Context Understanding TextVQA

Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection

no code implementations15 Mar 2024 Moxin Li, Wenjie Wang, Fuli Feng, Fengbin Zhu, Qifan Wang, Tat-Seng Chua

Self-detection for Large Language Models (LLMs) seeks to evaluate the trustworthiness of the LLM's output by leveraging its own capabilities, thereby alleviating the issue of output hallucination.

Hallucination Language Modelling +1

Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation

no code implementations11 Mar 2024 Tong Zhang, Chen Huang, Yang Deng, Hongru Liang, Jia Liu, Zujie Wen, Wenqiang Lei, Tat-Seng Chua

We investigate non-collaborative dialogue agents, which are expected to engage in strategic conversations with diverse users, for securing a mutual agreement that leans favorably towards the system's objectives.

User Simulation

FashionReGen: LLM-Empowered Fashion Report Generation

1 code implementation11 Mar 2024 Yujuan Ding, Yunshan Ma, Wenqi Fan, Yige Yao, Tat-Seng Chua, Qing Li

Fashion analysis refers to the process of examining and evaluating trends, styles, and elements within the fashion industry to understand and interpret its current state, generating fashion reports.

Discriminative Probing and Tuning for Text-to-Image Generation

no code implementations CVPR 2024 Leigang Qu, Wenjie Wang, Yongqi Li, Hanwang Zhang, Liqiang Nie, Tat-Seng Chua

We present a discriminative adapter built on T2I models to probe their discriminative abilities on two representative tasks and leverage discriminative fine-tuning to improve their text-image alignment.

Text-to-Image Generation

Contrastive Pre-training for Deep Session Data Understanding

no code implementations5 Mar 2024 Zixuan Li, Lizi Liao, Yunshan Ma, Tat-Seng Chua

In this work, we delve into deep session data understanding via scrutinizing the various clues inside the rich information in user sessions.

Contrastive Learning

Uplift Modeling for Target User Attacks on Recommender Systems

1 code implementation5 Mar 2024 Wenjie Wang, Changsheng Wang, Fuli Feng, Wentao Shi, Daizong Ding, Tat-Seng Chua

UBA estimates the treatment effect on each target user and optimizes the allocation of fake user budgets to maximize the attack performance.

Recommendation Systems

Learning to Ask Critical Questions for Assisting Product Search

no code implementations5 Mar 2024 Zixuan Li, Lizi Liao, Tat-Seng Chua

In this paper, we propose a dual-learning model that hybrids the best from both implicit session feedback and proactively clarifying with users on the most critical questions.

Information Retrieval

Abductive Ego-View Accident Video Understanding for Safe Driving Perception

no code implementations CVPR 2024 Jianwu Fang, Lei-Lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu, Chen Lv, Jianru Xue, Tat-Seng Chua

This model involves a contrastive interaction loss to learn the pair co-occurrence of normal, near-accident, accident frames with the corresponding text descriptions, such as accident reasons, prevention advice, and accident categories.

Object object-detection +3

A Survey on Neural Question Generation: Methods, Applications, and Prospects

no code implementations28 Feb 2024 Shasha Guo, Lizi Liao, Cuiping Li, Tat-Seng Chua

In this survey, we present a detailed examination of the advancements in Neural Question Generation (NQG), a field leveraging neural network techniques to generate relevant questions from diverse inputs like knowledge bases, texts, and images.

Question Generation Question-Generation +1

Prospect Personalized Recommendation on Large Language Model-based Agent Platform

1 code implementation28 Feb 2024 Jizhi Zhang, Keqin Bao, Wenjie Wang, Yang Zhang, Wentao Shi, Wanhong Xu, Fuli Feng, Tat-Seng Chua

Additionally, we prospect the evolution of Rec4Agentverse and conceptualize it into three stages based on the enhancement of the interaction and information exchange among Agent Items, Agent Recommender, and the user.

Language Modeling Language Modelling +2

On the Multi-turn Instruction Following for Conversational Web Agents

1 code implementation23 Feb 2024 Yang Deng, Xuan Zhang, Wenxuan Zhang, Yifei Yuan, See-Kiong Ng, Tat-Seng Chua

Web agents powered by Large Language Models (LLMs) have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks.

Conversational Web Navigation Instruction Following

Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations

1 code implementation23 Feb 2024 Yang Deng, Yong Zhao, Moxin Li, See-Kiong Ng, Tat-Seng Chua

Despite the remarkable abilities of Large Language Models (LLMs) to answer questions, they often display a considerable level of overconfidence even when the question does not have a definitive answer.

GraphEdit: Large Language Models for Graph Structure Learning

1 code implementation23 Feb 2024 Zirui Guo, Lianghao Xia, Yanhua Yu, Yuling Wang, Zixuan Yang, Wei Wei, Liang Pang, Tat-Seng Chua, Chao Huang

Graph Structure Learning (GSL) focuses on capturing intrinsic dependencies and interactions among nodes in graph-structured data by generating novel graph structures.

Graph structure learning

Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond

no code implementations16 Feb 2024 Yongqi Li, Wenjie Wang, Leigang Qu, Liqiang Nie, Wenjie Li, Tat-Seng Chua

Building upon this capability, we propose to enable multimodal large language models (MLLMs) to memorize and recall images within their parameters.

Cross-Modal Retrieval Retrieval

Distillation Enhanced Generative Retrieval

1 code implementation16 Feb 2024 Yongqi Li, Zhen Zhang, Wenjie Wang, Liqiang Nie, Wenjie Li, Tat-Seng Chua

Generative retrieval is a promising new paradigm in text retrieval that generates identifier strings of relevant passages as the retrieval target.

Text Retrieval

LLM-based Federated Recommendation

no code implementations15 Feb 2024 Jujia Zhao, Wenjie Wang, Chen Xu, Zhaochun Ren, See-Kiong Ng, Tat-Seng Chua

Nevertheless, applying Fed4Rec to LLM-based recommendation presents two main challenges: first, an increase in the imbalance of performance across clients, affecting the system's efficiency over time, and second, a high demand on clients' computational and storage resources for local training and inference of LLMs.

Federated Learning Language Modelling +2

Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models

1 code implementation6 Feb 2024 Kelvin J. L. Koa, Yunshan Ma, Ritchie Ng, Tat-Seng Chua

The training samples for the PPO trainer are also the responses generated during the reflective process, which eliminates the need for human annotators.

Stock Prediction

Data-efficient Fine-tuning for LLM-based Recommendation

1 code implementation30 Jan 2024 Xinyu Lin, Wenjie Wang, Yongqi Li, Shuo Yang, Fuli Feng, Yinwei Wei, Tat-Seng Chua

To pursue the two objectives, we propose a novel data pruning method based on two scores, i. e., influence score and effort score, to efficiently identify the influential samples.

Improving Expressive Power of Spectral Graph Neural Networks with Eigenvalue Correction

no code implementations28 Jan 2024 Kangkang Lu, Yanhua Yu, Hao Fei, Xuan Li, Zixuan Yang, Zirui Guo, Meiyu Liang, Mengran Yin, Tat-Seng Chua

Moreover, we theoretically establish that the number of distinguishable eigenvalues plays a pivotal role in determining the expressive power of spectral graph neural networks.

Node Classification

Towards 3D Molecule-Text Interpretation in Language Models

1 code implementation25 Jan 2024 Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, Qi Tian

Through 3D molecule-text alignment and 3D molecule-centric instruction tuning, 3D-MoLM establishes an integration of 3D molecular encoder and LM.

Instruction Following Language Modeling +3

TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data

no code implementations24 Jan 2024 Fengbin Zhu, Ziyang Liu, Fuli Feng, Chao Wang, Moxin Li, Tat-Seng Chua

In this work, we address question answering (QA) over a hybrid of tabular and textual data that are very common content on the Web (e. g. SEC filings), where discrete reasoning capabilities are often required.

Language Modeling Language Modelling +1

Instilling Multi-round Thinking to Text-guided Image Generation

no code implementations16 Jan 2024 Lidong Zeng, Zhedong Zheng, Yinwei Wei, Tat-Seng Chua

This paper delves into the text-guided image editing task, focusing on modifying a reference image according to user-specified textual feedback to embody specific attributes.

Image Generation text-guided-generation +1

Denoising Diffusion Recommender Model

1 code implementation13 Jan 2024 Jujia Zhao, Wenjie Wang, Yiyan Xu, Teng Sun, Fuli Feng, Tat-Seng Chua

To achieve this target, the key lies in offering appropriate guidance to steer the reverse denoising process and providing a proper starting point to start the forward-reverse process during inference.

Denoising Recommendation Systems +1

GOODAT: Towards Test-time Graph Out-of-Distribution Detection

1 code implementation10 Jan 2024 Luzhi Wang, Dongxiao He, He Zhang, Yixin Liu, Wenjie Wang, Shirui Pan, Di Jin, Tat-Seng Chua

To identify and reject OOD samples with GNNs, recent studies have explored graph OOD detection, often focusing on training a specific model or modifying the data on top of a well-trained GNN.

Out-of-Distribution Detection

LASO: Language-guided Affordance Segmentation on 3D Object

1 code implementation CVPR 2024 Yicong Li, Na Zhao, Junbin Xiao, Chun Feng, Xiang Wang, Tat-Seng Chua

With this regard we propose a novel task Language-guided Affordance Segmentation on 3D Object (LASO) which challenges a model to segment a 3D object's part relevant to a given affordance question.

Object Segmentation

Temporally and Distributionally Robust Optimization for Cold-Start Recommendation

1 code implementation15 Dec 2023 Xinyu Lin, Wenjie Wang, Jujia Zhao, Yongqi Li, Fuli Feng, Tat-Seng Chua

They learn a feature extractor on warm-start items to align feature representations with interactions, and then leverage the feature extractor to extract the feature representations of cold-start items for interaction prediction.

Collaborative Filtering

Towards Goal-oriented Intelligent Tutoring Systems in Online Education

no code implementations3 Dec 2023 Yang Deng, Zifeng Ren, An Zhang, Wenqiang Lei, Tat-Seng Chua

In this work, we investigate a new task, named Goal-oriented Intelligent Tutoring Systems (GITS), which aims to enable the student's mastery of a designated concept by strategically planning a customized sequence of exercises and assessment.

cognitive diagnosis Representation Learning

SCTc-TE: A Comprehensive Formulation and Benchmark for Temporal Event Forecasting

1 code implementation2 Dec 2023 Yunshan Ma, Chenchen Ye, Zijian Wu, Xiang Wang, Yixin Cao, Liang Pang, Tat-Seng Chua

Temporal complex event forecasting aims to predict the future events given the observed events from history.

MultiCBR: Multi-view Contrastive Learning for Bundle Recommendation

1 code implementation28 Nov 2023 Yunshan Ma, Yingzhi He, Xiang Wang, Yinwei Wei, Xiaoyu Du, Yuyangzi Fu, Tat-Seng Chua

It does, however, have two limitations: 1) the two-view formulation does not fully exploit all the heterogeneous relations among users, bundles and items; and 2) the "early contrast and late fusion" framework is less effective in capturing user preference and difficult to generalize to multiple views.

Contrastive Learning Representation Learning

Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching

2 code implementations21 Nov 2023 Meng Chu, Zhedong Zheng, Wei Ji, Tingyu Wang, Tat-Seng Chua

Navigating drones through natural language commands remains challenging due to the dearth of accessible multi-modal datasets and the stringent precision requirements for aligning visual and textual data.

Drone navigation geo-localization +4

A Study of Implicit Ranking Unfairness in Large Language Models

1 code implementation13 Nov 2023 Chen Xu, Wenjie Wang, Yuxin Li, Liang Pang, Jun Xu, Tat-Seng Chua

Worse still, in this paper, we identify a subtler form of discrimination in LLMs, termed \textit{implicit ranking unfairness}, where LLMs exhibit discriminatory ranking patterns based solely on non-sensitive user profiles, such as user names.

Data Augmentation Fairness +3

NExT-Chat: An LMM for Chat, Detection and Segmentation

1 code implementation8 Nov 2023 Ao Zhang, Yuan YAO, Wei Ji, Zhiyuan Liu, Tat-Seng Chua

The development of large language models (LLMs) has greatly advanced the field of multimodal understanding, leading to the emergence of large multimodal models (LMMs).

Referring Expression Referring Expression Segmentation +1

Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents

1 code implementation1 Nov 2023 Yang Deng, Wenxuan Zhang, Wai Lam, See-Kiong Ng, Tat-Seng Chua

Proactive dialogues serve as a practical yet challenging dialogue problem in the era of large language models (LLMs), where the dialogue policy planning is the key to improving the proactivity of LLMs.

Language Modeling Language Modelling +1

Empowering Collaborative Filtering with Principled Adversarial Contrastive Loss

1 code implementation NeurIPS 2023 An Zhang, Leheng Sheng, Zhibo Cai, Xiang Wang, Tat-Seng Chua

To bridge the gap, we delve into the reasons underpinning the success of contrastive loss in CF, and propose a principled Adversarial InfoNCE loss (AdvInfoNCE), which is a variant of InfoNCE, specially tailored for CF methods.

Collaborative Filtering Contrastive Learning +3

Leveraging Multimodal Features and Item-level User Feedback for Bundle Construction

1 code implementation28 Oct 2023 Yunshan Ma, Xiaohao Liu, Yinwei Wei, Zhulin Tao, Xiang Wang, Tat-Seng Chua

Specifically, we use self-attention modules to combine the multimodal and multi-item features, and then leverage both item- and bundle-level contrastive learning to enhance the representation learning, thus to counter the modality missing, noise, and sparsity problems.

Contrastive Learning Representation Learning

Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

1 code implementation NeurIPS 2023 Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua

Our results show that a subgraph-level tokenizer and a sufficiently expressive decoder with remask decoding have a large impact on the encoder's representation learning.

Decoder Representation Learning +1

A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction

1 code implementation18 Oct 2023 Ruihao Shui, Yixin Cao, Xiang Wang, Tat-Seng Chua

Large language models (LLMs) have demonstrated great potential for domain-specific applications, such as the law domain.

Information Retrieval Legal Reasoning +1

On Generative Agents in Recommendation

1 code implementation16 Oct 2023 An Zhang, Yuxin Chen, Leheng Sheng, Xiang Wang, Tat-Seng Chua

Recommender systems are the cornerstone of today's information dissemination, yet a disconnect between offline metrics and online performance greatly hinders their development.

Collaborative Filtering Movie Recommendation +1

Robust Collaborative Filtering to Popularity Distribution Shift

1 code implementation16 Oct 2023 An Zhang, Wenchang Ma, Jingnan Zheng, Xiang Wang, Tat-Seng Chua

The popularity shortcut tricks are good for in-distribution (ID) performance but poorly generalized to out-of-distribution (OOD) data, i. e., when popularity distribution of test data shifts w. r. t.

Collaborative Filtering

Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators

1 code implementation11 Oct 2023 Liang Chen, Yang Deng, Yatao Bian, Zeyu Qin, Bingzhe Wu, Tat-Seng Chua, Kam-Fai Wong

Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks when being prompted to generate world knowledge.

Information Retrieval Informativeness +4

Bridging Items and Language: A Transition Paradigm for Large Language Model-Based Recommendation

no code implementations10 Oct 2023 Xinyu Lin, Wenjie Wang, Yongqi Li, Fuli Feng, See-Kiong Ng, Tat-Seng Chua

Harnessing Large Language Models (LLMs) for recommendation is rapidly emerging, which relies on two fundamental steps to bridge the recommendation item space and the language space: 1) item indexing utilizes identifiers to represent items in the language space, and 2) generation grounding associates LLMs' generated token sequences to in-corpus items.

Attribute Language Modeling +3

Progressive Text-to-3D Generation for Automatic 3D Prototyping

1 code implementation26 Sep 2023 Han Yi, Zhedong Zheng, Xiangyu Xu, Tat-Seng Chua

We aspire for our work to pave the way for automatic 3D prototyping via natural language descriptions.

3D Generation Text to 3D

NExT-GPT: Any-to-Any Multimodal LLM

1 code implementation11 Sep 2023 Shengqiong Wu, Hao Fei, Leigang Qu, Wei Ji, Tat-Seng Chua

While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides, they mostly fall prey to the limitation of only input-side multimodal understanding, without the ability to produce content in multiple modalities.

AI Agent

I3: Intent-Introspective Retrieval Conditioned on Instructions

no code implementations19 Aug 2023 Kaihang Pan, Juncheng Li, Wenjie Wang, Hao Fei, Hongye Song, Wei Ji, Jun Lin, Xiaozhong Liu, Tat-Seng Chua, Siliang Tang

Recent studies indicate that dense retrieval models struggle to perform well on a wide variety of retrieval tasks that lack dedicated training data, as different retrieval tasks often entail distinct search intents.

Retrieval Text-to-Image Generation

Diffusion Variational Autoencoder for Tackling Stochasticity in Multi-Step Regression Stock Price Prediction

1 code implementation18 Aug 2023 Kelvin J. L. Koa, Yunshan Ma, Ritchie Ng, Tat-Seng Chua

The hierarchical VAE allows us to learn the complex and low-level latent variables for stock prediction, while the diffusion probabilistic model trains the predictor to handle stock price stochasticity by progressively adding random noise to the stock data.

regression Stock Prediction +1

Context-aware Event Forecasting via Graph Disentanglement

1 code implementation12 Aug 2023 Yunshan Ma, Chenchen Ye, Zijian Wu, Xiang Wang, Yixin Cao, Tat-Seng Chua

The task of event forecasting aims to model the relational and temporal patterns based on historical events and makes forecasting to what will happen in the future.

Disentanglement Link Prediction

LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation

1 code implementation9 Aug 2023 Leigang Qu, Shengqiong Wu, Hao Fei, Liqiang Nie, Tat-Seng Chua

Afterward, we propose a fine-grained object-interaction diffusion method to synthesize high-faithfulness images conditioned on the prompt and the automatically generated layout.

In-Context Learning Text-to-Image Generation

Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role Labeling

no code implementations9 Aug 2023 Yu Zhao, Hao Fei, Yixin Cao, Bobo Li, Meishan Zhang, Jianguo Wei, Min Zhang, Tat-Seng Chua

A scene-event mapping mechanism is first designed to bridge the gap between the underlying scene structure and the high-level event semantic structure, resulting in an overall hierarchical scene-event (termed ICE) graph structure.

Semantic Role Labeling

Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition

no code implementations8 Aug 2023 Bobo Li, Hao Fei, Lizi Liao, Yu Zhao, Chong Teng, Tat-Seng Chua, Donghong Ji, Fei Li

On the other hand, during the feature fusion stage, we propose a Contribution-aware Fusion Mechanism (CFM) and a Context Refusion Mechanism (CRM) for multimodal and context integration, respectively.

Contrastive Learning Disentanglement +2

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions

1 code implementation8 Aug 2023 Juncheng Li, Kaihang Pan, Zhiqi Ge, Minghe Gao, Wei Ji, Wenqiao Zhang, Tat-Seng Chua, Siliang Tang, Hanwang Zhang, Yueting Zhuang

This shortcoming results in MLLMs' underperformance in comprehending demonstrative instructions consisting of multiple, interleaved, and multimodal instructions that demonstrate the required context to complete a task.

Caption Generation Image Captioning +2

SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning

2 code implementations3 Aug 2023 Keyu Duan, Qian Liu, Tat-Seng Chua, Shuicheng Yan, Wei Tsang Ooi, Qizhe Xie, Junxian He

More recently, with the rapid development of language models (LMs), researchers have focused on leveraging LMs to facilitate the learning of TGs, either by jointly training them in a computationally intensive framework (merging the two stages), or designing complex self-supervised training tasks for feature extraction (enhancing the first stage).

Feature Engineering Graph Learning +4

XNLP: An Interactive Demonstration System for Universal Structured NLP

no code implementations3 Aug 2023 Hao Fei, Meishan Zhang, Min Zhang, Tat-Seng Chua

Structured Natural Language Processing (XNLP) is an important subset of NLP that entails understanding the underlying semantic or syntactic structure of texts, which serves as a foundational component for many downstream applications.