Search Results for author: Zhihong Zhu

Found 26 papers, 9 papers with code

Enhancing Image Generation Fidelity via Progressive Prompts

1 code implementation13 Jan 2025 Zhen Xiong, Yuqi Li, Chuanguang Yang, Tiao Tan, Zhihong Zhu, Siyuan Li, Yue Ma

We find that deeper layers are always responsible for high - level content control, while shallow layers handle low - level content control.

Diversity Image Generation +3

VASparse: Towards Efficient Visual Hallucination Mitigation for Large Vision-Language Model via Visual-Aware Sparsification

1 code implementation11 Jan 2025 Xianwei Zhuang, Zhihong Zhu, Yuxin Xie, Liming Liang, Yuexian Zou

In this work, we propose an efficient plug-and-play decoding algorithm via Visual-Aware Sparsification (VASparse) from the perspective of token sparsity for mitigating VH.

Hallucination Language Modeling +1

Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation

1 code implementation24 Dec 2024 Derong Xu, Xinhang Li, Ziheng Zhang, Zhenxi Lin, Zhihong Zhu, Zhi Zheng, Xian Wu, Xiangyu Zhao, Tong Xu, Enhong Chen

The Amar framework comprises two key sub-components: 1) a self-alignment module that aligns commonalities among entities, relations, and subgraphs to enhance retrieved text, thereby reducing noise interference; 2) a relevance gating module that employs a soft gate to learn the relevance score between question and multi-aspect retrieved data, to determine which information should be used to enhance LLMs' output, or even filtered altogether.

Graph Question Answering Hallucination +3

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

1 code implementation12 Dec 2024 Hongxiang Li, Yaowei Li, Yuhang Yang, Junjie Cao, Zhihong Zhu, Xuxin Cheng, Long Chen

Specifically, we generate a dense motion field from a sparse motion field and the reference image, which provides region-level dense guidance while maintaining the generalization of the sparse pose control.

Image Animation

Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding

no code implementations21 Oct 2024 Derong Xu, Ziheng Zhang, Zhihong Zhu, Zhenxi Lin, Qidong Liu, Xian Wu, Tong Xu, Xiangyu Zhao, Yefeng Zheng, Enhong Chen

In this paper, we address these hallucination issues in the context of Medical Information Extraction (MIE) tasks by introducing ALternate Contrastive Decoding (ALCD).

Hallucination

DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval

no code implementations16 Sep 2024 Yifei Xin, Xuxin Cheng, Zhihong Zhu, Xusheng Yang, Yuexian Zou

To this end, we present a diffusion-based ATR framework (DiffATR), which models ATR as an iterative procedure that progressively generates joint distribution from noise.

AudioCaps Text Retrieval

Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective

1 code implementation11 Sep 2024 Guimin Hu, Yi Xin, Weimin Lyu, Haojian Huang, Chang Sun, Zhihong Zhu, Lin Gui, Ruichu Cai, Erik Cambria, Hasti Seifi

The goal of this survey is to explore the current landscape of multimodal affective research, identify development trends, and highlight the similarities and differences across various tasks, offering a comprehensive report on the recent progress in multimodal affective computing from an NLP perspective.

Aspect-Based Sentiment Analysis Emotion Recognition in Conversation +3

XMeCap: Meme Caption Generation with Sub-Image Adaptability

no code implementations24 Jul 2024 Yuyan Chen, Songzhou Yan, Zhihong Zhu, Zhixu Li, Yanghua Xiao

Humor, deeply rooted in societal meanings and cultural details, poses a unique challenge for machines.

Caption Generation Meme Captioning

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

1 code implementation5 Jul 2024 Zhaorun Chen, Yichao Du, Zichen Wen, Yiyang Zhou, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

Compared with open-source VLMs, smaller-sized scoring models can provide better feedback regarding text-image alignment and image quality, while VLMs provide more accurate feedback regarding safety and generation bias due to their stronger reasoning capabilities.

Hallucination Text-to-Image Generation

LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference

1 code implementation26 Jun 2024 Zhongwei Wan, Ziang Wu, Che Liu, Jinfa Huang, Zhihong Zhu, Peng Jin, Longyue Wang, Li Yuan

Long-context Multimodal Large Language Models (MLLMs) demand substantial computational resources for inference as the growth of their multimodal Key-Value (KV) cache, in response to increasing input lengths, challenges memory and time efficiency.

multimodal interaction

D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models

no code implementations18 Jun 2024 Zhongwei Wan, Xinjian Wu, Yu Zhang, Yi Xin, Chaofan Tao, Zhihong Zhu, Xin Wang, Siqi Luo, Jing Xiong, Mi Zhang

Efficient inference in Large Language Models (LLMs) is impeded by the growing memory demands of key-value (KV) caching, especially for longer sequences.

Text Generation

UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause

no code implementations30 Mar 2024 Guimin Hu, Zhihong Zhu, Daniel Hershcovich, Lijie Hu, Hasti Seifi, Jiayuan Xie

In this paper, we propose a Unified Multimodal Emotion recognition and Emotion-Cause analysis framework (UniMEEC) to explore the causality between emotion and emotion cause.

Emotion Cause Extraction Emotion-Cause Pair Extraction +2

Multi-perspective Improvement of Knowledge Graph Completion with Large Language Models

1 code implementation4 Mar 2024 Derong Xu, Ziheng Zhang, Zhenxi Lin, Xian Wu, Zhihong Zhu, Tong Xu, Xiangyu Zhao, Yefeng Zheng, Enhong Chen

Knowledge graph completion (KGC) is a widely used method to tackle incompleteness in knowledge graphs (KGs) by making predictions for missing links.

Link Prediction Relation +1

Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models

1 code implementation28 Feb 2024 Derong Xu, Ziheng Zhang, Zhihong Zhu, Zhenxi Lin, Qidong Liu, Xian Wu, Tong Xu, Wanyu Wang, Yuyang Ye, Xiangyu Zhao, Enhong Chen, Yefeng Zheng

To evaluate the editing impact on the behaviours of LLMs, we propose two model editing studies for medical domain: (1) editing factual knowledge for medical specialization and (2) editing the explanatory ability for complex knowledge.

Benchmarking Hallucination +1

Securing Reliability: A Brief Overview on Enhancing In-Context Learning for Foundation Models

no code implementations27 Feb 2024 Yunpeng Huang, Yaonan Gu, Jingwei Xu, Zhihong Zhu, Zhaorun Chen, Xiaoxing Ma

As foundation models (FMs) continue to shape the landscape of AI, the in-context learning (ICL) paradigm thrives but also encounters issues such as toxicity, hallucination, disparity, adversarial vulnerability, and inconsistency.

Hallucination In-Context Learning

AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition

no code implementations18 Feb 2024 Zhaorun Chen, Zhuokai Zhao, Zhihong Zhu, Ruiqi Zhang, Xiang Li, Bhiksha Raj, Huaxiu Yao

Recent advancements in large language models (LLMs) have shown promise in multi-step reasoning tasks, yet their reliance on extensive manual labeling to provide procedural feedback remains a significant impediment.

ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding

no code implementations19 Nov 2023 Xuxin Cheng, Bowen Cao, Qichen Ye, Zhihong Zhu, Hongxiang Li, Yuexian Zou

Specifically, in fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively, aiming to iteratively share knowledge between these two models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory

no code implementations ICCV 2023 Hongxiang Li, Meng Cao, Xuxin Cheng, Yaowei Li, Zhihong Zhu, Yuexian Zou

Due to two annoying issues in video grounding: (1) the co-existence of some visual entities in both ground truth and other moments, \ie semantic overlapping; (2) only a few moments in the video are annotated, \ie sparse annotation dilemma, vanilla contrastive learning is unable to model the correlations between temporally distant moments and learned inconsistent video representations.

Contrastive Learning Video Grounding

Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation

no code implementations ICCV 2023 Yaowei Li, Bang Yang, Xuxin Cheng, Zhihong Zhu, Hongxiang Li, Yuexian Zou

Automatic radiology report generation has attracted enormous research interest due to its practical value in reducing the workload of radiologists.

Sentence Triplet

Improve Retrieval-based Dialogue System via Syntax-Informed Attention

no code implementations12 Mar 2023 Tengtao Song, Nuo Chen, Ji Jiang, Zhihong Zhu, Yuexian Zou

Since incorporating syntactic information like dependency structures into neural models can promote a better understanding of the sentences, such a method has been widely used in NLP tasks.

Retrieval Sentence

Exploiting Auxiliary Caption for Video Grounding

no code implementations15 Jan 2023 Hongxiang Li, Meng Cao, Xuxin Cheng, Zhihong Zhu, Yaowei Li, Yuexian Zou

Video grounding aims to locate a moment of interest matching the given query sentence from an untrimmed video.

Contrastive Learning Dense Video Captioning +2

A Dynamic Graph Interactive Framework with Label-Semantic Injection for Spoken Language Understanding

no code implementations8 Nov 2022 Zhihong Zhu, Weiyuan Xu, Xuxin Cheng, Tengtao Song, Yuexian Zou

Multi-intent detection and slot filling joint models are gaining increasing traction since they are closer to complicated real-world scenarios.

Intent Detection Semantic Frame Parsing +3

Cannot find the paper you are looking for? You can Submit a new open access paper.