Search Results for author: Jiacheng Zhang

Found 26 papers, 10 papers with code

Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding

1 code implementation3 Jan 2025 Jiaming Li, Jiacheng Zhang, Zequn Jie, Lin Ma, Guanbin Li

In this method, we design a Cross-Modal Value-Enhanced Decoding(CMVED) module to alleviate hallucination by a novel contrastive decoding mechanism.

Hallucination Language Modeling +2

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

1 code implementation19 Dec 2024 Yatai Ji, Jiacheng Zhang, Jie Wu, Shilong Zhang, Shoufa Chen, Chongjian Ge, Peize Sun, Weifeng Chen, Wenqi Shao, Xuefeng Xiao, Weilin Huang, Ping Luo

Text-to-video models have made remarkable advancements through optimization on high-quality text-video pairs, where the textual prompts play a pivotal role in determining quality of output videos.

Video Generation

OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization

no code implementations19 Dec 2024 Jiacheng Zhang, Jie Wu, Weifeng Chen, Yatai Ji, Xuefeng Xiao, Weilin Huang, Kai Han

To tackle these issues, we introduce OnlineVPO, a more efficient preference learning approach tailored specifically for video diffusion models.

Video Quality Assessment Visual Question Answering (VQA)

An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training

no code implementations18 Dec 2024 Haiming Zhang, Ying Xue, Xu Yan, Jiacheng Zhang, Weichao Qiu, Dongfeng Bai, Bingbing Liu, Shuguang Cui, Zhen Li

Experiments demonstrate the effectiveness of our approach, showcasing its state-of-the-art performance on the nuScenes and OpenScene benchmarks for 4D occupancy forecasting, end-to-end motion planning and point cloud forecasting.

Autonomous Driving Motion Planning

Minimax-optimal trust-aware multi-armed bandits

no code implementations4 Oct 2024 Changxiao Cai, Jiacheng Zhang

Multi-armed bandit (MAB) algorithms have achieved significant success in sequential decision-making applications, under the premise that humans perfectly implement the recommended policy.

Decision Making Multi-Armed Bandits +1

EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models

no code implementations25 Sep 2024 Jiacheng Zhang, Yang Jiao, Shaoxiang Chen, Jingjing Chen, Yu-Gang Jiang

To effectively instruct an MLLM, in addition to conventional language expressions, the practice of referring to objects by painting with brushes on images has emerged as a prevalent tool (referred to as "referring visual prompts") due to its efficacy in aligning the user's intention with specific image regions.

Instruction Following

EventHallusion: Diagnosing Event Hallucinations in Video LLMs

1 code implementation25 Sep 2024 Jiacheng Zhang, Yang Jiao, Shaoxiang Chen, Na Zhao, Jingjing Chen

To mitigate this gap, we propose EventHallusion, a novel benchmark that focuses on assessing the VideoLLMs' hallucination toward event, the crux of video analysis.

Hallucination Instruction Following

Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training

1 code implementation2 Jun 2024 Jiacheng Zhang, Feng Liu, Dawei Zhou, Jingfeng Zhang, Tongliang Liu

However, in this paper, we discover that not all pixels contribute equally to the accuracy on AEs (i. e., robustness) and accuracy on natural images (i. e., accuracy).

Robust classification

Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection

no code implementations CVPR 2024 Jiaming Li, Jiacheng Zhang, Jichang Li, Ge Li, Si Liu, Liang Lin, Guanbin Li

Specifically, we devise three modules: Background Category-specific Prompt, Background Object Discovery, and Inference Probability Rectification, to empower the detector to discover, represent, and leverage implicit object knowledge explored from background proposals.

Knowledge Distillation Object +3

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

no code implementations21 Apr 2024 Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, Xing Wang, Xuefeng Xiao

Current distillation techniques often dichotomize into two distinct aspects: i) ODE Trajectory Preservation; and ii) ODE Trajectory Reformulation.

Image Generation

UniFL: Improve Latent Diffusion Model via Unified Feedback Learning

no code implementations8 Apr 2024 Jiacheng Zhang, Jie Wu, Yuxi Ren, Xin Xia, Huafeng Kuang, Pan Xie, Jiashi Li, Xuefeng Xiao, Weilin Huang, Shilei Wen, Lean Fu, Guanbin Li

Latent diffusion models (LDM) have revolutionized text-to-image generation, leading to the proliferation of various advanced models and diverse downstream applications.

Text-to-Image Generation

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

no code implementations CVPR 2024 Jiacheng Zhang, Jiaming Li, Xiangru Lin, Wei zhang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

Additionally, we present a DepthGradient Projection (DGP) module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels, effectively decoupling the depth gradient and removing conflicting gradients.

Monocular 3D Object Detection object-detection +1

Generating Visual Scenes from Touch

no code implementations ICCV 2023 Fengyu Yang, Jiacheng Zhang, Andrew Owens

An emerging line of work has sought to generate plausible imagery from touch.

A Survey of Time Series Anomaly Detection Methods in the AIOps Domain

no code implementations1 Aug 2023 Zhenyu Zhong, Qiliang Fan, Jiacheng Zhang, Minghua Ma, Shenglin Zhang, Yongqian Sun, QIngwei Lin, Yuzhi Zhang, Dan Pei

Internet-based services have seen remarkable success, generating vast amounts of monitored key performance indicators (KPIs) as univariate or multivariate time series.

Anomaly Detection Time Series +1

Semi-DETR: Semi-Supervised Object Detection with Detection Transformers

3 code implementations CVPR 2023 Jiacheng Zhang, Xiangru Lin, Wei zhang, Kuo Wang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

Specifically, we propose a Stage-wise Hybrid Matching strategy that combines the one-to-many assignment and one-to-one assignment strategies to improve the training efficiency of the first stage and thus provide high-quality pseudo labels for the training of the second stage.

Object object-detection +3

On time-consistent equilibrium stopping under aggregation of diverse discount rates

no code implementations15 Feb 2023 Shuoqing Deng, Xiang Yu, Jiacheng Zhang

When the sufficient condition of the attitude function is violated, we can illustrate by various examples that the characterization of the optimal equilibrium may differ significantly from some existing results for an individual agent.

Decision Making Diversity

A new Speech Feature Fusion method with cross gate parallel CNN for Speaker Recognition

no code implementations24 Nov 2022 Jiacheng Zhang, Wenyi Yan, Ye Zhang

In this paper, a new speech feature fusion method is proposed for speaker recognition on the basis of the cross gate parallel convolutional neural network (CG-PCNN).

Speaker Recognition

Touch and Go: Learning from Human-Collected Vision and Touch

no code implementations22 Nov 2022 Fengyu Yang, Chenyang Ma, Jiacheng Zhang, Jing Zhu, Wenzhen Yuan, Andrew Owens

The ability to associate touch with sight is essential for tasks that require physically interacting with objects in the world.

Image Stylization

Modeling Voting for System Combination in Machine Translation

1 code implementation14 Jul 2020 Xuancheng Huang, Jiacheng Zhang, Zhixing Tan, Derek F. Wong, Huanbo Luan, Jingfang Xu, Maosong Sun, Yang Liu

System combination is an important technique for combining the hypotheses of different machine translation systems to improve translation performance.

Machine Translation Translation

Neural Machine Translation with Explicit Phrase Alignment

no code implementations26 Nov 2019 Jiacheng Zhang, Huanbo Luan, Maosong Sun, FeiFei Zhai, Jingfang Xu, Yang Liu

The lack of alignment in NMT models leads to three problems: it is hard to (1) interpret the translation process, (2) impose lexical constraints, and (3) impose structural constraints.

Machine Translation NMT +1

Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization

1 code implementation ACL 2017 Jiacheng Zhang, Yang Liu, Huanbo Luan, Jingfang Xu, Maosong Sun

Although neural machine translation has made significant progress recently, how to integrate multiple overlapping, arbitrary prior knowledge sources remains a challenge.

Machine Translation Translation

Improving the Transformer Translation Model with Document-Level Context

3 code implementations EMNLP 2018 Jiacheng Zhang, Huanbo Luan, Maosong Sun, FeiFei Zhai, Jingfang Xu, Min Zhang, Yang Liu

Although the Transformer translation model (Vaswani et al., 2017) has achieved state-of-the-art performance in a variety of translation tasks, how to use document-level context to deal with discourse phenomena problematic for Transformer still remains a challenge.

Decoder Sentence +1

THUMT: An Open Source Toolkit for Neural Machine Translation

6 code implementations20 Jun 2017 Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, Yang Liu

This paper introduces THUMT, an open-source toolkit for neural machine translation (NMT) developed by the Natural Language Processing Group at Tsinghua University.

Decoder Machine Translation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.