Search Results for author: Zhendong Mao

Found 60 papers, 29 papers with code

EmRel: Joint Representation of Entities and Embedded Relations for Multi-triple Extraction

1 code implementation NAACL 2022 Benfeng Xu, Quan Wang, Yajuan Lyu, Yabing Shi, Yong Zhu, Jie Gao, Zhendong Mao

Multi-triple extraction is a challenging task due to the existence of informative inter-triple correlations, and consequently rich interactions across the constituent entities and relations. While existing works only explore entity representations, we propose to explicitly introduce relation representation, jointly represent it with entities, and novelly align them to identify valid triples. We perform comprehensive experiments on document-level relation extraction and joint entity and relation extraction along with ablations to demonstrate the advantage of the proposed method.

Document-level Relation Extraction Joint Entity and Relation Extraction +2

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

no code implementations13 Jun 2025 Mingxuan Du, Benfeng Xu, Chiwei Zhu, Xiaorui Wang, Zhendong Mao

However, a comprehensive benchmark for systematically evaluating the capabilities of these agents remains absent.

Retrieval

Pro3D-Editor : A Progressive-Views Perspective for Consistent and Precise 3D Editing

no code implementations31 May 2025 Yang Zheng, Mengqi Huang, Nan Chen, Zhendong Mao

In this study, we argue that ideal consistent 3D editing can be achieved through a \textit{progressive-views paradigm}, which propagates editing semantics from the editing-salient view to other editing-sparse views.

Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability

1 code implementation30 May 2025 Chiwei Zhu, Benfeng Xu, An Yang, Junyang Lin, Quan Wang, Chang Zhou, Zhendong Mao

The results lead to several key findings that add new insights upon existing understandings: 1) Rationales can, at times, deteriorate model performance; 2) Rationales can, at times, improve model reliability, even outperforming their untrained counterparts; 3) A linear correspondence exists in between the performance and reliability improvements, while both are driven by the intrinsic difficulty of the task.

MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning

no code implementations27 May 2025 Zikang Guo, Benfeng Xu, Xiaorui Wang, Zhendong Mao

Complex tasks involving tool integration pose significant challenges for Large Language Models (LLMs), leading to the emergence of multi-agent workflows as a promising solution.

Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking

no code implementations26 May 2025 Yihan Chen, Benfeng Xu, Xiaorui Wang, Yongdong Zhang, Zhendong Mao

We synthesize self-reflected trajectories that include reflections and corrections of error steps, which enhance the effectiveness of LLM agents in learning from teacher models, enabling them to become agents capable of self-reflecting and correcting.

Prompt Engineering

Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models

no code implementations26 May 2025 Yi Liu, Dianqing Liu, Mingye Zhu, Junbo Guo, Yongdong Zhang, Zhendong Mao

To address this limitation, we propose a novel \textit{Residual Alignment Model} (\textit{RAM}) that formalizes the alignment process as a type of importance sampling.

Domain Adaptation Instruction Following

CL-RAG: Bridging the Gap in Retrieval-Augmented Generation with Curriculum Learning

no code implementations15 May 2025 Shaohan Wang, Licheng Zhang, Zheren Fu, Zhendong Mao

Inspired by human cognitive learning, curriculum learning trains models using samples progressing from easy to difficult, thus enhancing their generalization ability, and we integrate this effective paradigm to the training of the RAG system.

RAG Retrieval +1

HDGlyph: A Hierarchical Disentangled Glyph-Based Framework for Long-Tail Text Rendering in Diffusion Models

no code implementations10 May 2025 Shuhan Zhuang, Mengqi Huang, Fengyi Fu, Nan Chen, Bohan Lei, Zhendong Mao

Visual text rendering, which aims to accurately integrate specified textual content within generated images, is critical for various applications such as commercial design.

Text Generation

DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization

no code implementations4 May 2025 Wenchuan Wang, Mengqi Huang, Yijing Tu, Zhendong Mao

To address this, we introduce DualReal, a novel framework that, employs adaptive joint training to collaboratively construct interdependencies between dimensions.

Denoising Text-to-Video Generation +1

HistLLM: A Unified Framework for LLM-Based Multimodal Recommendation with User History Encoding and Compression

no code implementations14 Apr 2025 Chen Zhang, Bo Hu, Weidong Chen, Zhendong Mao

While large language models (LLMs) have proven effective in leveraging textual data for recommendations, their application to multimodal recommendation tasks remains relatively underexplored.

Multimodal Recommendation

Leveraging Robust Optimization for LLM Alignment under Distribution Shifts

no code implementations8 Apr 2025 Mingye Zhu, Yi Liu, Zheren Fu, Yongdong Zhang, Zhendong Mao

Preference alignment methods are increasingly critical for steering large language models (LLMs) to generate outputs consistent with human values.

RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models

no code implementations13 Mar 2025 Yijing Lin, Mengqi Huang, Shuhan Zhuang, Zhendong Mao

RealGeneral demonstrates effectiveness in multiple important visual generation tasks, e. g., it achieves a 14. 5% improvement in subject similarity for customized generation and a 10% enhancement in image quality for canny-to-image task.

Image Generation In-Context Learning

On-the-fly Preference Alignment via Principle-Guided Decoding

1 code implementation20 Feb 2025 Mingye Zhu, Yi Liu, Lei Zhang, Junbo Guo, Zhendong Mao

These limitations significantly constrain the scope and efficacy of both task-specific and general preference alignment methods.

Dragin3D: Image Editing by Dragging in 3D Space

no code implementations CVPR 2025 Weiran Guang, Xiaoguang Gu, Mengqi Huang, Zhendong Mao

This modification helps improve the model's ability to ignore background interference when editing real images with complex backgrounds.

3D Object Reconstruction continuous-control +2

FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation

no code implementations CVPR 2025 Fengyi Fu, Lei Zhang, Mengqi Huang, Zhendong Mao

Previous works mainly follow the multi-step denoising diffusion paradigm, which adopts a fixed text guidance intensity (i. e., editing intensity) to inject textual features, while ignoring the step-specific editing requirements.

Denoising Text-based Image Editing

A4A: Adapter for Adapter Transfer via All-for-All Mapping for Cross-Architecture Models

no code implementations CVPR 2025 Keyu Tu, Mengqi Huang, Zhuowei Chen, Zhendong Mao

During Coupling Space Projection, all attention features of the pretrained adapter are aggregated to fully capture the coupling relationship before being projected into a unified space.

All Image Generation +1

SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation

no code implementations CVPR 2025 Hao Du, Bo Wu, Yan Lu, Zhendong Mao

Vision-language temporal alignment is a crucial capability for human dynamic recognition and cognition in real-world scenarios.

Benchmarking Diagnostic +1

D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation

no code implementations CVPR 2025 Weinan Jia, Mengqi Huang, Nan Chen, Lei Zhang, Zhendong Mao

To address these limitations, we propose dynamically compressing different image regions by recognizing the importance of different regions, and introduce a novel two-stage framework designed to enhance the effectiveness and efficiency of image generation: (1) Dynamic VAE (DVAE) at first stage employs a hierarchical encoder to encode different image regions at different downsampling rates, tailored to their specific information densities, thereby providing more accurate and natural latent codes for the diffusion process.

Image Generation

FlipGuard: Defending Preference Alignment against Update Regression with Constrained Optimization

no code implementations1 Oct 2024 Mingye Zhu, Yi Liu, Quan Wang, Junbo Guo, Zhendong Mao

Recent breakthroughs in preference alignment have significantly improved Large Language Models' ability to generate texts that align with human preferences and values.

regression

CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization

no code implementations9 Sep 2024 Nan Chen, Mengqi Huang, Zhuowei Chen, Yang Zheng, Lei Zhang, Zhendong Mao

This misconstruction leads to both overfitting or underfitting of irrelevant and intrinsic attributes of the subject, i. e., these attributes are over-represented or under-represented simultaneously, causing a trade-off between similarity and controllability.

Contrastive Learning

RealCustom++: Representing Images as Real-Word for Real-Time Customization

no code implementations19 Aug 2024 Zhendong Mao, Mengqi Huang, Fei Ding, Mingcong Liu, Qian He, Yongdong Zhang

Text-to-image customization, which takes given texts and images depicting given subjects as inputs, aims to synthesize new images that align with both text semantics and subject appearance.

ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA

1 code implementation19 Aug 2024 Jiaang Li, Quan Wang, Zhongnan Wang, Yongdong Zhang, Zhendong Mao

To overcome this challenge, we propose ELDER, a novel approach to create a continuous association between data and adapters.

Model Editing

Dual-path Collaborative Generation Network for Emotional Video Captioning

1 code implementation6 Aug 2024 Cheng Ye, Weidong Chen, Jingyu Li, Lei Zhang, Zhendong Mao

Emotional Video Captioning is an emerging task that aims to describe factual content with the intrinsic emotions expressed in videos.

Caption Generation Video Captioning

LIRE: listwise reward enhancement for preference alignment

1 code implementation22 May 2024 Mingye Zhu, Yi Liu, Lei Zhang, Junbo Guo, Zhendong Mao

Recently, tremendous strides have been made to align the generation of Large Language Models (LLMs) with human values to mitigate toxic or unhelpful content.

Feature-Adaptive and Data-Scalable In-Context Learning

1 code implementation17 May 2024 Jiahao Li, Quan Wang, Licheng Zhang, Guoqing Jin, Zhendong Mao

In this paper, we propose a feature-adaptive and data-scalable in-context learning framework (FADS-ICL), which can leverage task-adaptive features to promote inference on the downstream task, with the supervision of beyond-context samples.

In-Context Learning

Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting

1 code implementation19 Apr 2024 Fengyi Fu, Shancheng Fang, Weidong Chen, Zhendong Mao

Furthermore, a batch attention module is also proposed in this paper to alleviate the problem of missing sentimental samples, caused by the data imbalance, which is common in live videos as the popularity of videos varies.

Diversity

Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation

1 code implementation5 Apr 2024 Tianqi Zhong, Zhaoyi Li, Quan Wang, Linqi Song, Ying WEI, Defu Lian, Zhendong Mao

Compositional generalization, representing the model's ability to generate text with new attribute combinations obtained by recombining single attributes from the training data, is a crucial property for multi-aspect controllable text generation (MCTG) methods.

Attribute Benchmarking +2

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

1 code implementation CVPR 2024 Mengqi Huang, Zhendong Mao, Mingcong Liu, Qian He, Yongdong Zhang

However, the inherent entangled influence scope of pseudo-words with the given text results in a dual-optimum paradox, i. e., the similarity of the given subjects and the controllability of the given text could not be optimal simultaneously.

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

no code implementations22 Feb 2024 Hao Li, Mengqi Huang, Lei Zhang, Bo Hu, Yi Liu, Zhendong Mao

GAN-based image attribute editing firstly leverages GAN Inversion to project real images into the latent space of GAN and then manipulates corresponding latent codes.

Attribute

Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

1 code implementation1 Jan 2024 Yihan Chen, Benfeng Xu, Quan Wang, Yi Liu, Zhendong Mao

While large language models (LLMs) have exhibited impressive instruction-following capabilities, it is still unclear whether and to what extent they can respond to explicit constraints that might be entailed in various instructions.

Benchmarking Instruction Following +1

Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment

1 code implementation CVPR 2024 Zheren Fu, Lei Zhang, Hou Xia, Zhendong Mao

We propose a novel Linguistic-Aware Patch Slimming (LAPS) framework for fine-grained alignment which explicitly identifies redundant visual patches with language supervision and rectifies their semantic and spatial information to facilitate more effective and consistent patch-word alignment.

cross-modal alignment Cross-Modal Retrieval +6

E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation

no code implementations25 Nov 2023 Fengyi Fu, Lei Zhang, Quan Wang, Zhendong Mao

Then we propose an emotion correlation enhanced decoder, with a novel correlation-aware aggregation and soft/hard strategy, respectively improving the emotion perception and response generation.

Decoder Dialogue Generation +1

Grammatical Error Correction via Mixed-Grained Weighted Training

no code implementations23 Nov 2023 Jiahao Li, Quan Wang, Chiwei Zhu, Zhendong Mao, Yongdong Zhang

In this paper, the inherent discrepancies are manifested in two aspects, namely, accuracy of data annotation and diversity of potential annotations.

Diversity Grammatical Error Correction +1

On the Calibration of Large Language Models and Alignment

no code implementations22 Nov 2023 Chiwei Zhu, Benfeng Xu, Quan Wang, Yongdong Zhang, Zhendong Mao

As large language models attract increasing attention and find widespread application, concurrent challenges of reliability also arise at the same time.

Improving Image Captioning via Predicting Structured Concepts

no code implementations14 Nov 2023 Ting Wang, Weidong Chen, Yuanhe Tian, Yan Song, Zhendong Mao

Having the difficulty of solving the semantic gap between images and texts for the image captioning task, conventional studies in this area paid some attention to treating semantic concepts as a bridge between the two modalities and improved captioning performance accordingly.

Image Captioning

Random Entity Quantization for Parameter-Efficient Compositional Knowledge Graph Representation

1 code implementation24 Oct 2023 Jiaang Li, Quan Wang, Yi Liu, Licheng Zhang, Zhendong Mao

We analyze this phenomenon and reveal that entity codes, the quantization outcomes for expressing entities, have higher entropy at the code level and Jaccard distance at the codeword level under random entity quantization.

Knowledge Graphs Quantization +1

Air-Decoding: Attribute Distribution Reconstruction for Decoding-Time Controllable Text Generation

1 code implementation23 Oct 2023 Tianqi Zhong, Quan Wang, Jingxuan Han, Yongdong Zhang, Zhendong Mao

Then we design a novel attribute distribution reconstruction method to balance the obtained distributions and use the reconstructed distributions to guide language models for generation, effectively avoiding the issue of Attribute Collapse.

Attribute Text Generation

DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation

no code implementations1 Jul 2023 Zhuowei Chen, Shancheng Fang, Wei Liu, Qian He, Mengqi Huang, Yongdong Zhang, Zhendong Mao

While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centric images, an intractable problem is how to preserve the face identity for conditioned face images.

Image Generation

ExpertPrompting: Instructing Large Language Models to be Distinguished Experts

2 code implementations24 May 2023 Benfeng Xu, An Yang, Junyang Lin, Quan Wang, Chang Zhou, Yongdong Zhang, Zhendong Mao

The answering quality of an aligned large language model (LLM) can be drastically improved if treated with proper crafting of prompts.

In-Context Learning Instruction Following +3

Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation

1 code implementation CVPR 2023 Mengqi Huang, Zhendong Mao, Quan Wang, Yongdong Zhang

Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook.

All Image Generation +2

Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization

1 code implementation CVPR 2023 Mengqi Huang, Zhendong Mao, Zhuowei Chen, Yongdong Zhang

Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm that first learns a codebook to encode images as discrete codes, and then completes generation based on the learned codebook.

Image Generation Position +1

$k$NN Prompting: Beyond-Context Learning with Calibration-Free Nearest Neighbor Inference

1 code implementation24 Mar 2023 Benfeng Xu, Quan Wang, Zhendong Mao, Yajuan Lyu, Qiaoqiao She, Yongdong Zhang

In-Context Learning (ICL), which formulates target tasks as prompt completion conditioned on in-context demonstrations, has become the prevailing utilization of LLMs.

In-Context Learning

Learning Semantic Relationship Among Instances for Image-Text Matching

1 code implementation CVPR 2023 Zheren Fu, Zhendong Mao, Yan Song, Yongdong Zhang

Image-text matching, a bridge connecting image and language, is an important task, which generally learns a holistic cross-modal embedding to achieve a high-quality semantic alignment between the two modalities.

Image Retrieval Image-text matching +7

Crossing the Gap: Domain Generalization for Image Captioning

no code implementations CVPR 2023 Yuchen Ren, Zhendong Mao, Shancheng Fang, Yan Lu, Tong He, Hao Du, Yongdong Zhang, Wanli Ouyang

In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process.

Domain Generalization Image Captioning +1

Intra-class Adaptive Augmentation with Neighbor Correction for Deep Metric Learning

1 code implementation29 Nov 2022 Zheren Fu, Zhendong Mao, Bo Hu, An-An Liu, Yongdong Zhang

They have overlooked the wide characteristic changes of different classes and can not model abundant intra-class variations for generations.

Image Augmentation Image Retrieval +5

ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting

1 code implementation19 Nov 2022 Shancheng Fang, Zhendong Mao, Hongtao Xie, Yuxin Wang, Chenggang Yan, Yongdong Zhang

In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input.

Blocking Language Modeling +3

UniRel: Unified Representation and Interaction for Joint Relational Triple Extraction

1 code implementation16 Nov 2022 Wei Tang, Benfeng Xu, Yuyue Zhao, Zhendong Mao, Yifeng Liu, Yong Liao, Haiyong Xie

Relational triple extraction is challenging for its difficulty in capturing rich correlations between entities and relations.

Relation Extraction

Improving Chinese Spelling Check by Character Pronunciation Prediction: The Effects of Adaptivity and Granularity

1 code implementation20 Oct 2022 Jiahao Li, Quan Wang, Zhendong Mao, Junbo Guo, Yanyan Yang, Yongdong Zhang

In this paper, we consider introducing an auxiliary task of Chinese pronunciation prediction (CPP) to improve CSC, and, for the first time, systematically discuss the adaptivity and granularity of this auxiliary task.

Lesion-Aware Transformers for Diabetic Retinopathy Grading

no code implementations CVPR 2021 Rui Sun, Yihao Li, Tianzhu Zhang, Zhendong Mao, Feng Wu, Yongdong Zhang

First, to the best of our knowledge, this is the first work to formulate lesion discovery as a weakly supervised lesion localization problem via a transformer decoder.

Decoder Diabetic Retinopathy Grading +1

Image Captioning with Context-Aware Auxiliary Guidance

no code implementations10 Dec 2020 Zeliang Song, Xiaofei Zhou, Zhendong Mao, Jianlong Tan

Image captioning is a challenging computer vision task, which aims to generate a natural language description of an image.

Decoder Image Captioning

Curriculum Learning for Natural Language Understanding

no code implementations ACL 2020 Benfeng Xu, Licheng Zhang, Zhendong Mao, Quan Wang, Hongtao Xie, Yongdong Zhang

With the great success of pre-trained language models, the pretrain-finetune paradigm now becomes the undoubtedly dominant solution for natural language understanding (NLU) tasks.

Natural Language Understanding

Graph Structured Network for Image-Text Matching

1 code implementation CVPR 2020 Chunxiao Liu, Zhendong Mao, Tianzhu Zhang, Hongtao Xie, Bin Wang, Yongdong Zhang

The GSMN explicitly models object, relation and attribute as a structured phrase, which not only allows to learn correspondence of object, relation and attribute separately, but also benefits to learn fine-grained correspondence of structured phrase.

Attribute Image-text matching +3

Cannot find the paper you are looking for? You can Submit a new open access paper.