Search Results for author: Gang Xiong

Found 24 papers, 18 papers with code

Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning

1 code implementation21 Apr 2025 Jie Cheng, Ruixi Qiao, Lijun Li, Chao Guo, Junle Wang, Gang Xiong, Yisheng Lv, Fei-Yue Wang

In this paper, we identify the main cause of PRM-induced reward hacking: the canonical summation-form credit assignment in reinforcement learning (RL), which defines the value as cumulative gamma-decayed future rewards, easily induces LLMs to hack steps with high rewards.

All Form +2

Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval

1 code implementation21 Mar 2025 Yuanmin Tang, Jing Yu, Keke Gai, Jiamin Zhuang, Gang Xiong, Gaopeng Gou, Qi Wu

The key challenge for ZS-CIR tasks is to modify a reference image according to manipulation text to accurately retrieve a target image, especially when the reference image is missing essential target content.

Attribute Image Retrieval +1

MADS: Multi-Attribute Document Supervision for Zero-Shot Image Classification

no code implementations10 Mar 2025 Xiangyan Qu, Jing Yu, Jiamin Zhuang, Gaopeng Gou, Gang Xiong, Qi Wu

Zero-shot learning (ZSL) aims to train a model on seen classes and recognize unseen classes by knowledge transfer through shared auxiliary information.

Attribute Image Classification +3

ProAPO: Progressively Automatic Prompt Optimization for Visual Classification

1 code implementation27 Feb 2025 Xiangyan Qu, Gaopeng Gou, Jiamin Zhuang, Jing Yu, Kun Song, Qihao Wang, Yili Li, Gang Xiong

While recent methods show that visual descriptions generated by large language models (LLMs) enhance the generalization of VLMs, class-specific prompts may be inaccurate or lack discrimination due to the hallucination in LLMs.

Classification Hallucination +1

Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval

1 code implementation15 Dec 2024 Yuanmin Tang, Xiaoting Qin, Jue Zhang, Jing Yu, Gaopeng Gou, Gang Xiong, Qingwei Ling, Saravan Rajmohan, Dongmei Zhang, Qi Wu

Existing training-free zero-shot CIR (ZS-CIR) methods often employ a two-stage process: they first generate a caption for the reference image and then use Large Language Models for reasoning to obtain a target description.

Image Retrieval Retrieval +1

Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval

no code implementations22 Oct 2024 Yuanmin Tang, Jing Yu, Keke Gai, Jiamin Zhuang, Gaopeng Gou, Gang Xiong, Qi Wu

Then, a pseudo-composed mapping module maps the pseudo-reference image to a pseudo-word token and combines it with the pseudo-manipulation text with manipulation intention.

Attribute Denoising +4

Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining

1 code implementation1 Oct 2024 Jie Cheng, Ruixi Qiao, Gang Xiong, Qinghai Miao, Yingwei Ma, Binhua Li, Yongbin Li, Yisheng Lv

Experimental results indicate that our largest agent, with 150 million parameters, achieves 78. 9% human-level performance on pretrained games using only 10% subsampled offline data, outperforming existing state-of-the-art large-scale offline RL baselines by 31. 6% on averange.

Atari Games model +3

T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval

1 code implementation21 Aug 2024 Yili Li, Jing Yu, Keke Gai, Bang Liu, Gang Xiong, Qi Wu

To enhance retrieval efficiency, in this paper, we introduce a model-based video indexer named T2VIndexer, which is a sequence-to-sequence generative model directly generating video identifiers and retrieving candidate videos with constant time complexity.

Retrieval Video Retrieval

IIU: Independent Inference Units for Knowledge-based Visual Question Answering

1 code implementation15 Aug 2024 Yili Li, Jing Yu, Keke Gai, Gang Xiong

In this paper, we propose Independent Inference Units (IIU) for fine-grained multi-modal reasoning to decompose intra-modal information by the functionally independent units.

Question Answering Visual Question Answering

Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning

1 code implementation22 Jul 2024 Xiangyan Qu, Jing Yu, Keke Gai, Jiamin Zhuang, Yuanmin Tang, Gang Xiong, Gaopeng Gou, Qi Wu

In this work, we propose a novel network to extract multi-view semantic concepts from documents and images and align the matching rather than entire concepts.

Diversity Zero-Shot Learning

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

1 code implementation CVPR 2024 Tongtian Yue, Jie Cheng, Longteng Guo, Xingyuan Dai, Zijia Zhao, Xingjian He, Gang Xiong, Yisheng Lv, Jing Liu

In this paper, we present and delve into the self-consistency capability of LVLMs, a crucial aspect that reflects the models' ability to both generate informative captions for specific objects and subsequently utilize these captions to accurately re-identify the objects in a closed-loop process.

RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences

1 code implementation27 Feb 2024 Jie Cheng, Gang Xiong, Xingyuan Dai, Qinghai Miao, Yisheng Lv, Fei-Yue Wang

Our experiments on robotic manipulation and locomotion tasks demonstrate that RIME significantly enhances the robustness of the state-of-the-art PbRL method.

reinforcement-learning Reinforcement Learning

Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding as a Service

1 code implementation10 Nov 2023 Yuanmin Tang, Jing Yu, Keke Gai, Xiangyan Qu, Yue Hu, Gang Xiong, Qi Wu

Our extensive experiments on various datasets indicate that the proposed watermarking approach is effective and safe for verifying the copyright of VLPs for multi-modal EaaS and robust against model extraction attacks.

Model extraction

Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval

1 code implementation28 Sep 2023 Yuanmin Tang, Jing Yu, Keke Gai, Jiamin Zhuang, Gang Xiong, Yue Hu, Qi Wu

Different from Composed Image Retrieval task that requires expensive labels for training task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR) involves diverse tasks with a broad range of visual content manipulation intent that could be related to domain, scene, object, and attribute.

Attribute Image Retrieval +4

Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search

1 code implementation28 Sep 2023 Yuanmin Tang, Jing Yu, Keke Gai, Yujing Wang, Yue Hu, Gang Xiong, Qi Wu

Conventional research mainly studies from the view of modeling the implicit correlations between images and texts for query-ads matching, ignoring the alignment of detailed product information and resulting in suboptimal search performance. In this work, we propose a simple alignment network for explicitly mapping fine-grained visual parts in ads images to the corresponding text, which leverages the co-occurrence structure consistency between vision and language spaces without requiring expensive labeled training data.

cross-modal alignment Image-text matching +2

Evaluate Geometry of Radiance Fields with Low-frequency Color Prior

1 code implementation10 Apr 2023 Qihang Fang, Yafei Song, Keqiang Li, Li Shen, Huaiyu Wu, Gang Xiong, Liefeng Bo

From this insight, given a reconstructed density field and observation images, we design a closed-form method to approximate the color field with low-frequency spherical harmonics, and compute the inverse mean residual color.

3D Reconstruction Novel View Synthesis

GraphFit: Learning Multi-scale Graph-Convolutional Representation for Point Cloud Normal Estimation

1 code implementation23 Jul 2022 Keqiang Li, Mingyang Zhao, Huaiyu Wu, Dong-Ming Yan, Zhen Shen, Fei-Yue Wang, Gang Xiong

We propose a precise and efficient normal estimation method that can deal with noise and nonuniform density for unstructured 3D point clouds.

Surface Normals Estimation

TTAGN: Temporal Transaction Aggregation Graph Network for Ethereum Phishing Scams Detection

no code implementations28 Apr 2022 Sijia Li, Gaopeng Gou, Chang Liu, Chengshang Hou, Zhenzhen Li, Gang Xiong

In this paper, we propose a Temporal Transaction Aggregation Graph Network (TTAGN) to enhance phishing scams detection performance on Ethereum.

Representation Learning

6GAN: IPv6 Multi-Pattern Target Generation via Generative Adversarial Nets with Reinforcement Learning

1 code implementation21 Apr 2022 Tianyu Cui, Gaopeng Gou, Gang Xiong, Chang Liu, Peipei Fu, Zhen Li

6GAN forces multiple generators to train with a multi-class discriminator and an alias detector to generate non-aliased active targets with different addressing pattern types.

Decision Making reinforcement-learning +2

6GCVAE: Gated Convolutional Variational Autoencoder for IPv6 Target Generation

no code implementations20 Apr 2022 Tianyu Cui, Gaopeng Gou, Gang Xiong

IPv6 scanning has always been a challenge for researchers in the field of network measurement.

ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification

1 code implementation13 Feb 2022 Xinjie Lin, Gang Xiong, Gaopeng Gou, Zhen Li, Junzheng Shi, Jing Yu

In this paper, we propose a new traffic representation model called Encrypted Traffic Bidirectional Encoder Representations from Transformer (ET-BERT), which pre-trains deep contextualized datagram-level representation from large-scale unlabeled data.

Classification Management +1

6VecLM: Language Modeling in Vector Space for IPv6 Target Generation

no code implementations5 Aug 2020 Tianyu Cui, Gang Xiong, Gaopeng Gou, Junzheng Shi, Wei Xia

Fast IPv6 scanning is challenging in the field of network measurement as it requires exploring the whole IPv6 address space but limited by current computational power.

Language Modeling Language Modelling

Using Non-invertible Data Transformations to Build Adversarial-Robust Neural Networks

no code implementations6 Oct 2016 Qinglong Wang, Wenbo Guo, Alexander G. Ororbia II, Xinyu Xing, Lin Lin, C. Lee Giles, Xue Liu, Peng Liu, Gang Xiong

Deep neural networks have proven to be quite effective in a wide variety of machine learning tasks, ranging from improved speech recognition systems to advancing the development of autonomous vehicles.

Autonomous Vehicles Dimensionality Reduction +2

Cannot find the paper you are looking for? You can Submit a new open access paper.