Search Results for author: Xiaodong Gu

Found 56 papers, 22 papers with code

Building Joint Relationship Attention Network for Image-Text Generation

no code implementations COLING 2022 Changzhi Wang, Xiaodong Gu

Specifically, different from the previous relationship based approaches that only explore the single relationship in the image, our JRAN can effectively learn two relationships, the visual relationships among region features and the visual-semantic relationships between region features and semantic features, and further make a dynamic trade-off between them during outputting the relationship representation.

Text Generation

MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models

1 code implementation15 May 2025 Yuncheng Guo, Xiaodong Gu

During training, both class and representation features are jointly optimized: a trainable projection layer is applied to representation tokens for task adaptation, while the projection layer for class token remains frozen to retain pre-trained knowledge.

General Knowledge Representation Learning +1

Empowering AI to Generate Better AI Code: Guided Generation of Deep Learning Projects with LLMs

no code implementations21 Apr 2025 Chen Xie, Mingsheng Jiao, Xiaodong Gu, Beijun Shen

In this paper, we propose a novel planning-guided code generation method, DLCodeGen, tailored for generating deep learning projects.

Code Generation Deep Learning

AttentionRAG: Attention-Guided Context Pruning in Retrieval-Augmented Generation

no code implementations13 Mar 2025 Yixiong Fang, Tianran Sun, Yuling Shi, Xiaodong Gu

The core idea of AttentionRAG lies in its attention focus mechanism, which reformulates RAG queries into a next-token prediction paradigm.

RAG Retrieval

LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds

1 code implementation13 Mar 2025 Lingteng Qiu, Xiaodong Gu, Peihao Li, Qi Zuo, Weichao Shen, Junfei Zhang, Kejie Qiu, Weihao Yuan, GuanYing Chen, Zilong Dong, Liefeng Bo

Animatable 3D human reconstruction from a single image is a challenging problem due to the ambiguity in decoupling geometry, appearance, and deformation.

3D Human Reconstruction

MMRL: Multi-Modal Representation Learning for Vision-Language Models

1 code implementation11 Mar 2025 Yuncheng Guo, Xiaodong Gu

For inference, a decoupling strategy is employed, wherein both representation and class features are utilized for base classes, while only the class features, which retain more generalized knowledge, are used for new tasks.

Prompt Engineering Representation Learning +1

LAM: Large Avatar Model for One-shot Animatable Gaussian Head

no code implementations25 Feb 2025 Yisheng He, Xiaodong Gu, Xiaodan Ye, Chao Xu, Zhengyi Zhao, Yuan Dong, Weihao Yuan, Zilong Dong, Liefeng Bo

The centerpiece of our framework is the canonical Gaussian attributes generator, which utilizes FLAME canonical points as queries.

MCMat: Multiview-Consistent and Physically Accurate PBR Material Generation

no code implementations18 Dec 2024 Shenhao Zhu, Lingteng Qiu, Xiaodong Gu, Zhengyi Zhao, Chao Xu, Yuxiao He, Zhe Li, Xiaoguang Han, Yao Yao, Xun Cao, Siyu Zhu, Weihao Yuan, Zilong Dong, Hao Zhu

In the generation stage, we adopt a Diffusion Transformer (DiT) model to generate PBR materials, where both the specially designed multi-branch DiT and reference-based DiT blocks adopt a global attention mechanism to promote feature interaction and fusion between different views, thereby improving multi-view consistency.

MVImgNet2.0: A Larger-scale Dataset of Multi-view Images

no code implementations2 Dec 2024 Xiaoguang Han, Yushuang Wu, Luyue Shi, Haolin Liu, Hongjie Liao, Lingteng Qiu, Weihao Yuan, Xiaodong Gu, Zilong Dong, Shuguang Cui

This paper constructs the MVImgNet2. 0 dataset that expands MVImgNet into a total of ~520k objects and 515 categories, which derives a 3D dataset with a larger scale that is more comparable to ones in the 2D domain.

3D Reconstruction Object Reconstruction

LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues

no code implementations21 Nov 2024 Yalan Lin, Yingwei Ma, Rongyu Cao, Binhua Li, Fei Huang, Xiaodong Gu, Yongbin Li

Reproducing buggy code is the first and crucially important step in issue resolving, as it aids in identifying the underlying problems and validating that generated patches resolve the problem.

Just-In-Time Software Defect Prediction via Bi-modal Change Representation Learning

1 code implementation15 Oct 2024 Yuze Jiang, Beijun Shen, Xiaodong Gu

The prevailing approaches train models to represent code changes in history commits and utilize the learned representations to predict the presence of defects in the latest commit.

Representation Learning

CodeCipher: Learning to Obfuscate Source Code Against LLMs

no code implementations8 Oct 2024 Yalan Lin, Chengcheng Wan, Yixiong Fang, Xiaodong Gu

CodeCipher transforms the LLM's embedding matrix so that each row corresponds to a different word in the original matrix, forming a token-to-token confusion mapping for obfuscating source code.

Code Completion valid

From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

1 code implementation2 Oct 2024 Yuling Shi, Songsong Wang, Chengcheng Wan, Xiaodong Gu

While large language models have made significant strides in code generation, the pass rate of the generated code is bottlenecked on subtle errors, often requiring human intervention to pass tests, especially for complex problems.

Auto Debugging Bug fixing +3

MoGenTS: Motion Generation based on Spatial-Temporal Joint Modeling

no code implementations26 Sep 2024 Weihao Yuan, Weichao Shen, Yisheng He, Yuan Dong, Xiaodong Gu, Zilong Dong, Liefeng Bo, QiXing Huang

Motion generation from discrete quantization offers many advantages over continuous regression, but at the cost of inevitable approximation errors.

Motion Generation Quantization

Surface-Centric Modeling for High-Fidelity Generalizable Neural Surface Reconstruction

1 code implementation5 Sep 2024 Rui Peng, Shihe Shen, Kaiqiang Xiong, Huachen Gao, Jianbo Jiao, Xiaodong Gu, Ronggang Wang

To this end, we propose SuRF, a new Surface-centric framework that incorporates a new Region sparsification based on a matching Field, achieving good trade-offs between performance, efficiency and scalability.

Surface Reconstruction

HIVE: HIerarchical Volume Encoding for Neural Implicit Surface Reconstruction

no code implementations3 Aug 2024 Xiaodong Gu, Weihao Yuan, Heng Li, Zilong Dong, Ping Tan

To better represent 3D shapes, we introduce a volume encoding to explicitly encode the spatial information.

Surface Reconstruction

StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal

no code implementations24 Jun 2024 Chongjie Ye, Lingteng Qiu, Xiaodong Gu, Qi Zuo, Yushuang Wu, Zilong Dong, Liefeng Bo, Yuliang Xiu, Xiaoguang Han

The effectiveness of StableNormal is demonstrated through competitive performance in standard datasets such as DIODE-indoor, iBims, ScannetV2 and NYUv2, and also in various downstream tasks, such as surface reconstruction and normal enhancement.

Surface Normal Estimation Surface Reconstruction

GenS: Generalizable Neural Surface Reconstruction from Multi-View Images

1 code implementation NeurIPS 2023 Rui Peng, Xiaodong Gu, Luyang Tang, Shihe Shen, Fanqi Yu, Ronggang Wang

Meanwhile, we introduce a multi-scale feature-metric consistency to impose the multi-view consistency in a more discriminative multi-scale feature space, which is robust to the failures of the photometric consistency.

Surface Reconstruction

An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes

no code implementations22 Mar 2024 Zhengyi Zhao, Chen Song, Xiaodong Gu, Yuan Dong, Qi Zuo, Weihao Yuan, Liefeng Bo, Zilong Dong, QiXing Huang

In particular, the third and fourth stages are iterated, with the cuts obtained in the fourth stage encouraging non-rigid alignment in the third stage to focus on regions close to the cuts.

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model

no code implementations18 Mar 2024 Qi Zuo, Xiaodong Gu, Lingteng Qiu, Yuan Dong, Zhengyi Zhao, Weihao Yuan, Rui Peng, Siyu Zhu, Zilong Dong, Liefeng Bo, QiXing Huang

Images from video generative models are more suitable for multi-view generation because the underlying network architecture that generates them employs a temporal module to enforce frame consistency.

Denoising

Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers

1 code implementation12 Jan 2024 Yuling Shi, Hongyu Zhang, Chengcheng Wan, Xiaodong Gu

Based on our findings, we propose DetectCodeGPT, a novel method for detecting machine-generated code, which improves DetectGPT by capturing the distinct stylized patterns of code.

Code Generation Diversity

JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models

1 code implementation CVPR 2024 Yuncheng Guo, Xiaodong Gu

Our comprehensive experiments confirm that JoAPR substantially enhances the robustness of prompt learning for Vision-Language Models against label noise offering a promising direction for future research.

Prompt Engineering Prompt Learning

GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors

no code implementations CVPR 2024 Yuan Dong, Qi Zuo, Xiaodong Gu, Weihao Yuan, Zhengyi Zhao, Zilong Dong, Liefeng Bo, QiXing Huang

The key to our approach is a new diffusion procedure that combines the discrete empirical data distribution and a continuous distribution induced by the quality checker.

Denoising

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

no code implementations CVPR 2024 Lingteng Qiu, GuanYing Chen, Xiaodong Gu, Qi Zuo, Mutian Xu, Yushuang Wu, Weihao Yuan, Zilong Dong, Liefeng Bo, Xiaoguang Han

Lifting 2D diffusion for 3D generation is a challenging problem due to the lack of geometric prior and the complex entanglement of materials and lighting in natural images.

3D Generation Text to 3D

InfeRE: Step-by-Step Regex Generation via Chain of Inference

1 code implementation8 Aug 2023 Shuai Zhang, Xiaodong Gu, Yuting Chen, Beijun Shen

Particularly, InfeRE outperforms the popular tree-based generation approach by 18. 1% and 11. 3% on both datasets, respectively, in terms of DFA@5 accuracy.

Text Matching

3D Former: Monocular Scene Reconstruction with 3D SDF Transformers

1 code implementation31 Jan 2023 Weihao Yuan, Xiaodong Gu, Heng Li, Zilong Dong, Siyu Zhu

In this work, we propose an SDF transformer network, which replaces the role of 3D CNN for better 3D feature aggregation.

Dense RGB SLAM with Neural Implicit Maps

no code implementations21 Jan 2023 Heng Li, Xiaodong Gu, Weihao Yuan, Luwei Yang, Zilong Dong, Ping Tan

To reach this challenging goal without depth input, we introduce a hierarchical feature volume to facilitate the implicit map decoder.

Decoder Simultaneous Localization and Mapping

Influential Recommender System

no code implementations18 Nov 2022 Haoren Zhu, Hao Ge, Xiaodong Gu, Pengfei Zhao, Dik Lun Lee

Traditional recommender systems are typically passive in that they try to adapt their recommendations to the user's historical interests.

Recommendation Systems

Continuous Decomposition of Granularity for Neural Paraphrase Generation

1 code implementation COLING 2022 Xiaodong Gu, Zhaowei Zhang, Sang-Woo Lee, Kang Min Yoo, Jung-Woo Ha

While Transformers have had significant success in paragraph generation, they treat sentences as linear sequences of tokens and often neglect their hierarchical information.

Paraphrase Generation Sentence

Vision-Language Matching for Text-to-Image Synthesis via Generative Adversarial Networks

no code implementations20 Aug 2022 Qingrong Cheng, Keyu Wen, Xiaodong Gu

To address this issue, we propose a novel Vision-Language Matching strategy for text-to-image synthesis, named VLMGAN*, which introduces a dual vision-language matching mechanism to strengthen the image quality and semantic consistency.

Image Generation

Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval

no code implementations2 Jul 2022 Keyu Wen, Zhenshan Tan, Qingrong Cheng, Cheng Chen, Xiaodong Gu

Concretely, the first module is a weight-sharing transformer that builds on the head of the visual and textual encoders, aiming to semantically align text and image.

Contrastive Learning Cross-Modal Retrieval +5

RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds

no code implementations23 May 2022 Xiaodong Gu, Chengzhou Tang, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Ping Tan

In the experiments, we evaluate the proposed method on both the 3D scene flow estimation and the point cloud registration task.

Motion Estimation Point Cloud Registration +1

UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog

no code implementations CVPR 2022 Cheng Chen, Yudong Zhu, Zhenshan Tan, Qingrong Cheng, Xin Jiang, Qun Liu, Xiaodong Gu

In this paper, we propose a contrastive learning-based framework UTC to unify and facilitate both discriminative and generative tasks in visual dialog with a single model.

Contrastive Learning Representation Learning +1

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

1 code implementation CVPR 2022 Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, Ping Tan

While recent works design increasingly complicated and powerful networks to directly regress the depth map, we take the path of CRFs optimization.

Decoder Depth Prediction +1

RCP: Recurrent Closest Point for Point Cloud

1 code implementation CVPR 2022 Xiaodong Gu, Chengzhou Tang, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Ping Tan

In the experiments, we evaluate the proposed method on both the 3D scene flow estimation and the point cloud registration task.

Motion Estimation Point Cloud Registration +1

DRO: Deep Recurrent Optimizer for Video to Depth

1 code implementation24 Mar 2021 Xiaodong Gu, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Chengzhou Tang, Zilong Dong, Ping Tan

There are increasing interests of studying the video-to-depth (V2D) problem with machine learning techniques.

Learning Dual Semantic Relations with Graph Attention for Image-Text Matching

1 code implementation22 Oct 2020 Keyu Wen, Xiaodong Gu, Qingrong Cheng

Thus, a novel multi-level semantic relations enhancement approach named Dual Semantic Relations Attention Network(DSRAN) is proposed which mainly consists of two modules, separate semantic relations module and the joint semantic relations module.

Graph Attention Image-text matching +1

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching

4 code implementations CVPR 2020 Xiaodong Gu, Zhiwen Fan, Zuozhuo Dai, Siyu Zhu, Feitong Tan, Ping Tan

The deep multi-view stereo (MVS) and stereo matching approaches generally construct 3D cost volumes to regularize and regress the output depth or disparity.

3D Reconstruction Point Clouds +1

Landmark Assisted CycleGAN for Cartoon Face Generation

no code implementations2 Jul 2019 Ruizheng Wu, Xiaodong Gu, Xin Tao, Xiaoyong Shen, Yu-Wing Tai, Jiaya Jia

In this paper, we are interested in generating an cartoon face of a person by using unpaired training data between real faces and cartoon ones.

Face Generation

Attribute-Driven Spontaneous Motion in Unpaired Image Translation

1 code implementation ICCV 2019 Ruizheng Wu, Xin Tao, Xiaodong Gu, Xiaoyong Shen, Jiaya Jia

Current image translation methods, albeit effective to produce high-quality results in various applications, still do not consider much geometric transform.

Attribute Motion Estimation +1

Multiple Encoder-Decoders Net for Lane Detection

no code implementations ICLR 2019 Yuetong Du, Xiaodong Gu, Junqin Liu, Liwen He

For semantic image segmentation and lane detection, nets with a single spatial pyramid structure or encoder-decoder structure are usually exploited.

Decoder Image Segmentation +2

Batch DropBlock Network for Person Re-identification and Beyond

5 code implementations ICCV 2019 Zuozhuo Dai, Mingqiang Chen, Xiaodong Gu, Siyu Zhu, Ping Tan

In this paper, we propose the Batch DropBlock (BDB) Network which is a two branch network composed of a conventional ResNet-50 as the global branch and a feature dropping branch.

Image Retrieval Metric Learning +1

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning

no code implementations25 Apr 2017 Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim

They rely on the sparse availability of bilingual projects, thus producing a limited number of API mappings.

Deep API Learning

no code implementations27 May 2016 Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim

We propose DeepAPI, a deep learning based approach to generate API usage sequences for a given natural language query.

Decoder Information Retrieval +4

Towards Dropout Training for Convolutional Neural Networks

no code implementations1 Dec 2015 Haibing Wu, Xiaodong Gu

However, its effect in convolutional and pooling layers is still not clear.

Data Augmentation

Aspect-based Opinion Summarization with Convolutional Neural Networks

no code implementations30 Nov 2015 Haibing Wu, Yiwei Gu, Shangdi Sun, Xiaodong Gu

To tackle aspect mapping and sentiment classification, we propose two Convolutional Neural Network (CNN) based methods, cascaded CNN and multitask CNN.

Aspect Extraction Classification +6

Cannot find the paper you are looking for? You can Submit a new open access paper.