no code implementations • COLING 2022 • Changzhi Wang, Xiaodong Gu
Specifically, different from the previous relationship based approaches that only explore the single relationship in the image, our JRAN can effectively learn two relationships, the visual relationships among region features and the visual-semantic relationships between region features and semantic features, and further make a dynamic trade-off between them during outputting the relationship representation.
1 code implementation • 15 May 2025 • Yuncheng Guo, Xiaodong Gu
During training, both class and representation features are jointly optimized: a trainable projection layer is applied to representation tokens for task adaptation, while the projection layer for class token remains frozen to retain pre-trained knowledge.
no code implementations • 21 Apr 2025 • Chen Xie, Mingsheng Jiao, Xiaodong Gu, Beijun Shen
In this paper, we propose a novel planning-guided code generation method, DLCodeGen, tailored for generating deep learning projects.
no code implementations • 13 Mar 2025 • Yixiong Fang, Tianran Sun, Yuling Shi, Xiaodong Gu
The core idea of AttentionRAG lies in its attention focus mechanism, which reformulates RAG queries into a next-token prediction paradigm.
1 code implementation • 13 Mar 2025 • Lingteng Qiu, Xiaodong Gu, Peihao Li, Qi Zuo, Weichao Shen, Junfei Zhang, Kejie Qiu, Weihao Yuan, GuanYing Chen, Zilong Dong, Liefeng Bo
Animatable 3D human reconstruction from a single image is a challenging problem due to the ambiguity in decoupling geometry, appearance, and deformation.
1 code implementation • 11 Mar 2025 • Yuncheng Guo, Xiaodong Gu
For inference, a decoupling strategy is employed, wherein both representation and class features are utilized for base classes, while only the class features, which retain more generalized knowledge, are used for new tasks.
Ranked #2 on
Prompt Engineering
on ImageNet-A
no code implementations • 25 Feb 2025 • Yisheng He, Xiaodong Gu, Xiaodan Ye, Chao Xu, Zhengyi Zhao, Yuan Dong, Weihao Yuan, Zilong Dong, Liefeng Bo
The centerpiece of our framework is the canonical Gaussian attributes generator, which utilizes FLAME canonical points as queries.
no code implementations • 18 Dec 2024 • Shenhao Zhu, Lingteng Qiu, Xiaodong Gu, Zhengyi Zhao, Chao Xu, Yuxiao He, Zhe Li, Xiaoguang Han, Yao Yao, Xun Cao, Siyu Zhu, Weihao Yuan, Zilong Dong, Hao Zhu
In the generation stage, we adopt a Diffusion Transformer (DiT) model to generate PBR materials, where both the specially designed multi-branch DiT and reference-based DiT blocks adopt a global attention mechanism to promote feature interaction and fusion between different views, thereby improving multi-view consistency.
no code implementations • 3 Dec 2024 • Lingteng Qiu, Shenhao Zhu, Qi Zuo, Xiaodong Gu, Yuan Dong, Junfei Zhang, Chao Xu, Zhe Li, Weihao Yuan, Liefeng Bo, GuanYing Chen, Zilong Dong
Generating animatable human avatars from a single image is essential for various digital human modeling applications.
no code implementations • 2 Dec 2024 • Xiaoguang Han, Yushuang Wu, Luyue Shi, Haolin Liu, Hongjie Liao, Lingteng Qiu, Weihao Yuan, Xiaodong Gu, Zilong Dong, Shuguang Cui
This paper constructs the MVImgNet2. 0 dataset that expands MVImgNet into a total of ~520k objects and 515 categories, which derives a 3D dataset with a larger scale that is more comparable to ones in the 2D domain.
no code implementations • 21 Nov 2024 • Yalan Lin, Yingwei Ma, Rongyu Cao, Binhua Li, Fei Huang, Xiaodong Gu, Yongbin Li
Reproducing buggy code is the first and crucially important step in issue resolving, as it aids in identifying the underlying problems and validating that generated patches resolve the problem.
1 code implementation • 15 Oct 2024 • Yuze Jiang, Beijun Shen, Xiaodong Gu
The prevailing approaches train models to represent code changes in history commits and utilize the learned representations to predict the presence of defects in the latest commit.
no code implementations • 9 Oct 2024 • Zhe Li, Weihao Yuan, Yisheng He, Lingteng Qiu, Shenhao Zhu, Xiaodong Gu, Weichao Shen, Yuan Dong, Zilong Dong, Laurence T. Yang
For captioning, we finetune a large language model with the language-informative motion features to develop a strong motion captioning model.
no code implementations • 8 Oct 2024 • Yalan Lin, Chengcheng Wan, Yixiong Fang, Xiaodong Gu
CodeCipher transforms the LLM's embedding matrix so that each row corresponds to a different word in the original matrix, forming a token-to-token confusion mapping for obfuscating source code.
1 code implementation • 2 Oct 2024 • Yuling Shi, Songsong Wang, Chengcheng Wan, Xiaodong Gu
While large language models have made significant strides in code generation, the pass rate of the generated code is bottlenecked on subtle errors, often requiring human intervention to pass tests, especially for complex problems.
Ranked #18 on
Code Generation
on MBPP
no code implementations • 26 Sep 2024 • Weihao Yuan, Weichao Shen, Yisheng He, Yuan Dong, Xiaodong Gu, Zilong Dong, Liefeng Bo, QiXing Huang
Motion generation from discrete quantization offers many advantages over continuous regression, but at the cost of inevitable approximation errors.
1 code implementation • 5 Sep 2024 • Rui Peng, Shihe Shen, Kaiqiang Xiong, Huachen Gao, Jianbo Jiao, Xiaodong Gu, Ronggang Wang
To this end, we propose SuRF, a new Surface-centric framework that incorporates a new Region sparsification based on a matching Field, achieving good trade-offs between performance, efficiency and scalability.
no code implementations • 3 Aug 2024 • Xiaodong Gu, Weihao Yuan, Heng Li, Zilong Dong, Ping Tan
To better represent 3D shapes, we introduce a volume encoding to explicitly encode the spatial information.
no code implementations • 24 Jun 2024 • Chongjie Ye, Lingteng Qiu, Xiaodong Gu, Qi Zuo, Yushuang Wu, Zilong Dong, Liefeng Bo, Yuliang Xiu, Xiaoguang Han
The effectiveness of StableNormal is demonstrated through competitive performance in standard datasets such as DIODE-indoor, iBims, ScannetV2 and NYUv2, and also in various downstream tasks, such as surface reconstruction and normal enhancement.
1 code implementation • NeurIPS 2023 • Rui Peng, Xiaodong Gu, Luyang Tang, Shihe Shen, Fanqi Yu, Ronggang Wang
Meanwhile, we introduce a multi-scale feature-metric consistency to impose the multi-view consistency in a more discriminative multi-scale feature space, which is robust to the failures of the photometric consistency.
no code implementations • 22 Mar 2024 • Zhengyi Zhao, Chen Song, Xiaodong Gu, Yuan Dong, Qi Zuo, Weihao Yuan, Liefeng Bo, Zilong Dong, QiXing Huang
In particular, the third and fourth stages are iterated, with the cuts obtained in the fourth stage encouraging non-rigid alignment in the third stage to focus on regions close to the cuts.
no code implementations • 18 Mar 2024 • Qi Zuo, Xiaodong Gu, Lingteng Qiu, Yuan Dong, Zhengyi Zhao, Weihao Yuan, Rui Peng, Siyu Zhu, Zilong Dong, Liefeng Bo, QiXing Huang
Images from video generative models are more suitable for multi-view generation because the underlying network architecture that generates them employs a temporal module to enforce frame consistency.
1 code implementation • 12 Jan 2024 • Yuling Shi, Hongyu Zhang, Chengcheng Wan, Xiaodong Gu
Based on our findings, we propose DetectCodeGPT, a novel method for detecting machine-generated code, which improves DetectGPT by capturing the distinct stylized patterns of code.
1 code implementation • CVPR 2024 • Yuncheng Guo, Xiaodong Gu
Our comprehensive experiments confirm that JoAPR substantially enhances the robustness of prompt learning for Vision-Language Models against label noise offering a promising direction for future research.
no code implementations • CVPR 2024 • Yuan Dong, Qi Zuo, Xiaodong Gu, Weihao Yuan, Zhengyi Zhao, Zilong Dong, Liefeng Bo, QiXing Huang
The key to our approach is a new diffusion procedure that combines the discrete empirical data distribution and a continuous distribution induced by the quality checker.
no code implementations • CVPR 2024 • Lingteng Qiu, GuanYing Chen, Xiaodong Gu, Qi Zuo, Mutian Xu, Yushuang Wu, Weihao Yuan, Zilong Dong, Liefeng Bo, Xiaoguang Han
Lifting 2D diffusion for 3D generation is a challenging problem due to the lack of geometric prior and the complex entanglement of materials and lighting in natural images.
1 code implementation • 8 Aug 2023 • Shuai Zhang, Xiaodong Gu, Yuting Chen, Beijun Shen
Particularly, InfeRE outperforms the popular tree-based generation approach by 18. 1% and 11. 3% on both datasets, respectively, in terms of DFA@5 accuracy.
1 code implementation • 15 Mar 2023 • Youcai Zhang, Yuzhuo Qin, Hengwei Liu, Yanhao Zhang, Yaqian Li, Xiaodong Gu
Knowledge distillation (KD) has been extensively studied in single-label image classification.
1 code implementation • 31 Jan 2023 • Weihao Yuan, Xiaodong Gu, Heng Li, Zilong Dong, Siyu Zhu
In this work, we propose an SDF transformer network, which replaces the role of 3D CNN for better 3D feature aggregation.
no code implementations • 21 Jan 2023 • Heng Li, Xiaodong Gu, Weihao Yuan, Luwei Yang, Zilong Dong, Ping Tan
To reach this challenging goal without depth input, we introduce a hierarchical feature volume to facilitate the implicit map decoder.
no code implementations • 18 Nov 2022 • Haoren Zhu, Hao Ge, Xiaodong Gu, Pengfei Zhao, Dik Lun Lee
Traditional recommender systems are typically passive in that they try to adapt their recommendations to the user's historical interests.
1 code implementation • COLING 2022 • Xiaodong Gu, Zhaowei Zhang, Sang-Woo Lee, Kang Min Yoo, Jung-Woo Ha
While Transformers have had significant success in paragraph generation, they treat sentences as linear sequences of tokens and often neglect their hierarchical information.
no code implementations • 20 Aug 2022 • Qingrong Cheng, Keyu Wen, Xiaodong Gu
To address this issue, we propose a novel Vision-Language Matching strategy for text-to-image synthesis, named VLMGAN*, which introduces a dual vision-language matching mechanism to strengthen the image quality and semantic consistency.
no code implementations • 13 Aug 2022 • Zhenshan Tan, Cheng Chen, Keyu Wen, Yuzhuo Qin, Xiaodong Gu
With the design of negative samples, the noise objects are suppressed.
no code implementations • 2 Jul 2022 • Keyu Wen, Zhenshan Tan, Qingrong Cheng, Cheng Chen, Xiaodong Gu
Concretely, the first module is a weight-sharing transformer that builds on the head of the visual and textual encoders, aiming to semantically align text and image.
no code implementations • 23 May 2022 • Xiaodong Gu, Chengzhou Tang, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Ping Tan
In the experiments, we evaluate the proposed method on both the 3D scene flow estimation and the point cloud registration task.
no code implementations • CVPR 2022 • Cheng Chen, Yudong Zhu, Zhenshan Tan, Qingrong Cheng, Xin Jiang, Qun Liu, Xiaodong Gu
In this paper, we propose a contrastive learning-based framework UTC to unify and facilitate both discriminative and generative tasks in visual dialog with a single model.
1 code implementation • CVPR 2022 • Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, Ping Tan
While recent works design increasingly complicated and powerful networks to directly regress the depth map, we take the path of CRFs optimization.
Ranked #1 on
Depth Prediction
on Matterport3D
no code implementations • CVPR 2022 • Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, Ping Tan
Estimating the accurate depth from a single image is challenging since it is inherently ambiguous and ill-posed.
1 code implementation • CVPR 2022 • Xiaodong Gu, Chengzhou Tang, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Ping Tan
In the experiments, we evaluate the proposed method on both the 3D scene flow estimation and the point cloud registration task.
no code implementations • 4 Nov 2021 • Xiaodong Gu, Kang Min Yoo, Sang-Woo Lee
Pre-trained language models (PLM) have marked a huge leap in neural dialogue modeling.
1 code implementation • 24 Mar 2021 • Xiaodong Gu, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Chengzhou Tang, Zilong Dong, Ping Tan
There are increasing interests of studying the video-to-depth (V2D) problem with machine learning techniques.
1 code implementation • 3 Dec 2020 • Xiaodong Gu, Kang Min Yoo, Jung-Woo Ha
Recent advances in pre-trained language models have significantly improved neural response generation.
1 code implementation • 22 Oct 2020 • Keyu Wen, Xiaodong Gu, Qingrong Cheng
Thus, a novel multi-level semantic relations enhancement approach named Dual Semantic Relations Attention Network(DSRAN) is proposed which mainly consists of two modules, separate semantic relations module and the joint semantic relations module.
4 code implementations • CVPR 2020 • Xiaodong Gu, Zhiwen Fan, Zuozhuo Dai, Siyu Zhu, Feitong Tan, Ping Tan
The deep multi-view stereo (MVS) and stereo matching approaches generally construct 3D cost volumes to regularize and regress the output depth or disparity.
Ranked #13 on
Point Clouds
on Tanks and Temples
no code implementations • 2 Jul 2019 • Ruizheng Wu, Xiaodong Gu, Xin Tao, Xiaoyong Shen, Yu-Wing Tai, Jiaya Jia
In this paper, we are interested in generating an cartoon face of a person by using unpaired training data between real faces and cartoon ones.
1 code implementation • ICCV 2019 • Ruizheng Wu, Xin Tao, Xiaodong Gu, Xiaoyong Shen, Jiaya Jia
Current image translation methods, albeit effective to produce high-quality results in various applications, still do not consider much geometric transform.
no code implementations • ICLR 2019 • Yuetong Du, Xiaodong Gu, Junqin Liu, Liwen He
For semantic image segmentation and lane detection, nets with a single spatial pyramid structure or encoder-decoder structure are usually exploited.
5 code implementations • ICCV 2019 • Zuozhuo Dai, Mingqiang Chen, Xiaodong Gu, Siyu Zhu, Ping Tan
In this paper, we propose the Batch DropBlock (BDB) Network which is a two branch network composed of a conventional ResNet-50 as the global branch and a feature dropping branch.
Ranked #8 on
Person Re-Identification
on Market-1501-C
3 code implementations • ICLR 2019 • Xiaodong Gu, Kyunghyun Cho, Jung-Woo Ha, Sunghun Kim
Variational autoencoders~(VAEs) have shown a promise in data-driven conversation modeling.
no code implementations • 25 Apr 2017 • Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim
They rely on the sparse availability of bilingual projects, thus producing a limited number of API mappings.
no code implementations • 27 May 2016 • Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim
We propose DeepAPI, a deep learning based approach to generate API usage sequences for a given natural language query.
no code implementations • 4 Dec 2015 • Haibing Wu, Xiaodong Gu
Recently, dropout has seen increasing use in deep learning.
no code implementations • 1 Dec 2015 • Haibing Wu, Xiaodong Gu
However, its effect in convolutional and pooling layers is still not clear.
no code implementations • 30 Nov 2015 • Haibing Wu, Yiwei Gu, Shangdi Sun, Xiaodong Gu
To tackle aspect mapping and sentiment classification, we propose two Convolutional Neural Network (CNN) based methods, cascaded CNN and multitask CNN.