no code implementations • 3 Mar 2025 • Zhendong Wang, Jianmin Bao, Shuyang Gu, Dong Chen, Wengang Zhou, Houqiang Li
In this paper, we present DesignDiffusion, a simple yet effective framework for the novel task of synthesizing design images from textual descriptions.
1 code implementation • 28 Feb 2025 • Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li, Yan Lu
In practice, the coding speed of NVCs depends on 1) computational costs, and 2) non-computational operational costs, such as memory I/O and the number of function calls.
no code implementations • 19 Feb 2025 • Yan Yu, Wengang Zhou, Yaodong Yang, Wanxuan Lu, Yingyan Hou, Houqiang Li
To this end, we propose a Model Evolution framework with Genetic Algorithm (MEGA), which enables the model to evolve during training according to the difficulty of the tasks.
1 code implementation • 25 Jan 2025 • Zecheng Li, Wengang Zhou, Weichao Zhao, Kepeng Wu, Hezhen Hu, Houqiang Li
Sign language pre-training has gained increasing attention for its ability to enhance performance across various sign language understanding (SLU) tasks.
Ranked #1 on
Gloss-free Sign Language Translation
on OpenASL
(using extra training data)
Computational Efficiency
Gloss-free Sign Language Translation
+3
no code implementations • 14 Jan 2025 • Longtao Jiang, Zhendong Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Lei Shi, Dong Chen, Houqiang Li
This paradigm retains the masked region in the input, using it as guidance for the removal process.
no code implementations • 7 Jan 2025 • Shang Chai, Zihang Lin, Min Zhou, Xubin Li, Liansheng Zhuang, Houqiang Li
To solve this issue, this paper approaches a novel framework named SceneBooth for subject-preserved text-to-image generation, which consumes inputs of a subject image, object phrases and text prompts.
1 code implementation • 19 Dec 2024 • Xiao Cui, Mo Zhu, Yulei Qin, Liang Xie, Wengang Zhou, Houqiang Li
Knowledge distillation (KD) has become a prevalent technique for compressing large language models (LLMs).
no code implementations • 17 Dec 2024 • Xiaomeng Chu, Jiajun Deng, Guoliang You, Yifan Duan, Houqiang Li, Yanyong Zhang
For the latter, we initially incorporate a radar-guided depth head to refine the transformation from image view to BEV.
1 code implementation • 16 Dec 2024 • Junjie Lin, Jian Zhao, Lin Liu, Yue Deng, Youpeng Zhao, Lanxiao Huang, Xia Lin, Wengang Zhou, Houqiang Li
After iterative refinements, our curling AI based on the decision tree ranks first on the Jidi platform among 34 curling AIs in total, which demonstrates that LLMs can significantly enhance the robustness and adaptability of decision trees, representing a substantial advancement in the field of Game AI.
no code implementations • 27 Nov 2024 • Zhiyang Guo, Jinxu Xiang, Kai Ma, Wengang Zhou, Houqiang Li, Ran Zhang
3D characters are essential to modern creative industries, but making them animatable often demands extensive manual work in tasks like rigging and skinning.
2 code implementations • 24 Nov 2024 • Yonghui Wang, Shi-Yong Chen, Zhenxing Zhou, Siyi Li, Haoran Li, Wengang Zhou, Houqiang Li
To train our SceneVLM, we collect over 610, 000 images from various public indoor datasets and implement a scene data generation pipeline with a semi-automated technique to establish relationships and estimate distances among indoor objects.
no code implementations • 19 Nov 2024 • Zongmeng Zhang, Jinhua Zhu, Wengang Zhou, Xiang Qi, Peng Zhang, Houqiang Li
Dense retrieval, which aims to encode the semantic information of arbitrary text into dense vector representations or embeddings, has emerged as an effective and efficient paradigm for text retrieval, consequently becoming an essential component in various natural language processing systems.
1 code implementation • 22 Oct 2024 • Zongmeng Zhang, Yufeng Shi, Jinhua Zhu, Wengang Zhou, Xiang Qi, Peng Zhang, Houqiang Li
Trustworthiness is an essential prerequisite for the real-world application of large language models.
no code implementations • 9 Oct 2024 • Xiaoyang Liu, Yunyao Mao, Wengang Zhou, Houqiang Li
We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks and aligning them with human preferences.
no code implementations • 6 Oct 2024 • Xiao Cui, Weicai Ye, Yifan Wang, Guofeng Zhang, Wengang Zhou, Houqiang Li
Reconstructing urban street scenes is crucial due to its vital role in applications such as autonomous driving and urban planning.
no code implementations • 17 Sep 2024 • Weiye Xu, Min Wang, Wengang Zhou, Houqiang Li
Embodied Everyday Task is a popular task in the embodied AI community, requiring agents to make a sequence of actions based on natural language instructions and visual observations.
no code implementations • 3 Sep 2024 • Wei Chen, Zhen Huang, Liang Xie, Binbin Lin, Houqiang Li, Le Lu, Xinmei Tian, Deng Cai, Yonggang Zhang, Wenxiao Wang, Xu Shen, Jieping Ye
Recent works propose to employ supervised fine-tuning (SFT) to mitigate the sycophancy issue, while it typically leads to the degeneration of LLMs' general capability.
1 code implementation • 30 Aug 2024 • Yonghui Wang, Wengang Zhou, Hao Feng, Houqiang Li
We hypothesize that the requisite number of visual tokens for the model is contingent upon both the resolution and content of the input image.
1 code implementation • 25 Aug 2024 • Keyi Zhou, Li Li, Wengang Zhou, Yonghui Wang, Hao Feng, Houqiang Li
In this work, we propose LaneTCA to bridge the individual video frames and explore how to effectively aggregate the temporal context.
no code implementations • 16 Aug 2024 • Wengang Zhou, Weichao Zhao, Hezhen Hu, Zecheng Li, Houqiang Li
To facilitate communication between the deaf-mute and hearing people, a series of sign language understanding (SLU) tasks have been studied in recent years, including isolated/continuous sign language recognition (ISLR/CSLR), gloss-free sign language translation (GF-SLT) and sign language retrieval (SL-RT).
1 code implementation • 7 Aug 2024 • Yonghui Wang, Shaokai Liu, Li Li, Wengang Zhou, Houqiang Li
In this case, when the color of the object is similar to that of the shadow, existing methods struggle to achieve accurate detection.
1 code implementation • 23 Jul 2024 • Longtao Jiang, Min Wang, Zecheng Li, Yao Fang, Wengang Zhou, Houqiang Li
Furthermore, existing RGB-based sign retrieval works suffer from the huge memory cost of dense visual data embedding in end-to-end training, and adopt offline RGB encoder instead, leading to suboptimal feature representation.
no code implementations • 7 Jul 2024 • Qi Sun, Hang Zhou, Wengang Zhou, Li Li, Houqiang Li
Synthesizing realistic 3D indoor scenes is a challenging task that traditionally relies on manual arrangement and annotation by expert designers.
no code implementations • 27 Jun 2024 • Zhaokang Liao, Hao Feng, Shaokai Liu, Wengang Zhou, Houqiang Li
Existing rectification methods are limited to central fisheye images, while this paper proposes a novel method that extends to deviated fisheye image rectification.
no code implementations • 25 Jun 2024 • Lin Liu, Quande Liu, Shengju Qian, Yuan Zhou, Wengang Zhou, Houqiang Li, Lingxi Xie, Qi Tian
Video generation is a challenging yet pivotal task in various industries, such as gaming, e-commerce, and advertising.
no code implementations • 20 Jun 2024 • Xihua Sheng, Li Li, Dong Liu, Houqiang Li
With these filters, our codec can adapt to different reference qualities, making it easier to achieve the target reconstruction quality and reduce the reconstruction error propagation.
1 code implementation • 15 Jun 2024 • Weichao Zhao, Wengang Zhou, Hezhen Hu, Min Wang, Houqiang Li
On one hand, since the semantics of sign language are expressed by the cooperation of fine-grained hands and coarse-grained trunks, we utilize both granularity information and encode them into latent spaces.
1 code implementation • 12 Jun 2024 • Huijie Yao, Wengang Zhou, Hao Zhou, Houqiang Li
Spoken language glossification (SLG) aims to translate the spoken language text into the sign language gloss, i. e., a written record of sign language.
1 code implementation • 6 Jun 2024 • Lin Liu, Jian Zhao, Cheng Hu, Zhengtao Cao, Youpeng Zhao, Zhenbin Ye, Meng Meng, Wenjun Wang, Zhaofeng He, Houqiang Li, Xia Lin, Lanxiao Huang
To address these issues, we introduce the first publicly available map editor for the popular mobile game Honor of Kings and design a lightweight environment, Mini Honor of Kings (Mini HoK), for researchers to conduct experiments.
1 code implementation • 3 Jun 2024 • Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shu Wei, Binghong Wu, Lei Liao, YongJie Ye, Hao liu, Wengang Zhou, Houqiang Li, Can Huang
In this mechanism, all the involved diverse visual table understanding (VTU) tasks and multi-source visual embeddings are abstracted as concepts.
1 code implementation • 31 May 2024 • Weichao Zhao, Hezhen Hu, Wengang Zhou, Yunyao Mao, Min Wang, Houqiang Li
To this end, we propose a Motion-Aware masked autoencoder with Semantic Alignment (MASA) that integrates rich motion cues and global semantic information in a self-supervised learning paradigm for SLR.
1 code implementation • 28 May 2024 • Qi Sun, Zhiyang Guo, Ziyu Wan, Jing Nathan Yan, Shengming Yin, Wengang Zhou, Jing Liao, Houqiang Li
In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects.
no code implementations • 24 May 2024 • Yunyao Mao, Xiaoyang Liu, Wengang Zhou, Zhenbo Lu, Houqiang Li
Text-driven human motion generation, as one of the vital tasks in computer-aided content creation, has recently attracted increasing attention.
1 code implementation • 18 Apr 2024 • Xiaoyu Qiu, Hao Feng, Yuechen Wang, Wengang Zhou, Houqiang Li
Initialization is responsible for encoding images and text using a VLM, followed by a feature filter that selects text features similar to image.
1 code implementation • 15 Apr 2024 • Bozhi Luan, Hao Feng, Hong Chen, Yonghui Wang, Wengang Zhou, Houqiang Li
The image overview stage provides a comprehensive understanding of the global scene information, and the coarse localization stage approximates the image area containing the answer based on the question asked.
no code implementations • 19 Mar 2024 • Xiaoyu Qiu, Yuechen Wang, Jiaxin Shi, Wengang Zhou, Houqiang Li
To efficiently transfer soft prompt, we propose a novel framework, Multilingual Prompt Translator (MPT), where a multilingual prompt translator is introduced to properly process crucial knowledge embedded in prompt by changing language knowledge while retaining task knowledge.
no code implementations • 18 Mar 2024 • Zhiyang Guo, Wengang Zhou, Li Li, Min Wang, Houqiang Li
To address the above problem, we propose a novel motion-aware enhancement framework for dynamic scene reconstruction, which mines useful motion cues from optical flow to improve different paradigms of dynamic 3DGS.
1 code implementation • 18 Mar 2024 • Xiaohan Lei, Min Wang, Wengang Zhou, Houqiang Li
In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment.
1 code implementation • 18 Mar 2024 • Sha Zhang, Jiajun Deng, Lei Bai, Houqiang Li, Wanli Ouyang, Yanyong Zhang
We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network with a pre-trained image network in an unsupervised man- ner.
no code implementations • 3 Mar 2024 • Yongchao Du, Min Wang, Wengang Zhou, Shuping Hui, Houqiang Li
To tackle the above problems, we propose Image2Sentence based Asymmetric zero-shot composed image retrieval (ISA), which takes advantage of the VL model and only relies on unlabeled images for composition learning.
no code implementations • CVPR 2023 • Hui Wu, Min Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li
Then, a dynamic mixer is introduced to aggregate these features into compact embedding for efficient search.
1 code implementation • 1 Mar 2024 • Hui Wu, Min Wang, Wengang Zhou, Houqiang Li
The centroid vectors in the quantizer serve as anchor points in the embedding space of the gallery model to characterize its structure.
1 code implementation • 29 Feb 2024 • Hao Feng, Wendi Wang, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li
In this work, we present DeepEraser, an effective deep network for generic text removal.
1 code implementation • 27 Feb 2024 • Xiao Cui, Yulei Qin, Yuting Gao, Enwei Zhang, Zihan Xu, Tong Wu, Ke Li, Xing Sun, Wengang Zhou, Houqiang Li
We propose the Sinkhorn Knowledge Distillation (SinKD) that exploits the Sinkhorn distance to ensure a nuanced and precise assessment of the disparity between teacher and student distributions.
1 code implementation • CVPR 2024 • Xiaohan Lei, Min Wang, Wengang Zhou, Li Li, Houqiang Li
As a new embodied vision task, Instance ImageGoal Navigation (IIN) aims to navigate to a specified object depicted by a goal image in an unexplored environment.
no code implementations • 29 Jan 2024 • Xihua Sheng, Li Li, Dong Liu, Houqiang Li
With the SDD-based motion model and long short-term temporal contexts fusion, our proposed learned video codec can obtain more accurate inter prediction.
no code implementations • 15 Jan 2024 • Qi Sun, Xiao Cui, Wengang Zhou, Houqiang Li
In this study, we tackle the challenge of classifying the object category in point clouds, which previous works like PointCLIP struggle to address due to the inherent limitations of the CLIP architecture.
no code implementations • CVPR 2024 • Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, Yan Lu
To address this issue we introduce a Generative Latent Coding (GLC) architecture which performs transform coding in the latent space of a generative vector-quantized variational auto-encoder (VQ-VAE) instead of in the pixel space.
1 code implementation • 26 Dec 2023 • Jiarui Zhang, Ruixu Geng, Xiaolong Du, Yan Chen, Houqiang Li, Yang Hu
In this work, we propose NLOS-LTM, a novel passive NLOS imaging method that effectively handles multiple light transport conditions with a single network.
2 code implementations • 21 Dec 2023 • Han Shu, Wenshuo Li, Yehui Tang, Yiman Zhang, Yihao Chen, Houqiang Li, Yunhe Wang, Xinghao Chen
Massive following works have developed various applications based on the pre-trained SAM and achieved impressive performance on downstream vision tasks.
1 code implementation • 5 Dec 2023 • Youpeng Zhao, Yudong Lu, Jian Zhao, Wengang Zhou, Houqiang Li
The utilization of artificial intelligence (AI) in card games has been a well-explored subject within AI research for an extensive period.
1 code implementation • 22 Nov 2023 • Yonghui Wang, Wengang Zhou, Hao Feng, Keyi Zhou, Houqiang Li
Moreover, we curate a collection of text-rich images and prompt the text-only GPT-4 to generate 12K high-quality conversations, featuring textual locations within text-rich scenarios.
no code implementations • 20 Nov 2023 • Hao Feng, Qi Liu, Hao liu, Jingqun Tang, Wengang Zhou, Houqiang Li, Can Huang
This work presents DocPedia, a novel large multimodal model (LMM) for versatile OCR-free document understanding, capable of parsing images up to 2, 560$\times$2, 560 resolution.
no code implementations • 8 Nov 2023 • Hezhen Hu, Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Lu Yuan, Dong Chen, Houqiang Li
Pre-training is playing an increasingly important role in learning generic feature representation for Person Re-identification (ReID).
no code implementations • 1 Nov 2023 • Yonghui Wang, Wengang Zhou, Hao Feng, Li Li, Houqiang Li
To handle this issue, we consider removing the shadow in a coarse-to-fine fashion and propose a simple but effective Progressive Recurrent Network (PRNet).
no code implementations • 24 Oct 2023 • Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang, Houqiang Li
Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities during pre-training.
no code implementations • 18 Oct 2023 • Yufei Kuang, Xijun Li, Jie Wang, Fangzhou Zhu, Meng Lu, Zhihai Wang, Jia Zeng, Houqiang Li, Yongdong Zhang, Feng Wu
Specifically, we formulate the routine design task as a Markov decision process and propose an RL framework with adaptive action sequences to generate high-quality presolve routines efficiently.
no code implementations • 8 Oct 2023 • Rusheng Zhang, Depu Meng, Shengyin Shen, Zhengxia Zou, Houqiang Li, Henry X. Liu
As vehicular communication and networking technologies continue to advance, infrastructure-based roadside perception emerges as a pivotal tool for connected automated vehicle (CAV) applications.
no code implementations • 7 Oct 2023 • Yuchen Yang, Houqiang Li, Yanfeng Wang, Yu Wang
In this study, we introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
no code implementations • ICCV 2023 • Huijie Yao, Wengang Zhou, Hao Feng, Hezhen Hu, Hao Zhou, Houqiang Li
Technically, IP-SLT consists of feature extraction, prototype initialization, and iterative prototype refinement.
Ranked #6 on
Sign Language Translation
on CSL-Daily
no code implementations • 19 Aug 2023 • Hao Feng, Zijian Wang, Jingqun Tang, Jinghui Lu, Wengang Zhou, Houqiang Li, Can Huang
However, existing advanced algorithms are limited to effectively utilizing the immense representation capabilities and rich world knowledge inherent to these large pre-trained models, and the beneficial connections among tasks within the context of text-rich scenarios have not been sufficiently explored.
no code implementations • ICCV 2023 • Hao Feng, Wendi Wang, Jiajun Deng, Wengang Zhou, Li Li, Houqiang Li
To make the best of such rectification cues, we introduce SimFIR, a simple framework for fisheye image rectification based on self-supervised representation learning.
no code implementations • 17 Aug 2023 • Yuechen Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li
Visual storytelling aims to generate a narrative based on a sequence of images, necessitating both vision-language alignment and coherent story generation.
1 code implementation • 16 Aug 2023 • Shengming Yin, Chenfei Wu, Jian Liang, Jie Shi, Houqiang Li, Gong Ming, Nan Duan
Our experiments validate the effectiveness of DragNUWA, demonstrating its superior performance in fine-grained control in video generation.
1 code implementation • ICCV 2023 • Yunyao Mao, Jiajun Deng, Wengang Zhou, Yao Fang, Wanli Ouyang, Houqiang Li
To be specific, the proposed MAMP takes as input the masked spatio-temporal skeleton sequence and predicts the corresponding temporal motion of the masked human joints.
Few-Shot Skeleton-Based Action Recognition
motion prediction
+2
1 code implementation • ICCV 2023 • Yufei Yin, Jiajun Deng, Wengang Zhou, Li Li, Houqiang Li
These inaccurate high-scoring region proposals will mislead the training of subsequent refinement modules and thus hamper the detection performance.
no code implementations • 8 Aug 2023 • Weichao Zhao, Hezhen Hu, Wengang Zhou, Li Li, Houqiang Li
Reconstructing interacting hands from monocular RGB data is a challenging task, as it involves many interfering factors, e. g. self- and mutual occlusion and similar textures.
1 code implementation • CVPR 2023 • Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Houqiang Li
In this paper, we propose to capture both spatial and temporal artifacts in one model for face forgery detection.
no code implementations • 19 Jun 2023 • Xihua Sheng, Li Li, Dong Liu, Houqiang Li
Such compact representations need to be decoded back to pixels before being displayed to humans and - as usual - before being enhanced/analyzed by machine vision algorithms.
no code implementations • 9 Jun 2023 • Lin Liu, Mingming Zhao, Shanxin Yuan, Wenlong Lyu, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian
Specifically, Cube Mask Sampling Module (CMSM) is proposed to apply both spatial and channel mask sampling modeling to image compression in the pre-training stage.
1 code implementation • 3 Jun 2023 • Haolin Song, Mingxiao Feng, Wengang Zhou, Houqiang Li
Recent approaches have utilized self-supervised auxiliary tasks as representation learning to improve the performance and sample efficiency of vision-based reinforcement learning algorithms in single-agent settings.
1 code implementation • 26 May 2023 • Yonghui Wang, Wengang Zhou, Yunyao Mao, Houqiang Li
Segment anything model (SAM) has achieved great success in the field of natural image segmentation.
1 code implementation • 16 May 2023 • Zongmeng Zhang, Wengang Zhou, Jiaxin Shi, Houqiang Li
In passage retrieval system, the initial passage retrieval results may be unsatisfactory, which can be refined by a reranking scheme.
no code implementations • 8 May 2023 • Hezhen Hu, Weichao Zhao, Wengang Zhou, Houqiang Li
In our framework, the hand pose is regarded as a visual token, which is derived from an off-the-shelf detector.
Ranked #2 on
Sign Language Recognition
on MSASL-1000
(using extra training data)
1 code implementation • ICLR 2023 • Jinhua Zhu, Kehan Wu, Bohan Wang, Yingce Xia, Shufang Xie, Qi Meng, Lijun Wu, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
Despite the recent success of molecular modeling with graph neural networks (GNNs), few models explicitly take rings in compounds into consideration, consequently limiting the expressiveness of the models.
Ranked #1 on
Graph Regression
on PCQM4M-LSC
(Validation MAE metric)
1 code implementation • 20 Apr 2023 • Shaokai Liu, Hao Feng, Wengang Zhou, Houqiang Li, Cong Liu, Feng Wu
Tremendous efforts have been made on document image rectification, but how to learn effective representation of such distorted images is still under-explored.
1 code implementation • 18 Apr 2023 • Hao Feng, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li
To our best knowledge, this is the first learning-based method for the rectification of unrestricted document images.
Ranked #1 on
Local Distortion
on DocUNet
1 code implementation • 12 Apr 2023 • Liping Bao, Longhui Wei, Xiaoyu Qiu, Wengang Zhou, Houqiang Li, Qi Tian
Recent researches on unsupervised person re-identification~(reID) have demonstrated that pre-training on unlabeled person images achieves superior performance on downstream reID tasks than pre-training on ImageNet.
no code implementations • CVPR 2023 • Zhiyang Guo, Wengang Zhou, Min Wang, Li Li, Houqiang Li
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands, enabling the rendering of photo-realistic images and videos for gesture animation from arbitrary views.
no code implementations • 22 Mar 2023 • Shengming Yin, Chenfei Wu, Huan Yang, JianFeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan
In this paper, we propose NUWA-XL, a novel Diffusion over Diffusion architecture for eXtremely Long video generation.
1 code implementation • CVPR 2023 • Zigang Geng, Chunyu Wang, Yixuan Wei, Ze Liu, Houqiang Li, Han Hu
Human pose is typically represented by a coordinate vector of body joints or their heatmap embeddings.
Ranked #1 on
Pose Estimation
on MPII Human Pose
2 code implementations • ICCV 2023 • Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, Houqiang Li
We find that existing detectors struggle to detect images generated by diffusion models, even if we include generated images from a specific diffusion model in their training data.
no code implementations • ICCV 2023 • Xinyue Huo, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian
Currently, a popular UDA framework lies in self-training which endows the model with two-fold abilities: (i) learning reliable semantics from the labeled images in the source domain, and (ii) adapting to the target domain via generating pseudo labels on the unlabeled images.
1 code implementation • 1 Mar 2023 • Depu Meng, Owen Sayer, Rusheng Zhang, Shengyin Shen, Houqiang Li, Henry X. Liu
With the traffic conflict data collected, we discover that failure to yield to circulating vehicles when entering the roundabout is the largest contributing reason for traffic conflicts.
no code implementations • 10 Feb 2023 • Weichao Zhao, Hezhen Hu, Wengang Zhou, Jiaxin Shi, Houqiang Li
In this work, we are dedicated to leveraging the BERT pre-training success and modeling the domain-specific statistics to fertilize the sign language recognition~(SLR) model.
1 code implementation • 21 Jan 2023 • Hao Feng, Keyi Zhou, Wengang Zhou, Yufei Yin, Jiajun Deng, Qi Sun, Houqiang Li
It maintains a single estimate of the contour that is progressively deformed toward the object boundary.
Ranked #1 on
Semantic Contour Prediction
on Sbd val
no code implementations • 13 Jan 2023 • Xiaomeng Chu, Jiajun Deng, Yuan Zhao, Jianmin Ji, Yu Zhang, Houqiang Li, Yanyong Zhang
To this end, we propose OA-BEV, a network that can be plugged into the BEV-based 3D object detection framework to bring out the objects by incorporating object-aware pseudo-3D features and depth features.
no code implementations • CVPR 2023 • Linfeng Qi, Jiahao Li, Bin Li, Houqiang Li, Yan Lu
Meanwhile, besides assisting frame coding at the current time step, the feature from context generation will be propagated as motion condition when coding the subsequent motion latent.
no code implementations • 15 Dec 2022 • Yuandong Ding, Mingxiao Feng, Guozi Liu, Wei Jiang, Chuheng Zhang, Li Zhao, Lei Song, Houqiang Li, Yan Jin, Jiang Bian
In this paper, we consider the inventory management (IM) problem where we need to make replenishment decisions for a large number of stock keeping units (SKUs) to balance their supply and demand.
no code implementations • 28 Nov 2022 • Hezhen Hu, Weilun Wang, Wengang Zhou, Houqiang Li
In this work, we are dedicated to a new task, i. e., hand-object interaction image generation, which aims to conditionally generate the hand-object image under the given hand, object and their interaction status.
no code implementations • 28 Nov 2022 • YiXuan Wang, Wengang Zhou, Jianmin Bao, Weilun Wang, Li Li, Houqiang Li
The key idea of our CLIP2GAN is to bridge the output feature embedding space of CLIP and the input latent space of StyleGAN, which is realized by introducing a mapping network.
1 code implementation • 22 Nov 2022 • Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.
Ranked #1 on
Image Generation
on Places50
no code implementations • CVPR 2023 • Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo
However, unlike the low-level features such as pixel values, we argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image. This raises one question: is reconstruction necessary in Masked Image Modeling (MIM) with a teacher model?
no code implementations • 31 Oct 2022 • Yudong Lu, Jian Zhao, Youpeng Zhao, Wengang Zhou, Houqiang Li
We compare it with 8 baseline AI programs which are based on heuristic rules and the results reveal the outstanding performance of DanZero.
no code implementations • Findings (EMNLP) 2021 • Yuechen Wang, Wengang Zhou, Houqiang Li
In this work, we propose a novel candidate-free framework: Fine-grained Semantic Alignment Network (FSAN), for weakly supervised TLG.
2 code implementations • 15 Oct 2022 • Hao Feng, Wengang Zhou, Jiajun Deng, Yuechen Wang, Houqiang Li
In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one.
1 code implementation • 15 Oct 2022 • Yonghui Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li
To this end, we propose UDoc-GAN, the first framework to address the problem of document illumination correction under the unpaired setting.
1 code implementation • 14 Sep 2022 • Hongwei Xue, Yuchong Sun, Bei Liu, Jianlong Fu, Ruihua Song, Houqiang Li, Jiebo Luo
and 2) how to mitigate the impact of these factors?
Ranked #2 on
Video Retrieval
on MSR-VTT-1kA
(using extra training data)
1 code implementation • 26 Aug 2022 • Yunyao Mao, Wengang Zhou, Zhenbo Lu, Jiajun Deng, Houqiang Li
In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem.
3D Action Recognition
Few-Shot Skeleton-Based Action Recognition
+3
no code implementations • 23 Aug 2022 • Lin Liu, Junfeng An, Jianzhuang Liu, Shanxin Yuan, Xiangyu Chen, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian
Low-light video enhancement (LLVE) is an important yet challenging task with many applications such as photographing and autonomous driving.
no code implementations • 14 Jul 2022 • Zhaoyang Jia, Yan Lu, Houqiang Li
Since the current frame is not available in video frame synthesis, NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.
Ranked #4 on
Video Frame Interpolation
on X4K1000FPS
1 code implementation • 14 Jul 2022 • Jinhua Zhu, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
The model is pre-trained on three tasks: reconstruction of masked atoms and coordinates, 3D conformation generation conditioned on 2D graph, and 2D graph generation conditioned on 3D conformation.
4 code implementations • 30 Jun 2022 • Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks compared with Generative Adversarial Nets (GANs).
no code implementations • 16 Jun 2022 • Xueliang Wang, Jianyu Cai, Shuiwang Ji, Houqiang Li, Feng Wu, Jie Wang
A major novelty of SALA is the task-adaptive metric, which can learn the metric adaptively for different tasks in an end-to-end fashion.
1 code implementation • 14 Jun 2022 • Jiajun Deng, Zhengyuan Yang, Daqing Liu, Tianlang Chen, Wengang Zhou, Yanyong Zhang, Houqiang Li, Wanli Ouyang
For another, we devise Language Conditioned Vision Transformer that removes external fusion modules and reuses the uni-modal ViT for vision-language fusion at the intermediate layers.
1 code implementation • 8 Jun 2022 • Minrui Wang, Mingxiao Feng, Wengang Zhou, Houqiang Li
Utilizing MARL algorithms to coordinate multiple control units in the grid, which is able to handle rapid changes of power systems, has been widely studied in active voltage control task recently.
Multi-agent Reinforcement Learning
reinforcement-learning
+2
no code implementations • 23 May 2022 • Weizhen Qi, Yeyun Gong, Yelong Shen, Jian Jiao, Yu Yan, Houqiang Li, Ruofei Zhang, Weizhu Chen, Nan Duan
To further illustrate the commercial value of our approach, we conduct experiments on three generation tasks in real-world advertisements applications.
1 code implementation • 8 May 2022 • Qing Li, Wengang Zhou, Zhenbo Lu, Houqiang Li
Actor-critic Reinforcement Learning (RL) algorithms have achieved impressive performance in continuous control tasks.
no code implementations • 7 May 2022 • Zheng Chen, Jian Zhao, Mingyu Yang, Wengang Zhou, Houqiang Li
In this work, we are dedicated to multi-target active object tracking (AOT), where there are multiple targets as well as multiple cameras in the environment.
no code implementations • 5 May 2022 • Mingyu Yang, Jian Zhao, Xunhan Hu, Wengang Zhou, Jiangcheng Zhu, Houqiang Li
In this way, agents dealing with the same subtask share their learning of specific abilities and different subtasks correspond to different specific abilities.
Multi-agent Reinforcement Learning
reinforcement-learning
+3
1 code implementation • 25 Apr 2022 • Junshan Hu, Chaoxu Guo, Liansheng Zhuang, Biao Wang, Tiezheng Ge, Yuning Jiang, Houqiang Li
For the region perspective, we introduce Region Evaluate Module (REM) which uses a new and efficient sampling method for proposal feature representation containing more contextual information compared with point feature to refine category score and proposal boundary.
no code implementations • CVPR 2022 • Xinyue Huo, Lingxi Xie, Hengtong Hu, Wengang Zhou, Houqiang Li, Qi Tian
Unsupervised domain adaptation (UDA) is an important topic in the computer vision community.
1 code implementation • 6 Apr 2022 • Youpeng Zhao, Jian Zhao, Xunhan Hu, Wengang Zhou, Houqiang Li
Recent years have witnessed the great breakthrough of deep reinforcement learning (DRL) in various perfect and imperfect information games.
2 code implementations • CVPR 2022 • Dengpan Fu, Dongdong Chen, Hao Yang, Jianmin Bao, Lu Yuan, Lei Zhang, Houqiang Li, Fang Wen, Dong Chen
Since theses ID labels automatically derived from tracklets inevitably contain noises, we develop a large-scale Pre-training framework utilizing Noisy Labels (PNL), which consists of three learning modules: supervised Re-ID learning, prototype-based contrastive learning, and label-guided contrastive learning.
Ranked #8 on
Person Re-Identification
on CUHK03
no code implementations • 21 Mar 2022 • Xiaodong Cun, Zhendong Wang, Chi-Man Pun, Jianzhuang Liu, Wengang Zhou, Xu Jia, Houqiang Li
Color constancy aims to restore the constant colors of a scene under different illuminants.
1 code implementation • 16 Mar 2022 • Jian Zhao, Youpeng Zhao, Weixun Wang, Mingyu Yang, Xunhan Hu, Wengang Zhou, Jianye Hao, Houqiang Li
To the best of our knowledge, this work is the first to study the unexpected crashes in the multi-agent system.
Multi-agent Reinforcement Learning
reinforcement-learning
+4
1 code implementation • 16 Mar 2022 • Jian Zhao, Xunhan Hu, Mingyu Yang, Wengang Zhou, Jiangcheng Zhu, Houqiang Li
In this way, CTDS balances the full utilization of global observation during training and the feasibility of decentralized execution for online inference.
Multi-agent Reinforcement Learning
reinforcement-learning
+3
no code implementations • 11 Mar 2022 • Lin Liu, Lingxi Xie, Xiaopeng Zhang, Shanxin Yuan, Xiangyu Chen, Wengang Zhou, Houqiang Li, Qi Tian
In this paper, we propose a novel approach that embeds a task-agnostic prior into a transformer.
no code implementations • 10 Mar 2022 • Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian
Recently, masked image modeling (MIM) has become a promising direction for visual pre-training.
1 code implementation • 22 Feb 2022 • Zeyu Fang, Jian Zhao, Mingyu Yang, Wengang Zhou, Zhenbo Lu, Houqiang Li
In our approach, we regard each camera as an agent and address AMOT with a multi-agent reinforcement learning solution.
1 code implementation • 21 Feb 2022 • Jian Zhao, Mingyu Yang, Youpeng Zhao, Xunhan Hu, Wengang Zhou, Jiangcheng Zhu, Houqiang Li
Specifically, we model both individual Q-values and global Q-value with categorical distribution.
no code implementations • 9 Feb 2022 • Jian Zhao, Yue Zhang, Xunhan Hu, Weixun Wang, Wengang Zhou, Jianye Hao, Jiangcheng Zhu, Houqiang Li
In cooperative multi-agent systems, agents jointly take actions and receive a team reward instead of individual rewards.
1 code implementation • 3 Feb 2022 • Jinhua Zhu, Yingce Xia, Chang Liu, Lijun Wu, Shufang Xie, Yusong Wang, Tong Wang, Tao Qin, Wengang Zhou, Houqiang Li, Haiguang Liu, Tie-Yan Liu
Molecular conformation generation aims to generate three-dimensional coordinates of all the atoms in a molecule and is an important task in bioinformatics and pharmacology.
no code implementations • CVPR 2021 • Dong Li, Zhaofan Qiu, Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei
For each action category, we execute online clustering to decompose the graph into sub-graphs on each scale through learning Gaussian Mixture Layer and select the discriminative sub-graphs as action prototypes for recognition.
no code implementations • CVPR 2022 • Hui Wu, Min Wang, Wengang Zhou, Houqiang Li, Qi Tian
To this end, we propose a flexible contextual similarity distillation framework to enhance the small query model and keep its output feature compatible with that of large gallery model, which is crucial with asymmetric retrieval.
no code implementations • 20 Dec 2021 • Yufei Kuang, Miao Lu, Jie Wang, Qi Zhou, Bin Li, Houqiang Li
Many existing algorithms learn robust policies by modeling the disturbance and applying it to source environments during training, which usually requires prior knowledge about the disturbance and control of simulators.
no code implementations • 16 Dec 2021 • Chengrun Qiu, Dongheng Zhang, Yang Hu, Houqiang Li, Qibin Sun, Yan Chen
In this paper, we propose a radio-assisted human detection framework by incorporating radio information into the state-of-the-art detection methods, including anchor-based onestage detectors and two-stage detectors.
1 code implementation • 12 Dec 2021 • Hui Wu, Min Wang, Wengang Zhou, Yang Hu, Houqiang Li
Next, a refinement block is introduced to enhance the visual tokens with self-attention and cross-attention.
Ranked #3 on
Image Retrieval
on RParis (Medium)
no code implementations • 29 Nov 2021 • Hanqi Zhu, Jiajun Deng, Yu Zhang, Jianmin Ji, Qiuyu Mao, Houqiang Li, Yanyong Zhang
However, this approach often suffers from the mismatch between the resolution of point clouds and RGB images, leading to sub-optimal performance.
no code implementations • NeurIPS 2021 • Chaoqun Wang, Shaobo Min, Xuejin Chen, Xiaoyan Sun, Houqiang Li
This enables DPPN to produce visual representations with accurate attribute localization ability, which benefits the semantic-visual alignment and representation transferability.
1 code implementation • 29 Oct 2021 • Yiheng Liu, Wengang Zhou, Qiaokang Xie, Houqiang Li
To this end, we propose to explore unsupervised person re-identification with both visual data and wireless positioning trajectories under weak scene labeling, in which we only need to know the locations of the cameras.
3 code implementations • 28 Oct 2021 • Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, Houqiang Li
The iterative refinements make DocScanner converge to a robust and superior rectification performance, while the lightweight recurrent architecture ensures the running efficiency.
1 code implementation • NeurIPS 2021 • Jianbo Ouyang, Hui Wu, Min Wang, Wengang Zhou, Houqiang Li
Since our re-ranking model is not directly involved with the visual feature used in the initial retrieval, it is ready to be applied to retrieval result lists obtained from various retrieval algorithms.
2 code implementations • 25 Oct 2021 • Hao Feng, Yuechen Wang, Wengang Zhou, Jiajun Deng, Houqiang Li
Specifically, DocTr consists of a geometric unwarping transformer and an illumination correction transformer.
no code implementations • 11 Oct 2021 • Weiming Liu, Huacong Jiang, Bin Li, Houqiang Li
Follow-the-Regularized-Lead (FTRL) and Online Mirror Descent (OMD) are regret minimization algorithms for Online Convex Optimization (OCO), they are mathematically elegant but less practical in solving Extensive-Form Games (EFGs).
no code implementations • ICCV 2021 • Hezhen Hu, Weichao Zhao, Wengang Zhou, Yuechen Wang, Houqiang Li
To validate the effectiveness of our method on SLR, we perform extensive experiments on four public benchmark datasets, i. e., NMFs-CSL, SLR500, MSASL and WLASL.
Ranked #3 on
Sign Language Recognition
on WLASL100
(using extra training data)
no code implementations • 29 Sep 2021 • Mingxiao Feng, Guozi Liu, Li Zhao, Lei Song, Jiang Bian, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
We consider inventory management (IM) problem for a single store with a large number of SKUs (stock keeping units) in this paper, where we need to make replenishment decisions for each SKU to balance its supply and demand.
no code implementations • 26 Sep 2021 • Minghong Yao, Zhiguang Liu, Liangwei Wang, Houqiang Li, Liansheng Zhuang
However, collecting and labeling a large dataset is time-consuming and is not a user-friendly requirement for many cloud platforms.
no code implementations • 6 Sep 2021 • Hongwei Xue, Bei Liu, Huan Yang, Jianlong Fu, Houqiang Li, Jiebo Luo
To tackle this problem, we propose a model named FGLA to generate high-quality and realistic videos by learning Fine-Grained motion embedding for Landscape Animation.
no code implementations • Findings (EMNLP) 2021 • Yimin Fan, Yaobo Liang, Alexandre Muzio, Hany Hassan, Houqiang Li, Ming Zhou, Nan Duan
Then we cluster all the target languages into multiple groups and name each group as a representation sprachbund.
no code implementations • 25 Aug 2021 • Xiao Cui, Wengang Zhou, Yang Hu, Weilun Wang, Houqiang Li
The main idea is to disentangle the latent space of a pre-trained generation model and precisely control the face attributes of child images with clear semantics.
4 code implementations • ICCV 2021 • Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang
Our approach, named conditional DETR, learns a conditional spatial query from the decoder embedding for decoder multi-head cross-attention.
Ranked #109 on
Object Detection
on COCO minival
no code implementations • ICCV 2021 • Weilun Wang, Wengang Zhou, Jianmin Bao, Dong Chen, Houqiang Li
In this paper, we uncover that the negative examples play a critical role in the performance of contrastive learning for image translation.
1 code implementation • ICCV 2021 • Yunyao Mao, Ning Wang, Wengang Zhou, Houqiang Li
In this work, we propose to integrate transductive and inductive learning into a unified framework to exploit the complementarity between them for accurate and robust video object segmentation.
1 code implementation • 30 Jul 2021 • Jiajun Deng, Wengang Zhou, Yanyong Zhang, Houqiang Li
To this end, in this work, we regard point clouds as hollow-3D data and propose a new architecture, namely Hallucinated Hollow-3D R-CNN ($\text{H}^2$3D R-CNN), to address the problem of 3D object detection.
1 code implementation • 3 Jul 2021 • Yue Jin, Yue Zhang, Tao Qin, Xudong Zhang, Jian Yuan, Houqiang Li, Tie-Yan Liu
Inspired by the two observations, in this work, we study a new problem, supervised off-policy ranking (SOPR), which aims to rank a set of target policies based on supervised learning by leveraging off-policy data and policies with known performance.
1 code implementation • CVPR 2021 • Zhen Huang, Xu Shen, Jun Xing, Tongliang Liu, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, Xian-Sheng Hua
The inheritance part is learned with a similarity loss to transfer the existing learned knowledge from the teacher model to the student model, while the exploration part is encouraged to learn representations different from the inherited ones with a dis-similarity loss.
1 code implementation • 30 Jun 2021 • Yuechen Wang, Jiajun Deng, Wengang Zhou, Houqiang Li
To this end, we introduce a novel weakly supervised temporal adjacent network (WSTAN) for temporal language grounding.
no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo
To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.
no code implementations • 24 Jun 2021 • Yingjie Wang, Qiuyu Mao, Hanqi Zhu, Jiajun Deng, Yu Zhang, Jianmin Ji, Houqiang Li, Yanyong Zhang
In this survey, we first introduce the background of popular sensors used for self-driving, their data properties, and the corresponding object detection algorithms.
no code implementations • CVPR 2021 • Xinyue Huo, Lingxi Xie, Jianzhong He, Zijie Yang, Wengang Zhou, Houqiang Li, Qi Tian
Semi-supervised learning is a useful tool for image segmentation, mainly due to its ability in extracting knowledge from unlabeled data to assist learning from labeled data.
no code implementations • CVPR 2021 • Hezhen Hu, Weilun Wang, Wengang Zhou, Weichao Zhao, Houqiang Li
Then, a transformation flow is calculated based on the correspondence of the source and target topology map.
1 code implementation • 17 Jun 2021 • Jinhua Zhu, Yingce Xia, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
After pre-training, we can use either the Transformer branch (this one is recommended according to empirical results), the GNN branch, or both for downstream tasks.
Ranked #1 on
Molecular Property Prediction
on HIV dataset
4 code implementations • CVPR 2022 • Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, Houqiang Li
Powered by these two designs, Uformer enjoys a high capability for capturing both local and global dependencies for image restoration.
Ranked #3 on
Deblurring
on RealBlur-R (trained on GoPro)
no code implementations • 1 Jun 2021 • Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian
By simply pulling the different augmented views of each image together or other novel mechanisms, they can learn much unsupervised knowledge and significantly improve the transfer performance of pre-training models.
no code implementations • CVPR 2021 • Hao Zhou, Wengang Zhou, Weizhen Qi, Junfu Pu, Houqiang Li
Finally, the synthetic parallel data serves as a strong supplement for the end-to-end training of the encoder-decoder SLT framework.
Ranked #5 on
Sign Language Translation
on CSL-Daily
no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo
To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.
no code implementations • 19 Apr 2021 • Chenyi Lei, Shixian Luo, Yong liu, Wanggui He, Jiamang Wang, Guoxin Wang, Haihong Tang, Chunyan Miao, Houqiang Li
The pre-trained neural models have recently achieved impressive performances in understanding multimodal content.
2 code implementations • ICCV 2021 • Jiajun Deng, Zhengyuan Yang, Tianlang Chen, Wengang Zhou, Houqiang Li
In this paper, we present a neat yet effective transformer-based framework for visual grounding, namely TransVG, to address the task of grounding a language query to the corresponding region onto an image.
Ranked #18 on
Referring Expression Comprehension
on RefCOCO
3 code implementations • ACL 2021 • Weizhen Qi, Yeyun Gong, Yu Yan, Can Xu, Bolun Yao, Bartuer Zhou, Biao Cheng, Daxin Jiang, Jiusheng Chen, Ruofei Zhang, Houqiang Li, Nan Duan
ProphetNet is a pre-training based natural language generation method which shows powerful performance on English text summarization and question generation tasks.
no code implementations • 5 Apr 2021 • Chaoqun Wang, Xuejin Chen, Shaobo Min, Xiaoyan Sun, Houqiang Li
First, DCEN leverages task labels to cluster representations of the same semantic category by cross-modal contrastive learning and exploring semantic-visual complementarity.
2 code implementations • CVPR 2021 • Jialun Peng, Dong Liu, Songcen Xu, Houqiang Li
We propose a two-stage model for diverse inpainting, where the first stage generates multiple coarse results each of which has a different structure, and the second stage refines each coarse result separately by augmenting texture.
1 code implementation • ICLR 2021 • Jinhua Zhu, Lijun Wu, Yingce Xia, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
Based on this observation, in this work, we break the assumption of the fixed layer order in the Transformer and introduce instance-wise layer reordering into the model structure.
1 code implementation • ICCV 2021 • Zhen Huang, Dixiu Xue, Xu Shen, Xinmei Tian, Houqiang Li, Jianqiang Huang, Xian-Sheng Hua
Second, different body parts possess different scales, and even the same part in different frames can appear at different locations and scales.
Ranked #4 on
Gait Recognition
on OUMVLP
1 code implementation • ICCV 2021 • Hui Wu, Min Wang, Wengang Zhou, Houqiang Li
To this end, we propose a novel deep local feature learning architecture to simultaneously focus on multiple discriminative local patterns in an image.
no code implementations • 1 Jan 2021 • Depu Meng, Zigang Geng, Zhirong Wu, Bin Xiao, Houqiang Li, Jingdong Wang
The proposed consistent instance classification (ConIC) approach simultaneously optimizes the classification loss and an additional consistency loss explicitly penalizing the feature dissimilarity between the augmented views from the same instance.
5 code implementations • 31 Dec 2020 • Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, Houqiang Li
In this paper, we take a slightly different viewpoint -- we find that precise positioning of raw points is not essential for high performance 3D object detection and that the coarse voxel granularity can also offer sufficient detection accuracy.
Ranked #4 on
3D Object Detection
on KITTI Cars Moderate val
1 code implementation • 31 Dec 2020 • Weizhen Qi, Yeyun Gong, Jian Jiao, Yu Yan, Weizhu Chen, Dayiheng Liu, Kewen Tang, Houqiang Li, Jiusheng Chen, Ruofei Zhang, Ming Zhou, Nan Duan
In this paper, we propose BANG, a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation.
1 code implementation • 9 Dec 2020 • Ning Wang, Wengang Zhou, Houqiang Li
It is worth mentioning that our method also surpasses the fully-supervised affinity representation (e. g., ResNet) and performs competitively against the recent fully-supervised algorithms designed for the specific tasks (e. g., VOT and VOS).
1 code implementation • CVPR 2021 • Dengpan Fu, Dongdong Chen, Jianmin Bao, Hao Yang, Lu Yuan, Lei Zhang, Houqiang Li, Dong Chen
In this paper, we present a large scale unlabeled person re-identification (Re-ID) dataset "LUPerson" and make the first attempt of performing unsupervised pre-training for improving the generalization ability of the learned person Re-ID feature representation.
Ranked #1 on
Person Re-Identification
on Market-1501
(mAP metric)
no code implementations • NeurIPS 2020 • Qi Zhou, Yufei Kuang, Zherui Qiu, Houqiang Li, Jie Wang
However, in continuous action spaces, integrating entropy regularization with expressive policies is challenging and usually requires complex inference procedures.
1 code implementation • 26 Nov 2020 • Zhen Huang, Xu Shen, Xinmei Tian, Houqiang Li, Jianqiang Huang, Xian-Sheng Hua
The topology of the adjacency graph is a key factor for modeling the correlations of the input skeletons.
no code implementations • 19 Nov 2020 • Xinyue Huo, Lingxi Xie, Longhui Wei, Xiaopeng Zhang, Hao Li, Zijie Yang, Wengang Zhou, Houqiang Li, Qi Tian
Contrastive learning has achieved great success in self-supervised visual representation learning, but existing approaches mostly ignored spatial information which is often crucial for visual representation.
no code implementations • 17 Nov 2020 • Longhui Wei, Lingxi Xie, Jianzhong He, Jianlong Chang, Xiaopeng Zhang, Wengang Zhou, Houqiang Li, Qi Tian
Recently, contrastive learning has largely advanced the progress of unsupervised visual representation learning.
1 code implementation • 21 Oct 2020 • Weizhen Qi, Yeyun Gong, Yu Yan, Jian Jiao, Bo Shao, Ruofei Zhang, Houqiang Li, Nan Duan, Ming Zhou
We build a dataset from a real-word sponsored search engine and carry out experiments to analyze different generative retrieval models.
1 code implementation • 15 Oct 2020 • Jinhua Zhu, Yingce Xia, Lijun Wu, Jiajun Deng, Wengang Zhou, Tao Qin, Houqiang Li
During inference, the CNN encoder and the policy network are used to take actions, and the Transformer module is discarded.
no code implementations • 11 Oct 2020 • Junfu Pu, Wengang Zhou, Hezhen Hu, Houqiang Li
Continuous sign language recognition (SLR) deals with unaligned video-text pair and uses the word error rate (WER), i. e., edit distance, as the main evaluation metric.
no code implementations • 21 Sep 2020 • Dengpan Fu, Bo Xin, Jingdong Wang, Dong-Dong Chen, Jianmin Bao, Gang Hua, Houqiang Li
Not only does such a simple method improve the performance of the baseline models, it also achieves comparable performance with latest advanced re-ranking methods.
no code implementations • 24 Aug 2020 • Hezhen Hu, Wengang Zhou, Junfu Pu, Houqiang Li
Sign language recognition (SLR) is a challenging problem, involving complex manual features, i. e., hand gestures, and fine-grained non-manual features (NMFs), i. e., facial expression, mouth shapes, etc.
1 code implementation • 10 Aug 2020 • Yiheng Liu, Wengang Zhou, Mao Xi, Sanjing Shen, Houqiang Li
Existing person re-identification methods rely on the visual sensor to capture the pedestrians.
1 code implementation • 22 Jul 2020 • Ning Wang, Wengang Zhou, Yibing Song, Chao Ma, Wei Liu, Houqiang Li
The advancement of visual tracking has continuously been brought by deep learning models.
1 code implementation • 7 Jul 2020 • Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei
Single shot detectors that are potentially faster and simpler than two-stage detectors tend to be more applicable to object detection in videos.
1 code implementation • 21 Jun 2020 • Hengrui Zhao, Dong Liu, Houqiang Li
Considering the tradeoff between activation quantization error and network learning ability, we set an empirical rule to tune the bound of each Bounded ReLU.
no code implementations • 18 Jun 2020 • Ning Wang, Wengang Zhou, Qi Tian, Houqiang Li
In the second stage, a discrete sampling based ridge regression is designed to double-check the remaining ambiguous hard samples, which serves as an alternative of fully-connected layers and benefits from the closed-form solver for efficient learning.
1 code implementation • CVPR 2020 • Jianping Lin, Dong Liu, Houqiang Li, Feng Wu
To compensate for the compression error of the auto-encoders, we further design a MV refinement network and a residual refinement network, taking use of the multiple reference frames as well.
no code implementations • 31 Mar 2020 • Dong Li, Ting Yao, Zhaofan Qiu, Houqiang Li, Tao Mei
It has been well recognized that modeling human-object or object-object relations would be helpful for detection task.
3 code implementations • ICLR 2020 • Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
While BERT is more commonly used as fine-tuning instead of contextual embedding for downstream language understanding tasks, in NMT, our preliminary exploration of using BERT as contextual embedding is better than using for fine-tuning.
no code implementations • 8 Feb 2020 • Hao Zhou, Wengang Zhou, Yun Zhou, Houqiang Li
Our STMC network consists of a spatial multi-cue (SMC) module and a temporal multi-cue (TMC) module.
2 code implementations • 6 Feb 2020 • Qiwei He, Liansheng Zhuang, Houqiang Li
However, due to the brittleness of deterministic methods, HER and its variants typically suffer from a major challenge for stability and convergence, which significantly affects the final performance.
1 code implementation • 28 Nov 2019 • Qi Zhou, Houqiang Li, Jie Wang
In this paper, We propose a Policy Optimization method with Model-Based Uncertainty (POMBU)---a novel model-based approach---that can effectively improve the asymptotic performance using the uncertainty in Q-values.
Model-based Reinforcement Learning
reinforcement-learning
+2
no code implementations • 28 Nov 2019 • Guanhua Zheng, Jitao Sang, Houqiang Li, Jian Yu, Changsheng Xu
The derived generalization bound based on the ITID assumption identifies the significance of hypothesis invariance in guaranteeing generalization performance.
1 code implementation • CVPR 2019 • Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, Xian-Sheng Hua
The proposed quantization function can be learned in a lossless and end-to-end manner and works for any weights and activations of neural networks in a simple and uniform way.
no code implementations • 16 Nov 2019 • Feng Lin, Haohang Xu, Houqiang Li, Hongkai Xiong, Guo-Jun Qi
For this reason, we should use the geodesic to characterize how an image transform along the manifold of a transformation group, and adopt its length to measure the deviation between transformations.
no code implementations • 25 Oct 2019 • Yiheng Liu, Wengang Zhou, Jianzhuang Liu, Guo-Jun Qi, Qi Tian, Houqiang Li
By presenting a target attention loss, the pedestrian features extracted from the foreground branch become more insensitive to the backgrounds, which greatly reduces the negative impacts of changing backgrounds on matching an identical across different camera views.
1 code implementation • 25 Oct 2019 • Qiaokang Xie, Wengang Zhou, Guo-Jun Qi, Qi Tian, Houqiang Li
In our approach, we first collect tracklet data within each camera by automatic person detection and tracking.
no code implementations • 25 Sep 2019 • Minghong Yao, Liansheng Zhuang, Houqiang Li, Jian Yang, Shafei Wang
Results show that our model can outperform the dominant models consistently in these tasks.
2 code implementations • ICCV 2019 • Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei
In this paper, we introduce a new design to capture the interactions across the objects in spatio-temporal context.
1 code implementation • 23 Jul 2019 • Ning Wang, Wengang Zhou, Yibing Song, Chao Ma, Houqiang Li
In the distillation process, we propose a fidelity loss to enable the student network to maintain the representation capability of the teacher network.
no code implementations • 28 May 2019 • Zhengguang Zhou, Wengang Zhou, Richang Hong, Houqiang Li
Pruning filters is an effective method for accelerating deep neural networks (DNNs), but most existing approaches prune filters on a pre-trained network directly which limits in acceleration.
no code implementations • 28 May 2019 • Zhengguang Zhou, Wengang Zhou, Xutao Lv, Xuan Huang, Xiaoyu Wang, Houqiang Li
Recent years have witnessed the great advance of deep learning in a variety of vision tasks.