no code implementations • 20 Apr 2025 • Siyi Jiao, Wenzheng Zeng, Yerong Li, Huayu Zhang, Changxin Gao, Nong Sang, Mike Zheng Shou
In this work, we address this by introducing MP-Mat, a novel 3D-and-instance-aware matting framework with multiplane representation, where the multiplane concept is designed from two different perspectives: scene geometry level and instance level.
no code implementations • 15 Apr 2025 • Minghui Lin, Shu Wang, Xiang Wang, Jianhua Tang, Longbin Fu, Zhengrong Zuo, Nong Sang
Current multi-modal object re-identification approaches based on large-scale pre-trained backbones (i. e., ViT) have displayed remarkable progress and achieved excellent performance.
no code implementations • 15 Apr 2025 • Xiang Wang, Shiwei Zhang, Hangjie Yuan, Yujie Wei, Yingya Zhang, Changxin Gao, Yuehuan Wang, Nong Sang
Recent advancements in human image animation have been propelled by video diffusion models, yet their reliance on numerous iterative denoising steps results in high inference costs and slow speeds.
1 code implementation • 15 Apr 2025 • Xiang Wang, Shiwei Zhang, Longxiang Tang, Yingya Zhang, Changxin Gao, Yuehuan Wang, Nong Sang
Furthermore, we adopt a simple concatenation operation to integrate the reference appearance into the model and incorporate the pose information of the reference image for enhanced pose alignment.
5 code implementations • 14 Apr 2025 • Yuqian Fu, Xingyu Qiu, Bin Ren, Yanwei Fu, Radu Timofte, Nicu Sebe, Ming-Hsuan Yang, Luc van Gool, Kaijin Zhang, Qingpeng Nong, Xiugang Dong, Hong Gao, Xiangsheng Zhou, Jiancheng Pan, Yanxing Liu, Xiao He, Jiahao Li, Yuze Sun, Xiaomeng Huang, Zhenyu Zhang, Ran Ma, YuHan Liu, Zijian Zhuang, Shuai Yi, Yixiong Zou, Lingyi Hong, Mingxi Chen, Runze Li, Xingdong Sheng, Wenqiang Zhang, Weisen Chen, Yongxin Yan, Xinguo Chen, Yuanjie Shao, Zhengrong Zuo, Nong Sang, Hao Wu, Haoran Sun, Shuming Hu, Yan Zhang, Zhiguang Shi, Yu Zhang, Chao Chen, Tao Wang, Da Feng, Linhai Zhuo, Ziming Lin, Yali Huang, Jie Me, Yiming Yang, Mi Guo, Mingyuan Jiu, Mingliang Xu, Maomao Xiong, Qunshu Zhang, Xinyu Cao, Yuqing Yang, Dianmo Sheng, Xuanpu Zhao, Zhiyu Li, Xuyang Ding, Wenqian Li
Cross-Domain Few-Shot Object Detection (CD-FSOD) poses significant challenges to existing object detection and few-shot detection models when applied across domains.
Cross-Domain Few-Shot
Cross-Domain Few-Shot Object Detection
+3
1 code implementation • 27 Mar 2025 • Minghui Lin, Xiang Wang, Yishan Wang, Shu Wang, Fengqi Dai, Pengxiang Ding, Cunxiang Wang, Zhengrong Zuo, Nong Sang, Siteng Huang, Donglin Wang
Recent advancements in video generation have witnessed significant progress, especially with the rapid advancement of diffusion models.
no code implementations • 13 Mar 2025 • Nannan Wu, Zengqiang Yan, Nong Sang, Li Yu, Chang Wen Chen
In this paper, we attribute this competition to the homogeneity in loss patterns exhibited by rare and mislabeled data clients, preventing existing loss-based fair and robust FL methods from effectively distinguishing and handling these two distinct client types.
no code implementations • 3 Mar 2025 • Huayu Zhang, Dongyue Wu, Yuanjie Shao, Nong Sang, Changxin Gao
Recently, trimap-free methods have drawn increasing attention in human video matting due to their promising performance.
1 code implementation • 5 Feb 2025 • Xingye Chen, Wei Feng, Zhenbang Du, Weizhen Wang, Yanyin Chen, Haohan Wang, Linkai Liu, Yaoyu Li, Jinyuan Zhao, Yu Li, Zheng Zhang, Jingjing Lv, Junjie Shen, Zhangang Lin, Jingping Shao, Yuanjie Shao, Xinge You, Changxin Gao, Nong Sang
In web data, advertising images are crucial for capturing user attention and improving advertising effectiveness.
no code implementations • 17 Dec 2024 • Guilin Zhu, Dongyue Wu, Changxin Gao, Runmin Wang, Weidong Yang, Nong Sang
However, they overlook a critical issue: in CISS, the representation of class knowledge is updated continuously through incremental learning, whereas prototype replay methods maintain fixed prototypes.
Class-Incremental Semantic Segmentation
Incremental Learning
1 code implementation • 17 Dec 2024 • Dongyue Wu, Zilin Guo, Li Yu, Nong Sang, Changxin Gao
Within this framework, we introduce a spatial-aware redundancy metric based on feature maps, thus endowing the pruning process with location sensitivity to better adapt to pruning segmentation networks.
1 code implementation • 9 Dec 2024 • Huaxin Zhang, Xiaohao Xu, Xiang Wang, Jialong Zuo, Xiaonan Huang, Changxin Gao, Shanjun Zhang, Li Yu, Nong Sang
For efficient anomaly detection in long videos, we propose the Anomaly-focused Temporal Sampler (ATS).
no code implementations • 17 Oct 2024 • Yanpeng Sun, Huaxin Zhang, Qiang Chen, Xinyu Zhang, Nong Sang, Gang Zhang, Jingdong Wang, Zechao Li
QLadder employs a learnable ``\textit{ladder}'' structure to deeply aggregates the intermediate representations from the frozen pretrained visual encoder (e. g., CLIP image encoder).
Ranked #165 on
Visual Question Answering
on MM-Vet
1 code implementation • 13 Oct 2024 • Siyi Jiao, Wenzheng Zeng, Changxin Gao, Nong Sang
(2) Existing works are limited to a single type of user input, which is ineffective for intention understanding and also inefficient for user operation.
no code implementations • 30 Sep 2024 • Xiang Wang, Changxin Gao, Yuehuan Wang, Nong Sang
Recent advancements in controllable human-centric video generation, particularly with the rise of diffusion models, have demonstrated considerable progress.
1 code implementation • 27 Sep 2024 • Jialong Zuo, Ying Nie, Hanyu Zhou, Huaxin Zhang, Haoyu Wang, Tianyu Guo, Nong Sang, Changxin Gao
For example, compared with the previous state-of-the-art~\cite{ISR}, CION with the same ResNet50-IBN achieves higher mAP of 93. 3\% and 74. 3\% on Market1501 and MSMT17, while only utilizing 8\% training samples.
1 code implementation • 18 Jun 2024 • Huaxin Zhang, Xiaohao Xu, Xiang Wang, Jialong Zuo, Chuchu Han, Xiaonan Huang, Changxin Gao, Yuehuan Wang, Nong Sang
We train a lightweight temporal sampler to select frames with high anomaly response and fine-tune a multimodal large language model (LLM) to generate explanatory content.
1 code implementation • CVPR 2024 • Xiangheng Shan, Dongyue Wu, Guilin Zhu, Yuanjie Shao, Nong Sang, Changxin Gao
To learn a consistent semantic structure from CLIP, the SSC Loss aligns the inter-classes affinity in the image feature space with that in the text feature space of CLIP, thereby improving the generalization ability of our model.
2 code implementations • 3 Jun 2024 • Xiang Wang, Shiwei Zhang, Changxin Gao, Jiayu Wang, Xiaoqiang Zhou, Yingya Zhang, Luxin Yan, Nong Sang
First, to reduce the optimization difficulty and ensure temporal coherence, we map the reference image along with the posture guidance and noise video into a common feature space by incorporating a unified video diffusion model.
no code implementations • 26 Apr 2024 • Zhengze Xu, Mengting Chen, Zhao Wang, Linyu Xing, Zhonghua Zhai, Nong Sang, Jinsong Lan, Shuai Xiao, Changxin Gao
To generate coherent motions, we first leverage the Kalman filter to construct smooth crops in the focus tunnel and inject the position embedding of the tunnel into attention layers to improve the continuity of the generated videos.
no code implementations • 13 Mar 2024 • Ruochen Zheng, Jiahao Hong, Changxin Gao, Nong Sang
Unfortunately, obtaining precise annotations in the multimodal field is expensive, which has prompted some methods to tackle the mismatched data pair issue in cross-modal matching contexts, termed as noisy correspondence.
1 code implementation • 10 Mar 2024 • Huaxin Zhang, Xiang Wang, Xiaohao Xu, Xiaonan Huang, Chuchu Han, Yuehuan Wang, Changxin Gao, Shanjun Zhang, Nong Sang
In recent years, video anomaly detection has been extensively investigated in both unsupervised and weakly supervised settings to alleviate costly temporal labeling.
no code implementations • 1 Mar 2024 • Jiahao Hong, Jialong Zuo, Chuchu Han, Ruochen Zheng, Ming Tian, Changxin Gao, Nong Sang
We introduce the Spatial Cascaded Clustering and Weighted Memory (SCWM) method to address these challenges.
1 code implementation • CVPR 2024 • Ziwen Li, Feng Zhang, Meng Cao, Jinpu Zhang, Yuanjie Shao, Yuehuan Wang, Nong Sang
Specifically the global transformation adjusts the overall appearance using image-adaptive 3D LUTs to provide decent global contrast and sharp details while the pixel transformation compensates for local context.
1 code implementation • 28 Dec 2023 • Zhengze Xu, Dongyue Wu, Changqian Yu, Xiangxiang Chu, Nong Sang, Changxin Gao
Recent real-time semantic segmentation methods usually adopt an additional semantic branch to pursue rich long-range context.
1 code implementation • CVPR 2024 • Xiang Wang, Shiwei Zhang, Hangjie Yuan, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang
Following such a pipeline, we study the effect of doubling the scale of training set (i. e., video-only WebVid10M) with some randomly collected text-free videos and are encouraged to observe the performance improvement (FID from 9. 67 to 8. 19 and FVD from 484 to 441), demonstrating the scalability of our approach.
Ranked #7 on
Text-to-Video Generation
on MSR-VTT
2 code implementations • 14 Dec 2023 • Xiang Wang, Shiwei Zhang, Han Zhang, Yu Liu, Yingya Zhang, Changxin Gao, Nong Sang
Consistency models have demonstrated powerful capability in efficient image generation and allowed synthesis within a few sampling steps, alleviating the high computational cost in diffusion models.
1 code implementation • CVPR 2024 • Zhiwu Qing, Shiwei Zhang, Jiayu Wang, Xiang Wang, Yujie Wei, Yingya Zhang, Changxin Gao, Nong Sang
At the structure level, we decompose the T2V task into two steps, including spatial reasoning and temporal reasoning, using a unified denoiser.
Ranked #6 on
Text-to-Video Generation
on MSR-VTT
1 code implementation • CVPR 2024 • Jialong Zuo, Hanyu Zhou, Ying Nie, Feng Zhang, Tianyu Guo, Nong Sang, Yunhe Wang, Changxin Gao
Firstly, we construct a new \textbf{dataset} named UFine6926.
1 code implementation • NeurIPS 2023 • Feng Zhang, Ming Tian, Zhiqiang Li, Bin Xu, Qingbo Lu, Changxin Gao, Nong Sang
Furthermore, we utilize local Laplacian filters to refine the edge details in the high-frequency components in an adaptive manner.
no code implementations • 16 Oct 2023 • Xiang Wang, Shiwei Zhang, Hangjie Yuan, Yingya Zhang, Changxin Gao, Deli Zhao, Nong Sang
In this paper, we develop an effective plug-and-play framework called CapFSAR to exploit the knowledge of multimodal models without manually annotating text.
1 code implementation • ICCV 2023 • Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yingya Zhang, Changxin Gao, Deli Zhao, Nong Sang
When pre-training on the large-scale Kinetics-710, we achieve 89. 7% on Kinetics-400 with a frozen ViT-L model, which verifies the scalability of DiST.
1 code implementation • 24 Aug 2023 • Huaxin Zhang, Xiang Wang, Xiaohao Xu, Zhiwu Qing, Changxin Gao, Nong Sang
For snippet-level learning, we introduce an online-updated memory to store reliable snippet prototypes for each class.
Ranked #1 on
Weakly Supervised Action Localization
on BEOID
1 code implementation • ICCV 2023 • Feng Zhang, Bin Xu, Zhiqiang Li, Xinran Liu, Qingbo Lu, Changxin Gao, Nong Sang
To address this issue, we introduce a new perspective to synthesize the signal-independent noise by a generative model.
Ranked #2 on
Image Denoising
on SID SonyA7S2 x300
1 code implementation • IEEE Transactions on Image Processing 2023 • Dongyue Wu, Zilin Guo, Aoyan Li, Changqian Yu, Nong Sang, Changxin Gao
We conduct extensive experiments on ADE20K, Cityscapes, and Pascal Context, and the results show that applying the CBL to various popular segmentation networks can significantly improve the mIoU and boundary F-score performance.
Ranked #25 on
Semantic Segmentation
on Cityscapes val
1 code implementation • 15 May 2023 • Jialong Zuo, Jiahao Hong, Feng Zhang, Changqian Yu, Hanyu Zhou, Changxin Gao, Nong Sang, Jingdong Wang
To address this issue, we propose a novel language-image pre-training framework for person representation learning, termed PLIP.
Ranked #7 on
Text based Person Retrieval
on ICFG-PEDES
1 code implementation • CVPR 2023 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Changxin Gao, Yingya Zhang, Deli Zhao, Nong Sang
To address these issues, we develop a Motion-augmented Long-short Contrastive Learning (MoLo) method that contains two crucial components, including a long-short contrastive objective and a motion autodecoder.
1 code implementation • 6 Mar 2023 • Xiang Wang, Shiwei Zhang, Jun Cen, Changxin Gao, Yingya Zhang, Deli Zhao, Nong Sang
Learning from large-scale contrastive language-image pre-training like CLIP has shown remarkable success in a wide range of downstream tasks recently, but it is still under-explored on the challenging few-shot action recognition (FSAR) task.
no code implementations • 12 Jan 2023 • Dongyue Wu, Zilin Guo, Aoyan Li, Changqian Yu, Changxin Gao, Nong Sang
Under this novel view, we propose a Class Center Similarity layer (CCS layer) to address the above-mentioned challenges by generating adaptive class centers conditioned on different scenes and supervising the similarities between class centers.
1 code implementation • 9 Jan 2023 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Zhengrong Zuo, Changxin Gao, Rong Jin, Nong Sang
To be specific, HyRSM++ consists of two key components, a hybrid relation module and a temporal set matching metric.
no code implementations • 9 Jan 2023 • Huan Peng, Fenggang Liu, Yangguang Li, Bin Huang, Jing Shao, Nong Sang, Changxin Gao
Human-Object Interaction (HOI) detection aims to learn how human interacts with surrounding objects.
no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence 2022 • Chuchu Han, Zhedong Zheng, Kai Su, Dongdong Yu, Zehuan Yuan, Changxin Gao, Nong Sang, Yi Yang
Person search aims at localizing and recognizing query persons from raw video frames, which is a combination of two sub-tasks, i. e., pedestrian detection and person re-identification.
Ranked #3 on
Person Search
on PRW
no code implementations • 2 Nov 2022 • Yixuan Pei, Zhiwu Qing, Jun Cen, Xiang Wang, Shiwei Zhang, Yaxiong Wang, Mingqian Tang, Nong Sang, Xueming Qian
The former is to reduce the memory cost by preserving only one condensed frame instead of the whole video, while the latter aims to compensate the lost spatio-temporal details in the Frame Condensing stage.
1 code implementation • 24 Jul 2022 • Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Xiang Wang, Yuehuan Wang, Yiliang Lv, Changxin Gao, Nong Sang
Inspired by this, we propose propose Masked Action Recognition (MAR), which reduces the redundant computation by discarding a proportion of patches and operating only on a part of the videos.
Ranked #11 on
Action Recognition
on Something-Something V2
no code implementations • 18 Jun 2022 • Xiang Wang, Huaxin Zhang, Shiwei Zhang, Changxin Gao, Yuanjie Shao, Nong Sang
This technical report presents our first place winning solution for temporal action detection task in CVPR-2022 AcitivityNet Challenge.
1 code implementation • CVPR 2022 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Mingqian Tang, Zhengrong Zuo, Changxin Gao, Rong Jin, Nong Sang
To overcome the two limitations, we propose a novel Hybrid Relation guided Set Matching (HyRSM) approach that incorporates two key components: hybrid relation module and set matching metric.
no code implementations • CVPR 2022 • Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yi Xu, Xiang Wang, Mingqian Tang, Changxin Gao, Rong Jin, Nong Sang
In this work, we aim to learn representations by leveraging more abundant information in untrimmed videos.
no code implementations • 12 Mar 2022 • Fusen Wang, Kai Liu, Fei Long, Nong Sang, Xiaofeng Xia, Jun Sang
However, the transformer directly partitions the crowd images into a series of tokens, which may not be a good choice due to each pedestrian being an independent individual, and the parameter number of the network is very large.
no code implementations • 22 Dec 2021 • Yuhang Wu, Tengteng Huang, Haotian Yao, Chi Zhang, Yuanjie Shao, Chuchu Han, Changxin Gao, Nong Sang
First, we present a Domain-Specific Contrastive Learning (DSCL) mechanism to fully explore intradomain information by comparing samples only from the same domain.
Contrastive Learning
Domain Adaptive Person Re-Identification
+2
1 code implementation • 15 Dec 2021 • Zongheng Huang, Yifan Sun, Chuchu Han, Changxin Gao, Nong Sang
By combining two fundamental learning approaches in DML, e. g., classification training and pairwise training, we set up a strong baseline for ZS-SBIR.
no code implementations • 5 Dec 2021 • Moran Li, Haibin Huang, Yi Zheng, Mengtian Li, Nong Sang, Chongyang Ma
In this work, we present a new method for 3D face reconstruction from sparse-view RGB images.
1 code implementation • 3 Dec 2021 • Feng Zhang, Yuanjie Shao, Yishi Sun, Kai Zhu, Changxin Gao, Nong Sang
We introduce a Noise Disentanglement Module (NDM) to disentangle the noise and content in the reflectance maps with the reliable aid of unpaired clean images.
Ranked #3 on
Low-Light Image Enhancement
on MEF
1 code implementation • 25 Nov 2021 • Rui Wang, Jian Chen, Gang Yu, Li Sun, Changqian Yu, Changxin Gao, Nong Sang
Image manipulation with StyleGAN has been an increasing concern in recent years. Recent works have achieved tremendous success in analyzing several semantic latent spaces to edit the attributes of the generated images. However, due to the limited semantic and spatial manipulation precision in these latent spaces, the existing endeavors are defeated in fine-grained StyleGAN image manipulation, i. e., local attribute translation. To address this issue, we discover attribute-specific control units, which consist of multiple channels of feature maps and modulation styles.
2 code implementations • 21 Sep 2021 • Changqian Yu, Yuanjie Shao, Changxin Gao, Nong Sang
The last layer of FCN is typically a global classifier (1x1 convolution) to recognize each pixel to a semantic label.
Ranked #21 on
Semantic Segmentation
on PASCAL Context
no code implementations • ICCV 2021 • Chuchu Han, Kai Su, Dongdong Yu, Zehuan Yuan, Changxin Gao, Nong Sang, Yi Yang, Changhu Wang
Large-scale labeled training data is often difficult to collect, especially for person identities.
1 code implementation • 24 Aug 2021 • Zhiwu Qing, Ziyuan Huang, Shiwei Zhang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Rong Jin, Nong Sang
The visualizations show that ParamCrop adaptively controls the center distance and the IoU between two augmented views, and the learned change in the disparity along the training process is beneficial to learning a strong representation.
no code implementations • 24 Jun 2021 • Zhiwu Qing, Xiang Wang, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Nong Sang
Temporal action localization aims to localize starting and ending time with action category.
1 code implementation • ICCV 2021 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Zhengrong Zuo, Changxin Gao, Nong Sang
Most recent approaches for online action detection tend to apply Recurrent Neural Network (RNN) to capture long-range temporal structure.
Ranked #8 on
Online Action Detection
on THUMOS'14
no code implementations • 20 Jun 2021 • Xiang Wang, Zhiwu Qing, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Yuanjie Shao, Nong Sang
Then our proposed Local-Global Background Modeling Network (LGBM-Net) is trained to localize instances by using only video-level labels based on Multi-Instance Learning (MIL).
Weakly-supervised Learning
Weakly-supervised Temporal Action Localization
+1
1 code implementation • 20 Jun 2021 • Xiang Wang, Zhiwu Qing, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Nong Sang
We calculate the detection results by assigning the proposals with corresponding classification results.
Ranked #3 on
Temporal Action Localization
on ActivityNet-1.3
(using extra training data)
1 code implementation • 13 Jun 2021 • Zhiwu Qing, Ziyuan Huang, Xiang Wang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Nong Sang
This technical report analyzes an egocentric video action detection method we used in the 2021 EPIC-KITCHENS-100 competition hosted in CVPR2021 Workshop.
1 code implementation • 9 Jun 2021 • Ziyuan Huang, Zhiwu Qing, Xiang Wang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Zhurong Xia, Mingqian Tang, Nong Sang, Marcelo H. Ang Jr
In this paper, we present empirical results for training a stronger video vision transformer on the EPIC-KITCHENS-100 Action Recognition dataset.
no code implementations • 4 Jun 2021 • Fusen Wang, Jun Sang, Zhongyuan Wu, Qi Liu, Nong Sang
In this paper, we propose a Hybrid Attention Network (HAN) by employing Progressive Embedding Scale-context (PES) information, which enables the network to simultaneously suppress noise and adapt head scale variation.
15 code implementations • CVPR 2021 • Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, Jingdong Wang
We introduce a lightweight unit, conditional channel weighting, to replace costly pointwise (1x1) convolutions in shuffle blocks.
Ranked #33 on
Pose Estimation
on COCO test-dev
(using extra training data)
1 code implementation • CVPR 2021 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Changxin Gao, Nong Sang
In this paper, we focus on applying the power of self-supervised methods to improve semi-supervised action proposal generation.
Ranked #2 on
Semi-Supervised Action Detection
on ActivityNet-1.3
1 code implementation • CVPR 2021 • Zhiwu Qing, Haisheng Su, Weihao Gan, Dongliang Wang, Wei Wu, Xiang Wang, Yu Qiao, Junjie Yan, Changxin Gao, Nong Sang
In this paper, we propose Temporal Context Aggregation Network (TCANet) to generate high-quality action proposals through "local and global" temporal context aggregation and complementary as well as progressive boundary refinement.
Ranked #10 on
Temporal Action Localization
on ActivityNet-1.3
1 code implementation • 22 Feb 2021 • Chuchu Han, Zhedong Zheng, Changxin Gao, Nong Sang, Yi Yang
Specifically, to reconcile the conflicts of multiple objectives, we simplify the standard tightly coupled pipelines and establish a deeply decoupled multi-task learning framework.
Ranked #8 on
Person Search
on PRW
1 code implementation • ICCV 2021 • Shizhen Zhao, Changxin Gao, Yuanjie Shao, Wei-Shi Zheng, Nong Sang
Specifically, to alleviate the intra-class variations, a clustering method is utilized to generate pseudo labels for both visual and textual instances.
1 code implementation • 17 Dec 2020 • Moran Li, Yuan Gao, Nong Sang
This is different from the previous methods where all the joints are considered holistically and share the same feature.
1 code implementation • ECCV 2020 • Shizhen Zhao, Changxin Gao, Jun Zhang, Hao Cheng, Chuchu Han, Xinyang Jiang, Xiaowei Guo, Wei-Shi Zheng, Nong Sang, Xing Sun
In the conventional person Re-ID setting, it is widely assumed that cropped person images are for each individual.
no code implementations • ECCV 2020 • Changqian Yu, Yifan Liu, Changxin Gao, Chunhua Shen, Nong Sang
In this paper, we present a Representative Graph (RepGraph) layer to dynamically sample a few representative features, which dramatically reduces redundancy.
no code implementations • 7 Aug 2020 • Xiang Wang, Changxin Gao, Shiwei Zhang, Nong Sang
By this means, the proposed MLTPN can learn rich and discriminative features for different action instances with different durations.
1 code implementation • ECCV 2020 • Yanrui Bin, Xuan Cao, Xinya Chen, Yanhao Ge, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Changxin Gao, Nong Sang
Human pose estimation is the task of localizing body keypoints from still images.
1 code implementation • 13 Jun 2020 • Xiang Wang, Baiteng Ma, Zhiwu Qing, Yongpeng Sang, Changxin Gao, Shiwei Zhang, Nong Sang
In this report, we present our solution for the task of temporal action localization (detection) (task 1) in ActivityNet Challenge 2020.
no code implementations • 13 Jun 2020 • Zhiwu Qing, Xiang Wang, Yongpeng Sang, Changxin Gao, Shiwei Zhang, Nong Sang
This technical report analyzes a temporal action localization method we used in the HACS competition which is hosted in Activitynet Challenge 2020. The goal of our task is to locate the start time and end time of the action in the untrimmed video, and predict action category. Firstly, we utilize the video-level feature information to train multiple video-level action classification models.
no code implementations • 20 May 2020 • Xinya Chen, Yanrui Bin, Changxin Gao, Nong Sang, Hao Tang
The module builds a fully connected directed graph between the regions of different density where each node (region) is represented by weighted global pooled feature, and GCN is learned to map this region graph to a set of relation-aware regions representations.
1 code implementation • CVPR 2020 • Yuanjie Shao, Lerenhan Li, Wenqi Ren, Changxin Gao, Nong Sang
By training image translation and dehazing network in an end-to-end manner, we can obtain better effects of both image translation and dehazing.
Ranked #5 on
Image Dehazing
on RESIDE-6K
7 code implementations • 5 Apr 2020 • Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang
We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation.
Ranked #1 on
Real-Time Semantic Segmentation
on COCO-Stuff
2 code implementations • CVPR 2020 • Changqian Yu, Jingbo Wang, Changxin Gao, Gang Yu, Chunhua Shen, Nong Sang
Given an input image and corresponding ground truth, Affinity Loss constructs an ideal affinity map to supervise the learning of Context Prior.
Ranked #1 on
Scene Understanding
on ADE20K val
1 code implementation • 19 Jan 2020 • Shizhen Zhao, Changxin Gao, Yuanjie Shao, Lerenhan Li, Changqian Yu, Zhong Ji, Nong Sang
FFU and BFU add the IoU variance to the results of CFU, yielding class-specific foreground and background features, respectively.
no code implementations • ICCV 2019 • Chuchu Han, Jiacheng Ye, Yunshan Zhong, Xin Tan, Chi Zhang, Changxin Gao, Nong Sang
The state-of-the-art methods train the detector individually, and the detected bounding boxes may be sub-optimal for the following re-ID task.
21 code implementations • ECCV 2018 • Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang
Semantic segmentation requires both rich spatial information and sizeable receptive field.
Ranked #4 on
Semantic Segmentation
on SkyScapes-Dense
Dichotomous Image Segmentation
Real-Time Semantic Segmentation
+2
3 code implementations • CVPR 2018 • Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang
Most existing methods of semantic segmentation still suffer from two aspects of challenges: intra-class inconsistency and inter-class indistinction.
Ranked #5 on
Semantic Segmentation
on PASCAL VOC 2012 test
no code implementations • CVPR 2018 • Lerenhan Li, Jinshan Pan, Wei-Sheng Lai, Changxin Gao, Nong Sang, Ming-Hsuan Yang
We present an effective blind image deblurring method based on a data-driven discriminative prior. Our work is motivated by the fact that a good image prior should favor clear images over blurred images. In this work, we formulate the image prior as a binary classifier which can be achieved by a deep convolutional neural network (CNN). The learned prior is able to distinguish whether an input image is clear or not. Embedded into the maximum a posterior (MAP) framework, it helps blind deblurring in various scenarios, including natural, face, text, and low-illumination images. However, it is difficult to optimize the deblurring method with the learned image prior as it involves a non-linear CNN. Therefore, we develop an efficient numerical approach based on the half-quadratic splitting method and gradient decent algorithm to solve the proposed model. Furthermore, the proposed model can be easily extended to non-uniform deblurring. Both qualitative and quantitative experimental results show that our method performs favorably against state-of-the-art algorithms as well as domain-specific image deblurring approaches.
no code implementations • 9 Feb 2018 • Jun Xiang, Guoshuai Zhang, Jianhua Hou, Nong Sang, Rui Huang
Designing a robust affinity model is the key issue in multiple target tracking (MTT).
no code implementations • 12 Nov 2016 • Dapeng Luo, Zhipeng Zeng, Nong Sang, Xiang Wu, Longsheng Wei, Quanzheng Mou, Jun Cheng, Chen Luo
In this paper, the proposed framework takes a remarkably different direction to resolve the multi-scene detection problem in a bottom-up fashion.
no code implementations • 29 Jun 2016 • Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, Zhimin Cao
Recently, scene text detection has become an active research topic in computer vision and document analysis, because of its great importance and significant challenge.
Ranked #6 on
Scene Text Detection
on COCO-Text
no code implementations • 24 Feb 2014 • Changxin Gao, Feifei Chen, Jin-Gang Yu, Rui Huang, Nong Sang
However, the task in tracking is to search for a specific object, rather than an object category as in detection.