no code implementations • 12 Sep 2024 • Ling Xing, Hongyu Qu, Rui Yan, Xiangbo Shu, Jinhui Tang
ii) To better aggregate such audio and visual features, we further customize Cross-modal Dynamic Perception layer (CDP) in cross-modal feature pyramid to understand local temporal patterns of audio-visual events by imposing local consistency within multimodal features in a data-driven manner.
1 code implementation • IEEE Transactions on Image Processing 2024 • Jiahui Wang, Qin Xu, Bo Jiang, Bin Luo, Jinhui Tang
In this paper, we propose a novel Multi-Granularity Part Sampling Attention (MPSA) network for fine-grained visual classification.
Ranked #2 on Fine-Grained Image Classification on Stanford Dogs
1 code implementation • 17 Jul 2024 • Fei Shen, Xin Jiang, Xin He, Hu Ye, Cong Wang, Xiaoyu Du, Zechao Li, Jinhui Tang
Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience.
no code implementations • 7 Jun 2024 • Peng Xing, Dong Zhang, Jinhui Tang, Zechao Li
Specifically, by Case-1, we found that the main reasons detrimental to current AD methods is that the inputs to the recovery model contain a large number of detailed features to be recovered, which leads to the normal/abnormal area has-not/has been recovered into its original state.
2 code implementations • 25 Apr 2024 • Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu, Chengjian Zheng, Diankai Zhang, Ning Wang, Xintao Qiu, Yuanbo Zhou, Kongxian Wu, Xinwei Dai, Hui Tang, Wei Deng, Qingquan Gao, Tong Tong, Jae-Hyeon Lee, Ui-Jin Choi, Min Yan, Xin Liu, Qian Wang, Xiaoqian Ye, Zhan Du, Tiansen Zhang, Long Peng, Jiaming Guo, Xin Di, Bohao Liao, Zhibo Du, Peize Xia, Renjing Pei, Yang Wang, Yang Cao, ZhengJun Zha, Bingnan Han, Hongyuan Yu, Zhuoyuan Wu, Cheng Wan, Yuqing Liu, Haodong Yu, Jizhe Li, Zhijuan Huang, Yuan Huang, Yajun Zou, Xianyu Guan, Qi Jia, Heng Zhang, Xuanwu Yin, Kunlong Zuo, Hyeon-Cheol Moon, Tae-hyun Jeong, Yoonmo Yang, Jae-Gon Kim, Jinwoo Jeong, Sunjei Kim
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs.
no code implementations • 24 Apr 2024 • Xin Jiang, Hao Tang, Rui Yan, Jinhui Tang, Zechao Li
This paper presents a meticulous analysis leading to the proposal of practical guidelines to identify subcategory-specific discrepancies and generate discriminative features to design effective FGIR models.
1 code implementation • 21 Apr 2024 • Gensheng Pei, Yazhou Yao, Jianbo Jiao, Wenguan Wang, Liqiang Nie, Jinhui Tang
To achieve this objective, we present a unified self-supervised approach to learn visual representations of static-dynamic feature similarity.
3 code implementations • 16 Apr 2024 • Bin Ren, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang, Wei Zhai, Renjing Pei, Jiaming Guo, Songcen Xu, Yang Cao, ZhengJun Zha, Yan Wang, Yi Liu, Qing Wang, Gang Zhang, Liou Zhang, Shijie Zhao, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Xin Liu, Min Yan, Menghan Zhou, Yiqiang Yan, Yixuan Liu, Wensong Chan, Dehua Tang, Dong Zhou, Li Wang, Lu Tian, Barsoum Emad, Bohan Jia, Junbo Qiao, Yunshuai Zhou, Yun Zhang, Wei Li, Shaohui Lin, Shenglong Zhou, Binbin Chen, Jincheng Liao, Suiyi Zhao, Zhao Zhang, Bo wang, Yan Luo, Yanyan Wei, Feng Li, Mingshen Wang, Yawei Li, Jinhan Guan, Dehua Hu, Jiawei Yu, Qisheng Xu, Tao Sun, Long Lan, Kele Xu, Xin Lin, Jingtong Yue, Lehan Yang, Shiyi Du, Lu Qi, Chao Ren, Zeyu Han, YuHan Wang, Chaolin Chen, Haobo Li, Mingjun Zheng, Zhongbao Yang, Lianhong Song, Xingzhuo Yan, Minghan Fu, Jingyi Zhang, Baiang Li, Qi Zhu, Xiaogang Xu, Dan Guo, Chunle Guo, Jiadi Chen, Huanhuan Long, Chunjiang Duanmu, Xiaoyan Lei, Jie Liu, Weilin Jia, Weifeng Cao, Wenlong Zhang, Yanyu Mao, Ruilong Guo, Nihao Zhang, Qian Wang, Manoj Pandey, Maksym Chernozhukov, Giang Le, Shuli Cheng, Hongyuan Wang, Ziyan Wei, Qingting Tang, Liejun Wang, Yongming Li, Yanhui Guo, Hao Xu, Akram Khatami-Rizi, Ahmad Mahmoudi-Aznaveh, Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou, Amogh Joshi, Nikhil Akalwadi, Sampada Malagi, Palani Yashaswini, Chaitra Desai, Ramesh Ashok Tabib, Ujwala Patil, Uma Mudenagudi
In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking.
1 code implementation • 9 Apr 2024 • Yixin Yang, Jiangxin Dong, Jinhui Tang, Jinshan Pan
To explore this property for better spatial and temporal feature utilization, we develop a local attention module to aggregate the features from adjacent frames in a spatial-temporal neighborhood.
1 code implementation • 6 Apr 2024 • Hao Li, Xiang Chen, Jiangxin Dong, Jinhui Tang, Jinshan Pan
However, inaccurate alignment usually leads to aligned features with significant artifacts, which will be accumulated during propagation and thus affect video restoration.
Ranked #5 on Video Super-Resolution on Vid4 - 4x upscaling
no code implementations • 15 Mar 2024 • Qin Xu, Sitong Li, Jiahui Wang, Bo Jiang, Jinhui Tang
To tackle this challenge, we propose a weakly supervised Context-Semantic Quality Awareness Network (CSQA-Net) for FGVC.
Ranked #4 on Fine-Grained Image Classification on CUB-200-2011
Fine-Grained Image Classification Fine-Grained Visual Categorization
1 code implementation • CVPR 2024 • Fengyun Wang, Qianru Sun, Dong Zhang, Jinhui Tang
Semantic scene completion (SSC) aims to predict complete 3D voxel occupancy and semantics from a single-view RGB-D image, and recent SSC methods commonly adopt multi-modal inputs.
1 code implementation • 20 Jan 2024 • Tao Chen, Yazhou Yao, Xingguo Huang, Zechao Li, Liqiang Nie, Jinhui Tang
In this paper, we propose spatial structure constraints (SSC) for weakly supervised semantic segmentation to alleviate the unwanted object over-activation of attention expansion.
no code implementations • 10 Jan 2024 • Luanyuan Dai, Xiaoyu Du, Hanwang Zhang, Jinhui Tang
To obtain information integrating implicit and explicit local graphs, we construct local graphs from implicit and explicit aspects and combine them effectively, which is used to build a global graph.
1 code implementation • 3 Jan 2024 • Wei Yao, Hongwen Zhang, Yunlian Sun, Jinhui Tang
This method can remarkably improve the smoothness of recovery results from video.
Ranked #50 on 3D Human Pose Estimation on MPI-INF-3DHP (using extra training data)
1 code implementation • 19 Dec 2023 • Wei Tang, Liang Li, Xuejing Liu, Lu Jin, Jinhui Tang, Zechao Li
In this paper, we propose a novel framework with context disentangling and prototype inheriting for robust visual grounding to handle both scenes.
1 code implementation • 29 Nov 2023 • Wei Yao, Hongwen Zhang, Yunlian Sun, Yebin Liu, Jinhui Tang
Our contributions include a novel weak-supervised camera calibration technique, an effective orientation correction module, and a decoupling strategy that significantly improves the generalizability and accuracy of human motion recovery in both camera and world coordinates.
Ranked #1 on 3D Human Pose Estimation on SPEC-MTP (using extra training data)
1 code implementation • 7 Nov 2023 • Neng Dong, Shuanglin Yan, Hao Tang, Jinhui Tang, Liyan Zhang
Moreover, as multiple images with the same identity are not accessible in the testing stage, we devise an Information Propagation (IP) mechanism to distill knowledge from the comprehensive representation to that of a single occluded image.
no code implementations • 17 Oct 2023 • Shuanglin Yan, Neng Dong, Jun Liu, Liyan Zhang, Jinhui Tang
Since the support set is unavailable during inference, we propose to distill the knowledge learned by the "richer" model into a lightweight model for inference with a single image/text as input.
no code implementations • 5 Oct 2023 • Xiang Chen, Jinshan Pan, Jiangxin Dong, Jinhui Tang
In this paper, we provide a comprehensive review of existing image deraining method and provide a unify evaluation setting to evaluate the performance of image deraining methods.
no code implementations • 6 Aug 2023 • Hao Tang, Jun Liu, Shuanglin Yan, Rui Yan, Zechao Li, Jinhui Tang
Due to the scarcity of manually annotated data required for fine-grained video understanding, few-shot fine-grained (FS-FG) action recognition has gained significant attention, with the aim of classifying novel fine-grained action categories with only a few labeled instances.
1 code implementation • 14 Jul 2023 • Neng Dong, Liyan Zhang, Shuanglin Yan, Hao Tang, Jinhui Tang
Occlusion perturbation presents a significant challenge in person re-identification (re-ID), and existing methods that rely on external visual cues require additional computational resources and only consider the issue of missing information caused by occlusion.
no code implementations • 12 Jun 2023 • Changguang Wu, Jiangxin Dong, Jinhui Tang
To further speed up the inference speed, a lookup table method is employed for fast retrieval.
1 code implementation • 9 May 2023 • Tao Chen, Yazhou Yao, Jinhui Tang
Weakly supervised semantic segmentation (WSSS) models relying on class activation maps (CAMs) have achieved desirable performance comparing to the non-CAMs-based counterparts.
1 code implementation • 18 Apr 2023 • Chunyan Wang, Dong Zhang, Liyan Zhang, Jinhui Tang
Specifically, a flexible context aggregation module is proposed to capture the global object context in different granular spaces.
Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation
1 code implementation • 17 Apr 2023 • Sihan Chen, Xingjian He, Longteng Guo, Xinxin Zhu, Weining Wang, Jinhui Tang, Jing Liu
Different from widely-studied vision-language pretraining models, VALOR jointly models relationships of vision, audio and language in an end-to-end manner.
Ranked #1 on Video Captioning on VATEX (using extra training data)
1 code implementation • CVPR 2023 • Fengyun Wang, Dong Zhang, Hanwang Zhang, Jinhui Tang, Qianru Sun
SSC is a well-known ill-posed problem as the prediction model has to "imagine" what is behind the visible surface, which is usually represented by Truncated Signed Distance Function (TSDF).
1 code implementation • ICCV 2023 • Long Sun, Jiangxin Dong, Jinhui Tang, Jinshan Pan
Although numerous solutions have been proposed for image super-resolution, they are usually incompatible with low-power devices with many computational and memory constraints.
Ranked #51 on Image Super-Resolution on Set14 - 4x upscaling
1 code implementation • 23 Jan 2023 • Fei Shen, Xiaoyu Du, Liyan Zhang, Xiangbo Shu, Jinhui Tang
To address this problem, in this paper, we propose a simple Triplet Contrastive Representation Learning (TCRL) framework which leverages cluster features to bridge the part features and global features for unsupervised vehicle re-identification.
1 code implementation • IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 2023 • Cairong Zhao, Chutian Wang, Guosheng Hu, Haonan Chen, Chun Liu, Jinhui Tang
To address these two challenges, in this paper, we propose an Interpretable Spatial-Temporal Video Transformer (ISTVT), which consists of a novel decomposed spatial-temporal self-attention and a self-subtract mechanism to capture spatial artifacts and temporal inconsistency for robust Deepfake detection.
no code implementations • ICCV 2023 • Xiang Li, Jinshan Pan, Jinhui Tang, Jiangxin Dong
We develop a hybrid dynamic-Transformer block(HDTB) that integrates the MHDLSA and SparseGSA for both local and global feature exploration.
1 code implementation • CVPR 2023 • Jinshan Pan, Boming Xu, Jiangxin Dong, Jianjun Ge, Jinhui Tang
In contrast to existing methods that directly align adjacent frames without discrimination, we develop a deep discriminative spatial and temporal network to facilitate the spatial and temporal feature exploration for better video deblurring.
no code implementations • ICCV 2023 • Jiangxin Dong, Jinshan Pan, Zhongbao Yang, Jinhui Tang
We present a simple and effective Multi-scale Residual Low-Pass Filter Network (MRLPFNet) that jointly explores the image details and main structures for image deblurring.
no code implementations • 5 Dec 2022 • Yixin Yang, Zhongzheng Peng, Xiaoyu Du, Zhulin Tao, Jinhui Tang, Jinshan Pan
To overcome this problem, we further develop a mixed expert block to extract semantic information for modeling the object boundaries of frames so that the semantic image prior can better guide the colorization process for better performance.
no code implementations • 21 Nov 2022 • Mingye Ju, Chuheng Chen, Charles A. Guo, Jinshan Pan, Jinhui Tang, DaCheng Tao
How to effectively explore semantic feature is vital for low-light image enhancement (LLE).
1 code implementation • 19 Oct 2022 • Shuanglin Yan, Neng Dong, Liyan Zhang, Jinhui Tang
Secondly, cross-grained feature refinement (CFR) and fine-grained correspondence discovery (FCD) modules are proposed to establish the cross-grained and fine-grained interactions between modalities, which can filter out non-modality-shared image patches/words and mine cross-modal correspondences from coarse to fine.
no code implementations • 19 Oct 2022 • Peng Xing, Hao Tang, Jinhui Tang, Zechao Li
However, existing KDAD methods suffer from two main limitations: 1) the student network can effortlessly replicate the teacher network's representations, and 2) the features of the teacher network serve solely as a ``reference standard" and are not fully leveraged.
1 code implementation • 5 Oct 2022 • Yu Quan, Dong Zhang, Liyan Zhang, Jinhui Tang
To address this problem, in this paper, we propose a Centralized Feature Pyramid (CFP) for object detection, which is based on a globally explicit centralized feature regulation.
1 code implementation • 4 Oct 2022 • Zican Zha, Hao Tang, Yunlian Sun, Jinhui Tang
To address this challenging task, we propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local-to-local (L2L) similarity metric.
3 code implementations • 23 Sep 2022 • Gangwei Xu, Yun Wang, Junda Cheng, Jinhui Tang, Xin Yang
In this paper, we present a novel cost volume construction method, named attention concatenation volume (ACV), which generates attention weights from correlation clues to suppress redundant information and enhance matching-related information in the concatenation volume.
1 code implementation • 21 Sep 2022 • Dong Zhang, Yi Lin, Hao Chen, Zhuotao Tian, Xin Yang, Jinhui Tang, Kwang Ting Cheng
Over the past few years, the rapid development of deep learning technologies for computer vision has significantly improved the performance of medical image segmentation (MedISeg).
no code implementations • 20 Sep 2022 • Dong Zhang, Jinhui Tang, Kwang-Ting Cheng
In this paper, we propose a novel Graph Reasoning Transformer (GReaT) for image parsing to enable image patches to interact following a relation reasoning pattern.
no code implementations • 30 Aug 2022 • Shuanglin Yan, Hao Tang, Liyan Zhang, Jinhui Tang
Moreover, existing methods seldom consider the information inequality problem between modalities caused by image-specific information.
1 code implementation • 18 Jul 2022 • Gensheng Pei, Fumin Shen, Yazhou Yao, Guo-Sen Xie, Zhenmin Tang, Jinhui Tang
Optical flow is an easily conceived and precious cue for advancing unsupervised video object segmentation (UVOS).
1 code implementation • 31 May 2022 • Fei Shen, Zhe Wang, Zijun Wang, Xiaode Fu, Jiayi Chen, Xiaoyu Du, Jinhui Tang
Vision-based pattern identification (such as face, fingerprint, iris etc.)
1 code implementation • 30 May 2022 • Long Sun, Jinshan Pan, Jinhui Tang
We propose a simple and effective approach, ShuffleMixer, for lightweight image super-resolution that explores large convolution and channel split-shuffle operation.
no code implementations • CVPR 2022 • Zeren Sun, Fumin Shen, Dan Huang, Qiong Wang, Xiangbo Shu, Yazhou Yao, Jinhui Tang
Label noise has been a practical challenge in deep learning due to the strong capability of deep neural networks in fitting all training data.
no code implementations • IEEE Transactions on Image Processing 2021 • Xinguang Xiang, YaJie Zhang, Lu Jin, Zechao Li, Jinhui Tang
Specifically, to localize diverse local regions, a sub-region localization module is developed to learn discriminative local features by locating the peaks of non-overlap sub-regions in the feature map.
1 code implementation • 2 Dec 2021 • Rui Yan, Mike Zheng Shou, Yixiao Ge, Alex Jinpeng Wang, Xudong Lin, Guanyu Cai, Jinhui Tang
Video-Text pre-training aims at learning transferable representations from large-scale video-text pairs via aligning the semantics between visual and textual information.
no code implementations • 11 Nov 2021 • Xiu-Shen Wei, Yi-Zhe Song, Oisin Mac Aodha, Jianxin Wu, Yuxin Peng, Jinhui Tang, Jian Yang, Serge Belongie
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications.
no code implementations • CVPR 2021 • Jing Wang, Jinhui Tang, Mingkun Yang, Xiang Bai, Jiebo Luo
Under the guidance of the geometrical relationship between OCR tokens, our LSTM-R capitalizes on a newly-devised relation-aware pointer network to select OCR tokens from the scene text for OCR-based image captioning.
1 code implementation • 20 Apr 2021 • Zechao Li, Yanpeng Sun, Jinhui Tang
Specifically, the Spatial Contextual Module (SCM) is leveraged to uncover the spatial contextual dependency between pixels by exploring the correlation between pixels and categories.
Ranked #72 on Semantic Segmentation on ADE20K val
1 code implementation • 10 Dec 2020 • Rui Yan, Lingxi Xie, Xiangbo Shu, Jinhui Tang
To understand a complex action, multiple sources of information, including appearance, positional, and semantic features, need to be integrated.
5 code implementations • CVPR 2021 • Xiang Li, Wenhai Wang, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang
Such a property makes the distribution statistics of a bounding box highly correlated to its real localization quality.
Ranked #26 on Object Detection on COCO-O
1 code implementation • NeurIPS 2020 • Dong Zhang, Hanwang Zhang, Jinhui Tang, Xian-Sheng Hua, Qianru Sun
We present a causal inference framework to improve Weakly-Supervised Semantic Segmentation (WSSS).
Ranked #37 on Weakly-Supervised Semantic Segmentation on COCO 2014 val
1 code implementation • ECCV 2020 • Dong Zhang, Hanwang Zhang, Jinhui Tang, Meng Wang, Xiansheng Hua, Qianru Sun
Yet, the non-local spatial interactions are not across scales, and thus they fail to capture the non-local contexts of objects (or parts) residing in different scales.
no code implementations • ECCV 2020 • Rui Yan, Lingxi Xie, Jinhui Tang, Xiangbo Shu, Qi Tian
This paper presents a new task named weakly-supervised group activity recognition (GAR) which differs from conventional GAR tasks in that only video-level labels are available, yet the important persons within each frame are not provided even in the training data.
7 code implementations • NeurIPS 2020 • Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang
Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations.
Ranked #105 on Object Detection on COCO test-dev
1 code implementation • CVPR 2020 • Jinshan Pan, Haoran Bai, Jinhui Tang
The proposed algorithm mainly consists of optical flow estimation from intermediate latent frames and latent frame restoration steps.
Ranked #2 on Deblurring on Beam-Splitter Deblurring (BSD)
2 code implementations • ICCV 2021 • Jinshan Pan, Songsheng Cheng, Jiawei Zhang, Jinhui Tang
Existing video super-resolution (SR) algorithms usually assume that the blur kernels in the degradation process are known and do not model the blur kernels in the restoration.
no code implementations • ICCV 2019 • Jun Fu, Jing Liu, Yuhang Wang, Yong Li, Yongjun Bao, Jinhui Tang, Hanqing Lu
Recent works attempt to improve scene parsing performance by exploring different levels of contexts, and typically train a well-designed convolutional network to exploit useful contexts across all pixels equally.
Ranked #73 on Semantic Segmentation on ADE20K val
no code implementations • 29 Sep 2019 • Xiangbo Shu, Liyan Zhang, Guo-Jun Qi, Wei Liu, Jinhui Tang
To this end, we propose a novel Skeleton-joint Co-attention Recurrent Neural Networks (SC-RNN) to capture the spatial coherence among joints, and the temporal evolution among skeletons simultaneously on a skeleton-joint co-attention feature map in spatiotemporal space.
1 code implementation • 18 Aug 2019 • Jinshan Pan, Yang Liu, Deqing Sun, Jimmy Ren, Ming-Ming Cheng, Jian Yang, Jinhui Tang
We present a simple and effective image super-resolution algorithm that imposes an image formation constraint on the deep neural networks via pixel substitution.
1 code implementation • 6 Aug 2019 • Longteng Guo, Jing Liu, Jinhui Tang, Jiangwei Li, Wei Luo, Hanqing Lu
Image captioning attempts to generate a sentence composed of several linguistic words, which are used to describe objects, attributes, and interactions in an image, denoted as visual semantic units in this paper.
no code implementations • 1 Aug 2019 • Jing Wang, Yingwei Pan, Ting Yao, Jinhui Tang, Tao Mei
A valid question is how to encapsulate such gists/topics that are worthy of mention from an image, and then describe the image from one topic to another but holistically with a coherent structure.
1 code implementation • 26 Jun 2019 • Xiaoyu Du, Xiangnan He, Fajie Yuan, Jinhui Tang, Zhiguang Qin, Tat-Seng Chua
In this work, we emphasize on modeling the correlations among embedding dimensions in neural networks to pursue higher effectiveness for CF.
no code implementations • 25 May 2019 • Zhao Zhang, Yan Zhang, Guangcan Liu, Jinhui Tang, Shuicheng Yan, Meng Wang
To enrich prior knowledge to enhance the discrimination, RS2ACF clearly uses class information of labeled data and more importantly propagates it to unlabeled data by jointly learning an explicit label indicator for unlabeled data.
no code implementations • CVPR 2016 • Xiaojuan Wang, Ting Zhang, Guo-Jun Q, Jinhui Tang, Jingdong Wang
In this paper, we address the problem of searching for semantically similar images from a large database.
no code implementations • 9 Jan 2019 • Lu Jin, Zechao Li, Jinhui Tang
In this article, we propose a novel deep semantic multimodal hashing network (DSMHN) for scalable image-text and video-text retrieval.
no code implementations • NeurIPS 2018 • Longquan Dai, Liang Tang, Yuan Xie, Jinhui Tang
Over the decades, people took a handmade approach to design fast algorithms for the Gaussian convolution.
1 code implementation • 11 Nov 2018 • Xiangnan He, Jinhui Tang, Xiaoyu Du, Richang Hong, Tongwei Ren, Tat-Seng Chua
This poses an imbalanced learning problem, since the scale of missing entries is usually much larger than that of observed entries, but they cannot be ignored due to the valuable negative signal.
no code implementations • 1 Nov 2018 • Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Wei Liu, Jian Yang
In a Co-LSTM unit, each sub-memory unit stores individual motion information, while this Co-LSTM unit selectively integrates and stores inter-related motion information between multiple interacting persons from multiple sub-memory units via the cell gate and co-memory cell, respectively.
Ranked #1 on Human Interaction Recognition on UT
1 code implementation • 19 Sep 2018 • Jinhui Tang, Xiaoyu Du, Xiangnan He, Fajie Yuan, Qi Tian, Tat-Seng Chua
To this end, we propose a novel solution named Adversarial Multimedia Recommendation (AMR), which can lead to a more robust multimedia recommender model by using adversarial learning.
Information Retrieval Multimedia
1 code implementation • 12 Aug 2018 • Xiangnan He, Xiaoyu Du, Xiang Wang, Feng Tian, Jinhui Tang, Tat-Seng Chua
In this work, we contribute a new multi-layer neural network architecture named ONCF to perform collaborative filtering.
Ranked #1 on Recommendation Systems on Gowalla
no code implementations • 2 Aug 2018 • Jinshan Pan, Jiangxin Dong, Yang Liu, Jiawei Zhang, Jimmy Ren, Jinhui Tang, Yu-Wing Tai, Ming-Hsuan Yang
We present an algorithm to directly solve numerous image restoration problems (e. g., image deblurring, image dehazing, image deraining, etc.).
no code implementations • CVPR 2018 • Runde Li, Jinshan Pan, Zechao Li, Jinhui Tang
In contrast, we solve this problem based on a conditional generative adversarial network (cGAN), where the clear image is estimated by an end-to-end trainable neural network.
no code implementations • CVPR 2018 • Jinshan Pan, Sifei Liu, Deqing Sun, Jiawei Zhang, Yang Liu, Jimmy Ren, Zechao Li, Jinhui Tang, Huchuan Lu, Yu-Wing Tai, Ming-Hsuan Yang
These problems usually involve the estimation of two components of the target signals: structures and details.
no code implementations • 7 May 2018 • Lu Jin, Xiangbo Shu, Kai Li, Zechao Li, Guo-Jun Qi, Jinhui Tang
However, most existing deep hashing methods directly learn the hash functions by encoding the global semantic information, while ignoring the local spatial information of images.
no code implementations • 12 Apr 2018 • Jinhui Tang, Xiangbo Shu, Zechao Li, Yu-Gang Jiang, Qi Tian
Recent approaches simultaneously explore visual, user and tag information to improve the performance of image retagging by constructing and exploring an image-tag-user graph.
no code implementations • CVPR 2017 • Longquan Dai, Mengke Yuan, Zechao Li, Xiaopeng Zhang, Jinhui Tang
In this paper we propose a hardware-efficient Guided Filter (HGF), which solves the efficiency problem of multichannel guided image filtering and yields competent results when applying it to multi-label problems with synthesized polynomial multichannel guidance.
no code implementations • 14 Jun 2017 • Yu-Gang Jiang, Zuxuan Wu, Jinhui Tang, Zechao Li, xiangyang xue, Shih-Fu Chang
More specifically, we utilize three Convolutional Neural Networks (CNNs) operating on appearance, motion and audio signals to extract their corresponding features.
no code implementations • 4 Jun 2017 • Xiangbo Shu, Jinhui Tang, Zechao Li, Hanjiang Lai, Liyan Zhang, Shuicheng Yan
Basically, for each age group, we learn an aging dictionary to reveal its aging characteristics (e. g., wrinkles), where the dictionary bases corresponding to the same index yet from two neighboring aging dictionaries form a particular aging pattern cross these two age groups, and a linear combination of all these patterns expresses a particular personalized aging process.
no code implementations • 3 Jun 2017 • Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Yan Song, Zechao Li, Liyan Zhang
To this end, we propose a novel Concurrence-Aware Long Short-Term Sub-Memories (Co-LSTSM) to model the long-term inter-related dynamics between two interacting people on the bounding boxes covering people.
Ranked #2 on Human Interaction Recognition on BIT
no code implementations • 27 Jan 2016 • Siyu Huang, Xi Li, Zhongfei Zhang, Zhouzhou He, Fei Wu, Wei Liu, Jinhui Tang, Yueting Zhuang
The highly effective visual representation and deep context models ensure that our framework makes a deep semantic understanding of the scene and motion pattern, consequently improving the performance of the visual path prediction task.
no code implementations • ICCV 2015 • Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, Shuicheng Yan
In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network.
no code implementations • 23 Oct 2015 • Canyi Lu, Jinhui Tang, Shuicheng Yan, Zhouchen Lin
The nuclear norm is widely used as a convex surrogate of the rank function in compressive sensing for low rank matrix recovery with its applications in image recovery and signal processing.
no code implementations • ICCV 2015 • Xiangbo Shu, Jinhui Tang, Hanjiang Lai, Luoqi Liu, Shuicheng Yan
Second, it is challenging or even impossible to collect faces of all age groups for a particular subject, yet much easier and more practical to get face pairs from neighboring age groups.
no code implementations • CVPR 2015 • Ting Zhang, Guo-Jun Qi, Jinhui Tang, Jingdong Wang
The benefit is that the distance evaluation between the query and the dictionary element (a sparse vector) is accelerated using the efficient sparse vector operation, and thus the cost of distance table computation is reduced a lot.
no code implementations • 18 Jan 2015 • Canyi Lu, Jinhui Tang, Min Lin, Liang Lin, Shuicheng Yan, Zhouchen Lin
In this paper, we study the robust subspace clustering problem, which aims to cluster the given possibly noisy data points into their underlying subspaces.
no code implementations • 3 Sep 2014 • Liansheng Zhuang, Shenghua Gao, Jinhui Tang, Jingjing Wang, Zhouchen Lin, Yi Ma
This paper aims at constructing a good graph for discovering intrinsic data structures in a semi-supervised learning setting.
no code implementations • CVPR 2014 • Canyi Lu, Jinhui Tang, Shuicheng Yan, Zhouchen Lin
We observe that all the existing nonconvex penalty functions are concave and monotonically increasing on $[0,\infty)$.
no code implementations • CVPR 2013 • Yang Liu, Jing Liu, Zechao Li, Jinhui Tang, Hanqing Lu
In this paper, we propose a novel Weakly-Supervised Dual Clustering (WSDC) approach for image semantic segmentation with image-level labels, i. e., collaboratively performing image segmentation and tag alignment with those regions.