no code implementations • 14 Mar 2023 • Xiang Fang, Daizong Liu, Pan Zhou, Guoshun Nan
To handle the raw video bit-stream input, we propose a novel Three-branch Compressed-domain Spatial-temporal Fusion (TCSF) framework, which extracts and aggregates three kinds of low-level visual features (I-frame, motion vector and residual features) for effective and efficient grounding.
no code implementations • 5 Mar 2023 • Yixin Liu, Chenrui Fan, Pan Zhou, Lichao Sun
While the use of graph-structured data in various fields is becoming increasingly popular, it also raises concerns about the potential unauthorized exploitation of personal data for training commercial graph neural network (GNN) models, which can compromise privacy.
no code implementations • 2 Mar 2023 • Daizong Liu, Pan Zhou
Temporal sentence localization in videos (TSLV) aims to retrieve the most interested segment in an untrimmed video according to a given sentence query.
no code implementations • 27 Feb 2023 • Junbin Xiao, Pan Zhou, Angela Yao, Yicong Li, Richang Hong, Shuicheng Yan, Tat-Seng Chua
CoVGT's uniqueness and superiority are three-fold: 1) It proposes a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations and dynamics, for complex spatio-temporal reasoning.
no code implementations • 21 Feb 2023 • Zeyu Xiong, Daizong Liu, Pan Zhou, Jiahao Zhu
Temporal sentence grounding (TSG) aims to localize the temporal segment which is semantically aligned with a natural language query in an untrimmed video. Most existing methods extract frame-grained features or object-grained features by 3D ConvNet or detection network under a conventional TSG framework, failing to capture the subtle differences between frames or to model the spatio-temporal behavior of core persons/objects.
no code implementations • 8 Jan 2023 • Ming Li, Xiangyu Xu, Hehe Fan, Pan Zhou, Jun Liu, Jia-Wei Liu, Jiahe Li, Jussi Keppo, Mike Zheng Shou, Shuicheng Yan
For the first time, we introduce vision Transformers into PPAR by treating a video as a tubelet sequence, and accordingly design two complementary mechanisms, i. e., sparsification and anonymization, to remove privacy from a spatio-temporal perspective.
no code implementations • 5 Jan 2023 • Daizong Liu, Xiang Fang, Pan Zhou, Xing Di, Weining Lu, Yu Cheng
Given an untrimmed video, temporal sentence localization (TSL) aims to localize a specific segment according to a given sentence query.
no code implementations • 2 Jan 2023 • Jiahao Zhu, Daizong Liu, Pan Zhou, Xing Di, Yu Cheng, Song Yang, Wenzheng Xu, Zichuan Xu, Yao Wan, Lichao Sun, Zeyu Xiong
All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning.
1 code implementation • 19 Dec 2022 • Alex Jinpeng Wang, Pan Zhou, Mike Zheng Shou, Shuicheng Yan
In this work, we propose a novel Position-guided Text Prompt (PTP) paradigm to enhance the visual grounding ability of cross-modal models trained with VLP.
Ranked #1 on
Zero-Shot Cross-Modal Retrieval
on COCO 2014
1 code implementation • 13 Dec 2022 • Xiaoye Qu, Jun Zeng, Daizong Liu, Zhefeng Wang, Baoxing Huai, Pan Zhou
Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates the data scarcity problem in NER by automatically generating training samples.
5 code implementations • 24 Oct 2022 • Weihao Yu, Chenyang Si, Pan Zhou, Mi Luo, Yichen Zhou, Jiashi Feng, Shuicheng Yan, Xinchao Wang
By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.
Ranked #57 on
Image Classification
on ImageNet
(using extra training data)
1 code implementation • 20 Oct 2022 • ShangHua Gao, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan
In this work, we explore a sustainable SSL framework with two major challenges: i) learning a stronger new SSL model based on the existing pretrained SSL model, also called as "base" model, in a cost-friendly manner, ii) allowing the training of the new model to be compatible with various base models.
Ranked #1 on
Semantic Segmentation
on ImageNet-S
no code implementations • 3 Oct 2022 • Bowen Dong, Pan Zhou, Shuicheng Yan, WangMeng Zuo
For better effectiveness, we divide prompts into two groups: 1) a shared prompt for the whole long-tailed dataset to learn general features and to adapt a pretrained model into target domain; and 2) group-specific prompts to gather group-specific features for the samples which have similar features and also to empower the pretrained model with discrimination ability.
Ranked #1 on
Long-tail Learning
on CIFAR-100-LT (ρ=100)
(using extra training data)
no code implementations • 23 Sep 2022 • Xiang Fang, Daizong Liu, Pan Zhou, Yuchong Hu
In addition, due to the domain gap between different datasets, directly applying these pre-trained models to an unseen domain leads to a significant performance drop.
no code implementations • 31 Aug 2022 • Xiang Fang, Daizong Liu, Pan Zhou, Zichuan Xu, Ruixuan Li
To address this issue, in this paper, we propose a novel Hierarchical Local-Global Transformer (HLGT) to leverage this hierarchy information and model the interactions between different levels of granularity and different modalities for learning more fine-grained multi-modal representations.
3 code implementations • 13 Aug 2022 • Xingyu Xie, Pan Zhou, Huan Li, Zhouchen Lin, Shuicheng Yan
Adan first reformulates the vanilla Nesterov acceleration to develop a new Nesterov momentum estimation (NME) method, which avoids the extra overhead of computing gradient at the extrapolation point.
2 code implementations • 12 Jul 2022 • Junbin Xiao, Pan Zhou, Tat-Seng Chua, Shuicheng Yan
VGT's uniqueness are two-fold: 1) it designs a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations, and dynamics for complex spatio-temporal reasoning; and 2) it exploits disentangled video and text Transformers for relevance comparison between the video and text to perform QA, instead of entangled cross-modal Transformer for answer classification.
Ranked #2 on
Video Question Answering
on NExT-QA
(using extra training data)
no code implementations • 12 Jul 2022 • Yuhua Sun, Tailai Zhang, Xingjun Ma, Pan Zhou, Jian Lou, Zichuan Xu, Xing Di, Yu Cheng, Lichao
In this paper, we propose two novel Density Manipulation Backdoor Attacks (DMBA$^{-}$ and DMBA$^{+}$) to attack the model to produce arbitrarily large or small density estimations.
no code implementations • 2 Jul 2022 • Zeyu Xiong, Daizong Liu, Pan Zhou
Spatial-Temporal Video Grounding (STVG) is a challenging task which aims to localize the spatio-temporal tube of the interested object semantically according to a natural language query.
no code implementations • 8 Jun 2022 • Jiachun Pan, Pan Zhou, Shuicheng Yan
To solve these problems, we first theoretically show that on an auto-encoder of a two/one-layered convolution encoder/decoder, MRP can capture all discriminative features of each potential semantic class in the pretraining dataset.
2 code implementations • 25 May 2022 • Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, Shuicheng Yan
Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information.
1 code implementation • CVPR 2022 • Binghui Wang, Youqi Li, Pan Zhou
We then propose an online attack based on bandit optimization which is proven to be {sublinear} to the query number $T$, i. e., $\mathcal{O}(\sqrt{N}T^{3/4})$ where $N$ is the number of nodes in the graph.
1 code implementation • 27 Mar 2022 • Pan Zhou, Yichen Zhou, Chenyang Si, Weihao Yu, Teck Khim Ng, Shuicheng Yan
It provides complementary instance supervision to IDS via an extra alignment on local neighbors, and scatters different local-groups separately to increase discriminability.
Ranked #2 on
Self-Supervised Image Classification
on ImageNet
Contrastive Learning
Self-Supervised Image Classification
+3
1 code implementation • 14 Mar 2022 • Bowen Dong, Pan Zhou, Shuicheng Yan, WangMeng Zuo
The few-shot learning ability of vision transformers (ViTs) is rarely investigated though heavily desired.
no code implementations • 6 Mar 2022 • Daizong Liu, Xiang Fang, Wei Hu, Pan Zhou
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
no code implementations • 18 Feb 2022 • Zhengyi Zhang, Pan Zhou
End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model.
no code implementations • 14 Jan 2022 • Daizong Liu, Xiaoye Qu, Yinzhen Wang, Xing Di, Kai Zou, Yu Cheng, Zichuan Xu, Pan Zhou
Temporal video grounding (TVG) aims to localize a target segment in a video according to a given sentence query.
no code implementations • 3 Jan 2022 • Daizong Liu, Xiaoye Qu, Pan Zhou, Yang Liu
Then, we develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations, respectively.
no code implementations • 3 Jan 2022 • Daizong Liu, Xiaoye Qu, Xing Di, Yu Cheng, Zichuan Xu, Pan Zhou
To tackle this issue, we propose a memory-augmented network, called Memory-Guided Semantic Learning Network (MGSL-Net), that learns and memorizes the rarely appeared content in TSG tasks.
1 code implementation • 9 Dec 2021 • Yuxuan Liang, Pan Zhou, Roger Zimmermann, Shuicheng Yan
While transformers have shown great potential on video recognition with their strong capability of capturing long-range dependencies, they often suffer high computational costs induced by the self-attention to the huge number of 3D tokens.
no code implementations • 8 Dec 2021 • Wenbo Gou, Wen Shi, Jian Lou, Lijie Huang, Pan Zhou, Ruixuan Li
Natural language video localization (NLVL) is an important task in the vision-language understanding area, which calls for an in-depth understanding of not only computer vision and natural language side alone, but more importantly the interplay between both sides.
no code implementations • NeurIPS 2021 • Pan Zhou, Hanshu Yan, Xiaotong Yuan, Jiashi Feng, Shuicheng Yan
Specifically, we prove that lookahead using SGD as its inner-loop optimizer can better balance the optimization error and generalization error to achieve smaller excess risk error than vanilla SGD on (strongly) convex problems and nonconvex problems with Polyak-{\L}ojasiewicz condition which has been observed/proved in neural networks.
no code implementations • 28 Nov 2021 • Yang Peng, Ping Liu, Yawei Luo, Pan Zhou, Zichuan Xu, Jingen Liu
Unsupervised domain adaptive person re-identification has received significant attention due to its high practical value.
Domain Adaptive Person Re-Identification
Person Re-Identification
12 code implementations • CVPR 2022 • Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan
Based on this observation, we hypothesize that the general architecture of the Transformers, instead of the specific token mixer module, is more essential to the model's performance.
Ranked #9 on
Semantic Segmentation
on DensePASS
no code implementations • 29 Sep 2021 • Binghui Wang, Youqi Li, Pan Zhou
However, many recent works have demonstrated that an attacker can mislead GNN models by slightly perturbing the graph structure.
no code implementations • 29 Sep 2021 • Qiming Wu, Xiaohan Chen, Yifan Jiang, Pan Zhou, Zhangyang Wang
Drawing inspirations from the recently prosperous research on lottery ticket hypothesis (LTH), we conjecture and study a novel “lottery image prior” (LIP), stated as: given an (untrained or trained) DNN-based image prior, it will have a sparse subnetwork that can be training in isolation, to match the original DNN’s performance when being applied as a prior to various image inverse problems.
1 code implementation • 22 Sep 2021 • Zeyuan Yin, Ye Yuan, Panfeng Guo, Pan Zhou
Edge devices in federated learning usually have much more limited computation and communication resources compared to servers in a data center.
no code implementations • Findings (EMNLP) 2021 • Guolin Zheng, Yubei Xiao, Ke Gong, Pan Zhou, Xiaodan Liang, Liang Lin
Specifically, we unify a pre-trained acoustic model (wav2vec 2. 0) and a language model (BERT) into an end-to-end trainable framework.
no code implementations • EMNLP 2021 • Daizong Liu, Xiaoye Qu, Pan Zhou
A key solution to temporal sentence grounding (TSG) exists in how to learn effective alignment between vision and language features extracted from an untrimmed video and a sentence description.
no code implementations • EMNLP 2021 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou
However, the performance of bottom-up model is inferior to the top-down counterpart as it fails to exploit the segment-level interaction.
no code implementations • 27 Jul 2021 • Zhikang Zou, Xiaoye Qu, Pan Zhou, Shuangjie Xu, Xiaoqing Ye, Wenhao Wu, Jin Ye
In specific, at the coarse-grained stage, we design a dual-discriminator strategy to adapt source domain to be close to the targets from the perspectives of both global and local feature space via adversarial learning.
no code implementations • NeurIPS 2021 • Pan Zhou, Caiming Xiong, Xiao-Tong Yuan, Steven Hoi
Although intuitive, such a native label assignment strategy cannot reveal the underlying semantic similarity between a query and its positives and negatives, and impairs performance, since some negatives are semantically similar to the query or even share the same semantic class as the query.
1 code implementation • 17 Jun 2021 • Shuai Lin, Pan Zhou, Zi-Yuan Hu, Shuojia Wang, Ruihui Zhao, Yefeng Zheng, Liang Lin, Eric Xing, Xiaodan Liang
However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i. e., the negatives likely having the same semantic structure with the query, leading to performance degradation.
no code implementations • 6 Jun 2021 • Wei Wei, Jiayi Liu, Xianling Mao, Guibing Guo, Feida Zhu, Pan Zhou, Yuchong Hu
The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions.
no code implementations • 2 Jun 2021 • Zanbo Wang, Wei Wei, Xianling Mao, Shanshan Feng, Pan Zhou, Zhiyong He, Sheng Jiang
To this end, we propose a model called Global Context enhanced Document-level NER (GCDoc) to leverage global contextual information from two levels, i. e., both word and sentence.
no code implementations • NeurIPS 2021 • Zhuolin Yang, Linyi Li, Xiaojun Xu, Shiliang Zuo, Qian Chen, Pan Zhou, Benjamin I. P. Rubinstein, Ce Zhang, Bo Li
To answer these questions, in this work we first theoretically analyze and outline sufficient conditions for adversarial transferability between models; then propose a practical algorithm to reduce the transferability between base models within an ensemble to improve its robustness.
no code implementations • 9 May 2021 • Huming Qiu, Hua Ma, Zhi Zhang, Yifeng Zheng, Anmin Fu, Pan Zhou, Yansong Gao, Derek Abbott, Said F. Al-Sarawi
To this end, a 1-bit quantized DNN model or deep binary neural network maximizes the memory efficiency, where each parameter in a BNN model has only 1-bit.
1 code implementation • 22 Apr 2021 • Qiming Wu, Zhikang Zou, Pan Zhou, Xiaoqing Ye, Binghui Wang, Ang Li
Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems.
no code implementations • 8 Apr 2021 • Zhichao Wang, Wenwen Yang, Pan Zhou, Wei Chen
Recently, attention-based encoder-decoder (AED) end-to-end (E2E) models have drawn more and more attention in the field of automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • NeurIPS 2021 • Zhuolin Yang, Linyi Li, Xiaojun Xu, Shiliang Zuo, Qian Chen, Benjamin Rubinstein, Pan Zhou, Ce Zhang, Bo Li
To answer these questions, in this work we first theoretically analyze and outline sufficient conditions for adversarial transferability between models; then propose a practical algorithm to reduce the transferability between base models within an ensemble to improve its robustness.
1 code implementation • CVPR 2021 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Yu Cheng, Wei Wei, Zichuan Xu, Yulai Xie
This paper addresses the problem of temporal sentence grounding (TSG), which aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.
1 code implementation • 2 Mar 2021 • Wenxiao Wang, Tianhao Wang, Lun Wang, Nanqing Luo, Pan Zhou, Dawn Song, Ruoxi Jia
Deep learning techniques have achieved remarkable performance in wide-ranging tasks.
no code implementations • 2 Feb 2021 • Qi Zheng, Jianfeng Dong, Xiaoye Qu, Xun Yang, Yabing Wang, Pan Zhou, Baolong Liu, Xun Wang
The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments.
no code implementations • 1 Jan 2021 • Fuyu Wang, Pan Zhou, Xiaodan Liang, Liang Lin
To solve this issue, we propose a novel DynamIc Self-sUperviSed Erasure (DISUSE) which adaptively erases redundant and artifactual clues in the context and questions to learn and establish the correct corresponding pair relations between the questions and their clues.
1 code implementation • 22 Dec 2020 • Shuai Lin, Pan Zhou, Xiaodan Liang, Jianheng Tang, Ruihui Zhao, Ziliang Chen, Liang Lin
Besides, we develop a Graph-Evolving Meta-Learning (GEML) framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease, which effectively alleviates the needs of a large number of dialogues.
no code implementations • 22 Dec 2020 • Yubei Xiao, Ke Gong, Pan Zhou, Guolin Zheng, Xiaodan Liang, Liang Lin
When sampling tasks in MML-ASR, AMS adaptively determines the task sampling probability for each source language.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 10 Dec 2020 • Daizong Liu, Shuangjie Xu, Xiao-Yang Liu, Zichuan Xu, Wei Wei, Pan Zhou
To capture temporal information from previous frames, we use a memory network to refine the mask of current frame by retrieving historic masks in a temporal graph.
One-shot visual object segmentation
Semantic Segmentation
+1
no code implementations • 4 Dec 2020 • Daizong Liu, Dongdong Yu, Changhu Wang, Pan Zhou
Specifically, our proposed network consists of three main parts: Siamese Encoder Module, Center Guiding Appearance Diffusion Module, and Dynamic Information Fusion Module.
Ranked #5 on
Unsupervised Video Object Segmentation
on FBMS test
Semantic Segmentation
Unsupervised Video Object Segmentation
+1
no code implementations • COLING 2020 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou
In this paper, we propose a novel deep rectification-modulation network (RMN), transforming this task into a multi-step reasoning process by repeating rectification and modulation.
1 code implementation • 23 Nov 2020 • Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu
Inspired by the variation and the heredity in genetics, V3H first decomposes each subspace into a variation matrix for the corresponding view and a heredity matrix for all the views to represent the unique information and the consistent information respectively.
1 code implementation • 20 Nov 2020 • Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu
However, different views often have distinct incompleteness, i. e., unbalanced incompleteness, which results in strong views (low-incompleteness views) and weak views (high-incompleteness views).
Incomplete multi-view clustering
Multi-view Subspace Clustering
1 code implementation • 20 Nov 2020 • Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu
In these scenarios, original image data often contain missing instances and noises, which is ignored by most multi-view clustering methods.
no code implementations • 16 Nov 2020 • Ziyang Wang, Wei Wei, Xian-Ling Mao, Guibing Guo, Pan Zhou, Shanshan Feng
Due to the huge commercial interests behind online reviews, a tremendousamount of spammers manufacture spam reviews for product reputation manipulation.
no code implementations • 15 Nov 2020 • Wei Wei, Jiayi Liu, Xianling Mao, Guibin Guo, Feida Zhu, Pan Zhou, Yuchong Hu, Shanshan Feng
The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions.
no code implementations • 26 Oct 2020 • Daizong Liu, Hongting Zhang, Pan Zhou
In terms of video based FER task, it is sensible to capture the dynamic expression variation among the frames to recognize facial expression.
no code implementations • 23 Oct 2020 • HANLIN ZHANG, Shuai Lin, Weiyang Liu, Pan Zhou, Jian Tang, Xiaodan Liang, Eric P. Xing
Recently, there has been increasing interest in the challenge of how to discriminatively vectorize graphs.
no code implementations • NeurIPS 2020 • Pan Zhou, Jiashi Feng, Chao Ma, Caiming Xiong, Steven Hoi, Weinan E
The result shows that (1) the escaping time of both SGD and ADAM~depends on the Radon measure of the basin positively and the heaviness of gradient noise negatively; (2) for the same basin, SGD enjoys smaller escaping time than ADAM, mainly because (a) the geometry adaptation in ADAM~via adaptively scaling each gradient coordinate well diminishes the anisotropic structure in gradient noise and results in larger Radon measure of a basin; (b) the exponential gradient average in ADAM~smooths its gradient and leads to lighter gradient noise tails than SGD.
no code implementations • 12 Oct 2020 • Yu Bai, Minshuo Chen, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang, Caiming Xiong
A common practice in meta-learning is to perform a train-validation split (\emph{train-val method}) where the prior adapts to the task on one split of the data, and the resulting predictor is evaluated on another split.
no code implementations • ICML 2020 • Pan Zhou, Xiao-Tong Yuan
Particularly, in the case of $\epsilon=\mathcal{O}\big(1/\sqrt{n}\big)$ which is at the order of intrinsic excess error bound of a learning model and thus sufficient for generalization, the stochastic gradient complexity bounds of HSDMPG for quadratic and generic loss functions are respectively $\mathcal{O} (n^{0. 875}\log^{1. 5}(n))$ and $\mathcal{O} (n^{0. 875}\log^{2. 25}(n))$, which to our best knowledge, for the first time achieve optimal generalization in less than a single pass over data.
no code implementations • 1 Sep 2020 • Houxiang Fan, Binghui Wang, Pan Zhou, Ang Li, Meng Pang, Zichuan Xu, Cai Fu, Hai Li, Yiran Chen
Link prediction in dynamic graphs (LPDG) is an important research problem that has diverse applications such as online recommendations, studies on disease contagion, organizational studies, etc.
no code implementations • 1 Sep 2020 • Binghui Wang, Tianxiang Zhou, Minhua Lin, Pan Zhou, Ang Li, Meng Pang, Cai Fu, Hai Li, Yiran Chen
Next, we reformulate the evasion attack against GNNs to be related to calculating label influence on LP, which is applicable to multi-layer GNNs and does not need to know the GNN model.
no code implementations • 12 Aug 2020 • Zichuan Xu, Jiangkai Wu, Qiufen Xia, Pan Zhou, Jiankang Ren, HuiZhi Liang
In this paper, we design novel models for pedestrian attribute recognition with re-ID in an MEC-enabled camera monitoring system.
no code implementations • 6 Aug 2020 • Xiaoye Qu, Pengwei Tang, Zhikang Zhou, Yu Cheng, Jianfeng Dong, Pan Zhou
In this paper, we propose a Fine-grained Iterative Attention Network (FIAN) that consists of an iterative attention module for bilateral query-video in-formation extraction.
1 code implementation • 4 Aug 2020 • Daizong Liu, Xiaoye Qu, Xiao-Yang Liu, Jianfeng Dong, Pan Zhou, Zichuan Xu
To this end, we propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph.
1 code implementation • NeurIPS 2020 • Pan Zhou, Caiming Xiong, Richard Socher, Steven C. H. Hoi
Then we propose a theory-inspired path-regularized DARTS that consists of two key modules: (i) a differential group-structured sparse binary gate introduced for each operation to avoid unfair competition among operations, and (ii) a path-depth-wise regularization used to incite search exploration for deep architectures that often converge slower than shallow ones as shown in our theory and are not well explored during the search.
1 code implementation • 27 Jun 2020 • Tao Shen, Jie Zhang, Xinkang Jia, Fengda Zhang, Gang Huang, Pan Zhou, Kun Kuang, Fei Wu, Chao Wu
The experiments show that FML can achieve better performance than alternatives in typical FL setting, and clients can be benefited from FML with different models and tasks.
1 code implementation • NeurIPS 2020 • Yue Wu, Pan Zhou, Andrew Gordon Wilson, Eric P. Xing, Zhiting Hu
Despite success on a wide range of problems related to vision, generative adversarial networks (GANs) often suffer from inferior performance due to unstable training, especially for text generation.
Ranked #2 on
Text Generation
on EMNLP2017 WMT
2 code implementations • ICLR 2021 • Junnan Li, Pan Zhou, Caiming Xiong, Steven C. H. Hoi
This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that addresses the fundamental limitations of instance-wise contrastive learning.
Ranked #5 on
Contrastive Learning
on imagenet-1k
no code implementations • 6 May 2020 • Ao Liu, Beibei Li, Tao Li, Pan Zhou, Rui Wang
In this paper, we first generalize the formulation of edge-perturbing attacks and strictly prove the vulnerability of GCNs to such attacks in node classification tasks.
no code implementations • 19 Apr 2020 • Yang Hu, Xiaying Bai, Pan Zhou, Fanhua Shang, ShengMei Shen
Pedestrian attribute recognition is an important multi-label classification problem.
no code implementations • 7 Mar 2020 • Zhikang Zou, Yifan Liu, Shuangjie Xu, Wei Wei, Shiping Wen, Pan Zhou
Extensive experiments on crowd counting datasets (ShanghaiTech, MALL, WorldEXPO'10, and UCSD) show that our HSRNet can deliver superior results over all state-of-the-art approaches.
no code implementations • 26 Feb 2020 • Daizong Liu, Shuangjie Xu, Pan Zhou, Kun He, Wei Wei, Zichuan Xu
In this work, we propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases by using a dynamic learnable adjacency matrix in graph structure to improve the diagnosis accuracy.
no code implementations • 6 Feb 2020 • Zeyue Xue, Shuang Luo, Chao Wu, Pan Zhou, Kaigui Bian, Wei Du
Peer-to-peer knowledge transfer in distributed environments has emerged as a promising method since it could accelerate learning and improve team-wide performance without relying on pre-trained teachers in deep reinforcement learning.
no code implementations • 3 Feb 2020 • Huawei Huang, Kangying Lin, Song Guo, Pan Zhou, Zibin Zheng
In the dynamic environment, the mobile devices selected by the existing reactive candidate-selection algorithms very possibly fail to complete the training and reporting phases of FL, because the FL parameter server only knows the currently-observed resources of all candidates.
no code implementations • NeurIPS 2019 • Pan Zhou, Xiao-Tong Yuan, Huan Xu, Shuicheng Yan, Jiashi Feng
We address the problem of meta-learning which learns a prior over hypothesis from a sample of meta-training tasks for fast adaptation on meta-testing tasks.
no code implementations • 1 Nov 2019 • Pan Zhou, Ruchao Fan, Wei Chen, Jia Jia
Transformer has shown promising results in many sequence to sequence transformation tasks recently.
no code implementations • 14 Oct 2019 • Shuangjie Xu, Feng Xu, Yu Cheng, Pan Zhou
In this paper, we investigate a novel problem of telling the difference between image pairs in natural language.
no code implementations • 25 Sep 2019 • Hongting Zhang, Qiben Yan, Pan Zhou
We then impose a constraint on the perturbation at the positions with lower sound intensity across the time domain to eliminate the perceptible noise during the silent periods or pauses.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 24 Aug 2019 • Wei Wei, Zanbo Wang, Xian-Ling Mao, Guangyou Zhou, Pan Zhou, Sheng Jiang
Sequence labeling is a fundamental task in natural language processing and has been widely studied.
no code implementations • 12 Aug 2019 • Zhikang Zou, Huiliang Shao, Xiaoye Qu, Wei Wei, Pan Zhou
Recently, convolutional neural networks (CNNs) are the leading defacto method for crowd counting.
no code implementations • 7 Aug 2019 • Zhikang Zou, Yu Cheng, Xiaoye Qu, Shouling Ji, Xiaoxiao Guo, Pan Zhou
ACM-CNN consists of three types of modules: a coarse network, a fine network, and a smooth network.
no code implementations • 16 Jul 2019 • Canyi Lu, Pan Zhou
This work studies the Tensor Robust Principal Component Analysis (TRPCA) problem, which aims to exactly recover the low-rank and sparse components from their sum.
8 code implementations • 17 Jun 2019 • Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, Zhangyang Wang
Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data?
no code implementations • NAACL 2019 • Xiaoye Qu, Zhikang Zou, Yu Cheng, Yang Yang, Pan Zhou
Cross-domain sentiment classification aims to predict sentiment polarity on a target domain utilizing a classifier learned from a source domain.
1 code implementation • CVPR 2019 • Shuangjie Xu, Daizong Liu, Linchao Bao, Wei Liu, Pan Zhou
Extensive experiments on challenging datasets demonstrate the effectiveness of the proposed method, especially in the case of object missing.
Ranked #36 on
Semi-Supervised Video Object Segmentation
on DAVIS 2016
no code implementations • ICLR 2020 • Zebang Shen, Pan Zhou, Cong Fang, Alejandro Ribeiro
We target the problem of finding a local minimum in non-convex finite-sum minimization.
no code implementations • 21 Feb 2019 • Chengjie Li, Ruixuan Li, Haozhao Wang, Yuhua Li, Pan Zhou, Song Guo, Keqin Li
Distributed asynchronous offline training has received widespread attention in recent years because of its high performance on large-scale data and complex models.
no code implementations • NeurIPS 2018 • Pan Zhou, Xiao-Tong Yuan, Jiashi Feng
To address these deficiencies, we propose an efficient hybrid stochastic gradient hard thresholding (HSG-HT) method that can be provably shown to have sample-size-independent gradient evaluation and hard thresholding complexity bounds.
no code implementations • NeurIPS 2018 • Pan Zhou, Xiao-Tong Yuan, Jiashi Feng
In this paper, we affirmatively answer this open question by showing that under WoRS and for both convex and non-convex problems, it is still possible for HSGD (with constant step-size) to match full gradient descent in rate of convergence, while maintaining comparable sample-size-independent incremental first-order oracle complexity to stochastic gradient descent.
1 code implementation • 19 Nov 2018 • Haoran You, Yu Cheng, Tianheng Cheng, Chunliang Li, Pan Zhou
We evaluate the proposed Bayesian CycleGAN on multiple benchmark datasets, including Cityscapes, Maps, and Monet2photo.
no code implementations • 13 Nov 2018 • Ruchao Fan, Pan Zhou, Wei Chen, Jia Jia, Gang Liu
In previous work, researchers have shown that such architectures can acquire comparable results to state-of-the-art ASR systems, especially when using a bidirectional encoder and global soft attention (GSA) mechanism.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 13 Nov 2018 • Senmao Wang, Pan Zhou, Wei Chen, Jia Jia, Lei Xie
End-to-end approaches have drawn much attention recently for significantly simplifying the construction of an automatic speech recognition (ASR) system.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 13 Nov 2018 • Pan Zhou, Wenwen Yang, Wei Chen, Yan-Feng Wang, Jia Jia
In this paper, we propose a novel multimodal attention based method for audio-visual speech recognition which could automatically learn the fused representation from both modalities based on their importance.
Audio-Visual Speech Recognition
Robust Speech Recognition
+2
no code implementations • 7 Oct 2018 • Lixing Chen, Jie Xu, Shaolei Ren, Pan Zhou
To solve this problem and optimize the edge computing performance, we propose SEEN, a Spatial-temporal Edge sErvice placemeNt algorithm.
no code implementations • 16 Jul 2018 • Xinxing Su, Yingtian Zou, Yu Cheng, Shuangjie Xu, Mo Yu, Pan Zhou
We present a novel method - Spatial-Temporal Synergic Residual Network (STSRN) for this problem.
no code implementations • CVPR 2018 • Pan Zhou, Yunqing Hou, Jiashi Feng
To solve this issue, we propose a novel deep adversarial subspace clustering (DASC) model, which learns more favorable sample representations by deep learning for subspace clustering, and more importantly introduces adversarial learning to supervise sample representation learning and subspace clustering.
Ranked #2 on
Image Clustering
on coil-40
no code implementations • ICML 2018 • Pan Zhou, Jiashi Feng
Besides, we prove that for an arbitrary gradient descent algorithm, the computed approximate stationary point by minimizing empirical risk is also an approximate stationary point to the population risk.
3 code implementations • 5 Apr 2018 • Xi Ouyang, Yu Cheng, Yifan Jiang, Chun-Liang Li, Pan Zhou
The results show that our framework can smoothly synthesize pedestrians on background images of variations and different levels of details.
Ranked #2 on
Scene Text Recognition
on MSDA
no code implementations • ICLR 2018 • Pan Zhou, Jiashi Feng
This work aims to provide comprehensive landscape analysis of empirical risk in deep neural networks (DNNs), including the convergence behavior of its gradient, its stationary points and the empirical risk itself to their corresponding population counterparts, which reveals how various network parameters determine the convergence performance.
no code implementations • 23 Oct 2017 • Yu Cheng, Duo Wang, Pan Zhou, Tao Zhang
Methods of parameter pruning and quantization are described first, after that the other techniques are introduced.
1 code implementation • ICCV 2017 • Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, Pan Zhou
Person Re-Identification (person re-id) is a crucial task as its applications in visual surveillance and human-computer interaction.
no code implementations • CVPR 2017 • Pan Zhou, Jiashi Feng
Low-rank tensor analysis is important for various real applications in computer vision.
no code implementations • 19 May 2017 • Pan Zhou, Jiashi Feng
For an $l$-layer linear neural network, we prove its empirical risk uniformly converges to its population risk at the rate of $\mathcal{O}(r^{2l}\sqrt{d\log(l)}/\sqrt{n})$ with training sample size of $n$, the total weight dimension of $d$ and the magnitude bound $r$ of weight of each layer.
no code implementations • 11 Oct 2016 • Yifan Hou, Pan Zhou, Ting Wang, Li Yu, Yuchong Hu, Dapeng Wu
In this respect, the key challenge is how to realize personalized course recommendation as well as to reduce the computing and storage costs for the tremendous course data.
no code implementations • 21 Feb 2016 • Chencheng Li, Pan Zhou, Yingxue Zhou, Kaigui Bian, Tao Jiang, Susanto Rahardja
An increasing number of people participate in social networks and massive online social data are obtained.
no code implementations • 1 Sep 2015 • Pan Zhou, Yingxue Zhou, Dapeng Wu, Hai Jin
In addition, none of them has considered both the privacy of users' contexts (e, g., social status, ages and hobbies) and video service vendors' repositories, which are extremely sensitive and of significant commercial value.
no code implementations • 25 May 2015 • Chencheng Li, Pan Zhou
Thus, we use differential privacy to preserve the privacy of learners, and study the influence of guaranteeing differential privacy on the utility of the distributed online learning algorithm.