Search Results for author: Pan Zhou

Found 149 papers, 51 papers with code

Diffusion Time-step Curriculum for One Image to 3D Generation

1 code implementation • 6 Apr 2024 • Xuanyu Yi, Zike Wu, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Hanwang Zhang

Score distillation sampling~(SDS) has been widely adopted to overcome the absence of unseen views in reconstructing 3D objects from a \textbf{single} image.

3D Generation Image to 3D +1

Paper
Code

Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction

no code implementations • 27 Mar 2024 • Qiuhong Shen, Xuanyu Yi, Zike Wu, Pan Zhou, Hanwang Zhang, Shuicheng Yan, Xinchao Wang

We tackle the challenge of efficiently reconstructing a 3D asset from a single image with growing demands for automated 3D content creation pipelines.

3D Generation 3D Reconstruction +1

Paper
Add Code

Optimization-based Prompt Injection Attack to LLM-as-a-Judge

no code implementations • 26 Mar 2024 • Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong

LLM-as-a-Judge is a novel solution that can assess textual information with large language models (LLMs).

Decision Making

Paper
Add Code

Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models

no code implementations • 20 Mar 2024 • Chengzhe Feng, Yanan sun, Ke Li, Pan Zhou, Jiancheng Lv, Aojun Lu

We conduct GenAP on three popular code intelligence PLMs with three canonical code intelligence tasks including defect prediction, code summarization, and code translation.

Code Summarization Code Translation

Paper
Add Code

Friendly Sharpness-Aware Minimization

1 code implementation • 19 Mar 2024 • Tao Li, Pan Zhou, Zhengbao He, Xinwen Cheng, Xiaolin Huang

By decomposing the adversarial perturbation in SAM into full gradient and stochastic gradient noise components, we discover that relying solely on the full gradient component degrades generalization while excluding it leads to improved performance.

Paper
Code

What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception

1 code implementation • 15 Mar 2024 • Wanfang Su, Lixing Chen, Yang Bai, Xi Lin, Gaolei Li, Zhe Qu, Pan Zhou

The core philosophy of CMiMC is to preserve discriminative information of individual views in the collaborative view by maximizing mutual information between pre- and post-collaboration features while enhancing the efficacy of collaborative views by minimizing the loss function of downstream tasks.

Contrastive Learning Philosophy

Paper
Code

Few-shot Learner Parameterization by Diffusion Time-steps

1 code implementation • 5 Mar 2024 • Zhongqi Yue, Pan Zhou, Richang Hong, Hanwang Zhang, Qianru Sun

To this end, we find an inductive bias that the time-steps of a Diffusion Model (DM) can isolate the nuanced class attributes, i. e., as the forward diffusion adds noise to an image at each time-step, nuanced attributes are usually lost at an earlier time-step than the spurious attributes that are visually prominent.

Few-Shot Learning Inductive Bias

Paper
Code

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

1 code implementation • 7 Feb 2024 • Dongping Chen, Ruoxi Chen, Shilin Zhang, Yinuo Liu, Yaochen Wang, Huichi Zhou, Qihui Zhang, Pan Zhou, Yao Wan, Lichao Sun

Multimodal Large Language Models (MLLMs) have gained significant attention recently, showing remarkable potential in artificial general intelligence.

Paper
Code

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

1 code implementation • 17 Jan 2024 • Zike Wu, Pan Zhou, Xuanyu Yi, Xiaoding Yuan, Hanwang Zhang

To solve this issue, we first deeply analyze the SDS and find that its distillation sampling process indeed corresponds to the trajectory sampling of a stochastic differential equation (SDE): SDS samples along an SDE trajectory to yield a less noisy sample which then serves as a guidance to optimize a 3D model.

3D Generation Text to 3D

Paper
Code

The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023

2 code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Wei Chen, Pan Zhou, Lei Xie

This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023, engaging in the fixed and open tracks of Single-Speaker VSR Task, and the open track of Multi-Speaker VSR Task.

speech-recognition Visual Speech Recognition

Paper
Code

MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition

no code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Pan Zhou, Lei Xie

While automatic speech recognition (ASR) systems degrade significantly in noisy environments, audio-visual speech recognition (AVSR) systems aim to complement the audio stream with noise-invariant visual cues and improve the system's robustness.

Audio-Visual Speech Recognition Automatic Speech Recognition +4

Paper
Add Code

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

no code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, BinBin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias

no code implementations • 15 Dec 2023 • Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie

Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest.

Keyword Spotting

Paper
Add Code

Towards Inductive Robustness: Distilling and Fostering Wave-induced Resonance in Transductive GCNs Against Graph Adversarial Attacks

no code implementations • 14 Dec 2023 • Ao Liu, Wenshan Li, Tao Li, Beibei Li, Hanyuan Huang, Pan Zhou

We then prove that merely three MP iterations within GCNs can induce signal resonance between nodes and edges, manifesting as a coupling between nodes and their distillable surrounding local subgraph.

Paper
Add Code

Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator

no code implementations • 11 Dec 2023 • Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou

Additionally, to validate the efficacy of generated data quantitatively, we add the instruction tuning data produced by Genixer into the training of two representative MLLMs and observe the consistent improvements on various VQA tasks and multimodal benchmarks.

Image Captioning Question Answering +1

Paper
Add Code

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

1 code implementation • 5 Dec 2023 • Shanshan Zhong, Zhongzhan Huang, ShangHua Gao, Wushao Wen, Liang Lin, Marinka Zitnik, Pan Zhou

To this end, we study LLMs on the popular Oogiri game which needs participants to have good creativity and strong associative thinking for responding unexpectedly and humorously to the given image, text, or both, and thus is suitable for LoT study.

Logical Reasoning

203

Paper
Code

Exploring the Robustness of Decentralized Training for Large Language Models

no code implementations • 1 Dec 2023 • Lin Lu, Chenxi Dai, Wangcheng Tao, Binhang Yuan, Yanan sun, Pan Zhou

Decentralized training of large language models has emerged as an effective way to democratize this technology.

Federated Learning

Paper
Add Code

MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning

1 code implementation • 22 Nov 2023 • Yixin Liu, Chenrui Fan, Yutong Dai, Xun Chen, Pan Zhou, Lichao Sun

To solve these challenges, we propose MetaCloak, which solves the bi-level poisoning problem with a meta-learning framework with an additional transformation sampling process to craft transferable and robust perturbation.

Bilevel Optimization Denoising +1

Paper
Code

Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts

no code implementations • 15 Nov 2023 • Yuanwei Wu, Xiang Li, Yixin Liu, Pan Zhou, Lichao Sun

This finding indicates potential exploitable security risks in MLLMs; 2) Based on the acquired system prompts, we propose a novel MLLM jailbreaking attack method termed SASP (Self-Adversarial Attack via System Prompt).

Adversarial Attack

Paper
Add Code

Instant3D: Instant Text-to-3D Generation

no code implementations • 14 Nov 2023 • Ming Li, Pan Zhou, Jia-Wei Liu, Jussi Keppo, Min Lin, Shuicheng Yan, Xiangyu Xu

Once trained, Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network.

3D Generation Negation +1

Paper
Add Code

F$^2$AT: Feature-Focusing Adversarial Training via Disentanglement of Natural and Perturbed Patterns

no code implementations • 23 Oct 2023 • Yaguan Qian, Chenyu Zhao, Zhaoquan Gu, Bin Wang, Shouling Ji, Wei Wang, Boyang Zhou, Pan Zhou

We propose a Feature-Focusing Adversarial Training (F$^2$AT), which differs from previous work in that it enforces the model to focus on the core features from natural patterns and reduce the impact of spurious features from perturbed patterns.

Adversarial Robustness Disentanglement +2

Paper
Add Code

ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

2 code implementations • NeurIPS 2023 • Zhongzhan Huang, Pan Zhou, Shuicheng Yan, Liang Lin

Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness.

Paper
Code

MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use

1 code implementation • 4 Oct 2023 • Yue Huang, Jiawen Shi, Yuan Li, Chenrui Fan, Siyuan Wu, Qihui Zhang, Yixin Liu, Pan Zhou, Yao Wan, Neil Zhenqiang Gong, Lichao Sun

However, in scenarios where LLMs serve as intelligent agents, as seen in applications like AutoGPT and MetaGPT, LLMs are expected to engage in intricate decision-making processes that involve deciding whether to employ a tool and selecting the most suitable tool(s) from a collection of available tools to fulfill user requests.

Decision Making

Paper
Code

3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack

no code implementations • ICCV 2023 • Yunbo Tao, Daizong Liu, Pan Zhou, Yulai Xie, Wei Du, Wei Hu

With the maturity of depth sensors, the vulnerability of 3D point cloud models has received increasing attention in various applications such as autonomous driving and robot navigation.

Autonomous Driving Robot Navigation

Paper
Add Code

Graph Agent Network: Empowering Nodes with Decentralized Communications Capabilities for Adversarial Resilience

no code implementations • 12 Jun 2023 • Ao Liu, Wenshan Li, Tao Li, Beibei Li, Hanyuan Huang, Guangquan Xu, Pan Zhou

In this paper, we propose the Graph Agent Network (GAgN) to address the aforementioned vulnerabilities of GNNs.

Classification Node Classification

Paper
Add Code

Fast Diffusion Model

1 code implementation • 12 Jun 2023 • Zike Wu, Pan Zhou, Kenji Kawaguchi, Hanwang Zhang

In this paper, we propose a Fast Diffusion Model (FDM) to significantly speed up DMs from a stochastic optimization perspective for both faster training and sampling.

Image Generation

Paper
Code

Transform-Equivariant Consistency Learning for Temporal Sentence Grounding

no code implementations • 6 May 2023 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Zichuan Xu, Haozhao Wang, Xing Di, Weining Lu, Yu Cheng

This paper addresses the temporal sentence grounding (TSG).

Sentence Temporal Sentence Grounding

Paper
Add Code

InceptionNeXt: When Inception Meets ConvNeXt

9 code implementations • 29 Mar 2023 • Weihao Yu, Pan Zhou, Shuicheng Yan, Xinchao Wang

Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7x7 depthwise convolution.

Image Classification Semantic Segmentation

29,671

Paper
Code

MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer

1 code implementation • ICCV 2023 • ShangHua Gao, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan

To solve this issue, we propose a Masked Diffusion Transformer (MDT) that introduces a mask latent modeling scheme to explicitly enhance the DPMs' ability to contextual relation learning among object semantic parts in an image.

Ranked #1 on Image Generation on ImageNet 256x256

Image Generation

425

Paper
Code

You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos

no code implementations • CVPR 2023 • Xiang Fang, Daizong Liu, Pan Zhou, Guoshun Nan

To handle the raw video bit-stream input, we propose a novel Three-branch Compressed-domain Spatial-temporal Fusion (TCSF) framework, which extracts and aggregates three kinds of low-level visual features (I-frame, motion vector and residual features) for effective and efficient grounding.

Sentence Temporal Sentence Grounding

Paper
Add Code

Unlearnable Graph: Protecting Graphs from Unauthorized Exploitation

no code implementations • 5 Mar 2023 • Yixin Liu, Chenrui Fan, Pan Zhou, Lichao Sun

While the use of graph-structured data in various fields is becoming increasingly popular, it also raises concerns about the potential unauthorized exploitation of personal data for training commercial graph neural network (GNN) models, which can compromise privacy.

Paper
Add Code

Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal Sentence Localization in Videos

no code implementations • 2 Mar 2023 • Daizong Liu, Pan Zhou

Temporal sentence localization in videos (TSLV) aims to retrieve the most interested segment in an untrimmed video according to a given sentence query.

Representation Learning Sentence +1

Paper
Add Code

Contrastive Video Question Answering via Video Graph Transformer

1 code implementation • 27 Feb 2023 • Junbin Xiao, Pan Zhou, Angela Yao, Yicong Li, Richang Hong, Shuicheng Yan, Tat-Seng Chua

CoVGT's uniqueness and superiority are three-fold: 1) It proposes a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations and dynamics, for complex spatio-temporal reasoning.

Ranked #11 on Video Question Answering on NExT-QA (using extra training data)

Contrastive Learning Question Answering +1

Paper
Code

Tracking Objects and Activities with Attention for Temporal Sentence Grounding

no code implementations • 21 Feb 2023 • Zeyu Xiong, Daizong Liu, Pan Zhou, Jiahao Zhu

Temporal sentence grounding (TSG) aims to localize the temporal segment which is semantically aligned with a natural language query in an untrimmed video. Most existing methods extract frame-grained features or object-grained features by 3D ConvNet or detection network under a conventional TSG framework, failing to capture the subtle differences between frames or to model the spatio-temporal behavior of core persons/objects.

Sentence Temporal Sentence Grounding

Paper
Add Code

BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT

no code implementations • 21 Feb 2023 • Jiawen Shi, Yixin Liu, Pan Zhou, Lichao Sun

Recently, ChatGPT has gained significant attention in research due to its ability to interact with humans effectively.

Backdoor Attack Language Modelling +2

Paper
Add Code

STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition

no code implementations • ICCV 2023 • Ming Li, Xiangyu Xu, Hehe Fan, Pan Zhou, Jun Liu, Jia-Wei Liu, Jiahe Li, Jussi Keppo, Mike Zheng Shou, Shuicheng Yan

For the first time, we introduce vision Transformers into PPAR by treating a video as a tubelet sequence, and accordingly design two complementary mechanisms, i. e., sparsification and anonymization, to remove privacy from a spatio-temporal perspective.

Action Recognition Facial Expression Recognition (FER) +2

Paper
Add Code

Hypotheses Tree Building for One-Shot Temporal Sentence Localization

no code implementations • 5 Jan 2023 • Daizong Liu, Xiang Fang, Pan Zhou, Xing Di, Weining Lu, Yu Cheng

Given an untrimmed video, temporal sentence localization (TSL) aims to localize a specific segment according to a given sentence query.

Sentence

Paper
Add Code

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

no code implementations • 2 Jan 2023 • Jiahao Zhu, Daizong Liu, Pan Zhou, Xing Di, Yu Cheng, Song Yang, Wenzheng Xu, Zichuan Xu, Yao Wan, Lichao Sun, Zeyu Xiong

All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning.

Sentence Temporal Sentence Grounding

Paper
Add Code

You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks?

no code implementations • CVPR 2023 • Zenghui Yuan, Pan Zhou, Kai Zou, Yu Cheng

Vision Transformers (ViTs), which made a splash in the field of computer vision (CV), have shaken the dominance of convolutional neural networks (CNNs).

Backdoor Attack

Paper
Add Code

Position-guided Text Prompt for Vision-Language Pre-training

1 code implementation • CVPR 2023 • Alex Jinpeng Wang, Pan Zhou, Mike Zheng Shou, Shuicheng Yan

In this work, we propose a novel Position-guided Text Prompt (PTP) paradigm to enhance the visual grounding ability of cross-modal models trained with VLP.

Ranked #5 on Zero-Shot Cross-Modal Retrieval on COCO 2014

Cross-Modal Retrieval Image Captioning +6

142

Paper
Code

Distantly-Supervised Named Entity Recognition with Adaptive Teacher Learning and Fine-grained Student Ensemble

1 code implementation • 13 Dec 2022 • Xiaoye Qu, Jun Zeng, Daizong Liu, Zhefeng Wang, Baoxing Huai, Pan Zhou

Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates the data scarcity problem in NER by automatically generating training samples.

named-entity-recognition Named Entity Recognition +1

Paper
Code

MetaFormer Baselines for Vision

7 code implementations • 24 Oct 2022 • Weihao Yu, Chenyang Si, Pan Zhou, Mi Luo, Yichen Zhou, Jiashi Feng, Shuicheng Yan, Xinchao Wang

By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.

Ranked #2 on Domain Generalization on ImageNet-C (using extra training data)

Domain Generalization Image Classification

29,671

Paper
Code

Towards Sustainable Self-supervised Learning

1 code implementation • 20 Oct 2022 • ShangHua Gao, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan

In this work, we explore a sustainable SSL framework with two major challenges: i) learning a stronger new SSL model based on the existing pretrained SSL model, also called as "base" model, in a cost-friendly manner, ii) allowing the training of the new model to be compatible with various base models.

Ranked #1 on Semantic Segmentation on ImageNet-S

Object Detection Relation +3

Paper
Code

LPT: Long-tailed Prompt Tuning for Image Classification

1 code implementation • 3 Oct 2022 • Bowen Dong, Pan Zhou, Shuicheng Yan, WangMeng Zuo

For better effectiveness, we divide prompts into two groups: 1) a shared prompt for the whole long-tailed dataset to learn general features and to adapt a pretrained model into target domain; and 2) group-specific prompts to gather group-specific features for the samples which have similar features and also to empower the pretrained model with discrimination ability.

Ranked #1 on Long-tail Learning on CIFAR-100-LT (ρ=100) (using extra training data)

Classification Image Classification +1

Paper
Code

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

no code implementations • 23 Sep 2022 • Xiang Fang, Daizong Liu, Pan Zhou, Yuchong Hu

In addition, due to the domain gap between different datasets, directly applying these pre-trained models to an unseen domain leads to a significant performance drop.

Information Retrieval Moment Retrieval +1

Paper
Add Code

Hierarchical Local-Global Transformer for Temporal Sentence Grounding

no code implementations • 31 Aug 2022 • Xiang Fang, Daizong Liu, Pan Zhou, Zichuan Xu, Ruixuan Li

To address this issue, in this paper, we propose a novel Hierarchical Local-Global Transformer (HLGT) to leverage this hierarchy information and model the interactions between different levels of granularity and different modalities for learning more fine-grained multi-modal representations.

Sentence Temporal Sentence Grounding

Paper
Add Code

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

4 code implementations • 13 Aug 2022 • Xingyu Xie, Pan Zhou, Huan Li, Zhouchen Lin, Shuicheng Yan

Adan first reformulates the vanilla Nesterov acceleration to develop a new Nesterov momentum estimation (NME) method, which avoids the extra overhead of computing gradient at the extrapolation point.

726

Paper
Code

Video Graph Transformer for Video Question Answering

1 code implementation • 12 Jul 2022 • Junbin Xiao, Pan Zhou, Tat-Seng Chua, Shuicheng Yan

VGT's uniqueness are two-fold: 1) it designs a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations, and dynamics for complex spatio-temporal reasoning; and 2) it exploits disentangled video and text Transformers for relevance comparison between the video and text to perform QA, instead of entangled cross-modal Transformer for answer classification.

Ranked #18 on Video Question Answering on NExT-QA (using extra training data)

Question Answering Relation +2

Paper
Code

Backdoor Attacks on Crowd Counting

1 code implementation • 12 Jul 2022 • Yuhua Sun, Tailai Zhang, Xingjun Ma, Pan Zhou, Jian Lou, Zichuan Xu, Xing Di, Yu Cheng, Lichao

In this paper, we propose two novel Density Manipulation Backdoor Attacks (DMBA$^{-}$ and DMBA$^{+}$) to attack the model to produce arbitrarily large or small density estimations.

Backdoor Attack Crowd Counting +3

Paper
Code

Gaussian Kernel-based Cross Modal Network for Spatio-Temporal Video Grounding

no code implementations • 2 Jul 2022 • Zeyu Xiong, Daizong Liu, Pan Zhou

Spatial-Temporal Video Grounding (STVG) is a challenging task which aims to localize the spatio-temporal tube of the interested object semantically according to a natural language query.

Spatio-Temporal Video Grounding Video Grounding

Paper
Add Code

Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks

no code implementations • 8 Jun 2022 • Jiachun Pan, Pan Zhou, Shuicheng Yan

To solve these problems, we first theoretically show that on an auto-encoder of a two/one-layered convolution encoder/decoder, MRP can capture all discriminative features of each potential semantic class in the pretraining dataset.

Paper
Add Code

Inception Transformer

3 code implementations • 25 May 2022 • Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, Shuicheng Yan

Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information.

Image Classification

554

Paper
Code

Bandits for Structure Perturbation-based Black-box Attacks to Graph Neural Networks with Theoretical Guarantees

1 code implementation • CVPR 2022 • Binghui Wang, Youqi Li, Pan Zhou

We then propose an online attack based on bandit optimization which is proven to be {sublinear} to the query number $T$, i. e., $\mathcal{O}(\sqrt{N}T^{3/4})$ where $N$ is the number of nodes in the graph.

Graph Classification Node Classification

Paper
Code

Mugs: A Multi-Granular Self-Supervised Learning Framework

1 code implementation • 27 Mar 2022 • Pan Zhou, Yichen Zhou, Chenyang Si, Weihao Yu, Teck Khim Ng, Shuicheng Yan

It provides complementary instance supervision to IDS via an extra alignment on local neighbors, and scatters different local-groups separately to increase discriminability.

Ranked #13 on Self-Supervised Image Classification on ImageNet

Contrastive Learning Self-Supervised Image Classification +3

Paper
Code

Self-Promoted Supervision for Few-Shot Transformer

1 code implementation • 14 Mar 2022 • Bowen Dong, Pan Zhou, Shuicheng Yan, WangMeng Zuo

The few-shot learning ability of vision transformers (ViTs) is rarely investigated though heavily desired.

Data Augmentation Few-Shot Learning +1

Paper
Code

Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding

no code implementations • 6 Mar 2022 • Daizong Liu, Xiang Fang, Wei Hu, Pan Zhou

Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.

Object object-detection +4

Paper
Add Code

End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system

no code implementations • 18 Feb 2022 • Zhengyi Zhang, Pan Zhou

End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model.

speech-recognition Speech Recognition

Paper
Add Code

Unsupervised Temporal Video Grounding with Deep Semantic Clustering

no code implementations • 14 Jan 2022 • Daizong Liu, Xiaoye Qu, Yinzhen Wang, Xing Di, Kai Zou, Yu Cheng, Zichuan Xu, Pan Zhou

Temporal video grounding (TVG) aims to localize a target segment in a video according to a given sentence query.

Clustering Sentence +1

Paper
Add Code

Exploring Motion and Appearance Information for Temporal Sentence Grounding

no code implementations • 3 Jan 2022 • Daizong Liu, Xiaoye Qu, Pan Zhou, Yang Liu

Then, we develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations, respectively.

Object object-detection +3

Paper
Add Code

Memory-Guided Semantic Learning Network for Temporal Sentence Grounding

no code implementations • 3 Jan 2022 • Daizong Liu, Xiaoye Qu, Xing Di, Yu Cheng, Zichuan Xu, Pan Zhou

To tackle this issue, we propose a memory-augmented network, called Memory-Guided Semantic Learning Network (MGSL-Net), that learns and memorizes the rarely appeared content in TSG tasks.

Sentence Temporal Sentence Grounding

Paper
Add Code

DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition

1 code implementation • 9 Dec 2021 • Yuxuan Liang, Pan Zhou, Roger Zimmermann, Shuicheng Yan

While transformers have shown great potential on video recognition with their strong capability of capturing long-range dependencies, they often suffer high computational costs induced by the self-attention to the huge number of 3D tokens.

Video Recognition

Paper
Code

SNEAK: Synonymous Sentences-Aware Adversarial Attack on Natural Language Video Localization

no code implementations • 8 Dec 2021 • Wenbo Gou, Wen Shi, Jian Lou, Lijie Huang, Pan Zhou, Ruixuan Li

Natural language video localization (NLVL) is an important task in the vision-language understanding area, which calls for an in-depth understanding of not only computer vision and natural language side alone, but more importantly the interplay between both sides.

Adversarial Attack Adversarial Robustness

Paper
Add Code

Towards Understanding Why Lookahead Generalizes Better Than SGD and Beyond

1 code implementation • NeurIPS 2021 • Pan Zhou, Hanshu Yan, Xiaotong Yuan, Jiashi Feng, Shuicheng Yan

Specifically, we prove that lookahead using SGD as its inner-loop optimizer can better balance the optimization error and generalization error to achieve smaller excess risk error than vanilla SGD on (strongly) convex problems and nonconvex problems with Polyak-{\L}ojasiewicz condition which has been observed/proved in neural networks.

Paper
Code

Unsupervised Domain Adaptive Person Re-Identification via Human Learning Imitation

no code implementations • 28 Nov 2021 • Yang Peng, Ping Liu, Yawei Luo, Pan Zhou, Zichuan Xu, Jingen Liu

Unsupervised domain adaptive person re-identification has received significant attention due to its high practical value.

Domain Adaptive Person Re-Identification Person Re-Identification

Paper
Add Code

MetaFormer Is Actually What You Need for Vision

14 code implementations • CVPR 2022 • Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan

Based on this observation, we hypothesize that the general architecture of the Transformers, instead of the specific token mixer module, is more essential to the model's performance.

Ranked #9 on Semantic Segmentation on DensePASS

Image Classification Object Detection +1

124,527

Paper
Code

Lottery Image Prior

no code implementations • 29 Sep 2021 • Qiming Wu, Xiaohan Chen, Yifan Jiang, Pan Zhou, Zhangyang Wang

Drawing inspirations from the recently prosperous research on lottery ticket hypothesis (LTH), we conjecture and study a novel “lottery image prior” (LIP), stated as: given an (untrained or trained) DNN-based image prior, it will have a sparse subnetwork that can be training in isolation, to match the original DNN’s performance when being applied as a prior to various image inverse problems.

Compressive Sensing Image Reconstruction +1

Paper
Add Code

Bandits for Black-box Attacks to Graph Neural Networks with Structure Perturbation

no code implementations • 29 Sep 2021 • Binghui Wang, Youqi Li, Pan Zhou

However, many recent works have demonstrated that an attacker can mislead GNN models by slightly perturbing the graph structure.

Graph Classification Node Classification

Paper
Add Code

Backdoor Attacks on Federated Learning with Lottery Ticket Hypothesis

1 code implementation • 22 Sep 2021 • Zeyuan Yin, Ye Yuan, Panfeng Guo, Pan Zhou

Edge devices in federated learning usually have much more limited computation and communication resources compared to servers in a data center.

Backdoor Attack Federated Learning +1

Paper
Code

Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition

no code implementations • Findings (EMNLP) 2021 • Guolin Zheng, Yubei Xiao, Ke Gong, Pan Zhou, Xiaodan Liang, Liang Lin

Specifically, we unify a pre-trained acoustic model (wav2vec 2. 0) and a language model (BERT) into an end-to-end trainable framework.

Language Modelling Representation Learning +2

Paper
Add Code

Adaptive Proposal Generation Network for Temporal Sentence Localization in Videos

no code implementations • EMNLP 2021 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou

However, the performance of bottom-up model is inferior to the top-down counterpart as it fails to exploit the segment-level interaction.

Sentence

Paper
Add Code

Progressively Guide to Attend: An Iterative Alignment Framework for Temporal Sentence Grounding

no code implementations • EMNLP 2021 • Daizong Liu, Xiaoye Qu, Pan Zhou

A key solution to temporal sentence grounding (TSG) exists in how to learn effective alignment between vision and language features extracted from an untrimmed video and a sentence description.

Sentence Temporal Sentence Grounding

Paper
Add Code

Coarse to Fine: Domain Adaptive Crowd Counting via Adversarial Scoring Network

no code implementations • 27 Jul 2021 • Zhikang Zou, Xiaoye Qu, Pan Zhou, Shuangjie Xu, Xiaoqing Ye, Wenhao Wu, Jin Ye

In specific, at the coarse-grained stage, we design a dual-discriminator strategy to adapt source domain to be close to the targets from the perspectives of both global and local feature space via adversarial learning.

Crowd Counting Transfer Learning

Paper
Add Code

A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning

no code implementations • NeurIPS 2021 • Pan Zhou, Caiming Xiong, Xiao-Tong Yuan, Steven Hoi

Although intuitive, such a native label assignment strategy cannot reveal the underlying semantic similarity between a query and its positives and negatives, and impairs performance, since some negatives are semantically similar to the query or even share the same semantic class as the query.

Contrastive Learning Representation Learning +2

Paper
Add Code

Prototypical Graph Contrastive Learning

1 code implementation • 17 Jun 2021 • Shuai Lin, Pan Zhou, Zi-Yuan Hu, Shuojia Wang, Ruihui Zhao, Yefeng Zheng, Liang Lin, Eric Xing, Xiaodan Liang

However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i. e., the negatives likely having the same semantic structure with the query, leading to performance degradation.

Clustering Contrastive Learning +1

Paper
Code

Emotion-aware Chat Machine: Automatic Emotional Response Generation for Human-like Emotional Interaction

no code implementations • 6 Jun 2021 • Wei Wei, Jiayi Liu, Xianling Mao, Guibing Guo, Feida Zhu, Pan Zhou, Yuchong Hu

The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions.

Response Generation

Paper
Add Code

Exploiting Global Contextual Information for Document-level Named Entity Recognition

no code implementations • 2 Jun 2021 • Zanbo Wang, Wei Wei, Xianling Mao, Shanshan Feng, Pan Zhou, Zhiyong He, Sheng Jiang

To this end, we propose a model called Global Context enhanced Document-level NER (GCDoc) to leverage global contextual information from two levels, i. e., both word and sentence.

named-entity-recognition Named Entity Recognition +2

Paper
Add Code

TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness

no code implementations • NeurIPS 2021 • Zhuolin Yang, Linyi Li, Xiaojun Xu, Shiliang Zuo, Qian Chen, Pan Zhou, Benjamin I. P. Rubinstein, Ce Zhang, Bo Li

To answer these questions, in this work we first theoretically analyze and outline sufficient conditions for adversarial transferability between models; then propose a practical algorithm to reduce the transferability between base models within an ensemble to improve its robustness.

Paper
Add Code

RBNN: Memory-Efficient Reconfigurable Deep Binary Neural Network with IP Protection for Internet of Things

no code implementations • 9 May 2021 • Huming Qiu, Hua Ma, Zhi Zhang, Yifeng Zheng, Anmin Fu, Pan Zhou, Yansong Gao, Derek Abbott, Said F. Al-Sarawi

To this end, a 1-bit quantized DNN model or deep binary neural network maximizes the memory efficiency, where each parameter in a BNN model has only 1-bit.

Quantization

Paper
Add Code

Towards Adversarial Patch Analysis and Certified Defense against Crowd Counting

1 code implementation • 22 Apr 2021 • Qiming Wu, Zhikang Zou, Pan Zhou, Xiaoqing Ye, Binghui Wang, Ang Li

Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems.

Adversarial Attack Adversarial Robustness +2

Paper
Code

WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition

no code implementations • 8 Apr 2021 • Zhichao Wang, Wenwen Yang, Pan Zhou, Wei Chen

Recently, attention-based encoder-decoder (AED) end-to-end (E2E) models have drawn more and more attention in the field of automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

TRS: Transferability Reduced Ensemble via Encouraging Gradient Diversity and Model Smoothness

1 code implementation • NeurIPS 2021 • Zhuolin Yang, Linyi Li, Xiaojun Xu, Shiliang Zuo, Qian Chen, Benjamin Rubinstein, Pan Zhou, Ce Zhang, Bo Li

Paper
Code

Context-aware Biaffine Localizing Network for Temporal Sentence Grounding

1 code implementation • CVPR 2021 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Yu Cheng, Wei Wei, Zichuan Xu, Yulai Xie

This paper addresses the problem of temporal sentence grounding (TSG), which aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.

Sentence Temporal Sentence Grounding

Paper
Code

DPlis: Boosting Utility of Differentially Private Deep Learning via Randomized Smoothing

2 code implementations • 2 Mar 2021 • Wenxiao Wang, Tianhao Wang, Lun Wang, Nanqing Luo, Pan Zhou, Dawn Song, Ruoxi Jia

Deep learning techniques have achieved remarkable performance in wide-ranging tasks.

311

Paper
Code

Progressive Localization Networks for Language-based Moment Localization

no code implementations • 2 Feb 2021 • Qi Zheng, Jianfeng Dong, Xiaoye Qu, Xun Yang, Yabing Wang, Pan Zhou, Baolong Liu, Xun Wang

The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments.

Paper
Add Code

Erasure for Advancing: Dynamic Self-Supervised Learning for Commonsense Reasoning

no code implementations • 1 Jan 2021 • Fuyu Wang, Pan Zhou, Xiaodan Liang, Liang Lin

To solve this issue, we propose a novel DynamIc Self-sUperviSed Erasure (DISUSE) which adaptively erases redundant and artifactual clues in the context and questions to learn and establish the correct corresponding pair relations between the questions and their clues.

Question Answering Self-Supervised Learning +1

Paper
Add Code

Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

no code implementations • 22 Dec 2020 • Yubei Xiao, Ke Gong, Pan Zhou, Guolin Zheng, Xiaodan Liang, Liang Lin

When sampling tasks in MML-ASR, AMS adaptively determines the task sampling probability for each source language.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

1 code implementation • 22 Dec 2020 • Shuai Lin, Pan Zhou, Xiaodan Liang, Jianheng Tang, Ruihui Zhao, Ziliang Chen, Liang Lin

Besides, we develop a Graph-Evolving Meta-Learning (GEML) framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease, which effectively alleviates the needs of a large number of dialogues.

Dialogue Generation Meta-Learning

Paper
Code

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

no code implementations • 10 Dec 2020 • Daizong Liu, Shuangjie Xu, Xiao-Yang Liu, Zichuan Xu, Wei Wei, Pan Zhou

To capture temporal information from previous frames, we use a memory network to refine the mask of current frame by retrieving historic masks in a temporal graph.

Ranked #11 on Semi-Supervised Video Object Segmentation on DAVIS (no YouTube-VOS training)

Object One-shot visual object segmentation +2

Paper
Add Code

F2Net: Learning to Focus on the Foreground for Unsupervised Video Object Segmentation

no code implementations • 4 Dec 2020 • Daizong Liu, Dongdong Yu, Changhu Wang, Pan Zhou

Specifically, our proposed network consists of three main parts: Siamese Encoder Module, Center Guiding Appearance Diffusion Module, and Dynamic Information Fusion Module.

Ranked #6 on Unsupervised Video Object Segmentation on FBMS test

Semantic Segmentation Unsupervised Video Object Segmentation +1

Paper
Add Code

Reasoning Step-by-Step: Temporal Sentence Localization in Videos via Deep Rectification-Modulation Network

no code implementations • COLING 2020 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou

In this paper, we propose a novel deep rectification-modulation network (RMN), transforming this task into a multi-step reasoning process by repeating rectification and modulation.

Sentence

Paper
Add Code

V3H: View Variation and View Heredity for Incomplete Multi-view Clustering

1 code implementation • 23 Nov 2020 • Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

Inspired by the variation and the heredity in genetics, V3H first decomposes each subspace into a variation matrix for the corresponding view and a heredity matrix for all the views to represent the unique information and the consistent information respectively.

Clustering Incomplete multi-view clustering

Paper
Code

Unbalanced Incomplete Multi-view Clustering via the Scheme of View Evolution: Weak Views are Meat; Strong Views do Eat

1 code implementation • 20 Nov 2020 • Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

However, different views often have distinct incompleteness, i. e., unbalanced incompleteness, which results in strong views (low-incompleteness views) and weak views (high-incompleteness views).

Clustering Incomplete multi-view clustering +1

Paper
Code

ANIMC: A Soft Framework for Auto-weighted Noisy and Incomplete Multi-view Clustering

1 code implementation • 20 Nov 2020 • Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

In these scenarios, original image data often contain missing instances and noises, which is ignored by most multi-view clustering methods.

Clustering Incomplete multi-view clustering +1

Paper
Code

User-based Network Embedding for Collective Opinion Spammer Detection

no code implementations • 16 Nov 2020 • Ziyang Wang, Wei Wei, Xian-Ling Mao, Guibing Guo, Pan Zhou, Shanshan Feng

Due to the huge commercial interests behind online reviews, a tremendousamount of spammers manufacture spam reviews for product reputation manipulation.

Network Embedding Relation

Paper
Add Code

Target Guided Emotion Aware Chat Machine

no code implementations • 15 Nov 2020 • Wei Wei, Jiayi Liu, Xianling Mao, Guibin Guo, Feida Zhu, Pan Zhou, Yuchong Hu, Shanshan Feng

The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions.

Paper
Add Code

Video-based Facial Expression Recognition using Graph Convolutional Networks

no code implementations • 26 Oct 2020 • Daizong Liu, Hongting Zhang, Pan Zhou

In terms of video based FER task, it is sensible to capture the dynamic expression variation among the frames to recognize facial expression.

Facial Expression Recognition Facial Expression Recognition (FER)

Paper
Add Code

Iterative Graph Self-Distillation

no code implementations • 23 Oct 2020 • HANLIN ZHANG, Shuai Lin, Weiyang Liu, Pan Zhou, Jian Tang, Xiaodan Liang, Eric P. Xing

Recently, there has been increasing interest in the challenge of how to discriminatively vectorize graphs.

Contrastive Learning Graph Learning +1

Paper
Add Code

How Important is the Train-Validation Split in Meta-Learning?

no code implementations • 12 Oct 2020 • Yu Bai, Minshuo Chen, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang, Caiming Xiong

A common practice in meta-learning is to perform a train-validation split (\emph{train-val method}) where the prior adapts to the task on one split of the data, and the resulting predictor is evaluated on another split.

Meta-Learning

Paper
Add Code

Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning

no code implementations • NeurIPS 2020 • Pan Zhou, Jiashi Feng, Chao Ma, Caiming Xiong, Steven Hoi, Weinan E

The result shows that (1) the escaping time of both SGD and ADAM~depends on the Radon measure of the basin positively and the heaviness of gradient noise negatively; (2) for the same basin, SGD enjoys smaller escaping time than ADAM, mainly because (a) the geometry adaptation in ADAM~via adaptively scaling each gradient coordinate well diminishes the anisotropic structure in gradient noise and results in larger Radon measure of a basin; (b) the exponential gradient average in ADAM~smooths its gradient and leads to lighter gradient noise tails than SGD.

Paper
Add Code

Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization

no code implementations • ICML 2020 • Pan Zhou, Xiao-Tong Yuan

Particularly, in the case of $\epsilon=\mathcal{O}\big(1/\sqrt{n}\big)$ which is at the order of intrinsic excess error bound of a learning model and thus sufficient for generalization, the stochastic gradient complexity bounds of HSDMPG for quadratic and generic loss functions are respectively $\mathcal{O} (n^{0. 875}\log^{1. 5}(n))$ and $\mathcal{O} (n^{0. 875}\log^{2. 25}(n))$, which to our best knowledge, for the first time achieve optimal generalization in less than a single pass over data.

Paper
Add Code

Efficient, Direct, and Restricted Black-Box Graph Evasion Attacks to Any-Layer Graph Neural Networks via Influence Function

1 code implementation • 1 Sep 2020 • Binghui Wang, Tianxiang Zhou, Minhua Lin, Pan Zhou, Ang Li, Meng Pang, Hai Li, Yiran Chen

Specifically, we first introduce two influence functions, i. e., feature-label influence and label influence, that are defined on GNNs and label propagation (LP), respectively.

Node Classification

Paper
Code

Reinforcement Learning-based Black-Box Evasion Attacks to Link Prediction in Dynamic Graphs

no code implementations • 1 Sep 2020 • Houxiang Fan, Binghui Wang, Pan Zhou, Ang Li, Meng Pang, Zichuan Xu, Cai Fu, Hai Li, Yiran Chen

Link prediction in dynamic graphs (LPDG) is an important research problem that has diverse applications such as online recommendations, studies on disease contagion, organizational studies, etc.

Graph Embedding Link Prediction +2

Paper
Add Code

Identity-Aware Attribute Recognition via Real-Time Distributed Inference in Mobile Edge Clouds

no code implementations • 12 Aug 2020 • Zichuan Xu, Jiangkai Wu, Qiufen Xia, Pan Zhou, Jiankang Ren, HuiZhi Liang

In this paper, we design novel models for pedestrian attribute recognition with re-ID in an MEC-enabled camera monitoring system.

Attribute Pedestrian Attribute Recognition +2

Paper
Add Code

Fine-grained Iterative Attention Network for TemporalLanguage Localization in Videos

no code implementations • 6 Aug 2020 • Xiaoye Qu, Pengwei Tang, Zhikang Zhou, Yu Cheng, Jianfeng Dong, Pan Zhou

In this paper, we propose a Fine-grained Iterative Attention Network (FIAN) that consists of an iterative attention module for bilateral query-video in-formation extraction.

Sentence

Paper
Add Code

Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization

1 code implementation • 4 Aug 2020 • Daizong Liu, Xiaoye Qu, Xiao-Yang Liu, Jianfeng Dong, Pan Zhou, Zichuan Xu

To this end, we propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph.

Graph Attention Sentence

Paper
Code

Theory-Inspired Path-Regularized Differential Network Architecture Search

1 code implementation • NeurIPS 2020 • Pan Zhou, Caiming Xiong, Richard Socher, Steven C. H. Hoi

Then we propose a theory-inspired path-regularized DARTS that consists of two key modules: (i) a differential group-structured sparse binary gate introduced for each operation to avoid unfair competition among operations, and (ii) a path-depth-wise regularization used to incite search exploration for deep architectures that often converge slower than shallow ones as shown in our theory and are not well explored during the search.

Image Classification

Paper
Code

Federated Mutual Learning

3 code implementations • 27 Jun 2020 • Tao Shen, Jie Zhang, Xinkang Jia, Fengda Zhang, Gang Huang, Pan Zhou, Kun Kuang, Fei Wu, Chao Wu

The experiments show that FML can achieve better performance than alternatives in typical FL setting, and clients can be benefited from FML with different models and tasks.

Federated Learning

1,136

Paper
Code

Improving GAN Training with Probability Ratio Clipping and Sample Reweighting

1 code implementation • NeurIPS 2020 • Yue Wu, Pan Zhou, Andrew Gordon Wilson, Eric P. Xing, Zhiting Hu

Despite success on a wide range of problems related to vision, generative adversarial networks (GANs) often suffer from inferior performance due to unstable training, especially for text generation.

Ranked #2 on Text Generation on EMNLP2017 WMT

Image Generation Style Transfer +1

Paper
Code

Prototypical Contrastive Learning of Unsupervised Representations

2 code implementations • ICLR 2021 • Junnan Li, Pan Zhou, Caiming Xiong, Steven C. H. Hoi

This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that addresses the fundamental limitations of instance-wise contrastive learning.

Ranked #5 on Contrastive Learning on imagenet-1k

Clustering Contrastive Learning +4

536

Paper
Code

AN-GCN: An Anonymous Graph Convolutional Network Defense Against Edge-Perturbing Attack

no code implementations • 6 May 2020 • Ao Liu, Beibei Li, Tao Li, Pan Zhou, Rui Wang

In this paper, we first generalize the formulation of edge-perturbing attacks and strictly prove the vulnerability of GCNs to such attacks in node classification tasks.

Adversarial Attack Classification +4

Paper
Add Code

Data Augmentation Imbalance For Imbalanced Attribute Classification

no code implementations • 19 Apr 2020 • Yang Hu, Xiaying Bai, Pan Zhou, Fanhua Shang, ShengMei Shen

Pedestrian attribute recognition is an important multi-label classification problem.

Attribute Classification +4

Paper
Add Code

Crowd Counting via Hierarchical Scale Recalibration Network

no code implementations • 7 Mar 2020 • Zhikang Zou, Yifan Liu, Shuangjie Xu, Wei Wei, Shiping Wen, Pan Zhou

Extensive experiments on crowd counting datasets (ShanghaiTech, MALL, WorldEXPO'10, and UCSD) show that our HSRNet can deliver superior results over all state-of-the-art approaches.

Crowd Counting

Paper
Add Code

Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete Labels

no code implementations • 26 Feb 2020 • Daizong Liu, Shuangjie Xu, Pan Zhou, Kun He, Wei Wei, Zichuan Xu

In this work, we propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases by using a dynamic learnable adjacency matrix in graph structure to improve the diagnosis accuracy.

Multi-Label Classification

Paper
Add Code

Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model Distillation Approach

no code implementations • 6 Feb 2020 • Zeyue Xue, Shuang Luo, Chao Wu, Pan Zhou, Kaigui Bian, Wei Du

Peer-to-peer knowledge transfer in distributed environments has emerged as a promising method since it could accelerate learning and improve team-wide performance without relying on pre-trained teachers in deep reinforcement learning.

Transfer Learning

Paper
Add Code

Prophet: Proactive Candidate-Selection for Federated Learning by Predicting the Qualities of Training and Reporting Phases

no code implementations • 3 Feb 2020 • Huawei Huang, Kangying Lin, Song Guo, Pan Zhou, Zibin Zheng

In the dynamic environment, the mobile devices selected by the existing reactive candidate-selection algorithms very possibly fail to complete the training and reporting phases of FL, because the FL parameter server only knows the currently-observed resources of all candidates.

Federated Learning

Paper
Add Code

Efficient Meta Learning via Minibatch Proximal Update

no code implementations • NeurIPS 2019 • Pan Zhou, Xiao-Tong Yuan, Huan Xu, Shuicheng Yan, Jiashi Feng

We address the problem of meta-learning which learns a prior over hypothesis from a sample of meta-training tasks for fast adaptation on meta-testing tasks.

Few-Shot Learning

Paper
Add Code

Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding

no code implementations • 1 Nov 2019 • Pan Zhou, Ruchao Fan, Wei Chen, Jia Jia

Transformer has shown promising results in many sequence to sequence transformation tasks recently.

speech-recognition Speech Recognition

Paper
Add Code

Tell-the-difference: Fine-grained Visual Descriptor via a Discriminating Referee

no code implementations • 14 Oct 2019 • Shuangjie Xu, Feng Xu, Yu Cheng, Pan Zhou

In this paper, we investigate a novel problem of telling the difference between image pairs in natural language.

Image Captioning

Paper
Add Code

Generating Robust Audio Adversarial Examples using Iterative Proportional Clipping

no code implementations • 25 Sep 2019 • Hongting Zhang, Qiben Yan, Pan Zhou

We then impose a constraint on the perturbation at the positions with lower sound intensity across the time domain to eliminate the perceptible noise during the silent periods or pauses.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Position-Aware Self-Attention based Neural Sequence Labeling

no code implementations • 24 Aug 2019 • Wei Wei, Zanbo Wang, Xian-Ling Mao, Guangyou Zhou, Pan Zhou, Sheng Jiang

Sequence labeling is a fundamental task in natural language processing and has been widely studied.

Chunking named-entity-recognition +7

Paper
Add Code

Enhanced 3D convolutional networks for crowd counting

no code implementations • 12 Aug 2019 • Zhikang Zou, Huiliang Shao, Xiaoye Qu, Wei Wei, Pan Zhou

Recently, convolutional neural networks (CNNs) are the leading defacto method for crowd counting.

Crowd Counting

Paper
Add Code

Attend To Count: Crowd Counting with Adaptive Capacity Multi-scale CNNs

no code implementations • 7 Aug 2019 • Zhikang Zou, Yu Cheng, Xiaoye Qu, Shouling Ji, Xiaoxiao Guo, Pan Zhou

ACM-CNN consists of three types of modules: a coarse network, a fine network, and a smooth network.

Crowd Counting Density Estimation

Paper
Add Code

Joint Coverage and Power Control in Highly Dynamic and Massive UAV Networks: An Aggregative Game-theoretic Learning Approach

no code implementations • 19 Jul 2019 • Zhuoying Li, Pan Zhou, Yanru Zhang, Lin Gao

Unmanned aerial vehicles (UAV) ad-hoc network is a significant contingency plan for communication after a natural disaster, such as typhoon and earthquake.

Paper
Add Code

Exact Recovery of Tensor Robust Principal Component Analysis under Linear Transforms

no code implementations • 16 Jul 2019 • Canyi Lu, Pan Zhou

This work studies the Tensor Robust Principal Component Analysis (TRPCA) problem, which aims to exactly recover the low-rank and sparse components from their sum.

Paper
Add Code

EnlightenGAN: Deep Light Enhancement without Paired Supervision

8 code implementations • 17 Jun 2019 • Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, Zhangyang Wang

Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data?

Ranked #1 on Low-Light Image Enhancement on AFLW (Zhang CVPR 2018 crops)

Generative Adversarial Network Image Restoration +1

1,360

Paper
Code

Adversarial Category Alignment Network for Cross-domain Sentiment Classification

no code implementations • NAACL 2019 • Xiaoye Qu, Zhikang Zou, Yu Cheng, Yang Yang, Pan Zhou

Cross-domain sentiment classification aims to predict sentiment polarity on a target domain utilizing a classifier learned from a source domain.

Classification General Classification +2

Paper
Add Code

MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation

1 code implementation • CVPR 2019 • Shuangjie Xu, Daizong Liu, Linchao Bao, Wei Liu, Pan Zhou

Extensive experiments on challenging datasets demonstrate the effectiveness of the proposed method, especially in the case of object missing.

Ranked #40 on Semi-Supervised Video Object Segmentation on DAVIS 2017 (test-dev)

Decision Making Object +3

Paper
Code

A Stochastic Trust Region Method for Non-convex Minimization

no code implementations • ICLR 2020 • Zebang Shen, Pan Zhou, Cong Fang, Alejandro Ribeiro

We target the problem of finding a local minimum in non-convex finite-sum minimization.

Paper
Add Code

Gradient Scheduling with Global Momentum for Non-IID Data Distributed Asynchronous Training

no code implementations • 21 Feb 2019 • Chengjie Li, Ruixuan Li, Haozhao Wang, Yuhua Li, Pan Zhou, Song Guo, Keqin Li

Distributed asynchronous offline training has received widespread attention in recent years because of its high performance on large-scale data and complex models.

Scheduling

Paper
Add Code

Efficient Stochastic Gradient Hard Thresholding

no code implementations • NeurIPS 2018 • Pan Zhou, Xiao-Tong Yuan, Jiashi Feng

To address these deficiencies, we propose an efficient hybrid stochastic gradient hard thresholding (HSG-HT) method that can be provably shown to have sample-size-independent gradient evaluation and hard thresholding complexity bounds.

Computational Efficiency

Paper
Add Code

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

no code implementations • NeurIPS 2018 • Pan Zhou, Xiao-Tong Yuan, Jiashi Feng

In this paper, we affirmatively answer this open question by showing that under WoRS and for both convex and non-convex problems, it is still possible for HSGD (with constant step-size) to match full gradient descent in rate of convergence, while maintaining comparable sample-size-independent incremental first-order oracle complexity to stochastic gradient descent.

Open-Ended Question Answering

Paper
Add Code

Bayesian Cycle-Consistent Generative Adversarial Networks via Marginalizing Latent Sampling

1 code implementation • 19 Nov 2018 • Haoran You, Yu Cheng, Tianheng Cheng, Chunliang Li, Pan Zhou

We evaluate the proposed Bayesian CycleGAN on multiple benchmark datasets, including Cityscapes, Maps, and Monet2photo.

Image-to-Image Translation Semantic Segmentation +1

Paper
Code

Modality Attention for End-to-End Audio-visual Speech Recognition

no code implementations • 13 Nov 2018 • Pan Zhou, Wenwen Yang, Wei Chen, Yan-Feng Wang, Jia Jia

In this paper, we propose a novel multimodal attention based method for audio-visual speech recognition which could automatically learn the fused representation from both modalities based on their importance.

Audio-Visual Speech Recognition Robust Speech Recognition +2

Paper
Add Code

Exploring RNN-Transducer for Chinese Speech Recognition

no code implementations • 13 Nov 2018 • Senmao Wang, Pan Zhou, Wei Chen, Jia Jia, Lei Xie

End-to-end approaches have drawn much attention recently for significantly simplifying the construction of an automatic speech recognition (ASR) system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

An Online Attention-based Model for Speech Recognition

no code implementations • 13 Nov 2018 • Ruchao Fan, Pan Zhou, Wei Chen, Jia Jia, Gang Liu

In previous work, researchers have shown that such architectures can acquire comparable results to state-of-the-art ASR systems, especially when using a bidirectional encoder and global soft attention (GSA) mechanism.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Spatio-temporal Edge Service Placement: A Bandit Learning Approach

no code implementations • 7 Oct 2018 • Lixing Chen, Jie Xu, Shaolei Ren, Pan Zhou

To solve this problem and optimize the edge computing performance, we propose SEEN, a Spatial-temporal Edge sErvice placemeNt algorithm.

Decision Making Edge-computing

Paper
Add Code

Spatial-Temporal Synergic Residual Learning for Video Person Re-Identification

no code implementations • 16 Jul 2018 • Xinxing Su, Yingtian Zou, Yu Cheng, Shuangjie Xu, Mo Yu, Pan Zhou

We present a novel method - Spatial-Temporal Synergic Residual Network (STSRN) for this problem.

Video-Based Person Re-Identification

Paper
Add Code

Deep Adversarial Subspace Clustering

no code implementations • CVPR 2018 • Pan Zhou, Yunqing Hou, Jiashi Feng

To solve this issue, we propose a novel deep adversarial subspace clustering (DASC) model, which learns more favorable sample representations by deep learning for subspace clustering, and more importantly introduces adversarial learning to supervise sample representation learning and subspace clustering.

Ranked #2 on Image Clustering on coil-40

Clustering Image Clustering +1

Paper
Add Code

Understanding Generalization and Optimization Performance of Deep CNNs

no code implementations • ICML 2018 • Pan Zhou, Jiashi Feng

Besides, we prove that for an arbitrary gradient descent algorithm, the computed approximate stationary point by minimizing empirical risk is also an approximate stationary point to the population risk.

Paper
Add Code

Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond

2 code implementations • 5 Apr 2018 • Xi Ouyang, Yu Cheng, Yifan Jiang, Chun-Liang Li, Pan Zhou

The results show that our framework can smoothly synthesize pedestrians on background images of variations and different levels of details.

Ranked #2 on Scene Text Recognition on MSDA

Generative Adversarial Network Pedestrian Detection +1

324

Paper
Code

Empirical Risk Landscape Analysis for Understanding Deep Neural Networks

no code implementations • ICLR 2018 • Pan Zhou, Jiashi Feng

This work aims to provide comprehensive landscape analysis of empirical risk in deep neural networks (DNNs), including the convergence behavior of its gradient, its stationary points and the empirical risk itself to their corresponding population counterparts, which reveals how various network parameters determine the convergence performance.

Generalization Bounds

Paper
Add Code

A Survey of Model Compression and Acceleration for Deep Neural Networks

no code implementations • 23 Oct 2017 • Yu Cheng, Duo Wang, Pan Zhou, Tao Zhang

Methods of parameter pruning and quantization are described first, after that the other techniques are introduced.

Benchmarking Knowledge Distillation +2

Paper
Add Code

Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification

1 code implementation • ICCV 2017 • Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, Pan Zhou

Person Re-Identification (person re-id) is a crucial task as its applications in visual surveillance and human-computer interaction.

Video-Based Person Re-Identification

Paper
Code

Outlier-Robust Tensor PCA

no code implementations • CVPR 2017 • Pan Zhou, Jiashi Feng

Low-rank tensor analysis is important for various real applications in computer vision.

Clustering Outlier Detection

Paper
Add Code

The Landscape of Deep Learning Algorithms

no code implementations • 19 May 2017 • Pan Zhou, Jiashi Feng

For an $l$-layer linear neural network, we prove its empirical risk uniformly converges to its population risk at the rate of $\mathcal{O}(r^{2l}\sqrt{d\log(l)}/\sqrt{n})$ with training sample size of $n$, the total weight dimension of $d$ and the magnitude bound $r$ of weight of each layer.

Generalization Bounds

Paper
Add Code

Context-Aware Online Learning for Course Recommendation of MOOC Big Data

no code implementations • 11 Oct 2016 • Yifan Hou, Pan Zhou, Ting Wang, Li Yu, Yuchong Hu, Dapeng Wu

In this respect, the key challenge is how to realize personalized course recommendation as well as to reduce the computing and storage costs for the tremendous course data.

Recommendation Systems

Paper
Add Code

Distributed Private Online Learning for Social Big Data Computing over Data Center Networks

no code implementations • 21 Feb 2016 • Chencheng Li, Pan Zhou, Yingxue Zhou, Kaigui Bian, Tao Jiang, Susanto Rahardja

An increasing number of people participate in social networks and massive online social data are obtained.

Cloud Computing Privacy Preserving

Paper
Add Code

Differentially Private Online Learning for Cloud-Based Video Recommendation with Multimedia Big Data in Social Networks

no code implementations • 1 Sep 2015 • Pan Zhou, Yingxue Zhou, Dapeng Wu, Hai Jin

In addition, none of them has considered both the privacy of users' contexts (e, g., social status, ages and hobbies) and video service vendors' repositories, which are extremely sensitive and of significant commercial value.

Privacy Preserving Recommendation Systems

Paper
Add Code

Differentially Private Distributed Online Learning

no code implementations • 25 May 2015 • Chencheng Li, Pan Zhou

Thus, we use differential privacy to preserve the privacy of learners, and study the influence of guaranteeing differential privacy on the utility of the distributed online learning algorithm.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.