Search Results for author: Pan Zhou

Found 149 papers, 51 papers with code

Diffusion Time-step Curriculum for One Image to 3D Generation

1 code implementation6 Apr 2024 Xuanyu Yi, Zike Wu, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Hanwang Zhang

Score distillation sampling~(SDS) has been widely adopted to overcome the absence of unseen views in reconstructing 3D objects from a \textbf{single} image.

3D Generation Image to 3D +1

Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction

no code implementations27 Mar 2024 Qiuhong Shen, Xuanyu Yi, Zike Wu, Pan Zhou, Hanwang Zhang, Shuicheng Yan, Xinchao Wang

We tackle the challenge of efficiently reconstructing a 3D asset from a single image with growing demands for automated 3D content creation pipelines.

3D Generation 3D Reconstruction +1

Optimization-based Prompt Injection Attack to LLM-as-a-Judge

no code implementations26 Mar 2024 Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong

LLM-as-a-Judge is a novel solution that can assess textual information with large language models (LLMs).

Decision Making

Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models

no code implementations20 Mar 2024 Chengzhe Feng, Yanan sun, Ke Li, Pan Zhou, Jiancheng Lv, Aojun Lu

We conduct GenAP on three popular code intelligence PLMs with three canonical code intelligence tasks including defect prediction, code summarization, and code translation.

Code Summarization Code Translation

Friendly Sharpness-Aware Minimization

1 code implementation19 Mar 2024 Tao Li, Pan Zhou, Zhengbao He, Xinwen Cheng, Xiaolin Huang

By decomposing the adversarial perturbation in SAM into full gradient and stochastic gradient noise components, we discover that relying solely on the full gradient component degrades generalization while excluding it leads to improved performance.

What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception

1 code implementation15 Mar 2024 Wanfang Su, Lixing Chen, Yang Bai, Xi Lin, Gaolei Li, Zhe Qu, Pan Zhou

The core philosophy of CMiMC is to preserve discriminative information of individual views in the collaborative view by maximizing mutual information between pre- and post-collaboration features while enhancing the efficacy of collaborative views by minimizing the loss function of downstream tasks.

Contrastive Learning Philosophy

Few-shot Learner Parameterization by Diffusion Time-steps

1 code implementation5 Mar 2024 Zhongqi Yue, Pan Zhou, Richang Hong, Hanwang Zhang, Qianru Sun

To this end, we find an inductive bias that the time-steps of a Diffusion Model (DM) can isolate the nuanced class attributes, i. e., as the forward diffusion adds noise to an image at each time-step, nuanced attributes are usually lost at an earlier time-step than the spurious attributes that are visually prominent.

Few-Shot Learning Inductive Bias

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

1 code implementation7 Feb 2024 Dongping Chen, Ruoxi Chen, Shilin Zhang, Yinuo Liu, Yaochen Wang, Huichi Zhou, Qihui Zhang, Pan Zhou, Yao Wan, Lichao Sun

Multimodal Large Language Models (MLLMs) have gained significant attention recently, showing remarkable potential in artificial general intelligence.

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

1 code implementation17 Jan 2024 Zike Wu, Pan Zhou, Xuanyu Yi, Xiaoding Yuan, Hanwang Zhang

To solve this issue, we first deeply analyze the SDS and find that its distillation sampling process indeed corresponds to the trajectory sampling of a stochastic differential equation (SDE): SDS samples along an SDE trajectory to yield a less noisy sample which then serves as a guidance to optimize a 3D model.

3D Generation Text to 3D

The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023

2 code implementations7 Jan 2024 He Wang, Pengcheng Guo, Wei Chen, Pan Zhou, Lei Xie

This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023, engaging in the fixed and open tracks of Single-Speaker VSR Task, and the open track of Multi-Speaker VSR Task.

speech-recognition Visual Speech Recognition

MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition

no code implementations7 Jan 2024 He Wang, Pengcheng Guo, Pan Zhou, Lei Xie

While automatic speech recognition (ASR) systems degrade significantly in noisy environments, audio-visual speech recognition (AVSR) systems aim to complement the audio stream with noise-invariant visual cues and improve the system's robustness.

Audio-Visual Speech Recognition Automatic Speech Recognition +4

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

no code implementations7 Jan 2024 He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, BinBin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias

no code implementations15 Dec 2023 Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie

Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest.

Keyword Spotting

Towards Inductive Robustness: Distilling and Fostering Wave-induced Resonance in Transductive GCNs Against Graph Adversarial Attacks

no code implementations14 Dec 2023 Ao Liu, Wenshan Li, Tao Li, Beibei Li, Hanyuan Huang, Pan Zhou

We then prove that merely three MP iterations within GCNs can induce signal resonance between nodes and edges, manifesting as a coupling between nodes and their distillable surrounding local subgraph.

Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator

no code implementations11 Dec 2023 Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou

Additionally, to validate the efficacy of generated data quantitatively, we add the instruction tuning data produced by Genixer into the training of two representative MLLMs and observe the consistent improvements on various VQA tasks and multimodal benchmarks.

Image Captioning Question Answering +1

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

1 code implementation5 Dec 2023 Shanshan Zhong, Zhongzhan Huang, ShangHua Gao, Wushao Wen, Liang Lin, Marinka Zitnik, Pan Zhou

To this end, we study LLMs on the popular Oogiri game which needs participants to have good creativity and strong associative thinking for responding unexpectedly and humorously to the given image, text, or both, and thus is suitable for LoT study.

Logical Reasoning

Exploring the Robustness of Decentralized Training for Large Language Models

no code implementations1 Dec 2023 Lin Lu, Chenxi Dai, Wangcheng Tao, Binhang Yuan, Yanan sun, Pan Zhou

Decentralized training of large language models has emerged as an effective way to democratize this technology.

Federated Learning

MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning

1 code implementation22 Nov 2023 Yixin Liu, Chenrui Fan, Yutong Dai, Xun Chen, Pan Zhou, Lichao Sun

To solve these challenges, we propose MetaCloak, which solves the bi-level poisoning problem with a meta-learning framework with an additional transformation sampling process to craft transferable and robust perturbation.

Bilevel Optimization Denoising +1

Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts

no code implementations15 Nov 2023 Yuanwei Wu, Xiang Li, Yixin Liu, Pan Zhou, Lichao Sun

This finding indicates potential exploitable security risks in MLLMs; 2) Based on the acquired system prompts, we propose a novel MLLM jailbreaking attack method termed SASP (Self-Adversarial Attack via System Prompt).

Adversarial Attack

Instant3D: Instant Text-to-3D Generation

no code implementations14 Nov 2023 Ming Li, Pan Zhou, Jia-Wei Liu, Jussi Keppo, Min Lin, Shuicheng Yan, Xiangyu Xu

Once trained, Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network.

3D Generation Negation +1

F$^2$AT: Feature-Focusing Adversarial Training via Disentanglement of Natural and Perturbed Patterns

no code implementations23 Oct 2023 Yaguan Qian, Chenyu Zhao, Zhaoquan Gu, Bin Wang, Shouling Ji, Wei Wang, Boyang Zhou, Pan Zhou

We propose a Feature-Focusing Adversarial Training (F$^2$AT), which differs from previous work in that it enforces the model to focus on the core features from natural patterns and reduce the impact of spurious features from perturbed patterns.

Adversarial Robustness Disentanglement +2

ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

2 code implementations NeurIPS 2023 Zhongzhan Huang, Pan Zhou, Shuicheng Yan, Liang Lin

Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness.

MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use

1 code implementation4 Oct 2023 Yue Huang, Jiawen Shi, Yuan Li, Chenrui Fan, Siyuan Wu, Qihui Zhang, Yixin Liu, Pan Zhou, Yao Wan, Neil Zhenqiang Gong, Lichao Sun

However, in scenarios where LLMs serve as intelligent agents, as seen in applications like AutoGPT and MetaGPT, LLMs are expected to engage in intricate decision-making processes that involve deciding whether to employ a tool and selecting the most suitable tool(s) from a collection of available tools to fulfill user requests.

Decision Making

3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack

no code implementations ICCV 2023 Yunbo Tao, Daizong Liu, Pan Zhou, Yulai Xie, Wei Du, Wei Hu

With the maturity of depth sensors, the vulnerability of 3D point cloud models has received increasing attention in various applications such as autonomous driving and robot navigation.

Autonomous Driving Robot Navigation

Fast Diffusion Model

1 code implementation12 Jun 2023 Zike Wu, Pan Zhou, Kenji Kawaguchi, Hanwang Zhang

In this paper, we propose a Fast Diffusion Model (FDM) to significantly speed up DMs from a stochastic optimization perspective for both faster training and sampling.

Image Generation

InceptionNeXt: When Inception Meets ConvNeXt

9 code implementations29 Mar 2023 Weihao Yu, Pan Zhou, Shuicheng Yan, Xinchao Wang

Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7x7 depthwise convolution.

Image Classification Semantic Segmentation

MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer

1 code implementation ICCV 2023 ShangHua Gao, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan

To solve this issue, we propose a Masked Diffusion Transformer (MDT) that introduces a mask latent modeling scheme to explicitly enhance the DPMs' ability to contextual relation learning among object semantic parts in an image.

Image Generation

You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos

no code implementations CVPR 2023 Xiang Fang, Daizong Liu, Pan Zhou, Guoshun Nan

To handle the raw video bit-stream input, we propose a novel Three-branch Compressed-domain Spatial-temporal Fusion (TCSF) framework, which extracts and aggregates three kinds of low-level visual features (I-frame, motion vector and residual features) for effective and efficient grounding.

Sentence Temporal Sentence Grounding

Unlearnable Graph: Protecting Graphs from Unauthorized Exploitation

no code implementations5 Mar 2023 Yixin Liu, Chenrui Fan, Pan Zhou, Lichao Sun

While the use of graph-structured data in various fields is becoming increasingly popular, it also raises concerns about the potential unauthorized exploitation of personal data for training commercial graph neural network (GNN) models, which can compromise privacy.

Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal Sentence Localization in Videos

no code implementations2 Mar 2023 Daizong Liu, Pan Zhou

Temporal sentence localization in videos (TSLV) aims to retrieve the most interested segment in an untrimmed video according to a given sentence query.

Representation Learning Sentence +1

Contrastive Video Question Answering via Video Graph Transformer

1 code implementation27 Feb 2023 Junbin Xiao, Pan Zhou, Angela Yao, Yicong Li, Richang Hong, Shuicheng Yan, Tat-Seng Chua

CoVGT's uniqueness and superiority are three-fold: 1) It proposes a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations and dynamics, for complex spatio-temporal reasoning.

Ranked #11 on Video Question Answering on NExT-QA (using extra training data)

Contrastive Learning Question Answering +1

Tracking Objects and Activities with Attention for Temporal Sentence Grounding

no code implementations21 Feb 2023 Zeyu Xiong, Daizong Liu, Pan Zhou, Jiahao Zhu

Temporal sentence grounding (TSG) aims to localize the temporal segment which is semantically aligned with a natural language query in an untrimmed video. Most existing methods extract frame-grained features or object-grained features by 3D ConvNet or detection network under a conventional TSG framework, failing to capture the subtle differences between frames or to model the spatio-temporal behavior of core persons/objects.

Sentence Temporal Sentence Grounding

BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT

no code implementations21 Feb 2023 Jiawen Shi, Yixin Liu, Pan Zhou, Lichao Sun

Recently, ChatGPT has gained significant attention in research due to its ability to interact with humans effectively.

Backdoor Attack Language Modelling +2

STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition

no code implementations ICCV 2023 Ming Li, Xiangyu Xu, Hehe Fan, Pan Zhou, Jun Liu, Jia-Wei Liu, Jiahe Li, Jussi Keppo, Mike Zheng Shou, Shuicheng Yan

For the first time, we introduce vision Transformers into PPAR by treating a video as a tubelet sequence, and accordingly design two complementary mechanisms, i. e., sparsification and anonymization, to remove privacy from a spatio-temporal perspective.

Action Recognition Facial Expression Recognition (FER) +2

Hypotheses Tree Building for One-Shot Temporal Sentence Localization

no code implementations5 Jan 2023 Daizong Liu, Xiang Fang, Pan Zhou, Xing Di, Weining Lu, Yu Cheng

Given an untrimmed video, temporal sentence localization (TSL) aims to localize a specific segment according to a given sentence query.

Sentence

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

no code implementations2 Jan 2023 Jiahao Zhu, Daizong Liu, Pan Zhou, Xing Di, Yu Cheng, Song Yang, Wenzheng Xu, Zichuan Xu, Yao Wan, Lichao Sun, Zeyu Xiong

All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning.

Sentence Temporal Sentence Grounding

You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks?

no code implementations CVPR 2023 Zenghui Yuan, Pan Zhou, Kai Zou, Yu Cheng

Vision Transformers (ViTs), which made a splash in the field of computer vision (CV), have shaken the dominance of convolutional neural networks (CNNs).

Backdoor Attack

Position-guided Text Prompt for Vision-Language Pre-training

1 code implementation CVPR 2023 Alex Jinpeng Wang, Pan Zhou, Mike Zheng Shou, Shuicheng Yan

In this work, we propose a novel Position-guided Text Prompt (PTP) paradigm to enhance the visual grounding ability of cross-modal models trained with VLP.

Cross-Modal Retrieval Image Captioning +6

Distantly-Supervised Named Entity Recognition with Adaptive Teacher Learning and Fine-grained Student Ensemble

1 code implementation13 Dec 2022 Xiaoye Qu, Jun Zeng, Daizong Liu, Zhefeng Wang, Baoxing Huai, Pan Zhou

Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates the data scarcity problem in NER by automatically generating training samples.

named-entity-recognition Named Entity Recognition +1

MetaFormer Baselines for Vision

7 code implementations24 Oct 2022 Weihao Yu, Chenyang Si, Pan Zhou, Mi Luo, Yichen Zhou, Jiashi Feng, Shuicheng Yan, Xinchao Wang

By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.

Ranked #2 on Domain Generalization on ImageNet-C (using extra training data)

Domain Generalization Image Classification

Towards Sustainable Self-supervised Learning

1 code implementation20 Oct 2022 ShangHua Gao, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan

In this work, we explore a sustainable SSL framework with two major challenges: i) learning a stronger new SSL model based on the existing pretrained SSL model, also called as "base" model, in a cost-friendly manner, ii) allowing the training of the new model to be compatible with various base models.

Object Detection Relation +3

LPT: Long-tailed Prompt Tuning for Image Classification

1 code implementation3 Oct 2022 Bowen Dong, Pan Zhou, Shuicheng Yan, WangMeng Zuo

For better effectiveness, we divide prompts into two groups: 1) a shared prompt for the whole long-tailed dataset to learn general features and to adapt a pretrained model into target domain; and 2) group-specific prompts to gather group-specific features for the samples which have similar features and also to empower the pretrained model with discrimination ability.

 Ranked #1 on Long-tail Learning on CIFAR-100-LT (ρ=100) (using extra training data)

Classification Image Classification +1

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

no code implementations23 Sep 2022 Xiang Fang, Daizong Liu, Pan Zhou, Yuchong Hu

In addition, due to the domain gap between different datasets, directly applying these pre-trained models to an unseen domain leads to a significant performance drop.

Information Retrieval Moment Retrieval +1

Hierarchical Local-Global Transformer for Temporal Sentence Grounding

no code implementations31 Aug 2022 Xiang Fang, Daizong Liu, Pan Zhou, Zichuan Xu, Ruixuan Li

To address this issue, in this paper, we propose a novel Hierarchical Local-Global Transformer (HLGT) to leverage this hierarchy information and model the interactions between different levels of granularity and different modalities for learning more fine-grained multi-modal representations.

Sentence Temporal Sentence Grounding

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

4 code implementations13 Aug 2022 Xingyu Xie, Pan Zhou, Huan Li, Zhouchen Lin, Shuicheng Yan

Adan first reformulates the vanilla Nesterov acceleration to develop a new Nesterov momentum estimation (NME) method, which avoids the extra overhead of computing gradient at the extrapolation point.

Video Graph Transformer for Video Question Answering

1 code implementation12 Jul 2022 Junbin Xiao, Pan Zhou, Tat-Seng Chua, Shuicheng Yan

VGT's uniqueness are two-fold: 1) it designs a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations, and dynamics for complex spatio-temporal reasoning; and 2) it exploits disentangled video and text Transformers for relevance comparison between the video and text to perform QA, instead of entangled cross-modal Transformer for answer classification.

Ranked #18 on Video Question Answering on NExT-QA (using extra training data)

Question Answering Relation +2

Backdoor Attacks on Crowd Counting

1 code implementation12 Jul 2022 Yuhua Sun, Tailai Zhang, Xingjun Ma, Pan Zhou, Jian Lou, Zichuan Xu, Xing Di, Yu Cheng, Lichao

In this paper, we propose two novel Density Manipulation Backdoor Attacks (DMBA$^{-}$ and DMBA$^{+}$) to attack the model to produce arbitrarily large or small density estimations.

Backdoor Attack Crowd Counting +3

Gaussian Kernel-based Cross Modal Network for Spatio-Temporal Video Grounding

no code implementations2 Jul 2022 Zeyu Xiong, Daizong Liu, Pan Zhou

Spatial-Temporal Video Grounding (STVG) is a challenging task which aims to localize the spatio-temporal tube of the interested object semantically according to a natural language query.

Spatio-Temporal Video Grounding Video Grounding

Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks

no code implementations8 Jun 2022 Jiachun Pan, Pan Zhou, Shuicheng Yan

To solve these problems, we first theoretically show that on an auto-encoder of a two/one-layered convolution encoder/decoder, MRP can capture all discriminative features of each potential semantic class in the pretraining dataset.

Inception Transformer

3 code implementations25 May 2022 Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, Shuicheng Yan

Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information.

Image Classification

Bandits for Structure Perturbation-based Black-box Attacks to Graph Neural Networks with Theoretical Guarantees

1 code implementation CVPR 2022 Binghui Wang, Youqi Li, Pan Zhou

We then propose an online attack based on bandit optimization which is proven to be {sublinear} to the query number $T$, i. e., $\mathcal{O}(\sqrt{N}T^{3/4})$ where $N$ is the number of nodes in the graph.

Graph Classification Node Classification

Mugs: A Multi-Granular Self-Supervised Learning Framework

1 code implementation27 Mar 2022 Pan Zhou, Yichen Zhou, Chenyang Si, Weihao Yu, Teck Khim Ng, Shuicheng Yan

It provides complementary instance supervision to IDS via an extra alignment on local neighbors, and scatters different local-groups separately to increase discriminability.

Contrastive Learning Self-Supervised Image Classification +3

Self-Promoted Supervision for Few-Shot Transformer

1 code implementation14 Mar 2022 Bowen Dong, Pan Zhou, Shuicheng Yan, WangMeng Zuo

The few-shot learning ability of vision transformers (ViTs) is rarely investigated though heavily desired.

Data Augmentation Few-Shot Learning +1

Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding

no code implementations6 Mar 2022 Daizong Liu, Xiang Fang, Wei Hu, Pan Zhou

Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.

Object object-detection +4

End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system

no code implementations18 Feb 2022 Zhengyi Zhang, Pan Zhou

End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model.

speech-recognition Speech Recognition

Unsupervised Temporal Video Grounding with Deep Semantic Clustering

no code implementations14 Jan 2022 Daizong Liu, Xiaoye Qu, Yinzhen Wang, Xing Di, Kai Zou, Yu Cheng, Zichuan Xu, Pan Zhou

Temporal video grounding (TVG) aims to localize a target segment in a video according to a given sentence query.

Clustering Sentence +1

Exploring Motion and Appearance Information for Temporal Sentence Grounding

no code implementations3 Jan 2022 Daizong Liu, Xiaoye Qu, Pan Zhou, Yang Liu

Then, we develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations, respectively.

Object object-detection +3

Memory-Guided Semantic Learning Network for Temporal Sentence Grounding

no code implementations3 Jan 2022 Daizong Liu, Xiaoye Qu, Xing Di, Yu Cheng, Zichuan Xu, Pan Zhou

To tackle this issue, we propose a memory-augmented network, called Memory-Guided Semantic Learning Network (MGSL-Net), that learns and memorizes the rarely appeared content in TSG tasks.

Sentence Temporal Sentence Grounding

DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition

1 code implementation9 Dec 2021 Yuxuan Liang, Pan Zhou, Roger Zimmermann, Shuicheng Yan

While transformers have shown great potential on video recognition with their strong capability of capturing long-range dependencies, they often suffer high computational costs induced by the self-attention to the huge number of 3D tokens.

Video Recognition

SNEAK: Synonymous Sentences-Aware Adversarial Attack on Natural Language Video Localization

no code implementations8 Dec 2021 Wenbo Gou, Wen Shi, Jian Lou, Lijie Huang, Pan Zhou, Ruixuan Li

Natural language video localization (NLVL) is an important task in the vision-language understanding area, which calls for an in-depth understanding of not only computer vision and natural language side alone, but more importantly the interplay between both sides.

Adversarial Attack Adversarial Robustness

Towards Understanding Why Lookahead Generalizes Better Than SGD and Beyond

1 code implementation NeurIPS 2021 Pan Zhou, Hanshu Yan, Xiaotong Yuan, Jiashi Feng, Shuicheng Yan

Specifically, we prove that lookahead using SGD as its inner-loop optimizer can better balance the optimization error and generalization error to achieve smaller excess risk error than vanilla SGD on (strongly) convex problems and nonconvex problems with Polyak-{\L}ojasiewicz condition which has been observed/proved in neural networks.

MetaFormer Is Actually What You Need for Vision

14 code implementations CVPR 2022 Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan

Based on this observation, we hypothesize that the general architecture of the Transformers, instead of the specific token mixer module, is more essential to the model's performance.

Image Classification Object Detection +1

Lottery Image Prior

no code implementations29 Sep 2021 Qiming Wu, Xiaohan Chen, Yifan Jiang, Pan Zhou, Zhangyang Wang

Drawing inspirations from the recently prosperous research on lottery ticket hypothesis (LTH), we conjecture and study a novel “lottery image prior” (LIP), stated as: given an (untrained or trained) DNN-based image prior, it will have a sparse subnetwork that can be training in isolation, to match the original DNN’s performance when being applied as a prior to various image inverse problems.

Compressive Sensing Image Reconstruction +1

Bandits for Black-box Attacks to Graph Neural Networks with Structure Perturbation

no code implementations29 Sep 2021 Binghui Wang, Youqi Li, Pan Zhou

However, many recent works have demonstrated that an attacker can mislead GNN models by slightly perturbing the graph structure.

Graph Classification Node Classification

Backdoor Attacks on Federated Learning with Lottery Ticket Hypothesis

1 code implementation22 Sep 2021 Zeyuan Yin, Ye Yuan, Panfeng Guo, Pan Zhou

Edge devices in federated learning usually have much more limited computation and communication resources compared to servers in a data center.

Backdoor Attack Federated Learning +1

Adaptive Proposal Generation Network for Temporal Sentence Localization in Videos

no code implementations EMNLP 2021 Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou

However, the performance of bottom-up model is inferior to the top-down counterpart as it fails to exploit the segment-level interaction.

Sentence

Progressively Guide to Attend: An Iterative Alignment Framework for Temporal Sentence Grounding

no code implementations EMNLP 2021 Daizong Liu, Xiaoye Qu, Pan Zhou

A key solution to temporal sentence grounding (TSG) exists in how to learn effective alignment between vision and language features extracted from an untrimmed video and a sentence description.

Sentence Temporal Sentence Grounding

Coarse to Fine: Domain Adaptive Crowd Counting via Adversarial Scoring Network

no code implementations27 Jul 2021 Zhikang Zou, Xiaoye Qu, Pan Zhou, Shuangjie Xu, Xiaoqing Ye, Wenhao Wu, Jin Ye

In specific, at the coarse-grained stage, we design a dual-discriminator strategy to adapt source domain to be close to the targets from the perspectives of both global and local feature space via adversarial learning.

Crowd Counting Transfer Learning

A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning

no code implementations NeurIPS 2021 Pan Zhou, Caiming Xiong, Xiao-Tong Yuan, Steven Hoi

Although intuitive, such a native label assignment strategy cannot reveal the underlying semantic similarity between a query and its positives and negatives, and impairs performance, since some negatives are semantically similar to the query or even share the same semantic class as the query.

Contrastive Learning Representation Learning +2

Prototypical Graph Contrastive Learning

1 code implementation17 Jun 2021 Shuai Lin, Pan Zhou, Zi-Yuan Hu, Shuojia Wang, Ruihui Zhao, Yefeng Zheng, Liang Lin, Eric Xing, Xiaodan Liang

However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i. e., the negatives likely having the same semantic structure with the query, leading to performance degradation.

Clustering Contrastive Learning +1

Emotion-aware Chat Machine: Automatic Emotional Response Generation for Human-like Emotional Interaction

no code implementations6 Jun 2021 Wei Wei, Jiayi Liu, Xianling Mao, Guibing Guo, Feida Zhu, Pan Zhou, Yuchong Hu

The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions.

Response Generation

Exploiting Global Contextual Information for Document-level Named Entity Recognition

no code implementations2 Jun 2021 Zanbo Wang, Wei Wei, Xianling Mao, Shanshan Feng, Pan Zhou, Zhiyong He, Sheng Jiang

To this end, we propose a model called Global Context enhanced Document-level NER (GCDoc) to leverage global contextual information from two levels, i. e., both word and sentence.

named-entity-recognition Named Entity Recognition +2

TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness

no code implementations NeurIPS 2021 Zhuolin Yang, Linyi Li, Xiaojun Xu, Shiliang Zuo, Qian Chen, Pan Zhou, Benjamin I. P. Rubinstein, Ce Zhang, Bo Li

To answer these questions, in this work we first theoretically analyze and outline sufficient conditions for adversarial transferability between models; then propose a practical algorithm to reduce the transferability between base models within an ensemble to improve its robustness.

RBNN: Memory-Efficient Reconfigurable Deep Binary Neural Network with IP Protection for Internet of Things

no code implementations9 May 2021 Huming Qiu, Hua Ma, Zhi Zhang, Yifeng Zheng, Anmin Fu, Pan Zhou, Yansong Gao, Derek Abbott, Said F. Al-Sarawi

To this end, a 1-bit quantized DNN model or deep binary neural network maximizes the memory efficiency, where each parameter in a BNN model has only 1-bit.

Quantization

WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition

no code implementations8 Apr 2021 Zhichao Wang, Wenwen Yang, Pan Zhou, Wei Chen

Recently, attention-based encoder-decoder (AED) end-to-end (E2E) models have drawn more and more attention in the field of automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

TRS: Transferability Reduced Ensemble via Encouraging Gradient Diversity and Model Smoothness

1 code implementation NeurIPS 2021 Zhuolin Yang, Linyi Li, Xiaojun Xu, Shiliang Zuo, Qian Chen, Benjamin Rubinstein, Pan Zhou, Ce Zhang, Bo Li

To answer these questions, in this work we first theoretically analyze and outline sufficient conditions for adversarial transferability between models; then propose a practical algorithm to reduce the transferability between base models within an ensemble to improve its robustness.

Context-aware Biaffine Localizing Network for Temporal Sentence Grounding

1 code implementation CVPR 2021 Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Yu Cheng, Wei Wei, Zichuan Xu, Yulai Xie

This paper addresses the problem of temporal sentence grounding (TSG), which aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.

Sentence Temporal Sentence Grounding

Progressive Localization Networks for Language-based Moment Localization

no code implementations2 Feb 2021 Qi Zheng, Jianfeng Dong, Xiaoye Qu, Xun Yang, Yabing Wang, Pan Zhou, Baolong Liu, Xun Wang

The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments.

Erasure for Advancing: Dynamic Self-Supervised Learning for Commonsense Reasoning

no code implementations1 Jan 2021 Fuyu Wang, Pan Zhou, Xiaodan Liang, Liang Lin

To solve this issue, we propose a novel DynamIc Self-sUperviSed Erasure (DISUSE) which adaptively erases redundant and artifactual clues in the context and questions to learn and establish the correct corresponding pair relations between the questions and their clues.

Question Answering Self-Supervised Learning +1

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

1 code implementation22 Dec 2020 Shuai Lin, Pan Zhou, Xiaodan Liang, Jianheng Tang, Ruihui Zhao, Ziliang Chen, Liang Lin

Besides, we develop a Graph-Evolving Meta-Learning (GEML) framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease, which effectively alleviates the needs of a large number of dialogues.

Dialogue Generation Meta-Learning

F2Net: Learning to Focus on the Foreground for Unsupervised Video Object Segmentation

no code implementations4 Dec 2020 Daizong Liu, Dongdong Yu, Changhu Wang, Pan Zhou

Specifically, our proposed network consists of three main parts: Siamese Encoder Module, Center Guiding Appearance Diffusion Module, and Dynamic Information Fusion Module.

Semantic Segmentation Unsupervised Video Object Segmentation +1

Reasoning Step-by-Step: Temporal Sentence Localization in Videos via Deep Rectification-Modulation Network

no code implementations COLING 2020 Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou

In this paper, we propose a novel deep rectification-modulation network (RMN), transforming this task into a multi-step reasoning process by repeating rectification and modulation.

Sentence

V3H: View Variation and View Heredity for Incomplete Multi-view Clustering

1 code implementation23 Nov 2020 Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

Inspired by the variation and the heredity in genetics, V3H first decomposes each subspace into a variation matrix for the corresponding view and a heredity matrix for all the views to represent the unique information and the consistent information respectively.

Clustering Incomplete multi-view clustering

Unbalanced Incomplete Multi-view Clustering via the Scheme of View Evolution: Weak Views are Meat; Strong Views do Eat

1 code implementation20 Nov 2020 Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

However, different views often have distinct incompleteness, i. e., unbalanced incompleteness, which results in strong views (low-incompleteness views) and weak views (high-incompleteness views).

Clustering Incomplete multi-view clustering +1

ANIMC: A Soft Framework for Auto-weighted Noisy and Incomplete Multi-view Clustering

1 code implementation20 Nov 2020 Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

In these scenarios, original image data often contain missing instances and noises, which is ignored by most multi-view clustering methods.

Clustering Incomplete multi-view clustering +1

User-based Network Embedding for Collective Opinion Spammer Detection

no code implementations16 Nov 2020 Ziyang Wang, Wei Wei, Xian-Ling Mao, Guibing Guo, Pan Zhou, Shanshan Feng

Due to the huge commercial interests behind online reviews, a tremendousamount of spammers manufacture spam reviews for product reputation manipulation.

Network Embedding Relation

Target Guided Emotion Aware Chat Machine

no code implementations15 Nov 2020 Wei Wei, Jiayi Liu, Xianling Mao, Guibin Guo, Feida Zhu, Pan Zhou, Yuchong Hu, Shanshan Feng

The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions.

Video-based Facial Expression Recognition using Graph Convolutional Networks

no code implementations26 Oct 2020 Daizong Liu, Hongting Zhang, Pan Zhou

In terms of video based FER task, it is sensible to capture the dynamic expression variation among the frames to recognize facial expression.

Facial Expression Recognition Facial Expression Recognition (FER)

Iterative Graph Self-Distillation

no code implementations23 Oct 2020 HANLIN ZHANG, Shuai Lin, Weiyang Liu, Pan Zhou, Jian Tang, Xiaodan Liang, Eric P. Xing

Recently, there has been increasing interest in the challenge of how to discriminatively vectorize graphs.

Contrastive Learning Graph Learning +1

How Important is the Train-Validation Split in Meta-Learning?

no code implementations12 Oct 2020 Yu Bai, Minshuo Chen, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang, Caiming Xiong

A common practice in meta-learning is to perform a train-validation split (\emph{train-val method}) where the prior adapts to the task on one split of the data, and the resulting predictor is evaluated on another split.

Meta-Learning

Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning

no code implementations NeurIPS 2020 Pan Zhou, Jiashi Feng, Chao Ma, Caiming Xiong, Steven Hoi, Weinan E

The result shows that (1) the escaping time of both SGD and ADAM~depends on the Radon measure of the basin positively and the heaviness of gradient noise negatively; (2) for the same basin, SGD enjoys smaller escaping time than ADAM, mainly because (a) the geometry adaptation in ADAM~via adaptively scaling each gradient coordinate well diminishes the anisotropic structure in gradient noise and results in larger Radon measure of a basin; (b) the exponential gradient average in ADAM~smooths its gradient and leads to lighter gradient noise tails than SGD.

Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization

no code implementations ICML 2020 Pan Zhou, Xiao-Tong Yuan

Particularly, in the case of $\epsilon=\mathcal{O}\big(1/\sqrt{n}\big)$ which is at the order of intrinsic excess error bound of a learning model and thus sufficient for generalization, the stochastic gradient complexity bounds of HSDMPG for quadratic and generic loss functions are respectively $\mathcal{O} (n^{0. 875}\log^{1. 5}(n))$ and $\mathcal{O} (n^{0. 875}\log^{2. 25}(n))$, which to our best knowledge, for the first time achieve optimal generalization in less than a single pass over data.

Efficient, Direct, and Restricted Black-Box Graph Evasion Attacks to Any-Layer Graph Neural Networks via Influence Function

1 code implementation1 Sep 2020 Binghui Wang, Tianxiang Zhou, Minhua Lin, Pan Zhou, Ang Li, Meng Pang, Hai Li, Yiran Chen

Specifically, we first introduce two influence functions, i. e., feature-label influence and label influence, that are defined on GNNs and label propagation (LP), respectively.

Node Classification

Reinforcement Learning-based Black-Box Evasion Attacks to Link Prediction in Dynamic Graphs

no code implementations1 Sep 2020 Houxiang Fan, Binghui Wang, Pan Zhou, Ang Li, Meng Pang, Zichuan Xu, Cai Fu, Hai Li, Yiran Chen

Link prediction in dynamic graphs (LPDG) is an important research problem that has diverse applications such as online recommendations, studies on disease contagion, organizational studies, etc.

Graph Embedding Link Prediction +2

Fine-grained Iterative Attention Network for TemporalLanguage Localization in Videos

no code implementations6 Aug 2020 Xiaoye Qu, Pengwei Tang, Zhikang Zhou, Yu Cheng, Jianfeng Dong, Pan Zhou

In this paper, we propose a Fine-grained Iterative Attention Network (FIAN) that consists of an iterative attention module for bilateral query-video in-formation extraction.

Sentence

Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization

1 code implementation4 Aug 2020 Daizong Liu, Xiaoye Qu, Xiao-Yang Liu, Jianfeng Dong, Pan Zhou, Zichuan Xu

To this end, we propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph.

Graph Attention Sentence

Theory-Inspired Path-Regularized Differential Network Architecture Search

1 code implementation NeurIPS 2020 Pan Zhou, Caiming Xiong, Richard Socher, Steven C. H. Hoi

Then we propose a theory-inspired path-regularized DARTS that consists of two key modules: (i) a differential group-structured sparse binary gate introduced for each operation to avoid unfair competition among operations, and (ii) a path-depth-wise regularization used to incite search exploration for deep architectures that often converge slower than shallow ones as shown in our theory and are not well explored during the search.

Image Classification

Federated Mutual Learning

3 code implementations27 Jun 2020 Tao Shen, Jie Zhang, Xinkang Jia, Fengda Zhang, Gang Huang, Pan Zhou, Kun Kuang, Fei Wu, Chao Wu

The experiments show that FML can achieve better performance than alternatives in typical FL setting, and clients can be benefited from FML with different models and tasks.

Federated Learning

Improving GAN Training with Probability Ratio Clipping and Sample Reweighting

1 code implementation NeurIPS 2020 Yue Wu, Pan Zhou, Andrew Gordon Wilson, Eric P. Xing, Zhiting Hu

Despite success on a wide range of problems related to vision, generative adversarial networks (GANs) often suffer from inferior performance due to unstable training, especially for text generation.

Image Generation Style Transfer +1

Prototypical Contrastive Learning of Unsupervised Representations

2 code implementations ICLR 2021 Junnan Li, Pan Zhou, Caiming Xiong, Steven C. H. Hoi

This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that addresses the fundamental limitations of instance-wise contrastive learning.

Clustering Contrastive Learning +4

AN-GCN: An Anonymous Graph Convolutional Network Defense Against Edge-Perturbing Attack

no code implementations6 May 2020 Ao Liu, Beibei Li, Tao Li, Pan Zhou, Rui Wang

In this paper, we first generalize the formulation of edge-perturbing attacks and strictly prove the vulnerability of GCNs to such attacks in node classification tasks.

Adversarial Attack Classification +4

Crowd Counting via Hierarchical Scale Recalibration Network

no code implementations7 Mar 2020 Zhikang Zou, Yifan Liu, Shuangjie Xu, Wei Wei, Shiping Wen, Pan Zhou

Extensive experiments on crowd counting datasets (ShanghaiTech, MALL, WorldEXPO'10, and UCSD) show that our HSRNet can deliver superior results over all state-of-the-art approaches.

Crowd Counting

Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete Labels

no code implementations26 Feb 2020 Daizong Liu, Shuangjie Xu, Pan Zhou, Kun He, Wei Wei, Zichuan Xu

In this work, we propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases by using a dynamic learnable adjacency matrix in graph structure to improve the diagnosis accuracy.

Multi-Label Classification

Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model Distillation Approach

no code implementations6 Feb 2020 Zeyue Xue, Shuang Luo, Chao Wu, Pan Zhou, Kaigui Bian, Wei Du

Peer-to-peer knowledge transfer in distributed environments has emerged as a promising method since it could accelerate learning and improve team-wide performance without relying on pre-trained teachers in deep reinforcement learning.

Transfer Learning

Prophet: Proactive Candidate-Selection for Federated Learning by Predicting the Qualities of Training and Reporting Phases

no code implementations3 Feb 2020 Huawei Huang, Kangying Lin, Song Guo, Pan Zhou, Zibin Zheng

In the dynamic environment, the mobile devices selected by the existing reactive candidate-selection algorithms very possibly fail to complete the training and reporting phases of FL, because the FL parameter server only knows the currently-observed resources of all candidates.

Federated Learning

Efficient Meta Learning via Minibatch Proximal Update

no code implementations NeurIPS 2019 Pan Zhou, Xiao-Tong Yuan, Huan Xu, Shuicheng Yan, Jiashi Feng

We address the problem of meta-learning which learns a prior over hypothesis from a sample of meta-training tasks for fast adaptation on meta-testing tasks.

Few-Shot Learning

Tell-the-difference: Fine-grained Visual Descriptor via a Discriminating Referee

no code implementations14 Oct 2019 Shuangjie Xu, Feng Xu, Yu Cheng, Pan Zhou

In this paper, we investigate a novel problem of telling the difference between image pairs in natural language.

Image Captioning

Generating Robust Audio Adversarial Examples using Iterative Proportional Clipping

no code implementations25 Sep 2019 Hongting Zhang, Qiben Yan, Pan Zhou

We then impose a constraint on the perturbation at the positions with lower sound intensity across the time domain to eliminate the perceptible noise during the silent periods or pauses.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Enhanced 3D convolutional networks for crowd counting

no code implementations12 Aug 2019 Zhikang Zou, Huiliang Shao, Xiaoye Qu, Wei Wei, Pan Zhou

Recently, convolutional neural networks (CNNs) are the leading defacto method for crowd counting.

Crowd Counting

Joint Coverage and Power Control in Highly Dynamic and Massive UAV Networks: An Aggregative Game-theoretic Learning Approach

no code implementations19 Jul 2019 Zhuoying Li, Pan Zhou, Yanru Zhang, Lin Gao

Unmanned aerial vehicles (UAV) ad-hoc network is a significant contingency plan for communication after a natural disaster, such as typhoon and earthquake.

Exact Recovery of Tensor Robust Principal Component Analysis under Linear Transforms

no code implementations16 Jul 2019 Canyi Lu, Pan Zhou

This work studies the Tensor Robust Principal Component Analysis (TRPCA) problem, which aims to exactly recover the low-rank and sparse components from their sum.

EnlightenGAN: Deep Light Enhancement without Paired Supervision

8 code implementations17 Jun 2019 Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, Zhangyang Wang

Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data?

Generative Adversarial Network Image Restoration +1

Adversarial Category Alignment Network for Cross-domain Sentiment Classification

no code implementations NAACL 2019 Xiaoye Qu, Zhikang Zou, Yu Cheng, Yang Yang, Pan Zhou

Cross-domain sentiment classification aims to predict sentiment polarity on a target domain utilizing a classifier learned from a source domain.

Classification General Classification +2

A Stochastic Trust Region Method for Non-convex Minimization

no code implementations ICLR 2020 Zebang Shen, Pan Zhou, Cong Fang, Alejandro Ribeiro

We target the problem of finding a local minimum in non-convex finite-sum minimization.

Gradient Scheduling with Global Momentum for Non-IID Data Distributed Asynchronous Training

no code implementations21 Feb 2019 Chengjie Li, Ruixuan Li, Haozhao Wang, Yuhua Li, Pan Zhou, Song Guo, Keqin Li

Distributed asynchronous offline training has received widespread attention in recent years because of its high performance on large-scale data and complex models.

Scheduling

Efficient Stochastic Gradient Hard Thresholding

no code implementations NeurIPS 2018 Pan Zhou, Xiao-Tong Yuan, Jiashi Feng

To address these deficiencies, we propose an efficient hybrid stochastic gradient hard thresholding (HSG-HT) method that can be provably shown to have sample-size-independent gradient evaluation and hard thresholding complexity bounds.

Computational Efficiency

New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

no code implementations NeurIPS 2018 Pan Zhou, Xiao-Tong Yuan, Jiashi Feng

In this paper, we affirmatively answer this open question by showing that under WoRS and for both convex and non-convex problems, it is still possible for HSGD (with constant step-size) to match full gradient descent in rate of convergence, while maintaining comparable sample-size-independent incremental first-order oracle complexity to stochastic gradient descent.

Open-Ended Question Answering

Bayesian Cycle-Consistent Generative Adversarial Networks via Marginalizing Latent Sampling

1 code implementation19 Nov 2018 Haoran You, Yu Cheng, Tianheng Cheng, Chunliang Li, Pan Zhou

We evaluate the proposed Bayesian CycleGAN on multiple benchmark datasets, including Cityscapes, Maps, and Monet2photo.

Image-to-Image Translation Semantic Segmentation +1

Modality Attention for End-to-End Audio-visual Speech Recognition

no code implementations13 Nov 2018 Pan Zhou, Wenwen Yang, Wei Chen, Yan-Feng Wang, Jia Jia

In this paper, we propose a novel multimodal attention based method for audio-visual speech recognition which could automatically learn the fused representation from both modalities based on their importance.

Audio-Visual Speech Recognition Robust Speech Recognition +2

Exploring RNN-Transducer for Chinese Speech Recognition

no code implementations13 Nov 2018 Senmao Wang, Pan Zhou, Wei Chen, Jia Jia, Lei Xie

End-to-end approaches have drawn much attention recently for significantly simplifying the construction of an automatic speech recognition (ASR) system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

An Online Attention-based Model for Speech Recognition

no code implementations13 Nov 2018 Ruchao Fan, Pan Zhou, Wei Chen, Jia Jia, Gang Liu

In previous work, researchers have shown that such architectures can acquire comparable results to state-of-the-art ASR systems, especially when using a bidirectional encoder and global soft attention (GSA) mechanism.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Spatio-temporal Edge Service Placement: A Bandit Learning Approach

no code implementations7 Oct 2018 Lixing Chen, Jie Xu, Shaolei Ren, Pan Zhou

To solve this problem and optimize the edge computing performance, we propose SEEN, a Spatial-temporal Edge sErvice placemeNt algorithm.

Decision Making Edge-computing

Deep Adversarial Subspace Clustering

no code implementations CVPR 2018 Pan Zhou, Yunqing Hou, Jiashi Feng

To solve this issue, we propose a novel deep adversarial subspace clustering (DASC) model, which learns more favorable sample representations by deep learning for subspace clustering, and more importantly introduces adversarial learning to supervise sample representation learning and subspace clustering.

Clustering Image Clustering +1

Understanding Generalization and Optimization Performance of Deep CNNs

no code implementations ICML 2018 Pan Zhou, Jiashi Feng

Besides, we prove that for an arbitrary gradient descent algorithm, the computed approximate stationary point by minimizing empirical risk is also an approximate stationary point to the population risk.

Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond

2 code implementations5 Apr 2018 Xi Ouyang, Yu Cheng, Yifan Jiang, Chun-Liang Li, Pan Zhou

The results show that our framework can smoothly synthesize pedestrians on background images of variations and different levels of details.

Generative Adversarial Network Pedestrian Detection +1

Empirical Risk Landscape Analysis for Understanding Deep Neural Networks

no code implementations ICLR 2018 Pan Zhou, Jiashi Feng

This work aims to provide comprehensive landscape analysis of empirical risk in deep neural networks (DNNs), including the convergence behavior of its gradient, its stationary points and the empirical risk itself to their corresponding population counterparts, which reveals how various network parameters determine the convergence performance.

Generalization Bounds

A Survey of Model Compression and Acceleration for Deep Neural Networks

no code implementations23 Oct 2017 Yu Cheng, Duo Wang, Pan Zhou, Tao Zhang

Methods of parameter pruning and quantization are described first, after that the other techniques are introduced.

Benchmarking Knowledge Distillation +2

Outlier-Robust Tensor PCA

no code implementations CVPR 2017 Pan Zhou, Jiashi Feng

Low-rank tensor analysis is important for various real applications in computer vision.

Clustering Outlier Detection

The Landscape of Deep Learning Algorithms

no code implementations19 May 2017 Pan Zhou, Jiashi Feng

For an $l$-layer linear neural network, we prove its empirical risk uniformly converges to its population risk at the rate of $\mathcal{O}(r^{2l}\sqrt{d\log(l)}/\sqrt{n})$ with training sample size of $n$, the total weight dimension of $d$ and the magnitude bound $r$ of weight of each layer.

Generalization Bounds

Context-Aware Online Learning for Course Recommendation of MOOC Big Data

no code implementations11 Oct 2016 Yifan Hou, Pan Zhou, Ting Wang, Li Yu, Yuchong Hu, Dapeng Wu

In this respect, the key challenge is how to realize personalized course recommendation as well as to reduce the computing and storage costs for the tremendous course data.

Recommendation Systems

Differentially Private Online Learning for Cloud-Based Video Recommendation with Multimedia Big Data in Social Networks

no code implementations1 Sep 2015 Pan Zhou, Yingxue Zhou, Dapeng Wu, Hai Jin

In addition, none of them has considered both the privacy of users' contexts (e, g., social status, ages and hobbies) and video service vendors' repositories, which are extremely sensitive and of significant commercial value.

Privacy Preserving Recommendation Systems

Differentially Private Distributed Online Learning

no code implementations25 May 2015 Chencheng Li, Pan Zhou

Thus, we use differential privacy to preserve the privacy of learners, and study the influence of guaranteeing differential privacy on the utility of the distributed online learning algorithm.

Cannot find the paper you are looking for? You can Submit a new open access paper.