no code implementations • 8 May 2025 • Wanjiang Weng, Xiaofeng Tan, Hongsong Wang, Pan Zhou
To address these challenges, we propose BiHumanML3D, a novel bilingual human motion dataset, which establishes a crucial benchmark for bilingual text-to-motion generation models.
no code implementations • 3 May 2025 • Jie Liu, Pan Zhou, Zehao Xiao, Jiayi Shen, Wenzhe Yin, Jan-Jakob Sonke, Efstratios Gavves
Interactive 3D segmentation has emerged as a promising solution for generating accurate object masks in complex 3D scenes by incorporating user-provided clicks.
no code implementations • 14 Mar 2025 • Xueyang Zhou, Guiyao Tie, Guowen Zhang, Weidong Wang, Zhigang Zuo, Di wu, DuanFeng Chu, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong
The rise of Large Reasoning Models (LRMs) signifies a paradigm shift toward advanced computational reasoning.
no code implementations • 13 Mar 2025 • Qi Zhao, Zhan Ma, Pan Zhou
Recent developments in generative diffusion models have turned many dreams into realities.
no code implementations • 8 Mar 2025 • Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, Zhenhan Dai, Yifeng Xie, Yihan Cao, Lichao Sun, Pan Zhou, Lifang He, Hechang Chen, Yu Zhang, Qingsong Wen, Tianming Liu, Neil Zhenqiang Gong, Jiliang Tang, Caiming Xiong, Heng Ji, Philip S. Yu, Jianfeng Gao
The emergence of Large Language Models (LLMs) has fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.
no code implementations • 8 Mar 2025 • Yinuo Liu, Zenghui Yuan, Guiyao Tie, Jiawen Shi, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong
Multimodal retrieval-augmented generation (RAG) enhances the visual reasoning capability of vision-language models (VLMs) by dynamically accessing information from external knowledge bases.
1 code implementation • 3 Mar 2025 • Yisen Li, Lingfeng Yang, Wenxuan Shen, Pan Zhou, Yao Wan, Weiwei Lin, Dongping Chen
Therefore, we investigate more diverse signals to capture comprehensive instruction-response pair characteristics and propose three foundational metrics that leverage Multi-LLM wisdom, informed by (1) diverse LLM responses and (2) reward model assessment.
no code implementations • 24 Feb 2025 • Wenzhe Yin, Zehao Xiao, Pan Zhou, Shujian Yu, Jiayi Shen, Jan-Jakob Sonke, Efstratios Gavves
In this paper, to overcome the limitation, we propose CS-Aligner, a novel and straightforward framework that performs distributional vision-language alignment by integrating Cauchy-Schwarz (CS) divergence with mutual information.
no code implementations • 20 Feb 2025 • Xiandong Zou, WanYu Lin, Yuchen Li, Pan Zhou
Aligning Large Language Model (LLM) responses with human preferences is vital for building safe and controllable AI systems.
1 code implementation • 16 Feb 2025 • Shijing Hu, Jingyang Li, Xingyu Xie, Zhihui Lu, Kim-Chuan Toh, Pan Zhou
Speculative decoding accelerates inference in large language models (LLMs) by generating multiple draft tokens simultaneously.
no code implementations • 13 Feb 2025 • Jingyang Li, Jiachun Pan, Kim-Chuan Toh, Pan Zhou
In this work, we present a unified theoretical framework that elucidates how data augmentation enhances generalization through two key effects: partial semantic feature removal and feature mixing.
no code implementations • 7 Feb 2025 • Lin Zhang, Wenshuo Dong, Zhuoran Zhang, Shu Yang, Lijie Hu, Ninghao Liu, Pan Zhou, Di Wang
In this paper, we revisit existing gradient-based circuit identification methods and find that their performance is either affected by the zero-gradient problem or saturation effects, where edge attribution scores become insensitive to input changes, resulting in noisy and unreliable attribution evaluations for circuit components.
no code implementations • 25 Jan 2025 • Zhongzhan Huang, Shanshan Zhong, Pan Zhou, ShangHua Gao, Marinka Zitnik, Liang Lin
This game aligns well with the input-output structure of modern multimodal LLMs and benefits from a rich repository of high-quality, human-annotated creative responses, making it an ideal platform for studying LLM creativity.
no code implementations • 20 Dec 2024 • Xiang Fang, Wanlong Fang, Changshuo Wang, Daizong Liu, Keke Tang, Jianfeng Dong, Pan Zhou, Beibei Li
Given some video-query pairs with untrimmed videos and sentence queries, temporal sentence grounding (TSG) aims to locate query-relevant segments in these videos.
no code implementations • 14 Dec 2024 • Jingyang Li, Kuangyu Ding, Kim-Chuan Toh, Pan Zhou
To our knowledge, this is the first quantization approach applied to Cholesky factors of preconditioners.
no code implementations • 11 Dec 2024 • Ao Liu, Wenshan Li, Beibei Li, Wengang Ma, Tao Li, Pan Zhou
Recent studies have revealed the vulnerability of graph neural networks (GNNs) to adversarial poisoning attacks on node classification tasks.
no code implementations • 8 Dec 2024 • Zhiguang Wu, Fengbin Zhu, Xuequn Shang, Yupei Zhang, Pan Zhou
In the first stage, agents analyze their respective schema and communicate with each other to collect the schema information relevant to the question.
1 code implementation • 6 Dec 2024 • Xiaofeng Tan, Hongsong Wang, Xin Geng, Pan Zhou
This method leverages both online and offline DPO, allowing each to compensate for the other's limitations.
no code implementations • 30 Nov 2024 • Daizong Liu, Yunbo Tao, Pan Zhou, Wei Hu
With the maturity of depth sensors in various 3D safety-critical applications, 3D point cloud models have been shown to be vulnerable to adversarial attacks.
no code implementations • 28 Nov 2024 • Yutong Zhang, Lixing Chen, Shenghong Li, Nan Cao, Yang Shi, Jiaxin Ding, Zhe Qu, Pan Zhou, Yang Bai
The former retrieves question-relevant domain knowledge from DKG and uses it to prompt LLM to enhance the reasoning capability for domain-specific tasks; the latter leverages LLM to generate new domain knowledge from processed tasks and use it to evolve DKG.
no code implementations • 26 Nov 2024 • Dongping Chen, Ruoxi Chen, Shu Pu, Zhaoyi Liu, Yanru Wu, Caixi Chen, Benlin Liu, Yue Huang, Yao Wan, Pan Zhou, Ranjay Krishna
While compositional approaches that combine separate language and image models show a 111% improvement over unified models at the holistic level, their performance remains suboptimal at both block and image levels.
no code implementations • 20 Nov 2024 • Zhi Luo, Xiyuan Yang, Pan Zhou, Di Wang
Manipulating the interaction trajectories between the intelligent agent and the environment can control the agent's training and behavior, exposing the potential vulnerabilities of reinforcement learning (RL).
1 code implementation • 7 Nov 2024 • Jie Liu, Pan Zhou, Yingjun Du, Ah-Hwee Tan, Cees G. M. Snoek, Jan-Jakob Sonke, Efstratios Gavves
To solve this issue, we propose Cooperative Plan Optimization (CaPo) to enhance the cooperation efficiency of LLM-based embodied agents.
1 code implementation • 30 Oct 2024 • Youcheng Huang, Fengbin Zhu, Jingkun Tang, Pan Zhou, Wenqiang Lei, Jiancheng Lv, Tat-Seng Chua
With the new RADAR dataset, we further develop a novel and effective iN-time Embedding-based AdveRSarial Image DEtection (NEARSIDE) method, which exploits a single vector that distilled from the hidden states of VLMs, which we call the attacking direction, to achieve the detection of adversarial images against benign ones in the input.
1 code implementation • 29 Oct 2024 • Ruihao Xia, Yu Liang, Peng-Tao Jiang, Hao Zhang, Bo Li, Yang Tang, Pan Zhou
To address this issue, we propose Modality Adaptation with text-to-image Diffusion Models (MADM) for semantic segmentation task which utilizes text-to-image diffusion models pre-trained on extensive image-text pairs to enhance the model's cross-modality capabilities.
1 code implementation • 25 Oct 2024 • Wei Han, Pan Zhou, Soujanya Poria, Shuicheng Yan
The limited context window of contemporary large language models (LLMs) remains a huge barrier to their broader application across various domains.
no code implementations • 15 Oct 2024 • Jingyang Li, Jiachun Pan, Vincent Y. F. Tan, Kim-Chuan Toh, Pan Zhou
Semi-supervised learning (SSL), exemplified by FixMatch (Sohn et al., 2020), has shown significant generalization advantages over supervised learning (SL), particularly in the context of deep neural networks (DNNs).
1 code implementation • 11 Oct 2024 • Ziming Yu, Pan Zhou, Sike Wang, Jia Li, Hua Huang
Fine-tuning Large Language Models (LLMs) has proven effective for a variety of downstream tasks.
1 code implementation • 9 Oct 2024 • Ruihao Xia, Yu Liang, Peng-Tao Jiang, Hao Zhang, Qianru Sun, Yang Tang, Bo Li, Pan Zhou
For training objectives, the proposed regularization and trimap loss aim to retain the prior from the pre-trained model and push the matting logits extracted from the mask decoder to contain trimap-based semantic information.
no code implementations • 20 Sep 2024 • Mingmeng Geng, Caixi Chen, Yanru Wu, Dongping Chen, Yao Wan, Pan Zhou
Large language models (LLMs) are increasingly impacting human society, particularly in textual information.
no code implementations • 19 Sep 2024 • Falguni Roy, Xiaofeng Ding, K. -K. R. Choo, Pan Zhou
This study explores fairness, bias, threats, and privacy in recommender systems.
no code implementations • 17 Sep 2024 • Bowen Dong, Pan Zhou, WangMeng Zuo
We introduce LPT++, a comprehensive framework for long-tailed classification that combines parameter-efficient fine-tuning (PEFT) with a learnable model ensemble.
1 code implementation • 7 Aug 2024 • Shanshan Zhong, ShangHua Gao, Zhongzhan Huang, Wushao Wen, Marinka Zitnik, Pan Zhou
To solve this issue, we introduce MoExtend, an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models.
no code implementations • 23 Jul 2024 • Yuanwei Wu, Yue Huang, Yixin Liu, Xiang Li, Pan Zhou, Lichao Sun
In our study, we introduce AutoJailbreak, an innovative automatic jailbreak technique inspired by prompt optimization.
1 code implementation • 10 Jul 2024 • Daizong Liu, Mingyu Yang, Xiaoye Qu, Pan Zhou, Yu Cheng, Wei Hu
Compared to traditional Large Language Models (LLMs), LVLMs present great potential and challenges due to its closer proximity to the multi-resource real-world applications and the complexity of multi-modal processing.
1 code implementation • 5 Jul 2024 • Xingyu Xie, Zhijie Lin, Kim-Chuan Toh, Pan Zhou
Experimental results show that across large-scale model training frameworks like Megatron-LM and PyTorch's FSDP, LoCo significantly improves communication efficiency, e. g., improving Adam's training speed by 14% to 40% without performance degradation on large language models like LLAMAs and MoE.
no code implementations • 1 Jul 2024 • Dongping Chen, Jiawen Shi, Yao Wan, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun
Additionally, we also explore the utility and trustworthiness of LLM in the self-cognition state, revealing that the self-cognition state enhances some specific tasks such as creative writing and exaggeration.
no code implementations • 18 Jun 2024 • Lijie Hu, Liang Liu, Shu Yang, Xin Chen, Hongru Xiao, Mengdi Li, Pan Zhou, Muhammad Asif Ali, Di Wang
Chain-of-Thought (CoT) holds a significant place in augmenting the reasoning performance for large language models (LLMs).
1 code implementation • 16 Jun 2024 • Dongping Chen, Yue Huang, Siyuan Wu, Jingyu Tang, Liuyi Chen, Yilin Bai, Zhigang He, Chenlong Wang, Huichi Zhou, Yiqiang Li, Tianshuo Zhou, Yue Yu, Chujie Gao, Qihui Zhang, Yi Gui, Zhen Li, Yao Wan, Pan Zhou, Jianfeng Gao, Lichao Sun
We evaluate the capabilities of current state-of-the-art MLLMs, including Image LLMs and Video LLMs, in understanding various types of GUI content, especially dynamic and sequential content.
1 code implementation • 13 Jun 2024 • Zhaochen Su, Juntao Li, Jun Zhang, Tong Zhu, Xiaoye Qu, Pan Zhou, Yan Bowen, Yu Cheng, Min Zhang
Temporal reasoning is fundamental for large language models (LLMs) to comprehend the world.
2 code implementations • 10 Jun 2024 • Xuanyu Yi, Zike Wu, Qiuhong Shen, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Shuicheng Yan, Xinchao Wang, Hanwang Zhang
Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors.
1 code implementation • 28 May 2024 • Sike Wang, Pan Zhou, Jia Li, Hua Huang
In this paper, we propose the first 4-bit second-order optimizers, exemplified by 4-bit Shampoo, maintaining performance similar to that of 32-bit ones.
1 code implementation • 23 May 2024 • Henry Hengyuan Zhao, Pan Zhou, Difei Gao, Zechen Bai, Mike Zheng Shou
Our results demonstrate consistent performance gains, underscoring the critical role of these additional tasks in fostering comprehensive intelligence in MLLMs.
Ranked #158 on
Visual Question Answering
on MM-Vet
no code implementations • CVPR 2024 • Wen Yin, Jian Lou, Pan Zhou, Yulai Xie, Dan Feng, Yuhua Sun, Tailai Zhang, Lichao Sun
In the digital realm, we evaluate our approach using benchmark datasets for TIOD, achieving an Attack Success Rate (ASR) of up to 98. 21%.
1 code implementation • 24 Apr 2024 • Batu Guan, Yao Wan, Zhangqian Bi, Zheng Wang, Hongyu Zhang, Pan Zhou, Lichao Sun
Experiments conducted on a real-world dataset across five programming languages demonstrate the effectiveness of CodeIP in watermarking LLMs for code generation while maintaining the syntactical correctness of code.
1 code implementation • 22 Apr 2024 • Yao Wan, Guanghua Wan, Shijie Zhang, Hongyu Zhang, Pan Zhou, Hai Jin, Lichao Sun
Subsequently, the membership classifier can be effectively employed to deduce the membership status of a given code sample based on the output of a target code completion model.
1 code implementation • CVPR 2024 • Xuanyu Yi, Zike Wu, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Hanwang Zhang
Score distillation sampling~(SDS) has been widely adopted to overcome the absence of unseen views in reconstructing 3D objects from a \textbf{single} image.
2 code implementations • 27 Mar 2024 • Qiuhong Shen, Zike Wu, Xuanyu Yi, Pan Zhou, Hanwang Zhang, Shuicheng Yan, Xinchao Wang
We tackle the challenge of efficiently reconstructing a 3D asset from a single image at millisecond speed.
1 code implementation • 26 Mar 2024 • Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong
In this work, we propose JudgeDeceiver, an optimization-based prompt injection attack to LLM-as-a-Judge.
no code implementations • 20 Mar 2024 • Chengzhe Feng, Yanan sun, Ke Li, Pan Zhou, Jiancheng Lv, Aojun Lu
We conduct GenAP on three popular code intelligence PLMs with three canonical code intelligence tasks including defect prediction, code summarization, and code translation.
1 code implementation • CVPR 2024 • Tao Li, Pan Zhou, Zhengbao He, Xinwen Cheng, Xiaolin Huang
By decomposing the adversarial perturbation in SAM into full gradient and stochastic gradient noise components, we discover that relying solely on the full gradient component degrades generalization while excluding it leads to improved performance.
1 code implementation • 15 Mar 2024 • Wanfang Su, Lixing Chen, Yang Bai, Xi Lin, Gaolei Li, Zhe Qu, Pan Zhou
The core philosophy of CMiMC is to preserve discriminative information of individual views in the collaborative view by maximizing mutual information between pre- and post-collaboration features while enhancing the efficacy of collaborative views by minimizing the loss function of downstream tasks.
1 code implementation • CVPR 2024 • Zhongqi Yue, Pan Zhou, Richang Hong, Hanwang Zhang, Qianru Sun
To this end, we find an inductive bias that the time-steps of a Diffusion Model (DM) can isolate the nuanced class attributes, i. e., as the forward diffusion adds noise to an image at each time-step, nuanced attributes are usually lost at an earlier time-step than the spurious attributes that are visually prominent.
1 code implementation • 7 Feb 2024 • Dongping Chen, Ruoxi Chen, Shilin Zhang, Yinuo Liu, Yaochen Wang, Huichi Zhou, Qihui Zhang, Yao Wan, Pan Zhou, Lichao Sun
Drawing inspiration from the concept of LLM-as-a-Judge within LLMs, this paper introduces a novel benchmark, termed MLLM-as-a-Judge, to assess the ability of MLLMs in assisting judges across diverse modalities, encompassing three distinct tasks: Scoring Evaluation, Pair Comparison, and Batch Ranking.
1 code implementation • CVPR 2024 • Zike Wu, Pan Zhou, Xuanyu Yi, Xiaoding Yuan, Hanwang Zhang
To solve this issue, we first deeply analyze the SDS and find that its distillation sampling process indeed corresponds to the trajectory sampling of a stochastic differential equation (SDE): SDS samples along an SDE trajectory to yield a less noisy sample which then serves as a guidance to optimize a 3D model.
2 code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Wei Chen, Pan Zhou, Lei Xie
This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023, engaging in the fixed and open tracks of Single-Speaker VSR Task, and the open track of Multi-Speaker VSR Task.
no code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Pan Zhou, Lei Xie
While automatic speech recognition (ASR) systems degrade significantly in noisy environments, audio-visual speech recognition (AVSR) systems aim to complement the audio stream with noise-invariant visual cues and improve the system's robustness.
Audio-Visual Speech Recognition
Automatic Speech Recognition
+4
no code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, BinBin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li
To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 15 Dec 2023 • Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie
Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest.
no code implementations • 14 Dec 2023 • Ao Liu, Wenshan Li, Tao Li, Beibei Li, Hanyuan Huang, Pan Zhou
We then prove that merely three MP iterations within GCNs can induce signal resonance between nodes and edges, manifesting as a coupling between nodes and their distillable surrounding local subgraph.
1 code implementation • 11 Dec 2023 • Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou
Multimodal Large Language Models (MLLMs) demonstrate exceptional problem-solving capabilities, but few research studies aim to gauge the ability to generate visual instruction tuning data.
1 code implementation • CVPR 2024 • Shanshan Zhong, Zhongzhan Huang, ShangHua Gao, Wushao Wen, Liang Lin, Marinka Zitnik, Pan Zhou
To this end, we study LLMs on the popular Oogiri game which needs participants to have good creativity and strong associative thinking for responding unexpectedly and humorously to the given image, text, or both, and thus is suitable for LoT study.
no code implementations • 1 Dec 2023 • Lin Lu, Chenxi Dai, Wangcheng Tao, Binhang Yuan, Yanan sun, Pan Zhou
Decentralized training of large language models has emerged as an effective way to democratize this technology.
1 code implementation • CVPR 2024 • Yixin Liu, Chenrui Fan, Yutong Dai, Xun Chen, Pan Zhou, Lichao Sun
To solve these challenges, we propose MetaCloak, which solves the bi-level poisoning problem with a meta-learning framework with an additional transformation sampling process to craft transferable and robust perturbation.
no code implementations • 15 Nov 2023 • Yuanwei Wu, Xiang Li, Yixin Liu, Pan Zhou, Lichao Sun
This finding indicates potential exploitable security risks in MLLMs; 2) Based on the acquired system prompts, we propose a novel MLLM jailbreaking attack method termed SASP (Self-Adversarial Attack via System Prompt).
1 code implementation • 14 Nov 2023 • Ming Li, Pan Zhou, Jia-Wei Liu, Jussi Keppo, Min Lin, Shuicheng Yan, Xiangyu Xu
We achieve this remarkable speed by devising a new network that directly constructs a 3D triplane from a text prompt.
no code implementations • 23 Oct 2023 • Yaguan Qian, Chenyu Zhao, Zhaoquan Gu, Bin Wang, Shouling Ji, Wei Wang, Boyang Zhou, Pan Zhou
We propose a Feature-Focusing Adversarial Training (F$^2$AT), which differs from previous work in that it enforces the model to focus on the core features from natural patterns and reduce the impact of spurious features from perturbed patterns.
2 code implementations • NeurIPS 2023 • Zhongzhan Huang, Pan Zhou, Shuicheng Yan, Liang Lin
Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness.
1 code implementation • 4 Oct 2023 • Yue Huang, Jiawen Shi, Yuan Li, Chenrui Fan, Siyuan Wu, Qihui Zhang, Yixin Liu, Pan Zhou, Yao Wan, Neil Zhenqiang Gong, Lichao Sun
However, in scenarios where LLMs serve as intelligent agents, as seen in applications like AutoGPT and MetaGPT, LLMs are expected to engage in intricate decision-making processes that involve deciding whether to employ a tool and selecting the most suitable tool(s) from a collection of available tools to fulfill user requests.
no code implementations • ICCV 2023 • Yunbo Tao, Daizong Liu, Pan Zhou, Yulai Xie, Wei Du, Wei Hu
With the maturity of depth sensors, the vulnerability of 3D point cloud models has received increasing attention in various applications such as autonomous driving and robot navigation.
no code implementations • 12 Jun 2023 • Ao Liu, Wenshan Li, Tao Li, Beibei Li, Guangquan Xu, Pan Zhou, Wengang Ma, Hanyuan Huang
In this paper, we propose the Graph Agent Network (GAgN) to address the aforementioned vulnerabilities of GNNs.
1 code implementation • 12 Jun 2023 • Zike Wu, Pan Zhou, Kenji Kawaguchi, Hanwang Zhang
In this paper, we propose a Fast Diffusion Model (FDM) to significantly speed up DMs from a stochastic optimization perspective for both faster training and sampling.
no code implementations • 6 May 2023 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Zichuan Xu, Haozhao Wang, Xing Di, Weining Lu, Yu Cheng
This paper addresses the temporal sentence grounding (TSG).
13 code implementations • CVPR 2024 • Weihao Yu, Pan Zhou, Shuicheng Yan, Xinchao Wang
Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7x7 depthwise convolution.
1 code implementation • ICCV 2023 • ShangHua Gao, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan
To solve this issue, we propose a Masked Diffusion Transformer (MDT) that introduces a mask latent modeling scheme to explicitly enhance the DPMs' ability to contextual relation learning among object semantic parts in an image.
Ranked #23 on
Image Generation
on ImageNet 256x256
no code implementations • CVPR 2023 • Xiang Fang, Daizong Liu, Pan Zhou, Guoshun Nan
To handle the raw video bit-stream input, we propose a novel Three-branch Compressed-domain Spatial-temporal Fusion (TCSF) framework, which extracts and aggregates three kinds of low-level visual features (I-frame, motion vector and residual features) for effective and efficient grounding.
no code implementations • 5 Mar 2023 • Yixin Liu, Chenrui Fan, Pan Zhou, Lichao Sun
While the use of graph-structured data in various fields is becoming increasingly popular, it also raises concerns about the potential unauthorized exploitation of personal data for training commercial graph neural network (GNN) models, which can compromise privacy.
no code implementations • 2 Mar 2023 • Daizong Liu, Pan Zhou
Temporal sentence localization in videos (TSLV) aims to retrieve the most interested segment in an untrimmed video according to a given sentence query.
1 code implementation • 27 Feb 2023 • Junbin Xiao, Pan Zhou, Angela Yao, Yicong Li, Richang Hong, Shuicheng Yan, Tat-Seng Chua
CoVGT's uniqueness and superiority are three-fold: 1) It proposes a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations and dynamics, for complex spatio-temporal reasoning.
Ranked #34 on
Video Question Answering
on NExT-QA
(using extra training data)
no code implementations • 21 Feb 2023 • Zeyu Xiong, Daizong Liu, Pan Zhou, Jiahao Zhu
Temporal sentence grounding (TSG) aims to localize the temporal segment which is semantically aligned with a natural language query in an untrimmed video. Most existing methods extract frame-grained features or object-grained features by 3D ConvNet or detection network under a conventional TSG framework, failing to capture the subtle differences between frames or to model the spatio-temporal behavior of core persons/objects.
no code implementations • 21 Feb 2023 • Jiawen Shi, Yixin Liu, Pan Zhou, Lichao Sun
Recently, ChatGPT has gained significant attention in research due to its ability to interact with humans effectively.
no code implementations • ICCV 2023 • Ming Li, Xiangyu Xu, Hehe Fan, Pan Zhou, Jun Liu, Jia-Wei Liu, Jiahe Li, Jussi Keppo, Mike Zheng Shou, Shuicheng Yan
For the first time, we introduce vision Transformers into PPAR by treating a video as a tubelet sequence, and accordingly design two complementary mechanisms, i. e., sparsification and anonymization, to remove privacy from a spatio-temporal perspective.
no code implementations • 5 Jan 2023 • Daizong Liu, Xiang Fang, Pan Zhou, Xing Di, Weining Lu, Yu Cheng
Given an untrimmed video, temporal sentence localization (TSL) aims to localize a specific segment according to a given sentence query.
no code implementations • 2 Jan 2023 • Jiahao Zhu, Daizong Liu, Pan Zhou, Xing Di, Yu Cheng, Song Yang, Wenzheng Xu, Zichuan Xu, Yao Wan, Lichao Sun, Zeyu Xiong
All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning.
no code implementations • CVPR 2023 • Zenghui Yuan, Pan Zhou, Kai Zou, Yu Cheng
Vision Transformers (ViTs), which made a splash in the field of computer vision (CV), have shaken the dominance of convolutional neural networks (CNNs).
1 code implementation • CVPR 2023 • Alex Jinpeng Wang, Pan Zhou, Mike Zheng Shou, Shuicheng Yan
In this work, we propose a novel Position-guided Text Prompt (PTP) paradigm to enhance the visual grounding ability of cross-modal models trained with VLP.
Ranked #5 on
Zero-Shot Cross-Modal Retrieval
on COCO 2014
1 code implementation • 13 Dec 2022 • Xiaoye Qu, Jun Zeng, Daizong Liu, Zhefeng Wang, Baoxing Huai, Pan Zhou
Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates the data scarcity problem in NER by automatically generating training samples.
8 code implementations • 24 Oct 2022 • Weihao Yu, Chenyang Si, Pan Zhou, Mi Luo, Yichen Zhou, Jiashi Feng, Shuicheng Yan, Xinchao Wang
By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.
Ranked #2 on
Domain Generalization
on ImageNet-C
(using extra training data)
1 code implementation • 20 Oct 2022 • ShangHua Gao, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan
In this work, we explore a sustainable SSL framework with two major challenges: i) learning a stronger new SSL model based on the existing pretrained SSL model, also called as "base" model, in a cost-friendly manner, ii) allowing the training of the new model to be compatible with various base models.
Ranked #1 on
Semantic Segmentation
on ImageNet-S
1 code implementation • 3 Oct 2022 • Bowen Dong, Pan Zhou, Shuicheng Yan, WangMeng Zuo
For better effectiveness, we divide prompts into two groups: 1) a shared prompt for the whole long-tailed dataset to learn general features and to adapt a pretrained model into target domain; and 2) group-specific prompts to gather group-specific features for the samples which have similar features and also to empower the pretrained model with discrimination ability.
Ranked #1 on
Long-tail Learning
on CIFAR-100-LT (ρ=100)
(using extra training data)
no code implementations • 23 Sep 2022 • Xiang Fang, Daizong Liu, Pan Zhou, Yuchong Hu
In addition, due to the domain gap between different datasets, directly applying these pre-trained models to an unseen domain leads to a significant performance drop.
no code implementations • 31 Aug 2022 • Xiang Fang, Daizong Liu, Pan Zhou, Zichuan Xu, Ruixuan Li
To address this issue, in this paper, we propose a novel Hierarchical Local-Global Transformer (HLGT) to leverage this hierarchy information and model the interactions between different levels of granularity and different modalities for learning more fine-grained multi-modal representations.
9 code implementations • 13 Aug 2022 • Xingyu Xie, Pan Zhou, Huan Li, Zhouchen Lin, Shuicheng Yan
Adan first reformulates the vanilla Nesterov acceleration to develop a new Nesterov momentum estimation (NME) method, which avoids the extra overhead of computing gradient at the extrapolation point.
1 code implementation • 12 Jul 2022 • Junbin Xiao, Pan Zhou, Tat-Seng Chua, Shuicheng Yan
VGT's uniqueness are two-fold: 1) it designs a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations, and dynamics for complex spatio-temporal reasoning; and 2) it exploits disentangled video and text Transformers for relevance comparison between the video and text to perform QA, instead of entangled cross-modal Transformer for answer classification.
Ranked #5 on
Video Question Answering
on IntentQA
1 code implementation • 12 Jul 2022 • Yuhua Sun, Tailai Zhang, Xingjun Ma, Pan Zhou, Jian Lou, Zichuan Xu, Xing Di, Yu Cheng, Lichao
In this paper, we propose two novel Density Manipulation Backdoor Attacks (DMBA$^{-}$ and DMBA$^{+}$) to attack the model to produce arbitrarily large or small density estimations.
no code implementations • 2 Jul 2022 • Zeyu Xiong, Daizong Liu, Pan Zhou
Spatial-Temporal Video Grounding (STVG) is a challenging task which aims to localize the spatio-temporal tube of the interested object semantically according to a natural language query.
no code implementations • 8 Jun 2022 • Jiachun Pan, Pan Zhou, Shuicheng Yan
To solve these problems, we first theoretically show that on an auto-encoder of a two/one-layered convolution encoder/decoder, MRP can capture all discriminative features of each potential semantic class in the pretraining dataset.
4 code implementations • 25 May 2022 • Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, Shuicheng Yan
Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information.
1 code implementation • CVPR 2022 • Binghui Wang, Youqi Li, Pan Zhou
We then propose an online attack based on bandit optimization which is proven to be {sublinear} to the query number $T$, i. e., $\mathcal{O}(\sqrt{N}T^{3/4})$ where $N$ is the number of nodes in the graph.
1 code implementation • 27 Mar 2022 • Pan Zhou, Yichen Zhou, Chenyang Si, Weihao Yu, Teck Khim Ng, Shuicheng Yan
It provides complementary instance supervision to IDS via an extra alignment on local neighbors, and scatters different local-groups separately to increase discriminability.
Ranked #13 on
Self-Supervised Image Classification
on ImageNet
Contrastive Learning
Self-Supervised Image Classification
+3
1 code implementation • 14 Mar 2022 • Bowen Dong, Pan Zhou, Shuicheng Yan, WangMeng Zuo
The few-shot learning ability of vision transformers (ViTs) is rarely investigated though heavily desired.
no code implementations • 6 Mar 2022 • Daizong Liu, Xiang Fang, Wei Hu, Pan Zhou
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
no code implementations • 18 Feb 2022 • Zhengyi Zhang, Pan Zhou
End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model.
no code implementations • 14 Jan 2022 • Daizong Liu, Xiaoye Qu, Yinzhen Wang, Xing Di, Kai Zou, Yu Cheng, Zichuan Xu, Pan Zhou
Temporal video grounding (TVG) aims to localize a target segment in a video according to a given sentence query.
no code implementations • 3 Jan 2022 • Daizong Liu, Xiaoye Qu, Pan Zhou, Yang Liu
Then, we develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations, respectively.
no code implementations • 3 Jan 2022 • Daizong Liu, Xiaoye Qu, Xing Di, Yu Cheng, Zichuan Xu, Pan Zhou
To tackle this issue, we propose a memory-augmented network, called Memory-Guided Semantic Learning Network (MGSL-Net), that learns and memorizes the rarely appeared content in TSG tasks.
1 code implementation • 9 Dec 2021 • Yuxuan Liang, Pan Zhou, Roger Zimmermann, Shuicheng Yan
While transformers have shown great potential on video recognition with their strong capability of capturing long-range dependencies, they often suffer high computational costs induced by the self-attention to the huge number of 3D tokens.
no code implementations • 8 Dec 2021 • Wenbo Gou, Wen Shi, Jian Lou, Lijie Huang, Pan Zhou, Ruixuan Li
Natural language video localization (NLVL) is an important task in the vision-language understanding area, which calls for an in-depth understanding of not only computer vision and natural language side alone, but more importantly the interplay between both sides.
1 code implementation • NeurIPS 2021 • Pan Zhou, Hanshu Yan, Xiaotong Yuan, Jiashi Feng, Shuicheng Yan
Specifically, we prove that lookahead using SGD as its inner-loop optimizer can better balance the optimization error and generalization error to achieve smaller excess risk error than vanilla SGD on (strongly) convex problems and nonconvex problems with Polyak-{\L}ojasiewicz condition which has been observed/proved in neural networks.
no code implementations • 28 Nov 2021 • Yang Peng, Ping Liu, Yawei Luo, Pan Zhou, Zichuan Xu, Jingen Liu
Unsupervised domain adaptive person re-identification has received significant attention due to its high practical value.
Domain Adaptive Person Re-Identification
Person Re-Identification
18 code implementations • CVPR 2022 • Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan
Based on this observation, we hypothesize that the general architecture of the Transformers, instead of the specific token mixer module, is more essential to the model's performance.
Ranked #9 on
Semantic Segmentation
on DensePASS
no code implementations • 29 Sep 2021 • Qiming Wu, Xiaohan Chen, Yifan Jiang, Pan Zhou, Zhangyang Wang
Drawing inspirations from the recently prosperous research on lottery ticket hypothesis (LTH), we conjecture and study a novel “lottery image prior” (LIP), stated as: given an (untrained or trained) DNN-based image prior, it will have a sparse subnetwork that can be training in isolation, to match the original DNN’s performance when being applied as a prior to various image inverse problems.
no code implementations • 29 Sep 2021 • Binghui Wang, Youqi Li, Pan Zhou
However, many recent works have demonstrated that an attacker can mislead GNN models by slightly perturbing the graph structure.
1 code implementation • 22 Sep 2021 • Zeyuan Yin, Ye Yuan, Panfeng Guo, Pan Zhou
Edge devices in federated learning usually have much more limited computation and communication resources compared to servers in a data center.
no code implementations • Findings (EMNLP) 2021 • Guolin Zheng, Yubei Xiao, Ke Gong, Pan Zhou, Xiaodan Liang, Liang Lin
Specifically, we unify a pre-trained acoustic model (wav2vec 2. 0) and a language model (BERT) into an end-to-end trainable framework.
no code implementations • EMNLP 2021 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou
However, the performance of bottom-up model is inferior to the top-down counterpart as it fails to exploit the segment-level interaction.
no code implementations • EMNLP 2021 • Daizong Liu, Xiaoye Qu, Pan Zhou
A key solution to temporal sentence grounding (TSG) exists in how to learn effective alignment between vision and language features extracted from an untrimmed video and a sentence description.
no code implementations • 27 Jul 2021 • Zhikang Zou, Xiaoye Qu, Pan Zhou, Shuangjie Xu, Xiaoqing Ye, Wenhao Wu, Jin Ye
In specific, at the coarse-grained stage, we design a dual-discriminator strategy to adapt source domain to be close to the targets from the perspectives of both global and local feature space via adversarial learning.
no code implementations • NeurIPS 2021 • Pan Zhou, Caiming Xiong, Xiao-Tong Yuan, Steven Hoi
Although intuitive, such a native label assignment strategy cannot reveal the underlying semantic similarity between a query and its positives and negatives, and impairs performance, since some negatives are semantically similar to the query or even share the same semantic class as the query.
1 code implementation • 17 Jun 2021 • Shuai Lin, Pan Zhou, Zi-Yuan Hu, Shuojia Wang, Ruihui Zhao, Yefeng Zheng, Liang Lin, Eric Xing, Xiaodan Liang
However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i. e., the negatives likely having the same semantic structure with the query, leading to performance degradation.
no code implementations • 6 Jun 2021 • Wei Wei, Jiayi Liu, Xianling Mao, Guibing Guo, Feida Zhu, Pan Zhou, Yuchong Hu
The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions.
no code implementations • 2 Jun 2021 • Zanbo Wang, Wei Wei, Xianling Mao, Shanshan Feng, Pan Zhou, Zhiyong He, Sheng Jiang
To this end, we propose a model called Global Context enhanced Document-level NER (GCDoc) to leverage global contextual information from two levels, i. e., both word and sentence.
1 code implementation • NeurIPS 2021 • Zhuolin Yang, Linyi Li, Xiaojun Xu, Shiliang Zuo, Qian Chen, Pan Zhou, Benjamin I. P. Rubinstein, Ce Zhang, Bo Li
To answer these questions, in this work we first theoretically analyze and outline sufficient conditions for adversarial transferability between models; then propose a practical algorithm to reduce the transferability between base models within an ensemble to improve its robustness.
no code implementations • 9 May 2021 • Huming Qiu, Hua Ma, Zhi Zhang, Yifeng Zheng, Anmin Fu, Pan Zhou, Yansong Gao, Derek Abbott, Said F. Al-Sarawi
To this end, a 1-bit quantized DNN model or deep binary neural network maximizes the memory efficiency, where each parameter in a BNN model has only 1-bit.
1 code implementation • 22 Apr 2021 • Qiming Wu, Zhikang Zou, Pan Zhou, Xiaoqing Ye, Binghui Wang, Ang Li
Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems.
no code implementations • 8 Apr 2021 • Zhichao Wang, Wenwen Yang, Pan Zhou, Wei Chen
Recently, attention-based encoder-decoder (AED) end-to-end (E2E) models have drawn more and more attention in the field of automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • NeurIPS 2021 • Zhuolin Yang, Linyi Li, Xiaojun Xu, Shiliang Zuo, Qian Chen, Benjamin Rubinstein, Pan Zhou, Ce Zhang, Bo Li
To answer these questions, in this work we first theoretically analyze and outline sufficient conditions for adversarial transferability between models; then propose a practical algorithm to reduce the transferability between base models within an ensemble to improve its robustness.
no code implementations • CVPR 2021 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Yu Cheng, Wei Wei, Zichuan Xu, Yulai Xie
This paper addresses the problem of temporal sentence grounding (TSG), which aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.
2 code implementations • 2 Mar 2021 • Wenxiao Wang, Tianhao Wang, Lun Wang, Nanqing Luo, Pan Zhou, Dawn Song, Ruoxi Jia
Deep learning techniques have achieved remarkable performance in wide-ranging tasks.
no code implementations • 2 Feb 2021 • Qi Zheng, Jianfeng Dong, Xiaoye Qu, Xun Yang, Yabing Wang, Pan Zhou, Baolong Liu, Xun Wang
The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments.
no code implementations • 1 Jan 2021 • Fuyu Wang, Pan Zhou, Xiaodan Liang, Liang Lin
To solve this issue, we propose a novel DynamIc Self-sUperviSed Erasure (DISUSE) which adaptively erases redundant and artifactual clues in the context and questions to learn and establish the correct corresponding pair relations between the questions and their clues.
no code implementations • 22 Dec 2020 • Yubei Xiao, Ke Gong, Pan Zhou, Guolin Zheng, Xiaodan Liang, Liang Lin
When sampling tasks in MML-ASR, AMS adaptively determines the task sampling probability for each source language.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 22 Dec 2020 • Shuai Lin, Pan Zhou, Xiaodan Liang, Jianheng Tang, Ruihui Zhao, Ziliang Chen, Liang Lin
Besides, we develop a Graph-Evolving Meta-Learning (GEML) framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease, which effectively alleviates the needs of a large number of dialogues.
no code implementations • 10 Dec 2020 • Daizong Liu, Shuangjie Xu, Xiao-Yang Liu, Zichuan Xu, Wei Wei, Pan Zhou
To capture temporal information from previous frames, we use a memory network to refine the mask of current frame by retrieving historic masks in a temporal graph.
no code implementations • 4 Dec 2020 • Daizong Liu, Dongdong Yu, Changhu Wang, Pan Zhou
Specifically, our proposed network consists of three main parts: Siamese Encoder Module, Center Guiding Appearance Diffusion Module, and Dynamic Information Fusion Module.
Ranked #10 on
Unsupervised Video Object Segmentation
on FBMS test
Semantic Segmentation
Unsupervised Video Object Segmentation
+1
no code implementations • COLING 2020 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou
In this paper, we propose a novel deep rectification-modulation network (RMN), transforming this task into a multi-step reasoning process by repeating rectification and modulation.
1 code implementation • 23 Nov 2020 • Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu
Inspired by the variation and the heredity in genetics, V3H first decomposes each subspace into a variation matrix for the corresponding view and a heredity matrix for all the views to represent the unique information and the consistent information respectively.
1 code implementation • 20 Nov 2020 • Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu
However, different views often have distinct incompleteness, i. e., unbalanced incompleteness, which results in strong views (low-incompleteness views) and weak views (high-incompleteness views).
1 code implementation • 20 Nov 2020 • Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu
In these scenarios, original image data often contain missing instances and noises, which is ignored by most multi-view clustering methods.
no code implementations • 16 Nov 2020 • Ziyang Wang, Wei Wei, Xian-Ling Mao, Guibing Guo, Pan Zhou, Shanshan Feng
Due to the huge commercial interests behind online reviews, a tremendousamount of spammers manufacture spam reviews for product reputation manipulation.
no code implementations • 15 Nov 2020 • Wei Wei, Jiayi Liu, Xianling Mao, Guibin Guo, Feida Zhu, Pan Zhou, Yuchong Hu, Shanshan Feng
The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions.
no code implementations • 26 Oct 2020 • Daizong Liu, Hongting Zhang, Pan Zhou
In terms of video based FER task, it is sensible to capture the dynamic expression variation among the frames to recognize facial expression.
Facial Expression Recognition
Facial Expression Recognition (FER)
no code implementations • 23 Oct 2020 • HANLIN ZHANG, Shuai Lin, Weiyang Liu, Pan Zhou, Jian Tang, Xiaodan Liang, Eric P. Xing
Recently, there has been increasing interest in the challenge of how to discriminatively vectorize graphs.
no code implementations • NeurIPS 2020 • Pan Zhou, Jiashi Feng, Chao Ma, Caiming Xiong, Steven Hoi, Weinan E
The result shows that (1) the escaping time of both SGD and ADAM~depends on the Radon measure of the basin positively and the heaviness of gradient noise negatively; (2) for the same basin, SGD enjoys smaller escaping time than ADAM, mainly because (a) the geometry adaptation in ADAM~via adaptively scaling each gradient coordinate well diminishes the anisotropic structure in gradient noise and results in larger Radon measure of a basin; (b) the exponential gradient average in ADAM~smooths its gradient and leads to lighter gradient noise tails than SGD.
no code implementations • 12 Oct 2020 • Yu Bai, Minshuo Chen, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang, Caiming Xiong
A common practice in meta-learning is to perform a train-validation split (\emph{train-val method}) where the prior adapts to the task on one split of the data, and the resulting predictor is evaluated on another split.
no code implementations • ICML 2020 • Pan Zhou, Xiao-Tong Yuan
Particularly, in the case of $\epsilon=\mathcal{O}\big(1/\sqrt{n}\big)$ which is at the order of intrinsic excess error bound of a learning model and thus sufficient for generalization, the stochastic gradient complexity bounds of HSDMPG for quadratic and generic loss functions are respectively $\mathcal{O} (n^{0. 875}\log^{1. 5}(n))$ and $\mathcal{O} (n^{0. 875}\log^{2. 25}(n))$, which to our best knowledge, for the first time achieve optimal generalization in less than a single pass over data.
no code implementations • 1 Sep 2020 • Houxiang Fan, Binghui Wang, Pan Zhou, Ang Li, Meng Pang, Zichuan Xu, Cai Fu, Hai Li, Yiran Chen
Link prediction in dynamic graphs (LPDG) is an important research problem that has diverse applications such as online recommendations, studies on disease contagion, organizational studies, etc.
1 code implementation • 1 Sep 2020 • Binghui Wang, Tianxiang Zhou, Minhua Lin, Pan Zhou, Ang Li, Meng Pang, Hai Li, Yiran Chen
Specifically, we first introduce two influence functions, i. e., feature-label influence and label influence, that are defined on GNNs and label propagation (LP), respectively.
no code implementations • 12 Aug 2020 • Zichuan Xu, Jiangkai Wu, Qiufen Xia, Pan Zhou, Jiankang Ren, HuiZhi Liang
In this paper, we design novel models for pedestrian attribute recognition with re-ID in an MEC-enabled camera monitoring system.
no code implementations • 6 Aug 2020 • Xiaoye Qu, Pengwei Tang, Zhikang Zhou, Yu Cheng, Jianfeng Dong, Pan Zhou
In this paper, we propose a Fine-grained Iterative Attention Network (FIAN) that consists of an iterative attention module for bilateral query-video in-formation extraction.
1 code implementation • 4 Aug 2020 • Daizong Liu, Xiaoye Qu, Xiao-Yang Liu, Jianfeng Dong, Pan Zhou, Zichuan Xu
To this end, we propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph.
1 code implementation • NeurIPS 2020 • Pan Zhou, Caiming Xiong, Richard Socher, Steven C. H. Hoi
Then we propose a theory-inspired path-regularized DARTS that consists of two key modules: (i) a differential group-structured sparse binary gate introduced for each operation to avoid unfair competition among operations, and (ii) a path-depth-wise regularization used to incite search exploration for deep architectures that often converge slower than shallow ones as shown in our theory and are not well explored during the search.
3 code implementations • 27 Jun 2020 • Tao Shen, Jie Zhang, Xinkang Jia, Fengda Zhang, Gang Huang, Pan Zhou, Kun Kuang, Fei Wu, Chao Wu
The experiments show that FML can achieve better performance than alternatives in typical FL setting, and clients can be benefited from FML with different models and tasks.
1 code implementation • NeurIPS 2020 • Yue Wu, Pan Zhou, Andrew Gordon Wilson, Eric P. Xing, Zhiting Hu
Despite success on a wide range of problems related to vision, generative adversarial networks (GANs) often suffer from inferior performance due to unstable training, especially for text generation.
Ranked #2 on
Text Generation
on EMNLP2017 WMT
2 code implementations • ICLR 2021 • Junnan Li, Pan Zhou, Caiming Xiong, Steven C. H. Hoi
This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that addresses the fundamental limitations of instance-wise contrastive learning.
Ranked #5 on
Contrastive Learning
on imagenet-1k
no code implementations • 6 May 2020 • Ao Liu, Beibei Li, Tao Li, Pan Zhou, Rui Wang
In this paper, we first generalize the formulation of edge-perturbing attacks and strictly prove the vulnerability of GCNs to such attacks in node classification tasks.
no code implementations • 19 Apr 2020 • Yang Hu, Xiaying Bai, Pan Zhou, Fanhua Shang, ShengMei Shen
Pedestrian attribute recognition is an important multi-label classification problem.
no code implementations • 7 Mar 2020 • Zhikang Zou, Yifan Liu, Shuangjie Xu, Wei Wei, Shiping Wen, Pan Zhou
Extensive experiments on crowd counting datasets (ShanghaiTech, MALL, WorldEXPO'10, and UCSD) show that our HSRNet can deliver superior results over all state-of-the-art approaches.
no code implementations • 26 Feb 2020 • Daizong Liu, Shuangjie Xu, Pan Zhou, Kun He, Wei Wei, Zichuan Xu
In this work, we propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases by using a dynamic learnable adjacency matrix in graph structure to improve the diagnosis accuracy.
no code implementations • 6 Feb 2020 • Zeyue Xue, Shuang Luo, Chao Wu, Pan Zhou, Kaigui Bian, Wei Du
Peer-to-peer knowledge transfer in distributed environments has emerged as a promising method since it could accelerate learning and improve team-wide performance without relying on pre-trained teachers in deep reinforcement learning.
no code implementations • 3 Feb 2020 • Huawei Huang, Kangying Lin, Song Guo, Pan Zhou, Zibin Zheng
In the dynamic environment, the mobile devices selected by the existing reactive candidate-selection algorithms very possibly fail to complete the training and reporting phases of FL, because the FL parameter server only knows the currently-observed resources of all candidates.
no code implementations • NeurIPS 2019 • Pan Zhou, Xiao-Tong Yuan, Huan Xu, Shuicheng Yan, Jiashi Feng
We address the problem of meta-learning which learns a prior over hypothesis from a sample of meta-training tasks for fast adaptation on meta-testing tasks.
no code implementations • 1 Nov 2019 • Pan Zhou, Ruchao Fan, Wei Chen, Jia Jia
Transformer has shown promising results in many sequence to sequence transformation tasks recently.
no code implementations • 14 Oct 2019 • Shuangjie Xu, Feng Xu, Yu Cheng, Pan Zhou
In this paper, we investigate a novel problem of telling the difference between image pairs in natural language.
no code implementations • 25 Sep 2019 • Hongting Zhang, Qiben Yan, Pan Zhou
We then impose a constraint on the perturbation at the positions with lower sound intensity across the time domain to eliminate the perceptible noise during the silent periods or pauses.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 24 Aug 2019 • Wei Wei, Zanbo Wang, Xian-Ling Mao, Guangyou Zhou, Pan Zhou, Sheng Jiang
Sequence labeling is a fundamental task in natural language processing and has been widely studied.
no code implementations • 12 Aug 2019 • Zhikang Zou, Huiliang Shao, Xiaoye Qu, Wei Wei, Pan Zhou
Recently, convolutional neural networks (CNNs) are the leading defacto method for crowd counting.
no code implementations • 7 Aug 2019 • Zhikang Zou, Yu Cheng, Xiaoye Qu, Shouling Ji, Xiaoxiao Guo, Pan Zhou
ACM-CNN consists of three types of modules: a coarse network, a fine network, and a smooth network.
no code implementations • 19 Jul 2019 • Zhuoying Li, Pan Zhou, Yanru Zhang, Lin Gao
Unmanned aerial vehicles (UAV) ad-hoc network is a significant contingency plan for communication after a natural disaster, such as typhoon and earthquake.
no code implementations • 16 Jul 2019 • Canyi Lu, Pan Zhou
This work studies the Tensor Robust Principal Component Analysis (TRPCA) problem, which aims to exactly recover the low-rank and sparse components from their sum.
8 code implementations • 17 Jun 2019 • Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, Zhangyang Wang
Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data?
no code implementations • NAACL 2019 • Xiaoye Qu, Zhikang Zou, Yu Cheng, Yang Yang, Pan Zhou
Cross-domain sentiment classification aims to predict sentiment polarity on a target domain utilizing a classifier learned from a source domain.
1 code implementation • CVPR 2019 • Shuangjie Xu, Daizong Liu, Linchao Bao, Wei Liu, Pan Zhou
Extensive experiments on challenging datasets demonstrate the effectiveness of the proposed method, especially in the case of object missing.
Ranked #40 on
Semi-Supervised Video Object Segmentation
on DAVIS 2016
no code implementations • ICLR 2020 • Zebang Shen, Pan Zhou, Cong Fang, Alejandro Ribeiro
We target the problem of finding a local minimum in non-convex finite-sum minimization.
no code implementations • 21 Feb 2019 • Chengjie Li, Ruixuan Li, Haozhao Wang, Yuhua Li, Pan Zhou, Song Guo, Keqin Li
Distributed asynchronous offline training has received widespread attention in recent years because of its high performance on large-scale data and complex models.
no code implementations • NeurIPS 2018 • Pan Zhou, Xiao-Tong Yuan, Jiashi Feng
In this paper, we affirmatively answer this open question by showing that under WoRS and for both convex and non-convex problems, it is still possible for HSGD (with constant step-size) to match full gradient descent in rate of convergence, while maintaining comparable sample-size-independent incremental first-order oracle complexity to stochastic gradient descent.
no code implementations • NeurIPS 2018 • Pan Zhou, Xiao-Tong Yuan, Jiashi Feng
To address these deficiencies, we propose an efficient hybrid stochastic gradient hard thresholding (HSG-HT) method that can be provably shown to have sample-size-independent gradient evaluation and hard thresholding complexity bounds.
1 code implementation • 19 Nov 2018 • Haoran You, Yu Cheng, Tianheng Cheng, ChunLiang Li, Pan Zhou
We evaluate the proposed Bayesian CycleGAN on multiple benchmark datasets, including Cityscapes, Maps, and Monet2photo.
no code implementations • 13 Nov 2018 • Senmao Wang, Pan Zhou, Wei Chen, Jia Jia, Lei Xie
End-to-end approaches have drawn much attention recently for significantly simplifying the construction of an automatic speech recognition (ASR) system.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 13 Nov 2018 • Pan Zhou, Wenwen Yang, Wei Chen, Yan-Feng Wang, Jia Jia
In this paper, we propose a novel multimodal attention based method for audio-visual speech recognition which could automatically learn the fused representation from both modalities based on their importance.
Audio-Visual Speech Recognition
Robust Speech Recognition
+2
no code implementations • 13 Nov 2018 • Ruchao Fan, Pan Zhou, Wei Chen, Jia Jia, Gang Liu
In previous work, researchers have shown that such architectures can acquire comparable results to state-of-the-art ASR systems, especially when using a bidirectional encoder and global soft attention (GSA) mechanism.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 7 Oct 2018 • Lixing Chen, Jie Xu, Shaolei Ren, Pan Zhou
To solve this problem and optimize the edge computing performance, we propose SEEN, a Spatial-temporal Edge sErvice placemeNt algorithm.
no code implementations • 16 Jul 2018 • Xinxing Su, Yingtian Zou, Yu Cheng, Shuangjie Xu, Mo Yu, Pan Zhou
We present a novel method - Spatial-Temporal Synergic Residual Network (STSRN) for this problem.
no code implementations • CVPR 2018 • Pan Zhou, Yunqing Hou, Jiashi Feng
To solve this issue, we propose a novel deep adversarial subspace clustering (DASC) model, which learns more favorable sample representations by deep learning for subspace clustering, and more importantly introduces adversarial learning to supervise sample representation learning and subspace clustering.
Ranked #2 on
Image Clustering
on coil-40
no code implementations • ICML 2018 • Pan Zhou, Jiashi Feng
Besides, we prove that for an arbitrary gradient descent algorithm, the computed approximate stationary point by minimizing empirical risk is also an approximate stationary point to the population risk.
2 code implementations • 5 Apr 2018 • Xi Ouyang, Yu Cheng, Yifan Jiang, Chun-Liang Li, Pan Zhou
The results show that our framework can smoothly synthesize pedestrians on background images of variations and different levels of details.
Ranked #2 on
Scene Text Recognition
on MSDA
no code implementations • ICLR 2018 • Pan Zhou, Jiashi Feng
This work aims to provide comprehensive landscape analysis of empirical risk in deep neural networks (DNNs), including the convergence behavior of its gradient, its stationary points and the empirical risk itself to their corresponding population counterparts, which reveals how various network parameters determine the convergence performance.
no code implementations • 23 Oct 2017 • Yu Cheng, Duo Wang, Pan Zhou, Tao Zhang
Methods of parameter pruning and quantization are described first, after that the other techniques are introduced.
1 code implementation • ICCV 2017 • Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, Pan Zhou
Person Re-Identification (person re-id) is a crucial task as its applications in visual surveillance and human-computer interaction.
no code implementations • CVPR 2017 • Pan Zhou, Jiashi Feng
Low-rank tensor analysis is important for various real applications in computer vision.
no code implementations • 19 May 2017 • Pan Zhou, Jiashi Feng
For an $l$-layer linear neural network, we prove its empirical risk uniformly converges to its population risk at the rate of $\mathcal{O}(r^{2l}\sqrt{d\log(l)}/\sqrt{n})$ with training sample size of $n$, the total weight dimension of $d$ and the magnitude bound $r$ of weight of each layer.
no code implementations • 11 Oct 2016 • Yifan Hou, Pan Zhou, Ting Wang, Li Yu, Yuchong Hu, Dapeng Wu
In this respect, the key challenge is how to realize personalized course recommendation as well as to reduce the computing and storage costs for the tremendous course data.
no code implementations • 21 Feb 2016 • Chencheng Li, Pan Zhou, Yingxue Zhou, Kaigui Bian, Tao Jiang, Susanto Rahardja
An increasing number of people participate in social networks and massive online social data are obtained.
no code implementations • 1 Sep 2015 • Pan Zhou, Yingxue Zhou, Dapeng Wu, Hai Jin
In addition, none of them has considered both the privacy of users' contexts (e, g., social status, ages and hobbies) and video service vendors' repositories, which are extremely sensitive and of significant commercial value.
no code implementations • 25 May 2015 • Chencheng Li, Pan Zhou
Thus, we use differential privacy to preserve the privacy of learners, and study the influence of guaranteeing differential privacy on the utility of the distributed online learning algorithm.