Search Results for author: Keze Wang

Found 40 papers, 16 papers with code

Mind the Context: The Impact of Contextualization in Neural Module Networks for Grounding Visual Referring Expressions

no code implementations EMNLP 2021 Arjun Akula, Spandana Gella, Keze Wang, Song-Chun Zhu, Siva Reddy

Our model outperforms the state-of-the-art NMN model on CLEVR-Ref+ dataset with +8. 1% improvement in accuracy on the single-referent test set and +4. 3% on the full test set.

On Training Data Influence of GPT Models

1 code implementation11 Apr 2024 Qingyi Liu, Yekun Chai, Shuohuan Wang, Yu Sun, Qiwei Peng, Keze Wang, Hua Wu

This paper presents GPTfluence, a novel approach that leverages a featurized simulation to assess the impact of training examples on the training dynamics of GPT models.

Natural Language Understanding

NeRF-VPT: Learning Novel View Representations with Neural Radiance Fields via View Prompt Tuning

1 code implementation2 Mar 2024 Linsheng Chen, Guangrun Wang, Liuchun Yuan, Keze Wang, Ken Deng, Philip H. S. Torr

Furthermore, the cascading learning of NeRF-VPT introduces adaptability to scenarios with sparse inputs, resulting in a significant enhancement of accuracy for sparse-view novel view synthesis.

Novel View Synthesis

Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention

1 code implementation12 Jan 2024 Xingyu Zhou, Leheng Zhang, Xiaorui Zhao, Keze Wang, Leida Li, Shuhang Gu

The core of MIA-VSR is leveraging feature-level temporal continuity between adjacent frames to reduce redundant computations and make more rational use of previously enhanced SR features.

Video Super-Resolution

Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation

no code implementations18 Dec 2023 Hui Fu, Zeqing Wang, Ke Gong, Keze Wang, Tianshui Chen, Haojie Li, Haifeng Zeng, Wenxiong Kang

Moreover, to facilitate disentangled representation learning, we introduce four well-designed constraints: an auxiliary style classifier, an auxiliary inverse classifier, a content contrastive loss, and a pair of latent cycle losses, which can effectively contribute to the construction of the identity-related style space and semantic-related content space.

Disentanglement

SQLNet: Scale-Modulated Query and Localization Network for Few-Shot Class-Agnostic Counting

1 code implementation16 Nov 2023 Hefeng Wu, Yandong Chen, Lingbo Liu, Tianshui Chen, Keze Wang, Liang Lin

In the localization stage, the Scale-aware Multi-head Localization (SAML) module utilizes the query tensor to predict the confidence, location, and size of each potential object.

A Continual Learning Paradigm for Non-differentiable Visual Programming Frameworks on Visual Reasoning Tasks

no code implementations18 Sep 2023 Wentao Wan, Nan Kang, Zeqing Wang, Zhuojie Yang, Liang Lin, Keze Wang

Specifically, our CLVP distills the capabilities of well-trained task-specific models into the visual sub-modules in a stepwise and anti-forgetting manner.

Continual Learning Visual Reasoning

TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning

2 code implementations7 Dec 2021 Yang Liu, Keze Wang, Lingbo Liu, Haoyuan Lan, Liang Lin

To overcome these limitations, we take advantage of the multi-scale temporal dependencies within videos and proposes a novel video self-supervised learning framework named Temporal Contrastive Graph Learning (TCGL), which jointly models the inter-snippet and intra-snippet temporal dependencies for temporal representation learning with a hybrid graph contrastive learning strategy.

Action Recognition Contrastive Learning +5

Solving Inefficiency of Self-supervised Representation Learning

1 code implementation ICCV 2021 Guangrun Wang, Keze Wang, Guangcong Wang, Philip H. S. Torr, Liang Lin

In this paper, we reveal two contradictory phenomena in contrastive learning that we call under-clustering and over-clustering problems, which are major obstacles to learning efficiency.

Clustering Contrastive Learning +4

Temporal Contrastive Graph Learning for Video Action Recognition and Retrieval

no code implementations4 Jan 2021 Yang Liu, Keze Wang, Haoyuan Lan, Liang Lin

To model multi-scale temporal dependencies, our TCGL integrates the prior knowledge about the frame and snippet orders into graph structures, i. e., the intra-/inter- snippet temporal contrastive graphs.

Action Recognition Contrastive Learning +5

Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering

no code implementations ICCV 2021 Qingxing Cao, Wentao Wan, Keze Wang, Xiaodan Liang, Liang Lin

The experimental results show that our proposed method can improve current VQA models on OOD split without losing performance on the in-domain test data.

Novel Concepts Question Answering +1

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding

1 code implementation14 Dec 2020 Qingxing Cao, Bailin Li, Xiaodan Liang, Keze Wang, Liang Lin

Specifically, we generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs to disentangle the knowledge from other biases.

Question Answering Visual Question Answering

Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

1 code implementation30 Nov 2020 Junfan Lin, Zhongzhan Huang, Keze Wang, Xiaodan Liang, Weiwei Chen, Liang Lin

Although deep reinforcement learning (RL) has been successfully applied to a variety of robotic control tasks, it's still challenging to apply it to real-world tasks, due to the poor sample efficiency.

Continuous Control Reinforcement Learning (RL)

Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision Action Recognition

1 code implementation1 Sep 2020 Yang Liu, Keze Wang, Guanbin Li, Liang Lin

In this paper, we propose a novel framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos) by adaptively transferring and distilling the knowledge from multiple wearable sensors.

Action Recognition Image Generation +3

Linguistically Driven Graph Capsule Network for Visual Question Reasoning

no code implementations23 Mar 2020 Qingxing Cao, Xiaodan Liang, Keze Wang, Liang Lin

Inspired by the property of a capsule network that can carve a tree structure inside a regular convolutional neural network (CNN), we propose a hierarchical compositional reasoning model called the "Linguistically driven Graph Capsule Network", where the compositional process is guided by the linguistic parse tree.

Question Answering Visual Question Answering

Towards Causality-Aware Inferring: A Sequential Discriminative Approach for Medical Diagnosis

1 code implementation14 Mar 2020 Junfan Lin, Keze Wang, Ziliang Chen, Xiaodan Liang, Liang Lin

To eliminate this bias and inspired by the propensity score matching technique with causal diagram, we propose a propensity-based patient simulator to effectively answer unrecorded inquiry by drawing knowledge from the other records; Bias (ii) inherently comes along with the passively collected data, and is one of the key obstacles for training the agent towards "learning how" rather than "remembering what".

Medical Diagnosis

Face Hallucination by Attentive Sequence Optimization with Reinforcement Learning

no code implementations4 May 2019 Yukai Shi, Guanbin Li, Qingxing Cao, Keze Wang, Liang Lin

Face hallucination is a domain-specific super-resolution problem that aims to generate a high-resolution (HR) face image from a low-resolution~(LR) input.

Face Hallucination Hallucination +3

Adaptively Connected Neural Networks

1 code implementation CVPR 2019 Guangrun Wang, Keze Wang, Liang Lin

This paper presents a novel adaptively connected neural network (ACNet) to improve the traditional convolutional neural networks (CNNs) {in} two aspects.

Document Classification Image Classification +1

3D Human Pose Machines with Self-supervised Learning

2 code implementations arXiv.org 2019 Keze Wang, Liang Lin, Chenhan Jiang, Chen Qian, Pengxu Wei

Driven by recent computer vision and robotic applications, recovering 3D human poses has become increasingly important and attracted growing interests.

3D Human Pose Estimation Self-Supervised Learning

Teaching to Teach by Structured Dark Knowledge

no code implementations27 Sep 2018 Ziliang Chen, Keze Wang, Liang Lin

We evaluate T2T across different learners, teachers, and tasks, which significantly demonstrates that structured knowledge can be inherited by the teachers to further benefit learners' training.

Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria

1 code implementation30 Jun 2018 Keze Wang, Liang Lin, Xiaopeng Yan, Ziliang Chen, Dongyu Zhang, Lei Zhang

The proposed process can be compatible with mini-batch based training (i. e., using a batch of unlabeled or partially labeled data as a one-time input) for object detection.

Active Learning object-detection +2

Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

no code implementations CVPR 2018 Keze Wang, Xiaopeng Yan, Dongyu Zhang, Lei Zhang, Liang Lin

Though quite challenging, leveraging large-scale unlabeled or partially labeled images in a cost-effective way has increasingly attracted interests for its great importance to computer vision.

Active Learning Object +2

Recurrent 3D Pose Sequence Machines

no code implementations CVPR 2017 Mude Lin, Liang Lin, Xiaodan Liang, Keze Wang, Hui Cheng

3D human articulated pose recovery from monocular image sequences is very challenging due to the diverse appearances, viewpoints, occlusions, and also the human 3D pose is inherently ambiguous from the monocular imagery.

3D Human Pose Estimation 3D Pose Estimation

Deep Co-Space: Sample Mining Across Feature Transformation for Semi-Supervised Learning

no code implementations28 Jul 2017 Ziliang Chen, Keze Wang, Xiao Wang, Pai Peng, Ebroul Izquierdo, Liang Lin

Aiming at improving performance of visual classification in a cost-effective manner, this paper proposes an incremental semi-supervised learning paradigm called Deep Co-Space (DCS).

Classification General Classification +1

Structure-Preserving Image Super-resolution via Contextualized Multi-task Learning

no code implementations26 Jul 2017 Yukai Shi, Keze Wang, Chongyu Chen, Li Xu, Liang Lin

Single image super resolution (SR), which refers to reconstruct a higher-resolution (HR) image from the observed low-resolution (LR) image, has received substantial attention due to its tremendous application potentials.

Computational Efficiency Image Restoration +2

Active Self-Paced Learning for Cost-Effective and Progressive Face Identification

no code implementations13 Jan 2017 Liang Lin, Keze Wang, Deyu Meng, WangMeng Zuo, Lei Zhang

By naturally combining two recently rising techniques: active learning (AL) and self-paced learning (SPL), our framework is capable of automatically annotating new instances and incorporating them into training under weak expert re-certification.

Active Learning Face Identification

Cost-Effective Active Learning for Deep Image Classification

3 code implementations13 Jan 2017 Keze Wang, Dongyu Zhang, Ya Li, Ruimao Zhang, Liang Lin

In this paper, we propose a novel active learning framework, which is capable of building a competitive classifier with optimal feature representation via a limited amount of labeled training instances in an incremental learning manner.

Active Learning Classification +5

Human Pose Estimation from Depth Images via Inference Embedded Multi-task Learning

no code implementations13 Aug 2016 Keze Wang, Shengfu Zhai, Hui Cheng, Xiaodan Liang, Liang Lin

In this paper, we propose a novel inference-embedded multi-task learning framework for predicting human pose from still depth images, which is implemented with a deep architecture of neural networks.

Multi-Task Learning Pose Estimation +1

Local- and Holistic- Structure Preserving Image Super Resolution via Deep Joint Component Learning

no code implementations25 Jul 2016 Yukai Shi, Keze Wang, Li Xu, Liang Lin

Recently, machine learning based single image super resolution (SR) approaches focus on jointly learning representations for high-resolution (HR) and low-resolution (LR) image patch pairs to improve the quality of the super-resolved images.

Image Super-Resolution Representation Learning

3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks

no code implementations26 Jan 2015 Keze Wang, Xiaolong Wang, Liang Lin, Meng Wang, WangMeng Zuo

Our model thus advances existing approaches in two aspects: (i) it acts directly on the raw inputs (grayscale-depth data) to conduct recognition instead of relying on hand-crafted features, and (ii) the model structure can be dynamically adjusted accounting for the temporal variations of human activities, i. e. the network configuration is allowed to be partially activated during inference.

Human Activity Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.