Cross-modal Representation Learning for Zero-shot Action Recognition

no code implementations3 May 2022 Chung-Ching Lin, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu

The model design provides a natural mechanism for visual and semantic representations to be learned in a shared knowledge space, whereby it encourages the learned visual embedding to be discriminative and more semantically consistent.

Action Recognition Representation Learning +1

An Intriguing Property of Geophysics Inversion

no code implementations28 Apr 2022 Yinan Feng, Yinpeng Chen, Shihang Feng, Peng Jin, Zicheng Liu, Youzuo Lin

In particular, when dealing with the inversion from seismic data to subsurface velocity governed by a wave equation, the integral results of velocity with Gaussian kernels are linearly correlated to the integral of seismic data with sine kernels.


ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

no code implementations19 Apr 2022 Chunyuan Li, Haotian Liu, Liunian Harold Li, Pengchuan Zhang, Jyoti Aneja, Jianwei Yang, Ping Jin, Yong Jae Lee, Houdong Hu, Zicheng Liu, Jianfeng Gao

A variety of evaluation metrics are used, including sample-efficiency (zero-shot and few-shot) and parameter-efficiency (linear probing and full model fine-tuning).

Fairness Image Classification +1

MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images

no code implementations31 Mar 2022 Xiangjun Gao, Jiaolong Yang, Jongyoo Kim, Sida Peng, Zicheng Liu, Xin Tong

For this task, we propose a simple yet effective method to train a generalizable NeRF with multiview images as conditional input.

Novel View Synthesis

Deep Frequency Filtering for Domain Generalization

no code implementations23 Mar 2022 Shiqi Lin, Zhizheng Zhang, Zhipeng Huang, Yan Lu, Cuiling Lan, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Amey Parulkar, Viraj Navkal, Zhibo Chen

Improving the generalization capability of Deep Neural Networks (DNNs) is critical for their practical uses, which has been a longstanding challenge.

Domain Generalization

Decoupled Mixup for Data-efficient Learning

1 code implementation21 Mar 2022 Zicheng Liu, Siyuan Li, Ge Wang, Cheng Tan, Lirong Wu, Stan Z. Li

Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data.

Data Augmentation Semi-Supervised Image Classification

The Overlooked Classifier in Human-Object Interaction Recognition

no code implementations10 Mar 2022 Ying Jin, Yinpeng Chen, Lijuan Wang, JianFeng Wang, Pei Yu, Lin Liang, Jenq-Neng Hwang, Zicheng Liu

Human-Object Interaction (HOI) recognition is challenging due to two factors: (1) significant imbalance across classes and (2) requiring multiple labels per image.

Classification Human-Object Interaction Detection +2

Exploring Multi-physics with Extremely Weak Supervision

no code implementations3 Feb 2022 Shihang Feng, Peng Jin, Yinpeng Chen, Xitong Zhang, Zicheng Liu, Youzuo Lin

Our results show that we are able to invert for properties without explicit governing equations.


SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering

no code implementations25 Jan 2022 Peixi Xiong, Quanzeng You, Pei Yu, Zicheng Liu, Ying Wu

As a multi-modality task, it is challenging since it requires not only visual and textual understanding, but also the ability to align cross-modality representations.

Question Answering Visual Question Answering +1

Improving Vision Transformers for Incremental Learning

no code implementations12 Dec 2021 Pei Yu, Yinpeng Chen, Ying Jin, Zicheng Liu

This paper proposes a working recipe of using Vision Transformer (ViT) in class incremental learning.

class-incremental learning Incremental Learning

Injecting Semantic Concepts into End-to-End Image Captioning

no code implementations9 Dec 2021 Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we are concerned with a better-performing detector-free image captioning model, and propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features.

Image Captioning

MLP Architectures for Vision-and-Language Modeling: An Empirical Study

1 code implementation8 Dec 2021 Yixin Nie, Linjie Li, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal, Lijuan Wang

Based on this, we ask an even bolder question: can we have an all-MLP architecture for VL modeling, where both VL fusion and the vision encoder are replaced with MLPs?

Language Modelling Visual Question Answering +1

Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup

1 code implementation30 Nov 2021 Siyuan Li, Zicheng Liu, Di wu, Zihan Liu, Stan Z. Li

Mixup is a popular data-dependent augmentation technique for deep neural networks, which contains two sub-tasks, mixup generation and classification.

Data Augmentation Image Classification +2

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

1 code implementation25 Nov 2021 Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang

Based on this model architecture, we show that video captioning can benefit significantly from more densely sampled video frames as opposed to previous successes with sparsely sampled video frames for video-and-language understanding tasks (e. g., video question answering).

Frame Question Answering +3

Scaling Up Vision-Language Pre-training for Image Captioning

no code implementations24 Nov 2021 Xiaowei Hu, Zhe Gan, JianFeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang

In this paper, we present LEMON, a LargE-scale iMage captiONer, and provide the first empirical study on the scaling behavior of VLP for image captioning.

Image Captioning

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling

1 code implementation24 Nov 2021 Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu

Further, unlike previous studies that found pre-training tasks on video inputs (e. g., masked frame modeling) not very effective, we design a new pre-training task, Masked Visual-token Modeling (MVM), for better video modeling.

Frame Question Answering +4

Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling

no code implementations23 Nov 2021 Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang

In this paper, we propose UNICORN, a vision-language (VL) model that unifies text generation and bounding box prediction into a single architecture.

Image Captioning Language Modelling +5

Florence: A New Foundation Model for Computer Vision

1 code implementation22 Nov 2021 Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.

Action Classification Action Recognition +11

UFO: A UniFied TransfOrmer for Vision-Language Representation Learning

no code implementations19 Nov 2021 JianFeng Wang, Xiaowei Hu, Zhe Gan, Zhengyuan Yang, Xiyang Dai, Zicheng Liu, Yumao Lu, Lijuan Wang

In this paper, we propose a single UniFied transfOrmer (UFO), which is capable of processing either unimodal inputs (e. g., image or language) or multimodal inputs (e. g., the concatenation of the image and the question), for vision-language (VL) representation learning.

Image Captioning Language Modelling +6

Physics-guided Loss Functions Improve Deep Learning Performance in Inverse Scattering

no code implementations13 Nov 2021 Zicheng Liu, Mayank Roy, Dilip K. Prasad, Krishna Agarwal

Solving electromagnetic inverse scattering problems (ISPs) is challenging due to the intrinsic nonlinearity, ill-posedness, and expensive computational cost.

Unsupervised Learning of Full-Waveform Inversion: Connecting CNN and Partial Differential Equation in a Loop

no code implementations ICLR 2022 Peng Jin, Xitong Zhang, Yinpeng Chen, Sharon Xiaolei Huang, Zicheng Liu, Youzuo Lin

In particular, we use finite difference to approximate the forward modeling of PDE as a differentiable operator (from velocity map to seismic data) and model its inversion by CNN (from seismic data to velocity map).


Improving Discriminative Visual Representation Learning via Automatic Mixup

no code implementations29 Sep 2021 Siyuan Li, Zicheng Liu, Di wu, Stan Z. Li

In this paper, we decompose mixup into two sub-tasks of mixup generation and classification and formulate it for discriminative representations as class- and instance-level mixup.

Data Augmentation Representation Learning

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

no code implementations10 Sep 2021 Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang

To address this challenge, we propose PICa, a simple yet effective method that Prompts GPT3 via the use of Image Captions, for knowledge-based VQA.

Image Captioning Question Answering +2

Mobile-Former: Bridging MobileNet and Transformer

2 code implementations12 Aug 2021 Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Xiaoyi Dong, Lu Yuan, Zicheng Liu

This structure leverages the advantages of MobileNet at local processing and transformer at global interaction.

Object Detection

MicroNet: Improving Image Recognition with Extremely Low FLOPs

1 code implementation ICCV 2021 Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos

This paper aims at addressing the problem of substantial performance degradation at extremely low computational cost (e. g. 5M FLOPs on ImageNet classification).

Probabilistic Model Distillation for Semantic Correspondence

1 code implementation CVPR 2021 Xin Li, Deng-Ping Fan, Fan Yang, Ao Luo, Hong Cheng, Zicheng Liu

We address this problem with the use of a novel Probabilistic Model Distillation (PMD) approach which transfers knowledge learned by a probabilistic teacher model on synthetic data to a static student model with the use of unlabeled real image pairs.

Representation Learning Semantic correspondence

A data-based comparative review and AI-driven symbolic model for longitudinal dispersion coefficient in natural streams

no code implementations17 Jun 2021 Yifeng Zhao, Zicheng Liu, Pei Zhang, S. A. Galindo-Torres, Stan Z. Li

Whereas implicit ML-driven methods are black-boxes in nature, explicit ML-driven methods have more potential in prediction of LDC.

Compressing Visual-linguistic Model via Knowledge Distillation

no code implementations ICCV 2021 Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we study knowledge distillation (KD) to effectively compress a transformer-based large VL model into a small VL model.

Image Captioning Knowledge Distillation +3

Mesh Graphormer

1 code implementation ICCV 2021 Kevin Lin, Lijuan Wang, Zicheng Liu

We present a graph-convolution-reinforced transformer, named Mesh Graphormer, for 3D human pose and mesh reconstruction from a single image.

3D Hand Pose Estimation 3D Human Pose Estimation

TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking

no code implementations1 Apr 2021 Peng Chu, Jiang Wang, Quanzeng You, Haibin Ling, Zicheng Liu

TransMOT effectively models the interactions of a large number of objects by arranging the trajectories of the tracked objects as a set of sparse weighted graphs, and constructing a spatial graph transformer encoder layer, a temporal transformer encoder layer, and a spatial graph transformer decoder layer based on the graphs.

Ranked #2 on Multi-Object Tracking on MOT16 (using extra training data)

Multi-Object Tracking Multiple Object Tracking +1

Disentanglement-based Cross-Domain Feature Augmentation for Effective Unsupervised Domain Adaptive Person Re-identification

no code implementations25 Mar 2021 Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Quanzeng You, Zicheng Liu, Kecheng Zheng, Zhibo Chen

Each recomposed feature, obtained based on the domain-invariant feature (which enables a reliable inheritance of identity) and an enhancement from a domain specific feature (which enables the approximation of real distributions), is thus an "ideal" augmentation.

Disentanglement Domain Adaptive Person Re-Identification +1

Unveiling the Power of Mixup for Stronger Classifiers

1 code implementation24 Mar 2021 Zicheng Liu, Siyuan Li, Di wu, Zihan Liu, ZhiYuan Chen, Lirong Wu, Stan Z. Li

Specifically, AutoMix reformulates the mixup classification into two sub-tasks (i. e., mixed sample generation and mixup classification) with corresponding sub-networks and solves them in a bi-level optimization framework.

Classification Data Augmentation +3

Revisiting Dynamic Convolution via Matrix Decomposition

1 code implementation ICLR 2021 Yunsheng Li, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Ye Yu, Lu Yuan, Zicheng Liu, Mei Chen, Nuno Vasconcelos

It has two limitations: (a) it increases the number of convolutional weights by K-times, and (b) the joint optimization of dynamic attention and static convolution kernels is challenging.

Dimensionality Reduction

Stronger NAS with Weaker Predictors

1 code implementation NeurIPS 2021 Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan

We propose a paradigm shift from fitting the whole architecture space using one strong predictor, to progressively fitting a search path towards the high-performance sub-space through a set of weaker predictors.

Neural Architecture Search

Weak NAS Predictor Is All You Need

no code implementations1 Jan 2021 Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan

Rather than expecting a single strong predictor to model the whole space, we seek a progressive line of weak predictors that can connect a path to the best architecture, thus greatly simplifying the learning task of each predictor.

Neural Architecture Search

3D Human motion anticipation and classification

no code implementations31 Dec 2020 Emad Barsoum, John Kender, Zicheng Liu

Our model learns to predict multiple future sequences of human poses from the same input sequence.

Action Recognition Classification +5

MiniVLM: A Smaller and Faster Vision-Language Model

no code implementations13 Dec 2020 JianFeng Wang, Xiaowei Hu, Pengchuan Zhang, Xiujun Li, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu

We design a Two-stage Efficient feature Extractor (TEE), inspired by the one-stage EfficientDet network, to significantly reduce the time cost of visual feature extraction by $95\%$, compared to a baseline model.

Language Modelling

MicroNet: Towards Image Recognition with Extremely Low FLOPs

no code implementations24 Nov 2020 Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos

In this paper, we present MicroNet, which is an efficient convolutional neural network using extremely low computational cost (e. g. 6 MFLOPs on ImageNet classification).

Semantic Change Detection with Asymmetric Siamese Networks

no code implementations12 Oct 2020 Kunping Yang, Gui-Song Xia, Zicheng Liu, Bo Du, Wen Yang, Marcello Pelillo, Liangpei Zhang

Given two multi-temporal aerial images, semantic change detection aims to locate the land-cover variations and identify their change types with pixel-wise boundaries.

Change Detection

Deep Clustering and Representation Learning that Preserves Geometric Structures

no code implementations28 Sep 2020 Lirong Wu, Zicheng Liu, Zelin Zang, Jun Xia, Siyuan Li, Stan Z. Li

To overcome the problem that clusteringoriented losses may deteriorate the geometric structure of embeddings in the latent space, an isometric loss is proposed for preserving intra-manifold structure locally and a ranking loss for inter-manifold structure globally.

Deep Clustering Representation Learning

VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning

no code implementations28 Sep 2020 Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu

It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps).

Image Captioning TAG

Generalized Clustering and Multi-Manifold Learning with Geometric Structure Preservation

1 code implementation21 Sep 2020 Lirong Wu, Zicheng Liu, Zelin Zang, Jun Xia, Siyuan Li, Stan Z. Li

Though manifold-based clustering has become a popular research topic, we observe that one important factor has been omitted by these works, namely that the defined clustering loss may corrupt the local and global structure of the latent space.

Deep Clustering Representation Learning

Dynamic ReLU

2 code implementations ECCV 2020 Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dong-Dong Chen, Lu Yuan, Zicheng Liu

Rectified linear units (ReLU) are commonly used in deep neural networks.

Learning Nonparametric Human Mesh Reconstruction from a Single Image without Ground Truth Meshes

no code implementations28 Feb 2020 Kevin Lin, Lijuan Wang, Ying Jin, Zicheng Liu, Ming-Ting Sun

Experimental results on multiple public datasets show that without using 3D ground truth meshes, the proposed approach outperforms the previous state-of-the-art approaches that require ground truth meshes for training.

Dynamic Convolution: Attention over Convolution Kernels

4 code implementations CVPR 2020 Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dong-Dong Chen, Lu Yuan, Zicheng Liu

Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited representation capability.

Image Classification Keypoint Detection

Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation

3 code implementations11 Jul 2019 Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, Ming-Ting Sun

On the other hand, if part labels are also available in the real-images during training, our method outperforms the supervised state-of-the-art methods by a large margin.

 Ranked #1 on Human Part Segmentation on PASCAL-Part (using extra training data)

Domain Adaptation Human Part Segmentation +2

Large Scale Incremental Learning

2 code implementations CVPR 2019 Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, Yun Fu

We believe this is because of the combination of two factors: (a) the data imbalance between the old and new classes, and (b) the increasing number of visually similar classes.

Incremental Learning

Rethinking Classification and Localization for Object Detection

2 code implementations CVPR 2020 Yue Wu, Yinpeng Chen, Lu Yuan, Zicheng Liu, Lijuan Wang, Hongzhi Li, Yun Fu

Two head structures (i. e. fully connected head and convolution head) have been widely used in R-CNN based detectors for classification and localization tasks.

Classification General Classification +1

Incremental Classifier Learning with Generative Adversarial Networks

no code implementations2 Feb 2018 Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, Zhengyou Zhang, Yun Fu

To address these problems, we propose (a) a new loss function to combine the cross-entropy loss and distillation loss, (b) a simple way to estimate and remove the unbalance between the old and new classes , and (c) using Generative Adversarial Networks (GANs) to generate historical data and select representative exemplars during generation.

General Classification

HP-GAN: Probabilistic 3D human motion prediction via GAN

3 code implementations27 Nov 2017 Emad Barsoum, John Kender, Zicheng Liu

Our model, which we call HP-GAN, learns a probability density function of future human poses conditioned on previous poses.

Autonomous Vehicles Human motion prediction +4

A Tube-and-Droplet-based Approach for Representing and Analyzing Motion Trajectories

no code implementations10 Sep 2016 Weiyao Lin, Yang Zhou, Hongteng Xu, Junchi Yan, Mingliang Xu, Jianxin Wu, Zicheng Liu

Our approach first leverages the complete information from given trajectories to construct a thermal transfer field which provides a context-rich way to describe the global motion pattern in a scene.

3D Action Recognition Anomaly Detection

HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences

no code implementations CVPR 2013 Omar Oreifej, Zicheng Liu

In contrast, we describe the depth sequence using a histogram capturing the distribution of the surface normal orientation in the 4D space of time, depth, and spatial coordinates.

Activity Recognition

Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

no code implementations CVPR 2013 Luming Zhang, Mingli Song, Zicheng Liu, Xiao Liu, Jiajun Bu, Chun Chen

Finally, we propose a novel image segmentation algorithm, called graphlet cut, that leverages the learned graphlet distribution in measuring the homogeneity of a set of spatially structured superpixels.

Semantic Segmentation Superpixels

Tensor-Based Human Body Modeling

no code implementations CVPR 2013 Yinpeng Chen, Zicheng Liu, Zhengyou Zhang

In this paper, we present a novel approach to model 3D human body with variations on both human shape and pose, by exploring a tensor decomposition technique.

3D Reconstruction Tensor Decomposition

