Search Results for author: Xiaolong Wang

Found 126 papers, 38 papers with code

Test-Time Training for Generalization under Distribution Shifts

no code implementations ICML 2020 Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, University of California Moritz Hardt

We introduce a general approach, called test-time training, for improving the performance of predictive models when training and test data come from different distributions.

Image Classification Self-Supervised Learning

CATNet: Cross-event Attention-based Time-aware Network for Medical Event Prediction

no code implementations29 Apr 2022 Sicen Liu, Xiaolong Wang, Yang Xiang, Hui Xu, Hui Wang, Buzhou Tang

It is a time-aware, event-aware and task-adaptive method with the following advantages: 1) modeling heterogeneous information and temporal information in a unified way and considering temporal irregular characteristics locally and globally respectively, 2) taking full advantage of correlations among different types of events via cross-event attention.

Time Series

From One Hand to Multiple Hands: Imitation Learning for Dexterous Manipulation from Single-Camera Teleoperation

no code implementations26 Apr 2022 Yuzhe Qin, Hao Su, Xiaolong Wang

We propose to perform imitation learning for dexterous manipulation with multi-finger robot hand from human demonstrations, and transfer the policy to the real robot hand.

Imitation Learning

GIFS: Neural Implicit Function for General Shape Representation

no code implementations14 Apr 2022 Jianglong Ye, Yuntao Chen, Naiyan Wang, Xiaolong Wang

This limitation leads to tedious data processing (converting non-watertight raw data to watertight) as well as the incapability of representing general object shapes in the real world.

3D Shape Reconstruction

Learning Generalizable Dexterous Manipulation from Human Grasp Affordance

no code implementations5 Apr 2022 Yueh-Hua Wu, Jiashun Wang, Xiaolong Wang

In this paper, we propose to learn dexterous manipulation using large-scale demonstrations with diverse 3D objects in a category, which are generated from a human grasp affordance model.

Imitation Learning Representation Learning

Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos

no code implementations4 Apr 2022 Shaowei Liu, Subarna Tripathi, Somdeb Majumdar, Xiaolong Wang

To tackle this task, we first provide an automatic way to collect trajectory and hotspots labels on large-scale data.

CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs

no code implementations30 Mar 2022 Jiteng Mu, Shalini De Mello, Zhiding Yu, Nuno Vasconcelos, Xiaolong Wang, Jan Kautz, Sifei Liu

We represent the correspondence maps of different images as warped coordinate frames transformed from a canonical coordinate frame, i. e., the correspondence map, which describes the structure (e. g., the shape of a face), is controlled via a transformation.

Disentanglement Frame

Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image

no code implementations17 Mar 2022 Xuanchi Ren, Xiaolong Wang

Novel view synthesis from a single image has recently attracted a lot of attention, and it has been primarily advanced by 3D deep learning and rendering techniques.

Frame Novel View Synthesis

Temporal Difference Learning for Model Predictive Control

1 code implementation9 Mar 2022 Nicklas Hansen, Xiaolong Wang, Hao Su

Data-driven model predictive control has two key advantages over model-free methods: a potential for improved sample efficiency through model learning, and better performance as computational budget for planning increases.

Continuous Control

GroupViT: Semantic Segmentation Emerges from Text Supervision

1 code implementation22 Feb 2022 Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang

With only text supervision and without any pixel-level annotations, GroupViT learns to group together semantic regions and successfully transfers to the task of semantic segmentation in a zero-shot manner, i. e., without any further fine-tuning.

Object Detection Scene Understanding +2

Multimodal data matters: language model pre-training over structured and unstructured electronic health records

1 code implementation25 Jan 2022 Sicen Liu, Xiaolong Wang, Yongshuai Hou, Ge Li, Hui Wang, Hui Xu, Yang Xiang, Buzhou Tang

The massive amount of electronic health records (EHRs) has created enormous potential for improving healthcare, among which clinical codes (structured data) and clinical narratives (unstructured data) are two important textual modalities.

Decision Making Language Modelling

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

no code implementations19 Jan 2022 Rishabh Jangir, Nicklas Hansen, Sambaran Ghosal, Mohit Jain, Xiaolong Wang

We propose a setting for robotic manipulation in which the agent receives visual feedback from both a third-person camera and an egocentric camera mounted on the robot's wrist.

NovelD: A Simple yet Effective Exploration Criterion

1 code implementation NeurIPS 2021 Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph E. Gonzalez, Yuandong Tian

We analyze NovelD thoroughly in MiniGrid and found that empirically it helps the agent explore the environment more uniformly with a focus on exploring beyond the boundary.

Efficient Exploration Montezuma's Revenge +1

Learning Continuous Environment Fields via Implicit Functions

no code implementations ICLR 2022 Xueting Li, Shalini De Mello, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz, Sifei Liu

We propose a novel scene representation that encodes reaching distance -- the distance between any position in the scene to a goal along a feasible trajectory.

Trajectory Prediction

Online Adaptation for Implicit Object Tracking and Shape Reconstruction in the Wild

no code implementations24 Nov 2021 Jianglong Ye, Yuntao Chen, Naiyan Wang, Xiaolong Wang

Tracking and reconstructing 3D objects from cluttered scenes are the key components for computer vision, robotics and autonomous driving systems.

3D Shape Reconstruction Autonomous Driving +1

Multi-Person 3D Motion Prediction with Multi-Range Transformers

no code implementations NeurIPS 2021 Jiashun Wang, Huazhe Xu, Medhini Narasimhan, Xiaolong Wang

Thus, instead of predicting each human pose trajectory in isolation, we introduce a Multi-Range Transformers model which contains of a local-range encoder for individual motion and a global-range encoder for social interactions.

motion prediction Trajectory Prediction

Vision-Guided Quadrupedal Locomotion in the Wild with Multi-Modal Delay Randomization

no code implementations29 Sep 2021 Chieko Sarah Imai, Minghao Zhang, Yuchen Zhang, Marcin Kierebinski, Ruihan Yang, Yuzhe Qin, Xiaolong Wang

While Reinforcement Learning (RL) provides a promising paradigm for agile locomotion skills with vision inputs in simulation, it is still very challenging to deploy the RL policy in the real world.

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

no code implementations12 Aug 2021 Yuzhe Qin, Yueh-Hua Wu, Shaowei Liu, Hanwen Jiang, Ruihan Yang, Yang Fu, Xiaolong Wang

We design a platform with: (i) a simulation system for complex dexterous manipulation tasks with a multi-finger robot hand and (ii) a computer vision system to record large-scale demonstrations of a human hand conducting the same tasks.

Imitation Learning motion retargeting +1

Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers

no code implementations ICLR 2022 Ruihan Yang, Minghao Zhang, Nicklas Hansen, Huazhe Xu, Xiaolong Wang

Our key insight is that proprioceptive states only offer contact measurements for immediate reaction, whereas an agent equipped with visual sensory observations can learn to proactively maneuver environments with obstacles and uneven terrain by anticipating changes in the environment many steps ahead.

Sentence-level Online Handwritten Chinese Character Recognition

no code implementations4 Jul 2021 Yunxin Li, Qian Yang, Qingcai Chen, Lin Ma, Baotian Hu, Xiaolong Wang, Yuxin Ding

Single online handwritten Chinese character recognition~(single OLHCCR) has achieved prominent performance.

Word Embeddings

GlyphCRM: Bidirectional Encoder Representation for Chinese Character with its Glyph

no code implementations1 Jul 2021 Yunxin Li, Yu Zhao, Baotian Hu, Qingcai Chen, Yang Xiang, Xiaolong Wang, Yuxin Ding, Lin Ma

Previous works indicate that the glyph of Chinese characters contains rich semantic information and has the potential to enhance the representation of Chinese characters.

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

2 code implementations NeurIPS 2021 Nicklas Hansen, Hao Su, Xiaolong Wang

Our method greatly improves stability and sample efficiency of ConvNets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL in environments with unseen visuals.

Data Augmentation Q-Learning

Single RGB-D Camera Teleoperation for General Robotic Manipulation

no code implementations28 Jun 2021 Quan Vuong, Yuzhe Qin, Runlin Guo, Xiaolong Wang, Hao Su, Henrik Christensen

We propose a teleoperation system that uses a single RGB-D camera as the human motion capture device.

Frame

DAIR: Disentangled Attention Intrinsic Regularization for Safe and Efficient Bimanual Manipulation

no code implementations10 Jun 2021 Minghao Zhang, Pingcheng Jian, Yi Wu, Huazhe Xu, Xiaolong Wang

We address the problem of safely solving complex bimanual robot manipulation tasks with sparse rewards.

Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

no code implementations CVPR 2021 Shaowei Liu, Hanwen Jiang, Jiarui Xu, Sifei Liu, Xiaolong Wang

Estimating 3D hand and object pose from a single image is an extremely challenging problem: hands and objects are often self-occluded during interactions, and the 3D annotations are scarce as even humans cannot directly label the ground-truths from a single image perfectly.

Hand Pose Estimation

Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency

no code implementations ICCV 2021 Haiping Wu, Xiaolong Wang

In this paper, we propose a novel contrastive learning method which explores the cross-video relation by using cycle-consistency for general image representation learning.

Action Recognition Contrastive Learning +3

Robust Object Detection via Instance-Level Temporal Cycle Confusion

1 code implementation ICCV 2021 Xin Wang, Thomas E. Huang, Benlin Liu, Fisher Yu, Xiaolong Wang, Joseph E. Gonzalez, Trevor Darrell

Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real-world applications.

Frame Out-of-Distribution Generalization +1

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation

1 code implementation ICCV 2021 Jiteng Mu, Weichao Qiu, Adam Kortylewski, Alan Yuille, Nuno Vasconcelos, Xiaolong Wang

To deal with the large shape variance, we introduce Articulated Signed Distance Functions (A-SDF) to represent articulated shapes with a disentangled latent space, where we have separate codes for encoding shape and articulation.

Hand-Object Contact Consistency Reasoning for Human Grasps Generation

no code implementations ICCV 2021 Hanwen Jiang, Shaowei Liu, Jiashun Wang, Xiaolong Wang

Based on the hand-object contact consistency, we design novel objectives in training the human grasp generation model and also a new self-supervised task which allows the grasp generation network to be adjusted even during test time.

Grasp Generation

Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective

5 code implementations ICCV 2021 Jiarui Xu, Xiaolong Wang

To learn generalizable representation for correspondence in large-scale, a variety of self-supervised pretext tasks are proposed to explicitly perform object-level or patch-level similarity learning.

Contrastive Learning Frame +4

Region Similarity Representation Learning

1 code implementation ICCV 2021 Tete Xiao, Colorado J Reed, Xiaolong Wang, Kurt Keutzer, Trevor Darrell

We present Region Similarity Representation Learning (ReSim), a new approach to self-supervised representation learning for localization-based tasks such as object detection and segmentation.

Instance Segmentation Object Detection +3

Solving Compositional Reinforcement Learning Problems via Task Reduction

1 code implementation ICLR 2021 Yunfei Li, Yilin Wu, Huazhe Xu, Xiaolong Wang, Yi Wu

We propose a novel learning paradigm, Self-Imitation via Reduction (SIR), for solving compositional reinforcement learning problems.

Continuous Control reinforcement-learning

Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization

2 code implementations ICLR 2021 Zhenggang Tang, Chao Yu, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Du, Yu Wang, Yi Wu

We propose a simple, general and effective technique, Reward Randomization for discovering diverse strategic policies in complex multi-agent games.

BeBold: Exploration Beyond the Boundary of Explored Regions

3 code implementations15 Dec 2020 Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph E. Gonzalez, Yuandong Tian

In this paper, we analyze the pros and cons of each method and propose the regulated difference of inverse visitation counts as a simple but effective criterion for IR.

Efficient Exploration NetHack +1

Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes

1 code implementation CVPR 2021 Jiashun Wang, Huazhe Xu, Jingwei Xu, Sifei Liu, Xiaolong Wang

Synthesizing 3D human motion plays an important role in many graphics applications as well as understanding human activity.

motion synthesis

Online Adaptation for Consistent Mesh Reconstruction in the Wild

no code implementations NeurIPS 2020 Xueting Li, Sifei Liu, Shalini De Mello, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz

This paper presents an algorithm to reconstruct temporally consistent 3D meshes of deformable object instances from videos in the wild.

3D Reconstruction Frame

MedWriter: Knowledge-Aware Medical Text Generation

no code implementations COLING 2020 Youcheng Pan, Qingcai Chen, Weihua Peng, Xiaolong Wang, Baotian Hu, Xin Liu, Junying Chen, Wenxiu Zhou

To exploit the domain knowledge to guarantee the correctness of generated text has been a hot topic in recent years, especially for high professional domains such as medical.

Text Generation

Generalization in Reinforcement Learning by Soft Data Augmentation

1 code implementation26 Nov 2020 Nicklas Hansen, Xiaolong Wang

Extensive efforts have been made to improve the generalization ability of Reinforcement Learning (RL) methods via domain randomization and data augmentation.

Data Augmentation reinforcement-learning

Multi-Agent Collaboration via Reward Attribution Decomposition

2 code implementations16 Oct 2020 Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph E. Gonzalez, Yuandong Tian

In this work, we propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge and supports ad hoc team play.

Dota 2 Multi-agent Reinforcement Learning +2

Reducing Class Collapse in Metric Learning with Easy Positive Sampling

no code implementations28 Sep 2020 Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell

We theoretically prove and empirically show that under reasonable noise assumptions, prevalent embedding losses in metric learning, e. g., triplet loss, tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.

Image Retrieval Metric Learning

Hierarchical Style-based Networks for Motion Synthesis

no code implementations ECCV 2020 Jingwei Xu, Huazhe Xu, Bingbing Ni, Xiaokang Yang, Xiaolong Wang, Trevor Darrell

Generating diverse and natural human motion is one of the long-standing goals for creating intelligent characters in the animated world.

motion synthesis

Fast ORB-SLAM without Keypoint Descriptors

no code implementations22 Aug 2020 Qiang Fu, Hongshan Yu, Xiaolong Wang, Zhengeng Yang, Hong Zhang, Ajmal Mian

ORB-SLAM2 \cite{orbslam2} is a benchmark method in this domain, however, it consumes significant time for computing descriptors that never get reused unless a frame is selected as a keyframe.

Robotics Computational Geometry I.4.0; I.4.9

What Should Not Be Contrastive in Contrastive Learning

no code implementations ICLR 2021 Tete Xiao, Xiaolong Wang, Alexei A. Efros, Trevor Darrell

Recent self-supervised contrastive methods have been able to produce impressive transferable visual representations by learning to be invariant to different data augmentations.

Contrastive Learning

Self-Supervised Policy Adaptation during Deployment

1 code implementation ICLR 2021 Nicklas Hansen, Rishabh Jangir, Yu Sun, Guillem Alenyà, Pieter Abbeel, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang

A natural solution would be to keep training after deployment in the new environment, but this cannot be done if the new environment offers no reward signal.

Deep Isometric Learning for Visual Recognition

1 code implementation ICML 2020 Haozhi Qi, Chong You, Xiaolong Wang, Yi Ma, Jitendra Malik

Initialization, normalization, and skip connections are believed to be three indispensable techniques for training very deep convolutional neural networks and obtaining state-of-the-art performance.

Compositional Video Synthesis with Action Graphs

1 code implementation27 Jun 2020 Amir Bar, Roei Herzig, Xiaolong Wang, Anna Rohrbach, Gal Chechik, Trevor Darrell, Amir Globerson

Our generative model for this task (AG2Vid) disentangles motion and appearance features, and by incorporating a scheduling mechanism for actions facilitates a timely and coordinated video generation.

Video Generation Video Prediction +1

Rethinking preventing class-collapsing in metric learning with margin-based losses

no code implementations ICCV 2021 Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell

We theoretically prove and empirically show that under reasonable noise assumptions, margin-based losses tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.

Image Retrieval Metric Learning

Class-Aware Domain Adaptation for Improving Adversarial Robustness

no code implementations10 May 2020 Xianxu Hou, Jingxin Liu, Bolei Xu, Xiaolong Wang, Bozhi Liu, Guoping Qiu

To improve the adversarial robustness of neural networks, adversarial training has been proposed to train networks by injecting adversarial examples into the training data.

Adversarial Attack Adversarial Defense +2

State-Only Imitation Learning for Dexterous Manipulation

no code implementations7 Apr 2020 Ilija Radosavovic, Xiaolong Wang, Lerrel Pinto, Jitendra Malik

To tackle this setting, we train an inverse dynamics model and use it to predict actions for state-only demonstrations.

Imitation Learning reinforcement-learning

Multi-Task Reinforcement Learning with Soft Modularization

1 code implementation NeurIPS 2020 Ruihan Yang, Huazhe Xu, Yi Wu, Xiaolong Wang

While training multiple tasks jointly allow the policies to share parameters across different tasks, the optimization problem becomes non-trivial: It remains unclear what parameters in the network should be reused across tasks, and how the gradients from different tasks may interfere with each other.

Meta-Learning Multi-Task Learning +1

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

1 code implementation ICLR 2020 Qian Long, Zihan Zhou, Abhibav Gupta, Fei Fang, Yi Wu, Xiaolong Wang

In multi-agent games, the complexity of the environment can grow exponentially as the number of agents increases, so it is particularly challenging to learn good policies when the agent population is large.

Multi-agent Reinforcement Learning reinforcement-learning

Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning

3 code implementations ICCV 2021 Yinbo Chen, Zhuang Liu, Huijuan Xu, Trevor Darrell, Xiaolong Wang

The edge between these two lines of works has yet been underexplored, and the effectiveness of meta-learning in few-shot learning remains unclear.

Classification Few-Shot Learning +1

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks

1 code implementation CVPR 2020 Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang, Trevor Darrell

Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations.

Action Recognition

A Deep Learning-Based System for PharmaCoNER

no code implementations WS 2019 Ying Xiong, Yedan Shen, Yuanhang Huang, Shuai Chen, Buzhou Tang, Xiaolong Wang, Qingcai Chen, Jun Yan, Yi Zhou

The Biological Text Mining Unit at BSC and CNIO organized the first shared task on chemical {\&} drug mention recognition from Spanish medical texts called PharmaCoNER (Pharmacological Substances, Compounds and proteins and Named Entity Recognition track) in 2019, which includes two tracks: one for NER offset and entity classification (track 1) and the other one for concept indexing (track 2).

General Classification Named Entity Recognition +1

Test-Time Training with Self-Supervision for Generalization under Distribution Shifts

3 code implementations29 Sep 2019 Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei A. Efros, Moritz Hardt

In this paper, we propose Test-Time Training, a general approach for improving the performance of predictive models when training and test data come from different distributions.

CARLA MAP Leaderboard Image Classification +3

Joint-task Self-supervised Learning for Temporal Correspondence

2 code implementations NeurIPS 2019 Xueting Li, Sifei Liu, Shalini De Mello, Xiaolong Wang, Jan Kautz, Ming-Hsuan Yang

Our learning process integrates two highly related tasks: tracking large image regions \emph{and} establishing fine-grained pixel-level associations between consecutive video frames.

Frame Object Tracking +3

IS THE LABEL TRUSTFUL: TRAINING BETTER DEEP LEARNING MODEL VIA UNCERTAINTY MINING NET

no code implementations25 Sep 2019 Yang Sun, Abhishek Kolagunda, Steven Eliuk, Xiaolong Wang

During the training stage, we utilize all the available data (labeled and unlabeled) to train the classifier via a semi-supervised generative framework.

Test-Time Training for Out-of-Distribution Generalization

no code implementations25 Sep 2019 Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei A. Efros, Moritz Hardt

We introduce a general approach, called test-time training, for improving the performance of predictive models when test and training data come from different distributions.

Image Classification Out-of-Distribution Generalization +1

Accelerating Proposal Generation Network for \\Fast Face Detection on Mobile Devices

no code implementations27 Apr 2019 Heming Zhang, Xiaolong Wang, Jingwen Zhu, C. -C. Jay Kuo

In this work, we present a proposal generation acceleration framework for real-time face detection.

Face Detection

Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments

no code implementations CVPR 2019 Xueting Li, Sifei Liu, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz

In order to predict valid affordances and learn possible 3D human poses in indoor scenes, we need to understand the semantic and geometric structure of a scene as well as its potential interactions with a human.

MICIK: MIning Cross-Layer Inherent Similarity Knowledge for Deep Model Compression

no code implementations3 Feb 2019 Jie Zhang, Xiaolong Wang, Dawei Li, Shalini Ghosh, Abhishek Kolagunda, Yalin Wang

State-of-the-art deep model compression methods exploit the low-rank approximation and sparsity pruning to remove redundant parameters from a learned hidden layer.

Knowledge Distillation Model Compression

Spatio-Temporal Action Graph Networks

1 code implementation4 Dec 2018 Roei Herzig, Elad Levi, Huijuan Xu, Hang Gao, Eli Brosh, Xiaolong Wang, Amir Globerson, Trevor Darrell

Events defined by the interaction of objects in a scene are often of critical importance; yet important events may have insufficient labeled examples to train a conventional deep model to generalize to future object appearance.

Activity Recognition Autonomous Driving +2

Visual Semantic Navigation using Scene Priors

1 code implementation ICLR 2019 Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, Roozbeh Mottaghi

Do we use the semantic/functional priors we have built over years to efficiently search and navigate?

reinforcement-learning

Interpretable Intuitive Physics Model

no code implementations ECCV 2018 Tian Ye, Xiaolong Wang, James Davidson, Abhinav Gupta

In order to demonstrate that our system models these underlying physical properties, we train our model on collisions of different shapes (cube, cone, cylinder, spheres etc.)

Videos as Space-Time Region Graphs

no code implementations ECCV 2018 Xiaolong Wang, Abhinav Gupta

These nodes are connected by two types of relations: (i) similarity relations capturing the long range dependencies between correlated objects and (ii) spatial-temporal relations capturing the interactions between nearby objects.

Ranked #27 on Action Classification on Charades (using extra training data)

Action Classification Action Recognition

LSDSCC: a Large Scale Domain-Specific Conversational Corpus for Response Generation with Diversity Oriented Evaluation Metrics

no code implementations NAACL 2018 Zhen Xu, Nan Jiang, Bingquan Liu, Wenge Rong, Bowen Wu, Baoxun Wang, Zhuoran Wang, Xiaolong Wang

The experimental results have shown that our proposed corpus can be taken as a new benchmark dataset for the NRG task, and the presented metrics are promising to guide the optimization of NRG models by quantifying the diversity of the generated responses reasonably.

Machine Translation Response Generation

Talking Face Generation by Conditional Recurrent Adversarial Network

1 code implementation13 Apr 2018 Yang Song, Jingwen Zhu, Dawei Li, Xiaolong Wang, Hairong Qi

Given an arbitrary face image and an arbitrary speech clip, the proposed work attempts to generating the talking face video with accurate lip synchronization while maintaining smooth transition of both lip and facial movement over the entire video clip.

Constrained Lip-synchronization Video Generation

Binge Watching: Scaling Affordance Learning from Sitcoms

no code implementations CVPR 2017 Xiaolong Wang, Rohit Girdhar, Abhinav Gupta

In this paper, we tackle the challenge of creating one of the biggest dataset for learning affordances.

3D Human Pose Estimation in the Wild by Adversarial Learning

no code implementations CVPR 2018 Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, Xiaogang Wang

Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild.

Monocular 3D Human Pose Estimation

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

3 code implementations CVPR 2018 Xiaolong Wang, Yufei Ye, Abhinav Gupta

Given a learned knowledge graph (KG), our approach takes as input semantic embeddings for each node (representing visual category).

Knowledge Graphs Zero-Shot Learning

Boundary-sensitive Network for Portrait Segmentation

no code implementations22 Dec 2017 Xianzhi Du, Xiaolong Wang, Dawei Li, Jingwen Zhu, Serafettin Tasci, Cameron Upright, Stephen Walsh, Larry Davis

Compared to the general semantic segmentation problem, portrait segmentation has higher precision requirement on boundary area.

Portrait Segmentation Semantic Segmentation

Non-local Neural Networks

25 code implementations CVPR 2018 Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.

Ranked #7 on Action Classification on Toyota Smarthome dataset (using extra training data)

Action Classification Action Recognition +3

Neural Response Generation via GAN with an Approximate Embedding Layer

no code implementations EMNLP 2017 Zhen Xu, Bingquan Liu, Baoxun Wang, Chengjie Sun, Xiaolong Wang, Zhuoran Wang, Chao Qi

This paper presents a Generative Adversarial Network (GAN) to model single-turn short-text conversations, which trains a sequence-to-sequence (Seq2Seq) network for response generation simultaneously with a discriminative classifier that measures the differences between human-produced responses and machine-generated ones.

Machine Translation Response Generation

DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices

no code implementations16 Aug 2017 Dawei Li, Xiaolong Wang, Deguang Kong

As observed in the experiment, DeepRebirth achieves more than 3x speed-up and 2. 5x run-time memory saving on GoogLeNet with only 0. 4% drop of top-5 accuracy on ImageNet.

Model Compression

Transitive Invariance for Self-supervised Visual Representation Learning

no code implementations ICCV 2017 Xiaolong Wang, Kaiming He, Abhinav Gupta

The objects are connected by two types of edges which correspond to two types of invariance: "different instances but a similar viewpoint and category" and "different viewpoints of the same instance".

Multi-Task Learning Object Detection +2

Temporal Dynamic Graph LSTM for Action-driven Video Object Detection

no code implementations ICCV 2017 Yuan Yuan, Xiaodan Liang, Xiaolong Wang, Dit-yan Yeung, Abhinav Gupta

A common issue, however, is that objects of interest that are not involved in human actions are often absent in global action descriptions known as "missing label".

Object Recognition Video Object Detection +1

Incorporating Label Dependency for Answer Quality Tagging in Community Question Answering via CNN-LSTM-CRF

1 code implementation COLING 2016 Yang Xiang, Xiaoqiang Zhou, Qingcai Chen, Zhihui Zheng, Buzhou Tang, Xiaolong Wang, Yang Qin

In community question answering (cQA), the quality of answers are determined by the matching degree between question-answer pairs and the correlation among the answers.

Community Question Answering

Incorporating Loose-Structured Knowledge into Conversation Modeling via Recall-Gate LSTM

1 code implementation17 May 2016 Zhen Xu, Bingquan Liu, Baoxun Wang, Chengjie Sun, Xiaolong Wang

Modeling human conversations is the essence for building satisfying chat-bots with multi-turn dialog ability.

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

no code implementations6 Apr 2016 Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, Abhinav Gupta

Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects.

Action Recognition

Generative Image Modeling using Style and Structure Adversarial Networks

no code implementations17 Mar 2016 Xiaolong Wang, Abhinav Gupta

Current generative frameworks use end-to-end learning and generate images by sampling from uniform noise distribution.

Image Generation

Actions ~ Transformations

1 code implementation CVPR 2016 Xiaolong Wang, Ali Farhadi, Abhinav Gupta

In this paper, we propose a novel representation for actions by modeling an action as a transformation which changes the state of the environment before the action happens (precondition) to the state after the action (effect).

Action Recognition

Answer Sequence Learning with Neural Networks for Answer Selection in Community Question Answering

no code implementations IJCNLP 2015 Xiaoqiang Zhou, Baotian Hu, Qingcai Chen, Buzhou Tang, Xiaolong Wang

In this paper, the answer selection problem in community question answering (CQA) is regarded as an answer sequence labeling task, and a novel approach is proposed based on the recurrent architecture for this problem.

Answer Selection Community Question Answering

In Defense of the Direct Perception of Affordances

no code implementations5 May 2015 David F. Fouhey, Xiaolong Wang, Abhinav Gupta

The field of functional recognition or affordance estimation from images has seen a revival in recent years.

Learning Contour-Fragment-based Shape Model with And-Or Tree Representation

no code implementations3 Feb 2015 Liang Lin, Xiaolong Wang, Wei Yang, Jian-Huang Lai

This paper proposes a simple yet effective method to learn the hierarchical object shape model consisting of local contour fragments, which represents a category of shapes in the form of an And-Or tree.

Edge Detection

Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

no code implementations NeurIPS 2012 Xiaolong Wang, Liang Lin

A discriminative learning algorithm, extended from the CCCP [23], is proposed to train the model in a dynamical manner: the model structure (e. g., the configuration of the leaf-nodes associated with the or-nodes) is automatically determined with optimizing the multi-layer parameters during the iteration.

Graph Learning

Deep Joint Task Learning for Generic Object Extraction

no code implementations NeurIPS 2014 Xiaolong Wang, Liliang Zhang, Liang Lin, Zhujin Liang, WangMeng Zuo

We present a general joint task learning framework, in which each task (either object localization or object segmentation) is tackled via a multi-layer convolutional neural network, and the two networks work collaboratively to boost performance.

Object Localization Semantic Segmentation

Discriminatively Trained And-Or Graph Models for Object Shape Detection

no code implementations2 Feb 2015 Liang Lin, Xiaolong Wang, Wei Yang, Jian-Huang Lai

In this paper, we investigate a novel reconfigurable part-based model, namely And-Or graph model, to recognize object shapes in images.

Object Detection

An Expressive Deep Model for Human Action Parsing from A Single Image

no code implementations2 Feb 2015 Zhujin Liang, Xiaolong Wang, Rui Huang, Liang Lin

This paper aims at one newly raising task in vision and multimedia research: recognizing human actions from still images.

Action Parsing Action Understanding +1

3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks

no code implementations26 Jan 2015 Keze Wang, Xiaolong Wang, Liang Lin, Meng Wang, WangMeng Zuo

Our model thus advances existing approaches in two aspects: (i) it acts directly on the raw inputs (grayscale-depth data) to conduct recognition instead of relying on hand-crafted features, and (ii) the model structure can be dynamically adjusted accounting for the temporal variations of human activities, i. e. the network configuration is allowed to be partially activated during inference.

Activity Recognition

Designing Deep Networks for Surface Normal Estimation

no code implementations CVPR 2015 Xiaolong Wang, David F. Fouhey, Abhinav Gupta

We show by incorporating several constraints (man-made, manhattan world) and meaningful intermediate representations (room layout, edge labels) in the architecture leads to state of the art performance on surface normal estimation.

Scene Understanding

Radical-Enhanced Chinese Character Embedding

no code implementations18 Apr 2014 Yaming Sun, Lei Lin, Duyu Tang, Nan Yang, Zhenzhou Ji, Xiaolong Wang

We present a method to leverage radical for learning Chinese character embedding.

Chinese Word Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.