no code implementations • ECCV 2020 • Lele Chen, Guofeng Cui, Celong Liu, Zhong Li, Ziyi Kou, Yi Xu, Chenliang Xu
Monocular 3D object detection is a challenging task due to unreliable depth, resulting in a distinct performance gap between monocular and LiDAR-based approaches.
no code implementations • 31 May 2023 • Ali Vosoughi, Shijian Deng, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo
In this paper, we first model a confounding effect that causes language and vision bias simultaneously, then propose a counterfactual inference to remove the influence of this effect.
no code implementations • 15 May 2023 • Jinyang Jiang, Zeliang Zhang, Chenliang Xu, Zhaofei Yu, Yijie Peng
We develop the likelihood ratio (LR) method, a new gradient estimation method, for training a broad range of neural network architectures, including convolutional neural networks, recurrent neural networks, graph neural networks, and spiking neural networks, without recursive gradient computation.
1 code implementation • CVPR 2023 • Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
In this paper, we explore the challenging egocentric audio-visual object localization task and observe that 1) egomotion commonly exists in first-person recordings, even within a short duration; 2) The out-of-view sound components can be created while wearers shift their attention.
no code implementations • 4 Feb 2023 • Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
Human perception of the complex world relies on a comprehensive analysis of multi-modal signals, and the co-occurrences of audio and video signals provide humans with rich cues.
no code implementations • 30 Jan 2023 • Zeliang Zhang, Peihan Liu, Xiaosen Wang, Chenliang Xu
Motivated by this finding, we argue that the information of adversarial perturbations near the benign sample, especially the direction, benefits more on the transferability.
1 code implementation • CVPR 2023 • Zhiheng Li, Ivan Evtimov, Albert Gordo, Caner Hazirbas, Tal Hassner, Cristian Canton Ferrer, Chenliang Xu, Mark Ibrahim
Key to advancing the reliability of vision systems is understanding whether existing methods can overcome multiple shortcuts or struggle in a Whac-A-Mole game, i. e., where mitigating one shortcut amplifies reliance on others.
Ranked #1 on
Out-of-Distribution Generalization
on ImageNet-W
1 code implementation • 20 Jul 2022 • Zhiheng Li, Anthony Hoogs, Chenliang Xu
By training in an alternate manner, the discoverer tries to find multiple unknown biases of the classifier without any annotations of biases, and the classifier aims at unlearning the biases identified by the discoverer.
Ranked #1 on
Out-of-Distribution Generalization
on ImageNet-W
1 code implementation • CVPR 2022 • Zhiheng Li, Martin Renqiang Min, Kai Li, Chenliang Xu
Based on the identified latent directions of attributes, we propose Compositional Attribute Adjustment to adjust the latent code, resulting in better compositionality of image synthesis.
1 code implementation • CVPR 2022 • Guangyuan Li, Jun Lv, Yapeng Tian, Qi Dou, Chengyan Wang, Chenliang Xu, Jing Qin
However, existing methods still have two shortcomings: (1) they neglect that the multi-contrast features at different scales contain different anatomical details and hence lack effective mechanisms to match and fuse these features for better reconstruction; and (2) they are still deficient in capturing long-range dependencies, which are essential for the regions with complicated anatomical structures.
1 code implementation • CVPR 2022 • Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu
In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos.
Ranked #3 on
Audio-visual Question Answering
on MUSIC-AVQA
no code implementations • 18 Jan 2022 • Zhengyuan Yang, Jingen Liu, Jing Huang, Xiaodong He, Tao Mei, Chenliang Xu, Jiebo Luo
In this study, we aim to predict the plausible future action steps given an observation of the past and study the task of instructional activity anticipation.
no code implementations • CVPR 2022 • Jing Shi, Ning Xu, Haitian Zheng, Alex Smith, Jiebo Luo, Chenliang Xu
Recently, large pretrained models (e. g., BERT, StyleGAN, CLIP) show great knowledge transfer and generalization capability on various downstream tasks within their domains.
no code implementations • 12 Dec 2021 • Guangyu Sun, Zhang Liu, Lianggong Wen, Jing Shi, Chenliang Xu
Video anomaly detection aims to identify abnormal events that occurred in videos.
no code implementations • 30 Nov 2021 • Jing Shi, Ning Xu, Haitian Zheng, Alex Smith, Jiebo Luo, Chenliang Xu
Recently, large pretrained models (e. g., BERT, StyleGAN, CLIP) have shown great knowledge transfer and generalization capability on various downstream tasks within their domains.
no code implementations • 10 Nov 2021 • Sizhe Li, Yapeng Tian, Chenliang Xu
Leveraging temporal synchronization and association within sight and sound is an essential step towards robust localization of sounding objects.
no code implementations • ICCV 2021 • Jing Bi, Jiebo Luo, Chenliang Xu
In this work, we leverage instructional videos to study humans' decision-making processes, focusing on learning a model to plan goal-directed actions in real-life videos.
no code implementations • 29 Sep 2021 • Samuel Lerman, Jing Bi, Chenliang Xu
rQdia (pronounced “Arcadia”) regularizes Q-value distributions with augmented images in pixel-based deep reinforcement learning.
1 code implementation • ICCV 2021 • Yiwu Zhong, Jing Shi, Jianwei Yang, Chenliang Xu, Yin Li
To bridge the gap between images and texts, we leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
1 code implementation • CVPR 2021 • Jing Shi, Ning Xu, Yihang Xu, Trung Bui, Franck Dernoncourt, Chenliang Xu
Recently, language-guided global image editing draws increasing attention with growing application potentials.
1 code implementation • ICCV 2021 • Zhiheng Li, Chenliang Xu
To help human experts better find the AI algorithms' biases, we study a new problem in this work -- for a classifier that predicts a target attribute of the input image, discover its unknown biased attribute.
1 code implementation • 15 Apr 2021 • Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, Chenliang Xu
A na\"ive method is to decompose it into two sub-tasks: video frame interpolation (VFI) and video super-resolution (VSR).
Space-time Video Super-resolution
Video Frame Interpolation
+1
1 code implementation • CVPR 2021 • Yapeng Tian, Chenliang Xu
In this paper, we propose to make a systematic study on machines multisensory perception under attacks.
1 code implementation • CVPR 2021 • Yapeng Tian, Di Hu, Chenliang Xu
There are rich synchronized audio and visual events in our daily life.
no code implementations • CVPR 2021 • Lele Chen, Chen Cao, Fernando de la Torre, Jason Saragih, Chenliang Xu, Yaser Sheikh
This paper addresses previous limitations by learning a deep learning lighting model, that in combination with a high-quality 3D face tracking algorithm, provides a method for subtle and robust facial motion transfer from a regular video to a 3D photo-realistic avatar.
no code implementations • ICCV 2021 • Jing Shi, Yiwu Zhong, Ning Xu, Yin Li, Chenliang Xu
We investigate the weakly-supervised scene graph generation, which is a challenging task since no correspondence of label and object is provided.
no code implementations • 5 Oct 2020 • Jing Shi, Ning Xu, Trung Bui, Franck Dernoncourt, Zheng Wen, Chenliang Xu
To solve this new task, we first present a new language-driven image editing dataset that supports both local and global editing with editing operation and mask annotations.
no code implementations • 3 Oct 2020 • Jing Shi, Jing Bi, Yingru Liu, Chenliang Xu
The marriage of recurrent neural networks and neural ordinary differential networks (ODE-RNN) is effective in modeling irregularly-observed sequences.
no code implementations • 1 Aug 2020 • Jing Shi, Zhiheng Li, Haitian Zheng, Yihang Xu, Tianyou Xiao, Weitao Tan, Xiaoning Guo, Sizhe Li, Bin Yang, Zhexin Xu, Ruitao Lin, Zhongkai Shangguan, Yue Zhao, Jingwen Wang, Rohan Sharma, Surya Iyer, Ajinkya Deshmukh, Raunak Mahalik, Srishti Singh, Jayant G Rohra, Yi-Peng Zhang, Tongyu Yang, Xuan Wen, Ethan Fahnestock, Bryce Ikeda, Ian Lawson, Alan Finkelstein, Kehao Guo, Richard Magnotti, Andrew Sexton, Jeet Ketan Thaker, Yiyang Su, Chenliang Xu
This technical report summarizes submissions and compiles from Actor-Action video classification challenge held as a final project in CSC 249/449 Machine Vision course (Spring 2020) at University of Rochester
1 code implementation • ECCV 2020 • Yapeng Tian, DIngzeyu Li, Chenliang Xu
In this paper, we introduce a new problem, named audio-visual video parsing, which aims to parse a video into temporal event segments and label them as either audible, visible, or both.
1 code implementation • 16 Jul 2020 • Lele Chen, Guofeng Cui, Celong Liu, Zhong Li, Ziyi Kou, Yi Xu, Chenliang Xu
When people deliver a speech, they naturally move heads, and this rhythmic head motion conveys prosodic information.
2 code implementations • 24 Jun 2020 • Zhiheng Li, Geemi P. Wellawatte, Maghesree Chakraborty, Heta A. Gandhi, Chenliang Xu, Andrew D. White
The selection of coarse-grained (CG) mapping operators is a critical step for CG molecular dynamics (MD) simulation.
1 code implementation • ICCV 2021 • Samuel Lerman, Chenliang Xu, Charles Venuto, Henry Kautz
We present a simple yet highly generalizable method for explaining interacting parts within a neural network's reasoning process.
1 code implementation • 7 May 2020 • Lele Chen, Guofeng Cui, Ziyi Kou, Haitian Zheng, Chenliang Xu
In this work, we present a carefully-designed benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies.
no code implementations • CVPR 2020 • Jie Chen, Zhiheng Li, Jiebo Luo, Chenliang Xu
Instead of blindly trusting quality-inconsistent PAs, WS^2 employs a learning-based selection to select effective PAs and a novel region integrity criterion as a stopping condition for weakly-supervised training.
no code implementations • CVPR 2020 • Zhiheng Li, Wenxuan Bao, Jiayang Zheng, Chenliang Xu
The perceptual-based grouping process produces a hierarchical and compositional image representation that helps both human and machine vision systems recognize heterogeneous visual concepts.
3 code implementations • CVPR 2020 • Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, Chenliang Xu
Rather than synthesizing missing LR video frames as VFI networks do, we firstly temporally interpolate LR frame features in missing LR video frames capturing local temporal contexts by the proposed feature temporal interpolation network.
Ranked #4 on
Video Frame Interpolation
on Vid4 - 4x upscaling
Space-time Video Super-resolution
Video Frame Interpolation
+1
no code implementations • 23 Feb 2020 • Burkay Donderici, Caleb New, Chenliang Xu
Deep neural networks can form high-level hierarchical representations of input data.
2 code implementations • 17 Jan 2020 • Lele Chen, Justin Tian, Guo Li, Cheng-Haw Wu, Erh-Kan King, Kuan-Ting Chen, Shao-Hang Hsieh, Chenliang Xu
To overcome those limitations, we propose a novel self-supervised model to synthesize garment images with disentangled attributes (e. g., collar and sleeves) without paired data.
1 code implementation • 21 Dec 2019 • Yapeng Tian, Chenliang Xu, DIngzeyu Li
We are interested in applying deep networks in the absence of training dataset.
no code implementations • 4 Dec 2019 • Jing Bi, Vikas Dhiman, Tianyou Xiao, Chenliang Xu
The recently proposed Learning from Interventions (LfI) overcomes this limitation by using an expert overseer.
no code implementations • 17 Nov 2019 • Ziyi Kou, Guofeng Cui, Shaojie Wang, Wentian Zhao, Chenliang Xu
In this paper, we propose a confidence segmentation (ConfSeg) module that builds confidence score for each pixel in CAM without introducing additional hyper-parameters.
no code implementations • 30 Sep 2019 • Haitian Zheng, Lele Chen, Chenliang Xu, Jiebo Luo
Pose guided synthesis aims to generate a new image in an arbitrary target pose while preserving the appearance details from the source image.
1 code implementation • 9 May 2019 • Lele Chen, Ross K. Maddox, Zhiyao Duan, Chenliang Xu
We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions.
no code implementations • 13 Dec 2018 • Hao Huang, Luowei Zhou, Wei zhang, Jason J. Corso, Chenliang Xu
Video action recognition, a critical problem in video understanding, has been gaining increasing attention.
no code implementations • 7 Dec 2018 • Yapeng Tian, Chenxiao Guan, Justin Goodman, Marc Moore, Chenliang Xu
To achieve this, we propose a multimodal convolutional neural network-based audio-visual video captioning framework and introduce a modality-aware module for exploring modality selection during sentence generation.
2 code implementations • 7 Dec 2018 • Yapeng Tian, Yulun Zhang, Yun Fu, Chenliang Xu
Video super-resolution (VSR) aims to restore a photo-realistic high-resolution (HR) video frame from both its corresponding low-resolution (LR) frame (reference frame) and multiple neighboring frames (supporting frames).
no code implementations • 2 Dec 2018 • Shaojie Wang, Wentian Zhao, Ziyi Kou, Chenliang Xu
Furthermore, we study multiple modalities including description and transcripts for the purpose of boosting video understanding.
no code implementations • 2 Dec 2018 • Wentian Zhao, Shaojie Wang, Zhihuai Xie, Jing Shi, Chenliang Xu
To overcome such limitation, we propose a GAN based EM learning framework that can maximize the likelihood of images and estimate the latent variables with only the constraint of L-Lipschitz continuity.
no code implementations • 1 Nov 2018 • Jing Bi, Tianyou Xiao, Qiuyue Sun, Chenliang Xu
Deep neural networks trained on demonstrations of human actions give robot the ability to perform self-driving on the road.
1 code implementation • CVPR 2018 • Li Ding, Chenliang Xu
In this work, we address the task of weakly-supervised human action segmentation in long, untrimmed videos.
1 code implementation • ECCV 2018 • Lele Chen, Zhiheng Li, Ross K. Maddox, Zhiyao Duan, Chenliang Xu
In this paper, we consider a task of such: given an arbitrary audio speech and one lip image of arbitrary target identity, generate synthesized lip movements of the target identity saying the speech.
no code implementations • 26 Mar 2018 • Sefik Emre Eskimez, Ross K. Maddox, Chenliang Xu, Zhiyao Duan
In this paper, we present a system that can generate landmark points of a talking face from an acoustic speech in real time.
2 code implementations • ECCV 2018 • Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, Chenliang Xu
In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos.
3 code implementations • 18 Jan 2018 • Lele Chen, Yue Wu, Adora M. DSouza, Anas Z. Abidin, Axel Wismuller, Chenliang Xu
The major difficulty of our segmentation model comes with the fact that the location, structure, and shape of gliomas vary significantly among different patients.
no code implementations • ICLR 2018 • Li Ding, Chenliang Xu
Action segmentation as a milestone towards building automatic systems to understand untrimmed videos has received considerable attention in the recent years.
no code implementations • CVPR 2017 • Yan Yan, Chenliang Xu, Dawen Cai, Jason J. Corso
However, current methods for detailed understanding of actor and action have significant limitations: they require large amounts of finely labeled data, and they fail to capture any internal relationship among actors and actions.
no code implementations • 22 May 2017 • Li Ding, Chenliang Xu
Action segmentation as a milestone towards building automatic systems to understand untrimmed videos has received considerable attention in the recent years.
Ranked #4 on
Action Segmentation
on JIGSAWS
no code implementations • 27 Apr 2017 • Chenliang Xu, Caiming Xiong, Jason J. Corso
Despite the rapid progress, existing works on action understanding focus strictly on one type of action agent, which we call actor---a human adult, ignoring the diversity of actions performed by other actors.
no code implementations • 26 Apr 2017 • Lele Chen, Sudhanshu Srivastava, Zhiyao Duan, Chenliang Xu
Being the first to explore this new problem, we compose two new datasets with pairs of images and sounds of musical performances of different instruments.
1 code implementation • 28 Mar 2017 • Luowei Zhou, Chenliang Xu, Jason J. Corso
To answer this question, we introduce the problem of procedure segmentation--to segment a video procedure into category-independent procedure segments.
1 code implementation • 15 Jun 2016 • Luowei Zhou, Chenliang Xu, Parker Koch, Jason J. Corso
Attention mechanisms have attracted considerable interest in image captioning due to its powerful performance.
no code implementations • 30 Dec 2015 • Chenliang Xu, Jason J. Corso
Supervoxel segmentation has strong potential to be incorporated into early video analysis as superpixel segmentation has in image analysis.
no code implementations • CVPR 2016 • Chenliang Xu, Jason J. Corso
Actor-action semantic segmentation made an important step toward advanced video understanding problems: what action is happening; who is performing the action; and where is the action in space-time.
no code implementations • CVPR 2015 • Chenliang Xu, Shao-Hang Hsieh, Caiming Xiong, Jason J. Corso
There is no work we know of on simultaneously inferring actors and actions in the video, not to mention a dataset to experiment with.
no code implementations • 13 Nov 2013 • Chenliang Xu, Richard F. Doell, Stephen José Hanson, Catherine Hanson, Jason J. Corso
In this paper, we conduct a systematic study of how well the actor and action semantics are retained in video supervoxel segmentation.
no code implementations • CVPR 2013 • Pradipto Das, Chenliang Xu, Richard F. Doell, Jason J. Corso
The problem of describing images through natural language has gained importance in the computer vision community.