1 code implementation • ECCV 2020 • Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, Ram Nevatia
Tehchniques for manipulating images are advancing rapidly; while these are helpful for many useful tasks, they also pose a threat to society with their ability to create believable misinformation.
Ranked #7 on Image Manipulation Detection on CocoGlide
no code implementations • 17 Jun 2024 • Xuefeng Hu, Ke Zhang, Min Sun, Albert Chen, Cheng-Hao Kuo, Ram Nevatia
Large-scale pretrained vision-language models like CLIP have demonstrated remarkable zero-shot image classification capabilities across diverse domains.
1 code implementation • 2 Apr 2024 • Wanrong Zheng, Haidong Zhu, Zhaoheng Zheng, Ram Nevatia
We demonstrate that with refined skeletons, the performance of the gait recognition model can achieve further improvement on public gait recognition datasets compared with state-of-the-art methods without extra annotations.
Ranked #2 on Multiview Gait Recognition on CASIA-B
no code implementations • CVPR 2024 • Haidong Zhu, Pranav Budhwant, Zhaoheng Zheng, Ram Nevatia
When recognizing an individual's identity existing methods primarily rely on appearance which can be influenced by the background environment due to a lack of body shape awareness.
1 code implementation • CVPR 2024 • Zhaoheng Zheng, Jingmin Wei, Xuefeng Hu, Haidong Zhu, Ram Nevatia
Thus, we propose LLaMP, Large Language Models as Prompt learners, that produces adaptive prompts for the CLIP text encoder, establishing it as the connecting bridge.
1 code implementation • 27 Nov 2023 • Haidong Zhu, Tianyu Ding, Tianyi Chen, Ilya Zharkov, Ram Nevatia, Luming Liang
Generalizability and few-shot learning are key challenges in Neural Radiance Fields (NeRF), often due to the lack of a holistic understanding in pixel-level rendering.
no code implementations • 24 Oct 2023 • Haidong Zhu, Wanrong Zheng, Zhaoheng Zheng, Ram Nevatia
PSE encodes the body shape via binarized silhouettes, skeleton motions, and 3-D body shape, while AAE provides two levels of temporal appearance feature aggregation: attention-based feature aggregation and averaging aggregation.
1 code implementation • 4 Aug 2023 • Xuefeng Hu, Ke Zhang, Lu Xia, Albert Chen, Jiajia Luo, Yuyin Sun, Ken Wang, Nan Qiao, Xiao Zeng, Min Sun, Cheng-Hao Kuo, Ram Nevatia
Large-scale Pre-Training Vision-Language Model such as CLIP has demonstrated outstanding performance in zero-shot classification, e. g. achieving 76. 3% top-1 accuracy on ImageNet without seeing any example, which leads to potential benefits to many tasks that have no labeled data.
2 code implementations • 26 May 2023 • Zhaoheng Zheng, Haidong Zhu, Ram Nevatia
In this paper, we study the problem of Compositional Zero-Shot Learning (CZSL), which is to recognize novel attribute-object combinations with pre-existing concepts.
1 code implementation • 16 Apr 2023 • Haidong Zhu, Wanrong Zheng, Zhaoheng Zheng, Ram Nevatia
Two common modalities used for representing the walking sequence of a person are silhouettes and joint skeletons.
Ranked #5 on Multiview Gait Recognition on CASIA-B
1 code implementation • 16 Apr 2023 • Haidong Zhu, Zhaoheng Zheng, Wanrong Zheng, Ram Nevatia
This paper addresses the problem of human rendering in the video with temporal appearance constancy.
2 code implementations • 21 Mar 2023 • Zhuoming Liu, Xuefeng Hu, Ram Nevatia
We propose a new setting for detecting unseen objects called Zero-shot Annotation object Detection (ZAD).
no code implementations • 18 Dec 2022 • Haidong Zhu, Zhaoheng Zheng, Ram Nevatia
Gait recognition, which identifies individuals based on their walking patterns, is an important biometric technique since it can be observed from a distance and does not require the subject's cooperation.
no code implementations • 5 Jul 2022 • Ke Xu, Yao Xiao, Zhaoheng Zheng, Kaijie Cai, Ram Nevatia
Despite the diversity in attack patterns, adversarial patches tend to be highly textured and different in appearance from natural images.
no code implementations • 21 Oct 2021 • Xuefeng Hu, Gokhan Uzunbas, Sirius Chen, Rui Wang, Ashish Shah, Ram Nevatia, Ser-Nam Lim
We present a simple and effective way to estimate the batch-norm statistics during test time, to fast adapt a source model to target test samples.
no code implementations • 29 Sep 2021 • Zhengyu Yang, Zijian Hu, Xuefeng Hu, Ram Nevatia
With both entropy and rank maximization, our method surpasses the state-of-the-art on CIFAR-10 and Mini-ImageNet under the standard linear evaluation protocol.
no code implementations • 29 Sep 2021 • Xuefeng Hu, Mustafa Uzunbas, Bor-Chun Chen, Rui Wang, Ashish Shah, Ram Nevatia, Ser-Nam Lim
We present a simple and effective way to estimate the batch-norm statistics during test time, to fast adapt a source model to target test samples.
no code implementations • 25 Aug 2021 • Zhaoheng Zheng, Arka Sadhu, Ram Nevatia
We explore object detection with two attributes: color and material.
no code implementations • NAACL 2021 • Arka Sadhu, Kan Chen, Ram Nevatia
Video Question Answering (VidQA) evaluation metrics have been limited to a single-word answer or selecting a phrase from a fixed set of phrases.
1 code implementation • CVPR 2021 • Arka Sadhu, Tanmay Gupta, Mark Yatskar, Ram Nevatia, Aniruddha Kembhavi
We propose a new framework for understanding and representing related salient events in a video using visual semantic role labeling.
1 code implementation • CVPR 2021 • Zijian Hu, Zhengyu Yang, Xuefeng Hu, Ram Nevatia
Combining the Pair Loss with the techniques developed by the MixMatch family, our proposed SimPLE algorithm shows significant performance gains over previous algorithms on CIFAR-100 and Mini-ImageNet, and is on par with the state-of-the-art methods on CIFAR-10 and SVHN.
no code implementations • 5 Nov 2020 • Haidong Zhu, Arka Sadhu, Zhaoheng Zheng, Ram Nevatia
The annotated language queries available during training are limited, which also limits the variations of language combinations that a model can see during training.
no code implementations • 1 Sep 2020 • Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, Ram Nevatia
We present a novel framework, Spatial Pyramid Attention Network (SPAN) for detection and localization of multiple types of image manipulations.
2 code implementations • EMNLP 2020 • Xisen Jin, Junyi Du, Arka Sadhu, Ram Nevatia, Xiang Ren
To study this human-like language acquisition ability, we present VisCOLL, a visually grounded language learning task, which simulates the continual acquisition of compositional phrases from streaming visual scenes.
no code implementations • 17 Apr 2020 • Chuanzi He, Haidong Zhu, Jiyang Gao, Kan Chen, Ram Nevatia
The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of \texttt{<subject, predicate, object>}.
1 code implementation • CVPR 2020 • Arka Sadhu, Kan Chen, Ram Nevatia
We explore the task of Video Object Grounding (VOG), which grounds objects in videos referred to in natural language descriptions.
1 code implementation • ECCV 2020 • Yueqi Duan, Haidong Zhu, He Wang, Li Yi, Ram Nevatia, Leonidas J. Guibas
When learning to sketch, beginners start with simple and flexible shapes, and then gradually strive for more complex and accurate ones in the subsequent training sessions.
1 code implementation • ICCV 2019 • Arka Sadhu, Kan Chen, Ram Nevatia
A phrase grounding system localizes a particular object in an image referred to by a natural language query.
no code implementations • 24 Jul 2019 • Feng-Ju Chang, Xiang Yu, Ram Nevatia, Manmohan Chandraker
We address the challenging problem of generating facial attributes using a single image in an unconstrained pose.
no code implementations • CVPR 2019 • Zhenheng Yang, Dhruv Mahajan, Deepti Ghadiyaram, Ram Nevatia, Vignesh Ramanathan
Weakly supervised object detection aims at reducing the amount of supervision required to train detection models.
Ranked #1 on Weakly Supervised Object Detection on Charades
no code implementations • 7 Dec 2018 • Rama Kovvuri, Ram Nevatia
Phrase Grounding aims to detect and localize objects in images that are referred to and are queried by natural language phrases.
no code implementations • ICCV 2019 • JIyang Gao, Jiang Wang, Shengyang Dai, Li-Jia Li, Ram Nevatia
Comparing to standard Faster RCNN, it contains three highlights: an ensemble of two classification heads and a distillation head to avoid overfitting on noisy labels and improve the mining precision, masking the negative sample loss in box predictor to avoid the harm of false negative labels, and training box regression head only on seed annotations to eliminate the harm from inaccurate boundaries of mined bounding boxes.
3 code implementations • 21 Nov 2018 • Runzhou Ge, Jiyang Gao, Kan Chen, Ram Nevatia
Previous methods address the problem by considering features from video sliding windows and language queries and learning a subspace to encode their correlation, which ignore rich semantic cues about activities in videos and queries.
1 code implementation • 14 Oct 2018 • Chenxu Luo, Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia, Alan Yuille
Performance on the five tasks of depth estimation, optical flow estimation, odometry, moving object segmentation and scene flow estimation shows that our approach outperforms other SoTA methods.
1 code implementation • ECCV 2018 • Jiyang Gao, Kan Chen, Ram Nevatia
Temporal action proposal generation is an important task, akin to object proposals, temporal action proposals are intended to capture "clips" or temporal intervals in videos that are likely to contain an action.
Ranked #10 on Temporal Action Proposal Generation on ActivityNet-1.3
no code implementations • 27 Jun 2018 • Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia
The four types of information, i. e. 2D flow, camera pose, segment mask and depth maps, are integrated into a differentiable holistic 3D motion parser (HMP), where per-pixel 3D motion for rigid background and moving objects are recovered.
8 code implementations • 5 May 2018 • Jiyang Gao, Ram Nevatia
Although many methods on temporal modeling have been proposed, it is hard to directly compare these methods, because the choice of feature extractor and loss function also have a large impact on the final performance.
no code implementations • CVPR 2018 • Jiyang Gao, Runzhou Ge, Kan Chen, Ram Nevatia
Specifically, there are three salient aspects: (1) a co-memory attention mechanism that utilizes cues from both motion and appearance to generate attention; (2) a temporal conv-deconv network to generate multi-level contextual facts; (3) a dynamic fact ensemble method to construct temporal representation dynamically for different questions.
Ranked #31 on Visual Question Answering (VQA) on MSRVTT-QA
1 code implementation • CVPR 2018 • Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia
In our framework, the predicted depths, normals and edges are forced to be consistent all the time.
no code implementations • CVPR 2018 • Kan Chen, Jiyang Gao, Ram Nevatia
In this paper, we explore the consistency contained in both visual and language modalities, and leverage complementary external knowledge to facilitate weakly supervised grounding.
1 code implementation • 2 Feb 2018 • Feng-Ju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, Gerard Medioni
Our ExpNet CNN is applied directly to the intensities of a face image and regresses a 29D vector of 3D expression coefficients.
Ranked #1 on 3D Facial Expression Recognition on 2017_test set (using extra training data)
no code implementations • 21 Nov 2017 • Jiyang Gao, Zijian, Guo, Zhen Li, Ram Nevatia
To address these challenges, we propose a Knowledge Concentration method, which effectively transfers the knowledge from dozens of specialists (multiple teacher networks) into one single model (one student network) to classify 100K object categories.
5 code implementations • 24 Aug 2017 • Feng-Ju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, Gerard Medioni
Instead, we compare our FPN with existing methods by evaluating how they affect face recognition accuracy on the IJB-A and IJB-B benchmarks: using the same recognition pipeline, but varying the face alignment method.
Ranked #1 on Facial Landmark Detection on 300W (Mean Error Rate metric)
no code implementations • ICCV 2017 • Kan Chen, Rama Kovvuri, Ram Nevatia
Given a textual description of an image, phrase grounding localizes objects in the image referred by query phrases in the description.
no code implementations • 31 Jul 2017 • Zhenheng Yang, Jiyang Gao, Ram Nevatia
In this work, we address the problem of spatio-temporal action detection in temporally untrimmed videos.
1 code implementation • 16 Jul 2017 • Jiyang Gao, Zhenheng Yang, Ram Nevatia
RED takes multiple history representations as input and learns to anticipate a sequence of future representations.
12 code implementations • ICCV 2017 • Jiyang Gao, Chen Sun, Zhenheng Yang, Ram Nevatia
For evaluation, we adopt TaCoS dataset, and build a new dataset for this task on top of Charades by adding sentence temporal annotations, called Charades-STA.
no code implementations • 2 May 2017 • Jiyang Gao, Zhenheng Yang, Ram Nevatia
CBR uses temporal coordinate regression to refine the temporal boundaries of the sliding windows.
Ranked #6 on Temporal Action Localization on THUMOS’14 (mAP IOU@0.1 metric)
2 code implementations • CVPR 2017 • Kan Chen, Trung Bui, Fang Chen, Zhaowen Wang, Ram Nevatia
According to the intent of query, attention mechanism can be introduced to adaptively balance the importance of different modalities.
1 code implementation • ICCV 2017 • Jiyang Gao, Zhenheng Yang, Chen Sun, Kan Chen, Ram Nevatia
Temporal Action Proposal (TAP) generation is an important problem, as fast and accurate extraction of semantically important (e. g. human actions) segments from untrimmed videos is an important step for large-scale video analysis.
Ranked #8 on Action Recognition on THUMOS’14
no code implementations • 12 Sep 2016 • Zhenheng Yang, Ram Nevatia
The number of proposals is decreased after each level, and the areas of regions are decreased to more precisely fit the face.
no code implementations • 8 Sep 2016 • Jiyang Gao, Ram Nevatia
Besides, the action categories in such datasets are pre-defined and vocabularies are fixed.
no code implementations • 16 Apr 2016 • Jiyang Gao, Chen Sun, Ram Nevatia
It obtains candidate action concepts by extracting verb-object pairs from sentences and verifies their visualness with the associated images.
no code implementations • 23 Mar 2016 • Wael Abd-Almageed, Yue Wua, Stephen Rawlsa, Shai Harel, Tal Hassner, Iacopo Masi, Jongmoo Choi, Jatuporn Toy Leksut, Jungyeon Kim, Prem Natarajan, Ram Nevatia, Gerard Medioni
In our representation, a face image is processed by several pose-specific deep convolutional neural network (CNN) models to generate multiple pose-specific features.
Ranked #14 on Face Verification on IJB-A
no code implementations • 18 Nov 2015 • Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, Ram Nevatia
ABC-CNN determines an attention map for an image-question pair by convolving the image feature map with configurable convolutional kernels derived from the question's semantics.
no code implementations • CVPR 2016 • Chen Sun, Manohar Paluri, Ronan Collobert, Ram Nevatia, Lubomir Bourdev
This paper aims to classify and locate objects accurately and efficiently, without using bounding box annotations.
Ranked #5 on Weakly Supervised Object Detection on MS COCO
no code implementations • ICCV 2015 • Chen Sun, Chuang Gan, Ram Nevatia
Humans connect language and vision to perceive the world.
1 code implementation • 4 Apr 2015 • Chen Sun, Sanketh Shetty, Rahul Sukthankar, Ram Nevatia
To solve this problem, we propose a simple yet effective method that takes weak video labels and noisy image labels as input, and generates localized action frames as output.
no code implementations • CVPR 2014 • Chen Sun, Ram Nevatia
Our goal is to find the important segments and capture their information for event classification and recounting.
no code implementations • CVPR 2013 • Pramod Sharma, Ram Nevatia
In this work, we present a novel and efficient detector adaptation method which improves the performance of an offline trained classifier (baseline classifier) by adapting it to new test datasets.