no code implementations • 5 Dec 2024 • Nefeli Andreou, Varsha Vivek, Ying Wang, Alex Vorobiov, Tiffany Deng, Raja Bala, Larry Davis, Betty Mohler Tesch
In particular, we use BodyMetric to benchmark the generation ability of text-to-image models to produce realistic human bodies.
no code implementations • 18 Apr 2022 • Hanyu Wang, Kamal Gupta, Larry Davis, Abhinav Shrivastava
We present Neural Space-filling Curves (SFCs), a data-driven approach to infer a context-based scan order for a set of images.
no code implementations • 31 Jan 2022 • Max Ehrlich, Jon Barker, Namitha Padmanabhan, Larry Davis, Andrew Tao, Bryan Catanzaro, Abhinav Shrivastava
Video compression is a central feature of the modern internet powering technologies from social media to video conferencing.
no code implementations • ICCV 2021 • Soubhik Sanyal, Alex Vorobiov, Timo Bolkart, Matthew Loper, Betty Mohler, Larry Davis, Javier Romero, Michael J. Black
Synthesizing images of a person in novel poses from a single image is a highly ambiguous task.
no code implementations • 20 May 2021 • Yuxiao Chen, Jianbo Yuan, Long Zhao, Tianlang Chen, Rui Luo, Larry Davis, Dimitris N. Metaxas
Cross-modal attention mechanisms have been widely applied to the image-text matching task and have achieved remarkable improvements thanks to its capability of learning fine-grained relevance across different modalities.
no code implementations • 16 May 2021 • Arthita Ghosh, Max Ehrlich, Larry Davis, Rama Chellappa
Urban material recognition in remote sensing imagery is a highly relevant, yet extremely challenging problem due to the difficulty of obtaining human annotations, especially on low resolution satellite images.
1 code implementation • ICCV 2021 • Xing Zhang, Zuxuan Wu, Zejia Weng, Huazhu Fu, Jingjing Chen, Yu-Gang Jiang, Larry Davis
In this paper, we introduce VideoLT, a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition.
1 code implementation • 24 Apr 2021 • Tianrui Guan, Jun Wang, Shiyi Lan, Rohan Chandra, Zuxuan Wu, Larry Davis, Dinesh Manocha
We present a novel architecture for 3D object detection, M3DeTR, which combines different point cloud representations (raw, voxels, bird-eye view) with different feature scales based on multi-scale feature pyramids.
Ranked #1 on 3D Object Detection on KITTI Cyclist Moderate val
no code implementations • CVPR 2021 • Xitong Yang, Haoqi Fan, Lorenzo Torresani, Larry Davis, Heng Wang
The standard way of training video models entails sampling at each iteration a single clip from a video and optimizing the clip prediction with respect to the video-level label.
1 code implementation • ICCV 2021 • Ning Yu, Guilin Liu, Aysegul Dundar, Andrew Tao, Bryan Catanzaro, Larry Davis, Mario Fritz
Lastly, we study different attention architectures in the discriminator, and propose a reference attention mechanism.
1 code implementation • CVPR 2021 • Ahmed Taha, Abhinav Shrivastava, Larry Davis
We evaluate KE using relatively small datasets (e. g., CUB-200) and randomly initialized deep networks.
1 code implementation • 4 Mar 2021 • Ahmed Taha, Alex Hanson, Abhinav Shrivastava, Larry Davis
The SVMax regularizer supports both supervised and unsupervised learning.
1 code implementation • ICLR 2022 • Ning Yu, Vladislav Skripniuk, Dingfan Chen, Larry Davis, Mario Fritz
Over the past years, deep generative models have achieved a new level of performance.
1 code implementation • CVPR 2021 • Sharath Girish, Shishira R. Maiya, Kamal Gupta, Hao Chen, Larry Davis, Abhinav Shrivastava
The recently proposed Lottery Ticket Hypothesis (LTH) states that deep neural networks trained on large datasets contain smaller subnetworks that achieve on par performance as the dense networks.
no code implementations • 17 Nov 2020 • Max Ehrlich, Larry Davis, Ser-Nam Lim, Abhinav Shrivastava
We show that there is a significant penalty on common performance metrics for high compression.
no code implementations • 20 Jul 2020 • Xitong Yang, Xiaodong Yang, Sifei Liu, Deqing Sun, Larry Davis, Jan Kautz
Thus, the motion features at higher levels are trained to gradually capture semantic dynamics and evolve more discriminative for action recognition.
1 code implementation • 19 Jul 2020 • Rohun Tripathi, Vasu Singla, Mahyar Najibi, Bharat Singh, Abhishek Sharma, Larry Davis
The widely adopted sequential variant of Non Maximum Suppression (or Greedy-NMS) is a crucial module for object-detection pipelines.
2 code implementations • ECCV 2020 • Ahmed Taha, Xitong Yang, Abhinav Shrivastava, Larry Davis
Compared to classification networks, attention visualization for retrieval networks is hardly studied.
2 code implementations • ICCV 2021 • Kamal Gupta, Justin Lazarow, Alessandro Achille, Larry Davis, Vijay Mahadevan, Abhinav Shrivastava
Generating a new layout or extending an existing layout requires understanding the relationships between these primitives.
1 code implementation • ECCV 2020 • Max Ehrlich, Larry Davis, Ser-Nam Lim, Abhinav Shrivastava
The JPEG image compression algorithm is the most popular method of image compression because of its ability for large compression ratios.
1 code implementation • ECCV 2020 • Ning Yu, Ke Li, Peng Zhou, Jitendra Malik, Larry Davis, Mario Fritz
Generative Adversarial Networks (GANs) have brought about rapid progress towards generating photorealistic images.
no code implementations • IJCNLP 2019 • Mingfei Gao, Larry Davis, Richard Socher, Caiming Xiong
We propose weakly supervised language localization networks (WSLLN) to detect events in long, untrimmed videos given language queries.
2 code implementations • ECCV 2020 • Zuxuan Wu, Ser-Nam Lim, Larry Davis, Tom Goldstein
We present a systematic study of adversarial attacks on state-of-the-art object detection frameworks.
no code implementations • 27 Sep 2019 • Xiaonan Zhao, Huan Qi, Rui Luo, Larry Davis
We address the problem of distance metric learning in visual similarity search, defined as learning an image embedding model which projects images into Euclidean space where semantically and visually similar images are closer and dissimilar images are further from one another.
no code implementations • 25 Sep 2019 • Moustafa Meshry, Yixuan Ren, Ricardo Martin-Brualla, Larry Davis, Abhinav Shrivastava
Then we train a generator to transform an input image along with a style-code to the output domain.
1 code implementation • CVPR 2019 • Xitong Yang, Xiaodong Yang, Ming-Yu Liu, Fanyi Xiao, Larry Davis, Jan Kautz
In this paper, we propose Spatio-TEmporal Progressive (STEP) action detector---a progressive learning framework for spatio-temporal action detection in videos.
Ranked #7 on Action Detection on UCF101-24
no code implementations • 7 Feb 2019 • Ahmed Taha, Yi-Ting Chen, Teruhisa Misu, Abhinav Shrivastava, Larry Davis
We introduce an unsupervised formulation to estimate heteroscedastic uncertainty in retrieval systems.
1 code implementation • 24 Jan 2019 • Ahmed Taha, Yi-Ting Chen, Teruhisa Misu, Abhinav Shrivastava, Larry Davis
We employ triplet loss as a feature embedding regularizer to boost classification performance.
no code implementations • 23 Jan 2019 • Ahmed Taha, Yi-Ting Chen, Xitong Yang, Teruhisa Misu, Larry Davis
We cast visual retrieval as a regression problem by posing triplet loss as a regression loss.
1 code implementation • ICCV 2019 • Max Ehrlich, Larry Davis
We introduce a general method of performing Residual Network inference and learning in the JPEG transform domain that allows the network to consume compressed images as input.
2 code implementations • ICCV 2019 • Ning Yu, Larry Davis, Mario Fritz
Our experiments show that (1) GANs carry distinct model fingerprints and leave stable fingerprints in their generated images, which support image attribution; (2) even minor differences in GAN training can result in different fingerprints, which enables fine-grained model authentication; (3) fingerprints persist across different image frequencies and patches and are not biased by GAN artifacts; (4) fingerprint finetuning is effective in immunizing against five types of adversarial image perturbations; and (5) comparisons also show our learned fingerprints consistently outperform several baselines in a variety of setups.
no code implementations • 16 Jun 2018 • Ahmed Taha, Moustafa Meshry, Xitong Yang, Yi-Ting Chen, Larry Davis
The self-supervised pre-trained weights effectiveness is validated on the action recognition task.
no code implementations • 2 May 2018 • Xianzhi Du, Mostafa El-Khamy, Vlad I. Morariu, Jungwon Lee, Larry Davis
The classification system further classifies the generated candidates based on opinions of multiple deep verification networks and a fusion network which utilizes a novel soft-rejection fusion method to adjust the confidence in the detection results.
1 code implementation • NAACL 2018 • Varun Manjunatha, Mohit Iyyer, Jordan Boyd-Graber, Larry Davis
Automatic colorization is the process of adding color to greyscale images.
no code implementations • 13 Apr 2018 • Xiaoqing Yin, Xiyang Dai, Xinchao Wang, Maojun Zhang, DaCheng Tao, Larry Davis
In this paper, we propose the first dedicated end-to-end deep learning approach for motion boundary detection, which we term as MoBoNet.
no code implementations • 30 Mar 2018 • Varun Manjunatha, Srikumar Ramalingam, Tim K. Marks, Larry Davis
To accomplish this, we use a submodular set function to model the accuracy achievable on a new task when the features have been learned on a given subset of classes of the source dataset.
1 code implementation • 14 Mar 2018 • Pouya Samangouei, Mahyar Najibi, Larry Davis, Rama Chellappa
In this paper, we introduce the Face Magnifier Network (Face-MageNet), a face detector based on the Faster-RCNN framework which enables the flow of discriminative information of small scale faces to the classifier without any skip or residual connections.
no code implementations • 22 Dec 2017 • Xianzhi Du, Xiaolong Wang, Dawei Li, Jingwen Zhu, Serafettin Tasci, Cameron Upright, Stephen Walsh, Larry Davis
Compared to the general semantic segmentation problem, portrait segmentation has higher precision requirement on boundary area.
6 code implementations • ICCV 2017 • Mahyar Najibi, Pouya Samangouei, Rama Chellappa, Larry Davis
Surprisingly, with a headless VGG-16, SSH beats the ResNet-101-based state-of-the-art on the WIDER dataset.
3 code implementations • CVPR 2017 • Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daumé III, Larry Davis
While computers can now describe what is explicitly depicted in natural images, in this paper we examine whether they can understand the closure-driven narratives conveyed by stylized artwork and dialogue in comic book panels.
no code implementations • 12 Jul 2016 • Sohil Shah, Kuldeep Kulkarni, Arijit Biswas, Ankit Gandhi, Om Deshmukh, Larry Davis
Typical textual descriptions that accompany online videos are 'weak': i. e., they mention the main concepts in the video but not their corresponding spatio-temporal locations.
no code implementations • 3 Feb 2016 • Zhuolin Jiang, Yaming Wang, Larry Davis, Walt Andrews, Viktor Rozgic
Deep Convolutional Neural Networks (CNN) enforces supervised information only at the output layer, and hidden layers are trained by back propagating the prediction error from the output layer without explicit supervision.
no code implementations • 9 Jul 2015 • Ran He, Tieniu Tan, Larry Davis, Zhenan Sun
This paper presents a structured ordinal measure method for video-based face recognition that simultaneously learns ordinal filters and structured ordinal features.