no code implementations • 12 Dec 2024 • Yihong Sun, Hao Zhou, Liangzhe Yuan, Jennifer J. Sun, Yandong Li, Xuhui Jia, Hartwig Adam, Bharath Hariharan, Long Zhao, Ting Liu
We explore a novel video creation experience, namely Video Creation by Demonstration.
no code implementations • 5 Oct 2024 • Long Zhao, Sanghyun Woo, Ziyu Wan, Yandong Li, Han Zhang, Boqing Gong, Hartwig Adam, Xuhui Jia, Ting Liu
Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input.
no code implementations • 20 Feb 2024 • Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong
We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model.
no code implementations • CVPR 2024 • Yue Zhao, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, Hartwig Adam, Ting Liu, Boqing Gong, Philipp Krähenbühl, Liangzhe Yuan
Our best model outperforms state-of-the-art methods on MSR-VTT zero-shot text-to-video retrieval by 6%.
1 code implementation • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang
We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.
Ranked #4 on
Text-to-Video Generation
on MSR-VTT
1 code implementation • 9 Nov 2023 • Xuan Yang, Liangzhe Yuan, Kimberly Wilber, Astuti Sharma, Xiuye Gu, Siyuan Qiao, Stephanie Debats, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Liang-Chieh Chen
Despite this shift, methods based on the per-pixel prediction paradigm still dominate the benchmarks on the other dense prediction tasks that require continuous outputs, such as depth estimation and surface normal prediction.
Ranked #2 on
Surface Normals Estimation
on NYU Depth v2
no code implementations • 21 Sep 2023 • Sagar M. Waghmare, Kimberly Wilber, Dave Hawkey, Xuan Yang, Matthew Wilson, Stephanie Debats, Cattalyya Nuengsigkapian, Astuti Sharma, Lars Pandikow, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko
To overcome this obstacle, we present SANPO, a large-scale egocentric video dataset designed for dense prediction in outdoor human navigation environments.
1 code implementation • 6 Jul 2023 • Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong
We evaluate the video understanding capabilities of existing foundation models (FMs) using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring an FM for downstream tasks.
no code implementations • NeurIPS 2023 • Hassan Akbari, Dan Kondratyuk, Yin Cui, Rachel Hornung, Huisheng Wang, Hartwig Adam
We conduct extensive empirical studies and reveal the following key insights: 1) Performing gradient descent updates by alternating on diverse modalities, loss functions, and tasks, with varying input resolutions, efficiently improves the model.
Ranked #2 on
Zero-Shot Action Recognition
on UCF101
(using extra training data)
1 code implementation • ICCV 2023 • Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu
To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).
Human-Object Interaction Detection
Relationship Detection
+2
1 code implementation • CVPR 2023 • Yunhao Ge, Jie Ren, Andrew Gallagher, Yuxiao Wang, Ming-Hsuan Yang, Hartwig Adam, Laurent Itti, Balaji Lakshminarayanan, Jiaping Zhao
We also show that our method improves across ImageNet shifted datasets, four other datasets, and other model architectures such as LiT.
2 code implementations • 4 Oct 2022 • Chenglin Yang, Siyuan Qiao, Qihang Yu, Xiaoding Yuan, Yukun Zhu, Alan Yuille, Hartwig Adam, Liang-Chieh Chen
The tiny-MOAT family is also benchmarked on downstream tasks, serving as a baseline for the community.
Ranked #1 on
Object Detection
on MS COCO
1 code implementation • 21 Jul 2022 • Grant van Horn, Rui Qian, Kimberly Wilber, Hartwig Adam, Oisin Mac Aodha, Serge Belongie
We thoroughly benchmark audiovisual classification performance and modality fusion experiments through the use of state-of-the-art transformer methods.
3 code implementations • 8 Jul 2022 • Qihang Yu, Huiyu Wang, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
However, we observe that most existing transformer-based vision models simply borrow the idea from NLP, neglecting the crucial difference between languages and images, particularly the extremely large sequence length of spatially flattened pixel features.
Ranked #2 on
Panoptic Segmentation
on COCO test-dev
2 code implementations • CVPR 2022 • Qihang Yu, Huiyu Wang, Dahun Kim, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
We propose Clustering Mask Transformer (CMT-DeepLab), a transformer-based framework for panoptic segmentation designed around clustering.
Ranked #6 on
Panoptic Segmentation
on COCO test-dev
no code implementations • CVPR 2022 • Dahun Kim, Jun Xie, Huiyu Wang, Siyuan Qiao, Qihang Yu, Hong-Seok Kim, Hartwig Adam, In So Kweon, Liang-Chieh Chen
We present TubeFormer-DeepLab, the first attempt to tackle multiple core video segmentation tasks in a unified manner.
no code implementations • 23 Mar 2022 • Hsin-Ping Huang, Deqing Sun, Yaojie Liu, Wen-Sheng Chu, Taihong Xiao, Jinwei Yuan, Hartwig Adam, Ming-Hsuan Yang
While recent face anti-spoofing methods perform well under the intra-domain setups, an effective approach needs to account for much larger appearance variations of images acquired in complex scenes with different sensors for robust performance.
2 code implementations • ICLR 2022 • Juntang Zhuang, Boqing Gong, Liangzhe Yuan, Yin Cui, Hartwig Adam, Nicha Dvornek, Sekhar Tatikonda, James Duncan, Ting Liu
Instead, we define a \textit{surrogate gap}, a measure equivalent to the dominant eigenvalue of Hessian at a local minimum when the radius of the neighborhood (to derive the perturbed loss) is small.
1 code implementation • CVPR 2022 • Liangzhe Yuan, Rui Qian, Yin Cui, Boqing Gong, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu
Modern self-supervised learning algorithms typically enforce persistency of instance representations across views.
no code implementations • 8 Dec 2021 • Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge Belongie, Ming-Hsuan Yang, Hartwig Adam, Yin Cui
The training objective consists of two parts: a fine-grained temporal learning objective to maximize the similarity between corresponding temporal embeddings in the short clip and the long clip, and a persistent temporal learning objective to pull together global embeddings of the two clips.
4 code implementations • 17 Jun 2021 • Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan, Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen
DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision.
1 code implementation • 26 Apr 2021 • Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong
To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.
1 code implementation • 23 Feb 2021 • Mark Weber, Jun Xie, Maxwell Collins, Yukun Zhu, Paul Voigtlaender, Hartwig Adam, Bradley Green, Andreas Geiger, Bastian Leibe, Daniel Cremers, Aljoša Ošep, Laura Leal-Taixé, Liang-Chieh Chen
The task of assigning semantic classes and track identities to every pixel in a video is called video panoptic segmentation.
1 code implementation • CVPR 2021 • Siyuan Qiao, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
We name this joint task as Depth-aware Video Panoptic Segmentation, and propose a new evaluation metric along with two derived datasets for it, which will be made available to the public.
Ranked #1 on
Video Panoptic Segmentation
on Cityscapes-VPS
(using extra training data)
Depth-aware Video Panoptic Segmentation
Monocular Depth Estimation
+2
1 code implementation • CVPR 2021 • Long Zhao, Yuxiao Wang, Jiaping Zhao, Liangzhe Yuan, Jennifer J. Sun, Florian Schroff, Hartwig Adam, Xi Peng, Dimitris Metaxas, Ting Liu
To evaluate the power of the learned representations, in addition to the conventional fully-supervised action recognition settings, we introduce a novel task called single-shot cross-view action recognition.
3 code implementations • CVPR 2021 • Huiyu Wang, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
As a result, MaX-DeepLab shows a significant 7. 1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time.
Ranked #12 on
Panoptic Segmentation
on COCO test-dev
2 code implementations • 23 Oct 2020 • Ting Liu, Jennifer J. Sun, Long Zhao, Jiaping Zhao, Liangzhe Yuan, Yuxiao Wang, Liang-Chieh Chen, Florian Schroff, Hartwig Adam
Recognition of human poses and actions is crucial for autonomous systems to interact smoothly with people.
1 code implementation • ECCV 2020 • Liang-Chieh Chen, Raphael Gontijo Lopes, Bowen Cheng, Maxwell D. Collins, Ekin D. Cubuk, Barret Zoph, Hartwig Adam, Jonathon Shlens
We view this work as a notable step towards building a simple procedure to harness unlabeled video sequences and extra images to surpass state-of-the-art performance on core computer vision tasks.
8 code implementations • ECCV 2020 • Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, Claire Cardie, Bharath Hariharan, Hartwig Adam, Serge Belongie
In this work we explore the task of instance segmentation with attribute localization, which unifies instance segmentation (detect and segment each object instance) and fine-grained visual attribute categorization (recognize one or multiple attributes).
5 code implementations • ECCV 2020 • Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions.
Ranked #4 on
Panoptic Segmentation
on Cityscapes val
(using extra training data)
1 code implementation • 15 Jan 2020 • Jennifer J. Sun, Ting Liu, Alan S. Cowen, Florian Schroff, Hartwig Adam, Gautam Prasad
The ability to predict evoked affect from a video, before viewers watch the video, can help in content creation and video recommendation.
2 code implementations • ECCV 2020 • Jennifer J. Sun, Jiaping Zhao, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Ting Liu
Depictions of similar human body configurations can vary with changing viewpoints.
Ranked #1 on
Pose Retrieval
on MPI-INF-3DHP
9 code implementations • CVPR 2020 • Bowen Cheng, Maxwell D. Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam, Liang-Chieh Chen
In this work, we introduce Panoptic-DeepLab, a simple, strong, and fast system for panoptic segmentation, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods while yielding fast inference speed.
Ranked #6 on
Panoptic Segmentation
on Cityscapes test
(using extra training data)
2 code implementations • 10 Oct 2019 • Bowen Cheng, Maxwell D. Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam, Liang-Chieh Chen
The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e. g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression.
1 code implementation • 13 Jun 2019 • Sheng Guo, Weilin Huang, Xiao Zhang, Prasanna Srikhanta, Yin Cui, Yuan Li, Matthew R. Scott, Hartwig Adam, Serge Belongie
The dataset is constructed from over one million fashion images with a label space that includes 8 groups of 228 fine-grained attributes in total.
1 code implementation • 4 Jun 2019 • Grace Chu, Brian Potetz, Weijun Wang, Andrew Howard, Yang song, Fernando Brucher, Thomas Leung, Hartwig Adam
By leveraging geolocation information we improve top-1 accuracy in iNaturalist from 70. 1% to 79. 0% for a strong baseline image-only model.
63 code implementations • ICCV 2019 • Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam
We achieve new state of the art results for mobile classification, detection and segmentation.
Ranked #9 on
Dichotomous Image Segmentation
on DIS-TE1
3 code implementations • CVPR 2019 • Paul Voigtlaender, Yuning Chai, Florian Schroff, Hartwig Adam, Bastian Leibe, Liang-Chieh Chen
Many of the recent successful methods for video object segmentation (VOS) are overly complicated, heavily rely on fine-tuning on the first frame, and/or are slow, and are hence of limited practical use.
Ranked #1 on
Semi-Supervised Video Object Segmentation
on YouTube
12 code implementations • CVPR 2019 • Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan Yuille, Li Fei-Fei
Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space.
Ranked #7 on
Semantic Segmentation
on PASCAL VOC 2012 val
1 code implementation • NeurIPS 2018 • Liang-Chieh Chen, Maxwell D. Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, Jonathon Shlens
Recent progress has demonstrated that such meta-learning methods may exceed scalable human-invented architectures on image classification tasks.
Ranked #1 on
Human Part Segmentation
on PASCAL-Person-Part
4 code implementations • ECCV 2018 • Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, Hartwig Adam
This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget.
78 code implementations • ECCV 2018 • Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam
The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information.
Ranked #1 on
Semantic Segmentation
on PASCAL VOC 2012 val
(mIoU (Syn) metric)
20 code implementations • CVPR 2018 • Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko
The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes.
no code implementations • CVPR 2018 • Liang-Chieh Chen, Alexander Hermans, George Papandreou, Florian Schroff, Peng Wang, Hartwig Adam
Within each region of interest, MaskLab performs foreground/background segmentation by combining semantic and direction prediction.
Ranked #87 on
Instance Segmentation
on COCO test-dev
(using extra training data)
1 code implementation • 1 Dec 2017 • Hee Jung Ryu, Hartwig Adam, Margaret Mitchell
We demonstrate an approach to face attribute detection that retains or improves attribute detection accuracy across gender and race subgroups by learning demographic information prior to learning the attribute detection task.
19 code implementations • CVPR 2018 • Grant Van Horn, Oisin Mac Aodha, Yang song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, Serge Belongie
Existing image classification datasets used in computer vision tend to have a uniform distribution of images across object categories.
Ranked #13 on
Image Classification
on iNaturalist
no code implementations • 19 Jul 2017 • Yang Song, Yuan Li, Bo Wu, Chao-Yeh Chen, Xiao Zhang, Hartwig Adam
To ease the training difficulty, a novel learning scheme is proposed by using the output from specialized models as learning targets so that L2 loss can be used instead of triplet loss.
no code implementations • CVPR 2017 • Bohyung Han, Jack Sim, Hartwig Adam
We propose an extremely simple but effective regularization technique of convolutional neural networks (CNNs), referred to as BranchOut, for online ensemble tracking.
77 code implementations • 17 Jun 2017 • Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam
To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates.
Ranked #3 on
Semantic Segmentation
on PASCAL VOC 2012 test
(using extra training data)
158 code implementations • 17 Apr 2017 • Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam
We present a class of efficient models called MobileNets for mobile and embedded vision applications.
Ranked #247 on
Object Detection
on COCO test-dev