76 code implementations • ECCV 2018 • Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam
The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information.
Ranked #1 on Semantic Segmentation on PASCAL VOC 2012 test (using extra training data)
75 code implementations • 17 Jun 2017 • Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam
To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates.
Ranked #3 on Semantic Segmentation on PASCAL VOC 2012 test (using extra training data)
1 code implementation • NeurIPS 2018 • Liang-Chieh Chen, Maxwell D. Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, Jonathon Shlens
Recent progress has demonstrated that such meta-learning methods may exceed scalable human-invented architectures on image classification tasks.
Ranked #1 on Human Part Segmentation on PASCAL-Person-Part
12 code implementations • CVPR 2019 • Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan Yuille, Li Fei-Fei
Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space.
Ranked #7 on Semantic Segmentation on PASCAL VOC 2012 val
3 code implementations • CVPR 2019 • Paul Voigtlaender, Yuning Chai, Florian Schroff, Hartwig Adam, Bastian Leibe, Liang-Chieh Chen
Many of the recent successful methods for video object segmentation (VOS) are overly complicated, heavily rely on fine-tuning on the first frame, and/or are slow, and are hence of limited practical use.
Ranked #1 on Semi-Supervised Video Object Segmentation on YouTube
1 code implementation • CVPR 2022 • Liangzhe Yuan, Rui Qian, Yin Cui, Boqing Gong, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu
Modern self-supervised learning algorithms typically enforce persistency of instance representations across views.
1 code implementation • 6 Jul 2023 • Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong
We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.
2 code implementations • ECCV 2020 • Jennifer J. Sun, Jiaping Zhao, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Ting Liu
Depictions of similar human body configurations can vary with changing viewpoints.
Ranked #1 on Pose Retrieval on MPI-INF-3DHP
2 code implementations • 23 Oct 2020 • Ting Liu, Jennifer J. Sun, Long Zhao, Jiaping Zhao, Liangzhe Yuan, Yuxiao Wang, Liang-Chieh Chen, Florian Schroff, Hartwig Adam
Recognition of human poses and actions is crucial for autonomous systems to interact smoothly with people.
1 code implementation • CVPR 2021 • Long Zhao, Yuxiao Wang, Jiaping Zhao, Liangzhe Yuan, Jennifer J. Sun, Florian Schroff, Hartwig Adam, Xi Peng, Dimitris Metaxas, Ting Liu
To evaluate the power of the learned representations, in addition to the conventional fully-supervised action recognition settings, we introduce a novel task called single-shot cross-view action recognition.
182 code implementations • CVPR 2015 • Florian Schroff, Dmitry Kalenichenko, James Philbin
On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99. 63%.
Ranked #1 on Disguised Face Verification on MegaFace
1 code implementation • ICCV 2023 • Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu
To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).
Human-Object Interaction Detection Relationship Detection +2
4 code implementations • 17 Jun 2021 • Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan, Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen
DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision.
1 code implementation • CVPR 2023 • Zheng Xu, Maxwell Collins, Yuxiao Wang, Liviu Panait, Sewoong Oh, Sean Augenstein, Ting Liu, Florian Schroff, H. Brendan McMahan
Small on-device models have been successfully trained with user-level differential privacy (DP) for next word prediction and image classification tasks in the past.
1 code implementation • 30 Sep 2018 • Seong Joon Oh, Kevin Murphy, Jiyan Pan, Joseph Roth, Florian Schroff, Andrew Gallagher
Instance embeddings are an efficient and versatile image representation that facilitates applications like recognition, verification, retrieval, and clustering.
1 code implementation • 15 Jan 2020 • Jennifer J. Sun, Ting Liu, Alan S. Cowen, Florian Schroff, Hartwig Adam, Gautam Prasad
The ability to predict evoked affect from a video, before viewers watch the video, can help in content creation and video recommendation.
no code implementations • CVPR 2018 • Liang-Chieh Chen, Alexander Hermans, George Papandreou, Florian Schroff, Peng Wang, Hartwig Adam
Within each region of interest, MaskLab performs foreground/background segmentation by combining semantic and direction prediction.
Ranked #85 on Instance Segmentation on COCO test-dev (using extra training data)
no code implementations • ICLR 2019 • Seong Joon Oh, Kevin P. Murphy, Jiyan Pan, Joseph Roth, Florian Schroff, Andrew C. Gallagher
Instance embeddings are an efficient and versatile image representation that facilitates applications like recognition, verification, retrieval, and clustering.
no code implementations • 28 Mar 2023 • Yuanhao Xiong, Long Zhao, Boqing Gong, Ming-Hsuan Yang, Florian Schroff, Ting Liu, Cho-Jui Hsieh, Liangzhe Yuan
Existing video-language pre-training methods primarily focus on instance-level alignment between video clips and captions via global contrastive learning but neglect rich fine-grained local information in both videos and text, which is of importance to downstream tasks requiring temporal localization and semantic reasoning.
no code implementations • 11 Jan 2024 • Yue Zhao, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, Hartwig Adam, Ting Liu, Boqing Gong, Philipp Krähenbühl, Liangzhe Yuan
Our best model outperforms state-of-the-art methods on MSR-VTT zero-shot text-to-video retrieval by 6%.
no code implementations • 20 Feb 2024 • Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong
We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model.