Search Results for author: Florian Schroff

Found 21 papers, 16 papers with code

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

76 code implementations • ECCV 2018 • Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam

The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information.

Ranked #1 on Semantic Segmentation on PASCAL VOC 2012 test (using extra training data)

Image Classification Image Segmentation +2

76,591

Paper
Code

Rethinking Atrous Convolution for Semantic Image Segmentation

75 code implementations • 17 Jun 2017 • Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam

To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates.

Ranked #3 on Semantic Segmentation on PASCAL VOC 2012 test (using extra training data)

Dichotomous Image Segmentation Image Segmentation +3

76,591

Paper
Code

Searching for Efficient Multi-Scale Architectures for Dense Image Prediction

1 code implementation • NeurIPS 2018 • Liang-Chieh Chen, Maxwell D. Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, Jonathon Shlens

Recent progress has demonstrated that such meta-learning methods may exceed scalable human-invented architectures on image classification tasks.

Ranked #1 on Human Part Segmentation on PASCAL-Person-Part

Image Classification Image Segmentation +5

76,589

Paper
Code

Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation

12 code implementations • CVPR 2019 • Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan Yuille, Li Fei-Fei

Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space.

Ranked #7 on Semantic Segmentation on PASCAL VOC 2012 val

Image Classification Image Segmentation +3

76,589

Paper
Code

FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation

3 code implementations • CVPR 2019 • Paul Voigtlaender, Yuning Chai, Florian Schroff, Hartwig Adam, Bastian Leibe, Liang-Chieh Chen

Many of the recent successful methods for video object segmentation (VOS) are overly complicated, heavily rely on fine-tuning on the first frame, and/or are slow, and are hence of limited practical use.

Ranked #1 on Semi-Supervised Video Object Segmentation on YouTube

Object Segmentation +3

76,589

Paper
Code

Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision

1 code implementation • CVPR 2022 • Liangzhe Yuan, Rui Qian, Yin Cui, Boqing Gong, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

Modern self-supervised learning algorithms typically enforce persistency of instance representations across views.

Action Recognition Contrastive Learning +4

76,589

Paper
Code

VideoGLUE: Video General Understanding Evaluation of Foundation Models

1 code implementation • 6 Jul 2023 • Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong

We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.

Action Recognition Temporal Localization +1

76,589

Paper
Code

View-Invariant Probabilistic Embedding for Human Pose

2 code implementations • ECCV 2020 • Jennifer J. Sun, Jiaping Zhao, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Ting Liu

Depictions of similar human body configurations can vary with changing viewpoints.

Ranked #1 on Pose Retrieval on MPI-INF-3DHP

Action Recognition Pose Retrieval +2

32,806

Paper
Code

View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose

2 code implementations • 23 Oct 2020 • Ting Liu, Jennifer J. Sun, Long Zhao, Jiaping Zhao, Liangzhe Yuan, Yuxiao Wang, Liang-Chieh Chen, Florian Schroff, Hartwig Adam

Recognition of human poses and actions is crucial for autonomous systems to interact smoothly with people.

3D Pose Estimation Action Recognition +2

32,806

Paper
Code

Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization

1 code implementation • CVPR 2021 • Long Zhao, Yuxiao Wang, Jiaping Zhao, Liangzhe Yuan, Jennifer J. Sun, Florian Schroff, Hartwig Adam, Xi Peng, Dimitris Metaxas, Ting Liu

To evaluate the power of the learned representations, in addition to the conventional fully-supervised action recognition settings, we introduce a novel task called single-shot cross-view action recognition.

Action Recognition Contrastive Learning +1

32,798

Paper
Code

FaceNet: A Unified Embedding for Face Recognition and Clustering

182 code implementations • CVPR 2015 • Florian Schroff, Dmitry Kalenichenko, James Philbin

On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99. 63%.

Ranked #1 on Disguised Face Verification on MegaFace

Clustering Disguised Face Verification +2

13,498

Paper
Code

Unified Visual Relationship Detection with Vision and Language Models

1 code implementation • ICCV 2023 • Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).

Human-Object Interaction Detection Relationship Detection +2

2,995

Paper
Code

DeepLab2: A TensorFlow Library for Deep Labeling

4 code implementations • 17 Jun 2021 • Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan, Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen

DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision.

988

Paper
Code

Learning to Generate Image Embeddings with User-level Differential Privacy

1 code implementation • CVPR 2023 • Zheng Xu, Maxwell Collins, Yuxiao Wang, Liviu Panait, Sewoong Oh, Sean Augenstein, Ting Liu, Florian Schroff, H. Brendan McMahan

Small on-device models have been successfully trained with user-level differential privacy (DP) for next word prediction and image classification tasks in the past.

Federated Learning Image Classification

647

Paper
Code

Modeling Uncertainty with Hedged Instance Embedding

1 code implementation • 30 Sep 2018 • Seong Joon Oh, Kevin Murphy, Jiyan Pan, Joseph Roth, Florian Schroff, Andrew Gallagher

Instance embeddings are an efficient and versatile image representation that facilitates applications like recognition, verification, retrieval, and clustering.

Clustering Metric Learning +1

Paper
Code

EEV: A Large-Scale Dataset for Studying Evoked Expressions from Video

1 code implementation • 15 Jan 2020 • Jennifer J. Sun, Ting Liu, Alan S. Cowen, Florian Schroff, Hartwig Adam, Gautam Prasad

The ability to predict evoked affect from a video, before viewers watch the video, can help in content creation and video recommendation.

Recommendation Systems Transfer Learning +1

Paper
Code

MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features

no code implementations • CVPR 2018 • Liang-Chieh Chen, Alexander Hermans, George Papandreou, Florian Schroff, Peng Wang, Hartwig Adam

Within each region of interest, MaskLab performs foreground/background segmentation by combining semantic and direction prediction.

Ranked #85 on Instance Segmentation on COCO test-dev (using extra training data)

Instance Segmentation Object +4

Paper
Add Code

Modeling Uncertainty with Hedged Instance Embeddings

no code implementations • ICLR 2019 • Seong Joon Oh, Kevin P. Murphy, Jiyan Pan, Joseph Roth, Florian Schroff, Andrew C. Gallagher

Instance embeddings are an efficient and versatile image representation that facilitates applications like recognition, verification, retrieval, and clustering.

Clustering Metric Learning +1

Paper
Add Code

Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding

no code implementations • 28 Mar 2023 • Yuanhao Xiong, Long Zhao, Boqing Gong, Ming-Hsuan Yang, Florian Schroff, Ting Liu, Cho-Jui Hsieh, Liangzhe Yuan

Existing video-language pre-training methods primarily focus on instance-level alignment between video clips and captions via global contrastive learning but neglect rich fine-grained local information in both videos and text, which is of importance to downstream tasks requiring temporal localization and semantic reasoning.

Action Recognition Contrastive Learning +7

Paper
Add Code

Distilling Vision-Language Models on Millions of Videos

no code implementations • 11 Jan 2024 • Yue Zhao, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, Hartwig Adam, Ting Liu, Boqing Gong, Philipp Krähenbühl, Liangzhe Yuan

Our best model outperforms state-of-the-art methods on MSR-VTT zero-shot text-to-video retrieval by 6%.

Language Modelling Retrieval +2

Paper
Add Code

VideoPrism: A Foundational Visual Encoder for Video Understanding

no code implementations • 20 Feb 2024 • Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong

We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model.

Question Answering Video Question Answering +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.