no code implementations • 20 Feb 2024 • Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong
We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model.
1 code implementation • 6 Jul 2023 • Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong
We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.
no code implementations • 2 Jun 2022 • Zu Kim, André Araujo, Bingyi Cao, Cam Askew, Jack Sim, Mike Green, N'Mah Fodiatu Yilla, Tobias Weyand
We showcase its application to the landmark recognition domain, presenting a detailed analysis and the final fairer landmark rankings.
no code implementations • 19 Aug 2021 • Zu Kim, André Araujo, Bingyi Cao, Cam Askew, Jack Sim, Mike Green, N'Mah Fodiatu Yilla, Tobias Weyand
To create a more comprehensive and equitable dataset, we start by defining the fair relevance of a landmark to the world population.
1 code implementation • CVPR 2021 • Quin Thames, Arjun Karpur, Wade Norris, Fangting Xia, Liviu Panait, Tobias Weyand, Jack Sim
Understanding the nutritional content of food from visual data is a challenging computer vision problem, with the potential to have a positive and widespread impact on public health.
1 code implementation • CVPR 2020 • Tobias Weyand, Andre Araujo, Bingyi Cao, Jack Sim
GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels.
4 code implementations • 3 Apr 2020 • Tobias Weyand, Andre Araujo, Bingyi Cao, Jack Sim
GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels.
Ranked #1 on Landmark Recognition on Google Landmarks Dataset v2 (recognition, validation) (using extra training data)
no code implementations • ECCV 2018 • Paul Hongsuck Seo, Tobias Weyand, Jack Sim, Bohyung Han
Image geolocalization is the task of identifying the location depicted in a photo based only on its visual information.
Ranked #1 on Photo geolocation estimation on Im2GPS (Reference images metric)
153 code implementations • 17 Apr 2017 • Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam
We present a class of efficient models called MobileNets for mobile and embedded vision applications.
Ranked #238 on Object Detection on COCO test-dev
12 code implementations • ICCV 2017 • Hyeonwoo Noh, Andre Araujo, Jack Sim, Tobias Weyand, Bohyung Han
We propose an attentive local feature descriptor suitable for large-scale image retrieval, referred to as DELF (DEep Local Feature).
Ranked #2 on Image Retrieval on Oxf105k
1 code implementation • 17 Feb 2016 • Tobias Weyand, Ilya Kostrikov, James Philbin
Is it possible to build a system to determine the location where a photo was taken using just its pixels?
Ranked #1 on Photo geolocation estimation on Im2GPS (Reference images metric)
no code implementations • 18 Sep 2014 • Tobias Weyand, Bastian Leibe
We evaluate how different choices of methods and parameters for the individual pipeline steps affect overall system performance and examine their effects for different query categories such as buildings, paintings or sculptures.