no code implementations • 15 Mar 2024 • Xiaohan Wang, Yuhui Zhang, Orr Zohar, Serena Yeung-Levy
Long-form video understanding represents a significant challenge within computer vision, demanding a model capable of reasoning over long multi-modal sequences.
Ranked #1 on Zero-Shot Video Question Answer on NExT-QA
no code implementations • 10 Dec 2023 • Orr Zohar, Alejandro Lozano, Shelly Goel, Serena Yeung, Kuan-Chieh Wang
We exploit the inherent connection between classes in application-driven datasets and introduce a novel method, Foundation Object detection Model for the Open world, or FOMO, which identifies unknown objects based on their shared attributes with the base known objects.
1 code implementation • NeurIPS 2023 • Orr Zohar, Shih-Cheng Huang, Kuan-Chieh Wang, Serena Yeung
As the number of open-source VLM variants increases, there is a need for an efficient model selection strategy that does not require access to a curated evaluation dataset.
1 code implementation • CVPR 2023 • Orr Zohar, Kuan-Chieh Wang, Serena Yeung
The resulting Probabilistic Objectness transformer-based open-world detector, PROB, integrates our framework into traditional object detection models, adapting them for the open-world setting.