no code implementations • 5 Dec 2024 • Justin Lazarow, David Griffiths, Gefen Kohavi, Francisco Crespo, Afshin Dehghan
Furthermore, by pre-training on CA-1M, CuTR can outperform point-based methods on a more diverse variant of SUN RGB-D - supporting the notion that while inductive biases in 3D are useful at the smaller sizes of existing datasets, they fail to scale to the data-rich regime of CA-1M.
no code implementations • 30 Sep 2024 • Haotian Zhang, Mingfei Gao, Zhe Gan, Philipp Dufter, Nina Wenzel, Forrest Huang, Dhruti Shah, Xianzhi Du, BoWen Zhang, Yanghao Li, Sam Dodge, Keen You, Zhen Yang, Aleksei Timofeev, Mingze Xu, Hong-You Chen, Jean-Philippe Fauconnier, Zhengfeng Lai, Haoxuan You, ZiRui Wang, Afshin Dehghan, Peter Grasch, Yinfei Yang
We present MM1. 5, a new family of multimodal large language models (MLLMs) designed to enhance capabilities in text-rich image understanding, visual referring and grounding, and multi-image reasoning.
Ranked #51 on Visual Question Answering on MM-Vet
1 code implementation • 22 Jul 2024 • Mingze Xu, Mingfei Gao, Zhe Gan, Hong-You Chen, Zhengfeng Lai, Haiming Gang, Kai Kang, Afshin Dehghan
As a result, this design allows us to adequately capture both spatial and temporal features that are beneficial for detailed video understanding.
no code implementations • 2 Jul 2024 • Elmira Amirloo, Jean-Philippe Fauconnier, Christoph Roesmann, Christian Kerl, Rinu Boney, Yusu Qian, ZiRui Wang, Afshin Dehghan, Yinfei Yang, Zhe Gan, Peter Grasch
A primary objective of alignment for MLLMs is to encourage these models to align responses more closely with image information.
1 code implementation • 13 Jun 2024 • Roman Bachmann, Oğuzhan Fatih Kar, David Mizrahi, Ali Garjani, Mingfei Gao, David Griffiths, Jiaming Hu, Afshin Dehghan, Amir Zamir
In this paper, we expand upon the capabilities of them by training a single model on tens of highly diverse modalities and by performing co-training on large-scale multimodal datasets and text corpora.
1 code implementation • NeurIPS 2023 • David Mizrahi, Roman Bachmann, Oğuzhan Fatih Kar, Teresa Yeo, Mingfei Gao, Afshin Dehghan, Amir Zamir
Current machine learning models for vision are often highly specialized and limited to a single modality and task.
1 code implementation • 27 Jul 2022 • Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind
We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.
Ranked #1 on Image Generation on ARKitScenes
1 code implementation • 17 Nov 2021 • Gilad Baruch, Zhuoyuan Chen, Afshin Dehghan, Tal Dimry, Yuri Feigin, Peter Fu, Thomas Gebauer, Brandon Joffe, Daniel Kurz, Arik Schwartz, Elad Shulman
It is not only the first RGB-D dataset that is captured with a now widely available depth sensor, but to our best knowledge, it also is the largest indoor scene understanding data released.
no code implementations • 21 Mar 2017 • Syed Zain Masood, Guang Shu, Afshin Dehghan, Enrique. G. Ortiz
This work details Sighthounds fully automated license plate detection and recognition system.
2 code implementations • 14 Feb 2017 • Afshin Dehghan, Enrique. G. Ortiz, Guang Shu, Syed Zain Masood
This paper describes the details of Sighthound's fully automated age, gender and emotion recognition system.
1 code implementation • 6 Feb 2017 • Afshin Dehghan, Syed Zain Masood, Guang Shu, Enrique. G. Ortiz
The backbone of our system is a deep convolutional neural network that is not only computationally inexpensive, but also provides state-of-the-art results on several competitive benchmarks.
no code implementations • 30 Mar 2016 • Afshin Dehghan, Mubarak Shah
In this paper, we propose a tracker that addresses the aforementioned problems and is capable of tracking hundreds of people efficiently.
no code implementations • 13 Dec 2015 • Meera Hahn, Si Chen, Afshin Dehghan
In this paper, we study a discriminatively trained deep convolutional network for the task of visual tracking.
no code implementations • CVPR 2015 • Afshin Dehghan, Yicong Tian, Philip H. S. Torr, Mubarak Shah
In this paper we show that multiple object tracking (MOT) can be formulated in a framework, where the detection and data-association are performed simultaneously.
no code implementations • CVPR 2015 • Afshin Dehghan, Shayan Modiri Assari, Mubarak Shah
Data association is the backbone to many multiple object tracking (MOT) methods.
no code implementations • CVPR 2014 • Afshin Dehghan, Enrique. G. Ortiz, Ruben Villegas, Mubarak Shah
Recent years have seen a major push for face recognition technology due to the large expansion of image sharing on social networks.
Ranked #4 on Kinship Verification on KinFaceW-II
no code implementations • CVPR 2014 • Afshin Dehghan, Haroon Idrees, Mubarak Shah
A video captures a sequence and interactions of concepts that can be static, for instance, objects or scenes, or dynamic, such as actions.
no code implementations • CVPR 2013 • Guang Shu, Afshin Dehghan, Mubarak Shah
In general, our method takes detection bounding boxes of a generic detector as input and generates the detection output with higher average precision and precise object regions.