1 code implementation • 10 Oct 2024 • Maxwell Horton, Qingqing Cao, Chenfan Sun, Yanzi Jin, Sachin Mehta, Mohammad Rastegari, Moin Nabi
In our method, a small auxiliary model is used to process the prompt and produce an approximation of the KV cache used by a base model.
no code implementations • 13 Sep 2024 • Arnav Kundu, Yanzi Jin, Mohammad Sekhavat, Max Horton, Danny Tormoen, Devang Naik
This paper delves into the challenging task of Active Speaker Detection (ASD), where the system needs to determine in real-time whether a person is speaking or not in a series of video frames.
Active Speaker Detection
Audio-Visual Active Speaker Detection
4 code implementations • 22 Apr 2024 • Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari
To this end, we release OpenELM, a state-of-the-art open language model.
no code implementations • 5 Oct 2023 • Elvis Nunez, Yanzi Jin, Mohammad Rastegari, Sachin Mehta, Maxwell Horton
Over the past several years, the synchronization between audio and visual signals has been leveraged to learn richer audio-visual representations.
no code implementations • 18 Nov 2020 • Maxwell Horton, Yanzi Jin, Ali Farhadi, Mohammad Rastegari
We also show how to precondition the network to improve the accuracy of our layer-wise compression method.