no code implementations • 28 Mar 2024 • Yash Jain, David Chan, Pranav Dheram, Aparna Khare, Olabanji Shonibare, Venkatesh Ravichandran, Shalini Ghosh
Recent advances in machine learning have demonstrated that multi-modal pre-training can improve automatic speech recognition (ASR) performance compared to randomly initialized models, even when models are fine-tuned on uni-modal tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 7 Jan 2024 • Tahseen Rabbani, Jiahao Su, Xiaoyu Liu, David Chan, Geoffrey Sangston, Furong Huang
Modern ConvNets continue to achieve state-of-the-art results over a vast array of vision and image classification tasks, but at the cost of increasing parameters.
no code implementations • CVPR 2024 • Tsung-Han Wu, Giscard Biamby, David Chan, Lisa Dunlap, Ritwik Gupta, Xudong Wang, Joseph E. Gonzalez, Trevor Darrell
Current open-source Large Multimodal Models (LMMs) excel at tasks such as open-vocabulary language grounding and segmentation but can suffer under false premises when queries imply the existence of something that is not actually present in the image.
no code implementations • 13 Dec 2023 • Tsung-Han Wu, Giscard Biamby, David Chan, Lisa Dunlap, Ritwik Gupta, Xudong Wang, Joseph E. Gonzalez, Trevor Darrell
Current open-source Large Multimodal Models (LMMs) excel at tasks such as open-vocabulary language grounding and segmentation but can suffer under false premises when queries imply the existence of something that is not actually present in the image.
no code implementations • 19 Oct 2023 • David Chan, Suzanne Petryk, Joseph E. Gonzalez, Trevor Darrell, John Canny
The evaluation of machine-generated image captions poses an interesting yet persistent challenge.
no code implementations • 4 Apr 2023 • Vladislav Lialin, Stephen Rawls, David Chan, Shalini Ghosh, Anna Rumshisky, Wael Hamza
Currently popular video-text data mining approach via automatic speech recognition (ASR) used in HowTo100M provides low-quality captions that often do not refer to the video content.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 16 Jul 2022 • Sumanth Gurram, Andy Fang, David Chan, John Canny
Generating representations of video data is of key importance in advancing the field of machine perception.
no code implementations • 15 Feb 2022 • Kehan Wang, David Chan, Seth Z. Zhao, John Canny, Avideh Zakhor
With the growing adoption of short-form video by social media platforms, reducing the spread of misinformation through video posts has become a critical challenge for social media providers.
no code implementations • 31 Aug 2021 • Anthony Frazier, Joethi Silva, Rachel Meilak, Indranil Sahoo, David Chan, Michael Broda
For White students, different types of educational support were important in predicting academic achievement, while for non-White students, different types of emotional support were important in predicting academic achievement.
no code implementations • 5 Jun 2020 • Bofan Xue, David Chan, John Canny
We present a new publicly available dataset with the goal of advancing multi-modality learning by offering vision and language data within the same context.
2 code implementations • 26 Oct 2019 • Daniel Seita, David Chan, Roshan Rao, Chen Tang, Mandi Zhao, John Canny
Learning from demonstrations is a popular tool for accelerating and reducing the exploration requirements of reinforcement learning.
no code implementations • 23 Dec 2018 • Pooran Singh Negi, David Chan, Mohammad Mahoor
Traditionally artificial neural networks (ANNs) are trained by minimizing the cross-entropy between a provided groundtruth delta distribution (encoded as one-hot vector) and the ANN's predictive softmax distribution.
no code implementations • 9 Jul 2018 • Aiyou Chen, David Chan, Mike Perry, Yuxue Jin, Yunting Sun, Yueqing Wang, Jim Koehler
Evaluating the return on ad spend (ROAS), the causal effect of advertising on sales, is critical to advertisers for understanding the performance of their existing marketing strategy as well as how to improve and optimize it.
Applications
no code implementations • 11 May 2016 • Ali Mollahosseini, Behzad Hassani, Michelle J. Salvador, Hojjat Abdollahi, David Chan, Mohammad H. Mahoor
In fact, the Internet is a Word Wild Web of facial images with expressions.
Facial Expression Recognition
Facial Expression Recognition (FER)
no code implementations • 12 Nov 2015 • Ali Mollahosseini, David Chan, Mohammad H. Mahoor
Despite efforts made in developing various methods for FER, existing approaches traditionally lack generalizability when applied to unseen images or those that are captured in wild setting.
Facial Expression Recognition
Facial Expression Recognition (FER)