no code implementations • 25 Nov 2024 • Wang Bill Zhu, Deqing Fu, Kai Sun, Yi Lu, Zhaojiang Lin, Seungwhan Moon, Kanika Narang, Mustafa Canim, Yue Liu, Anuj Kumar, Xin Luna Dong
We hypothesize that a user's visual history with images reflecting their daily life, offers valuable insights into their interests and preferences, and can be leveraged for personalization.
no code implementations • 7 Oct 2024 • Deqing Fu, Tong Xiao, Rui Wang, Wang Zhu, Pengchuan Zhang, Guan Pang, Robin Jia, Lawrence Chen
Although reward models have been successful in improving multimodal large language models, the reward models themselves remain brutal and contain minimal information.
no code implementations • 5 Jun 2024 • Tianyi Zhou, Deqing Fu, Vatsal Sharan, Robin Jia
This paper shows that pre-trained LLMs add numbers using Fourier features -- dimensions in the hidden state that represent numbers via a set of features sparse in the frequency domain.
no code implementations • 1 Apr 2024 • Deqing Fu, Ruohao Guo, Ghazal Khalighinejad, Ollie Liu, Bhuwan Dhingra, Dani Yogatama, Robin Jia, Willie Neiswanger
Current foundation models exhibit impressive capabilities when prompted either with text only or with both image and text inputs.
no code implementations • 11 Mar 2024 • Bhavya Vasudeva, Deqing Fu, Tianyi Zhou, Elliott Kau, Youqi Huang, Vatsal Sharan
Transformers achieve state-of-the-art accuracy and robustness across many tasks, but an understanding of the inductive biases that they have and how those biases are different from other neural network architectures remains elusive.
no code implementations • 4 Feb 2024 • Ollie Liu, Deqing Fu, Dani Yogatama, Willie Neiswanger
The potential of large language models (LLMs) as decision support tools is increasingly being explored in fields such as business, engineering, and medicine, which often face challenging tasks of decision-making under uncertainty.
no code implementations • 29 Nov 2023 • Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian
Then, it uses two VLMs to select the best generation: a Visual Question Answering model that measures the alignment of generated images to the text, and another that measures the generation's aesthetic quality.
1 code implementation • 26 Oct 2023 • Deqing Fu, Tian-Qi Chen, Robin Jia, Vatsal Sharan
Transformers excel at in-context learning (ICL) -- learning from demonstrations without parameter updates -- but how they do so remains a mystery.
1 code implementation • 13 May 2023 • Deqing Fu, Ameya Godbole, Robin Jia
In this work, we propose Self-labeled Counterfactuals for Extrapolating to Negative Examples (SCENE), an automatic method for synthesizing training data that greatly improves models' ability to detect challenging negative examples.
no code implementations • 22 Nov 2021 • Deqing Fu, Bradley J. Nelson
Dense prediction tasks such as depth perception and semantic segmentation are important applications in computer vision that have a concrete topological description in terms of partitioning an image into connected components or estimating a function with a small number of local extrema corresponding to objects in the image.
no code implementations • ICCV 2021 • Cooper Nederhood, Nicholas Kolkin, Deqing Fu, Jason Salavon
Multi-modal domain translation typically refers to synthesizing a novel image that inherits certain localized attributes from a 'content' image (e. g. layout, semantics, or geometry), and inherits everything else (e. g. texture, lighting, sometimes even semantics) from a 'style' image.