Search Results for author: Serena Yeung-Levy

Found 10 papers, 5 papers with code

Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models

1 code implementation19 Mar 2024 Elaine Sui, Xiaohan Wang, Serena Yeung-Levy

Advancements in vision-language models (VLMs) have propelled the field of computer vision, particularly in the zero-shot learning setting.

Prompt Engineering Zero-shot Generalization +1

Depth-guided NeRF Training via Earth Mover's Distance

no code implementations19 Mar 2024 Anita Rau, Josiah Aklilu, F. Christopher Holsinger, Serena Yeung-Levy

This work proposes a novel approach to uncertainty in depth priors for NeRF supervision.

Denoising

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

no code implementations15 Mar 2024 Xiaohan Wang, Yuhui Zhang, Orr Zohar, Serena Yeung-Levy

Long-form video understanding represents a significant challenge within computer vision, demanding a model capable of reasoning over long multi-modal sequences.

Language Modelling Large Language Model +2

Multi-Human Mesh Recovery with Transformers

no code implementations26 Feb 2024 Zeyu Wang, Zhenzhen Weng, Serena Yeung-Levy

Conventional approaches to human mesh recovery predominantly employ a region-based strategy.

Human Mesh Recovery

Revisiting Active Learning in the Era of Vision Foundation Models

1 code implementation25 Jan 2024 Sanket Rajan Gupte, Josiah Aklilu, Jeffrey J. Nirschl, Serena Yeung-Levy

Foundation vision or vision-language models are trained on large unlabeled or noisy data and learn robust representations that can achieve impressive zero- or few-shot performance on diverse tasks.

Active Learning Image Classification

Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM

no code implementations22 Jan 2024 Zhenzhen Weng, Jingyuan Liu, Hao Tan, Zhan Xu, Yang Zhou, Serena Yeung-Levy, Jimei Yang

We present Human-LRM, a diffusion-guided feed-forward model that predicts the implicit field of a human from a single image.

Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data

1 code implementation16 Jan 2024 Yuhui Zhang, Elaine Sui, Serena Yeung-Levy

However, this assumption is under-explored due to the poorly understood geometry of the multi-modal contrastive space, where a modality gap exists.

Text-to-Image Generation Video Captioning

Describing Differences in Image Sets with Natural Language

1 code implementation5 Dec 2023 Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy

To aid in this discovery process, we explore the task of automatically describing the differences between two $\textbf{sets}$ of images, which we term Set Difference Captioning.

Language Modelling

Diffusion-HPC: Synthetic Data Generation for Human Mesh Recovery in Challenging Domains

1 code implementation16 Mar 2023 Zhenzhen Weng, Laura Bravo-Sánchez, Serena Yeung-Levy

Recent text-to-image generative models have exhibited remarkable abilities in generating high-fidelity and photo-realistic images.

Human Mesh Recovery Synthetic Data Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.