Search Results for author: Xiuye Gu

Found 15 papers, 7 papers with code

VideoPoet: A Large Language Model for Zero-Shot Video Generation

no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang

We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.

Ranked #3 on Text-to-Video Generation on MSR-VTT

Language Modelling Large Language Model +2

Paper
Add Code

Pixel Aligned Language Models

no code implementations • 14 Dec 2023 • Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid

When taking locations as inputs, the model performs location-conditioned captioning, which generates captions for the indicated object or region.

Language Modelling

Paper
Add Code

CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

no code implementations • 12 Dec 2023 • Shuyang Sun, Runjia Li, Philip Torr, Xiuye Gu, Siyang Li

Mask labels are labor-intensive, which limits the number of categories in segmentation datasets.

Image Segmentation Segmentation +1

Paper
Add Code

Photorealistic Video Generation with Diffusion Models

no code implementations • 11 Dec 2023 • Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, José Lezama

We present W. A. L. T, a transformer-based approach for photorealistic video generation via diffusion modeling.

Ranked #1 on Video Prediction on Kinetics-600 12 frames, 64x64

Text-to-Video Generation Video Generation +1

Paper
Add Code

PolyMaX: General Dense Prediction with Mask Transformer

1 code implementation • 9 Nov 2023 • Xuan Yang, Liangzhe Yuan, Kimberly Wilber, Astuti Sharma, Xiuye Gu, Siyuan Qiao, Stephanie Debats, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Liang-Chieh Chen

Despite this shift, methods based on the per-pixel prediction paradigm still dominate the benchmarks on the other dense prediction tasks that require continuous outputs, such as depth estimation and surface normal prediction.

Ranked #1 on Surface Normals Estimation on NYU Depth v2

Monocular Depth Estimation Semantic Segmentation +2

985

Paper
Code

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

no code implementations • 9 Oct 2023 • Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang

While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation.

Ranked #2 on Video Prediction on Kinetics-600 12 frames, 64x64

Action Recognition Image Generation +4

Paper
Add Code

A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

no code implementations • 13 Feb 2023 • James Urquhart Allingham, Jie Ren, Michael W Dusenberry, Xiuye Gu, Yin Cui, Dustin Tran, Jeremiah Zhe Liu, Balaji Lakshminarayanan

In particular, we ask "Given a large pool of prompts, can we automatically score the prompts and ensemble those that are most suitable for a particular downstream dataset, without needing access to labeled validation data?".

Prompt Engineering Zero-Shot Learning

Paper
Add Code

Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features

no code implementations • 20 Dec 2022 • Vivek Rathod, Bryan Seybold, Sudheendra Vijayanarasimhan, Austin Myers, Xiuye Gu, Vighnesh Birodkar, David A. Ross

Detecting actions in untrimmed videos should not be limited to a small, closed set of classes.

Action Detection Optical Flow Estimation

Paper
Add Code

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

1 code implementation • 30 Sep 2022 • Weicheng Kuo, Yin Cui, Xiuye Gu, AJ Piergiovanni, Anelia Angelova

We present F-VLM, a simple open-vocabulary object detection method built upon Frozen Vision and Language Models.

Knowledge Distillation object-detection +1

32,769

Paper
Code

Scaling Open-Vocabulary Image Segmentation with Image-Level Labels

1 code implementation • 22 Dec 2021 • Golnaz Ghiasi, Xiuye Gu, Yin Cui, Tsung-Yi Lin

We propose OpenSeg to address the above issue while still making use of scalable image-level supervision of captions.

Image Segmentation Segmentation +1

5,176

Paper
Code

Open-vocabulary Object Detection via Vision and Language Knowledge Distillation

4 code implementations • ICLR 2022 • Xiuye Gu, Tsung-Yi Lin, Weicheng Kuo, Yin Cui

On COCO, ViLD outperforms the previous state-of-the-art by 4. 8 on novel AP and 11. 4 on overall AP.

Ranked #2 on Open Vocabulary Object Detection on Objects365

Image Classification Knowledge Distillation +4

5,176

Paper
Code

Password-conditioned Anonymization and Deanonymization with Face Identity Transformers

1 code implementation • 26 Nov 2019 • Xiuye Gu, Weixin Luo, Michael S. Ryoo, Yong Jae Lee

Cameras are prevalent in our daily lives, and enable many useful systems built upon computer vision technologies such as smart cameras and home robots for service applications.

Paper
Code

HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-scale Point Clouds

2 code implementations • CVPR 2019 • Xiuye Gu, Yijie Wang, Chongruo wu, Yong-Jae lee, Panqu Wang

We present a novel deep neural network architecture for end-to-end scene flow estimation that directly operates on large-scale 3D point clouds.

Scene Flow Estimation

Paper
Code

A Revisit on Deep Hashings for Large-scale Content Based Image Retrieval

no code implementations • 16 Nov 2017 • Deng Cai, Xiuye Gu, Chaoqi Wang

However, there are serious flaws in the evaluations of existing deep hashing papers: (1) The datasets they used are too small and simple to simulate the real CBIR situation.

Content-Based Image Retrieval Deep Hashing

Paper
Add Code

Interspecies Knowledge Transfer for Facial Keypoint Detection

1 code implementation • CVPR 2017 • Maheen Rashid, Xiuye Gu, Yong Jae Lee

Instead of directly finetuning a network trained to detect keypoints on human faces to animal faces (which is sub-optimal since human and animal faces can look quite different), we propose to first adapt the animal images to the pre-trained human detection network by correcting for the differences in animal and human face shape.

Human Detection Keypoint Detection +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.