Search Results for author: Lijun Yu

Found 13 papers, 2 papers with code

Improving and Unifying Discrete&Continuous-time Discrete Denoising Diffusion

no code implementations • 6 Feb 2024 • Lingxiao Zhao, Xueying Ding, Lijun Yu, Leman Akoglu

Discrete diffusion models have seen a surge of attention with applications on naturally discrete data such as language and graphs.

Denoising

Paper
Add Code

VideoPoet: A Large Language Model for Zero-Shot Video Generation

no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang

We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.

Ranked #3 on Text-to-Video Generation on MSR-VTT

Language Modelling Large Language Model +2

Paper
Add Code

Photorealistic Video Generation with Diffusion Models

no code implementations • 11 Dec 2023 • Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, José Lezama

We present W. A. L. T, a transformer-based approach for photorealistic video generation via diffusion modeling.

Ranked #1 on Video Prediction on Kinetics-600 12 frames, 64x64

Text-to-Video Generation Video Generation +1

Paper
Add Code

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

no code implementations • 9 Oct 2023 • Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang

While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation.

Ranked #2 on Video Prediction on Kinetics-600 12 frames, 64x64

Action Recognition Image Generation +4

Paper
Add Code

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

no code implementations • NeurIPS 2023 • Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang

In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos.

In-Context Learning multimodal generation

Paper
Add Code

DocumentNet: Bridging the Data Gap in Document Pre-Training

no code implementations • 15 Jun 2023 • Lijun Yu, Jin Miao, Xiaoyu Sun, Jiayi Chen, Alexander G. Hauptmann, Hanjun Dai, Wei Wei

Document understanding tasks, in particular, Visually-rich Document Entity Retrieval (VDER), have gained significant attention in recent years thanks to their broad applications in enterprise AI.

document understanding Entity Retrieval +3

Paper
Add Code

MAGVIT: Masked Generative Video Transformer

1 code implementation • CVPR 2023 • Lijun Yu, Yong Cheng, Kihyuk Sohn, José Lezama, Han Zhang, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, Lu Jiang

We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model.

Ranked #1 on Video Prediction on Something-Something V2

Multi-Task Learning Text-to-Video Generation +2

842

Paper
Code

Score-based Continuous-time Discrete Diffusion Models

no code implementations • 30 Nov 2022 • Haoran Sun, Lijun Yu, Bo Dai, Dale Schuurmans, Hanjun Dai

Score-based modeling through stochastic differential equations (SDEs) has provided a new perspective on diffusion models, and demonstrated superior performance on continuous data.

Paper
Add Code

Rethinking Zero-shot Action Recognition: Learning from Latent Atomic Actions

1 code implementation • ECCV 2022 • Yijun Qian, Lijun Yu, Wenhe Liu, and Alexander G. Hauptmann

However, due to the complexity of actions, it remains challenging to transfer knowledge learned from source to target action domains.

Ranked #10 on Zero-Shot Action Recognition on Kinetics

Action Recognition Zero-Shot Action Recognition

Paper
Code

Argus++: Robust Real-time Activity Detection for Unconstrained Video Streams with Overlapping Cube Proposals

no code implementations • 14 Jan 2022 • Lijun Yu, Yijun Qian, Wenhe Liu, Alexander G. Hauptmann

Activity detection is one of the attractive computer vision tasks to exploit the video streams captured by widely installed cameras.

Action Detection Activity Detection

Paper
Add Code

Training-free Monocular 3D Event Detection System for Traffic Surveillance

no code implementations • 1 Feb 2020 • Lijun Yu, Peng Chen, Wenhe Liu, Guoliang Kang, Alexander G. Hauptmann

To deal with the aforementioned problems, in this paper, we propose a training-free monocular 3D event detection system for traffic surveillance.

Event Detection

Paper
Add Code

Traffic Danger Recognition With Surveillance Cameras Without Training Data

no code implementations • 29 Nov 2018 • Lijun Yu, Dawei Zhang, Xiangqun Chen, Alexander Hauptmann

Therefore, we developed a model to predict and identify car crashes from surveillance cameras based on a 3D reconstruction of the road plane and prediction of trajectories.

3D Reconstruction Position

Paper
Add Code

MOBA-Slice: A Time Slice Based Evaluation Framework of Relative Advantage between Teams in MOBA Games

no code implementations • 22 Jul 2018 • Lijun Yu, Dawei Zhang, Xiangqun Chen, Xing Xie

In this paper, we introduce MOBA-Slice, a time slice based evaluation framework of relative advantage between teams in MOBA games.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.