Search Results for author: Lijun Yu

Found 16 papers, 3 papers with code

Towards Multi-Task Multi-Modal Models: A Video Generative Perspective

no code implementations • 26 May 2024 • Lijun Yu

This thesis chronicles our endeavor to build multi-task models for generating videos and other modalities under diverse conditions, as well as for understanding and compression applications.

Paper
Add Code

A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation

no code implementations • 22 May 2024 • Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna Somandepalli

Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the audiovisual space. Our key contribution lies in how we parameterize the diffusion timestep in the forward diffusion process.

Paper
Add Code

Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

no code implementations • 15 May 2024 • Kai Hu, Weichen Yu, Tianjun Yao, Xiang Li, Wenhe Liu, Lijun Yu, Yining Li, Kai Chen, Zhiqiang Shen, Matt Fredrikson

Our approach relaxes the discrete jailbreak optimization into a continuous optimization and progressively increases the sparsity of the optimizing vectors.

Paper
Add Code

Improving and Unifying Discrete&Continuous-time Discrete Denoising Diffusion

2 code implementations • 6 Feb 2024 • Lingxiao Zhao, Xueying Ding, Lijun Yu, Leman Akoglu

Discrete diffusion models have seen a surge of attention with applications on naturally discrete data such as language and graphs.

Denoising

Paper
Code

VideoPoet: A Large Language Model for Zero-Shot Video Generation

no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang

We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.

Ranked #7 on Text-to-Video Generation on UCF-101

Decoder Language Modelling +3

Paper
Add Code

Photorealistic Video Generation with Diffusion Models

no code implementations • 11 Dec 2023 • Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, José Lezama

We present W. A. L. T, a transformer-based approach for photorealistic video generation via diffusion modeling.

Ranked #1 on Video Prediction on Kinetics-600 12 frames, 64x64

Text-to-Video Generation Video Generation +1

Paper
Add Code

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

no code implementations • 9 Oct 2023 • Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang

While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation.

Ranked #2 on Video Prediction on Kinetics-600 12 frames, 64x64

Action Recognition Image Generation +4

Paper
Add Code

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

no code implementations • NeurIPS 2023 • Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang

In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos.

In-Context Learning multimodal generation

Paper
Add Code

DocumentNet: Bridging the Data Gap in Document Pre-Training

no code implementations • 15 Jun 2023 • Lijun Yu, Jin Miao, Xiaoyu Sun, Jiayi Chen, Alexander G. Hauptmann, Hanjun Dai, Wei Wei

Document understanding tasks, in particular, Visually-rich Document Entity Retrieval (VDER), have gained significant attention in recent years thanks to their broad applications in enterprise AI.

document understanding Entity Retrieval +3

Paper
Add Code

MAGVIT: Masked Generative Video Transformer

1 code implementation • CVPR 2023 • Lijun Yu, Yong Cheng, Kihyuk Sohn, José Lezama, Han Zhang, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, Lu Jiang

We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model.

Ranked #1 on Video Prediction on Something-Something V2

Multi-Task Learning Text-to-Video Generation +2

862

Paper
Code

Score-based Continuous-time Discrete Diffusion Models

no code implementations • 30 Nov 2022 • Haoran Sun, Lijun Yu, Bo Dai, Dale Schuurmans, Hanjun Dai

Score-based modeling through stochastic differential equations (SDEs) has provided a new perspective on diffusion models, and demonstrated superior performance on continuous data.

Paper
Add Code

Rethinking Zero-shot Action Recognition: Learning from Latent Atomic Actions

1 code implementation • ECCV 2022 • Yijun Qian, Lijun Yu, Wenhe Liu, and Alexander G. Hauptmann

However, due to the complexity of actions, it remains challenging to transfer knowledge learned from source to target action domains.

Ranked #10 on Zero-Shot Action Recognition on Kinetics

Action Recognition Zero-Shot Action Recognition

Paper
Code

Argus++: Robust Real-time Activity Detection for Unconstrained Video Streams with Overlapping Cube Proposals

no code implementations • 14 Jan 2022 • Lijun Yu, Yijun Qian, Wenhe Liu, Alexander G. Hauptmann

Activity detection is one of the attractive computer vision tasks to exploit the video streams captured by widely installed cameras.

Action Detection Activity Detection

Paper
Add Code

Training-free Monocular 3D Event Detection System for Traffic Surveillance

no code implementations • 1 Feb 2020 • Lijun Yu, Peng Chen, Wenhe Liu, Guoliang Kang, Alexander G. Hauptmann

To deal with the aforementioned problems, in this paper, we propose a training-free monocular 3D event detection system for traffic surveillance.

Event Detection

Paper
Add Code

Traffic Danger Recognition With Surveillance Cameras Without Training Data

no code implementations • 29 Nov 2018 • Lijun Yu, Dawei Zhang, Xiangqun Chen, Alexander Hauptmann

Therefore, we developed a model to predict and identify car crashes from surveillance cameras based on a 3D reconstruction of the road plane and prediction of trajectories.

3D Reconstruction Position

Paper
Add Code

MOBA-Slice: A Time Slice Based Evaluation Framework of Relative Advantage between Teams in MOBA Games

no code implementations • 22 Jul 2018 • Lijun Yu, Dawei Zhang, Xiangqun Chen, Xing Xie

In this paper, we introduce MOBA-Slice, a time slice based evaluation framework of relative advantage between teams in MOBA games.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.