no code implementations • 26 May 2024 • Lijun Yu
This thesis chronicles our endeavor to build multi-task models for generating videos and other modalities under diverse conditions, as well as for understanding and compression applications.
no code implementations • 22 May 2024 • Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna Somandepalli
Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the audiovisual space. Our key contribution lies in how we parameterize the diffusion timestep in the forward diffusion process.
no code implementations • 15 May 2024 • Kai Hu, Weichen Yu, Tianjun Yao, Xiang Li, Wenhe Liu, Lijun Yu, Yining Li, Kai Chen, Zhiqiang Shen, Matt Fredrikson
Our approach relaxes the discrete jailbreak optimization into a continuous optimization and progressively increases the sparsity of the optimizing vectors.
2 code implementations • 6 Feb 2024 • Lingxiao Zhao, Xueying Ding, Lijun Yu, Leman Akoglu
Discrete diffusion models have seen a surge of attention with applications on naturally discrete data such as language and graphs.
no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang
We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.
Ranked #7 on Text-to-Video Generation on UCF-101
no code implementations • 11 Dec 2023 • Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, José Lezama
We present W. A. L. T, a transformer-based approach for photorealistic video generation via diffusion modeling.
Ranked #1 on Video Prediction on Kinetics-600 12 frames, 64x64
no code implementations • 9 Oct 2023 • Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang
While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation.
Ranked #2 on Video Prediction on Kinetics-600 12 frames, 64x64
no code implementations • NeurIPS 2023 • Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang
In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos.
no code implementations • 15 Jun 2023 • Lijun Yu, Jin Miao, Xiaoyu Sun, Jiayi Chen, Alexander G. Hauptmann, Hanjun Dai, Wei Wei
Document understanding tasks, in particular, Visually-rich Document Entity Retrieval (VDER), have gained significant attention in recent years thanks to their broad applications in enterprise AI.
1 code implementation • CVPR 2023 • Lijun Yu, Yong Cheng, Kihyuk Sohn, José Lezama, Han Zhang, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, Lu Jiang
We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model.
Ranked #1 on Video Prediction on Something-Something V2
no code implementations • 30 Nov 2022 • Haoran Sun, Lijun Yu, Bo Dai, Dale Schuurmans, Hanjun Dai
Score-based modeling through stochastic differential equations (SDEs) has provided a new perspective on diffusion models, and demonstrated superior performance on continuous data.
1 code implementation • ECCV 2022 • Yijun Qian, Lijun Yu, Wenhe Liu, and Alexander G. Hauptmann
However, due to the complexity of actions, it remains challenging to transfer knowledge learned from source to target action domains.
Ranked #10 on Zero-Shot Action Recognition on Kinetics
no code implementations • 14 Jan 2022 • Lijun Yu, Yijun Qian, Wenhe Liu, Alexander G. Hauptmann
Activity detection is one of the attractive computer vision tasks to exploit the video streams captured by widely installed cameras.
no code implementations • 1 Feb 2020 • Lijun Yu, Peng Chen, Wenhe Liu, Guoliang Kang, Alexander G. Hauptmann
To deal with the aforementioned problems, in this paper, we propose a training-free monocular 3D event detection system for traffic surveillance.
no code implementations • 29 Nov 2018 • Lijun Yu, Dawei Zhang, Xiangqun Chen, Alexander Hauptmann
Therefore, we developed a model to predict and identify car crashes from surveillance cameras based on a 3D reconstruction of the road plane and prediction of trajectories.
no code implementations • 22 Jul 2018 • Lijun Yu, Dawei Zhang, Xiangqun Chen, Xing Xie
In this paper, we introduce MOBA-Slice, a time slice based evaluation framework of relative advantage between teams in MOBA games.