no code implementations • 19 Jul 2024 • Seunggeun Chi, Hyung-gun Chi, Hengbo Ma, Nakul Agarwal, Faizan Siddiqui, Karthik Ramani, Kwonjoon Lee
We introduce the Multi-Motion Discrete Diffusion Models (M2D2M), a novel approach for human motion generation from textual descriptions of multiple actions, utilizing the strengths of discrete diffusion models.
1 code implementation • 14 Jul 2024 • Yuchen Yang, Kwonjoon Lee, Behzad Dariush, Yinzhi Cao, Shao-Yuan Lo
In the induction stage, the LLM is fed with few-shot normal reference samples and then summarizes these normal patterns to induce a set of rules for detecting anomalies.
Ranked #2 on Video Anomaly Detection on UBnormal
no code implementations • CVPR 2024 • Himangi Mittal, Nakul Agarwal, Shao-Yuan Lo, Kwonjoon Lee
To address this limitation, we explore the generative capability of a large video-language model in our work and further, develop the understanding of plausibility in an action sequence by introducing two objective functions, a counterfactual-based plausible action sequence learning loss and a long-horizon action repetition loss.
Ranked #1 on Action Anticipation on EPIC-KITCHENS-100
no code implementations • CVPR 2024 • Hongji Guo, Nakul Agarwal, Shao-Yuan Lo, Kwonjoon Lee, Qiang Ji
The objective is to make the two decoupled tasks assist each other and eventually improve the action anticipation task.
Ranked #1 on Action Anticipation on 50-Salads
1 code implementation • 22 Nov 2023 • Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun
To interpret the important text evidence for question answering, we generalize the concept bottleneck model to work with tokens and nonlinear models, which uses hard attention to select a small subset of tokens from the free-form text as inputs to the LLM reasoner.
1 code implementation • 31 Oct 2023 • Ce Zhang, Changcheng Fu, Shijie Wang, Nakul Agarwal, Kwonjoon Lee, Chiho Choi, Chen Sun
To recognize and predict human-object interactions, we use a Transformer-based neural architecture which allows the "retrieval" of relevant objects for action anticipation at various time scales.
Ranked #4 on Long Term Action Anticipation on Ego4D
no code implementations • 9 Oct 2023 • Kaiwen Zhou, Kwonjoon Lee, Teruhisa Misu, Xin Eric Wang
For problems where the goal is to infer conclusions beyond image content, which we noted as visual commonsense inference (VCI), VLMs face difficulties, while LLMs, given sufficient visual evidence, can use commonsense to infer the answer well.
1 code implementation • 31 Jul 2023 • Qi Zhao, Shijie Wang, Ce Zhang, Changcheng Fu, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun
We propose to formulate the LTA task from two perspectives: a bottom-up approach that predicts the next actions autoregressively by modeling temporal dynamics; and a top-down approach that infers the goal of the actor and plans the needed procedure to accomplish the goal.
Ranked #2 on Long Term Action Anticipation on Ego4D
no code implementations • CVPR 2023 • Hyung-gun Chi, Kwonjoon Lee, Nakul Agarwal, Yi Xu, Karthik Ramani, Chiho Choi
SALF is challenging because it requires understanding the underlying physics of video observations to predict future action locations accurately.
3 code implementations • ICLR 2022 • Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu
Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases.
Ranked #76 on Image Generation on CIFAR-10
no code implementations • CVPR 2021 • Gaurav Parmar, Dacheng Li, Kwonjoon Lee, Zhuowen Tu
Our model, named dual contradistinctive generative autoencoder (DC-VAE), integrates an instance-level discriminative loss (maintaining the instance-level fidelity for the reconstruction/synthesis) with a set-level adversarial loss (encouraging the set-level fidelity for there construction/synthesis), both being contradistinctive.
Ranked #2 on Image Generation on LSUN Bedroom 128 x 128
no code implementations • ICLR 2020 • Siyang Wang, Justin Lazarow, Kwonjoon Lee, Zhuowen Tu
We tackle the problem of modeling sequential visual phenomena.
1 code implementation • CVPR 2020 • Justin Lazarow, Kwonjoon Lee, Kunyu Shi, Zhuowen Tu
Panoptic segmentation requires segments of both "things" (countable object instances) and "stuff" (uncountable and amorphous regions) within a single output.
Ranked #22 on Panoptic Segmentation on COCO test-dev
7 code implementations • CVPR 2019 • Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, Stefano Soatto
We propose to use these predictors as base learners to learn representations for few-shot learning and show they offer better tradeoffs between feature size and performance across a range of few-shot recognition benchmarks.
Ranked #12 on Few-Shot Image Classification on FC100 5-way (1-shot)
no code implementations • 6 Dec 2017 • Zhiwei Jia, Haoshen Hong, Siyang Wang, Kwonjoon Lee, Zhuowen Tu
We study the intrinsic transformation of feature maps across convolutional network layers with explicit top-down control.
1 code implementation • CVPR 2018 • Kwonjoon Lee, Weijian Xu, Fan Fan, Zhuowen Tu
We present Wasserstein introspective neural networks (WINN) that are both a generator and a discriminator within a single model.