Search Results for author: Kwonjoon Lee

Found 16 papers, 8 papers with code

M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models

no code implementations19 Jul 2024 Seunggeun Chi, Hyung-gun Chi, Hengbo Ma, Nakul Agarwal, Faizan Siddiqui, Karthik Ramani, Kwonjoon Lee

We introduce the Multi-Motion Discrete Diffusion Models (M2D2M), a novel approach for human motion generation from textual descriptions of multiple actions, utilizing the strengths of discrete diffusion models.

Denoising Motion Generation

Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models

1 code implementation14 Jul 2024 Yuchen Yang, Kwonjoon Lee, Behzad Dariush, Yinzhi Cao, Shao-Yuan Lo

In the induction stage, the LLM is fed with few-shot normal reference samples and then summarizes these normal patterns to induce a set of rules for detecting anomalies.

Anomaly Detection Video Anomaly Detection

Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models

no code implementations CVPR 2024 Himangi Mittal, Nakul Agarwal, Shao-Yuan Lo, Kwonjoon Lee

To address this limitation, we explore the generative capability of a large video-language model in our work and further, develop the understanding of plausibility in an action sequence by introducing two objective functions, a counterfactual-based plausible action sequence learning loss and a long-horizon action repetition loss.

Action Anticipation counterfactual +2

Vamos: Versatile Action Models for Video Understanding

1 code implementation22 Nov 2023 Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun

To interpret the important text evidence for question answering, we generalize the concept bottleneck model to work with tokens and nonlinear models, which uses hard attention to select a small subset of tokens from the free-form text as inputs to the LLM reasoner.

Hard Attention Language Modelling +3

Object-centric Video Representation for Long-term Action Anticipation

1 code implementation31 Oct 2023 Ce Zhang, Changcheng Fu, Shijie Wang, Nakul Agarwal, Kwonjoon Lee, Chiho Choi, Chen Sun

To recognize and predict human-object interactions, we use a Transformer-based neural architecture which allows the "retrieval" of relevant objects for action anticipation at various time scales.

Action Anticipation Human-Object Interaction Detection +4

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

no code implementations9 Oct 2023 Kaiwen Zhou, Kwonjoon Lee, Teruhisa Misu, Xin Eric Wang

For problems where the goal is to infer conclusions beyond image content, which we noted as visual commonsense inference (VCI), VLMs face difficulties, while LLMs, given sufficient visual evidence, can use commonsense to infer the answer well.

Image Captioning Visual Commonsense Reasoning

AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

1 code implementation31 Jul 2023 Qi Zhao, Shijie Wang, Ce Zhang, Changcheng Fu, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun

We propose to formulate the LTA task from two perspectives: a bottom-up approach that predicts the next actions autoregressively by modeling temporal dynamics; and a top-down approach that infers the goal of the actor and plans the needed procedure to accomplish the goal.

Action Anticipation counterfactual +1

AdamsFormer for Spatial Action Localization in the Future

no code implementations CVPR 2023 Hyung-gun Chi, Kwonjoon Lee, Nakul Agarwal, Yi Xu, Karthik Ramani, Chiho Choi

SALF is challenging because it requires understanding the underlying physics of video observations to predict future action locations accurately.

Action Localization

ViTGAN: Training GANs with Vision Transformers

3 code implementations ICLR 2022 Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu

Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases.

Image Generation

Dual Contradistinctive Generative Autoencoder

no code implementations CVPR 2021 Gaurav Parmar, Dacheng Li, Kwonjoon Lee, Zhuowen Tu

Our model, named dual contradistinctive generative autoencoder (DC-VAE), integrates an instance-level discriminative loss (maintaining the instance-level fidelity for the reconstruction/synthesis) with a set-level adversarial loss (encouraging the set-level fidelity for there construction/synthesis), both being contradistinctive.

Image Generation Image Reconstruction +1

Learning Instance Occlusion for Panoptic Segmentation

1 code implementation CVPR 2020 Justin Lazarow, Kwonjoon Lee, Kunyu Shi, Zhuowen Tu

Panoptic segmentation requires segments of both "things" (countable object instances) and "stuff" (uncountable and amorphous regions) within a single output.

Instance Segmentation Panoptic Segmentation +2

Meta-Learning with Differentiable Convex Optimization

7 code implementations CVPR 2019 Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, Stefano Soatto

We propose to use these predictors as base learners to learn representations for few-shot learning and show they offer better tradeoffs between feature size and performance across a range of few-shot recognition benchmarks.

Few-Shot Image Classification Few-Shot Learning

Controllable Top-down Feature Transformer

no code implementations6 Dec 2017 Zhiwei Jia, Haoshen Hong, Siyang Wang, Kwonjoon Lee, Zhuowen Tu

We study the intrinsic transformation of feature maps across convolutional network layers with explicit top-down control.

Data Augmentation Style Transfer

Wasserstein Introspective Neural Networks

1 code implementation CVPR 2018 Kwonjoon Lee, Weijian Xu, Fan Fan, Zhuowen Tu

We present Wasserstein introspective neural networks (WINN) that are both a generator and a discriminator within a single model.

General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.