Search Results for author: Kwonjoon Lee

Found 12 papers, 5 papers with code

Vamos: Versatile Action Models for Video Understanding

no code implementations22 Nov 2023 Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun

What makes good video representations for video understanding, such as anticipating future activities, or answering video-conditioned questions?

Language Modelling Large Language Model +2

Object-centric Video Representation for Long-term Action Anticipation

1 code implementation31 Oct 2023 Ce Zhang, Changcheng Fu, Shijie Wang, Nakul Agarwal, Kwonjoon Lee, Chiho Choi, Chen Sun

To recognize and predict human-object interactions, we use a Transformer-based neural architecture which allows the "retrieval" of relevant objects for action anticipation at various time scales.

Action Anticipation Human-Object Interaction Detection +4

AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

no code implementations31 Jul 2023 Qi Zhao, Shijie Wang, Ce Zhang, Changcheng Fu, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun

We propose to formulate the LTA task from two perspectives: a bottom-up approach that predicts the next actions autoregressively by modeling temporal dynamics; and a top-down approach that infers the goal of the actor and plans the needed procedure to accomplish the goal.

Action Anticipation counterfactual +1

AdamsFormer for Spatial Action Localization in the Future

no code implementations CVPR 2023 Hyung-gun Chi, Kwonjoon Lee, Nakul Agarwal, Yi Xu, Karthik Ramani, Chiho Choi

SALF is challenging because it requires understanding the underlying physics of video observations to predict future action locations accurately.

Action Localization

ViTGAN: Training GANs with Vision Transformers

3 code implementations ICLR 2022 Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu

Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases.

Image Generation

Dual Contradistinctive Generative Autoencoder

no code implementations CVPR 2021 Gaurav Parmar, Dacheng Li, Kwonjoon Lee, Zhuowen Tu

Our model, named dual contradistinctive generative autoencoder (DC-VAE), integrates an instance-level discriminative loss (maintaining the instance-level fidelity for the reconstruction/synthesis) with a set-level adversarial loss (encouraging the set-level fidelity for there construction/synthesis), both being contradistinctive.

Image Generation Image Reconstruction +1

Learning Instance Occlusion for Panoptic Segmentation

1 code implementation CVPR 2020 Justin Lazarow, Kwonjoon Lee, Kunyu Shi, Zhuowen Tu

Panoptic segmentation requires segments of both "things" (countable object instances) and "stuff" (uncountable and amorphous regions) within a single output.

Instance Segmentation Panoptic Segmentation +2

Meta-Learning with Differentiable Convex Optimization

7 code implementations CVPR 2019 Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, Stefano Soatto

We propose to use these predictors as base learners to learn representations for few-shot learning and show they offer better tradeoffs between feature size and performance across a range of few-shot recognition benchmarks.

Few-Shot Image Classification Few-Shot Learning

Controllable Top-down Feature Transformer

no code implementations6 Dec 2017 Zhiwei Jia, Haoshen Hong, Siyang Wang, Kwonjoon Lee, Zhuowen Tu

We study the intrinsic transformation of feature maps across convolutional network layers with explicit top-down control.

Data Augmentation Style Transfer

Wasserstein Introspective Neural Networks

1 code implementation CVPR 2018 Kwonjoon Lee, Weijian Xu, Fan Fan, Zhuowen Tu

We present Wasserstein introspective neural networks (WINN) that are both a generator and a discriminator within a single model.

General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.