Search Results for author: Liangzhe Yuan

Found 18 papers, 13 papers with code

EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras

2 code implementations • 19 Feb 2018 • Alex Zihao Zhu, Liangzhe Yuan, Kenneth Chaney, Kostas Daniilidis

Event-based cameras have shown great promise in a variety of situations where frame based cameras suffer, such as high speed motions and high dynamic range scenes.

Optical Flow Estimation

162

Paper
Code

Zoom-In-to-Check: Boosting Video Interpolation via Instance-level Discrimination

no code implementations • CVPR 2019 • Liangzhe Yuan, Yibo Chen, Hantian Liu, Tao Kong, Jianbo Shi

We propose a light-weight video frame interpolation algorithm.

Image Generation object-detection +3

Paper
Add Code

Unsupervised Event-based Learning of Optical Flow, Depth, and Egomotion

2 code implementations • CVPR 2019 • Alex Zihao Zhu, Liangzhe Yuan, Kenneth Chaney, Kostas Daniilidis

In this work, we propose a novel framework for unsupervised learning for event cameras that learns motion information from only the event stream.

Optical Flow Estimation

110

Paper
Code

View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose

2 code implementations • 23 Oct 2020 • Ting Liu, Jennifer J. Sun, Long Zhao, Jiaping Zhao, Liangzhe Yuan, Yuxiao Wang, Liang-Chieh Chen, Florian Schroff, Hartwig Adam

Recognition of human poses and actions is crucial for autonomous systems to interact smoothly with people.

3D Pose Estimation Action Recognition +2

32,904

Paper
Code

Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization

1 code implementation • CVPR 2021 • Long Zhao, Yuxiao Wang, Jiaping Zhao, Liangzhe Yuan, Jennifer J. Sun, Florian Schroff, Hartwig Adam, Xi Peng, Dimitris Metaxas, Ting Liu

To evaluate the power of the learned representations, in addition to the conventional fully-supervised action recognition settings, we introduce a novel task called single-shot cross-view action recognition.

Action Recognition Contrastive Learning +1

32,904

Paper
Code

MoViNets: Mobile Video Networks for Efficient Video Recognition

3 code implementations • CVPR 2021 • Dan Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Mingxing Tan, Matthew Brown, Boqing Gong

We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference.

Ranked #3 on Action Classification on Charades

Action Classification Action Recognition +4

76,628

Paper
Code

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

2 code implementations • NeurIPS 2021 • Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval.

Ranked #3 on Zero-Shot Video Retrieval on YouCook2 (text-to-video Mean Rank metric)

Action Classification Action Recognition In Videos +9

32,907

Paper
Code

DeepLab2: A TensorFlow Library for Deep Labeling

4 code implementations • 17 Jun 2021 • Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan, Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen

DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision.

990

Paper
Code

Exploring Temporal Granularity in Self-Supervised Video Representation Learning

no code implementations • 8 Dec 2021 • Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge Belongie, Ming-Hsuan Yang, Hartwig Adam, Yin Cui

The training objective consists of two parts: a fine-grained temporal learning objective to maximize the similarity between corresponding temporal embeddings in the short clip and the long clip, and a persistent temporal learning objective to pull together global embeddings of the two clips.

Representation Learning Self-Supervised Learning

Paper
Add Code

Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision

1 code implementation • CVPR 2022 • Liangzhe Yuan, Rui Qian, Yin Cui, Boqing Gong, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

Modern self-supervised learning algorithms typically enforce persistency of instance representations across views.

Action Recognition Contrastive Learning +4

76,628

Paper
Code

Surrogate Gap Minimization Improves Sharpness-Aware Training

1 code implementation • ICLR 2022 • Juntang Zhuang, Boqing Gong, Liangzhe Yuan, Yin Cui, Hartwig Adam, Nicha Dvornek, Sekhar Tatikonda, James Duncan, Ting Liu

Instead, we define a \textit{surrogate gap}, a measure equivalent to the dominant eigenvalue of Hessian at a local minimum when the radius of the neighborhood (to derive the perturbed loss) is small.

9,330

Paper
Code

Unified Visual Relationship Detection with Vision and Language Models

1 code implementation • ICCV 2023 • Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).

Human-Object Interaction Detection Relationship Detection +2

3,022

Paper
Code

Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding

no code implementations • 28 Mar 2023 • Yuanhao Xiong, Long Zhao, Boqing Gong, Ming-Hsuan Yang, Florian Schroff, Ting Liu, Cho-Jui Hsieh, Liangzhe Yuan

Existing video-language pre-training methods primarily focus on instance-level alignment between video clips and captions via global contrastive learning but neglect rich fine-grained local information in both videos and text, which is of importance to downstream tasks requiring temporal localization and semantic reasoning.

Action Recognition Contrastive Learning +7

Paper
Add Code

VideoGLUE: Video General Understanding Evaluation of Foundation Models

1 code implementation • 6 Jul 2023 • Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong

We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.

Action Recognition Temporal Localization +1

76,628

Paper
Code

Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition

1 code implementation • ICCV 2023 • Qitong Wang, Long Zhao, Liangzhe Yuan, Ting Liu, Xi Peng

To facilitate the data efficiency of multiview learning, we further perform video-text alignment for first-person and third-person videos, to fully leverage the semantic knowledge to improve video representations.

Multiview Learning Video Recognition

Paper
Code

PolyMaX: General Dense Prediction with Mask Transformer

1 code implementation • 9 Nov 2023 • Xuan Yang, Liangzhe Yuan, Kimberly Wilber, Astuti Sharma, Xiuye Gu, Siyuan Qiao, Stephanie Debats, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Liang-Chieh Chen

Despite this shift, methods based on the per-pixel prediction paradigm still dominate the benchmarks on the other dense prediction tasks that require continuous outputs, such as depth estimation and surface normal prediction.

Ranked #2 on Surface Normals Estimation on NYU Depth v2