Search Results for author: Long Zhao

Found 37 papers, 18 papers with code

Deep Deformable Models: Learning 3D Shape Abstractions with Part Consistency

no code implementations2 Sep 2023 Di Liu, Long Zhao, Qilong Zhangli, Yunhe Gao, Ting Liu, Dimitris N. Metaxas

The task of shape abstraction with semantic part consistency is challenging due to the complex geometries of natural objects.

Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition

1 code implementation22 Aug 2023 Qitong Wang, Long Zhao, Liangzhe Yuan, Ting Liu, Xi Peng

To facilitate the data efficiency of multiview learning, we further perform video-text alignment for first-person and third-person videos, to fully leverage the semantic knowledge to improve video representations.

Multiview Learning Video Recognition

Improving Pseudo Labels for Open-Vocabulary Object Detection

no code implementations11 Aug 2023 Shiyu Zhao, Samuel Schulter, Long Zhao, Zhixing Zhang, Vijay Kumar B. G, Yumin Suh, Manmohan Chandraker, Dimitris N. Metaxas

Second, a split-and-fusion (SAF) head is designed to remove the noise in localization of PLs, which is usually ignored in existing methods.

object-detection Open Vocabulary Object Detection

VideoGLUE: Video General Understanding Evaluation of Foundation Models

no code implementations6 Jul 2023 Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong

We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.

Action Recognition Temporal Localization +1

Spatiotemporally Discriminative Video-Language Pre-Training with Text Grounding

no code implementations28 Mar 2023 Yuanhao Xiong, Long Zhao, Boqing Gong, Ming-Hsuan Yang, Florian Schroff, Ting Liu, Cho-Jui Hsieh, Liangzhe Yuan

Most of existing video-language pre-training methods focus on instance-level alignment between video clips and captions via global contrastive learning but neglect rich fine-grained local information, which is of importance to downstream tasks requiring temporal localization and semantic reasoning.

Action Recognition Contrastive Learning +6

Unified Visual Relationship Detection with Vision and Language Models

1 code implementation16 Mar 2023 Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).

Human-Object Interaction Detection Relationship Detection +2

Steering Prototype with Prompt-tuning for Rehearsal-free Continual Learning

no code implementations16 Mar 2023 Zhuowei Li, Long Zhao, Zizhao Zhang, Han Zhang, Di Liu, Ting Liu, Dimitris N. Metaxas

Prototype, as a representation of class embeddings, has been explored to reduce memory footprint or mitigate forgetting for continual learning scenarios.

class-incremental learning Class Incremental Learning +2

Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning

1 code implementation20 Jul 2022 Yuxiao Chen, Long Zhao, Jianbo Yuan, Yu Tian, Zhaoyang Xia, Shijie Geng, Ligong Han, Dimitris N. Metaxas

Despite the success of fully-supervised human skeleton sequence modeling, utilizing self-supervised pre-training for skeleton sequence representation learning has been an active field because acquiring task-specific skeleton annotations at large scales is difficult.

Action Detection Action Recognition +3

Exploiting Unlabeled Data with Vision and Language Models for Object Detection

1 code implementation18 Jul 2022 Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B. G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris Metaxas

We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectively generating pseudo labels for object detection.

Ranked #7 on Open Vocabulary Object Detection on MSCOCO (using extra training data)

object-detection Open Vocabulary Object Detection +2

Limits of Semistatic Trading Strategies

no code implementations26 Apr 2022 Marcel Nutz, Johannes Wiesel, Long Zhao

We show that pointwise limits of semistatic trading strategies in discrete time are again semistatic strategies.

Martingale Schrödinger Bridges and Optimal Semistatic Portfolios

no code implementations26 Apr 2022 Marcel Nutz, Johannes Wiesel, Long Zhao

In a two-period financial market where a stock is traded dynamically and European options at maturity are traded statically, we study the so-called martingale Schr\"odinger bridge Q*; that is, the minimal-entropy martingale measure among all models calibrated to option prices.

Are Multimodal Transformers Robust to Missing Modality?

no code implementations CVPR 2022 Mengmeng Ma, Jian Ren, Long Zhao, Davide Testuggine, Xi Peng

Based on these findings, we propose a principle method to improve the robustness of Transformer models by automatically searching for an optimal fusion strategy regarding input data.

Global Matching with Overlapping Attention for Optical Flow Estimation

1 code implementation CVPR 2022 Shiyu Zhao, Long Zhao, Zhixing Zhang, Enyu Zhou, Dimitris Metaxas

In this paper, inspired by the traditional matching-optimization methods where matching is introduced to handle large displacements before energy-based optimizations, we introduce a simple but effective global matching step before the direct regression and develop a learning-based matching-optimization framework, namely GMFlowNet.

Optical Flow Estimation regression

Min-Max Latency Optimization Based on Sensed Position State Information in Internet of Vehicles

no code implementations19 Mar 2022 Pengzun Gao, Long Zhao, Kan Zheng, Pingzhi Fan

The dual-function radar communication (DFRC) is an essential technology in Internet of Vehicles (IoV).

Out-of-Domain Generalization from a Single Source: An Uncertainty Quantification Approach

no code implementations5 Aug 2021 Xi Peng, Fengchun Qiao, Long Zhao

We are concerned with a worst-case scenario in model generalization, in the sense that a model aims to perform well on many unseen domains while there is only one single domain available for training.

Domain Generalization Image Classification +4

Improved Transformer for High-Resolution GANs

1 code implementation NeurIPS 2021 Long Zhao, Zizhao Zhang, Ting Chen, Dimitris N. Metaxas, Han Zhang

Attention-based models, exemplified by the Transformer, can effectively model long range dependency, but suffer from the quadratic complexity of self-attention operation, making them difficult to be adopted for high-resolution image generation based on Generative Adversarial Networks (GANs).

Ranked #2 on Image Generation on CelebA 256x256 (FID metric)

Image Generation Vocal Bursts Intensity Prediction

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

6 code implementations26 May 2021 Zizhao Zhang, Han Zhang, Long Zhao, Ting Chen, Sercan O. Arik, Tomas Pfister

Hierarchical structures are popular in recent vision transformers, however, they require sophisticated designs and massive datasets to work well.

Image Classification Image Generation

More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image-Text Matching

no code implementations20 May 2021 Yuxiao Chen, Jianbo Yuan, Long Zhao, Tianlang Chen, Rui Luo, Larry Davis, Dimitris N. Metaxas

Cross-modal attention mechanisms have been widely applied to the image-text matching task and have achieved remarkable improvements thanks to its capability of learning fine-grained relevance across different modalities.

Contrastive Learning Image Captioning +4

SMIL: Multimodal Learning with Severely Missing Modality

1 code implementation9 Mar 2021 Mengmeng Ma, Jian Ren, Long Zhao, Sergey Tulyakov, Cathy Wu, Xi Peng

A common assumption in multimodal learning is the completeness of training data, i. e., full modalities are available in all training examples.


Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization

1 code implementation CVPR 2021 Long Zhao, Yuxiao Wang, Jiaping Zhao, Liangzhe Yuan, Jennifer J. Sun, Florian Schroff, Hartwig Adam, Xi Peng, Dimitris Metaxas, Ting Liu

To evaluate the power of the learned representations, in addition to the conventional fully-supervised action recognition settings, we introduce a novel task called single-shot cross-view action recognition.

Action Recognition Contrastive Learning +1

Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge

no code implementations CVPR 2020 Long Zhao, Xi Peng, Yuxiao Chen, Mubbasir Kapadia, Dimitris N. Metaxas

Our key idea is to generalize the distilled cross-modal knowledge learned from a Source dataset, which contains paired examples from both modalities, to the Target dataset by modeling knowledge as priors on parameters of the Student.

3D Hand Pose Estimation Knowledge Distillation

Learning to Learn Single Domain Generalization

1 code implementation CVPR 2020 Fengchun Qiao, Long Zhao, Xi Peng

We are concerned with a worst-case scenario in model generalization, in the sense that a model aims to perform well on many unseen domains while there is only one single domain available for training.

Domain Generalization Meta-Learning

Short-term Road Traffic Prediction based on Deep Cluster at Large-scale Networks

no code implementations25 Feb 2019 Lingyi Han, Kan Zheng, Long Zhao, Xianbin Wang, Xuemin Shen

Therefore, a framework combining with a deep clustering (DeepCluster) module is developed for STTP at largescale networks in this paper.

Clustering Deep Clustering +3

A Driving Intention Prediction Method Based on Hidden Markov Model for Autonomous Driving

no code implementations25 Feb 2019 Shiwen Liu, Kan Zheng, Long Zhao, Pingzhi Fan

Experimental results show that the HMMs trained with the continuous characterization of mobility features can give a higher prediction accuracy when they are used for predicting driving intentions.

Autonomous Driving

Learning to Forecast and Refine Residual Motion for Image-to-Video Generation

1 code implementation ECCV 2018 Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, Dimitris Metaxas

We consider the problem of image-to-video translation, where an input image is translated into an output video containing motions of a single object.

Human Pose Forecasting Image to Video Generation +1

CR-GAN: Learning Complete Representations for Multi-view Generation

1 code implementation28 Jun 2018 Yu Tian, Xi Peng, Long Zhao, Shaoting Zhang, Dimitris N. Metaxas

Generating multi-view images from a single-view input is an essential yet challenging problem.

Self-Supervised Learning

Cartoonish sketch-based face editing in videos using identity deformation transfer

no code implementations25 Mar 2017 Long Zhao, Fangda Han, Xi Peng, Xun Zhang, Mubbasir Kapadia, Vladimir Pavlovic, Dimitris N. Metaxas

We first recover the facial identity and expressions from the video by fitting a face morphable model for each frame.

Face Model

Object Proposal by Multi-Branch Hierarchical Segmentation

no code implementations CVPR 2015 Chaoyang Wang, Long Zhao, Shuang Liang, Liqing Zhang, Jinyuan Jia, Yichen Wei

Hierarchical segmentation based object proposal methods have become an important step in modern object detection paradigm.

object-detection Object Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.