Search Results for author: Long Zhao

Found 43 papers, 22 papers with code

$ε$-VAE: Denoising as Visual Decoding

no code implementations5 Oct 2024 Long Zhao, Sanghyun Woo, Ziyu Wan, Yandong Li, Han Zhang, Boqing Gong, Hartwig Adam, Xuhui Jia, Ting Liu

We hope this work offers new insights into integrating iterative generation and autoencoding for improved compression and generation.

Decoder Denoising

Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

no code implementations18 Jul 2024 Xiaoyu Zhu, Hao Zhou, Pengfei Xing, Long Zhao, Hao Xu, Junwei Liang, Alexander Hauptmann, Ting Liu, Andrew Gallagher

In this paper, we investigate the use of diffusion models which are pre-trained on large-scale image-caption pairs for open-vocabulary 3D semantic understanding.

3D Semantic Segmentation Visual Grounding

Generating Enhanced Negatives for Training Language-Based Object Detectors

1 code implementation CVPR 2024 Shiyu Zhao, Long Zhao, Vijay Kumar B. G, Yumin Suh, Dimitris N. Metaxas, Manmohan Chandraker, Samuel Schulter

The recent progress in language-based open-vocabulary object detection can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations.

Object object-detection +1

Unwinding Stochastic Order Flow: When to Warehouse Trades

no code implementations22 Oct 2023 Marcel Nutz, Kevin Webster, Long Zhao

We study how to unwind stochastic order flow with minimal transaction costs.

Deep Deformable Models: Learning 3D Shape Abstractions with Part Consistency

no code implementations2 Sep 2023 Di Liu, Long Zhao, Qilong Zhangli, Yunhe Gao, Ting Liu, Dimitris N. Metaxas

The task of shape abstraction with semantic part consistency is challenging due to the complex geometries of natural objects.

Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition

1 code implementation ICCV 2023 Qitong Wang, Long Zhao, Liangzhe Yuan, Ting Liu, Xi Peng

To facilitate the data efficiency of multiview learning, we further perform video-text alignment for first-person and third-person videos, to fully leverage the semantic knowledge to improve video representations.

Multiview Learning Video Recognition

VideoGLUE: Video General Understanding Evaluation of Foundation Models

1 code implementation6 Jul 2023 Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong

We evaluate the video understanding capabilities of existing foundation models (FMs) using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring an FM for downstream tasks.

Action Recognition Temporal Localization +1

Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding

no code implementations28 Mar 2023 Yuanhao Xiong, Long Zhao, Boqing Gong, Ming-Hsuan Yang, Florian Schroff, Ting Liu, Cho-Jui Hsieh, Liangzhe Yuan

Existing video-language pre-training methods primarily focus on instance-level alignment between video clips and captions via global contrastive learning but neglect rich fine-grained local information in both videos and text, which is of importance to downstream tasks requiring temporal localization and semantic reasoning.

Action Recognition Contrastive Learning +7

Unified Visual Relationship Detection with Vision and Language Models

1 code implementation ICCV 2023 Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).

Human-Object Interaction Detection Relationship Detection +2

Steering Prototypes with Prompt-tuning for Rehearsal-free Continual Learning

2 code implementations16 Mar 2023 Zhuowei Li, Long Zhao, Zizhao Zhang, Han Zhang, Di Liu, Ting Liu, Dimitris N. Metaxas

In the context of continual learning, prototypes-as representative class embeddings-offer advantages in memory conservation and the mitigation of catastrophic forgetting.

class-incremental learning Class Incremental Learning +2

Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning

1 code implementation20 Jul 2022 Yuxiao Chen, Long Zhao, Jianbo Yuan, Yu Tian, Zhaoyang Xia, Shijie Geng, Ligong Han, Dimitris N. Metaxas

Despite the success of fully-supervised human skeleton sequence modeling, utilizing self-supervised pre-training for skeleton sequence representation learning has been an active field because acquiring task-specific skeleton annotations at large scales is difficult.

Action Detection Action Recognition +3

Exploiting Unlabeled Data with Vision and Language Models for Object Detection

1 code implementation18 Jul 2022 Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B. G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris Metaxas

We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectively generating pseudo labels for object detection.

Ranked #19 on Open Vocabulary Object Detection on MSCOCO (using extra training data)

Object object-detection +3

Limits of Semistatic Trading Strategies

no code implementations26 Apr 2022 Marcel Nutz, Johannes Wiesel, Long Zhao

We show that pointwise limits of semistatic trading strategies in discrete time are again semistatic strategies.

Martingale Schrödinger Bridges and Optimal Semistatic Portfolios

no code implementations26 Apr 2022 Marcel Nutz, Johannes Wiesel, Long Zhao

In a two-period financial market where a stock is traded dynamically and European options at maturity are traded statically, we study the so-called martingale Schr\"odinger bridge Q*; that is, the minimal-entropy martingale measure among all models calibrated to option prices.

Position

Are Multimodal Transformers Robust to Missing Modality?

no code implementations CVPR 2022 Mengmeng Ma, Jian Ren, Long Zhao, Davide Testuggine, Xi Peng

Based on these findings, we propose a principle method to improve the robustness of Transformer models by automatically searching for an optimal fusion strategy regarding input data.

Global Matching with Overlapping Attention for Optical Flow Estimation

1 code implementation CVPR 2022 Shiyu Zhao, Long Zhao, Zhixing Zhang, Enyu Zhou, Dimitris Metaxas

In this paper, inspired by the traditional matching-optimization methods where matching is introduced to handle large displacements before energy-based optimizations, we introduce a simple but effective global matching step before the direct regression and develop a learning-based matching-optimization framework, namely GMFlowNet.

Optical Flow Estimation regression

Min-Max Latency Optimization Based on Sensed Position State Information in Internet of Vehicles

no code implementations19 Mar 2022 Pengzun Gao, Long Zhao, Kan Zheng, Pingzhi Fan

The dual-function radar communication (DFRC) is an essential technology in Internet of Vehicles (IoV).

Position

Out-of-Domain Generalization from a Single Source: An Uncertainty Quantification Approach

no code implementations5 Aug 2021 Xi Peng, Fengchun Qiao, Long Zhao

We are concerned with a worst-case scenario in model generalization, in the sense that a model aims to perform well on many unseen domains while there is only one single domain available for training.

Domain Generalization Image Classification +5

Improved Transformer for High-Resolution GANs

1 code implementation NeurIPS 2021 Long Zhao, Zizhao Zhang, Ting Chen, Dimitris N. Metaxas, Han Zhang

Attention-based models, exemplified by the Transformer, can effectively model long range dependency, but suffer from the quadratic complexity of self-attention operation, making them difficult to be adopted for high-resolution image generation based on Generative Adversarial Networks (GANs).

Ranked #2 on Image Generation on CelebA 256x256 (FID metric)

Image Generation Vocal Bursts Intensity Prediction

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

6 code implementations26 May 2021 Zizhao Zhang, Han Zhang, Long Zhao, Ting Chen, Sercan O. Arik, Tomas Pfister

Hierarchical structures are popular in recent vision transformers, however, they require sophisticated designs and massive datasets to work well.

Decoder Image Classification +1

More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image-Text Matching

no code implementations20 May 2021 Yuxiao Chen, Jianbo Yuan, Long Zhao, Tianlang Chen, Rui Luo, Larry Davis, Dimitris N. Metaxas

Cross-modal attention mechanisms have been widely applied to the image-text matching task and have achieved remarkable improvements thanks to its capability of learning fine-grained relevance across different modalities.

Contrastive Learning Image Captioning +4

SMIL: Multimodal Learning with Severely Missing Modality

1 code implementation9 Mar 2021 Mengmeng Ma, Jian Ren, Long Zhao, Sergey Tulyakov, Cathy Wu, Xi Peng

A common assumption in multimodal learning is the completeness of training data, i. e., full modalities are available in all training examples.

Meta-Learning

Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization

1 code implementation CVPR 2021 Long Zhao, Yuxiao Wang, Jiaping Zhao, Liangzhe Yuan, Jennifer J. Sun, Florian Schroff, Hartwig Adam, Xi Peng, Dimitris Metaxas, Ting Liu

To evaluate the power of the learned representations, in addition to the conventional fully-supervised action recognition settings, we introduce a novel task called single-shot cross-view action recognition.

Action Recognition Contrastive Learning +1

Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge

no code implementations CVPR 2020 Long Zhao, Xi Peng, Yuxiao Chen, Mubbasir Kapadia, Dimitris N. Metaxas

Our key idea is to generalize the distilled cross-modal knowledge learned from a Source dataset, which contains paired examples from both modalities, to the Target dataset by modeling knowledge as priors on parameters of the Student.

3D Hand Pose Estimation Knowledge Distillation

Learning to Learn Single Domain Generalization

1 code implementation CVPR 2020 Fengchun Qiao, Long Zhao, Xi Peng

We are concerned with a worst-case scenario in model generalization, in the sense that a model aims to perform well on many unseen domains while there is only one single domain available for training.

Domain Generalization Meta-Learning

A Driving Intention Prediction Method Based on Hidden Markov Model for Autonomous Driving

no code implementations25 Feb 2019 Shiwen Liu, Kan Zheng, Long Zhao, Pingzhi Fan

Experimental results show that the HMMs trained with the continuous characterization of mobility features can give a higher prediction accuracy when they are used for predicting driving intentions.

Autonomous Driving

Short-term Road Traffic Prediction based on Deep Cluster at Large-scale Networks

no code implementations25 Feb 2019 Lingyi Han, Kan Zheng, Long Zhao, Xianbin Wang, Xuemin Shen

Therefore, a framework combining with a deep clustering (DeepCluster) module is developed for STTP at largescale networks in this paper.

Clustering Deep Clustering +4

Learning to Forecast and Refine Residual Motion for Image-to-Video Generation

1 code implementation ECCV 2018 Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, Dimitris Metaxas

We consider the problem of image-to-video translation, where an input image is translated into an output video containing motions of a single object.

Human Pose Forecasting Image to Video Generation +1

Cartoonish sketch-based face editing in videos using identity deformation transfer

no code implementations25 Mar 2017 Long Zhao, Fangda Han, Xi Peng, Xun Zhang, Mubbasir Kapadia, Vladimir Pavlovic, Dimitris N. Metaxas

We first recover the facial identity and expressions from the video by fitting a face morphable model for each frame.

Face Model

Object Proposal by Multi-Branch Hierarchical Segmentation

no code implementations CVPR 2015 Chaoyang Wang, Long Zhao, Shuang Liang, Liqing Zhang, Jinyuan Jia, Yichen Wei

Hierarchical segmentation based object proposal methods have become an important step in modern object detection paradigm.

Object object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.