no code implementations • 5 Oct 2024 • Long Zhao, Sanghyun Woo, Ziyu Wan, Yandong Li, Han Zhang, Boqing Gong, Hartwig Adam, Xuhui Jia, Ting Liu
We hope this work offers new insights into integrating iterative generation and autoencoding for improved compression and generation.
no code implementations • 18 Jul 2024 • Xiaoyu Zhu, Hao Zhou, Pengfei Xing, Long Zhao, Hao Xu, Junwei Liang, Alexander Hauptmann, Ting Liu, Andrew Gallagher
In this paper, we investigate the use of diffusion models which are pre-trained on large-scale image-caption pairs for open-vocabulary 3D semantic understanding.
no code implementations • 20 Feb 2024 • Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong
We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model.
no code implementations • CVPR 2024 • Yue Zhao, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, Hartwig Adam, Ting Liu, Boqing Gong, Philipp Krähenbühl, Liangzhe Yuan
Our best model outperforms state-of-the-art methods on MSR-VTT zero-shot text-to-video retrieval by 6%.
1 code implementation • CVPR 2024 • Shiyu Zhao, Long Zhao, Vijay Kumar B. G, Yumin Suh, Dimitris N. Metaxas, Manmohan Chandraker, Samuel Schulter
The recent progress in language-based open-vocabulary object detection can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations.
no code implementations • 22 Oct 2023 • Marcel Nutz, Kevin Webster, Long Zhao
We study how to unwind stochastic order flow with minimal transaction costs.
no code implementations • 2 Sep 2023 • Di Liu, Long Zhao, Qilong Zhangli, Yunhe Gao, Ting Liu, Dimitris N. Metaxas
The task of shape abstraction with semantic part consistency is challenging due to the complex geometries of natural objects.
1 code implementation • ICCV 2023 • Qitong Wang, Long Zhao, Liangzhe Yuan, Ting Liu, Xi Peng
To facilitate the data efficiency of multiview learning, we further perform video-text alignment for first-person and third-person videos, to fully leverage the semantic knowledge to improve video representations.
2 code implementations • CVPR 2024 • Shiyu Zhao, Samuel Schulter, Long Zhao, Zhixing Zhang, Vijay Kumar B. G, Yumin Suh, Manmohan Chandraker, Dimitris N. Metaxas
This work identifies two challenges of using self-training in OVD: noisy PLs from VLMs and frequent distribution changes of PLs.
1 code implementation • 6 Jul 2023 • Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong
We evaluate the video understanding capabilities of existing foundation models (FMs) using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring an FM for downstream tasks.
no code implementations • 28 Mar 2023 • Yuanhao Xiong, Long Zhao, Boqing Gong, Ming-Hsuan Yang, Florian Schroff, Ting Liu, Cho-Jui Hsieh, Liangzhe Yuan
Existing video-language pre-training methods primarily focus on instance-level alignment between video clips and captions via global contrastive learning but neglect rich fine-grained local information in both videos and text, which is of importance to downstream tasks requiring temporal localization and semantic reasoning.
1 code implementation • ICCV 2023 • Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu
To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).
Human-Object Interaction Detection Relationship Detection +2
2 code implementations • 16 Mar 2023 • Zhuowei Li, Long Zhao, Zizhao Zhang, Han Zhang, Di Liu, Ting Liu, Dimitris N. Metaxas
In the context of continual learning, prototypes-as representative class embeddings-offer advantages in memory conservation and the mitigation of catastrophic forgetting.
1 code implementation • 20 Jul 2022 • Yuxiao Chen, Long Zhao, Jianbo Yuan, Yu Tian, Zhaoyang Xia, Shijie Geng, Ligong Han, Dimitris N. Metaxas
Despite the success of fully-supervised human skeleton sequence modeling, utilizing self-supervised pre-training for skeleton sequence representation learning has been an active field because acquiring task-specific skeleton annotations at large scales is difficult.
1 code implementation • 18 Jul 2022 • Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B. G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris Metaxas
We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectively generating pseudo labels for object detection.
Ranked #19 on Open Vocabulary Object Detection on MSCOCO (using extra training data)
no code implementations • 26 Apr 2022 • Marcel Nutz, Johannes Wiesel, Long Zhao
We show that pointwise limits of semistatic trading strategies in discrete time are again semistatic strategies.
no code implementations • 26 Apr 2022 • Marcel Nutz, Johannes Wiesel, Long Zhao
In a two-period financial market where a stock is traded dynamically and European options at maturity are traded statically, we study the so-called martingale Schr\"odinger bridge Q*; that is, the minimal-entropy martingale measure among all models calibrated to option prices.
no code implementations • CVPR 2022 • Mengmeng Ma, Jian Ren, Long Zhao, Davide Testuggine, Xi Peng
Based on these findings, we propose a principle method to improve the robustness of Transformer models by automatically searching for an optimal fusion strategy regarding input data.
1 code implementation • CVPR 2022 • Shiyu Zhao, Long Zhao, Zhixing Zhang, Enyu Zhou, Dimitris Metaxas
In this paper, inspired by the traditional matching-optimization methods where matching is introduced to handle large displacements before energy-based optimizations, we introduce a simple but effective global matching step before the direct regression and develop a learning-based matching-optimization framework, namely GMFlowNet.
Ranked #4 on Optical Flow Estimation on KITTI 2015
no code implementations • 19 Mar 2022 • Pengzun Gao, Long Zhao, Kan Zheng, Pingzhi Fan
The dual-function radar communication (DFRC) is an essential technology in Internet of Vehicles (IoV).
1 code implementation • 11 Dec 2021 • Honglu Zhou, Asim Kadav, Aviv Shamsian, Shijie Geng, Farley Lai, Long Zhao, Ting Liu, Mubbasir Kapadia, Hans Peter Graf
Group Activity Recognition detects the activity collectively performed by a group of actors, which requires compositional reasoning of actors and objects.
Ranked #2 on Group Activity Recognition on Collective Activity
no code implementations • 5 Aug 2021 • Xi Peng, Fengchun Qiao, Long Zhao
We are concerned with a worst-case scenario in model generalization, in the sense that a model aims to perform well on many unseen domains while there is only one single domain available for training.
1 code implementation • NeurIPS 2021 • Long Zhao, Zizhao Zhang, Ting Chen, Dimitris N. Metaxas, Han Zhang
Attention-based models, exemplified by the Transformer, can effectively model long range dependency, but suffer from the quadratic complexity of self-attention operation, making them difficult to be adopted for high-resolution image generation based on Generative Adversarial Networks (GANs).
Ranked #2 on Image Generation on CelebA 256x256 (FID metric)
6 code implementations • 26 May 2021 • Zizhao Zhang, Han Zhang, Long Zhao, Ting Chen, Sercan O. Arik, Tomas Pfister
Hierarchical structures are popular in recent vision transformers, however, they require sophisticated designs and massive datasets to work well.
Ranked #89 on Image Classification on CIFAR-10
no code implementations • 20 May 2021 • Yuxiao Chen, Jianbo Yuan, Long Zhao, Tianlang Chen, Rui Luo, Larry Davis, Dimitris N. Metaxas
Cross-modal attention mechanisms have been widely applied to the image-text matching task and have achieved remarkable improvements thanks to its capability of learning fine-grained relevance across different modalities.
1 code implementation • 9 Mar 2021 • Mengmeng Ma, Jian Ren, Long Zhao, Sergey Tulyakov, Cathy Wu, Xi Peng
A common assumption in multimodal learning is the completeness of training data, i. e., full modalities are available in all training examples.
no code implementations • 1 Feb 2021 • WeiJie Chen, Yilu Guo, Shicai Yang, Zhaoyang Li, Zhenxin Ma, Binbin Chen, Long Zhao, Di Xie, ShiLiang Pu, Yueting Zhuang
Therefore, it yields our attention to suppress false positive in each target domain in an unsupervised way.
1 code implementation • CVPR 2021 • Long Zhao, Yuxiao Wang, Jiaping Zhao, Liangzhe Yuan, Jennifer J. Sun, Florian Schroff, Hartwig Adam, Xi Peng, Dimitris Metaxas, Ting Liu
To evaluate the power of the learned representations, in addition to the conventional fully-supervised action recognition settings, we introduce a novel task called single-shot cross-view action recognition.
2 code implementations • 23 Oct 2020 • Ting Liu, Jennifer J. Sun, Long Zhao, Jiaping Zhao, Liangzhe Yuan, Yuxiao Wang, Liang-Chieh Chen, Florian Schroff, Hartwig Adam
Recognition of human poses and actions is crucial for autonomous systems to interact smoothly with people.
1 code implementation • NeurIPS 2020 • Long Zhao, Ting Liu, Xi Peng, Dimitris Metaxas
In this paper, we propose a novel and effective regularization term for adversarial data augmentation.
no code implementations • 10 Aug 2020 • Kuan Fang, Long Zhao, Zhan Shen, RuiXing Wang, RiKang Zhour, LiWen Fan
Search engine has become a fundamental component in various web and mobile applications.
no code implementations • CVPR 2020 • Long Zhao, Xi Peng, Yuxiao Chen, Mubbasir Kapadia, Dimitris N. Metaxas
Our key idea is to generalize the distilled cross-modal knowledge learned from a Source dataset, which contains paired examples from both modalities, to the Target dataset by modeling knowledge as priors on parameters of the Student.
1 code implementation • CVPR 2020 • Fengchun Qiao, Long Zhao, Xi Peng
We are concerned with a worst-case scenario in model generalization, in the sense that a model aims to perform well on many unseen domains while there is only one single domain available for training.
1 code implementation • NeurIPS 2019 • Yu Tian, Long Zhao, Xi Peng, Dimitris N. Metaxas
Graph kernels are kernel methods measuring graph similarity and serve as a standard tool for graph classification.
Ranked #8 on Link Prediction on Cora
1 code implementation • 20 Jul 2019 • Yuxiao Chen, Long Zhao, Xi Peng, Jianbo Yuan, Dimitris N. Metaxas
We propose a Dynamic Graph-Based Spatial-Temporal Attention (DG-STA) method for hand gesture recognition.
Ranked #4 on Hand Gesture Recognition on SHREC 2017
5 code implementations • CVPR 2019 • Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, Dimitris N. Metaxas
In this paper, we study the problem of learning Graph Convolutional Networks (GCNs) for regression.
Ranked #26 on Monocular 3D Human Pose Estimation on Human3.6M
no code implementations • 25 Feb 2019 • Shiwen Liu, Kan Zheng, Long Zhao, Pingzhi Fan
Experimental results show that the HMMs trained with the continuous characterization of mobility features can give a higher prediction accuracy when they are used for predicting driving intentions.
no code implementations • 25 Feb 2019 • Lingyi Han, Kan Zheng, Long Zhao, Xianbin Wang, Xuemin Shen
Therefore, a framework combining with a deep clustering (DeepCluster) module is developed for STTP at largescale networks in this paper.
1 code implementation • ECCV 2018 • Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, Dimitris Metaxas
We consider the problem of image-to-video translation, where an input image is translated into an output video containing motions of a single object.
1 code implementation • 28 Jun 2018 • Yu Tian, Xi Peng, Long Zhao, Shaoting Zhang, Dimitris N. Metaxas
Generating multi-view images from a single-view input is an essential yet challenging problem.
no code implementations • 25 Mar 2017 • Long Zhao, Fangda Han, Xi Peng, Xun Zhang, Mubbasir Kapadia, Vladimir Pavlovic, Dimitris N. Metaxas
We first recover the facial identity and expressions from the video by fitting a face morphable model for each frame.
no code implementations • 3 Mar 2017 • Dingwen Zhang, Deyu Meng, Long Zhao, Junwei Han
Weakly-supervised object detection (WOD) is a challenging problems in computer vision.
Ranked #40 on Weakly Supervised Object Detection on PASCAL VOC 2007
no code implementations • CVPR 2015 • Chaoyang Wang, Long Zhao, Shuang Liang, Liqing Zhang, Jinyuan Jia, Yichen Wei
Hierarchical segmentation based object proposal methods have become an important step in modern object detection paradigm.