no code implementations • 2 Sep 2023 • Di Liu, Long Zhao, Qilong Zhangli, Yunhe Gao, Ting Liu, Dimitris N. Metaxas
The task of shape abstraction with semantic part consistency is challenging due to the complex geometries of natural objects.
1 code implementation • 22 Aug 2023 • Qitong Wang, Long Zhao, Liangzhe Yuan, Ting Liu, Xi Peng
To facilitate the data efficiency of multiview learning, we further perform video-text alignment for first-person and third-person videos, to fully leverage the semantic knowledge to improve video representations.
no code implementations • 11 Aug 2023 • Shiyu Zhao, Samuel Schulter, Long Zhao, Zhixing Zhang, Vijay Kumar B. G, Yumin Suh, Manmohan Chandraker, Dimitris N. Metaxas
Second, a split-and-fusion (SAF) head is designed to remove the noise in localization of PLs, which is usually ignored in existing methods.
no code implementations • 6 Jul 2023 • Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong
We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.
no code implementations • 28 Mar 2023 • Yuanhao Xiong, Long Zhao, Boqing Gong, Ming-Hsuan Yang, Florian Schroff, Ting Liu, Cho-Jui Hsieh, Liangzhe Yuan
Most of existing video-language pre-training methods focus on instance-level alignment between video clips and captions via global contrastive learning but neglect rich fine-grained local information, which is of importance to downstream tasks requiring temporal localization and semantic reasoning.
1 code implementation • 16 Mar 2023 • Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu
To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).
Human-Object Interaction Detection
Relationship Detection
+2
no code implementations • 16 Mar 2023 • Zhuowei Li, Long Zhao, Zizhao Zhang, Han Zhang, Di Liu, Ting Liu, Dimitris N. Metaxas
Prototype, as a representation of class embeddings, has been explored to reduce memory footprint or mitigate forgetting for continual learning scenarios.
1 code implementation • 20 Jul 2022 • Yuxiao Chen, Long Zhao, Jianbo Yuan, Yu Tian, Zhaoyang Xia, Shijie Geng, Ligong Han, Dimitris N. Metaxas
Despite the success of fully-supervised human skeleton sequence modeling, utilizing self-supervised pre-training for skeleton sequence representation learning has been an active field because acquiring task-specific skeleton annotations at large scales is difficult.
1 code implementation • 18 Jul 2022 • Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B. G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris Metaxas
We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectively generating pseudo labels for object detection.
Ranked #7 on
Open Vocabulary Object Detection
on MSCOCO
(using extra training data)
no code implementations • 26 Apr 2022 • Marcel Nutz, Johannes Wiesel, Long Zhao
We show that pointwise limits of semistatic trading strategies in discrete time are again semistatic strategies.
no code implementations • 26 Apr 2022 • Marcel Nutz, Johannes Wiesel, Long Zhao
In a two-period financial market where a stock is traded dynamically and European options at maturity are traded statically, we study the so-called martingale Schr\"odinger bridge Q*; that is, the minimal-entropy martingale measure among all models calibrated to option prices.
no code implementations • CVPR 2022 • Mengmeng Ma, Jian Ren, Long Zhao, Davide Testuggine, Xi Peng
Based on these findings, we propose a principle method to improve the robustness of Transformer models by automatically searching for an optimal fusion strategy regarding input data.
1 code implementation • CVPR 2022 • Shiyu Zhao, Long Zhao, Zhixing Zhang, Enyu Zhou, Dimitris Metaxas
In this paper, inspired by the traditional matching-optimization methods where matching is introduced to handle large displacements before energy-based optimizations, we introduce a simple but effective global matching step before the direct regression and develop a learning-based matching-optimization framework, namely GMFlowNet.
Ranked #2 on
Optical Flow Estimation
on KITTI 2015
no code implementations • 19 Mar 2022 • Pengzun Gao, Long Zhao, Kan Zheng, Pingzhi Fan
The dual-function radar communication (DFRC) is an essential technology in Internet of Vehicles (IoV).
1 code implementation • 11 Dec 2021 • Honglu Zhou, Asim Kadav, Aviv Shamsian, Shijie Geng, Farley Lai, Long Zhao, Ting Liu, Mubbasir Kapadia, Hans Peter Graf
Group Activity Recognition detects the activity collectively performed by a group of actors, which requires compositional reasoning of actors and objects.
Ranked #2 on
Group Activity Recognition
on Collective Activity
no code implementations • 5 Aug 2021 • Xi Peng, Fengchun Qiao, Long Zhao
We are concerned with a worst-case scenario in model generalization, in the sense that a model aims to perform well on many unseen domains while there is only one single domain available for training.
1 code implementation • NeurIPS 2021 • Long Zhao, Zizhao Zhang, Ting Chen, Dimitris N. Metaxas, Han Zhang
Attention-based models, exemplified by the Transformer, can effectively model long range dependency, but suffer from the quadratic complexity of self-attention operation, making them difficult to be adopted for high-resolution image generation based on Generative Adversarial Networks (GANs).
Ranked #2 on
Image Generation
on CelebA 256x256
(FID metric)
6 code implementations • 26 May 2021 • Zizhao Zhang, Han Zhang, Long Zhao, Ting Chen, Sercan O. Arik, Tomas Pfister
Hierarchical structures are popular in recent vision transformers, however, they require sophisticated designs and massive datasets to work well.
Ranked #87 on
Image Classification
on CIFAR-10
no code implementations • 20 May 2021 • Yuxiao Chen, Jianbo Yuan, Long Zhao, Tianlang Chen, Rui Luo, Larry Davis, Dimitris N. Metaxas
Cross-modal attention mechanisms have been widely applied to the image-text matching task and have achieved remarkable improvements thanks to its capability of learning fine-grained relevance across different modalities.
1 code implementation • 9 Mar 2021 • Mengmeng Ma, Jian Ren, Long Zhao, Sergey Tulyakov, Cathy Wu, Xi Peng
A common assumption in multimodal learning is the completeness of training data, i. e., full modalities are available in all training examples.
no code implementations • 1 Feb 2021 • WeiJie Chen, Yilu Guo, Shicai Yang, Zhaoyang Li, Zhenxin Ma, Binbin Chen, Long Zhao, Di Xie, ShiLiang Pu, Yueting Zhuang
Therefore, it yields our attention to suppress false positive in each target domain in an unsupervised way.
1 code implementation • CVPR 2021 • Long Zhao, Yuxiao Wang, Jiaping Zhao, Liangzhe Yuan, Jennifer J. Sun, Florian Schroff, Hartwig Adam, Xi Peng, Dimitris Metaxas, Ting Liu
To evaluate the power of the learned representations, in addition to the conventional fully-supervised action recognition settings, we introduce a novel task called single-shot cross-view action recognition.
2 code implementations • 23 Oct 2020 • Ting Liu, Jennifer J. Sun, Long Zhao, Jiaping Zhao, Liangzhe Yuan, Yuxiao Wang, Liang-Chieh Chen, Florian Schroff, Hartwig Adam
Recognition of human poses and actions is crucial for autonomous systems to interact smoothly with people.
1 code implementation • NeurIPS 2020 • Long Zhao, Ting Liu, Xi Peng, Dimitris Metaxas
In this paper, we propose a novel and effective regularization term for adversarial data augmentation.
no code implementations • 10 Aug 2020 • Kuan Fang, Long Zhao, Zhan Shen, RuiXing Wang, RiKang Zhour, LiWen Fan
Search engine has become a fundamental component in various web and mobile applications.
no code implementations • CVPR 2020 • Long Zhao, Xi Peng, Yuxiao Chen, Mubbasir Kapadia, Dimitris N. Metaxas
Our key idea is to generalize the distilled cross-modal knowledge learned from a Source dataset, which contains paired examples from both modalities, to the Target dataset by modeling knowledge as priors on parameters of the Student.
1 code implementation • CVPR 2020 • Fengchun Qiao, Long Zhao, Xi Peng
We are concerned with a worst-case scenario in model generalization, in the sense that a model aims to perform well on many unseen domains while there is only one single domain available for training.
1 code implementation • NeurIPS 2019 • Yu Tian, Long Zhao, Xi Peng, Dimitris N. Metaxas
Graph kernels are kernel methods measuring graph similarity and serve as a standard tool for graph classification.
Ranked #7 on
Link Prediction
on Cora
1 code implementation • 20 Jul 2019 • Yuxiao Chen, Long Zhao, Xi Peng, Jianbo Yuan, Dimitris N. Metaxas
We propose a Dynamic Graph-Based Spatial-Temporal Attention (DG-STA) method for hand gesture recognition.
Ranked #3 on
Hand Gesture Recognition
on SHREC 2017
4 code implementations • CVPR 2019 • Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, Dimitris N. Metaxas
In this paper, we study the problem of learning Graph Convolutional Networks (GCNs) for regression.
Ranked #21 on
Monocular 3D Human Pose Estimation
on Human3.6M
no code implementations • 25 Feb 2019 • Lingyi Han, Kan Zheng, Long Zhao, Xianbin Wang, Xuemin Shen
Therefore, a framework combining with a deep clustering (DeepCluster) module is developed for STTP at largescale networks in this paper.
no code implementations • 25 Feb 2019 • Shiwen Liu, Kan Zheng, Long Zhao, Pingzhi Fan
Experimental results show that the HMMs trained with the continuous characterization of mobility features can give a higher prediction accuracy when they are used for predicting driving intentions.
1 code implementation • ECCV 2018 • Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, Dimitris Metaxas
We consider the problem of image-to-video translation, where an input image is translated into an output video containing motions of a single object.
1 code implementation • 28 Jun 2018 • Yu Tian, Xi Peng, Long Zhao, Shaoting Zhang, Dimitris N. Metaxas
Generating multi-view images from a single-view input is an essential yet challenging problem.
no code implementations • 25 Mar 2017 • Long Zhao, Fangda Han, Xi Peng, Xun Zhang, Mubbasir Kapadia, Vladimir Pavlovic, Dimitris N. Metaxas
We first recover the facial identity and expressions from the video by fitting a face morphable model for each frame.
no code implementations • 3 Mar 2017 • Dingwen Zhang, Deyu Meng, Long Zhao, Junwei Han
Weakly-supervised object detection (WOD) is a challenging problems in computer vision.
Ranked #34 on
Weakly Supervised Object Detection
on PASCAL VOC 2007
no code implementations • CVPR 2015 • Chaoyang Wang, Long Zhao, Shuang Liang, Liqing Zhang, Jinyuan Jia, Yichen Wei
Hierarchical segmentation based object proposal methods have become an important step in modern object detection paradigm.