no code implementations • 22 Dec 2023 • Zhenyang Li, Fan Liu, Yinwei Wei, Zhiyong Cheng, Liqiang Nie, Mohan Kankanhalli
To obtain robust and independent representations for each factor associated with a specific attribute, we first disentangle the representations of features both within and across different modalities.
no code implementations • 29 Mar 2023 • Mingqing Wang, Jiawei Li, Zhenyang Li, Chengxiao Luo, Bin Chen, Shu-Tao Xia, Zhi Wang
In this work, the VQVAE focus on feature extraction and reconstruction of images, and the transformers fit the manifold and locate anomalies in the latent space.
no code implementations • 4 Feb 2023 • Zhenyang Li, Yangyang Guo, Kejie Wang, Fan Liu, Liqiang Nie, Mohan Kankanhalli
Visual Commonsense Reasoning (VCR) remains a significant yet challenging research problem in the realm of visual reasoning.
no code implementations • 30 Sep 2022 • Yizhou Zhao, Zhenyang Li, Xun Guo, Yan Lu
Temporal modeling is crucial for various video learning tasks.
no code implementations • 14 Jul 2022 • Boming Zhao, Bangbang Yang, Zhenyang Li, Zuoyue Li, Guofeng Zhang, Jiashu Zhao, Dawei Yin, Zhaopeng Cui, Hujun Bao
Expanding an existing tourist photo from a partially captured scene to a full scene is one of the desired experiences for photography applications.
1 code implementation • 21 Jun 2022 • Yikang Ding, Zhenyang Li, Dihe Huang, Zhiheng Li, Kai Zhang
Learning-based multi-view stereo (MVS) methods have made impressive progress and surpassed traditional methods in recent years.
1 code implementation • 25 Feb 2022 • Zhenyang Li, Yangyang Guo, Kejie Wang, Yinwei Wei, Liqiang Nie, Mohan Kankanhalli
Given that our framework is model-agnostic, we apply it to the existing popular baselines and validate its effectiveness on the benchmark dataset.
1 code implementation • Findings (ACL) 2021 • Weidong Guo, Mingjun Zhao, Lusheng Zhang, Di Niu, Jinwen Luo, Zhenhua Liu, Zhenyang Li, Jianbo Tang
Language model pre-training based on large corpora has achieved tremendous success in terms of constructing enriched contextual representations and has led to significant performance gains on a diverse range of Natural Language Understanding (NLU) tasks.
no code implementations • 29 Jun 2020 • Long Chen, Lei Tong, Feixiang Zhou, Zheheng Jiang, Zhenyang Li, Jialin Lv, Junyu Dong, Huiyu Zhou
To investigate how the underwater image enhancement methods influence the following underwater object detection tasks, in this paper, we provide a large-scale underwater object detection dataset with both bounding box annotations and high quality reference images, namely OUC dataset.
1 code implementation • CVPR 2018 • Kirill Gavrilyuk, Amir Ghodrati, Zhenyang Li, Cees G. M. Snoek
This paper strives for pixel-level segmentation of actors and their actions in video content.
Ranked #13 on Referring Expression Segmentation on J-HMDB
no code implementations • CVPR 2017 • Zhenyang Li, Ran Tao, Efstratios Gavves, Cees G. M. Snoek, Arnold W. M. Smeulders
This paper strives to track a target object in a video.
Ranked #17 on Referring Expression Segmentation on J-HMDB
1 code implementation • 6 Jul 2016 • Zhenyang Li, Efstratios Gavves, Mihir Jain, Cees G. M. Snoek
We present a new architecture for end-to-end sequence learning of actions in video, we call VideoLSTM.
no code implementations • 21 Apr 2016 • Roeland De Geest, Efstratios Gavves, Amir Ghodrati, Zhenyang Li, Cees Snoek, Tinne Tuytelaars
Third, the start of the action is unknown, so it is unclear over what time window the information should be integrated.