Learning high-quality sentence representations is a fundamental problem of natural language processing which could benefit a wide range of downstream tasks.
However, the inherent complexity of 3D models and the ambiguous text description lead to the challenge in this task.
The image enhancement module learns a coil-combined image prior to suppress noise-like artifacts, while the k-space restoration module explores multi-coil k-space correlations to recover high-frequency details.
To bring digital avatars into people's lives, it is highly demanded to efficiently generate complete, realistic, and animatable head avatars.
Our MAC aims to reduce video representation's spatial and temporal redundancy in the VidLP model by a mask sampling mechanism to improve pre-training efficiency.
Ranked #31 on Video Retrieval on MSR-VTT-1kA (using extra training data)
With the help of first pass predicted RF and corresponding actual quality as feedback, the second pass prediction will be highly accurate.
The marginal distribution model (MDM) is one such model, that requires only the specification of marginal distributions of the random utilities.
In this report, we present the ReLER@ZJU-Alibaba submission to the Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2022.
Ranked #2 on Natural Language Queries on Ego4D
Skeleton extraction is a task focused on providing a simple representation of an object by extracting the skeleton from the given binary or RGB image.
In 3D face reconstruction, orthogonal projection has been widely employed to substitute perspective projection to simplify the fitting process.
In this paper, we reveal and address the disadvantages of the conventional query-driven HOI detectors from the two aspects.
Ranked #11 on Human-Object Interaction Detection on HICO-DET
Automated medical coding, an essential task for healthcare operation and delivery, makes unstructured data manageable by predicting medical codes from clinical documents.
The development of online economics arouses the demand of generating images of models on product clothes, to display new clothes and promote sales.
Ranked #2 on Virtual Try-on on MPV
To this end, we propose a novel one-stage framework with disentangling human-object detection and interaction classification in a cascade manner.
Ranked #7 on Human-Object Interaction Detection on V-COCO
Video affective understanding, which aims to predict the evoked expressions by the video content, is desired for video creation and recommendation.
This paper introduces a dual-critic reinforcement learning (RL) framework to address the problem of frame-level bit allocation in HEVC/H. 265.
no code implementations • 25 Sep 2020 • Pengxu Wei, Hannan Lu, Radu Timofte, Liang Lin, WangMeng Zuo, Zhihong Pan, Baopu Li, Teng Xi, Yanwen Fan, Gang Zhang, Jingtuo Liu, Junyu Han, Errui Ding, Tangxin Xie, Liang Cao, Yan Zou, Yi Shen, Jialiang Zhang, Yu Jia, Kaihua Cheng, Chenhuan Wu, Yue Lin, Cen Liu, Yunbo Peng, Xueyi Zou, Zhipeng Luo, Yuehan Yao, Zhenyu Xu, Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Tongtong Zhao, Shanshan Zhao, Yoseob Han, Byung-Hoon Kim, JaeHyun Baek, HaoNing Wu, Dejia Xu, Bo Zhou, Wei Guan, Xiaobo Li, Chen Ye, Hao Li, Yukai Shi, Zhijing Yang, Xiaojun Yang, Haoyu Zhong, Xin Li, Xin Jin, Yaojun Wu, Yingxue Pang, Sen Liu, Zhi-Song Liu, Li-Wen Wang, Chu-Tak Li, Marie-Paule Cani, Wan-Chi Siu, Yuanbo Zhou, Rao Muhammad Umer, Christian Micheloni, Xiaofeng Cong, Rajat Gupta, Keon-Hee Ahn, Jun-Hyuk Kim, Jun-Ho Choi, Jong-Seok Lee, Feras Almasri, Thomas Vandamme, Olivier Debeir
This paper introduces the real image Super-Resolution (SR) challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2020.
It is also a challenging task due to the lack of high-quality datasets that can fuel current deep learning-based methods.
Finally, the model learns to compare global and local features separately, i. e., in two paths, before merging the similarities.