no code implementations • 27 Dec 2022 • Bo Chen, Zhiwei Hu, Zhilong Ji, Jinfeng Bai, WangMeng Zuo
The main challenge of this task is to understand the visual and linguistic content simultaneously and to find the referred object accurately among all instances in the image.
1 code implementation • 27 Dec 2022 • Zhiwei Hu, Bo Chen, Yuan Gao, Zhilong Ji, Jinfeng Bai
The task of referring video object segmentation aims to segment the object in the frames of a given video to which the referring expressions refer.
Referring Video Object Segmentation
Semantic Segmentation
+1
1 code implementation • 20 Oct 2022 • Zhiwei Hu, Víctor Gutiérrez-Basulto, Zhiliang Xiang, Ru Li, Jeff Z. Pan
We investigate the knowledge graph entity typing task which aims at inferring plausible entity types.
no code implementations • 24 Aug 2022 • Qi Lv, Ziqiang Cao, Wenrui Xie, Derui Wang, Jingwen Wang, Zhiwei Hu, Tangkun Zhang, Ba Yuan, Yuanhang Li, Min Cao, Wenjie Li, Sujian Li, Guohong Fu
Furthermore, based on the similarity between video outlines and textual outlines, we use a large number of articles with chapter headings to pretrain our model.
1 code implementation • 2 May 2022 • Zhiwei Hu, Víctor Gutiérrez-Basulto, Zhiliang Xiang, XiaoLi Li, Ru Li, Jeff Z. Pan
Multi-hop reasoning over real-life knowledge graphs (KGs) is a highly challenging problem as traditional subgraph matching methods are not capable to deal with noise and missing information.
no code implementations • 30 Mar 2022 • Guang Feng, Lihe Zhang, Zhiwei Hu, Huchuan Lu
To address this task, we first design a two-stream encoder to extract CNN-based visual features and transformer-based linguistic features hierarchically, and a vision-language mutual guidance (VLMG) module is inserted into the encoder multiple times to promote the hierarchical and progressive fusion of multi-modal features.
Ranked #1 on
Referring Expression Segmentation
on J-HMDB
no code implementations • EMNLP 2020 • Rongsheng Zhang, Xiaoxi Mao, Le Li, Lin Jiang, Lin Chen, Zhiwei Hu, Yadong Xi, Changjie Fan, Minlie Huang
In the lyrics generation process, \textit{Youling} supports traditional one pass full-text generation mode as well as an interactive generation mode, which allows users to select the satisfactory sentences from generated candidates conditioned on preceding context.
no code implementations • CVPR 2021 • Guang Feng, Zhiwei Hu, Lihe Zhang, Huchuan Lu
In this work, we propose an encoder fusion network (EFN), which transforms the visual encoder into a multi-modal feature learning network, and uses language to refine the multi-modal features progressively.
no code implementations • 5 Oct 2020 • Andrea Amorese, Andrea Marino, Martin Sundermann, Kai Chen, Zhiwei Hu, Thomas Willers, Fadi Choukani, Philippe Ohresser, Javier Herrero-Martin, Stefano Agrestini, Chien-Te Chen, Hong-Ji Lin, Maurits W. Haverkort, Silvia Seiro, Christoph Geibel, Frank Steglich, Liu Hao Tjeng, Gertrud Zwicknagl, Andrea Severing
The crystal-field ground state wave function of CeCu$_2$Si$_2$ has been investigated with linear polarized $M$-edge x-ray absorption spectroscopy from 250mK to 250K, thus covering the superconducting ($T_{\text{c}}$=0. 6K), the Kondo ($T_{\text{K}}$$\approx$20K) as well as the Curie-Weiss regime.
Strongly Correlated Electrons
no code implementations • CVPR 2020 • Zhiwei Hu, Guang Feng, Jiayu Sun, Lihe Zhang, Huchuan Lu
Combining with the language-guided visual attention, a bi-directional cross-modal attention module (BCAM) is built to learn the relationship between multi-modal features.
Ranked #11 on
Referring Expression Segmentation
on RefCOCO testB