Invariant Grounding for Video Question Answering

1 code implementation CVPR 2022 Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, Tat-Seng Chua

At its core is understanding the alignments between visual scenes in video and linguistic semantics in question to yield the answer.

Question Answering Video Question Answering

PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models

1 code implementation23 May 2022 Yuan YAO, Qianyu Chen, Ao Zhang, Wei Ji, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

We show that PEVL enables state-of-the-art performance of detector-free VLP models on position-sensitive tasks such as referring expression comprehension and phrase grounding, and also improves the performance on position-insensitive tasks with grounded inputs.

Language Modelling Phrase Grounding +3

3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective

no code implementations27 Apr 2022 Zhedong Zheng, Jiayin Zhu, Wei Ji, Yi Yang, Tat-Seng Chua

In particular, to solve the inherent ambiguity among four implicit variables, i. e., camera position, shape, texture, and illumination, we study existing works and introduce an explainable structural causal map (SCM) to build our model.

3D Reconstruction Self-Supervised Learning

Exploring Denoised Cross-Video Contrast for Weakly-Supervised Temporal Action Localization

no code implementations CVPR 2022 Jingjing Li, Tianyu Yang, Wei Ji, Jue Wang, Li Cheng

Inspired by recent success in unsupervised contrastive representation learning, we propose a novel denoised cross-video contrastive algorithm, aiming to enhance the feature discrimination ability of video snippets for accurate temporal action localization in the weakly-supervised setting.

Contrastive Learning Denoising +3

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering

1 code implementation12 Dec 2021 Junbin Xiao, Angela Yao, Zhiyuan Liu, Yicong Li, Wei Ji, Tat-Seng Chua

To align with the multi-granular essence of linguistic concepts in language queries, we propose to model video as a conditional graph hierarchy which weaves together visual facts of different granularity in a level-wise manner, with the guidance of corresponding textual cues.

Question Answering Video Question Answering

Rethinking the Two-Stage Framework for Grounded Situation Recognition

1 code implementation10 Dec 2021 Meng Wei, Long Chen, Wei Ji, Xiaoyu Yue, Tat-Seng Chua

Since each verb is associated with a specific set of semantic roles, all existing GSR methods resort to a two-stage framework: predicting the verb in the first stage and detecting the semantic roles in the second stage.

Object Recognition

Meeting Summarization with Pre-training and Clustering Methods

1 code implementation16 Nov 2021 Andras Huebner, Wei Ji, Xiang Xiao

Lastly, we compare the performance of our baseline models with BART, a state-of-the-art language model that is effective for summarization.

Language Modelling Meeting Summarization +1

Decoupling Strategy and Surface Realization for Task-oriented Dialogues

no code implementations29 Sep 2021 Chenchen Ye, Lizi Liao, Fuli Feng, Wei Ji, Tat-Seng Chua

The core is to construct a latent content space for strategy optimization and disentangle the surface style from it.

Style Transfer Task-Oriented Dialogue Systems

Advancing biological super-resolution microscopy through deep learning: a brief review

no code implementations24 Jun 2021 Tianjie Yang, Yaoru Luo, Wei Ji, Ge Yang

We conclude with an outlook on how deep learning could shape the future of this new generation of light microscopy technology.


Calibrated RGB-D Salient Object Detection

1 code implementation CVPR 2021 Wei Ji, Jingjing Li, Shuang Yu, Miao Zhang, Yongri Piao, Shunyu Yao, Qi Bi, Kai Ma, Yefeng Zheng, Huchuan Lu, Li Cheng

Complex backgrounds and similar appearances between objects and their surroundings are generally recognized as challenging scenarios in Salient Object Detection (SOD).

object-detection RGB-D Salient Object Detection +1

Deconfounded Video Moment Retrieval with Causal Intervention

1 code implementation3 Jun 2021 Xun Yang, Fuli Feng, Wei Ji, Meng Wang, Tat-Seng Chua

To fill the research gap, we propose a causality-inspired VMR framework that builds structural causal model to capture the true effect of query and video content on the prediction.

Moment Retrieval

Conditional Hyper-Network for Blind Super-Resolution with Multiple Degradations

1 code implementation8 Apr 2021 Guanghao Yin, Wei Wang, Zehuan Yuan, Wei Ji, Dongdong Yu, Shouqian Sun, Tat-Seng Chua, Changhu Wang

We extract degradation prior at task-level with the proposed ConditionNet, which will be used to adapt the parameters of the basic SR network (BaseNet).

Blind Super-Resolution Image Super-Resolution +1

Boundary Proposal Network for Two-Stage Natural Language Video Localization

no code implementations15 Mar 2021 Shaoning Xiao, Long Chen, Songyang Zhang, Wei Ji, Jian Shao, Lu Ye, Jun Xiao

State-of-the-art NLVL methods are almost in one-stage fashion, which can be typically grouped into two categories: 1) anchor-based approach: it first pre-defines a series of video segment candidates (e. g., by sliding window), and then does classification for each candidate; 2) anchor-free approach: it directly predicts the probabilities for each video frame as a boundary or intermediate frame inside the positive segment.

ChemistryQA: A Complex Question Answering Dataset from Chemistry

no code implementations1 Jan 2021 Zhuoyu Wei, Wei Ji, Xiubo Geng, Yining Chen, Baihua Chen, Tao Qin, Daxin Jiang

We notice that some real-world QA tasks are more complex, which cannot be solved by end-to-end neural networks or translated to any kind of formal representations.

Machine Reading Comprehension Question Answering

An Early Study on Intelligent Analysis of Speech under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety

no code implementations30 Apr 2020 Jing Han, Kun Qian, Meishu Song, Zijiang Yang, Zhao Ren, Shuo Liu, Juan Liu, Huaiyuan Zheng, Wei Ji, Tomoya Koike, Xiao Li, Zixing Zhang, Yoshiharu Yamamoto, Björn W. Schuller

In particular, by analysing speech recordings from these patients, we construct audio-only-based models to automatically categorise the health state of patients from four aspects, including the severity of illness, sleep quality, fatigue, and anxiety.

Sleep Quality

Context-Aware Deep Spatio-Temporal Network for Hand Pose Estimation from Depth Images

no code implementations6 Oct 2018 Yiming Wu, Wei Ji, Xi Li, Gang Wang, Jianwei Yin, Fei Wu

As a fundamental and challenging problem in computer vision, hand pose estimation aims to estimate the hand joint locations from depth images.

Hand Pose Estimation

