Search Results for author: Yulei Niu

Found 27 papers, 17 papers with code

RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos

no code implementations27 Mar 2024 Ali Zare, Yulei Niu, Hammad Ayyubi, Shih-Fu Chang

(3) Annotation cost: Annotating instructional videos with step-level labels (i. e., timestamp) or sequence-level labels (i. e., action category) is demanding and labor-intensive, limiting its generalizability to large-scale datasets. In this work, we propose a new and practical setting, called adaptive procedure planning in instructional videos, where the procedure length is not fixed or pre-determined.

Relation Retrieval +1

SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos

no code implementations3 Mar 2024 Yulei Niu, Wenliang Guo, Long Chen, Xudong Lin, Shih-Fu Chang

We study the problem of procedure planning in instructional videos, which aims to make a goal-oriented sequence of action steps given partial visual state observations.

Contrastive Learning

Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering

no code implementations7 Apr 2023 Hung-Ting Su, Yulei Niu, Xudong Lin, Winston H. Hsu, Shih-Fu Chang

Causal Video Question Answering (CVidQA) queries not only association or temporal relations but also causal relations in a video.

Question Answering Question Generation +3

DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection

1 code implementation CVPR 2023 Jiawei Ma, Yulei Niu, Jincheng Xu, Shiyuan Huang, Guangxing Han, Shih-Fu Chang

Generalized few-shot object detection aims to achieve precise detection on both base classes with abundant annotations and novel classes with limited training data.

Few-Shot Object Detection object-detection

Debiased Fine-Tuning for Vision-language Models by Prompt Regularization

no code implementations29 Jan 2023 Beier Zhu, Yulei Niu, Saeil Lee, Minhoe Hur, Hanwang Zhang

We present a new paradigm for fine-tuning large-scale visionlanguage pre-trained models on downstream task, dubbed Prompt Regularization (ProReg).

In Defense of Structural Symbolic Representation for Video Event-Relation Prediction

no code implementations6 Jan 2023 Andrew Lu, Xudong Lin, Yulei Niu, Shih-Fu Chang

Understanding event relationships in videos requires a model to understand the underlying structures of events (i. e. the event type, the associated argument roles, and corresponding entities) and factual knowledge for reasoning.

Relation

Respecting Transfer Gap in Knowledge Distillation

no code implementations23 Oct 2022 Yulei Niu, Long Chen, Chang Zhou, Hanwang Zhang

The network response serves as additional supervision to formulate the machine domain, which uses the data collected from the human domain as a transfer set.

Knowledge Distillation

Weakly-Supervised Temporal Article Grounding

1 code implementation22 Oct 2022 Long Chen, Yulei Niu, Brian Chen, Xudong Lin, Guangxing Han, Christopher Thomas, Hammad Ayyubi, Heng Ji, Shih-Fu Chang

Specifically, given an article and a relevant video, WSAG aims to localize all ``groundable'' sentences to the video, and these sentences are possibly at different semantic scales.

Natural Language Queries Sentence +1

Explicit Image Caption Editing

1 code implementation20 Jul 2022 Zhen Wang, Long Chen, Wenbo Ma, Guangxing Han, Yulei Niu, Jian Shao, Jun Xiao

Given an image and a reference caption, the image caption editing task aims to correct the misalignment errors and generate a refined caption.

Sentence

On Non-Random Missing Labels in Semi-Supervised Learning

1 code implementation ICLR 2022 Xinting Hu, Yulei Niu, Chunyan Miao, Xian-Sheng Hua, Hanwang Zhang

Our method is three-fold: 1) We propose Class-Aware Propensity (CAP) that exploits the unlabeled data to train an improved classifier using the biased labeled data.

Imputation Missing Labels +1

Prompt-aligned Gradient for Prompt Tuning

1 code implementation ICCV 2023 Beier Zhu, Yulei Niu, Yucheng Han, Yue Wu, Hanwang Zhang

Thanks to the large pre-trained vision-language models (VLMs) like CLIP, we can craft a zero-shot classifier by "prompt", e. g., the confidence score of an image being "[CLASS]" can be obtained by using the VLM provided similarity measure between the image and the prompt sentence "a photo of a [CLASS]".

Domain Adaptation Few-Shot Learning +2

Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification

1 code implementation29 Dec 2021 Beier Zhu, Yulei Niu, Xian-Sheng Hua, Hanwang Zhang

We address the overlooked unbiasedness in existing long-tailed classification methods: we find that their overall improvement is mostly attributed to the biased preference of tail over head, as the test distribution is assumed to be balanced; however, when the test is as imbalanced as the long-tailed training data -- let the test respect Zipf's law of nature -- the tail bias is no longer beneficial overall because it hurts the head majorities.

Classification

Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs

1 code implementation CVPR 2022 Kaifeng Gao, Long Chen, Yulei Niu, Jian Shao, Jun Xiao

To this end, we propose a new classification-then-grounding framework for VidSGG, which can avoid all the three overlooked drawbacks.

Predicate Classification

Introspective Distillation for Robust Question Answering

1 code implementation NeurIPS 2021 Yulei Niu, Hanwang Zhang

Question answering (QA) models are well-known to exploit data bias, e. g., the language prior in visual QA and the position bias in reading comprehension.

counterfactual Inductive Bias +3

Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering

1 code implementation3 Oct 2021 Long Chen, Yuhang Zheng, Yulei Niu, Hanwang Zhang, Jun Xiao

Specifically, CSST is composed of two parts: Counterfactual Samples Synthesizing (CSS) and Counterfactual Samples Training (CST).

counterfactual Question Answering +1

COSY: COunterfactual SYntax for Cross-Lingual Understanding

1 code implementation ACL 2021 Sicheng Yu, Hao Zhang, Yulei Niu, Qianru Sun, Jing Jiang

Pre-trained multilingual language models, e. g., multilingual-BERT, are widely used in cross-lingual tasks, yielding the state-of-the-art performance.

counterfactual Natural Language Inference +3

Counterfactual Variable Control for Robust and Interpretable Question Answering

1 code implementation12 Oct 2020 Sicheng Yu, Yulei Niu, Shuohang Wang, Jing Jiang, Qianru Sun

We then conduct two novel CVC inference methods (on trained models) to capture the effect of comprehensive reasoning as the final prediction.

Causal Inference counterfactual +3

Counterfactual VQA: A Cause-Effect Look at Language Bias

1 code implementation CVPR 2021 Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian-Sheng Hua, Ji-Rong Wen

VQA models may tend to rely on language bias as a shortcut and thus fail to sufficiently learn the multi-modal knowledge from both vision and language.

counterfactual Counterfactual Inference +2

Domain-Adaptive Few-Shot Learning

1 code implementation19 Mar 2020 An Zhao, Mingyu Ding, Zhiwu Lu, Tao Xiang, Yulei Niu, Jiechao Guan, Ji-Rong Wen, Ping Luo

Existing few-shot learning (FSL) methods make the implicit assumption that the few target class samples are from the same domain as the source class samples.

Domain Adaptation Few-Shot Learning

Unbiased Scene Graph Generation from Biased Training

6 code implementations CVPR 2020 Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, Hanwang Zhang

Today's scene graph generation (SGG) task is still far from practical, mainly due to the severe training bias, e. g., collapsing diverse "human walk on / sit on / lay on beach" into "human on beach".

Causal Inference counterfactual +2

Two Causal Principles for Improving Visual Dialog

1 code implementation CVPR 2020 Jiaxin Qi, Yulei Niu, Jianqiang Huang, Hanwang Zhang

This paper unravels the design tricks adopted by us, the champion team MReaL-BDAI, for Visual Dialog Challenge 2019: two causal principles for improving Visual Dialog (VisDial).

Visual Dialog Vocal Bursts Valence Prediction

Mobile Video Action Recognition

no code implementations27 Aug 2019 Yuqi Huo, Xiaoli Xu, Yao Lu, Yulei Niu, Zhiwu Lu, Ji-Rong Wen

In addition to motion vectors, we also provide a temporal fusion method to explicitly induce the temporal context.

Action Recognition Temporal Action Localization

Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions

no code implementations8 Jul 2019 Yulei Niu, Hanwang Zhang, Zhiwu Lu, Shih-Fu Chang

Specifically, our framework exploits the reciprocal relation between the referent and context, i. e., either of them influences estimation of the posterior distribution of the other, and thereby the search space of context can be greatly reduced.

Multiple Instance Learning Referring Expression

Recursive Visual Attention in Visual Dialog

1 code implementation CVPR 2019 Yulei Niu, Hanwang Zhang, Manli Zhang, Jianhong Zhang, Zhiwu Lu, Ji-Rong Wen

Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image.

Question Answering Visual Dialog +1

Grounding Referring Expressions in Images by Variational Context

1 code implementation CVPR 2018 Hanwang Zhang, Yulei Niu, Shih-Fu Chang

This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehension of context --- visual attributes (e. g., "largest", "baby") and relationships (e. g., "behind") that help to distinguish the referent from other objects, especially those of the same category.

Multiple Instance Learning Referring Expression

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

no code implementations5 Sep 2017 Yulei Niu, Zhiwu Lu, Ji-Rong Wen, Tao Xiang, Shih-Fu Chang

In this paper, we address two main issues in large-scale image annotation: 1) how to learn a rich feature representation suitable for predicting a diverse set of visual concepts ranging from object, scene to abstract concept; 2) how to annotate an image with the optimal number of class labels.

Cannot find the paper you are looking for? You can Submit a new open access paper.