Search Results for author: Huijuan Xu

Found 29 papers, 15 papers with code

Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning

10 code implementations • ICCV 2021 • Yinbo Chen, Zhuang Liu, Huijuan Xu, Trevor Darrell, Xiaolong Wang

The edge between these two lines of works has yet been underexplored, and the effectiveness of meta-learning in few-shot learning remains unclear.

Few-Shot Learning General Classification

587

Paper
Code

R-C3D: Region Convolutional 3D Network for Temporal Activity Detection

3 code implementations • ICCV 2017 • Huijuan Xu, Abir Das, Kate Saenko

We address the problem of activity detection in continuous, untrimmed video streams.

Ranked #1 on Action Recognition In Videos on THUMOS’14

Action Detection Action Recognition In Videos +2

254

Paper
Code

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks

1 code implementation • CVPR 2020 • Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang, Trevor Darrell

Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations.

Action Recognition Object

137

Paper
Code

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

1 code implementation • HLT 2015 • Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko

Solving the visual symbol grounding problem has long been a goal of artificial intelligence.

Sentence Text Generation +1

Paper
Code

Multilevel Language and Vision Integration for Text-to-Clip Retrieval

1 code implementation • 13 Apr 2018 • Huijuan Xu, Kun He, Bryan A. Plummer, Leonid Sigal, Stan Sclaroff, Kate Saenko

To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work.

Retrieval Sentence

Paper
Code

Learning Canonical Representations for Scene Graph to Image Generation

2 code implementations • ECCV 2020 • Roei Herzig, Amir Bar, Huijuan Xu, Gal Chechik, Trevor Darrell, Amir Globerson

Generating realistic images of complex visual scenes becomes challenging when one wishes to control the structure of the generated images.

Ranked #3 on Layout-to-Image Generation on Visual Genome 256x256

Layout-to-Image Generation Scene Generation

Paper
Code

Joint Event Detection and Description in Continuous Video Streams

1 code implementation • 28 Feb 2018 • Huijuan Xu, Boyang Li, Vasili Ramanishka, Leonid Sigal, Kate Saenko

In order to explicitly model temporal relationships between visual events and their captions in a single video, we also propose a two-level hierarchical captioning module that keeps track of context.

Dense Captioning Dense Video Captioning +2

Paper
Code

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

1 code implementation • 24 Jul 2022 • Zhi Li, Lu He, Huijuan Xu

Action understanding has evolved into the era of fine granularity, as most human behaviors in real life have only minor differences.

Ranked #1 on Weakly Supervised Action Localization on FineAction

Action Understanding Fine-Grained Action Detection +1

Paper
Code

TwoStreamVAN: Improving Motion Modeling in Video Generation

1 code implementation • 3 Dec 2018 • Ximeng Sun, Huijuan Xu, Kate Saenko

Video generation is an inherently challenging task, as it requires modeling realistic temporal dynamics as well as spatial content.

Video Generation

Paper
Code

Spatio-Temporal Action Graph Networks

1 code implementation • 4 Dec 2018 • Roei Herzig, Elad Levi, Huijuan Xu, Hang Gao, Eli Brosh, Xiaolong Wang, Amir Globerson, Trevor Darrell

Events defined by the interaction of objects in a scene are often of critical importance; yet important events may have insufficient labeled examples to train a conventional deep model to generalize to future object appearance.

Activity Recognition Autonomous Driving +3

Paper
Code

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

1 code implementation • 17 Nov 2015 • Huijuan Xu, Kate Saenko

We propose a novel spatial attention architecture that aligns words with image patches in the first hop, and obtain improved results by adding a second attention hop which considers the whole question to choose visual evidence based on the results of the first hop.

Ranked #12 on Visual Question Answering (VQA) on COCO Visual Question Answering (VQA) real images 1.0 open ended

Image Captioning Question Answering +1

Paper
Code

Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning

1 code implementation • ECCV 2020 • Zhekun Luo, Devin Guillory, Baifeng Shi, Wei Ke, Fang Wan, Trevor Darrell, Huijuan Xu

Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label.

Ranked #9 on Weakly Supervised Action Localization on THUMOS’14

Action Localization Multiple Instance Learning +2

Paper
Code

Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration

1 code implementation • 17 Mar 2024 • Shu Zhao, Xiaohan Zou, Tan Yu, Huijuan Xu

Meanwhile, our RebQ leverages extensive multi-modal knowledge from pre-trained LMMs to reconstruct the data of missing modality.

Continual Learning

Paper
Code

Syntax Controlled Knowledge Graph-to-Text Generation with Order and Semantic Consistency

1 code implementation • Findings (NAACL) 2022 • Jin Liu, Chongfeng Fan, Fengyu Zhou, Huijuan Xu

Knowledge graph-to-text (KG-to-text) generation aims to generate easy-to-understand sentences from the KG, and at the same time, maintains semantic consistency between generated sentences and the KG.

KG-to-Text Generation POS +2

Paper
Code

Less is More: Toward Zero-Shot Local Scene Graph Generation via Foundation Models

1 code implementation • 2 Oct 2023 • Shu Zhao, Huijuan Xu

To fill this gap, we present a new task called Local Scene Graph Generation.

Graph Generation Scene Graph Generation

Paper
Code

Contextual Multi-Scale Region Convolutional 3D Network for Activity Detection

no code implementations • 28 Jan 2018 • Yancheng Bai, Huijuan Xu, Kate Saenko, Bernard Ghanem

In this paper, we propose the contextual multi-scale region convolutional 3D network (CMS-RC3D) for activity detection.

Action Detection Activity Detection

Paper
Add Code

A Multi-scale Multiple Instance Video Description Network

no code implementations • 21 May 2015 • Huijuan Xu, Subhashini Venugopalan, Vasili Ramanishka, Marcus Rohrbach, Kate Saenko

Most state-of-the-art methods for solving this problem borrow existing deep convolutional neural network (CNN) architectures (AlexNet, GoogLeNet) to extract a visual representation of the input video.

Image Segmentation Multiple Instance Learning +3

Paper
Add Code

Similarity R-C3D for Few-shot Temporal Activity Detection

no code implementations • 25 Dec 2018 • Huijuan Xu, Bingyi Kang, Ximeng Sun, Jiashi Feng, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection which detects the start and end time of the few-shot input activities in an untrimmed video.

Action Detection Activity Detection

Paper
Add Code

Two-Stream Region Convolutional 3D Network for Temporal Activity Detection

no code implementations • 5 Jun 2019 • Huijuan Xu, Abir Das, Kate Saenko

We address the problem of temporal activity detection in continuous, untrimmed video streams.

Ranked #4 on Action Recognition on THUMOS’14

Action Detection Action Recognition +4

Paper
Add Code

Multi-user Resource Control with Deep Reinforcement Learning in IoT Edge Computing

no code implementations • 19 Jun 2019 • Lei Lei, Huijuan Xu, Xiong Xiong, Kan Zheng, Wei Xiang, Xianbin Wang

By leveraging the concept of mobile edge computing (MEC), massive amount of data generated by a large number of Internet of Things (IoT) devices could be offloaded to MEC server at the edge of wireless network for further computational intensive processing.

Edge-computing reinforcement-learning +2

Paper
Add Code

LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval

no code implementations • 27 Sep 2019 • Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer

However, while such approaches tend to focus on identifying relationships between elements of the video and language modalities, there is less emphasis on modeling relational context between video frames given the semantic context of the query.

Moment Retrieval Retrieval

Paper
Add Code

Spatio-Temporal Action Detection with Multi-Object Interaction

no code implementations • 1 Apr 2020 • Huijuan Xu, Lizhi Yang, Stan Sclaroff, Kate Saenko, Trevor Darrell

Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an "action tube".

Action Detection Human Detection +2

Paper
Add Code

Revisiting Few-shot Activity Detection with Class Similarity Control

no code implementations • 31 Mar 2020 • Huijuan Xu, Ximeng Sun, Eric Tzeng, Abir Das, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection based on proposal regression which detects the start and end time of the activities in untrimmed videos.

Action Detection Activity Detection +1

Paper
Add Code

Auxiliary Task Reweighting for Minimum-data Learning

no code implementations • NeurIPS 2020 • Baifeng Shi, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior of the main task, we obtain a more accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search.

Domain Adaptation Multi-Label Classification

Paper
Add Code

Temporal Action Detection with Multi-level Supervision

no code implementations • ICCV 2021 • Baifeng Shi, Qi Dai, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1. 2, and demonstrate the effectiveness of the proposed UFA and IB methods.

Action Detection Semi-Supervised Action Detection

Paper
Add Code

wMAN: WEAKLY-SUPERVISED MOMENT ALIGNMENT NETWORK FOR TEXT-BASED VIDEO SEGMENT RETRIEVAL

no code implementations • 25 Sep 2019 • Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer

Given a video and a sentence, the goal of weakly-supervised video moment retrieval is to locate the video segment which is described by the sentence without having access to temporal annotations during training.

Moment Retrieval Retrieval +1

Paper
Add Code

Disentangled Action Recognition with Knowledge Bases

no code implementations • NAACL 2022 • Zhekun Luo, Shalini Ghosh, Devin Guillory, Keizo Kato, Trevor Darrell, Huijuan Xu

In this paper, we aim to improve the generalization ability of the compositional action recognition model to novel verbs or novel nouns that are unseen during training time, by leveraging the power of knowledge graphs.

Action Recognition Knowledge Graphs

Paper
Add Code

CircleNet: Reciprocating Feature Adaptation for Robust Pedestrian Detection

no code implementations • 12 Dec 2022 • Tianliang Zhang, Zhenjun Han, Huijuan Xu, Baochang Zhang, Qixiang Ye

In this paper we propose a novel feature learning model, referred to as CircleNet, to achieve feature adaptation by mimicking the process humans looking at low resolution and occluded objects: focusing on it again, at a finer scale, if the object can not be identified clearly for the first time.

object-detection Object Detection +1

Paper
Add Code

NEUCORE: Neural Concept Reasoning for Composed Image Retrieval

no code implementations • 2 Oct 2023 • Shu Zhao, Huijuan Xu

Specifically, considering that text modifier may refer to semantic concepts not existing in the reference image and requiring to be added into the target image, we learn the multi-modal concept alignment between the text modifier and the concatenation of reference and target images, under multiple-instance learning framework with image and sentence level weak supervision.

Concept Alignment Image Retrieval +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.