Search Results for author: Huijuan Xu

Found 22 papers, 11 papers with code

Temporal Action Detection with Multi-level Supervision

no code implementations ICCV 2021 Baifeng Shi, Qi Dai, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1. 2, and demonstrate the effectiveness of the proposed UFA and IB methods.

Action Detection

Auxiliary Task Reweighting for Minimum-data Learning

no code implementations NeurIPS 2020 Baifeng Shi, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior of the main task, we obtain a more accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search.

Domain Adaptation Multi-Label Classification

Spatio-Temporal Action Detection with Multi-Object Interaction

no code implementations1 Apr 2020 Huijuan Xu, Lizhi Yang, Stan Sclaroff, Kate Saenko, Trevor Darrell

Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an "action tube".

Action Detection Human Detection

Revisiting Few-shot Activity Detection with Class Similarity Control

no code implementations31 Mar 2020 Huijuan Xu, Ximeng Sun, Eric Tzeng, Abir Das, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection based on proposal regression which detects the start and end time of the activities in untrimmed videos.

Action Detection Activity Detection +1

Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning

4 code implementations ICCV 2021 Yinbo Chen, Zhuang Liu, Huijuan Xu, Trevor Darrell, Xiaolong Wang

The edge between these two lines of works has yet been underexplored, and the effectiveness of meta-learning in few-shot learning remains unclear.

Classification Few-Shot Learning +1

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks

1 code implementation CVPR 2020 Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang, Trevor Darrell

Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations.

Action Recognition

LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval

no code implementations27 Sep 2019 Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer

However, while such approaches tend to focus on identifying relationships between elements of the video and language modalities, there is less emphasis on modeling relational context between video frames given the semantic context of the query.

Moment Retrieval

wMAN: WEAKLY-SUPERVISED MOMENT ALIGNMENT NETWORK FOR TEXT-BASED VIDEO SEGMENT RETRIEVAL

no code implementations25 Sep 2019 Reuben Tan, Huijuan Xu, Kate Saenko, Bryan A. Plummer

Given a video and a sentence, the goal of weakly-supervised video moment retrieval is to locate the video segment which is described by the sentence without having access to temporal annotations during training.

Moment Retrieval

Multi-user Resource Control with Deep Reinforcement Learning in IoT Edge Computing

no code implementations19 Jun 2019 Lei Lei, Huijuan Xu, Xiong Xiong, Kan Zheng, Wei Xiang, Xianbin Wang

By leveraging the concept of mobile edge computing (MEC), massive amount of data generated by a large number of Internet of Things (IoT) devices could be offloaded to MEC server at the edge of wireless network for further computational intensive processing.

Edge-computing reinforcement-learning

Similarity R-C3D for Few-shot Temporal Activity Detection

no code implementations25 Dec 2018 Huijuan Xu, Bingyi Kang, Ximeng Sun, Jiashi Feng, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection which detects the start and end time of the few-shot input activities in an untrimmed video.

Action Detection Activity Detection

Spatio-Temporal Action Graph Networks

1 code implementation4 Dec 2018 Roei Herzig, Elad Levi, Huijuan Xu, Hang Gao, Eli Brosh, Xiaolong Wang, Amir Globerson, Trevor Darrell

Events defined by the interaction of objects in a scene are often of critical importance; yet important events may have insufficient labeled examples to train a conventional deep model to generalize to future object appearance.

Activity Recognition Autonomous Driving +2

TwoStreamVAN: Improving Motion Modeling in Video Generation

1 code implementation3 Dec 2018 Ximeng Sun, Huijuan Xu, Kate Saenko

Video generation is an inherently challenging task, as it requires modeling realistic temporal dynamics as well as spatial content.

Video Generation

Multilevel Language and Vision Integration for Text-to-Clip Retrieval

1 code implementation13 Apr 2018 Huijuan Xu, Kun He, Bryan A. Plummer, Leonid Sigal, Stan Sclaroff, Kate Saenko

To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work.

Joint Event Detection and Description in Continuous Video Streams

1 code implementation28 Feb 2018 Huijuan Xu, Boyang Li, Vasili Ramanishka, Leonid Sigal, Kate Saenko

In order to explicitly model temporal relationships between visual events and their captions in a single video, we also propose a two-level hierarchical captioning module that keeps track of context.

Dense Video Captioning Event Detection +1

Contextual Multi-Scale Region Convolutional 3D Network for Activity Detection

no code implementations28 Jan 2018 Yancheng Bai, Huijuan Xu, Kate Saenko, Bernard Ghanem

In this paper, we propose the contextual multi-scale region convolutional 3D network (CMS-RC3D) for activity detection.

Action Detection Activity Detection +1

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

1 code implementation17 Nov 2015 Huijuan Xu, Kate Saenko

We propose a novel spatial attention architecture that aligns words with image patches in the first hop, and obtain improved results by adding a second attention hop which considers the whole question to choose visual evidence based on the results of the first hop.

Image Captioning Question Answering +1

A Multi-scale Multiple Instance Video Description Network

no code implementations21 May 2015 Huijuan Xu, Subhashini Venugopalan, Vasili Ramanishka, Marcus Rohrbach, Kate Saenko

Most state-of-the-art methods for solving this problem borrow existing deep convolutional neural network (CNN) architectures (AlexNet, GoogLeNet) to extract a visual representation of the input video.

Multiple Instance Learning Semantic Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.