Search Results for author: Jiyang Gao

Found 20 papers, 8 papers with code

VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation

5 code implementations CVPR 2020 Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Cong-Cong Li, Cordelia Schmid

Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e. g. pedestrians and vehicles) and road context information (e. g. lanes, traffic lights).

Self-Driving Cars

STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction

no code implementations CVPR 2020 Zhishuai Zhang, Jiyang Gao, Junhua Mao, Yukai Liu, Dragomir Anguelov, Cong-Cong Li

For the Waymo Open Dataset, we achieve a bird-eyes-view (BEV) detection AP of 80. 73 and trajectory prediction average displacement error (ADE) of 33. 67cm for pedestrians, which establish the state-of-the-art for both tasks.

Autonomous Driving Object Detection +2

CPARR: Category-based Proposal Analysis for Referring Relationships

no code implementations17 Apr 2020 Chuanzi He, Haidong Zhu, Jiyang Gao, Kan Chen, Ram Nevatia

The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of \texttt{<subject, predicate, object>}.

Visual Relationship Detection

End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds

no code implementations15 Oct 2019 Yin Zhou, Pei Sun, Yu Zhang, Dragomir Anguelov, Jiyang Gao, Tom Ouyang, James Guo, Jiquan Ngiam, Vijay Vasudevan

In this paper, we aim to synergize the birds-eye view and the perspective view and propose a novel end-to-end multi-view fusion (MVF) algorithm, which can effectively learn to utilize the complementary information from both.

3D Object Detection

NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection

no code implementations ICCV 2019 JIyang Gao, Jiang Wang, Shengyang Dai, Li-Jia Li, Ram Nevatia

Comparing to standard Faster RCNN, it contains three highlights: an ensemble of two classification heads and a distillation head to avoid overfitting on noisy labels and improve the mining precision, masking the negative sample loss in box predictor to avoid the harm of false negative labels, and training box regression head only on seed annotations to eliminate the harm from inaccurate boundaries of mined bounding boxes.

Semi-Supervised Object Detection Weakly Supervised Object Detection

MAC: Mining Activity Concepts for Language-based Temporal Localization

3 code implementations21 Nov 2018 Runzhou Ge, Jiyang Gao, Kan Chen, Ram Nevatia

Previous methods address the problem by considering features from video sliding windows and language queries and learning a subspace to encode their correlation, which ignore rich semantic cues about activities in videos and queries.

Language-Based Temporal Localization

CTAP: Complementary Temporal Action Proposal Generation

1 code implementation ECCV 2018 Jiyang Gao, Kan Chen, Ram Nevatia

Temporal action proposal generation is an important task, akin to object proposals, temporal action proposals are intended to capture "clips" or temporal intervals in videos that are likely to contain an action.

14 Temporal Action Proposal Generation

Revisiting Temporal Modeling for Video-based Person ReID

8 code implementations5 May 2018 Jiyang Gao, Ram Nevatia

Although many methods on temporal modeling have been proposed, it is hard to directly compare these methods, because the choice of feature extractor and loss function also have a large impact on the final performance.

Motion-Appearance Co-Memory Networks for Video Question Answering

no code implementations CVPR 2018 Jiyang Gao, Runzhou Ge, Kan Chen, Ram Nevatia

Specifically, there are three salient aspects: (1) a co-memory attention mechanism that utilizes cues from both motion and appearance to generate attention; (2) a temporal conv-deconv network to generate multi-level contextual facts; (3) a dynamic fact ensemble method to construct temporal representation dynamically for different questions.

Question Answering Video Question Answering +1

Knowledge Aided Consistency for Weakly Supervised Phrase Grounding

no code implementations CVPR 2018 Kan Chen, Jiyang Gao, Ram Nevatia

In this paper, we explore the consistency contained in both visual and language modalities, and leverage complementary external knowledge to facilitate weakly supervised grounding.

Phrase Grounding

Knowledge Concentration: Learning 100K Object Classifiers in a Single CNN

no code implementations21 Nov 2017 Jiyang Gao, Zijian, Guo, Zhen Li, Ram Nevatia

To address these challenges, we propose a Knowledge Concentration method, which effectively transfers the knowledge from dozens of specialists (multiple teacher networks) into one single model (one student network) to classify 100K object categories.

General Classification Knowledge Distillation

Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation

no code implementations31 Jul 2017 Zhenheng Yang, Jiyang Gao, Ram Nevatia

In this work, we address the problem of spatio-temporal action detection in temporally untrimmed videos.

Action Detection Frame +1

RED: Reinforced Encoder-Decoder Networks for Action Anticipation

1 code implementation16 Jul 2017 Jiyang Gao, Zhenheng Yang, Ram Nevatia

RED takes multiple history representations as input and learns to anticipate a sequence of future representations.

14 Action Anticipation

TALL: Temporal Activity Localization via Language Query

8 code implementations ICCV 2017 Jiyang Gao, Chen Sun, Zhenheng Yang, Ram Nevatia

For evaluation, we adopt TaCoS dataset, and build a new dataset for this task on top of Charades by adding sentence temporal annotations, called Charades-STA.

Temporal Localization

Cascaded Boundary Regression for Temporal Action Detection

no code implementations2 May 2017 Jiyang Gao, Zhenheng Yang, Ram Nevatia

CBR uses temporal coordinate regression to refine the temporal boundaries of the sliding windows.

14 Action Detection

TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals

1 code implementation ICCV 2017 Jiyang Gao, Zhenheng Yang, Chen Sun, Kan Chen, Ram Nevatia

Temporal Action Proposal (TAP) generation is an important problem, as fast and accurate extraction of semantically important (e. g. human actions) segments from untrimmed videos is an important step for large-scale video analysis.

14 Temporal Action Localization

ACD: Action Concept Discovery from Image-Sentence Corpora

no code implementations16 Apr 2016 Jiyang Gao, Chen Sun, Ram Nevatia

It obtains candidate action concepts by extracting verb-object pairs from sentences and verifies their visualness with the associated images.

Action Classification Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.