no code implementations • CVPR 2021 • Lu Mi, Hang Zhao, Charlie Nash, Xiaohan Jin, Jiyang Gao, Chen Sun, Cordelia Schmid, Nir Shavit, Yuning Chai, Dragomir Anguelov
To address this issue, we introduce a new challenging task to generate HD maps.
4 code implementations • 19 Aug 2020 • Hang Zhao, Jiyang Gao, Tian Lan, Chen Sun, Benjamin Sapp, Balakrishnan Varadarajan, Yue Shen, Yi Shen, Yuning Chai, Cordelia Schmid, Cong-Cong Li, Dragomir Anguelov
Our key insight is that for prediction within a moderate time horizon, the future modes can be effectively captured by a set of target states.
no code implementations • CVPR 2020 • Zhishuai Zhang, Jiyang Gao, Junhua Mao, Yukai Liu, Dragomir Anguelov, Cong-Cong Li
For the Waymo Open Dataset, we achieve a bird-eyes-view (BEV) detection AP of 80. 73 and trajectory prediction average displacement error (ADE) of 33. 67cm for pedestrians, which establish the state-of-the-art for both tasks.
3 code implementations • CVPR 2020 • Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Cong-Cong Li, Cordelia Schmid
Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e. g. pedestrians and vehicles) and road context information (e. g. lanes, traffic lights).
no code implementations • 17 Apr 2020 • Chuanzi He, Haidong Zhu, Jiyang Gao, Kan Chen, Ram Nevatia
The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of \texttt{<subject, predicate, object>}.
no code implementations • 15 Oct 2019 • Yin Zhou, Pei Sun, Yu Zhang, Dragomir Anguelov, Jiyang Gao, Tom Ouyang, James Guo, Jiquan Ngiam, Vijay Vasudevan
In this paper, we aim to synergize the birds-eye view and the perspective view and propose a novel end-to-end multi-view fusion (MVF) algorithm, which can effectively learn to utilize the complementary information from both.
no code implementations • ICCV 2019 • JIyang Gao, Jiang Wang, Shengyang Dai, Li-Jia Li, Ram Nevatia
Comparing to standard Faster RCNN, it contains three highlights: an ensemble of two classification heads and a distillation head to avoid overfitting on noisy labels and improve the mining precision, masking the negative sample loss in box predictor to avoid the harm of false negative labels, and training box regression head only on seed annotations to eliminate the harm from inaccurate boundaries of mined bounding boxes.
3 code implementations • 21 Nov 2018 • Runzhou Ge, Jiyang Gao, Kan Chen, Ram Nevatia
Previous methods address the problem by considering features from video sliding windows and language queries and learning a subspace to encode their correlation, which ignore rich semantic cues about activities in videos and queries.
1 code implementation • ECCV 2018 • Jiyang Gao, Kan Chen, Ram Nevatia
Temporal action proposal generation is an important task, akin to object proposals, temporal action proposals are intended to capture "clips" or temporal intervals in videos that are likely to contain an action.
Ranked #10 on Temporal Action Proposal Generation on ActivityNet-1.3
8 code implementations • 5 May 2018 • Jiyang Gao, Ram Nevatia
Although many methods on temporal modeling have been proposed, it is hard to directly compare these methods, because the choice of feature extractor and loss function also have a large impact on the final performance.
no code implementations • CVPR 2018 • Jiyang Gao, Runzhou Ge, Kan Chen, Ram Nevatia
Specifically, there are three salient aspects: (1) a co-memory attention mechanism that utilizes cues from both motion and appearance to generate attention; (2) a temporal conv-deconv network to generate multi-level contextual facts; (3) a dynamic fact ensemble method to construct temporal representation dynamically for different questions.
Ranked #28 on Visual Question Answering (VQA) on MSRVTT-QA
no code implementations • CVPR 2018 • Kan Chen, Jiyang Gao, Ram Nevatia
In this paper, we explore the consistency contained in both visual and language modalities, and leverage complementary external knowledge to facilitate weakly supervised grounding.
no code implementations • 21 Nov 2017 • Jiyang Gao, Zijian, Guo, Zhen Li, Ram Nevatia
To address these challenges, we propose a Knowledge Concentration method, which effectively transfers the knowledge from dozens of specialists (multiple teacher networks) into one single model (one student network) to classify 100K object categories.
no code implementations • 31 Jul 2017 • Zhenheng Yang, Jiyang Gao, Ram Nevatia
In this work, we address the problem of spatio-temporal action detection in temporally untrimmed videos.
1 code implementation • 16 Jul 2017 • Jiyang Gao, Zhenheng Yang, Ram Nevatia
RED takes multiple history representations as input and learns to anticipate a sequence of future representations.
12 code implementations • ICCV 2017 • Jiyang Gao, Chen Sun, Zhenheng Yang, Ram Nevatia
For evaluation, we adopt TaCoS dataset, and build a new dataset for this task on top of Charades by adding sentence temporal annotations, called Charades-STA.
no code implementations • 2 May 2017 • Jiyang Gao, Zhenheng Yang, Ram Nevatia
CBR uses temporal coordinate regression to refine the temporal boundaries of the sliding windows.
Ranked #6 on Temporal Action Localization on THUMOS’14 (mAP IOU@0.1 metric)
1 code implementation • ICCV 2017 • Jiyang Gao, Zhenheng Yang, Chen Sun, Kan Chen, Ram Nevatia
Temporal Action Proposal (TAP) generation is an important problem, as fast and accurate extraction of semantically important (e. g. human actions) segments from untrimmed videos is an important step for large-scale video analysis.
Ranked #8 on Action Recognition on THUMOS’14
no code implementations • 8 Sep 2016 • Jiyang Gao, Ram Nevatia
Besides, the action categories in such datasets are pre-defined and vocabularies are fixed.
no code implementations • 16 Apr 2016 • Jiyang Gao, Chen Sun, Ram Nevatia
It obtains candidate action concepts by extracting verb-object pairs from sentences and verifies their visualness with the associated images.