2 code implementations • CVPR 2017 • Xin Wang, Geoffrey Oxholm, Da Zhang, Yuan-Fang Wang
That is, our scheme can generate results that are visually pleasing and more similar to multiple desired artistic styles with color and texture cues at multiple scales.
no code implementations • 31 Jan 2017 • Da Zhang, Hamid Maei, Xin Wang, Yuan-Fang Wang
In this paper we introduce a fully end-to-end approach for visual tracking in videos that learns to predict the bounding box locations of a target object at every frame.
no code implementations • CVPR 2018 • Xin Wang, Wenhu Chen, Jiawei Wu, Yuan-Fang Wang, William Yang Wang
Video captioning is the task of automatically generating a textual description of the actions in a video.
Hierarchical Reinforcement Learning reinforcement-learning +2
1 code implementation • NAACL 2018 • Xin Wang, Yuan-Fang Wang, William Yang Wang
Furthermore, for the first time, we validate the superior performance of the deep audio features on the video captioning task.
2 code implementations • ACL 2018 • Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang
Though impressive results have been achieved in visual captioning, the task of generating abstract stories from photo streams is still a little-tapped problem.
Ranked #13 on Visual Storytelling on VIST
1 code implementation • 21 Jul 2018 • Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang
In this paper, we present a novel Single Shot multi-Span Detector for temporal activity detection in long, untrimmed videos using a simple end-to-end fully three-dimensional convolutional (Conv3D) network.
no code implementations • 7 Aug 2018 • Da Zhang, Xiyang Dai, Yuan-Fang Wang
(3) We further exploit the temporal context of activities by appropriately fusing multi-scale feature maps, and demonstrate that both local and global temporal contexts are important.
no code implementations • CVPR 2019 • Xin Wang, Qiuyuan Huang, Asli Celikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, Lei Zhang
Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments.
Ranked #2 on Vision-Language Navigation on Room2Room
no code implementations • CVPR 2019 • Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang, Larry S. Davis
In this paper, we present Moment Alignment Network (MAN), a novel framework that unifies the candidate moment encoding and temporal structural reasoning in a single-shot feed-forward network.
2 code implementations • ICCV 2019 • Xin Wang, Jiawei Wu, Junkun Chen, Lei LI, Yuan-Fang Wang, William Yang Wang
We also introduce two tasks for video-and-language research based on VATEX: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description into the target language using the video information as additional spatiotemporal context.
no code implementations • 8 Jan 2022 • Jian-wei Liu, Yuan-Fang Wang, Run-kun Lu, Xionglin Luo
But not all of this information is useful for classification tasks.
no code implementations • 9 Jan 2022 • Run-kun Lu, Jian-wei Liu, Yuan-Fang Wang, Hao-jie Xie, Xin Zuo
As we known, auto-encoder is a method of deep learning, which can learn the latent feature of raw data by reconstructing the input, and based on this, we propose a novel algorithm called Auto-encoder based Co-training Multi-View Learning (ACMVL), which utilizes both complementarity and consistency and finds a joint latent feature representation of multiple views.
no code implementations • 28 Sep 2022 • Haotian Xia, Rhys Tracy, Yun Zhao, Erwan Fraisse, Yuan-Fang Wang, Linda Petzold
The second goal is to introduce a volleyball descriptive language to fully describe the rally processes in the games and apply the language to our dataset.
no code implementations • 22 Aug 2023 • Rhys Tracy, Haotian Xia, Alex Rasla, Yuan-Fang Wang, Ambuj Singh
Our results show that the use of GNNs with our graph encoding yields a much more advanced analysis of the data, which noticeably improves prediction results overall.
1 code implementation • 26 Sep 2023 • Haotian Xia, Rhys Tracy, Yun Zhao, Yuqing Wang, Yuan-Fang Wang, Weining Shen
Our frameworks combine setting ball trajectory recognition with a novel set trajectory classifier to generate comprehensive and advanced statistical data.
no code implementations • 24 Feb 2024 • Haotian Xia, Zhengbang Yang, Yuqing Wang, Rhys Tracy, Yun Zhao, Dongdong Huang, Zezhi Chen, Yan Zhu, Yuan-Fang Wang, Weining Shen
A deep understanding of sports, a field rich in strategic and dynamic content, is crucial for advancing Natural Language Processing (NLP).