no code implementations • 5 Jun 2024 • Zaibin Zhang, Shiyu Tang, Yuanhang Zhang, Talas Fu, Yifan Wang, Yang Liu, Dong Wang, Jing Shao, Lijun Wang, Huchuan Lu
However, prevalent approaches often directly translate high-level instructions into low-level vehicle control signals, which deviates from the inherent language generation paradigm of MLLMs and fails to fully harness their emergent powers.
no code implementations • 11 May 2024 • Phoebe Jing, Yijing Gao, Yuanhang Zhang, Xianlong Zeng
In the realm of predictive analytics, the nuanced domain knowledge of investigators often remains underutilized, confined largely to subjective interpretations and ad hoc decision-making.
no code implementations • 15 Mar 2024 • Yuanhang Zhang, Zhidi Lin, Yiyong Sun, Feng Yin, Carsten Fritsche
Deep state-space models (DSSMs) have gained popularity in recent years due to their potent modeling capacity for dynamic systems.
no code implementations • CVPR 2024 • Yuanhang Zhang, Shuang Yang, Shiguang Shan, Xilin Chen
While many recent approaches for this task primarily rely on guiding the learning process using the audio modality alone to capture information shared between audio and video we reframe the problem as the acquisition of shared unique (modality-specific) and synergistic speech information to address the inherent asymmetry between the modalities.
no code implementations • 13 Jul 2023 • Yuanhang Zhang, Jundong Liu
Path planning plays a crucial role in various autonomy applications, and RRT* is one of the leading solutions in this field.
no code implementations • 26 May 2023 • Zaibin Zhang, Yuanhang Zhang, Lijun Wang, Yifan Wang, Huchuan Lu
At the core of our method is the newly-designed instance occupancy prediction (IOP) module, which aims to infer point-level occupancy status for each instance in the frustum space.
no code implementations • 22 Jun 2022 • Yuanhang Zhang, Susan Liang, Shuang Yang, Shiguang Shan
This report presents a brief description of our winning solution to the AVA Active Speaker Detection (ASD) task at ActivityNet Challenge 2022.
2 code implementations • 24 Apr 2022 • Zhuohao Li, Fandi Gou, Qixin De, Leqi Ding, Yuanhang Zhang, Yunze Cai
Innovation of our method is using information fusion to compensate the problem of insufficient frame rate of output image, and improve the robustness of target detection and depth estimation under monocular vision. Object Detection is based on YOLO-v5.
no code implementations • 5 Aug 2021 • Yuanhang Zhang, Susan Liang, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan, Xilin Chen
Our solution is a novel, unified framework that focuses on jointly modeling multiple types of contextual information: spatial context to indicate the position and scale of each candidate's face, relational context to capture the visual relationships among the candidates and contrast audio-visual affinities with each other, and temporal context to aggregate long-term information and smooth out local uncertainties.
no code implementations • The ActivityNet Large-Scale Activity Recognition Challenge Workshop, CVPR 2021 • Yuanhang Zhang, Susan Liang, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan
This report presents a brief description of our method for the AVA Active Speaker Detection (ASD) task at ActivityNet Challenge 2021.
no code implementations • 15 Feb 2020 • Nicolas K. Fontaine, Yuanhang Zhang, Haoshuo Chen, Roland Ryf, David T. Neilson, Guifang Li, Mark Cappuzzo, Rose Kopf, Al Tate, Hugo Safar, Cristian Bolle, Mark Earnshaw, Joel Carpenter
We designed, fabricated and tested an optical hybrid that supports an octave of bandwidth (900-1800 nm) and below 4-dB insertion loss using multiplane light conversion.
Optics
no code implementations • The ActivityNet Large-Scale Activity Recognition Challenge Workshop, CVPR 2019 • Yuanhang Zhang, Jingyun Xiao, Shuang Yang, Shiguang Shan
This report describes the approach underlying our submission to the active speaker detection task (task B-2) of ActivityNet Challenge 2019.
Ranked #18 on Audio-Visual Active Speaker Detection on AVA-ActiveSpeaker (using extra training data)