Search Results for author: Yuanhang Zhang

Found 12 papers, 1 papers with code

AD-H: Autonomous Driving with Hierarchical Agents

no code implementations • 5 Jun 2024 • Zaibin Zhang, Shiyu Tang, Yuanhang Zhang, Talas Fu, Yifan Wang, Yang Liu, Dong Wang, Jing Shao, Lijun Wang, Huchuan Lu

However, prevalent approaches often directly translate high-level instructions into low-level vehicle control signals, which deviates from the inherent language generation paradigm of MLLMs and fails to fully harness their emergent powers.

Autonomous Driving Text Generation

Paper
Add Code

Translating Expert Intuition into Quantifiable Features: Encode Investigator Domain Knowledge via LLM for Enhanced Predictive Analytics

no code implementations • 11 May 2024 • Phoebe Jing, Yijing Gao, Yuanhang Zhang, Xianlong Zeng

In the realm of predictive analytics, the nuanced domain knowledge of investigators often remains underutilized, confined largely to subjective interpretations and ad hoc decision-making.

Decision Making Natural Language Understanding

Paper
Add Code

Regularization-Based Efficient Continual Learning in Deep State-Space Models

no code implementations • 15 Mar 2024 • Yuanhang Zhang, Zhidi Lin, Yiyong Sun, Feng Yin, Carsten Fritsche

Deep state-space models (DSSMs) have gained popularity in recent years due to their potent modeling capacity for dynamic systems.

Continual Learning

Paper
Add Code

ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations

no code implementations • CVPR 2024 • Yuanhang Zhang, Shuang Yang, Shiguang Shan, Xilin Chen

While many recent approaches for this task primarily rely on guiding the learning process using the audio modality alone to capture information shared between audio and video we reframe the problem as the acquisition of shared unique (modality-specific) and synergistic speech information to address the inherent asymmetry between the modalities.

Paper
Add Code

Vertex-based Networks to Accelerate Path Planning Algorithms

no code implementations • 13 Jul 2023 • Yuanhang Zhang, Jundong Liu

Path planning plays a crucial role in various autonomy applications, and RRT* is one of the leading solutions in this field.

Paper
Add Code

BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy

no code implementations • 26 May 2023 • Zaibin Zhang, Yuanhang Zhang, Lijun Wang, Yifan Wang, Huchuan Lu

At the core of our method is the newly-designed instance occupancy prediction (IOP) module, which aims to infer point-level occupancy status for each instance in the frustum space.

Paper
Add Code

UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022

no code implementations • 22 Jun 2022 • Yuanhang Zhang, Susan Liang, Shuang Yang, Shiguang Shan

This report presents a brief description of our winning solution to the AVA Active Speaker Detection (ASD) task at ActivityNet Challenge 2022.

Ranked #3 on Audio-Visual Active Speaker Detection on AVA-ActiveSpeaker

Audio-Visual Active Speaker Detection

Paper
Add Code

RealNet: Combining Optimized Object Detection with Information Fusion Depth Estimation Co-Design Method on IoT

2 code implementations • 24 Apr 2022 • Zhuohao Li, Fandi Gou, Qixin De, Leqi Ding, Yuanhang Zhang, Yunze Cai

Innovation of our method is using information fusion to compensate the problem of insufficient frame rate of output image, and improve the robustness of target detection and depth estimation under monocular vision. Object Detection is based on YOLO-v5.

Autonomous Driving Depth Estimation +3

Paper
Code

UniCon: Unified Context Network for Robust Active Speaker Detection

no code implementations • 5 Aug 2021 • Yuanhang Zhang, Susan Liang, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan, Xilin Chen

Our solution is a novel, unified framework that focuses on jointly modeling multiple types of contextual information: spatial context to indicate the position and scale of each candidate's face, relational context to capture the visual relationships among the candidates and contrast audio-visual affinities with each other, and temporal context to aggregate long-term information and smooth out local uncertainties.

Ranked #11 on Audio-Visual Active Speaker Detection on AVA-ActiveSpeaker

Audio-Visual Active Speaker Detection

Paper
Add Code

ICTCAS-UCAS-TAL Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2021

no code implementations • The ActivityNet Large-Scale Activity Recognition Challenge Workshop, CVPR 2021 • Yuanhang Zhang, Susan Liang, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan

This report presents a brief description of our method for the AVA Active Speaker Detection (ASD) task at ActivityNet Challenge 2021.

Ranked #7 on Audio-Visual Active Speaker Detection on AVA-ActiveSpeaker

Audio-Visual Active Speaker Detection

Paper
Add Code

Ultrabroadband Polarization Insensitive Hybrid using Multiplane Light Conversion

no code implementations • 15 Feb 2020 • Nicolas K. Fontaine, Yuanhang Zhang, Haoshuo Chen, Roland Ryf, David T. Neilson, Guifang Li, Mark Cappuzzo, Rose Kopf, Al Tate, Hugo Safar, Cristian Bolle, Mark Earnshaw, Joel Carpenter

We designed, fabricated and tested an optical hybrid that supports an octave of bandwidth (900-1800 nm) and below 4-dB insertion loss using multiplane light conversion.

Optics

Paper
Add Code

Multi-Task Learning for Audio Visual Active Speaker Detection

no code implementations • The ActivityNet Large-Scale Activity Recognition Challenge Workshop, CVPR 2019 • Yuanhang Zhang, Jingyun Xiao, Shuang Yang, Shiguang Shan

This report describes the approach underlying our submission to the active speaker detection task (task B-2) of ActivityNet Challenge 2019.

Ranked #18 on Audio-Visual Active Speaker Detection on AVA-ActiveSpeaker (using extra training data)

Audio-Visual Active Speaker Detection Lipreading +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.