Search Results for author: Hao Shao

Found 22 papers, 15 papers with code

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

no code implementations15 Dec 2024 Hao Shao, Shulun Wang, Yang Zhou, Guanglu Song, Dailan He, Shuo Qin, Zhuofan Zong, Bingqi Ma, Yu Liu, Hongsheng Li

Our approach effectively mitigates key challenges in video face swapping, including temporal flickering, identity preservation, and robustness to occlusions and pose variations.

3D Reconstruction Attribute +3

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

no code implementations12 Dec 2024 Zhuofan Zong, Dongzhi Jiang, Bingqi Ma, Guanglu Song, Hao Shao, Dazhong Shen, Yu Liu, Hongsheng Li

To effectively exploit consistent visual elements within multiple images, we leverage the multi-image comprehension and instruction-following capabilities of the multimodal large language model (MLLM), prompting it to capture consistent visual elements based on the instruction.

Image Comprehension Image Generation +4

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

1 code implementation11 Oct 2024 Yang Zhou, Hao Shao, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu

Extensive experiments on multiple datasets demonstrate that SmartPretrain consistently improves the performance of state-of-the-art prediction models across datasets, data splits and main metrics.

Autonomous Vehicles motion prediction +3

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

1 code implementation19 Apr 2024 Zhuofan Zong, Bingqi Ma, Dazhong Shen, Guanglu Song, Hao Shao, Dongzhi Jiang, Hongsheng Li, Yu Liu

In the coarse-grained stage, we design a context-aware expert routing strategy to dynamically select the most suitable vision experts according to the user instruction, input image, and expertise of vision experts.

Language Modelling Large Language Model

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

1 code implementation25 Mar 2024 Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li

To address these challenges, we collect and introduce the large-scale Visual CoT dataset comprising 438k question-answer pairs, annotated with intermediate bounding boxes highlighting key regions essential for answering the questions.

Visual Question Answering (VQA)

SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction

1 code implementation CVPR 2024 Yang Zhou, Hao Shao, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu

Context information, such as road maps and surrounding agents' states, provides crucial geometric and semantic information for motion behavior prediction.

Autonomous Vehicles motion prediction +1

MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention

1 code implementation14 Dec 2023 Hao Shao, Quansheng Zeng, Qibin Hou, Jufeng Yang

To process the significant variations of lesion regions or organs in individual sizes and shapes, we also use multiple convolutions of strip-shape kernels with different kernel sizes in each axial attention path to improve the efficiency of the proposed MCA in encoding spatial information.

Image Segmentation Lesion Segmentation +4

Polyper: Boundary Sensitive Polyp Segmentation

1 code implementation14 Dec 2023 Hao Shao, Yang Zhang, Qibin Hou

We present a new boundary sensitive framework for polyp segmentation, called Polyper.

Segmentation

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

2 code implementations CVPR 2024 Hao Shao, Yuxuan Hu, Letian Wang, Steven L. Waslander, Yu Liu, Hongsheng Li

On the other hand, previous autonomous driving methods tend to rely on limited-format inputs (e. g. sensor data and navigation waypoints), restricting the vehicle's ability to understand language information and interact with humans.

Autonomous Driving Instruction Following

ReasonNet: End-to-End Driving with Temporal and Global Reasoning

no code implementations CVPR 2023 Hao Shao, Letian Wang, RuoBing Chen, Steven L. Waslander, Hongsheng Li, Yu Liu

The large-scale deployment of autonomous vehicles is yet to come, and one of the major remaining challenges lies in urban dense traffic scenarios.

Autonomous Driving

Efficient Reinforcement Learning for Autonomous Driving with Parameterized Skills and Priors

1 code implementation8 May 2023 Letian Wang, Jie Liu, Hao Shao, Wenshuo Wang, RuoBing Chen, Yu Liu, Steven L. Waslander

Inspired by this, we propose ASAP-RL, an efficient reinforcement learning algorithm for autonomous driving that simultaneously leverages motion skills and expert priors.

Autonomous Driving reinforcement-learning

Blending Anti-Aliasing into Vision Transformer

no code implementations NeurIPS 2021 Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia

In this work, we analyze the uncharted problem of aliasing in vision transformer and explore to incorporate anti-aliasing properties.

Leaning Compact and Representative Features for Cross-Modality Person Re-Identification

1 code implementation26 Mar 2021 Guangwei Gao, Hao Shao, Fei Wu, Meng Yang, Yi Yu

This paper pays close attention to the cross-modality visible-infrared person re-identification (VI Re-ID) task, which aims to match pedestrian samples between visible and infrared modes.

Cross-Modality Person Re-identification Knowledge Distillation +2

Self-supervised Temporal Learning

no code implementations1 Jan 2021 Hao Shao, Yu Liu, Hongsheng Li

Inspired by spatial-based contrastive SSL, we show that significant improvement can be achieved by a proposed temporal-based contrastive learning approach, which includes three novel and efficient modules: temporal augmentations, temporal memory bank and SSTL loss.

Contrastive Learning Retrieval +3

Complementary Boundary Generator with Scale-Invariant Relation Modeling for Temporal Action Localization: Submission to ActivityNet Challenge 2020

no code implementations20 Jul 2020 Haisheng Su, Jinyuan Feng, Hao Shao, Zhenyu Jiang, Manyuan Zhang, Wei Wu, Yu Liu, Hongsheng Li, Junjie Yan

Specifically, in order to generate high-quality proposals, we consider several factors including the video feature encoder, the proposal generator, the proposal-proposal relations, the scale imbalance, and ensemble strategy.

Diversity Temporal Action Localization

1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020

2 code implementations16 Jun 2020 Siyu Chen, Junting Pan, Guanglu Song, Manyuan Zhang, Hao Shao, Ziyi Lin, Jing Shao, Hongsheng Li, Yu Liu

This technical report introduces our winning solution to the spatio-temporal action localization track, AVA-Kinetics Crossover, in ActivityNet Challenge 2020.

Relation Network Spatio-Temporal Action Localization +1

Top-1 Solution of Multi-Moments in Time Challenge 2019

1 code implementation12 Mar 2020 Manyuan Zhang, Hao Shao, Guanglu Song, Yu Liu, Junjie Yan

In this technical report, we briefly introduce the solutions of our team 'Efficient' for the Multi-Moments in Time challenge in ICCV 2019.

Action Recognition Video Understanding

Temporal Interlacing Network

4 code implementations17 Jan 2020 Hao Shao, Shengju Qian, Yu Liu

In this way, a heavy temporal model is replaced by a simple interlacing operator.

Optical Flow Estimation Video Understanding

CCKS 2019 Shared Task on Inter-Personal Relationship Extraction

1 code implementation29 Aug 2019 Haitao Wang, Zhengqiu He, Tong Zhu, Hao Shao, Wenliang Chen, Min Zhang

In this paper, we present the task definition, the description of data and the evaluation methodology used during this shared task.

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.