X-VILA: Cross-Modality Alignment for Large Language Model

no code implementations29 May 2024 Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei Ping, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu Yin

We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities.

Instruction Following Language Modelling +1

DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data

no code implementations CVPR 2024 Hanrong Ye, Dan Xu

It designs a joint diffusion and denoising paradigm to model a potential noisy distribution in the task prediction or feature maps and generate rectified outputs for different tasks.

Denoising Scene Understanding

SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis

no code implementations6 Nov 2023 Hanrong Ye, Jason Kuen, Qing Liu, Zhe Lin, Brian Price, Dan Xu

On the highly competitive ADE20K and COCO benchmarks, our data generation method markedly improves the performance of state-of-the-art segmentation models in semantic segmentation, panoptic segmentation, and instance segmentation.

Image Generation Image Segmentation +3

TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts

1 code implementation ICCV 2023 Hanrong Ye, Dan Xu

Furthermore, to establish long-range modeling of the task-specific representations from different layers of TaskExpert, we design a multi-task feature memory that updates at each layer and acts as an additional feature expert for dynamic task-specific feature decoding.

Long-range modeling Multi-Task Learning +1

Contrastive Multi-Task Dense Prediction

no code implementations16 Jul 2023 Siwei Yang, Hanrong Ye, Dan Xu

A core objective in design is how to effectively model cross-task interactions to achieve a comprehensive improvement on different tasks based on their inherent complementarity and consistency.

Contrastive Learning Representation Learning

InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding

1 code implementation8 Jun 2023 Hanrong Ye, Dan Xu

And then, we design a transformer decoder to establish spatial and cross-task interaction globally, and a novel UP-Transformer block is devised to increase the resolutions of multi-task features gradually and establish cross-task interaction at different scales.

Decoder Multi-Task Learning +1

Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection, Segmentation, and Depth Estimation

1 code implementation3 Apr 2023 Hanrong Ye, Dan Xu

TaskPrompter introduces a new multi-task benchmark based on Cityscapes-3D dataset, which requires the multi-task model to concurrently generate predictions for monocular 3D vehicle detection, semantic segmentation, and monocular depth estimation.

3D Object Detection Autonomous Driving +4

InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding

1 code implementation15 Mar 2022 Hanrong Ye, Dan Xu

Multi-task dense scene understanding is a thriving research domain that requires simultaneous perception and reasoning on a series of correlated tasks with pixel-wise prediction.

Boundary Detection Human Parsing +6

MOFA: Modular Factorial Design for Hyperparameter Optimization

no code implementations18 Nov 2020 Bo Xiong, Yimin Huang, Hanrong Ye, Steffen Staab, Zhenguo Li

MOFA pursues several rounds of HPO, where each round alternates between exploration of hyperparameter space by factorial design and exploitation of evaluation results by factorial analysis.

Hyperparameter Optimization Model Selection

An Asymptotically Optimal Multi-Armed Bandit Algorithm and Hyperparameter Optimization

1 code implementation11 Jul 2020 Yimin Huang, Yu-Jun Li, Hanrong Ye, Zhenguo Li, Zhihua Zhang

The evaluation of hyperparameters, neural architectures, or data augmentation policies becomes a critical model selection problem in advanced deep learning with a large hyperparameter search space.

Bayesian Optimization Data Augmentation +6

Bi-directional Exponential Angular Triplet Loss for RGB-Infrared Person Re-Identification

1 code implementation1 Jun 2020 Hanrong Ye, Hong Liu, Fanyang Meng, Xia Li

As an angularly discriminative feature space is important for classifying the human images based on their embedding vectors, in this paper, we propose a novel ranking loss function, named Bi-directional Exponential Angular Triplet Loss, to help learn an angularly separable common feature space by explicitly constraining the included angles between embedding vectors.

Person Re-Identification

Video Logo Retrieval based on local Features

1 code implementation11 Aug 2018 Bochen Guan, Hanrong Ye, Hong Liu, William A. Sethares

Estimation of the frequency and duration of logos in videos is important and challenging in the advertisement industry as a way of estimating the impact of ad purchases.

Image Retrieval Retrieval +1

