Search Results for author: Wanli Ouyang

Found 300 papers, 140 papers with code

GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI

no code implementations2 Sep 2024 Xiangyuan Xue, Zeyu Lu, Di Huang, Wanli Ouyang, Lei Bai

Much previous AI research has focused on developing monolithic models to maximize their intelligence and capability, with the primary goal of enhancing performance on specific tasks.

MambaDS: Near-Surface Meteorological Field Downscaling with Topography Constrained Selective State Space Modeling

no code implementations20 Aug 2024 Zili Liu, Hao Chen, Lei Bai, Wenyuan Li, Wanli Ouyang, Zhengxia Zou, Zhenwei Shi

In an era of frequent extreme weather and global warming, obtaining precise, fine-grained near-surface weather forecasts is increasingly essential for human activities.

Super-Resolution

NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction

no code implementations19 Aug 2024 Yifan Wang, Di Huang, Weicai Ye, Guofeng Zhang, Wanli Ouyang, Tong He

Signed Distance Function (SDF)-based volume rendering has demonstrated significant capabilities in surface reconstruction.

Surface Reconstruction

Fast Information Streaming Handler (FisH): A Unified Seismic Neural Network for Single Station Real-Time Earthquake Early Warning

no code implementations13 Aug 2024 Tianning Zhang, Feng Liu, Yuming Yuan, Rui Su, Wanli Ouyang, Lei Bai

FisH is designed to process real-time streaming seismic data and generate simultaneous results for phase picking, location estimation, and magnitude estimation in an end-to-end fashion.

Event Detection

VegeDiff: Latent Diffusion Model for Geospatial Vegetation Forecasting

no code implementations17 Jul 2024 Sijie Zhao, Hao Chen, Xueliang Zhang, Pengfeng Xiao, Lei Bai, Wanli Ouyang

By capturing the uncertainties in vegetation changes and modeling the complex influence of relevant variables, VegeDiff outperforms existing deterministic methods, providing clear and accurate forecasting results of future vegetation states.

TCFormer: Visual Recognition via Token Clustering Transformer

1 code implementation16 Jul 2024 Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

Our dynamic tokens possess two crucial characteristics: (1) Representing image regions with similar semantic meanings using the same vision token, even if those regions are not adjacent, and (2) concentrating on regions with valuable details and represent them using fine tokens.

Clustering Image Classification +4

Semi-supervised 3D Object Detection with PatchTeacher and PillarMix

no code implementations13 Jul 2024 Xiaopei Wu, Liang Peng, Liang Xie, Yuenan Hou, Binbin Lin, Xiaoshui Huang, Haifeng Liu, Deng Cai, Wanli Ouyang

In this paper, we propose PatchTeacher, which focuses on partial scene 3D object detection to provide high-quality pseudo labels for the student.

3D Object Detection Data Augmentation +2

PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines

1 code implementation11 Jul 2024 Zidong Wang, Zeyu Lu, Di Huang, Tong He, Xihui Liu, Wanli Ouyang, Lei Bai

In this paper, we introduce PredBench, a benchmark tailored for the holistic evaluation of spatio-temporal prediction networks.

Benchmarking

VEnhancer: Generative Space-Time Enhancement for Video Generation

no code implementations10 Jul 2024 Jingwen He, Tianfan Xue, Dongyang Liu, Xinqi Lin, Peng Gao, Dahua Lin, Yu Qiao, Wanli Ouyang, Ziwei Liu

Given a generated low-quality video, our approach can increase its spatial and temporal resolution simultaneously with arbitrary up-sampling space and time scales through a unified video diffusion model.

Data Augmentation Video Generation +1

Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space

no code implementations17 Jun 2024 YuAn Wang, Zhao Wang, Junhao Gong, Di Huang, Tong He, Wanli Ouyang, Jile Jiao, Xuetao Feng, Qi Dou, Shixiang Tang, Dan Xu

In this paper, we introduce a novel path to $\textit{general}$ human motion generation by focusing on 2D space.

Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level

no code implementations17 Jun 2024 Jie Liu, Zhanhui Zhou, Jiaheng Liu, Xingyuan Bu, Chao Yang, Han-sen Zhong, Wanli Ouyang

In this work, we identify a pitfall of vanilla iterative DPO - improved response quality can lead to increased verbosity.

BEACON: Benchmark for Comprehensive RNA Tasks and Language Models

1 code implementation14 Jun 2024 Yuchen Ren, ZhiYuan Chen, Lifeng Qiao, Hongtai Jing, Yuchen Cai, Sheng Xu, Peng Ye, Xinzhu Ma, Siqi Sun, Hongliang Yan, Dong Yuan, Wanli Ouyang, Xihui Liu

RNA plays a pivotal role in translating genetic instructions into functional outcomes, underscoring its importance in biological processes and disease mechanisms.

Language Modelling

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

1 code implementation11 Jun 2024 Di Zhang, Xiaoshui Huang, Dongzhan Zhou, Yuqiang Li, Wanli Ouyang

This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks.

Decision Making GSM8K +2

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT

1 code implementation5 Jun 2024 Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao

Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions.

Point Cloud Generation Text-to-Image Generation

Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision

no code implementations5 Jun 2024 Minglei Li, Peng Ye, Yongqi Huang, Lin Zhang, Tao Chen, Tong He, Jiayuan Fan, Wanli Ouyang

Parameter-efficient fine-tuning (PEFT) has become increasingly important as foundation models continue to grow in both popularity and size.

3D Classification parameter-efficient fine-tuning

FNP: Fourier Neural Processes for Arbitrary-Resolution Data Assimilation

no code implementations3 Jun 2024 Kun Chen, Tao Chen, Peng Ye, Hao Chen, Kang Chen, Tao Han, Wanli Ouyang, Lei Bai

Data assimilation is a vital component in modern global medium-range weather forecasting systems to obtain the best estimation of the atmospheric state by combining the short-term forecast and observations.

Weather Forecasting

ORCA: A Global Ocean Emulator for Multi-year to Decadal Predictions

no code implementations24 May 2024 Zijie Guo, Pumeng Lyu, Fenghua Ling, Jing-Jia Luo, Niklas Boers, Wanli Ouyang, Lei Bai

Hindcasts of key oceanic variables demonstrate ORCA's remarkable prediction skills in predicting ocean variations compared with state-of-the-art numerical OGCMs and abilities in capturing occurrences of extreme events at the subsurface ocean and ENSO vertical patterns.

EMR-Merging: Tuning-Free High-Performance Model Merging

1 code implementation23 May 2024 Chenyu Huang, Peng Ye, Tao Chen, Tong He, Xiangyu Yue, Wanli Ouyang

In this paper, we rethink and analyze the existing model merging paradigm.

Dense Connector for MLLMs

1 code implementation22 May 2024 Huanjin Yao, Wenhao Wu, Taojiannan Yang, Yuxin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, Jingdong Wang

We witness the rise of larger and higher-quality instruction datasets, as well as the involvement of larger-sized LLMs.

Video Understanding

Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling

1 code implementation22 May 2024 Wanghan Xu, Fenghua Ling, Wenlong Zhang, Tao Han, Hao Chen, Wanli Ouyang, Lei Bai

Data-driven artificial intelligence (AI) models have made significant advancements in weather forecasting, particularly in medium-range and nowcasting.

Weather Forecasting

DocReLM: Mastering Document Retrieval with Language Model

no code implementations19 May 2024 Gengchen Wei, Xinle Pang, Tianning Zhang, Yu Sun, Xun Qian, Chen Lin, Han-sen Zhong, Wanli Ouyang

With over 200 million published academic documents and millions of new documents being written each year, academic researchers face the challenge of searching for information within this vast corpus.

Language Modelling Retrieval

An Embarrassingly Simple Approach to Enhance Transformer Performance in Genomic Selection for Crop Breeding

1 code implementation15 May 2024 Renqi Chen, Wenwei Han, Haohao Zhang, Haoyang Su, Zhefan Wang, Xiaolei Liu, Hao Jiang, Wanli Ouyang, Nanqing Dong

Genomic selection (GS), as a critical crop breeding strategy, plays a key role in enhancing food production and addressing the global hunger crisis.

Taming Stable Diffusion for Text to 360° Panorama Image Generation

1 code implementation11 Apr 2024 Cheng Zhang, Qianyi Wu, Camilo Cruz Gambardella, Xiaoshui Huang, Dinh Phung, Wanli Ouyang, Jianfei Cai

Generative models, e. g., Stable Diffusion, have enabled the creation of photorealistic images from text prompts.

Denoising Image Generation

How Much Data are Enough? Investigating Dataset Requirements for Patch-Based Brain MRI Segmentation Tasks

no code implementations4 Apr 2024 Dongang Wang, Peilin Liu, Hengrui Wang, Heidi Beadnall, Kain Kyle, Linda Ly, Mariano Cabezas, Geng Zhan, Ryan Sullivan, Weidong Cai, Wanli Ouyang, Fernando Calamante, Michael Barnett, Chenyu Wang

This paper focuses on an early stage phase of deep learning research, prior to model development, and proposes a strategic framework for estimating the amount of annotated data required to train patch-based segmentation networks.

MRI segmentation

RS-Mamba for Large Remote Sensing Image Dense Prediction

1 code implementation3 Apr 2024 Sijie Zhao, Hao Chen, Xueliang Zhang, Pengfeng Xiao, Lei Bai, Wanli Ouyang

RSM is specifically designed to capture the global context of remote sensing images with linear complexity, facilitating the effective processing of large VHR images.

Building change detection for remote sensing images Change Detection +1

RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

no code implementations28 Mar 2024 Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng

The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments.

Motion Planning

GVGEN: Text-to-3D Generation with Volumetric Representation

no code implementations19 Mar 2024 Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He

To better optimize the representation of these details, we propose a unique pruning and densifying method named the Candidate Pool Strategy, enhancing detail fidelity through selective optimization.

3D Generation 3D geometry +2

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

1 code implementation19 Mar 2024 Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Philip Torr, Jian Wu

We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object detection ability of multimodal large language models (MLLMs), such as GPT-4V and Gemini.

Object object-detection +3

HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation

1 code implementation18 Mar 2024 Sha Zhang, Jiajun Deng, Lei Bai, Houqiang Li, Wanli Ouyang, Yanyong Zhang

We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network with a pre-trained image network in an unsupervised man- ner.

Knowledge Distillation NER +1

Agent3D-Zero: An Agent for Zero-shot 3D Understanding

no code implementations18 Mar 2024 Sha Zhang, Di Huang, Jiajun Deng, Shixiang Tang, Wanli Ouyang, Tong He, Yanyong Zhang

The ability to understand and reason the 3D real world is a crucial milestone towards artificial general intelligence.

Language Modelling Scene Understanding

PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest

no code implementations14 Mar 2024 Jiajun Deng, Sha Zhang, Feras Dayoub, Wanli Ouyang, Yanyong Zhang, Ian Reid

In particular, our PoIFusion follows the paradigm of query-based object detection, formulating object queries as dynamic 3D boxes and generating a set of PoIs based on each query box.

3D Object Detection Object +1

LOCR: Location-Guided Transformer for Optical Character Recognition

no code implementations4 Mar 2024 Yu Sun, Dongzhan Zhou, Chen Lin, Conghui He, Wanli Ouyang, Han-sen Zhong

Academic documents are packed with texts, equations, tables, and figures, requiring comprehensive understanding for accurate Optical Character Recognition (OCR).

Marketing Optical Character Recognition +1

Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation

no code implementations2 Mar 2024 Lian Xu, Mohammed Bennamoun, Farid Boussaid, Wanli Ouyang, Ferdous Sohel, Dan Xu

We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from these saliency maps and the significant inter-task correlation between saliency detection and semantic segmentation.

Auxiliary Learning Multi-Label Image Classification +5

NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection

1 code implementation22 Feb 2024 Chenxi Huang, Yuenan Hou, Weicai Ye, Di Huang, Xiaoshui Huang, Binbin Lin, Deng Cai, Wanli Ouyang

We project the freely available 3D segmentation annotations onto the 2D plane and leverage the corresponding 2D semantic maps as the supervision signal, significantly enhancing the semantic awareness of multi-view detectors.

Depth Estimation Depth Prediction +1

ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models

1 code implementation22 Feb 2024 Yanan Wu, Jie Liu, Xingyuan Bu, Jiaheng Liu, Zhanhui Zhou, Yuanxing Zhang, Chenchen Zhang, Zhiqi Bai, Haibin Chen, Tiezheng Ge, Wanli Ouyang, Wenbo Su, Bo Zheng

This paper introduces ConceptMath, a bilingual (English and Chinese), fine-grained benchmark that evaluates concept-wise mathematical reasoning of Large Language Models (LLMs).

Math Mathematical Reasoning

MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

1 code implementation22 Feb 2024 Ge Bai, Jie Liu, Xingyuan Bu, Yancheng He, Jiaheng Liu, Zhanhui Zhou, Zhuoran Lin, Wenbo Su, Tiezheng Ge, Bo Zheng, Wanli Ouyang

By conducting a detailed analysis of real multi-turn dialogue data, we construct a three-tier hierarchical ability taxonomy comprising 4208 turns across 1388 multi-turn dialogues in 13 distinct tasks.

FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model

2 code implementations19 Feb 2024 Zidong Wang, Zeyu Lu, Di Huang, Cai Zhou, Wanli Ouyang, Lei Bai

We have released all the codes and models at https://github. com/whlzy/FiT to promote the exploration of diffusion transformer models for arbitrary-resolution image generation.

Computational Efficiency Image Cropping +1

Self-consistent Validation for Machine Learning Electronic Structure

no code implementations15 Feb 2024 Gengyuan Hu, Gengchen Wei, Zekun Lou, Philip H. S. Torr, Wanli Ouyang, Han-sen Zhong, Chen Lin

Machine learning has emerged as a significant approach to efficiently tackle electronic structure problems.

Active Learning

Revealing Decurve Flows for Generalized Graph Propagation

no code implementations13 Feb 2024 Chen Lin, Liheng Ma, Yiyang Chen, Wanli Ouyang, Michael M. Bronstein, Philip H. S. Torr

\textbf{Secondly}, we propose the {\em Continuous Unified Ricci Curvature} (\textbf{CURC}), an extension of celebrated {\em Ollivier-Ricci Curvature} for directed and weighted graphs.

Graph Learning

CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling

no code implementations6 Feb 2024 Junchao Gong, Lei Bai, Peng Ye, Wanghan Xu, Na Liu, Jianhua Dai, Xiaokang Yang, Wanli Ouyang

Precipitation nowcasting based on radar data plays a crucial role in extreme weather prediction and has broad implications for disaster management.

Management

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning

1 code implementation4 Feb 2024 Haoyi Zhu, Yating Wang, Di Huang, Weicai Ye, Wanli Ouyang, Tong He

These outcomes suggest that the 3D point cloud is a valuable observation modality for intricate robotic tasks.

Zero-shot Generalization

Integration of cognitive tasks into artificial general intelligence test for large models

no code implementations4 Feb 2024 Youzhi Qu, Chen Wei, Penghui Du, Wenxin Che, Chi Zhang, Wanli Ouyang, Yatao Bian, Feiyang Xu, Bin Hu, Kai Du, Haiyan Wu, Jia Liu, Quanying Liu

During the evolution of large models, performance evaluation is necessarily performed to assess their capabilities and ensure safety before practical application.

ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast

1 code implementation2 Feb 2024 Wanghan Xu, Kang Chen, Tao Han, Hao Chen, Wanli Ouyang, Lei Bai

Data-driven weather forecast based on machine learning (ML) has experienced rapid development and demonstrated superior performance in the global medium-range forecast compared to traditional physics-based dynamical models.

Value prediction

A Comprehensive Survey on 3D Content Generation

1 code implementation2 Feb 2024 Jian Liu, Xiaoshui Huang, Tianyu Huang, Lu Chen, Yuenan Hou, Shixiang Tang, Ziwei Liu, Wanli Ouyang, WangMeng Zuo, Junjun Jiang, Xianming Liu

Recent years have witnessed remarkable advances in artificial intelligence generated content(AIGC), with diverse input modalities, e. g., text, image, video, audio and 3D.

Observation-Guided Meteorological Field Downscaling at Station Scale: A Benchmark and a New Method

1 code implementation22 Jan 2024 Zili Liu, Hao Chen, Lei Bai, Wenyuan Li, Keyan Chen, Zhengyi Wang, Wanli Ouyang, Zhengxia Zou, Zhenwei Shi

In this paper, we extend meteorological downscaling to arbitrary scattered station scales, establish a brand new benchmark and dataset, and retrieve meteorological states at any given station location from a coarse-resolution meteorological field.

Super-Resolution Weather Forecasting

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

1 code implementation7 Jan 2024 Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe

It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef).

Camouflaged Object Segmentation Dichotomous Image Segmentation +3

Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers

no code implementations25 Dec 2023 Peng Ye, Yongqi Huang, Chongjun Tu, Minglei Li, Tao Chen, Tong He, Wanli Ouyang

We first validate eight manually-defined partial fine-tuning strategies across kinds of datasets and vision transformer architectures, and find that some partial fine-tuning strategies (e. g., ffn only or attention only) can achieve better performance with fewer tuned parameters than full fine-tuning, and selecting appropriate layers is critical to partial fine-tuning.

parameter-efficient fine-tuning

Merging Vision Transformers from Different Tasks and Domains

no code implementations25 Dec 2023 Peng Ye, Chenyu Huang, Mingzhu Shen, Tao Chen, Yongqi Huang, Yuning Zhang, Wanli Ouyang

This work targets to merge various Vision Transformers (ViTs) trained on different tasks (i. e., datasets with different object categories) or domains (i. e., datasets with the same categories but different environments) into one unified model, yielding still good performance on each task or domain.

Efficient Architecture Search via Bi-level Data Pruning

no code implementations21 Dec 2023 Chongjun Tu, Peng Ye, Weihao Lin, Hancheng Ye, Chong Yu, Tao Chen, Baopu Li, Wanli Ouyang

Improving the efficiency of Neural Architecture Search (NAS) is a challenging but significant task that has received much attention.

Neural Architecture Search

Towards an end-to-end artificial intelligence driven global weather forecasting system

1 code implementation18 Dec 2023 Kun Chen, Lei Bai, Fenghua Ling, Peng Ye, Tao Chen, Jing-Jia Luo, Hao Chen, Yi Xiao, Kang Chen, Tao Han, Wanli Ouyang

Initial states are typically generated by traditional data assimilation components, which are computational expensive and time-consuming.

Weather Forecasting

ResoNet: Robust and Explainable ENSO Forecasts with Hybrid Convolution and Transformer Networks

no code implementations16 Dec 2023 Pumeng Lyu, Tao Tang, Fenghua Ling, Jing-Jia Luo, Niklas Boers, Wanli Ouyang, Lei Bai

Recent studies have shown that deep learning (DL) models can skillfully predict the El Ni\~no-Southern Oscillation (ENSO) forecasts over 1. 5 years ahead.

UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

no code implementations14 Dec 2023 Zexiang Liu, Yangguang Li, Youtian Lin, Xin Yu, Sida Peng, Yan-Pei Cao, Xiaojuan Qi, Xiaoshui Huang, Ding Liang, Wanli Ouyang

Recent advancements in text-to-3D generation technology have significantly advanced the conversion of textual descriptions into imaginative well-geometrical and finely textured 3D objects.

3D Generation Text to 3D

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

1 code implementation12 Dec 2023 Yinmin Zhang, Jie Liu, Chuming Li, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

In this paper, from a novel perspective, we systematically study the challenges that remain in O2O RL and identify that the reason behind the slow improvement of the performance and the instability of online finetuning lies in the inaccurate Q-value estimation inherited from offline pretraining.

Offline RL

Hulk: A Universal Knowledge Translator for Human-Centric Tasks

2 code implementations4 Dec 2023 Yizhou Wang, Yixuan Wu, Shixiang Tang, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang

Human-centric perception tasks, e. g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis.

3D Human Pose Estimation Action Recognition +8

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

2 code implementations27 Nov 2023 Wenhao Wu, Huanjin Yao, Mengxi Zhang, Yuxin Song, Wanli Ouyang, Jingdong Wang

Our study centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks: Firstly, we explore the potential of its generated rich textual descriptions across various categories to enhance recognition performance without any training.

Zero-Shot Learning

Point Cloud Pre-training with Diffusion Models

no code implementations CVPR 2024 Xiao Zheng, Xiaoshui Huang, Guofeng Mei, Yuenan Hou, Zhaoyang Lyu, Bo Dai, Wanli Ouyang, Yongshun Gong

This generator aggregates the features extracted by the backbone and employs them as the condition to guide the point-to-point recovery from the noisy point cloud, thereby assisting the backbone in capturing both local and global geometric priors as well as the global point density distribution of the object.

Point Cloud Pre-training

Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE

1 code implementation5 Nov 2023 Zeren Chen, Ziqin Wang, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Yu Qiao, Jing Shao

While this phenomenon has been overlooked in previous work, we propose a novel and extensible framework, called Octavius, for comprehensive studies and experimentation on multimodal learning with Multimodal Large Language Models (MLLMs).

Decoder Zero-shot Generalization

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

1 code implementation24 Oct 2023 Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang

It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.

Monocular 3D Object Detection object-detection

I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

no code implementations24 Oct 2023 Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang, Houqiang Li

Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities during pre-training.

Contrastive Learning Representation Learning

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

1 code implementation12 Oct 2023 Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang

In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway to 3D foundational models.

Ranked #2 on Semantic Segmentation on S3DIS (using extra training data)

3D Object Detection 3D Reconstruction +5

Toward Understanding BERT-Like Pre-Training for DNA Foundation Models

no code implementations11 Oct 2023 Chaoqi Liang, Lifeng Qiao, Peng Ye, Nanqing Dong, Jianle Sun, Weiqiang Bai, Yuchen Ren, Xinzhu Ma, Hongliang Yan, Chunfeng Song, Wanli Ouyang, WangMeng Zuo

However, existing pre-training methods for DNA sequences largely rely on direct adoptions of BERT pre-training from NLP, lacking a comprehensive understanding and a specifically tailored approach.

Towards Fair and Comprehensive Comparisons for Image-Based 3D Object Detection

no code implementations ICCV 2023 Xinzhu Ma, Yongtao Wang, Yinmin Zhang, Zhiyi Xia, Yuan Meng, Zhihui Wang, Haojie Li, Wanli Ouyang

In this work, we build a modular-designed codebase, formulate strong training recipes, design an error diagnosis toolbox, and discuss current methods for image-based 3D object detection.

3D Object Detection Object +1

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

2 code implementations5 Oct 2023 Zhanhui Zhou, Jie Liu, Jing Shao, Xiangyu Yue, Chao Yang, Wanli Ouyang, Yu Qiao

A single language model, even when aligned with labelers through reinforcement learning from human feedback (RLHF), may not suit all human preferences.

Language Modelling Long Form Question Answering +1

Understanding Masked Autoencoders From a Local Contrastive Perspective

no code implementations3 Oct 2023 Xiaoyu Yue, Lei Bai, Meng Wei, Jiangmiao Pang, Xihui Liu, Luping Zhou, Wanli Ouyang

Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies.

Contrastive Learning Data Augmentation +2

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

2 code implementations1 Oct 2023 Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Jian Yang, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Stephen W. Huang, Jie Fu, Junran Peng

The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters.

Benchmarking

DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior

2 code implementations29 Aug 2023 Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Wanli Ouyang, Yu Qiao, Chao Dong

We present DiffBIR, a general restoration pipeline that could handle different blind image restoration tasks in a unified framework.

Blind Face Restoration Image Denoising +2

Boosting Residual Networks with Group Knowledge

1 code implementation26 Aug 2023 Shengji Tang, Peng Ye, Baopu Li, Weihao Lin, Tao Chen, Tong He, Chong Yu, Wanli Ouyang

Specifically, we implicitly divide all subnets into hierarchical groups by subnet-in-subnet sampling, aggregate the knowledge of different subnets in each group during training, and exploit upper-level group knowledge to supervise lower-level subnet groups.

Knowledge Distillation

STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning

1 code implementation ICCV 2023 Tao Han, Lei Bai, Lingbo Liu, Wanli Ouyang

Scale variation is a deep-rooted problem in object counting, which has not been effectively addressed by existing scale-aware algorithms.

feature selection Object Counting

Masked Motion Predictors are Strong 3D Action Representation Learners

1 code implementation ICCV 2023 Yunyao Mao, Jiajun Deng, Wengang Zhou, Yao Fang, Wanli Ouyang, Houqiang Li

To be specific, the proposed MAMP takes as input the masked spatio-temporal skeleton sequence and predicts the corresponding temporal motion of the masked human joints.

motion prediction Skeleton Based Action Recognition

Experts Weights Averaging: A New General Training Scheme for Vision Transformers

no code implementations11 Aug 2023 Yongqi Huang, Peng Ye, Xiaoshui Huang, Sheng Li, Tao Chen, Tong He, Wanli Ouyang

As Vision Transformers (ViTs) are gradually surpassing CNNs in various visual tasks, one may question: if a training scheme specifically for ViTs exists that can also achieve performance improvement without increasing inference cost?

MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation

1 code implementation6 Aug 2023 Lian Xu, Mohammed Bennamoun, Farid Boussaid, Hamid Laga, Wanli Ouyang, Dan Xu

Building upon the observation that the attended regions of the one-class token in the standard vision transformer can contribute to a class-agnostic localization map, we explore the potential of the transformer model to capture class-specific attention for class-discriminative object localization by learning multiple class tokens.

Object Localization Weakly supervised Semantic Segmentation +1

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning

no code implementations24 Jul 2023 Chuming Li, Ruonan Jia, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks due to its high sample efficiency.

Continuous Control Model-based Reinforcement Learning +1

Meta-Transformer: A Unified Framework for Multimodal Learning

1 code implementation20 Jul 2023 Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue

Multimodal learning aims to build models that can process and relate information from multiple modalities.

Time Series

What Can Simple Arithmetic Operations Do for Temporal Modeling?

2 code implementations ICCV 2023 Wenhao Wu, Yuxin Song, Zhun Sun, Jingdong Wang, Chang Xu, Wanli Ouyang

We conduct comprehensive ablation studies on the instantiation of ATMs and demonstrate that this module provides powerful temporal modeling capability at a low computational cost.

Action Classification Action Recognition +1

UniG3D: A Unified 3D Object Generation Dataset

no code implementations19 Jun 2023 Qinghong Sun, Yangguang Li, Zexiang Liu, Xiaoshui Huang, Fenggang Liu, Xihui Liu, Wanli Ouyang, Jing Shao

However, the quality and diversity of existing 3D object generation methods are constrained by the inadequacies of existing 3D object datasets, including issues related to text quality, the incompleteness of multi-modal data representation encompassing 2D rendered images and 3D assets, as well as the size of the dataset.

Autonomous Driving Object

MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators

no code implementations19 Jun 2023 Yaqi Zhang, Di Huang, Bin Liu, Shixiang Tang, Yan Lu, Lu Chen, Lei Bai, Qi Chu, Nenghai Yu, Wanli Ouyang

Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans.

Adaptive Hierarchical SpatioTemporal Network for Traffic Forecasting

no code implementations15 Jun 2023 YiRong Chen, Ziyue Li, Wanli Ouyang, Michael Lepech

In this work, we propose an Adaptive Hierarchical SpatioTemporal Network (AHSTN) to promote traffic forecasting by exploiting the spatial hierarchy and modeling multi-scale spatial correlations.

Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

1 code implementation CVPR 2024 Weizhen He, Yiheng Deng, Shixiang Tang, Qihao Chen, Qingsong Xie, Yizhou Wang, Lei Bai, Feng Zhu, Rui Zhao, Wanli Ouyang, Donglian Qi, Yunfeng Yan

This paper strives to resolve this problem by proposing a new instruct-ReID task that requires the model to retrieve images according to the given image or language instructions.

Person Re-Identification Triplet

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

1 code implementation NeurIPS 2023 Zhenfei Yin, Jiong Wang, JianJian Cao, Zhelun Shi, Dingning Liu, Mukai Li, Lu Sheng, Lei Bai, Xiaoshui Huang, Zhiyong Wang, Jing Shao, Wanli Ouyang

To the best of our knowledge, we present one of the very first open-source endeavors in the field, LAMM, encompassing a Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark.

Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection

1 code implementation CVPR 2023 Yingjie Wang, Jiajun Deng, Yao Li, Jinshui Hu, Cong Liu, Yu Zhang, Jianmin Ji, Wanli Ouyang, Yanyong Zhang

LiDAR and Radar are two complementary sensing approaches in that LiDAR specializes in capturing an object's 3D shape while Radar provides longer detection ranges as well as velocity hints.

object-detection Object Detection

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer

1 code implementation CVPR 2023 Honghui Yang, Wenxiao Wang, Minghao Chen, Binbin Lin, Tong He, Hua Chen, Xiaofei He, Wanli Ouyang

The key to associating the two different representations is our introduced input-dependent Query Initialization module, which could efficiently generate reference points and content queries.

Autonomous Driving Quantization

Stimulative Training++: Go Beyond The Performance Limits of Residual Networks

no code implementations4 May 2023 Peng Ye, Tong He, Shengji Tang, Baopu Li, Tao Chen, Lei Bai, Wanli Ouyang

In this work, we aim to re-investigate the training process of residual networks from a novel social psychology perspective of loafing, and further propose a new training scheme as well as three improved strategies for boosting residual networks beyond their performance limits.

FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead

1 code implementation6 Apr 2023 Kang Chen, Tao Han, Junchao Gong, Lei Bai, Fenghua Ling, Jing-Jia Luo, Xi Chen, Leiming Ma, Tianning Zhang, Rui Su, Yuanzheng Ci, Bin Li, Xiaokang Yang, Wanli Ouyang

We present FengWu, an advanced data-driven global medium-range weather forecast system based on Artificial Intelligence (AI).

Automatically Predict Material Properties with Microscopic Image Example Polymer Compatibility

no code implementations22 Mar 2023 Zhilong Liang, Zhenzhi Tan, Ruixin Hong, Wanli Ouyang, Jinying Yuan, ChangShui Zhang

Computer image recognition with machine learning method can make up the defects of artificial judging, giving accurate and quantitative judgement.

Transfer Learning

HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining

1 code implementation CVPR 2023 Shixiang Tang, Cheng Chen, Qingsong Xie, Meilin Chen, Yizhou Wang, Yuanzheng Ci, Lei Bai, Feng Zhu, Haiyang Yang, Li Yi, Rui Zhao, Wanli Ouyang

Specifically, we propose a \textbf{HumanBench} based on existing datasets to comprehensively evaluate on the common ground the generalization abilities of different pretraining methods on 19 datasets from 6 diverse downstream tasks, including person ReID, pose estimation, human parsing, pedestrian attribute recognition, pedestrian detection, and crowd counting.

 Ranked #1 on Pedestrian Attribute Recognition on PA-100K (using extra training data)

Attribute Autonomous Driving +5

UniHCP: A Unified Model for Human-Centric Perceptions

1 code implementation CVPR 2023 Yuanzheng Ci, Yizhou Wang, Meilin Chen, Shixiang Tang, Lei Bai, Feng Zhu, Rui Zhao, Fengwei Yu, Donglian Qi, Wanli Ouyang

When adapted to a specific task, UniHCP achieves new SOTAs on a wide range of human-centric tasks, e. g., 69. 8 mIoU on CIHP for human parsing, 86. 18 mA on PA-100K for attribute prediction, 90. 3 mAP on Market1501 for ReID, and 85. 8 JI on CrowdHuman for pedestrian detection, performing better than specialized models tailored for each task.

2D Pose Estimation Attribute +8

Multi-Scale Control Signal-Aware Transformer for Motion Synthesis without Phase

no code implementations3 Mar 2023 Lintao Wang, Kun Hu, Lei Bai, Yu Ding, Wanli Ouyang, Zhiyong Wang

As past poses often contain useful auxiliary hints, in this paper, we propose a task-agnostic deep learning method, namely Multi-scale Control Signal-aware Transformer (MCS-T), with an attention based encoder-decoder architecture to discover the auxiliary information implicitly for synthesizing controllable motion without explicitly requiring auxiliary information such as phase.

Decoder Feature Engineering +1

Saliency Guided Contrastive Learning on Scene Images

no code implementations22 Feb 2023 Meilin Chen, Yizhou Wang, Shixiang Tang, Feng Zhu, Haiyang Yang, Lei Bai, Rui Zhao, Donglian Qi, Wanli Ouyang

Despite being feasible, recent works largely overlooked discovering the most discriminative regions for contrastive learning to object representations in scene images.

Contrastive Learning Linear evaluation +2

Learning from pseudo-labels: deep networks improve consistency in longitudinal brain volume estimation

no code implementations8 Feb 2023 Geng Zhan, Dongang Wang, Mariano Cabezas, Lei Bai, Kain Kyle, Wanli Ouyang, Michael Barnett, Chenyu Wang

An accurate and robust quantitative measurement of brain volume change is paramount for translational research and clinical applications.

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

1 code implementation29 Jan 2023 Yangguang Li, Bin Huang, Zeren Chen, Yufeng Cui, Feng Liang, Mingzhu Shen, Fenggang Liu, Enze Xie, Lu Sheng, Wanli Ouyang, Jing Shao

Our Fast-BEV consists of five parts, We novelly propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image feature to 3D voxel space, (2) an multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference.

Data Augmentation

$β$-DARTS++: Bi-level Regularization for Proxy-robust Differentiable Architecture Search

1 code implementation16 Jan 2023 Peng Ye, Tong He, Baopu Li, Tao Chen, Lei Bai, Wanli Ouyang

To address the robustness problem, we first benchmark different NAS methods under a wide range of proxy data, proxy channels, proxy layers and proxy epochs, since the robustness of NAS under different kinds of proxies has not been explored before.

Neural Architecture Search

Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization

no code implementations CVPR 2023 Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, Dan Xu

Weakly supervised dense object localization (WSDOL) relies generally on Class Activation Mapping (CAM), which exploits the correlation between the class weights of the image classifier and the pixel-level features.

Object Localization Representation Learning +2

Crossing the Gap: Domain Generalization for Image Captioning

no code implementations CVPR 2023 Yuchen Ren, Zhendong Mao, Shancheng Fang, Yan Lu, Tong He, Hao Du, Yongdong Zhang, Wanli Ouyang

In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process.

Domain Generalization Image Captioning +1

Open-Set Fine-Grained Retrieval via Prompting Vision-Language Evaluator

no code implementations CVPR 2023 Shijie Wang, Jianlong Chang, Haojie Li, Zhihui Wang, Wanli Ouyang, Qi Tian

PLEor could leverage pre-trained CLIP model to infer the discrepancies encompassing both pre-defined and unknown subcategories, called category-specific discrepancies, and transfer them to the backbone network trained in the close-set scenarios.

Knowledge Distillation Retrieval +1

Ponder: Point Cloud Pre-training via Neural Rendering

no code implementations ICCV 2023 Di Huang, Sida Peng, Tong He, Honghui Yang, Xiaowei Zhou, Wanli Ouyang

We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering.

3D Reconstruction Image Generation +2

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

5 code implementations CVPR 2023 Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Action Classification Action Recognition +3

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

4 code implementations CVPR 2023 Wenhao Wu, Haipeng Luo, Bo Fang, Jingdong Wang, Wanli Ouyang

Most existing text-video retrieval methods focus on cross-modal matching between the visual content of videos and textual query sentences.

Data Augmentation Retrieval +2

MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency

no code implementations CVPR 2023 Mingye Xu, Mutian Xu, Tong He, Wanli Ouyang, Yali Wang, Xiaoguang Han, Yu Qiao

Besides, such scenes with progressive masking ratios can also serve to self-distill their intrinsic spatial consistency, requiring to learn the consistent representations from unmasked areas.

object-detection Object Detection +2

EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder

2 code implementations8 Dec 2022 Xiaoshui Huang, Zhou Huang, Sheng Li, Wentao Qu, Tong He, Yuenan Hou, Yifan Zuo, Wanli Ouyang

These token embeddings are concatenated with a task token and fed into the frozen CLIP transformer to learn point cloud representation.

Few-Shot Learning Segmentation +1

GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds

1 code implementation CVPR 2023 Honghui Yang, Tong He, Jiaheng Liu, Hua Chen, Boxi Wu, Binbin Lin, Xiaofei He, Wanli Ouyang

In contrast to previous 3D MAE frameworks, which either design a complex decoder to infer masked information from maintained regions or adopt sophisticated masking strategies, we instead propose a much simpler paradigm.

Decoder

Reconstructing Hand-Held Objects from Monocular Video

no code implementations30 Nov 2022 Di Huang, Xiaopeng Ji, Xingyi He, Jiaming Sun, Tong He, Qing Shuai, Wanli Ouyang, Xiaowei Zhou

The key idea is that the hand motion naturally provides multiple views of the object and the motion can be reliably estimated by a hand pose tracker.

Hand Pose Estimation Object

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

1 code implementation29 Nov 2022 Chuming Li, Jie Liu, Yinmin Zhang, Yuhong Wei, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

In the learning phase, each agent minimizes the TD error that is dependent on how the subsequent agents have reacted to their chosen action.

Decision Making Q-Learning +2

3D-QueryIS: A Query-based Framework for 3D Instance Segmentation

no code implementations17 Nov 2022 Jiaheng Liu, Tong He, Honghui Yang, Rui Su, Jiayi Tian, Junran Wu, Hongcheng Guo, Ke Xu, Wanli Ouyang

Previous top-performing methods for 3D instance segmentation often maintain inter-task dependencies and the tendency towards a lack of robustness.

3D Instance Segmentation Segmentation +1

Boosting Semi-Supervised 3D Object Detection with Semi-Sampling

no code implementations14 Nov 2022 Xiaopei Wu, Yang Zhao, Liang Peng, Hua Chen, Xiaoshui Huang, Binbin Lin, Haifeng Liu, Deng Cai, Wanli Ouyang

When training a teacher-student semi-supervised framework, we randomly select gt samples and pseudo samples to both labeled frames and unlabeled frames, making a strong data augmentation for them.

3D Object Detection Data Augmentation +2

Stimulative Training of Residual Networks: A Social Psychology Perspective of Loafing

1 code implementation9 Oct 2022 Peng Ye, Shengji Tang, Baopu Li, Tao Chen, Wanli Ouyang

In this work, we aim to re-investigate the training process of residual networks from a novel social psychology perspective of loafing, and further propose a new training strategy to strengthen the performance of residual networks.

CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training

1 code implementation ICCV 2023 Tianyu Huang, Bowen Dong, Yunhan Yang, Xiaoshui Huang, Rynson W. H. Lau, Wanli Ouyang, WangMeng Zuo

To address this issue, we propose CLIP2Point, an image-depth pre-training method by contrastive learning to transfer CLIP to the 3D domain, and adapt it to point cloud classification.

Contrastive Learning Few-Shot Learning +5

Towards Frame Rate Agnostic Multi-Object Tracking

1 code implementation23 Sep 2022 Weitao Feng, Lei Bai, Yongqiang Yao, Fengwei Yu, Wanli Ouyang

In this paper, we propose a Frame Rate Agnostic MOT framework with a Periodic training Scheme (FAPS) to tackle the FraMOT problem for the first time.

Multi-Object Tracking Object

ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild

1 code implementation23 Aug 2022 Lumin Xu, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

We propose a single-network approach, termed ZoomNet, to take into account the hierarchical structure of the full human body and solve the scale variation of different body parts.

2D Human Pose Estimation Neural Architecture Search +1

Fine-grained Retrieval Prompt Tuning

no code implementations29 Jul 2022 Shijie Wang, Jianlong Chang, Zhihui Wang, Haojie Li, Wanli Ouyang, Qi Tian

In this paper, we develop Fine-grained Retrieval Prompt Tuning (FRPT), which steers a frozen pre-trained model to perform the fine-grained retrieval task from the perspectives of sample prompting and feature adaptation.

Retrieval

3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal

1 code implementation22 Jul 2022 Hao Meng, Sheng Jin, Wentao Liu, Chen Qian, Mengxiang Lin, Wanli Ouyang, Ping Luo

Unlike most previous works that directly predict the 3D poses of two interacting hands simultaneously, we propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.

3D Interacting Hand Pose Estimation Hand Pose Estimation