no code implementations • 19 Dec 2024 • Qihao Liu, Xi Yin, Alan Yuille, Andrew Brown, Mannat Singh
For cross-modal tasks such as text-to-image generation, this same mapping from noise to image is learnt whilst including a conditioning mechanism in the model.
no code implementations • 30 Oct 2024 • Jin Huang, Xinyu Li, Liang Gao, Qihao Liu, Yue Teng
To enhance the capabilities of LLMs in automatic HDRs design, this paper proposes a novel population self-evolutionary (SeEvo) method, a general search framework inspired by the self-reflective design strategies of human experts.
no code implementations • 18 Jul 2024 • Wufei Ma, Kai Li, Zhongshi Jiang, Moustafa Meshry, Qihao Liu, Huiyu Wang, Christian Häne, Alan Yuille
In order to narrow the gap between video-text models and human performance on RCAD, we identify a key limitation of current contrastive approaches on video-text data and introduce LLM-teacher, a more effective approach to learn action semantics by leveraging knowledge obtained from a pretrained large language model.
1 code implementation • 13 Jun 2024 • Wufei Ma, Guanning Zeng, Guofeng Zhang, Qihao Liu, Letian Zhang, Adam Kortylewski, Yaoyao Liu, Alan Yuille
A vision model with general-purpose object-level 3D understanding should be capable of inferring both 2D (e. g., class name and bounding box) and 3D information (e. g., 3D location and 3D viewpoint) for arbitrary rigid objects in natural images.
Image Captioning Linear Probing Object-Level 3D Awareness +2
no code implementations • 13 Jun 2024 • Qihao Liu, Zhanpeng Zeng, Ju He, Qihang Yu, Xiaohui Shen, Liang-Chieh Chen
This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization.
Ranked #12 on Image Generation on ImageNet 256x256
1 code implementation • CVPR 2024 • Qihao Liu, Yi Zhang, Song Bai, Adam Kortylewski, Alan Yuille
Unlike recent 3D generative models that rely on clean and well-aligned 3D data, limiting them to single or few-class generation, our model is directly trained on extensive noisy and unaligned `in-the-wild' 3D assets, mitigating the key challenge (i. e., data scarcity) in large-scale 3D generation.
1 code implementation • 15 Dec 2023 • Qian Wang, Yaoyao Liu, Hefei Ling, Yingwei Li, Qihao Liu, Ping Li, Jiazhong Chen, Alan Yuille, Ning Yu
In response to the rapidly evolving nature of adversarial attacks against visual classifiers on a monthly basis, numerous defenses have been proposed to generalize against as many known attacks as possible.
1 code implementation • CVPR 2024 • Junfeng Wu, Yi Jiang, Qihao Liu, Zehuan Yuan, Xiang Bai, Song Bai
We present GLEE in this work, an object-level foundation model for locating and identifying objects in images and videos.
Ranked #1 on Referring Video Object Segmentation on Refer-YouTube-VOS (using extra training data)
Long-tail Video Object Segmentation Multi-Object Tracking +8
no code implementations • ICCV 2023 • Jiacong Xu, Yi Zhang, Jiawei Peng, Wufei Ma, Artur Jesslen, Pengliang Ji, Qixin Hu, Jiehua Zhang, Qihao Liu, Jiahao Wang, Wei Ji, Chen Wang, Xiaoding Yuan, Prakhar Kaushik, Guofeng Zhang, Jie Liu, Yushan Xie, Yawen Cui, Alan Yuille, Adam Kortylewski
Animal3D consists of 3379 images collected from 40 mammal species, high-quality annotations of 26 keypoints, and importantly the pose and shape parameters of the SMAL model.
Ranked #1 on Animal Pose Estimation on Animal3D
no code implementations • 13 Jun 2023 • Wufei Ma, Qihao Liu, Jiahao Wang, Angtian Wang, Xiaoding Yuan, Yi Zhang, Zihao Xiao, Guofeng Zhang, Beijia Lu, Ruxiao Duan, Yongrui Qi, Adam Kortylewski, Yaoyao Liu, Alan Yuille
With explicit 3D geometry control, we can easily change the 3D structures of the objects in the generated images and obtain ground-truth 3D annotations automatically.
no code implementations • 1 Jun 2023 • Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, Alan Yuille
(2) We find regions in the latent space that lead to distorted images independent of the text prompt, suggesting that parts of the latent space are not well-structured.
1 code implementation • CVPR 2023 • Qihao Liu, Junfeng Wu, Yi Jiang, Xiang Bai, Alan Yuille, Song Bai
A common solution is to use optical flow to provide motion information, but essentially it only considers pixel-level motion, which still relies on appearance similarity and hence is often inaccurate under occlusion and fast movement.
1 code implementation • CVPR 2023 • Qihao Liu, Adam Kortylewski, Alan Yuille
We introduce a learning-based testing method, termed PoseExaminer, that automatically diagnoses HPS algorithms by searching over the parameter space of human pose images to find the failure modes.
no code implementations • 18 Nov 2022 • Junfeng Wu, Yi Jiang, Qihao Liu, Xiang Bai, Song Bai
This technical report describes our 2nd-place solution for the ECCV 2022 YouTube-VIS Long Video Challenge.
no code implementations • 29 Jul 2022 • Qihao Liu, Yi Zhang, Song Bai, Alan Yuille
Inspired by the remarkable ability of humans to infer occluded joints from visible cues, we develop a method to explicitly model this process that significantly improves bottom-up multi-person human pose estimation with or without occlusions.
Ranked #10 on 3D Multi-Person Pose Estimation (absolute) on MuPoTS-3D
3D Human Pose Estimation 3D Multi-Person Pose Estimation (absolute) +2
2 code implementations • 21 Jul 2022 • Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
Ranked #14 on Video Instance Segmentation on YouTube-VIS 2021
1 code implementation • CVPR 2022 • Qing Liu, Adam Kortylewski, Zhishuai Zhang, Zizhang Li, Mengqi Guo, Qihao Liu, Xiaoding Yuan, Jiteng Mu, Weichao Qiu, Alan Yuille
We believe our dataset provides a rich testbed to study UDA for part segmentation and will help to significantly push forward research in this area.
no code implementations • 30 Nov 2020 • Qihao Liu, Weichao Qiu, Weiyao Wang, Gregory D. Hager, Alan L. Yuille
We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori, and then adapt it to the task of category-independent articulated object pose estimation.
no code implementations • 26 Nov 2018 • Qihao Liu, Yujia Wang, Xiaofeng Liu
To balance exploration and exploitation, the Novelty Search (NS) is employed in every chief agent to encourage policies with high novelty while maximizing per-episode performance.