no code implementations • 11 Dec 2024 • Jihao Liu, Zhiding Yu, Shiyi Lan, Shihao Wang, Rongyao Fang, Jan Kautz, Hongsheng Li, Jose M. Alvare
This paper presents StreamChat, a novel approach that enhances the interaction capabilities of Large Multimodal Models (LMMs) with streaming video content.
2 code implementations • 28 Jun 2024 • Jihao Liu, Xin Huang, Jinliang Zheng, Boxiao Liu, Jia Wang, Osamu Yoshie, Yu Liu, Hongsheng Li
This paper introduces MM-Instruct, a large-scale dataset of diverse and high-quality visual instruction data designed to enhance the instruction-following capabilities of large multimodal models (LMMs).
Ranked #135 on Visual Question Answering on MM-Vet
1 code implementation • 30 May 2024 • Jinliang Zheng, Jianxiong Li, Sijie Cheng, Yinan Zheng, Jiaming Li, Jihao Liu, Yu Liu, Jingjing Liu, Xianyuan Zhan
To achieve more accurate and nuanced multimodal instruction following, we introduce Instruction-guided Visual Masking (IVM), a new versatile visual grounding model that is compatible with diverse multimodal models, such as LMM and robot model.
Ranked #1 on Visual Question Answering on V*bench
1 code implementation • 29 May 2024 • Jihao Liu, Jinliang Zheng, Boxiao Liu, Yu Liu, Hongsheng Li
Contrastive pre-training on image-text pairs, exemplified by CLIP, becomes a standard technique for learning multi-modal visual-language representations.
no code implementations • CVPR 2024 • Jihao Liu, Jinliang Zheng, Yu Liu, Hongsheng Li
This paper proposes a GeneraLIst encoder-Decoder (GLID) pre-training method for better handling various downstream computer vision tasks.
1 code implementation • 28 Feb 2024 • Jianxiong Li, Jinliang Zheng, Yinan Zheng, Liyuan Mao, Xiao Hu, Sijie Cheng, Haoyi Niu, Jihao Liu, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Xianyuan Zhan
Multimodal pretraining is an effective strategy for the trinity of goals of representation learning in autonomous robots: 1) extracting both local and global task progressions; 2) enforcing temporal consistency of visual representation; 3) capturing trajectory-level language grounding.
1 code implementation • CVPR 2024 • Xingzhong Hou, Boxiao Liu, Yi Zhang, Jihao Liu, Yu Liu, Haihang You
Generative models are gaining increasing popularity and the demand for precisely generating images is on the rise.
1 code implementation • ICCV 2023 • Jihao Liu, Tai Wang, Boxiao Liu, Qihang Zhang, Yu Liu, Hongsheng Li
In this paper, we propose Geometry Enhanced Masked Image Modeling (GeoMIM) to transfer the knowledge of the LiDAR model in a pretrain-finetune paradigm for improving the multi-view camera-based 3D detection.
1 code implementation • 18 Jul 2022 • Jihao Liu, Boxiao Liu, Hang Zhou, Hongsheng Li, Yu Liu
In this paper, we propose a novel data augmentation technique TokenMix to improve the performance of vision transformers.
2 code implementations • 12 Jul 2022 • Jihao Liu, Xin Huang, Guanglu Song, Hongsheng Li, Yu Liu
Finally, we integrate configurable operators and DSMs into a unified search space and search with a Reinforcement Learning-based search algorithm to fully explore the optimal combination of the operators.
Ranked #12 on Neural Architecture Search on ImageNet
1 code implementation • CVPR 2023 • Jihao Liu, Xin Huang, Jinliang Zheng, Yu Liu, Hongsheng Li
In this paper, we propose Mixed and Masked AutoEncoder (MixMAE), a simple but efficient pretraining method that is applicable to various hierarchical Vision Transformers.
Ranked #2 on Image Classification on Places205
no code implementations • 16 Feb 2022 • Jihao Liu, Boxiao Liu, Hongsheng Li, Yu Liu
Recent studies pointed out that knowledge distillation (KD) suffers from two degradation problems, the teacher-student gap and the incompatibility with strong data augmentations, making it not applicable to training state-of-the-art models, which are trained with advanced augmentations.
Ranked #144 on Image Classification on ImageNet
no code implementations • 16 Nov 2021 • Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao
Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society.
no code implementations • 8 Oct 2021 • Jihao Liu, Hongsheng Li, Guanglu Song, Xin Huang, Yu Liu
Recently, transformer and multi-layer perceptron (MLP) architectures have achieved impressive results on various vision tasks.
Ranked #254 on Image Classification on ImageNet
no code implementations • 25 May 2021 • Jihao Liu, Ming Zhang, Yangting Sun, Boxiao Liu, Guanglu Song, Yu Liu, Hongsheng Li
Further, an architecture knowledge pool together with a block similarity function is proposed to utilize parameter knowledge and reduces the searching time by 2 times.
no code implementations • 1 Jan 2021 • Jihao Liu, Lingyao Xie
Let $(X\ni x, B)$ be an lc surface germ.
Algebraic Geometry
no code implementations • 1 Jan 2021 • Jihao Liu, Yangting Sun, Ming Zhang, Boxiao Liu, Yu Liu
Further, a life-long knowledge pool together with a block similarity function is proposed to utilize the lifelong parameter knowledge and reduces the searching time by 2 times.
no code implementations • 3 Jul 2020 • Jingjun Han, Jihao Liu
For $\epsilon$-lc Fano type varieties $X$ of dimension $d$ and a given finite set $\Gamma$, we show that there exists a positive integer $m_0$ which only depends on $\epsilon, d$ and $\Gamma$, such that both $|-mK_X-\sum_i\lceil mb_i\rceil B_i|$ and $|-mK_X-\sum_i\lfloor mb_i\rfloor B_i|$ define birational maps for any $m\ge m_0$ provided that $B_i$ are pseudo-effective Weil divisors, $b_i\in\Gamma$, and $-(K_X+\sum_ib_iB_i)$ is big.
Algebraic Geometry
1 code implementation • CVPR 2020 • Hang Zhou, Jihao Liu, Ziwei Liu, Yu Liu, Xiaogang Wang
Though face rotation has achieved rapid progress in recent years, the lack of high-quality paired training data remains a great hurdle for existing methods.
1 code implementation • ECCV 2020 • Zhengkai Jiang, Yu Liu, Ceyuan Yang, Jihao Liu, Peng Gao, Qian Zhang, Shiming Xiang, Chunhong Pan
Transferring existing image-based detectors to the video is non-trivial since the quality of frames is always deteriorated by part occlusion, rare pose, and motion blur.
Ranked #24 on Video Object Detection on ImageNet VID
1 code implementation • 2 Sep 2019 • Yu Liu, Guanglu Song, Manyuan Zhang, Jihao Liu, Yucong Zhou, Junjie Yan
Large scale face recognition is challenging especially when the computational budget is limited.